Session 07-06 - Contingency Tables
Section 07: Probability & Statistics
Entry Quiz - 10 Minutes
Quick Review from Session 07-05
State Bayes’ Theorem.
A test has sensitivity 80% and specificity 90%. If prevalence is 10%, calculate PPV.
What’s the difference between sensitivity and PPV?
If PPV is low but NPV is high, what does this tell us about the test?
Homework Discussion - 12 Minutes
Your Questions from Session 07-05
Bring Bayes questions and screening interpretation issues.
- Sensitivity vs specificity vs PPV/NPV
- Formula method vs contingency table method
- Interpreting low-prevalence results
Learning Objectives
What You’ll Master Today
- Construct contingency tables from word problems
- Complete tables with missing values
- Read probabilities from tables: marginal, joint, conditional
- Test independence using table values
- Connect tables to Bayes’ theorem
. . .
Contingency tables are a key exam format - expect at least one problem!
Part A: Table Structure
Discussion Prompt
Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?
Two-Way Contingency Table
A contingency table shows the joint distribution of two categorical variables.
. . .
| \(B\) | \(\bar{B}\) | Total | |
|---|---|---|---|
| \(A\) | \(n_{AB}\) | \(n_{A\bar{B}}\) | \(n_A\) |
| \(\bar{A}\) | \(n_{\bar{A}B}\) | \(n_{\bar{A}\bar{B}}\) | \(n_{\bar{A}}\) |
| Total | \(n_B\) | \(n_{\bar{B}}\) | \(n\) |
. . .
- Cells: Joint frequencies (both conditions)
- Row totals: Marginal frequencies for A
- Column totals: Marginal frequencies for B
Reading Probabilities from Tables
| Type | Formula | Location in Table |
|---|---|---|
| Marginal | \(P(A)\) | Row total / Grand total |
| Joint | \(P(A \cap B)\) | Cell / Grand total |
| Conditional | \(P(A\|B)\) | Cell / Column total |
Example: Market Research
Survey of 500 customers about product preference and age:
| Age < 30 | Age ≥ 30 | Total | |
|---|---|---|---|
| Prefers A | 120 | 80 | 200 |
| Prefers B | 130 | 170 | 300 |
| Total | 250 | 250 | 500 |
. . .
Calculate:
- \(P(\text{Prefers A}) = \frac{200}{500} = 0.40\)
- \(P(\text{Age} < 30 \cap \text{Prefers A}) = \frac{120}{500} = 0.24\)
- \(P(\text{Prefers A} | \text{Age} < 30) = \frac{120}{250} = 0.48\)
Part B: Constructing Tables from Word Problems
Discussion Prompt
Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?
Strategy for Word Problems
- Identify the two variables and their categories
- Create empty table with row/column labels
- Fill in given values (often percentages → convert to counts)
- Use relationships to complete missing cells
- Verify: Row and column totals must match
Example: Building a Table
In a city of 10,000 residents:
- 40% are employed
- 70% are adults (age ≥ 18)
- 35% are employed adults
. . .
Construct the contingency table.
. . .
| Adult | Minor | Total | |
|---|---|---|---|
| Employed | 3500 | ? | 4000 |
| Not Employed | ? | ? | 6000 |
| Total | 7000 | 3000 | 10000 |
Completing the Table
| Adult | Minor | Total | |
|---|---|---|---|
| Employed | 3500 | 500 | 4000 |
| Not Employed | 3500 | 2500 | 6000 |
| Total | 7000 | 3000 | 10000 |
. . .
Now we can answer questions like:
- \(P(\text{Employed}|\text{Minor}) = \frac{500}{3000} = \frac{1}{6} \approx 0.167\)
- \(P(\text{Adult}|\text{Employed}) = \frac{3500}{4000} = 0.875\)
Exam-Style Problem
A company surveyed 200 customers:
- 60% are satisfied with the product
- 45% are repeat customers
- Of the satisfied customers, 60% are repeat customers
. . .
Build the table:
Step 1: Fill in what we know directly
| Repeat | New | Total | |
|---|---|---|---|
| Satisfied | ? | ? | 120 |
| Not Satisfied | ? | ? | 80 |
| Total | 90 | 110 | 200 |
Solution Continued
Step 2: Use “Of satisfied, 60% are repeat”
\(P(\text{Repeat}|\text{Satisfied}) = 0.60\), so \(120 \times 0.60 = 72\) repeat AND satisfied
| Repeat | New | Total | |
|---|---|---|---|
| Satisfied | 72 | 48 | 120 |
| Not Satisfied | 18 | 62 | 80 |
| Total | 90 | 110 | 200 |
. . .
Verify: All rows and columns sum correctly ✓
Break - 10 Minutes
Part C: Independence Testing
Discussion Prompt
Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?
When Are Variables Independent?
Variables A and B are independent if and only if for all cells:
\[P(A \cap B) = P(A) \cdot P(B)\]
Or equivalently: \(\frac{\text{Cell count}}{\text{Total}} = \frac{\text{Row total}}{\text{Total}} \times \frac{\text{Column total}}{\text{Total}}\)
Testing Independence: Example
From our customer survey:
| Repeat | New | Total | |
|---|---|---|---|
| Satisfied | 72 | 48 | 120 |
| Not Satisfied | 18 | 62 | 80 |
| Total | 90 | 110 | 200 |
. . .
Test independence for (Satisfied, Repeat):
Question: If variables were independent, what cell count would you expect for (Satisfied, Repeat)?
- Expected if independent: \(\frac{120}{200} \times \frac{90}{200} \times 200 = 0.60 \times 0.45 \times 200 = 54\)
- Observed: 72
. . .
\(72 \neq 54\), so satisfaction and repeat status are NOT independent.
Interpretation
The data suggests:
- Satisfied customers are MORE likely to be repeat customers
- \(P(\text{Repeat}|\text{Satisfied}) = \frac{72}{120} = 0.60\)
- \(P(\text{Repeat}|\text{Not Satisfied}) = \frac{18}{80} = 0.225\)
. . .
Satisfied customers are about 2.7 times more likely to be repeat customers!
Part D: Connecting to Bayes’ Theorem
Discussion Prompt
Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?
Tables and Bayes
The contingency table method from Session 07-05 is actually using this technique!
. . .
Medical testing example:
| Disease | No Disease | Total | |
|---|---|---|---|
| Test + | TP | FP | All + |
| Test − | FN | TN | All − |
| Total | Diseased | Healthy | Population |
. . .
- PPV = \(P(D|+) = \frac{\text{TP}}{\text{All +}}\)
- This is Bayes’ theorem applied to the table!
For conditional probabilities, the denominator is the condition.
- \(P(A|B)\) -> divide by “all \(B\)”
- \(P(B|A)\) -> divide by “all \(A\)”
Always say the condition in words before dividing.
Converting Between Approaches
Given: Sensitivity = 90%, Specificity = 95%, Prevalence = 2%
For 10,000 people:
| Disease (200) | No Disease (9800) | Total | |
|---|---|---|---|
| Test + | 180 | 490 | 670 |
| Test − | 20 | 9310 | 9330 |
. . .
Direct calculations: - PPV = \(\frac{180}{670} \approx 0.269\) - NPV = \(\frac{9310}{9330} \approx 0.998\)
Algorithm Card: Build and Read Any 2x2 Table
- Define event symbols first (for example, \(A\), \(B\)).
- Fill totals and direct givens.
- Use row/column subtraction to complete missing cells.
- Verify all four internal cells sum to grand total.
- Read probabilities with the correct denominator:
- joint -> grand total
- marginal -> row/column total over grand total
- conditional -> condition total
Quick Check - 6 Minutes
Reading Probabilities from Tables
Work individually
Given table:
| \(B\) | \(B'\) | Total | |
|---|---|---|---|
| \(A\) | 24 | 36 | 60 |
| \(A'\) | 16 | 24 | 40 |
| Total | 40 | 60 | 100 |
- Compute \(P(A \cap B)\).
- Compute \(P(A|B)\).
- Compute \(P(B|A)\).
- Are \(A\) and \(B\) independent?
Guided Practice - 20 Minutes
Practice Problem 1
A survey of 400 employees found:
- 55% work full-time
- 40% have a graduate degree
- 25% work full-time AND have a graduate degree
Tasks: a) Construct the contingency table b) Find \(P(\text{Grad degree}|\text{Full-time})\) c) Find \(P(\text{Full-time}|\text{Grad degree})\) d) Are full-time status and graduate degree independent?
Practice Problem 2 (2025 Exam Style)
A company produces items at two factories. Quality control data:
- Factory A produces 3000 items, 5% defective
- Factory B produces 2000 items, 8% defective
Tasks: a) Construct a contingency table b) An item is randomly selected and found defective. What’s the probability it came from Factory A? c) What percentage of all items are defective?
Practice Problem 3 (Business Interpretation)
In a retail database, let \(K\) = customer bought this month, and \(T\) = customer uses payback card.
Given:
| \(T\) | \(T'\) | Total | |
|---|---|---|---|
| \(K\) | 180 | 120 | 300 |
| \(K'\) | 90 | 210 | 300 |
| Total | 270 | 330 | 600 |
- State in words what each event means: \(T'\), \(K' \cap T\), \(T \setminus K\).
- Compute \(P(T \setminus K)\).
- Write in words what \(P(T|K')\) means and compute it.
Chained Exam Mini-Problem - 8 Minutes
Work individually, then compare
From a table: total \(n=500\), with \(n_{AB}=90\), row total \(n_A=200\), column total \(n_B=150\).
- Compute \(P(A\cap B)\).
- Use (a) and totals to compute \(P(A|B)\).
- Compare \(P(A\cap B)\) with \(P(A)P(B)\) and assess independence.
Coffee Break - 10 Minutes
Collaborative Problem-Solving - 20 Minutes
Group Challenge: Campaign Conversion Table
Think individually (2 min), pair (3 min), then work in groups of 3-4 and share
A campaign contacts 800 customers.
- 320 clicked the email (\(C\))
- 180 purchased (\(P\))
- 110 both clicked and purchased
- Build the full 2x2 contingency table.
- Compute \(P(P|C)\) and \(P(P|C')\).
- Test whether clicking and purchasing are independent.
- Give one practical marketing interpretation.
Confidence Check - 2 Minutes
Rate your confidence for today’s goals on a 1-5 scale (1 = not confident, 5 = exam-ready):
- Constructing complete contingency tables
- Reading marginal/joint/conditional probabilities
- Choosing correct denominators in conditional probabilities
Final Assessment - 5 Minutes
Exit Ticket
Work individually
- In a table, where do you read \(P(A \cap B)\) from?
- If cell \((A,B)=36\) and column total for \(B\) is 120, what is \(P(A|B)\)?
- In words, what does \(K' \cap T\) mean in a customer table?
Wrap-Up & Key Takeaways
Today’s Essential Concepts
- Table structure: Cells (joint), margins (marginal)
- Reading probabilities: Marginal, joint, conditional
- Building tables: Use given percentages and relationships
- Independence test: Expected = row% × col% × total
- Connection to Bayes: Tables provide visual Bayes calculations
Next Session Preview
Coming Up: Binomial Distribution
- Discrete probability distributions
- Binomial formula: \(P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}\)
- “Exactly k”, “at most k”, “at least k”
- Expected value and variance
. . .
Complete Tasks 07-06 - practice building and reading contingency tables!