Section 07: Probability & Statistics
State Bayes’ Theorem.
A test has sensitivity 80% and specificity 90%. If prevalence is 10%, calculate PPV.
What’s the difference between sensitivity and PPV?
If PPV is low but NPV is high, what does this tell us about the test?
Bring Bayes questions and screening interpretation issues.
Contingency tables are a key exam format - expect at least one problem!
Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?
A contingency table shows the joint distribution of two categorical variables.
| \(B\) | \(\bar{B}\) | Total | |
|---|---|---|---|
| \(A\) | \(n_{AB}\) | \(n_{A\bar{B}}\) | \(n_A\) |
| \(\bar{A}\) | \(n_{\bar{A}B}\) | \(n_{\bar{A}\bar{B}}\) | \(n_{\bar{A}}\) |
| Total | \(n_B\) | \(n_{\bar{B}}\) | \(n\) |
| Type | Formula | Location in Table |
|---|---|---|
| Marginal | \(P(A)\) | Row total / Grand total |
| Joint | \(P(A \cap B)\) | Cell / Grand total |
| Conditional | \(P(A\|B)\) | Cell / Column total |
Survey of 500 customers about product preference and age:
| Age < 30 | Age ≥ 30 | Total | |
|---|---|---|---|
| Prefers A | 120 | 80 | 200 |
| Prefers B | 130 | 170 | 300 |
| Total | 250 | 250 | 500 |
Calculate:
Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?
Step-by-Step Approach
In a city of 10,000 residents:
Construct the contingency table.
| Adult | Minor | Total | |
|---|---|---|---|
| Employed | 3500 | ? | 4000 |
| Not Employed | ? | ? | 6000 |
| Total | 7000 | 3000 | 10000 |
| Adult | Minor | Total | |
|---|---|---|---|
| Employed | 3500 | 500 | 4000 |
| Not Employed | 3500 | 2500 | 6000 |
| Total | 7000 | 3000 | 10000 |
Now we can answer questions like:
A company surveyed 200 customers:
Build the table:
Step 1: Fill in what we know directly
| Repeat | New | Total | |
|---|---|---|---|
| Satisfied | ? | ? | 120 |
| Not Satisfied | ? | ? | 80 |
| Total | 90 | 110 | 200 |
Step 2: Use “Of satisfied, 60% are repeat”
\(P(\text{Repeat}|\text{Satisfied}) = 0.60\), so \(120 \times 0.60 = 72\) repeat AND satisfied
| Repeat | New | Total | |
|---|---|---|---|
| Satisfied | 72 | 48 | 120 |
| Not Satisfied | 18 | 62 | 80 |
| Total | 90 | 110 | 200 |
Verify: All rows and columns sum correctly ✓
Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?
Independence in Tables
Variables A and B are independent if and only if for all cells:
\[P(A \cap B) = P(A) \cdot P(B)\]
Or equivalently: \(\frac{\text{Cell count}}{\text{Total}} = \frac{\text{Row total}}{\text{Total}} \times \frac{\text{Column total}}{\text{Total}}\)
From our customer survey:
| Repeat | New | Total | |
|---|---|---|---|
| Satisfied | 72 | 48 | 120 |
| Not Satisfied | 18 | 62 | 80 |
| Total | 90 | 110 | 200 |
Test independence for (Satisfied, Repeat):
Question: If variables were independent, what cell count would you expect for (Satisfied, Repeat)?
\(72 \neq 54\), so satisfaction and repeat status are NOT independent.
The data suggests:
Satisfied customers are about 2.7 times more likely to be repeat customers!
Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?
The contingency table method from Session 07-05 is actually using this technique!
Medical testing example:
| Disease | No Disease | Total | |
|---|---|---|---|
| Test + | TP | FP | All + |
| Test − | FN | TN | All − |
| Total | Diseased | Healthy | Population |
For conditional probabilities, the denominator is the condition.
Always say the condition in words before dividing.
Given: Sensitivity = 90%, Specificity = 95%, Prevalence = 2%
For 10,000 people:
| Disease (200) | No Disease (9800) | Total | |
|---|---|---|---|
| Test + | 180 | 490 | 670 |
| Test − | 20 | 9310 | 9330 |
Direct calculations: - PPV = \(\frac{180}{670} \approx 0.269\) - NPV = \(\frac{9310}{9330} \approx 0.998\)
Work individually
Given table:
| \(B\) | \(B'\) | Total | |
|---|---|---|---|
| \(A\) | 24 | 36 | 60 |
| \(A'\) | 16 | 24 | 40 |
| Total | 40 | 60 | 100 |
A survey of 400 employees found:
Tasks: a) Construct the contingency table b) Find \(P(\text{Grad degree}|\text{Full-time})\) c) Find \(P(\text{Full-time}|\text{Grad degree})\) d) Are full-time status and graduate degree independent?
A company produces items at two factories. Quality control data:
Tasks: a) Construct a contingency table b) An item is randomly selected and found defective. What’s the probability it came from Factory A? c) What percentage of all items are defective?
In a retail database, let \(K\) = customer bought this month, and \(T\) = customer uses payback card.
Given:
| \(T\) | \(T'\) | Total | |
|---|---|---|---|
| \(K\) | 180 | 120 | 300 |
| \(K'\) | 90 | 210 | 300 |
| Total | 270 | 330 | 600 |
Work individually, then compare
From a table: total \(n=500\), with \(n_{AB}=90\), row total \(n_A=200\), column total \(n_B=150\).
Think individually (2 min), pair (3 min), then work in groups of 3-4 and share
A campaign contacts 800 customers.
Rate your confidence for today’s goals on a 1-5 scale (1 = not confident, 5 = exam-ready):
Work individually
Homework
Complete Tasks 07-06 - practice building and reading contingency tables!
Session 07-06 - Contingency Tables | Dr. Nikolai Heinrichs & Dr. Tobias Vlćek | Home