
Section 07: Probability & Statistics
Test your understanding of Bayes’ Theorem
State Bayes’ Theorem.
A test has a true positive rate of 80% and a true negative rate of 90%. If the base rate is 10%, calculate the Positive Predictive Value.
What’s the difference between the true positive rate and the Positive Predictive Value?
If the Positive Predictive Value is low but the Negative Predictive Value is high, what does this tell us about the test?
Bring Bayes questions and screening interpretation issues.
Contingency tables are a key exam format — expect at least one problem!
Shows the joint distribution of two categorical variables.
| \(B\) | \(\bar{B}\) | Total | |
|---|---|---|---|
| \(A\) | \(n_{AB}\) | \(n_{A\bar{B}}\) | \(n_A\) |
| \(\bar{A}\) | \(n_{\bar{A}B}\) | \(n_{\bar{A}\bar{B}}\) | \(n_{\bar{A}}\) |
| Total | \(n_B\) | \(n_{\bar{B}}\) | \(n\) |
| Type | Formula | Where to Look |
|---|---|---|
| Marginal | \(P(A)\) | Row total / Grand total |
| Joint | \(P(A \cap B)\) | Cell / Grand total |
| Conditional | \(P(A|B)\) | Cell / Column total |
The most common exam mistake: using the wrong denominator for conditional probabilities!
Survey of 500 customers about product preference and age:
| Age < 30 | Age ≥ 30 | Total | |
|---|---|---|---|
| Prefers A | 120 | 80 | 200 |
| Prefers B | 130 | 170 | 300 |
| Total | 250 | 250 | 500 |
Calculate:

The right panel shows that preference depends on age — younger customers favour A more.
Step-by-Step Approach:
In a city of 10,000 residents:
Step 1: Fill in what we know
| Adult | Minor | Total | |
|---|---|---|---|
| Employed | 3,500 | ? | 4,000 |
| Not Employed | ? | ? | 6,000 |
| Total | 7,000 | 3,000 | 10,000 |
Step 2: Use subtraction to fill remaining cells
| Adult | Minor | Total | |
|---|---|---|---|
| Employed | 3,500 | 500 | 4,000 |
| Not Employed | 3,500 | 2,500 | 6,000 |
| Total | 7,000 | 3,000 | 10,000 |
Now we can answer conditional questions:
A company surveyed 200 customers:
Step 1: Fill in direct values
| Repeat | New | Total | |
|---|---|---|---|
| Satisfied | ? | ? | 120 |
| Not Satisfied | ? | ? | 80 |
| Total | 90 | 110 | 200 |
Step 2: Use “Of satisfied, 60% are repeat”
\(P(\text{Repeat}|\text{Satisfied}) = 0.60\), so \(72\) repeat AND satisfied
| Repeat | New | Total | |
|---|---|---|---|
| Satisfied | 72 | 48 | 120 |
| Not Satisfied | 18 | 62 | 80 |
| Total | 90 | 110 | 200 |
Always read conditional statements carefully: “of the satisfied” means the total number of satisfied is 120, not 200!
Work individually
A company has 300 employees. 180 are in Sales, the rest in Operations. Of the Sales employees, 40% received a bonus. Of Operations employees, 25% received a bonus.
Independence in Tables
Variables A and B are independent if and only if for every cell:
\[P(A \cap B) = P(A) \cdot P(B)\]
Equivalently, the expected count under independence is:
\[E_{ij} = \frac{\text{Row total} \times \text{Column total}}{\text{Grand total}}\]
If observed \(\neq\) expected in any cell, the variables are NOT independent.
From our customer survey:
| Repeat | New | Total | |
|---|---|---|---|
| Satisfied | 72 | 48 | 120 |
| Not Satisfied | 18 | 62 | 80 |
| Total | 90 | 110 | 200 |
Question: If satisfaction and repeat status were independent, what would we expect in the (Satisfied, Repeat) cell?
\[E = \frac{120 \times 90}{200} = 54\]

Large differences between observed and expected → strong evidence against independence.
The data suggests:
Satisfied customers are about 2.7 times more likely to be repeat customers!
Table method from Session 07-05 is exactly this technique!
Classification as a contingency table:
| Condition (+) | Condition (−) | Total | |
|---|---|---|---|
| Predicted + | TP | FP | All + |
| Predicted − | FN | TN | All − |
| Total | Condition + | Condition − | Population |
For conditional prob., the denominator is always the condition total!
Always state the condition in words before choosing the denominator.
Given: TPR = 90%, TNR = 95%, BR = 2%, Population = 10,000
| Condition + (200) | Condition − (9,800) | Total | |
|---|---|---|---|
| Predicted + | 180 | 490 | 670 |
| Predicted − | 20 | 9,310 | 9,330 |
| Total | 200 | 9,800 | 10,000 |
Direct calculations:
Sometimes tables give proportions instead of counts:
| \(B\) | \(B'\) | Total | |
|---|---|---|---|
| \(A\) | 0.15 | 0.25 | 0.40 |
| \(A'\) | 0.35 | 0.25 | 0.60 |
| Total | 0.50 | 0.50 | 1.00 |
The same rules apply — cells are joint probabilities, margins are marginal probabilities.
From the previous table, test independence:
\[P(A) \cdot P(B) = 0.40 \times 0.50 = 0.20\]
\[P(A \cap B) = 0.15\]
\(0.15 \neq 0.20\), so A and B are NOT independent.
With a relative frequency table, you never need to multiply by a population size — just compare the cell value directly with the product of the marginals.
Question: Given a relative table, how do you get a count table?
Multiply every entry by the population size!
If \(n = 200\):
| \(B\) | \(B'\) | Total | |
|---|---|---|---|
| \(A\) | \(0.15 \times 200 = 30\) | \(0.25 \times 200 = 50\) | 80 |
| \(A'\) | \(0.35 \times 200 = 70\) | \(0.25 \times 200 = 50\) | 120 |
| Total | 100 | 100 | 200 |
If a problem gives percentages and asks for counts, pick a convenient total and multiply.
Checklist:
Define event symbols first (e.g., \(A\), \(B\))
Fill totals and directly given values
Subtract to complete missing cells
Verify all four internal cells sum to grand total
Read probabilities with the correct denominator:
Work in pairs
A survey of 400 employees found:
Work in pairs
A company produces items at two factories:
In a retail database, let \(K\) = customer bought this month, and \(T\) = customer uses loyalty card.
| \(T\) | \(T'\) | Total | |
|---|---|---|---|
| \(K\) | 180 | 120 | 300 |
| \(K'\) | 90 | 210 | 300 |
| Total | 270 | 330 | 600 |
A study reports the following relative frequency table:
| Online | In-Store | Total | |
|---|---|---|---|
| Return | 0.12 | 0.03 | 0.15 |
| Keep | 0.48 | 0.37 | 0.85 |
| Total | 0.60 | 0.40 | 1.00 |
Work individually, then compare
From a table: total \(n=500\), with \(n_{AB}=90\), row total \(n_A=200\), column total \(n_B=150\).
Think individually (2 min), then work in groups of 3-4
A campaign contacts 800 customers.
Think individually, then work in groups
A university tracks 600 students across two dimensions: study method (lecture-only vs lecture + tutorial) and exam result (pass vs fail).
360 students attended tutorials in addition to lectures, 480 students passed the exam and of those who attended tutorials, 85% passed
Think individually, then work in groups
A factory inspector tests items from two production shifts. The following relative frequency table is given:
| Day Shift | Night Shift | Total | |
|---|---|---|---|
| Conforming | ? | ? | ? |
| Non-conforming | 0.04 | 0.06 | 0.10 |
| Total | 0.55 | 0.45 | 1.00 |
Work individually
Homework
Complete Tasks 07-06 — practice building and reading contingency tables!
Session 07-06 - Contingency Tables | Dr. Nikolai Heinrichs & Dr. Tobias Vlćek | Home