Session 07-06 - Contingency Tables

Section 07: Probability & Statistics

Dr. Nikolai Heinrichs & Dr. Tobias Vlćek

Entry Quiz - 10 Minutes

Quick Review from Session 07-05

State Bayes’ Theorem.
A test has sensitivity 80% and specificity 90%. If prevalence is 10%, calculate PPV.
What’s the difference between sensitivity and PPV?
If PPV is low but NPV is high, what does this tell us about the test?

Homework Discussion - 12 Minutes

Your Questions from Session 07-05

Bring Bayes questions and screening interpretation issues.

Sensitivity vs specificity vs PPV/NPV
Formula method vs contingency table method
Interpreting low-prevalence results

Learning Objectives

What You’ll Master Today

Construct contingency tables from word problems
Complete tables with missing values
Read probabilities from tables: marginal, joint, conditional
Test independence using table values
Connect tables to Bayes’ theorem

Contingency tables are a key exam format - expect at least one problem!

Part A: Table Structure

Discussion Prompt

Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?

Two-Way Contingency Table

A contingency table shows the joint distribution of two categorical variables.

	\(B\)	\(\bar{B}\)	Total
\(A\)	\(n_{AB}\)	\(n_{A\bar{B}}\)	\(n_A\)
\(\bar{A}\)	\(n_{\bar{A}B}\)	\(n_{\bar{A}\bar{B}}\)	\(n_{\bar{A}}\)
Total	\(n_B\)	\(n_{\bar{B}}\)	\(n\)

Cells: Joint frequencies (both conditions)
Row totals: Marginal frequencies for A
Column totals: Marginal frequencies for B

Reading Probabilities from Tables

Type	Formula	Location in Table
Marginal	\(P(A)\)	Row total / Grand total
Joint	\(P(A \cap B)\)	Cell / Grand total
Conditional	\(P(A\\|B)\)	Cell / Column total

Example: Market Research

Survey of 500 customers about product preference and age:

	Age < 30	Age ≥ 30	Total
Prefers A	120	80	200
Prefers B	130	170	300
Total	250	250	500

Calculate:

\(P(\text{Prefers A}) = \frac{200}{500} = 0.40\)
\(P(\text{Age} < 30 \cap \text{Prefers A}) = \frac{120}{500} = 0.24\)
\(P(\text{Prefers A} | \text{Age} < 30) = \frac{120}{250} = 0.48\)

Part B: Constructing Tables from Word Problems

Discussion Prompt

Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?

Strategy for Word Problems

Step-by-Step Approach

Identify the two variables and their categories
Create empty table with row/column labels
Fill in given values (often percentages → convert to counts)
Use relationships to complete missing cells
Verify: Row and column totals must match

Example: Building a Table

In a city of 10,000 residents:

40% are employed
70% are adults (age ≥ 18)
35% are employed adults

Construct the contingency table.

	Adult	Minor	Total
Employed	3500	?	4000
Not Employed	?	?	6000
Total	7000	3000	10000

Completing the Table

	Adult	Minor	Total
Employed	3500	500	4000
Not Employed	3500	2500	6000
Total	7000	3000	10000

Now we can answer questions like:

\(P(\text{Employed}|\text{Minor}) = \frac{500}{3000} = \frac{1}{6} \approx 0.167\)
\(P(\text{Adult}|\text{Employed}) = \frac{3500}{4000} = 0.875\)

Exam-Style Problem

A company surveyed 200 customers:

60% are satisfied with the product
45% are repeat customers
Of the satisfied customers, 60% are repeat customers

Build the table:

Step 1: Fill in what we know directly

	Repeat	New	Total
Satisfied	?	?	120
Not Satisfied	?	?	80
Total	90	110	200

Solution Continued

Step 2: Use “Of satisfied, 60% are repeat”

\(P(\text{Repeat}|\text{Satisfied}) = 0.60\), so \(120 \times 0.60 = 72\) repeat AND satisfied

	Repeat	New	Total
Satisfied	72	48	120
Not Satisfied	18	62	80
Total	90	110	200

Verify: All rows and columns sum correctly ✓

Break - 10 Minutes

Part C: Independence Testing

Discussion Prompt

Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?

When Are Variables Independent?

Independence in Tables

Variables A and B are independent if and only if for all cells:

\[P(A \cap B) = P(A) \cdot P(B)\]

Or equivalently: \(\frac{\text{Cell count}}{\text{Total}} = \frac{\text{Row total}}{\text{Total}} \times \frac{\text{Column total}}{\text{Total}}\)

Testing Independence: Example

From our customer survey:

	Repeat	New	Total
Satisfied	72	48	120
Not Satisfied	18	62	80
Total	90	110	200

Test independence for (Satisfied, Repeat):

Question: If variables were independent, what cell count would you expect for (Satisfied, Repeat)?

Expected if independent: \(\frac{120}{200} \times \frac{90}{200} \times 200 = 0.60 \times 0.45 \times 200 = 54\)
Observed: 72

\(72 \neq 54\), so satisfaction and repeat status are NOT independent.

Interpretation

The data suggests:

Satisfied customers are MORE likely to be repeat customers
\(P(\text{Repeat}|\text{Satisfied}) = \frac{72}{120} = 0.60\)
\(P(\text{Repeat}|\text{Not Satisfied}) = \frac{18}{80} = 0.225\)

Satisfied customers are about 2.7 times more likely to be repeat customers!

Part D: Connecting to Bayes’ Theorem

Discussion Prompt

Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?

Tables and Bayes

The contingency table method from Session 07-05 is actually using this technique!

Medical testing example:

	Disease	No Disease	Total
Test +	TP	FP	All +
Test −	FN	TN	All −
Total	Diseased	Healthy	Population

PPV = \(P(D|+) = \frac{\text{TP}}{\text{All +}}\)
This is Bayes’ theorem applied to the table!

For conditional probabilities, the denominator is the condition.

\(P(A|B)\) -> divide by “all \(B\)”
\(P(B|A)\) -> divide by “all \(A\)”

Always say the condition in words before dividing.

Converting Between Approaches

Given: Sensitivity = 90%, Specificity = 95%, Prevalence = 2%

For 10,000 people:

	Disease (200)	No Disease (9800)	Total
Test +	180	490	670
Test −	20	9310	9330

Direct calculations: - PPV = \(\frac{180}{670} \approx 0.269\) - NPV = \(\frac{9310}{9330} \approx 0.998\)

Algorithm Card: Build and Read Any 2x2 Table

Define event symbols first (for example, \(A\), \(B\)).
Fill totals and direct givens.
Use row/column subtraction to complete missing cells.
Verify all four internal cells sum to grand total.
Read probabilities with the correct denominator:
- joint -> grand total
- marginal -> row/column total over grand total
- conditional -> condition total

Quick Check - 6 Minutes

Reading Probabilities from Tables

Work individually

Given table:

	\(B\)	\(B'\)	Total
\(A\)	24	36	60
\(A'\)	16	24	40
Total	40	60	100

Compute \(P(A \cap B)\).
Compute \(P(A|B)\).
Compute \(P(B|A)\).
Are \(A\) and \(B\) independent?

Guided Practice - 20 Minutes

Practice Problem 1

A survey of 400 employees found:

55% work full-time
40% have a graduate degree
25% work full-time AND have a graduate degree

Tasks: a) Construct the contingency table b) Find \(P(\text{Grad degree}|\text{Full-time})\) c) Find \(P(\text{Full-time}|\text{Grad degree})\) d) Are full-time status and graduate degree independent?

Practice Problem 2 (2025 Exam Style)

A company produces items at two factories. Quality control data:

Factory A produces 3000 items, 5% defective
Factory B produces 2000 items, 8% defective

Tasks: a) Construct a contingency table b) An item is randomly selected and found defective. What’s the probability it came from Factory A? c) What percentage of all items are defective?

Practice Problem 3 (Business Interpretation)

In a retail database, let \(K\) = customer bought this month, and \(T\) = customer uses payback card.

Given:

	\(T\)	\(T'\)	Total
\(K\)	180	120	300
\(K'\)	90	210	300
Total	270	330	600

State in words what each event means: \(T'\), \(K' \cap T\), \(T \setminus K\).
Compute \(P(T \setminus K)\).
Write in words what \(P(T|K')\) means and compute it.

Chained Exam Mini-Problem - 8 Minutes

Work individually, then compare

From a table: total \(n=500\), with \(n_{AB}=90\), row total \(n_A=200\), column total \(n_B=150\).

Compute \(P(A\cap B)\).
Use (a) and totals to compute \(P(A|B)\).
Compare \(P(A\cap B)\) with \(P(A)P(B)\) and assess independence.

Coffee Break - 10 Minutes

Collaborative Problem-Solving - 20 Minutes

Group Challenge: Campaign Conversion Table

Think individually (2 min), pair (3 min), then work in groups of 3-4 and share

A campaign contacts 800 customers.

320 clicked the email (\(C\))
180 purchased (\(P\))
110 both clicked and purchased

Build the full 2x2 contingency table.
Compute \(P(P|C)\) and \(P(P|C')\).
Test whether clicking and purchasing are independent.
Give one practical marketing interpretation.

Confidence Check - 2 Minutes

Rate your confidence for today’s goals on a 1-5 scale (1 = not confident, 5 = exam-ready):

Constructing complete contingency tables
Reading marginal/joint/conditional probabilities
Choosing correct denominators in conditional probabilities

Final Assessment - 5 Minutes

Exit Ticket

Work individually

In a table, where do you read \(P(A \cap B)\) from?
If cell \((A,B)=36\) and column total for \(B\) is 120, what is \(P(A|B)\)?
In words, what does \(K' \cap T\) mean in a customer table?

Wrap-Up & Key Takeaways

Today’s Essential Concepts

Table structure: Cells (joint), margins (marginal)
Reading probabilities: Marginal, joint, conditional
Building tables: Use given percentages and relationships
Independence test: Expected = row% × col% × total
Connection to Bayes: Tables provide visual Bayes calculations

Next Session Preview

Coming Up: Binomial Distribution

Discrete probability distributions
Binomial formula: \(P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}\)
“Exactly k”, “at most k”, “at least k”
Expected value and variance

Homework

Complete Tasks 07-06 - practice building and reading contingency tables!