Session 07-06 - Contingency Tables

Section 07: Probability & Statistics

Author

Dr. Nikolai Heinrichs & Dr. Tobias Vlćek

Entry Quiz - 10 Minutes

Quick Review from Session 07-05

  1. State Bayes’ Theorem.

  2. A test has sensitivity 80% and specificity 90%. If prevalence is 10%, calculate PPV.

  3. What’s the difference between sensitivity and PPV?

  4. If PPV is low but NPV is high, what does this tell us about the test?

Homework Discussion - 12 Minutes

Your Questions from Session 07-05

Bring Bayes questions and screening interpretation issues.

  • Sensitivity vs specificity vs PPV/NPV
  • Formula method vs contingency table method
  • Interpreting low-prevalence results

Learning Objectives

What You’ll Master Today

  • Construct contingency tables from word problems
  • Complete tables with missing values
  • Read probabilities from tables: marginal, joint, conditional
  • Test independence using table values
  • Connect tables to Bayes’ theorem

. . .

Contingency tables are a key exam format - expect at least one problem!

Part A: Table Structure

Discussion Prompt

Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?

Two-Way Contingency Table

A contingency table shows the joint distribution of two categorical variables.

. . .

\(B\) \(\bar{B}\) Total
\(A\) \(n_{AB}\) \(n_{A\bar{B}}\) \(n_A\)
\(\bar{A}\) \(n_{\bar{A}B}\) \(n_{\bar{A}\bar{B}}\) \(n_{\bar{A}}\)
Total \(n_B\) \(n_{\bar{B}}\) \(n\)

. . .

  • Cells: Joint frequencies (both conditions)
  • Row totals: Marginal frequencies for A
  • Column totals: Marginal frequencies for B

Reading Probabilities from Tables

Type Formula Location in Table
Marginal \(P(A)\) Row total / Grand total
Joint \(P(A \cap B)\) Cell / Grand total
Conditional \(P(A\|B)\) Cell / Column total

Example: Market Research

Survey of 500 customers about product preference and age:

Age < 30 Age ≥ 30 Total
Prefers A 120 80 200
Prefers B 130 170 300
Total 250 250 500

. . .

Calculate:

  • \(P(\text{Prefers A}) = \frac{200}{500} = 0.40\)
  • \(P(\text{Age} < 30 \cap \text{Prefers A}) = \frac{120}{500} = 0.24\)
  • \(P(\text{Prefers A} | \text{Age} < 30) = \frac{120}{250} = 0.48\)

Part B: Constructing Tables from Word Problems

Discussion Prompt

Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?

Strategy for Word Problems

TipStep-by-Step Approach
  1. Identify the two variables and their categories
  2. Create empty table with row/column labels
  3. Fill in given values (often percentages → convert to counts)
  4. Use relationships to complete missing cells
  5. Verify: Row and column totals must match

Example: Building a Table

In a city of 10,000 residents:

  • 40% are employed
  • 70% are adults (age ≥ 18)
  • 35% are employed adults

. . .

Construct the contingency table.

. . .

Adult Minor Total
Employed 3500 ? 4000
Not Employed ? ? 6000
Total 7000 3000 10000

Completing the Table

Adult Minor Total
Employed 3500 500 4000
Not Employed 3500 2500 6000
Total 7000 3000 10000

. . .

Now we can answer questions like:

  • \(P(\text{Employed}|\text{Minor}) = \frac{500}{3000} = \frac{1}{6} \approx 0.167\)
  • \(P(\text{Adult}|\text{Employed}) = \frac{3500}{4000} = 0.875\)

Exam-Style Problem

A company surveyed 200 customers:

  • 60% are satisfied with the product
  • 45% are repeat customers
  • Of the satisfied customers, 60% are repeat customers

. . .

Build the table:

Step 1: Fill in what we know directly

Repeat New Total
Satisfied ? ? 120
Not Satisfied ? ? 80
Total 90 110 200

Solution Continued

Step 2: Use “Of satisfied, 60% are repeat”

\(P(\text{Repeat}|\text{Satisfied}) = 0.60\), so \(120 \times 0.60 = 72\) repeat AND satisfied

Repeat New Total
Satisfied 72 48 120
Not Satisfied 18 62 80
Total 90 110 200

. . .

Verify: All rows and columns sum correctly ✓

Break - 10 Minutes

Part C: Independence Testing

Discussion Prompt

Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?

When Are Variables Independent?

ImportantIndependence in Tables

Variables A and B are independent if and only if for all cells:

\[P(A \cap B) = P(A) \cdot P(B)\]

Or equivalently: \(\frac{\text{Cell count}}{\text{Total}} = \frac{\text{Row total}}{\text{Total}} \times \frac{\text{Column total}}{\text{Total}}\)

Testing Independence: Example

From our customer survey:

Repeat New Total
Satisfied 72 48 120
Not Satisfied 18 62 80
Total 90 110 200

. . .

Test independence for (Satisfied, Repeat):

Question: If variables were independent, what cell count would you expect for (Satisfied, Repeat)?

  • Expected if independent: \(\frac{120}{200} \times \frac{90}{200} \times 200 = 0.60 \times 0.45 \times 200 = 54\)
  • Observed: 72

. . .

\(72 \neq 54\), so satisfaction and repeat status are NOT independent.

Interpretation

The data suggests:

  • Satisfied customers are MORE likely to be repeat customers
  • \(P(\text{Repeat}|\text{Satisfied}) = \frac{72}{120} = 0.60\)
  • \(P(\text{Repeat}|\text{Not Satisfied}) = \frac{18}{80} = 0.225\)

. . .

Satisfied customers are about 2.7 times more likely to be repeat customers!

Part D: Connecting to Bayes’ Theorem

Discussion Prompt

Question: Discuss with a partner: What is the key decision rule from this part, and where can students confuse it on the exam?

Tables and Bayes

The contingency table method from Session 07-05 is actually using this technique!

. . .

Medical testing example:

Disease No Disease Total
Test + TP FP All +
Test − FN TN All −
Total Diseased Healthy Population

. . .

  • PPV = \(P(D|+) = \frac{\text{TP}}{\text{All +}}\)
  • This is Bayes’ theorem applied to the table!

For conditional probabilities, the denominator is the condition.

  • \(P(A|B)\) -> divide by “all \(B\)
  • \(P(B|A)\) -> divide by “all \(A\)

Always say the condition in words before dividing.

Converting Between Approaches

Given: Sensitivity = 90%, Specificity = 95%, Prevalence = 2%

For 10,000 people:

Disease (200) No Disease (9800) Total
Test + 180 490 670
Test − 20 9310 9330

. . .

Direct calculations: - PPV = \(\frac{180}{670} \approx 0.269\) - NPV = \(\frac{9310}{9330} \approx 0.998\)

Algorithm Card: Build and Read Any 2x2 Table

  1. Define event symbols first (for example, \(A\), \(B\)).
  2. Fill totals and direct givens.
  3. Use row/column subtraction to complete missing cells.
  4. Verify all four internal cells sum to grand total.
  5. Read probabilities with the correct denominator:
    • joint -> grand total
    • marginal -> row/column total over grand total
    • conditional -> condition total

Quick Check - 6 Minutes

Reading Probabilities from Tables

Work individually

Given table:

\(B\) \(B'\) Total
\(A\) 24 36 60
\(A'\) 16 24 40
Total 40 60 100
  1. Compute \(P(A \cap B)\).
  2. Compute \(P(A|B)\).
  3. Compute \(P(B|A)\).
  4. Are \(A\) and \(B\) independent?

Guided Practice - 20 Minutes

Practice Problem 1

A survey of 400 employees found:

  • 55% work full-time
  • 40% have a graduate degree
  • 25% work full-time AND have a graduate degree

Tasks: a) Construct the contingency table b) Find \(P(\text{Grad degree}|\text{Full-time})\) c) Find \(P(\text{Full-time}|\text{Grad degree})\) d) Are full-time status and graduate degree independent?

Practice Problem 2 (2025 Exam Style)

A company produces items at two factories. Quality control data:

  • Factory A produces 3000 items, 5% defective
  • Factory B produces 2000 items, 8% defective

Tasks: a) Construct a contingency table b) An item is randomly selected and found defective. What’s the probability it came from Factory A? c) What percentage of all items are defective?

Practice Problem 3 (Business Interpretation)

In a retail database, let \(K\) = customer bought this month, and \(T\) = customer uses payback card.

Given:

\(T\) \(T'\) Total
\(K\) 180 120 300
\(K'\) 90 210 300
Total 270 330 600
  1. State in words what each event means: \(T'\), \(K' \cap T\), \(T \setminus K\).
  2. Compute \(P(T \setminus K)\).
  3. Write in words what \(P(T|K')\) means and compute it.

Chained Exam Mini-Problem - 8 Minutes

Work individually, then compare

From a table: total \(n=500\), with \(n_{AB}=90\), row total \(n_A=200\), column total \(n_B=150\).

  1. Compute \(P(A\cap B)\).
  2. Use (a) and totals to compute \(P(A|B)\).
  3. Compare \(P(A\cap B)\) with \(P(A)P(B)\) and assess independence.

Coffee Break - 10 Minutes

Collaborative Problem-Solving - 20 Minutes

Group Challenge: Campaign Conversion Table

Think individually (2 min), pair (3 min), then work in groups of 3-4 and share

A campaign contacts 800 customers.

  • 320 clicked the email (\(C\))
  • 180 purchased (\(P\))
  • 110 both clicked and purchased
  1. Build the full 2x2 contingency table.
  2. Compute \(P(P|C)\) and \(P(P|C')\).
  3. Test whether clicking and purchasing are independent.
  4. Give one practical marketing interpretation.

Confidence Check - 2 Minutes

Rate your confidence for today’s goals on a 1-5 scale (1 = not confident, 5 = exam-ready):

  • Constructing complete contingency tables
  • Reading marginal/joint/conditional probabilities
  • Choosing correct denominators in conditional probabilities

Final Assessment - 5 Minutes

Exit Ticket

Work individually

  1. In a table, where do you read \(P(A \cap B)\) from?
  2. If cell \((A,B)=36\) and column total for \(B\) is 120, what is \(P(A|B)\)?
  3. In words, what does \(K' \cap T\) mean in a customer table?

Wrap-Up & Key Takeaways

Today’s Essential Concepts

  • Table structure: Cells (joint), margins (marginal)
  • Reading probabilities: Marginal, joint, conditional
  • Building tables: Use given percentages and relationships
  • Independence test: Expected = row% × col% × total
  • Connection to Bayes: Tables provide visual Bayes calculations

Next Session Preview

Coming Up: Binomial Distribution

  • Discrete probability distributions
  • Binomial formula: \(P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}\)
  • “Exactly k”, “at most k”, “at least k”
  • Expected value and variance

. . .

TipHomework

Complete Tasks 07-06 - practice building and reading contingency tables!