Tasks 07-06 - Contingency Tables & Bayes

Section 07: Probability & Statistics

Problem 1: Bayes’ Theorem Basics (x)

A medical test has the following characteristics:

  • True positive rate: 90%
  • True negative rate: 95%
  • Base rate: 5% (5% of the population has the disease)
  1. What is \(P(+|D)\)?
  2. What is \(P(-|D')\)?
  3. Calculate the Positive Predictive Value \(P(D|+)\)
  4. Calculate the Negative Predictive Value \(P(D'|-)\)
  1. \(P(+|D) = 0.90\) (this is the true positive rate)

  2. \(P(-|D') = 0.95\) (this is the true negative rate)

  3. Positive Predictive Value using Bayes: \[P(D|+) = \frac{P(+|D) \cdot P(D)}{P(+|D) \cdot P(D) + P(+|D') \cdot P(D')}\] \[= \frac{0.90 \times 0.05}{0.90 \times 0.05 + 0.05 \times 0.95}\] \[= \frac{0.045}{0.045 + 0.0475} = \frac{0.045}{0.0925} \approx 0.486\]

  4. Negative Predictive Value: \[P(-) = P(-|D) \cdot P(D) + P(-|D') \cdot P(D') = 0.10 \times 0.05 + 0.95 \times 0.95 = 0.005 + 0.9025 = 0.9075\] \[P(D'|-) = \frac{P(-|D') \cdot P(D')}{P(-)} = \frac{0.95 \times 0.95}{0.9075} = \frac{0.9025}{0.9075} \approx 0.994\]

Problem 2: Contingency Table Construction (xx)

A company surveyed 200 employees about their commute method and job satisfaction:

  • 55% commute by car
  • 40% are highly satisfied
  • 30% commute by car AND are highly satisfied
  1. Construct a complete contingency table
  2. Find \(P(\text{Car}|\text{Highly Satisfied})\)
  3. Find \(P(\text{Highly Satisfied}|\text{Car})\)
  4. Are commute method and satisfaction independent?
  1. Contingency Table:
Car Other Total
High Sat 60 20 80
Lower Sat 50 70 120
Total 110 90 200
  1. \(P(\text{Car}|\text{HS}) = \frac{60}{80} = 0.75\)

  2. \(P(\text{HS}|\text{Car}) = \frac{60}{110} \approx 0.545\)

  3. Independence check: \(P(\text{Car}) \times P(\text{HS}) = 0.55 \times 0.40 = 0.22\) \(P(\text{Car} \cap \text{HS}) = \frac{60}{200} = 0.30\) \(0.22 \neq 0.30\), so NOT independent!

Problem 3: Medical Testing - Full Analysis (xx)

A screening test for a disease has:

  • True positive rate = 85%
  • True negative rate = 92%
  • The disease affects 3% of the population (base rate)
  1. Create a contingency table for a population of 10,000
  2. Calculate the Positive Predictive Value directly from the table
  3. Calculate the Negative Predictive Value directly from the table
  4. If you test positive, how worried should you be? Interpret the Positive Predictive Value.
  1. Table for 10,000 people:

Population: 300 with disease, 9,700 without

Condition + Condition − Total
Predicted + 255 776 1,031
Predicted − 45 8,924 8,969
Total 300 9,700 10,000

Calculations:

  • True Positive: \(300 \times 0.85 = 255\)
  • False Negative: \(300 \times 0.15 = 45\)
  • True Negative: \(9700 \times 0.92 = 8,924\)
  • False Positive: \(9700 \times 0.08 = 776\)
  1. \(\text{Pos. Predictive Value} = \frac{255}{1031} \approx 0.247\) or 24.7%

  2. \(\text{Neg. Predictive Value} = \frac{8924}{8969} \approx 0.995\) or 99.5%

  3. Interpretation: A positive test result only means about 25% chance of actually having the disease. This seems counterintuitive given the test’s high accuracy, but it’s because the base rate is low (3%). Most positive tests are false positives from the large healthy population. A negative result is very reassuring (99.5% chance of being healthy).

Problem 4: Factory Quality (xx)

A factory has two machines:

  • Machine A produces 70% of output, 4% defect rate
  • Machine B produces 30% of output, 6% defect rate
  1. What is the overall defect rate?
  2. A defective item is found. What’s the probability it came from Machine A?
  3. Create a contingency table for 1000 items
  4. Verify your answer to (b) using the table
  1. \(P(D) = P(D|A)P(A) + P(D|B)P(B) = 0.04(0.70) + 0.06(0.30) = 0.028 + 0.018 = 0.046 = 4.6\%\)

  2. \(P(A|D) = \frac{P(D|A)P(A)}{P(D)} = \frac{0.04 \times 0.70}{0.046} = \frac{0.028}{0.046} \approx 0.609\)

  3. Table for 1000 items:

Machine A Machine B Total
Defective 28 18 46
Good 672 282 954
Total 700 300 1000
  1. From table: \(P(A|D) = \frac{28}{46} \approx 0.609\)

Problem 5: Exam-Style Problem - 2025 Format (xxx)

In a city, a rapid test for a virus is available:

  • The test correctly identifies 92% of infected people (true positive rate)
  • The test correctly identifies 97% of non-infected people (true negative rate)
  • Currently 8% of the population is infected (base rate)

A person tests positive.

  1. Calculate the probability that this person is actually infected (Positive Predictive Value).

  2. Now suppose the base rate increases to 20% due to an outbreak. Recalculate the Positive Predictive Value.

  3. Explain why the Positive Predictive Value changes with the base rate.

  4. At what base rate would the Positive Predictive Value equal 80%? (Set up the equation and solve)

  1. Positive Predictive Value with 8% base rate: \[P(D|+) = \frac{0.92 \times 0.08}{0.92 \times 0.08 + 0.03 \times 0.92}\] \[= \frac{0.0736}{0.0736 + 0.0276} = \frac{0.0736}{0.1012} \approx 0.727\]

Positive Predictive Value ≈ 72.7%

  1. Positive Predictive Value with 20% base rate: \[P(D|+) = \frac{0.92 \times 0.20}{0.92 \times 0.20 + 0.03 \times 0.80}\] \[= \frac{0.184}{0.184 + 0.024} = \frac{0.184}{0.208} \approx 0.885\]

Positive Predictive Value ≈ 88.5%

  1. Explanation: When the base rate increases, a larger proportion of the tested population actually has the disease. This means:
  • More true positives (infected people correctly identified)
  • Fewer false positives relative to true positives (same false positive rate but smaller healthy population)
  • Result: Higher Positive Predictive Value
  1. Finding base rate for Positive Predictive Value = 0.80:

Let \(p\) = base rate

\[0.80 = \frac{0.92p}{0.92p + 0.03(1-p)}\]

\[0.80(0.92p + 0.03 - 0.03p) = 0.92p\] \[0.736p + 0.024 - 0.024p = 0.92p\] \[0.024 = 0.92p - 0.712p\] \[0.024 = 0.208p\] \[p = \frac{0.024}{0.208} \approx 0.115\]

Answer: A base rate of approximately 11.5% yields a Positive Predictive Value of 80%

Problem 6: Exam-Style Problem - 2023 Format (xxx)

A company conducts employee surveys. Based on historical data:

  • 60% of employees are satisfied with their job
  • Of satisfied employees, 75% recommend the company to others
  • Of unsatisfied employees, 20% still recommend the company
  1. Create a contingency table for 500 employees
  2. What proportion of employees recommend the company?
  3. An employee recommends the company. What’s the probability they are satisfied?
  4. Are satisfaction and recommendation independent? Justify with calculations.
  1. Contingency Table for 500 employees:

300 satisfied, 200 unsatisfied

Satisfied Unsatisfied Total
Recommend 225 40 265
Don’t Recommend 75 160 235
Total 300 200 500

Calculations:

  • Satisfied & Recommend: \(300 \times 0.75 = 225\)
  • Satisfied & Don’t: \(300 \times 0.25 = 75\)
  • Unsatisfied & Recommend: \(200 \times 0.20 = 40\)
  • Unsatisfied & Don’t: \(200 \times 0.80 = 160\)
  1. \(P(\text{Recommend}) = \frac{265}{500} = 0.53 = 53\%\)

  2. \(P(\text{Satisfied}|\text{Recommend}) = \frac{225}{265} \approx 0.849\)

  3. Independence test: \(P(\text{S}) \times P(\text{R}) = 0.60 \times 0.53 = 0.318\) \(P(\text{S} \cap \text{R}) = \frac{225}{500} = 0.45\)

    Since \(0.318 \neq 0.45\), the events are NOT independent.

    This makes sense: satisfied employees are much more likely to recommend (75% vs 20%).