Tasks 07-05 - Bayes’ Theorem
Section 07: Probability & Statistics
Problem 1: Basic Bayes’ Formula (x)
In a factory, 60% of products come from Machine A and 40% from Machine B. Machine A has a 2% defect rate, while Machine B has a 5% defect rate.
- What is the probability that a randomly selected product is defective?
- If a product is defective, what is the probability it came from Machine A?
Let D = defective, A = from Machine A, B = from Machine B
Given: P(A) = 0.6, P(B) = 0.4, P(D|A) = 0.02, P(D|B) = 0.05
Using Law of Total Probability: \(P(D) = P(D|A)P(A) + P(D|B)P(B)\) \(P(D) = 0.02 \times 0.6 + 0.05 \times 0.4 = 0.012 + 0.02 = 0.032\) (3.2%)
Using Bayes’ Theorem: \(P(A|D) = \frac{P(D|A)P(A)}{P(D)} = \frac{0.02 \times 0.6}{0.032} = \frac{0.012}{0.032} = 0.375\) (37.5%)
Problem 2: Medical Testing Basics (x)
A disease affects 1% of the population. A screening test has:
- True positive rate (sensitivity): 95%
- True negative rate (specificity): 90%
- If a person tests positive, what is the probability they have the disease (Positive Predictive Value)?
- If a person tests negative, what is the probability they don’t have the disease (Negative Predictive Value)?
Let D = has disease, T+ = tests positive, T- = tests negative
Given: P(D) = 0.01, P(D’) = 0.99, P(T+|D) = 0.95, P(T-|D’) = 0.90
Positive Predictive Value = P(D|T+)
First find P(T+): \(P(T+) = P(T+|D)P(D) + P(T+|D')P(D')\) \(P(T+) = 0.95 \times 0.01 + 0.10 \times 0.99 = 0.0095 + 0.099 = 0.1085\)
\(P(D|T+) = \frac{P(T+|D)P(D)}{P(T+)} = \frac{0.0095}{0.1085} = 0.0876\) (8.76%)
Negative Predictive Value = P(D’|T-)
First find P(T-): \(P(T-) = P(T-|D)P(D) + P(T-|D')P(D')\) \(P(T-) = 0.05 \times 0.01 + 0.90 \times 0.99 = 0.0005 + 0.891 = 0.8915\)
\(P(D'|T-) = \frac{P(T-|D')P(D')}{P(T-)} = \frac{0.891}{0.8915} = 0.9994\) (99.94%)
Problem 3: Quality Control (xx)
A company has three suppliers:
- Supplier A: 50% of parts, 4% defect rate
- Supplier B: 30% of parts, 2% defect rate
- Supplier C: 20% of parts, 6% defect rate
- What is the overall defect rate?
- A part is found defective. What is the probability it came from Supplier C?
- Which supplier should be investigated first for quality issues?
Let D = defective
\(P(D) = P(D|A)P(A) + P(D|B)P(B) + P(D|C)P(C)\) \(P(D) = 0.04 \times 0.5 + 0.02 \times 0.3 + 0.06 \times 0.2\) \(P(D) = 0.02 + 0.006 + 0.012 = 0.038\) (3.8%)
\(P(C|D) = \frac{P(D|C)P(C)}{P(D)} = \frac{0.06 \times 0.2}{0.038} = \frac{0.012}{0.038} = 0.316\) (31.6%)
Calculate probability of each supplier given defect:
\(P(A|D) = \frac{0.04 \times 0.5}{0.038} = \frac{0.02}{0.038} = 0.526\) (52.6%)
\(P(B|D) = \frac{0.02 \times 0.3}{0.038} = \frac{0.006}{0.038} = 0.158\) (15.8%)
\(P(C|D) = 31.6\%\) (from part b)
Investigate Supplier A first - they contribute the most defectives (52.6%) despite only having a 4% defect rate, because they supply 50% of parts.
Problem 4: Two-Stage Testing (xx)
A rare disease affects 0.5% of the population. A two-stage testing protocol:
- Stage 1 test: 98% true positive rate, 85% true negative rate
- Stage 2 test (only if Stage 1 positive): 99% true positive rate, 95% true negative rate
- What is the probability of testing positive in Stage 1?
- Given a positive Stage 1 result, what is the probability of having the disease?
- Given positive results in both stages, what is the probability of having the disease?
Let D = disease, T1+ = Stage 1 positive, T2+ = Stage 2 positive P(D) = 0.005, P(D’) = 0.995
\(P(T1+) = P(T1+|D)P(D) + P(T1+|D')P(D')\) \(P(T1+) = 0.98 \times 0.005 + 0.15 \times 0.995 = 0.0049 + 0.14925 = 0.15415\)
15.4% probability of positive Stage 1
\(P(D|T1+) = \frac{P(T1+|D)P(D)}{P(T1+)} = \frac{0.0049}{0.15415} = 0.0318\)
3.18% probability of disease after Stage 1 positive
After Stage 1+, the “prior” for Stage 2 is P(D|T1+) = 0.0318
Using updated probabilities: \(P(T2+|T1+) = P(T2+|D,T1+)P(D|T1+) + P(T2+|D',T1+)P(D'|T1+)\) \(P(T2+|T1+) = 0.99 \times 0.0318 + 0.05 \times 0.9682 = 0.0315 + 0.0484 = 0.0799\)
\(P(D|T1+,T2+) = \frac{P(T2+|D)P(D|T1+)}{P(T2+|T1+)} = \frac{0.99 \times 0.0318}{0.0799} = \frac{0.0315}{0.0799} = 0.394\)
39.4% probability of disease after both tests positive
Problem 5: Email Spam Filter (xx)
A spam filter classifies emails. From historical data:
- 30% of emails are spam
- When an email is spam, the filter correctly identifies it 92% of the time
- When an email is not spam, the filter incorrectly marks it as spam 8% of the time
- What percentage of all emails are marked as spam by the filter?
- If an email is marked as spam, what is the probability it’s actually spam?
- If an email passes the filter (not marked as spam), what is the probability it’s actually not spam?
Let S = spam, M = marked as spam P(S) = 0.30, P(S’) = 0.70 P(M|S) = 0.92 (true positive rate), P(M|S’) = 0.08 (false positive rate)
\(P(M) = P(M|S)P(S) + P(M|S')P(S')\) \(P(M) = 0.92 \times 0.30 + 0.08 \times 0.70 = 0.276 + 0.056 = 0.332\)
33.2% of emails are marked as spam
\(P(S|M) = \frac{P(M|S)P(S)}{P(M)} = \frac{0.276}{0.332} = 0.831\)
83.1% of marked emails are actually spam (precision)
\(P(M') = 1 - 0.332 = 0.668\)
\(P(S'|M') = \frac{P(M'|S')P(S')}{P(M')} = \frac{0.92 \times 0.70}{0.668} = \frac{0.644}{0.668} = 0.964\)
96.4% of passed emails are legitimate
Problem 6: Effect of Base Rate on Positive Predictive Value (xxx)
A test has a true positive rate of 95% and a true negative rate of 90%. Calculate the Positive Predictive Value for three different base rates:
- Base rate = 10% (common condition)
- Base rate = 1% (uncommon condition)
- Base rate = 0.1% (rare condition)
- What pattern do you observe? Explain why this matters for screening.
True positive rate = P(T+|D) = 0.95 True negative rate = P(T-|D’) = 0.90, so P(T+|D’) = 0.10
Base rate 10%: \(P(T+) = 0.95 \times 0.10 + 0.10 \times 0.90 = 0.095 + 0.090 = 0.185\) \(P(D|T+) = \frac{0.095}{0.185} = 0.514\) (51.4%)
Base rate 1%: \(P(T+) = 0.95 \times 0.01 + 0.10 \times 0.99 = 0.0095 + 0.099 = 0.1085\) \(P(D|T+) = \frac{0.0095}{0.1085} = 0.088\) (8.8%)
Base rate 0.1%: \(P(T+) = 0.95 \times 0.001 + 0.10 \times 0.999 = 0.00095 + 0.0999 = 0.10085\) \(P(D|T+) = \frac{0.00095}{0.10085} = 0.0094\) (0.94%)
Pattern observed:
Base Rate Pos. Predictive Value 10% 51.4% 1% 8.8% 0.1% 0.94% As the base rate decreases, the Positive Predictive Value drops dramatically.
Why this matters: For rare conditions, even highly accurate tests produce mostly false positives. This is why:
- Screening for rare conditions requires a very high true negative rate
- Confirmatory tests are needed after positive screening results
- Mass screening is not appropriate for very rare conditions
Problem 7: What Base Rate is Needed? (xxx)
A new diagnostic test has a true positive rate of 90% and a true negative rate of 95%.
- What base rate would be needed for a Positive Predictive Value of 50%?
- What base rate would be needed for a Positive Predictive Value of 80%?
- What base rate would be needed for a Positive Predictive Value of 95%?
Let p = base rate = P(D) True positive rate = 0.90, True negative rate = 0.95, so P(T+|D’) = 0.05
\(PPV = \frac{P(T+|D) \cdot p}{P(T+|D) \cdot p + P(T+|D') \cdot (1-p)} = \frac{0.90p}{0.90p + 0.05(1-p)}\)
Setting Positive Predictive Value = target and solving for p:
\(PPV = \frac{0.90p}{0.90p + 0.05 - 0.05p} = \frac{0.90p}{0.85p + 0.05}\)
\(PPV(0.85p + 0.05) = 0.90p\) \(0.85 \cdot PPV \cdot p + 0.05 \cdot PPV = 0.90p\) \(0.05 \cdot PPV = 0.90p - 0.85 \cdot PPV \cdot p\) \(0.05 \cdot PPV = p(0.90 - 0.85 \cdot PPV)\) \(p = \frac{0.05 \cdot PPV}{0.90 - 0.85 \cdot PPV}\)
For Pos. Predictive Value = 50%: \(p = \frac{0.05 \times 0.50}{0.90 - 0.85 \times 0.50} = \frac{0.025}{0.90 - 0.425} = \frac{0.025}{0.475} = 0.053\) Base rate needed: 5.3%
For Pos. Predictive Value = 80%: \(p = \frac{0.05 \times 0.80}{0.90 - 0.85 \times 0.80} = \frac{0.04}{0.90 - 0.68} = \frac{0.04}{0.22} = 0.182\) Base rate needed: 18.2%
For Pos. Predictive Value = 95%: \(p = \frac{0.05 \times 0.95}{0.90 - 0.85 \times 0.95} = \frac{0.0475}{0.90 - 0.8075} = \frac{0.0475}{0.0925} = 0.514\) Base rate needed: 51.4%
Problem 8: Insurance Risk Assessment (xxx)
An insurance company categorizes drivers:
- 20% are “high risk” (accident probability 15%)
- 50% are “medium risk” (accident probability 5%)
- 30% are “low risk” (accident probability 1%)
- What is the overall accident probability for a randomly selected driver?
- A new customer has an accident in their first year. What is the probability they are high risk?
- The company wants to assign risk categories based on accident history. After 3 accident-free years, what is the probability a driver is low risk? (Assume independence)
Let A = accident, H = high risk, M = medium risk, L = low risk P(H) = 0.20, P(M) = 0.50, P(L) = 0.30 P(A|H) = 0.15, P(A|M) = 0.05, P(A|L) = 0.01
\(P(A) = P(A|H)P(H) + P(A|M)P(M) + P(A|L)P(L)\) \(P(A) = 0.15 \times 0.20 + 0.05 \times 0.50 + 0.01 \times 0.30\) \(P(A) = 0.03 + 0.025 + 0.003 = 0.058\) (5.8%)
\(P(H|A) = \frac{P(A|H)P(H)}{P(A)} = \frac{0.03}{0.058} = 0.517\) (51.7%)
Let A’ = no accident in a year. For 3 accident-free years (A’A’A’):
\(P(A'A'A'|H) = (0.85)^3 = 0.614\) \(P(A'A'A'|M) = (0.95)^3 = 0.857\) \(P(A'A'A'|L) = (0.99)^3 = 0.970\)
\(P(A'A'A') = 0.614 \times 0.20 + 0.857 \times 0.50 + 0.970 \times 0.30\) \(= 0.1228 + 0.4285 + 0.291 = 0.8423\)
\(P(L|A'A'A') = \frac{P(A'A'A'|L)P(L)}{P(A'A'A')} = \frac{0.970 \times 0.30}{0.8423} = \frac{0.291}{0.8423} = 0.345\)
34.5% probability of being low risk after 3 accident-free years
(Note: Increased from prior 30% - the accident-free record is evidence of lower risk)
Problem 9: Serial Testing (xxx)
A condition has a 5% base rate. Two independent tests are available:
- Test A: 90% true positive rate, 85% true negative rate
- Test B: 95% true positive rate, 80% true negative rate
Compare two testing strategies:
- Strategy 1: Use only Test A. What is the Positive Predictive Value?
- Strategy 2: First use Test B, then if positive, confirm with Test A. What is the final Positive Predictive Value?
- Which strategy is better? Why might a system choose the strategy with the worse Positive Predictive Value?
P(D) = 0.05, P(D’) = 0.95
Strategy 1 (Test A only): \(P(A+) = 0.90 \times 0.05 + 0.15 \times 0.95 = 0.045 + 0.1425 = 0.1875\) \(P(D|A+) = \frac{0.045}{0.1875} = 0.24\) (24%)
Strategy 2 (Test B then Test A):
First, probability of positive on Test B: \(P(B+) = 0.95 \times 0.05 + 0.20 \times 0.95 = 0.0475 + 0.19 = 0.2375\)
After B+, updated probability of condition: \(P(D|B+) = \frac{0.0475}{0.2375} = 0.20\)
Now apply Test A to this population: \(P(A+|B+) = 0.90 \times 0.20 + 0.15 \times 0.80 = 0.18 + 0.12 = 0.30\)
\(P(D|A+,B+) = \frac{0.90 \times 0.20}{0.30} = \frac{0.18}{0.30} = 0.60\) (60%)
Strategy 2 is better (60% vs 24% Positive Predictive Value)
Why choose the worse strategy?
- Cost: Test B is cheaper (fewer confirmatory tests needed)
- Speed: Only one test vs. two
- True positive rate: Strategy 2 might miss more true positives
- Strategy 1 overall true positive rate: 90%
- Strategy 2 overall true positive rate: 0.95 × 0.90 = 85.5%
- If the consequence of a false negative is severe, the simpler test might be preferred
Problem 10: Exam-Style Problem (xxxx)
A company produces electronic components. Components are manufactured by three machines:
- Machine 1: produces 40% of output, 3% defect rate
- Machine 2: produces 35% of output, 2% defect rate
- Machine 3: produces 25% of output, 5% defect rate
An automatic inspection system tests each component:
- If a component is defective, the system detects it with 95% probability
- If a component is good, the system incorrectly flags it as defective with 8% probability
- What is the overall probability that a randomly selected component is defective?
- What is the probability that a component flagged by the inspection system is actually defective?
- Given that a component passed inspection, what is the probability it came from Machine 2?
- The company wants to improve quality. If they eliminate Machine 3, what would be the new overall defect rate?
Let D = defective, F = flagged by inspection P(M1) = 0.40, P(M2) = 0.35, P(M3) = 0.25 P(D|M1) = 0.03, P(D|M2) = 0.02, P(D|M3) = 0.05 P(F|D) = 0.95, P(F|D’) = 0.08
Overall defect rate: \(P(D) = P(D|M1)P(M1) + P(D|M2)P(M2) + P(D|M3)P(M3)\) \(P(D) = 0.03 \times 0.40 + 0.02 \times 0.35 + 0.05 \times 0.25\) \(P(D) = 0.012 + 0.007 + 0.0125 = 0.0315\) (3.15%)
P(D|F): \(P(F) = P(F|D)P(D) + P(F|D')P(D')\) \(P(F) = 0.95 \times 0.0315 + 0.08 \times 0.9685\) \(P(F) = 0.0299 + 0.0775 = 0.1074\)
\(P(D|F) = \frac{P(F|D)P(D)}{P(F)} = \frac{0.0299}{0.1074} = 0.278\) (27.8%)
P(M2|F’) where F’ = passed inspection: \(P(F') = 1 - 0.1074 = 0.8926\)
Need P(F’|M2): \(P(D|M2) = 0.02\), so \(P(D'|M2) = 0.98\) \(P(F'|M2) = P(F'|D,M2)P(D|M2) + P(F'|D',M2)P(D'|M2)\) \(P(F'|M2) = 0.05 \times 0.02 + 0.92 \times 0.98 = 0.001 + 0.9016 = 0.9026\)
\(P(M2|F') = \frac{P(F'|M2)P(M2)}{P(F')} = \frac{0.9026 \times 0.35}{0.8926} = \frac{0.3159}{0.8926} = 0.354\) (35.4%)
(Slightly higher than the prior 35%, because Machine 2 has lower defect rate)
Without Machine 3: New proportions: M1 = 40/(40+35) = 53.3%, M2 = 35/(40+35) = 46.7%
\(P(D) = 0.03 \times 0.533 + 0.02 \times 0.467 = 0.016 + 0.0093 = 0.0253\) (2.53%)
Reduction from 3.15% to 2.53% (about 20% improvement)
Problem 11: Tree Diagram Analysis (xxx)
Draw a probability tree and solve:
A marketing campaign targets customers. Historical data shows:
- 40% of customers receive email marketing
- 50% of customers receive phone marketing
- 30% of customers receive both
For customers who receive email marketing only: 10% purchase For customers who receive phone marketing only: 15% purchase For customers who receive both: 25% purchase For customers who receive neither: 2% purchase
- What is the overall purchase rate?
- If a customer made a purchase, what is the probability they received both types of marketing?
- Which marketing channel is most effective in terms of increasing purchase probability?
Interpreting the customer segments:
- Email only: 10%
- Phone only: 20%
- Both: 30%
- Neither: 40%
(These sum to 100%)
Given purchase rates: P(Purchase|Email only) = 0.10 P(Purchase|Phone only) = 0.15 P(Purchase|Both) = 0.25 P(Purchase|Neither) = 0.02
Overall purchase rate: \(P(Purchase) = 0.10 \times 0.10 + 0.15 \times 0.20 + 0.25 \times 0.30 + 0.02 \times 0.40\) \(= 0.01 + 0.03 + 0.075 + 0.008 = 0.123\) (12.3%)
P(Both|Purchase): \(P(Both|Purchase) = \frac{P(Purchase|Both)P(Both)}{P(Purchase)} = \frac{0.25 \times 0.30}{0.123} = \frac{0.075}{0.123} = 0.610\)
(61.0%)
Marketing effectiveness:
Comparing to baseline (neither) of 2%:
- Email only: 10% (5× baseline)
- Phone only: 15% (7.5× baseline)
- Both: 25% (12.5× baseline)
Phone marketing is more effective than email alone, but the combination is best. The synergy between channels (25% vs. 10%+15%-2% = 23% if independent) suggests a small positive interaction effect.
Problem 12: Comprehensive Medical Testing (xxxx)
A hospital screens for a disease with the following characteristics:
- Base rate in the screening population: 2%
- Screening test: 92% true positive rate, 88% true negative rate
- Confirmatory test: 99% true positive rate, 97% true negative rate
Testing protocol: All patients get the screening test. Those who test positive get the confirmatory test.
- What percentage of the screening population will need the confirmatory test?
- Of those who test positive on the screening test, what percentage will test positive on the confirmatory test?
- If a patient tests positive on both tests, what is the probability they have the disease?
- The hospital can only afford confirmatory tests for 10% of patients. What true negative rate would the screening test need to achieve this?
P(D) = 0.02, P(D’) = 0.98 Screening: TPR = 0.92, TNR = 0.88 (so FPR = 0.12) Confirmatory: TPR = 0.99, TNR = 0.97 (so FPR = 0.03)
Percentage needing confirmatory test = P(Screen+): \(P(S+) = 0.92 \times 0.02 + 0.12 \times 0.98 = 0.0184 + 0.1176 = 0.136\)
13.6% will need confirmatory testing
P(Confirm+|Screen+): First, find P(D|Screen+): \(P(D|S+) = \frac{0.0184}{0.136} = 0.1353\)
\(P(C+|S+) = P(C+|D)P(D|S+) + P(C+|D')P(D'|S+)\) \(= 0.99 \times 0.1353 + 0.03 \times 0.8647\) \(= 0.134 + 0.026 = 0.160\)
16.0% of those screened positive will confirm positive
P(D|S+,C+): \(P(D|S+,C+) = \frac{P(C+|D)P(D|S+)}{P(C+|S+)} = \frac{0.99 \times 0.1353}{0.160} = \frac{0.134}{0.160} = 0.838\)
83.8% probability of disease after both positive
Required true negative rate for P(S+) = 0.10: Let true negative rate = x, so false positive rate = (1-x)
\(P(S+) = 0.92 \times 0.02 + (1-x) \times 0.98 = 0.10\) \(0.0184 + 0.98 - 0.98x = 0.10\) \(0.9984 - 0.98x = 0.10\) \(0.98x = 0.8984\) \(x = 0.917\)
Required true negative rate: 91.7% (up from current 88%)