Tasks 07-05 - Bayes’ Theorem

Section 07: Probability & Statistics

Problem 1: Basic Bayes’ Formula (x)

In a factory, 60% of products come from Machine A and 40% from Machine B. Machine A has a 2% defect rate, while Machine B has a 5% defect rate.

  1. What is the probability that a randomly selected product is defective?
  2. If a product is defective, what is the probability it came from Machine A?

Let D = defective, A = from Machine A, B = from Machine B

Given: P(A) = 0.6, P(B) = 0.4, P(D|A) = 0.02, P(D|B) = 0.05

  1. Using Law of Total Probability: \(P(D) = P(D|A)P(A) + P(D|B)P(B)\) \(P(D) = 0.02 \times 0.6 + 0.05 \times 0.4 = 0.012 + 0.02 = 0.032\) (3.2%)

  2. Using Bayes’ Theorem: \(P(A|D) = \frac{P(D|A)P(A)}{P(D)} = \frac{0.02 \times 0.6}{0.032} = \frac{0.012}{0.032} = 0.375\) (37.5%)

Problem 2: Medical Testing Basics (x)

A disease affects 1% of the population. A screening test has:

  • True positive rate (sensitivity): 95%
  • True negative rate (specificity): 90%
  1. If a person tests positive, what is the probability they have the disease (Positive Predictive Value)?
  2. If a person tests negative, what is the probability they don’t have the disease (Negative Predictive Value)?

Let D = has disease, T+ = tests positive, T- = tests negative

Given: P(D) = 0.01, P(D’) = 0.99, P(T+|D) = 0.95, P(T-|D’) = 0.90

  1. Positive Predictive Value = P(D|T+)

    First find P(T+): \(P(T+) = P(T+|D)P(D) + P(T+|D')P(D')\) \(P(T+) = 0.95 \times 0.01 + 0.10 \times 0.99 = 0.0095 + 0.099 = 0.1085\)

    \(P(D|T+) = \frac{P(T+|D)P(D)}{P(T+)} = \frac{0.0095}{0.1085} = 0.0876\) (8.76%)

  2. Negative Predictive Value = P(D’|T-)

    First find P(T-): \(P(T-) = P(T-|D)P(D) + P(T-|D')P(D')\) \(P(T-) = 0.05 \times 0.01 + 0.90 \times 0.99 = 0.0005 + 0.891 = 0.8915\)

    \(P(D'|T-) = \frac{P(T-|D')P(D')}{P(T-)} = \frac{0.891}{0.8915} = 0.9994\) (99.94%)

Problem 3: Quality Control (xx)

A company has three suppliers:

  • Supplier A: 50% of parts, 4% defect rate
  • Supplier B: 30% of parts, 2% defect rate
  • Supplier C: 20% of parts, 6% defect rate
  1. What is the overall defect rate?
  2. A part is found defective. What is the probability it came from Supplier C?
  3. Which supplier should be investigated first for quality issues?

Let D = defective

  1. \(P(D) = P(D|A)P(A) + P(D|B)P(B) + P(D|C)P(C)\) \(P(D) = 0.04 \times 0.5 + 0.02 \times 0.3 + 0.06 \times 0.2\) \(P(D) = 0.02 + 0.006 + 0.012 = 0.038\) (3.8%)

  2. \(P(C|D) = \frac{P(D|C)P(C)}{P(D)} = \frac{0.06 \times 0.2}{0.038} = \frac{0.012}{0.038} = 0.316\) (31.6%)

  3. Calculate probability of each supplier given defect:

    \(P(A|D) = \frac{0.04 \times 0.5}{0.038} = \frac{0.02}{0.038} = 0.526\) (52.6%)

    \(P(B|D) = \frac{0.02 \times 0.3}{0.038} = \frac{0.006}{0.038} = 0.158\) (15.8%)

    \(P(C|D) = 31.6\%\) (from part b)

    Investigate Supplier A first - they contribute the most defectives (52.6%) despite only having a 4% defect rate, because they supply 50% of parts.

Problem 4: Two-Stage Testing (xx)

A rare disease affects 0.5% of the population. A two-stage testing protocol:

  • Stage 1 test: 98% true positive rate, 85% true negative rate
  • Stage 2 test (only if Stage 1 positive): 99% true positive rate, 95% true negative rate
  1. What is the probability of testing positive in Stage 1?
  2. Given a positive Stage 1 result, what is the probability of having the disease?
  3. Given positive results in both stages, what is the probability of having the disease?

Let D = disease, T1+ = Stage 1 positive, T2+ = Stage 2 positive P(D) = 0.005, P(D’) = 0.995

  1. \(P(T1+) = P(T1+|D)P(D) + P(T1+|D')P(D')\) \(P(T1+) = 0.98 \times 0.005 + 0.15 \times 0.995 = 0.0049 + 0.14925 = 0.15415\)

    15.4% probability of positive Stage 1

  2. \(P(D|T1+) = \frac{P(T1+|D)P(D)}{P(T1+)} = \frac{0.0049}{0.15415} = 0.0318\)

    3.18% probability of disease after Stage 1 positive

  3. After Stage 1+, the “prior” for Stage 2 is P(D|T1+) = 0.0318

    Using updated probabilities: \(P(T2+|T1+) = P(T2+|D,T1+)P(D|T1+) + P(T2+|D',T1+)P(D'|T1+)\) \(P(T2+|T1+) = 0.99 \times 0.0318 + 0.05 \times 0.9682 = 0.0315 + 0.0484 = 0.0799\)

    \(P(D|T1+,T2+) = \frac{P(T2+|D)P(D|T1+)}{P(T2+|T1+)} = \frac{0.99 \times 0.0318}{0.0799} = \frac{0.0315}{0.0799} = 0.394\)

    39.4% probability of disease after both tests positive

Problem 5: Email Spam Filter (xx)

A spam filter classifies emails. From historical data:

  • 30% of emails are spam
  • When an email is spam, the filter correctly identifies it 92% of the time
  • When an email is not spam, the filter incorrectly marks it as spam 8% of the time
  1. What percentage of all emails are marked as spam by the filter?
  2. If an email is marked as spam, what is the probability it’s actually spam?
  3. If an email passes the filter (not marked as spam), what is the probability it’s actually not spam?

Let S = spam, M = marked as spam P(S) = 0.30, P(S’) = 0.70 P(M|S) = 0.92 (true positive rate), P(M|S’) = 0.08 (false positive rate)

  1. \(P(M) = P(M|S)P(S) + P(M|S')P(S')\) \(P(M) = 0.92 \times 0.30 + 0.08 \times 0.70 = 0.276 + 0.056 = 0.332\)

    33.2% of emails are marked as spam

  2. \(P(S|M) = \frac{P(M|S)P(S)}{P(M)} = \frac{0.276}{0.332} = 0.831\)

    83.1% of marked emails are actually spam (precision)

  3. \(P(M') = 1 - 0.332 = 0.668\)

    \(P(S'|M') = \frac{P(M'|S')P(S')}{P(M')} = \frac{0.92 \times 0.70}{0.668} = \frac{0.644}{0.668} = 0.964\)

    96.4% of passed emails are legitimate

Problem 6: Effect of Base Rate on Positive Predictive Value (xxx)

A test has a true positive rate of 95% and a true negative rate of 90%. Calculate the Positive Predictive Value for three different base rates:

  1. Base rate = 10% (common condition)
  2. Base rate = 1% (uncommon condition)
  3. Base rate = 0.1% (rare condition)
  4. What pattern do you observe? Explain why this matters for screening.

True positive rate = P(T+|D) = 0.95 True negative rate = P(T-|D’) = 0.90, so P(T+|D’) = 0.10

  1. Base rate 10%: \(P(T+) = 0.95 \times 0.10 + 0.10 \times 0.90 = 0.095 + 0.090 = 0.185\) \(P(D|T+) = \frac{0.095}{0.185} = 0.514\) (51.4%)

  2. Base rate 1%: \(P(T+) = 0.95 \times 0.01 + 0.10 \times 0.99 = 0.0095 + 0.099 = 0.1085\) \(P(D|T+) = \frac{0.0095}{0.1085} = 0.088\) (8.8%)

  3. Base rate 0.1%: \(P(T+) = 0.95 \times 0.001 + 0.10 \times 0.999 = 0.00095 + 0.0999 = 0.10085\) \(P(D|T+) = \frac{0.00095}{0.10085} = 0.0094\) (0.94%)

  4. Pattern observed:

    Base Rate Pos. Predictive Value
    10% 51.4%
    1% 8.8%
    0.1% 0.94%

    As the base rate decreases, the Positive Predictive Value drops dramatically.

    Why this matters: For rare conditions, even highly accurate tests produce mostly false positives. This is why:

    • Screening for rare conditions requires a very high true negative rate
    • Confirmatory tests are needed after positive screening results
    • Mass screening is not appropriate for very rare conditions

Problem 7: What Base Rate is Needed? (xxx)

A new diagnostic test has a true positive rate of 90% and a true negative rate of 95%.

  1. What base rate would be needed for a Positive Predictive Value of 50%?
  2. What base rate would be needed for a Positive Predictive Value of 80%?
  3. What base rate would be needed for a Positive Predictive Value of 95%?

Let p = base rate = P(D) True positive rate = 0.90, True negative rate = 0.95, so P(T+|D’) = 0.05

\(PPV = \frac{P(T+|D) \cdot p}{P(T+|D) \cdot p + P(T+|D') \cdot (1-p)} = \frac{0.90p}{0.90p + 0.05(1-p)}\)

Setting Positive Predictive Value = target and solving for p:

\(PPV = \frac{0.90p}{0.90p + 0.05 - 0.05p} = \frac{0.90p}{0.85p + 0.05}\)

\(PPV(0.85p + 0.05) = 0.90p\) \(0.85 \cdot PPV \cdot p + 0.05 \cdot PPV = 0.90p\) \(0.05 \cdot PPV = 0.90p - 0.85 \cdot PPV \cdot p\) \(0.05 \cdot PPV = p(0.90 - 0.85 \cdot PPV)\) \(p = \frac{0.05 \cdot PPV}{0.90 - 0.85 \cdot PPV}\)

  1. For Pos. Predictive Value = 50%: \(p = \frac{0.05 \times 0.50}{0.90 - 0.85 \times 0.50} = \frac{0.025}{0.90 - 0.425} = \frac{0.025}{0.475} = 0.053\) Base rate needed: 5.3%

  2. For Pos. Predictive Value = 80%: \(p = \frac{0.05 \times 0.80}{0.90 - 0.85 \times 0.80} = \frac{0.04}{0.90 - 0.68} = \frac{0.04}{0.22} = 0.182\) Base rate needed: 18.2%

  3. For Pos. Predictive Value = 95%: \(p = \frac{0.05 \times 0.95}{0.90 - 0.85 \times 0.95} = \frac{0.0475}{0.90 - 0.8075} = \frac{0.0475}{0.0925} = 0.514\) Base rate needed: 51.4%

Problem 8: Insurance Risk Assessment (xxx)

An insurance company categorizes drivers:

  • 20% are “high risk” (accident probability 15%)
  • 50% are “medium risk” (accident probability 5%)
  • 30% are “low risk” (accident probability 1%)
  1. What is the overall accident probability for a randomly selected driver?
  2. A new customer has an accident in their first year. What is the probability they are high risk?
  3. The company wants to assign risk categories based on accident history. After 3 accident-free years, what is the probability a driver is low risk? (Assume independence)

Let A = accident, H = high risk, M = medium risk, L = low risk P(H) = 0.20, P(M) = 0.50, P(L) = 0.30 P(A|H) = 0.15, P(A|M) = 0.05, P(A|L) = 0.01

  1. \(P(A) = P(A|H)P(H) + P(A|M)P(M) + P(A|L)P(L)\) \(P(A) = 0.15 \times 0.20 + 0.05 \times 0.50 + 0.01 \times 0.30\) \(P(A) = 0.03 + 0.025 + 0.003 = 0.058\) (5.8%)

  2. \(P(H|A) = \frac{P(A|H)P(H)}{P(A)} = \frac{0.03}{0.058} = 0.517\) (51.7%)

  3. Let A’ = no accident in a year. For 3 accident-free years (A’A’A’):

    \(P(A'A'A'|H) = (0.85)^3 = 0.614\) \(P(A'A'A'|M) = (0.95)^3 = 0.857\) \(P(A'A'A'|L) = (0.99)^3 = 0.970\)

    \(P(A'A'A') = 0.614 \times 0.20 + 0.857 \times 0.50 + 0.970 \times 0.30\) \(= 0.1228 + 0.4285 + 0.291 = 0.8423\)

    \(P(L|A'A'A') = \frac{P(A'A'A'|L)P(L)}{P(A'A'A')} = \frac{0.970 \times 0.30}{0.8423} = \frac{0.291}{0.8423} = 0.345\)

    34.5% probability of being low risk after 3 accident-free years

    (Note: Increased from prior 30% - the accident-free record is evidence of lower risk)

Problem 9: Serial Testing (xxx)

A condition has a 5% base rate. Two independent tests are available:

  • Test A: 90% true positive rate, 85% true negative rate
  • Test B: 95% true positive rate, 80% true negative rate

Compare two testing strategies:

  1. Strategy 1: Use only Test A. What is the Positive Predictive Value?
  2. Strategy 2: First use Test B, then if positive, confirm with Test A. What is the final Positive Predictive Value?
  3. Which strategy is better? Why might a system choose the strategy with the worse Positive Predictive Value?

P(D) = 0.05, P(D’) = 0.95

  1. Strategy 1 (Test A only): \(P(A+) = 0.90 \times 0.05 + 0.15 \times 0.95 = 0.045 + 0.1425 = 0.1875\) \(P(D|A+) = \frac{0.045}{0.1875} = 0.24\) (24%)

  2. Strategy 2 (Test B then Test A):

    First, probability of positive on Test B: \(P(B+) = 0.95 \times 0.05 + 0.20 \times 0.95 = 0.0475 + 0.19 = 0.2375\)

    After B+, updated probability of condition: \(P(D|B+) = \frac{0.0475}{0.2375} = 0.20\)

    Now apply Test A to this population: \(P(A+|B+) = 0.90 \times 0.20 + 0.15 \times 0.80 = 0.18 + 0.12 = 0.30\)

    \(P(D|A+,B+) = \frac{0.90 \times 0.20}{0.30} = \frac{0.18}{0.30} = 0.60\) (60%)

  3. Strategy 2 is better (60% vs 24% Positive Predictive Value)

    Why choose the worse strategy?

    • Cost: Test B is cheaper (fewer confirmatory tests needed)
    • Speed: Only one test vs. two
    • True positive rate: Strategy 2 might miss more true positives
      • Strategy 1 overall true positive rate: 90%
      • Strategy 2 overall true positive rate: 0.95 × 0.90 = 85.5%
    • If the consequence of a false negative is severe, the simpler test might be preferred

Problem 10: Exam-Style Problem (xxxx)

A company produces electronic components. Components are manufactured by three machines:

  • Machine 1: produces 40% of output, 3% defect rate
  • Machine 2: produces 35% of output, 2% defect rate
  • Machine 3: produces 25% of output, 5% defect rate

An automatic inspection system tests each component:

  • If a component is defective, the system detects it with 95% probability
  • If a component is good, the system incorrectly flags it as defective with 8% probability
  1. What is the overall probability that a randomly selected component is defective?
  2. What is the probability that a component flagged by the inspection system is actually defective?
  3. Given that a component passed inspection, what is the probability it came from Machine 2?
  4. The company wants to improve quality. If they eliminate Machine 3, what would be the new overall defect rate?

Let D = defective, F = flagged by inspection P(M1) = 0.40, P(M2) = 0.35, P(M3) = 0.25 P(D|M1) = 0.03, P(D|M2) = 0.02, P(D|M3) = 0.05 P(F|D) = 0.95, P(F|D’) = 0.08

  1. Overall defect rate: \(P(D) = P(D|M1)P(M1) + P(D|M2)P(M2) + P(D|M3)P(M3)\) \(P(D) = 0.03 \times 0.40 + 0.02 \times 0.35 + 0.05 \times 0.25\) \(P(D) = 0.012 + 0.007 + 0.0125 = 0.0315\) (3.15%)

  2. P(D|F): \(P(F) = P(F|D)P(D) + P(F|D')P(D')\) \(P(F) = 0.95 \times 0.0315 + 0.08 \times 0.9685\) \(P(F) = 0.0299 + 0.0775 = 0.1074\)

    \(P(D|F) = \frac{P(F|D)P(D)}{P(F)} = \frac{0.0299}{0.1074} = 0.278\) (27.8%)

  3. P(M2|F’) where F’ = passed inspection: \(P(F') = 1 - 0.1074 = 0.8926\)

    Need P(F’|M2): \(P(D|M2) = 0.02\), so \(P(D'|M2) = 0.98\) \(P(F'|M2) = P(F'|D,M2)P(D|M2) + P(F'|D',M2)P(D'|M2)\) \(P(F'|M2) = 0.05 \times 0.02 + 0.92 \times 0.98 = 0.001 + 0.9016 = 0.9026\)

    \(P(M2|F') = \frac{P(F'|M2)P(M2)}{P(F')} = \frac{0.9026 \times 0.35}{0.8926} = \frac{0.3159}{0.8926} = 0.354\) (35.4%)

    (Slightly higher than the prior 35%, because Machine 2 has lower defect rate)

  4. Without Machine 3: New proportions: M1 = 40/(40+35) = 53.3%, M2 = 35/(40+35) = 46.7%

    \(P(D) = 0.03 \times 0.533 + 0.02 \times 0.467 = 0.016 + 0.0093 = 0.0253\) (2.53%)

    Reduction from 3.15% to 2.53% (about 20% improvement)

Problem 11: Tree Diagram Analysis (xxx)

Draw a probability tree and solve:

A marketing campaign targets customers. Historical data shows:

  • 40% of customers receive email marketing
  • 50% of customers receive phone marketing
  • 30% of customers receive both

For customers who receive email marketing only: 10% purchase For customers who receive phone marketing only: 15% purchase For customers who receive both: 25% purchase For customers who receive neither: 2% purchase

  1. What is the overall purchase rate?
  2. If a customer made a purchase, what is the probability they received both types of marketing?
  3. Which marketing channel is most effective in terms of increasing purchase probability?

Interpreting the customer segments:

  • Email only: 10%
  • Phone only: 20%
  • Both: 30%
  • Neither: 40%

(These sum to 100%)

Given purchase rates: P(Purchase|Email only) = 0.10 P(Purchase|Phone only) = 0.15 P(Purchase|Both) = 0.25 P(Purchase|Neither) = 0.02

  1. Overall purchase rate: \(P(Purchase) = 0.10 \times 0.10 + 0.15 \times 0.20 + 0.25 \times 0.30 + 0.02 \times 0.40\) \(= 0.01 + 0.03 + 0.075 + 0.008 = 0.123\) (12.3%)

  2. P(Both|Purchase): \(P(Both|Purchase) = \frac{P(Purchase|Both)P(Both)}{P(Purchase)} = \frac{0.25 \times 0.30}{0.123} = \frac{0.075}{0.123} = 0.610\)

    (61.0%)

  3. Marketing effectiveness:

    Comparing to baseline (neither) of 2%:

    • Email only: 10% (5× baseline)
    • Phone only: 15% (7.5× baseline)
    • Both: 25% (12.5× baseline)

    Phone marketing is more effective than email alone, but the combination is best. The synergy between channels (25% vs. 10%+15%-2% = 23% if independent) suggests a small positive interaction effect.

Problem 12: Comprehensive Medical Testing (xxxx)

A hospital screens for a disease with the following characteristics:

  • Base rate in the screening population: 2%
  • Screening test: 92% true positive rate, 88% true negative rate
  • Confirmatory test: 99% true positive rate, 97% true negative rate

Testing protocol: All patients get the screening test. Those who test positive get the confirmatory test.

  1. What percentage of the screening population will need the confirmatory test?
  2. Of those who test positive on the screening test, what percentage will test positive on the confirmatory test?
  3. If a patient tests positive on both tests, what is the probability they have the disease?
  4. The hospital can only afford confirmatory tests for 10% of patients. What true negative rate would the screening test need to achieve this?

P(D) = 0.02, P(D’) = 0.98 Screening: TPR = 0.92, TNR = 0.88 (so FPR = 0.12) Confirmatory: TPR = 0.99, TNR = 0.97 (so FPR = 0.03)

  1. Percentage needing confirmatory test = P(Screen+): \(P(S+) = 0.92 \times 0.02 + 0.12 \times 0.98 = 0.0184 + 0.1176 = 0.136\)

    13.6% will need confirmatory testing

  2. P(Confirm+|Screen+): First, find P(D|Screen+): \(P(D|S+) = \frac{0.0184}{0.136} = 0.1353\)

    \(P(C+|S+) = P(C+|D)P(D|S+) + P(C+|D')P(D'|S+)\) \(= 0.99 \times 0.1353 + 0.03 \times 0.8647\) \(= 0.134 + 0.026 = 0.160\)

    16.0% of those screened positive will confirm positive

  3. P(D|S+,C+): \(P(D|S+,C+) = \frac{P(C+|D)P(D|S+)}{P(C+|S+)} = \frac{0.99 \times 0.1353}{0.160} = \frac{0.134}{0.160} = 0.838\)

    83.8% probability of disease after both positive

  4. Required true negative rate for P(S+) = 0.10: Let true negative rate = x, so false positive rate = (1-x)

    \(P(S+) = 0.92 \times 0.02 + (1-x) \times 0.98 = 0.10\) \(0.0184 + 0.98 - 0.98x = 0.10\) \(0.9984 - 0.98x = 0.10\) \(0.98x = 0.8984\) \(x = 0.917\)

    Required true negative rate: 91.7% (up from current 88%)