Session 09-04: Tasks
Probability & Statistics - Exam Review
Probability & Statistics - Exam Review
Problem 1: Basic Probability (x)
A die is rolled twice. Find the probability that the sum is 7.
A bag contains 5 red, 3 blue, and 2 green balls. Find the probability of drawing a red or green ball.
Two cards are drawn from a standard 52-card deck without replacement. Find the probability both are aces.
Events A and B satisfy \(P(A) = 0.5\), \(P(B) = 0.4\), \(P(A \cap B) = 0.2\). Find \(P(A \cup B)\) and \(P(A | B)\).
a) The total number of outcomes when rolling two dice is \(6 \times 6 = 36\).
The pairs that sum to 7 are: \((1,6), (2,5), (3,4), (4,3), (5,2), (6,1)\) — that is 6 pairs.
\[P(\text{sum} = 7) = \frac{6}{36} = \frac{1}{6} \approx 0.1667\]
b) Total balls: \(5 + 3 + 2 = 10\). Red or green balls: \(5 + 2 = 7\).
\[P(\text{red or green}) = \frac{7}{10} = 0.7\]
c) There are 4 aces in a 52-card deck. Without replacement:
\[P(\text{both aces}) = \frac{4}{52} \cdot \frac{3}{51} = \frac{12}{2652} = \frac{1}{221} \approx 0.00452\]
d) Using the addition rule:
\[P(A \cup B) = P(A) + P(B) - P(A \cap B) = 0.5 + 0.4 - 0.2 = 0.7\]
Using the definition of conditional probability:
\[P(A | B) = \frac{P(A \cap B)}{P(B)} = \frac{0.2}{0.4} = 0.5\]
Interpretation: Given that B occurs, the probability that A also occurs is 0.5.
Problem 2: Combinatorics (x)
In how many ways can 5 books be arranged on a shelf?
A team of 4 is chosen from 12 people. How many teams are possible?
A PIN consists of 4 digits (0–9). How many PINs are possible if digits can repeat? If digits cannot repeat?
a) This is a permutation of 5 objects:
\[5! = 5 \times 4 \times 3 \times 2 \times 1 = 120\]
There are 120 ways to arrange 5 books on a shelf.
b) Order does not matter, so we use the combination formula:
\[\binom{12}{4} = \frac{12!}{4! \cdot 8!} = \frac{12 \times 11 \times 10 \times 9}{4 \times 3 \times 2 \times 1} = \frac{11880}{24} = 495\]
There are 495 possible teams.
c) With repetition: Each of the 4 positions can be any digit from 0–9, so:
\[10^4 = 10000\]
Without repetition: The first digit has 10 choices, the second has 9, the third has 8, the fourth has 7:
\[10 \times 9 \times 8 \times 7 = 5040\]
There are 10,000 PINs with repetition and 5,040 PINs without repetition.
Problem 3: Contingency Table Reading (x)
A survey of 200 students:
| Math Major | Science Major | Business Major | Total | |
|---|---|---|---|---|
| Satisfied | 30 | 25 | 45 | 100 |
| Neutral | 10 | 15 | 25 | 50 |
| Dissatisfied | 10 | 10 | 30 | 50 |
| Total | 50 | 50 | 100 | 200 |
Find \(P(\text{Satisfied})\)
Find \(P(\text{Business Major} | \text{Dissatisfied})\)
Find \(P(\text{Dissatisfied} | \text{Math Major})\)
Are “Satisfied” and “Math Major” independent? Check using the definition.
a)
\[P(\text{Satisfied}) = \frac{100}{200} = 0.5\]
b) Among dissatisfied students (50 total), 30 are Business Majors:
\[P(\text{Business Major} | \text{Dissatisfied}) = \frac{30}{50} = 0.6\]
c) Among Math Majors (50 total), 10 are dissatisfied:
\[P(\text{Dissatisfied} | \text{Math Major}) = \frac{10}{50} = 0.2\]
d) Two events are independent if \(P(A \cap B) = P(A) \cdot P(B)\).
\[P(\text{Satisfied} \cap \text{Math Major}) = \frac{30}{200} = 0.15\]
\[P(\text{Satisfied}) \cdot P(\text{Math Major}) = 0.5 \times \frac{50}{200} = 0.5 \times 0.25 = 0.125\]
Since \(0.15 \neq 0.125\), the events are not independent.
Interpretation: Being a Math Major and being satisfied are dependent — Math Majors are somewhat more likely to be satisfied than the overall student population.
Problem 4: Contingency Table Construction (xx)
In a factory, 60% of products are from Machine A and 40% from Machine B. Machine A has a 4% defect rate, Machine B has a 6% defect rate.
Construct a contingency table for 500 products.
What is the overall defect rate?
A defective product is found. What is the probability it came from Machine A?
A non-defective product is found. What is the probability it came from Machine B?
a) Machine A produces \(500 \times 0.60 = 300\) items, Machine B produces \(500 \times 0.40 = 200\) items.
Defective from A: \(300 \times 0.04 = 12\). Defective from B: \(200 \times 0.06 = 12\).
| Machine A | Machine B | Total | |
|---|---|---|---|
| Defective | 12 | 12 | 24 |
| Non-defective | 288 | 188 | 476 |
| Total | 300 | 200 | 500 |
b) Overall defect rate:
\[P(\text{Defective}) = \frac{24}{500} = 0.048 = 4.8\%\]
Alternatively: \(0.60 \times 0.04 + 0.40 \times 0.06 = 0.024 + 0.024 = 0.048\).
c) Using Bayes’ theorem:
\[P(\text{Machine A} | \text{Defective}) = \frac{P(\text{Defective} | \text{A}) \cdot P(\text{A})}{P(\text{Defective})} = \frac{0.04 \times 0.60}{0.048} = \frac{0.024}{0.048} = 0.5\]
From the table: \(\frac{12}{24} = 0.5\). There is a 50% chance it came from Machine A.
d) Using Bayes’ theorem:
\[P(\text{Machine B} | \text{Non-defective}) = \frac{P(\text{Non-def.} | \text{B}) \cdot P(\text{B})}{P(\text{Non-def.})} = \frac{0.94 \times 0.40}{0.952} = \frac{0.376}{0.952} \approx 0.3950\]
From the table: \(\frac{188}{476} \approx 0.3950\). There is approximately a 39.5% chance it came from Machine B.
Problem 5: Bayes’ Theorem with Tree Diagram (xx)
A store has three suppliers:
- Supplier X: 50% of stock, 5% defective
- Supplier Y: 30% of stock, 3% defective
- Supplier Z: 20% of stock, 8% defective
Draw a tree diagram with all probabilities.
Find the overall defect rate.
An item is defective. Find the probability it’s from Supplier Z.
An item is not defective. Find the probability it’s from Supplier X.
a) Tree diagram structure:
- Branch X (0.50):
- Defective: \(0.50 \times 0.05 = 0.025\)
- Non-defective: \(0.50 \times 0.95 = 0.475\)
- Branch Y (0.30):
- Defective: \(0.30 \times 0.03 = 0.009\)
- Non-defective: \(0.30 \times 0.97 = 0.291\)
- Branch Z (0.20):
- Defective: \(0.20 \times 0.08 = 0.016\)
- Non-defective: \(0.20 \times 0.92 = 0.184\)
b) Overall defect rate (total probability):
\[P(D) = 0.025 + 0.009 + 0.016 = 0.050 = 5.0\%\]
c) Bayes’ theorem:
\[P(Z | D) = \frac{P(D | Z) \cdot P(Z)}{P(D)} = \frac{0.08 \times 0.20}{0.050} = \frac{0.016}{0.050} = 0.32\]
There is a 32% probability the defective item is from Supplier Z.
d) First, \(P(\text{Non-defective}) = 1 - 0.050 = 0.950\).
\[P(X | \text{Non-def.}) = \frac{P(\text{Non-def.} | X) \cdot P(X)}{P(\text{Non-def.})} = \frac{0.95 \times 0.50}{0.950} = \frac{0.475}{0.950} = 0.5\]
There is a 50% probability the non-defective item is from Supplier X.
Problem 6: Binomial Distribution Calculations (xx)
A multiple-choice exam has 15 questions, each with 4 options (one correct).
If a student guesses randomly, what distribution models the number correct? State parameters.
Find \(P(X = 5)\).
Find \(P(X \geq 3)\).
Find the expected number of correct answers and the standard deviation.
The pass mark is 8 correct. What is \(P(\text{pass by guessing})\)?
a) Each question is an independent Bernoulli trial with \(p = \frac{1}{4} = 0.25\) (probability of guessing correctly). With \(n = 15\) questions:
\[X \sim \text{Bin}(n = 15, p = 0.25)\]
b) Using the binomial formula:
\[P(X = 5) = \binom{15}{5} (0.25)^5 (0.75)^{10}\]
\[= 3003 \times 0.0009766 \times 0.05631\]
\[= 3003 \times 0.00005499\]
\[\approx 0.1651\]
c) It is easier to compute \(P(X \geq 3) = 1 - P(X \leq 2) = 1 - [P(X=0) + P(X=1) + P(X=2)]\).
\[P(X = 0) = \binom{15}{0}(0.25)^0(0.75)^{15} = 1 \times 1 \times 0.01336 = 0.01336\]
\[P(X = 1) = \binom{15}{1}(0.25)^1(0.75)^{14} = 15 \times 0.25 \times 0.01782 = 0.06680\]
\[P(X = 2) = \binom{15}{2}(0.25)^2(0.75)^{13} = 105 \times 0.0625 \times 0.02376 = 0.15592\]
\[P(X \leq 2) = 0.01336 + 0.06680 + 0.15592 = 0.23608\]
\[P(X \geq 3) = 1 - 0.23608 = 0.76392 \approx 0.7639\]
d) Expected value:
\[E(X) = n \cdot p = 15 \times 0.25 = 3.75\]
Standard deviation:
\[\sigma = \sqrt{n \cdot p \cdot (1-p)} = \sqrt{15 \times 0.25 \times 0.75} = \sqrt{2.8125} \approx 1.677\]
On average, a student guessing randomly gets 3.75 correct answers with a standard deviation of about 1.68.
e) We need \(P(X \geq 8)\) with \(X \sim \text{Bin}(15, 0.25)\).
\[P(X \geq 8) = \sum_{k=8}^{15} \binom{15}{k} (0.25)^k (0.75)^{15-k}\]
Computing the dominant terms:
\[P(X=8) = \binom{15}{8}(0.25)^8(0.75)^7 = 6435 \times 1.526 \times 10^{-5} \times 0.1335 \approx 0.01311\]
\[P(X=9) = \binom{15}{9}(0.25)^9(0.75)^6 = 5005 \times 3.815 \times 10^{-6} \times 0.1780 \approx 0.00340\]
\[P(X=10) = \binom{15}{10}(0.25)^{10}(0.75)^5 = 3003 \times 9.537 \times 10^{-7} \times 0.2373 \approx 0.00068\]
The remaining terms (\(k = 11, \ldots, 15\)) contribute less than 0.0001 combined.
\[P(X \geq 8) \approx 0.01311 + 0.00340 + 0.00068 + \ldots \approx 0.0173\]
The probability of passing by pure guessing is approximately 1.7% — extremely unlikely.
Problem 7: Descriptive Statistics from Frequency Table (xx)
Weekly overtime hours for 50 employees:
| Hours | 0–2 | 2–4 | 4–6 | 6–8 | 8–10 |
|---|---|---|---|---|---|
| Frequency | 8 | 14 | 16 | 9 | 3 |
Calculate the estimated mean.
Find the median class and estimate the median.
Calculate the standard deviation.
The company policy allows a maximum average of 5 hours overtime. Is the company in compliance?
a) Using class midpoints \(m_i\): 1, 3, 5, 7, 9.
\[\bar{x} = \frac{\sum f_i \cdot m_i}{\sum f_i} = \frac{8(1) + 14(3) + 16(5) + 9(7) + 3(9)}{50}\]
\[= \frac{8 + 42 + 80 + 63 + 27}{50} = \frac{220}{50} = 4.4 \text{ hours}\]
b) Cumulative frequencies: 8, 22, 38, 47, 50.
The median is the value at position \(\frac{50}{2} = 25\). The cumulative frequency first exceeds 25 in the class 4–6 (cumulative frequency reaches 38). So the median class is 4–6.
Using linear interpolation:
\[\tilde{x} = L + \frac{\frac{n}{2} - F}{f} \cdot w\]
where \(L = 4\) (lower boundary), \(F = 22\) (cumulative frequency before median class), \(f = 16\) (frequency of median class), \(w = 2\) (class width).
\[\tilde{x} = 4 + \frac{25 - 22}{16} \times 2 = 4 + \frac{3}{16} \times 2 = 4 + 0.375 = 4.375 \text{ hours}\]
c) Using the formula \(s = \sqrt{\frac{\sum f_i(m_i - \bar{x})^2}{n-1}}\):
| Class | \(m_i\) | \(f_i\) | \(m_i - \bar{x}\) | \((m_i - \bar{x})^2\) | \(f_i(m_i - \bar{x})^2\) |
|---|---|---|---|---|---|
| 0–2 | 1 | 8 | \(-3.4\) | 11.56 | 92.48 |
| 2–4 | 3 | 14 | \(-1.4\) | 1.96 | 27.44 |
| 4–6 | 5 | 16 | 0.6 | 0.36 | 5.76 |
| 6–8 | 7 | 9 | 2.6 | 6.76 | 60.84 |
| 8–10 | 9 | 3 | 4.6 | 21.16 | 63.48 |
\[s = \sqrt{\frac{92.48 + 27.44 + 5.76 + 60.84 + 63.48}{49}} = \sqrt{\frac{250.00}{49}} = \sqrt{5.1020} \approx 2.259 \text{ hours}\]
d) The estimated mean overtime is 4.4 hours, which is below the 5-hour maximum. The company is in compliance with its overtime policy.
Problem 8: Probability System of Equations (xx)
Events A and B satisfy:
- \(P(A \cup B) = 0.8\)
- \(P(A \cap B) = 0.15\)
- \(P(A) = 2 \cdot P(B)\)
Set up a system of equations.
Solve for \(P(A)\) and \(P(B)\).
Find \(P(A | B)\) and \(P(B | A)\).
Are A and B independent?
a) Let \(a = P(A)\) and \(b = P(B)\). From the addition rule and given conditions:
\[a + b - 0.15 = 0.8 \quad \Rightarrow \quad a + b = 0.95 \quad \text{...(1)}\]
\[a = 2b \quad \text{...(2)}\]
b) Substituting (2) into (1):
\[2b + b = 0.95\]
\[3b = 0.95\]
\[b = \frac{0.95}{3} \approx 0.3167\]
\[a = 2 \times 0.3167 \approx 0.6333\]
So \(P(A) = \frac{19}{30} \approx 0.6333\) and \(P(B) = \frac{19}{60} \approx 0.3167\).
c) Conditional probabilities:
\[P(A | B) = \frac{P(A \cap B)}{P(B)} = \frac{0.15}{19/60} = \frac{0.15 \times 60}{19} = \frac{9}{19} \approx 0.4737\]
\[P(B | A) = \frac{P(A \cap B)}{P(A)} = \frac{0.15}{19/30} = \frac{0.15 \times 30}{19} = \frac{4.5}{19} = \frac{9}{38} \approx 0.2368\]
d) For independence we need \(P(A \cap B) = P(A) \cdot P(B)\):
\[P(A) \cdot P(B) = \frac{19}{30} \times \frac{19}{60} = \frac{361}{1800} \approx 0.2006\]
Since \(P(A \cap B) = 0.15 \neq 0.2006\), events A and B are not independent.
Problem 9: Conditional Probability Word Problem (xx)
At a university, 70% of students who study regularly pass the exam, while only 20% of those who don’t study regularly pass. If 60% of students study regularly:
What is the overall pass rate?
Given that a student passed, what is the probability they studied regularly?
Given that a student failed, what is the probability they did not study regularly?
Let \(S\) = studies regularly, \(P\) = passes the exam.
Given: \(P(P|S) = 0.70\), \(P(P|S^c) = 0.20\), \(P(S) = 0.60\), \(P(S^c) = 0.40\).
a) Total probability:
\[P(P) = P(P|S) \cdot P(S) + P(P|S^c) \cdot P(S^c) = 0.70 \times 0.60 + 0.20 \times 0.40 = 0.42 + 0.08 = 0.50\]
The overall pass rate is 50%.
b) Bayes’ theorem:
\[P(S|P) = \frac{P(P|S) \cdot P(S)}{P(P)} = \frac{0.70 \times 0.60}{0.50} = \frac{0.42}{0.50} = 0.84\]
Given that a student passed, there is an 84% probability they studied regularly.
c) First, \(P(\text{Fail}) = 1 - 0.50 = 0.50\).
\(P(\text{Fail} | S^c) = 1 - 0.20 = 0.80\).
\[P(S^c | \text{Fail}) = \frac{P(\text{Fail} | S^c) \cdot P(S^c)}{P(\text{Fail})} = \frac{0.80 \times 0.40}{0.50} = \frac{0.32}{0.50} = 0.64\]
Given that a student failed, there is a 64% probability they did not study regularly.
Problem 10: Expected Value Calculations (xx)
A game costs €5 to play. You roll two dice:
- Sum of 12: win €50
- Sum of 7: win €10
- Any other sum: win nothing
Find the probability distribution of winnings.
Calculate the expected winnings.
Calculate the expected profit (winnings minus cost).
If you play 100 times, what is your expected total profit/loss?
a) Total outcomes: \(6 \times 6 = 36\).
- Sum of 12: only \((6,6)\) — 1 outcome. \(P(\text{sum}=12) = \frac{1}{36}\)
- Sum of 7: \((1,6),(2,5),(3,4),(4,3),(5,2),(6,1)\) — 6 outcomes. \(P(\text{sum}=7) = \frac{6}{36} = \frac{1}{6}\)
- Any other sum: \(36 - 1 - 6 = 29\) outcomes. \(P(\text{other}) = \frac{29}{36}\)
| Winnings \(w\) | €50 | €10 | €0 |
|---|---|---|---|
| \(P(W = w)\) | \(\frac{1}{36}\) | \(\frac{6}{36}\) | \(\frac{29}{36}\) |
b) Expected winnings:
\[E(W) = 50 \times \frac{1}{36} + 10 \times \frac{6}{36} + 0 \times \frac{29}{36}\]
\[= \frac{50}{36} + \frac{60}{36} = \frac{110}{36} \approx €3.056\]
c) Expected profit per game:
\[E(\text{Profit}) = E(W) - 5 = 3.056 - 5 = -€1.944\]
On average, you lose about €1.94 per game.
d) Over 100 games:
\[E(\text{Total profit}) = 100 \times (-1.944) = -€194.44\]
You would expect to lose approximately €194.44 after 100 games. This is not a favorable game for the player.
Problem 11: Quality Control Multi-Step Bayes (xxx)
A screening process has two stages:
- Stage 1 test: Sensitivity 90%, Specificity 85%
- Stage 2 test (only applied to Stage 1 positives): Sensitivity 95%, Specificity 92%
- Base rate of defects: 3%
What fraction of items test positive in Stage 1?
Among Stage 1 positives, what is the new “prevalence” of actual defects? (i.e., PPV of Stage 1)
What fraction of Stage 1 positives also test positive in Stage 2?
What is the final PPV after both stages?
How does the two-stage process compare to a single test with the same sensitivity and specificity as Stage 2?
a) Using the total probability of a positive result in Stage 1:
\[P(T_1^+) = P(T_1^+ | D) \cdot P(D) + P(T_1^+ | D^c) \cdot P(D^c)\]
\[= 0.90 \times 0.03 + 0.15 \times 0.97 = 0.027 + 0.1455 = 0.1725\]
So 17.25% of all items test positive in Stage 1.
b) The PPV of Stage 1 (i.e., the probability of being truly defective given a positive Stage 1 result):
\[\text{PPV}_1 = P(D | T_1^+) = \frac{P(T_1^+ | D) \cdot P(D)}{P(T_1^+)} = \frac{0.027}{0.1725} \approx 0.1565\]
Among Stage 1 positives, about 15.65% are truly defective. This becomes the “prior” for Stage 2.
c) Among Stage 1 positives, we now apply Stage 2 with the updated prevalence \(p' = 0.1565\):
\[P(T_2^+ | T_1^+) = P(T_2^+ | D) \cdot P(D | T_1^+) + P(T_2^+ | D^c) \cdot P(D^c | T_1^+)\]
\[= 0.95 \times 0.1565 + 0.08 \times 0.8435 = 0.14868 + 0.06748 = 0.21616\]
About 21.62% of Stage 1 positives also test positive in Stage 2.
d) The PPV after both stages:
\[\text{PPV}_2 = \frac{P(T_2^+ | D) \cdot P(D | T_1^+)}{P(T_2^+ | T_1^+)} = \frac{0.95 \times 0.1565}{0.21616} = \frac{0.14868}{0.21616} \approx 0.6878\]
After both stages, about 68.78% of double-positive items are truly defective.
e) For comparison, a single Stage 2 test applied to the original population (3% prevalence):
\[P(T_2^+) = 0.95 \times 0.03 + 0.08 \times 0.97 = 0.0285 + 0.0776 = 0.1061\]
\[\text{PPV}_{\text{single}} = \frac{0.0285}{0.1061} \approx 0.2686\]
A single Stage 2 test gives a PPV of only about 26.86%, while the two-stage process achieves 68.78%. The two-stage process dramatically improves the positive predictive value by pre-filtering with Stage 1, effectively increasing the prevalence among the tested group before applying the more accurate Stage 2 test.
Problem 12: Binomial + Expected Profit (xxx)
A company produces batches of 100 items. Each item has a 2% probability of being defective (independent).
What is the expected number of defective items per batch?
Each defective item costs €150 to replace. What is the expected replacement cost per batch?
The company offers a warranty: if more than 5 items in a batch are defective, the entire batch is replaced at a cost of €8,000. Find \(P(X > 5)\).
What is the expected cost per batch including the warranty?
Should the company price the warranty at €200 per batch? Justify.
a) With \(X \sim \text{Bin}(100, 0.02)\):
\[E(X) = n \cdot p = 100 \times 0.02 = 2\]
The expected number of defective items per batch is 2.
b) Expected replacement cost:
\[E(\text{cost}) = E(X) \times 150 = 2 \times 150 = €300\]
c) We need \(P(X > 5) = 1 - P(X \leq 5)\).
Using the binomial distribution with \(n=100\), \(p=0.02\):
\[P(X = k) = \binom{100}{k}(0.02)^k(0.98)^{100-k}\]
\[P(X=0) = (0.98)^{100} \approx 0.1326\]
\[P(X=1) = \binom{100}{1}(0.02)(0.98)^{99} = 100 \times 0.02 \times 0.1353 \approx 0.2707\]
\[P(X=2) = \binom{100}{2}(0.02)^2(0.98)^{98} = 4950 \times 0.0004 \times 0.1380 \approx 0.2734\]
\[P(X=3) = \binom{100}{3}(0.02)^3(0.98)^{97} = 161700 \times 0.000008 \times 0.1409 \approx 0.1823\]
\[P(X=4) = \binom{100}{4}(0.02)^4(0.98)^{96} = 3921225 \times 0.00000016 \times 0.1437 \approx 0.0902\]
\[P(X=5) = \binom{100}{5}(0.02)^5(0.98)^{95} = 75287520 \times 3.2 \times 10^{-9} \times 0.1466 \approx 0.0353\]
\[P(X \leq 5) \approx 0.1326 + 0.2707 + 0.2734 + 0.1823 + 0.0902 + 0.0353 = 0.9845\]
\[P(X > 5) \approx 1 - 0.9845 = 0.0155\]
There is approximately a 1.55% chance that more than 5 items are defective.
d) Expected cost including warranty:
- Regular replacement cost (for batches with \(X \leq 5\) defectives): This still occurs; on average, replacement costs are €300 per batch.
- Additional warranty cost: \(P(X > 5) \times 8000 = 0.0155 \times 8000 = €124\).
Expected total cost per batch:
\[E(\text{total cost}) = 300 + 0.0155 \times 8000 = 300 + 124 = €424\]
e) The expected warranty payout is approximately €124 per batch. If the company charges €200 per batch for the warranty:
\[\text{Expected profit per warranty} = 200 - 124 = €76\]
Yes, the company should price the warranty at €200. This provides an expected profit of €76 per warranty sold (a 38% margin), which is a comfortable buffer above the expected cost.
Problem 13: Insurance Pricing Word Problem (xxx)
An insurance company models car accident claims:
- 80% of drivers are “safe” with 2% annual accident probability
- 20% of drivers are “risky” with 10% annual accident probability
- Average claim: €8,000
What is the overall probability of a claim?
A driver files a claim. What is the probability they are “risky”? (Bayes)
The company charges a flat premium. What minimum annual premium ensures expected profit?
If the company charges €500/year, what is the expected profit per policyholder?
After a claim-free year, what is the updated probability that a policyholder is “risky”? (Bayes with updating)
Let \(S\) = safe driver, \(R\) = risky driver, \(C\) = claim filed.
Given: \(P(S) = 0.80\), \(P(R) = 0.20\), \(P(C|S) = 0.02\), \(P(C|R) = 0.10\), average claim = €8,000.
a) Total probability:
\[P(C) = P(C|S) \cdot P(S) + P(C|R) \cdot P(R) = 0.02 \times 0.80 + 0.10 \times 0.20 = 0.016 + 0.020 = 0.036\]
The overall probability of a claim is 3.6%.
b) Bayes’ theorem:
\[P(R|C) = \frac{P(C|R) \cdot P(R)}{P(C)} = \frac{0.10 \times 0.20}{0.036} = \frac{0.020}{0.036} \approx 0.5556\]
Given that a driver files a claim, there is about a 55.56% probability they are a risky driver. Note how the prior of 20% risky jumps to over 55% after observing a claim.
c) Expected claim cost per policyholder per year:
\[E(\text{cost}) = P(C) \times 8000 = 0.036 \times 8000 = €288\]
The minimum annual premium to ensure expected profit is any amount greater than €288.
d) Expected profit per policyholder:
\[E(\text{profit}) = 500 - 288 = €212\]
The expected profit per policyholder is €212 per year.
e) We want \(P(R | C^c)\), i.e., the probability a driver is risky given no claim in one year.
First: \(P(C^c) = 1 - 0.036 = 0.964\).
\[P(R | C^c) = \frac{P(C^c | R) \cdot P(R)}{P(C^c)} = \frac{0.90 \times 0.20}{0.964} = \frac{0.18}{0.964} \approx 0.1867\]
After a claim-free year, the probability the policyholder is risky decreases from 20% to approximately 18.67%. The absence of a claim provides slight evidence that the driver is safer.
Problem 14: “How Many Trials Needed” with Logarithms (xxx)
A biased coin has \(P(\text{heads}) = 0.6\). How many flips are needed so that \(P(\text{at least one head}) > 0.999\)? Solve using logarithms.
A machine produces items with 1% defect rate. How many items must be inspected to be 95% sure of finding at least one defect?
A radioactive substance has a half-life of 5 years. After how many years is less than 1% of the original amount remaining? (Use \(0.5^{t/5} < 0.01\) and logarithms.)
a) We need \(P(\text{at least one head in } n \text{ flips}) > 0.999\).
\[1 - P(\text{no heads}) > 0.999\]
\[1 - (0.4)^n > 0.999\]
\[(0.4)^n < 0.001\]
Taking logarithms (base 10):
\[n \cdot \log(0.4) < \log(0.001)\]
\[n \cdot (-0.39794) < -3\]
Since we divide by a negative number, the inequality flips:
\[n > \frac{-3}{-0.39794} = \frac{3}{0.39794} \approx 7.539\]
Since \(n\) must be an integer, we need \(n = 8\) flips.
b) We need \(P(\text{at least one defect in } n \text{ items}) > 0.95\).
\[1 - (0.99)^n > 0.95\]
\[(0.99)^n < 0.05\]
\[n \cdot \ln(0.99) < \ln(0.05)\]
\[n > \frac{\ln(0.05)}{\ln(0.99)} = \frac{-2.9957}{-0.01005} \approx 298.1\]
We need to inspect at least 299 items.
c) We solve \(0.5^{t/5} < 0.01\):
\[\frac{t}{5} \cdot \ln(0.5) < \ln(0.01)\]
\[\frac{t}{5} \cdot (-0.6931) < -4.6052\]
\[\frac{t}{5} > \frac{4.6052}{0.6931} \approx 6.644\]
\[t > 5 \times 6.644 = 33.22\]
After approximately 33.22 years (so after about 34 years), less than 1% of the original radioactive substance remains.
Problem 15: Sequential Bayes (xxx)
A coin is either fair (\(P(H) = 0.5\)) or biased (\(P(H) = 0.8\)). Initially, you believe there’s a 50% chance the coin is biased.
You flip the coin and get heads. Update the probability the coin is biased.
You flip again and get heads. Update again.
You flip a third time and get tails. Update again.
After these 3 flips (H, H, T), what is the probability the coin is biased?
How many heads in a row would you need to be 99% sure the coin is biased?
Let \(B\) = coin is biased, \(F\) = coin is fair. Initially \(P(B) = P(F) = 0.5\).
a) After observing Heads:
\[P(B | H) = \frac{P(H|B) \cdot P(B)}{P(H|B) \cdot P(B) + P(H|F) \cdot P(F)}\]
\[= \frac{0.8 \times 0.5}{0.8 \times 0.5 + 0.5 \times 0.5} = \frac{0.40}{0.40 + 0.25} = \frac{0.40}{0.65} \approx 0.6154\]
After one head, \(P(B) \approx 0.6154\).
b) Now use \(P(B) = 0.6154\) as the prior and observe another H:
\[P(B | H) = \frac{0.8 \times 0.6154}{0.8 \times 0.6154 + 0.5 \times 0.3846} = \frac{0.4923}{0.4923 + 0.1923} = \frac{0.4923}{0.6846} \approx 0.7191\]
After two heads, \(P(B) \approx 0.7191\).
c) Now use \(P(B) = 0.7191\) as the prior and observe Tails. For the biased coin \(P(T|B) = 0.2\), for the fair coin \(P(T|F) = 0.5\):
\[P(B | T) = \frac{P(T|B) \cdot P(B)}{P(T|B) \cdot P(B) + P(T|F) \cdot P(F)}\]
\[= \frac{0.2 \times 0.7191}{0.2 \times 0.7191 + 0.5 \times 0.2809} = \frac{0.1438}{0.1438 + 0.1405} = \frac{0.1438}{0.2843} \approx 0.5059\]
After observing tails, the probability drops back to approximately \(P(B) \approx 0.5059\).
d) After 3 flips (H, H, T), the probability the coin is biased is approximately 0.5059, i.e., about 50.6%. The tails result nearly cancels out the evidence from the two heads, since tails is much less likely under the biased coin.
We can verify this directly. The likelihood of H, H, T under each hypothesis:
- Biased: \(0.8 \times 0.8 \times 0.2 = 0.128\)
- Fair: \(0.5 \times 0.5 \times 0.5 = 0.125\)
\[P(B | H,H,T) = \frac{0.128 \times 0.5}{0.128 \times 0.5 + 0.125 \times 0.5} = \frac{0.064}{0.064 + 0.0625} = \frac{0.064}{0.1265} \approx 0.5059 \checkmark\]
e) After \(n\) heads in a row, with equal priors:
\[P(B | n \text{ heads}) = \frac{0.8^n \times 0.5}{0.8^n \times 0.5 + 0.5^n \times 0.5} = \frac{0.8^n}{0.8^n + 0.5^n}\]
We need this to exceed 0.99:
\[\frac{0.8^n}{0.8^n + 0.5^n} > 0.99\]
\[0.8^n > 0.99 \times (0.8^n + 0.5^n)\]
\[0.01 \times 0.8^n > 0.99 \times 0.5^n\]
\[\frac{0.8^n}{0.5^n} > \frac{0.99}{0.01} = 99\]
\[\left(\frac{0.8}{0.5}\right)^n > 99\]
\[1.6^n > 99\]
\[n > \frac{\ln(99)}{\ln(1.6)} = \frac{4.5951}{0.4700} \approx 9.78\]
You would need 10 heads in a row to be 99% sure the coin is biased.
Problem 16: Full Exam-Style Problem (xxxx)
A company produces electronic components on two assembly lines:
- Line A: 60% of production, 4% defect rate
- Line B: 40% of production, 7% defect rate
The company tests all components. The test has:
- Sensitivity: 92% (correctly detects defective)
- Specificity: 88% (correctly passes non-defective)
Construct a full contingency table for 10,000 components (include production line AND test result).
A component tests positive. What is the probability it is truly defective?
A component from Line A tests positive. What is the probability it is truly defective?
The company discards all test-positive items. What fraction of discarded items are actually good? (False positive rate among positives.)
Each false positive costs €20 (wasted good component). Each false negative costs €200 (defective reaches customer). What is the expected total quality cost per 10,000 components?
Would it be cost-effective to add a second test (at €0.50 per component) to retest all positives? The second test has the same sensitivity and specificity. Calculate and compare.
a) From 10,000 components:
Line A (6,000 components):
- Defective: \(6000 \times 0.04 = 240\)
- Non-defective: \(6000 - 240 = 5760\)
Line B (4,000 components):
- Defective: \(4000 \times 0.07 = 280\)
- Non-defective: \(4000 - 280 = 3720\)
Total: Defective = 520, Non-defective = 9,480.
Now apply the test (sensitivity 92%, specificity 88%):
- True Positives (defective, test +): \(520 \times 0.92 = 478.4\)
- False Negatives (defective, test −): \(520 \times 0.08 = 41.6\)
- False Positives (non-defective, test +): \(9480 \times 0.12 = 1137.6\)
- True Negatives (non-defective, test −): \(9480 \times 0.88 = 8342.4\)
| Test Positive | Test Negative | Total | |
|---|---|---|---|
| Defective | 478.4 | 41.6 | 520 |
| Non-defective | 1137.6 | 8342.4 | 9480 |
| Total | 1616 | 8384 | 10000 |
b) PPV (Positive Predictive Value):
\[P(\text{Defective} | \text{Test}^+) = \frac{478.4}{1616} \approx 0.2960\]
About 29.60% of components that test positive are truly defective.
c) For Line A specifically:
- Defective from A: 240. Test positive: \(240 \times 0.92 = 220.8\).
- Non-defective from A: 5760. Test positive: \(5760 \times 0.12 = 691.2\).
- Total test positive from A: \(220.8 + 691.2 = 912\).
\[P(\text{Defective} | \text{Test}^+, \text{Line A}) = \frac{220.8}{912} \approx 0.2421\]
About 24.21% of Line A positives are truly defective. This is lower than the overall PPV because Line A has a lower defect rate.
d) Discarded items = all test-positive items = 1616.
Good items among discarded (false positives): 1137.6.
\[\text{Fraction of discarded that are good} = \frac{1137.6}{1616} \approx 0.7040\]
About 70.40% of discarded items are actually good — a significant waste.
e) Quality costs:
- False positive cost: \(1137.6 \times €20 = €22,752\)
- False negative cost: \(41.6 \times €200 = €8,320\)
\[\text{Total quality cost} = €22,752 + €8,320 = €31,072 \text{ per 10,000 components}\]
f) The second test is applied to the 1616 items that tested positive in the first test.
Among these 1616 positives:
- 478.4 are truly defective
- 1137.6 are truly non-defective
Apply the second test:
- True Positives (2nd test): \(478.4 \times 0.92 = 440.13\)
- False Negatives (2nd test): \(478.4 \times 0.08 = 38.27\)
- False Positives (2nd test): \(1137.6 \times 0.12 = 136.51\)
- True Negatives (2nd test): \(1137.6 \times 0.88 = 1001.09\)
After two tests, items discarded = those positive on both tests: \(440.13 + 136.51 = 576.64\).
Items that escape (defective but not caught):
- False negatives from first test: 41.6
- Defective items positive on first test but negative on second: 38.27
- Total false negatives: \(41.6 + 38.27 = 79.87\)
New quality costs:
- False positives (good items discarded after both tests): \(136.51 \times €20 = €2,730.20\)
- False negatives (defectives reaching customer): \(79.87 \times €200 = €15,974.00\)
- Second test cost: \(1616 \times €0.50 = €808\)
\[\text{Total cost with 2nd test} = €2,730.20 + €15,974.00 + €808 = €19,512.20\]
Comparison:
- Single test: €31,072
- Two tests: €19,512.20
- Savings: \(€31,072 - €19,512.20 = €11,559.80\)
Yes, adding the second test is cost-effective, saving approximately €11,560 per 10,000 components. The second test dramatically reduces false positives (from 1137.6 to 136.51), which saves on wasted good components. Although false negatives increase somewhat, the net savings are substantial.
Problem 17: Multi-Stage Quality Assurance with Cost Optimization (xxxx)
A pharmaceutical company tests drug batches. Each batch either passes (good) or fails (contaminated). The prior probability of contamination is 5%.
Test characteristics: Sensitivity 98%, Specificity 90%.
Calculate the PPV after one test.
If a batch tests positive, it can be retested. Using the PPV from (a) as the new prior, calculate the PPV after a second positive test.
Each test costs €1,000. A contaminated batch that reaches market costs €500,000 in liability. A good batch destroyed costs €50,000 in production loss. Set up the expected cost for the following strategies:
- Strategy 1: No testing (ship all batches)
- Strategy 2: One test (discard if positive)
- Strategy 3: Two tests (discard only if both positive)
Which strategy minimizes expected cost?
At what contamination rate does Strategy 3 become cheaper than Strategy 2?
a) With prevalence \(P(D) = 0.05\):
\[P(T^+) = P(T^+|D) \cdot P(D) + P(T^+|D^c) \cdot P(D^c)\]
\[= 0.98 \times 0.05 + 0.10 \times 0.95 = 0.049 + 0.095 = 0.144\]
\[\text{PPV}_1 = P(D|T^+) = \frac{0.049}{0.144} \approx 0.3403\]
After one positive test, the probability of actual contamination is about 34.03%.
b) Using \(\text{PPV}_1 = 0.3403\) as the new prior:
\[P(T_2^+) = 0.98 \times 0.3403 + 0.10 \times 0.6597 = 0.33349 + 0.06597 = 0.39946\]
\[\text{PPV}_2 = \frac{0.98 \times 0.3403}{0.39946} = \frac{0.33349}{0.39946} \approx 0.8349\]
After two consecutive positive tests, the probability of contamination is about 83.49%.
c) We compute expected cost per batch for each strategy.
Strategy 1: No testing (ship all batches)
- Contaminated batches reach market: \(P(D) = 0.05\), cost = €500,000
- No testing cost, no destruction cost
\[E_1 = 0.05 \times 500{,}000 = €25{,}000\]
Strategy 2: One test (discard if positive)
- Testing cost: €1,000 per batch
- Discard if positive: \(P(T^+) = 0.144\)
- Good batch destroyed (false positive): \(P(D^c \cap T^+) = 0.095\), cost = €50,000
- Contaminated batch correctly caught: \(P(D \cap T^+) = 0.049\) — no liability
- Ship if negative: \(P(T^-) = 0.856\)
- Contaminated batch escapes (false negative): \(P(D \cap T^-) = P(T^-|D) \cdot P(D) = 0.02 \times 0.05 = 0.001\), cost = €500,000
\[E_2 = 1{,}000 + 0.095 \times 50{,}000 + 0.001 \times 500{,}000\]
\[= 1{,}000 + 4{,}750 + 500 = €6{,}250\]
Strategy 3: Two tests (discard only if both positive)
- Testing cost: €1,000 for first test on every batch. Second test only on positives: \(0.144 \times 1{,}000 = €144\)
- Total testing cost: \(€1{,}000 + €144 = €1{,}144\)
- Discard if both positive: items positive on both tests
- \(P(D \cap T_1^+ \cap T_2^+) = 0.05 \times 0.98 \times 0.98 = 0.04802\) (contaminated, caught by both)
- \(P(D^c \cap T_1^+ \cap T_2^+) = 0.95 \times 0.10 \times 0.10 = 0.0095\) (good, false positive on both)
- Good batch destroyed: \(0.0095 \times 50{,}000 = €475\)
- Ship if not both positive (first test negative OR second test negative):
- Contaminated escapes if first test negative: \(P(D \cap T_1^-) = 0.05 \times 0.02 = 0.001\)
- Contaminated escapes if first test positive but second test negative: \(P(D \cap T_1^+ \cap T_2^-) = 0.05 \times 0.98 \times 0.02 = 0.00098\)
- Total contaminated escaping: \(0.001 + 0.00098 = 0.00198\)
- Liability cost: \(0.00198 \times 500{,}000 = €990\)
\[E_3 = 1{,}144 + 475 + 990 = €2{,}609\]
d) Summary of expected costs per batch:
| Strategy | Expected Cost |
|---|---|
| Strategy 1 (no testing) | €25,000 |
| Strategy 2 (one test) | €6,250 |
| Strategy 3 (two tests) | €2,609 |
Strategy 3 minimizes expected cost at approximately €2,609 per batch. The second test significantly reduces both false positives (saving on destroyed good batches) and is well worth the additional €144 testing cost.
e) Let \(p\) be the contamination rate. We need to find \(p\) where \(E_3 = E_2\).
For Strategy 2:
\[E_2 = 1{,}000 + (1-p)(0.10)(50{,}000) + p(0.02)(500{,}000)\]
\[= 1{,}000 + 5{,}000(1-p) + 10{,}000p\]
\[= 1{,}000 + 5{,}000 - 5{,}000p + 10{,}000p = 6{,}000 + 5{,}000p\]
For Strategy 3:
Testing cost: \(1{,}000 + [p \times 0.98 + (1-p) \times 0.10] \times 1{,}000\)
Let \(r = P(T_1^+) = 0.98p + 0.10(1-p) = 0.10 + 0.88p\).
\[E_3 = 1{,}000 + r \times 1{,}000 + (1-p)(0.10)(0.10)(50{,}000) + p[0.02 + 0.98 \times 0.02](500{,}000)\]
\[= 1{,}000 + 1{,}000(0.10 + 0.88p) + (1-p)(0.01)(50{,}000) + p(0.0396)(500{,}000)\]
\[= 1{,}000 + 100 + 880p + 500(1-p) + 19{,}800p\]
\[= 1{,}600 + 880p - 500p + 19{,}800p = 1{,}600 + 20{,}180p\]
Setting \(E_2 = E_3\):
\[6{,}000 + 5{,}000p = 1{,}600 + 20{,}180p\]
\[4{,}400 = 15{,}180p\]
\[p = \frac{4{,}400}{15{,}180} \approx 0.2898\]
Strategy 3 is cheaper than Strategy 2 when the contamination rate is below approximately 29%. Since our actual contamination rate (5%) is well below this threshold, Strategy 3 is clearly preferred. For very high contamination rates, the additional false negatives from the two-test requirement make Strategy 2 preferable, as catching contaminated batches quickly becomes more important than avoiding false positives.