Course Cheatsheet
Section 07: Probability & Statistics
Descriptive Statistics
Measures of Central Tendency
| Measure | Formula | When to Use |
|---|---|---|
| Mean | \(\bar{x} = \frac{\sum x_i}{n}\) | Symmetric data, no outliers |
| Median | Middle value when sorted | Skewed data, outliers present |
| Mode | Most frequent value | Categorical data |
Median calculation: - Odd \(n\): Middle value - Even \(n\): Average of two middle values
Measures of Spread
| Measure | Formula |
|---|---|
| Range | \(\text{Max} - \text{Min}\) |
| Sample Variance | \(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\) |
| Sample Std. Dev. | \(s = \sqrt{s^2}\) |
| Population Variance | \(\sigma^2 = \frac{\sum(x_i - \mu)^2}{N}\) |
Use \(n-1\) in denominator for sample variance. Use \(N\) for population variance.
Five-Number Summary
- Minimum
- First Quartile (Q1) - 25th percentile
- Median (Q2) - 50th percentile
- Third Quartile (Q3) - 75th percentile
- Maximum
Interquartile Range (IQR): \(\text{IQR} = Q3 - Q1\)
Outlier Detection: - Lower fence: \(Q1 - 1.5 \times \text{IQR}\) - Upper fence: \(Q3 + 1.5 \times \text{IQR}\)
Probability Fundamentals
Basic Probability Rules
| Rule | Formula |
|---|---|
| Complement | \(P(A') = 1 - P(A)\) |
| Addition | \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\) |
| Multiplication | \(P(A \cap B) = P(A) \cdot P(B|A)\) |
For mutually exclusive events: \(P(A \cup B) = P(A) + P(B)\)
For independent events: \(P(A \cap B) = P(A) \cdot P(B)\)
Independence Test
Events A and B are independent if and only if: \[P(A \cap B) = P(A) \cdot P(B)\]
or equivalently: \[P(A|B) = P(A)\]
Conditional Probability
Definition
\[P(A|B) = \frac{P(A \cap B)}{P(B)}\]
Read as: “Probability of A given B”
Law of Total Probability
If \(B_1, B_2, ..., B_n\) partition the sample space: \[P(A) = \sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)\]
For two partitions: \[P(A) = P(A|B) \cdot P(B) + P(A|B') \cdot P(B')\]
Bayes’ Theorem
Formula
\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\]
Expanded Form
\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A') \cdot P(A')}\]
For complex Bayes problems, create a hypothetical population (e.g., 10,000 people), fill in a 2×2 table using the given probabilities, then read answers directly from counts!
Medical Testing Terminology
| Metric | Definition | Formula |
|---|---|---|
| Sensitivity | True positive rate | \(P(+|D)\) |
| Specificity | True negative rate | \(P(-|D')\) |
| Prevalence | Disease rate | \(P(D)\) |
| PPV | Positive predictive value | \(P(D|+)\) |
| NPV | Negative predictive value | \(P(D'|-)\) |
False positive rate: \(P(+|D') = 1 - \text{Specificity}\)
False negative rate: \(P(-|D) = 1 - \text{Sensitivity}\)
PPV and NPV Formulas
\[\text{PPV} = P(D|+) = \frac{\text{Sensitivity} \times \text{Prevalence}}{\text{Sensitivity} \times \text{Prevalence} + (1-\text{Specificity}) \times (1-\text{Prevalence})}\]
\[\text{NPV} = P(D'|-) = \frac{\text{Specificity} \times (1-\text{Prevalence})}{\text{Specificity} \times (1-\text{Prevalence}) + (1-\text{Sensitivity}) \times \text{Prevalence}}\]
Low prevalence leads to low PPV, even with high sensitivity and specificity!
Combinatorics
Factorial
\[n! = n \times (n-1) \times (n-2) \times ... \times 2 \times 1\]
Special cases: \(0! = 1\), \(1! = 1\)
Permutations (Order Matters)
All n objects: \(P(n) = n!\)
r objects from n (without replacement): \[P(n,r) = \frac{n!}{(n-r)!}\]
Combinations (Order Doesn’t Matter)
\[\binom{n}{r} = C(n,r) = \frac{n!}{r!(n-r)!}\]
- Permutation: Arranging people in a line (order matters)
- Combination: Selecting a committee (order doesn’t matter)
Counting Principles
| Principle | Description |
|---|---|
| Multiplication | If task A has \(m\) ways and task B has \(n\) ways, together: \(m \times n\) |
| Addition | If choices are mutually exclusive, add the counts |
Contingency Tables
Reading a 2x2 Table
| B | B’ | Total | |
|---|---|---|---|
| A | a | b | a+b |
| A’ | c | d | c+d |
| Total | a+c | b+d | n |
Probability Calculations from Tables
| Probability | Formula | Name |
|---|---|---|
| \(P(A)\) | \(\frac{a+b}{n}\) | Marginal |
| \(P(B)\) | \(\frac{a+c}{n}\) | Marginal |
| \(P(A \cap B)\) | \(\frac{a}{n}\) | Joint |
| \(P(A|B)\) | \(\frac{a}{a+c}\) | Conditional |
| \(P(B|A)\) | \(\frac{a}{a+b}\) | Conditional |
Independence Test in Tables
Events are independent if: \[P(A \cap B) = P(A) \times P(B)\]
Or equivalently, observed cell count equals expected: \[\text{Expected} = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\]
Binomial Distribution
Conditions
- Fixed number of trials \(n\)
- Two outcomes: Success (p) or Failure (1-p)
- Independent trials
- Constant probability \(p\)
Probability Mass Function
\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]
Key Formulas
| Measure | Formula |
|---|---|
| Expected Value | \(\mu = E[X] = np\) |
| Variance | \(\sigma^2 = np(1-p)\) |
| Standard Deviation | \(\sigma = \sqrt{np(1-p)}\) |
Common Probability Calculations
| Question | Calculation |
|---|---|
| Exactly k | \(P(X = k)\) |
| At most k | \(P(X \leq k) = \sum_{i=0}^{k} P(X=i)\) |
| At least k | \(P(X \geq k) = 1 - P(X \leq k-1)\) |
| Between a and b | \(P(a \leq X \leq b) = \sum_{i=a}^{b} P(X=i)\) |
For “at least” problems, often easier to calculate: \(P(X \geq k) = 1 - P(X < k) = 1 - P(X \leq k-1)\)
For \(P(\text{at least one success in } n \text{ trials})\), use the complement: \(P(X \geq 1) = 1 - P(X = 0) = 1 - (1-p)^n\)
Geometric Distribution
Definition
Probability that the first success occurs on trial \(n\): \[P(X = n) = (1-p)^{n-1} \cdot p\]
Key Formulas
| Measure | Formula |
|---|---|
| Expected Trials | \(E[X] = \frac{1}{p}\) |
| P(First success by trial n) | \(P(X \leq n) = 1 - (1-p)^n\) |
Normal Distribution
The 68-95-99.7 Rule
For normal distributions:
- 68% of data within \(\mu \pm 1\sigma\)
- 95% of data within \(\mu \pm 2\sigma\)
- 99.7% of data within \(\mu \pm 3\sigma\)
Normal Approximation to Binomial
When \(np \geq 5\) and \(n(1-p) \geq 5\): \[\text{Binomial}(n, p) \approx \text{Normal}(\mu = np, \sigma = \sqrt{np(1-p)})\]
Problem-Solving Strategies
For Bayes Problems
- Identify what you need: Usually \(P(D|+)\) or \(P(D|-)\)
- Extract given information: sensitivity, specificity, prevalence
- Calculate \(P(+)\) or \(P(-)\) using law of total probability
- Apply Bayes’ theorem
- Interpret the result
For Binomial Problems
- Verify binomial conditions are met
- Identify \(n\), \(p\), and \(k\)
- Translate question: “exactly”, “at most”, “at least”
- Calculate using appropriate formula
- Use complement rule when helpful
Alternative: Contingency Table Method
For Bayes problems, use a hypothetical population:
- Choose convenient population size (e.g., 10,000)
- Fill in table using given probabilities
- Read probabilities directly from counts
Common Mistakes to Avoid
- Confusing \(P(A|B)\) with \(P(B|A)\)
- Forgetting to use complement rule for “at least” problems
- Using permutations when combinations are needed (or vice versa)
- Assuming independence without checking
- Mixing up sensitivity with PPV