Course Cheatsheet
Section 07: Probability & Statistics
Descriptive Statistics
Measures of Central Tendency
| Measure | Formula | When to Use |
|---|---|---|
| Mean | \(\bar{x} = \frac{\sum x_i}{n}\) | Symmetric data, no outliers |
| Median | Middle value when sorted | Skewed data, outliers present |
| Mode | Most frequent value | Categorical data |
Median calculation:
- Odd \(n\): Middle value at position \(\frac{n+1}{2}\)
- Even \(n\): Average of the two middle values at positions \(\frac{n}{2}\) and \(\frac{n}{2}+1\)
Use the mean when the data is roughly symmetric. Use the median when there are outliers or heavy skew – the median is robust against extreme values. Use the mode for categorical data where numerical averages are meaningless.
Measures of Spread
| Measure | Formula |
|---|---|
| Range | \(\text{Max} - \text{Min}\) |
| Sample Variance | \(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\) |
| Sample Std. Dev. | \(s = \sqrt{s^2}\) |
| Population Variance | \(\sigma^2 = \frac{\sum(x_i - \mu)^2}{N}\) |
| Population Std. Dev. | \(\sigma = \sqrt{\sigma^2}\) |
Use \(n-1\) in the denominator for sample variance (Bessel’s correction). Use \(N\) for population variance. On exams, read carefully whether you are given a sample or a full population.
Five-Number Summary and Box Plots
- Minimum
- First Quartile (Q1) – 25th percentile
- Median (Q2) – 50th percentile
- Third Quartile (Q3) – 75th percentile
- Maximum
Interquartile Range (IQR): \(\text{IQR} = Q3 - Q1\) (contains the middle 50% of data)
Outlier Detection:
- Lower fence: \(Q1 - 1.5 \times \text{IQR}\)
- Upper fence: \(Q3 + 1.5 \times \text{IQR}\)
- Any value below the lower fence or above the upper fence is an outlier
Frequency is the raw count of observations in a category. Relative frequency is \(\frac{\text{Frequency}}{\text{Total count}}\), which gives the proportion or percentage. Relative frequencies sum to 1 (or 100%) and serve as probability estimates via the frequentist interpretation: \(\text{Relative Frequency} \approx \text{Probability}\).
Random Variables
Definition
A random variable \(X\) assigns a numerical value to each outcome in the sample space. Formally: \(X : S \to \mathbb{R}\).
Examples: Number of defective items in a batch, number shown on a die, number of heads in coin flips.
Probability Mass Function (PMF)
For a discrete random variable \(X\), the PMF is:
\[p_X(x) = P(X = x)\]
Properties:
- \(p_X(x) \geq 0\) for all \(x\)
- \(\sum_x p_X(x) = 1\)
Cumulative Distribution Function (CDF)
The CDF gives cumulative probabilities up to a threshold:
\[F_X(x) = P(X \leq x) = \sum_{k \leq x} P(X = k)\]
PMF vs. CDF Quick Reference
| Question Wording | Use | Expression |
|---|---|---|
| “Exactly \(k\)” | PMF | \(P(X = k)\) |
| “At most \(k\)” | CDF | \(P(X \leq k)\) |
| “At least \(k\)” | Complement of CDF | \(1 - P(X \leq k-1)\) |
| “More than \(k\)” | Complement of CDF | \(1 - P(X \leq k)\) |
Do not confuse \(P(X = k)\) with \(P(X \leq k)\). Always check whether the question asks for “exactly” (PMF) or “at most” / “at least” (CDF). For example, \(P(X = 1) = 0.35\) but \(P(X \leq 1) = 0.45\) are different values.
Expected Value and Variance of Discrete Random Variables
Expected Value
\[E[X] = \sum_i x_i \cdot P(X = x_i)\]
The expected value is the long-run average outcome, weighted by probabilities.
Variance and Standard Deviation
\[\text{Var}(X) = E[(X - \mu)^2] = \sum_i (x_i - \mu)^2 \cdot P(X = x_i)\]
\[\sigma_X = \sqrt{\text{Var}(X)}\]
Linear Transformation Rules
For constants \(a\) and \(b\):
\[E[aX + b] = a \cdot E[X] + b\]
\[\text{Var}(aX + b) = a^2 \cdot \text{Var}(X)\]
Sum of Independent Random Variables
If \(X\) and \(Y\) are independent:
\[E[X + Y] = E[X] + E[Y]\]
\[\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)\]
Adding a constant shifts the mean but does not change the variance. Expectation always adds for sums. Variance adds for sums only under independence.
Probability Fundamentals
Sample Space and Events
- Sample space \(S\) (or \(\Omega\)): Set of all possible outcomes
- Event \(A\): Any subset of the sample space
- Classical probability (equally likely outcomes): \(P(A) = \frac{|A|}{|S|}\)
Set Operations on Events
| Operation | Notation | Meaning |
|---|---|---|
| Union | \(A \cup B\) | A or B (or both) |
| Intersection | \(A \cap B\) | A and B |
| Complement | \(A'\) or \(\bar{A}\) | Not A |
| Set difference | \(A \setminus B = A \cap B'\) | A but not B |
Basic Probability Rules
| Rule | Formula |
|---|---|
| Complement | \(P(A') = 1 - P(A)\) |
| Addition | \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\) |
| Multiplication | \(P(A \cap B) = P(A) \cdot P(B|A)\) |
For mutually exclusive events: \(P(A \cap B) = 0\) and \(P(A \cup B) = P(A) + P(B)\)
For independent events: \(P(A \cap B) = P(A) \cdot P(B)\)
Mutually Exclusive vs. Independent
| Property | Mutually Exclusive | Independent |
|---|---|---|
| \(P(A \cap B)\) | \(= 0\) | \(= P(A) \cdot P(B)\) |
| Can occur together? | No | Yes |
| A occurred tells us… | B did not occur | Nothing about B |
If \(P(A) > 0\) and \(P(B) > 0\), then mutually exclusive events cannot be independent. Mutually exclusive means they never happen together; independent means they do not affect each other.
Combinatorics
Factorial
\[n! = n \times (n-1) \times (n-2) \times \ldots \times 2 \times 1\]
Special cases: \(0! = 1\) and \(1! = 1\)
Counting Methods Summary
| Method | Formula | When to Use |
|---|---|---|
| Counting Principle | \(n_1 \times n_2 \times \ldots \times n_k\) | Sequential independent choices |
| Permutation | \(P(n,r) = \frac{n!}{(n-r)!}\) | Order matters, without replacement |
| Combination | \(\binom{n}{r} = \frac{n!}{r!(n-r)!}\) | Order does not matter |
Ask: Does order matter?
- Yes (rankings, codes, arrangements) \(\to\) Permutation
- No (committees, teams, selections) \(\to\) Combination
Key relationship: \(\binom{n}{r} = \frac{P(n,r)}{r!}\)
Conditional Probability
Definition
\[P(A|B) = \frac{P(A \cap B)}{P(B)} \quad \text{where } P(B) > 0\]
Read as “probability of A given B.” The condition restricts the sample space to B.
Multiplication Rule
\[P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)\]
Extended: \(P(A \cap B \cap C) = P(A) \cdot P(B|A) \cdot P(C|A \cap B)\)
Law of Total Probability
If \(B_1, B_2, \ldots, B_n\) partition the sample space (mutually exclusive and exhaustive):
\[P(A) = \sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)\]
For two partitions:
\[P(A) = P(A|B) \cdot P(B) + P(A|B') \cdot P(B')\]
Tree Diagrams
Tree diagrams visualize the multiplication rule for sequential events:
- Branches represent outcomes at each stage
- Branch labels are conditional probabilities
- Multiply along a path to get joint probability
- Add across paths to get union probability
- All final path probabilities must sum to 1
For problems with sequential events or “with/without replacement,” draw a tree diagram first. Label every branch with its probability, then read joint probabilities by multiplying along paths. This approach prevents errors with conditional probabilities.
Bayes’ Theorem
Formula
\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\]
Expanded Form
\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A') \cdot P(A')}\]
Medical Testing / Classification Terminology
| Metric | Definition | Formula |
|---|---|---|
| True Positive Rate (Sensitivity) | Correctly detecting condition | \(P(+|D)\) |
| True Negative Rate (Specificity) | Correctly ruling out condition | \(P(-|D')\) |
| Base Rate (Prevalence) | Proportion with condition | \(P(D)\) |
| PPV (Positive Predictive Value) | Condition given positive test | \(P(D|+)\) |
| NPV (Negative Predictive Value) | No condition given negative test | \(P(D'|-)\) |
Derived rates:
- False positive rate: \(P(+|D') = 1 - \text{Specificity}\)
- False negative rate: \(P(-|D) = 1 - \text{Sensitivity}\)
PPV and NPV Formulas
\[\text{PPV} = \frac{\text{Sensitivity} \times \text{Prevalence}}{\text{Sensitivity} \times \text{Prevalence} + (1 - \text{Specificity}) \times (1 - \text{Prevalence})}\]
\[\text{NPV} = \frac{\text{Specificity} \times (1 - \text{Prevalence})}{\text{Specificity} \times (1 - \text{Prevalence}) + (1 - \text{Sensitivity}) \times \text{Prevalence}}\]
Low prevalence leads to low PPV, even with high sensitivity and specificity. When the condition is rare, most positive results are false positives because the large healthy population generates more false positives than the small sick population generates true positives.
For complex Bayes problems, create a hypothetical population (e.g., 10,000 people), fill in a 2x2 table using the given probabilities, then read answers directly from counts. This avoids formula errors and is easy to verify.
Contingency Tables
Reading a 2x2 Table
| \(B\) | \(B'\) | Total | |
|---|---|---|---|
| \(A\) | \(a\) | \(b\) | \(a+b\) |
| \(A'\) | \(c\) | \(d\) | \(c+d\) |
| Total | \(a+c\) | \(b+d\) | \(n\) |
Probability Types from Tables
| Probability | Formula | Where to Look |
|---|---|---|
| Marginal \(P(A)\) | \(\frac{a+b}{n}\) | Row total / Grand total |
| Marginal \(P(B)\) | \(\frac{a+c}{n}\) | Column total / Grand total |
| Joint \(P(A \cap B)\) | \(\frac{a}{n}\) | Cell / Grand total |
| Conditional \(P(A|B)\) | \(\frac{a}{a+c}\) | Cell / Column total |
| Conditional \(P(B|A)\) | \(\frac{a}{a+b}\) | Cell / Row total |
Independence Test in Tables
Events are independent if for every cell:
\[P(A \cap B) = P(A) \times P(B)\]
Or equivalently, the observed cell count equals the expected count:
\[\text{Expected} = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\]
The most common exam error is using the wrong denominator for conditional probabilities. For \(P(A|B)\), the denominator is the column total for \(B\), not the grand total. Always state the condition in words before choosing the denominator.
Binomial Distribution
Four Conditions
- Fixed number of trials \(n\)
- Two outcomes: Success (probability \(p\)) or Failure (probability \(1-p\))
- Independent trials
- Constant probability \(p\) for each trial
Probability Mass Function
\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]
Interpretation: \(\binom{n}{k}\) counts the arrangements, \(p^k\) is the probability of \(k\) successes, \((1-p)^{n-k}\) is the probability of \(n-k\) failures.
Parameters
| Measure | Formula |
|---|---|
| Expected Value | \(\mu = E[X] = np\) |
| Variance | \(\sigma^2 = np(1-p)\) |
| Standard Deviation | \(\sigma = \sqrt{np(1-p)}\) |
Common Probability Calculations
| Question | Calculation |
|---|---|
| Exactly \(k\) | \(P(X = k)\) |
| At most \(k\) | \(P(X \leq k) = \sum_{i=0}^{k} P(X=i)\) |
| At least \(k\) | \(P(X \geq k) = 1 - P(X \leq k-1)\) |
| Between \(a\) and \(b\) (inclusive) | \(\sum_{i=a}^{b} P(X=i)\) |
For \(P(X \geq k)\), use the complement: \(1 - P(X \leq k-1)\). This is especially efficient for “at least one”: \(P(X \geq 1) = 1 - P(X = 0) = 1 - (1-p)^n\).
Hypergeometric Distribution
When to Use
Use the hypergeometric distribution when sampling without replacement from a finite population with two types of items. This violates the independence assumption of the binomial.
Formula
\[P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}\]
Where:
- \(N\) = population size
- \(K\) = number of “success” items in the population
- \(n\) = sample size (drawn without replacement)
- \(k\) = number of successes in the sample
Binomial vs. Hypergeometric
| Feature | Binomial | Hypergeometric |
|---|---|---|
| Sampling | With replacement / large population | Without replacement / finite population |
| Independence | Yes | No |
| Typical wording | “in \(n\) trials” | “from a lot of \(N\) items, draw \(n\)” |
| Formula | \(\binom{n}{k}p^k(1-p)^{n-k}\) | \(\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\) |
If the population is very large relative to the sample (\(n / N < 0.05\)), the hypergeometric distribution is well approximated by the binomial with \(p = K/N\). For small populations or large samples, you must use hypergeometric.
Geometric Distribution
Definition
Probability that the first success occurs on trial \(n\):
\[P(X = n) = (1-p)^{n-1} \cdot p\]
Key Formulas
| Formula | Use Case |
|---|---|
| \(P(X = n) = (1-p)^{n-1} p\) | First success on trial \(n\) |
| \(P(X \leq n) = 1 - (1-p)^n\) | At least one success within \(n\) trials |
| \(P(X > n) = (1-p)^n\) | No success in first \(n\) trials |
| \(E[X] = \frac{1}{p}\) | Expected number of trials until first success |
Solving for Minimum \(n\)
To find the minimum number of trials so that \(P(\text{at least one success}) \geq c\):
\[1 - (1-p)^n \geq c \implies (1-p)^n \leq 1 - c\]
\[n \geq \frac{\ln(1-c)}{\ln(1-p)}\]
Since trials are discrete, always round \(n\) up to the next integer. For example, if you compute \(n \geq 28.43\), then \(n = 29\).
Memoryless Property
\[P(X > m + n \mid X > m) = P(X > n)\]
If no success has occurred yet, the process “restarts” probabilistically. Past failures do not affect future success probabilities.
Normal Distribution and z-Scores
The 68-95-99.7 Rule
For normally distributed data \(X \sim N(\mu, \sigma)\):
- 68% of data falls within \(\mu \pm 1\sigma\)
- 95% of data falls within \(\mu \pm 2\sigma\)
- 99.7% of data falls within \(\mu \pm 3\sigma\)
Standardization (z-Score)
Any normal distribution can be converted to the standard normal \(Z \sim N(0, 1)\):
\[z = \frac{X - \mu}{\sigma}\]
Typical Probability Conversions
| Question | z-Form |
|---|---|
| \(P(X \leq a)\) | \(P\!\left(Z \leq \frac{a - \mu}{\sigma}\right)\) |
| \(P(X \geq a)\) | \(1 - P\!\left(Z \leq \frac{a - \mu}{\sigma}\right)\) |
| \(P(a \leq X \leq b)\) | \(P\!\left(Z \leq \frac{b - \mu}{\sigma}\right) - P\!\left(Z \leq \frac{a - \mu}{\sigma}\right)\) |
Reverse Question (Finding the Quantile)
Given a target probability \(P(X \leq x) = c\), find the corresponding \(x\):
\[x = \mu + z_c \cdot \sigma\]
where \(z_c\) is the standard normal quantile for probability \(c\).
A z-score tells you how many standard deviations a value is from the mean. A z-score of 1.5 means the value is 1.5 standard deviations above the mean. Negative z-scores indicate values below the mean.
Quick Reference: Formula Collection
| Topic | Formula |
|---|---|
| Mean | \(\bar{x} = \frac{\sum x_i}{n}\) |
| Sample Variance | \(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\) |
| IQR | \(Q3 - Q1\) |
| Complement Rule | \(P(A') = 1 - P(A)\) |
| Addition Rule | \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\) |
| Conditional Prob. | \(P(A|B) = \frac{P(A \cap B)}{P(B)}\) |
| Bayes’ Theorem | \(P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\) |
| Permutation | \(P(n,r) = \frac{n!}{(n-r)!}\) |
| Combination | \(\binom{n}{r} = \frac{n!}{r!(n-r)!}\) |
| Binomial PMF | \(P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}\) |
| Binomial Mean/Var | \(\mu = np\), \(\sigma^2 = np(1-p)\) |
| Hypergeometric | \(P(X=k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\) |
| Geometric PMF | \(P(X=n) = (1-p)^{n-1}p\) |
| z-Score | \(z = \frac{X - \mu}{\sigma}\) |
| Expected Value | \(E[X] = \sum x_i P(X = x_i)\) |
| Linearity of \(E\) | \(E[aX+b] = aE[X]+b\) |
| Variance scaling | \(\text{Var}(aX+b) = a^2 \text{Var}(X)\) |
Problem-Solving Strategies
For Probability Problems
- Define events clearly with symbols
- Identify what type of probability is needed (joint, conditional, marginal)
- Choose the right rule (addition, multiplication, complement, Bayes)
- Draw a diagram if helpful (Venn diagram, tree diagram, contingency table)
- Check that your answer is between 0 and 1
For Bayes Problems
- Identify what you need (usually PPV or reversed conditional)
- Extract given information: base rate, true positive rate, true negative rate
- Calculate \(P(B)\) using the law of total probability
- Apply Bayes’ theorem
- Verify with a contingency table using a hypothetical population
For Binomial / Distribution Problems
- Verify conditions (fixed \(n\), two outcomes, independence, constant \(p\))
- Check if sampling is with or without replacement (binomial vs. hypergeometric)
- Identify \(n\), \(p\), and \(k\)
- Translate wording: “exactly,” “at most,” “at least”
- Use the complement rule when helpful
For Geometric / Minimum-n Problems
- Set up the inequality: \(1 - (1-p)^n \geq c\)
- Take logarithms on both sides
- Remember to flip the inequality when dividing by a negative number
- Round up to the next integer
- Confusing \(P(A|B)\) with \(P(B|A)\) – these have different denominators
- Forgetting to use the complement rule for “at least” problems
- Using permutations when combinations are needed (or vice versa)
- Assuming independence without verifying
- Mixing up sensitivity (\(P(+|D)\)) with PPV (\(P(D|+)\))
- Using the wrong denominator for conditional probability in contingency tables
- Rounding \(n\) down instead of up in minimum-\(n\) problems
- Applying binomial when sampling is without replacement from a small population
- Confusing PMF (\(P(X = k)\)) with CDF (\(P(X \leq k)\)) based on exam wording