Course Cheatsheet

Section 07: Probability & Statistics

Descriptive Statistics

Measures of Central Tendency

Measure	Formula	When to Use
Mean	\(\bar{x} = \frac{\sum x_i}{n}\)	Symmetric data, no outliers
Median	Middle value when sorted	Skewed data, outliers present
Mode	Most frequent value	Categorical data

Median calculation:

Odd \(n\): Middle value at position \(\frac{n+1}{2}\)
Even \(n\): Average of the two middle values at positions \(\frac{n}{2}\) and \(\frac{n}{2}+1\)

Choosing the Right Measure

Use the mean when the data is roughly symmetric. Use the median when there are outliers or heavy skew – the median is robust against extreme values. Use the mode for categorical data where numerical averages are meaningless.

Measures of Spread

Measure	Formula
Range	\(\text{Max} - \text{Min}\)
Sample Variance	\(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\)
Sample Std. Dev.	\(s = \sqrt{s^2}\)
Population Variance	\(\sigma^2 = \frac{\sum(x_i - \mu)^2}{N}\)
Population Std. Dev.	\(\sigma = \sqrt{\sigma^2}\)

Sample vs. Population

Use \(n-1\) in the denominator for sample variance (Bessel’s correction). Use \(N\) for population variance. On exams, read carefully whether you are given a sample or a full population.

Five-Number Summary and Box Plots

Minimum
First Quartile (Q1) – 25th percentile
Median (Q2) – 50th percentile
Third Quartile (Q3) – 75th percentile
Maximum

Interquartile Range (IQR): \(\text{IQR} = Q3 - Q1\) (contains the middle 50% of data)

Outlier Detection:

Lower fence: \(Q1 - 1.5 \times \text{IQR}\)
Upper fence: \(Q3 + 1.5 \times \text{IQR}\)
Any value below the lower fence or above the upper fence is an outlier

Frequency vs. Relative Frequency

Frequency is the raw count of observations in a category. Relative frequency is \(\frac{\text{Frequency}}{\text{Total count}}\), which gives the proportion or percentage. Relative frequencies sum to 1 (or 100%) and serve as probability estimates via the frequentist interpretation: \(\text{Relative Frequency} \approx \text{Probability}\).

Random Variables

Definition

A random variable \(X\) assigns a numerical value to each outcome in the sample space. Formally: \(X : S \to \mathbb{R}\).

Examples: Number of defective items in a batch, number shown on a die, number of heads in coin flips.

Probability Mass Function (PMF)

For a discrete random variable \(X\), the PMF is:

\[p_X(x) = P(X = x)\]

Properties:

\(p_X(x) \geq 0\) for all \(x\)
\(\sum_x p_X(x) = 1\)

Cumulative Distribution Function (CDF)

The CDF gives cumulative probabilities up to a threshold:

\[F_X(x) = P(X \leq x) = \sum_{k \leq x} P(X = k)\]

PMF vs. CDF Quick Reference

Question Wording	Use	Expression
“Exactly \(k\)”	PMF	\(P(X = k)\)
“At most \(k\)”	CDF	\(P(X \leq k)\)
“At least \(k\)”	Complement of CDF	\(1 - P(X \leq k-1)\)
“More than \(k\)”	Complement of CDF	\(1 - P(X \leq k)\)

PMF vs. CDF – A Common Exam Error

Do not confuse \(P(X = k)\) with \(P(X \leq k)\). Always check whether the question asks for “exactly” (PMF) or “at most” / “at least” (CDF). For example, \(P(X = 1) = 0.35\) but \(P(X \leq 1) = 0.45\) are different values.

Expected Value and Variance of Discrete Random Variables

Expected Value

\[E[X] = \sum_i x_i \cdot P(X = x_i)\]

The expected value is the long-run average outcome, weighted by probabilities.

Variance and Standard Deviation

\[\text{Var}(X) = E[(X - \mu)^2] = \sum_i (x_i - \mu)^2 \cdot P(X = x_i)\]

\[\sigma_X = \sqrt{\text{Var}(X)}\]

Linear Transformation Rules

For constants \(a\) and \(b\):

\[E[aX + b] = a \cdot E[X] + b\]

\[\text{Var}(aX + b) = a^2 \cdot \text{Var}(X)\]

Sum of Independent Random Variables

If \(X\) and \(Y\) are independent:

\[E[X + Y] = E[X] + E[Y]\]

\[\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)\]

Key Insight on Variance Rules

Adding a constant shifts the mean but does not change the variance. Expectation always adds for sums. Variance adds for sums only under independence.

Probability Fundamentals

Sample Space and Events

Sample space \(S\) (or \(\Omega\)): Set of all possible outcomes
Event \(A\): Any subset of the sample space
Classical probability (equally likely outcomes): \(P(A) = \frac{|A|}{|S|}\)

Set Operations on Events

Operation	Notation	Meaning
Union	\(A \cup B\)	A or B (or both)
Intersection	\(A \cap B\)	A and B
Complement	\(A'\) or \(\bar{A}\)	Not A
Set difference	\(A \setminus B = A \cap B'\)	A but not B

Basic Probability Rules

Rule	Formula
Complement	\(P(A') = 1 - P(A)\)
Addition	\(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
Multiplication	\(P(A \cap B) = P(A) \cdot P(B\|A)\)

For mutually exclusive events: \(P(A \cap B) = 0\) and \(P(A \cup B) = P(A) + P(B)\)

For independent events: \(P(A \cap B) = P(A) \cdot P(B)\)

Mutually Exclusive vs. Independent

Property	Mutually Exclusive	Independent
\(P(A \cap B)\)	\(= 0\)	\(= P(A) \cdot P(B)\)
Can occur together?	No	Yes
A occurred tells us…	B did not occur	Nothing about B

Do Not Confuse These Two Concepts

If \(P(A) > 0\) and \(P(B) > 0\), then mutually exclusive events cannot be independent. Mutually exclusive means they never happen together; independent means they do not affect each other.

Combinatorics

Factorial

\[n! = n \times (n-1) \times (n-2) \times \ldots \times 2 \times 1\]

Special cases: \(0! = 1\) and \(1! = 1\)

Counting Methods Summary

Method	Formula	When to Use
Counting Principle	\(n_1 \times n_2 \times \ldots \times n_k\)	Sequential independent choices
Permutation	\(P(n,r) = \frac{n!}{(n-r)!}\)	Order matters, without replacement
Combination	\(\binom{n}{r} = \frac{n!}{r!(n-r)!}\)	Order does not matter

Permutation vs. Combination Decision

Ask: Does order matter?

Yes (rankings, codes, arrangements) \(\to\) Permutation
No (committees, teams, selections) \(\to\) Combination

Key relationship: \(\binom{n}{r} = \frac{P(n,r)}{r!}\)

Conditional Probability

Definition

\[P(A|B) = \frac{P(A \cap B)}{P(B)} \quad \text{where } P(B) > 0\]

Read as “probability of A given B.” The condition restricts the sample space to B.

Multiplication Rule

\[P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)\]

Extended: \(P(A \cap B \cap C) = P(A) \cdot P(B|A) \cdot P(C|A \cap B)\)

Law of Total Probability

If \(B_1, B_2, \ldots, B_n\) partition the sample space (mutually exclusive and exhaustive):

\[P(A) = \sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)\]

For two partitions:

\[P(A) = P(A|B) \cdot P(B) + P(A|B') \cdot P(B')\]

Tree Diagrams

Tree diagrams visualize the multiplication rule for sequential events:

Branches represent outcomes at each stage
Branch labels are conditional probabilities
Multiply along a path to get joint probability
Add across paths to get union probability
All final path probabilities must sum to 1

Tree Diagram Strategy

For problems with sequential events or “with/without replacement,” draw a tree diagram first. Label every branch with its probability, then read joint probabilities by multiplying along paths. This approach prevents errors with conditional probabilities.

Bayes’ Theorem

Formula

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\]

Expanded Form

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A') \cdot P(A')}\]

Medical Testing / Classification Terminology

Metric	Definition	Formula
True Positive Rate (Sensitivity)	Correctly detecting condition	\(P(+\|D)\)
True Negative Rate (Specificity)	Correctly ruling out condition	\(P(-\|D')\)
Base Rate (Prevalence)	Proportion with condition	\(P(D)\)
PPV (Positive Predictive Value)	Condition given positive test	\(P(D\|+)\)
NPV (Negative Predictive Value)	No condition given negative test	\(P(D'\|-)\)

Derived rates:

False positive rate: \(P(+|D') = 1 - \text{Specificity}\)
False negative rate: \(P(-|D) = 1 - \text{Sensitivity}\)

PPV and NPV Formulas

\[\text{PPV} = \frac{\text{Sensitivity} \times \text{Prevalence}}{\text{Sensitivity} \times \text{Prevalence} + (1 - \text{Specificity}) \times (1 - \text{Prevalence})}\]

\[\text{NPV} = \frac{\text{Specificity} \times (1 - \text{Prevalence})}{\text{Specificity} \times (1 - \text{Prevalence}) + (1 - \text{Sensitivity}) \times \text{Prevalence}}\]

The Base Rate Effect

Low prevalence leads to low PPV, even with high sensitivity and specificity. When the condition is rare, most positive results are false positives because the large healthy population generates more false positives than the small sick population generates true positives.

Contingency Table Method for Bayes

For complex Bayes problems, create a hypothetical population (e.g., 10,000 people), fill in a 2x2 table using the given probabilities, then read answers directly from counts. This avoids formula errors and is easy to verify.

Contingency Tables

Reading a 2x2 Table

	\(B\)	\(B'\)	Total
\(A\)	\(a\)	\(b\)	\(a+b\)
\(A'\)	\(c\)	\(d\)	\(c+d\)
Total	\(a+c\)	\(b+d\)	\(n\)

Probability Types from Tables

Probability	Formula	Where to Look
Marginal \(P(A)\)	\(\frac{a+b}{n}\)	Row total / Grand total
Marginal \(P(B)\)	\(\frac{a+c}{n}\)	Column total / Grand total
Joint \(P(A \cap B)\)	\(\frac{a}{n}\)	Cell / Grand total
Conditional \(P(A\|B)\)	\(\frac{a}{a+c}\)	Cell / Column total
Conditional \(P(B\|A)\)	\(\frac{a}{a+b}\)	Cell / Row total

Independence Test in Tables

Events are independent if for every cell:

\[P(A \cap B) = P(A) \times P(B)\]

Or equivalently, the observed cell count equals the expected count:

\[\text{Expected} = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\]

Denominator Mistakes in Contingency Tables

The most common exam error is using the wrong denominator for conditional probabilities. For \(P(A|B)\), the denominator is the column total for \(B\), not the grand total. Always state the condition in words before choosing the denominator.

Binomial Distribution

Four Conditions

Fixed number of trials \(n\)
Two outcomes: Success (probability \(p\)) or Failure (probability \(1-p\))
Independent trials
Constant probability \(p\) for each trial

Probability Mass Function

\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]

Interpretation: \(\binom{n}{k}\) counts the arrangements, \(p^k\) is the probability of \(k\) successes, \((1-p)^{n-k}\) is the probability of \(n-k\) failures.

Parameters

Measure	Formula
Expected Value	\(\mu = E[X] = np\)
Variance	\(\sigma^2 = np(1-p)\)
Standard Deviation	\(\sigma = \sqrt{np(1-p)}\)

Common Probability Calculations

Question	Calculation
Exactly \(k\)	\(P(X = k)\)
At most \(k\)	\(P(X \leq k) = \sum_{i=0}^{k} P(X=i)\)
At least \(k\)	\(P(X \geq k) = 1 - P(X \leq k-1)\)
Between \(a\) and \(b\) (inclusive)	\(\sum_{i=a}^{b} P(X=i)\)

Complement Strategy for “At Least” Problems

For \(P(X \geq k)\), use the complement: \(1 - P(X \leq k-1)\). This is especially efficient for “at least one”: \(P(X \geq 1) = 1 - P(X = 0) = 1 - (1-p)^n\).

Hypergeometric Distribution

When to Use

Use the hypergeometric distribution when sampling without replacement from a finite population with two types of items. This violates the independence assumption of the binomial.

Formula

\[P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}\]

Where:

\(N\) = population size
\(K\) = number of “success” items in the population
\(n\) = sample size (drawn without replacement)
\(k\) = number of successes in the sample

Binomial vs. Hypergeometric

Feature	Binomial	Hypergeometric
Sampling	With replacement / large population	Without replacement / finite population
Independence	Yes	No
Typical wording	“in \(n\) trials”	“from a lot of \(N\) items, draw \(n\)”
Formula	\(\binom{n}{k}p^k(1-p)^{n-k}\)	\(\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\)

When Are They Approximately Equal?

If the population is very large relative to the sample (\(n / N < 0.05\)), the hypergeometric distribution is well approximated by the binomial with \(p = K/N\). For small populations or large samples, you must use hypergeometric.

Geometric Distribution

Definition

Probability that the first success occurs on trial \(n\):

\[P(X = n) = (1-p)^{n-1} \cdot p\]

Key Formulas

Formula	Use Case
\(P(X = n) = (1-p)^{n-1} p\)	First success on trial \(n\)
\(P(X \leq n) = 1 - (1-p)^n\)	At least one success within \(n\) trials
\(P(X > n) = (1-p)^n\)	No success in first \(n\) trials
\(E[X] = \frac{1}{p}\)	Expected number of trials until first success

Solving for Minimum \(n\)

To find the minimum number of trials so that \(P(\text{at least one success}) \geq c\):

\[1 - (1-p)^n \geq c \implies (1-p)^n \leq 1 - c\]

\[n \geq \frac{\ln(1-c)}{\ln(1-p)}\]

Always Round Up

Since trials are discrete, always round \(n\) up to the next integer. For example, if you compute \(n \geq 28.43\), then \(n = 29\).

Memoryless Property

\[P(X > m + n \mid X > m) = P(X > n)\]

If no success has occurred yet, the process “restarts” probabilistically. Past failures do not affect future success probabilities.

Normal Distribution and z-Scores

The 68-95-99.7 Rule

For normally distributed data \(X \sim N(\mu, \sigma)\):

68% of data falls within \(\mu \pm 1\sigma\)
95% of data falls within \(\mu \pm 2\sigma\)
99.7% of data falls within \(\mu \pm 3\sigma\)

Standardization (z-Score)

Any normal distribution can be converted to the standard normal \(Z \sim N(0, 1)\):

\[z = \frac{X - \mu}{\sigma}\]

Typical Probability Conversions

Question	z-Form
\(P(X \leq a)\)	\(P\!\left(Z \leq \frac{a - \mu}{\sigma}\right)\)
\(P(X \geq a)\)	\(1 - P\!\left(Z \leq \frac{a - \mu}{\sigma}\right)\)
\(P(a \leq X \leq b)\)	\(P\!\left(Z \leq \frac{b - \mu}{\sigma}\right) - P\!\left(Z \leq \frac{a - \mu}{\sigma}\right)\)

Reverse Question (Finding the Quantile)

Given a target probability \(P(X \leq x) = c\), find the corresponding \(x\):

\[x = \mu + z_c \cdot \sigma\]

where \(z_c\) is the standard normal quantile for probability \(c\).

z-Score Interpretation

A z-score tells you how many standard deviations a value is from the mean. A z-score of 1.5 means the value is 1.5 standard deviations above the mean. Negative z-scores indicate values below the mean.

Quick Reference: Formula Collection

Topic	Formula
Mean	\(\bar{x} = \frac{\sum x_i}{n}\)
Sample Variance	\(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\)
IQR	\(Q3 - Q1\)
Complement Rule	\(P(A') = 1 - P(A)\)
Addition Rule	\(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
Conditional Prob.	\(P(A\|B) = \frac{P(A \cap B)}{P(B)}\)
Bayes’ Theorem	\(P(A\|B) = \frac{P(B\|A) \cdot P(A)}{P(B)}\)
Permutation	\(P(n,r) = \frac{n!}{(n-r)!}\)
Combination	\(\binom{n}{r} = \frac{n!}{r!(n-r)!}\)
Binomial PMF	\(P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}\)
Binomial Mean/Var	\(\mu = np\), \(\sigma^2 = np(1-p)\)
Hypergeometric	\(P(X=k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\)
Geometric PMF	\(P(X=n) = (1-p)^{n-1}p\)
z-Score	\(z = \frac{X - \mu}{\sigma}\)
Expected Value	\(E[X] = \sum x_i P(X = x_i)\)
Linearity of \(E\)	\(E[aX+b] = aE[X]+b\)
Variance scaling	\(\text{Var}(aX+b) = a^2 \text{Var}(X)\)

Problem-Solving Strategies

For Probability Problems

Define events clearly with symbols
Identify what type of probability is needed (joint, conditional, marginal)
Choose the right rule (addition, multiplication, complement, Bayes)
Draw a diagram if helpful (Venn diagram, tree diagram, contingency table)
Check that your answer is between 0 and 1

For Bayes Problems

Identify what you need (usually PPV or reversed conditional)
Extract given information: base rate, true positive rate, true negative rate
Calculate \(P(B)\) using the law of total probability
Apply Bayes’ theorem
Verify with a contingency table using a hypothetical population

For Binomial / Distribution Problems

Verify conditions (fixed \(n\), two outcomes, independence, constant \(p\))
Check if sampling is with or without replacement (binomial vs. hypergeometric)
Identify \(n\), \(p\), and \(k\)
Translate wording: “exactly,” “at most,” “at least”
Use the complement rule when helpful

For Geometric / Minimum-n Problems

Set up the inequality: \(1 - (1-p)^n \geq c\)
Take logarithms on both sides
Remember to flip the inequality when dividing by a negative number
Round up to the next integer

Common Mistakes to Avoid

Confusing \(P(A|B)\) with \(P(B|A)\) – these have different denominators
Forgetting to use the complement rule for “at least” problems
Using permutations when combinations are needed (or vice versa)
Assuming independence without verifying
Mixing up sensitivity (\(P(+|D)\)) with PPV (\(P(D|+)\))
Using the wrong denominator for conditional probability in contingency tables
Rounding \(n\) down instead of up in minimum-\(n\) problems
Applying binomial when sampling is without replacement from a small population
Confusing PMF (\(P(X = k)\)) with CDF (\(P(X \leq k)\)) based on exam wording