Course Cheatsheet

Section 07: Probability & Statistics

Descriptive Statistics

Measures of Central Tendency

Measure Formula When to Use
Mean \(\bar{x} = \frac{\sum x_i}{n}\) Symmetric data, no outliers
Median Middle value when sorted Skewed data, outliers present
Mode Most frequent value Categorical data

Median calculation:

  • Odd \(n\): Middle value at position \(\frac{n+1}{2}\)
  • Even \(n\): Average of the two middle values at positions \(\frac{n}{2}\) and \(\frac{n}{2}+1\)
TipChoosing the Right Measure

Use the mean when the data is roughly symmetric. Use the median when there are outliers or heavy skew – the median is robust against extreme values. Use the mode for categorical data where numerical averages are meaningless.

Measures of Spread

Measure Formula
Range \(\text{Max} - \text{Min}\)
Sample Variance \(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\)
Sample Std. Dev. \(s = \sqrt{s^2}\)
Population Variance \(\sigma^2 = \frac{\sum(x_i - \mu)^2}{N}\)
Population Std. Dev. \(\sigma = \sqrt{\sigma^2}\)
ImportantSample vs. Population

Use \(n-1\) in the denominator for sample variance (Bessel’s correction). Use \(N\) for population variance. On exams, read carefully whether you are given a sample or a full population.

Five-Number Summary and Box Plots

  1. Minimum
  2. First Quartile (Q1) – 25th percentile
  3. Median (Q2) – 50th percentile
  4. Third Quartile (Q3) – 75th percentile
  5. Maximum

Interquartile Range (IQR): \(\text{IQR} = Q3 - Q1\) (contains the middle 50% of data)

Outlier Detection:

  • Lower fence: \(Q1 - 1.5 \times \text{IQR}\)
  • Upper fence: \(Q3 + 1.5 \times \text{IQR}\)
  • Any value below the lower fence or above the upper fence is an outlier
NoteFrequency vs. Relative Frequency

Frequency is the raw count of observations in a category. Relative frequency is \(\frac{\text{Frequency}}{\text{Total count}}\), which gives the proportion or percentage. Relative frequencies sum to 1 (or 100%) and serve as probability estimates via the frequentist interpretation: \(\text{Relative Frequency} \approx \text{Probability}\).

Random Variables

Definition

A random variable \(X\) assigns a numerical value to each outcome in the sample space. Formally: \(X : S \to \mathbb{R}\).

Examples: Number of defective items in a batch, number shown on a die, number of heads in coin flips.

Probability Mass Function (PMF)

For a discrete random variable \(X\), the PMF is:

\[p_X(x) = P(X = x)\]

Properties:

  • \(p_X(x) \geq 0\) for all \(x\)
  • \(\sum_x p_X(x) = 1\)

Cumulative Distribution Function (CDF)

The CDF gives cumulative probabilities up to a threshold:

\[F_X(x) = P(X \leq x) = \sum_{k \leq x} P(X = k)\]

PMF vs. CDF Quick Reference

Question Wording Use Expression
“Exactly \(k\) PMF \(P(X = k)\)
“At most \(k\) CDF \(P(X \leq k)\)
“At least \(k\) Complement of CDF \(1 - P(X \leq k-1)\)
“More than \(k\) Complement of CDF \(1 - P(X \leq k)\)
WarningPMF vs. CDF – A Common Exam Error

Do not confuse \(P(X = k)\) with \(P(X \leq k)\). Always check whether the question asks for “exactly” (PMF) or “at most” / “at least” (CDF). For example, \(P(X = 1) = 0.35\) but \(P(X \leq 1) = 0.45\) are different values.

Expected Value and Variance of Discrete Random Variables

Expected Value

\[E[X] = \sum_i x_i \cdot P(X = x_i)\]

The expected value is the long-run average outcome, weighted by probabilities.

Variance and Standard Deviation

\[\text{Var}(X) = E[(X - \mu)^2] = \sum_i (x_i - \mu)^2 \cdot P(X = x_i)\]

\[\sigma_X = \sqrt{\text{Var}(X)}\]

Linear Transformation Rules

For constants \(a\) and \(b\):

\[E[aX + b] = a \cdot E[X] + b\]

\[\text{Var}(aX + b) = a^2 \cdot \text{Var}(X)\]

Sum of Independent Random Variables

If \(X\) and \(Y\) are independent:

\[E[X + Y] = E[X] + E[Y]\]

\[\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)\]

ImportantKey Insight on Variance Rules

Adding a constant shifts the mean but does not change the variance. Expectation always adds for sums. Variance adds for sums only under independence.

Probability Fundamentals

Sample Space and Events

  • Sample space \(S\) (or \(\Omega\)): Set of all possible outcomes
  • Event \(A\): Any subset of the sample space
  • Classical probability (equally likely outcomes): \(P(A) = \frac{|A|}{|S|}\)

Set Operations on Events

Operation Notation Meaning
Union \(A \cup B\) A or B (or both)
Intersection \(A \cap B\) A and B
Complement \(A'\) or \(\bar{A}\) Not A
Set difference \(A \setminus B = A \cap B'\) A but not B

Basic Probability Rules

Rule Formula
Complement \(P(A') = 1 - P(A)\)
Addition \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
Multiplication \(P(A \cap B) = P(A) \cdot P(B|A)\)

For mutually exclusive events: \(P(A \cap B) = 0\) and \(P(A \cup B) = P(A) + P(B)\)

For independent events: \(P(A \cap B) = P(A) \cdot P(B)\)

Mutually Exclusive vs. Independent

Property Mutually Exclusive Independent
\(P(A \cap B)\) \(= 0\) \(= P(A) \cdot P(B)\)
Can occur together? No Yes
A occurred tells us… B did not occur Nothing about B
WarningDo Not Confuse These Two Concepts

If \(P(A) > 0\) and \(P(B) > 0\), then mutually exclusive events cannot be independent. Mutually exclusive means they never happen together; independent means they do not affect each other.

Combinatorics

Factorial

\[n! = n \times (n-1) \times (n-2) \times \ldots \times 2 \times 1\]

Special cases: \(0! = 1\) and \(1! = 1\)

Counting Methods Summary

Method Formula When to Use
Counting Principle \(n_1 \times n_2 \times \ldots \times n_k\) Sequential independent choices
Permutation \(P(n,r) = \frac{n!}{(n-r)!}\) Order matters, without replacement
Combination \(\binom{n}{r} = \frac{n!}{r!(n-r)!}\) Order does not matter
TipPermutation vs. Combination Decision

Ask: Does order matter?

  • Yes (rankings, codes, arrangements) \(\to\) Permutation
  • No (committees, teams, selections) \(\to\) Combination

Key relationship: \(\binom{n}{r} = \frac{P(n,r)}{r!}\)

Conditional Probability

Definition

\[P(A|B) = \frac{P(A \cap B)}{P(B)} \quad \text{where } P(B) > 0\]

Read as “probability of A given B.” The condition restricts the sample space to B.

Multiplication Rule

\[P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)\]

Extended: \(P(A \cap B \cap C) = P(A) \cdot P(B|A) \cdot P(C|A \cap B)\)

Law of Total Probability

If \(B_1, B_2, \ldots, B_n\) partition the sample space (mutually exclusive and exhaustive):

\[P(A) = \sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)\]

For two partitions:

\[P(A) = P(A|B) \cdot P(B) + P(A|B') \cdot P(B')\]

Tree Diagrams

Tree diagrams visualize the multiplication rule for sequential events:

  • Branches represent outcomes at each stage
  • Branch labels are conditional probabilities
  • Multiply along a path to get joint probability
  • Add across paths to get union probability
  • All final path probabilities must sum to 1
TipTree Diagram Strategy

For problems with sequential events or “with/without replacement,” draw a tree diagram first. Label every branch with its probability, then read joint probabilities by multiplying along paths. This approach prevents errors with conditional probabilities.

Bayes’ Theorem

Formula

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\]

Expanded Form

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A') \cdot P(A')}\]

Medical Testing / Classification Terminology

Metric Definition Formula
True Positive Rate (Sensitivity) Correctly detecting condition \(P(+|D)\)
True Negative Rate (Specificity) Correctly ruling out condition \(P(-|D')\)
Base Rate (Prevalence) Proportion with condition \(P(D)\)
PPV (Positive Predictive Value) Condition given positive test \(P(D|+)\)
NPV (Negative Predictive Value) No condition given negative test \(P(D'|-)\)

Derived rates:

  • False positive rate: \(P(+|D') = 1 - \text{Specificity}\)
  • False negative rate: \(P(-|D) = 1 - \text{Sensitivity}\)

PPV and NPV Formulas

\[\text{PPV} = \frac{\text{Sensitivity} \times \text{Prevalence}}{\text{Sensitivity} \times \text{Prevalence} + (1 - \text{Specificity}) \times (1 - \text{Prevalence})}\]

\[\text{NPV} = \frac{\text{Specificity} \times (1 - \text{Prevalence})}{\text{Specificity} \times (1 - \text{Prevalence}) + (1 - \text{Sensitivity}) \times \text{Prevalence}}\]

WarningThe Base Rate Effect

Low prevalence leads to low PPV, even with high sensitivity and specificity. When the condition is rare, most positive results are false positives because the large healthy population generates more false positives than the small sick population generates true positives.

TipContingency Table Method for Bayes

For complex Bayes problems, create a hypothetical population (e.g., 10,000 people), fill in a 2x2 table using the given probabilities, then read answers directly from counts. This avoids formula errors and is easy to verify.

Contingency Tables

Reading a 2x2 Table

\(B\) \(B'\) Total
\(A\) \(a\) \(b\) \(a+b\)
\(A'\) \(c\) \(d\) \(c+d\)
Total \(a+c\) \(b+d\) \(n\)

Probability Types from Tables

Probability Formula Where to Look
Marginal \(P(A)\) \(\frac{a+b}{n}\) Row total / Grand total
Marginal \(P(B)\) \(\frac{a+c}{n}\) Column total / Grand total
Joint \(P(A \cap B)\) \(\frac{a}{n}\) Cell / Grand total
Conditional \(P(A|B)\) \(\frac{a}{a+c}\) Cell / Column total
Conditional \(P(B|A)\) \(\frac{a}{a+b}\) Cell / Row total

Independence Test in Tables

Events are independent if for every cell:

\[P(A \cap B) = P(A) \times P(B)\]

Or equivalently, the observed cell count equals the expected count:

\[\text{Expected} = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\]

WarningDenominator Mistakes in Contingency Tables

The most common exam error is using the wrong denominator for conditional probabilities. For \(P(A|B)\), the denominator is the column total for \(B\), not the grand total. Always state the condition in words before choosing the denominator.

Binomial Distribution

Four Conditions

  1. Fixed number of trials \(n\)
  2. Two outcomes: Success (probability \(p\)) or Failure (probability \(1-p\))
  3. Independent trials
  4. Constant probability \(p\) for each trial

Probability Mass Function

\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]

Interpretation: \(\binom{n}{k}\) counts the arrangements, \(p^k\) is the probability of \(k\) successes, \((1-p)^{n-k}\) is the probability of \(n-k\) failures.

Parameters

Measure Formula
Expected Value \(\mu = E[X] = np\)
Variance \(\sigma^2 = np(1-p)\)
Standard Deviation \(\sigma = \sqrt{np(1-p)}\)

Common Probability Calculations

Question Calculation
Exactly \(k\) \(P(X = k)\)
At most \(k\) \(P(X \leq k) = \sum_{i=0}^{k} P(X=i)\)
At least \(k\) \(P(X \geq k) = 1 - P(X \leq k-1)\)
Between \(a\) and \(b\) (inclusive) \(\sum_{i=a}^{b} P(X=i)\)
TipComplement Strategy for “At Least” Problems

For \(P(X \geq k)\), use the complement: \(1 - P(X \leq k-1)\). This is especially efficient for “at least one”: \(P(X \geq 1) = 1 - P(X = 0) = 1 - (1-p)^n\).

Hypergeometric Distribution

When to Use

Use the hypergeometric distribution when sampling without replacement from a finite population with two types of items. This violates the independence assumption of the binomial.

Formula

\[P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}\]

Where:

  • \(N\) = population size
  • \(K\) = number of “success” items in the population
  • \(n\) = sample size (drawn without replacement)
  • \(k\) = number of successes in the sample

Binomial vs. Hypergeometric

Feature Binomial Hypergeometric
Sampling With replacement / large population Without replacement / finite population
Independence Yes No
Typical wording “in \(n\) trials” “from a lot of \(N\) items, draw \(n\)
Formula \(\binom{n}{k}p^k(1-p)^{n-k}\) \(\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\)
NoteWhen Are They Approximately Equal?

If the population is very large relative to the sample (\(n / N < 0.05\)), the hypergeometric distribution is well approximated by the binomial with \(p = K/N\). For small populations or large samples, you must use hypergeometric.

Geometric Distribution

Definition

Probability that the first success occurs on trial \(n\):

\[P(X = n) = (1-p)^{n-1} \cdot p\]

Key Formulas

Formula Use Case
\(P(X = n) = (1-p)^{n-1} p\) First success on trial \(n\)
\(P(X \leq n) = 1 - (1-p)^n\) At least one success within \(n\) trials
\(P(X > n) = (1-p)^n\) No success in first \(n\) trials
\(E[X] = \frac{1}{p}\) Expected number of trials until first success

Solving for Minimum \(n\)

To find the minimum number of trials so that \(P(\text{at least one success}) \geq c\):

\[1 - (1-p)^n \geq c \implies (1-p)^n \leq 1 - c\]

\[n \geq \frac{\ln(1-c)}{\ln(1-p)}\]

ImportantAlways Round Up

Since trials are discrete, always round \(n\) up to the next integer. For example, if you compute \(n \geq 28.43\), then \(n = 29\).

Memoryless Property

\[P(X > m + n \mid X > m) = P(X > n)\]

If no success has occurred yet, the process “restarts” probabilistically. Past failures do not affect future success probabilities.

Normal Distribution and z-Scores

The 68-95-99.7 Rule

For normally distributed data \(X \sim N(\mu, \sigma)\):

  • 68% of data falls within \(\mu \pm 1\sigma\)
  • 95% of data falls within \(\mu \pm 2\sigma\)
  • 99.7% of data falls within \(\mu \pm 3\sigma\)

Standardization (z-Score)

Any normal distribution can be converted to the standard normal \(Z \sim N(0, 1)\):

\[z = \frac{X - \mu}{\sigma}\]

Typical Probability Conversions

Question z-Form
\(P(X \leq a)\) \(P\!\left(Z \leq \frac{a - \mu}{\sigma}\right)\)
\(P(X \geq a)\) \(1 - P\!\left(Z \leq \frac{a - \mu}{\sigma}\right)\)
\(P(a \leq X \leq b)\) \(P\!\left(Z \leq \frac{b - \mu}{\sigma}\right) - P\!\left(Z \leq \frac{a - \mu}{\sigma}\right)\)

Reverse Question (Finding the Quantile)

Given a target probability \(P(X \leq x) = c\), find the corresponding \(x\):

\[x = \mu + z_c \cdot \sigma\]

where \(z_c\) is the standard normal quantile for probability \(c\).

Tipz-Score Interpretation

A z-score tells you how many standard deviations a value is from the mean. A z-score of 1.5 means the value is 1.5 standard deviations above the mean. Negative z-scores indicate values below the mean.

Quick Reference: Formula Collection

Topic Formula
Mean \(\bar{x} = \frac{\sum x_i}{n}\)
Sample Variance \(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\)
IQR \(Q3 - Q1\)
Complement Rule \(P(A') = 1 - P(A)\)
Addition Rule \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
Conditional Prob. \(P(A|B) = \frac{P(A \cap B)}{P(B)}\)
Bayes’ Theorem \(P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\)
Permutation \(P(n,r) = \frac{n!}{(n-r)!}\)
Combination \(\binom{n}{r} = \frac{n!}{r!(n-r)!}\)
Binomial PMF \(P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}\)
Binomial Mean/Var \(\mu = np\), \(\sigma^2 = np(1-p)\)
Hypergeometric \(P(X=k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\)
Geometric PMF \(P(X=n) = (1-p)^{n-1}p\)
z-Score \(z = \frac{X - \mu}{\sigma}\)
Expected Value \(E[X] = \sum x_i P(X = x_i)\)
Linearity of \(E\) \(E[aX+b] = aE[X]+b\)
Variance scaling \(\text{Var}(aX+b) = a^2 \text{Var}(X)\)

Problem-Solving Strategies

For Probability Problems

  1. Define events clearly with symbols
  2. Identify what type of probability is needed (joint, conditional, marginal)
  3. Choose the right rule (addition, multiplication, complement, Bayes)
  4. Draw a diagram if helpful (Venn diagram, tree diagram, contingency table)
  5. Check that your answer is between 0 and 1

For Bayes Problems

  1. Identify what you need (usually PPV or reversed conditional)
  2. Extract given information: base rate, true positive rate, true negative rate
  3. Calculate \(P(B)\) using the law of total probability
  4. Apply Bayes’ theorem
  5. Verify with a contingency table using a hypothetical population

For Binomial / Distribution Problems

  1. Verify conditions (fixed \(n\), two outcomes, independence, constant \(p\))
  2. Check if sampling is with or without replacement (binomial vs. hypergeometric)
  3. Identify \(n\), \(p\), and \(k\)
  4. Translate wording: “exactly,” “at most,” “at least”
  5. Use the complement rule when helpful

For Geometric / Minimum-n Problems

  1. Set up the inequality: \(1 - (1-p)^n \geq c\)
  2. Take logarithms on both sides
  3. Remember to flip the inequality when dividing by a negative number
  4. Round up to the next integer
WarningCommon Mistakes to Avoid
  • Confusing \(P(A|B)\) with \(P(B|A)\) – these have different denominators
  • Forgetting to use the complement rule for “at least” problems
  • Using permutations when combinations are needed (or vice versa)
  • Assuming independence without verifying
  • Mixing up sensitivity (\(P(+|D)\)) with PPV (\(P(D|+)\))
  • Using the wrong denominator for conditional probability in contingency tables
  • Rounding \(n\) down instead of up in minimum-\(n\) problems
  • Applying binomial when sampling is without replacement from a small population
  • Confusing PMF (\(P(X = k)\)) with CDF (\(P(X \leq k)\)) based on exam wording