
Section 07: Probability & Statistics
Test your understanding of Integration
Find \(\int x \cdot e^x \, dx\) using integration by parts.
Evaluate \(\int_0^1 (2x + 1) \, dx\)
A company’s marginal profit is \(MP(x) = 60 - 2x\). Find the profit function if \(P(0) = -100\).
Find the area between \(y = x\) and \(y = x^2\) from \(x = 0\) to \(x = 1\).
Bring up anything unclear from integration and applications.
Today we switch from calculus to data description, but we still need the same habits: clear setup, careful notation, and interpretation in context.
Probability accounts for approximately 25% of the Feststellungsprüfung!
This is foundational material - brief coverage to prepare for probability!
How do we summarize a data set with a single number?
Three Measures of Center
I’ll figure you already know what these are, but let’s do an example
Monthly sales (in thousands €) for a store:
\(12, 15, 14, 18, 15, 22, 15, 16, 14, 19\)
Question: What is the mean, median, and mode of the data?
Easy, right?

Mean: Best for symmetric data without outliers; Median: Best for skewed data or data with outliers; Mode: Best for categorical data
Two datasets can have the same mean but different spreads:
Question: Do you know any measures of spread?
Simplest measure of spread:
Range only uses two values - sensitive to outliers!
Outliers are extreme values that are far from other observations.
Used to measure spread around the mean:
Used to compute the population standard deviation.
\[\sigma^2 = \frac{\sum (x_i - \mu)^2}{N}\]
Used to compute the standard deviation of a sample.
\[s^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1}\]
The standard deviation is the square root of the variance.
\[\sigma = \sqrt{\sigma^2} \quad \text{or} \quad s = \sqrt{s^2}\]
Data: \(4, 8, 6, 5, 3, 2, 8, 9, 2, 5\) (n = 10)
Step 1: Calculate mean \[\bar{x} = \frac{4+8+6+5+3+2+8+9+2+5}{10} = \frac{52}{10} = 5.2\]
Step 2: Calculate deviations squared \[(4-5.2)^2 + (8-5.2)^2 + ... = 1.44 + 7.84 + ... = 57.6\]
Step 3: Variance and SD \[s^2 = 57.6/9 = 6.4 \quad \Rightarrow \quad s = \sqrt{6.4} \approx 2.53\]
Raw data: Test scores of 30 students
\(65, 72, 78, 81, 65, 73, 85, 92, 78, 72,\)
\(65, 88, 91, 73, 78, 82, 76, 72, 85, 78,\)
\(65, 73, 82, 79, 88, 73, 78, 85, 92, 78\)
Question: How can we summarize this data effectively?
We can use a frequency table or a histogram!
A frequency table organizes data into groups (bins):
| Score Range | Frequency | Relative Frequency |
|---|---|---|
| 60-69 | 4 | 4/30 = 13.3% |
| 70-79 | 14 | 14/30 = 46.7% |
| 80-89 | 9 | 9/30 = 30.0% |
| 90-99 | 3 | 3/30 = 10.0% |
| Total | 30 | 100% |
Relative frequency = Frequency / Total = Probability interpretation!

A histogram is a graphical representation of the distribution of data. It is an estimate of the probability distribution of a continuous variable.
Work individually
Quantiles divide sorted data into equal-sized groups:
| Quartile | Percentile | Meaning |
|---|---|---|
| Q1 | 25th | 25% of values fall below Q1 |
| Q2 | 50th | 50% of values fall below Q2 (= Median) |
| Q3 | 75th | 75% of values fall below Q3 |
Quartiles are the most commonly used quantiles in descriptive statistics!
Provides a concise overview of a dataset’s distribution:
Interquartile Range (IQR): \(\text{IQR} = Q3 - Q1\), contains the middle 50% of the data!
A box-plot can be used to visualize the five-number summary! Let’s take a look at one.

Standardized way of displaying the distribution of data based on the five-number summary!
Outliers are values that fall outside:
Any value below 35 or above 115 would be an outlier.

If the data is normally distributed, then: 68% of data falls within \(\mu \pm 1\sigma\); 95% of data falls within \(\mu \pm 2\sigma\); 99.7% of data falls within \(\mu \pm 3\sigma\)
Example: Test scores have \(\mu = 75\) and \(\sigma = 10\)
Based on the assumption that data is normally distributed, we can make informed guesses about certain things. We’ll see later what this allows us to do in the context of a business application.
A factory measures the diameter of manufactured bolts (in mm):
\[10.2, \; 10.1, \; 10.0, \; 10.3, \; 9.9, \; 10.1, \; 10.0, \; 10.2, \; 10.1, \; 10.0\]
Target: 10.0 mm with tolerance ±0.3 mm (acceptable: 9.7 – 10.3 mm)
Task: Compute the mean, median, and mode.
The mean (10.09) exceeds the target (10.0), indicating a slight upward bias in production.
Same data: \(10.2, \; 10.1, \; 10.0, \; 10.3, \; 9.9, \; 10.1, \; 10.0, \; 10.2, \; 10.1, \; 10.0\)
Task: Compute range, sample variance, and sample standard deviation.
The standard deviation (0.12 mm) is small relative to the tolerance (±0.3 mm) and the target (10.0 mm). This suggests the process has low variability!
Task: Sort the data and find the five-number summary and IQR.
Sorted: \(9.9, \; 10.0, \; 10.0, \; 10.0, \; 10.1, \; 10.1, \; 10.1, \; 10.2, \; 10.2, \; 10.3\)
| Measure | Value |
|---|---|
| Min | 9.9 |
| Q1 | 10.0 |
| Median | 10.1 |
| Q3 | 10.2 |
| Max | 10.3 |
\(\text{IQR} = Q3 - Q1 = 10.2 - 10.0 = 0.2\) mm
Task: Compute the fences and check for outliers.
The tolerance is \([9.7, \; 10.3]\) mm. The upper fence is slightly above the upper tolerance limit, but since no values exceed the tolerance, we do not have to worry about outliers.
Task: Assuming normality with \(\bar{x} = 10.09\) and \(s = 0.12\), compute the 68-95-99.7 intervals. Do all fall within the tolerance \([9.7, \; 10.3]\)?
| Rule | Interval | Within tolerance? |
|---|---|---|
| 68% | \([10.09 \pm 0.12]\) | Yes |
| 95% | \([10.09 \pm 0.24]\) | No — upper end exceeds 10.3 |
| 99.7% | \([10.09 \pm 0.36]\) | No — upper end clearly outside |
About 95% of bolts are expected within tolerance, but the upper tail extends beyond the 10.3 mm limit due to the upward-shifted mean.
Keep in mind that we only have \(n = 10\) measurements, with such a small sample, the 68-95-99.7 rule is only a rough approximation. Larger samples give more reliable estimates of \(\bar{x}\) and \(s\).
Task: Should the factory manager be concerned?
Summary of findings:
Recommendation: Recalibrate the machine to shift the mean closer to 10.0 mm. The low standard deviation means the process only needs recentering, not a reduction in variability.
Work in pairs for 5 minutes
Problem 1: Customer wait times (minutes): \(3, 5, 2, 8, 4, 6, 3, 7, 2, 10\)
Problem 2: Create a frequency table for exam scores:
Work individually, then compare
A shop tracks daily visitors and purchases:
Think individually, then work in groups of 3-4 and share
A retail manager reports weekly sales (in thousand euro):
\[41, 44, 43, 45, 46, 47, 44, 90\]
Key connection:
\[\text{Relative Frequency} \approx \text{Probability}\]
Example: If 30% of customers wait more than 5 minutes, then the probability that a randomly selected customer waits more than 5 minutes is approximately 0.30.
This is the frequentist interpretation of probability, which states that probability equals long-run relative frequency!
Work individually
Homework
Complete Tasks 07-01:
Session 07-01 - Descriptive Statistics Essentials | Dr. Nikolai Heinrichs & Dr. Tobias Vlćek | Home