Tasks 07-01 - Descriptive Statistics

Section 07: Probability & Statistics

Problem 1: Measures of Central Tendency (x)

For the dataset: \(15, 22, 18, 25, 22, 19, 22, 28, 17, 22\)

Calculate the mean.
Find the median.
Find the mode.
Which measure best represents the “typical” value? Why?

Solution

Mean: \(\bar{x} = \frac{15+22+18+25+22+19+22+28+17+22}{10} = \frac{210}{10} = 21\)
Sorted: \(15, 17, 18, 19, 22, 22, 22, 22, 25, 28\) Median = \(\frac{22 + 22}{2} = 22\)
Mode = 22 (appears 4 times)
The mode (22) best represents the typical value because it’s the most frequent value. The mean (21) is slightly pulled down by the lower values 15 and 17.

Problem 2: Variance and Standard Deviation (x)

For the dataset: \(8, 12, 15, 11, 14\)

Calculate the mean.
Calculate the sample variance.
Calculate the sample standard deviation.

Solution

Mean: \(\bar{x} = \frac{8+12+15+11+14}{5} = \frac{60}{5} = 12\)
Variance: Deviations from mean: \((8-12)^2 = 16\), \((12-12)^2 = 0\), \((15-12)^2 = 9\), \((11-12)^2 = 1\), \((14-12)^2 = 4\)

Sum of squared deviations: \(16 + 0 + 9 + 1 + 4 = 30\)

Sample variance: \(s^2 = \frac{30}{5-1} = \frac{30}{4} = 7.5\)
Standard deviation: \(s = \sqrt{7.5} \approx 2.74\)

Problem 3: Range and IQR (x)

For the dataset: \(42, 55, 63, 48, 71, 59, 45, 67, 52, 58, 61, 49\)

Find the range.
Find Q1 (first quartile).
Find Q3 (third quartile).
Calculate the interquartile range (IQR).

Solution

Sorted data: \(42, 45, 48, 49, 52, 55, 58, 59, 61, 63, 67, 71\)

Range = Max - Min = \(71 - 42 = 29\)
Lower half: \(42, 45, 48, 49, 52, 55\) Q1 = median of lower half = \(\frac{48 + 49}{2} = 48.5\)
Upper half: \(58, 59, 61, 63, 67, 71\) Q3 = median of upper half = \(\frac{61 + 63}{2} = 62\)
IQR = Q3 - Q1 = \(62 - 48.5 = 13.5\)

Problem 4: Outlier Detection (xx)

For the dataset: \(25, 28, 30, 32, 27, 29, 31, 85, 26, 30\)

Calculate Q1, Q3, and IQR.
Determine the lower and upper fences for outliers.
Are there any outliers? If so, which value(s)?
Recalculate the mean with and without outliers.

Solution

Sorted: \(25, 26, 27, 28, 29, 30, 30, 31, 32, 85\)

Lower half: \(25, 26, 27, 28, 29\) → Q1 = 27 Upper half: \(30, 30, 31, 32, 85\) → Q3 = 31 IQR = \(31 - 27 = 4\)
Lower fence = Q1 - 1.5 × IQR = \(27 - 1.5(4) = 27 - 6 = 21\) Upper fence = Q3 + 1.5 × IQR = \(31 + 1.5(4) = 31 + 6 = 37\)
Values below 21 or above 37 are outliers. 85 is an outlier (85 > 37)
With outlier: \(\bar{x} = \frac{25+26+27+28+29+30+30+31+32+85}{10} = \frac{343}{10} = 34.3\)

Without outlier: \(\bar{x} = \frac{25+26+27+28+29+30+30+31+32}{9} = \frac{258}{9} = 28.67\)

The outlier increases the mean by 5.63!

Problem 5: Frequency Distribution (x)

Test scores for 20 students: \(65, 72, 78, 85, 91, 68, 74, 82, 88, 95, 71, 77, 83, 89, 73, 79, 84, 92, 76, 81\)

Create a frequency table using intervals: 65-74, 75-84, 85-94, 95-100
Calculate the relative frequency for each interval.
What percentage of students scored between 75 and 84?

Solution

& b) Frequency table:

Score Range	Tally	Frequency	Relative Frequency
65-74	IIII I	6	6/20 = 30%
75-84	IIII III	8	8/20 = 40%
85-94	IIII	5	5/20 = 25%
95-100	I	1	1/20 = 5%
Total		20	100%

40% of students scored between 75 and 84.

Problem 6: Comparing Datasets (xx)

Two sales teams’ weekly sales (in units):

Team A: \(45, 52, 48, 55, 50\) Team B: \(30, 70, 45, 60, 45\)

Calculate the mean for each team.
Calculate the standard deviation for each team.
Which team is more consistent? Why?
Which team would you prefer to manage? Justify your answer.

Solution

Team A: \(\bar{x}_A = \frac{45+52+48+55+50}{5} = \frac{250}{5} = 50\) Team B: \(\bar{x}_B = \frac{30+70+45+60+45}{5} = \frac{250}{5} = 50\)
Team A: Deviations: \((45-50)^2=25\), \((52-50)^2=4\), \((48-50)^2=4\), \((55-50)^2=25\), \((50-50)^2=0\) \(s_A^2 = \frac{25+4+4+25+0}{4} = \frac{58}{4} = 14.5\) \(s_A = \sqrt{14.5} = 3.81\)

Team B: Deviations: \((30-50)^2=400\), \((70-50)^2=400\), \((45-50)^2=25\), \((60-50)^2=100\), \((45-50)^2=25\) \(s_B^2 = \frac{400+400+25+100+25}{4} = \frac{950}{4} = 237.5\) \(s_B = \sqrt{237.5} = 15.41\)
Team A is more consistent because its standard deviation (3.81) is much lower than Team B’s (15.41).
Answers may vary. Team A is more predictable and easier to plan around. Team B has higher highs but also lower lows - more variable performance.

Problem 7: Five-Number Summary (xx)

Monthly revenue data (in thousands Euro): \(120, 145, 132, 158, 175, 142, 138, 165, 155, 148, 162, 170\)

Find the five-number summary (Min, Q1, Median, Q3, Max).
Calculate the IQR.
Describe the shape of the distribution based on the five-number summary.

Solution

Sorted: \(120, 132, 138, 142, 145, 148, 155, 158, 162, 165, 170, 175\)

Five-Number Summary:
- Minimum: 120
- Q1: median of {120, 132, 138, 142, 145, 148} = \(\frac{138+142}{2} = 140\)
- Median: \(\frac{148+155}{2} = 151.5\)
- Q3: median of {155, 158, 162, 165, 170, 175} = \(\frac{162+165}{2} = 163.5\)
- Maximum: 175
IQR = Q3 - Q1 = \(163.5 - 140 = 23.5\)
Shape analysis:
- Distance from Min to Q1: \(140 - 120 = 20\)
- Distance from Q1 to Median: \(151.5 - 140 = 11.5\)
- Distance from Median to Q3: \(163.5 - 151.5 = 12\)
- Distance from Q3 to Max: \(175 - 163.5 = 11.5\)
The distribution is slightly left-skewed (longer left tail), as the distance from minimum to Q1 is larger than from Q3 to maximum.

Problem 8: Grouped Data (xxx)

Employee salaries (in thousands Euro) at a company are grouped:

Salary Range	Frequency
30-39	8
40-49	15
50-59	22
60-69	12
70-79	3

Estimate the mean salary using midpoints.
Find the modal class.
Estimate the median class.
Calculate the relative frequency for each class.

Solution

Range	Midpoint (m)	Freq (f)	f × m	Rel. Freq
30-39	34.5	8	276	8/60 = 13.3%
40-49	44.5	15	667.5	15/60 = 25%
50-59	54.5	22	1199	22/60 = 36.7%
60-69	64.5	12	774	12/60 = 20%
70-79	74.5	3	223.5	3/60 = 5%
Total		60	3140	100%

Estimated mean: \(\bar{x} = \frac{3140}{60} = 52.33\) thousand Euro
Modal class: 50-59 (highest frequency of 22)
Median position: \(\frac{60+1}{2} = 30.5\)th value Cumulative frequencies: 8, 23, 45, 57, 60 The 30.5th value falls in the 50-59 class (cumulative > 30.5 at position 45)
Relative frequencies shown in table above.

Problem 9: Business Application (xx)

A quality control manager measures the diameter of manufactured bolts (in mm):

\(10.02, 9.98, 10.05, 9.97, 10.01, 10.03, 9.99, 10.02, 10.00, 9.96, 10.04, 10.01\)

Target diameter: 10.00 mm with tolerance ±0.05 mm

Calculate the mean diameter.
Calculate the standard deviation.
Are all bolts within specification?
If bolts outside tolerance are rejected, what is the reject rate?

Solution

Mean: \(\bar{x} = \frac{10.02+9.98+10.05+9.97+10.01+10.03+9.99+10.02+10.00+9.96+10.04+10.01}{12}\) \(= \frac{120.08}{12} = 10.007\) mm
Deviations from 10.007: \((0.013)^2, (-0.027)^2, (0.043)^2, (-0.037)^2, (0.003)^2, (0.023)^2, (-0.017)^2, (0.013)^2, (-0.007)^2, (-0.047)^2, (0.033)^2, (0.003)^2\)

Sum = \(0.000169 + 0.000729 + 0.001849 + 0.001369 + 0.000009 + 0.000529 + 0.000289 + 0.000169 + 0.000049 + 0.002209 + 0.001089 + 0.000009 = 0.008468\)

\(s^2 = \frac{0.008468}{11} = 0.00077\) \(s = \sqrt{0.00077} = 0.028\) mm
Tolerance range: 9.95 to 10.05 mm Check each value: All values are within 9.95-10.05 mm. Yes, all bolts are within specification.
Reject rate = 0% (all bolts pass)

Problem 10: Coefficient of Variation (xx)

Compare the variability of these two datasets using the coefficient of variation:

Dataset X (prices in Euro): \(50, 55, 45, 60, 40\) Dataset Y (prices in cents): \(5000, 5500, 4500, 6000, 4000\)

Calculate mean and standard deviation for both datasets.
Calculate the coefficient of variation (CV = s/mean × 100%) for both.
Which dataset has more relative variability?

Solution

Dataset X: Mean: \(\bar{x}_X = \frac{50+55+45+60+40}{5} = 50\) Euro Variance: \(s_X^2 = \frac{(0)^2+(5)^2+(-5)^2+(10)^2+(-10)^2}{4} = \frac{250}{4} = 62.5\) Std Dev: \(s_X = 7.91\) Euro

Dataset Y: Mean: \(\bar{x}_Y = \frac{5000+5500+4500+6000+4000}{5} = 5000\) cents Variance: \(s_Y^2 = \frac{(0)^2+(500)^2+(-500)^2+(1000)^2+(-1000)^2}{4} = \frac{2500000}{4} = 625000\) Std Dev: \(s_Y = 790.6\) cents
Coefficient of Variation: \(CV_X = \frac{7.91}{50} \times 100\% = 15.82\%\) \(CV_Y = \frac{790.6}{5000} \times 100\% = 15.81\%\)
Both datasets have essentially the same relative variability (15.8%).

This makes sense because Dataset Y is just Dataset X expressed in different units (cents instead of Euros). The CV is unit-free, so it shows they have the same underlying variability.

Problem 11: Percentiles (xxx)

For the dataset: \(12, 15, 18, 22, 25, 28, 31, 35, 38, 42, 45, 48, 52, 55, 58, 62, 65, 68, 72, 75\)

Find the 25th percentile (P25).
Find the 75th percentile (P75).
Find the 90th percentile (P90).
If a value is at the 60th percentile, how many values are below it?

Solution

Data is sorted, n = 20 values.

P25 position: \(0.25 \times (20+1) = 5.25\) P25 = 5th value + 0.25 × (6th - 5th) = \(25 + 0.25(28-25) = 25 + 0.75 = 25.75\)
P75 position: \(0.75 \times 21 = 15.75\) P75 = 15th value + 0.75 × (16th - 15th) = \(58 + 0.75(62-58) = 58 + 3 = 61\)
P90 position: \(0.90 \times 21 = 18.9\) P90 = 18th value + 0.9 × (19th - 18th) = \(68 + 0.9(72-68) = 68 + 3.6 = 71.6\)
60th percentile means 60% of values are below it. \(0.60 \times 20 = 12\) values are below the 60th percentile.

Problem 12: Comprehensive Analysis (xxxx)

A store tracks daily customer counts for 30 days:

\(42, 58, 65, 38, 71, 45, 52, 67, 55, 48, 63, 72, 44, 59, 68, 51, 56, 74, 41, 62, 49, 57, 69, 46, 54, 70, 43, 60, 66, 50\)

Calculate all measures of central tendency (mean, median, mode).
Calculate range, variance, standard deviation, and IQR.
Construct the five-number summary.
Identify any outliers using the 1.5 × IQR rule.
Create a frequency distribution with 5 equal-width classes.
What can you conclude about the store’s daily customer traffic?

Solution

Sorted data: \(38, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 59, 60, 62, 63, 65, 66, 67, 68, 69, 70, 71, 72, 74\)

Mean: \(\bar{x} = \frac{1697}{30} = 56.57\) customers

Median: Average of 15th and 16th values = \(\frac{56+57}{2} = 56.5\) customers

Mode: No mode (all values appear once)
Range: \(74 - 38 = 36\) customers

Variance: Sum of squared deviations = 2642.97 \(s^2 = \frac{2642.97}{29} = 91.14\)

Standard deviation: \(s = \sqrt{91.14} = 9.55\) customers

IQR: Q1 (position 7.75): \(\frac{46+48}{2} = 47\) (using simple method) Q3 (position 23.25): \(\frac{66+67}{2} = 66.5\) IQR = \(66.5 - 47 = 19.5\)
Five-Number Summary:
- Min: 38
- Q1: 47
- Median: 56.5
- Q3: 66.5
- Max: 74
Outlier Detection: Lower fence = \(47 - 1.5(19.5) = 47 - 29.25 = 17.75\) Upper fence = \(66.5 + 1.5(19.5) = 66.5 + 29.25 = 95.75\)

All values are between 17.75 and 95.75. No outliers.
Frequency Distribution: Range = 36, Class width = 36/5 = 7.2 ≈ 8

Class Frequency Rel. Freq.

38-45 6 20%

46-53 6 20%

54-61 7 23.3%

62-69 7 23.3%

70-77 4 13.3%

Total 30 100%
Conclusions:
- Average daily traffic is about 57 customers
- Traffic is fairly consistent (CV = 9.55/56.57 = 16.9%)
- The distribution is roughly symmetric (mean ≈ median)
- No extreme days (no outliers)
- Most days see between 46 and 69 customers (67% of days)
- The store can plan staffing around 50-65 customers with reasonable confidence

Class	Frequency	Rel. Freq.
38-45	6	20%
46-53	6	20%
54-61	7	23.3%
62-69	7	23.3%
70-77	4	13.3%
Total	30	100%