I want to discuss a question popped up from my student when I was teaching how to find five-number summary. The case is simple, if we have only five number, (say 1, 2, 3, 4, and 5), what is the reason that calculators and the algorithm did not provide the five number summary as Minimum=1, the first quartile (Q1)=2, the second quartile (Median)=3, the third quartile (Q3)=4, and Maximum=5. The calculator will provide you the five number summary as 1, 1.5, 3, 4.5, and 5, respectively, instead. There is no right and wrong but how we treat the data.

Quantiles1/Percentile2

If the data, X, is treated as population, that is, .2 probability for each number, 1, 2, …, and 5. The kth q-quantile is the value, xk, such that P(X < xk) ≤ k/q ≤ P(X ≤ xk). For example, the first quartile is the value, x1, that satisfy P(X < x1) ≤ 0.25 ≤ P(X ≤ x1). Solve for x1=2. Thus, by applying similar argument, Minimum=1, the first quartile (Q1)=2, the second quartile (Median)=3, the third quartile (Q3)=4, and Maximum=5.

Five number summary3

If we treat data as samples form the population, the median will a number which separates the data into half. The first quartile will be a number that further separates the first half of data into another half. Similarly, the third quartile will be a number that separates the second half of data into another half. Hence, if our samples are, 1, 2, 3, 4, and 5, the median will be 3. The first quartile is 1.5 that separate first half of the data (1, and 2) into another half. The third quartile is 4.5. The result will different from the original definition of quantiles.

References

  1. Quantile-Wikipedia
  2. Percentile-Wikipedia
  3. PSU Stat 200 link