GRE Quartiles and the Interquartile Range

6:38 AM

Statisticians point out that it’s often useful to “chunk” data to understand it. What does it mean to “chunk” data? It means dividing a long list into smaller chunks so that, with a few well-chosen numbers, we can get a sense of the layout of the list.

The fundamental “chunking” number is the median. The median is the middle of the list: that is, it divides the list into two chunks: an upper list and a lower list. This one number, the median, tells you both the maximum of the lower list and the minimum of the upper list.

Quartiles

Quartiles extend this idea. First, find the median, which divides the entire list into a “top 50%” list and a “bottom 50%.” Now, find the medians of each one of these lists. The median of the “bottom 50%” called Q1, the first quartile. The median of the “top 50%” is called the third quartile. The quartiles are called “quartiles” because the two quartiles and the median nicely divide the list into four equal chunks.

the lowest 25% of the list is below the first quartile
the next 25% of the list is between the first quartile and the median
the next 25% of the list is between the median and third quartile
the highest 25% is above the third quartile.

Notice that, we don’t use the term “second quartile” because the median plays the role of the second quartile.

Example:

Set S = {2, 5, 7, 11, 16, 24, 28, 50, 52, 101, 120, 130}

What is the average of the first quartile (“Q1”) and the third quartile (“Q3”) of set S?

(A) 9

(B) 26

(D) 76.5

(E) 85.5

The Interquartile Range

Often, statisticians are bothered by outliers, that is, extreme high or low values. An outlier is a member on the list who is not representative of most of the list. In the list of household incomes in the US, the incomes of Bill Gates and Warren Buffett are not representative of the rest of us: they are outliers. Outliers, by definition, will always be at the very top or the very bottom of a list.

Notice that both the “top 50%” and the “bottom 50%” will necessarily contain any outliers. Would it be possible to talk about a “half” of the population that definitely contains no outliers? Well, instead of the “top 50%” or the “bottom 50%”, we could take the “middle 50%“. What’s that? Well, suppose we look at all the folks between the first quartile and third quartile. We know that a quarter of the population is between the first quartile and the median, and a quarter between the median and the third quartile, so between the first quartile and the third quartile is 50% of the population, and it’s the 50% that’s in the middle of the population. This is called the interquartile range: the set of data entries from the first quartile to the third quartile. It’s a big deal because it’s not the upper half or lower half but rather the middle half of the data. For this reason, statisticians feel it gives a very good representation where the typical data lie.

Explanation:

Set S = {2, 5, 7, 11, 16, 24, 28, 50, 52, 101, 120, 130}

What is the average of the first quartile (“Q1”) and the third quartile (“Q3”) of set S?

A quartile is defined as the median of half of a set of data. The first quartile (or Q1) of a set of data is the
median of the lower half of the data.
For the first half, {2, 5, 7, 11, 16, 24}, the median is (7 + 11)/2 = 9 = Q1.

The third quartile (or Q3) of a set of data is the median of the upper half of the data. For the second half, {28, 50, 52,
101, 120, 130}, the median is (52 + 101)/2 = 76.5 = Q3.

Now find the average of Q1 and Q3 = (9 + 76.5) /2 = 42.75.

Answer: C