by Professor Pi

Tue Oct 09 2001 at 6:31:21

The Interquartile Mean (IQM) is a statistical measure of central tendency, much like the mean (in more popular terms called the average), the median, and the mode.

The IQM is very similar to the scoring method used in sports that are evaluated by a panel of judges: discard the lowest and the highest scores; calculate the mean value of the remaining scores.

In calculation of the IQM, the lowest 25% and the highest 25% of the scores are discarded. These points are called the first and third quartiles, hence the name of the IQM. (Note that the second quartile is also called the median). The method is best explained with an example:

Consider the following dataset:

	5, 8, 4, 38, 8, 6, 9, 7, 7, 3, 1, 6

First sort the list from lowest-to-highest:

	1, 3, 4, 5, 6, 6, 7, 7, 8, 8, 9, 38

There are 12 observations (datapoints) in the dataset, thus we have 4 quartiles of 3 numbers. Discard the lowest and the highest 3 values:

	1, 3, 4, 5, 6, 6, 7, 7, 8, 8, 9, 38

We now have 6 of the 12 observations remaining; next, we calculate the average of the mean of these numbers:

	x_IQM = (5 + 6 + 6 + 7 + 7 + 8) / 6 = 6.5

The Interquartile Mean shares some properties from both the mean as well as the median:

Like the median, the IQM insensitive to outliers; in the example given, the highest value (38) was an obvious outlier of the dataset, but its value is not used in the calculation of the IQM. On the other hand, the common average (the arithmetic mean) is sensitive to these outliers: x_mean = 8.5.
Like the mean, the IQM is a discrete parameter, based on a large number of observations from the dataset. The median is always equal to one of the observations in the dataset (assuming an odd number of observations). The mean can be equal to any value between the lowest and highest observation, depending on the value of all the other observations. The IQM can be equal to any value between the first and third quartiles, depending on all the observations in the interquartile range.

The above example consisted of 12 observations in the dataset, which made the determination of the quartiles very easy. Of course, not all datasets have a number of observations that is divisible by 4. We can adjust the method of calculating the IQM to accommodate this. Ideally we want to have the IQM equal to the mean for symmetric distributions, e.g.:

	1, 2, 3, 4, 5

has a mean value x_mean = 3, and since it is a symmetric distribution, x_IQM = 3 would be desired.

We can solve this by using a weighted average of the quartiles and the interquartile dataset:

Consider the following dataset of 9 observations:

	1, 3, 5, 7, 9, 11, 13, 15, 17

There are 9/4 = 2.25 observations in each quartile, and 4.5 observations in the interquartile range. Truncate the fractional quartile size, and remove this number from the 1st and 3rd quartiles (2.25 observations in each quartile, thus the lowest 2 and the highest 2 are removed).

	
	1, 3, (5), 7, 9, 11, (13), 15, 17

Thus, there are 3 full observations in the interquartile range, and 2 fractional observations. Since we have a total of 4.5 observations in the interquartile range, the two fractional observations each count for 0.75 (and thus 3x1 + 2x0.75 = 4.5 observations). The IQM is now calculated as follows:

	x_IQM = {(7 + 9 + 11) + 0.75 x (5 + 13)} / 4.5 = 9

In the above example, the mean has a value x_mean = 9. The same as the IQM, as was expected. The method of calculating the IQM for any number of observations is analogous; the fractional contributions to the IQM can be either 0, 0.25, 0.50, or 0.75.

Note: I worked out a method similar to Tom Rook, but didn't bother to write it down. This one is easier since it uses/corrects the built in trimmean function in Excel. It works with 65536 values (or even more if adapted to multiple columns) One also does not need to sort the data. It goes as follows:

Put the data in column A
Define a name for a range in the sheet: (insert... Name... Define...) e.g. name it "IQM"
In the reference field put: "=Sheet1!$A:$A;Sheet1!$A:$A;Sheet1!$A:$A;Sheet1!$A:$A". Thus, the name "IQM" should refer to 4 times the column A.
The interquartile mean can be calculated with this function: "=TRIMMEAN(IQM;0.5)"

I think this one is almost winning the beauty contest, except for using a name range. Unfortunately one cannot use: TRIMMEAN(Sheet1!$A:$A;Sheet1!$A:$A;Sheet1!$A:$A;Sheet1!$A:$A;0.5)

If someone knows a method to put it all in one function without using a name range, let me know.

Update 20070103: DutchDemon writes me that it is actually possible to put everything into one formula as follows: =TRIMMEAN((A:A;A:A;A:A;A:A);0.5). Magic.

I like it!

5 C!s

(idea)

by SharQ

Sun Mar 31 2002 at 23:26:42

How to calculate the IQM in Microsoft Excel

There is, sadly enough, no simple formula for calculating the IQM in Microsoft Excel (well there is one way, but that isn't exact, as it strips away the partial numbers, which is no good). However, there is a complicated way :)

The formula

=IF(((COUNT(A1:A1000)/4)-INT(COUNT(A1:A1000)/4))=0, (AVERAGE(OFFSET(A1,(COUNT(A1:A1000))/4,0,(COUNT(A1:A1000))/2,1))), ((SUM(OFFSET(A1,((COUNT(A1:A1000))/4)+1,0,((COUNT(A1:A1000))/2)-1,1)))+ (((OFFSET(A1,((COUNT(A1:A1000))/4),0,1,1))+(OFFSET(A1,(((COUNT(A1:A1000))/4)+ ((COUNT(A1:A1000))/2)),0,1,1))))*(1-((COUNT(A1:A100)/4)-INT(COUNT(A1:A1000)/4)))) /(COUNT(OFFSET(A1,((COUNT(A1:A1000))/4)+1,0,((COUNT(A1:A1000))/2)-1,1))+ ((1-((COUNT(A1:A100)/4)-INT(COUNT(A1:A1000)/4)))*2)))

For readability, the code has spaces inserted. If you care to use it, please remove the spaces on the end of each line. This goes for all the code in this w/u.

How to use it

This formula makes certain assumptions:

That your list of cells for which you want to calculate the IQM starts on cell A1

That your cells are sorted (ascending or descending does not matter)

That you have less than a thousand cells in row A

That you have nothing else but the list of numbers in row A

How to customise it

If you would like to tweak the formula, it is probably easier to split the formula over in two different cells:

In cell B1, add the following formula:

=COUNT(A1:A1000)

Where you want the IQM to show, add the following formula:

=IF(((B1/4)-INT(B1/4))=0,(AVERAGE(OFFSET(A1,(B1)/4,0,(B1)/2,1))), ((SUM(OFFSET(A1,((B1)/4)+1,0,((B1)/2)-1,1)))+(((OFFSET(A1,((B1)/4),0,1,1))+ (OFFSET(A1,(((B1)/4)+((B1)/2)),0,1,1))))*(1-((B1/4)-INT(B1/4))))/ (COUNT(OFFSET(A1,((B1)/4)+1,0,((B1)/2)-1,1))+((1-((B1/4)-INT(B1/4)))*2)))

Now, in B1, you will want to change A1 to the first cell of your list. Change A1000 to a cell that is greater than your list (i.e if you use more than 1000 cells, you want this number to be larger than 1000)

In the long formula, change the references to A1 to whatever is your first cell, and the formula should work just fine.

What the formula does

First, the formula decides which main formula to use.

IF(((COUNT(A1:A1000)/4)-INT(COUNT(A1:A1000)/4))=0,

If the number of cells is divisible by four, the following formula is used:

AVERAGE(OFFSET(A1,(COUNT(A1:A1000))/4,0,(COUNT(A1:A1000))/2,1)),

This formula counts how many cells there are, and then calculates the average of half the cells, starting on the cell one fourth into the list

If the number of cells is not divisible by four, that is where the fun begins...

((SUM(OFFSET(A1,((COUNT(A1:A1000))/4)+1,0,((COUNT(A1:A1000))/2)-1,1)))+

The above section of the formula calculates the sum of the whole cells - i.e the same thing as above, but it does not count the cells that will only count partially towards the interquartile mean.

(((OFFSET(A1,((COUNT(A1:A1000))/4),0,1,1))+(OFFSET(A1,(((COUNT(A1:A1000))/4)+ ((COUNT(A1:A1000))/2)),0,1,1))))*

The above section of the formula finds the first and last cell of the selection - in other words, the cells that were omitted from the previous section of the formula. When those values are found, they are added to each other and then...

(1-((COUNT(A1:A100)/4)-INT(COUNT(A1:A1000)/4))))/

... multiplied with the correct fraction of which the two numbers count towards the interquartile mean. This number is added to the other cells, meaning that we now have a sum - the total of all the cells in our middle 50% selection of the list.

(COUNT(OFFSET(A1,((COUNT(A1:A1000))/4)+1,0,((COUNT(A1:A1000))/2)-1,1))+ ((1-((COUNT(A1:A100)/4)-INT(COUNT(A1:A1000)/4)))*2)))

Finally, the formula determines how many items there are in the selection, and adds either 0.5, 1.0 or 1.5 items to the list, depending on how many there are in the list.

When it has determined how many items there are in the selection, the sum found earlier is divided by the number of items.

The final result? The interquartile mean!

Note:

I consider it highly likely that there are several ways to tweak my formula so it looks nicer, or perhaps replace it with something different altogether. If you have any ideas as to how, i'd appreciate a message - I've spent a ridiculous amount of time on this, and I would like to get it right :)

I like it!

2 C!s

(idea)

by Tom Rook

Tue Sep 12 2006 at 8:48:29

Update: Professor Pi has posted a much simpler method for Excel in his write-up, using a named range and the TRIMMEAN function. I originally used the MS Works spreadsheet, which has no TRIMMEAN function (as far as I know) so the below might be nice in Works, although a less painful alternative would be just to buy Excel.

Inspired by Sharq's write-up above, I decided to try my hand at a simpler way of calculating the interquartile mean in Excel. The operative word there is "try."

SharQ's write-up above shows that we can calculate the I-Q mean with one formula. We're going to break it down into several formulas, each of which will be a tiny step, so that at the end the process is clear.

To help us with the cases when the number of data points cannot be split evenly into fours, we will use this property of the interquartile mean: the I-Q mean of {x1, x2, x3, ..., xn} is the same as the I-Q mean of {x1, x1, x1, x1; x2, x2, x2, x2; ...; xn, xn, xn, xn}. In other words, if we create a new data set consisting of four copies of each element of the original set, the two sets will have the same interquartile mean. (This somewhat makes sense intuitively, since this is how the normal average works. If you are in a class with two exams and you score 90 and 95, your final grade will be the same as in a class with four exams in which you score 90, 90, 95, 95.) We will refer to the set with four copies of each item as the expanded set. Now that our expanded set is guaranteed to have a size that is a multiple of four, we no longer have to handle four cases in our formula, just one.

Enough talk. If you have the volition, fire up Excel and follow me.

Let's make some headers first. In A1, put "Data". In B1, put "IQ Mean". Our input will go into column A (starting on row 3), and our result will show up in cell B3. Skip C; label D "Range", E "Counters", F "Original", G "Expanded", H "Contribution", I "Sum", and J "Count."

In D3, put the formula = COUNT(A3:A10000)
In D4, put the formula = D3 * 3
In E2, put -1. (Read nothing into that.)
In E3, put the formula = E2 + 1
In F3, put the formula = A3
In G3, put the formula = VLOOKUP(INT(E3/4), $E$3:$F$10000, 2) and don't panic.
In H3, put the formula = IF(AND(E3>=$D$3, E3<$D$4), G3, 0)
In I3, put the formula = SUM(H3:H10000)
In J3, put the formula = 2*COUNT(A3:A10000)
In B3, put the formula = I3/J3

Now we want to copy the formulas in E3, F3, G3, H3 down at least a thousand times. To do this fast, highlight those four cells, hold Shift while hitting Page Down repeatedly, and then type Ctrl-D.

The construction of our spreadsheet is done. To use it, just enter in values (no more than 250) starting in cell A3 and going down the A column, then sort the values (using "Sort" on the Tools menu). Cell B3 will show the interquartile mean.

Although figuring it out yourself is half the fun, I'll explain it if you like...

In column F, we duplicate the entries of column A for convenience (and so that VLOOKUP works; that's later). The extra zeroes in the column (coming from empty cells in column A) can be ignored. In column E3, we number each value in column F, starting from 0 (since I'm a C-array kind of guy).

The formula in the G column is the magic part. In G3, it reads = VLOOKUP(INT(E3/4), $E$3:$F$10000, 2).

Huh?

Well, imagine that our columns E and F are a hashtable, mapping a value in column E to a value in column F. VLOOKUP(SOMEVALUE, $E$3:$F$10000, 2) returns the value in column F next to the cell in column E containing SOMEVALUE. (The parameter 2 tells Excel we want the second column in the range (the middle term).) In our actual formula, SOMEVALUE is INT(E3/4); this means that each number in column F will appear four times in column G. (INT rounds a decimal down to an integer.) So our column G looks like F[INT(0/4)], F[INT(1/4)], F[INT(2/4)], F[INT(3/4)], F[INT(4/4)], F[INT(5/4)], F[INT(6/4)], F[INT(7/4)], which is F[0], F[0], F[0], F[0], F[1], F[1], F[1], F[1]..., our expanded data set. (The VLOOKUP function is best understood by a warped mind; I am pleased to be at your service.)

In column H we eliminate the contributions of the low and high values from our expanded set. If the index of a value is not within our desired range (which comes from D3 and D4), we replace its original value with a zero. If the index is in the desired range, we pass the value on unmodified.

Finally, we add up column H, count how many values from the expanded set still remain, and find the quotient. This quotient is the interquartile mean.

This method uses a lot more space than SharQ's does above, but it requires (I think) a little less typing, complemented by a lot of copy-down-the-rows hullabaloo.

So now, armed with the power to calcuate interquartile means for any set with 250 elements or less, have fun, um, finding out your merit or something....

If you're lazy like me and just skipped to the bottom of the write-up (I only do that sparingly...) then here is a portion of the spreadsheet, with my current reputations entered in as data.

Data  IQ Mean       Range  Counters  Original  Expanded  Contrib  Sum  Count
                              -1
  -1  6.529...        17       0        -1        -1        0     222    34
  -1                  51       1        -1        -1        0
   1                           2         1        -1        0
   2                           3         2        -1        0
   3                           4         3        -1        0
   4                           .         4        -1        0
   5                           .         5        -1        0
   6                           .         6        -1        0
   6                                     6         1        0
   6                                     6         1        0
   8                                     8         1        0
  10                                    10         1        .
  11                                    11         2        .
  12                                    12         2        .
  13                                    13         2      (below,
  13                                    13         2       non-zero)
  17                                    17        ...

(This calculation of merit is slightly off from the one displayed on Honor Roll and you; I'm not sure why.)

I and my army of imaginary minions have striven to ensure that no slip-ups occurred in the transcription of data from Excel to E2. But we are not perfect, so if you attempt this and it fails or Excel attacks you, let me know. Thanks.

I like it!

1 C!

Honor Roll	inter-quartile mean	measures of central tendency	Everything Statistics - September 29, 2001 (3)
Interdecile Mean	Everything Statistics - September 29, 2001	Devotion	XP stoicism
Median	Incentivizing antisocial behavior	first quartile	Everything Statistics - January 20, 2002
Everything Statistics - September 29, 2001 (2)	Logarithmic mean	Albert Camus, in Kurt Vonnegut's style	Standard deviation from the median
weighted average	mean	Penn and Teller	Five number summary
IQM	Statistics every writer should know	Speed limits should be higher for motorcycles	The Bell Curve