Averages: What Do They Really Mean? Part 2

post_1_distribution_salaryIn Part 1, we noted that means work pretty well for many HR analytics measures like salary…unless Bill Gates’ salary or a gaggle of entry-level accountants come along to throw off our calculations. How can we can tell when a mean might be misleading? What can we do about it?

Summary Points

  • There are two common causes of misleading means: outliers (extreme scores) or a set of scores that are shifted strongly in one direction or another (skewed distributions).
  • Plotting your data will tell you if outliers or skewed distributions are impacting your mean.
  • The median (the midway point in your data) can be used instead of the mean to limit the impact of outliers and skewed distributions.

The Basics

Outliers: What are they?

Outliers are simply scores that are extremely high or extremely low. Outliers can have a dramatic effect on the mean.

Example:

In Part 1, we saw how the simple addition of Bill Gates’ $2 billion annual salary dramatically impacted the average salary for a sample of people over the age of 50 with some college education. But are outliers really relevant for day-to-day Human Resources and other organizational measures? Absolutely.

Let’s suppose we have a group 15 recruiters. Of these, 14 have been hired within the last 3 years because of a dramatic expansion in talent acquisition and normal attrition. However, our most senior recruiter has been with the company since its founding 14 years ago. So what?

post_1_part2_tenure

Figure 1. Impact of a single outlier on average recruiter tenure.

As we can see in Figure 1, this single outlier increases our average recruiter tenure from 1.27 years to 2.1 years. That’s an increase of almost 70%. If we looked just at the mean without knowing the underlying data, we would be substantially overestimating the total amount of company experience at this critical position.

Plotting our data using a histogram before calculating the mean can help us spot such outliers immediately:

post_1_part_2_outliers

Figure 2. Histogram of tenure data.

Figure 2 shows that most of our recruiters are on the far left, indicating that they have comparatively limited experience. Our outlier with 14 years of experience, on the other hand, is all the way on the right and has a huge impact on our average for a group this size. If we plot our data first before rushing to the average, we can spot these issues.

Conclusion? Outliers can produce misleading averages in real HR/ Human Capital contexts, not just hypothetical examples involving Bill Gates’ and a huge salary.

Skewed Distributions: What are they?

A skewed distribution is a distribution with many values clustered on either end of the scale (see the “Negatively Skewed” and “Positively Skewed” in Figure 3).  These can be contrasted with the normal distribution (“the bell curve”) which is essentially symmetrical.

post_1_skewed_1

Figure 3. The impact of distribution shapes on the mean.

There are two key ideas here:

  • When our scores are generally balanced (like the “normal”, green distribution), the mean serves as a good representation of the data.
  • When our scores are not balanced, the mean can be distorted. Specifically, for the “Negatively Skewed” figure, we see that the mean is pulled away (in the negative direction) away from the bulk of the scores by the comparatively low scores on the far left. We see the opposite effect inthe “Positively Skewed” figure; the mean is pulled in a positive direction away from the bulk of scores and towards the limited number of extremely high scores.

We can use the median instead to limit the impact of these distortions.

The Median: What is it?

The median is the midpoint value for a set of measures; half the measures will be below the median value and half will be above the median value.

As we can see in Figure 3, the median is less impacted by outlier scores or skewed distributions; the medians are not shifted away from the bulk of our data like the means are. If you plot your data and see either outliers or distorted distributions pushed in one direction or the other, you should strongly consider using the median as a key summary statistic instead of the mean.

As an example, consider again our preceding calculations for company tenure with our 14-year outlier. There, reporting the presence of the outlier shifted our average tenure measure from 1.27 years to 2.1 years. The median would produce a cleaner result, giving us 1.21 years without the outlier and 1.29 with the outlier. The median in this case better reflects the high number of low-tenure people currently in this position.

Why Should I Care?

The examples here highlight how the mean can be distorted in either direction because of a handful scores. So what?

Means are tossed around in everything from politics to sports to reinforce an argument. But when was the last time someone presented information about both the means AND the distributions? It’s rare and that’s a shame because those distributions can tell us much more about a measure and a group than any single number can.

What is the take home lesson? Before you make a decision based on a comparison of means, be sure to ask questions about the distribution of the underlying scores. If possible, ask for the medians as well. If you are doing the analyses yourself, take the extra step and look at the data first.

Conclusion and Final Recommendations

The mean is the most commonly reported statistic for good reasons: it is an extremely powerful summary measure and differences in means can often tell an accurate and compelling story. Still, means can also be misleading. Before rushing off to report means and make bar graphs, be sure to do the following:

  1. PLOT YOUR DATA! Scatterplots are great but even a simple histogram like the one in Figure 2 is nice start. Does the mean accurately represent what is really going on or is there another story to tell?
  2. If everything looks pretty, well, go ahead and calculate the median anyway. If the mean and median are meaningfully different (broadly defined), look at your data again and ask why.
  3. Include that histogram with your results. If your presentation window is short, then at least stick it in the back of your slide deck and have it handy. The knowing members of your audience will appreciate the extra attention to detail. If you have the opportunity to bring it up, other members of your audience may also have insights about your measure or your sample worth additional discussion.

Like this post?

Get our FREE Turnover Mini Course!

You’ll get 5 insight-rich daily lessons delivered right to your inbox.

In this series you’ll discover:

  • How to calculate this critical HR metric
  • How turnover can actually be a GOOD thing for your organization
  • How to develop your own LEADING INDICATORS
  • Other insightful workforce metrics to use today

There’s a bunch more too. All free. All digestible. Right to your inbox.

Yes! Sign Me Up!

Comments or Questions?

Add your comments OR just send me an email: john@hranalytics101.com

I would be happy to answer them!

Contact Us

Yes, I would like to receive newsletters from HR Analytics 101.