Tutorial: The (F)Law of Small Numbers and What to Do About It

One thing that is often overlooked in all the hubbub of analytics is the number of observations (or sample size). The dirty secret is that a small number of observations can dramatically distort what we believe about our organizations and lead to very bad, very inaccurate decisions.

The culprit is the “Law of Small Numbers”, the evil twin of the Law of Large Numbers.

Today You Will Discover:

  • What the Law of Small Numbers is
  • What it means in the context of HR
  • How to limit its impact (even if you have smaller groups) by using a moving average (“running average” or “rolling average”).

The Law of Large Numbers

Let’s start with the Law of Large Numbers.

The Law of Large Numbers states that as a sample size grows, the mean of the sample will get closer and closer to the true mean of the population.

Suppose you want to estimate the height of the average American. If you take a random sample of 100,000 people, you can be pretty sure that the mean of your sample will be pretty close to the true mean of the population.

Said differently, getting a sample of 100,000 people will give you a pretty good idea of what the true mean is.Moreover, you can be pretty sure that a different sample of another 100,000 people will be pretty similar.

But what happens if the sample is small?

The Law of Small Numbers

You can think of the Law of Small Numbers as the opposite of the Law of Large Numbers. As the sample size shrinks, we increase the chances that the mean of the sample deviates from that of the overall population. In statistical jargon, our sampling error increases.

As an extreme case, let’s suppose we again want to estimate the height of the average American but we use a sample size of only five people. Intuitively, you know this is a bad idea.

Your sample of five people might include a 7-foot basketball player or 4-foot-six Olympic gymnast but not enough of the other people to balance things out. Or it might include only men or only women.

Even without such an extremes though, estimates that depend on small samples are much more likely to vary wildly.

If I took another sample of just five people, the results from that sample would probably be pretty different from the first sample too. Note the contrast here with that from the 100,000 sample example above; it would be shocking (and statistically highly unlikely) to have two huge samples differ wildly from each other.

Observe also that this variation in outcomes with small samples has nothing to do with how we selected people. It’s just what happens if we have a limited number of observations.

And this lesson brings us to….HR Analytics!

Same Process but “Different” Outcomes

To see how, let’s take two different business units, Accounting and Sales. We’ll say Accounting has 20 people and Sales has 100.

Let us further suppose that each person in Accounting and each person in Sales has a 5% chance of quitting every month. We run a simple simulation here for 48 months.

To simplfy the example, we’ll just say that we hire as many new people as we need to replace the quitters.

Observe that there is absolutely nothing different between these two groups when it comes to the quitting process. Everyone has a 5% of quitting.

Yet huge differences between Accounting (small) and Sales (big) emerge when we look at the monthly turnover data.

An Example in R

The code for this simulation is provided below for those following along at home. If you don’t care about code that’s fine. Just skip those parts and focus on the figures and discussion.

library(ggplot2)#plotting library
library(reshape2)#reshaping data
library(scales)#using percentages
library(TTR) # for the simple moving averages functions

set.seed(42) # set the random seed to replicate results

# Set Up Small Sample
# Randomly determine # leaving each month
s1 <- rbinom(n = 48, size = 20, prob = .05) #48 draws (months), 20 people, .05 prob of leaving
sp <- s1/20 # calculate the percentage leaving each month
## [1] 0.06875
# Set Up Big Sample
# Randomly determine number leaving each month
b1 <- rbinom(n = 48, size = 100, prob = .05) #48 draws (months), 100 people, .05 prob of leaving
bp <- b1/100 # calculate the percentage leaving each month
## [1] 0.04604167

Figure 1 shows simulated turnover rate (y axis) for this group calculated every month for 48 straight months assuming a 5% chance of leaving their job.

The rate skips all over the place relative to the 5% baseline (black line), shooting up to 15% or 20% in some months and then plummetting to 0% in others. Moreover, we also see a few instances of high rates in consecutive observations (see time 30-32).

The mean turnover rate for the small Accounting group over these 48 months is 6.8%, kinda sorta close to the 5% value that we specified in our simulation process.

### Structure data and get the figure
all <- data.frame(id = seq_along(bp), small = sp, big = bp) # setting up the data frame
all <- melt(all, id.vars = "id") # reshaping for use with ggplot2

### Plot Small Sample Only
ggplot(data = all[all$variable =="small",], aes(x = id, y = value, group = variable, color = variable)) + 
    geom_line(size = 1.2) + scale_colour_brewer(palette="Dark2", name = "Group Size") + 
    geom_hline(yintercept = .05) + ylim(0,.25) + scale_y_continuous(labels=percent, name = "Turnover Rate")

What happens to our turnover numbers when the group is larger? Figure 2 compares the smaller Accounting group results with the larger Sales group turnover.

We certainly see some ups and downs for Sales, but generally the rates are consistently closer to the true 5% rate in the simulations (actually, it’s 4.6%). Gone are the huge swings all the way up to 15% or 20%.

### Big and Small Sample Together
ggplot(data = all, aes(x = id, y = value, group = variable, color = variable)) + 
    geom_line(size = 1.2) + scale_colour_brewer(palette="Dark2", name = "Group Size") + 
    geom_hline(yintercept = .05) + ylim(0,.25) + scale_y_continuous(labels=percent, name = "Turnover Rate")

Remember that the ONLY difference between these two groups is the size of group, not the quitting process. Every employee within these simulations is operating under 5% probability of quitting each month. That’s it.

What’s the lesson?

Changes in the group size (or sample size) can have a dramatic impact on the variability of a measure even when the underlying the processes are exactly the same.

If you are using small groups, don’t jump to conclusions when the rates go sky high or drop through the floor.

How to Deal with Small Numbers

What’s one to do?

There is no perfect solution. Statistics are statistics and if you are measuring outcomes with smaller groups, the core numbers will simply bounce around more.

Still, I think there are at leats three basic things one can do limit bad or rash decisions arising from the Law of Small Numbers.

Suggestion #1: Reduce Reporting Frequency

People want up-to-the-minute data but the Law of Small Numbers suggests that much of the fluctuations that we see are due to random variation and sampling error, not meaningful change.

One obvious solution is to report data less frequently, say, on a quarterly rather than a monthly basis.

If that sounds unrealistic, consider the following three benefits.

  1. Reduced time spent creating reports
  2. Reduced leader time trying to understand and explain random noise (v. true change)
  3. Reduced likelihood of making decisions based on imaginary patterns.

There’s an argument to be made for simply reporting less often.

Suggestion #2: Use a Simple Moving Average

If you can’t reduce reporting frequency, you should consider using a moving average. This also sometimes referred to as a “running average” or “rolling average”.

A simple moving average is averaging some value over multiple time periods. In the example below, we average the current value with the preceding two values. We are a using a “bucket size” of three but the choice is up to you.

If this is the month of March, for example, then we would average the turnover data from March with the two months immediately preceding it (January and February). Similarly in April we would average the April data with that from February and March and so on.

To see how a simple moving average works, compare the first observations from the small sample data and the simple moving average of that same data.

Notice that the first two spots in the moving average are empty. That is because we don’t have two preceding observations to help us calculate a moving average.

Looking at the raw numbers, we can that the average of 2,3, and 0 is 1.67, giving us the first simple moving average value.

s1_sma <- SMA(s1, 3) #making a simple moving average, averaging over 3 items
head(s1, 10) #first 10 raw observations
##  [1] 2 3 0 2 1 1 2 0 1 1
head(s1_sma, 10) # first 10 moving average
##  [1]        NA        NA 1.6666667 1.6666667 1.0000000 1.3333333 1.3333333
##  [8] 1.0000000 1.0000000 0.6666667

Moving averages help smooth out the data. This makes it easier to spot trends and, in this current case, ignore fluctuating noisy data.

Figure 3 shows you the orginal 20-person Accounting turnover rate data together with the moving average data. Note how the moving average eliminates the huge peaks and troughs.

In practical terms this means eliminating the compulsion to explain striking but nonetheless random rises and falls.

s1_sma_perc <- s1_sma/20 # converting from raw number of to a rate
plot(sp, col = "red", type = "l", lwd = 2.5, main = "Monthly v. Simple Moving Average", 
     xlab = "Observation Month", ylab = "Turnover")
lines(s1_sma_perc, col = "blue", lwd = 2.5)

legend(x = 30,y = .18, legend = c("Monthly","Moving Average"), lty=c(1,1),
lwd=c(2.5,2.5),col=c("red","blue")) # gives the legend lines the correct color and width

Suggestion #3:

Find Other Measures Less Impacted

A final option is finding measures of “operational health” that might be less impacted by the Law of Small Numbers. For instance, instead of looking at completed sales numbers which might vary substantially from month to month (especially with just a few sales reps), look instead at more persistent, process-based measures like contacts initiated or outgoing current client calls.

Changes in sales numbers might be subject to a variety of external factors in any given month. Changes in the effort that sales reps put forward to secure those sales likely are not.

Final Thoughts

Two months from now when you are looking at turnover rates, new hire rates, sales, or some other metric remember the Law of Small Numbers. In the presence of fewer observations, a few bad months don’t necessarily reflect poor management decisions or a call to action. It might just be the luck of the draw.

The reverse is also true. A few great months don’t necessarily reflect your genius or that of other leaders. If small numbers are at play, be careful what you conclude.

Like this post?

Get our FREE Turnover Mini Course!

You’ll get 5 insight-rich daily lessons delivered right to your inbox.

In this series you’ll discover:

  • How to calculate this critical HR metric
  • How turnover can actually be a GOOD thing for your organization
  • How to develop your own LEADING INDICATORS
  • Other insightful workforce metrics to use today

There’s a bunch more too. All free. All digestible. Right to your inbox.

Yes! Sign Me Up!

Comments or Questions?

Add your comments OR just send me an email: john@hranalytics101.com

I would be happy to answer them!

Contact Us

Yes, I would like to receive newsletters from HR Analytics 101.