Trouble in Turnover Land: Signal, Noise, and HR Reporting Part 2
Introduction
In our first post on reporting frequency and noise, we showed how reporting on outcomes more frequently lead to noisier, less accurate findings. As a result, we end up wasting time and energy creating the reports with the latest data, confusing random shifts with real change, and taking strong actions to correct problems we don’t have.
In today’s post, I’ll apply these basic insights to everyone’s favorite HR measure: turnover. My aim is to show you just how pernicious increased reporting can be in HR analytics and help you develop a case for resisting louder calls for evermore frequent reports. As you will see once again, more is definitely less.
What You Will Learn
- Why monthly turnover measures can vary widely
- Why annualized turnover measures are often wildly inflated
- What to do about it
Monthly Turnover: More Reporting Means More Outliers
Let’s look at the impact of reporting frequency on monthly turnover data. In this example, we’ll use the following set up:
- Situation:
- 10% annual turnover
- 1000 employees
- 100 departures per year on average
We’ll keep the number of employees at 1000 throughout to keep things simple. That would normally vary depending on a host of factors like backfill rate, new positions, reorgs, etc. but there is no value in going to that level of detail.
- Three Time Intervals Tested:
- Annual (10% = 100 expected departures per year)
- Quarterly (100/4 = 25 expected departures per quarter on average)
- Monthly (100/ 12 = 8.3 expected departures per month on average)
Turnover numbers are typically annualized to make comparisons easier. That is what we will do here and, as you will see, there are some negative consequences.
Monthly Turnover: Monte Carlo Simulations
To create the data we need to understand the impact of reporting frequency, we’ll use Monte Carlo simulations. If you are not familiar with Monte Carlo simulations, you can think of it like using a computer to roll a die or flip a coin 1 million times and saving the results. We can look at those results and easily get some intuitive insights that would be impossible otherwise.
In these simulations, we’ll make 1,000,000 observations for each of our three time conditions.
In each condition, we have a 1000 employees and we make an individual stay/ quit “decision” for each individual employee according to the expected departure rate.
For example, when simulating our quarterly turnover, we are essentially flipping a coin for each of our 1000 employees with the chance of any one individual quitting in a given quarter at 2.5% (.1/4(quarters) = .025).
Doing this a million times will give us a set of independent observed turnover rates for one million quarterly observations.
To preview our results, when we annualize the quarterly and monthly turnover results, we see a massive increase in the range of reported “annualized turnover rates”. The impact of this noise is magnified when we annualize our results, leading us to report many more severe outcomes.
Two Big Things
I’m providing the code so you can play along at home but there really just two big things to remember.
- When we “annualize” turnover we convert the results from our given time scale to the annual scale. If you have a monthly turnover rate you multiply that value by 12 to see what that rate would look like over the course of a year. For instance, if you lost 4% in a month than you would expect to lose 48% over the course of a year if you continued at that rate (see this post for more on annualized turnover rates).
- The importance of the signal-to-noise ratio (SNR). As we learned in the first post, SNR is just the ratio of the mean to the standard deviation. Annualizing our turnover results from quarterly or monthly turnover data dramatically increases the standard deviation and therefore the random swings in our results. Cranking up the noise means lowering the SNR, and, ultimately, big movements in the reported turnover results without any real change in the underlying system.
The aim here is to show you how this happens to all of us everyday when we are scrambling to get our latest HR data into the hands of leaders.
Now on to the results:
# Monte Carlo Annualized Turnover
library(plyr)
library(dplyr)
library(knitr)
# Creating the matrix
set.seed(42)
obs <- 1000000
sets_ann <- matrix(data = NA, nrow = obs, ncol = 3)
for (i in c(1:3)){
time <- c(1,4,12)[i]
p <- 1000
pr <- .1/time #calculating number of expected quits based on time unit
# Multiplying the result by the time factos is what I need to "annualize" it
sets_ann[,i]<- rbinom(obs, p, pr)*time # Get annualized rate
}
sets_df <- data.frame(Annually = sets_ann[,1], Quarterly = sets_ann[,2], Monthly = sets_ann[,3])
# Get SNRs of the raw frequent data ---------------------------------------
SNR <- lapply(sets_df[,1:3], FUN = function(x){
temp_df <- data.frame(round(mean(x), 2), round(sd(x), 2), round(mean(x)/sd(x), 2))
})
SNR <- ldply(SNR)
names(SNR) <- c("Frequency", "Ann_Mean", "SD", "SNR")
#row.names(SNR) <- c("Annual", "Quartely", "Monthly")
#kable(SNR[,2:4], align = 'l')
kable(SNR)
Frequency | Ann_Mean | SD | SNR |
---|---|---|---|
Annually | 100.00 | 9.49 | 10.54 |
Quarterly | 99.98 | 19.76 | 5.06 |
Monthly | 100.04 | 34.48 | 2.90 |
Monte Carlo Results: By the Numbers
What do our results show us? First, note that our average annualized results all give 100 as we would expect. That is, on average, we are losing 100 people per year regardless of whether we calculate that based on annual, quaterly, or monthly data. This is just a sanity check that we are simulating what we think we are simulating.
The key thing we see is a dramatic increase in the standard deviation of our results. Just moving from annual to quarterly reporting doubles our SD and cuts our SNR in half. Moving from quarterly to monthly reporting of annualized turnover is similarly damaging.
What this means in practice is that we’ll end up many more “good” periods of turnover but also many more “bad” periods of turnover as we increase reporting frequency. Whether we’re patting ourselves on the back or gnashing our teeth, we’ll often overreact based on solely on more noisy data and not any real underlying change.
Monte Carlo Results: By the Pictures
A picture is worth a thousand tables so let’s use some histograms. The standard definition of an outlier is a result that is roughly +/- 2 SDs from the mean. We know that the SD for our true annual Monte Carlo results is 9.5. We’ll draw that histogram with 2 red bars to highlight the boundaries for these outlier values. About 5% of our true annual observations fall into either of these outlier areas.
set.seed(42)
temp_samp <- sets_df[sample(nrow(sets_df), size = 8000), ]
# Getting the Outliers for the annual distributions
lout_2 <- mean(sets_df$Annually) - 1.96*sd(sets_df$Annually)
rout_2 <- mean(sets_df$Annually) + 1.96*sd(sets_df$Annually)
hist(temp_samp[,1], breaks = 20, xlim = c(0,250), main = "Annual")
abline(v = lout_2, col = 'red', lwd = 3)
abline(v = rout_2, col = 'red', lwd = 3)
mtext(text = '-2 SDs', side = 1, line = .5, at = lout_2, cex = .7)
mtext(text = '+2 SDs', side = 1, line = .5, at = rout_2, cex = .7)
Now we’ll use those same outlier values but apply them to the annualized QUARTERLY turnover. When we do this, a funny thing happens: we end up with 36% of our annualized quarterly turnover values falling into the outlier category, not 5%.
These extreme results are simply due to noise but I assure you that won’t be the interpretation when we present the results to senior leadership.
hist(temp_samp[,2], breaks = 20, xlim = c(0,250), main = "Quarterly Annualized")
abline(v = lout_2, col = 'red', lwd = 3)
abline(v = rout_2, col = 'red', lwd = 3)
mtext(text = '-2 SDs', side = 1, line = .5, at = lout_2, cex = .7)
mtext(text = '+2 SDs', side = 1, line = .5, at = rout_2, cex = .7)
It gets worse with annualized MONTHLY turnover numbers: a full 60% of our annualized observations fall into the outlier range.
hist(temp_samp$Monthly, breaks = 20, xlim = c(0,250), main = "Monthly Annualized")
abline(v = lout_2, col = 'red', lwd = 3)
abline(v = rout_2, col = 'red', lwd = 3)
mtext(text = '-2 SDs', side = 1, line = .5, at = lout_2, cex = .7)
mtext(text = '+2 SDs', side = 1, line = .5, at = rout_2, cex = .7)
Summary and Recommendations
In this post, I applied the issue of reporting frequency to the turnover. This is a big topic and a huge concern for employers big and small. Unfortunately, in our haste to get the most recent data and quickly react to changes in our organizations, we end up mistaking noise for signal.
The following table summarizes these results, showing the percentage of outliers we get depending on the frequency of the underlying measure. Increasing reporting frequency increases the noise in our data. When we then turn around and annualize those outcomes, the results are ugly.
# Getting percentages of sims that exceed 2 SDs
outs <- lapply(sets_df, FUN = function(x) round(mean(as.numeric(x > rout_2 | x < lout_2)),2))
outs <- ldply(outs)
names(outs) <- c("Frequency", "Prop_Outliers")
print(outs)
## Frequency Prop_Outliers
## 1 Annually 0.05
## 2 Quarterly 0.36
## 3 Monthly 0.60
What do we do? I have the following recommendations:
- Take these issues to heart and repeatedly emphasize the practical consequences of constantly reporting on the latest data. In practice, you’ll stand a decent chance of shifting some key reporting measures such as turnover to a quarterly basis instead of monthly. That might seem like a small victory, but you should take it.
- Closely related, you can also make a meaningful case that efforts to report monthly findings are instead better directed to other areas of HR analytics. By reducing the reporting load, you’ll not only get more accurate data but save valuable resources.
- When you are reporting turnover (or other cummulative, time-sensitive measures) you would do well to use a running average over some time window (say 3-4 months). This is a sort of compromise that allows you to take in the most recent data without letting your results get totally swamped by the noise that accompanies frequent measurement.
- Fight the good fight. Take your company’s historic data and actually show people the difference between the results obtained in a given month versus those for the year overall. Show how annualizing data led to results that did not ultimately reflect reality. You may find you can make a compelling case for reducing reporting frequency and preventing leaders from making big decisions based on the noise instead of the signal.
Like this post?
Get our FREE Turnover Mini Course!
You’ll get 5 insight-rich daily lessons delivered right to your inbox.
In this series you’ll discover:
- How to calculate this critical HR metric
- How turnover can actually be a GOOD thing for your organization
- How to develop your own LEADING INDICATORS
- Other insightful workforce metrics to use today
There’s a bunch more too. All free. All digestible. Right to your inbox.
Yes! Sign Me Up!
Comments or Questions?
Add your comments OR just send me an email: john@hranalytics101.com
I would be happy to answer them!
Contact Us
- © 2023 HR Analytics 101
- Privacy Policy