## Calculate Employee Turnover the Easy Way

In today’s tutorial I show you how to calculate employee turnover from just three columns of data: employee id, start date, and the termination date.

For simplicity, we’ll stick with monthly turnover calculations and we’ll assume all turnover is voluntary but the basic techniques can be adapted to suit your needs.

Unlike some of my earlier posts from several years ago, I make healthy use of the tidyverse approach to R… a real lifesaver. I’ve covered the fundamental logic of calculating turnover (see here, here, and here).

If you are new to R and not familiar with the tidyverse, check out the resources here. The most valuable package will be dplyr.

If you want to skip the details about creating the dates for the sample data, just run the code to make your sample data and then skip ahead to the second part where we deal with the meat of the turnover calculations themselves.

## Libraries

We’ll use the tidyverse and lubridate libraries.

library(tidyverse)
library(lubridate) 

## Part 1: Making Data and Data Wrangling

### Create Basic Data

Let’s first make some dummy data so we can calculate employee turnover.

You need to run this data creation step for your sample data but if you want to understand how it works, in brief it does the following:

• Creates a sample of 2000 people, each with a start date
• Randomly assign 25% of the people to the quit (terminated) category.
• Assigns a term date to those 25% who left the organization at some point.
set.seed(42)

d <- tibble(start_date = sample(seq(as.Date('2010/01/01'), Sys.Date(), by = "day"),
size = 2e3, replace = T),
quit = sample(c(0, 1), size = 2e3, replace = T,
prob = c(.75, .25))) %>%
rowwise() %>%
mutate(term_date = case_when(quit == 0 ~ as_date(NA),
TRUE ~ start_date +
sample(1:(Sys.Date() - start_date), 1))) %>%
ungroup() %>%
mutate(id = row_number()) %>%
select(id, start_date, term_date, -quit) 

So now we have our absolute minimum data:

• Employee ID
• Start Date
• Term Date (when applicable)
glimpse(d)
## Rows: 2,000
## Columns: 3
## $id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, … ##$ start_date <date> 2017-02-21, 2021-02-20, 2016-06-26, 2013-01-02, 2013-06-05…
## $term_date <date> 2021-10-09, NA, 2018-09-11, 2015-07-16, NA, NA, NA, NA, 20… ## Part 2: Turnover Fest! ### Creating a Sequence of Measurement Periods Now the fun part! A key tool here is the seq which allows us to make a series of dates. We can see how this seq function works in the example below going from a random date to today’s date. seq(as_date("2021-07-21"), Sys.Date(), by = "month") ## [1] "2021-07-21" "2021-08-21" "2021-09-21" "2021-10-21" "2021-11-21" ## [6] "2021-12-21" "2022-01-21" "2022-02-21" "2022-03-21" "2022-04-21" ## [11] "2022-05-21" "2022-06-21" "2022-07-21" "2022-08-21" "2022-09-21" Here, we’ll use the seq function to create a series of months, beginning with the first month of our company all the way until today. Then we’ll apply the floor_date function to just get the first of the month for each of our series of dates. To the last of day of the processing month, we’ll use the ceiling_date date function (minus one day). processing_month_start <- seq(min(d$start_date), Sys.Date(), by = "month") %>%
floor_date(unit = "month")

head(processing_month_start)
## [1] "2010-01-01" "2010-02-01" "2010-03-01" "2010-04-01" "2010-05-01"
## [6] "2010-06-01"
processing_month_end <- ceiling_date(processing_month_start, unit = "month") - days(1)

head(processing_month_end)
## [1] "2010-01-31" "2010-02-28" "2010-03-31" "2010-04-30" "2010-05-31"
## [6] "2010-06-30"

### Writing the Function to Calculate Employee Turnover

Now we are going to create function that take in the processing start and end dates one at time, figures out who was present at the beginning of the the month, who left or started at some point in the month, and how many we ended the month with.

These are the key pieces of data we need to calculate turnover in each month.

I’ve included comments in the code below so that you can understand each piece.

If you want to explore the outputs within the function in detail, use the browser() function as you would when debugging.

calc_monthly_turnover <- function(data, period_start, period_end){

# browser() # You can uncomment this to explore each output as it runs

# Present at the start of the period if you joined before on
# the start of the period AND (you have never terminated OR
# your term date was on or after the start of the period

employees_present_start <-  data %>%
filter(start_date <= period_start &
(is.na(term_date) |term_date >= period_start))

# Count the number of people who were present at period start

num_employees_present_start <- nrow(employees_present_start)

# Present at the end of period if your start date was before the
# the end of the period AND (you never terminated OR your term date
# was after the end of the period)

employees_present_end <- data %>%
filter(start_date <= period_end &
(is.na(term_date) | term_date > period_end))

# Count number of people present at end of period

num_employees_present_end <- nrow(employees_present_end)

# Count number of people terminating in this same period
# Here using months as the unit of time

employees_terminating <- format_ISO8601(period_start, precision = "ym") ==
format_ISO8601(data\$term_date, precision = "ym")

# Counting the number of terminations in this month period
num_employee_terms <- sum(employees_terminating, na.rm = T)

# Calculating final turnover value for the month
# Number of terminations in month divided by the
# average of number of employees start and end
turnover <- num_employee_terms/(mean(c(num_employees_present_start,num_employees_present_end )))

return (turnover)
}

### Testing with One Period

Before I show you how to run this for all the periods, let’s just try to calculate employee turnover for a single sample period.

We’ll choose a start and end date for our month of interest and then plug in those values.

I added on a call to the scales::percent function to give us a percent instead of a tiny decimal.

sample_start_date <- as.Date("2018-09-01")

sample_end_date <- as.Date("2018-09-30")

calc_monthly_turnover(d, period_start = sample_start_date,
period_end = sample_end_date) %>%
scales::percent(accuracy = .01)
## [1] "0.34%"

### Run for All Periods

Now we are going to take our sequence of processing start and end dates we built earlier plus our data and calculate turnover for every period.

To do this, we are going to use the map function from the purrr package. If you are not familiar with the map function and the purrr package, it’s definitely worth the investment…see the resources here.

In brief, the map function takes a vector of input(s) and plugs those into a function. It’s like a for loop but way more powerful and flexible.

Here we are going to feed the vector of process starting dates, the vector of processing end dates, and the data into our calc_monthly_turnover function. The map will apply the two vectors to our function. We’ll actually use the map2 function because are providing two vectors of input.

turnover_rates <- purrr::map2_dbl(processing_month_start, processing_month_end, ~calc_monthly_turnover(data = d, .x, .y))

Finally, we can put our vectors of processing dates and the rates into a dataframe and then make a simple visualization in ggplot.

This is just randomly generated data so we don’t want to spend too much time interpreting things but it is worth pointing that things seem highly variable in the beginning.

The reason is that we had far fewer employees in the beginning. If you only have 4 employees and 1 quits, BAM!, there goes 25%…followed by a bunch of 0s when no one quits.

But as we added employees, the rate starts
to stabilize from one period to the next. With fewer employees we have more seemingly “extreme events” but as we add employees, we have more and more flips of the coin and the numbers start to approach the long term mean.

turnover_df <- tibble(period_start = processing_month_start,
period_end = processing_month_end,
turnover_rate = turnover_rates)

ggplot(turnover_df, aes(period_start, turnover_rates)) +
geom_point() +
geom_line() +
scale_y_continuous(labels = scales::percent_format(scale = 100), limits = c(0, .05)) +
xlab(label = "Period") +
ylab(label = "Turnover Rate") +
ggtitle(label = "Monthly Turnover Rates") +
theme_bw()

## Conclusion

In today’s tutorial you learned how to calculate employee turnover with just three columns of data….and really two given that we didn’t even use the id col.

The bottom line is that if you pause and think through the fundamental logic of how to count when someone is or is not at the company, you can come up with simpler code that easier to understand, explain, modify, and implement.

Here we compared the start and termination dates with the start and end of the periods.

To really understand the logic, work through the code, experiment with your own data, compare the results with your expectations and give it a go! It’s the only way to learn.