Visualizing Employee Turnover and Movement

Employee turnover and internal movements are a constant concern for HR professionals and business leaders. But sometimes we get caught up in the details and miss the forest from the trees.

Shifting our mindset from static percentages to employee flows and networks is a great way to communicate our findings and move our focus to relationships over time.

In today’s tutorial, we’ll borrow a page from economists and demographers to see how interactive Sankey diagrams (like the one above) can help us understand employee flows in our organizations.

The key idea is to think of employee turnover and movement as a network that reflects changes or flows of human capital over time.

If you are new HR Analytics, I hope this post will help you think about your reporting and analytics work a bit differently. Play around with the figures and think about your different reporting and decision needs.

If you are a little further along in your HR Analytics journey, you have some good starter code to help you tackle turnover and employee movement with some useful visualizations.

Making The Data

First we’ll need some data. As always, I strongly recommend playing along at home.

library(dplyr)
library(networkD3)
library(knitr)
set.seed(42)

# Creating our basic data

df <- data.frame(Time1 = sample(c('HR', 'IT', 'Finance', 'Operations', 'Sales'), 
                                  size = 300, replace = TRUE, prob = c(.1, .2, .2, .2, .3)), Time2 = character(300),
stringsAsFactors = F)
# Fill in Time2 Business Area
for (i in seq_along(df$Time1)){
    
     if (df$Time1[i] == 'HR'){
        df$Time2[i] <- sample(c('HR', 'IT', 'Finance', 'Operations', 'Sales', 'Departure'), size = 1,
                             prob = c(.7, .05, .05, .05, .05, .1))
     }
     if (df$Time1[i] == 'IT'){
    df$Time2[i] <- sample(c('HR', 'IT', 'Finance', 'Operations', 'Sales', 'Departure'), size = 1,
                         prob = c(.05, .6, .05, .05, .05, .2))
     }
     if (df$Time1[i] == 'Finance'){
    df$Time2[i] <- sample(c('HR', 'IT', 'Finance', 'Operations', 'Sales', 'Departure'), size = 1,
                         prob = c(.05, .15, .5, .15, .05, .1))
     }
     if (df$Time1[i] == 'Operations'){
    df$Time2[i] <- sample(c('HR', 'IT', 'Finance', 'Operations', 'Sales', 'Departure'), size = 1,
                         prob = c(.1, .05, .05, .6, .05, .15))
     }
     if (df$Time1[i] == 'Sales'){
    df$Time2[i] <- sample(c('HR', 'IT', 'Finance', 'Operations', 'Sales', 'Departure'), size = 1,
                         prob = c(0, 0, .1, .1, .55, .25))
    }
}

Visualizing Employee Movement and Turnover

Now the fun stuff! Time for our first interactive Sankey diagram. I’ve added comments so you can understand the code.

In the present example we’ll stick with just two points in time for a small subset of possible roles.

df_fil <- filter(df, Time1!=Time2) # filtering to just those moving

# Adding the different labels to distinguish the nodes at time 1 and time2
df_fil$Time1 <- paste(df_fil$Time1, "_1", sep = "")
df_fil$Time2 <- paste(df_fil$Time2, "_2", sep = "")

### Getting the counts for each source-target pair
df2 <- df_fil %>%
    group_by(Time1, Time2) %>%
    dplyr::summarize(counts = n()) %>%
    ungroup() %>%
    arrange(desc(counts))

# Setting up the nodes and links for the network
name_vec <- c(unique(df2$Time1), unique(df2$Time2))

nodes <- data.frame(name = name_vec, id = 0:(length(name_vec)-1)) #length of ids must equal number of unique locations

links <- df2 %>%
    left_join(nodes, by = c('Time1' = 'name')) %>%
    rename(origin_id = id) %>%
    left_join(nodes, by = c('Time2' = 'name')) %>%
    rename(dest_id = id)

# Sankey Visualization 1 --------------------------------------------------

sankeyNetwork(Links = links, Nodes = nodes, Source = 'origin_id',
             Target = 'dest_id', Value = 'counts', NodeID = 'name',
             fontSize = 12, nodeWidth = 30, units = 'hires')

Hovering over each thread will show you the number of people involved in each move.

You can also drag the nodes (labels) at the top or the bottom to move each of the pieces to help highlight particular comparisons of interest. Play around with the diagram and you’ll quickly appreciate how Sankey diagrams can help your employee movement and turnover numbers come to life.

Visualizing Talent Sources

Let’s apply this to our sources of talent. The code for creating the network below is a bit more manual and transparent although admittedly not nearly as elegant.

set.seed(42)

### Creating our names for the sources and business area target
temp_name <- c('Competitor 1', 'Competitor 2', 'Consultancy 1', 'Consultancy 2','College Recruiting', 
                             'HR', 'IT', 'Finance', 'Operations', 'Sales')

# Setting up the nodes in the figure
# Note that we need to start our indexing from 0 instead of 1.
nodes <- data.frame(name = temp_name, id = 0:9)

#getting all possible combinations of source and target using the ids
# Note that we need to start our indexing from 0 instead of 1.
# Our 5 sources are 0-4 and the 5 targets are 5 to 9
temp_cross <- expand.grid(c(0:4),c(5:9))


links <- data.frame(source = temp_cross$Var1, target = temp_cross$Var2, 
                    value = sample(c(5:25), dim(temp_cross)[1],replace = T))

sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
             Target = 'target', Value = 'value', NodeID = 'name',
             fontSize = 12, nodeWidth = 30, units = 'hires')

Visualizing Talent Sources and Promotion

Now let’s use Sankey diagrams to visualize human capital flows from initial hiring up to manager-level hires (or even higher depending on the data you have available). Remember the key idea: turnover, movement, and hiring as flows of human capital over time.

library(dplyr)
library(networkD3)

set.seed(42)

### Creating our names for the sources and business area target
temp_name <- c('Non-Exempt(Hourly) Worker Pool','Interns', 'College Recruiting','Competitor 1', 'Competitor 2', 
                             'HR Consultant', 'IT Consultant', 'Finance Consultant', 'Operations Consultant', 'Sales Consultant',
               'HR Manager', 'IT Manager', 'Finance Manager', 'Operations Manager', 'Sales Manager')

# Setting up the nodes in the figure
# Note that we need to start our indexing from 0 instead of 1.
nodes <- data.frame(name = temp_name, id = 0:14)

#getting all possible combinations of source and target using the ids
# Note that we need to start our indexing from 0 instead of 1.
# Our 5 sources are 0-4 and the 5 targets are 5 to 9
temp_cross <- expand.grid(c(0:2),c(5:9))


links <- data.frame(source = temp_cross$Var1, target = temp_cross$Var2, 
                    value = sample(c(5:30), dim(temp_cross)[1],replace = T))

### Adding a few more moves to illustrate the entire flow
temp_links1 <- data.frame(source = c(5:9) , target = c(10:14) , value = c(3, 2, 1, 3, 3))

temp_cross <- expand.grid(c(3:4),c(10:14)) #get all possible combinations

temp_links2 <- data.frame(source = temp_cross$Var1 , target = temp_cross$Var2 , 
                          value = sample(c(1:3), size = 10, replace = T))

links <- rbind(links, temp_links1, temp_links2)

sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
             Target = 'target', Value = 'value', NodeID = 'name',
             fontSize = 12, nodeWidth = 30, units = 'hires')

At a glance, we can see where our foundational consultant talent is coming from and how we are falling short when it comes finding internal talent for critical managerial roles. Picking off talent from the competitor is nice but it’s clear we have some employee development issues that need to be addressed. This storyline would be harder to detect using the typical tables and bar graphs in HR reports.

Putting It All Together: Hiring Sources, Promotion, and Turnover

Let’s throw in employee turnover to see a complete movement picture.

set.seed(42)

### Creating our names for the sources and business area target
temp_name <- c('Non-Exempt (Hourly) Worker Pool','Interns', 'College Recruiting','Competitor 1', 'Competitor 2', 
                             'HR Consultant', 'IT Consultant', 'Finance Consultant', 'Operations Consultant', 'Sales Consultant',
               'HR Manager', 'IT Manager', 'Finance Manager', 'Operations Manager', 'Sales Manager')

# Setting up the nodes in the figure
# Note that we need to start our indexing from 0 instead of 1.
nodes <- data.frame(name = temp_name, id = 0:14)

#getting all possible combinations of source and target using the ids
# Note that we need to start our indexing from 0 instead of 1.
# Our 5 sources are 0-4 and the 5 targets are 5 to 9
temp_cross <- expand.grid(c(0:2),c(5:9))


links <- data.frame(source = temp_cross$Var1, target = temp_cross$Var2, 
                    value = sample(c(5:30), dim(temp_cross)[1],replace = T))

### Adding a few more moves to illustrate the entire flow
temp_links1 <- data.frame(source = c(5:9) , target = c(10:14) , value = c(3, 2, 1, 3, 3))

temp_cross <- expand.grid(c(3:4),c(10:14))

temp_links2 <- data.frame(source = temp_cross$Var1 , target = temp_cross$Var2 , 
                          value = sample(c(1:3), size = 10, replace = T))

links <- rbind(links, temp_links1, temp_links2)

### Adding some departures ### 
# Adding the departure node
temp_nodes <- data.frame(name = "Departure", id = 15)
nodes <- rbind(nodes, temp_nodes)

# adding the links to departure and how many left
temp_links <- data.frame(source = c(5:14) , target = 15 , value = c(16, 12, 3, 11, 10, 3, 1, 2, 2,2))

links <- rbind(links, temp_links)

### Final Sankey

sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
             Target = 'target', Value = 'value', NodeID = 'name',
             fontSize = 12, nodeWidth = 30, units = 'hires')

This one does gets a little crowded so move the pieces around to your liking. Regardless, visualizing this movement as a flow over time instead of series of static snapshots is a big step in the right direction.

Final Thoughts

Yes, we will continue to be asked about those turnover numbers but that doesn’t mean we can’t find more compelling and informative ways to present that information. Context matters and Sankey diagrams give us the ability to see that context by combining time and networks.

Demographers and economists have known this for a long time and now we can bring their visualization insights to bear on analytics challenges within HR.

And Special Thanks Goes To….

You should check out this fantastic post by Kyle Walker. I used his code in creating our first network. I found the whole thing hugely helpful and very inspiring. We’ll definitely be leveraging some of these techniques in future posts.

Like this post?

Get our FREE Turnover Mini eCourse!

You’ll get 5 insight-rich daily lessons delivered right to your inbox.

In this series you’ll discover:

  • How to calculate this critical HR metric
  • How turnover can actually be a GOOD thing for your organization
  • How to develop your own LEADING INDICATORS
  • Other insightful workforce metrics to use today

There’s a bunch more too. All free. All digestible. Right to your inbox.

Yes! Sign Me Up!

Comments or Questions?

Add your comments OR just send me an email: john@hranalytics101.com

I would be happy to answer them!


Contact Us

Yes, I would like to receive newsletters from HR Analytics 101.