Tutorial: How to Handle and Summarize Multiple Rows of Employee Movement Data
HR Analytics often depends on data with a single row for each individual employee. Things change when it comes to movement data, though, because each move is usually captured with a new line of data.
This can be a headache for those just starting in HR analytics Knowing how to extract useful movement data tied a single individual employee can lead better insights that include the number of job changes for each person, the length of time in those jobs, or anything else you think important.
We have two sets of data. One in the standard single row/ single person format and the other creates a new line of data capturing every role for each person. Our goal is to process that role movement data, get some meaningful movement information for each employee into a single line, and the combine that with our standard structure data.
Note: This will be essentially the same trick we used for adding team-level information. Sometimes it helps to see the same thing in different ways.
Make the Data
Once again, we will create some very simple data standard data. You can just copy and paste the following code into your Rstudio script.
### Standard Employee Data emp_id <- 1:5 # employee number role <- c('Marketing', 'Sales','Accounting', 'Sales', 'Marketing') d <- data.frame(emp_id, role)
Now let’s create the movement data with a new line of data for each change in role.
### Movement Data emp_id <- rep(1:5,c(2,2,4,1,3)) # same people # Roles occuppied role <- c('Sales', 'Marketing', 'Marketing', 'Sales', 'Sales', 'Accounting', 'Sales', 'Accounting', 'Sales', 'Marketing', 'Accounting', 'Marketing') # Years in Each Role yrs <- c(1,2,4,2,1,3,2,1,4,5,2,7) # Combining the data together move <- data.frame(emp_id, role, yrs)
Summarizing Individual Movement Data with the Aggregate Function
As I noted previously, the aggregate function is a foundational tool in R for those in HR Analytics. It works by splitting one variable up according to the values of some other variable. Then it applies a function (such as the mean) to each of those pieces separately.
This is something that you might do in an Excel pivot table but we are doing this in R because we want to combine it with our standard data without cutting and pasting (cutting and pasting = bad).
Writing the Aggregate Statement
In the current case, we are “modeling” (that is, breaking down) our our variable for years of experience (yrs) according the employee id. Here, we just want to get the mean number of years in the differing roles. Note that we are using a formula format and specifying that the data comes from the “move” dataset we created.
yrs_role_avg <- aggregate(yrs ~ emp_id, data = move, FUN = mean) yrs_role_avg
## emp_id yrs ## 1 1 1.500000 ## 2 2 3.000000 ## 3 3 1.750000 ## 4 4 4.000000 ## 5 5 4.666667
Note that now we have a single line for each individual employee.
Merging Summarized Movement Data Result
We can now combine this with our standard structure data.
We’ll do this by merging the original data (d) with the summarized movement data, merging everything according to the employee number.
## emp_id role ## 1 1 Marketing ## 2 2 Sales ## 3 3 Accounting ## 4 4 Sales ## 5 5 Marketing
new_d <- merge(d, yrs_role_avg, by = "emp_id") new_d # It works!
## emp_id role yrs ## 1 1 Marketing 1.500000 ## 2 2 Sales 3.000000 ## 3 3 Accounting 1.750000 ## 4 4 Sales 4.000000 ## 5 5 Marketing 4.666667
Counting the Number of Moves with the Aggregate Function
Of course, we are not just limited to the average number of years. For example, we could also count the number of roles they have had. We’ll do this with the “length” function, which just counts the length of the roles vector for each individual.
# Counting the Number of Roles num_roles <- aggregate(role ~ emp_id, data = move, FUN = length) num_roles
## emp_id role ## 1 1 2 ## 2 2 2 ## 3 3 4 ## 4 4 1 ## 5 5 3
# Let's change the name to "num_roles" # Avoid confusion with "role" in d names(num_roles) <- "num_roles" # Merge the data as before new_d2 <- merge(d, num_roles, by = "emp_id") new_d2
## emp_id role num_roles ## 1 1 Marketing 2 ## 2 2 Sales 2 ## 3 3 Accounting 4 ## 4 4 Sales 1 ## 5 5 Marketing 3
Movement data reveals many important aspects of our employees and organizational changes over time. Sometimes, though, we just need to connect it with our standard 1-line/ 1-person information. Knowing just a little bit of R can help us here.
As I noted in a previous post, there are different ways of getting these operations done (e.g. tapply, ddply). The point is to explore and learn…but above all else getting the job done while also expanding your skillset.
Like this post?
Get our FREE Turnover Mini Course!
You’ll get 5 insight-rich daily lessons delivered right to your inbox.
In this series you’ll discover:
- How to calculate this critical HR metric
- How turnover can actually be a GOOD thing for your organization
- How to develop your own LEADING INDICATORS
- Other insightful workforce metrics to use today
There’s a bunch more too. All free. All digestible. Right to your inbox.
Yes! Sign Me Up!
Comments or Questions?
Add your comments OR just send me an email: firstname.lastname@example.org
I would be happy to answer them!
- © 2023 HR Analytics 101