Tutorial: How to Handle and Summarize Multiple Rows of Employee Movement Data

Gelatin silver print
Employees changes roles which is great for them…but a potential headache for those trying to corral the extra rows of employee data for each move. Today, I will show you how to summarize multiple rows of employee role movement data and get something useful into a single line.


HR Analytics often depends on data with a single row for each individual employee. Things change when it comes to movement data, though, because each move is usually captured with a new line of data.

This can be a headache for those just starting in HR analytics Knowing how to extract useful movement data tied a single individual employee can lead better insights that include the number of job changes for each person, the length of time in those jobs, or anything else you think important.

The Goal

We have two sets of data. One in the standard single row/ single person format and the other creates a new line of data capturing every role for each person. Our goal is to process that role movement data, get some meaningful movement information for each employee into a single line, and the combine that with our standard structure data.

Note: This will be essentially the same trick we used for adding team-level information. Sometimes it helps to see the same thing in different ways.

Make the Data

Once again, we will create some very simple data standard data. You can just copy and paste the following code into your Rstudio script.

### Standard Employee Data
emp_id <- 1:5 # employee number
role <- c('Marketing', 'Sales','Accounting', 'Sales', 'Marketing')
d <- data.frame(emp_id, role) 

Now let’s create the movement data with a new line of data for each change in role.

### Movement Data

emp_id <- rep(1:5,c(2,2,4,1,3)) # same people

# Roles occuppied
role <- c('Sales', 'Marketing', 'Marketing', 'Sales', 'Sales', 'Accounting',
          'Sales', 'Accounting', 'Sales', 'Marketing', 'Accounting', 'Marketing')

# Years in Each Role
yrs <- c(1,2,4,2,1,3,2,1,4,5,2,7)

# Combining the data together
move <- data.frame(emp_id, role, yrs)

Summarizing Individual Movement Data with the Aggregate Function

As I noted previously, the aggregate function is a foundational tool in R for those in HR Analytics. It works by splitting one variable up according to the values of some other variable. Then it applies a function (such as the mean) to each of those pieces separately.

This is something that you might do in an Excel pivot table but we are doing this in R because we want to combine it with our standard data without cutting and pasting (cutting and pasting = bad).

Writing the Aggregate Statement

In the current case, we are “modeling” (that is, breaking down) our our variable for years of experience (yrs) according the employee id. Here, we just want to get the mean number of years in the differing roles. Note that we are using a formula format and specifying that the data comes from the “move” dataset we created.

yrs_role_avg <- aggregate(yrs ~ emp_id, data = move, FUN = mean)
##   emp_id      yrs
## 1      1 1.500000
## 2      2 3.000000
## 3      3 1.750000
## 4      4 4.000000
## 5      5 4.666667

Note that now we have a single line for each individual employee.

Merging Summarized Movement Data Result

We can now combine this with our standard structure data.

We’ll do this by merging the original data (d) with the summarized movement data, merging everything according to the employee number.

##   emp_id       role
## 1      1  Marketing
## 2      2      Sales
## 3      3 Accounting
## 4      4      Sales
## 5      5  Marketing
new_d <- merge(d, yrs_role_avg, by = "emp_id")
new_d # It works!
##   emp_id       role      yrs
## 1      1  Marketing 1.500000
## 2      2      Sales 3.000000
## 3      3 Accounting 1.750000
## 4      4      Sales 4.000000
## 5      5  Marketing 4.666667

Counting the Number of Moves with the Aggregate Function

Of course, we are not just limited to the average number of years. For example, we could also count the number of roles they have had. We’ll do this with the “length” function, which just counts the length of the roles vector for each individual.

# Counting the Number of Roles
num_roles <- aggregate(role ~ emp_id, data = move, FUN = length)
##   emp_id role
## 1      1    2
## 2      2    2
## 3      3    4
## 4      4    1
## 5      5    3
# Let's change the name to "num_roles"
# Avoid confusion with "role" in d

names(num_roles)[2] <- "num_roles"

# Merge the data as before
new_d2 <- merge(d, num_roles, by = "emp_id")
##   emp_id       role num_roles
## 1      1  Marketing         2
## 2      2      Sales         2
## 3      3 Accounting         4
## 4      4      Sales         1
## 5      5  Marketing         3


Movement data reveals many important aspects of our employees and organizational changes over time. Sometimes, though, we just need to connect it with our standard 1-line/ 1-person information. Knowing just a little bit of R can help us here.

As I noted in a previous post, there are different ways of getting these operations done (e.g. tapply, ddply). The point is to explore and learn…but above all else getting the job done while also expanding your skillset.

Like this post?

Get our FREE Turnover Mini Course!

You’ll get 5 insight-rich daily lessons delivered right to your inbox.

In this series you’ll discover:

  • How to calculate this critical HR metric
  • How turnover can actually be a GOOD thing for your organization
  • How to develop your own LEADING INDICATORS
  • Other insightful workforce metrics to use today

There’s a bunch more too. All free. All digestible. Right to your inbox.

Yes! Sign Me Up!

Comments or Questions?

Add your comments OR just send me an email: john@hranalytics101.com

I would be happy to answer them!

Contact Us

Yes, I would like to receive newsletters from HR Analytics 101.