Tutorial: How to Move HR Data From Excel to R and Back

traffic_excel_backIt’s pretty easy to move data from Excel into R, do a few fancy things, and then ship things right back to your Excel comfort zone.

In today’s tutorial I will show you the basic steps.

What You Will Learn

  • How to open an Excel file in R
  • How to do a team-level calculation using that data
  • How to merge the new team-level data with your original data
  • How to save your new combined file for use with Excel

Preliminaries: The Excel File Format

Excel files are typically saved in the .xlsx format.

In the HR analytics though, it is generally best to deal with .csv files instead. The “.csv” stands for comma separated values and tells data tools how to separate columns.

When later applying the lessons from this tutorial to your own work, you will want save your Excel data as a csv file first.

Tutorial

Step 1: Get the Tutorial Data

Download the Tutorial Data (Easy)

I have created some data for this exercise in the form of a csv file.

Download the tutorial data here!

…Or Create the Tutorial Data Yourself (More Work)

If you prefer to create the data yourself in R  instead of downloading it, you can run the following:

# Only run this if you want to make your own data
# instead of using the csv I provided in the link.

# Create thing basic data
set.seed(42) # set random seed for replication
dept <- c(rep("HR", 20), rep("IT", 100), rep("Acct", 40))
team <- c(rep(1:5, each = 4), rep(1:10, each = 10), rep(1:5, each = 8))
emp <- runif(length(dept)) # create an employee number
yrs <- c(rnorm(n = 20, mean = 8, sd = 3), rnorm(n = 100, mean = 4, sd = 2), rnorm(n = 40, mean = 10, sd = 4)) # create years at company

# remove any negative values for years (this is simulated data afterall)
yrs <- ifelse(yrs < 0, 0, yrs) 

# Put all of that data together
d <- data.frame(emp, dept, yrs, team) 

# mixing the order so it's messy just like real HR data
ind <- order(d$emp)
d <- d[ind,]
d$emp <- seq_along(d$emp) # assigning employee numbers

# dropping row names
rownames(d) <- c()

# write out as CSV in the current working directory
write.csv(d,"./hr101_excel_back.csv", row.names = F) 

# note: try "getwd()" in the command line without the quotes to see where it was saved if you can't find it.

Step 2: Let’s Import!

Importing Option 1 (Easiest): The RStudio Import Button

Using the interface in RStudio is the easiest way what I usually choose.

  1. Click “Import Dataset” from the Environment tab in your RStudio interface.
  2. Select “From Local File”
  3. Select the options as needed. I keep the defaults but usually de-select the “Strings as factors” option.

Importing Option 2: Importing the Excel File with R code

Sometimes you will want to write the code out yourself instead of using the Import button. This is especially valuable if you are using a set of scripts that you run automatically.

We will use the read.csv function for this.

# General format
# your_dataframe_name <- read.csv("yourfilename", stringsAsFactors=FALSE)

hr101_excel_back <- read.csv("./hr101_excel_back.csv") 

Be sure that your filename includes both the location and the name itself.

Step 4: Shorten the Data Name

You should now have a dataframe object called “hr101_excel_back” with all of the data in it.

But that’s a long name. Let’s just copy it into a new dataframe “d” to make it easier.

Shorter object names really do end up saving tons time.

d <- hr101_excel_back

Step 5: Examine Your File

Let’s make sure the data looks like it should using “head(d)”

# View(d) Use this if you want to see all of the data
head(d) # just show the first 6 rows
##   emp dept       yrs team
## 1   1   IT 5.6732381    8
## 2   2   IT 4.8075498   10
## 3   3   IT 6.5753505    6
## 4   4   IT 0.8108237    9
## 5   5   IT 0.6778018    2
## 6   6   IT 2.9746995    2

Everything looks good.

Now let’s do something with our data that we can’t easily do in Excel.

Step 6: Do Some R Stuff

HR data is typically organized by the individual.

But usually people work as part of a team. Including some basic team-level measures in the individual data can be useful.

In this example, we’re going to calculate the average years of experience for each team and then add that team-level data to our individual-level data.

Click HERE if you want a deeper “how-to” on creating team-level measures from individual data as we do in the example below.

Calculating Team-level Years of Experience

The aggregate function is perfect for our task.

I will cover the aggregate function in a future tutorial.

In brief  though, aggregate works here by breaking down the years data (yrs) into pieces according to the department (dept) and the team number (team).

Then it calculates the mean for each of those pieces.

e <- aggregate(yrs ~ dept + team, data = d, mean)
head(e) # take a look at the data
##   dept team       yrs
## 1 Acct    1  8.638059
## 2   HR    1  9.303629
## 3   IT    1  4.458496
## 4 Acct    2 10.280665
## 5   HR    2  7.263329
## 6   IT    2  3.602855

Everything looks good so far.

Now let’s change the name of the “yrs” column to “team_avg_yrs” so it is labeled correctly.

names(e) # checking names of columns
## [1] "dept" "team" "yrs"
names(e)[3] <- "team_avg_yrs" # change name of column 3

Step 7: Merge the Data

We now have 2 sets of data, the original data “d” that contained the individual level data and the team-level “e” data we created using the aggregate function.

Now we want to add the team-level data to the individual dataset “d” so that each person will have the correct team-level average years of experience for their specific team.

We will use the merge function to bring these two datasets together and create a new data set that combines the right data.

For this to work we need to tell R what column names to match on. In this case the right match depends on the dept and team variables.

f <- merge(d, e, by = c("dept", "team"))
head(f) # looks good!
##   dept team emp       yrs team_avg_yrs
## 1 Acct    1  83 18.238157     8.638059
## 2 Acct    1 108  4.492554     8.638059
## 3 Acct    1  57  5.396578     8.638059
## 4 Acct    1  59 11.335109     8.638059
## 5 Acct    1 154  5.783777     8.638059
## 6 Acct    1 143  7.176714     8.638059

When we check our data we now see that f has all of the original individual data together with the new column we got when we merged the team-level data, matching on dept and team.

Step 8: Save Your New Data

Now we will save out new combined data in “f” to a new csv file that Excel can use.

fn = "your_file_name_here.csv" # include the folder and the .csv part
write.csv(f, fn) #f is the current data, fn is the file

Finally….

Just open up your new csv datafile  in Excel and you are back to working in Excel but with some new team-level data courtesy of R!

Like this post?

Get our FREE Turnover Mini Course!

You’ll get 5 insight-rich daily lessons delivered right to your inbox.

In this series you’ll discover:

  • How to calculate this critical HR metric
  • How turnover can actually be a GOOD thing for your organization
  • How to develop your own LEADING INDICATORS
  • Other insightful workforce metrics to use today

There’s a bunch more too. All free. All digestible. Right to your inbox.

Yes! Sign Me Up!

Comments or Questions?

Add your comments OR just send me an email: john@hranalytics101.com

I would be happy to answer them!

Contact Us

Yes, I would like to receive newsletters from HR Analytics 101.