Tutorial: How to Move HR Data From Excel to R and Back
In today’s tutorial I will show you the basic steps.
What You Will Learn
- How to open an Excel file in R
- How to do a team-level calculation using that data
- How to merge the new team-level data with your original data
- How to save your new combined file for use with Excel
Preliminaries: The Excel File Format
Excel files are typically saved in the .xlsx format.
In the HR analytics though, it is generally best to deal with .csv files instead. The “.csv” stands for comma separated values and tells data tools how to separate columns.
When later applying the lessons from this tutorial to your own work, you will want save your Excel data as a csv file first.
Step 1: Get the Tutorial Data
Download the Tutorial Data (Easy)
I have created some data for this exercise in the form of a csv file.
…Or Create the Tutorial Data Yourself (More Work)
If you prefer to create the data yourself in R instead of downloading it, you can run the following:
# Only run this if you want to make your own data # instead of using the csv I provided in the link. # Create thing basic data set.seed(42) # set random seed for replication dept <- c(rep("HR", 20), rep("IT", 100), rep("Acct", 40)) team <- c(rep(1:5, each = 4), rep(1:10, each = 10), rep(1:5, each = 8)) emp <- runif(length(dept)) # create an employee number yrs <- c(rnorm(n = 20, mean = 8, sd = 3), rnorm(n = 100, mean = 4, sd = 2), rnorm(n = 40, mean = 10, sd = 4)) # create years at company # remove any negative values for years (this is simulated data afterall) yrs <- ifelse(yrs < 0, 0, yrs) # Put all of that data together d <- data.frame(emp, dept, yrs, team) # mixing the order so it's messy just like real HR data ind <- order(d$emp) d <- d[ind,] d$emp <- seq_along(d$emp) # assigning employee numbers # dropping row names rownames(d) <- c() # write out as CSV in the current working directory write.csv(d,"./hr101_excel_back.csv", row.names = F) # note: try "getwd()" in the command line without the quotes to see where it was saved if you can't find it.
Step 2: Let’s Import!
Importing Option 1 (Easiest): The RStudio Import Button
Using the interface in RStudio is the easiest way what I usually choose.
- Click “Import Dataset” from the Environment tab in your RStudio interface.
- Select “From Local File”
- Select the options as needed. I keep the defaults but usually de-select the “Strings as factors” option.
Importing Option 2: Importing the Excel File with R code
Sometimes you will want to write the code out yourself instead of using the Import button. This is especially valuable if you are using a set of scripts that you run automatically.
We will use the read.csv function for this.
# General format # your_dataframe_name <- read.csv("yourfilename", stringsAsFactors=FALSE) hr101_excel_back <- read.csv("./hr101_excel_back.csv")
Be sure that your filename includes both the location and the name itself.
Step 4: Shorten the Data Name
You should now have a dataframe object called “hr101_excel_back” with all of the data in it.
But that’s a long name. Let’s just copy it into a new dataframe “d” to make it easier.
Shorter object names really do end up saving tons time.
d <- hr101_excel_back
Step 5: Examine Your File
Let’s make sure the data looks like it should using “head(d)”
# View(d) Use this if you want to see all of the data head(d) # just show the first 6 rows
## emp dept yrs team ## 1 1 IT 5.6732381 8 ## 2 2 IT 4.8075498 10 ## 3 3 IT 6.5753505 6 ## 4 4 IT 0.8108237 9 ## 5 5 IT 0.6778018 2 ## 6 6 IT 2.9746995 2
Everything looks good.
Now let’s do something with our data that we can’t easily do in Excel.
Step 6: Do Some R Stuff
HR data is typically organized by the individual.
But usually people work as part of a team. Including some basic team-level measures in the individual data can be useful.
In this example, we’re going to calculate the average years of experience for each team and then add that team-level data to our individual-level data.
Click HERE if you want a deeper “how-to” on creating team-level measures from individual data as we do in the example below.
Calculating Team-level Years of Experience
The aggregate function is perfect for our task.
I will cover the aggregate function in a future tutorial.
In brief though, aggregate works here by breaking down the years data (yrs) into pieces according to the department (dept) and the team number (team).
Then it calculates the mean for each of those pieces.
e <- aggregate(yrs ~ dept + team, data = d, mean) head(e) # take a look at the data
## dept team yrs ## 1 Acct 1 8.638059 ## 2 HR 1 9.303629 ## 3 IT 1 4.458496 ## 4 Acct 2 10.280665 ## 5 HR 2 7.263329 ## 6 IT 2 3.602855
Everything looks good so far.
Now let’s change the name of the “yrs” column to “team_avg_yrs” so it is labeled correctly.
names(e) # checking names of columns
##  "dept" "team" "yrs"
names(e) <- "team_avg_yrs" # change name of column 3
Step 7: Merge the Data
We now have 2 sets of data, the original data “d” that contained the individual level data and the team-level “e” data we created using the aggregate function.
Now we want to add the team-level data to the individual dataset “d” so that each person will have the correct team-level average years of experience for their specific team.
We will use the merge function to bring these two datasets together and create a new data set that combines the right data.
For this to work we need to tell R what column names to match on. In this case the right match depends on the dept and team variables.
f <- merge(d, e, by = c("dept", "team")) head(f) # looks good!
## dept team emp yrs team_avg_yrs ## 1 Acct 1 83 18.238157 8.638059 ## 2 Acct 1 108 4.492554 8.638059 ## 3 Acct 1 57 5.396578 8.638059 ## 4 Acct 1 59 11.335109 8.638059 ## 5 Acct 1 154 5.783777 8.638059 ## 6 Acct 1 143 7.176714 8.638059
When we check our data we now see that f has all of the original individual data together with the new column we got when we merged the team-level data, matching on dept and team.
Step 8: Save Your New Data
Now we will save out new combined data in “f” to a new csv file that Excel can use.
fn = "your_file_name_here.csv" # include the folder and the .csv part write.csv(f, fn) #f is the current data, fn is the file
Just open up your new csv datafile in Excel and you are back to working in Excel but with some new team-level data courtesy of R!
Like this post?
Get our FREE Turnover Mini Course!
You’ll get 5 insight-rich daily lessons delivered right to your inbox.
In this series you’ll discover:
- How to calculate this critical HR metric
- How turnover can actually be a GOOD thing for your organization
- How to develop your own LEADING INDICATORS
- Other insightful workforce metrics to use today
There’s a bunch more too. All free. All digestible. Right to your inbox.
Yes! Sign Me Up!
Comments or Questions?
Add your comments OR just send me an email: firstname.lastname@example.org
I would be happy to answer them!
- © 2023 HR Analytics 101