Reproducible HR Analytics, Part 1
One of the most common challenges in HR analytics (and really analytics) is understanding and reconstructing work you did before.
In the moment of creation, juggling three different files, dropping these columns, cleaning up that text, and pushing it all together to create the tables and figures you need, it’s easy to forget one thing: 3 months from now you won’t have as clue what you did.
The problem is that your analytics research was not reproducible.
In this post (Part 1 of 2), I will briefly introduce you to the idea of reproducible research and show you how to get started on your reproducibility journey using R Markdown.
In Part 2, I’ll cover a few additional techniques to help you accelerate your Reproducible HR Analytics process.
Reproducible HR Analytics Research
Roger Peng at Johns Hopkins (and his wonderful course at Coursera) provides a tidy definition:
Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them
We might not care about scientific publications in HR Analytics but we often face the same essential problem:
- Replicating and verifying our results
- Enabling others and ourselves to build on the work
As part of replicating our results, there is also an essential need to code up the analysis steps once and then reuse that code later on.
If we get in the habit of creating reproducible HR Analytics research, we can quickly verify and reproduce our results and also dramatically increase our productivity.
Moreover, as I will show quickly below, we can also summarize and share our findings for greater organizational impact
Rmarkdown: Your New Best Friend
What exactly is R Markdown? I’ve put some great introductory resources for you below, but in simplest terms, R Markdown is a document format that lets you integrate analysis and text to create documents that show your results and capture all of the precise steps you took to create those results.
Imagine never needing to name a file “final_final” again.
Imagine never wondering where your data came from.
Imagine returning to a challenging analysis from 7 months ago and having all of your work right there in front of you along with all of your notes and summary explanations.
Imagine writing code to solve a problem one time and then being able to use it again and again later to reproduce the same analysis with updated data.
How Do I Get Started?
I’ve put a few resources below, but you can just follow these 4 simple steps to get started.
- Open RStudio
- If you don’t have RStudio installed yet, just go here and select the FREE RStudio Desktop version.
- Click on your install icon within RStudio and install rmarkdown
- Create your first new R Markdown file by going to to File –> New File –> R Markdown…then select the “Document” option
- Select the “Knit” button at the top, choose the kind of output you want and let it rip!
The file will be created in the current working directory. If you can’t find it, run
getwd()in the console and check that location.
Open it up and gaze upon ye mighty works!
Cool…But What I Am Looking At?
The Rmarkdown file (“Rmd” for short) can be a bit confusing at first so I’ve put a quick and dirty summary view of it below
Really though, there are just three parts:
- The “YAML” header
- This just tells R how to handle this file, what title to include, etc. There are a million options available that you can google but for now just stick with changing the title, author, info etc. to get a feel for it.
- The code chunks
- These contain all of the R code that you want to execute
- The little
rinside the curly brackets just means “Run this code using R”; all the rest of the R coding rules and power apply here…it’s just R code.
- The text chunks
- Write up whatever you need to explain your steps or describe your results.
- The format is based on standard markdown formatting which is very powerful but simple.
Your Next Learning Steps
Don’t worry about understanding everything. You don’t need to.
You just need to know enough to get started. Google what you need to when you need to, but no sooner!
As a starting point, try the steps below and then follow up with the elements that are particularly important for you and your work.
- Start with the default Rmd file, change the text, and reknit…What happened? Did it do what you expected? Why or why not?
- Check out the RMarkdown cheatsheet and specifically look at the Markdown formatting section (section 3) showing how to write the syntax to produce different text output styles…then start experimenting yourself. * Create a bulleted list * Create a numbered list * Create different headers
- Analyze your own data inside one of the coding chunks * Import your data * Create a table * Make a basic plot
- Introduction * You can find a nice little intro to R Markdown here * https://rmarkdown.rstudio.com/
- Don’t Be Scared * https://bookdown.org/yihui/rmarkdown/
- Reproducible Research Course on Coursera (totally Free!): https://www.coursera.org/learn/reproducible-research
- RMarkdown cheatsheet: https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf