What does “Statistically Significant” Mean? What the HR Pro Needs to Know
Sooner or later, you will be presenting data and someone will ask you whether the difference you see between two groups is “significant” or “statistically significant”.
Most HR professionals finding themselves in this position will fall into one of two groups:
- They’ve heard the term before, but don’t know what it means and did not run a test for significance.
- They’ve heard the term before and don’t know what it means but they ran a test for significance anyway because they had a vague sense they should.
Whether you find yourself genuinely pinched in this situation or merely out of your comfort zone, it’s not a fun place to be.
In today’s tutorial, we’re going to provide a very basic, intuitive understanding of what “statistical significance” means so you can present your findings with reasonable confidence. This tutorial is for those with little or no statistical background. Deeper treatments of the underlying statistical concepts are available on numerous other websites.
“Significant” DOES NOT Mean “Important”:
The first thing to know is that “significant” in the statistical sense does not mean “important”.
If I could choose only one thing for you to remember from the this post, this would be it.
So What DOES “Significant” Mean?
Let’s ground our explanation in an example.
Suppose we have two factories, one in New Albany, Indiana (100 employees) and the other in Cincinnati, Ohio (100 employees). The VP of HR wants to know whether the experience level of those in New Albany and differs from that of those in Cincinnati.
We’ll start with data.
# making some data set.seed(42) new_alb <- rnorm(100, 7.1, 3) new_alb <- ifelse(new_alb < 0, 0., new_alb) #eliminating any negative values cin <- rnorm(100, 6.9, 3) cin <- ifelse(cin < 0, 0, cin) #eliminating any negatives exper <- data.frame(new_alb, cin)
To answer this question given the data, you first decide to simply compare the average number of years of experience between the two factories. New Albany has a mean experience level of 7.23, Cincinnati 6.64. This gives us a difference of .59 years.
## new_alb cin ## Min. : 0.000 Min. : 0.826 ## 1st Qu.: 5.250 1st Qu.: 5.126 ## Median : 7.369 Median : 6.692 ## Mean : 7.229 Mean : 6.638 ## 3rd Qu.: 9.085 3rd Qu.: 8.285 ## Max. :13.960 Max. :15.006
mean(exper$new_alb) - mean(exper$cin)
##  0.5911222
So in one sense, your job is done. You compared the means and they are not the same!
The Means Aren’t the Same BUT…
But you know this is not really satisfactory.
After all, was there any chance that the mean years of experience for the two factories would ACTUALLY be the same?
Intuitively, you know this won’t be the case because there are different people in each factory. Some people have just been hired, some have been there for 15 years, and most others are in between. In short, we have differences.
Even if you took all 200 people and just randomly assigned them to one group or another, the average for those two groups would not be EXACTLY the same.
The Question You Are Really Asking
What you really want to know then is the following: How likely am I to observe a value as extreme as the one I got if there are no differences between the groups?
Here, that means we are asking “How likely am I to get a difference as extreme as .59 if I assume there are no differences between the two factories?”
This assumption that any difference between the two groups is just chance is referred to as the Null Hypothesis.
The Alternative Hypothesis is that the observed difference is not just chance (i.e. there is a “real” difference between the two).
Testing the Difference
With that setup, we now need to figure out whether the observed difference was chance or not.
We’ll answer this question by comparing these two groups using something statisticians call a t-test.
I won’t go into a deep dive on this because I promised a basic explanation, but t-tests are common and you can run them in Excel, R, or almost any analysis software.
The t-test will return something called a p-value. The p-value represents the probability of getting a difference as extreme as the one you observed with the current data IF the null hypothesis is true (i.e. if there is no “real” difference between the groups). A p-value will always be between 0 and 1.
It works like this.
If we get a high p-value it means the chances of getting the observed difference (.59 years) given no real difference between the groups were pretty good. In casual terms it says “Hey, this outcome was pretty likely if we assume these groups are really the same. There is nothing special here”.
When we get a high p-value, we say the difference we observed is “non-significant”.
If we get a low p-value it means the chances of getting a score as extreme as the observed difference (.59 years) given no “real” difference between the groups were really low.
When we get a low p-value, we say that observed difference is “significant”.
In essence, a small p-value says “If you assume these two groups are the same, then the chances of getting a difference like the one you got are tiny. Therefore, your assumption that these groups are the same is probably wrong.”
By convention, anything below .05 is called “significant” and anything above .05 is “non-significant”.
I feel compelled to note that many statisticians think this is not ultimately the right way to handle this decision process (and I more or less agree with them).
However, right or wrong, this is general convention and at least understanding the convention is a key early step in honing your HR Analytics skills.
Interpreting Our Significance Test Result
Let’s run the t-test and see what we get:
#comparing two separate groups of people so paired = F t.test(exper$new_alb, exper$cin, paired = F)
## ## Welch Two Sample t-test ## ## data: exper$new_alb and exper$cin ## t = 1.45, df = 195.43, p-value = 0.1487 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.2128877 1.3951321 ## sample estimates: ## mean of x mean of y ## 7.228671 6.637549
There are lots of things here but we’ll just focus on that p-value today.
The result of .28 says “You have a 28% of getting a difference at least as extreme as .59 years given your assumption that the groups are the same”. Statisticians are conservative by nature when it comes to hypotheses so given these results, we will not reject the Null Hypothesis; we’ll stick with our going assumption that there is no “real” difference between the two groups.
I’ll be short and sweet.
- “Significant”” DOES NOT mean important
- A significant difference is a difference that is unlikely to occur if we assume that the any observed differences are just chance.
If you have additional questions or want more information on this topic, email me at email@example.com or simply post a comment.
Like this post?
Get our FREE Turnover Mini Course!
You’ll get 5 insight-rich daily lessons delivered right to your inbox.
In this series you’ll discover:
- How to calculate this critical HR metric
- How turnover can actually be a GOOD thing for your organization
- How to develop your own LEADING INDICATORS
- Other insightful workforce metrics to use today
There’s a bunch more too. All free. All digestible. Right to your inbox.
Yes! Sign Me Up!
Comments or Questions?
Add your comments OR just send me an email: firstname.lastname@example.org
I would be happy to answer them!
- © 2022 HR Analytics 101