What is “Machine Learning” and Why Should I Care? (Part 2) Prediction is Not Understanding
In our previous discussion, we succinctly defined Machine Learning as getting a computer to learn from data, identify patterns, and make predictions. It stands to reason that someone will likely want to use those predictions, but there is a not-so-obvious twist: sometimes the more useful model might actually produce LESS accurate predictions. Huh?
The issue hinges on what we mean by “useful”. In the analytics world, modelers often distinguish between “models for prediction” versus “models for interpretation”.
A Model for Prediction
Consider your spam email detector, a classic model for prediction. The central goal of the spam detector is to shuttle spam to the trash and away from your inbox while also leaving your good emails alone. We only care that the spam detector can accurately identify spam. Trying to understand the integrated set of relations that led to those predictions from our distinctly limited human perspective just isn’t relevant here.
Moreover, it wasn’t even really relevant for those who wrote the machine learning algorithm. All that mattered was that the program learned from the user’s spam/ not spam decisions and quickly started doing its job.
The lesson? If we are modeling simply to maximize our predictive power, we don’t necessarily care about understanding the “why” behind the prediction.
A Model for Interpretation
But now consider a case where we need a model to help us take corrective action, say, reducing voluntary turnover. In this classic human capital analytics problem, our goal is not simply to predict who is likely to leave voluntarily. Rather, we need to also identify one or two comprehensible factors that we can isolate to help us determine what actions to take.
For example, an easily interpreted model might be able to tell us that those living more than 25 miles away from the office are more likely to quit. This perhaps suggests the possibility of offering more work at home opportunities for those employees or more flexible hours to reduce their time in traffic.
Or maybe our model instead indicates that employees tend to leave after three years. This might instead suggest the need for more developmental or promotional opportunities. Whatever the identifiable factor(s), the key idea is that we must be able to understand what the model means if we want to use it to help guide our subsequent actions.
Predictive accuracy will always be important. But simply being handed a list of 50 employees who are highly likely to quit for reasons that we don’t understand is not helpful if we intend to intervene. To impact the outcome, we are probably better off with a model that produces a potentially less accurate list of 30 names along with one or two identifiable factors signalling the reason for their likely departure.
We will touch on the topic of predictive accuracy versus understanding throughout these posts. For now, the take-home idea is simply that the aims of our predictive model must align with the aims of our action. To be sure, there will be times (as in our spam detector) that we do indeed want the most predictive power available. But in the more complex world of human and organizational behavior, actionable guidance from our second best model might be better after all.
Coming Up Next….
In our next installment, “The Data Scientist Will See You Now”.
Like this post?
Get our FREE Turnover Mini Course!
You’ll get 5 insight-rich daily lessons delivered right to your inbox.
In this series you’ll discover:
- How to calculate this critical HR metric
- How turnover can actually be a GOOD thing for your organization
- How to develop your own LEADING INDICATORS
- Other insightful workforce metrics to use today
There’s a bunch more too. All free. All digestible. Right to your inbox.
Yes! Sign Me Up!
Comments or Questions?
Add your comments OR just send me an email: firstname.lastname@example.org
I would be happy to answer them!
photo credit: <a href=”http://www.flickr.com/photos/8136496@N05/2196367188″>ENIAC</a> via <a href=”http://photopin.com”>photopin</a> <a href=”https://creativecommons.org/licenses/by/2.0/”>(license)</a>
- © 2022 HR Analytics 101