Why Don’t Combine Metrics Predict Success?

Jeffrey Levy

combine

As we enter the fantasy football off-season, the dynasty community shifts its attention to the upcoming NFL Combine the end of February, and the draft in April. For people particularly into scouting and the numbers side of things, they may find themselves rather awkwardly spending the regular season looking forward to this time. All the football, none of the weekly disappointment! The running backs on your roster might even go a week without injuring their knees and hamstrings.

While the Combine provides us with a wealth of new data, anyone who studies player metrics will inevitably face a troubling fact: they don’t seem very good at predicting success. This is at odds with the common-sense idea that physical skills obviously matter for being a good football player (Christine Michael notwithstanding).

Given that, I’m going to try to explain how it’s possible that physical metrics DO matter even though the data doesn’t really show it. I’ll need to use a bit of math and statistics to do so, but I’ll try to keep it simple and minimize the jargon.

Models and Omitted Variable Bias

[am4show have=’g1;’ guest_error=’sub_message’ user_error=’sub_message’ ]

Most data analysis involves creating an underlying model. That is, an assumption about the way we think this data works in order to estimate some results. To use a common example, if you just create a simple graph with a metric on the vertical axis (e.g. SPARQ, BAS3, or a single measure like 40 time) and a measure of NFL success on the other (e.g. total fantasy points, ADP, number of years in the top ten of their position) then plot a best fit line, the model that you’re implicitly assuming is:

<success> = a + (b*<metric>)

Those of you who recall high school algebra might recognize it as the slope-intercept form of a line, often written something like y = mx+b or y = a + bx, where b is the slope of the best fit line and a is where it intercepts the y-axis.

Or to put the math in simple English: “If I change my metric by one, how much does success change by?”

If you were to look at such a plot, you would almost certainly notice that the best fit line isn’t really a very good fit. It may even slope the wrong way, suggesting that an increase in your metric leads to a decrease in NFL success. A big part of the problem you’re running into is omitted variable bias.

Omitted variable bias comes about when we create a model like the one above, but are missing loads of explanatory variables. NFL success isn’t just a function of metrics; it’s a function of metrics, and skill, and work ethic, and off-field problems, and injuries, and scheme, and luck (with a lower-case l, not the Andrew variety), and Luck (the Andrew variety, aka quality teammates), and being in the NFC East, and many more. Many of them depend on each other, and many of them are very hard, or even impossible, to measure. To get around this messy and missing data, we’re going to create a simplified simulation and fill it with some randomly generated players.

The Simulation

To create our simulated world, we’ll follow a series of steps. First, we’ll decide the exact formula for success, second, we’ll create a batch of players by giving them random attributes, and third we’ll use the first two steps calculate their level of success.

Each of our players will have eight attributes, which we’ll call “a” through “h”. For simplicity there are no other attributes, no error, and nothing is unmeasured. All of those attributes together completely determine success, according to this formula I made up:

success = .3*a + .2*b – .1*c + .7*d + .4*e -.2*f + .8*g + .4*h

The numbers in there are called coefficients, but really they’re just telling us how much success changes when we change an attribute by one. Hopefully a quick look at it will show you that if you raise “e” by 1, success rises by .4. Raise “f” by 1, success falls by .2.

Now that our simulated world has a structure, we have to create players. To do so I’m going to generate some random values from a normal distribution, where the mean is 0 and the standard deviation is 1. To put it in simple terms, most of the players will be near the average in each value, with progressively less and less players being very good or very bad. This type of number can be described in shorthand as N(0,1), which I’ll use below, and can be calculated in Excel (for those of you following along at home) by the formula: =NORM.INV(RAND(),0,1). Our world has eight attributes; I’m going to create the first six, “a” through “f” this way.

The last two attributes, “g” and “h”, we will create as interactions of the other attributes. Imagine, for example, that “a” is physical metrics and “b” is work ethic. Both of those obviously matter, on their own, for NFL success. However, what also matters is the combination of the two, or how they feed back into each other. For example, if you’re physically gifted and have a high work ethic, that matters to success above and beyond each one alone. However, if you have a bad work ethic then being physically gifted actually makes things even worse. Such a player may have never had to work hard for success in their life, because their physical skills always made it easy. This is called an interaction term, which here we’ll simply model as a * b = g and c * d = h. You can probably imagine all sorts of real-world interaction terms.

We then plug the values for those attributes into our success formula to determine a value for how each player does. An example player, then, will look like this:

levy success

Keep in mind as we’re doing this that my numbers and structure are mostly arbitrary; there are countless ways this could be done that will change the results. My goal is to use a relatively simple design to illustrate omitted variable bias.

Now we’ll go ahead and create fifty such players this way and then pretend some simulated scientists are studying this world to figure out how to predict NFL player success.

Simulated Scientists

For our first study, our scientists don’t know the coefficients that were used to create the world, because all they see for each player are their attributes, “a” through “h”, and their success. However, they DO know that success is a formula of only those eight things, and they have measurements for them.

We’re going to use a simple method called a regression, to try to find the relationship between the attribute variables on the right and the success variable on the left. I won’t go into the details of what this entails, or what it assumes, but suffice to say that with our simulated data it would exactly return all of those coefficients we used to create the players, the p-values (the measures of significance of our results) would all be effectively zero (lower means it’s more confidently estimated), and the R-squared would be 1 (meaning the entire variance in success is captured; 0 means none of it is). In simple English, if you accurately model the entirety of the way the simulated world works, and accurately measure all the determining variables “a” through “h”, you can perfectly estimate success.

But that’s not very realistic, so now let’s handicap our scientists. Much like us when we make a simple plot with success on one axis and physical metrics on the other, our scientists will only have measurements for success and for attribute “a”. They either don’t know that “b” through “h” matter, or they don’t have a way to measure them. Can our simulated scientists still figure out how “a” matters to success? Let’s take a look at a plot for the data I created:

levy plot

Well that looks terrible. What might our scientists conclude from this about the relationship between attribute “a” and success? The obvious answer is that attribute “a” is essentially irrelevant. In fact, if they run a regression using this model they would find the p-value to be insignificant and the R-squared to be very, very close to zero. But we know the underlying truth of this simulated world because we created it, and we know that attribute “a” is a determining factor for success.

Our simulated scientists are suffering from omitted variable bias, just like we do in the real world. Because they don’t know all the other factors that go into success, they can’t come very close to estimating the effect of the one they do know.

The Moral of the Story

My little simulation was created specifically to illustrate omitted variable bias. You can try changing it around, and you’ll find very different results. You may even find a relationship between success and a single attribute, if the simulation you create is very simple or that attribute determines a large enough part of success relative to the others. If attribute “a” determines 90% of success and attributes “b” through “h” determine the remaining 10%, then we can very likely get close to the truth even without the others. The degree to which a plot may be deceptive will generally be proportional to what’s missing from the explanation.

This is important, of course, because in the real world the number of things that determine an NFL player’s success is enormous. Not only do we not know what all of them are, many of the ones we suspect to matter either aren’t measurable or aren’t comparable between players. Physical metrics are easy to do both with, which is likely why we are constantly talking about them. That is, we can measure bench press and can clearly say 20 reps are better than 15 reps, and by exactly how much. But what about work ethic, or the drive to win, or the strength of a player’s ACL? Even if you could estimate them somehow (e.g. “this player’s work ethic is high”), how would you put them to numbers and compare them (e.g. “this player’s work ethic is 20, and this other player’s is 15.”)?

The fact that we can’t come up with a full measure of success means that unraveling the relationship between physical metrics and success is a difficult task. I think it’s fairly obvious that being faster or more agile or stronger is better, and given the attention the Combine receives I think most would agree. We just have to be very careful when we’re attempting to quantify that advantage.

As with our simulated scientists, simple two-variable graphs are one of the very common ways we deceive ourselves in this sort of analysis. If you take away only one thing from this article, let it be that the first thing that comes to mind the next time you see a graph like that is suspicion, followed by “what is missing from this?”

[/am4show]

jeffrey levy
Latest posts by Jeffrey Levy (see all)