Burst, Agility, Strength and Speed in Standard Deviations
Author’s Note: This article talks about some of the theory and data behind the BAS3 measure of athleticism. I’ve tried to keep math and cumbersome statistical terms out of it as much as possible, but I’m afraid some was inevitable. It might facilitate reading if you first download the Excel spreadsheet and explore the calculations and instructions a bit.
Also, a quick special thanks for some early input from George Kritikos and Brian Malone at DLF, Telperion from the DLF Forum for his football data warehouse, TheFFGhost for his help with the final presentation of the data and Marty Jackson for help with some of the math and with figuring out the name.
And finally, I encourage anyone to use the BAS3 measure in their own research, or to modify it. I just ask that you please cite the author and the Dynasty League Football site when you do so.
The question of player athleticism is an important and controversial one. It seems obvious athleticism should somehow correlate with success in the NFL and both dynasty players and NFL teams themselves closely watch the NFL Combine and Pro Days. Uncovering the relationship between athleticism and success is difficult, because many other things also matter that aren’t accounted for and are difficult or impossible to measure. A relatively unathletic player may have excellent technique, a top-notch work ethic and high coachability that more than not offsets any physical limitations. Conversely, an athletic superstar may struggle with any or all of those things, or fall to nagging injuries or even find themselves lacking opportunity to show what they’re capable of because they’re buried behind an entrenched starter.
My goal here is not to dig into the question of how athleticism relates to performance, but to instead back up a step and think about how we measure athleticism.
SPARQ, Speed Score, and Others
There are a number of existing measures that attempt to summarize player athleticism into a single number, perhaps the most well-known of which is SPARQ. This was developed by NIKE for prep athletes on their way to college. Unfortunately, the formula for how they determine SPARQ scores is secret – we know what measures they use (height, weight, 40 yard dash, kneeling power ball throw, shuttle and vertical jump), but the end number emerges from a proverbial black box. A player’s measurements go in and a final number comes out.
There have been several attempts to reverse-engineer SPARQ scores based on the known inputs and apply them to players headed to the NFL, which do a fair job of approximating the final results. There are several problems with this, however:
- We’re assuming this formula applies the same to high school athletes and college athletes.
- We have to substitute bench press for kneeling power ball throw and we have to drop some measures we often have that NIKE doesn’t use (3 cone, broad jump). If possible, you never want to end up discarding relevant data when you already have it.
- We’re assuming SPARQ is an optimal way to measure athleticism. Since we don’t know how SPARQ is calculated, this assumption is based almost entirely on whatever credibility it gains from the NIKE name. This underscores one of the main attributes of an “athleticism” measure – it’s not a real value we can measure directly. Rather, it’s one we have to infer from things we can measure by using good reasoning and methods.
A New Measure of Athleticism: BAS3
[am4show have=’g1;’ guest_error=’sub_message’ user_error=’sub_message’ ]
In developing a new measure of athleticism, I had two primary goals: One, create a logical, consistent and transparent measure that anyone can access, and two, create a measure that does not directly relate unit changes in dissimilar events. A good example of why this second goal matters is the common version of “Speed Score” (SS) that uses the equation: SS = 200*Weight / 40Time^4. I’ll spare you the math, but suffice to say that if you set the Speed Score to any constant value, this equation relates an entire range of values of 40Time to a corresponding value for Weight. Lower 40Time by one second, find out exactly how much Weight needs to go up to keep the same Speed Score.
That’s a very specific and highly dubious thing to propose without lots of supporting theory. Now, imagine making an assumption like this to relate all eight metrics we gain just from the NFL Combine. What’s the tradeoff between broad jump and bench press? Shuttle and height? All models involve some assumptions, but to avoid this particular mess of prickly assumption-making, the core of BAS3 revolves around using standard deviations from the mean.
Again sparing the math, a “standard deviation” is a way to measure how dispersed a set of data is, expressed in the units of measure themselves. That is to say, the standard deviation of all player’s 40-yard dash times is expressed in seconds, the standard deviation of bench press is measured in reps, and so on. Once the standard deviation for a given measure is calculated, you can simply find out how far an individual player is from the average, then divide that average by the standard deviation – the result is the number of standard deviations they are above or below the mean.
Note: At this point some of you may be recalling some statistics and thinking of the normal distribution (or “bell curve”). This is a type of distribution where each standard deviation corresponds to a percentage of the population, which tells us, for example, that about 95% of a population falls within two standard deviations of the mean. It’s possible all of the measures we are looking at here come from normal distributions, but that assumption will not be necessary to the result. In short, those corresponding percentages do not apply to these values.
By using standard deviations from the mean, we can compare dissimilar measures without needing to consider messy questions like how a bench press score relates to a 3-cone score. What we are comparing instead is how far a player is from average in each category, expressed in a standardized unit. So, a player who is zero standard deviations from the mean in both the 40-yard dash and bench press is exactly average in each metric, and THAT is something we can relate across events.
One fairly small side effect of this is that adding new players to our sample will very slightly change the scores for all players in the sample. So when we include data from the 2016 class, every player from 1999-2015 will be tweaked slightly. This results from the fact that averages and standard deviations are calculated based on the entire population’s values. Change those values, change the results. However, unless two players were extremely close to each other in the end score AND the incoming group of players differs strongly from the previous players, all ordinal rankings will remain the same.
For ease of interpretation, the final BAS3 figures are normalized so the least athletic player in the sample will always have a value of 0 and the most athletic will have a value of 100. They are also rounded out to one decimal point, even though it does make a few players look equal when they are really separated by a small amount. The necessary level of precision for this data to distinctly rank all players is 5 decimal points, which makes it cumbersome to read. Since the number of players affected is very small, it wasn’t much of a sacrifice to make in the name of readability.
The Inputs
In order to be relevant to as many players as possible, I restricted myself to only the inputs that are widely available, such as those officially reported by the Combine and commonly at Pro Days. This means measures such as arm length, hand size and the 10 and 20 yard dash times are discarded. As mentioned before, we never want to discard data if we can avoid it. In this case, however, including them resulted in dropping too many observations. It’s certainly possible (with better data) to work on including them in a future version of BAS3.
One other element I am not using is height. I find height to be implausible as a measure of athleticism. Imagine two hypothetical players:
Does the top player really seem more athletic? I find that hard to argue. Additionally, height can be a physical advantage or disadvantage. RBs tend to be shorter, WRs tend to be taller. But some receivers are short (Steve Smith Sr, 5’9”) and some running backs are tall (Adrian Peterson, 6’1”). I feel like height is better considered in addition to an athleticism score, not as part of it.
Weight, on the other hand, is used just like the other metrics. If you imagined those two players from above, but dropped height and made one of them weigh 180 pounds and the other 240 while keeping all other measures the same, I think it is fairly obvious that the same scores for the heavier player suggest a higher level of athleticism than they do for the lighter player. Take for example, JJ Watt (6’5”, 290 pounds) and Justin Hardy (5’10”, 192 pounds) who both had 4.21 Shuttle times. It strikes me as self-evident that Watt’s suggests far greater athleticism due to his extra 98 pounds.
One of the assumptions I make is that, broadly speaking, being truly elite in one category and average in all others is “more athletic” than being slightly above average in all categories. Or in a simple example, being four standard deviations above the mean in one category and exactly at the mean in the other six categories results in a higher end score than being one standard deviation above the mean in four categories and exactly at the mean in the remaining three. Mathematically, this is accomplished through squaring the figures. This strikes me as a good assumption for two reasons:
- The specifics are subject to the exact distribution of the population, but it’s reasonable to assume that being two standard deviations from the mean is more than twice as hard as being one standard deviation from the mean.
- A player who is truly elite in one category should have an opportunity to exploit that category to their advantage. That is, a player who is extremely fast can play the game in a way that lets them use that speed, thus giving them an advantage over a player who is merely above average at everything.
A problem arises from this emphasis on elite over above average, because two pairs of our measures are highly related – broad and vertical jump, and 3 cone and shuttle. The first two are measures of explosion and lower-body power, while the other two are measures of agility. If a player is particularly agile, then they likely have a very good value in both 3 cone and shuttle. If we square both of those values in our calculations, we’re essentially double-emphasizing the same measure. Therefore, I combine 3 cone and shuttle into a single “agility” score by averaging a player’s standard deviations from the mean in each. I do the same for broad and vertical jump into a “burst” category. This move is supported both by the descriptions of events on the NFL Combine website, and by plotting the categories against each other and looking at how related they are.
Rather than go through the rest of the calculations here, the exact steps (and some more detailed instructions) are available in the accompanying Excel spreadsheet.
Without Further Ado
If you’ve stuck with me this far, thank you! Now that I’ve kept you waiting through a glance into the muck (if you skipped through it all to the end, I forgive you), here’s a snapshot of the top 15 most athletic players of 2015 and 2014 according to BAS3. Note that my sample does not include lineman.
I’ll explore some of the data and its implications more in a future article, but a glance at this snapshot suggests it is quite in line with the general narrative about who is “athletic.”
Vic Beasley is an example of a player who is above average in every category, but enough so that he still beats out guys who are particularly elite in a smaller number of categories, such as Chris Conley. Beasley’s weakest category is his 40-yard dash, where he is still .41 standard deviations above the mean, while his strongest is Bench where he is almost 3. Conley, on the other hand, is actually below average in Weight, Bench and Shuttle and almost exactly average in 3 cone. However, he is 3.2 standard deviations above the mean in Vert and Broad, and 1.66 in 40Yard. Nikita Whitlock is an even more extreme example of a truly elite athlete in one category. His 43 Bench is an unheard-of 4.4 standard deviations above the mean, the highest value for any player in any category since 1999.
Of course, he’s also an example of how athleticism doesn’t directly translate to NFL success, as he was an undrafted free agent who has done short stints with at least three teams (mainly on practice squads) in the span of less than two years. Hopefully, with the help of BAS3, we can do more exploration into who succeeds and who doesn’t at the NFL level.
[/am4show]
- The Five Rules of Dynasty Trading - August 30, 2016
- Not So Fast There, Average - February 29, 2016
- Why Don’t Combine Metrics Predict Success? - January 21, 2016

Why would a higher weight suggest a player is more athletic ? It seems that if all criteria are equal then they are athletically equal one is just larger than the other. Bigger is better is the old saying but unless you are driving railroad spikes using too big of a hammer will just get you tired faster . Bigger may allow different uses but the athleticism is still the same. The weight could actually be a detriment in that it could affect stamina perhaps? perhaps not ? Unless it is a tug of war then the heavier player may be considered automatically more valuable.
The other thing I see is if you rely too heavily in the metric based information you will get fired from your football scouting job and have a mediocre to terrible fantasy football team. I admire the work that you put forth and I always check out the various metric based studies done by DLF and the message board users but I have yet to see one metric that can come close to being used as a fail safe over just watching players play the sport and deciding whom is better at that sport.
Also, I want to add that one of my goals in making the data fully available was that others can modify it if they wish. So if you think it would be better without weight figured in, just recalculate it! I tried dozens of different specifications before settling on this one.
Just let us know if you find anything interesting.
The case for including weight is quite easy to make. A wildly simplified way to think of athleticism is a measure of the functional force someone can generate. Running fast, jumping high, lifting weights, throwing balls- these are all applications of force.
Force = mass x acceleration. The greater the mass, the greater the force required to achieve equal acceleration. Imagine you run the 40-yard dash in 5 seconds. Now imagine you rest up and run it again tomorrow, this time wearing a 30-pound weighted vest. Your time is going to be longer than 5 seconds, right? Because the amount of force you can generate did not go up, but with the greater mass, the amount of acceleration that force creates has gone down.
The case for not including height is also relatively easy to make. Let’s say you’re 5’6″ tall. You run a 40-yard dash. How far do your feet travel? In theory, 40 yards. Now, let’s say you drink a magical potion that makes you grow a foot overnight. You run the 40-yard dash again tomorrow at 6’6″. How far do your feet travel? Again, it’s just 40-yards. Assuming your weight did not increase, it takes the exact same amount of energy to propel you through the run, too.
Height will probably confound some of these measures of athleticism a bit. Longer arms are a detriment in the bench press. Longer legs are an asset in the broad jump. I don’t know if height-adjusting either of those scores would be a net improvement or not. But it definitely functions entirely differently from weight, and I think it’s justified to include the latter but not the former.
A much more detailed defense than I offered in the article. Thanks Adam!
Great article Jeff!… I have just a couple questions.
1) Do you plan on weighting any of these metrics based on position played/weight of the player? For example, you can imagine that a LB will quite possibly put up better rep metrics than a CB… so it would be more impressive for the CB to be 1 STD.P above average for the bench than it would be for a LB, assuming all positions are being compared on a single grand scale.
2) How do we get around the idea of an off day for small standard deviation metrics? If say, in the 40 time, a particular player happens to run 0.1 s slower time… that can have a significant impact to their BAS3 score, especially if the off day translates to other areas of running/jumping. This can be seen in the comparison of Rowe and Shaw from this year… they look relatively even the way things stood, but given a different day those BAS3 numbers could have been up 4 points different with just a 1% bump or drop in metric performance. This could very well mean that all the CB’s on this years list fall within the standard error of the test, and could be considered possibly equal. Thoughts?
This is pretty similar to what I’ve been doing the last few years with WRs. I tried applying a similar type of measurement to other positions, but I’ve found the correlation to be pretty low for positions like RB and QB since there are so many non-quantifiable traits which lead to success at those positions. For example, it doesn’t matter how fast a RB is if they lack the instincts and vision to feel and see the holes opening up. Unfortunately it is really tough to measure someone’s vision on the football field with a number. For that reason, I’ve gone back to just looking at WRs. It also allows me to tailor what measurements I use. I’ve found measurements like the bench press have very minimal impact on a wide receiver’s success, so I’ve removed it from my calculations.
One thing to consider instead of just weight is looking at something which looks at it as more of a ratio. BMI is far from an exact measurement, but it at least takes height and weight into account instead of just weight.
Here’s the link to part one of the latest wide receiver article for reference: http://dynastyleaguefootball.com/2015/04/12/wide-receiver-combine-analysis-part-one/
I actually considered BMI in some earlier versions of BAS3, but decided height wasn’t part of athleticism, so went back to just weight.
As for the correlations to success, that’s actually not surprising to me. I would be shocked if simple athleticism correlated with success – too many other omitted variables. But I’m working on another way to consider it as a piece of the puzzle, and my hope is that others will to.