Statterbrained: Draft with your PALs

Brian Malone

Combining expertise

Suppose you know only three rookie wide receiver “experts”: Ali, Ben, and Carla.  When given the choice between two prospects, Ali predicts the better one 60% of the time; Ben predicts the better one 70% of the time; and Carla predicts the better one 80% of the time.

Now suppose you’re on the clock in your rookie draft choosing between Leonte Carroo and Tyler Boyd.  You text Ali, Ben, and Carla.  Ali and Ben say to draft Carroo, but Carla says to draft Boyd.  What should you do?

[am4show have=’g1;’ guest_error=’sub_message’ user_error=’sub_message’ ]

Well, it depends.  If the three experts are purely independent — that is, there’s no correlation between when one chooses correctly and when another chooses correctly — you should go with Ali and Ben.

Think of it this way: imagine Ali, Ben, and Carla are each monkeys pulling marbles from a jar.  Ali has 60 red marbles and 40 blue; Ben 70 red and 30 blue; Carla 80 red and 20 blue.  Each monkey pulls a marble.  I tell you that Ali and Ben have pulled the same color marble, but Carla has pulled the other color.  If you guess which monkey(s) pulled a red marble, you win.

You should bet on Ali and Ben.  Independently, Carla has a 20% chance of pulling a blue marble, while Ali and Ben have just a 12% chance of both pulling a blue marble (0.4*0.3 = 0.12).  This is a simple example of Condorcet’s jury theorem.

But NFL draft experts aren’t monkeys pulling marbles from a jar.  Well, most of them aren’t.  And that stinks, because it means their opinions aren’t independent.  Film people are watching the same games.  Metrics people are analyzing the same combine and production numbers.  And they’re all talking to each other on Twitter.  Their approaches may differ, but because of their shared inputs and interactions, their hit rates won’t be independent.

Still, if two experts’ approaches are different enough, we should be able combine their rankings into a model that beats each expert on his own — and maybe even beats the NFL draft.  But first we need a catchy name.  How about the “Pairwise Agreement Lends Support” model?  Call it “PALs” for short.

Building a PALs model

I used Jon Moore and Matt Waldman for this version of PALs.  Specifically, I used Moore’s Phenom Index and Waldman’s pre-draft RSP ranks.  Both have data going back at least to 2009, and they are as close to independent as possible: the Phenom Index relies solely on age and college production, while the RSP ranks rely almost entirely on Waldman’s film study.

I collected Phenom ranks, RSP ranks, and NFL draft position for 173 WR prospects from the 2009 to 2014 rookie classes.  If a player didn’t have a listed Phenom rank, Waldman didn’t rank him, or he went undrafted, he was not included.

Using these prospects, I generated hundreds of “prospect pairs” within each season —  for example, Brian Quick and Alshon Jeffery were one pair in 2012.   I then scored only pairs where (1) Player A had an earlier NFL draft position while (2) Player B had a better Phenom rank and RSP rank.  So Quick and Jeffery were scored because both the RSP and the Phenom Index favored Jeffery, while Quick was drafted first.
To score the pairs, I used each player’s maximum fantasy points per game (using 0.5 PPR scoring) in his first three NFL seasons.  Only seasons with at least eight games counted.

When both experts disagreed with the draft, the experts were right 59% of the time.  In comparison, when each expert was matched head-to-head with the draft, the experts were right only 44% of the time.[1]  So you’re better off betting on the draft than a single expert, but when the two experts agree, you should go with them.  Here are the season-by-season and total results:

st1

That kind of success is very unlikely to have occurred by chance.  To illustrate, I ran 10,000 simulations of 200 coin flips.  One side won 118 or more times in only 1.23% of the simulations.

Creating PALs rankings

We’ve shown that when PALs and the NFL draft disagree, PALs wins more than 50% of the time.  So now we can just plug all the 2016 prospects into PALs to make our rankings … or not.

Consider this trio of players:

st2

The PALs model prefers Carroo to Thomas because Carroo has a better Waldman rank and Phenom rank.  And the model prefers Thomas to Boyd because Thomas has a better Draft rank and RSP rank.  But the model prefers Boyd to Carroo because Boyd has a better Draft rank and Phenom rank.  In other words, according to PALs, Carroo > Thomas > Boyd > Carroo.  Which is … um … not good.

Actually, this is an example of Arrow’s impossibility theorem: whenever you get more than three “people” voting on more than two head-to-head matchups, mischief ensues.

As the name “impossibility theorem” suggests, there’s no easy fix.  I came up with two workarounds: the conservative approach and the aggressive approach.

The conservative approach uses NFL draft position as the default ranking.  From there, the worst player is matched up against the second worst player.  Draft Rank, RSP Rank, and Phenom Rank each count as one vote, and whoever the majority prefers “wins.”  That winner then is matched up against the next worst prospect.  Keep going until everyone has been in at least one matchup.

The aggressive approach gives no extra deference to NFL draft position.  Instead, each prospect is matched up against every other prospect in a “round robin” style tournament.  Again, whoever gets a majority of “votes” wins the matchup.  We then rank everyone by the number of matchups won.

Each approach is imperfect — and indeed violates some fundamental notions about rational voting.  But that’s the whole Impossibility Theorem thing.  Anyhow, here are both rankings:

st3

Making new PALs

If you fancy yourself a player evaluation expert, you can still use the PALs model.  All you need are two sets of independent rankings and NFL draft position.  If your rankings are film based, I recommend combining them with a metrics-based ranking system — and vice versa.

Neither ranking system has to be better than the NFL draft; they just have to be fairly independent and each better than 50/50 at choosing between prospect pairs.  That said, I can’t promise that any particular pair of rankings will beat the draft.

If you have two sets of rankings and want help combining them using this method, find me on Twitter or in the DLF forums.

[1] I didn’t test Waldman.  Instead, I relied on Rotoviz’s test of Waldman in “Figuring out the NFL draft is hard.”  That’s a great article, and it inspired this one.

[/am4show]

brian malone
Latest posts by Brian Malone (see all)