Predictiveness of ATP Rankings – Research Notes
I’m working on some bigger projects right now that might take some time before they see the light. In the meantime, here are a couple of things I’ve discovered about ATP rankings and their use to predict the outcome of matches.
1. In my earlier research, I found that in the “buckets” of matches that the favorite is most likely to win, my algorithm is still reasonably accurate. In other words, if the ranking points predict that Nadal, say, has a 98% chance of beating the 140th ranked player, his chances are in fact that high. The algorithm was as accurate on the extreme high end as it was anywhere else on the spectrum.
However, I only included matches in my sample where both players were ranked inside the top 200. I thought that was an innocuous enough cutoff, but I see now why it was misleading. If we limit the the sample that way, the most extreme favorites will only be the very top players. In fact, the only players who my algorithm gives a 95% chance of beating the 200th ranked player are the top 5.
When I expanded the sample to players ranked outside of the top 200, the high end broke down. In other words, in the bucket of matches where the favorite had a 90% or better chance of winning, the favorite isn’t winning that often.
There are several possible explanations for this, none of which account for the entire effect, but many of which surely play a part:
- I’m still only looking at ATP-level matches, and if a player outside of the top 200 is in an ATP main draw match, he was not exactly randomly selected. He may be playing at “home” on a wild card, he may be hot after a solid week in qualifying, he is probably on his favorite surface, and his ranking may be misleading due to injury.
- Outside of the top 5 or top 10, players are substantially less consistent. It’s tough to imagine Robin Soderling losing to a qualifier right now, but easy to see, say, Fernando Verdasco or Ivan Ljubicic doing so.
More fundamentally, I suspect that the further down the rankings you go, the less the difference in points really mean. Certainly there’s much more movement–once you get outside the top 50, one good showing can easily gain you 10, 20, or more spots. That doesn’t mean that a player is suddenly more skilled, which is the way my algorithm has to treat him.
Controlling for surface, wild card status, and more will help reconcile some of these differences, but ultimately, matches between drastically mismatched (on paper) opponents may have to be treated differently than matches between more closely matched peers.
2. Eliminating some quirks of the ATP ranking system doesn’t break it, at least not for my purposes. In the process of my current projects, I wanted to be able to more easily tweak the parameters of the ranking system, so I started by rebuilding the existing one. But there are a lot of quirks:
- The top 4 or 5 players get a lot of points from the Tour Championships.
- Davis Cup players get points.
- Rankings are limited to a player’s top 18 tournaments, but there are some limitations on what those tournaments must be, resulting in cases where player gets credited for a poor showing at a grand slam, but does not get credited for a better showing (worth more points) at a smaller tournament.
All of these quirks have their purposes, given the ATP’s priorities are built around keeping fans interested and ensuring that top players focus on the most important events. But they are a pain in the butt to incorporate in an on-the-fly system, so I just ignored them.
And as it turns out, they are not affecting my results in any meaningful way. I’ve re-run a couple of earlier projects with my “improvised” rankings, and nothing is changing by more than a percent or two. Occasionally the effect is strong on a certain player (I think the improvised system bumps Juan Carlos Ferrero from #29 to inside the top #15 at 2010 year-end), but in the aggregate, it makes no difference.