Lefties in Tennis: Doubles and Prize Money
A few days ago, I offered some numbers on the prevalence of lefties in men’s tennis. It turned out that, in the top 300 of the ATP singles rankings, lefties don’t show up much more than you would expect them to.
A reasonable follow-up question would be: What about doubles?
Being left-handed may not make one a better doubles player, but being left-handed does have the potential to make one part of a better doubles team. Case in point: Five of the eight doubles teams that earned a spot in the ATP Tour Finals last year were a righty/lefty duo, including the top two teams in the year-end rankings.
And indeed, it turns out that left-handers are more prevalent in the top ranks of men’s doubles. As we’ve seen, in November 2010, five of the sixteen players (31 percent) included in the ATP Tour Finals were left-handed.
The most current ATP doubles rankings tell a similar, if less extreme, story. Of the top 100 ranked doubles players, 18 are left-handed. That’s considerably higher than the 12 of 100 at the top of the singles rankings. (Both top 100s include Rafael Nadal, who plays left-handed but was born right-hand dominant. These calculations consider him left-handed.)
Prize money
The majority of players participate in both singles and doubles, at least on occasion. To determine some general level of “success” for ATP players, we could look at total prize money. This weights singles much more heavily. An advantage is that it is a reasonable measure of a sustainable career in professional tennis.
So, do left-handers have a better chance at making money in tennis than we would expect, given their prevalence in the general population?
It doesn’t look like there is any substantial advantage. Of the top 100 money-winners, 13 are left-handed, including Nadal. The top 100 does include four doubles specialists, out of only 13 total doubles specialists in the top 100.
If we go further, we find an additional five lefties from 101 to 150, and six more from 151 to 200.
Left-handers do seem to have a better chance than right-handers of reaching a certain level of success in men’s doubles. Beyond that, there is little in the way of a handedness advantage. Whatever the advantages of playing tennis left-handed and the challenges of facing a lefty, they don’t translate into an overwhelming number of left-handers at the top of the professional game, or a disproportionate level of success for left-handed professionals.
Comments Off
The Prevalence of Lefties in Men’s Tennis
Many people, in and out of tennis, believe that left-handed players have an advantage of some kind. The perceived advantage may just be one of unfamiliarity; a junior or club-level player doesn’t see many lefties, so he is unaccustomed to the angles and spins that come of a left-hander’s racquet.
In any event, we need some hard data. Are lefties overrepresented in the top ranks of professional men’s tennis?
The short answer: Not really.
There’s no universal consensus on the prevalence of left-hand dominance in the general population. You’ll frequently see the figure 10 percent, or a range between 8 and 15 percent. How does that compare to the number of lefties in the ATP rankings?
Here is a breakdown of lefties in the ATP rankings of 7 Feb 2011:
- Top 10: 2 (20%)
- Top 20: 3 (15%)
- Top 50: 6 (12%)
- Top 100: 12 (12%)
- Top 200: 29 (14.5%)
- Top 300: 40 (13.3%)
An interesting case is Rafael Nadal, who was born right-hand dominant, but was taught to play left-handed. So if we are looking at the success rates of left-hand dominant players, we could subtract one from each of the raw totals above. Of course, there may be other players who were taught to play with their non-dominant hand.
(An odder case is that of Guillermo Olaso, who is listed on the ATP site as ambidextrous. Other resources show him as right-handed. I saw him play a couple of years ago and don’t remember anything unique about his game, so I left him in the righty category.)
The advantage, if any
A perspective that I’ve heard (I have no idea from where) is that lefties can take advantage of the unfamiliarity advantage early in their careers, giving them a foundation of success that earns them more matches, more support, more coaching, and the like. The left-handedness doesn’t make them a better player, exactly, but it causes other things that lead to an improvement in their play.
Depending on how long that advantage persists, we might expect to see a “bulge” in the number of lefties somewhere in the rankings. There’s a bit of a blip in the 101-200 range, and there’s a bigger one if we narrow our focus to 151-200, where 10 of the 50 men play left-handed. Perhaps unfamiliarity helps them get to some level, but when they start meeting opponents at higher levels, the unfamiliarity advantage is not enough.
The blip between 101 and 200 might not mean anything; perhaps if we went further down the rankings, or even into the national or junior rankings, we’d see something more pronounced. Alas, it was hard enough to get handedness for the top 300 players, so any larger project will have to wait for another day.
Quantifying the Bias of an ATP Draw
ATP tennis draws are biased in favor of top-ranked players. If you’re ranked in the top four, you won’t face another top-16 player until the round of 16, you won’t face a top-8 player until the quarters, and you won’t face a fellow top-4 player until the semis. If you’re unseeded (out of the top 32 in slams, 16 or 8 in smaller tourneys), you’ll probably have to face a top-16 player just to get into the round of 16 … and you might draw a top-4 player in the first round.
This is the way it is, and it’s not going to change anytime soon. Since it’s the nature of the beast, we should better understand the effects of this system.
In short: The more highly ranked you are, the easier it probably will be to win the first few matches of a tournament. The further you go in the draw, the more points you earn, and the higher your ranking stays. A higher pre-tournament ranking–regardless of actual skill!–increases your odds of a better performance.
Thus, the rankings lag behind changes in skill level, creating a bias against rapidly-improving youngsters and players returning from long absences.
An example
Let’s play around with this a bit. Before the Australian Open started, I published the odds that each player in the main draw would reach any given round. From that, we can calculate “expected points,” which gives us a way of directly comparing each player’s chances, given his skill level and his draw.
For instance, Nadal’s expected points were 1056, Federer’s were 857, and 11th-seed Jurgen Melzer’s were 227.
What happens if swap two players’ positions in the draw? Let’s try #4 and the top-ranked non-seed. Going into the tournament, #4 was Robin Soderling, and Phillip Kohlschreiber was #34, unseeded. In the draw as it actually happened, Soderling’s expected points were 515 and Kohlschreiber’s were 56, thanks in part to a 2nd round matchup against Tomas Berdych.
If we exchange Soderling’s and Kohlschreiber’s draw positions and run the simulation, we get very different results. Soderling’s expected points are 353 (down 31 percent) and Kohlschreiber’s expected points improve to 103 (up 84 percent).
Randomization
The Soderling/Kohlschreiber swap may be an outlier. We can do better, and besides, I don’t want to type “Kohlschreiber” anymore.
Let’s try a new simulation. For each run, we’ll randomize the draw positions, so Nadal has an equal chance of drawing Marcos Daniel, Roger Federer, or anybody else in the first round.
The differences in the results are substantial. Nearly 75 percent of players have their expected points change by more than 10 percent. 39 of the 128 players see their expected points decrease with randomization, and those players are disproportionately seeds. The seeds are disproportionately high seeds.
Two types of players seem to benefit from the status quo:
- High seeds. They are guaranteed non-seeded opponents for at least two rounds, and lower-seeded opponents for a round or two after that.
- Marginal players who get lucky draws. The player who the Aussie Open draw benefited the most was wild card Benoit Paire. He was one of the weakest players in the field, but in the first round, he drew Flavio Cipolla, one of the few competitors who was even weaker.
Of the 39 players who do better under the original draw, 19 are seeds and 9 are WCs or qualifiers, mostly in situations like Paire’s. That leaves only 11 middle-of-the-pack, unseeded players who weren’t disadvantaged by the draw.
If the draw had been randomized, half of the field (mostly unseeded players) would have seen their expected points increase by more than 10 percent. 52 players would have jumped by 20 percent or more, 37 by more than 30 percent, and 22 by more than 50 percent.
Season-long bias
To some extent, the bias is mitigated over the course of a season. Players like Kohlschreiber are disadvantaged by the draw so long as their ranking stays in the unseeded-but-good 33-50 range for slams, but in smaller tournaments, such a player is often seeded.
And, of course, by playing 20-30 tournaments, the draws are randomized for some players. Paire got lucky by drawing Cipolla in Melbourne, but he could just as easily have found himself pitted against a top seed.
As is intuitively obvious, draws are biased in favor of the top players, and that is one thing that isn’t mitigated by a year’s worth of tournaments. The top 12 seeds all did better in the actual draw simulation than in the randomized simulation, and I expect that would be true for the vast majority of tournaments.
If some players are consistently “winning” through draw bias, there must be losers. As we’ve seen, lower-ranked players can win big or lose big in a draw, but it stands to reason that, over the course of the season, they lose a little bit. At least until they overcome and disadvantage and become top-ranked players themselves.
Comments Off
Home Court Advantage Run Amok
This week in Johannesburg is the only time of year that an ATP tour-level event goes to South Africa. Accordingly, all the South Africans take part, the wild cards are generally awarded to South Africans, and a disproportionate number of entries in the qualifying draw are South Africans. And they performed unexpectedly well.
Thus, of the main draw of 32, 6 players were South Africans. They included 4th seed Kevin Anderson (ranked 59th), wild cards Fritz Wolmarans (261), Rik de Voest (183), and Izak van der Merwe (170), along with qualifiers Raven Klaasen (307) and Nikala Scholtz (662).
It isn’t uncommon to see someone ranked as low as Anderson win a 250-level tournament; for example, another local player, 84th-ranked Crotian Ivan Dodig took the title in Zagreb this week. But rarely do home favorites make such comprehensive work of a draw.
Anderson won the tournament–though he’s not all that pertinent to our theme, since he outranked every one of his opponents. All five other South Africans exceeded expectations.
The qualifiers, Klaasen and Scholtz, didn’t win a main draw match, neither would have been expected to come through qualifying. Scholtz had to beat Pierre-Ludovic Duclos and Thiago Alves, ranked 443rd and 178th, respectively. Klaasen had to get past Rajeev Ram, currently ranked 188th but ranked inside the top 80 only a year ago.
Of the wild cards, only Wolmarans failed to reach the quarters. He did win his first round match against Igor Sijsling, who outranks him by 130 places.
Rik de Voest defeated Stefano Galvani (ranked 321) and 8th seed Michal Przysiezny (81), one of his best ATP-level results. And van der Merwe made it to the semifinals, beating Stephane Robert, Dustin Brown, and Simon Greul, all players who have spent substantial time in the top 100.
It is tempting to wonder if some locations lend themselves to a greater home court advantage. South Africa, in particular, is one of the more far-flung spots on the ATP map.
But it would be foolish to draw any conclusions based on one tournament. After all, last year, South Africans won a grand total of two matches in the Johannesburg main draw. This results of this year’s event are at least partly due to an usually weak field: only the top four seeds were among the world’s top 65. Some challenger-level events may be similarly competitive.
In any event, this week’s results are certainly a boost for tennis in South Africa; maybe the draw will be stronger next year.
Comments Off
The Wild Card Effect
I’ve written before about the types of players awarded wild cards into professional men’s tennis tournaments. While they can be categorized in different ways, there are two characteristics that are true of almost all wild cards:
- Without a wild card, they would not be able to play in the tournament.
- Tournament organizers see them as an asset to the event.
The first isn’t quite true; many wild cards would otherwise enter the qualifying draw, and some would reach the main draw that way. We can still conclude that WCs are, at least according to ATP entry rankings, inferior to other players who appear in the main draw. The only possible exceptions worth mentioning are qualifiers and other wild cards.
The second doesn’t necessarily tell us anything about the skill level of a player. Simply having James Blake in the draw probably boosts tickets sales for any event in the U.S. Other WCs are awarded to promote a tournament in other ways, perhaps by giving one WC to the winner of a junior event, or a special qualifying tournament for local amateurs.
While these cases are common enough, a major factor in the awarding of wild cards is the tournament organizer’s belief that a WC can compete. So the WC goes to a player returning from injury, or a veteran coming back from retirement. Or a junior who is rocketing up the rankings, or who has recently won a major collegiate event.
All this is to say, in the aggregate, players granted wild cards are usually better than their ranking says they are.
Thus, when we look at matches with one wild card and one non-wild card and apply my algorithm to predict the winner, we should anticipate that wild cards outperform expectations.
Empirical results
In fact, they do. The effect is substantial, and it holds at multiple levels of competition.
In testing the hypothesis, I controlled for home court advantage, an important consideration that is easily conflated with the wild card effect. After all, a large percentage of wild cards are granted to local players, so without careful analysis, it would not be clear how much of the advantage can be attributed to the wild card selection or the benefits of playing in one’s home country.
I ran the numbers with a dataset comprising all ATP main draw, ATP qualifying draw, and Challenger main draw matches from 2008 to 2010. The results were fairly consistent from year to year.
At the ATP main draw level, the dataset yielded over 900 matches between a wild card and a non-wild card. The wild card won the match about 15% more often than expected. We can approximate this effect by multiplying the WC’s ranking points by 1.3.
The other two levels showed even larger effects over about 2600 relevant matches. In ATP qualifying and Challenger main draw matches, wild cards won more than 25% more than expected. We can approximate this effect by multiplying the WC’s ranking points by 1.55.
Commentary
The existence of a positive “wild card effect” is not a surprise, nor is the magnitude. Essentially, when a player is awarded a wild card, we’re given more information about him than ranking points otherwise offer.
I suspect the difference in magnitude between the higher and lower levels is fairly straightforward, as well. While some players receive ATP wild cards straight from the amateur ranks, as can be the case with collegiate champions, most ATP wild cards go to somewhat established players on the fringes of success. These players are often inside the top 150, meaning that they’ve played a lot of professional tournaments, so while their ranking might undervalue them slightly, it is a fairly accurate gauge of their ability level.
By contrast, qualifying and challenger-level wild cards often go to less experienced players. They may not be full-time professionals or they may spend most of their time playing collegiate or junior tournaments. They usually have rankings, but the point totals may only be based on a handful of events.
Example from Australia
The most successful wild card in the Australian Open was Aussie youngster Bernard Tomic, who reached the third round, beating Jeremy Chardy and Feliciano Lopez before losing to Rafael Nadal.
As he was a local and a wild card, we now know to adjust his ranking points twice before estimating his likelihood of winning a match. Instead of estimating his talent with his pre-tourney ranking point total of 239, we adjust upward to 435. That still puts him as an underdog against Chardy’s 960 points, but it means we would have given him a 30% chance of winning instead of an 18% chance.
Of course, the 2011 Australian Open isn’t very instructive here, since six of the other wild cards lost their first matches, while the final WC, Benoit Paire, drew qualifier Flavio Cipolla in the the first round, and was a favorite.
Comments Off
Tennis Home Court – Research Notes
I’ve built out my men’s tennis results database quite a bit in the last couple of months, so I thought I’d revisit my research into home court advantage.
To recall, I started with ATP main draw matches from 2009. I focused on the subset of matches where the tournament was in the home country of one player, but not the other. I excluded matches where either player was a wild card entry–that usually applies to the home player. I did so because I think there is a separate “wild card” effect that reflects selection bias. (Tourney organizers choose players who did not make the cut but whose chances, for whatever reason, are better than their ranking would suggest.)
As I reported in my initial research, using about 450 matches from the 2009 main draw dataset, the home player won 17% more matches than expected. (“Expected” winnings are derived from my bare-bones algorithm to predict the winner of the match.) Using ranking points, this is roughly equivalent to giving the home player credit for 50% more ranking points than he actually has.
For example, Lleyton Hewitt is currently ranked 54th, with 870 ranking points. If we make this adjustment for the Australian Open, we’d say he’ll play at a level equal to someone with 1,305 ranking points, which would be 32nd in the world. Instead of giving him a 36% chance of winning his first round match against David Nalbandian, the home-court-adjusted number would give him a 47% chance. In this case the results might bear us out: The match went to 9-7 in the fifth set.
The surprise came when I expanded the dataset to include Challenger main draw matches and ATP-level qualifier matches. In 2009 Challengers, home players only won 6% more often than expected–equivalent to a ranking points multiplier of 1.15. In 2009 ATP qualies, the home court advantage was only 2%–a multiplier of about 1.05. Whatever confers the home court advantage in ATP main draw matches may not apply at all levels.
I next looked at the same datasets for 2010. Here are the home court advantages (and ranking points multipliers) observed last year:
- ATP main draw: 12% (1.35)
- Challenger main draw: 4% (1.1)
- ATP qualifiers: 14% (1.3)
The first two numbers don’t differ much from the ’09 observations, but the qualifier numbers come out of nowhere.
Until I’m able to look at more matches from before 2009, I hesitate to draw any conclusions about the qualifiers. That still leaves us with a fairly consistent gap between the home court advantage observed at the ATP main draw and Challenger main draw levels.
To the extent that crowd involvement plays a part, it seems reasonable to expect that players would get a bigger boost on a bigger stage. Even on outer courts in the early rounds, fans tend to pull for the locals. At challengers, the atmosphere is often more like a club tournament where the audience is next to nonexistent.
Another major possibility is that some combination of selection bias and the inadequacy of my prediction algorithm accounts for the lack of observed home court advantage in challengers. Players have more choice of where to play at the lower levels, so they will tend to stay closer to home. It may mean that, even exclusive of wild cards, the distribution of home-country players and non-home-country players is different; perhaps the bottom ends of challenger draws are disproportionately packed with home-country players. This is something that I can investigate further.
UPDATE: Just ran the numbers for 2008. The ATP main draw home court advantage remained consistent, at a 16% boost for the home player. The ATP qualifier pool also showed the same home court advantage. However, 2008 differed from later years in that in Challenger main draw matches, home players got an 11% boost, much bigger than in 2009 or 2010.
Marginal ATP Rankings
ATP rankings are frustrating: They are a decent approximation for player skill, but there are so many obvious flaws. Some of those flaws derive from the problem of needing one number–there’s no accounting for surface, for instance.
The one that frustrates me the most is how much luck is allowed to creep into a player’s ranking. When a player is awarded points for his performance in a certain tournament, there is no consideration of the skill level of the players he defeated. So two players who lose in the second round get the same number of points, even if one defeated a 16-year-old wild card in the first round and the other defeated Rafael Nadal in the first round.
There are plenty of arguments in favor of the present way of doing things.
- First, there’s the circular problem of finding a starting point–if ranking points aren’t an adequate measure of skill, how do you give numerical credit based on the skill of opponents?
- Second, players don’t display consistent levels of skill; if Milos Raonic is in the fourth round of the Australian Open, he is probably playing better than he was four months ago when he lost in the first round of the U.S. Open. Perhaps the person who defeats him in Melbourne deserves more points than the guys who beat him in qualifiers and challengers last fall. Players also display different levels of skill depending on surface; beating Juan Carlos Ferrero is more impressive on clay than on grass, and you’re more likely to do so in a later round on clay.
- Third, you could say that it all comes out in the wash. Pros play a lot of tournaments, and while you might only get 20 points for beating a top-10 player in the first round, you might get an additional 90 points for beating an unseeded player three rounds later.
We could settle for the status quo, or we could experiment with a different approach and test it. Testing these things is an enormous task, so for today I’m just presenting the experiment itself.
Opponent-based point awards
I looked at all ATP-level main draw and qualifying draw matches, along with Challenger-level main draw matches. I figured out the marginal points awarded to the winner of each match (e.g., by winning in the third round in the Aussie Open, you get 180 points instead of 90 points, for 90 marginal points) and the ranking points of the loser at the time of the match.
For instance, when Nadal beat Federer in the Madrid final, Nadal was awarded 400 marginal points, and Federer had 10,690 ranking points. Add up those two types of points, and it turns out that the total marginal points awarded in these matches are approximately 4.5% of the ranking points of the losers.
Thus, if we use a simple linear model, instead of giving Nadal 400 marginal points for winning that match, we give him 4.5% of 10,690, or 463 points. In this case, not a big difference. But when top players are upset in early rounds, the adjustment is huge.
To take a very different example: In Miami last year, Olivier Rochus beat Novak Djokovic in the round of 64. For advancing to the round of 32, Rochus earned 20 marginal points. Djokovic’s ranking point total at that point was 8,220, so if we give Rochus 4.5% of that, he gets 365 points. As we’ll see, that single adjustment rockets him up the rankings.
Pros and Cons
Compared to the present ATP ranking system, this approach gives more credit to the players who are capable of a top-10 performance, even if they play at that level very rarely. As we’ll see, a single major upset can make a huge difference, so perhaps it too heavily weighs a single match. If Rochus happened to play Djokovic on a day when Djokovic had the flu, does he really deserve 365 points?
Another potential problem is that this model doesn’t consider the level of the opponents that a player loses to. Nikolay Davydenko is known for his ability to beat Federer or Nadal, but in consecutive weeks in October, he lost to Pablo Cuevas and Mischa Zverev. Should we rank someone based on their ability to defeat “better” players, or their inability to defeat “lesser” players? As always the standard ATP ranking system appears to be a decent compromise.
For my purposes, what matters is how well a ranking system predicts future results. I hope that soon I’ll be able to report on how this one performs.
In the meantime, here are the 2010 year-end top 100, using the opponent-based model I’ve described. I’ve also included each player’s actual 2010 year-end ranking and the difference between their placement in the two systems.
Rk Player Pts Actual Diff 1 Rafael Nadal 4562 1 0 2 Roger Federer 4529 2 0 3 Robin Soderling 3905 5 2 4 David Ferrer 3450 7 3 5 Andy Murray 3347 4 -1 6 Tomas Berdych 2891 6 0 7 Jurgen Melzer 2772 11 4 8 Novak Djokovic 2730 3 -5 9 Fernando Verdasco 2697 9 0 10 Andy Roddick 2331 8 -2 11 Gael Monfils 2266 12 1 12 Nikolay Davydenko 2070 22 10 13 Mikhail Youzhny 2059 10 -3 14 Ivan Ljubicic 1952 17 3 15 Guillermo Garcia-Lopez 1948 33 18 16 Nicolas Almagro 1908 15 -1 17 Marcos Baghdatis 1859 20 3 18 Marin Cilic 1843 14 -4 19 Albert Montanes 1832 25 6 20 Michael Llodra 1822 23 3 21 Ernests Gulbis 1695 24 3 22 Viktor Troicki 1654 28 6 23 Mardy Fish 1646 16 -7 24 Jo-Wilfried Tsonga 1577 13 -11 25 Stanislas Wawrinka 1573 21 -4 26 Richard Gasquet 1570 30 4 27 Florian Mayer 1497 37 10 28 John Isner 1473 19 -9 29 Philipp Kohlschreiber 1456 34 5 30 Feliciano Lopez 1396 32 2 31 David Nalbandian 1395 27 -4 32 Juan Monaco 1394 26 -6 33 Samuel Querrey 1353 18 -15 34 Xavier Malisse 1342 60 26 35 Jeremy Chardy 1272 45 10 36 Andrei Goloubev 1236 36 0 37 Juan Carlos Ferrero 1227 29 -8 38 Jarkko Nieminen 1214 39 1 39 Gilles Simon 1180 41 2 40 Janko Tipsarevic 1178 49 9 41 Benjamin Becker 1145 53 12 42 Michael Berrer 1144 58 16 43 Thomaz Bellucci 1125 31 -12 44 Alexander Dolgopolov 1058 48 4 45 Denis Istomin 1058 40 -5 46 Andreas Seppi 1047 52 6 47 Thiemo de Bakker 1033 43 -4 48 Potito Starace 1031 47 -1 49 Daniel Gimeno 983 56 7 50 Olivier Rochus 971 113 63 51 Lleyton Hewitt 944 54 3 52 Julien Benneteau 941 44 -8 53 Marcel Granollers 937 42 -11 54 Juan Ignacio Chela 872 38 -16 55 Pablo Cuevas 868 63 8 56 Tommy Robredo 851 50 -6 57 Philipp Petzschner 817 57 0 58 Sergey Stakhovsky 813 46 -12 59 Dudi Sela 808 75 16 60 Santiago Giraldo 805 64 4 61 Michael Zverev 797 82 21 62 Radek Stepanek 789 62 0 63 Fabio Fognini 784 55 -8 64 Mikhail Kukushkin 781 59 -5 65 Yen-Hsun Lu 745 35 -30 66 Igor Andreev 722 79 13 67 Carlos Berlocq 719 66 -1 68 Ryan Sweeting 716 116 48 69 Teimuraz Gabashvili 714 80 11 70 Arnaud Clement 703 78 8 71 Lukas Lacko 691 89 18 72 Tobias Kamke 652 67 -5 73 Pere Riba 646 72 -1 74 Rainer Schuettler 642 84 10 75 Robin Haase 626 65 -10 76 Florent Serra 625 69 -7 77 Leonardo Mayer 622 94 17 78 Rui Machado 618 93 15 79 Kevin Anderson 596 61 -18 80 Albert Ramos 595 123 43 81 Ivo Karlovic 567 73 -8 82 Frederico Gil 554 101 19 83 Daniel Brands 546 104 21 84 Alejandro Falla 544 105 21 85 Simon Greul 534 130 45 86 Simone Bolelli 521 107 21 87 Filippo Volandri 509 91 4 88 Ilia Marchenko 488 81 -7 89 Marco Chiudinelli 486 117 28 90 Filip Krajinovic 483 214 124 91 Victor Hanescu 481 51 -40 92 Bjorn Phau 479 102 10 93 Ivan Dodig 478 88 -5 94 Kei Nishikori 477 98 4 95 Evgueni Korolev 468 140 45 96 James Blake 467 135 39 97 Ruben Ramirez-Hidalgo 466 77 -20 98 Ricardo Mello 461 76 -22 99 Grigor Dimitrov 460 106 7 100 Brian Dabul 458 85 -15
Comments Off
Predictiveness of ATP Rankings – Research Notes
I’m working on some bigger projects right now that might take some time before they see the light. In the meantime, here are a couple of things I’ve discovered about ATP rankings and their use to predict the outcome of matches.
1. In my earlier research, I found that in the “buckets” of matches that the favorite is most likely to win, my algorithm is still reasonably accurate. In other words, if the ranking points predict that Nadal, say, has a 98% chance of beating the 140th ranked player, his chances are in fact that high. The algorithm was as accurate on the extreme high end as it was anywhere else on the spectrum.
However, I only included matches in my sample where both players were ranked inside the top 200. I thought that was an innocuous enough cutoff, but I see now why it was misleading. If we limit the the sample that way, the most extreme favorites will only be the very top players. In fact, the only players who my algorithm gives a 95% chance of beating the 200th ranked player are the top 5.
When I expanded the sample to players ranked outside of the top 200, the high end broke down. In other words, in the bucket of matches where the favorite had a 90% or better chance of winning, the favorite isn’t winning that often.
There are several possible explanations for this, none of which account for the entire effect, but many of which surely play a part:
- I’m still only looking at ATP-level matches, and if a player outside of the top 200 is in an ATP main draw match, he was not exactly randomly selected. He may be playing at “home” on a wild card, he may be hot after a solid week in qualifying, he is probably on his favorite surface, and his ranking may be misleading due to injury.
- Outside of the top 5 or top 10, players are substantially less consistent. It’s tough to imagine Robin Soderling losing to a qualifier right now, but easy to see, say, Fernando Verdasco or Ivan Ljubicic doing so.
More fundamentally, I suspect that the further down the rankings you go, the less the difference in points really mean. Certainly there’s much more movement–once you get outside the top 50, one good showing can easily gain you 10, 20, or more spots. That doesn’t mean that a player is suddenly more skilled, which is the way my algorithm has to treat him.
Controlling for surface, wild card status, and more will help reconcile some of these differences, but ultimately, matches between drastically mismatched (on paper) opponents may have to be treated differently than matches between more closely matched peers.
2. Eliminating some quirks of the ATP ranking system doesn’t break it, at least not for my purposes. In the process of my current projects, I wanted to be able to more easily tweak the parameters of the ranking system, so I started by rebuilding the existing one. But there are a lot of quirks:
- The top 4 or 5 players get a lot of points from the Tour Championships.
- Davis Cup players get points.
- Rankings are limited to a player’s top 18 tournaments, but there are some limitations on what those tournaments must be, resulting in cases where player gets credited for a poor showing at a grand slam, but does not get credited for a better showing (worth more points) at a smaller tournament.
All of these quirks have their purposes, given the ATP’s priorities are built around keeping fans interested and ensuring that top players focus on the most important events. But they are a pain in the butt to incorporate in an on-the-fly system, so I just ignored them.
And as it turns out, they are not affecting my results in any meaningful way. I’ve re-run a couple of earlier projects with my “improvised” rankings, and nothing is changing by more than a percent or two. Occasionally the effect is strong on a certain player (I think the improvised system bumps Juan Carlos Ferrero from #29 to inside the top #15 at 2010 year-end), but in the aggregate, it makes no difference.
Comments Off
2011 Aussie Open Simulation Results
Using my simple ranking-points-based algorithm to determine the odds that each player wins a match, I ran simulations using the 2011 Australian Open draw.
As usual, the keyword is “simple,” and you can easily find all sorts of intuitive reasons to discount the results. There’s no consideration for surface, so clay-court specialists are generally overrated. Players returning from injury (Del Potro, especially, and Karlovic) have seen the hit in the rankings, and are thus underrated here, as well.
I’m also publishing the code that I use to generate these sims. It should work for any single-elimination tournament up to 128 competitors, and is easily expandable to handle larger brackets. The function ‘calcWP’ is specific to my tennis algorithm, but you could swap in something like log5 very easily. I also included the .csv file I used for the draw, so you can see the format, or tinker with the parameters and come up with your own Aussie sim.
Your 2011 Australian Open…
Player points R64 R32 R16 QF SF F W
Nadal 1 12390 96.9% 92.7% 87.0% 78.1% 66.1% 49.6% 34.5%
Daniel 564 3.1% 1.4% 0.5% 0.1% 0.0% 0.0% 0.0%
Sweeting Q 486 35.3% 1.6% 0.5% 0.1% 0.0% 0.0% 0.0%
Gimeno-Traver 844 64.7% 4.3% 1.9% 0.7% 0.2% 0.0% 0.0%
Tomic W 239 17.9% 3.1% 0.1% 0.0% 0.0% 0.0% 0.0%
Chardy 960 82.1% 39.6% 3.7% 1.4% 0.4% 0.1% 0.0%
Falla 540 27.3% 11.3% 0.7% 0.2% 0.0% 0.0% 0.0%
Lopez F 31 1310 72.7% 46.0% 5.6% 2.6% 0.9% 0.2% 0.0%
Isner 20 1850 74.0% 56.8% 31.7% 5.8% 2.5% 0.8% 0.2%
Serra 711 26.0% 14.0% 4.6% 0.5% 0.1% 0.0% 0.0%
Stepanek 735 62.1% 20.4% 6.9% 0.6% 0.2% 0.0% 0.0%
Gremelmayr Q 469 37.9% 8.8% 2.2% 0.1% 0.0% 0.0% 0.0%
Machado 573 41.2% 10.2% 3.3% 0.2% 0.0% 0.0% 0.0%
Giraldo 785 58.8% 18.3% 7.2% 0.7% 0.2% 0.0% 0.0%
Young D Q 435 14.6% 5.4% 1.4% 0.1% 0.0% 0.0% 0.0%
Cilic 15 2140 85.4% 66.1% 42.8% 8.7% 4.1% 1.4% 0.4%
Youzhny 10 2920 85.6% 70.1% 51.9% 29.2% 8.1% 3.3% 1.1%
Ilhan 574 14.4% 6.2% 2.2% 0.5% 0.0% 0.0% 0.0%
Kavcic Q 552 38.0% 7.1% 2.4% 0.5% 0.0% 0.0% 0.0%
Anderson K 868 62.0% 16.6% 7.4% 2.1% 0.2% 0.0% 0.0%
Raonic Q 351 36.4% 6.8% 1.0% 0.2% 0.0% 0.0% 0.0%
Phau 581 63.6% 18.0% 4.1% 0.9% 0.1% 0.0% 0.0%
Chela 1070 39.3% 27.8% 9.9% 3.2% 0.4% 0.1% 0.0%
Llodra 22 1575 60.7% 47.4% 21.0% 8.8% 1.6% 0.4% 0.1%
Nalbandian 27 1480 64.2% 49.1% 18.4% 8.2% 1.4% 0.4% 0.1%
Hewitt 870 35.8% 23.1% 6.1% 2.0% 0.2% 0.0% 0.0%
Berankis 589 61.1% 19.1% 3.9% 1.0% 0.1% 0.0% 0.0%
Matosevic W 392 38.9% 8.8% 1.4% 0.2% 0.0% 0.0% 0.0%
Russell 547 67.0% 10.1% 3.6% 0.8% 0.1% 0.0% 0.0%
Ebden W 288 33.0% 2.7% 0.6% 0.1% 0.0% 0.0% 0.0%
Nieminen 1062 20.2% 14.5% 7.7% 2.8% 0.4% 0.1% 0.0%
Ferrer 7 3735 79.8% 72.7% 58.4% 39.4% 12.6% 5.8% 2.4%
Soderling 4 5785 87.9% 83.6% 71.9% 58.3% 35.9% 15.6% 7.9%
Starace 945 12.1% 8.9% 4.2% 1.7% 0.3% 0.0% 0.0%
Muller Q 466 76.9% 6.9% 2.0% 0.5% 0.1% 0.0% 0.0%
Stadler Q 155 23.1% 0.7% 0.1% 0.0% 0.0% 0.0% 0.0%
Istomin 1031 86.2% 41.8% 8.8% 3.6% 0.8% 0.1% 0.0%
Hernych Q 196 13.8% 1.9% 0.1% 0.0% 0.0% 0.0% 0.0%
Mello 627 30.0% 12.8% 1.9% 0.6% 0.1% 0.0% 0.0%
Bellucci 30 1355 70.0% 43.5% 11.0% 5.3% 1.4% 0.2% 0.1%
Gulbis 24 1505 64.3% 41.5% 20.7% 6.3% 1.9% 0.4% 0.1%
Becker 870 35.7% 17.9% 6.4% 1.3% 0.2% 0.0% 0.0%
Dolgopolov 928 53.6% 22.8% 8.6% 1.8% 0.4% 0.0% 0.0%
Kukushkin 815 46.4% 17.9% 6.3% 1.2% 0.2% 0.0% 0.0%
Seppi 900 59.6% 19.2% 8.7% 1.9% 0.4% 0.0% 0.0%
Clement 627 40.4% 9.9% 3.5% 0.6% 0.1% 0.0% 0.0%
Petzschner 839 24.3% 12.6% 5.5% 1.1% 0.2% 0.0% 0.0%
Tsonga 13 2345 75.7% 58.2% 40.4% 15.8% 6.3% 1.6% 0.5%
Melzer 11 2785 91.2% 77.7% 54.3% 22.9% 10.4% 3.0% 1.0%
Millot Q 334 8.8% 3.3% 0.7% 0.1% 0.0% 0.0% 0.0%
Ball W 344 32.5% 4.1% 0.9% 0.1% 0.0% 0.0% 0.0%
Riba 672 67.5% 14.8% 5.5% 1.0% 0.2% 0.0% 0.0%
Sela 568 77.8% 21.8% 5.0% 0.7% 0.1% 0.0% 0.0%
Del Potro 180 22.2% 2.4% 0.2% 0.0% 0.0% 0.0% 0.0%
Zemlja Q 376 15.1% 6.9% 1.2% 0.1% 0.0% 0.0% 0.0%
Baghdatis 21 1785 84.9% 68.9% 32.2% 10.7% 3.8% 0.8% 0.2%
Garcia-Lopez 32 1300 62.1% 44.0% 10.6% 4.2% 1.2% 0.2% 0.0%
Berrer 835 37.9% 22.8% 3.9% 1.1% 0.2% 0.0% 0.0%
Schwank 580 50.6% 16.9% 2.3% 0.5% 0.1% 0.0% 0.0%
Mayer L 572 49.4% 16.3% 2.1% 0.4% 0.1% 0.0% 0.0%
Marchenko 624 49.3% 5.5% 2.3% 0.6% 0.1% 0.0% 0.0%
Ramirez Hidalgo 638 50.7% 5.7% 2.4% 0.6% 0.1% 0.0% 0.0%
Beck K 543 7.0% 3.2% 1.2% 0.3% 0.0% 0.0% 0.0%
Murray 5 5760 93.0% 85.5% 75.3% 56.7% 35.5% 15.6% 7.9%
Berdych 6 3955 96.4% 78.5% 63.1% 42.3% 22.0% 9.6% 3.4%
Crugnola Q 194 3.6% 0.5% 0.1% 0.0% 0.0% 0.0% 0.0%
Kohlschreiber 1215 63.8% 15.2% 8.3% 3.1% 0.9% 0.2% 0.0%
Kamke 724 36.2% 5.8% 2.4% 0.7% 0.1% 0.0% 0.0%
Harrison W 313 32.3% 6.7% 0.6% 0.1% 0.0% 0.0% 0.0%
Mannarino 612 67.7% 22.8% 3.9% 0.9% 0.1% 0.0% 0.0%
Dancevic Q 172 9.0% 2.2% 0.1% 0.0% 0.0% 0.0% 0.0%
Gasquet 28 1385 91.0% 68.3% 21.5% 8.8% 2.5% 0.6% 0.1%
Davydenko 23 1555 60.0% 41.5% 17.1% 6.5% 2.0% 0.5% 0.1%
Mayer F 1073 40.0% 23.9% 8.0% 2.3% 0.6% 0.1% 0.0%
Fognini 855 59.6% 22.7% 6.5% 1.7% 0.3% 0.0% 0.0%
Nishikori 599 40.4% 12.0% 2.7% 0.5% 0.1% 0.0% 0.0%
Zverev 611 38.3% 7.2% 2.4% 0.5% 0.1% 0.0% 0.0%
Tipsarevic 935 61.7% 16.0% 7.2% 2.0% 0.4% 0.1% 0.0%
Schuettler 597 13.5% 5.8% 1.9% 0.4% 0.1% 0.0% 0.0%
Verdasco 9 3240 86.5% 71.1% 54.1% 30.3% 14.2% 5.7% 1.8%
Almagro 14 2160 84.5% 68.0% 41.9% 15.4% 6.8% 2.2% 0.5%
Robert Q 460 15.5% 6.6% 1.8% 0.2% 0.0% 0.0% 0.0%
Andreev 622 52.1% 13.7% 4.5% 0.8% 0.1% 0.0% 0.0%
Volandri 574 47.9% 11.8% 3.6% 0.5% 0.1% 0.0% 0.0%
Cipolla Q 190 32.6% 3.4% 0.4% 0.0% 0.0% 0.0% 0.0%
Paire W 366 67.4% 12.5% 2.5% 0.3% 0.0% 0.0% 0.0%
Luczak W 400 14.7% 8.6% 1.9% 0.2% 0.0% 0.0% 0.0%
Ljubicic 17 1965 85.3% 75.5% 43.4% 15.1% 6.2% 1.8% 0.4%
Troicki 29 1385 86.2% 64.4% 16.2% 7.2% 2.4% 0.5% 0.1%
Tursunov 263 13.8% 4.5% 0.3% 0.0% 0.0% 0.0% 0.0%
Dabul 584 58.6% 19.8% 2.7% 0.7% 0.1% 0.0% 0.0%
Mahut Q 424 41.4% 11.3% 1.1% 0.2% 0.0% 0.0% 0.0%
Karlovic 670 52.8% 6.2% 2.5% 0.7% 0.1% 0.0% 0.0%
Dodig 606 47.2% 5.0% 2.0% 0.5% 0.1% 0.0% 0.0%
Granollers 993 11.6% 7.2% 3.6% 1.4% 0.4% 0.1% 0.0%
Djokovic 3 6240 88.4% 81.6% 71.5% 56.9% 40.2% 21.9% 10.2%
Roddick 8 3565 88.5% 78.1% 61.4% 42.2% 16.8% 8.1% 2.7%
Hajek 560 11.5% 5.8% 2.0% 0.4% 0.0% 0.0% 0.0%
Przysiezny 590 51.7% 8.5% 2.9% 0.7% 0.1% 0.0% 0.0%
Kunitsyn 551 48.3% 7.6% 2.5% 0.7% 0.1% 0.0% 0.0%
Berlocq 725 47.1% 16.8% 4.0% 1.2% 0.2% 0.0% 0.0%
Haase 803 52.9% 20.0% 5.2% 1.7% 0.3% 0.0% 0.0%
Benneteau 965 38.5% 21.8% 6.3% 2.3% 0.4% 0.1% 0.0%
Monaco 26 1480 61.5% 41.5% 15.7% 7.2% 1.7% 0.5% 0.1%
Wawrinka 19 1855 76.7% 52.3% 28.1% 12.8% 3.5% 1.2% 0.2%
Gabashvili 626 23.3% 9.4% 2.6% 0.6% 0.1% 0.0% 0.0%
Dimitrov Q 518 29.5% 7.5% 1.8% 0.4% 0.0% 0.0% 0.0%
Golubev 1135 70.5% 30.8% 12.7% 4.2% 0.8% 0.2% 0.0%
Gil 551 40.2% 8.3% 2.4% 0.5% 0.0% 0.0% 0.0%
Cuevas 790 59.8% 16.4% 6.1% 1.6% 0.2% 0.0% 0.0%
De Bakker 950 25.2% 14.9% 6.2% 1.9% 0.3% 0.1% 0.0%
Monfils 12 2560 74.8% 60.4% 40.2% 21.5% 7.2% 2.9% 0.7%
Fish 16 1996 70.1% 52.0% 32.0% 8.2% 3.9% 1.3% 0.3%
Hanescu 915 29.9% 16.4% 6.8% 1.0% 0.3% 0.0% 0.0%
Robredo 915 65.2% 23.4% 9.9% 1.5% 0.4% 0.1% 0.0%
Devvarman 514 34.8% 8.2% 2.4% 0.2% 0.0% 0.0% 0.0%
Stakhovsky 925 64.4% 24.8% 10.2% 1.6% 0.4% 0.1% 0.0%
Brands 541 35.6% 9.3% 2.6% 0.3% 0.1% 0.0% 0.0%
Kubot 670 24.5% 11.4% 3.9% 0.5% 0.1% 0.0% 0.0%
Querrey 18 1860 75.5% 54.5% 32.1% 7.8% 3.4% 1.1% 0.2%
Montanes 25 1495 74.3% 48.4% 8.8% 4.5% 1.7% 0.5% 0.1%
Brown 573 25.7% 10.3% 0.9% 0.2% 0.0% 0.0% 0.0%
Andujar 683 40.9% 14.9% 1.4% 0.4% 0.1% 0.0% 0.0%
Malisse 956 59.1% 26.4% 3.4% 1.4% 0.4% 0.1% 0.0%
Lu 1141 53.7% 6.2% 3.2% 1.4% 0.5% 0.1% 0.0%
Simon 1005 46.3% 4.8% 2.3% 0.9% 0.3% 0.1% 0.0%
Lacko 553 4.3% 1.4% 0.4% 0.1% 0.0% 0.0% 0.0%
Federer 2 9245 95.7% 87.6% 79.6% 70.0% 56.6% 40.3% 22.4%
Comments Off