The Summer of Jeff

Head Over to HeavyTopspin.com

Posted in Meta, tennis by Jeff on February 28, 2011

It was only a matter of time before I started a dedicated tennis blog.

My archived tennis studies will remain here, but from now on, I’ll be publishing tennis commentary (and additional research) at my new site, HeavyTopspin.com.

Click on over, tell your friends, and learn more than ever wanted to know about men’s tennis.

Comments Off on Head Over to HeavyTopspin.com

Lefties in Tennis: Doubles and Prize Money

Posted in tennis by Jeff on February 16, 2011

A few days ago, I offered some numbers on the prevalence of lefties in men’s tennis.  It turned out that, in the top 300 of the ATP singles rankings, lefties don’t show up much more than you would expect them to.

A reasonable follow-up question would be: What about doubles?

Being left-handed may not make one a better doubles player, but being left-handed does have the potential to make one part of a better doubles team.  Case in point: Five of the eight doubles teams that earned a spot in the ATP Tour Finals last year were a righty/lefty duo, including the top two teams in the year-end rankings.

And indeed, it turns out that left-handers are more prevalent in the top ranks of men’s doubles.  As we’ve seen, in November 2010, five of the sixteen players (31 percent) included in the ATP Tour Finals were left-handed.

The most current ATP doubles rankings tell a similar, if less extreme, story.  Of the top 100 ranked doubles players, 18 are left-handed.  That’s considerably higher than the 12 of 100 at the top of the singles rankings.  (Both top 100s include Rafael Nadal, who plays left-handed but was born right-hand dominant.  These calculations consider him left-handed.)

Prize money

The majority of players participate in both singles and doubles, at least on occasion.  To determine some general level of “success” for ATP players, we could look at total prize money.  This weights singles much more heavily.  An advantage is that it is a reasonable measure of a sustainable career in professional tennis.

So, do left-handers have a better chance at making money in tennis than we would expect, given their prevalence in the general population?

It doesn’t look like there is any substantial advantage.  Of the top 100 money-winners, 13 are left-handed, including Nadal.  The top 100 does include four doubles specialists, out of only 13 total doubles specialists in the top 100.

If we go further, we find an additional five lefties from 101 to 150, and six more from 151 to 200.

Left-handers do seem to have a better chance than right-handers of reaching a certain level of success in men’s doubles.  Beyond that, there is little in the way of a handedness advantage.  Whatever the advantages of playing tennis left-handed and the challenges of facing a lefty, they don’t translate into an overwhelming number of left-handers at the top of the professional game, or a disproportionate level of success for left-handed professionals.

Comments Off on Lefties in Tennis: Doubles and Prize Money

The Prevalence of Lefties in Men’s Tennis

Posted in tennis by Jeff on February 12, 2011

Many people, in and out of tennis, believe that left-handed players have an advantage of some kind.  The perceived advantage may just be one of unfamiliarity; a junior or club-level player doesn’t see many lefties, so he is unaccustomed to the angles and spins that come of a left-hander’s racquet.

In any event, we need some hard data.  Are lefties overrepresented in the top ranks of professional men’s tennis?

The short answer: Not really.

There’s no universal consensus on the prevalence of left-hand dominance in the general population.  You’ll frequently see the figure 10 percent, or a range between 8 and 15 percent.  How does that compare to the number of lefties in the ATP rankings?

Here is a breakdown of lefties in the ATP rankings of 7 Feb 2011:

  • Top 10: 2 (20%)
  • Top 20: 3 (15%)
  • Top 50: 6 (12%)
  • Top 100: 12 (12%)
  • Top 200: 29 (14.5%)
  • Top 300: 40 (13.3%)

An interesting case is Rafael Nadal, who was born right-hand dominant, but was taught to play left-handed.  So if we are looking at the success rates of left-hand dominant players, we could subtract one from each of the raw totals above.  Of course, there may be other players who were taught to play with their non-dominant hand.

(An odder case is that of Guillermo Olaso, who is listed on the ATP site as ambidextrous.  Other resources show him as right-handed.  I saw him play a couple of years ago and don’t remember anything unique about his game, so I left him in the righty category.)

The advantage, if any

A perspective that I’ve heard (I have no idea from where) is that lefties can take advantage of the unfamiliarity advantage early in their careers, giving them a foundation of success that earns them more matches, more support, more coaching, and the like.  The left-handedness doesn’t make them a better player, exactly, but it causes other things that lead to an improvement in their play.

Depending on how long that advantage persists, we might expect to see a “bulge” in the number of lefties somewhere in the rankings.  There’s a bit of a blip in the 101-200 range, and there’s a bigger one if we narrow our focus to 151-200, where 10 of the 50 men play left-handed.  Perhaps unfamiliarity helps them get to some level, but when they start meeting opponents at higher levels, the unfamiliarity advantage is not enough.

The blip between 101 and 200 might not mean anything; perhaps if we went further down the rankings, or even into the national or junior rankings, we’d see something more pronounced.  Alas, it was hard enough to get handedness for the top 300 players, so any larger project will have to wait for another day.

Quantifying the Bias of an ATP Draw

Posted in tennis by Jeff on February 10, 2011

ATP tennis draws are biased in favor of top-ranked players.  If you’re ranked in the top four, you won’t face another top-16 player until the round of 16, you won’t face a top-8 player until the quarters, and you won’t face a fellow top-4 player until the semis.  If you’re unseeded (out of the top 32 in slams, 16 or 8 in smaller tourneys), you’ll probably have to face a top-16 player just to get into the round of 16 … and you might draw a top-4 player in the first round.

This is the way it is, and it’s not going to change anytime soon.  Since it’s the nature of the beast, we should better understand the effects of this system.

In short: The more highly ranked you are, the easier it probably will be to win the first few matches of a tournament.  The further you go in the draw, the more points you earn, and the higher your ranking stays.  A higher pre-tournament ranking–regardless of actual skill!–increases your odds of a better performance.

Thus, the rankings lag behind changes in skill level, creating a bias against rapidly-improving youngsters and players returning from long absences.

An example

Let’s play around with this a bit.  Before the Australian Open started, I published the odds that each player in the main draw would reach any given round.  From that, we can calculate “expected points,” which gives us a way of directly comparing each player’s chances, given his skill level and his draw.

For instance, Nadal’s expected points were 1056, Federer’s were 857, and 11th-seed Jurgen Melzer’s were 227.

What happens if swap two players’ positions in the draw?  Let’s try #4 and the top-ranked non-seed.  Going into the tournament, #4 was Robin Soderling, and Phillip Kohlschreiber was #34, unseeded.  In the draw as it actually happened, Soderling’s expected points were 515 and Kohlschreiber’s were 56, thanks in part to a 2nd round matchup against Tomas Berdych.

If we exchange Soderling’s and Kohlschreiber’s draw positions and run the simulation, we get very different results.  Soderling’s expected points are 353 (down 31 percent) and Kohlschreiber’s expected points improve to 103 (up 84 percent).

Randomization

The Soderling/Kohlschreiber swap may be an outlier.  We can do better, and besides, I don’t want to type “Kohlschreiber” anymore.

Let’s try a new simulation.  For each run, we’ll randomize the draw positions, so Nadal has an equal chance of drawing Marcos Daniel, Roger Federer, or anybody else in the first round.

The differences in the results are substantial.  Nearly 75 percent of players have their expected points change by more than 10 percent.  39 of the 128 players see their expected points decrease with randomization, and those players are disproportionately seeds.  The seeds are disproportionately high seeds.

Two types of players seem to benefit from the status quo:

  • High seeds.  They are guaranteed non-seeded opponents for at least two rounds, and lower-seeded opponents for a round or two after that.
  • Marginal players who get lucky draws.  The player who the Aussie Open draw benefited the most was wild card Benoit Paire.  He was one of the weakest players in the field, but in the first round, he drew Flavio Cipolla, one of the few competitors who was even weaker.

Of the 39 players who do better under the original draw, 19 are seeds and 9 are WCs or qualifiers, mostly in situations like Paire’s.  That leaves only 11 middle-of-the-pack, unseeded players who weren’t disadvantaged by the draw.

If the draw had been randomized, half of the field (mostly unseeded players) would have seen their expected points increase by more than 10 percent.  52 players would have jumped by 20 percent or more, 37 by more than 30 percent, and 22 by more than 50 percent.

Season-long bias

To some extent, the bias is mitigated over the course of a season.  Players like Kohlschreiber are disadvantaged by the draw so long as their ranking stays in the unseeded-but-good 33-50 range for slams, but in smaller tournaments, such a player is often seeded.

And, of course, by playing 20-30 tournaments, the draws are randomized for some players.  Paire got lucky by drawing Cipolla in Melbourne, but he could just as easily have found himself pitted against a top seed.

As is intuitively obvious, draws are biased in favor of the top players, and that is one thing that isn’t mitigated by a year’s worth of tournaments.  The top 12 seeds all did better in the actual draw simulation than in the randomized simulation, and I expect that would be true for the vast majority of tournaments.

If some players are consistently “winning” through draw bias, there must be losers.  As we’ve seen, lower-ranked players can win big or lose big in a draw, but it stands to reason that, over the course of the season, they lose a little bit.  At least until they overcome and disadvantage and become top-ranked players themselves.

Comments Off on Quantifying the Bias of an ATP Draw

Home Court Advantage Run Amok

Posted in tennis by Jeff on February 6, 2011

This week in Johannesburg is the only time of year that an ATP tour-level event goes to South Africa.  Accordingly, all the South Africans take part, the wild cards are generally awarded to South Africans, and a disproportionate number of entries in the qualifying draw are South Africans.  And they performed unexpectedly well.

Thus, of the main draw of 32, 6 players were South Africans.  They included 4th seed Kevin Anderson (ranked 59th), wild cards Fritz Wolmarans (261), Rik de Voest (183), and Izak van der Merwe (170), along with qualifiers Raven Klaasen (307) and Nikala Scholtz (662).

It isn’t uncommon to see someone ranked as low as Anderson win a 250-level tournament; for example, another local player, 84th-ranked Crotian Ivan Dodig took the title in Zagreb this week.  But rarely do home favorites make such comprehensive work of a draw.

Anderson won the tournament–though he’s not all that pertinent to our theme, since he outranked every one of his opponents.  All five other South Africans exceeded expectations.

The qualifiers, Klaasen and Scholtz, didn’t win a main draw match, neither would have been expected to come through qualifying.  Scholtz had to beat Pierre-Ludovic Duclos and Thiago Alves, ranked 443rd and 178th, respectively.  Klaasen had to get past Rajeev Ram, currently ranked 188th but ranked inside the top 80 only a year ago.

Of the wild cards, only Wolmarans failed to reach the quarters.  He did win his first round match against Igor Sijsling, who outranks him by 130 places.

Rik de Voest defeated Stefano Galvani (ranked 321) and 8th seed Michal Przysiezny (81), one of his best ATP-level results.  And van der Merwe made it to the semifinals, beating Stephane Robert, Dustin Brown, and Simon Greul, all players who have spent substantial time in the top 100.

It is tempting to wonder if some locations lend themselves to a greater home court advantage.  South Africa, in particular, is one of the more far-flung spots on the ATP map.

But it would be foolish to draw any conclusions based on one tournament.  After all, last year, South Africans won a grand total of two matches in the Johannesburg main draw.  This results of this year’s event are at least partly due to an usually weak field: only the top four seeds were among the world’s top 65.  Some challenger-level events may be similarly competitive.

In any event, this week’s results are certainly a boost for tennis in South Africa; maybe the draw will be stronger next year.

Comments Off on Home Court Advantage Run Amok

The Wild Card Effect

Posted in tennis by Jeff on January 24, 2011

I’ve written before about the types of players awarded wild cards into professional men’s tennis tournaments.  While they can be categorized in different ways, there are two characteristics that are true of almost all wild cards:

  1. Without a wild card, they would not be able to play in the tournament.
  2. Tournament organizers see them as an asset to the event.

The first isn’t quite true; many wild cards would otherwise enter the qualifying draw, and some would reach the main draw that way.  We can still conclude that WCs are, at least according to ATP entry rankings, inferior to other players who appear in the main draw.  The only possible exceptions worth mentioning are qualifiers and other wild cards.

The second doesn’t necessarily tell us anything about the skill level of a player.  Simply having James Blake in the draw probably boosts tickets sales for any event in the U.S.  Other WCs are awarded to promote a tournament in other ways, perhaps by giving one WC to the winner of a junior event, or a special qualifying tournament for local amateurs.

While these cases are common enough, a major factor in the awarding of wild cards is the tournament organizer’s belief that a WC can compete.  So the WC goes to a player returning from injury, or a veteran coming back from retirement.  Or a junior who is rocketing up the rankings, or who has recently won a major collegiate event.

All this is to say, in the aggregate, players granted wild cards are usually better than their ranking says they are.

Thus, when we look at matches with one wild card and one non-wild card and apply my algorithm to predict the winner, we should anticipate that wild cards outperform expectations.

Empirical results

In fact, they do.  The effect is substantial, and it holds at multiple levels of competition.

In testing the hypothesis, I controlled for home court advantage, an important consideration that is easily conflated with the wild card effect.  After all, a large percentage of wild cards are granted to local players, so without careful analysis, it would not be clear how much of the advantage can be attributed to the wild card selection or the benefits of playing in one’s home country.

I ran the numbers with a dataset comprising all ATP main draw, ATP qualifying draw, and Challenger main draw matches from 2008 to 2010.  The results were fairly consistent from year to year.

At the ATP main draw level, the dataset yielded over 900 matches between a wild card and a non-wild card.  The wild card won the match about 15% more often than expected.  We can approximate this effect by multiplying the WC’s ranking points by 1.3.

The other two levels showed even larger effects over about 2600 relevant matches.  In ATP qualifying and Challenger main draw matches, wild cards won more than 25% more than expected.  We can approximate this effect by multiplying the WC’s ranking points by 1.55.

Commentary

The existence of a positive “wild card effect” is not a surprise, nor is the magnitude.  Essentially, when a player is awarded a wild card, we’re given more information about him than ranking points otherwise offer.

I suspect the difference in magnitude between the higher and lower levels is fairly straightforward, as well.  While some players receive ATP wild cards straight from the amateur ranks, as can be the case with collegiate champions, most ATP wild cards go to somewhat established players on the fringes of success.  These players are often inside the top 150, meaning that they’ve played a lot of professional tournaments, so while their ranking might undervalue them slightly, it is a fairly accurate gauge of their ability level.

By contrast, qualifying and challenger-level wild cards often go to less experienced players.  They may not be full-time professionals or they may spend most of their time playing collegiate or junior tournaments.  They usually have rankings, but the point totals may only be based on a handful of events.

Example from Australia

The most successful wild card in the Australian Open was Aussie youngster Bernard Tomic, who reached the third round, beating Jeremy Chardy and Feliciano Lopez before losing to Rafael Nadal.

As he was a local and a wild card, we now know to adjust his ranking points twice before estimating his likelihood of winning a match.  Instead of estimating his talent with his pre-tourney ranking point total of 239, we adjust upward to 435.  That still puts him as an underdog against Chardy’s 960 points, but it means we would have given him a 30% chance of winning instead of an 18% chance.

Of course, the 2011 Australian Open isn’t very instructive here, since six of the other wild cards lost their first matches, while the final WC, Benoit Paire, drew qualifier Flavio Cipolla in the the first round, and was a favorite.

Comments Off on The Wild Card Effect

Tennis Home Court – Research Notes

Posted in tennis by Jeff on January 23, 2011

I’ve built out my men’s tennis results database quite a bit in the last couple of months, so I thought I’d revisit my research into home court advantage.

To recall, I started with ATP main draw matches from 2009.  I focused on the subset of matches where the tournament was in the home country of one player, but not the other.  I excluded matches where either player was a wild card entry–that usually applies to the home player.  I did so because I think there is a separate “wild card” effect that reflects selection bias.  (Tourney organizers choose players who did not make the cut but whose chances, for whatever reason, are better than their ranking would suggest.)

As I reported in my initial research, using about 450 matches from the 2009 main draw dataset, the home player won 17% more matches than expected.  (“Expected” winnings are derived from my bare-bones algorithm to predict the winner of the match.)  Using ranking points, this is roughly equivalent to giving the home player credit for 50% more ranking points than he actually has.

For example, Lleyton Hewitt is currently ranked 54th, with 870 ranking points.  If we make this adjustment for the Australian Open, we’d say he’ll play at a level equal to someone with 1,305 ranking points, which would be 32nd in the world.  Instead of giving him a 36% chance of winning his first round match against David Nalbandian, the home-court-adjusted number would give him a 47% chance.  In this case the results might bear us out: The match went to 9-7 in the fifth set.

The surprise came when I expanded the dataset to include Challenger main draw matches and ATP-level qualifier matches.  In 2009 Challengers, home players only won 6% more often than expected–equivalent to a ranking points multiplier of 1.15.  In 2009 ATP qualies, the home court advantage was only 2%–a multiplier of about 1.05.  Whatever confers the home court advantage in ATP main draw matches may not apply at all levels.

I next looked at the same datasets for 2010.  Here are the home court advantages (and ranking points multipliers) observed last year:

  • ATP main draw: 12% (1.35)
  • Challenger main draw: 4% (1.1)
  • ATP qualifiers: 14% (1.3)

The first two numbers don’t differ much from the ’09 observations, but the qualifier numbers come out of nowhere.

Until I’m able to look at more matches from before 2009, I hesitate to draw any conclusions about the qualifiers.  That still leaves us with a fairly consistent gap between the home court advantage observed at the ATP main draw and Challenger main draw levels.

To the extent that crowd involvement plays a part, it seems reasonable to expect that players would get a bigger boost on a bigger stage.  Even on outer courts in the early rounds, fans tend to pull for the locals.  At challengers, the atmosphere is often more like a club tournament where the audience is next to nonexistent.

Another major possibility is that some combination of selection bias and the inadequacy of my prediction algorithm accounts for the lack of observed home court advantage in challengers.  Players have more choice of where to play at the lower levels, so they will tend to stay closer to home.  It may mean that, even exclusive of wild cards, the distribution of home-country players and non-home-country players is different; perhaps the bottom ends of challenger draws are disproportionately packed with home-country players.  This is something that I can investigate further.

UPDATE: Just ran the numbers for 2008.  The ATP main draw home court advantage remained consistent, at a 16% boost for the home player.  The ATP qualifier pool also showed the same home court advantage.  However, 2008 differed from later years in that in Challenger main draw matches, home players got an 11% boost, much bigger than in 2009 or 2010.

Marginal ATP Rankings

Posted in tennis by Jeff on January 22, 2011

ATP rankings are frustrating: They are a decent approximation for player skill, but there are so many obvious flaws.  Some of those flaws derive from the problem of needing one number–there’s no accounting for surface, for instance.

The one that frustrates me the most is how much luck is allowed to creep into a player’s ranking.  When a player is awarded points for his performance in a certain tournament, there is no consideration of the skill level of the players he defeated.  So two players who lose in the second round get the same number of points, even if one defeated a 16-year-old wild card in the first round and the other defeated Rafael Nadal in the first round.

There are plenty of arguments in favor of the present way of doing things.

  • First, there’s the circular problem of finding a starting point–if ranking points aren’t an adequate measure of skill, how do you give numerical credit based on the skill of opponents?
  • Second, players don’t display consistent levels of skill; if Milos Raonic is in the fourth round of the Australian Open, he is probably playing better than he was four months ago when he lost in the first round of the U.S. Open.  Perhaps the person who defeats him in Melbourne deserves more points than the guys who beat him in qualifiers and challengers last fall.  Players also display different levels of skill depending on surface; beating Juan Carlos Ferrero is more impressive on clay than on grass, and you’re more likely to do so in a later round on clay.
  • Third, you could say that it all comes out in the wash.  Pros play a lot of tournaments, and while you might only get 20 points for beating a top-10 player in the first round, you might get an additional 90 points for beating an unseeded player three rounds later.

We could settle for the status quo, or we could experiment with a different approach and test it.  Testing these things is an enormous task, so for today I’m just presenting the experiment itself.

Opponent-based point awards

I looked at all ATP-level main draw and qualifying draw matches, along with Challenger-level main draw matches.  I figured out the marginal points awarded to the winner of each match (e.g., by winning in the third round in the Aussie Open, you get 180 points instead of 90 points, for 90 marginal points) and the ranking points of the loser at the time of the match.

For instance, when Nadal beat Federer in the Madrid final, Nadal was awarded 400 marginal points, and Federer had 10,690 ranking points.  Add up those two types of points, and it turns out that the total marginal points awarded in these matches are approximately 4.5% of the ranking points of the losers.

Thus, if we use a simple linear model, instead of giving Nadal 400 marginal points for winning that match, we give him 4.5% of 10,690, or 463 points.  In this case, not a big difference.  But when top players are upset in early rounds, the adjustment is huge.

To take a very different example: In Miami last year, Olivier Rochus beat Novak Djokovic in the round of 64.  For advancing to the round of 32, Rochus earned 20 marginal points.  Djokovic’s ranking point total at that point was 8,220, so if we give Rochus 4.5% of that, he gets 365 points.  As we’ll see, that single adjustment rockets him up the rankings.

Pros and Cons

Compared to the present ATP ranking system, this approach gives more credit to the players who are capable of a top-10 performance, even if they play at that level very rarely.  As we’ll see, a single major upset can make a huge difference, so perhaps it too heavily weighs a single match.  If Rochus happened to play Djokovic on a day when Djokovic had the flu, does he really deserve 365 points?

Another potential problem is that this model doesn’t consider the level of the opponents that a player loses to.  Nikolay Davydenko is known for his ability to beat Federer or Nadal, but in consecutive weeks in October, he lost to Pablo Cuevas and Mischa Zverev.  Should we rank someone based on their ability to defeat “better” players, or their inability to defeat “lesser” players?  As always the standard ATP ranking system appears to be a decent compromise.

For my purposes, what matters is how well a ranking system predicts future results.  I hope that soon I’ll be able to report on how this one performs.

In the meantime, here are the 2010 year-end top 100, using the opponent-based model I’ve described.  I’ve also included each player’s actual 2010 year-end ranking and the difference between their placement in the two systems.

Rk   Player                   Pts  Actual  Diff  
1    Rafael Nadal            4562       1     0  
2    Roger Federer           4529       2     0  
3    Robin Soderling         3905       5     2  
4    David Ferrer            3450       7     3  
5    Andy Murray             3347       4    -1  
6    Tomas Berdych           2891       6     0  
7    Jurgen Melzer           2772      11     4  
8    Novak Djokovic          2730       3    -5  
9    Fernando Verdasco       2697       9     0  
10   Andy Roddick            2331       8    -2  
11   Gael Monfils            2266      12     1  
12   Nikolay Davydenko       2070      22    10  
13   Mikhail Youzhny         2059      10    -3  
14   Ivan Ljubicic           1952      17     3  
15   Guillermo Garcia-Lopez  1948      33    18  
16   Nicolas Almagro         1908      15    -1  
17   Marcos Baghdatis        1859      20     3  
18   Marin Cilic             1843      14    -4  
19   Albert Montanes         1832      25     6  
20   Michael Llodra          1822      23     3  
21   Ernests Gulbis          1695      24     3  
22   Viktor Troicki          1654      28     6  
23   Mardy Fish              1646      16    -7  
24   Jo-Wilfried Tsonga      1577      13   -11  
25   Stanislas Wawrinka      1573      21    -4  
26   Richard Gasquet         1570      30     4  
27   Florian Mayer           1497      37    10  
28   John Isner              1473      19    -9  
29   Philipp Kohlschreiber   1456      34     5  
30   Feliciano Lopez         1396      32     2  
31   David Nalbandian        1395      27    -4  
32   Juan Monaco             1394      26    -6  
33   Samuel Querrey          1353      18   -15  
34   Xavier Malisse          1342      60    26  
35   Jeremy Chardy           1272      45    10  
36   Andrei Goloubev         1236      36     0  
37   Juan Carlos Ferrero     1227      29    -8  
38   Jarkko Nieminen         1214      39     1  
39   Gilles Simon            1180      41     2  
40   Janko Tipsarevic        1178      49     9  
41   Benjamin Becker         1145      53    12  
42   Michael Berrer          1144      58    16  
43   Thomaz Bellucci         1125      31   -12  
44   Alexander Dolgopolov    1058      48     4  
45   Denis Istomin           1058      40    -5  
46   Andreas Seppi           1047      52     6  
47   Thiemo de Bakker        1033      43    -4  
48   Potito Starace          1031      47    -1  
49   Daniel Gimeno            983      56     7  
50   Olivier Rochus           971     113    63  
51   Lleyton Hewitt           944      54     3  
52   Julien Benneteau         941      44    -8  
53   Marcel Granollers        937      42   -11  
54   Juan Ignacio Chela       872      38   -16  
55   Pablo Cuevas             868      63     8  
56   Tommy Robredo            851      50    -6  
57   Philipp Petzschner       817      57     0  
58   Sergey Stakhovsky        813      46   -12  
59   Dudi Sela                808      75    16  
60   Santiago Giraldo         805      64     4  
61   Michael Zverev           797      82    21  
62   Radek Stepanek           789      62     0  
63   Fabio Fognini            784      55    -8  
64   Mikhail Kukushkin        781      59    -5  
65   Yen-Hsun Lu              745      35   -30  
66   Igor Andreev             722      79    13  
67   Carlos Berlocq           719      66    -1  
68   Ryan Sweeting            716     116    48  
69   Teimuraz Gabashvili      714      80    11  
70   Arnaud Clement           703      78     8  
71   Lukas Lacko              691      89    18  
72   Tobias Kamke             652      67    -5  
73   Pere Riba                646      72    -1  
74   Rainer Schuettler        642      84    10  
75   Robin Haase              626      65   -10  
76   Florent Serra            625      69    -7  
77   Leonardo Mayer           622      94    17  
78   Rui Machado              618      93    15  
79   Kevin Anderson           596      61   -18  
80   Albert Ramos             595     123    43  
81   Ivo Karlovic             567      73    -8  
82   Frederico Gil            554     101    19  
83   Daniel Brands            546     104    21  
84   Alejandro Falla          544     105    21  
85   Simon Greul              534     130    45  
86   Simone Bolelli           521     107    21  
87   Filippo Volandri         509      91     4  
88   Ilia Marchenko           488      81    -7  
89   Marco Chiudinelli        486     117    28  
90   Filip Krajinovic         483     214   124  
91   Victor Hanescu           481      51   -40  
92   Bjorn Phau               479     102    10  
93   Ivan Dodig               478      88    -5  
94   Kei Nishikori            477      98     4  
95   Evgueni Korolev          468     140    45  
96   James Blake              467     135    39  
97   Ruben Ramirez-Hidalgo    466      77   -20  
98   Ricardo Mello            461      76   -22  
99   Grigor Dimitrov          460     106     7  
100  Brian Dabul              458      85   -15

Comments Off on Marginal ATP Rankings

Predictiveness of ATP Rankings – Research Notes

Posted in tennis by Jeff on January 19, 2011

I’m working on some bigger projects right now that might take some time before they see the light. In the meantime, here are a couple of things I’ve discovered about ATP rankings and their use to predict the outcome of matches.

1. In my earlier research, I found that in the “buckets” of matches that the favorite is most likely to win, my algorithm is still reasonably accurate.  In other words, if the ranking points predict that Nadal, say, has a 98% chance of beating the 140th ranked player, his chances are in fact that high.  The algorithm was as accurate on the extreme high end as it was anywhere else on the spectrum.

However, I only included matches in my sample where both players were ranked inside the top 200.  I thought that was an innocuous enough cutoff, but I see now why it was misleading.  If we limit the the sample that way, the most extreme favorites will only be the very top players.  In fact, the only players who my algorithm gives a 95% chance of beating the 200th ranked player are the top 5.

When I expanded the sample to players ranked outside of the top 200, the high end broke down.  In other words, in the bucket of matches where the favorite had a 90% or better chance of winning, the favorite isn’t winning that often.

There are several possible explanations for this, none of which account for the entire effect, but many of which surely play a part:

  • I’m still only looking at ATP-level matches, and if a player outside of the top 200 is in an ATP main draw match, he was not exactly randomly selected.  He may be playing at “home” on a wild card, he may be hot after a solid week in qualifying, he is probably on his favorite surface, and his ranking may be misleading due to injury.
  • Outside of the top 5 or top 10, players are substantially less consistent.  It’s tough to imagine Robin Soderling losing to a qualifier right now, but easy to see, say, Fernando Verdasco or Ivan Ljubicic doing so.

More fundamentally, I suspect that the further down the rankings you go, the less the difference in points really mean.  Certainly there’s much more movement–once you get outside the top 50, one good showing can easily gain you 10, 20, or more spots.  That doesn’t mean that a player is suddenly more skilled, which is the way my algorithm has to treat him.

Controlling for surface, wild card status, and more will help reconcile some of these differences, but ultimately, matches between drastically mismatched (on paper) opponents may have to be treated differently than matches between more closely matched peers.

2. Eliminating some quirks of the ATP ranking system doesn’t break it, at least not for my purposes.  In the process of my current projects, I wanted to be able to more easily tweak the parameters of the ranking system, so I started by rebuilding the existing one.  But there are a lot of quirks:

  • The top 4 or 5 players get a lot of points from the Tour Championships.
  • Davis Cup players get points.
  • Rankings are limited to a player’s top 18 tournaments, but there are some limitations on what those tournaments must be, resulting in cases where player gets credited for a poor showing at a grand slam, but does not get credited for a better showing (worth more points) at a smaller tournament.

All of these quirks have their purposes, given the ATP’s priorities are built around keeping fans interested and ensuring that top players focus on the most important events.  But they are a pain in the butt to incorporate in an on-the-fly system, so I just ignored them.

And as it turns out, they are not affecting my results in any meaningful way.  I’ve re-run a couple of earlier projects with my “improvised” rankings, and nothing is changing by more than a percent or two.  Occasionally the effect is strong on a certain player (I think the improvised system bumps Juan Carlos Ferrero from #29 to inside the top #15 at 2010 year-end), but in the aggregate, it makes no difference.

Comments Off on Predictiveness of ATP Rankings – Research Notes

2011 Aussie Open Simulation Results

Posted in programming, tennis by Jeff on January 16, 2011

Using my simple ranking-points-based algorithm to determine the odds that each player wins a match, I ran simulations using the 2011 Australian Open draw.

As usual, the keyword is “simple,” and you can easily find all sorts of intuitive reasons to discount the results.  There’s no consideration for surface, so clay-court specialists are generally overrated.  Players returning from injury (Del Potro, especially, and Karlovic) have seen the hit in the rankings, and are thus underrated here, as well.

I’m also publishing the code that I use to generate these sims. It should work for any single-elimination tournament up to 128 competitors, and is easily expandable to handle larger brackets.  The function ‘calcWP’ is specific to my tennis algorithm, but you could swap in something like log5 very easily. I also included the .csv file I used for the draw, so you can see the format, or tinker with the parameters and come up with your own Aussie sim.

Your 2011 Australian Open…

Player               points    R64    R32    R16     QF     SF      F      W  
Nadal             1   12390  96.9%  92.7%  87.0%  78.1%  66.1%  49.6%  34.5%  
Daniel                  564   3.1%   1.4%   0.5%   0.1%   0.0%   0.0%   0.0%  
Sweeting          Q     486  35.3%   1.6%   0.5%   0.1%   0.0%   0.0%   0.0%  
Gimeno-Traver           844  64.7%   4.3%   1.9%   0.7%   0.2%   0.0%   0.0%  
Tomic             W     239  17.9%   3.1%   0.1%   0.0%   0.0%   0.0%   0.0%  
Chardy                  960  82.1%  39.6%   3.7%   1.4%   0.4%   0.1%   0.0%  
Falla                   540  27.3%  11.3%   0.7%   0.2%   0.0%   0.0%   0.0%  
Lopez F          31    1310  72.7%  46.0%   5.6%   2.6%   0.9%   0.2%   0.0%  
                                                                              
Isner            20    1850  74.0%  56.8%  31.7%   5.8%   2.5%   0.8%   0.2%  
Serra                   711  26.0%  14.0%   4.6%   0.5%   0.1%   0.0%   0.0%  
Stepanek                735  62.1%  20.4%   6.9%   0.6%   0.2%   0.0%   0.0%  
Gremelmayr        Q     469  37.9%   8.8%   2.2%   0.1%   0.0%   0.0%   0.0%  
Machado                 573  41.2%  10.2%   3.3%   0.2%   0.0%   0.0%   0.0%  
Giraldo                 785  58.8%  18.3%   7.2%   0.7%   0.2%   0.0%   0.0%  
Young D           Q     435  14.6%   5.4%   1.4%   0.1%   0.0%   0.0%   0.0%  
Cilic            15    2140  85.4%  66.1%  42.8%   8.7%   4.1%   1.4%   0.4%  
                                                                              
Youzhny          10    2920  85.6%  70.1%  51.9%  29.2%   8.1%   3.3%   1.1%  
Ilhan                   574  14.4%   6.2%   2.2%   0.5%   0.0%   0.0%   0.0%  
Kavcic            Q     552  38.0%   7.1%   2.4%   0.5%   0.0%   0.0%   0.0%  
Anderson K              868  62.0%  16.6%   7.4%   2.1%   0.2%   0.0%   0.0%  
Raonic            Q     351  36.4%   6.8%   1.0%   0.2%   0.0%   0.0%   0.0%  
Phau                    581  63.6%  18.0%   4.1%   0.9%   0.1%   0.0%   0.0%  
Chela                  1070  39.3%  27.8%   9.9%   3.2%   0.4%   0.1%   0.0%  
Llodra           22    1575  60.7%  47.4%  21.0%   8.8%   1.6%   0.4%   0.1%  
                                                                              
Nalbandian       27    1480  64.2%  49.1%  18.4%   8.2%   1.4%   0.4%   0.1%  
Hewitt                  870  35.8%  23.1%   6.1%   2.0%   0.2%   0.0%   0.0%  
Berankis                589  61.1%  19.1%   3.9%   1.0%   0.1%   0.0%   0.0%  
Matosevic         W     392  38.9%   8.8%   1.4%   0.2%   0.0%   0.0%   0.0%  
Russell                 547  67.0%  10.1%   3.6%   0.8%   0.1%   0.0%   0.0%  
Ebden             W     288  33.0%   2.7%   0.6%   0.1%   0.0%   0.0%   0.0%  
Nieminen               1062  20.2%  14.5%   7.7%   2.8%   0.4%   0.1%   0.0%  
Ferrer            7    3735  79.8%  72.7%  58.4%  39.4%  12.6%   5.8%   2.4%  
                                                                              
Soderling         4    5785  87.9%  83.6%  71.9%  58.3%  35.9%  15.6%   7.9%  
Starace                 945  12.1%   8.9%   4.2%   1.7%   0.3%   0.0%   0.0%  
Muller            Q     466  76.9%   6.9%   2.0%   0.5%   0.1%   0.0%   0.0%  
Stadler           Q     155  23.1%   0.7%   0.1%   0.0%   0.0%   0.0%   0.0%  
Istomin                1031  86.2%  41.8%   8.8%   3.6%   0.8%   0.1%   0.0%  
Hernych           Q     196  13.8%   1.9%   0.1%   0.0%   0.0%   0.0%   0.0%  
Mello                   627  30.0%  12.8%   1.9%   0.6%   0.1%   0.0%   0.0%  
Bellucci         30    1355  70.0%  43.5%  11.0%   5.3%   1.4%   0.2%   0.1%  
                                                                              
Gulbis           24    1505  64.3%  41.5%  20.7%   6.3%   1.9%   0.4%   0.1%  
Becker                  870  35.7%  17.9%   6.4%   1.3%   0.2%   0.0%   0.0%  
Dolgopolov              928  53.6%  22.8%   8.6%   1.8%   0.4%   0.0%   0.0%  
Kukushkin               815  46.4%  17.9%   6.3%   1.2%   0.2%   0.0%   0.0%  
Seppi                   900  59.6%  19.2%   8.7%   1.9%   0.4%   0.0%   0.0%  
Clement                 627  40.4%   9.9%   3.5%   0.6%   0.1%   0.0%   0.0%  
Petzschner              839  24.3%  12.6%   5.5%   1.1%   0.2%   0.0%   0.0%  
Tsonga           13    2345  75.7%  58.2%  40.4%  15.8%   6.3%   1.6%   0.5%  
                                                                              
Melzer           11    2785  91.2%  77.7%  54.3%  22.9%  10.4%   3.0%   1.0%  
Millot            Q     334   8.8%   3.3%   0.7%   0.1%   0.0%   0.0%   0.0%  
Ball              W     344  32.5%   4.1%   0.9%   0.1%   0.0%   0.0%   0.0%  
Riba                    672  67.5%  14.8%   5.5%   1.0%   0.2%   0.0%   0.0%  
Sela                    568  77.8%  21.8%   5.0%   0.7%   0.1%   0.0%   0.0%  
Del Potro               180  22.2%   2.4%   0.2%   0.0%   0.0%   0.0%   0.0%  
Zemlja            Q     376  15.1%   6.9%   1.2%   0.1%   0.0%   0.0%   0.0%  
Baghdatis        21    1785  84.9%  68.9%  32.2%  10.7%   3.8%   0.8%   0.2%  
                                                                              
Garcia-Lopez     32    1300  62.1%  44.0%  10.6%   4.2%   1.2%   0.2%   0.0%  
Berrer                  835  37.9%  22.8%   3.9%   1.1%   0.2%   0.0%   0.0%  
Schwank                 580  50.6%  16.9%   2.3%   0.5%   0.1%   0.0%   0.0%  
Mayer L                 572  49.4%  16.3%   2.1%   0.4%   0.1%   0.0%   0.0%  
Marchenko               624  49.3%   5.5%   2.3%   0.6%   0.1%   0.0%   0.0%  
Ramirez Hidalgo         638  50.7%   5.7%   2.4%   0.6%   0.1%   0.0%   0.0%  
Beck K                  543   7.0%   3.2%   1.2%   0.3%   0.0%   0.0%   0.0%  
Murray            5    5760  93.0%  85.5%  75.3%  56.7%  35.5%  15.6%   7.9%  
                                                                              
Berdych           6    3955  96.4%  78.5%  63.1%  42.3%  22.0%   9.6%   3.4%  
Crugnola          Q     194   3.6%   0.5%   0.1%   0.0%   0.0%   0.0%   0.0%  
Kohlschreiber          1215  63.8%  15.2%   8.3%   3.1%   0.9%   0.2%   0.0%  
Kamke                   724  36.2%   5.8%   2.4%   0.7%   0.1%   0.0%   0.0%  
Harrison          W     313  32.3%   6.7%   0.6%   0.1%   0.0%   0.0%   0.0%  
Mannarino               612  67.7%  22.8%   3.9%   0.9%   0.1%   0.0%   0.0%  
Dancevic          Q     172   9.0%   2.2%   0.1%   0.0%   0.0%   0.0%   0.0%  
Gasquet          28    1385  91.0%  68.3%  21.5%   8.8%   2.5%   0.6%   0.1%  
                                                                              
Davydenko        23    1555  60.0%  41.5%  17.1%   6.5%   2.0%   0.5%   0.1%  
Mayer F                1073  40.0%  23.9%   8.0%   2.3%   0.6%   0.1%   0.0%  
Fognini                 855  59.6%  22.7%   6.5%   1.7%   0.3%   0.0%   0.0%  
Nishikori               599  40.4%  12.0%   2.7%   0.5%   0.1%   0.0%   0.0%  
Zverev                  611  38.3%   7.2%   2.4%   0.5%   0.1%   0.0%   0.0%  
Tipsarevic              935  61.7%  16.0%   7.2%   2.0%   0.4%   0.1%   0.0%  
Schuettler              597  13.5%   5.8%   1.9%   0.4%   0.1%   0.0%   0.0%  
Verdasco          9    3240  86.5%  71.1%  54.1%  30.3%  14.2%   5.7%   1.8%  
                                                                              
Almagro          14    2160  84.5%  68.0%  41.9%  15.4%   6.8%   2.2%   0.5%  
Robert            Q     460  15.5%   6.6%   1.8%   0.2%   0.0%   0.0%   0.0%  
Andreev                 622  52.1%  13.7%   4.5%   0.8%   0.1%   0.0%   0.0%  
Volandri                574  47.9%  11.8%   3.6%   0.5%   0.1%   0.0%   0.0%  
Cipolla           Q     190  32.6%   3.4%   0.4%   0.0%   0.0%   0.0%   0.0%  
Paire             W     366  67.4%  12.5%   2.5%   0.3%   0.0%   0.0%   0.0%  
Luczak            W     400  14.7%   8.6%   1.9%   0.2%   0.0%   0.0%   0.0%  
Ljubicic         17    1965  85.3%  75.5%  43.4%  15.1%   6.2%   1.8%   0.4%  
                                                                              
Troicki          29    1385  86.2%  64.4%  16.2%   7.2%   2.4%   0.5%   0.1%  
Tursunov                263  13.8%   4.5%   0.3%   0.0%   0.0%   0.0%   0.0%  
Dabul                   584  58.6%  19.8%   2.7%   0.7%   0.1%   0.0%   0.0%  
Mahut             Q     424  41.4%  11.3%   1.1%   0.2%   0.0%   0.0%   0.0%  
Karlovic                670  52.8%   6.2%   2.5%   0.7%   0.1%   0.0%   0.0%  
Dodig                   606  47.2%   5.0%   2.0%   0.5%   0.1%   0.0%   0.0%  
Granollers              993  11.6%   7.2%   3.6%   1.4%   0.4%   0.1%   0.0%  
Djokovic          3    6240  88.4%  81.6%  71.5%  56.9%  40.2%  21.9%  10.2%  
                                                                              
Roddick           8    3565  88.5%  78.1%  61.4%  42.2%  16.8%   8.1%   2.7%  
Hajek                   560  11.5%   5.8%   2.0%   0.4%   0.0%   0.0%   0.0%  
Przysiezny              590  51.7%   8.5%   2.9%   0.7%   0.1%   0.0%   0.0%  
Kunitsyn                551  48.3%   7.6%   2.5%   0.7%   0.1%   0.0%   0.0%  
Berlocq                 725  47.1%  16.8%   4.0%   1.2%   0.2%   0.0%   0.0%  
Haase                   803  52.9%  20.0%   5.2%   1.7%   0.3%   0.0%   0.0%  
Benneteau               965  38.5%  21.8%   6.3%   2.3%   0.4%   0.1%   0.0%  
Monaco           26    1480  61.5%  41.5%  15.7%   7.2%   1.7%   0.5%   0.1%  
                                                                              
Wawrinka         19    1855  76.7%  52.3%  28.1%  12.8%   3.5%   1.2%   0.2%  
Gabashvili              626  23.3%   9.4%   2.6%   0.6%   0.1%   0.0%   0.0%  
Dimitrov          Q     518  29.5%   7.5%   1.8%   0.4%   0.0%   0.0%   0.0%  
Golubev                1135  70.5%  30.8%  12.7%   4.2%   0.8%   0.2%   0.0%  
Gil                     551  40.2%   8.3%   2.4%   0.5%   0.0%   0.0%   0.0%  
Cuevas                  790  59.8%  16.4%   6.1%   1.6%   0.2%   0.0%   0.0%  
De Bakker               950  25.2%  14.9%   6.2%   1.9%   0.3%   0.1%   0.0%  
Monfils          12    2560  74.8%  60.4%  40.2%  21.5%   7.2%   2.9%   0.7%  
                                                                              
Fish             16    1996  70.1%  52.0%  32.0%   8.2%   3.9%   1.3%   0.3%  
Hanescu                 915  29.9%  16.4%   6.8%   1.0%   0.3%   0.0%   0.0%  
Robredo                 915  65.2%  23.4%   9.9%   1.5%   0.4%   0.1%   0.0%  
Devvarman               514  34.8%   8.2%   2.4%   0.2%   0.0%   0.0%   0.0%  
Stakhovsky              925  64.4%  24.8%  10.2%   1.6%   0.4%   0.1%   0.0%  
Brands                  541  35.6%   9.3%   2.6%   0.3%   0.1%   0.0%   0.0%  
Kubot                   670  24.5%  11.4%   3.9%   0.5%   0.1%   0.0%   0.0%  
Querrey          18    1860  75.5%  54.5%  32.1%   7.8%   3.4%   1.1%   0.2%  
                                                                              
Montanes         25    1495  74.3%  48.4%   8.8%   4.5%   1.7%   0.5%   0.1%  
Brown                   573  25.7%  10.3%   0.9%   0.2%   0.0%   0.0%   0.0%  
Andujar                 683  40.9%  14.9%   1.4%   0.4%   0.1%   0.0%   0.0%  
Malisse                 956  59.1%  26.4%   3.4%   1.4%   0.4%   0.1%   0.0%  
Lu                     1141  53.7%   6.2%   3.2%   1.4%   0.5%   0.1%   0.0%  
Simon                  1005  46.3%   4.8%   2.3%   0.9%   0.3%   0.1%   0.0%  
Lacko                   553   4.3%   1.4%   0.4%   0.1%   0.0%   0.0%   0.0%  
Federer           2    9245  95.7%  87.6%  79.6%  70.0%  56.6%  40.3%  22.4%