The Summer of Jeff

2009 US Open Men’s Qualifiers

Posted in tennis by Jeff on August 30, 2009

I spent most of the last week at the National Tennis Center watching men’s US Open qualifying matches. I saw almost all of the guys who ended up qualifying. Here are some tidbits about each one.  Their country and current ATP ranking is in parenthesis.

Giovanni Lapentti (ECU – 182)

I saw Giovanni’s first-round match against Nicolas Mahut.  Mahut is a smart player, but simply couldn’t keep up with Lapentti on the ground.  Lapentti looked to have a solid all-around game, but without any one facet that stands out.  It’s tough to imagine him ever breaking the top fifty, or presenting much of a threat in the main draw.  His first-round matchup is with Simon Greul.  If he makes it through that one, he’ll get Roger Federer.

Donald Young (USA – 185)

Young didn’t get a wild card into the main draw, but he managed to fight his way in anyway.  I saw his first-rounder with Marco Crugnola.  Crugnola (ranked #199) looked to me like the better player, but neither competitor looked particularly strong mentally.  Young choked less, and he went on to beat Guillermo Olaso (who I was impressed with, incidentally) and Lukas Rosol.  He gets Tommy Robredo in the first round, and I can’t imagine it’s worth speculating what would happen after that.

Michael Yani (USA – 248)

I hadn’t heard of Yani before this week, and just happened to see most of his qualifying-round match against Peter Luczak.  Luczak is currently ranked #78 (more on him in a bit), so I was surprised that this match was close at all.  Yani looked as good as anyone I saw all week.  Big serve, nice forehand, but most of all, rock-solid consistent throughout the match.  He got better as each set progressed and put away Luczak with relative ease in a second-set tiebreak.  I’ll be rooting for him, but probably not for long, since he gets Sam Querrey in the first round.

Carsten Ball (AUS – 154)

The tall, big-serving lefty got his big break this summer, reaching the finals in LA.  But he just barely made it into the main draw.  I saw his second-round match against South African Rik De Voest, a player I like a lot.  It was a choke-fest that went to a third-set tiebreak.  De Voest played better but choked more, included a disaster in that final tiebreak.  Ball is one-dimensional, but Ivo Karlovic can tell you that one dimension can get you a long way in this game.

Ball draws another qualifier, Argentine Juan Pablo Brzezicki.  I didn’t see Brzezicki so I don’t have anything to say about him.  As the Aussie showed in LA, the booming lefty serve is a threat to anyone.  But…if he beats Brzezicki, he’ll get Novak Djokovic.  Maybe next year.

Jesse Witten (USA – 270)

I didn’t see Witten in this year’s qualies, but I’ve seen him several times in the past.  He’ll always be a favorite of mine thanks to a late-night outer-court match a few years ago against Paul Goldstein.  It went five sets and Witten was entertaining for every minute of it.

He’s a fun player to watch, but he’s eked every last bit of performance out of his game.  He beat Go Soeda, Stephane Bohli, and Alexander Peya to get this far.  Not a bad week, and he may yet make Igor Andreev work to get to the second round.

Somdev Devvarman (IND – 161)

I saw parts of two of Devvarman’s matches this week–a relatively easy win against Alex Bogolomov Jr. and then a three-setter against Igor Sijsling.  Devvarman has an interesting game, mostly based on speed and consistency.  He hits his groundies with a ton of topspin, often dropping them short, but never letting his opponent get into a rhythm with a consistent depth.  That tactic was certainly driving Bogomolov nuts.

Most impressive was Devvarman’s straight-sets dispatching of Jerzy Janowicz, the big-serving 18-year-old from Poland.  The Indian may have the best shot of all the qualifiers to make some noise in the tournament.  He gets Frederico Gil in the first round, then a probable matchup with Philip Kohlschrieber.  Even after that, it’s likely Radek Stepanek in the third round.  Devvarman shouldn’t be favored in any other those contests, but he’s a legitimate threat.

Alejandro Falla (COL – 159)

I’ve seen the Colombian in qualifying for several years running.  He’s great fun to watch, running down every ball and driving his opponents crazy with seemingly impossible lobs.  He draws Tommy Haas in the first round, which isn’t a particularly good matchup for Falla, who appears every bit as emotionally unstable as the German.

Marsel Ilhan (TUR – 230)

Before this week, I didn’t know they played tennis in Turkey.  As it turns out, Ilhan is not only his country’s top-ranked player, but also the only Turk in the top thousand.  I saw parts of all three of his matches, and frankly I’m surprised he has gotten this far.

On Wednesday, he faced 17-year-old American wildcard Ryan Harrison, a big server with tons of potential.  Harrison took the second set 6-0 before losing the feel for his serve and going down two breaks in the final set.  Ilhan served for the match at 5-2 but did some faltering of his own before eventually winning 7-5.  He beat a cranky Frenchman, Sebastian De Chaunac, 7-6 7-6 in the second round before easily dispatching Ricardo Mello Saturday.

As might be inferred from the performance against Harrison, Ilhan isn’t the most consistent player.  He’s a typical big serve/forehand package with somewhat unorthodox groundstrokes.  They get the job done, though–I can’t remember the last time I saw more groundies drop right on the baseline than Ilhan placed there against Mello.  He did seem to get stronger as the week progressed; perhaps he’s gotten more comfortable playing on a relatively big stage.  He has a chance to make a mark next week, with Christophe Rochus in the first round, followed by the winner of Isner-Hanescu.

Peter Luczak (AUS – 78)

Luczak was unlucky in that he did make the cut (around #104) when the US Open entry list was determined, but lucky in that he made the main draw anyway after losing in the final round of qualifying.  He lost to the aforementioned Michael Yani, and he was flat-out outplayed.  I don’t have much to say about Luczak–he looked like a solid player, especially from the baseline, but he couldn’t do enough against Yani.  He gets Victor Troicki in the first round, a winnable match that he probably won’t win.

Michael Berrer (GER – 120)

I watched most of Berrer’s first-rounder against Sam Warburg, and I am now a fan of his.  He suffered from a smattering of bad calls toward the end of the third set, and after one exchange with the umpire, he yelled, pleadingly: “What is your mission?!”  Warburg was hurting, and that’s the only reason Berrer got through the match in what turned out to be an easy third-set tiebreak.

That said, I liked Berrer’s game.  He mixes things up and seems to have a good idea of how to use his lefthandedness to his advantage.  I don’t see him having much more success than he has already enjoyed, but he’s definitely a guy I’ll try to watch again.  In the first round, he’ll face…

Horacio Zeballos (ARG – 76)

Zeballos is an exciting young player, an aggressive lefty with a potentially huge one-handed backhand.  I watched him play circles around Sergei Bubka in the second round.  He then qualified easily against Stefan Koubek.  Zeballos looked to be a little erratic–perhaps that’s the youth speaking–but he’s got a big, big game and much of the inconsistently stemmed from his willingness to go for big shots.  This is a guy we’ll be hearing more about.

Zeballos-Berrer will be a fun first-rounder, and perhaps Berrer can make it a tight one with his veteran wiles.

Josselin Ouanna (FRA – 103)

Ouanna is one of two qualifiers (the other is Thomaz Belluci, more on him later) who are ranked higher than their first-round opponents.  Ouanna draws Rajeev Ram, an American having a breakout season.

I saw Ouanna’s third-rounder on Saturday against the fiesty Brazilian Julio Silva.  Silva’s a tough guy to play: He looks and often acts like a counterpuncher, but he’s got a surprisingly big serve.  It took Ouanna three sets to beat him, but no one watching the match could’ve doubted which player had the better chance in the main draw.

Ouanna is a big man who pummels his serve.  I didn’t watch the entire match but I saw multiple games in which the Frenchman put three aces down the T.  He plays a lot like Tsonga, with some of Tsonga’s flaws as well: The only reason the match went as deep as it did is that he would miss some easy balls, and occasionally the backhand just deserted him for a game or two.  Despite that, I’d give Ouanna the slight edge over Ram.

Thomaz Bellucci (BRA – 68)

Bellucci’s ranking skyrocketed thanks to winning Gstaad earlier this summer, but not in time to make the US Open main draw.  While he had to get in the hard way, he made it look easy, not dropping a set, never pushed farther than 6-4 against a couple of dark-horse threats in Grigor Dimitrov and Scoville Jenkins.

Bellucci is yet another lefty.  He’s tall and uses it to his advantage, whipping groundstrokes when you think he’ll just barely get a racquet on the ball.  He looked shaky for much of the first set against Dimitrov and occasionally against Jenkins, but the difference in ability level was too great for it to matter much.  Like Zeballos, Bellucci is a name we’ll be hearing for years to come, probably associated with numbers a lot smaller than 68.

As mentioned above, Belluci is technically the favorite in his opening match against Yen-Hsun Lu, who is ranked #71.  I saw Lu at Indian Wells this year–he’s a talented player and a blast to watch, but I suspect he won’t have enough answers to give Bellucci much of a challenge.

Others

The other qualifiers are:

  • Peter Polansky (CAN – 200)
  • Marco Chiudinelli (SUI – 160)
  • Dieter Kindlmann (GER – 219)
  • Juan Pablo Brzezicki (ARG – 188)

I didn’t see any of those guys play.  Kindlmann is likely to have a short stay in New York.  He takes on Nikolay Davydenko in the second match tomorrow.  Aside from Brzezicki, who faces Ball in the first round, Chiudinelli has a relatively easy path, facing Potito Starace to open the tournament.

There you go — 1,700 words on a bunch of guys who probably won’t be in the tournament come Saturday.

Comments Off on 2009 US Open Men’s Qualifiers

2009 US Open Predictatron

Posted in tennis by Jeff on August 30, 2009

Now that the qualifiers are entered in the men’s draw for the 2009 US Open, I’ve used my brand-spankin’-new formula to predict the outcome of the tournament.

I shouldn’t have to say this, but: This is just a toy.  If you’re using these numbers to place your bets, you probably aren’t very smart.  In the post linked above, I list a number of flaws in the system.  There are probably more that I haven’t thought of yet.

Because my method is based on ATP ranking points, it should come as no surprise that a player’s ATP ranking is very highly correlated to their chances of winning the tournament.  Once you get to #10 or so, though, there are some quirks.

Here are the ten players with the best shot at winning it, along with their percent chances for the semis, finals, and championship:

Player           points     SF      F      W
Federer (1)      12040   68.83  48.57  31.17
Murray (2)       9610    54.27  34.98  19.19
Nadal (3)        9025    54.29  32.49  17.34
Djokovic (4)     7660    47.19  23.58  12.42
Roddick (5)      5720    28.73  12.32   5.55
Del Potro (6)    5325    21.06  10.58   4.24
Tsonga (7)       3920    19.31   7.64   2.52
Davydenko (8)    3655    13.82   5.77   2.00
Simon (9)        3410    10.68   4.22   1.26
Gonzalez F (11)  2825     8.87   2.86   0.75

Freddie just barely edges out 10th-seeded Fernando Verdasco for the last spot.

And since that isn’t nearly enough data, what follows is the complete draw, along with each player’s chance of reaching each round.

Player              points    2nd    3rd    4th     QF     SF      F      W
Federer (1)         12040   99.98  95.78  90.27  81.39  68.83  48.57  31.17
Britton             4        0.02   0.00   0.00   0.00   0.00   0.00   0.00
Lapentti G (Q)      341     27.18   0.53   0.14   0.02   0.00   0.00   0.00
Greul               836     72.82   3.69   1.79   0.63   0.16   0.02   0.00
Chela               467     38.33  11.63   0.49   0.11   0.02   0.00   0.00
Hernandez Oscar     720     61.67  25.18   1.57   0.50   0.11   0.01   0.00
Alves               592     29.89  14.69   0.76   0.22   0.04   0.01   0.00
Hewitt (31)         1285    70.11  48.50   4.97   2.28   0.80   0.16   0.02

Blake (21)          1520    75.35  47.68  23.37   3.34   1.29   0.30   0.05
Ramirez-Hidalgo     551     24.65   8.99   2.32   0.15   0.03   0.00   0.00
Rochus O            678     37.12  13.27   3.97   0.31   0.07   0.01   0.00
Kunitsyn            1098    62.88  30.07  12.31   1.38   0.43   0.08   0.01
Polansky (Q)        301     21.40   3.08   0.61   0.02   0.00   0.00   0.00
Garcia-Lopez        984     78.60  27.44  12.70   1.31   0.38   0.07   0.01
Young D (Q)         336     11.41   3.37   0.71   0.03   0.00   0.00   0.00
Robredo (14)        2165    88.59  66.11  44.02   8.29   3.95   1.17   0.28

Soderling (12)      2665    74.83  58.65  40.49  21.44   5.34   1.82   0.50
Montanes            991     25.17  13.90   6.04   1.82   0.22   0.04   0.01
Zverev              930     58.48  17.60   7.39   2.12   0.24   0.04   0.00
Granollers          680     41.52   9.85   3.36   0.77   0.07   0.01   0.00
Kim                 632     32.23  10.56   2.87   0.63   0.05   0.01   0.00
Sela                1244    67.77  33.20  14.01   4.91   0.71   0.14   0.02
Yani (Q)            220     10.63   1.75   0.20   0.02   0.00   0.00   0.00
Querrey (22)        1520    89.37  54.49  25.64  10.17   1.74   0.41   0.08

Mathieu (26)        1390    62.62  45.79  15.45   7.04   1.11   0.24   0.04
Youzhny             870     37.38  23.20   5.66   1.93   0.22   0.03   0.00
Starace             665     64.65  22.77   4.61   1.28   0.11   0.01   0.00
Chiudinelli (Q)     386     35.35   8.24   1.07   0.19   0.01   0.00   0.00
Hernych             812     56.81  10.96   5.03   1.63   0.17   0.02   0.00
Schuettler          635     43.19   6.83   2.71   0.74   0.06   0.01   0.00
Kindlmann (Q)       258      5.18   1.26   0.25   0.03   0.00   0.00   0.00
Davydenko (8)       3655    94.82  80.95  65.22  45.27  13.82   5.77   2.00

Djokovic (4)        7660    90.59  87.43  78.49  68.08  47.19  23.58  12.42
Ljubicic            976      9.41   7.01   3.44   1.47   0.35   0.04   0.01
Brzezicki (Q)       334     44.91   2.25   0.55   0.11   0.01   0.00   0.00
Ball (Q)            402     55.09   3.30   0.91   0.21   0.02   0.00   0.00
Beck K              654     46.61  16.97   1.99   0.65   0.11   0.01   0.00
Gonzalez Maximo     739     53.39  21.05   2.75   0.98   0.18   0.02   0.00
Witten (Q)          197     10.87   2.16   0.10   0.01   0.00   0.00   0.00
Andreev (29)        1330    89.13  59.82  11.76   5.96   1.71   0.27   0.04

Kohlschreiber (23)  1510    61.46  47.40  25.22   6.13   1.92   0.35   0.07
Seppi               990     38.54  26.23  11.12   2.03   0.47   0.06   0.01
Devvarman (Q)       386     39.07   8.35   1.80   0.15   0.02   0.00   0.00
Gil                 579     60.93  18.03   5.35   0.64   0.10   0.01   0.00
Golubev             901     52.01  18.58   8.69   1.45   0.32   0.04   0.00
Mayer               834     47.99  16.32   7.30   1.15   0.24   0.03   0.00
Bolelli             816     27.18  13.10   5.77   0.91   0.18   0.02   0.00
Stepanek (15)       2000    72.82  52.01  34.75  10.09   3.81   0.85   0.19

Verdasco (10)       3015    76.15  60.89  43.40  20.18   8.14   2.39   0.72
Becker              1054    23.85  13.26   5.98   1.49   0.30   0.04   0.01
Serra               866     50.29  13.09   5.24   1.15   0.21   0.02   0.00
Tipsarevic          855     49.71  12.76   5.07   1.10   0.19   0.02   0.00
Kendrick            747     42.81  14.67   4.13   0.80   0.13   0.01   0.00
Vassallo Arguello   973     57.19  23.11   7.79   1.86   0.36   0.05   0.01
Falla (Q)           386     15.88   4.67   0.78   0.09   0.01   0.00   0.00
Haas (20)           1760    84.12  57.56  27.61   9.57   2.82   0.57   0.12

Hanescu (28)        1346    60.55  41.64  10.68   4.48   1.11   0.19   0.03
Isner               914     39.45  23.49   4.58   1.48   0.28   0.03   0.00
Ilhan (Q)           243     21.24   3.31   0.21   0.02   0.00   0.00   0.00
Rochus C            799     78.76  31.56   5.54   1.68   0.29   0.03   0.00
Gicquel             729     36.00   4.76   1.99   0.57   0.09   0.01   0.00
Turnsunov           1230    64.00  12.83   7.13   2.85   0.66   0.10   0.01
Phau                707      9.17   3.68   1.50   0.42   0.07   0.01   0.00
Roddick (5)         5720    90.83  78.72  68.37  52.27  28.73  12.32   5.55

Tsonga (7)          3920    99.77  83.94  67.92  46.02  19.31   7.64   2.52
Buchanan Chase      16       0.23   0.00   0.00   0.00   0.00   0.00   0.00
Nieminen            924     54.50   9.33   4.39   1.46   0.25   0.03   0.00
Fognini             785     45.50   6.73   2.88   0.85   0.13   0.02   0.00
Cipolla             537     31.68  10.11   1.45   0.32   0.03   0.00   0.00
Benneteau           1079    68.32  34.09   8.65   3.16   0.57   0.09   0.01
Luczak (LL)         744     34.74  15.98   3.06   0.87   0.12   0.01   0.00
Troicki (30)        1320    65.26  39.83  11.67   4.79   1.02   0.20   0.03

Berdych             1945    75.67  58.11  29.81  12.74   3.53   0.88   0.18
Odesnik             694     24.33  12.58   3.57   0.77   0.10   0.01   0.00
Berrer (Q)          521     39.98   9.89   2.26   0.38   0.04   0.00   0.00
Zeballos (Q)        750     60.02  19.41   5.75   1.31   0.19   0.02   0.00
Ouanna (Q)          623     54.18  11.59   3.82   0.75   0.09   0.01   0.00
Ram                 534     45.82   8.64   2.58   0.45   0.05   0.01   0.00
Massu               678     17.18   9.35   3.27   0.69   0.08   0.01   0.00
Gonzalez F (11)     2825    82.82  70.42  48.95  25.44   8.87   2.86   0.75

Monfils (13)        2475    67.67  50.02  33.16   9.91   4.75   1.40   0.33
Chardy              1267    32.33  18.69   9.20   1.80   0.57   0.10   0.01
Korolev             732     38.58   9.86   3.49   0.46   0.10   0.01   0.00
Beck A              1115    61.42  21.44   9.90   1.75   0.50   0.08   0.01
Daniel              890     47.99  17.79   6.20   0.96   0.23   0.03   0.00
Acasuso             960     52.01  20.22   7.33   1.20   0.30   0.04   0.01
Martin              664     24.10   9.89   2.76   0.33   0.07   0.01   0.00
Ferrer (18)         1890    75.90  52.10  27.97   7.09   2.93   0.73   0.14

Almagro (32)        1285    68.38  44.31   7.56   3.31   1.05   0.19   0.03
Darcis              638     31.62  14.67   1.51   0.41   0.07   0.01   0.00
Ginepri             765     97.22  40.97   4.81   1.50   0.32   0.04   0.00
Pavel               30       2.78   0.06   0.00   0.00   0.00   0.00   0.00
Llodra              540     52.11   3.49   1.25   0.30   0.05   0.00   0.00
Kiefer              500     47.89   2.98   1.01   0.23   0.03   0.00   0.00
Gasquet             1045     8.54   5.85   3.12   1.21   0.34   0.05   0.01
Nadal (3)           9025    91.46  87.68  80.74  69.54  54.29  32.49  17.34

Del Potro (6)       5325    84.82  72.76  62.38  45.04  21.06  10.58   4.24
Monaco              1110    15.18   7.87   4.12   1.45   0.25   0.05   0.01
Melzer              1160    57.53  12.19   6.58   2.36   0.42   0.08   0.01
Safin               880     42.47   7.18   3.36   1.01   0.15   0.02   0.00
Cuevas              813     57.20  22.92   4.55   1.30   0.18   0.03   0.00
Guccione            625     42.80  14.32   2.29   0.54   0.06   0.01   0.00
Koellerer           862     36.81  20.08   4.16   1.23   0.18   0.03   0.00
Fish (25)           1410    63.19  42.68  12.55   5.08   1.06   0.24   0.03

Ferrero JC (24)     1490    59.27  37.20  14.71   5.07   1.12   0.25   0.04
Santoro             1060    40.73  22.07   7.06   1.97   0.33   0.06   0.01
Petzchner           1147    72.44  33.80  11.45   3.37   0.61   0.11   0.01
Stakhovsky          476     27.56   6.93   1.24   0.19   0.02   0.00   0.00
Bellucci (Q)        813     50.31  11.28   4.54   1.03   0.14   0.02   0.00
Lu                  803     49.69  11.08   4.44   1.02   0.14   0.02   0.00
Gimeno-Traver       627     13.43   5.80   1.97   0.38   0.04   0.00   0.00
Simon (9)           3410    86.57  71.84  54.59  28.94  10.68   4.22   1.26

Cilic (16)          1985    83.40  64.45  39.73   9.53   3.69   1.04   0.20
Sweeting            458     16.60   6.81   1.87   0.15   0.02   0.00   0.00
Gabashvili          769     63.19  20.41   7.83   0.95   0.19   0.02   0.00
Levine              470     36.81   8.32   2.30   0.18   0.02   0.00   0.00
Istomin             587     53.63  15.01   4.58   0.45   0.07   0.01   0.00
Evans               514     46.37  11.80   3.29   0.29   0.04   0.00   0.00
Lapentti N          378     14.91   5.93   1.30   0.09   0.01   0.00   0.00
Wawrinka (19)       1845    85.09  67.26  39.10   8.93   3.27   0.87   0.16

Karlovic (27)       1360    74.02  43.55   6.94   3.28   1.00   0.21   0.03
Navarro             525     25.98   9.14   0.70   0.18   0.03   0.00   0.00
Lopez F             1205    81.74  43.40   6.34   2.80   0.77   0.15   0.02
Dent                309     18.26   3.92   0.19   0.03   0.00   0.00   0.00
Capdeville          680     51.62   3.81   1.42   0.43   0.08   0.01   0.00
Crivoi              640     48.38   3.38   1.21   0.35   0.06   0.01   0.00
Gulbis              640      4.80   2.35   0.81   0.24   0.04   0.00   0.00
Murray (2)          9610    95.20  90.47  82.38  72.15  54.27  34.98  19.19

Predicting the outcome of men’s tennis matches

Posted in tennis by Jeff on August 28, 2009

It’s easy to come up with a rough prediction of the outcome of a MLB baseball game.  Take a strength rating (usually winning percentage of some variant thereof) and apply Bill James’s invention, the Log5 method.

There’s no reason something similar can’t be applied to men’s tennis matches.  (Women’s tennis matches, too, but the necessary data is more readily available for men’s tennis, and I’m more interested in the men’s game.)  We need two pieces of data for such a calculation:

  1. A numerical representation of the strength of each competitor.  (In baseball, as mentioned, that’s generally a winning percentage.)
  2. An algorithm (such as Log5) to translate those two numbers into an expected winning percentage.

I’ve been mulling this over for some time, most of which I’ve spent stuck on #1.  I’m convinced that ATP rankings are a flawed representation of each player’s talent.  They are probably about right for the top 10 or so, but as you go deeper, I have more problems.  But despite some experimentation, I haven’t come up with a system that comes anywhere close to replacing it.

So ATP rankings it is.  Specifically: ATP ranking points.  At the moment, then, Federer’s rating is 12,040, Murray’s 9,610, and so on.

For #2, I saw no reason to reinvent the wheel.  While Log5 is designed to work with percentages, a little algebra can shift it to something workable with large integers.  The result:

x^e / (x^e + y^e)

x and y are the ratings of each player, and e is a constant exponent.  As it turns out, e = 1 is a pretty good approximation of the best possible results available from this formula.  That gives us a much simpler formula; a ratio, really:

x / (x+y)

It’s easy to use, and it’s reasonably accurate.

Testing

To develop and test this approach, I built a database of every ATP and ITF challenger-level match from 2009, through last week’s results in Cincinnati.  I took only those matches involving two players ranked in the top 200.  That gave me a group of 2,766 contests. Side note: 1,106 (37%) of those matches were won by the lower-ranked player.

To test a variety of formulae and exponents for the formula above, I sorted those 2,766 matches by expected winning percentage (i.e., the theoretical percent chance that the higher-ranked player would win), then split them into “buckets” of about 100 matches.  For each bucket, I calculated the average expected winning percentage and compared it to the actual results of the matches within that bucket.

Here’s a graph of the results:

The buckets are plotted from left to right in order of expected winning percentage.  For instance, the 100 matches closest on paper gave the favorites a 50.3 percent chance of winning (on average).  As it turned out, 55.4 percent of those matches were upsets.  With 100-match buckets, there will be some noise, especially when a difference that looks substantial on the graph only represents 5 or 6 matches.

New toy

Let’s look at some results, based on the rankings of August 24.  We get a “smell test” for the model, plus the fun of some hypothetical matchups.

  • Federer vs. Murray: 56.2%
  • Federer vs. Tsonga: 77.7%
  • Federer vs. Wawrinka: 88.9%
  • Federer vs. Kevin Kim (#100 right now): 96.3%
  • Nadal vs. Soderling: 79.5%
  • Roddick vs. Querrey: 81.3%
  • Tsonga vs. Simon: 53.9%
  • Verdasco vs. Feliciano Lopez: 73.5%

Here are a few interesting first-round matchups from the U.S. Open draw:

  • Mathieu vs. Youzhny: 62.7%
  • Djokovic vs. Ljubicic: 90.8%
  • Monfils vs. Chardy: 67.8%
  • Melzer vs. Safin: 57.6%

After the qualifiers are placed in the draw, I’ll run the numbers on the whole thing.  Predictions of this sort aren’t all that interesting when they are based on commonly-used rankings–it would take an incredibly stacked draw for anyone other than Federer to be the favorite–but it will be interesting to see who has theoretically tough draws, as well as seeing the percentage chances each of the top players have of taking the title.

But

Of course, I could write another thousand words about the limitations of such a system as this.  Here are a few:

  • It doesn’t take surface into account.
  • It doesn’t consider the difference between best-of-three and best-of-five matches.
  • It doesn’t know about injuries.
  • Related: it doesn’t consider the quality of recent performance.  Sam Querrey’s success of late shows up in his ranking points, but it isn’t weighted any more heavily than if he had enjoyed an equally successful February and March, instead.  While streakiness has been devalued in the analysis of team sports, it sure seems like an important component in tennis performance.
  • It doesn’t consider matchups.  Based purely on anecdotal evidence, I’d guess a disproportionate number of outliers come from guys like Karlovic and Isner.  Maybe the same is true with unknown wildcards.
  • I limited the study to matchups within the top 200.  This is a comically extreme example, but what about next week’s matchup between Federer and Devin Britton?  Britton hasn’t played much professional tennis, so he has 4 ranking points.  That gives Roger a 99.9986% chance of winning.  I’d put my money on Roger, but you have to believe in 1,000 such matches, even Federer would roll an ankle and have to retire at least once.
  • Similar to streakiness like Querrey’s current run, what about the momentum that qualifiers carry into a tournament?  Alejandro Falla is ranked #159, but after winning three matches in qualifying this week (including two quite handily), it’s tempting to think he is a bigger threat than a similarly-ranked player who was wildcarded into the event.

Coming soon

At the very least, I’d like to extend my study to more matches.  The data mining is a bit arduous, but another several thousand matches wouldn’t hurt.

Given the complexity of the ATP ranking system, it’s not an easy thing to tweak.  Part of the reason I seek an alternative method of ranking players is that it might be more flexible, admitting adjustments for surface and recent results.