The Wild Card Effect
I’ve written before about the types of players awarded wild cards into professional men’s tennis tournaments. While they can be categorized in different ways, there are two characteristics that are true of almost all wild cards:
- Without a wild card, they would not be able to play in the tournament.
- Tournament organizers see them as an asset to the event.
The first isn’t quite true; many wild cards would otherwise enter the qualifying draw, and some would reach the main draw that way. We can still conclude that WCs are, at least according to ATP entry rankings, inferior to other players who appear in the main draw. The only possible exceptions worth mentioning are qualifiers and other wild cards.
The second doesn’t necessarily tell us anything about the skill level of a player. Simply having James Blake in the draw probably boosts tickets sales for any event in the U.S. Other WCs are awarded to promote a tournament in other ways, perhaps by giving one WC to the winner of a junior event, or a special qualifying tournament for local amateurs.
While these cases are common enough, a major factor in the awarding of wild cards is the tournament organizer’s belief that a WC can compete. So the WC goes to a player returning from injury, or a veteran coming back from retirement. Or a junior who is rocketing up the rankings, or who has recently won a major collegiate event.
All this is to say, in the aggregate, players granted wild cards are usually better than their ranking says they are.
Thus, when we look at matches with one wild card and one non-wild card and apply my algorithm to predict the winner, we should anticipate that wild cards outperform expectations.
Empirical results
In fact, they do. The effect is substantial, and it holds at multiple levels of competition.
In testing the hypothesis, I controlled for home court advantage, an important consideration that is easily conflated with the wild card effect. After all, a large percentage of wild cards are granted to local players, so without careful analysis, it would not be clear how much of the advantage can be attributed to the wild card selection or the benefits of playing in one’s home country.
I ran the numbers with a dataset comprising all ATP main draw, ATP qualifying draw, and Challenger main draw matches from 2008 to 2010. The results were fairly consistent from year to year.
At the ATP main draw level, the dataset yielded over 900 matches between a wild card and a non-wild card. The wild card won the match about 15% more often than expected. We can approximate this effect by multiplying the WC’s ranking points by 1.3.
The other two levels showed even larger effects over about 2600 relevant matches. In ATP qualifying and Challenger main draw matches, wild cards won more than 25% more than expected. We can approximate this effect by multiplying the WC’s ranking points by 1.55.
Commentary
The existence of a positive “wild card effect” is not a surprise, nor is the magnitude. Essentially, when a player is awarded a wild card, we’re given more information about him than ranking points otherwise offer.
I suspect the difference in magnitude between the higher and lower levels is fairly straightforward, as well. While some players receive ATP wild cards straight from the amateur ranks, as can be the case with collegiate champions, most ATP wild cards go to somewhat established players on the fringes of success. These players are often inside the top 150, meaning that they’ve played a lot of professional tournaments, so while their ranking might undervalue them slightly, it is a fairly accurate gauge of their ability level.
By contrast, qualifying and challenger-level wild cards often go to less experienced players. They may not be full-time professionals or they may spend most of their time playing collegiate or junior tournaments. They usually have rankings, but the point totals may only be based on a handful of events.
Example from Australia
The most successful wild card in the Australian Open was Aussie youngster Bernard Tomic, who reached the third round, beating Jeremy Chardy and Feliciano Lopez before losing to Rafael Nadal.
As he was a local and a wild card, we now know to adjust his ranking points twice before estimating his likelihood of winning a match. Instead of estimating his talent with his pre-tourney ranking point total of 239, we adjust upward to 435. That still puts him as an underdog against Chardy’s 960 points, but it means we would have given him a 30% chance of winning instead of an 18% chance.
Of course, the 2011 Australian Open isn’t very instructive here, since six of the other wild cards lost their first matches, while the final WC, Benoit Paire, drew qualifier Flavio Cipolla in the the first round, and was a favorite.
Tennis Home Court – Research Notes
I’ve built out my men’s tennis results database quite a bit in the last couple of months, so I thought I’d revisit my research into home court advantage.
To recall, I started with ATP main draw matches from 2009. I focused on the subset of matches where the tournament was in the home country of one player, but not the other. I excluded matches where either player was a wild card entry–that usually applies to the home player. I did so because I think there is a separate “wild card” effect that reflects selection bias. (Tourney organizers choose players who did not make the cut but whose chances, for whatever reason, are better than their ranking would suggest.)
As I reported in my initial research, using about 450 matches from the 2009 main draw dataset, the home player won 17% more matches than expected. (“Expected” winnings are derived from my bare-bones algorithm to predict the winner of the match.) Using ranking points, this is roughly equivalent to giving the home player credit for 50% more ranking points than he actually has.
For example, Lleyton Hewitt is currently ranked 54th, with 870 ranking points. If we make this adjustment for the Australian Open, we’d say he’ll play at a level equal to someone with 1,305 ranking points, which would be 32nd in the world. Instead of giving him a 36% chance of winning his first round match against David Nalbandian, the home-court-adjusted number would give him a 47% chance. In this case the results might bear us out: The match went to 9-7 in the fifth set.
The surprise came when I expanded the dataset to include Challenger main draw matches and ATP-level qualifier matches. In 2009 Challengers, home players only won 6% more often than expected–equivalent to a ranking points multiplier of 1.15. In 2009 ATP qualies, the home court advantage was only 2%–a multiplier of about 1.05. Whatever confers the home court advantage in ATP main draw matches may not apply at all levels.
I next looked at the same datasets for 2010. Here are the home court advantages (and ranking points multipliers) observed last year:
- ATP main draw: 12% (1.35)
- Challenger main draw: 4% (1.1)
- ATP qualifiers: 14% (1.3)
The first two numbers don’t differ much from the ’09 observations, but the qualifier numbers come out of nowhere.
Until I’m able to look at more matches from before 2009, I hesitate to draw any conclusions about the qualifiers. That still leaves us with a fairly consistent gap between the home court advantage observed at the ATP main draw and Challenger main draw levels.
To the extent that crowd involvement plays a part, it seems reasonable to expect that players would get a bigger boost on a bigger stage. Even on outer courts in the early rounds, fans tend to pull for the locals. At challengers, the atmosphere is often more like a club tournament where the audience is next to nonexistent.
Another major possibility is that some combination of selection bias and the inadequacy of my prediction algorithm accounts for the lack of observed home court advantage in challengers. Players have more choice of where to play at the lower levels, so they will tend to stay closer to home. It may mean that, even exclusive of wild cards, the distribution of home-country players and non-home-country players is different; perhaps the bottom ends of challenger draws are disproportionately packed with home-country players. This is something that I can investigate further.
UPDATE: Just ran the numbers for 2008. The ATP main draw home court advantage remained consistent, at a 16% boost for the home player. The ATP qualifier pool also showed the same home court advantage. However, 2008 differed from later years in that in Challenger main draw matches, home players got an 11% boost, much bigger than in 2009 or 2010.
Marginal ATP Rankings
ATP rankings are frustrating: They are a decent approximation for player skill, but there are so many obvious flaws. Some of those flaws derive from the problem of needing one number–there’s no accounting for surface, for instance.
The one that frustrates me the most is how much luck is allowed to creep into a player’s ranking. When a player is awarded points for his performance in a certain tournament, there is no consideration of the skill level of the players he defeated. So two players who lose in the second round get the same number of points, even if one defeated a 16-year-old wild card in the first round and the other defeated Rafael Nadal in the first round.
There are plenty of arguments in favor of the present way of doing things.
- First, there’s the circular problem of finding a starting point–if ranking points aren’t an adequate measure of skill, how do you give numerical credit based on the skill of opponents?
- Second, players don’t display consistent levels of skill; if Milos Raonic is in the fourth round of the Australian Open, he is probably playing better than he was four months ago when he lost in the first round of the U.S. Open. Perhaps the person who defeats him in Melbourne deserves more points than the guys who beat him in qualifiers and challengers last fall. Players also display different levels of skill depending on surface; beating Juan Carlos Ferrero is more impressive on clay than on grass, and you’re more likely to do so in a later round on clay.
- Third, you could say that it all comes out in the wash. Pros play a lot of tournaments, and while you might only get 20 points for beating a top-10 player in the first round, you might get an additional 90 points for beating an unseeded player three rounds later.
We could settle for the status quo, or we could experiment with a different approach and test it. Testing these things is an enormous task, so for today I’m just presenting the experiment itself.
Opponent-based point awards
I looked at all ATP-level main draw and qualifying draw matches, along with Challenger-level main draw matches. I figured out the marginal points awarded to the winner of each match (e.g., by winning in the third round in the Aussie Open, you get 180 points instead of 90 points, for 90 marginal points) and the ranking points of the loser at the time of the match.
For instance, when Nadal beat Federer in the Madrid final, Nadal was awarded 400 marginal points, and Federer had 10,690 ranking points. Add up those two types of points, and it turns out that the total marginal points awarded in these matches are approximately 4.5% of the ranking points of the losers.
Thus, if we use a simple linear model, instead of giving Nadal 400 marginal points for winning that match, we give him 4.5% of 10,690, or 463 points. In this case, not a big difference. But when top players are upset in early rounds, the adjustment is huge.
To take a very different example: In Miami last year, Olivier Rochus beat Novak Djokovic in the round of 64. For advancing to the round of 32, Rochus earned 20 marginal points. Djokovic’s ranking point total at that point was 8,220, so if we give Rochus 4.5% of that, he gets 365 points. As we’ll see, that single adjustment rockets him up the rankings.
Pros and Cons
Compared to the present ATP ranking system, this approach gives more credit to the players who are capable of a top-10 performance, even if they play at that level very rarely. As we’ll see, a single major upset can make a huge difference, so perhaps it too heavily weighs a single match. If Rochus happened to play Djokovic on a day when Djokovic had the flu, does he really deserve 365 points?
Another potential problem is that this model doesn’t consider the level of the opponents that a player loses to. Nikolay Davydenko is known for his ability to beat Federer or Nadal, but in consecutive weeks in October, he lost to Pablo Cuevas and Mischa Zverev. Should we rank someone based on their ability to defeat “better” players, or their inability to defeat “lesser” players? As always the standard ATP ranking system appears to be a decent compromise.
For my purposes, what matters is how well a ranking system predicts future results. I hope that soon I’ll be able to report on how this one performs.
In the meantime, here are the 2010 year-end top 100, using the opponent-based model I’ve described. I’ve also included each player’s actual 2010 year-end ranking and the difference between their placement in the two systems.
Rk Player Pts Actual Diff 1 Rafael Nadal 4562 1 0 2 Roger Federer 4529 2 0 3 Robin Soderling 3905 5 2 4 David Ferrer 3450 7 3 5 Andy Murray 3347 4 -1 6 Tomas Berdych 2891 6 0 7 Jurgen Melzer 2772 11 4 8 Novak Djokovic 2730 3 -5 9 Fernando Verdasco 2697 9 0 10 Andy Roddick 2331 8 -2 11 Gael Monfils 2266 12 1 12 Nikolay Davydenko 2070 22 10 13 Mikhail Youzhny 2059 10 -3 14 Ivan Ljubicic 1952 17 3 15 Guillermo Garcia-Lopez 1948 33 18 16 Nicolas Almagro 1908 15 -1 17 Marcos Baghdatis 1859 20 3 18 Marin Cilic 1843 14 -4 19 Albert Montanes 1832 25 6 20 Michael Llodra 1822 23 3 21 Ernests Gulbis 1695 24 3 22 Viktor Troicki 1654 28 6 23 Mardy Fish 1646 16 -7 24 Jo-Wilfried Tsonga 1577 13 -11 25 Stanislas Wawrinka 1573 21 -4 26 Richard Gasquet 1570 30 4 27 Florian Mayer 1497 37 10 28 John Isner 1473 19 -9 29 Philipp Kohlschreiber 1456 34 5 30 Feliciano Lopez 1396 32 2 31 David Nalbandian 1395 27 -4 32 Juan Monaco 1394 26 -6 33 Samuel Querrey 1353 18 -15 34 Xavier Malisse 1342 60 26 35 Jeremy Chardy 1272 45 10 36 Andrei Goloubev 1236 36 0 37 Juan Carlos Ferrero 1227 29 -8 38 Jarkko Nieminen 1214 39 1 39 Gilles Simon 1180 41 2 40 Janko Tipsarevic 1178 49 9 41 Benjamin Becker 1145 53 12 42 Michael Berrer 1144 58 16 43 Thomaz Bellucci 1125 31 -12 44 Alexander Dolgopolov 1058 48 4 45 Denis Istomin 1058 40 -5 46 Andreas Seppi 1047 52 6 47 Thiemo de Bakker 1033 43 -4 48 Potito Starace 1031 47 -1 49 Daniel Gimeno 983 56 7 50 Olivier Rochus 971 113 63 51 Lleyton Hewitt 944 54 3 52 Julien Benneteau 941 44 -8 53 Marcel Granollers 937 42 -11 54 Juan Ignacio Chela 872 38 -16 55 Pablo Cuevas 868 63 8 56 Tommy Robredo 851 50 -6 57 Philipp Petzschner 817 57 0 58 Sergey Stakhovsky 813 46 -12 59 Dudi Sela 808 75 16 60 Santiago Giraldo 805 64 4 61 Michael Zverev 797 82 21 62 Radek Stepanek 789 62 0 63 Fabio Fognini 784 55 -8 64 Mikhail Kukushkin 781 59 -5 65 Yen-Hsun Lu 745 35 -30 66 Igor Andreev 722 79 13 67 Carlos Berlocq 719 66 -1 68 Ryan Sweeting 716 116 48 69 Teimuraz Gabashvili 714 80 11 70 Arnaud Clement 703 78 8 71 Lukas Lacko 691 89 18 72 Tobias Kamke 652 67 -5 73 Pere Riba 646 72 -1 74 Rainer Schuettler 642 84 10 75 Robin Haase 626 65 -10 76 Florent Serra 625 69 -7 77 Leonardo Mayer 622 94 17 78 Rui Machado 618 93 15 79 Kevin Anderson 596 61 -18 80 Albert Ramos 595 123 43 81 Ivo Karlovic 567 73 -8 82 Frederico Gil 554 101 19 83 Daniel Brands 546 104 21 84 Alejandro Falla 544 105 21 85 Simon Greul 534 130 45 86 Simone Bolelli 521 107 21 87 Filippo Volandri 509 91 4 88 Ilia Marchenko 488 81 -7 89 Marco Chiudinelli 486 117 28 90 Filip Krajinovic 483 214 124 91 Victor Hanescu 481 51 -40 92 Bjorn Phau 479 102 10 93 Ivan Dodig 478 88 -5 94 Kei Nishikori 477 98 4 95 Evgueni Korolev 468 140 45 96 James Blake 467 135 39 97 Ruben Ramirez-Hidalgo 466 77 -20 98 Ricardo Mello 461 76 -22 99 Grigor Dimitrov 460 106 7 100 Brian Dabul 458 85 -15
Comments Off on Marginal ATP Rankings
Predictiveness of ATP Rankings – Research Notes
I’m working on some bigger projects right now that might take some time before they see the light. In the meantime, here are a couple of things I’ve discovered about ATP rankings and their use to predict the outcome of matches.
1. In my earlier research, I found that in the “buckets” of matches that the favorite is most likely to win, my algorithm is still reasonably accurate. In other words, if the ranking points predict that Nadal, say, has a 98% chance of beating the 140th ranked player, his chances are in fact that high. The algorithm was as accurate on the extreme high end as it was anywhere else on the spectrum.
However, I only included matches in my sample where both players were ranked inside the top 200. I thought that was an innocuous enough cutoff, but I see now why it was misleading. If we limit the the sample that way, the most extreme favorites will only be the very top players. In fact, the only players who my algorithm gives a 95% chance of beating the 200th ranked player are the top 5.
When I expanded the sample to players ranked outside of the top 200, the high end broke down. In other words, in the bucket of matches where the favorite had a 90% or better chance of winning, the favorite isn’t winning that often.
There are several possible explanations for this, none of which account for the entire effect, but many of which surely play a part:
- I’m still only looking at ATP-level matches, and if a player outside of the top 200 is in an ATP main draw match, he was not exactly randomly selected. He may be playing at “home” on a wild card, he may be hot after a solid week in qualifying, he is probably on his favorite surface, and his ranking may be misleading due to injury.
- Outside of the top 5 or top 10, players are substantially less consistent. It’s tough to imagine Robin Soderling losing to a qualifier right now, but easy to see, say, Fernando Verdasco or Ivan Ljubicic doing so.
More fundamentally, I suspect that the further down the rankings you go, the less the difference in points really mean. Certainly there’s much more movement–once you get outside the top 50, one good showing can easily gain you 10, 20, or more spots. That doesn’t mean that a player is suddenly more skilled, which is the way my algorithm has to treat him.
Controlling for surface, wild card status, and more will help reconcile some of these differences, but ultimately, matches between drastically mismatched (on paper) opponents may have to be treated differently than matches between more closely matched peers.
2. Eliminating some quirks of the ATP ranking system doesn’t break it, at least not for my purposes. In the process of my current projects, I wanted to be able to more easily tweak the parameters of the ranking system, so I started by rebuilding the existing one. But there are a lot of quirks:
- The top 4 or 5 players get a lot of points from the Tour Championships.
- Davis Cup players get points.
- Rankings are limited to a player’s top 18 tournaments, but there are some limitations on what those tournaments must be, resulting in cases where player gets credited for a poor showing at a grand slam, but does not get credited for a better showing (worth more points) at a smaller tournament.
All of these quirks have their purposes, given the ATP’s priorities are built around keeping fans interested and ensuring that top players focus on the most important events. But they are a pain in the butt to incorporate in an on-the-fly system, so I just ignored them.
And as it turns out, they are not affecting my results in any meaningful way. I’ve re-run a couple of earlier projects with my “improvised” rankings, and nothing is changing by more than a percent or two. Occasionally the effect is strong on a certain player (I think the improvised system bumps Juan Carlos Ferrero from #29 to inside the top #15 at 2010 year-end), but in the aggregate, it makes no difference.
Comments Off on Predictiveness of ATP Rankings – Research Notes
2011 Aussie Open Simulation Results
Using my simple ranking-points-based algorithm to determine the odds that each player wins a match, I ran simulations using the 2011 Australian Open draw.
As usual, the keyword is “simple,” and you can easily find all sorts of intuitive reasons to discount the results. There’s no consideration for surface, so clay-court specialists are generally overrated. Players returning from injury (Del Potro, especially, and Karlovic) have seen the hit in the rankings, and are thus underrated here, as well.
I’m also publishing the code that I use to generate these sims. It should work for any single-elimination tournament up to 128 competitors, and is easily expandable to handle larger brackets. The function ‘calcWP’ is specific to my tennis algorithm, but you could swap in something like log5 very easily. I also included the .csv file I used for the draw, so you can see the format, or tinker with the parameters and come up with your own Aussie sim.
Your 2011 Australian Open…
Player points R64 R32 R16 QF SF F W Nadal 1 12390 96.9% 92.7% 87.0% 78.1% 66.1% 49.6% 34.5% Daniel 564 3.1% 1.4% 0.5% 0.1% 0.0% 0.0% 0.0% Sweeting Q 486 35.3% 1.6% 0.5% 0.1% 0.0% 0.0% 0.0% Gimeno-Traver 844 64.7% 4.3% 1.9% 0.7% 0.2% 0.0% 0.0% Tomic W 239 17.9% 3.1% 0.1% 0.0% 0.0% 0.0% 0.0% Chardy 960 82.1% 39.6% 3.7% 1.4% 0.4% 0.1% 0.0% Falla 540 27.3% 11.3% 0.7% 0.2% 0.0% 0.0% 0.0% Lopez F 31 1310 72.7% 46.0% 5.6% 2.6% 0.9% 0.2% 0.0% Isner 20 1850 74.0% 56.8% 31.7% 5.8% 2.5% 0.8% 0.2% Serra 711 26.0% 14.0% 4.6% 0.5% 0.1% 0.0% 0.0% Stepanek 735 62.1% 20.4% 6.9% 0.6% 0.2% 0.0% 0.0% Gremelmayr Q 469 37.9% 8.8% 2.2% 0.1% 0.0% 0.0% 0.0% Machado 573 41.2% 10.2% 3.3% 0.2% 0.0% 0.0% 0.0% Giraldo 785 58.8% 18.3% 7.2% 0.7% 0.2% 0.0% 0.0% Young D Q 435 14.6% 5.4% 1.4% 0.1% 0.0% 0.0% 0.0% Cilic 15 2140 85.4% 66.1% 42.8% 8.7% 4.1% 1.4% 0.4% Youzhny 10 2920 85.6% 70.1% 51.9% 29.2% 8.1% 3.3% 1.1% Ilhan 574 14.4% 6.2% 2.2% 0.5% 0.0% 0.0% 0.0% Kavcic Q 552 38.0% 7.1% 2.4% 0.5% 0.0% 0.0% 0.0% Anderson K 868 62.0% 16.6% 7.4% 2.1% 0.2% 0.0% 0.0% Raonic Q 351 36.4% 6.8% 1.0% 0.2% 0.0% 0.0% 0.0% Phau 581 63.6% 18.0% 4.1% 0.9% 0.1% 0.0% 0.0% Chela 1070 39.3% 27.8% 9.9% 3.2% 0.4% 0.1% 0.0% Llodra 22 1575 60.7% 47.4% 21.0% 8.8% 1.6% 0.4% 0.1% Nalbandian 27 1480 64.2% 49.1% 18.4% 8.2% 1.4% 0.4% 0.1% Hewitt 870 35.8% 23.1% 6.1% 2.0% 0.2% 0.0% 0.0% Berankis 589 61.1% 19.1% 3.9% 1.0% 0.1% 0.0% 0.0% Matosevic W 392 38.9% 8.8% 1.4% 0.2% 0.0% 0.0% 0.0% Russell 547 67.0% 10.1% 3.6% 0.8% 0.1% 0.0% 0.0% Ebden W 288 33.0% 2.7% 0.6% 0.1% 0.0% 0.0% 0.0% Nieminen 1062 20.2% 14.5% 7.7% 2.8% 0.4% 0.1% 0.0% Ferrer 7 3735 79.8% 72.7% 58.4% 39.4% 12.6% 5.8% 2.4% Soderling 4 5785 87.9% 83.6% 71.9% 58.3% 35.9% 15.6% 7.9% Starace 945 12.1% 8.9% 4.2% 1.7% 0.3% 0.0% 0.0% Muller Q 466 76.9% 6.9% 2.0% 0.5% 0.1% 0.0% 0.0% Stadler Q 155 23.1% 0.7% 0.1% 0.0% 0.0% 0.0% 0.0% Istomin 1031 86.2% 41.8% 8.8% 3.6% 0.8% 0.1% 0.0% Hernych Q 196 13.8% 1.9% 0.1% 0.0% 0.0% 0.0% 0.0% Mello 627 30.0% 12.8% 1.9% 0.6% 0.1% 0.0% 0.0% Bellucci 30 1355 70.0% 43.5% 11.0% 5.3% 1.4% 0.2% 0.1% Gulbis 24 1505 64.3% 41.5% 20.7% 6.3% 1.9% 0.4% 0.1% Becker 870 35.7% 17.9% 6.4% 1.3% 0.2% 0.0% 0.0% Dolgopolov 928 53.6% 22.8% 8.6% 1.8% 0.4% 0.0% 0.0% Kukushkin 815 46.4% 17.9% 6.3% 1.2% 0.2% 0.0% 0.0% Seppi 900 59.6% 19.2% 8.7% 1.9% 0.4% 0.0% 0.0% Clement 627 40.4% 9.9% 3.5% 0.6% 0.1% 0.0% 0.0% Petzschner 839 24.3% 12.6% 5.5% 1.1% 0.2% 0.0% 0.0% Tsonga 13 2345 75.7% 58.2% 40.4% 15.8% 6.3% 1.6% 0.5% Melzer 11 2785 91.2% 77.7% 54.3% 22.9% 10.4% 3.0% 1.0% Millot Q 334 8.8% 3.3% 0.7% 0.1% 0.0% 0.0% 0.0% Ball W 344 32.5% 4.1% 0.9% 0.1% 0.0% 0.0% 0.0% Riba 672 67.5% 14.8% 5.5% 1.0% 0.2% 0.0% 0.0% Sela 568 77.8% 21.8% 5.0% 0.7% 0.1% 0.0% 0.0% Del Potro 180 22.2% 2.4% 0.2% 0.0% 0.0% 0.0% 0.0% Zemlja Q 376 15.1% 6.9% 1.2% 0.1% 0.0% 0.0% 0.0% Baghdatis 21 1785 84.9% 68.9% 32.2% 10.7% 3.8% 0.8% 0.2% Garcia-Lopez 32 1300 62.1% 44.0% 10.6% 4.2% 1.2% 0.2% 0.0% Berrer 835 37.9% 22.8% 3.9% 1.1% 0.2% 0.0% 0.0% Schwank 580 50.6% 16.9% 2.3% 0.5% 0.1% 0.0% 0.0% Mayer L 572 49.4% 16.3% 2.1% 0.4% 0.1% 0.0% 0.0% Marchenko 624 49.3% 5.5% 2.3% 0.6% 0.1% 0.0% 0.0% Ramirez Hidalgo 638 50.7% 5.7% 2.4% 0.6% 0.1% 0.0% 0.0% Beck K 543 7.0% 3.2% 1.2% 0.3% 0.0% 0.0% 0.0% Murray 5 5760 93.0% 85.5% 75.3% 56.7% 35.5% 15.6% 7.9% Berdych 6 3955 96.4% 78.5% 63.1% 42.3% 22.0% 9.6% 3.4% Crugnola Q 194 3.6% 0.5% 0.1% 0.0% 0.0% 0.0% 0.0% Kohlschreiber 1215 63.8% 15.2% 8.3% 3.1% 0.9% 0.2% 0.0% Kamke 724 36.2% 5.8% 2.4% 0.7% 0.1% 0.0% 0.0% Harrison W 313 32.3% 6.7% 0.6% 0.1% 0.0% 0.0% 0.0% Mannarino 612 67.7% 22.8% 3.9% 0.9% 0.1% 0.0% 0.0% Dancevic Q 172 9.0% 2.2% 0.1% 0.0% 0.0% 0.0% 0.0% Gasquet 28 1385 91.0% 68.3% 21.5% 8.8% 2.5% 0.6% 0.1% Davydenko 23 1555 60.0% 41.5% 17.1% 6.5% 2.0% 0.5% 0.1% Mayer F 1073 40.0% 23.9% 8.0% 2.3% 0.6% 0.1% 0.0% Fognini 855 59.6% 22.7% 6.5% 1.7% 0.3% 0.0% 0.0% Nishikori 599 40.4% 12.0% 2.7% 0.5% 0.1% 0.0% 0.0% Zverev 611 38.3% 7.2% 2.4% 0.5% 0.1% 0.0% 0.0% Tipsarevic 935 61.7% 16.0% 7.2% 2.0% 0.4% 0.1% 0.0% Schuettler 597 13.5% 5.8% 1.9% 0.4% 0.1% 0.0% 0.0% Verdasco 9 3240 86.5% 71.1% 54.1% 30.3% 14.2% 5.7% 1.8% Almagro 14 2160 84.5% 68.0% 41.9% 15.4% 6.8% 2.2% 0.5% Robert Q 460 15.5% 6.6% 1.8% 0.2% 0.0% 0.0% 0.0% Andreev 622 52.1% 13.7% 4.5% 0.8% 0.1% 0.0% 0.0% Volandri 574 47.9% 11.8% 3.6% 0.5% 0.1% 0.0% 0.0% Cipolla Q 190 32.6% 3.4% 0.4% 0.0% 0.0% 0.0% 0.0% Paire W 366 67.4% 12.5% 2.5% 0.3% 0.0% 0.0% 0.0% Luczak W 400 14.7% 8.6% 1.9% 0.2% 0.0% 0.0% 0.0% Ljubicic 17 1965 85.3% 75.5% 43.4% 15.1% 6.2% 1.8% 0.4% Troicki 29 1385 86.2% 64.4% 16.2% 7.2% 2.4% 0.5% 0.1% Tursunov 263 13.8% 4.5% 0.3% 0.0% 0.0% 0.0% 0.0% Dabul 584 58.6% 19.8% 2.7% 0.7% 0.1% 0.0% 0.0% Mahut Q 424 41.4% 11.3% 1.1% 0.2% 0.0% 0.0% 0.0% Karlovic 670 52.8% 6.2% 2.5% 0.7% 0.1% 0.0% 0.0% Dodig 606 47.2% 5.0% 2.0% 0.5% 0.1% 0.0% 0.0% Granollers 993 11.6% 7.2% 3.6% 1.4% 0.4% 0.1% 0.0% Djokovic 3 6240 88.4% 81.6% 71.5% 56.9% 40.2% 21.9% 10.2% Roddick 8 3565 88.5% 78.1% 61.4% 42.2% 16.8% 8.1% 2.7% Hajek 560 11.5% 5.8% 2.0% 0.4% 0.0% 0.0% 0.0% Przysiezny 590 51.7% 8.5% 2.9% 0.7% 0.1% 0.0% 0.0% Kunitsyn 551 48.3% 7.6% 2.5% 0.7% 0.1% 0.0% 0.0% Berlocq 725 47.1% 16.8% 4.0% 1.2% 0.2% 0.0% 0.0% Haase 803 52.9% 20.0% 5.2% 1.7% 0.3% 0.0% 0.0% Benneteau 965 38.5% 21.8% 6.3% 2.3% 0.4% 0.1% 0.0% Monaco 26 1480 61.5% 41.5% 15.7% 7.2% 1.7% 0.5% 0.1% Wawrinka 19 1855 76.7% 52.3% 28.1% 12.8% 3.5% 1.2% 0.2% Gabashvili 626 23.3% 9.4% 2.6% 0.6% 0.1% 0.0% 0.0% Dimitrov Q 518 29.5% 7.5% 1.8% 0.4% 0.0% 0.0% 0.0% Golubev 1135 70.5% 30.8% 12.7% 4.2% 0.8% 0.2% 0.0% Gil 551 40.2% 8.3% 2.4% 0.5% 0.0% 0.0% 0.0% Cuevas 790 59.8% 16.4% 6.1% 1.6% 0.2% 0.0% 0.0% De Bakker 950 25.2% 14.9% 6.2% 1.9% 0.3% 0.1% 0.0% Monfils 12 2560 74.8% 60.4% 40.2% 21.5% 7.2% 2.9% 0.7% Fish 16 1996 70.1% 52.0% 32.0% 8.2% 3.9% 1.3% 0.3% Hanescu 915 29.9% 16.4% 6.8% 1.0% 0.3% 0.0% 0.0% Robredo 915 65.2% 23.4% 9.9% 1.5% 0.4% 0.1% 0.0% Devvarman 514 34.8% 8.2% 2.4% 0.2% 0.0% 0.0% 0.0% Stakhovsky 925 64.4% 24.8% 10.2% 1.6% 0.4% 0.1% 0.0% Brands 541 35.6% 9.3% 2.6% 0.3% 0.1% 0.0% 0.0% Kubot 670 24.5% 11.4% 3.9% 0.5% 0.1% 0.0% 0.0% Querrey 18 1860 75.5% 54.5% 32.1% 7.8% 3.4% 1.1% 0.2% Montanes 25 1495 74.3% 48.4% 8.8% 4.5% 1.7% 0.5% 0.1% Brown 573 25.7% 10.3% 0.9% 0.2% 0.0% 0.0% 0.0% Andujar 683 40.9% 14.9% 1.4% 0.4% 0.1% 0.0% 0.0% Malisse 956 59.1% 26.4% 3.4% 1.4% 0.4% 0.1% 0.0% Lu 1141 53.7% 6.2% 3.2% 1.4% 0.5% 0.1% 0.0% Simon 1005 46.3% 4.8% 2.3% 0.9% 0.3% 0.1% 0.0% Lacko 553 4.3% 1.4% 0.4% 0.1% 0.0% 0.0% 0.0% Federer 2 9245 95.7% 87.6% 79.6% 70.0% 56.6% 40.3% 22.4%
Python Code for Marcel Projections
A while back, I posted retro-Marcel projections for over 100 seasons. They were generated with some python code, and now you can play with it.
You’ll also need some Baseball-Databank files. (Well, you don’t need them, but they will make the process much easier.)
The ‘import’ lines refer to a few utilities that I’ve written. Those are also available on gitHub. At some point, I’ll write up a summary of some of my Python utilities. I’m sure that none of them are original (for instance, turning a 2-d matrix into a .csv, or vice versa), but I use them all the time, and they might come in handy for you, too.
Comments Off on Python Code for Marcel Projections
Python Code for Tennis Markov
I’ve published my code for the tennis markov project. You can find it here:
- Single game outcome. Takes the server’s probability of winning a single point and the current score, returns server’s chance of winning game.
- Tiebreak outcome. Takes server’s probability of winning a single service point, prob of winning single return point, and current score, returns server’s chance of winning tiebreak.
- Single set outcome. Takes server’s probability of winning a single service point, prob of winning single return point, and current game score, returns server’s chance of winning set. (Assumes standard tiebreak set.)
- Match outcome. Takes server’s probability of winning a single service point, prob of winning single return point, current score in points, games, and sets, and number of sets, returns server’s chance of winning match.
The logic in the tiebreak problem is knotty, and the code reflects that; I’m sure there’s a better way of doing it, I just didn’t feel like working it out once I got to the answer.
In the other functions, the code is pretty clean, and I’ve commented it more than I otherwise would. The math gets a little hairy, though.
Isner Loses on Points, Wins Match…Again
Last month, I noticed that John Isner wins a whole lot of matches despite losing more than half of the total points. Such matches are not terribly rare, but Isner is in a class by himself–it happened eight times last season alone.
Sure enough, he’s picking up right where he left off. In his first match of the year, he beat Robin Haase 36 76(4) 75, despite losing 110 of 211 total points.
While Isner managed to break serve once (in his one attempt), he was amazingly inept against Haase’s serve, winning only 24 percent of points on return. I haven’t made a thorough survey, but that’s one of the lowest return success rates I’ve ever seen.
Between this and the staggering numbers of tiebreaks played by the likes of Isner and Ivo Karlovic, it really seems like the tallest guys are playing a different sport.
Comments Off on Isner Loses on Points, Wins Match…Again
Minor League Splits Databases Available
It’s been a good run, but Minor League Splits will no longer exist in its original form.
I took down the site a few months ago, and have decided to make all of the underlying databases freely available. This includes full play-by-play of all affiliated minor leagues in the U.S. from 2005 to 2010.
I haven’t decided whether I will update this with 2011 data at the end of next season. At the very least, I won’t be doing any in-season updates.
At some point in the future, I may open source the code I use to collect and analyze the play-by-play. That’s a ways off, though. This was my first major programming project five years ago, so the original code isn’t very good, and as MLBAM has changed things, I’ve added one ugly hack on top of another. As is, the code is probably not usable for anyone aside from me.
Click here to see what’s available.
Ivo Karlovic and the Inevitable Tiebreak
Ivo Karlovic is back. He missed most of last season due to a foot injury, but he’s healed, and playing just like he always has. In Doha last week, he reached the quarterfinals, beating Philipp Kohlschreiber in the round of 16.
What should come as a surprise to no one is that, before reaching the quarters, he played a total of five sets, every one of which went to a tiebreak. For most opponents, Karlovic is impossible to break, and since his game is so service-centered, he doesn’t break serve much himself.
That’s the anecdotal story, and it’s intuitively sound. Does the data back it up?
To find out, I used a data set of all ATP-level matches from 2001 to 2010 and counted, among other things, how many sets ended in a tiebreak.
In that span, about 17 percent of sets ended in tiebreaks. Indeed, Karlovic has played tiebreaks at a higher rate than anyone else. And it isn’t even close.
After eliminating everyone who played fewer than 200 sets in the last decade, we’re left with 205 players. Of those, 33 guys reached a tiebreak in at least 20 percent of sets. Only 7 played tiebreaks in more than a quarter of sets. Karlovic reached a tiebreak in 40 percent of sets, more than anyone else in this time period.
Rounding out the top of the list are Chris Guccione at 35% (in only 203 sets), John Isner and Wayne Arthurs at 33%, and Alexander Waske at 31%.
The highest possible range for top-10 level success seems to be about 24 percent. That’s where Ivan Ljubicic has been over the last decade, while Pete Sampras (albeit in only 252 sets) and Andy Roddick are at 23 percent. To find more elite-level players, we must go down to 21 percent, the level Marat Safin and Jo-Wilfried Tsonga have maintained.
But did they win?
Tennis fans often view success in tiebreaks as a proxy for clutch, and perhaps they are right to do so. If two players reach a tiebreak, they are fairly evenly matched, and the tiebreak itself doesn’t necessarily give an edge to either player.
This may explain why few top players end up in a high percentage of tiebreaks. Only a handful of players are able to sustain tiebreak winning percentages above 60 percent, but to have a very successful season, you need to win more than 60 percent of sets.
What came as a surprise to me is that the players who reach the most tiebreaks are not necessarily that successful in the tiebreaks. In fact, there is no meaningful correlation between the two rates.
Karlovic, for instance, won only 49 percent of tiebreaks in the last decade. Ljubicic only 52 percent, and Safin only 50 percent. Yet there are plenty of standouts at the high end of the spectrum: Isner has won 63 percent of tiebreaks, while Roddick has won 64 percent and Tsonga has won 61 percent.
The same variety is on display among those players who contest tiebreaks at the lowest rates. David Ferrer ends up in a tiebreak only 11 percent of the time, and wins only 47 percent of them, but Nicholas Kiefer played tiebreaks in 12 percent of his sets and won 58 percent.
Perhaps this is all reassuring. I suspect I’m not the only tennis fan to be annoyed watching Karlovic or Isner cruise to a tiebreak with what appears to be a minimal effort. At least, for many such players, winning the set is not such an automatic result.
Comments Off on The Wild Card Effect