Tennis Win Expectancy Graphs!
It’s time to toss a lot of this year’s work into the blender and see what comes out.
In the last few months, I’ve calculated win expectancy charts for different parts of tennis matches. Here’s the data for a single game, a tiebreak, and a set. All of these are adjustable for assumed serving and returning skill levels, so we can can look at different win expectancies when the players are big servers, counterpunchers, or anything in between.
There’s not much tennis play-by-play out there, so the applications are lacking at this time. But I’m working on it.
2006 Davis Cup: Roddick vs. Pavel
It’s February 2006, the first round of Davis Cup World Group play. The USA has drawn Romania, and they are playing at home. Opening the tie is the USA #1 (and world #3) Andy Roddick against the Romanian Andrei Pavel.
While Pavel broke in to the top 20 for a small chunk of his career, he was never a player of Roddick’s caliber. At the time of this match, he was ranked #82 in the world. Using only ranking points, Roddick had about an 87% chance of winning. Add in the home-field, home-surface advantage, and on paper, it was even more lopsided.
A roller-coaster match
Here’s a quick summary of the match, so you know what you’re looking at when we get to the graphs.
First Set: Roddick and Pavel trade holds to 6-6, then Roddick easily takes the tiebreak, 7-2.
Second Set: Roddick cruises, breaking twice (at 1-1 and 2-4), to win the set 6-2.
Third Set: Pavel breaks Roddick at 0-1, but Roddick breaks back at 4-2. They reach another tiebreak. They trade early mini-breaks, but at 8-8, Roddick loses a point on serve, and Pavel holds to take the tiebreak 10-8.
Fourth Set: Pavel keeps the pressure on, breaking Roddick at love in the first game. He breaks in the 5th game as well, winning this set 6-2 to even the match.
Fifth Set: This is almost unbelievable. Roddick gets broken in the first game. Pavel loses serve as well. Roddick gets broken again. Pavel reels off the next three games to reach 5-1, but Roddick fights back to 4-5. Finally, in a ten-point game (the second longest of the match), Pavel holds to win the set 6-4, and the match.
Any WinEx graph varies depending on your assumptions. Let’s start with what I’ll call the “baseball model.” In it, we assume that each player has an equal chance of winning every point, serve or return. (In professional men’s tennis, that’s false. But as in baseball, it’s a convenient fiction.)
All of these graphs are from Roddick’s perspective. The little gaps in the line separate each set. Unsurprisingly, the most drama is pictured at the beginning and end of the final set. The flip-flopping breaks are obvious at the beginning of the fifth set, and the hard-fought final game shows how close Roddick was to evening the score before finally losing.
Picturing the server advantage
Let’s change our assumptions to get closer to reality. On hard courts, the average player wins about 64% of points on serve. This is 2006 Roddick we’re talking about, so I edged it up to 65%. Note that I’m altering the assumption for both players — this graph is based on the fiction that all players are equal, but the more truthful parameter that they are more likely to win points on serve.
Still looks about the same. The extremes are a bit more pronounced in the fifth set — still, Roddick created some drama, but the lower win expectancy in the final game reflects the fact that Pavel had the ball on his racquet.
Approximating skill level
I don’t yet have granular stats for 2005 or 2006, so I don’t know exactly what percentage of serve and return points Roddick and Pavel were winning back then. However, I do have their rankings, which generated the estimate that Roddick had about an 87% chance of winning. We can argue about the effect of the Davis Cup atmosphere, the fact that Pavel was selected for the team, and so on, but it’s clear that Roddick had a big edge, at least on paper.
To reflect that edge, I changed the assumptions that each player would win 65% of points on serve. Now, we figure that Roddick wins 69% of points on serve while Pavel wins 62%. That spits out a win percentage of about 86% for Roddick, which is obvious in the graph, especially in the first set!
Roddick was up 3-0 in that 3rd set tiebreak. In this model, he had a 99.0% chance at that point of winning the match! What I find noteworthy is how, when a player is so heavily favored, the underdog can hack away at the advantage just by hanging in there. It’s intuitively true, but still interesting to see in practice. While Roddick started the match at 86%, Pavel got the edge down to 76% toward the end of the first set.
All together now
Here’s a look at all three models. The red line is the 50/50 “baseball model,” the green line is the 65/65 model, and the blue line is the 69/62 skill-based model.
There’s a lot of great stuff here. In future posts, I’ll get into more of it.