The Summer of Jeff

Serving Against Markov

Posted in tennis by Jeff on December 5, 2010

The last few days I’ve presented a lot of theory. My win expectancy tables are based on what you might call a “video game” model of tennis–players are presumed to perform at the same level on every single point. In a one-on-one sport played by humans, I think it’s safe to assume that that’s not the case.

Some of the specific assumptions are more easily tested than others. Let’s start with one of the easy ones.

Starting from the beginning of this project a few days ago, I used Markov chains to translate service points won to service games won.  There’s obviously a correlation, but does my model reflect reality?

There are plenty of reasons to suspect that it may not.  First off, it doesn’t consider the differences between the deuce and ad courts.  Some players are stronger servers (or returners) on one side or the other.  Second, anyone who has even played or watched tennis can tell you that good serving isn’t always consistent.  Even a server who is perfectly consistent isn’t going to put up the same numbers every match–the surface, the weather, and most of all his opponent will have something to say about that.

Tennis isn’t awash with numbers, but this is a case where we are able to use ATP statistics for a quick reality check.

Testing the model

I took the top 50 players in the most recent ATP rankings.  In the table below, you’ll see each man’s percentage of service games played in 2010, his percentage of service points won, and his number of service games won.  Next, I show the percentage of service games the Markov model predicts he would win, based on his percentage of service points won.

The final columns show two ways of measuring the results.  The first of the two is the raw difference between the number of service games he actually won and the number that my model predicts he would win.  For 40 of the 50 players, this is negative!  Clearly, the model is not quite an accurate representation of reality, at least at the aggregate level.

To show something a little different, the final column adjusts those numbers so that they sum to zero.  You might think of this like wins above or below a Pythagorean expectation for a baseball team.  As in the baseball analogue, we might be tempted into speculating that this plus/minus rating represents some kind of clutch skill.  We’ll come back to that in a bit.

Here are the numbers:

First         Last             Sv Gms  Pts Won  Gms Won    Pred Gms    Diff  Diff+  
John          Isner               916     0.69      0.9        0.89      11     19  
Marin         Cilic               769     0.65     0.84        0.83       8     15  
Novak         Djokovic            907     0.64     0.82        0.81       7     15  
Ivan          Ljubicic            563     0.65     0.84        0.83       6     11  
Albert        Montanes            724     0.63      0.8        0.79       4     10  
Jurgen        Melzer              914     0.65     0.83        0.83       0      8  
Rafael        Nadal              1001      0.7      0.9        0.90      -1      8  
Feliciano     Lopez               581     0.66     0.85        0.85       2      8  
Robin         Soderling           942     0.67     0.86        0.86      -1      7  
Gael          Monfils             793     0.65     0.83        0.83       0      7  

Sam           Querrey             810     0.67     0.86        0.86      -1      6  
Andrey        Golubev             440     0.62     0.78        0.78       2      6  
Mikhail       Youzhny             771     0.64     0.81        0.81      -2      5  
Jarkko        Nieminen            669     0.64     0.81        0.81      -2      4  
Tomas         Berdych             864     0.68     0.87        0.88      -4      3  
Nicolas       Almagro             898     0.66     0.84        0.85      -5      3  
Andy          Murray              791     0.66     0.84        0.85      -5      2  
Potito        Starace             541     0.63     0.79        0.79      -3      2  
Richard       Gasquet             724     0.66     0.84        0.85      -4      2  
Stanislas     Wawrinka            641     0.66     0.84        0.85      -4      2  

Juan Ignacio  Chela               593      0.6     0.73        0.74      -3      2  
Sergiy        Stakhovsky          588     0.62     0.77        0.78      -3      2  
Yen Hsun      Lu                  425     0.62     0.77        0.78      -2      1  
Janko         Tipsarevic          515     0.65     0.82        0.83      -5     -1  
Nikolay       Davydenko           560     0.65     0.82        0.83      -5     -1  
Thiemo        De Bakker           620     0.67     0.85        0.86      -7     -1  
Florian       Mayer               474     0.64      0.8        0.81      -6     -2  
Roger         Federer             980      0.7     0.89        0.90     -11     -2  
David         Nalbandian          387     0.63     0.78        0.79      -6     -2  
Viktor        Troicki             721     0.64      0.8        0.81      -9     -3  

Marcos        Baghdatis           823     0.64      0.8        0.81     -10     -3  
Fernando      Verdasco            829     0.64      0.8        0.81     -10     -3  
Alexandr      Dolgopolov          561     0.63     0.78        0.79      -8     -3  
Andy          Roddick             868     0.72     0.91        0.92     -11     -4  
Jeremy        Chardy              630     0.63     0.78        0.79      -9     -4  
Jo Wilfried   Tsonga              607     0.68     0.86        0.88      -9     -4  
Guillermo     Garcia Lopez        662     0.63     0.78        0.79     -10     -4  
Julien        Benneteau           561     0.62     0.76        0.78      -9     -4  
Juan          Monaco              584     0.62     0.76        0.78      -9     -4  
Thomaz        Bellucci            706     0.63     0.78        0.79     -10     -4  

Philipp       Kohlschreiber       723     0.66     0.83        0.85     -11     -5  
Juan Carlos   Ferrero             536     0.65     0.81        0.83     -11     -6  
Ernests       Gulbis              542     0.67     0.84        0.86     -11     -7  
Marcel        Granollers          508     0.61     0.73        0.76     -13     -9  
Gilles        Simon               462     0.65      0.8        0.83     -14    -10  
David         Ferrer              950     0.65     0.81        0.83     -19    -10  
Mardy         Fish                647     0.68     0.85        0.88     -16    -11  
Michael       Llodra              551     0.67     0.83        0.86     -17    -12  
Tommy         Robredo             512     0.64     0.78        0.81     -17    -12  
Denis         Istomin             720     0.65      0.8        0.83     -21    -15

Any list with John Isner at the top or bottom is going to lend itself to some breezy conclusions, but a more thorough look tells us that strong and weak servers are scattered throughout the list.

Let’s return to the issue of consistency, and see how much it might account for these differences.

Impossible consistency

Imagine a hypothetical player who, on average, is a middle-of-the-pack server, winning 65% of his service points.  Whether because of his own inconsistency, or because of the variety in environments and opponents, he wins 60% of his service points half the time, and 70% of his service points the other half.

A perfectly consistent 65 percenter, according to the model, will win 83% of service games.  But the half-60/half-70 server will win only 81.9% of service games.  For a player who compiles 800 service games, that’s nine fewer winning games than the algorithm predicts from his aggregate numbers.

A fair amount of variance could be due solely to differences between opponents.  David Ferrer won 42% of return points this year, while Isner won only 29%.  Factor in higher success rates on hard courts or in warmer weather, and it’s a wonder that so many players come close to winning as many service games as are expected of them!

We may be able to get closer to the bottom of this by looking at single-match results.  Roger Federer probably wins different percentages of service points against Ferrer than he does against Isner, but perhaps when we adjust each match for the opponent’s return skills, we’ll discover that he regularly outperforms the model.

Strategy and clutch

One factor working in the other direction is that of strategy.  As we saw in the single-game probability table, a reasonably good server at 30-0, 40-0, or 40-15 has little to fear.  Nobody is tanking points with those scores, but some players may take the opportunity to try something riskier than usual, perhaps to keep his opponent guessing later in the match.

With sufficient point-by-point data, we could test this hypothesis, but without it, we can only speculate that servers may be less likely to win these specific points when the game is already heavily in their favor.  If this is frequently the case, players should be winning more service games than the model predicts.

Finally, back to clutch.  Because there are so many other factors at play, it would be foolish to point to the rightmost column in the table above and call it anything in particular, let alone “clutch.”  Among other problems, we don’t even know whether outperforming the model is a repeatable skill.  I’d love to show you the tables from previous years, but the necessary data collection is going to take some time.

Advertisements

2 Responses

Subscribe to comments with RSS.

  1. Alex said, on December 6, 2010 at 12:20 am

    wonderful.

  2. Håkon said, on December 6, 2010 at 10:02 am

    I think most people would agree that any list showing Cilic in second place can’t possibly be ranking clutch. 😉

    Great work though.


Comments are closed.

%d bloggers like this: