The Summer of Jeff

Calculating the Probability of a Streak Within a Season

Posted in baseball analysis, programming by Jeff on September 18, 2017

The 2017 baseball season has had its share of team streaks. I wrote about the Indians record-breaking 22-game winning streak for the Economist, and needed to calculate the odds that a team of Cleveland’s quality would, over the course of a 162-game season, win so many consecutive games.

It turns out that this is a rather difficult problem. Baseball fans and pundits have published a number of probabilities over the course of the streak, but many fall into two categories: (a) the odds of winning all 22 games out of a specific set of 22, or (b) the odds of winning 22 out of 22, multiplied by 162/22, or the number of 22-game stretches that occur in the baseball regular season. The first solution generates some extremely long odds–it’s much harder to win 22 specific games than to assemble a streak over a longer time frame. The second solution is a bit better, but still gets it wrong.

My solution isn’t 100% correct (and I’ll discuss its limitations in a bit), but it gets much closer, especially for rare events such as the Indians streak. We must calculate the probability of an exactly 22-game streak, then the probability of an exactly 23-game streak, all the way up to a full-season, 162-game streak. An exactly 22-game streak is really 24 games long. To exclude 23-gamers (or longer), the 22-game streak must be preceded and followed by losses. There are also the edge cases of streaks that begin or end the season–and thus cannot be both preceded and followed by losses–so we must handle those separately.

So, for the probability of an exactly 22-game streak:

  1. find the probability of a 24-game stretch consisting of a loss, 22 wins, and then a loss;
  2. count the number of 24-game stretches in the course of a season (the length of the season minus 24 minus 1, or 137, for the MLB regular season);
  3. multiply (1) by (2)
  4. to handle the edge cases, find the probability of a 23 game stretch starting (or ending) with a loss, followed (or preceded) by 22 wins;
  5. multiply (4) by 2
  6. add (3) and (5)

Then repeat the process for every streak length from 23 up to 161, and then calculate the probability of 162 wins in 162 games.

Clearly you’re not going to do this by hand. To manage it, I wrote up a python script, which can be customized for three variables: streak length, season length, and the odds that the team will win a single game.

The problem with this method is that is double-counts any seasons with multiple qualifying streaks. The longer or rarer the streak, the less likely this is a problem–just imagine the odds against Cleveland winning 22 in a row twice in the same season, even if it were still possible with fewer than 22 games remaining. But for the sake of completeness, it’s important to realize the answer given by this algorithm is not precise. And if you use it to calculate the probability of, say, a 3-game winning streak at some point in the season, the error is going to be so great as to make the whole exercise worthless.

(For the more math-conversant, here’s a discussion of the problem along with a fully correct answer. It would be possible to expand my solution in python to render it complete, but it would increase the complexity quite a bit.)

Indians, Dodgers, and long odds

As I noted in the Economist piece, the odds that a team of the Indians quality–a pythagorean winning percentage of 65.77% through the final game of the streak–would win 22 in a row at some point during a season is about 200 to 1. I was surprised it was that likely, but on the other hand, there are very few teams of that quality. Also, if we consider the range in team quality from game to game due to the differences in starting pitching, the likelihood of such a streak goes down.

Playing around with the algorithm led me to a surprising finding: If we assume the Dodgers are also a very good team, as their record suggests, the likelihood of their own 11-game losing streak was lower than the probability of the Indians reeling off 22 wins in a row. At the same pythag of 65.77%, the odds against an 11-game losing streak are over 1,000 to 1, and even at a modest 60%, the odds against dropping 11 in a row are over 250 to 1.

That doesn’t even consider their almost-adjacent five-game losing streak, which meant that one of the best teams in baseball lost 16 out of 17. The probability of that requires solving a slightly different problem, one that I’ll leave as an exercise for the reader.


Comments Off on Calculating the Probability of a Streak Within a Season