The Summer of Jeff

Python code for WPA stats

Posted in baseball analysis, programming by Jeff on February 7, 2011

A long time ago I put together a python version of the win expectancy/volatility calculations contained in Studes’s WPA spreadsheet.  Those were the days–if we wanted a post-game WPA graph, we had to do it ourselves :).

I’ve brushed off the cobwebs and published the code.  Click here to see it.

All this does is calculate the win expectancy and volatility (~leverage) in any situation.  It doesn’t calculate WPA on the play.  Of course, if you’re running this on a play-by-play log, it’s trivial to compare the WX of one play and the next.

‘Volatility’ is the difference between the win expectancies that would result from a home run and from a strikeout.  To normalize it so that the average volatility is 1.0, I have this code divide the result by 0.133.  Depending on your dataset, that might not be quite right.  There are more sophisticated ways to measure leverage, though this one is adequate for many purposes.

Thank you Studes, Tango, and others for publishing all that you have.  As is so often the case, I’m just the code monkey.

Comments Off on Python code for WPA stats

Python Code for Marcel Projections

Posted in baseball analysis, programming by Jeff on January 14, 2011

A while back, I posted retro-Marcel projections for over 100 seasons.  They were generated with some python code, and now you can play with it.

You’ll also need some Baseball-Databank files.  (Well, you don’t need them, but they will make the process much easier.)

The ‘import’ lines refer to a few utilities that I’ve written.  Those are also available on gitHub.  At some point, I’ll write up a summary of some of my Python utilities.  I’m sure that none of them are original (for instance, turning a 2-d matrix into a .csv, or vice versa), but I use them all the time, and they might come in handy for you, too.

Comments Off on Python Code for Marcel Projections

Minor League Splits Databases Available

Posted in baseball analysis by Jeff on January 11, 2011

It’s been a good run, but Minor League Splits will no longer exist in its original form.

I took down the site a few months ago, and have decided to make all of the underlying databases freely available.  This includes full play-by-play of all affiliated minor leagues in the U.S. from 2005 to 2010.

I haven’t decided whether I will update this with 2011 data at the end of next season.  At the very least, I won’t be doing any in-season updates.

At some point in the future, I may open source the code I use to collect and analyze the play-by-play.  That’s a ways off, though.  This was my first major programming project five years ago, so the original code isn’t very good, and as MLBAM has changed things, I’ve added one ugly hack on top of another.  As is, the code is probably not usable for anyone aside from me.

Greinke to the Brewers

Posted in baseball analysis by Jeff on December 20, 2010

The evidence would suggest that Doug Melvin is a very single-minded man.

One year, he was all about relief pitching.  Another, focused on defense.  Yet another, building starting pitching depth.

This year, he has focused on high-quality starting pitching to the exclusion of everything else.  First, he swapped 2008 first round pick Brett Lawrie for Blue Jays starter Shaun Marcum.  Now, he’s bet the farm on Zack Greinke, giving up four youngsters in a deal for the 2009 Cy Young Award winner.

One thing is clear: The Brewers rotation looked dreadful a few weeks ago, and now it looks very solid.  Greinke, Marcum, and Yovani Gallardo comprise a 1-2-3 that any team (except the Phillies) would desire, and adding Randy Wolf to that threesome makes it look even better.

Of course, the wisdom of a deal isn’t just in what the get.  The real question is, did the Brewers give up too much to increase their odds of winning in 2011?

The end of the Prince Fielder era

I’ve long been surprised that no one blew away Melvin with a trade offer for Prince.  For whatever reason, it hasn’t happened, and Fielder is still in the fold … at least for a few more months.  The team will look very different without him, and it’s all but certain that the first baseman will be elsewhere in 2012.

If the Brewers are going to win with Prince, it’s going to have to be in 2011.  And if the Crew is to win in 2012 and beyond without Prince, the team will have to rely less on a deadly 3-4 combination.

At the risk of stating the obvious: The trades make the Brewers better in 2011.  The difference between 400+ innings of Greinke and Marcum and 400+ innings of whoever the hell else Melvin would have dug up will be huge.

Some of the discarded players–Escobar, Cain, and maybe Jeffress–were part of the Crew’s 2011 plans, but none were as crucial as the starting pitching gained in exchange.  Escobar should continue to improve at the major league level, but there’s no guarantee that’s going to happen soon, suggesting that his .614 2010 OPS might be better as somebody else’s problem.

Cain excelled in an extended audition last year, but we may have seen the best he’ll ever produce.  He’s turning 25 in April, making it possible he’s a late bloomer, or more likely, that he’ll have a few league-average seasons before becoming a part-timer in his 30’s.  Jeffress has always had huge potential, but can only be viewed as a reliever at this point, and one with serious control issues.

The 2011 Brewers

In other words, there’s little short-term tradeoff.  I’m not convinced that the Greinke-Marcum duo makes the Brewers favorites in the Central next year, but you could certainly make the argument.

The biggest problems with the 2011 squad are the ones that existed before the deals.  A month ago, we wondered whether Escobar and some combination of Cain and Carlos Gomez would provide anything better than replacement-level offense.  Now the focus is on the same positions, but with less hope.  Now both spots are up for grabs this spring, with Yuniesky Betancourt and Gomez presumably in the lead.

I think it’s safe to assume that at least one of those two positions will end up being dreadful.  Maybe both.  I also think the Brewers should be happy with that result.  I’d rather have two studs and two replacement level players than four mediocre to average players.

Frankly, it’s tough to imagine Doug Melvin making any other (realistic) pair of moves to better boost his team’s chances for the 2011 playoffs.  If you don’t like the deal, it can only be because you’re concerned he traded away too much of the future.

2012 and beyond

Did he?  As is always the case in these sorts of deals, it will be at least seven or eight years before we know the whole story.

Let’s take a specific look at 2012.  As mentioned, the Brewers assume they’ll be without Fielder.  They will, however, still have a substantial core–just about everybody else in the 2011 lineup except for Fielder and Rickie Weeks, and the same top four in the rotation.  Slot Mat Gamel into first base, assume you’ve got a bit of money to play with after Prince’s departure and everybody else’s raise, and that still looks like a pretty good team.

To find the possibility of a serious downside, you have to look further into the future.  Sure, it would be nice to have Lawrie playing second base as soon as Weeks departs, but it’s very possible Lawrie will never play a major league inning as a second baseman.

It’s fun to wishcast a 2013 or 2014 Brewers team with Escobar at short, Cain in center, and Lawrie somewhere.  Maybe Jeffress closing and Odorizzi enjoying a successful rookie year in the rotation.  Realistically, though, Escobar and Cain may never be reliably league-average hitters; Lawrie may end up stuck in a corner; Jeffress could just as easily flame out as last a full season as an 8th-inning guy, and Odorizzi hasn’t yet pitched above single-A.

The combined packages for Greinke and Marcum are better than what Milwaukee sent to Cleveland for CC Sabathia, but not hugely so.  Here, we get two starters for two years each, instead of one for half a season.  Just because that deal hasn’t panned out for Cleveland doesn’t mean these won’t for the Jays or Royals.  But the playoff run in 2008 showed us just how unimportant prospects are in September–unless you swap them for someone valuable.

Without these deals, the Brewers wouldn’t have made the playoffs next year.  It’s possible they wouldn’t have broken .500.  Now, they are contenders in 2011, and it will only take one or two solid moves to make them contenders again in 2012.  Literally or figuratively, “the entire farm system” might just be worth it.

If relegation met baseball

Posted in baseball analysis by Jeff on September 7, 2010

One tidbit jumped out at me from the book Soccernomics.  Fans turn out for games that matter. (Big insight, right?)  The category of “important games” of course includes contests such as those that help determine who makes the playoffs.  But in European soccer, fans also turn out for games that help determine which teams are relegated, and which will take their places.

For the uninitiated, here’s a quick background on relegation.  There are four leagues in British soccer.  At the end of each season, the worst performing teams in each league are demoted to the league one level lower, while the best performing teams are promoted to the league one level higher.  It’s kind of like promoting the Triple-A champions to the major leagues, but without the problem of team affiliations.

So, British soccer has stumbled upon what Major League Baseball might view as the holy grail: a way to get fans to come see the worst teams in the league.  Not every MLB team can have Trevor Hoffman chasing his 600th save.

Introducing relegation stateside

How would relegation work if applied to North American baseball?  It would require some massive changes to MLB’s structure, probably radical enough that we can be sure they’ll never happen.  Let’s consider it anyway.

First off, it’s important to throw away the metaphor of “promoting Triple-A teams to the majors.”  Unless we go back to the pre-Branch Rickey days of unaffiliated high-level minor league teams, that just doesn’t work.  Clearly, we can’t have the Scranton-Wilkes Barre Yankees competing with the New York Yankees.

That means that if we’re going to have leagues of different levels, we have to carve them out of the current 30 squads.  Let’s say we take the 10 worst teams (by won-loss record) at the end of some season, and with them, form the “Challenger League.”  Leave the current American League and National League in place, and give each league a two-division structure.

Each team would play an unbalanced schedule, primarily playing teams within their league, but playing a fair amount of interleague games.  We may be bored with Blue Jays-Phillies matchups, but I suspect the relegation aspect would spice things up.  Sure, AL/NL teams would be favored over Challengers, but they wouldn’t win every time, and Challenger League fans (like fans of British soccer) could even gloat over losses, if they were close enough.

A(nother) new playoff structure

At the end of each year, two (or maybe four) teams are promoted from the Challenger League to take the places of the worst-performing teams in the AL and NL.  Thus, not only do the pennant races matter in the Challenger League, but the cellar races matter in the AL and NL.

The current playoff structure could be kept almost intact.  Award playoff spots to the winning teams in each division: 2 AL, 2 NL, and 2 CL, then give a wild card to the best remaining team in the AL and NL.  Maybe the CL division winners wouldn’t “deserve” a spot, but what the hell.  Worst case scenario, it’s a “bye” for the top-seeded team in each league, and it emphasizes the temporary nature of relegation.

The toughest aspect of managing relegation from year to year is keeping the leagues balanced and travel schedules under control.  Occasionally, a team would have to switch from the AL to NL, or perhaps one would be demoted from the AL, only to come back in the NL.  The geographical balance of each league would be temporary; perhaps it would be best if each league’s East and West divisions were allowed to vary between 4 and 6 members.

Introducing relegation would take a huge shift in a sport that isn’t very good at accepting huge shifts.  None of the individual steps (except for the promotion/demotion itself) is that huge: We’ve already seen league realignments, division realignments, interleague play, and changes to the playoff system.

But consider the counterpoint.  If MLB could sell the fans on this, 20 or more teams would in a race for something all year long.  For Yankees fans, it would mean fewer Yankees-Orioles games.  (Until the O’s got good again, anyway.)  For Orioles fans, it would mean a shorter time horizon before making the playoffs through the Challenger League.  Teams like the Brewers and Blue Jays could maintain fan interest with their yearly battle to avoid relegation.

It’s not going to happen.  But it sounds like fun.

Comments Off on If relegation met baseball

Posted in baseball analysis by Jeff on August 2, 2010

I’ve now uploaded year-by-year spreadsheets with Marcel projections for pitchers, dating back to 1906.  Each set of projections relies on the previous three years of data and uses Tango’s algorithm exactly, barring mistakes in my coding or interpretation.

I started with 1906 because the Baseball Databank spreadsheet only has batters faced (BF) back to 1903.  It would be easy enough to estimate BF, but for now, I’ve only projected years where I can use BF for all seasons the projections draw upon.

As with hitters, I’ve made no league-specific adjustments.  ERA is calculated using Marcel-projected IP and ER; I didn’t do any kind of DIPS or BsR adjustments.

Posted in baseball analysis by Jeff on July 23, 2010

Following up on my earlier post about historical Marcels, I’ve just posted full spreadsheets with Marcel forecasts for hitters going back to 1901.

For each year, I included a prediction for every non-pitcher who appeared in any of the previous three seasons, or would be a rookie that year.  (For the rookies, of course, the prediction is very close to league average.)  I didn’t distinguish at all between leagues, which I’m sure creates some wonkiness, especially around the years of the Federal League, the years of World War II, and the last few seasons, when the AL/NL difference became stark.

Click here for the directory with single-year spreadsheets available for download.  At some point in the near future I’ll add pitchers, and at some point in the less-near future I’ll make a more user-friendly interface so that you can view the stats directly on the web.

Marcel forecasts 1941

Posted in baseball analysis by Jeff on July 18, 2010

Here’s something I’ve been meaning to do for a long time.

Using 1938-40 stats and the Marcel forecasting algorithm, I generated batter projections for the 1941 season.  You can download the full spreadsheet (CSV) here.

My intention was to follow Tango’s algorithm exactly.  I didn’t do any park adjustments, nor did I consider any league differences.  I included a “reliability” column to indicate how much data each projection was based on.  Those numbers top out at about 0.88 for a player who played full seasons in 1938-40, and is 0 for a rookie with no MLB experience.  I ran projections for every non-pitcher who played in 1938, ’39, ’40, or ’41.

As you might imagine, Ted Williams and Joe DiMaggio come out near the top.  But neither would have been projected to have the best offensive season in 1941; that honor went to Hank Greenberg, who had to leave a 269/410/463 slash line when he was drafted in May.  Those three, along with Johnny Mize and Jimmy Foxx, would have been forecast as the best of the bunch; Marcel gives them all wOBAs between .431 and .441.  No one else is above .407.

Pitchers and other seasons to come soon.