Elite Pitchers and WAR
Many players began this off-season with no contract. Many of those players are starting pitchers. Starting pitchers like Zack Greinke, Jake Peavy and Anibal Sanchez. Through the following research, I found that it seems that perhaps WAR systematically overvalues elite starting pitchers–something to be aware of during this off season and those to follow.
When valuing players, there are two primary stats we turn to: Wins Above Replacement (WAR) and Win Percentage Added (WPA). WAR completely strips every play of its context, valuing a walkoff grandslam the same as a first-inning solo shot. On the other hand, WPA attempts to calibrate for each unique situation, and it would value that walkoff slam quite a bit more than the first-inning solo homerun. I think it’s fair to say that WAR’s primary advantage over WPA is its ability to filter out a lot of statistical noise and hone in on a player’s true talent level. Though WPA may be better at explaining what happened and when it happened, it also gives players a lot of credit for something not believed to be a repeatable skill: clutch performance.
But it’s not my intention to get into an argument about the validity of either stat. I actually want to encourage the use of both in conjunction, especially when valuing elite starting pitchers as a group.
For an individual player, WPA’s inclusion of “clutch value” tends to make it a very fickle statistic. In 2003, Albert Pujols slashed .359/.439/.667 with 43 homeruns. He would have won the NL MVP if not for some guy named Barry Bonds. But that season also represented just the fifth best season of Pujols’ career in terms of WPA, probably due to the fact that the average leverage of his plate appearances that season was well below average (0.93). Pretty much by dumb luck, Pujols didn’t get as many crucial trips to the plate. This is mostly why WPA is not used as often as WAR. Too much is out of the player’s control.
However, while one player’s WPA may not be representative of his skill set, the WPA of an entire group of players can hint at trends to which we should pay attention. I think WAR is the product of a lot of smart thinking, but its greatest strength is also its primary shortcoming. Stripping context out of a value statistic completely perhaps ignores patterns of how and where that value is distributed throughout the season. For example, an elite pitcher has a lot of control over about 35 games per season, and virtually no control over the other 127. It would seem that good pitchers are more likely to be involved in blowout wins, since they have so much control over the outcome of those games relative to position players. Thus, WAR might have a tendency to overvalue good starting pitchers, giving them larger chunks of WAR in situations that don’t help the team. But how to measure it… Oh I know! WPA!
To get started, I took the top 10% of hitters and the top 10% of pitchers by WAR in 2012 (through September 23rd). I then subtracted each player’s WPA from his WAR to get an idea of how much WAR overvalues WPA. While there are nuances to the concepts of “above replacement” (WAR) versus “above average” (WPA), those nuances should be flushed out later when I compare the two groups. For 2012, the top 10% of pitchers recorded a WAR 2.55 wins greater than WPA. The top 10% of hitters only recorded a WAR 2.11 wins greater than WPA. Though the 2.55 and 2.11 figures themselves shouldn’t mean much to us, the 0.44-win difference should. That figure approximates that WAR is overvaluing elite starting pitchers by 0.44 wins on average, relative to the elite hitters. I think this is due to a greater percentage of the WAR an elite starting pitcher accumulates being wasted on blowout wins.
So while the WPA of one single player is highly variable, looking at WPA trends in 2012 for elite starters suggests that, on average, WAR is overvaluing top pitchers. But why use small sample sizes when you can use big sample sizes?
I performed the same test, combining the 2009 through 2011 seasons. However, to qualify for elite status, a player had to be in the 85th percentile for both WAR and WPA over the three seasons. This hopefully reduced any sampling biases of the first method. The hitter group recorded a WAR 5.9 wins greater than WPA, while the pitcher group’s difference climbed to 7.7 wins. Again, we expect WAR to outpace WPA since the baseline for WAR is lower, so these differences in and of themselves are not surprising. But when we compare the differences between the pitchers and hitters groups—the differences in the differences, if you will—the pitchers were “overvalued” by an additional 1.8 wins compared to the hitters (7.7 minus 5.9), or about 0.6 wins per season. In terms of money, that’s worth approximately $2.5M on the free-agent market these days, and not something that should be entirely ignored.
*It should be noted that the differences in WAR minus WPA for the 2009 – 2011 tests were highly significant, according to both a T-test and a Wilcoxon Rank Sum test.