Steroids and Runs: Correlation or Causation?

facebooktwitterreddit

Mark L. Baer-USA TODAY Sports

It’s not like we as fans are ignorant to the use of performance-enhancing drugs (PEDs). We didn’t necessarily need another scandal to reaffirm our suspicions. But that’s what we got. I’m not here to discuss the morality of PED usage—mostly because morality is subjective, and I can’t support any of my feelings with numbers. But I can, somewhat objectively, assess the ponderings of one Carson Cistulli (among others).

In a brief conversation on his podcast—in which Cistulli remarkably covered all baseball—Cistulli posed the question to Dave Cameron, “What do we know about the ways in which PEDs help players?” Cameron replied, “We don’t know,” and alluded to the fact that there is still controversy in the sabermetric and scientific communities concerning the answer to that question. He added that there sure are an awful lot of shitty players that get caught taking PEDs. PEDs, Cameron suggested, can’t make a bad player good.

I can’t promise conclusive answers to the question above. If I could, I would be famous. But I can show you some graphs and transcribe a conversation I had with my computer.  (Yeah, I’m weird). Here goes.

Check out the following two graphs showing runs-per-game first, and then homerun rates over the last century:

If we look at too small a window, say 1985 to 1995, we get caught up focusing on the run-scoring spike in the early 90s and ignore the greater upward trend that began in the late 1960s. This is not to say that PEDs had no effect, only that they were part of a long-term pattern and perhaps less influential than we tend to think.

Here we also see a spike in homerun rates in the early 90s, but as with runs, the spike is part of a longer-term trend. In fact, there seem to be three major humps in both graphs around-ish the same time periods.* It’s tempting to go back and find events that corresponded to those spikes—the mound being lowered in 1969, or perhaps the addition of the DH in 1973—and claim causation. That could be right, or we could have simply found a few convenient events to confirm what we already believed.

At this point I felt it necessary to have a conversation with my computer, and here’s how it went down.

Me: Computer, can you possibly find any sudden changes in run-scoring since 1900?

Computer: No hints?

Me: No hints.

Computer: Let me try. [Thinks for 5 milliseconds]. 1939. But nothing else blows my mind.**

Me: How about home runs?

Computer: 1965.

Me: 1965? That’s interesting, what about the late 80’s?

Computer: Nothing significant. Homeruns continued their upward trend.

Me: What about the early 90’s?

Computer: Ah, there’s something here between 1992 and 1993. Homeruns increased. Statistical significance and all that jazz.

Computer told me what we already knew—that when we look at the year that homeruns increased, we see that homeruns increased. That sounds like a stupid statement, but here’s my point: when we look for what we think we already know, we find it. It’s called confirmation bias. However, when we let an unbiased algorithm look for major changes, it found something else. That something else was not necessarily contradictory, but the computer didn’t find the early 90’s to be significant until I told it to look there.

If you think about it, we don’t really know when the steroid era began. There is no definitive data set that tells us how many players were using during each season. We have been letting the increased power numbers (and Jose Canseco) dictate when the steroid era began, but then arguing that steroids led to the power. That’s circular! Steroids could very well have caused some of the power spike—I’m not saying they didn’t—but the way we’ve been going about it doesn’t prove anything.

Hypothetically, if we were to know that players started using en masse in 1986 (or 1987 or 1997), then that actually might provide evidence against steroids helping players hit home runs since the biggest jump occurred in 1993. We can’t let the power numbers determine the steroid era, and then turn around and claim the steroid era led to the increased power levels. As of right now, all we know is that there seems to be an approximate correlation between the two.

Unlike adding cork to the center of the ball in 1910, lowering the mound and reducing the size of the strike zone in 1969, or adding the DH in 1973, the steroid era does not have a definitive beginning. Looking at those graphs, there are plenty of sudden changes that don’t fall on or around any known alterations to the game of baseball. Why couldn’t one of those spikes have just happened between 1992 and 1993? How do we know that’s precisely the year more players began using? The answer is that we don’t.

*Basically the 1920s, 50s and 90s. Indeed, there was statistically significant correlation of 0.29 between the two stats over the last 100 years.

**Using time series functions in R’s basic package as well as “urca” and “strucchange”