EVALUATING HITTERS BY
MONTE CARLO SIMULATION
In chapters 2 and 3 we showed how to use Runs Created and Linear Weights to evaluate a hitter's effectiveness. These metrics were primarily developed to “fit” the relationship between runs scored by a team during a season and team statistics such as walks, singles, doubles, triples, and home runs. We pointed out that for players whose event frequencies differ greatly from typical team frequencies, these metrics might do a poor job of evaluating a hitter's effectiveness.
A simple example will show how Runs Created and Linear Weights can be very inaccurate.1 Consider a player (let's call him Joe Hardy after the hero of the wonderful movie and play Damn Yankees) who hits a home run at 50% of his plate appearances and makes an out at the other 50% of his plate appearances. Since Joe hits as many home runs as he makes outs, you would expect Joe “on average” to alternate HR, OUT, HR, OUT, HR, OUT, for an average of 3 runs per inning. In the appendix to chapter 6 we will use the principle of conditional expectation to give a mathematical proof of this result.
In 162 nine- inning games Joe Hardy will make, on average, 4,374 outs (162 × 27 = 4,374) and hit 4,374 home runs. As shown in figure 4.1, we find that Runs Created predicts that Joe Hardy would generate 54 runs per game (or 6 per inning) and Linear Weights predicts Joe Hardy to generate 36.77 runs per game (or 4.08 runs per inning). Both estimates are far from the true value of 27 runs per game.
How can we show that our player generates 3 runs per inning, or 27 runs per game? We can do so by programming the computer to play out many
1 This was described to me by Jeff Sagarin, USA Today sports statistician.