Bridging Different Eras in Sports

Article excerpt


This article compares the performances of athletes from different eras in three sports: baseball, hockey, and golf. A goal is to construct a statistical time machine in which we estimate how an athlete from one era would perform in another era. For examples, we estimate how many home runs Babe Ruth would hit in modern baseball, how many points Wayne Gretzky would have scored in the tight-checking National Hockey League (NHL) of the 1950s, and how well Ben Hogan would do with the titanium drivers and extra-long golf balls of today's game.

Comparing players from different eras has long been pub fodder. The topic has been debated endlessly, generally to the conclusion that such comparisons are impossible. However, the data available in sports are well suited for such comparisons. In every sport there is a great deal of overlap in players' careers. Although a player that played in the early 1900s never played against contemporary players, they did play against players, who played against players, . . ., who played against contemporary players. This process forms a bridge from the early years of sport to the present that allows comparisons across eras.

A complication in making this bridge is that the overlapping of players' careers is confounded with the players' aging process; players in all sports tend to improve, peak, and then decline. To bridge the past to the present, the effects of aging on performance must be modeled. We use a nonparametric function to model these effects in each sport. An additional difficulty in modeling the effects of age on performance is that age does not have the same effect on all players. To handle such heterogeneity, we use random effects for each player's aging function, which allows for modeling players that deviate from the "standard" aging pattern. A desirable effect of using random curves is that each player is characterized by a career profile, rather than by a one-number summary. Player A may be better than player B when they are both 23 years old, and player A may be worse than player B when they are both 33 years old. Section 3.4 discusses the age effect model.

By modeling the effects of age on the performance of each individual, we can simultaneously model the difficulty of each year and the ability of each player. We use hierarchical models (see Draper et al. 1992) to estimate the innate ability of each player. To capture the changing pool of players in each sport, we use separate distributions for each decade. This allows us to study the changing distribution of players in each sport over time. We also model the effect that year (season) has on player performance. We find that, for example, in the last 40 years, improved equipment and course conditions in golf have decreased scoring by approximately 1 shot per 18 holes. This is above and beyond any improvement in the abilities of the players over time. The estimated innate ability of each player, and the changing evolution of each sport is discussed in Section 7.

Gould (1996) has hypothesized that the population of players in sport is continually improving. He claimed there is a limit to human ability - a wall that will never be crossed. There will always be players close to this wall, but as time passes and the population increases, more and more players will be close to this wall. He believes there are great players in all eras, but the mean players and lower end of the tail players in each era are closer to the "wall." By separating out the innate ability of each player, we study the dynamic nature of the population of players. Section 8 describes our results regarding the population dynamics. We provide a discussion of Gould's claims as well.

We have four main goals:

1. To describe the effects of aging on performance in each of the sports, including the degree of heterogeneity among players. Looking at the unadjusted performance of players over their careers is confounded with the changing nature of the players and the changing structure of the sports. …