Comment

Article excerpt

We appreciate Albright's contribution to the growing literature on deviations from simple Bernoulli models in sports. His analysis of baseball hitting streaks parallels that of Tver-sky and Gilovich (1989) on basketball shooting streaks, but with two significant improvements. The main improvement is that when testing for streaks, Albright adjusts success probabilities for the varying situational difficulties that occur naturally in all sports (e.g., long-range shots or layups in basketball and varying pitching abilities in baseball). He also has a much larger sample than that available for the basketball analysis.

A large baseball data base has enabled Albright to assemble 501 player-seasons of batting records for his analyses. These data include some possibly important predictors of performance (e.g., the score while batting, the pitcher's ability, and the pitcher's left- or right-handedness), thereby allowing adjustments for situational difficulty. The author provided these data to the discussants, and we base our comments partly on a reanalysis of data on 40 full-time players available for each of 4 seasons.

Our main conclusions are summarized next.

1. The tool for detecting streaks emphasized in Albright's article, logistic regression, has almost no power to detect in a single player-season the kind of streak-hitting behavior that realistically might be expected. We believe that adding or subtracting .050 to a player's probability of success during a streak to be nearly the maximum realistic effect size. But the power to detect such an effect is less than 10% (see Sec. 1 and Table 2).

Table 1. Null Distribution of Logistic Regression Z for 560 iid
Bernoulli Trials with p = .280 (10,000 Replications)

                                           Distribution of Z

  Predictor         Mean Z  Std. dev. Z    .025      .50      .975
                                         quantile  quantile  quantile

[Y.sub.i-1]          -.046    .991        -1.934    -.038     1.948
Exp. avg. wt = .80   -.201    .984        -2.080    -.189     1.738
Exp. avg. wt = .95   -.289    .982        -2.197    -.288     1.618
Nominal values           0   1.000        -1.960        0     1.960

Table 2. Power of Logistic Regression Using Exponential
Moving Average With Weight .80 as Predictor

Model for data  [DELTA]  Mean Z  Pr(Z > 1.645)  Pr(Z > 1.960)

Nominal null       0         0      .0500          .0250
Null               0     -.201      .0316          .0134
Markov-1        .025      .283      .0863          .0462
Markov-1        .050      .768      .1943          .1159
Markov-1        .075     1.277      .3580          .2560
Markov-1        .100     1.759      .5437          .4203
Markov-1        .150     2.732      .8592          .7745
Cyclical        .025     -.102      .0415          .0211
Cyclical        .050      .136      .0706          .0398
Cyclical        .075      .565      .1492          .0900
Cyclical        .100     1.065      .2983          .2026
Cyclical        .150     2.449      .7769          .6828
Random          .025     -.107      .0422          .0227
Random          .050      .122      .0685          .0348
Random          .075      .471      .1318          .0748
Random          .100      .951      .2580          .1704
Random          .150     2.173      .6871          .5804

NOTE: Each simulation is based on 10,000 replications of 560
at-bats with p = .280.

2. The finite sample bias of the logistic autoregression slope parameter estimate is substantial for samples corresponding to 1 year of batting data. Because logistic regression is repeatedly applied to the data for a single player-season, this bias operates strongly to mask streaks in Albright's analyses (see Sec. 1 and Table 1).

3. An analysis reported by Albright that aggregates results across players does indicate that a modest streak-streak-hitting effect exists, although this result is not emphasized. …