Mease, D. (2003), "A Penalized Maximum Likelihood Approach for the Ranking of College Football Teams Independent of Victory Margins," the American Statistician, 57, 241-248: Comment by Rothman and Reply
1. As with any Bayesian or crypto-Bayesian approach, one defect is that this kind of methodology will produce poor results when applied to a set of teams having vastly different abilities. This phenomenon has been noted for the Colley system, and has been named the Mount Union effect, since Mount Union often receives an absurdly high ranking when college football ranking systems are extended to incorporate lower division schools. Yet even with Division I, the rankings in the article of Miami (Ohio) in 1975 (ranked 8th when the polls' ranking averaged 14th and the Foundation for the Analysis of Competitions and Tournaments (FACT) ranked them 25th among all teams, 24th in Division I); Tulane after the 1998 regular season (7th when the polls averaged 10th and FACT only 14th); and Marshall after the 1999 regular season (5th when the polls averaged 11th and FACT had them 12th) show that such a method cannot even handle schools with very weak schedules within Division I. But the worst such case may have been in 1975, where the article does not state which team got the 4th position in the author's ranking. That team was most likely Arkansas State (unranked in both polls, while 13th in FACT's list of all teams, 12th in Division I). We see once again a serious upward bias introduced by a Bayesian procedure. Arkansas State's schedule was very weak, with only three opponents in FACT's top 100 (Louisiana Tech at 49th, Cincinnati at 64th, Memphis State at 67th), so even good margins and an 11-0 record can only help so much (in a non-Bayesian procedure).
2. This article uses the average of the major polls as the gold standard, and that is simply unacceptable. Our gold standard should be the standings in round robins, where a game played at the beginning of the tournament is weighed exactly the same as one at the end. But human voters are subject to weighing recent games more highly, as we could see after a 1989 bowl loss of previously undefeated Colorado by Notre Dame pushed Colorado below even 10-2 Florida State, a result off by only one ordinal, but absurd when you study the record. Polls also use information from the previous season to shade the rankings, but surely we would not want such influence. The polls get increasingly unreliable as the ordinal nears 25, so the author made a good choice in using only their top 15. However, the differences should have been normalized by some increasing function of those ordinals, because teams further down on the list are likely to be more closely spaced than those at the top. Thus, an ordinal off by only 1 in 15th position is generally a lot less significant than in 1st place. I would suggest that square root of ordinal be used for normalization, but the gold standard should be better than the polls.
The FACT methodology was described fully in the 2002 Proceedings of the JSM, where I gave a paper detailing my history in sports ranking, and the computer code for this methodology (a simple variation of the logistic model, in which grades 0-1 are assigned to margins) has been posted for years on the Internet, reproduced by Dr. Peter Wolfe, and applied to hockey and college basketball.
Foundation for the Analysis of Competitions and Tournaments (FACT)
I would like to thank the writer for his interesting comments. …