Distinguishing Judges: An Empirical Ranking of Judicial Quality in the United States Courts of Appeals
Anderson, Robert, IV, Missouri Law Review
How can one evaluate the performance of federal appellate judges? This question implicitly arises every time a federal appellate judge is nominated to the United States Supreme Court. And because the federal appellate bench is the most common source of Supreme Court nominees in recent decades, (1) this question is relevant to most modern Supreme Court nominations. But the question of judicial performance is at least as important outside the context of Supreme Court appointments, as the courts of appeals are the final arbiters of most disputes in the federal courts. Thus, the outcome of virtually every litigated matter in the federal system hinges on the quality of federal appellate decision-making, and therefore the performance of these judges implicates fundamental questions about the rule of law.
However, the importance of evaluating the performance of federal judges has not motivated systematic assessment of individual judges' work product in legal scholarship. Indeed, aside from anecdotal information, little is known about the performance of individual federal appellate judges. of course there is no dearth of scholarly critique of federal courts' products--the opinions in individual cases--but these critiques are not systematically organized into evaluation of the producers of the opinions--the judges themselves. Thus, in spite of the fact that the performance of individual judges has important implications for the functioning of the judicial system and rule of law, scholars still do not have a good idea of which judges are performing well and which judges are performing poorly. Few academic studies have even attempted to evaluate federal judges using quantitative data, and, when they have, they have generally received harsh criticism from scholarly commentators. (2)
The recent nomination of then-Judge Sonia Sotomayor to the Supreme Court illustrates the de facto alternative to systematic approaches to judicial quality. The Sotomayor nomination, like nominations of federal appellate judges in the past, tended to focus on detailed scrutiny of a small number of high-profile opinions, distracting from the broader, systematic examination of the nominee's body of work as a whole. (3) In the absence of reliable information about judicial performance, center stage in the debate is yielded to anecdotal accounts of anonymous sources, (4) isolated remarks from the judge's public appearances, and short passages in opinions culled from the tens of thousands of pages the nominee has written. Although we have the benefit of a more thorough evaluation from the American Bar Association, its approach has been called biased (5) and may be no more objective than the confirmation hearings. The result is that the evaluation of judicial performance is biased, subjective, and based on a narrow slice of information rather than on the judge's record as a whole.
The frustration with the prevailing approaches to assessing judicial quality, both in the context of Supreme Court appointments and otherwise, has led scholars and legal commentators to develop quantitative techniques to measure judicial performance. (6) The most prominent approaches in recent years use large databases of citations to evaluate the "influence," "prestige," or "quality" of judges. One of the first such papers, by Professors Landes, Lessig, and Solimine (hereinafter "Landes et al."), used citation counts to opinions to measure "judicial influence" in the federal courts of appeals. (7) More recently, Professors Choi and Gulati have expanded on the Landes et al. study, using citation counts to measure "productivity," "quality," and "independence" on the federal courts of appeals. (8) In contrast to the typical evaluation of judges' opinions that legal scholars perform in law reviews, the citation literature abstracts away from the details of the cases to systematically evaluate the whole body of the judges' work product.
The citation studies have revealed information about judicial performance that was not previously well known outside the ranks of experienced appellate advocates and federal judges themselves--if it was known at all. Perhaps for this reason, the studies have attracted considerable attention in legal scholarship, including multiple responses to Choi and Gulati's first judge ranking paper in the Southern California Law Review, (9) a symposium published in the Florida State University Law Review, (10) and conference proceedings published in the Duke Law Journal. (11) As might be expected, however, the responding scholars and judges have not enthusiastically welcomed this quantitative intrusion into the traditional purview of qualitative legal commentary. Although some of the commentators' responses have focused on very fair criticisms of the methodology employed by Choi and Gulati, several scholarly responses seem to have a broader point--one that rejects the very idea of quantitative assessment of judging. (12)
Those who reject the notion that judicial performance is quantifiable will, of course, be disappointed by any variation of the Choi and Gulati techniques. But for those who see quantitative methods as a valuable tool in systematic assessment of the judiciary, there is an opportunity to build on existing citation studies to reveal another perspective on judicial performance. The opportunity arises because of two key limitations of the citation studies that limit the effectiveness of the existing work. First, the citation studies are "count-based," meaning that the number of citations is the key variable of interest in evaluating the judges. Among other problems, this approach treats negative citations the same as positive citations, even though negative citations might reflect negatively on judicial quality. Second, the citation studies rate judges based only on opinions they have authored, rather than all cases in which they have participated. opinion authorship, although closely tied to individual judges, raises a host of problems, not the least of which is selection bias in opinion assignment. Constructing a judicial evaluation technique that responds to these two problems offers a clearer, more comprehensive view of judicial performance.
This Article attempts to construct such a technique for ranking federal appellate judges--one that does not have the same drawbacks of the existing citation-count studies. The problem of treating positive and negative citations alike is addressed by using "treatment" data from Shepard's Citations (provided by LexisNexis) to rank judges according to the positive or negative citations of their peers: other federal appellate judges. Using treatment data allows judges whose decisions are cited more positively to receive higher rankings and judges who are cited more negatively to receive lower rankings. The problem of selection bias is addressed by using panel membership, rather than opinion authorship, as the link between judges and citations. This means that judges who contribute to producing a decision that is positively cited when they did not author the opinion are ranked higher than those who only contribute to producing a decision that is positively cited when they author the opinion. These two innovations are designed to produce a measure of the quality of the average opinion produced jointly by an appellate panel, rather than the visibility or notoriety of opinions authored individually by a judge, as in the citation-count measures.
Although this project was conceived as a means of building on the existing citation-count studies, the results are so strikingly different from those of the citation-count models that this study is more properly viewed as a break with the existing literature. Indeed, the picture of judicial performance that emerges from this study poses a challenge to widely-held conceptions about the identities of the "top" judges in the federal appellate courts. Some of the most prominent judges in the citation-count models and law review literature appear only average when ranked by the mix of positive and negative citations to their opinions. Similarly, some relatively low-profile judges who rarely make the pages of law review scholarship emerge as some of the nation's most highly rated judges in this ranking. The reason is relatively clear: while citation counts tend to reward the most provocative judicial entrepreneurs, this study rewards the careful judicial craftsperson. Thus, the results of this study provide a means of assessing the quality of the typical decision rendered by an appellate judge, rather than the notoriety of his or her high-profile decisions.
This Article outlines a judicial evaluation tool that is as transparent and objective as in the method used by Choi and Gulati, but one that more directly measures the characteristics most people--especially litigants in the federal courts--are likely to think of and care about as judicial "quality." The justification for this alternative measure is that the average litigant likely cares much more about the quality of federal appellate judges than about the judges' passing of an ideological litmus test, or the likelihood the litigant's dispute will be immortalized in casebooks and law review articles. Moreover, this approach provides a means of assessing judicial performance for the purpose of judicial administration that is complementary to, rather than duplicative of, the productivity measures already used in judicial assessment, such as caseloads and backlogs. Finally, by using a much larger dataset and more detailed information than the existing studies, the quality measures in this study offer a preliminary but revealing look into the interplay of ideology and precedent in the federal appellate courts.
This Article proceeds as follows: Part II surveys the burgeoning literature evaluating judicial performance and explains the contribution of this Article in extending that literature. It outlines the theory that underlies the performance measurement strategy in this Article and how incorporating positive and negative citations improves that measure. Part III describes the dataset and methods used in this study. Part IV presents the results--a ranking of 383 federal appellate judges based on positive and negative citations to their decisions since 1960. This Part uses the opportunity presented by then-Judge Sotomayor's nomination to the Supreme Court to compare her performance relative to that of the other federal appellate judges considered for the nomination. Part V applies this research to broad normative policy questions, such as the relationship between ideology and precedent and whether the Ninth Circuit should be split. Part VI concludes with the observation that both ideology and judicial quality appear to drive judicial citation patterns.
II. LITERATURE AND THEORY
A. Introduction to Existing Literature
The quantitative literature evaluating judicial performance is still in its infancy. Although evaluations based on survey responses have been around since the Almanac of the Federal Judiciary was first published in 1984, (13) only recently have scholars begun to use large databases to evaluate the performance of judges. (14) The quantitative work on judicial quality can be roughly grouped into two broad categories according to the study's evaluative measure. (15) One approach uses judicial outcome measures to evaluate judges, such as voting patterns and reversal rates on appeal. (16) Another group of approaches uses the number of citations to opinions from other judicial opinions, law review articles, and so forth to evaluate judges. (17) In each case, the goal is to find an objective measurement that captures something important about judicial influence, prestige, or quality, while possibly controlling for one or more variables that would potentially confound the analysis. To illustrate these two approaches, the discussion below contrasts two recent studies: the Cross and Lindquist reversal-rate study and the Choi and Gulati citation analysis study.
The reversal-rate approach is probably the most intuitive way to think about assessing performance for judges who do not sit in a court of last resort. This is true in part because appellate courts themselves invite this interpretation by describing trial courts as having "erred" when they reverse the trial courts. This reversal-rate approach involves comparing the rates at which individual judges' decisions are reversed on appeal (or reversed on certiorari in the case of federal appellate judges). (18) Assuming that there is a "correct" disposition of most cases (an aggressive assumption), or at least that there are "incorrect" ways of resolving some cases (a less aggressive assumption), and that the superior court is "correct" more often than the lower court (certainly debatable, but not implausible), then the rate of reversal approach might capture a measure of judicial quality. (19) The interpretation is that lower rates of reversal, possibly as moderated by control variables, translate into higher judicial quality.
The strengths and limitations of the reversal rate approach are well illustrated by a recent study by Frank Cross and Stefanie Lindquist. (20) Cross and Lindquist examined Supreme Court review of federal appellate judges' decisions from 1989 to 2000, computing reversal rates and comparing rankings based on those rates with the findings from Choi and Gulati's citation analysis study, discussed below. (21) The authors found that highly cited judges tended to fare slightly worse at the Supreme Court level than less frequently cited judges, although the difference was not statistically significant. (22) Digging deeper into the data, however, Cross and Lindquist found that highly cited judges received more affirmances and reversals by the Supreme Court in short, they tended to be reviewed by the Court more often. (23) Cross and Lindquist were cautious about their results, however, arguing that their approach "may capture only one dimension of judicial quality." (24) The authors therefore augmented their study with a cluster analysis of judicial "types," which placed judges into categories but did not purport to produce an ordinal ranking of the judges on a single scale. (25)
The reversal-rate approach has a certain appeal as a measure of performance or at least as a measure of lower court fidelity to superior court preferences. But the approach has a number of weaknesses that limit its effectiveness. one defect is that cases granted certiorari are likely a highly biased sample of the work product of judges, both in terms of ideology and quality, (26) and, in any event, even if a certiorari review is granted for a case, the fact that nine justices disagree with a particular decision is not necessarily an indication of the decision's low quality. (27) The problem of ideological bias could be ameliorated somewhat by using only unanimous reversals or summary reversals, but that remedy means reducing the sample size further, leading to the next, and more significant limitation of the method. The sample size of Supreme Court review is simply too small for meaningful comparisons of individual appellate judges and possibly even too small for meaningful comparisons of whole circuits. Even the most frequently reviewed judges were not reversed or affirmed more than nine times in the Cross and Lindquist study, except for Judge Stephen Reinhardt of the Ninth Circuit, who had fourteen reversals. (28) Thus, this reversal-rate approach may be appropriate for evaluating district court judges where the problems of bias are reduced and the sample size is considerably greater, but the approach may not work well for evaluating intermediate appellate judges. (29)
The second approach to evaluating judges--the citation analysis approach --uses citations by peers, rather than reversals by superiors, as the relevant data for evaluating judicial quality. (30) The idea of using citations to evaluate judges as historical figures is not new, (31) but systematic quantitative studies of the quality of judges have emerged only in the last decade. (32) The first significant step on this path was in 1998, when Landes et al. published their groundbreaking study of judicial influence. (33) In that study, the authors explored the advantages and disadvantages of citation analysis and ranked federal appellate judges by citation counts. (34) In the years that followed the Landes et al. study, a flurry of follow-up studies appeared, with scholars using citation analysis to study Supreme Court justices, (35) courts of appeals judges, (36) state supreme court judges, (37) and Australian judges. (38)
In this citation-analysis line of research, the most provocative recent work has come in a series of articles by Stephen Choi and Mitu Gulati. (39) Choi and Gulati's approach is similar to that of Landes et al. in that both use citation counts to evaluate judges. However, Choi and Gulati updated the analysis by using more recent data and introducing an innovation to the rankings in the form of an "independence" score. (40) But perhaps the most significant difference between the Landes et al. study and the work by Choi and Gulati is that Choi and Gulati explicitly make the normative argument that the rankings should be used to evaluate judges for promotion to the Supreme Court. (41) In brief, the authors propose that the Supreme Court nomination and confirmation process could be improved by a "tournament" in which federal appellate judges compete according to their quantitative criteria for elevation
to the Court. (42) Thus, the authors go beyond the largely descriptive or theoretical work of the Landes et al. study to make aggressive normative arguments about how the rankings should be used.
The Choi and Gulati studies have generated significant response from both scholars and federal judges. Most of the response, however, has been negative, and some of it emphatically so. (43) Commentators have expressed a variety of criticisms, some specific to the Choi and Gulati project or its methodology and some that seem directed toward the very idea of quantitative studies of the judiciary. one line of criticism expresses doubt about the project of empirical assessment of judges, arguing that it is impossible to measure judicial performance with quantitative data. (44) Another line of criticism focuses on Choi and Gulati's normative claims and argues that although judicial performance might be measurable, the proposed "tournament" would create perverse incentives or otherwise would not translate into better Supreme Court justices. (45) Because some of these criticisms raise important issues about quantitative studies of judicial performance, the discussion of these issues is deferred until Part V of this Article.
A third line of criticism relates to methodological details of the techniques used by Choi and Gulati. Some of these arguments repeat well-known criticisms of citation analysis that were extensively catalogued in the original Landes et al. study. (46) Others relate more specifically to Choi and Gulati's methodology. (47) The next section details some of these criticisms and describes how the analysis used in this study responds to the limitations of the Choi and Gulati approach and citation analysis more generally. Indeed, the approach taken in this study was motivated by many of the criticisms of the citation analysis literature. (48)
B. Drawbacks of Existing Literature
1. Positive Versus Negative Citations
The most significant drawback of the existing citation studies as measures of judicial quality is their failure to distinguish among positive, negative, and neutral citations. As explained above, the citation analysis literature evaluates judges based on the number of citations to their opinions--the "citation count"--rather than the nature of the citations--the "citation treatment." (49) The use of citation counts to evaluate judges draws upon the widespread practice of counting citations to evaluate scholarly influence and quality. (50) But there are many reasons that judges and scholars cite one another, and not all of them are indicative of quality of the cited work. (51) This is because although many citations are positive, some are negative, (52) and, at least in the judicial context, most are neutral, incidental, or otherwise not meaningful. (53) Thus, using citation counts, even with control variables, may not accurately measure the quality of the cited work.
Of course, the citation studies recognize this potential objection and some of the studies even acknowledge that taking account of citation treatment would "refine" the analysis. (54) In general, however, the authors of citation-count studies typically argue that it is not necessary to distinguish between positive and negative citations. (55) This perspective is common in citation analysis of scholarly quality, where the argument is that "an article engendering hundreds of critical comments would undoubtedly be an extremely important, albeit controversial, contribution." (56) But this assumption, although appropriate in studies of scholars, is often extended by the citation-count literature to the judicial context, where it may not belong. The argument is that negative citations, like positive citations, also reflect judicial influence, because unpersuasive decisions, at least those outside the circuit, will be simply ignored. (57) As a result, the argument concludes that the mere mention of the cited work is an indication of some degree of influence.
This argument needs to be broken down into two overlapping distinctions, both of which are important for sorting out the relationship of citations to measuring influence and quality. The first distinction is that influence is not necessarily the same as quality, and citation counts are concededly a better measure of influence than of quality, at least in the judicial context. As Landes and Posner acknowledge in another paper:
[A] common criticism of citation analysis when it is used as an evaluative tool is inapplicable, or largely so, when it is used to study influence: that a critical citation should not be weighted as heavily as a favorable one and maybe should not be counted at all or given a negative weight. When speaking of influence rather than of quality, one has no call to denigrate critical citations. Scholars rarely bother to criticize work that they do not think is or is likely to become influential. They ignore it. (58)
Thus, negative citations may actually be positive indications of influence, if the alternative to the negative citation is ignoring the work. When measuring quality, on the other hand, negative citations are just that--negative. This is where a line can be drawn between the Landes et al. study and the Choi and Gulati study. The Landes et al. study and several other citation studies purport to measure the "influence" of federal judges, (59) not the "quality" of their opinions. For this purpose, not distinguishing between positive and negative citations may make sense. The Choi and Gulati approach, on the other hand, purports to use citation counts to measure "opinion quality." (60) For this purpose, treating positive, negative, and neutral citations alike may overlook the most important piece of information: the citing judge's treatment of the cited opinion. The use of citation counts to assess judges, therefore, is much more properly considered a measure of influence, rather than of quality or reputation. (61)
The second distinction is suggested by the first--namely that negative citations, even if not a measure of quality of work, may be a measure of influence of that work. There is an important assumption underlying this argument, however, which is that the alternative to negative citation is no citation at all. As suggested by the passage above, that may be an appropriate behavioral assumption for citations in scholarship, where unpersuasive work is regularly ignored. But the citation-count studies, with one notable exception, (62) assume that the prevailing behavioral norms in scholarship--ignoring unpersuasive work--translate into judicial opinions. In reality, however, judges often do not simply ignore the arguments of the losing party in their opinions, they engage those arguments, distinguishing or rejecting them if necessary. Thus, the alternative to citation is not to ignore the previous decision, but rather to cite it negatively. As a result, the mere fact that a judge cites a prior decision is not necessarily an indication of influence and certainly not an indication of quality but rather an indication of the arguments put forth by the litigants.
2. Inside- Versus Outside- Circuit Citations
The discussion of influence and quality in citation studies leads to the second drawback of those studies--their emphasis on "outside-circuit" citations as measures of influence or quality. As explained above, negative citations may reflect influence when the alternative to a negative citation is no citation at all. But even the citation studies recognize that in judicial decisions, the doctrine of binding precedent might require a citation--even a positive citation--to an unpersuasive or poorly reasoned opinion. (63) In contrast, the assumption is that a decision in another circuit will simply be ignored if it is not well-reasoned and influential. Thus, the authors of the citation studies argue that outside-circuit citations are driven by persuasion and inside-circuit citations are driven by precedent, (64) leading those authors to focus primarily on outside-circuit citations and to ignore inside-circuit citations.
The emphasis on outside-circuit citations, however, rests on problematic assumptions about how judges deal with unpersuasive decisions, both inside and outside their circuits. Although most people agree that judges cite well-reasoned, persuasive opinions positively, what do judges do with unpersuasive opinions? The citation studies assume that judges ignore those decisions if they are outside the circuit, unless they are so influential that they are impossible to ignore. This assumption leads Landes et al. to argue that "[c]ritical citations, in particular to opinions outside the citing circuit, are also a gauge of influence since it is easier to ignore an unimportant decision than to spell out reasons for not following it." (65) Thus, citations from outside the circuit are assumed to be positive because they are only persuasive.
The citation studies assume that unpersuasive decisions inside the circuit, on the other hand, will be cited and cited positively. The argument is that citations from the same circuit are uninformative about influence or quality because those citations are uniformly positive and compelled by precedent. (66) But this mechanical view of precedent ignores the fact that judges have a great deal of flexibility to avoid poorly reasoned opinions in their own circuits. Judges may well be bound by those decisions as precedent, but judges can often distinguish the binding precedent if they find it unpersuasive. Judges' options are not limited to following, rejecting, or ignoring a decision; judges can, in many cases, distinguish an unpersuasive decision and avoid its effect. Thus, the extent to which precedents are distinguished rather than followed is a negative indication of the persuasiveness of the precedent.
With respect to citations from other circuits, the mere fact of citation is not always an indication of quality, as suggested above, or even influence. This is because when there is no law directly on point--a likely situation in citation studies based on published opinions--the litigants will usually cite authority from other circuits. In such cases, the mere fact of citation to out-of-circuit decisions simply does not communicate anything about those decisions, other than the fact that they favored and were cited by one of the litigants. This is because, "[u]nlike scholars, courts often are not free simply to ignore authority that is, for example, expressly relied upon in a party's brief, but which the court finds unpersuasive. Instead, a court often will cite that authority and in the process criticize or at least distinguish it." (67)
Generally, when a judge cites persuasive authority from another circuit, he or she will need to respond to the decision from the other circuit, which means following, distinguishing, or criticizing the other circuit's decision. In any one of these treatments, the other circuit's decision will be cited.
There are two broad lessons from this discussion. First, inside-circuit citations still convey meaningful information if the study distinguishes between positive (e.g., following) and negative (e.g., criticizing or distinguishing). Second, outside-circuit citations do not necessarily convey meaningful information about influence or persuasiveness but rather may simply reflect the generality of a decision's reasoning or the capaciousness of its dicta. As a result, at least when measuring quality rather than influence, it makes sense to distinguish between positive and negative citations and to use inside- and outside-circuit citations.
3. Opinion Authorship Versus Panel Membership
The third problem with citation-count studies is the reliance on opinion authorship rather than panel membership to assess influence or quality. (68) These studies evaluate judges based on the citations to the opinions they write rather than to the panels on which they participate, because authoring an opinion is more closely tied to the individual judge than is serving on a panel. (69) The advantage of the opinion authorship strategy is that fewer observations need to be collected because the random variation of other factors is attenuated when judges are linked to decisions through their opinion authorship. But as long as there is sufficient data and panel membership is determined randomly, linking judges to decisions using panel membership rather than opinion authorship allows measurement of much more than opinion authorship, as discussed below.
The advantages of using panel membership rather than opinion authorship are that the analysis can (1) mitigate the effects of selection bias in opinion assignment and (2) capture collegial factors that should enter into a measure of good judging. Because opinion authorship is not randomly assigned, the measure of quality may be biased by selection effects. opinion assignment might affect citation counts, (70) especially when opinion assignment is combined with selective publication of opinions. (71) On the other hand, using panels, rather than authored opinions, eliminates the concern of self-selection, as judges are assigned randomly to panels. (72) Moreover, this approach minimizes concerns that critics of judge rankings systems have had with gamesmanship of the rankings, (73) because it would be much more difficult to game randomly assigned panels than deliberately assigned opinions on the panels. Although judges might be able to selectively choose opinions to write or manipulate those opinions to maximize citations, it would be very hard to do that as one member of a three-judge panel.
The second advantage to linking performance to panel membership rather than opinion authorship is that panel membership might capture the intangible contributions that high-quality judges make other than writing opinions. Presumably, judges who serve on panels make some positive contribution in the cases in which they participate, even if they do not write the majority opinion. Participation in deliberations, comments on the opinion, discussion of the rationale, and even the actual vote on the disposition could all affect the persuasiveness of the resulting decision. Judge Jay Bybee (writing with Professor Thomas Miles) explains how other members of the panel contribute to the opinion ultimately produced:
A judge may contribute mightily to the quality of an opinion even if she is not its author. A thoughtful judge may ask penetrating questions from the bench that help shape the views of the other members of the panel. In conference discussions or in commenting on a colleague's draft opinion, a judge may influence an opinion's analysis. (74)
Similarly, Judge Harry Edwards of the United States Court of Appeals for the District of Columbia Circuit argues that, "[d]uring the course of judicial deliberations, judges more often than not persuade one another until a consensus is reached." (75) Thus, the ultimate opinion, which represents the "consensus," is the joint product of the entire panel, not merely of the opinion's author. As Choi and Gulati pointed out in their original "tournament" article, there is an important "team" aspect to the appellate panel that should be measured. (76)
Yet these contributions are not measured by techniques that focus exclusively on ranking judges by opinions they have authored. If judges are ranked based on the decisions in which they participate in addition to the opinions they author, the rankings reward rather than discourage fruitful collaboration, as some say Choi and Gulati's tournament does. (77) Indeed, the failure to capture deliberation, collegiality, and consensus-building is one of the prime criticisms of those who criticize empirical analysis of judging generally, not merely judge ranking studies in particular. (78) Linking judges to citations based on panel membership may help to capture some of the intangibles of collegiality that quantitative studies are often criticized for ignoring.
4. Productivity and Quality
The final criticism of the citation-count studies centers on their preference for quantity over quality in opinion writing. The citation-count studies implicitly favor judges who produce more published opinions because more published opinions produce more citations. Choi and Gulati go further, explicitly incorporating "productivity" into their rankings, because they believe productivity is one characteristic of a promising Supreme Court justice. (79) But by rewarding separate opinion writing in both the productivity and independence ranking categories, Choi and Gulati's method may actually measure negative traits of judges. (80) Moreover, although productivity is perhaps an important trait for judges on the courts of appeals, it does not seem particularly relevant for Supreme Court justices. (81) Thus, a better approach would be one that avoids this "volume" measure of performance entirely, which is exactly the approach outlined in the next section.
C. Theory of This Article
The considerations discussed above suggest that the citation-count studies probably reflect a mix of judicial productivity, influence, aggressiveness, and possibly creativity or originality, but not necessarily "quality" in the usual sense. (82) As Cross and Lindquist wrote, "The quality judges on the Choi and Gulati measure appear to be fairly aggressive in their decisionmaking, provoking more frequent Supreme Court review. They are relatively successful in achieving higher numbers of affirmances, but they also suffer more losses than the average judge." (83) Or, as Judge Richard Posner put it, the evaluative criteria in the citation studies "implicitly treat judicial creativity as the only, or at least the most important, attribute of a circuit judge." (84) Although these may be positive traits of appellate judges, (85) they do not directly measure what most people think of as judicial quality, as opposed to scholarly quality or judicial or scholarly influence. This Article aims to develop a measure specifically directed to judicial quality.
The first step in devising a measurement strategy for judicial quality is to articulate a clear theoretical mechanism that links the unobservable features of judicial quality to observable judicial outputs. As described above, many judicial traits, such as productivity, influence, aggressiveness, creativity, originality, and quality contribute to the number of citations to the decisions of a particular judge. (86) To help isolate the effect of quality, this Article focuses not on the number of citations but on the relative proportion of positive to negative citations to a judge's decisions. The theory underlying this measurement strategy is simple: the quality of an opinion's reasoning matters to judges in their citation practices. This theory assumes that judges tend to cite high-quality opinions more positively than low-quality opinions. As a result, in the absence of binding precedent, judges will be more likely to "follow" the reasoning of high-quality opinions and more likely to criticize or distinguish the reasoning of low-quality opinions.
But what about cases in which judges face binding precedents, such as those within the same circuit? In such cases, judges generally are not free to simply "criticize" or disregard the precedent, as the very idea of binding precedent is that it must be followed or distinguished. (87) But the power to distinguish a precedent leaves judges with considerable freedom to avoid decisions with which they do not agree. Judges may distinguish a prior case because the rule of the precedent does not apply to the facts of the present case, but judges may also distinguish a precedent because the precedent was not well-reasoned, although purporting to identify facts that distinguish the prior case. It is always possible to distinguish a precedent (although perhaps at the cost of making the distinguishing case itself less persuasive), so even in the presence of binding authority the proportion of cases "following" versus "distinguishing" a precedent will contain important information about the quality of the precedent.
There is a third reason a judge might seek to distinguish or otherwise avoid a prior decision, even if that decision is well-reasoned or precedential. The judge might simply have a preference for an outcome different from the one the prior decision would dictate. As a result, ideological differences between judges may account for some negative citations, because judges who are ideologically extreme may tend to be cited negatively, just as judges whose opinions are of lower quality will tend to be cited more negatively. Ideological behavior is often perceived as a negative characteristic in a judge, (88) and even if perceived neutrally, the more extreme the judge the less likely others will agree with him or her. Thus, both ideology and quality will probably affect the mix of positive versus negative citations in appellate courts.
The fact that both ideology and quality might affect citations, however, does not mean we cannot disentangle the two effects. Indeed, exploring the implications of ideology in judicial rankings will be the focus of Part V.D.1. Rather than attempting to make assumptions that would disaggregate the two effects, this Article presents multiple perspectives on the data and allows readers to draw their own conclusions.
III. DATA AND METHODS
The data for this study consist of 311,931 citations among published federal appellate court cases between 1960 and 2008. (89) The data include a pool of 120,906 cases that are cited by other cases (the "cited cases") and a pool of 117,280 cases that cite other cases (the "citing cases"). (90) Many of the cited cases are also citing cases, so that the total number of unique cases is 170,786. The data were collected using LexisNexis's Shepard's Citations service in mid-2008, so the data does not include any cited cases or citing cases decided after that time.
The distinctive feature of this dataset is that each of the citations is coded as "positive" or "negative" according to the Shepard's Citations treatment code assigned to the citation. These treatment codes are assigned by staff attorneys who read the cases and are designed to indicate the precedential value of the cited cases. (91) The majority of Shepard's Citations specify no treatment other than "citing," which does not indicate any positive or negative relationship between the cited case and the citing case. But a substantial percentage of citations include a treatment code, such as "following," "distinguishing," "criticizing," "limiting," "overruling," and so forth. These treatment codes reflect the relationship between the cited case and the citing case and provide the key data for this study. Scholars have conducted extensive tests on the treatment codes in Shepard's data and found that the treatment coding is generally quite valid and reliable. (92)
The dependent variable in this analysis is a dichotomously coded outcome indicating the treatment of the citations--i.e., "positive" or "negative." Consistent with other empirical work using Shepard's Citations, "positive" codes include the Shepard's treatment of "following," and negative codes include the Shepard's treatments of "distinguishing," "criticizing," "questioning," "overruling," and "limiting." (93) Although the "distinguishing" code is not obviously negative in the same way as "criticizing," as discussed in Part II.B above, the citing case is negative in the sense that the citing case seeks to avoid applying the reasoning of the cited case. Of course, the "distinguishing" code may genuinely indicate the citing case's fact pattern is not within the holding of the cited case, but those factual distinctions should occur at random and only make the measures noisy, not biased. Table I below presents the distribution of each treatment category in the data.
The parameters of interest are estimated using a linear probability model (94) with a binary dependent variable regressed on a large number of independent variables. (95) The dependent variable takes the value 1 if the citation is "positive" and 0 if the citation is "negative," as described above. This means that the linear probability model estimates relationships between the independent (explanatory) variables to whether the case is cited positively or negatively.
The independent variables of interest are indicator variables for each of the 466 judges included in the data. (96) The judge indicator variables (one for each judge) take the value of 1 if the judge served on the panel of the cited case and 0 if otherwise. (97) If a judge dissented in the cited case, the judge is treated as having been removed from the panel, meaning that the variable takes the value 0. (98) The coefficient on the judge's indicator variable, therefore, may be interpreted as the contribution that judge makes toward the case being positively (rather than negatively) cited. If a judge's presence on the panel of the cited case is associated with the case being cited more positively, the judge will have a positive coefficient. If a judge's presence on the panel of the cited case is associated with the case being cited more negatively, the judge will have a negative coefficient. These coefficients are the main quantities of interest used to estimate judicial quality. Larger positive coefficients may be interpreted as indications of higher quality and smaller (or negative) coefficients may be interpreted as indications of lower quality.
The model also includes three principal types of control variables. The first control variables are the volume numbers of the Federal Reporter for the citing case and the cited case, which serve as proxies for a time variable. (99) It is well known that the number of citations to a precedent depreciates over time, (100) and this pattern translates into more negative citations as well--older cases are cited more negatively than recent cases. Thus, to make the measures comparable over time, a control variable for the volume number of the cited case is included. Moreover, older cases also cite other cases more negatively, even holding constant the date of the cited case. (101) This effect is not as strong as the first but also requires a control variable. There is an approximately linear trend in both relationships over time, so a variable is included for the volume numbers of the citing case and the cited case to control for time. (102)
The second set of variables controls for the effects of inside-circuit versus outside-circuit citations. As one might expect, judges cite cases within their own circuits much more frequently and favorably than cases outside their circuits. (103) Whether one attributes this effect to the constraining force of precedent or to the threat of en banc review, the effect is a considerable one, as the analysis in Part IV demonstrates. As a result, without control variables, if some judges tend to be cited more by their own circuit than others, those judges would appear to be of higher quality than the others, when in fact the judges receiving more outside-circuit citations might be higher quality. (104) The citation-count studies dealt with this complication by focusing primarily on outside-circuit citations. But using only outside-circuit citations means leaving out half the data, and it may have other disadvantages as well. (105)
Rather than focusing on outside-circuit citations such as those analyzed in previous studies, the approach in this Article controls for the inside-circuit effect, which also allows estimation of a separate inside-circuit effect for each circuit. The control variables therefore include twelve indicator variables, one for each circuit (the Federal Circuit is omitted because its specialized docket would make comparisons to judges in other circuits unreliable). These variables take the value 1 if the cited case and the citing case are in the same circuit and 0 otherwise. A thirteenth control variable is included for Eleventh Circuit cases that cite Fifth Circuit cases, as some Fifth Circuit opinions have precedential value in the Eleventh Circuit. (106) These variables control for in-circuit citation and also estimate the in-circuit citation effect--reflecting, in part, the constraining force of precedent.
A final set of variables attempts to capture ideological differences between the citing panel and the cited panel, using the political parties of appointing presidents as proxies for ideology. A panel with a majority of Democrat-appointed judges is coded as Democratic, and a panel with a majority of Republican-appointed judges is coded as Republican. (107) Using the political party of the appointing president as a proxy for ideology is as controversial as it is standard. (108) Of course, this variable will not capture all or perhaps even most ideological differences between panels, but the variable does reveal whether ideology, as measured by political party of the appointing president, affects whether the judge cites another case positively or negatively. As discussed below, these variables also help answer the question of the extent to which ideology and precedent affect inside-circuit and outside-circuit citations. (109)
IV. RESULTS AND INTERPRETATION
A. A Ranking of Federal Appellate Judges Since 1960
The rankings are presented in Table II for the full dataset (column 4) and five subsets of the dataset (columns 5-9), discussion of which is deferred to Part IV.B below. The judges are ranked in order from the most positively cited judges to the most negatively cited judges according to column 4 (results for the full dataset). Column 1 lists the judges' names, column 2 lists the judges' circuits, and column 3 lists the judges' "scores," which are the coefficients on their indicator variables in the full dataset regression. Larger scores indicate a judge is more positively cited and smaller scores (including negative scores) indicate a judge is more negatively cited. Only judges whose coefficients had standard errors of .024 or less in the full dataset are presented, which eliminates some judges from the ranking if their positions are more uncertain because of a small number of observations. (110) It should be noted that because of the number of parameters estimated, most of the judges' scores could vary considerably. In drawing inferences about individual judges, only those consistently toward the top or the bottom of the rankings can be considered reliable.…
Questia, a part of Gale, Cengage Learning. www.questia.com
Publication information: Article title: Distinguishing Judges: An Empirical Ranking of Judicial Quality in the United States Courts of Appeals. Contributors: Anderson, Robert, IV - Author. Journal title: Missouri Law Review. Volume: 76. Issue: 2 Publication date: Spring 2011. Page number: 315+. © 2007 University of Missouri-Columbia School of Law. COPYRIGHT 2011 Gale Group.
This material is protected by copyright and, with the exception of fair use, may not be further copied, distributed or transmitted in any form or by any means.