Challenges of evaluation
In January 2010, a retrospective evaluation of the United Nations Children's Fund's multi-country Accelerated Child Survival and Development programme was published in the Lancet. (1) The authors found great variation in effectiveness of the programme's 14 interventions and could not account for the causes of these differences. (2) The journal's editors wrote that "evaluation must now become the top priority in global health" and called for a revised approach to evaluating large-scale programmes to account for contextual variation in timing, intensity and effectiveness. (3-6)
Evaluations of large-scale public health programmes should not only assess whether an intervention works, as randomized designs do, but also why and how an intervention works. There are three main reasons for this need.
First, challenges in global health lie not in the identification of efficacious interventions, but rather in their effective scale-up. (7) This requires a nuanced understanding of how implementation varies in different contexts. Context can have greater influence on uptake of an intervention than any pre-specified implementation strategy. (3) Despite widespread understanding of this, existing evaluation techniques for scale-up of interventions do not prioritize an understanding of context. (5,7)
Second, health systems are constantly changing, which may influence the uptake of an intervention. To better and more rapidly inform service delivery, ongoing evaluations of effectiveness are needed to provide implementers with real-time continuous feedback on how changing contexts affect outcomes. (7,8) Summative evaluations that spend years collecting baseline data and report on results years after the conclusion of the intervention are no longer adequate.
Finally, study designs built to evaluate the efficacy of an intervention in a controlled setting are often mistakenly applied to provide definitive rulings on an interventions effectiveness at a population level. (9,10) These designs, including the randomized controlled trial (RCT), are primarily capable of assessing an intervention in controlled situations that rarely imitate "real life". The findings of these studies are often taken out of their contexts as proof that an intervention will or will not work on a large-scale. Instead, RCTs should serve as starting points for more comprehensive evaluations that account for contextual variations and link them to population-level health outcomes. (5,11-18)
The need for new evaluation designs that account for context has long been recognized. (18-20) Yet designs to evaluate effectiveness at scale are poorly defined, usually lack control groups, and are often disregarded as unsatisfactory or inadequate. (4) Recent attempts to roll out interventions across wide and varied populations have uncovered two important problems: first, the need for a flexible, contextually sensitive, data-driven approach to implementation and, second, a similarly agile evaluation effort. Numerous authors have proposed novel frameworks and designs to account for context, though few have been tested on a large scale. (22-25) Moreover, these frameworks have tended to focus on theories to guide evaluations rather than concrete tools to assist evaluators in identifying and collecting data related to context. In this paper, we review these proposals, present guiding principles for future evaluations and describe a tool that aims to capture contextual differences between health facilities as well as implementation experiences, and may be useful when considering how to best scale up an intervention.
Several evaluation designs have been proposed in response to the need to understand context in study settings (Table 1). Some of these designs are based on RCTs with changes to allow for greater flexibility. The adaptive RCT design allows for adjustment of study protocols at pre-determined times during the study as contextual conditions change. …