Academic journal article Bulletin of the World Health Organization

# Use of Time-Series Analysis in Infectious Disease Surveillance

Academic journal article Bulletin of the World Health Organization

# Use of Time-Series Analysis in Infectious Disease Surveillance

## Article excerpt

Introduction

Early identification of an outbreak of a reportable disease is the first step toward an effective intervention to contain it. However, outbreaks are often well under way before public health authorities become aware of them. Time-series analysis based on the Box--Jenkins or ARIMA (autoregressive, integrated, moving average) method models reported cases over time and thereby permits forecasts to be made of expected numbers of reported cases and provides confidence intervals around these forecasts. Having forecasts at hand to compare with the observed numbers of cases can facilitate making a decision as to whether an apparent excess represents an outbreak rather than a random variation. This article discusses the use by nonspecialists of this method of time-series analysis for infectious disease surveillance, paying particular attention to the circumstances that favour or hinder its usefulness.

Many textbooks and other sources (e.g. 1 and 2) explain the theory of time-series analysis and several statistical packages are available to carry out the calculations involved, e.g. BMDP, S-PLUS and SYSTAT. The reader interested in carrying out ARIMA modelling needs to have access to such a package. Since the Statistical Surveillance System 1 (SSS1) (3) is the most user-friendly software for ARIMA modelling, we will refer to it repeatedly in this article. It is produced by the Centers for Disease Control and Prevention (CDC) and is available free on the Internet at http://www.cdc.gov/epo/epi/software.htm. All the figures in this article were generated using SSS1.

ARIMA modelling is theoretically sound and practical, and it is not necessary to have a complete understanding of the underlying statistical theory to apply this method successfully (2). However, in order to gain some understanding of the method and be able to apply it prudently, a basic understanding of algebra (square root, reciprocal, logarithm), statistics (mean, moving average, variance, normal distribution, significance, confidence intervals, goodness-of-fit, correlation, and partial correlation) and, less importantly, estimation techniques (least-squares method, iterations, convergence), is needed.

All the examples are drawn from experience gained in Montreal, Canada, whose population is 1.7 million, with about 8000 infectious disease notifications per year.

Selection of the time series

Time-series analysis requires a series of observations, repeated at equal time intervals (usually), on the same population. For the forecast intervals to be accurate, the observations should have a normal distribution, with a mean and variance that remain constant over time (a property called "stationarity") (1).

The series must not be subdivided into such short intervals that the numbers of observations per interval (case reports, in our context) are so small as to be nonnormally distributed. The interval concerned can be a day, week, a 28-day period, a month, etc. (all series in this article consist of 28-day intervals). However, for rare diseases only a few cases may occur per interval, even if the interval is made long, and ARIMA modelling may therefore not be useful for such diseases.

The longer the series, the better; however, the series should not extend so far into the past as to include periods during which a different case definition was applied or in which any other reporting artifact resulted in a mean number of cases per interval that differs from the mean of recent intervals.

Experience shows that series with a clear periodicity (i.e., the numbers of observations per interval increase and decrease in cycles, generally of 1 year) are more likely to lead to useful forecasts than series without periodicity. Periodicity over 1 year is called seasonality, and for seasonality to become apparent the series should cover at least 2 years.

Should an outbreak have occurred during the period covered by the series, one ought to consider excising from the series the intervals containing the excess cases, taking care to remove a whole year (or several years) of intervals in order to retain any seasonality present in the data. …

Search by...
Show...

### Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.