An Examination of Five Statistical Software Packages for Epidemiology
Oster, Robert A., The American Statistician
EPI INFO, Version 6.04 b
Available for download from the Internet (site:www.cdc.gov) or can be purchased with documentation from USD, Inc., 2075-A West Park Place, Stone Mountain, GA 30087; phone 770-469-4098. Released 1997. The software was developed and is supported by the Centers for Disease Control, and is in the public domain. The World Wide Web page is www.cdc.gov/epo/epi/epiinfo.htm.
EPICURE, Version 2.0
Available from Hirosoft International Corporation, 1463 East Republican, Suite 103, Seattle, WA 98112; phone 206328-5301. Released 1996. The World Wide Web page is www.hirosoft.com/hirosoft.
EPILOG PLUS, Version 3.07
Available from Epicenter Software, P.O. Box 90073, Pasadena, CA 91109; phone 626-304-9487. Released 1995. The World Wide Web page is icarus2.hsc.usc.edu/epicenter.
STATA, Version 5.0
Available from STATA Corporation, 702 University Drive East, College Station, TX 77840; phone 409-6964600 or 800-782-8272 (800-STATAPC). Released 1996. The World Wide Web page is www.stata.com.
TRUE EPISTAT, Version 5.3 a
Available from Epistat Services, 2011 Cap Rock Circle, Richardson, TX 75080-3417; phone 972-680-1376 or 800326-1488. Released 1997.
As applications of statistics to various research fields have increased, and as the versatility of personal computers has increased, statistical software has become considerably more specialized and sophisticated. Statistical software packages are currently available for diverse areas such as medicine, business and marketing, engineering, and the social sciences, and contain a wide variety of statistical procedures and graphics capabilities.
In recent years, several statistical software packages have been developed for epidemiology and clinical trials. Most of these packages were DOS-based when they were released, and many of them are now available in a Windows version.
This article reviews five recently released statistical packages for DOS, and compares them with respect to several criteria. Because some of these packages remain DOS-based, and because all of them are still available in a DOS version, only DOS versions of the software are examined. Capabilities of available (and forthcoming) Windows versions of these packages are briefly discussed, and are compared to capabilities of the corresponding DOS versions.
2. IMPORTANT CRITERIA OF EPIDEMIOLOGISTS AND BIOSTATISTICIANS
I believe that several criteria are important to virtually all users of any statistical package. Potential users typically inquire about these criteria before purchasing a statistical package. These criteria include smoothness of the installation, simplicity of the interface, ease of use, completeness and statistical quality of the documentation, ease of data entry, completeness and appearance of statistical graphics, accuracy of statistical computations, and ability to add user-defined commands. Each of these items will be examined and then rated for each of the five statistical packages.
I also believe that several criteria pertaining to statistical procedures are of particular interest to epidemiologists and biostatisticians when analyzing and interpreting data obtained from medical, clinical, and public health studies. These criteria include completeness of descriptive measures, creating and updating epidemiological tables with ability to calculate and standardize rates, analysis of multilevel contingency tables, presence of survey sampling procedures, and capability to perform analysis of variance (ANOVA), analysis of covariance (ANCOVA), least-squares regression, repeated measures analysis, logistic regression, survival analysis, Poisson analysis, receiver operating characteristic (ROC) analysis, meta-analysis, non-parametric tests, sample size estimation, and missing value imputation. Each of these items will also be examined and then rated for each of the five statistical packages.
Statistical procedures not commonly used by epidemiologists and biostatisticians, but used more frequently by industrial statisticians and econometricians, include time-series analysis (modeling and forecasting), quality control charts and plots, and techniques to analyze product and system reliability. These items are not examined in this article.
3. THE FIVE STATISTICAL PACKAGES
3.1 EPI INFO
EPI INFO is a general-purpose software package containing modules for word processing, database management, and public health statistics, and is intended primarily for applications to epidemiology and public health. The word processing portions of EPI INFO, including programs that can create questionnaires (including actual survey forms) for use in public health investigations, are not examined in this review.
A Windows version of EPI INFO, named EPI INFO 2000, is scheduled for release during the next few months. Some statistical features not present in EPI INFO will most likely be implemented in EPI INFO 2000.
EPICURE is a statistical software package containing five programs that are used for risk modeling and for creating multi-way tables. These programs perform the analysis of binary data, including unconditional logistic regression; the analysis of grouped survival data, including Poisson regression; the analysis of ungrouped survival data or case-cohort data, including Cox proportional hazards models; the analysis of matched case-control data, including conditional logistic regression; and the creation of multi-way, multivariate epidemiologic tables.
A Windows version of EPICURE is currently in development. Some statistical and database management features not present in the DOS version will most likely be implemented in the Windows version.
3.3 EPILOG PLUS
EPILOG PLUS is a statistical software package designed for epidemiological and clinical trials applications. This package includes programs for relative risk estimation, logistic regression, loglinear models, Poisson regression, survival analysis, Cox regression, exact estimation and testing for contingency tables, exact multivariate logistic regression, repeated measures random effects models, repeated measures ordinal logistic regression, meta-analysis, creating epidemiologic tables, and displaying maps of selected regions or countries.
A Windows version of EPILOG PLUS is currently in development. Some statistical and database management features not present in the DOS version will most likely be implemented in the Windows version.
STATA is a comprehensive statistical and graphical software package. Although STATA was not designed exclusively for the analysis of epidemiological studies and clinical trials, it contains a strong biostatistical component. Included in STATA's statistical procedures are one-, two-, and multi-sample parametric and nonparametric tests, multi-way ANOVA and ANCOVA, simple and multiple regression and correlation analysis, contingency-table analysis, Poisson regression, logistic regression, nonlinear regression, survival analysis, and sample size estimation and power determination. Graphical capabilities of STATA include numerous charts, graphs, and plots for quantitative and qualitative data.
STATA is now primarily a Windows product. Some statistical and database management features not present in the DOS version are implemented in the Windows version.
3.5 TRUE EPISTAT
TRUE EPISTAT is a comprehensive statistical and graphical software package designed for epidemiological and clinical trials applications. Included in TRUE EPISTAT's statistical procedures are one-, two-, and multi-sample parametric and nonparametric tests, multi-way ANOVA and ANCOVA, repeated measures analysis, simple and multiple regression and correlation analysis, contingency-table analysis, logistic regression, survival analysis, Bayesian analysis, meta-analysis, ROC analysis, and sample size estimation and power determination. Graphical capabilities of TRUE EPISTAT include numerous charts, graphs, and plots for quantitative and qualitative data.
There is no Windows version of TRUE EPISTAT, and none is currently in development.
4. INSTALLATION AND HARDWARE REQUIREMENTS
No problems were encountered during the installation of these statistical packages. Each one needed only a few minutes to install, and required only a small number of user responses. Both EPILOG PLUS and STATA are copy-protected.
The following amounts of hard drive space are needed for successful installation (M represents megabytes): EPI INFO requires 8.6 M; EPICURE requires 7.0 M; EPILOG PLUS requires 4.3 M; STATA requires 3.5 M; and TRUE EPISTAT requires 4.3 M.
Each of the five reviewed software packages should run successfully on any IBM-compatible 386, 486, or Pentium computer. A Pentium computer is recommended for optimal performance. A 386 computer is not recommended due to slow processor speeds. A math co-processor and extended memory are highly recommended. Regarding printer support, an HP Laserjet-, Epson-, or IBM Proprinter-compatible printer is highly recommended.
EPI INFO has a Windows-like interface. The ways of maneuvering within EPI INFO include using a mouse with pull-down menus, using function keys for pre-defined commands, and typing commands from the keyboard (for statistical analysis and graphics). Pop-up menus are available to assist in data entry and analysis. EPI INFO 2000, which will have a Windows interface, promises to be even more user-friendly.
EPICURE, EPILOG PLUS, and STATA (DOS version) are completely command-driven, and do not have menus or windows. However, each of these packages has an interactive mode, in which commands are typed in individually and immediately executed, and a batch mode, in which a file of previously typed-in commands is submitted to the program and then executed. One nice feature of EPICURE is that commands issued in the interactive mode appear in a color different from that in the output, making it is easier to distinguish commands from results and messages. The Windows version of STATA is considerably more user-friendly than the corresponding DOS version. The Windows version contains a command window, where commands are typed or read in; a results window, where results from statistical analysis are displayed; a review window, where the commands are displayed; and a variables window, where all current variables are described.
Each forthcoming version of EPILOG PLUS will have a Windows interface, and will therefore be more user-friendly than the current DOS version. I examined an alpha version of EPILOG PLUS and found it to be a great improvement over the DOS version in terms of the interface and ease of use. The forthcoming Windows version of EPICURE will certainly be more user-friendly than the current DOS version.
TRUE EPISTAT has a Windows-like interface containing a series of cascading pull-down menus. No typing of commands is necessary to use the statistical and graphical procedures. In addition, a batch file can be created to run a series of macros (sessions involving fewer than 2048 keystrokes) without any user intervention.
Since EPI INFO and TRUE EPISTAT have a Windows-like interface with menus, most people will find these two packages easier to learn and use than the DOS versions of the other three packages. Therefore, both EPI INFO and TRUE EPISTAT rate high in the "ease of use" category, while I rate EPICURE, EPILOG PLUS, and STATA (DOS version) as average in this category. The Windows version of STATA rates higher in this category than does the corresponding DOS version.
6.1 EPI INFO
The documentation for EPI INFO consists solely of a User's Guide. This guide is easy to read and easy to understand. Examples given in each chapter illustrate how to enter or manage data, or how to perform the specified analysis. It would be helpful if more such examples were given. Interpretations are provided for results obtained from statistical …
Questia, a part of Gale, Cengage Learning. www.questia.com
Publication information: Article title: An Examination of Five Statistical Software Packages for Epidemiology. Contributors: Oster, Robert A. - Author. Journal title: The American Statistician. Volume: 52. Issue: 3 Publication date: August 1998. Page number: 267+. © 1999 American Statistical Association. COPYRIGHT 1998 Gale Group.
This material is protected by copyright and, with the exception of fair use, may not be further copied, distributed or transmitted in any form or by any means.