Computer-Aided Teaching of Probabilistic Modeling for Biological Phenomena

Article excerpt


In this paper we describe the preliminary version of a software system for the computer simulation of prototypical stochastic models for a variety of biological systems. Our primary motivation for developing this software is to provide a flexible teaching tool to be employed in probabilistic modeling courses geared towards undergraduate students in the biological and environmental sciences.

Proper accounting of uncertainty plays a major role in many problems of risk assessment and policy analysis [see, for example, Morgan and Henrion (1990)]. Most colleges and universities require introductory courses in statistics and data analysis as part of their general educational curricula. However, for a large majority of students in science, engineering, and public policy, knowledge of key concepts in applied statistics and probability modeling and understanding of their applications to substantive disciplines must go beyond the basics that are usually taught.

Appropriate mathematical modeling (both deterministic and stochastic) of the key features of the driving mechanisms of many natural, scientific, and engineering phenomena requires a thorough understanding of matrix algebra, ordinary and partial differential equations, as well as probability theory and statistical analysis. Oftentimes, the lack of a deep grasp of these concepts prevents the students from understanding the modeling techniques and the consequences of uncertainty and variability. As educators in the statistical sciences we should still strive to convey effectively fundamental ideas of probabilistic modeling to science and engineering majors, while minimizing the need for underlying mathematical analyses.

In particular, we should stress the importance of the modeling assumptions and foster an understanding of the practical consequences that they bear. In addition, students ought to be able to explore and visualize the effects of small changes in the parameters of the model. With respect to the latter point, a thorough development of simulation methodology and the availability of fast, inexpensive, and user-friendly computing and visualization tools would allow one to treat mathematical modeling as an experimental science, in addition to its being a mathematical science [see, for example, Spain (1982)]. Simulation, dynamic visualization, and data summarization tools can be effectively used to teach students how to build interesting deterministic and stochastic models of scientific phenomena and engineering processes, how to gain a better understanding of the underlying mechanisms, and how to find meaningful explanations of the observable behaviors.

Deterministic modeling of dynamic systems in many areas of application can be accomplished with a flexible software package, called STELLA II, that allows the user to employ a simple set of graphics icons to construct a diagram of the process. The defining set of linear differential equations is then generated automatically. Sensitivity to the initial conditions and/or the values of the system parameters can be assessed with multirun analyses (Hannon and Matthias 1994). Stochastic behavior can be introduced by allowing some or all of these parameters to be random, but direct modeling of the probabilistic evolution of the process (in terms of conditional distributions of the future states of the process given its history) is not easily implementable.

Stochastic modeling of dynamic systems in the engineering disciplines can be achieved, to a limited extent, with special-purpose simulation languages (e.g., SIMAN, SLAM, GPSS/H, SIMSCRIPT, WITNESS, SIMFACTORY) that have animation capabilities. However, the time and effort required of students to become familiar with these programming environments tend to outweigh the educational benefits. From a technical point of view the animation capabilities of these simulation languages are most useful in displaying a single sample path (realization over time) of a single process; they are weak on displaying multiple sample paths of multiple processes simultaneously. …