Statistical Modeling of Waterborne Pathogen Concentrations

EPA Grant Number: R827952
Title: Statistical Modeling of Waterborne Pathogen Concentrations
Investigators: Stedinger, Jery
Current Investigators: Stedinger, Jery , Ruppert, David
Institution: Cornell University
EPA Project Officer: Fields, Nigel
Project Period: January 21, 2000 through January 20, 2003
Project Amount: $305,493
RFA: Environmental Statistics (1999) RFA Text |  Recipients Lists
Research Category: Environmental Statistics , Health , Ecosystems


This research focuses on development of appropriate statistical methods to describe environmental distributions of microorganisms, such as Giardia lamblia and Cryptosporidium parvum protozoa, to support health risk analyses. The research addresses model formulation, parameter estimation and the precision of estimated pathogen concentrations. Innovative statistical methods are needed to address such problems because:

  • the assay of such pathogens results in counts, not a continuous response.
  • data at a limited number of sites must be "interpolated" to sites with limited sample data.
  • reported counts are subject to randomly varying recovery rates.
  • actual pathogen concentrations vary due to many environmental factors, some random.
  • sample volumes may vary; and concentrations, recovery rates and volume may be correlated.
  • risk assessments require integrating information from many different data sources, which is best done within a Bayesian framework.


Negative binomial GLMs (generalized linear models) and GLMMs (generalized linear mixed models) will be used to model over dispersed and correlated count data. In the past, inappropriate models have been applied to pathogen counts. For example, models assuming continuously-distributed responses instead of counts or assuming Poisson-distributed (rather than over dispersed) counts have been used. The use of such models can give misleading descriptions of the distribution of pathogen concentrations and the uncertainty in those descriptions leading to erroneous health risk assessments. The assays of Giardia and Cryptosporidium protozoa, as well as other microorganisms, often results in zero counts. In example Cryptosporidium data, 60% of the observations were zero counts. A zero count is not a "censored datum" below a detection limit as erroneously assumed by models appropriate for continuously-distributed responses that were in the past. Rather, zero counts are part of the sampling variation of count data and will be modeled as such by GLMs or GLMMs to provide accurate estimates of concentration distributions.

Expected Results:

Data on waterborne pathogens concentrations in surface waters are available at a limited number of sites. Information on site and sample characteristics needs to be integrated so as to characterize as accurately as possible the likely distribution of the organisms over time at any point of interest. An understanding of the variation of pathogen concentrations in space (considering water source) and over time (season) in association with measurable environmental series (rainfall, stream flow, turbidity) is required to improve the precision of health risk analyses. This research requires innovative application of GLMMs to model pathogen concentrations measured, perhaps, with randomly varying recovery rates. The current ICR assay method for the Cryptosporidium has an average recover rate of 10% with a coefficient of variation that may exceed 100%. Thus reported concentrations are imprecise and models need to distinguish the true variability of Cryptosporidium levels from sampling variations including recovery rate variability, as well as what can be explained by observable environmental parameters, season, and other factors. The examples and statistical methodological advances proposed will contribute to the toolbox of statistical methods available for environmental analysis, risk management, policy and regulation.

Publications and Presentations:

Publications have been submitted on this project: View all 10 publications for this project

Journal Articles:

Journal Articles have been submitted on this project: View all 1 journal articles for this project

Supplemental Keywords:

RFA, Economic, Social, & Behavioral Science Research Program, Scientific Discipline, Environmental Chemistry, Health Risk Assessment, Environmental Microbiology, Environmental Statistics, Ecological Risk Assessment, health risk analysis, ecosystem assessment, multiple response variables, Bayesian method, computer models, waterborne pathogen concentrations, statistical models, data analysis, innovative statistical models, cryptosporidium, Giardia lamblia, generalized linear models

Progress and Final Reports:

Final Report