Grantee Research Project Results
Final Report: Using Neural Networks to Create New Indices and Classification Schemes
EPA Grant Number: R829784Title: Using Neural Networks to Create New Indices and Classification Schemes
Investigators: Brion, Gail M. , Lingireddy, Srinivasa
Institution: University of Kentucky
EPA Project Officer: Page, Angela
Project Period: July 1, 2002 through June 30, 2005 (Extended to June 30, 2006)
Project Amount: $523,938
RFA: Microbial Risk in Drinking Water (2001) RFA Text | Recipients Lists
Research Category: Water , Drinking Water , Human Health
Objective:
To use advanced computer modeling techniques, (Artificial Neural Networks, ANN), and a newly developed indicator of fecal age to predict enhanced pathogen risk, as represented by presence or absence of enteric viruses or protozoa, in a drinking water source from other more easily measured surrogate parameters.
Summary/Accomplishments (Outputs/Outcomes):
Kentucky River Database. The newly developed atypical to typical coliform colonies ( AC/TC) ratio was most useful in determining the overt presence of human fecal materials, as well as being indicative of the relative age of fecal material in the raw source waters of the Kentucky River. This river was impacted by animal and human sources of fecal materials and had levels of fecal indicator bacteria greater than those recommended for body contact. Results from m odeling that was conducted on 100 multiparameter observations (out of the 108 total observation database that was previously compiled for this study), demonstrate that AC/TC ratio alone could accurately determine by simple logistic regression the presence or absence of culturable enteric viruses for individual observations with 77.6% and 66.7% accuracy, respectively. Levels of the AC/TC ratio below 15 were correlated with the majority of enteric virus presence observations (74%).
To improve predictions from the single input model discussed above, a modeling paradigm that selected input parameters to indicate levels of fecal load, the predominate fecal source, and relative fecal age was applied. Potential input parameters in these categories were evaluated for their correlation with enteric virus presence/absence in the raw source water. The presence of the fecal sterol, epicoprostanol, was strongly correlated to enteric virus presence from inputs of human sewage; whereas, the presence of male-specific coliphages was not strongly correlated. Levels of coprostanol, E. coli, and fecal coliforms were good indicators of fecal loads in the river system and strongly correlated to enteric virus presence. The AC/TC ratio and temperature were strongly correlated to enteric virus presence and indicative of relative fecal age with the AC/TC ratio dropping after rain events. A simple, 3-input, multivariate logistic regression model containing densities of fecal coliforms, the presence or absence of epicoprostanol, and the AC/TC ratio was fit to the 100 observations and predicted the presence or absence of total culturable viruses with 84.5% and 78.6% overall accuracy. It was found that the levels of coprostanol could be substituted into the 3-input model in place of the densities of fecal coliform for a fecal load signal with little impact to the overall accuracy of prediction.
The overall predictive capability of a multivariate logistic regression model using the age, load, source input parameter paradigm described above was re-evaluated using similar age (AC/TC ratio), load (E. coli and coprostanol concentrations), and source (epicoprostanol) input parameters in combination with additional input parameters that reflected changes in flow (Δ flow), changes in load (Δ somatic coliphages), and changes in a newly developed index variable (Δ change index) that was calculated from a combination of changes in flow, turbidity, alkalinity, and conductivity. The new, expanded multivariate logistic regression model could fit and predict with an overall 89.6% and 64.3% accuracy for enteric virus presence and absence, respectively.
The relative performance of a conventionally trained ANN model with architecture of 7:3:1 to that of the expanded multivariate logistic regression discussed above was evaluated. Overall prediction improved with respects to identification of virus absence (90.5%) and similar prediction of virus presence (87.9%) was observed. The separation between the frequency classification of enteric virus presence and absence was greatly enhanced by the application of the ANN model. Adding a dummy variable to the ANN model and utilizing a newly developed scheme for determining the optimum training termination point, resulted in the creation of an 8:2:1 ANN model that could predict enteric virus presence and absence with 87.9% and 81.0% overall accuracy, but with the ability to preserve more input observations for a validation set by elimination of the testing set.
An ANN model based on “forward selection analysis” for input variable selection was developed for predicting peak concentrations of encysted Cryptosporidium in the watershed. The model was able to select an optimal set of input variables in 13 of the possible 32,767 iterations. The use of simple 5:1:1 architecture along with the “forward selection analysis” appears to eliminate the need for testing datasets while providing an average predictive accuracy of over 80% for both peak and non-peak concentrations of encysted Cryptosporidium.
Analyses with Databases Obtained from Other Researchers. A database obtained from a local watershed that was undergoing a TMDL study (Eagle Creek) was analyzed for the utility of the AC/TC ratio in detecting inputs of human fecal materials from inadequately sewered towns along its length. The study was conducted in two successive years, before and after a forced main was installed. At every location along the river where human sewage input was expected, the average AC/TC value dropped. After the forced main was installed, the AC/TC ratio in the creek rose overall. For towns that had not installed the forced main and were still inputting human wastes, there was a drop in the AC/TC value.
A database with observations on water quality and the detection of enteric viruses was obtained from a multinational team of researchers in Europe. The results of this analysis found that ANN models predicted all types of viral presence and absence in shellfish with better precision than MLR models for a multi-country database. For overall presence/absence classification accuracy, ANN modeling had a performance rate of 95.9%, 98.9%, and 95.7% versus 60.5%, 75.0%, and 64.6% for the MLR for ADV, NLV, and EV respectively. The selectivity (prediction of viral negatives) was greater than the sensitivity (prediction of viral positives) for both models and with all virus types with the ANN model performing with greater sensitivity than the MLR. ANN models were able to illuminate site-specific relationships between microbial indicators chosen as model inputs and human virus presence. A validation study on ADV demonstrated that the MLR and ANN models differed in sensitivity and selectivity with the ANN model correctly identifying ADV presence with greater precision.
A powerful criterion based on “Relative Strength Effect” was developed for determining the optimal training termination point for ANN models. The efficacy and robustness of the criterion was demonstrated on three different databases including one microbial dataset from the Kentucky River. Application of this method was proved to be useful (97% overall accuracy) in backfilling missing microbial data with a broad classification of fecal coliforms into to normal (<200CFU/100ml) or peak (>250CFU/100ml) concentrations.
Conclusions:
The presence of encysted protozoa and total culturable virus can be modeled successfully by application of both simple and advanced multivariate models that capture signals relative to fecal load, fecal source, and fecal age from carefully selected physical, chemical, and microbial surrogate parameters. The AC/TC ratio is a new indicator for fecal age and source that should be applied to source water quality monitoring.
Journal Articles on this Report : 9 Displayed | Download in RIS Format
Other project views: | All 21 publications | 9 publications in selected types | All 9 journal articles |
---|
Type | Citation | ||
---|---|---|---|
|
Black LE, Brion GM, Freitas SJ. Multivariate logistic regression for predicting total culturable virus presence at the intake of a potable-water treatment plant: novel application of the atypical coliform/total coliform ratio. Applied and Environmental Microbiology 2007;73(12):3965-3974. |
R829784 (Final) |
Exit Exit |
|
Booth J, Brion GM. The utility of the AC/TC ratio for watershed management: a case study. Water Science & Technology 2004;50(1):199-203. |
R829784 (2003) R829784 (Final) |
|
|
Brion GM. The AC/TC bacterial ratio: a tool for watershed quality management. Journal of Water and Environment Technology 2005;3(2):271-277. |
R829784 (2003) R829784 (2004) R829784 (Final) |
Exit Exit |
|
Brion G, Lingeriddy S, Neelakantan TR, Wang M, Girones R, Lees D, Allard A, Vantarakis A. Probing Norwalk-like virus presence in shellfish, using artificial neural networks. Water Science & Technology 2004;50(1):125-129. |
R829784 (2003) R829784 (Final) |
|
|
Brion G, Viswanathan C, Neelakantan TR, Lingireddy S, Girones R, Lees D, Allard A, Vantarakis A. Artificial neural network prediction of viruses in shellfish. Applied and Environmental Microbiology 2005;71(9):5244-5253. |
R829784 (2003) R829784 (2004) R829784 (Final) |
Exit Exit |
|
Chandramouli V, Brion G, Neelakantan TR, Lingireddy S. Backfilling missing microbial concentrations in a riverine database using artificial neural networks. Water Research 2007;41(1):217-227. |
R829784 (2003) R829784 (2004) R829784 (Final) |
Exit Exit |
|
Chandramouli V, Lingireddy S, Brion GM. Robust training termination criterion for back-propagation ANNs applicable to small datasets. Journal of Computing in Civil Engineering 2007;21(1):39-46. |
R829784 (2004) R829784 (Final) |
Exit |
|
Chandramouli V, Neelakantan TR, Brion GM, Lingireddy S. Predicting enteric virus presence in surface waters using artificial neural network models. Environmental Engineering Science 2008;25(1):53-62. |
R829784 (Final) |
Exit |
|
Freitas SJ, Brion GM, Black L, Coakley T. Predictive input parameters for enteric virus presence at the inlet of a potable water supply. Water Science and Technology 2006;54(3):17-21. |
R829784 (2004) R829784 (Final) |
|
Supplemental Keywords:
water quality, pathogens, indicators, modeling, artificial neural networks,, RFA, Scientific Discipline, Geographic Area, Water, Environmental Chemistry, Ecological Risk Assessment, Ecology and Ecosystems, Drinking Water, Engineering, Chemistry, & Physics, Environmental Engineering, EPA Region, microbial risk assessment, alternative disinfection methods, microbial contamination, environmental monitoring, water contamination detection, region 4, bacteria, microbiological organisms, early warning, microbial pathogens, cryptosporidium , neural networks, emerging pathogens, water quality, ecological risk, drinking water contaminantsProgress and Final Reports:
Original AbstractThe perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.