Science Inventory

Scaling Watershed Models: Modern Approaches to Science Computation with MapReduce, Parallelization, and Cloud Optimization

Citation:

Flaishans, J., M. Fry, J. Hook, N. Thurman, Jim Carleton, M. Thawley, K. Wolfe, D. Young, AND Tom Purucker. Scaling Watershed Models: Modern Approaches to Science Computation with MapReduce, Parallelization, and Cloud Optimization. 8th International Congress on Environmental Modelling and Software, Toulouse, FRANCE, July 11 - 14, 2016.

Impact/Purpose:

Presented at iEMS 2016

Description:

Environmental models are products of the computer architecture and software tools available at the time of development. Scientifically sound algorithms may persist in their original state even as system architectures and software development approaches evolve and progress. Dating back to the 1980s, the EPA has developed algorithms to estimate the flux of pesticides from treated fields to neighboring water bodies. Recent development of the EPA’s Spatial Aquatic Model (SAM) has provided an opportunity to redevelop, optimize and modernize this code used for regulatory decisions. Use of profiling has indicated a number of efficiencies that could be gained by updating the code to address execution time, memory utilization, CPU utilization and disk I/O issues. Porting the code to Python in order to access modern scientific computing and database libraries has allowed for a number of improvements and new use cases for SAM. These improvements include improved scalability, cloud infrastructure deployment, simpler cross-language communication and use of NoSQL databases. These improvements allow for SAM to be run as a service regardless of intensive input data, processing, and large output data requirements. Concurrent treatment of individual watersheds as embarrassingly parallel processes increases efficiency and scalability, while implementing MapReduce methods speeds up post-processing of the model outputs while accounting for network watershed structures. Converting to Python also has allowed the development process to leverage modern software testing frameworks and continuous integration design techniques. We discuss the experience of modernizing this code base with a goal of communicating useful design patterns for other science models.

URLs/Downloads:

http://www.iemss.org/sites/iemss2016/   Exit

Record Details:

Record Type: DOCUMENT (PRESENTATION/SLIDE)
Product Published Date: 07/14/2016
Record Last Revised: 02/23/2017
OMB Category: Other
Record ID: 335435

Organization:

U.S. ENVIRONMENTAL PROTECTION AGENCY

OFFICE OF RESEARCH AND DEVELOPMENT

NATIONAL EXPOSURE RESEARCH LABORATORY

COMPUTATIONAL EXPOSURE DIVISION

WATERSHED EXPOSURE BRANCH