Science Inventory

A New Approach to Predict Tributary Phosphorus Loads Using Machine Learning– and Physics-Based Modeling Systems


Feng Chang, C., M. Astitha, Y. Yuan, C. Tang, P. Vlahos, V. Garcia, AND U. Khaira. A New Approach to Predict Tributary Phosphorus Loads Using Machine Learning– and Physics-Based Modeling Systems. Frontiers in Artificial Intelligence. Frontiers, Lausanne, Switzerland, 2(3):1-43, (2023).


Water bodies and coastal areas around the world are threatened by excessive amounts of nitrogen (N) and phosphorous (P) from upstream watersheds, which can cause rapid proliferation of algae. These algal blooms negatively impact drinking water sources, aquatic species, and recreational services of water bodies by producing toxins, also called harmful algal blooms (HABs). Excess dissolved reactive phosphorus (DRP) is a major driver of HABs in Lake Erie and finding controlling factors of DRP load is paramount important for EPA program offices and regional partners to make informed decisions to better control DRP load from agricultural fields.


Tributary phosphorus (P) loads are one of the main drivers of eutrophication problems in freshwater lakes. Being able to predict P loads can aid in understanding subsequent load patterns and elucidate potential degraded water quality conditions in downstream surface waters. We demonstrate the development and performance of an integrated multimedia modeling system that uses machine learning (ML) to assess and predict monthly total P (TP) and dissolved reactive P (DRP) loads. Meteorological variables from the Weather Research and Forecasting model, hydrological variables from the Variable Infiltration Capacity model, and agricultural management practice variables from the Environmental Policy Integrated Climate agroecosystem model are utilized to train the ML models to predict P loads. Our study presents a new modeling methodology using as testbeds the Maumee, Sandusky, Portage, and Raisin watersheds, which discharge into Lake Erie and contribute to significant P loads to the lake. Two models were built, one for TP loads using ten environmental variables, and one for DRP loads using nine environmental variables. Both models ranked streamflow as the most important predictive variable. Compared to observations, TP and DRP loads were predicted very well temporally and spatially. Modeling results of TP loads are within the ranges of those obtained from other studies and on some occasions more accurate. Modeling results of DRP loads exceed performance measures from other studies. We explore both of the ML-based models’ ability to further improve as more data becomes available over time. This integrated multimedia approach is recommended for studying other freshwater systems and water quality variables using available decadal data from physics-based model simulations.

Record Details:

Product Published Date:07/01/2023
Record Last Revised:09/08/2023
OMB Category:Other
Record ID: 358905