Science Inventory

Estimating Probability of Lead Service Line Occurrence using Predictive Models

Citation:

Buahin, C. AND B. Dyson. Estimating Probability of Lead Service Line Occurrence using Predictive Models. Presented at EPA ORD's States LSL Identification Meeting, Cincinnati, OH, October 26, 2023.

Impact/Purpose:

The abstract describes a presentation that will be given to states to introduce some of the predictive models that are used to forecast the likelihood of lead service occurrence in water distribution systems. It will involve no data collection or model development and will only focus on the underlying science and a review of publications describing the models in question. Predictive models are increasingly seen as tools for developing the service line inventories mandated by the Lead and Copper Rule Revisions (LCRR). They can be used in conjunction with other lead service line testing and identification approaches to address challenges including poor records and inconclusiveness and high costs of other testing methods. However, improper use or worse, misinterpretation of the results from these models can lead to incorrect applications. The presentation to be presented under this abstract will focus on demystifying this approach by shedding light on the various types of models and how they work; the input data requirements; and costs of maintaining and improving these models over the lifetime of a replacement program. This effort will bring more clarity to utilities and practitioners who face a confusing array of technologies that are being marketed as solutions to the lead service line issue.

Description:

Addressing the adverse human health impacts from legacy lead (and copper) service lines (LSL) installed for drinking water distribution systems in many parts of the United States continues to be a major challenge. As a step in addressing this issue, the Lead and Copper Rule Revisions (LCRR) mandates the development and maintenance of service line inventories to inform replacement efforts by water utilities. Predictive data-driven modeling using machine learning and geo/statistical approaches are increasingly being proposed and employed to address various challenges surrounding the development of these inventories including: 1) non-existent, poor, inaccurate, and/or incomplete records on the material types, locations, and distribution of service lines; 2) inconclusiveness of other LSL identification methods; and 3) the prohibitive costs of other LSL identification methods. However, the dizzying array of modeling approaches and lack of understanding  about their underpinnings can lead to misinterpretation and misapplication of these models. In this presentation, we seek to provide an objective  and scientifically based overview of these models. Specifically, the need for input data requirements, model performance evaluation methods, and how to use these models   to guide the inventory development and replacement process in practice. We advance our view that predictive models should be considered as only one of the tools in the arsenal of approaches that can be used in conjunction with other LSL identification approaches to inform thresholds for service line categorization and minimize uncertainty in the use of these methods. We highlight some common pitfalls that model developers and policy makers need consider in interpreting and using these models. Finally, we discuss the true cost of developing these models in comparison to other LSL identification methods by considering the cost of obtaining the input data needed to train, test, and validate these models correctly and the need to continually maintain these models throughout the life cycle of a replacement program.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:10/26/2023
Record Last Revised:12/19/2023
OMB Category:Other
Record ID: 359915