Science Inventory

Benchmarking novel curation strategies for large publicly available water solubility compilations (ACS)

Citation:

Sinclair, G., C. Lowe, N. Charest, C. Ramsland, T. Martin, A. Richard, AND A. Williams. Benchmarking novel curation strategies for large publicly available water solubility compilations (ACS). ACS, Chicago, IL, August 21 - 25, 2022. https://doi.org/10.23645/epacomptox.20993776

Impact/Purpose:

N/A

Description:

The challenge of data curation for quantitative structure-property or -activity relationship modeling (QSP/AR) underlies every modeling effort. This challenge is further complicated by multi-source data aggregation, which introduces unique problems as well as magnifying extant problems in single-source data. This work considers strategies for the aggregation of publicly available water solubility data. We compare a baseline source-agnostic statistical curation workflow for a dataset of over 82,000 water solubility measurements against the workflows of two previous publications of large multi-source water solubility datasets and propose new strategies expanding on those previously developed. These strategies are benchmarked by their performance in QSPR modeling as well as metrics of data redundancy and consistency, and are compared against a prior single-source curation workflow and model. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:08/25/2022
Record Last Revised:10/12/2022
OMB Category:Other
Record ID: 355882