Science Inventory

Defining uncertainty in publicly available high-throughput screening data from the ToxCast program

Citation:

Brown, J., E. Watt, Woodrow Setzer, R. Judson, AND K. Paul-Friedman. Defining uncertainty in publicly available high-throughput screening data from the ToxCast program. Presented at 57th Annual Meeting of the Society of Toxicology, San Antonio, Texas, March 11 - 15, 2018. https://doi.org/10.23645/epacomptox.7028990

Impact/Purpose:

The purpose of this work is to quantify the uncertainty in ToxCast data, and to learn from the general trends in the uncertainty information obtained.

Description:

The US Environmental Protection Agency (EPA) ToxCast data pipeline (tcpl) has been applied to >1000 assay endpoints to enable first tier data processing of heterogeneous bioactivity data from high-throughput screening. This analysis generated concentration-response parameters, including the 50% activity concentration (AC50), for each chemical sample-assay endpoint pair for which the data can be fit to a curve. There are multiple sources of potential variability in these AC50s, resulting from biological variance, experimental error, or curve-fitting procedures. The primary objectives of this work were to: (1) implement a bootstrap resampling method, available as the toxboot R package, for generation of uncertainty information for AC50s from tcpl; (2) to develop a new level of tcpl to track uncertainty information and inform use of ToxCast data; and, (3) derive trends in the uncertainty information to develop a greater understanding of ToxCast data. Briefly, toxboot uses smooth nonparametric bootstrap resampling to add random normally distributed noise to give a resampled set of concentration-response values. The resampled data is fit to the three ToxCast models (constant, Hill, gain-loss), repeated 1000 times, and then >50 variables relating to model fitting parameters are stored in a Mongo database. The resulting data were used to generate point estimates, winning model, and hitcall for each of the 1000 resamples. Summary statistics (hit percent, median AC50, and AC50 confidence interval) were generated based on the toxboot resampling. Hit percent, the probability of a positive hitcall given the collection of resampled data, may be useful for predictive modeling in place of binary hitcall. Overall, 78% of positive hitcalls in invitrodb corresponded to a hit percent ≥ 90, and the median AC50 confidence interval width was 0.368 log10 micromolar units. The AC50 median and confidence interval quantify AC50 variability. Application of toxboot to ToxCast data provides a statistically robust means of estimating the uncertainty in AC50 values and evaluating the reproducibility of curve fits from the ToxCast pipeline. This abstract does not necessarily reflect U.S. EPA policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:03/15/2018
Record Last Revised:08/31/2018
OMB Category:Other
Record ID: 342150