Science Inventory

Applicability domains for classification problems: benchmarking of distance to models for AMES mutagenicity set

Citation:

Sushko, I., S. Novotarskyi, R. Körner, A. Pandey, A. Cherkasov, J. Li, P. Gramatica, K. Hansen, T. Schroeter, K. Mueller, L. Xi, H. Liu, X. Yao, T. Öberg, F. Hormozdiari, Dao, C. Sahinalp, R. Todeschini, P. Polishchuk, A. Artemenko, V. Kuz'min, T. M. MARTIN, D. M. YOUNG, D. Fourches, E. Muratov, A. Tropsha, I. Baskin, D. Horvath, G. Marcou, C. Muller, A. Varnek, V. V. Prokopenko, AND I. V. Tetko. Applicability domains for classification problems: benchmarking of distance to models for AMES mutagenicity set. Journal of Chemical Information and Modeling. American Chemical Society, Washington, DC, 50(12):2094-2111, (2010).

Impact/Purpose:

To inform the public.

Description:

For QSAR and QSPR modeling of biological and physicochemical properties, estimating the accuracy of predictions is a critical problem. The “distance to model” (DM) can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property of a specific model. In our previous studies we rigorously studied several popular DMs for quantitative models and found that DM based on the standard deviation (STD) calculated from the ensemble of models offered the best results. The current study extends this analysis to qualitative models using 30 models from the AMES mutagenicity challenge 2009. Besides the STD, we explore the value of difference between the prediction value and the class label, a measure that combines both previous measures to provide a probabilistic estimation of miss-classification as well as several other models. We show that DMs based on an ensemble (consensus) model provided systematically better performance than other DMs. Moreover, the analyzed DM identified 30-60% of compounds having accuracy of prediction similar to the cross-laboratory accuracy of AMES test, which is estimated to be 85%. Thus, the in silico predictions can be used to halve costs of experimental measurements by providing similar prediction accuracy. Model developed by HMGU group is publicly available as http://ochem.eu/models/1.

Record Details:

Record Type:DOCUMENT( JOURNAL/ PEER REVIEWED JOURNAL)
Product Published Date:10/29/2010
Record Last Revised:01/25/2011
OMB Category:Other
Record ID: 230624