Science Inventory

Development of a CSRML version of the Analog identification Methodology (AIM) fragments and their evaluation within the Generalised Read-Across (GenRA) approach

Citation:

Adams, M., H. Hilde, D. Chang, A. Richard, A. Williams, I. Shah, AND G. Patlewicz. Development of a CSRML version of the Analog identification Methodology (AIM) fragments and their evaluation within the Generalised Read-Across (GenRA) approach. Computational Toxicology. Elsevier B.V., Amsterdam, Netherlands, 25:100256, (2023). https://doi.org/10.1016/j.comtox.2022.100256

Impact/Purpose:

AIM fragments form the basis of the Analogue Identification Methodology, a tool still in use by the New Chemicals Division for the search and retrieve of analogues for read-across. The AIM tool that is publicly available is only available to run in legacy Windows OS. As such an attempt was made to codify the fragments using CSRML language that would facilitate refinements and would be compatible with modern chemoinformatics tools. In addition the newly codified AIM fragments were compared with the public ToxPrints to evaluate the extent to which they were similar or not. Their utility was also evaluated on a large dataset of acute oral rat toxicity data as part of a GenRA approach.

Description:

The Analog Identification Methodology (AIM) was developed over 20 years ago to identify analogues to support read-across at the US Environmental Protection Agency. However, the current public version of the standalone tool, released in 2012, is no longer usable on Windows operating systems supported by Microsoft. Additionally, the structural logic for analogue selection is based on older, customised Simplified molecular-input-line-entry system (SMILES)-type features that are incompatible with modern cheminformatics tools. Given these limitations, a case study was undertaken to explore a more transparent, extensible method of implementing the AIM fragments using Chemical Subgraphs and Reactions Mark-up Language (CSRML). A CSRML file was developed to codify the original AIM fragments, and the extent to which AIM fragments were faithfully replicated was assessed using the AIM Database. The overall mean performance of the CSRML-AIM across all fragments in terms of sensitivity, specificity, and Jaccard similarity was 89.5%, 99.9%, and 82.2%, respectively. Comparing the AIM fragments with public ToxPrints using a large set of ∼25,000 substances of regulatory interest to EPA found them to be dissimilar, with an average maximum Jaccard score of 0.24 for AIM and 0.29 for ToxPrint fingerprints. Both fragment sets were then used as inputs in the automated read-across approach, Generalised Read-Across (GenRA), to evaluate the quality of fit in predicting rat acute oral toxicity LD50 values with the coefficient of determination (R2) and root mean squared error (RMSE). The performance of AIM fragments was R2=0.434 and RMSE=0.663 whereas that of ToxPrints was R2=0.477 and RMSE=0.638. A bootstrap resampling using 100 iterations found the mean and the 95th confidence interval of R2 to be 0.349 [0.319, 0.379] for AIM fragments and 0.377 [0.338, 0.412] for ToxPrints. Although AIM and ToxPrints performed similarly in predicting LD50, they differed in their performance at a local level, revealing that their features can offer complementary insights.

Record Details:

Record Type:DOCUMENT( JOURNAL/ PEER REVIEWED JOURNAL)
Product Published Date:02/01/2023
Record Last Revised:01/03/2023
OMB Category:Other
Record ID: 356673