Science Inventory

INTEGRATED CHEMICAL INFORMATION TECHNOLOGIES APPLIED TO TOXICOLOGY

Impact/Purpose:

In the area of improving data resources for structure-based mining, the NCCT is supporting further development and expansion of the DSSTox (Distributed Structure-Searchable Toxicity) database network. The DSSTox project is primarily focused on migrating toxicity data from diverse areas of study into structure-annotated, standardized form for use in relational structure-based searching and structure-activity model development. An essential element of this effort involves bridging understanding and forging productive linkages between the toxicology domain experts and the data users and modelers by means of focus on clarifying the chemistry content and summary presentation of the toxicology data. The larger goal of these efforts is to, in effect, overcome inherent and limiting data constraints in focused domains of toxicological study (e.g., cancer, developmental toxicity, neurotoxicity, etc) by expanding the searchable and mine-able data network across both chemical and biological domains. As an extension of the DSSTox project, NCCT researchers are promoting adoption of standardized chemical structure data fields for public toxicogenomics datasets to enable broader searchability across these data domains, and to enable integration of these datasets with legacy toxicity data and other public data. In particular, collaboration of the DSSTox project with the NIEHS Chemical Effects in Biological Systems (CEBS) project is working towards incorporation of DSSTox data fields and providing structure-searching capability and linkages across CEBS data and public genomics data, as well as DSSTox and National Toxicology Program legacy toxicity databases. Chemical structure and genomic expression patterns provide common metrics for exploring diverse toxicological effects, and can provide the basis for development of predictive patterns or signatures of a toxicological effect. Similarly, biological activity profiles consisting of experimentally determined, or computationally predicted interaction spectra (receptors, proteins, enzymes) could be viewed as expanded “properties” of the chemical and could augment structure-based information for enhancing toxicity classification and prediction algorithms. Finally, NCCT researchers are taking a lead in efforts to address more fundamental and essential needs to migrate older paper legacy data (such as within EPA Program Offices such as OPP and OPPT) into electronic form suitable for incorporation into standardized, searchable relational databases. New commercial technologies from IBM, SciTegic and others that allow for more automated structure-annotation, and chemical indexing and retrieval procedures are being evaluated to facilitate efficient electronic conversion and structured content-annotation of legacy EPA data. In addition, related issues of quality control of chemical information are being addressed, and Agency-wide chemical structure-browser capabilities are being explored

Description:

A central regulatory mandate of the Environmental Protection Agency, spanning many Program Offices and issues, is to assess the potential health and environmental risks of large numbers of chemicals released into the environment, often in the absence of relevant test data. Models for predicting potential adverse effects of chemicals based primarily on chemical structure play a central role in prioritization and screening strategies yet are highly dependent and conditional upon the data used for developing such models. Hence, limits on data quantity, quality, and availability are considered by many to be the largest hurdles to improving prediction models in diverse areas of toxicology. Generation of new toxicity data for additional chemicals and endpoints, development of new high-throughput, mechanistically relevant bioassays, and increased generation of genomics and proteomics data that can clarify relevant mechanisms will all play important roles in improving future SAR prediction models. The potential for much greater immediate gains, across large domains of chemical and toxicity space, comes from maximizing the ability to mine and model useful information from existing toxicity data, data that represent huge past investment in research and testing expenditures. In addition, the ability to place newer “omics” data, data that potentially span many possible domains of toxicological effects, in the broader context of historical data is the means for optimizing the value of these new data.

The challenges for application of information technologies, including chem-informatics and bioinformatics, are fourfold: 1) to more efficiently migrate legacy toxicity data from diverse sources into standardized, electronic, open, and searchable forms into the public domain; 2) to employ new technologies to mine existing data for coherent patterns that can provide scientific underpinning for extrapolations; 3) to place a new chemical, of unknown hazard, appropriately in the context of existing data and chemical and biological understanding; and 4) to integrate data from different domains of toxicology and newer “omics” experiments to look beyond traditional means for classifying chemicals, inferring modes of action, and predicting potential adverse effects.

Record Details:

Record Type:PROJECT
Projected Completion Date:09/30/2008
OMB Category:Other
Record ID: 149112