Science Inventory

A New Publicly Available Chemical Query Language, CSRML, to support Chemotype Representations for Application to Data-Mining and Modeling

Citation:

Yang, C., A. Tarkhov, J. Marusczyk1, B. Bienfait, J. Gasteiger, T. Kleinoeder, T. Magdziarz, O. Sacher, C. Schwab, J. Schwoebel, L. Terfloth, K. Arvidson, A. Richard, A. Worth, AND J. Rathman. A New Publicly Available Chemical Query Language, CSRML, to support Chemotype Representations for Application to Data-Mining and Modeling. Journal of Chemical Information and Modeling. American Chemical Society, Washington, DC, 55(3):510-528, (2015).

Impact/Purpose:

Paper details specifications for a new XML-based query language, CSRML, and publicly available resources associated with this (ToxPrint chemotypes, Chemotyper, KNIME node) that are facilitating and offer many advantages for data mining applications, profiling, and developing predictive models linking chemistry to bioassay results, such as being generated in the ToxCast and Tox21 programs.

Description:

A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transformation (e.g., SMIRKS, reaction SMILES) queries currently in use. Chemotypes, a term used to represent advanced CSRML queries for repeated application can be encoded not only with connectivity and topology, but also with properties of atoms, bonds, electronic systems, or molecules. The CSRML language has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory and commercial use chemical space, as well as to represent features and frameworks believed to be especially relevant to toxicity concerns. A software application, ChemoTyper, has also been developed and made publicly available to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML standard used in CSRML-based chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge.

Record Details:

Record Type:DOCUMENT( JOURNAL/ PEER REVIEWED JOURNAL)
Product Published Date:03/23/2015
Record Last Revised:03/23/2015
OMB Category:Other
Record ID: 307132