Science Inventory

The limitations of the InChI and InChIKey for substance databases

Citation:

Grulke, Chris, A. Jacobs, A. Richard, AND A. Williams. The limitations of the InChI and InChIKey for substance databases. ACS, San Francisco, California, August 16 - 20, 2020.

Impact/Purpose:

Chemical curation is an expert-driven, manually intensive task intended to ensure the internal consistency and quality of linkages of chemical identifiers associated with source lists and data, as well as to structure. Accurate structure representations, in turn, are essential underpinnings for all chemical structure-based models and methods to be applied to predicting properties and attributes of chemicals with limited available data. Collection and assignment of structures from public and commercial sources is typically replicated in many different organizations, often by non-cheminformatics experts, with variable quality results. This work supports the development of a more robust cheminformatic infrastructure to account for complex substances and to expand the chemical space of DSSTox.

Description:

InChI has become a canonical identifier for the communication of well-defined chemicals. This has greatly simplified the linking and aggregation of most content in public structure-centric chemical databases. It is a tool that has greatly improved the ease of identifying inconsistencies in those public databases and has enabled the clear dissemination of chemical linked data. However, when attempting to document substances, there are many areas of chemistry that InChI cannot handle effectively. Substances with partially-defined structures, complex chiral chemicals, inorganics, nanoparticles, and coordinate complexes all point to limitations in the use of InChI and InChIKey for communicating chemical concepts with the level of detail necessary to enable research. These limitations though, are not apparent only with InChI, but most formats used to communicate chemical concepts. Examples of chemicals in common use and under ongoing research are that InChI cannot support are plentiful. The use of substance registries is a vital bridge to enable communication for the edges of chemistry that are currently not well-supported until technological advances in chemical storage and canonicalization enable the InChI to serve as an effective identifier for all substances of interest.

URLs/Downloads:

GRULKE_ABSTRACT_ACS.DOCX

Record Details:

Record Type:DOCUMENT( PRESENTATION/ ABSTRACT)
Product Published Date:05/26/2020
Record Last Revised:06/03/2020
OMB Category:Other
Record ID: 349014