Science Inventory

Predicting compound amenability with liquid chromatography-mass spectrometry to improve non-targeted analysis

Citation:

Lowe, C., K. Isaacs, A. McEachran, C. Grulke, J. Sobus, E. Ulrich, A. Richard, A. Chao, J. Wambaugh, AND A. Williams. Predicting compound amenability with liquid chromatography-mass spectrometry to improve non-targeted analysis. Analytical and Bioanalytical Chemistry. Springer, New York, NY, 413(30):7495-7508, (2021). https://doi.org/10.1007/s00216-021-03713-w

Impact/Purpose:

Whereas targeted analytical methods can be used to determine the presence and concentration of small numbers of chemicals (on the order of 10s to 100s) in a given sample, this approach is not feasible for comprehensive chemical analysis. Two alternative techniques, known as non-targeted analysis (NTA) and suspect screening analysis (SSA), provide a means to address such a need. NTA uses high-resolution mass spectrometry (HRMS) to deduce the identity of unknown/understudied compounds without the use of chemical standards or chemical suspect lists. Similarly, SSA uses HRMS to tentatively identify chemicals in samples of interest using lists of chemical suspects and, in many cases, supporting data (e.g., reference spectra). Non-targeted and suspect screening analyses are commonly performed using a chromatograph in tandem with a mass spectrometer.  Both gas and liquid chromatography have been successfully used to aid in the characterization of large numbers of small molecules in various media. However, neither approach on its own is capable of determining the entire chemical composition of a sample as some chemicals are not amenable to specific methods, ionization techniques, etc. In a recent evaluation of NTA method performance (part of the Environmental Protection Agency’s, EPA’s Non-Targeted Analysis Collaborative Trial [ENTACT]), 1,269 diverse chemical substances were analyzed using multiple LC-MS methods, with up to 40% noted as being unamenable to detection and/or identification. Considering this result, a clear benefit would exist to having model(s) that can accurately predict the amenability of compounds in LC-MS experiments to aid in the interpretation of positive (compound reported as present) and negative (compound not reported as present) findings.  Having this predictive capability could also reduce the costs of time and resources associated with analyzing unamenable compounds.  Herein, we investigate the application of Quantitative Structure-Activity Relationship (QSAR) modeling, where “activity” is defined in this case as “amenability to detection with LC-MS”.  Random forest models were used to predict a compound’s amenability to detection with LC-MS.  Specifically, we collected a large (6,342 representatives) dataset of chemicals with known LC-MS amenability, represented our chemicals using PaDEL molecular descriptors and built random forest models to predict the LC-MS amenability of compounds for detection using both positive and negative modes of an electrospray ionization source.  Model predictivity is evaluated using statistics from Y-randomization, five-fold cross validation (CV), and external validation sets.  An applicability domain is defined using the class probability estimates from each of the random forest models.  These models provide a new technique to add weight-of-evidence when selecting and eliminating tentative chemical identities in NTA and/or SSA experiments. Whereas no model will likely ever predict the amenability of all (>1060) organic molecules, the models presented here attempt to predict within the subspace of compounds commonly identified in environmental analysis.

Description:

With the increasing availability of high-resolution mass spectrometers, suspect screening and non-targeted analysis are becoming popular compound identification tools for environmental researchers. Samples of interest often contain a large (unknown) number of chemicals spanning the detectable mass range of the instrument. In an effort to separate these chemicals prior to injection into the mass spectrometer, a chromatography method is often utilized. There are numerous types of gas and liquid chromatographs that can be coupled to commercially available mass spectrometers. Depending on the type of instrument used for analysis, the researcher is likely to observe a different subset of compounds based on the amenability of those chemicals to the selected experimental techniques and equipment. It would be advantageous if this subset of chemicals could be predicted prior to conducting the experiment, in order to minimize potential false-positive and false-negative identifications. In this work, we utilize experimental datasets to predict the amenability of chemical compounds to detection with liquid chromatography-electrospray ionization-mass spectrometry (LC–ESI–MS). The assembled dataset totals 5517 unique chemicals either explicitly detected or not detected with LC–ESI–MS. The resulting detected/not-detected matrix has been modeled using specific molecular descriptors to predict which chemicals are amenable to LC–ESI–MS, and to which form(s) of ionization. Random forest models, including a measure of the applicability domain of the model for both positive and negative modes of the electrospray ionization source, were successfully developed. The outcome of this work will help to inform future suspect screening and non-targeted analyses of chemicals by better defining the potential LC–ESI–MS detectable chemical landscape of interest.

Record Details:

Record Type:DOCUMENT( JOURNAL/ PEER REVIEWED JOURNAL)
Product Published Date:12/01/2021
Record Last Revised:08/29/2022
OMB Category:Other
Record ID: 355553