EPA Science Inventory

Building associations between markers of environmental stressors and adverse human health impacts using frequent itemset mining

Citation:

Bell, S. AND S. Edwards. Building associations between markers of environmental stressors and adverse human health impacts using frequent itemset mining. Presented at SIAM International Conference on Data Mining, Phildelphia, PA, April 24 - 26, 2014.

Description:

Building associations between markers of exposure and effect using frequent itemset mining The human-health impact of environmental contaminant exposures is unclear. While some exposure-effect relationships are well studied, health effects are unknown for the vast majority of the > 83,000 chemicals in commerce. This creates challenges for manufacturers, regulators, and consumers trying to balance industrial needs against a complex landscape of health susceptibilities and exposures. The National Health and Nutritional Examination Survey (NHANES), a large-scale epidemiological survey aimed at determining the prevalence and risk factors of major diseases, is increasingly used to postulate relationships among chemicals and adverse health effects in the U.S. population. The interpretation of these studies is complicated, however, by the ad hoc data mining approaches typically employed. Here we describe the use of frequent itemset mining for identifying exposure-health associations in NHANES. From 9440 discretized samples, 983 two-itemset rules were generated describing associations between markers of health and environmental exposure (lift >1, response threshold > 97.5th quantile). Odds ratios for the rules enable use of network approaches to facilitate knowledge discovery and hypothesis development, as well as comparison to traditional regression analyses. A case study using parathyroid hormone levels to demonstrate how association rules can be used in data mining is described. This study demonstrates how rules can facilitate hypothesis development and improve traditional regression models by identification of potentially confounding variables even in the presence of missing information. Our approach is designed to enable effective knowledge discovery of potential health impacts of environmental chemicals by facilitating comprehensive data mining and meta-analysis of the NHANES dataset. Long-term, our representation of the information allows for integration with other disparate data, such as known biological pathways, to address the current data gaps. The views expressed in this abstract are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency.

Purpose/Objective:

This abstract will be presented at the Society of Industrial and Applied Mathematics (SIAM) International Conference on Datamining (SDM14) April 24-26, 2013, in Phildelphia, PA. This study demonstrates how rules can facilitate hypothesis development and improve traditional regression models by identification of potentially confounding variables even in the presence of missing information. Our approach is designed to enable effective knowledge discovery of potential health impacts of environmental chemicals by facilitating comprehensive data mining and meta-analysis of the NHANES dataset.

URLs/Downloads:

ISTD-STICS-13-064-SIAMDM14ABSTRACT-FINAL.DOCX

Record Details:

Record Type: DOCUMENT (PRESENTATION/ABSTRACT)
Completion Date: 05/20/2014
Record Last Revised: 05/20/2014
Record Created: 05/20/2014
Record Released: 05/20/2014
OMB Category: Other
Record ID: 276255

Organization:

U.S. ENVIRONMENTAL PROTECTION AGENCY

OFFICE OF RESEARCH AND DEVELOPMENT

NATIONAL HEALTH AND ENVIRONMENTAL EFFECTS RESEARCH LAB

INTEGRATED SYSTEMS TOXICOLOGY DIVISION

SYSTEMS BIOLOGY BRANCH