Final Report: Database and Tools for Investigation of Climate-Mediated Human Disease

EPA Grant Number: R832753
Title: Database and Tools for Investigation of Climate-Mediated Human Disease
Investigators: Smith, Mark S. , Feied, Craig , Gillam, Michael , Handler, Jonathan
Institution: Washington Hospital Center
EPA Project Officer: Chung, Serena
Project Period: August 1, 2006 through July 31, 2010
Project Amount: $443,420
RFA: Decision Support Systems Involving Climate Change and Public Health (2005) RFA Text |  Recipients Lists
Research Category: Global Climate Change , Health Effects , Health , Climate Change


Project Climate Query (PCQ) succeeded in creating the first publicly available, real-time-updated single database   that integrates clinical data with climatologic data.
The clinical data are emergency department (ED) visit data from Washington Hospital Center in Washington DC, with 88,000 patient visits per year in 2010. The clinical data fields used in this combined clinical/climatologic database are age, gender, date, chief complaint, disposition, and final diagnosis. The climatologic data are from the National Climatic Data Center, and includes fields such as temperature, precipitation, barometric pressure, wind speed, etc.
The combined dataset extends from 1996 to 2010 and is updated in real time on a daily basis.  It contains data from more than 1,000,000 emergency department encounters.
The PCQ clinical/climatologic database permits researchers to pose and answer questions about the relationship between presentation of specific symptoms or diseases (e.g., shortness of breath, chest pain, vomiting, etc. as symptoms, and asthma, myocardial infarction, stroke, etc. as diagnoses) with climatologic conditions (e.g., temperature extremes, precipitation, barometric pressure etc.).
The database is accessible via a custom designed Web tool to preauthorized public health researchers who can use the database to pose and answer questions of research and public health interest. Sample questions, such as the relationship between mortality and high temperatures, or the occurrence of temporal clusters of heart attacks with various meteorological variables can be analyzed. Additionally, the database can be used to instantiate an already existing scientific paper to determine if the conclusions cited in the paper hold true using data from the PCQ database.

Summary/Accomplishments (Outputs/Outcomes):

Project Climate Query (PCQ) integrates climate data from the National Climatic Data Center (NCDC) with patient information from the Washington Hospital Center Emergency Department. The resulting conjoined database contains clinical information selected from numerous data fields that are routinely collected for each patient at Washington Hospital Center. These clinical data are combined with detailed weather information. The combined dataset provides a uniquely granular, comprehensive, and continuously growing resource that extends from 1996 to the present. By filtering on any combination of fields, we were able to analyze time series and clusters and explore correlations between clinical and climatological quantities. The database is updated daily through automatic data feeds of climate and patient information as new patients enter the system. These patient-level data are being made available to public health researchers who wish to explore the relationship between climatologic conditions and presentations of clinical illness.
The database is complemented with a Web-based analysis platform that facilitates exploration and analysis. The platform offers an integrated set of computational tools that allow for rapid and easy queries, statistical analysis and visualization of the data. These tools are implemented in a novel format called "RoughDraft," which allows for the integration of research articles with their underlying analysis into a unified dynamic scientific document.
The PCQ database was developed in several stages. First, a data core was created with parsers that provide a daily feed containing both patient and climate data. This data core was integrated with the existing Amalga infrastructure and made accessible via SQL server queries for further analysis.
A Matlab based data-mining software system was written to query the data core and perform specified statistical analyses in an automated way. The resulting findings are reported in user-definable html-templates. In addition to the built-in Matlab analytic and visual functionality, this tool was also interfaced to the S+ statistical package. Use of the Matlab package requires VPN access to the Washington Hospital Center network.
To make the datacore available to a wider community of public health researchers and to provide a more seamless integration between data, analysis, and report generation, we then developed a framework that implements and extends the functionality of the Matlab toolset as a PHP class library.
A public Web platform was established based on the PHP class library. Access to this Web platform will be granted to qualified public health researchers, who can apply for login privileges on the website. Only de-identified data fields are available on the site. First and foremost, the platform offers access to the unified PCQ database, including the ability to browse the fields, filter and sort by certain criteria, define custom columns and download the data for further analysis using third party software. In addition, the platform offers an integrated set of computational tools that allow for rapid and easy queries, statistical analysis and visualization of the data. These tools are implemented in a novel format called "RoughDraft," which allows for the integration of research articles with their underlying analysis into a unified dynamic scientific document. The platform is open to approved public health researchers, and is available at


Core Data Table

The first component of the PCQ database consists of the full, real-time dataset recorded for every patient that registered in the Washington Hospital Center Emergency Department and stored in the Amalga database.  The information for each patient includes the full range of clinical data recorded. These patient data are complemented with a daily data feed from the National Climatic Data Center (NCDC).
The database is constructed to be able to answer questions of the form:  is there a relationship between temperature and a complaint of chest pain, or between barometric pressure and a diagnosis of myocardial infarction, or between a chief complaint of shortness of breath and rainfall.
By filtering on any combination of fields in the database, it is possible to study time series, profile correlations, temporal clusters and many more. Furthermore, it is possible to define derived quantities such as lagged averages (e.g., to study the effect of the temperature over the last 3 days prior to hospital admission) or weather extremes (e.g., the highest temperature in a 24 h interval). This flexibility allowed us to replicate in one computational framework, a very large variety of studies that had been carried out in dedicated settings in the past.
In addition to this core set of patient and climate data, users can define custom columns that are derived from the core data. The analysis can be abstracted from the individual patient level to a broader daily census level. A typical derived piece of data would be the number of patients who registered on a given day. Custom columns allow for the definition of a filter that counts only those visitors whose complaint or diagnosis matches a specific criterion.
Analyses Carried Out
As a first application of the PCQ Matlab package, we implemented the statistical methodology known as Generalized Additive Models (GAMs).
A test article investigating the relationship between mortality and temperature (Frank C Curriero et al., 2002) was successfully instantiated using the GAMs methodology.
Applications based on the Web platform primarily focused on the exploration of climate triggers of ED visits. In addition to testing our System, the main objectives of this phase of the Project were to explore possible associations between specific daily ED presentations and climate variables lagged in time, and to identify possible anomalies in the time series distribution of ED presentations.
In this part of the exploratory analysis, ambient climatic conditions lagged in time up to a week were correlated with total daily ED presentations. Cross correlation maps (Curriero et al., 2005) were used to assess and visualize these correlations allowing the lagged climate variables to represent conditions on a single day or summarized over an interval of consecutive days. 
Results from the correlation analysis performed on all ED outcomes and various climate variables did not yield any significant associations. 
A further analysis focused on identifying a set of consecutive days where the combined rate for a specific ED presentation (with rates again defined to be counts relative to total admissions) is significantly higher than that found in the rest of the time series (temporal cluster). We employed the software SaTScan. Statistically significant temporal clusters were found both for the incidence of heart attack and chest pain.

Journal Articles:

No journal articles submitted with this report: View all 3 publications for this project

Supplemental Keywords:

RFA, Health, Scientific Discipline, Air, Health Risk Assessment, climate change, Air Pollution Effects, Risk Assessments, Environmental Monitoring, Ecological Risk Assessment, Atmosphere, air quality modeling, ecosystem models, decision making database tool, climatic influence, Project Sentinel, modeling, climate models, demographics, human exposure, regional climate model, ambient air pollution, Global Climate Change

Relevant Websites:

The presentation and poster files can be downloaded from the project web page http://www.projectclimatequery.orgexit EPA.

Progress and Final Reports:

Original Abstract
  • 2007 Progress Report
  • 2008 Progress Report
  • 2009 Progress Report