Grantee Research Project Results
1999 Progress Report: Using Multilevel Statistical Models to Address Representativeness and Data at Different Spatial and Temporal Scales
EPA Grant Number: R826763Title: Using Multilevel Statistical Models to Address Representativeness and Data at Different Spatial and Temporal Scales
Investigators: Berk, Richard , DeLeeuw, Jan , Ambrose, Richard , Turco, Richard , Gould, Robert
Current Investigators: Berk, Richard , Ambrose, Richard
Institution: University of California - Los Angeles
EPA Project Officer: Hahn, Intaek
Project Period: October 1, 1998 through September 30, 2000
Project Period Covered by this Report: October 1, 1998 through September 30, 1999
Project Amount: $414,149
RFA: Regional Scale Analysis and Assessment (1998) RFA Text | Recipients Lists
Research Category: Aquatic Ecosystems , Ecological Indicators/Assessment/Restoration
Objective:
There exists, in a wide variety of software, the ability to estimate multilevel linear models. As discussed in our proposal, these may be used to explore the generalizability of findings from case studies and convenience samples. For example, one may address explicitly if an association found in one site is found in other sites. If not, one may then consider what is it about sites that might explain why the association is not a general one.
The goal of this research is to broaden multilevel linear models to cover a much wider range of data structures likely to be used by practicing environmental scientists. We proposed to do this within Xplispstat because it is a very powerful statistical platform and free of charge.
Progress Summary:
In our proposal, we described five extensions of the basic linear model we intended to implement: (1) multiple response variables, (2) nonlinear functional forms, (3) missing data, (4) disturbance covariance matrices allowing for temporal and spatial dependencies, and (5) latent variables. To do this, we also had to provide for a wider range of estimation algorithms. To date, we have added: (1) a restricted maximum likelihood estimation algorithm, (2) the nonlinear functional forms associated with the generalized linear model, and (3) a way to handle temporal dependence among the disturbances. We are still working on ways to address spatial dependence, missing data, generally nonlinear functional forms, and latent variables.
As an example of progress to date, consider the following, where i = 1, ..., N denotes level-2 units and j = 1, ..., ni denotes level-1 units nested within the ith level-2 unit:
- Yij - Binomial ( ij, 1) (1)
&pii = (1 exp(Wi (x+ XiBi))-1
i - N (0, )
where Yij is jth level-1 response for the ith level-2 unit, i is the ni by 1 vector with jth element ij, Wi is the ith level-2 unit's design matrix for the fixed effects, Xi is the ith level-2 unit's design matrix for the random effects, is a vector of fixed effects, i is a p by 1 vector of normally distributed random effects, and is a p by p covariance matrix. Let Yi [Yi1, ..., Yini] be the ni by 1 vector of responses for the ith level-2 unit. This is little more than the usual formulation for logistic regression allowing for data at two levels, such as observations characterizing what is going on within given wetlands and observations characterizing differences and similarities between wetlands. For example, one might study for a set of wetlands the relationship between a form of pollution and species diversity, and then how the relationships found may vary as a function of how fast water flows through the area. The technical point is that logistic regression, as contained within the generalized linear model, is an illustration of hierarchical models beyond the usual linear regression format.
Now consider the following more general model in the context of the generalized linear multilevel model.
- Yi fi(Wi Xi i) i
We require the marginal mean and covariance of the observed data to match that modeled. Therefore, the function fi depends on the link, and the distribution and/or structure of i depends on the distribution of the data. The right hand side above is nonlinear in both and i, which means standard methods of evaluation may not be used. Instead, the right hand side of the model is linearized using a Taylor expansion. We use a first-order expansion of the fixed effects, Wi , and a second-order expansion of the random components, Xi i. Estimation based on this particular expansion is referred to as second-order method quantitation limit (MQL). After linearization and some algebra, we arrive at a linear model with a transformed response and design matrix. With a linear model, estimates may be obtained using Interactive Generalized Least Squares. This algorithm alternates between updating and updating the variance parameters; generalized least squares is used for each update. The practical point is that we now can properly analyze data with responses that are categorical (e.g., a taxonomy of wetland characteristics), ordinal (e.g., a rank order of species diversity), or counts (e.g., of species), as well as the usual equal interval response common in linear regression. We do this within a hierarchical framework that facilitates within site and between site analyses simultaneously.
Our software also now has the ability to estimate the parameters of multilevel models with disturbances that possess an AR(1), MA(1), or ARMA(1,1) form. Thus, if the data within sites are time series and, as a result, one is concerned about temporal dependence, three of the most common models for that dependence can now be exploited.
Future Activities:
We will proceed as described in our original proposal with illustrative data analyses beginning in the summer of 2000. We also anticipate working on a presentation/publication late in 2000.Journal Articles:
No journal articles submitted with this report: View all 1 publications for this projectSupplemental Keywords:
statistical inference, external validity, hierarchical models, modeling., RFA, Economic, Social, & Behavioral Science Research Program, Ecosystem Protection/Environmental Exposure & Risk, Regional/Scaling, Environmental Statistics, data synthesis, regional environmental data, risk assessment, non-linear functional forms, ecosystem assessment, representativeness studies, multiple response variables, survey data, environmental risks, multilevel statistical model, hierarchical statistical inference, satellite data, modeling, external validity, statistical models, regional scale impacts, data analysis, spatial-temporal methods, spatial and temporal scales, representativeness, multiple response variable, data models, hierarchical statistical analysis, innovative statistical models, regional survey data, remotely sensed data, statistical methodsProgress and Final Reports:
Original AbstractThe perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.