USE OF GIS IN ANALYZING ENVIRONMENTAL CANCER RISKS AS A FUNCTION of GEOGRAPHIC SCALE

Mid-semester Report

 

By

Zheng Cai

 

The overall goal of the research project is to provide an estimate of the Arsenic concentration in groundwater for every residence that every subject has occupied over the course of its lifetime.

 

During the first few weeks, I studied several articles on the spatial analysis, and learnt from Dr. Myers on several software using for the spatial analysis, as well as the theories behind them. Each of software has its own strengths and weaknesses. We will need to choose the “best” one to use in the research.

 

The vast majority of effort on any GIS project generally involves data acquisition, or preparing the data for analysis in the GIS. Up to the present, I have been preparing water quality data from a variety of sources for geostatistical and geospatial analysis. These data have required a significant amount of processing before they can be imported into a GIS for exploratory spatial data analysis, deterministic spatial modeling and geostatistical analysis. I transformed the raw data from different resources into excel, then altered them into the SPSS file. We try to keep the data into the format that in the next step we will be doing analysis convenient.

 

The next steps will provide further insight as to the nature of Arsenic in groundwater, and the potential for human exposure. I will move to the steps of exploratory spatial data analysis, deterministic spatial modeling and geostatistical modeling. They are as follows:

 

1.      Exploratory Spatial Data Analysis (ESDA)

Tidy up the summary measures of central tendency and dispersion for each county in AZ. In the cases that data permits, I will need to summarize measures for certain communities within counties. We will need univariate and bivariate descriptive data by county, community and well. Some of these analyses will require further use of SPSS software. These analyses will focus on Arsenic, as well as some other variables along with, such as well depth and other contaminants that may correlate with Arsenic.

 

2.      Deterministic Spatial Modeling

After having characterized and described the arsenic data for each well, we will be interested in interpolating arsenic concentrations at unmeasured locations. These interpolations will be conducted using ARCGIS and ARCINFO. We will interpolate Arsenic concentration using IDW, Spline, radial basis functions, local and global polynomials.

 

3.      Geostatistical Modeling

This interpolation method assumes that the distance or direction between sample points reflects a spatial correlation that can be used to explain variation in the surface. Kriging fits a mathematical function to a specified number of points, or all points within a specified radius, to determine the output value for each location. Kriging is a multiple step process; it includes exploratory statistical analysis of the data, variogram modeling, creating the surface, and exploring a variance surface. This function is most appropriate when knowing there is a spatially correlated distance or directional bias in the data.

Kriging has several advantages over other deterministic interpolation methods. In addition to kriging, however, we will explore the realm of stochastic simulations. In this type of analysis, observations are re-sampled a large number of times and point estimates with confidence intervals may be developed from the re-sampled data (this should sound a lot like the bootstrap). While kriging has a tendency to smooth distributions, these simulations maintain closer resemblence to the true 'shape' of the data. These simulations will be conducted using the GSLIB freeware package.