VALID MODELS FOR SPACE-TIME VARIOGRAMS                                                                  Supervisor: Donald E. Myers

Dibakor Roy Bapy

 

A current project with EPA (Environmental Protection Agency) is looking at the use of space-time variograms to detect vegetation change in a ten-year period using data from the Oregon Pilot Study area. The project will use NDVI (Normalized Vegetation Index) satellite data. The Normalized Difference Vegetation Index (NDVI) is widely used in a variety of biospheric and hydrologic studies. For instance the NDVI plays an important role in the soil moisture mapping conducted in SGP 99. This project will advance the science of ecological monitoring and demonstrate techniques for regional-scale assessment of the condition of aquatic resources in the western United States. The data set provides a comprehensive growing season profile of these ecosystems, is extremely useful for assessing seasonal variations in vegetation conditions, and provides a foundation for studying long-term changes resulting from human or natural factors

 

The variogram is used (especially in geostatics) to quantify spatial correlation, i.e., similarity or dissimilarity in a statistical sense as a function of separation distance and direction. The variograms are analogous to the (auto)covariance function or (auto) correlation function except that they exists under weaker conditions. For a function defined on n-dimensional Euclidean space to be valid variogram it must satisfy certain conditions, e.g., the growth rate must be less than quadratic and it must be conditionally negative definite. The second condition is not an easy one to check for a particular function hence the practice is to use known valid models or positive linear combinations (the class of valid models is closed under positive linear combinations). When extending the variogram into space-time there are two general approaches that might be used, one is to treat space-time as simply a higher dimensional Euclidean space and the second is to “separate” space and time. The disadvantage to the first approach is that it means one must have a metric or norm on space-time, which essentially means that time as a “dimension” is not really different than other Euclidean dimensions, which contradicts some of the usual perceptions of time. The second approach is essentially the same as that of constructing a valid model in n-dimensional space from two models, one valid on k-dimensional space and the other on –k dimensional space (for space-time use k=1).

 

 

              The NDVI is the difference of near-infrared (channel 2) and visible (channel 1) reflectance values normalized over the sum of channels 1 and 2 .It is  based on the principle that actively growing green plants strongly absorb radiation in the visible region of the spectrum (the ‘PAR’ or ‘Photosynthetically Active Radiation’) while strongly reflecting radiation in the Near Infrared region. The concept of vegetative ‘spectral signatures (patterns)’ is based on this principle. Given the following abbreviations:

      

         PAR   Value of Photosynthetically Active Radiation from a pixel

         NIR    Value of Near-Infrared Radiation from a pixel

 

 

The NDVI for a pixel is calculated from the following formula:

                 

                   NIR - PAR

NDVI = ---------------------

                   NIR + PAR

 

          This formula yields a value that ranges from -1 (usually water) to +1 (strongest vegetative growth.) where increasing positive values indicate increasing green vegetation and negative values indicate non-vegetated surface features such as water, barren, ice, snow, or clouds. The NDVI can be derived at several points in the processing flow. To retain the most precision, the NDVI is derived after calibration of channels 1 and 2, prior to scaling to byte range. Computation of the NDVI must precede geometric registration and resampling to maintain precision in this calculation.

 To scale the computed NDVI results to byte data range, the NDVI computed value, which ranges from -1.0 to 1.0, is scaled to the range of 0 to 200, where computed -1.0 equals 0, computed 0 equals 100, and computed 1.0 equals 200. As a result, NDVI values less than 100 now represent clouds, snow, water, and other non-vegetative surfaces and values equal to or greater than 100 represent vegetative surfaces.

 

          To monitor vegetation response, NDVI data can be used to determine the greenness overtime. NDVI is presumably determined from cloud free AVHRR observations. The composite daily AVHRR values are taken to determine the biweekly AVHRR cloud free data. NDVI data were calculated and scaled to range from zero to 200 to represent a

terrestrial feature on land . Prior to the analysis, the NDVI data were inspected and found that some outliers and they were for example, low (<105) or equal to 200.Information on climate were used to verify the data and the condition of the day of the observation. Cleaning the data is, therefore, a necessary step before any analysis and inferences. Low NDVI values (80-105) could be as results of snow, inland water bodies, exposed soils, and dust (Eastman and Fulk, 1993; Myneni et al., 1997). Values greater than 200 were probably as a default and therefore, were excluded from the data. The differences between two consecutive days were also examined. In a study to separate dust storm and cloud effect on NDVI that used for drought effect in Burkina Faso, Groten (1993) indicated that if an NDVI value is less than that of preceding day by more than

10%, then this is a dust storm effect. He used an algorithm to substitute for the value that was in a dust storm day. We used his algorithm and substitute for low NDVI values when there are differences between consecutive NDVI values of ≥20  as follows:

Time

NDVI

NDVI, used

      Calculation

 

1

163

163

 

 

 

 

2

139

163.667

 

163+(165-163)/3=163.667

3

142

164.333

 

(163.667+165)/2=164.333

4

165

165

 

 

 

 

5

158

158

 

 

 

 

6

165

165

 

 

 

 

7

141

163.5

 

(163+164)/2=163.5

 

8

164

164

 

 

 

 

9

150

165

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

         The cleaned NDVI data were within acceptable range (105 < NDVI < 200) and the

absolute value of the difference between any consecutive values is less than 20.

 

 

Before cleaning, the NDVI data file looks like the following-

 

-9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999

-9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 153 160 155 158 158 158 155 149 147 147 144 144 152 156 153 152 146 139 148 151 147 135 130 140 141 137 136 137 139 148 152 150 146 138 144 147 147 147 147 144 147 144 142 150 150 147 147 151 149 135 140 137 137 155 155 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999

-9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9

 

-9999 is the default reading.

 

For cleaning the NDVI data a software program written in FORTRAN language was used. The FORTRAN program was tested for its accuracy, reliability and efficiency. The FORTRAN program was also transformed into C anticipating that it would be more efficient and less time consuming. After running it in the supercomputer, the FORTRAN code seemed to be more efficient. Cleaning the data takes a long time in an average computer for its huge volume. Sometimes it takes months just to clean the data. So in order to handle this large data file the use of supercomputer is a necessity.  Then we run the program for each year of data at a time in the supercomputer. The supercomputer took about 2 days at a stretch to complete the job. After cleaning a mean value has been assigned for each pixel. The data are arranged in the following way-

 

 

 

 

NDVI yearly mean (year 1989)

 

    Xcoord            Ycoord          MeanValue

  -1870500.       36500.00      -9999.000   

  -1870500.       37500.00      -9999.000   

  -1870500.       38500.00      -9999.000   

  -1870500.       39500.00      -9999.000   

  -1870500.       40500.00      -9999.000   

  -1870500.       41500.00      -9999.000   

  -1870500.       42500.00      -9999.000   

  -1870500.       43500.00      -9999.000   

  -1870500.       44500.00      -9999.000   

  -1870500.       45500.00      -9999.000   

  -1870500.       46500.00      -9999.000   

  -1870500.       47500.00      -9999.000   

  -1870500.       48500.00      -9999.000   

  -1870500.       49500.00      -9999.000

 

The next thing we are going to do is to merge all the folders of one year data into a big folder. Right now we are testing the program which will do this job. After this we will start some statistical analysis and use some decision making tool (space time variogram) to reach a conclusion about the vegetation change.

 

So far we have dealt with the information. We have organized or reorganized the data files to refine it. Now the data files are ready for computation and analysis. Every data is associated with four variables; easting, northing, time and pixel value. The data are arranged under these four columns in the data files. The next thing is to apply the statistical tool to find out the average difference between two points on a space fixing the time variable constant. A vector having distance and direction will indicate the difference between two points. The distance vector will be assigned an angle either of 0-45, 45-90 or 90-135 degrees to get the direction. A computer program will continually do the job of comparing two pairs of points. It will take the distance and direction and search all the rows of the data file in order to find out similarity and dissimilarity between two pairs of points. We are interested about the nature of the computational difference among those points. If the two points are closer, a smaller difference will be anticipated. However if the points are far apart, the calculated difference would be greater.

 

Similar statistical computation will be done with respect to time keeping the space variable constant. We will calculate the average difference between two points with the change of time. The difference will be plotted on a graph with respect to change of year.

 The nature of the graph will reveal if there is any change in the data over time. On the other hand it will give us some understanding about the change of vegetation over time.

 

We have got ten layers of data, i.e., data of ten years. So these computations will be done for each layer or for all the layers at a time.