Unformatted Attachment Preview
Polygon Pattern Analysis (Assessed Practical)
6F6Z2002 Environmental Risk Management
Tutor: Elias Symeonakis
e.symeonakis@mmu.ac.uk
2 February 2018
Introduction
Much of the health data readily accessible in the public domain is grouped or aggregated together so
that individuals cannot be recognised (for reasons of privacy and confidentiality). While this can pose
significant problems for researchers associated with ecological fallacy and the modifiable areal unit
problems (Cromley & McLafferty 2002), areal data can still prove highly valuable, especially when one
considers the potential influence of the wider social and physical setting on the existence and possible
diffusion of a particular health event.
When working with areal data it is important that the data be standardised to prevent any
visualisations simply reflecting population distribution (Gatrell 2002). There are many levels of
aggregation available, with different units often employed by different agencies involved in the data
gathering process. Thankfully, in the UK we can access health data aggregated (and standardised) to
the national Census units – this makes its analysis and interpretation much easier, particularly when
trying to establish socioeconomic determinants of health.
Using GIS the analyst can explore the evidence for spatial patterning within health data (Gatrell 2002)
– there may be wider structural problems or factors influencing people’s health which can be seen
within aggregated data. Indeed, Gatrell (2002:60) continues to describe how one can “look at the
disease rate in one location and rates in neighbouring locations, up to a specified distance. This helps
[to] identify disease hotspots”.
The presence of such spatial patterns is common in health data (and geographical data much more
widely) and the latter has led Waldo Tobler to comment (in what is often described as the first law of
geography):
Everything is related to everything else, but near things are more related than distance things.
Different measures of spatial autocorrelation are available for spatial analysis including Moran’s I, the
Getis and Ord G-statistic and Anselin’s Local Indicators of Spatial Association (LISA). Increasingly these
measures are available within standard desktop GIS. They can determine the substance (if any) of the
apparent spatial pattern present within the geographical data (Figure 1). This is truly important
because, as Slocum et al. (2005) note, there is a chance that the visual pattern is truly random rather
than clustered appearance on screen would suggest.
Figure 1. Types of spatial patterns (after Boots & Getis 1988)
In today’s exercise you will employ two different measures of spatial autocorrelation:
1. Global – Moran’s I (or coefficient)
2. Local – Local Indicator of Spatial Association
Learning outcomes
On completion of this practical you should be able to:
Implement a range of areal pattern descriptors in ArcGIS for the analysis of health data
Understand the basic characteristics of the different pattern descriptors and recognise their
suitability for different health data analysis scenarios
Data
In this practical exercise you will be working with real health data for the 3104 counties of the
continental United States (the USA without Alaska, Hawaii and Puerto Rico). The data is presented in
shapefile format and can be found on the S:\ drive.
S:\Faculty
of
Science
&
Engineering\Environmental
Sciences\6F6Z2002\PolygonPatternAnalysis\
&
Geographical
NOTE: Copy the ENTIRE \Data folder to your drive! Not just separate files from it!
Moran’s I: a global measure of spatial autocorrelation
All measures of spatial autocorrelation attempt to quantify Tobler’s law of geography – that features
in close geographical space are highly likely to share attribute properties. The Moran coefficient is
widely used because of its perceived statistical robustness (Slocum et al. 2005).
The method can be thought of as a spatial extension to the correlation technique used in standard
descriptive statistics, but in this instance, it looks at the spatial arrangement of the values of the
variable under investigation (O’Sullivan & Unwin 2003; Slocum et al. 2005). Figure 2, below, taken
from Slocum et al. (2005:52) provides a brief explanation of how Moran’s I is calculated:
Figure 2. Calculating Moran’s I (Source: Slocum, 2005, p. 52)
With the Moran’s I statistic the values of neighbouring areal units are compared against the overall
mean (this is the covariance measurement). Where neighbours share similar values in comparison to
this mean they will show positive covariance values. Dissimilar values will present negative values (see
O’Sullivan & Unwin 2003:197). These values are also weighted to identify neighbouring units, with a
value of 1 for adjacent units and 0 for non-neighbouring units. The computed Moran’s I value ranges
from -1 (perfect negative spatial autocorrelation – such as you would see on a chessboard) and +1
(perfect positive spatial autocorrelation – large sections of homogenous areas). A value of 0 is
indicative of a random pattern or distribution.
A note of caution
Please take the utmost care when using these spatial statistics (indeed when using any statistics for
that matter). All these measures suggest is the likelihood of a spatial pattern being present (or not)
and the possibility that there may be some underlying factor or process responsible. They are not
explanatory in their own right and using the results of such statistics should be only one of the steps
undertaken in your exploration of health data – not the end point! The next stage would be to
consider dataset reliability and whether the particular statistical techniques are appropriate and
what further testing and analysis is necessary.
1. Start up ArcMap (ArcGIS) and create a new map file – add the US Health Data shapefile
(US_Pov_HD.shp) to your display.
2. Using the symbology tab create a graduated colour map display for heart disease (HeartDisea)
(deaths per 100,000 persons – men and women) Figure 2b).
Figure 2b. Heart Disease Deaths per 100,000 persons in the continental USA
3. Spend a few minutes examining the data closely – can you see any obvious spatial patterns
present within the dataset?
4. To test the robustness of any spatial patterning you will calculate the Moran’s I measure of
spatial autocorrelation for the heart disease dataset. First, you must make sure that the
ArcToolbox is up and running. Click on the red toolbox icon from the toolbar to install this
function within your display. You will find that there are many different functions available –
you will be using the Spatial Statistics Tools at the bottom of the list.
5. Double click on Spatial Statistics Tools and do the same for the buttons Analyzing Patterns
and Spatial Autocorrelation (Moran’s I). This will bring up a new dialogue box as shown in
Figure 3.
6. Select US_Pov_HD.shp as the Input Feature Class and HeartDisese as the Input Field. You
should also make sure that the Generate Report (optional) is ticked and that you use the
Inverse Distance option for determining the spatial relationship (Figure 3). Then click OK.
Figure 3. Running Moran’s I in ArcGIS
7. This process could take a few minutes. You can access the results from the main menu, by
selecting Geoprocessing > Results. The numerical results and the graphic from the Moran’s I
analysis can be visualised by doubleclicking on the ‘Report File’ in the results panel:
Figure 4(a and b). Displaying the Spatial Autocorrelation (global) Moran’s I Results
LISA: a local measure of spatial autocorrelation
While the global measure of spatial autocorrelation that was above is a robust and useful statistic,
researchers have recently begun to highlight that it is very unlikely that any kind of geographical
data (representing health events or otherwise) will be spatially homogenous across the entire
study region (Lee & Wong 2001). Using a single global measure is therefore pretty crude and a
more localised measure would be much more appropriate (O’Sullivan and Unwin, 2011). Such a
local measure would enable researchers to identify pockets of variable spatial autocorrelation
across the region of study – this is particularly useful in the search for health hotpots, e.g. the
clustering of people affected by similar illnesses or other health complaints.
Luc Anselin, an internationally recognised researcher in spatial science and the creator of the
GeoDa software (http://geodacenter.asu.edu/ ), has developed a local measure of spatial
autocorrelation known as the Local Indicator of Spatial Association - LISA (Anselin 1995). This
procedure works in the same way as the global measure, but instead of just obtaining a single
number – you can create a variety of outputs including: (1) a significance map, (2) a cluster map
and (3) a scatterplot. Each of the maps presents a thematic map display which highlights areas of
local spatial autocorrelation – as opposed to the global summary, or average, statistic provided by
the standard Moran’s I. The cluster map, for example, displays data in several colours to signify
the four quadrants (high-high, low-low = positive spatial autocorrelation; high-low, low-high =
negative spatial autocorrelation).
8. CONTINUING WITH THE ANALYSIS OF THE ARTHRITIS (i.e. field HeartDiseas), select Spatial
Statistics Tools > Mapping Clusters from the ArcToolbox menu and then choose Cluster and
Outlier Analysis (Anselin Local Morans I) from the list of options.
Figure 5.
After some processing you should see a new map display presentation in ArcGIS – this is colour
coded to help identify High-high spatial autocorrelation (SA) (hotspots) in black, High-low SA in
yellow, Low-high in white and Low-low in blue.
Figure 6.
The output of a LISA analysis is not only the map of the areas with HH, LL, LH and HL local spatial
autocorrelation. LISA also adds another 3 attributes to the attribute table. So, altogether, it adds
the following 4 attributes:
•
The local Moran’s I (attribute: LMiIndex)
•
The respective z-score (attribute: LMiZScore)
•
The respective p-value (attribute: LMiPValue)
•
The Cluster Type (Attribute: COType)
To view the attribute table, right click on the layer in the table of contents on the left side of
the screen, and select Open Attribute Table:
Figure 7.
This will bring up the following table:
Figure 8.
You can visualise these new attributes by double-clicking on the layer in the table of contents ,
and selecting how you wish to visualise them in the Symbology tab:
Select a graduated colour display to visualise the P-values (LMiPValue), edit the range of value
used so you can clearly see areas where the value is less than 0.05
Figure 9.
Figure 10: The p-value output from LISA for the Heart disease attribute (mapped here
with 5 classes. The first class is modified accordingly to show all counties with p-values
below 0.05. Why? (HINT: use your statistical testing knowledge and with Figure 4b above)
Figure 11: The re s pect iv e z-value output from LISA for the Heart disease attribute,
mapped here w it h classes that are meaningful in order to interpret them. Question: why is
the choice of these classes considered meaningful? (HINT: use your statistical testing
knowledge and check Figure 4b above).
What information does this give us? Compare this map with the heart disease deaths maps.
ASSESSED
TASKS
Repeat this task generating the same type of outputs (global Moran’s I and local LISA) for
the following variable in the US_Pov_HD.shp shapefile located in S:\Faculty of Science &
Engineering\Environmental & Geographical Sciences\6F6Z2002\PolygonPatternAnalysis\:
Poverty ( % of residents living in poverty at last census)
Produce Maps for the following:
–
–
–
–
Heart Disease Deaths: A choropleth map (with appropriately chosen symbology, north
arrow, scale bar, title and your name)
Heart Disease LISA outputs, i.e.:
• local Moran’s I
• Cluster Type
• p-value
• z-score
Poverty (%): A choropleth map (with appropriately chosen symbology, north arrow, scale
bar, title and your name)
Poverty LISA Outputs, i.e.:
• local Moran’s I
• Cluster Type
• p-value
• z-score
Briefly discuss (no more than one side of A4) the global and local patterns of spatial autocorrelation
shown in your results. Are there any areas of the US which are particularly unhealthy in terms of
heart disease? Can you see any potential links between poorer health outcomes (in terms of heart
disease death) and socioeconomic status (poverty)? You should support your commentary with
some links to the academic literature.
Should you wish to comment on any additional data, such as race or education status, you will find
the “Interactive Atlas of Disease” produced by the Centre for Disease Control a useful resource.
http://www.cdc.gov/dhdsp/maps/atlas/
Environmental Risk Management
Spatial Epidemiology
Assessed practical:
Point Pattern Analysis
Tutor: Dr Elias Symeonakis (E410a)
e.symeonakis@mmu.ac.uk
26 January 2018
Introduction
In GIS we are able to utilise and analyse a variety of spatially‐referenced datasets. Typically
these spatial or geographical datasets are represented by what we term spatial entity data
models* or entities for short. Entities are essentially graphical components used by the
computer to represent the different phenomena of interest within the chosen study area.
There are several types of entity described below (after Chang (2003) and Heywood et al.
(2006)) including:
Point - a zero dimensional feature represented by a single coordinate XY pair or an
individual pixel
Line – a one dimensional feature which represents length and is encoded either as a
string of coordinate XY pairs or a linear series of contiguous pixels
Polygon – a two dimensional feature which has both an area and a perimeter. It is
represented either using a series of connected coordinate XY pairs with the same
start and end point coordinate or a cluster of contiguous pixels
Surface – this is a special form of entity which represents continuous phenomena
either using a raster grid or a Triangulated Irregular Network (TIN)
*You can find a fuller explanation of the spatial data modelling process (including entity
Network – this is another specialist entity representation which recognises the
selection)
in Heywood of
et line
al. (2006:71-107).
interconnection
features
1
In this practical exercise we will focus entirely on the point entity data model. Points are used
to represent the spatial location of events or activities known to have occurred in a defined
geographical area (Bailey & Gatrell 1995, Boots & Getis 1988). These are typically individual
events, or features, such as the centroid location or address point of a person (or persons)
affected by a particular illness or disease. This type of analysis is very commonly used in
spatial analysis, particularly in the areas of health, crime and ecology with a myriad of
academic papers available on the subject and several textbooks which focus on this particular
aspect of spatial analysis alone.
Point pattern analysis is a common procedure where centroid (or point location) data form
the primary dataset (Birkin et al. 1996). Researchers then employ a series of statistical
methods in an attempt to determine whether any patterns exist in the spatial or geographical
distribution of points (i.e. events) in the study area. Spatial point patterns specifically include
“a set of locations, irregularly distributed within a designated region and presumed to have
been generated by some form of [random or other] mechanism” (Diggle 2003:vii).
Rather than rely upon simple visual interpretation of the point distribution(s) which may
suggest specific patterns where none truly exists, specialist statistical methods are employed
to help identify whether any discernable point patterns exist and to help establish the
possible underlying causes for any evident spatial behaviours and patterns.
Types of point pattern
In undertaking point pattern analysis the user is exploring the dataset for evidence of specific
spatial or geographic properties. From this, the user can then begin to establish whether
there are specific processes which have generated the observed point pattern. Typically this
involves the study of the dispersion of points (location of point patterns with respect to the
geographical study area) or alternatively the arrangement of points with respect to each
other (Boots and Getis 1988). To understand these properties more clearly it is important to
define the possible patterns expected in a point pattern map display (Figure 1).
Figure 1. Point patterns (after Boots & Getis 1988)
2
The point pattern conditions include clustering (or aggregation), regularity, and randomness,
and are defined in the box below.
Clustering (Aggregation)
A concentration of events or objects (O’Sullivan & Unwin 2003), where the points are more
tightly grouped together than would be expected from a completely random pattern.
Dispersion (Evenly spaced)
The events or objects appear to be uniformly, or evenly, spaced. The observed average
distance between the points is also greater than that found within a completely random
pattern.
Randomness
Diggle (1983) describes the pattern of Complete Spatial Randomness (CSR) where the
points are characterised by uniformity and independence. More simply, the pattern of points
occurs by chance, with no variation in intensity across the study area. Boots & Getis (1988)
note that CSR is doubtful in real world situations where the likelihood is that no single
process (acting upon the points) is dominant, giving the appearance of a CSR pattern.
When studying an area of interest, it is useful to adopt point patterns analysis methods. Very
often data are collected at a number of discrete locations. Usually, we attempt to extrapolate
from the limited data to obtain information about the wider population or region. The analysis
of these points can allow us to identify whether there is any definable spatial component in
their behaviour. Take the following crime-based example: your hometown has recently
suffered a spate of break-ins, and the local police authority want to obtain further information
to help them catch the criminals and reduce the incidence of burglary. It is highly likely that
the police workers will record the burglaries in point form employing the household location as
the unit of observation. Using spatial statistics and GIS the police can begin to piece together
criminal activity in the area. First of all the police will be very interested in determining
whether there is any pattern to these burglaries. For instance, are there any localities that are
more affected than others by the criminal activity, i.e. hotspot areas where a greater number
of burglaries have been recorded? Obviously the distribution of points is likely to be affected
by the type of built environment and population numbers and dynamics. Once this factor has
been accounted for, the researcher or police worker can begin to examine the distribution of
point data to see whether there are any discernable patterns.
In this examination the police worker can start to determine whether there are clusters of
criminal activity. Using this evidence we can begin to hypothesise about the nature of the
burglaries, and potentially establish the reasons for such increased activity. For example, are
any areas affected in particular? And if so, is there any additional evidence that might be able
to help explain this? The following list of bullet points highlights some potential lines of
inquiry:
Higher incidence of burglaries in areas of socio-economic deprivation, potentially
as a result of poorer home security 3
Higher incidence in student areas, where multiple occupancy (e.g. flats in
renovated houses) is common offering greater opportunity for criminals
The use of such data exploration techniques to help develop hypotheses is a fairly typical aspect
of spatial data analysis. This type of technique can be used to help build spatial process models
and improve our understanding of the phenomenon under observation. Importantly, this is often
an iterative process, with many steps involved in the development of the spatial model.
Furthermore, there are a variety of different point pattern analysis methods available to the GIS
user and some of the key techniques are discussed below with example exercises for you to
complete later.
Distance Measures
One of the most common ways to detect any pattern within a point distribution is to examine the
distances – or spaces – between points (Gatrell et al. 1996), and compare these to another, typically
random, arrangement. Although relatively straightforward to calculate such measures are particularly
effective in demonstrating what are known as second order effects, described by O’Sullivan and Unwin
(2003: 79) as indicative of some form of “interaction between locations”. In other words, second order
effects demonstrate local patterning or variation which
is distinct from the global pattern or first order effects (Bailey and Gatrell 1995). Two such distance
measures commonly used in point pattern analysis are described below. Nearest neighbour analysis is
explained first followed by Ripley’s K statistic.
Nearest Neighbour Analysis
Nearest neighbour analysis is based upon a solid geographical principle that those objects or
phenomena that are located in close proximity to one another are likely to share similar
properties. This procedure describes the point pattern through calculating the mean distance to
each point’s nearest neighbour (Kitchin and Tate 2000). Then, using relatively simple statistical
analyses that compare the average distance(s) between closest neighbouring point observations
with those of a previously known pattern (typically the analyst would select a random pattern in
this type of analysis) it is possible to establish whether there are clustering or dispersed patterns
within the point distribution. Cluster patterns are defined by the short distances between proximal
neighbouring points, while dispersed point patterns display greater observed average distances
between points when compared to a random distribution network. To calculate the expected
average distant neighbours the following equation is used:
Rexp = 1 / (2√ (n / A))
Where A is the area of the study location and n is the number of points in the particular
distribution.
Lee and Wong (2001) identify another useful statistic based upon the average distance
information ‐ this is the randomness statistic, and is a simple ratio between observed and
expected distance between point locations.
R = robs / rexp
Where robs is the observed average distance between nearest neighbours and rexp is the
expected average distance between nearest neighbours using the basis of the theoretical
pattern.
4
Employing this statistic is relatively easy to determine whether point distributions follow
clustered, random or dispersed distributions. Where R is less than 1 the data set is
characterised by an increasing cluster tendency, and in contrast R values greater than 1
assume dispersed spatial behaviour (evenly spaced events).
Nearest Neighbour Analysis: A Worked Example
So taking a hypothetical example of a study of town and city locations within a 100 x 100
kilometre study area, we can begin to establish the mean distances of the different events
and compare this to the expected average distance between nearest neighbours. The region
of interest contains 8 major towns and cities across its 10, 000 square kilometre study area
as shown in Figure 2. The location of each settlement is provided in Table 1, as are the
details of distance to closest neighbour. The calculation of the nearest neighbour index is
given after the table.
Figure 2. Point display of settlement distribution in hypothetical study area
5
Ripley’s K Function
One of the problems associated with the nearest neighbour statistic is that it only considers the
closest neighbour and does not consider other spatial scale effects (O’Sullivan and Unwin 2003,
Mehrer and Westcott 2006). The K function originally developed by Ripley (1976) provides an
opportunity to explore spatial patterning at different spatial scales within the chosen study area.
To calculate K we must visit every event or point in the study area and then establish the mean
number of other points falling within a set distance of the start point (Bailey and Gatrell 1995).
Typically this distance is defined as a circle of radius d and is repeated for different radius values
(O’Sullivan and Unwin 2003) (Figure 3). The mean counts for each circle are then divided by
what is known as the mean intensity of the process – which is in effect the total number of events
or points divided by the study area (Fotheringham et al. 2000).
6
Figure 3. Determining the K function (source: Bailey & Gatrell (1995:93))
The results of the K function can be presented graphically and help to show at what spatial
scales different pattern behaviours (such as clustering may) occur (Figure 4). When the
observed K value is larger than the expected K value for a particular distance, the distribution is
more clustered than a random distribution at that distance (scale of analysis). When the
observed K value is smaller than the expected K value, the distribution is more dispersed than
a random distribution at that distance. When the observed K value is larger than the Higher
Confidence Envelope value, spatial clustering for that distance is statistically significant. When
the observed K value is smaller than the Lower Confidence Envelope value, spatial dispersion
for that distance is statistically significant.
7
Figure 4. Point pattern behaviour at different spatial scales (Source: ArcGIS 10.1 help
pages)
Intensity measures
Alternative approaches to measuring point patterns have moved away from basic measures of
distance to the intensity (or density) of points in a given area. One such method is quadrat
analysis, where simply the number of events (points) that occurs within a set of, typically square,
sampling frames is counted. This is used to establish a frequency distribution, which records the
number of events in each individual quadrat. This distribution can then be compared against
another distribution, commonly a random pattern. In a random pattern, the mean number of
points in each quadrat would approximate the variance of the number of points per quadrat. This
can be calculated by the Variance Mean Ratio (VMR), which equals 1 for a random distribution.
Where the VMR is greater than 1 then a cluster pattern is identified. Dispersed patterns are
shown by a VMR of less than 1. This type of method has significant problems, however, most
notably concerning the choice of quadrat size and the fact that it does not consider local density
– only measuring the number of points and not their spatial distribution within a single quadrat.
Thankfully there are a number of other intensity‐based measures, the most significant of which is
the Kernel Density Estimator described next.
Kernel Density Estimation
The kernel density estimation technique involves the creation of a continuous (raster) surface
which represents the variation in the density of point events in a given study area (Chainey &
8
Ratcliffe 2005). Specifically the analysis involves the estimation of the density of points
across geographical space using kernels which have a defined search radius (Figure 5).
Figure 5. The kernel function (Source: Chang (2003:282))
The appearance of the resultant raster‐based output is strongly influenced by the choice of kernel bandwidth
– the radius used to search for other points around each event (O’Sullivan & Unwin 2003).
Software environments for point pattern analysis
There is a great deal of specialist software available for all kinds of spatial analysis including
point pattern detection. Many of the standard desktop GIS packages, such as ArcGIS and
IDRISI, include some (admittedly rather limited) point pattern analysis functionality, although
you will find standalone specialist packages such as CrimeStat and R more capable for the
task with a great range of point pattern analysis options available.
ArcGIS
ESRI’s ArcGIS software environment offers users the ability to undertake nearest neighbour, Ripley’s K and
kernel density estimation. This is primarily carried out through the ArcToolbox
unction in ArcGIS desktop. You should see the ArcToolbox as a small icon on the main toolbar.
9
CrimeStat
CrimeStat is a standalone spatial statistical package for the analysis of point-based crime
data. It was created by Ned Levine for the analysis of US crime data and is freely available
for download for educational and research use. It offers a range of measures from basic
centrographic analysis through to complex spatio-temporal modelling.
http://www.icpsr.umich.edu/CRIMESTAT/
R – Statistical Computing
R is a statistical computing environment created by the academic community and freely
available for non-commercial use. Although it is used for many different statistical tasks it has
a very strong spatial statistical component based around a series of additional packages
which can be downloaded and added to the main R GUI interface.
http://www.r-project.org/
Working in ArcGIS
The practical exercise is to be completed using the desktop GIS package ArcGIS available in
the computer labs. Please note that you may not finish this task within the hour or so available
and therefore may need to work on this outside of the GIS lab class. You are required to submit
the map and table outputs from the different analyses and answer the questions set out below.
You should aim to write this up (including any figures) in 2 or 3 sides of A4.
Point Pattern Analysis with ArcGIS
The dataset provided for this practical exercise is:
Lancashire Lung Cancer data – this is a shapefile with the locations of reported lung cancer
incidences in southern Lanchasire (Source: Bailey and Gatrell (1995)).
The data are available on S:\Faculty of Science & Engineering\Environmental &
Geographical Sciences\6F6Z2002\Point Pattern Analysis\lung_cancer_lancs
Copy the folder onto your own drive space or alternatively onto a USB flash drive.
Nearest neighbour analysis
1. Open up ArcGIS and connect to the lung_cancer_lancs data folder in your personal
drive space (or USB stick).
2. Add the lung_cancer_lancashire.shp file to your display.
10
3. You should now see a display like that shown in Figure 6 – you will see that it contains
point data that represent the incidence of lung cancer among the local population.
You may want to change the symbology properties of the data if the default symbol
and colour is not to your liking.
Figure 6. Lung Cancer data from Lancashire (Source: Bailey and Gatrell (1995))
4. Visually examine the lung cancer point dataset for Lancashire. Can you see any
pattern emerge? How might you describe this set of points – clustered, dispersed,
random?
5. Once you have decided upon how to describe this pattern visually the next step is to
see whether there is any statistical basis to this assumption. Here you will employ
the nearest neighbour index. Select ArcToolbox from the toolbar – this is the
little red tool box icon on the toolbar – you should now see a new menu display on
your screen next to Layers. Within ArcToolbox there are numerous different
modules and operations.
11
6. Find Spatial Statistics Tools and from its submenu select Analyzing Patterns.
Here you will find the option to perform the Average Nearest Neighbour
technique (Figure 7).
Figure 7. Average Nearest Neighbour statistic
7. Select lung_cancer_lancashire as the Input Feature Class. Check the Generate
Report box and click OK (accept all other defaults).
8. ArcGIS should now start processing your data and calculating its nearest neighbour index.
Don’t worry if this takes the computer a minute or two to complete. When it does finish
you should ask to see the results (from the main menu: Geoprocessing > Results). What is
the value of the Nearest Neighbour Ratio? Remember, you should compare it with unity.
What pattern does the lung cancer events data show? Double-click on the HTML Report
File: Nearest Neighbor_Result.html to open up the graphical output of the results (Figure
7b). Make sure you keep a record of the results.
12
Figure 7b. Graphical output of NN results using ArcGIS
Kernel Density Estimation (KDE)
9. To estimate a surface that describes the density of the cancer incidences in
Lancashire using the KDE approach, go to the Arctoolbox > Spatial Analyst Tools >
Density menu and double click on Kernel Density. Select lung_cancer_lancashire
as the Input Feature Class and choose an appropriate output name and location for
the raster that will be created via the KDE. Leave the rest of the defaults as they
are and click OK. After the KDE algorithm finishes, the raster will be
automatically displayed. You can modify the colouring scheme as you see fit,
e.g. Figure 7c. To modify the colouring scheme, you need to double click on the
Kernel Desnity layer, click on the Symbology tab and then click on the Classify button
to modify the number of classes and the method of classification. You can use the
Symbology of the lung cancer locations layer to change their symbol too, from the
default dots to x’s (Figure 7c) so that they do not cover too large an area of the map.
You can also click on the 0 density value class in the table of contents to change it’s
colour to transparent (Figure 7d).
Figure 7c. Density of lung cancer occurrences in Lancashire estimated using the Kernel
Density Estimation tool (number of classes: 9, method: Quantile)
Figure 7d: Changing the colour of individual classes
Are the locations of lung cancer in Lancashire clustered? To assist in the discussion, click on
the little arrow next to the Add Data button from the top menu in ArcMap and select Add
Basemap (Figure 7e):
Figure 7e: Adding a basemap
Can you find out using the base map what if any urban areas are linked to these clusters? Use
the zoom in tool if you need to. What happens if you modify the search radius option in the KD
estimation window? Try a larger and a smaller radius and visualize the resulting rasters to
compare.
K statistical analysis of the lung cancer data
10. Using the lung_cancer_lancashire.shp data select Multi-Distance Spatial Cluster
Analysis (Ripley’s K Function) from the Analyzing Patterns submenu of Spatial
Statistical Tools. You should be prompted with a dialogue box approximately like that
shown in Figure 8.
Figure 8. Ripley’s K Function in ArcGIS
11. Select lung_cancer_lancashire as the Input Feature Class and then choose a suitable
name and location for the Output Table. For Compute Confidence Envelope select 99
Permutations. Check the display output graphically box and click OK.
12. After several minutes of processing you should be presented with a dialogue box
which shows how the data are clustered (or dispersed) and how these patterns
change with spatial scale.
13. Are the locations of lung cancer in Lancashire clustered? And is there any variation
with spatial scale? Keep a copy of the output graph for inclusion in your submission.
TASKS
Write up this practical in report format with (i) an introduction, (ii) a description of methods, (iii)
the visual presentation of any maps/result outputs, and (iv) your answers/written discussions to
any set questions below.
Include a map display for the Lancashire lung cancer data. Your map should be presented
separately and include north arrow, legend and your name (clearly labeled).
Write a brief commentary describing the pattern of lung cancer events (is any clustering or
other pattern present?) and link this discussion with the results of your nearest neighbor
analysis, kerned density estimation and Ripley’s K analysis. You should also include the output
graphical results and KDE map.
Make sure that your write up makes appropriate use of the supporting academic
literature.
14
References
Bailey, T.C. & Gatrell, A.C. (1995) Interactive Spatial Data Analysis. Harlow:Prentice Hall
Boots, B. & Getis, A. (1988) Point Pattern Analysis. London: Sage
Birkin, M., Clarke, G.P., Clarke, M. & Wilson, A.G. (1996) Intelligent GIS: Location Decisions
and Strategic Planning. Cambridge: Geoinformation.
Chainey, S. & Ratcliffe, J. (2005) GIS and Crime Mapping. London: Wiley.
Chang, K‐T. (2003) Introduction to Geographic Information Systems. Second Edition. Boston:
McGraw Hill.
Diggle, P.J. (2003) Statistical Analysis of Spatial Point Patterns. Second Edition. London:
Arnold.
Fotheringham, A.S., Brunsdon, C. & Charlton, M. (2000) Quantitative Geography:
Perspectives on Spatial Data Analysis. London: Sage.
Gatrell, A.C., Bailey, T.C., Diggle, P.J. & Rowlingson, B.S. (1996) Spatial point pattern
analysis and its application in geographical epidemiology. Trans Inst Br Geogr 21:256-274.
Heywood, I., Cornelius, S. & Carver, S. (2006) An Introduction to Geographical Information
Systems. Third Edition. Harlow: Prentice Hall.
Kitchin, R. & Tate, N.J. (2000) Conducting Research in Human Geography: Theory,
Methodology and Practice. Harlow: Prentice Hall.
Lee, J. & Wong, DS. (2001) Statistical Analysis with ArcView GIS. New York: Wiley.
Levine, N. (2007) CrimeStat: A Spatial Statistics Program for the Analysis of Crime Incident
Locations (v3.1). Ned Levine & Associates, Houston, TX, and the National Institute of
Justice, Washington, DC.
Mehrer, M. & Westcott, K. (2006) GIS and Archaeological Site Location Modeling. CRC
Press. O’Sullivan, D. & Unwin, D. (2003) Geographic Information Analysis. New Jersey:
Wiley.
Ripley, B.D. (1976) The second-order analysis of stationary point processes. Journal of
Applied Probability 13: 255-266.
Acknowledgements:
The data for this exercise have been created by other researchers and are included in Bailey
and Gatrell (1995).
15