GIS to Evaluate
Environmental Risk Factors
for Neural Tube Defects
by Kris Martinez
Term Project for
Geographic Information Systems
in Water Resources
Department of Civil Engineering
The University of Texas at Austin
May 2, 1997
Table of Contents
In April 1991, three anencephalic babies were delivered
during a single 36-hour period at a hospital in Cameron county, Texas.
Anencephaly is a type of neural tube defect (NTD) that is universally fatal
because the baby is born with large portions of its brain and skull missing.
NTDs are birth defects that result from a failure of the neural tube to
close properly early in the development of an embryo (1).
NTDs also include the serious disorders spina bifida and encephalocele.
The Cameron county cluster was surprising considering that NTD rates in
the United States (U.S.) are usually cited as less than 10 per 10,000 births.
The NTD rate in Cameron county for women who conceived during 1990 and
1991 was 27.1 per 10,000 births (2). Shortly after
the incident, the Texas Department of Health (TDH) began a surveillance
and intervention project aimed at reducing the number of infants born with
The main question that needs to be answered is why the Brownsville cluster
occurred. In more general terms, we must ask why the Lower Rio Grande
Valley of Texas had an NTD rate that was almost three times the national
average in 1991. This term report describes an innovative approach
that can be used in conjunction with traditional methods to examine the
role of environmental risk factors in the development of NTDs. This
approach is being used for the GIS component of the Texas Neural Tube Defect
Project (TNTDP) surveillance and intervention project begun in the summer
of 1993. The project addresses the hypothesis that women who have
had an NTD-affected pregnancy are more likely than women with normal pregnancies
to have conceived within a one-mile radius of sites that are potential
sources of environmental pollution. These sites include fields where
agricultural chemicals are applied and industries that must report their
emissions to state and federal agencies.
The base map developed for this project will serve as the backdrop for
the TDH investigation. In order to establish spatial relationships,
the base map must include both potential sources of pollution and the
cases and controls. Selected industries, Superfund sites, permitted
landfills, and wastewater dischargers were all considered. The locations
of these sources were determined and were represented as points on the
base map. Agricultural sources of pollution were taken into account
by including land use-land cover (LULC) coverages which provide a general
idea of where certain broad categories of crop types are grown. Finally,
a hypothetical point coverage of 10 case-women and 20 control-women was
created. The TNTDP defines case-women as women who have had an NTD-affected
pregnancy; control-women have not had an NTD-affected pregnancy.
The term project also describes a simple environmental risk analysis using
GIS. Buffers with a 1-mile radius around each of the imaginary case-
and control-women were generated. The number of subjects conceiving
within one mile of an environmental risk factor was determined by comparing
plotted buffer zones with the point coverages showing the locations of
the various sites. The proportion of case-women living close to a
certain type of pollution source was compared to the proportion of control-women
living near the same type of potential pollution source. A more thorough
evaluation of risk would involve rates of incidence, odds ratios and 95%
confidence intervals. These methods can be easily incorporated into
a GIS approach, but are not described in this report.
The base map for this term project covers the two southernmost counties
in Texas, Cameron and Hidalgo. These two counties account for half
of the NTD occurrence along the border between Texas and Mexico. The
NTD rates for Cameron and Hidalgo counties are currently one and one-half
times the national average. Much like the border region joining California
and Mexico, population growth in the area is high. High growth rates
contribute to problems with crowded housing, less than adequate water and
sewage sanitation and poor prenatal care. Low socio-economic status
(SES) has repeatedly been shown to be associated with increased incidence
Cameron and Hidalgo counties are predominantly agricultural,
at this time. However, the passage of the North American Free Trade
Agreement (NAFTA) has prompted industry to begin introducing new facilities
into the area. There are environmental risks associated with both
agriculture and industry. Considering agriculture, an increased prevalence
of NTDs has been observed in areas with a high use of pesticides
(3) . Exposure to agricultural chemicals can occur
through contact with airborne contaminants or through the consumption of
vegetables and the ingestion of polluted drinking water. Conversely,
rapid industrialization introduces greater contaminant levels from toxic
air and water emissions. An increased number of hazardous waste
sites can also be expected.
Together, Cameron and Hidalgo counties have over a
half-million acres of cultivated farmland. In 1995, 240 thousand
acres of cotton and 240 thousand acres of sorghum were produced in these
counties, making these the primary crops grown in the Rio Grande Valley
(4). Cotton requires a greater variety and amount
of pesticides compared to other crops. The most widely used pesticides
in Cameron and Hidalgo counties are azinphos-methyl, methyl parathion and
trifluralin. The Environmental Protection Agency (EPA) has found
that methyl parathion has adverse reproductive effects (5).
Because of the primarily agricultural nature of the area, and the predominance
of crops requiring extensive use of pesticides, a comparison of rural and
urban subjects is essential.
The primary objective of this term project was to develop a base map for
the TDH investigation of risk factors associated with NTDs. A secondary
objective was to begin looking at how GIS could be used in a statistical
analysis of the relationships between where a woman conceived and her proximity
to various sources of pollution. The base map establishes a spatial
context for these relationships. In order to allow others to use
and expand upon the information that was collected for this project, it
was necessary to develop the base map in a broadly applicable coordinate
system. For this reason, all of the collected data layers were projected
into the Texas Statewide Mapping System (TSMS). TSMS is a standard
coordinate system recommended by the Texas GIS Planning Council.
Another goal was to adequately explain each of the coverages and make them
easily accessible to others. A data dictionary describing each of
the data layers has been included in the Appendix
to this report. In the future, it will probably be necessary to document
all of this information in accordance with federal standards for metadata.
The base map created for this report was designed to show locational relationships
between the different elements of the TDH study. It includes point
coverages of different sources of environmental pollution such as industries
reporting under the Toxic Release Inventory (TRI), wastewater treatment
facilities reporting under the National Pollution Discharge Elimination
System (NPDES), Superfund sites and permitted landfills. Land use
and land cover files were obtained to show the extent of rural and urban
areas in Cameron and Hidalgo counties. Various other features, including
city and county boundaries, streets and highways, and water bodies were
also included to better define the study region. Finally, Digital
Raster Graphic (DRG) files containing United States Geological Survey (USGS)
7.5-minute quads were obtained for selected areas. These files produce
geographically referenced images that can be used as paper-map backgrounds
for comparison with actual coverages.
This report will discuss three main components of the base map development.
In the initial stages of the project, several coverages were created to
define the area of study. These included city and county boundaries
and distinguishing features like rivers and major highways. Subsequently,
potential sources of pollution were mapped using the data layers described
above. Addresses were obtained for many of the TRI and NPDES sites.
A street database was geocoded making it possible to compare points located
on the map using latitudes and longitudes with points defined by addresses.
This can be used to ensure the accuracy of an industry location, for example.
Finally, a coverage was created pinpointing the locations of a hypothetical
group of case- and control-women at the time of conception. It is
emphasized that these subjects are imaginary and were created to show how
GIS can be used to quantify spatial relationships.
Defining the Area of Study
Beginning with a county coverage of the entire U.S., it is necessary to
select only Cameron and Hidalgo counties and create a separate coverage.
In order to do this, a query is performed in ArcView by clicking on the
When the query box appears, select just the state of Texas by clicking
on [State_name] under fields then
and then under values, click "Texas." The query can then be repeated
to select Cameron and Hidalgo counties. In order to do this, the
fips code for the county must be known. The fips code for the state
of Texas is '48' and the codes for Cameron and Hidalgo counties are '48061'
and '48215,' respectively. The result of this process is shown in
Figure 1. Selecting the Study
Area in ArcView
Once the study area is selected, it can be converted to a shapefile by
using the menu option Theme/Convert to Shapefile. Many of
the coverages for this term project were created using the query process.
Map projections are used to represent the curved surface of the earth on
a two-dimensional surface like a piece of paper. There are many different
types of projections. Consequently, it is important to use a common
projection that is applicable in most situations. The NTDP will use
a coordinate system called the Texas State Mapping System (TSMS).
Specific parameters for TSMS are provided below.
In order to describe projection and some of the problems that can
be encountered, the process of obtaining a land use-land cover (LULC) file
for the project will be described. The import file for the Brownsville
area, lbr25097.e00, was downloaded from the EPA site earth1.epa.gov.
This file was imported (decompressed) into a coverage using the following
1st Standard Parallel: 27 25 0.0
2nd Standard Parallel: 34 55 0.0
Central Meridian: -100 0 0.0
Latitude of Origin: 31 10 0.0
False Northing: 1000000.0
False Easting: 1000000.0
Arc: import grid lbr25097 browns
Browns is the name of the output coverage. In order to project the
coverage into a different coordinate system, it is important to know what
the original projection was.
Arc: describe browns
Along with other information, the coverage description states that Browns
is in an Albers projection. The TSMS uses a Lambert projection.
It is important to note that the coverage cannot be transformed into a
Lambert projection in one step. It must first be defined in a base
coordinated system called Geographic. My original attempts at direct
conversion resulted in incorrect straight-line images when the coverage
was brought up in ArcView. Specific parameters for a Geographic coordinate
system are presented below.
After the coverage is converted into Geographic coordinates, it can be
successfully projected into TSMS using the project command.
An output coverage must be specified. In this case the coverage was
named lulc_cam, as in the land use-land cover data layer for Cameron county.
Units: dd (digital degrees)
Arc: project cover browns lulc_cam
I combined the LULC data layer with the county coverage using a query process
to select out the features of LULC that were outside of the county boundaries.
I did a theme-on-theme selection in ArcView by choosing the menu item Theme/Select
by Theme. Using the LULC layer as the active theme, its features
were intersected with the boundaries of County by clicking New Set
as shown in Figure 2.
Figure 2. Intersecting Coverages
The final step in defining the study area was to add geographical reference
points to make it easier to compare the important relationships between
subjects and sources of pollution. Labels were added to identify
Cameron and Hidalgo counties and a number of larger cities and towns.
This allows the viewer to distinguish between a case in Harlingen and a
case in Brownsville, for instance. Labels are created in ArcView
by selecting a theme and choosing the menu item Theme/Auto Label.
If there are a lot of features associated with a coverage, it is possible
to label a selected few by zooming in on them and clicking the button
Label Only Features in View Extent. Using these procedures,
a map can be created to simplify a study region. A map showing some
important cities and towns in the NTDP investigation is presented in Figure
Figure 3. Selected cities in
Cameron and Hidalgo Counties
Point coverages were created for several different pollution sources.
In order to define a location on the base map, it was necessary to know
a latitude and a longitude for each point. The coverages for Superfund
sites and permitted landfills were downloaded directly from the Texas National
Resource Conservation Commission (TNRCC)
Regulated Sites web page. The EPA's Envirofacts
database provided addresses and Global Positioning System (GPS) coordinates
for the majority of the industries reporting under TRI. A general
procedure was followed to transform the raw data into a layer for the base
The first step is to create a table that can be loaded into ArcView.
ArcView is able to work with database files (*.dbf) and delimited text
files (*.txt) along with its own program specific INFO files. Most
spreadsheet and database applications allow their files to be transformed
into these generic formats. Tables are imported by opening up a View
window and clicking on the
button and then the Add button. This action places the table in the
ArcView environment but it does not create a theme (an ArcView coverage).
To do this, select the menu item Theme/Add Event Theme and specify
the name of the table and the fields containing the x- and y-coordinates.
This step allows data from the table to be used to create a new theme.
After the theme is created, it can be changed into a shapefile as previously
In order to project the theme, it must be turned into a coverage that Arc/Info
can process. The following commands perform this transformation.
Afterward, the coverage can be projected.
Arc: shapearc industry1 industry2
Arc: build industry2 points
Arc: addxy industry2
A map showing the location of industries reporting under TRI regulations
is presented in Figure 4.
Figure 4. Toxic Release Inventory
Sites in Cameron County
Geocoding can be used to create a geographically
referenced point coverage using addresses instead of latitudes and longitudes.
This is very useful when GPS coordinates are not available. ArcView
can geocode a location when a street address and a zip code (or city) are
specified. In order to map specific locations, a coverage or table
containing the links between streets and geographic coordinates are necessary.
Topographically Integrated Geographic Encoding and Referencing
files created by the Census Bureau can be used for this purpose.
Once the reference table or coverage is obtained, a geocoding index is
created in ArcView by selecting Theme/Properties and clicking on
Latitudes and longitudes were found for all of the point sources in this
project. Addresses were also obtained for the TRI and NPDES sites.
When both types of information are available, addresses and GPS coordinates
can be compared for quality control purposes. For the TNTDP, it will
be important to have accurate locations for all of the point coverages
in order to determine the exact distances between case- and control-women
and risk factors. When the reference table is available, clicking on the
button and specifying the street address and zip code will allow the program
to locate a point on the map. The location
defined by the address can then be compared to the location defined by
the latitude and longitude. Two different locations for the same
industrial facility are shown in Figure 5.
Figure 5. Comparing Two Point
Locations for a Single Industry
In Figure 5, either point location may be incorrect. However, ArcView
has the ability to evaluate how accurately an address location can be pinpointed.
When ArcView geocodes, it calculates a score based on the specified address.
A perfect match between an address and a geographically referenced location
gives a score of 100. If the address for the industry above had received
a perfect score, one could be pretty certain that the GPS coordinates defining
the other point were inaccurate.
The reference line is provided as a gauge of the relative differences between
the points. If the address location was found to be correct, one
would need to either use that point location or evaluate whether one-tenth
of a mile is an acceptable level of error given the source and type of
potential pollution involved. Most of the time, you are looking at
a coverage consisting of many points. In this case, it becomes necessary
to look at relative amounts of error between the locations determined by
addresses and those determined by latitudes and longitudes. This
can be a time-consuming process.
Outlining Spatial Relationships
Once the base map has been defined and the locations of the pollution sources
have been determined, risk factors can be spatially quantified by mapping
the locations of case- and control-women at the time of conception and
establishing buffer zones. Buffer zones define a circular area of
a specified radius surrounding a subject. The number of subjects
living within the predetermined buffer distance can then be calculated
for each category of pollution source. Using this methodology, the
proportion of case-women living close to a certain type of pollution source
can be compared to the proportion of control-women living near the same
type of pollution source.
For this term project, a point coverage of a hypothetical group of 10 case-women
and 20 control-women was created in ArcView. For the TNTDP, actual
GPS coordinates for the subjects are determined. A table containing
a unique identification number for each subject along with the latitude
and longitude of the location where they conceived can be imported into
ArcView and transformed into a point coverage using the procedure described
for mapping potential risk factors. Buffer zones of 1-mile, 1/2-mile,
and 1/4-mile have been established for the study. A buffer zone coverage
can be generated around the point locations of the hypothetical subjects
using the following command:
Arc: buffer subjects buffercov # # 1600 # point
In the above command, Subjects is the coverage to be buffered and Buffercov
is the name of the coverage to be created. The #-signs tell Arc/Info
to use default values for some options. The buffer distance of one
mile is specified as 1600 meters in the case where the coverage units are
defined as meters (TSMS). Finally, the word point signifies
that the coverage to be buffered is a point coverage. If the coverage
type is not specified, the program will run properly but the coverage will
be incorrect when it is displayed in ArcView.
As described earlier, GIS is used to evaluate spatial relationships for
two important issues: rural and urban trends for subject conception
locations and proximity to potential sources of pollution, including landfills,
Superfund sites, TRI sites, and NPDES sites. For the rural and urban
comparison, the LULC data layers were used to distinguish between different
broad categories of land use within Cameron and Hidalgo counties.
Point locations for the imaginary case-women (red) and control-women (blue)
can be compared to land use by looking at Figure 6.
Figure 6. Rural and Urban Trends
for Case- and Control-Women
There are some trends that can be inferred from just glancing at the figure.
The case- and control-women are concentrated in urban areas (yellow).
The large amount of area identified in the map as agricultural (green)
is consistent with the fact that Cameron county has over 200-thousand acres
of cultivated farmland. However, the case- and control-women are
concentrated in the urban areas (yellow). Differences between proportions
of cases and controls living in rural or urban areas are not immediately
obvious. However, these relationships can be quantified and are discussed
in the Results section of this report.
Evaluating spatial relationships for specific categories of pollution sources
is accomplished in a similar manner. The point coverage for the source
is displayed in ArcView along with the buffer coverages for the subjects.
The proximity of case- and control-women to TRI sites in Cameron county
can be examined in Figure 7.
Figure 7. Case and Control
Proximity to Toxic Release Inventory Sites
The conclusion that can be immediately gathered from the figure is that
TRI sites occur predominately in urban areas. This observation can
be correlated to the previous statements made about the relationships between
increased industrialization and greater contaminant levels from toxic emissions.
Once again, differences between case and control proximity to TRI sites
is not visually apparent but can be easily quantified as described below.
The spatial relationships that are inferred from looking at subject proximities
to environmental risk factors are easily quantified. Once buffer
zones are plotted around cases and controls, the area within and outside
of the boundary becomes defined by distances. As an example, buffer
zones were used to compare the proportion of the imaginary subjects that
were living in rural areas at the time of conception with those living
in urban areas. A rural subject was defined as a woman who was surrounded
by at least one mile of agricultural land, as defined by LULC data.
All of the other subjects were considered to have conceived in an urban
area. The number of cases and controls meeting each type of criteria
was counted. This number was then divided by the total number of
cases or controls and multiplied by 100 to arrive at a percentage.
The trends for the hypothetical sample of subjects are shown in Table 1.
Table 1. Urban and Agricultural
Given a statistically significant sample population, it could be concluded
that the majority of cases and controls conceived in urban areas.
The conclusion could be reinforced by looking at census data. The
fact that there are no differences between the percentages of cases and
controls living in either type of area would suggest that risk factors
associated with urban areas were at the same level as those associated
with rural areas. This assessment could be further investigated by
looking at spatial relationships for specific types of pollution.
The procedure outlined above for rural and urban subject comparisons could
be used to examine risk factors associated with different types of pollution.
Looking at different categories of sources cannot provide any information
about specific contaminants, but it can provide a sense of which pathways
would merit further investigation. For example, if the percentage
of cases living near landfills was significantly higher than the percentage
for controls, one could consider the possibility that groundwater contamination
by leachates should be examined. A higher percentage of controls
near TRI sites would point in the direction of air emissions and suggest
that hard data for actual industrial releases be collected. Spatial
relationships between the hypothetical subjects and different types of
sources are presented in Table 2.
Table 2. Subject Comparison
for Specific Types of Potential Pollution Sites
At the present time, the rate of NTDs in Cameron and Hidalgo counties remains
at one and one-half times the national average. Members of the community
along the border between Texas and Mexico have voiced their concerns that
an environmental agent could be responsible for the elevated rate.
One of the ways in which environmental effects can be assessed is by looking
at spatial trends in conjunction with traditional methods, such as demographics.
GIS is powerful and innovative tool for spatial analysis. Consequently,
it provides an opportunity for uncovering relationships that might not
have been discovered in the past. This term report serves as a brief
introduction to an important study that is now underway at the Texas Department
of Health. It is important in that it addresses a problem that is
common to the entire border region, high prevalence of NTDs. In addition,
the approach that is being developed can be applied to any environmental
risk factor to determine whether or not it has actually caused a negative
impact on human health.
1. Sever, L. The Conundrum of Birth Defects Clusters. Health
and Environment Digest,
2. Texas Department of Health. An Investigation of a Cluster
of Neural Tube Defects in
Cameron county, Texas.
July 1, 1992. [return]
3. Field B., Kerr C. Herbicide use and incidence of neural-tube
defects. Lancet i:1341-142,
4. Texas Agricultural Statistics Service. Texas Agricultural
Statistics. 1995. [return]
5. U.S. Environmental Protection Agency. 1994. [return]
Please send any comments regarding this document to: email@example.com
Click here to go to my personal home page: Kris Martinez
Go to the Course Home Page