Using GIS to Evaluate
Environmental Risk Factors
for Neural Tube Defects

by Kris Martinez

Term Project for
Geographic Information Systems
in Water Resources
Department of Civil Engineering
The University of Texas at Austin

May 2, 1997

Questions or Comments?


Table of Contents



Introduction

In April 1991, three anencephalic babies were delivered during a single 36-hour period at a hospital in Cameron county, Texas.  Anencephaly is a type of neural tube defect (NTD) that is universally fatal because the baby is born with large portions of its brain and skull missing.  NTDs are birth defects that result from a failure of the neural tube to close properly early in the development of an embryo (1).  NTDs also include the serious disorders spina bifida and encephalocele.  The Cameron county cluster was surprising considering that NTD rates in the United States (U.S.) are usually cited as less than 10 per 10,000 births.  The NTD rate in Cameron county for women who conceived during 1990 and 1991 was 27.1 per 10,000 births (2).  Shortly after the incident, the Texas Department of Health (TDH) began a surveillance and intervention project aimed at reducing the number of infants born with NTDs.

The main question that needs to be answered is why the Brownsville cluster occurred.  In more general terms, we must ask why the Lower Rio Grande Valley of Texas had an NTD rate that was almost three times the national average in 1991.  This term report describes an innovative approach that can be used in conjunction with traditional methods to examine the role of environmental risk factors in the development of NTDs.  This approach is being used for the GIS component of the Texas Neural Tube Defect Project (TNTDP) surveillance and intervention project begun in the summer of 1993.  The project addresses the hypothesis that women who have had an NTD-affected pregnancy are more likely than women with normal pregnancies to have conceived within a one-mile radius of sites that are potential sources of environmental pollution.  These sites include fields where agricultural chemicals are applied and industries that must report their emissions to state and federal agencies.

The base map developed for this project will serve as the backdrop for the TDH investigation.  In order to establish spatial relationships, the base map must include both potential sources of pollution and the cases and controls.  Selected industries, Superfund sites, permitted landfills, and wastewater dischargers were all considered.  The locations of these sources were determined and were represented as points on the base map.  Agricultural sources of pollution were taken into account by including land use-land cover (LULC) coverages which provide a general idea of where certain broad categories of crop types are grown.  Finally, a hypothetical point coverage of 10 case-women and 20 control-women was created.  The TNTDP defines case-women as women who have had an NTD-affected pregnancy; control-women have not had an NTD-affected pregnancy.

The term project also describes a simple environmental risk analysis using GIS.  Buffers with a 1-mile radius around each of the imaginary case- and control-women were generated.  The number of subjects conceiving within one mile of an environmental risk factor was determined by comparing plotted buffer zones with the point coverages showing the locations of the various sites.  The proportion of case-women living close to a certain type of pollution source was compared to the proportion of control-women living near the same type of potential pollution source.  A more thorough evaluation of risk would involve rates of incidence, odds ratios and 95% confidence intervals.  These methods can be easily incorporated into a GIS approach, but are not described in this report.

Background

The base map for this term project covers the two southernmost counties in Texas, Cameron and Hidalgo.  These two counties account for half of the NTD occurrence along the border between Texas and Mexico.  The NTD rates for Cameron and Hidalgo counties are currently one and one-half times the national average.  Much like the border region joining California and Mexico, population growth in the area is high.  High growth rates contribute to problems with crowded housing, less than adequate water and sewage sanitation and poor prenatal care.  Low socio-economic status (SES) has repeatedly been shown to be associated with increased incidence of NTDs.

Cameron and Hidalgo counties are predominantly agricultural, at this time.  However, the passage of the North American Free Trade Agreement (NAFTA) has prompted industry to begin introducing new facilities into the area.  There are environmental risks associated with both agriculture and industry.  Considering agriculture, an increased prevalence of NTDs has been observed in areas with a high use of pesticides  (3) .  Exposure to agricultural chemicals can occur through contact with airborne contaminants or through the consumption of vegetables and the ingestion of polluted drinking water.  Conversely, rapid industrialization introduces greater contaminant levels from toxic air and water emissions.   An increased number of hazardous waste sites can also be expected.

Together, Cameron and Hidalgo counties have over a half-million acres of cultivated farmland.  In 1995, 240 thousand acres of cotton and 240 thousand acres of sorghum were produced in these counties, making these the primary crops grown in the Rio Grande Valley (4).  Cotton requires a greater variety and amount of pesticides compared to other crops.  The most widely used pesticides in Cameron and Hidalgo counties are azinphos-methyl, methyl parathion and trifluralin.  The Environmental Protection Agency (EPA) has found that methyl parathion has adverse reproductive effects (5).  Because of the primarily agricultural nature of the area, and the predominance of crops requiring extensive use of pesticides, a comparison of rural and urban subjects is essential.

Objectives

The primary objective of this term project was to develop a base map for the TDH investigation of risk factors associated with NTDs.  A secondary objective was to begin looking at how GIS could be used in a statistical analysis of the relationships between where a woman conceived and her proximity to various sources of pollution.  The base map establishes a spatial context for these relationships.  In order to allow others to use and expand upon the information that was collected for this project, it was necessary to develop the base map in a broadly applicable coordinate system.  For this reason, all of the collected data layers were projected into the Texas Statewide Mapping System (TSMS).  TSMS is a standard coordinate system recommended by the Texas GIS Planning Council.  Another goal was to adequately explain each of the coverages and make them easily accessible to others.  A data dictionary describing each of the data layers has been included in the Appendix to this report.  In the future, it will probably be necessary to document all of this information in accordance with federal standards for metadata.

Procedure

The base map created for this report was designed to show locational relationships between the different elements of the TDH study.  It includes point coverages of different sources of environmental pollution such as industries reporting under the Toxic Release Inventory (TRI), wastewater treatment facilities reporting under the National Pollution Discharge Elimination System (NPDES), Superfund sites and permitted landfills.  Land use and land cover files were obtained to show the extent of rural and urban areas in Cameron and Hidalgo counties.  Various other features, including city and county boundaries, streets and highways, and water bodies were also included to better define the study region.  Finally, Digital Raster Graphic (DRG) files containing United States Geological Survey (USGS) 7.5-minute quads were obtained for selected areas.  These files produce geographically referenced images that can be used as paper-map backgrounds for comparison with actual coverages.

This report will discuss three main components of the base map development.  In the initial stages of the project, several coverages were created to define the area of study.  These included city and county boundaries and distinguishing features like rivers and major highways.  Subsequently, potential sources of pollution were mapped using the data layers described above.  Addresses were obtained for many of the TRI and NPDES sites.  A street database was geocoded making it possible to compare points located on the map using latitudes and longitudes with points defined by addresses.  This can be used to ensure the accuracy of an industry location, for example.  Finally, a coverage was created pinpointing the locations of a hypothetical group of case- and control-women at the time of conception.  It is emphasized that these subjects are imaginary and were created to show how GIS can be used to quantify spatial relationships.

Defining the Area of Study

Beginning with a county coverage of the entire U.S., it is necessary to select only Cameron and Hidalgo counties and create a separate coverage.  In order to do this, a query is performed in ArcView by clicking on the hammer  icon.  When the query box appears, select just the state of Texas by clicking on [State_name] under fields then  and then under values, click "Texas."  The query can then be repeated to select Cameron and Hidalgo counties.  In order to do this, the fips code for the county must be known.  The fips code for the state of Texas is '48' and the codes for Cameron and Hidalgo counties are '48061' and '48215,' respectively.  The result of this process is shown in Figure 1.

 

Figure 1.  Selecting the Study Area in ArcView 

Once the study area is selected, it can be converted to a shapefile by using the menu option Theme/Convert to Shapefile.  Many of the coverages for this term project were created using the query process.

Map projections are used to represent the curved surface of the earth on a two-dimensional surface like a piece of paper.  There are many different types of projections.  Consequently, it is important to use a common projection that is applicable in most situations.  The NTDP will use a coordinate system called the Texas State Mapping System (TSMS).  Specific parameters for TSMS are provided below. In order to describe projection and some of  the problems that can be encountered, the process of obtaining a land use-land cover (LULC) file for the project will be described.  The import file for the Brownsville area, lbr25097.e00, was downloaded from the EPA site earth1.epa.gov.  This file was imported (decompressed) into a coverage using the following command.

Arc:  import grid lbr25097 browns 

Browns is the name of the output coverage.  In order to project the coverage into a different coordinate system, it is important to know what the original projection was.

Arc:  describe browns  Along with other information, the coverage description states that Browns is in an Albers projection.  The TSMS uses a Lambert projection.  It is important to note that the coverage cannot be transformed into a Lambert projection in one step.  It must first be defined in a base coordinated system called Geographic.  My original attempts at direct conversion resulted in incorrect straight-line images when the coverage was brought up in ArcView.  Specific parameters for a Geographic coordinate system are presented below. After the coverage is converted into Geographic coordinates, it can be successfully projected into TSMS using the project command.  An output coverage must be specified.  In this case the coverage was named lulc_cam, as in the land use-land cover data layer for Cameron county.

Arc:  project cover browns lulc_cam  

I combined the LULC data layer with the county coverage using a query process to select out the features of LULC that were outside of the county boundaries.  I did a theme-on-theme selection in ArcView by choosing the menu item Theme/Select by Theme.  Using the LULC layer as the active theme, its features were intersected with the boundaries of County by clicking New Set as shown in Figure 2.

 

Figure 2.  Intersecting Coverages in ArcView 

The final step in defining the study area was to add geographical reference points to make it easier to compare the important relationships between subjects and sources of pollution.  Labels were added to identify Cameron and Hidalgo counties and a number of larger cities and towns.  This allows the viewer to distinguish between a case in Harlingen and a case in Brownsville, for instance.  Labels are created in ArcView by selecting a theme and choosing the menu item Theme/Auto Label.  If there are a lot of features associated with a coverage, it is possible to label a selected few by zooming in on them and clicking the button  Label Only Features in View Extent.  Using these procedures, a map can be created to simplify a study region.  A map showing some important cities and towns in the NTDP investigation is presented in Figure 3.

 

Figure 3.  Selected cities in Cameron and Hidalgo Counties 

Mapping Potential Risk Factors

Point coverages were created for several different pollution sources.  In order to define a location on the base map, it was necessary to know a latitude and a longitude for each point.  The coverages for Superfund sites and permitted landfills were downloaded directly from the Texas National Resource Conservation Commission (TNRCC) Regulated Sites web page.  The EPA's Envirofacts database provided addresses and Global Positioning System (GPS) coordinates for the majority of the industries reporting under TRI.  A general procedure was followed to transform the raw data into a layer for the base map.

The first step is to create a table that can be loaded into ArcView.  ArcView is able to work with database files (*.dbf) and delimited text files (*.txt) along with its own program specific INFO files.  Most spreadsheet and database applications allow their files to be transformed into these generic formats.  Tables are imported by opening up a View window and clicking on the  button and then the Add button.  This action places the table in the ArcView environment but it does not create a theme (an ArcView coverage).  To do this, select the menu item Theme/Add Event Theme and specify the name of the table and the fields containing the x- and y-coordinates.  This step allows data from the table to be used to create a new theme.  After the theme is created, it can be changed into a shapefile as previously described.

In order to project the theme, it must be turned into a coverage that Arc/Info can process.  The following commands perform this transformation.  Afterward, the coverage can be projected.

Arc:  shapearc industry1 industry2 

Arc:  build industry2 points 

Arc:  addxy industry2 

A map showing the location of industries reporting under TRI regulations is presented in Figure 4.

 

Figure 4.  Toxic Release Inventory Sites in Cameron County 

Geocoding

Geocoding can be used to create a geographically referenced point coverage using addresses instead of latitudes and longitudes.  This is very useful when GPS coordinates are not available.  ArcView can geocode a location when a street address and a zip code (or city) are specified.  In order to map specific locations, a coverage or table containing the links between streets and geographic coordinates are necessary.  Topographically Integrated Geographic Encoding and Referencing  (TIGER)  files created by the Census Bureau can be used for this purpose.  Once the reference table or coverage is obtained, a geocoding index is created in ArcView by selecting Theme/Properties and clicking on the  button.

Latitudes and longitudes were found for all of the point sources in this project.  Addresses were also obtained for the TRI and NPDES sites.  When both types of information are available, addresses and GPS coordinates can be compared for quality control purposes.  For the TNTDP, it will be important to have accurate locations for all of the point coverages in order to determine the exact distances between  case- and control-women and risk factors.  When the reference table is available, clicking on the  button and specifying the street address and zip code will allow the program to locate a point on the map.  The location defined by the address can then be compared to the location defined by the latitude and longitude.  Two different locations for the same industrial facility are shown in Figure 5.

 

Figure 5.  Comparing Two Point Locations for a Single Industry 

In Figure 5, either point location may be incorrect.  However, ArcView has the ability to evaluate how accurately an address location can be pinpointed.  When ArcView geocodes, it calculates a score based on the specified address.  A perfect match between an address and a geographically referenced location gives a score of 100.  If the address for the industry above had received a perfect score, one could be pretty certain that the GPS coordinates defining the other point were inaccurate.

The reference line is provided as a gauge of the relative differences between the points.  If the address location was found to be correct, one would need to either use that point location or evaluate whether one-tenth of a mile is an acceptable level of error given the source and type of potential pollution involved.  Most of the time, you are looking at a coverage consisting of many points.  In this case, it becomes necessary to look at relative amounts of error between the locations determined by addresses and those determined by latitudes and longitudes.  This can be a time-consuming process.

Outlining Spatial Relationships

Once the base map has been defined and the locations of the pollution sources have been determined, risk factors can be spatially quantified by mapping the locations of case- and control-women at the time of conception and establishing buffer zones.  Buffer zones define a circular area of a specified radius surrounding a subject.  The number of subjects living within the predetermined buffer distance can then be calculated for each category of pollution source.  Using this methodology, the proportion of case-women living close to a certain type of pollution source can be compared to the proportion of control-women living near the same type of pollution source.

For this term project, a point coverage of a hypothetical group of 10 case-women and 20 control-women was created in ArcView.  For the TNTDP, actual GPS coordinates for the subjects are determined.  A table containing a unique identification number for each subject along with the latitude and longitude of the location where they conceived can be imported into ArcView and transformed into a point coverage using the procedure described for mapping potential risk factors.  Buffer zones of 1-mile, 1/2-mile, and 1/4-mile have been established for the study.  A buffer zone coverage can be generated around the point locations of the hypothetical subjects using the following command:

Arc:  buffer subjects buffercov # # 1600 # point 

In the above command, Subjects is the coverage to be buffered and Buffercov is the name of the coverage to be created.  The #-signs tell Arc/Info to use default values for some options.  The buffer distance of one mile is specified as 1600 meters in the case where the coverage units are defined as meters (TSMS).  Finally, the word point signifies that the coverage to be buffered is a point coverage.  If the coverage type is not specified, the program will run properly but the coverage will be incorrect when it is displayed in ArcView.

As described earlier, GIS is used to evaluate spatial relationships for two important issues:  rural and urban trends for subject conception locations and proximity to potential sources of pollution, including landfills, Superfund sites, TRI sites, and NPDES sites.  For the rural and urban comparison, the LULC data layers were used to distinguish between different broad categories of land use within Cameron and Hidalgo counties.  Point locations for the imaginary case-women (red) and control-women (blue) can be compared to land use by looking at Figure 6.

 

Figure 6.  Rural and Urban Trends for Case- and Control-Women 

There are some trends that can be inferred from just glancing at the figure.  The case- and control-women are concentrated in urban areas (yellow).  The large amount of area identified in the map as agricultural (green) is consistent with the fact that Cameron county has over 200-thousand acres of cultivated farmland.  However, the case- and control-women are concentrated in the urban areas (yellow).  Differences between proportions of cases and controls living in rural or urban areas are not immediately obvious.  However, these relationships can be quantified and are discussed in the Results section of this report.

Evaluating spatial relationships for specific categories of pollution sources is accomplished in a similar manner.  The point coverage for the source is displayed in ArcView along with the buffer coverages for the subjects.  The proximity of case- and control-women to TRI sites in Cameron county can be examined in Figure 7.

 

Figure 7.  Case and Control Proximity to Toxic Release Inventory Sites 

The conclusion that can be immediately gathered from the figure is that TRI sites occur predominately in urban areas.  This observation can be correlated to the previous statements made about the relationships between increased industrialization and greater contaminant levels from toxic emissions.  Once again, differences between case and control proximity to TRI sites is not visually apparent but can be easily quantified as described below.

Results

The spatial relationships that are inferred from looking at subject proximities to environmental risk factors are easily quantified.  Once buffer zones are plotted around cases and controls, the area within and outside of the boundary becomes defined by distances.  As an example, buffer zones were used to compare the proportion of the imaginary subjects that were living in rural areas at the time of conception with those living in urban areas.  A rural subject was defined as a woman who was surrounded by at least one mile of agricultural land, as defined by LULC data.  All of the other subjects were considered to have conceived in an urban area.  The number of cases and controls meeting each type of criteria was counted.  This number was then divided by the total number of cases or controls and multiplied by 100 to arrive at a percentage.  The trends for the hypothetical sample of subjects are shown in Table 1.

 

Table 1.  Urban and Agricultural Subject Comparison 

Given a statistically significant sample population, it could be concluded that the majority of cases and controls conceived in urban areas.  The conclusion could be reinforced by looking at census data.  The fact that there are no differences between the percentages of cases and controls living in either type of area would suggest that risk factors associated with urban areas were at the same level as those associated with rural areas.  This assessment could be further investigated by looking at spatial relationships for specific types of pollution.

The procedure outlined above for rural and urban subject comparisons could be used to examine risk factors associated with different types of pollution.  Looking at different categories of sources cannot provide any information about specific contaminants, but it can provide a sense of which pathways would merit further investigation.  For example, if the percentage of cases living near landfills was significantly higher than the percentage for controls, one could consider the possibility that groundwater contamination by leachates should be examined.  A higher percentage of controls near TRI sites would point in the direction of air emissions and suggest that hard data for actual industrial releases be collected.  Spatial relationships between the hypothetical subjects and different types of sources are presented in Table 2.

 

Table 2.  Subject Comparison for Specific Types of Potential Pollution Sites 

Conclusions

At the present time, the rate of NTDs in Cameron and Hidalgo counties remains at one and one-half times the national average.  Members of the community along the border between Texas and Mexico have voiced their concerns that an environmental agent could be responsible for the elevated rate.  One of the ways in which environmental effects can be assessed is by looking at spatial trends in conjunction with traditional methods, such as demographics. GIS is powerful and innovative tool for spatial analysis.  Consequently, it provides an opportunity for uncovering relationships that might not have been discovered in the past.  This term report serves as a brief introduction to an important study that is now underway at the Texas Department of Health.  It is important in that it addresses a problem that is common to the entire border region, high prevalence of NTDs.  In addition, the approach that is being developed can be applied to any environmental risk factor to determine whether or not it has actually caused a negative impact on human health.

References

1. Sever, L. The Conundrum of Birth Defects Clusters. Health and Environment Digest,
         1993. [return] 

2. Texas Department of Health. An Investigation of a Cluster of Neural Tube Defects in
         Cameron county, Texas. July 1, 1992. [return] 

3. Field B., Kerr C. Herbicide use and incidence of neural-tube defects. Lancet i:1341-142,
         1979. [return] 

4. Texas Agricultural Statistics Service. Texas Agricultural Statistics. 1995. [return] 

5. U.S. Environmental Protection Agency.  1994. [return] 


Appendix-Data Dictionary



Please send any comments regarding this document to: k_martinez@mail.utexas.edu

Click here to go to my personal home page: Kris Martinez

Go to the Course Home Page