Geocoding Exercise:
Where in the World?!

CE 397 GIS in Water Resources
University of Texas at Austin

Prepared by: Carolyn Nobel


Table of Contents

  1. Centering in on Cameron County
  2. Adding Latitude and Longitude Data Point to the Map
  3. Producing A "Map on Demand"
  4. Preparing for Geocoding
  5. Address Mapping & Trouble Shooting
  6. Prettying the Picture

Goals of the Exercise

The purpose of this exercise is to become familiar with different procedures for producing maps with point locations by mapping industries by their latitude and longitude information or address location.

Computer and Data Requirements

This exercise uses ArcView3.0.

Data needed:

This data are located at the sub-directory gishyd97/class/geocode/gisfiles of the GIS-Hydro 97 CD-ROM.

Procedure

1. Centering in on Cameron County

This part of the exercise cuts down a large data set into the smaller area of interest.

Get County Data
Open ArcView 3.0.
Go to the project window and Open a New View.
Rename it latlong by going to View/Properties

Add a new theme to the view by clicking on and add the txcnty theme your working directory. Make the theme active and check the box next to the theme. You will see that this theme contains all of the counties in the Texas. However, you only need to deal with Cameron County for this exercise.

Query for Cameron County
Perform a query on the txcnty.shp theme by going to Theme/Properties and clicking on the hammer icon.
From the query builder window use the field, value and buttons to select Cameron County, Texas as shown below.

Click OK twice to display the theme with only the selected attributes. You will see a small dot representation of Cameron County on scale with the rest of the United States.
Select Theme/Add to Shapefile... name the theme Cameron.shp and add it to your working directory. Now you have a theme comprised only of the county of interest.
Make the cameron theme active and zoom to the extent of the active theme by clicking on
The small strip of land to the east is Padre Island!
Double click on the cameron.shp theme to bring up the legend editor to make the theme the color of your choice.
Remove the Counties.shp theme from the view by making it active and going to Edit/Cut Theme.

2. Adding Latitude and Longitude Data Points to the Map

Load data into ArcView as a table
Goto the project window, highlight Table and click the Add button.
In the Add Table menu:
List Files of Type: Delimited Text (*.txt) and select splat.txt from your working directory.
The following table should be added to your project:

This data was acquired from the US EPA SPLAT (Spatial Preferred Locational Attribute Table) database. The information from this table includes the names of industries, their latitude and longitude, address and accuracy information. SPLAT is a local database for now, but eventually the data will be added to the Locational Reference Tables in Envirofacts (which you will use later in the exercise.)

Define Units
Tell ArcView the units of the x,y coordinates in the table by clicking on the Cameron view to get the view menu at the top of the window.
Goto View/Properties.
In the View Properties menu, choose decimal degrees from the Map Units dropdown list.
Press OK.

Add Latitude & Longitude Points to the Theme
Go to View/Add Event Theme
In the Add Event Theme window that appears, choose splat.txt from the Table dropdown list.
ArcView reads the field names in the table you specify and chooses the most likely default names for the X and Y fields. Choose longitude from the X and latitude from the Y field dropdown lists.
Press OK.
Name the theme industries (lat/long)
A new theme is added to the view containing all the points defined in the table.

ArcView automatically maintains the relationship between a theme created in this way and the tabular data it is based on, so that changes to the tabular data are reflected in the map. If you did not want this relationship to be maintained (e.g. if you wanted to use this map when the connection to the database is not available) you could save the theme to a shapefile by making the theme active and choosing Theme/Convert to Shapefile, adding the new shapefile to your view as a theme and deleting the original latlong theme. This is not necessary for this project (DELETE THIS PARAGRAPH?)

Add & Label Cameron County Cities
You can see that the industries generally fall in clusters. Not surprisingly, these industry clusters surround cities.
Add a new theme to the view by clicking on and selecting the txcities.shp file from your working directory.
If you make the cities.shp file active and hit the button, you will see that this file contains cities for all of Texas. Again, however, you are interested in only the cities in Cameron county, so you can make a new theme with the smaller data set.
Make sure the cities.shp theme is active.
Goto Theme/Select By Theme and select the features of the active theme which Intersect the selected features of Cameron.shp
Press New Set. This will highlight the cities in Cameron county.
Goto Theme/Convert to Shapefile... name the new theme camcities and add it to your view.
Double click on the camcities theme to bring up the legend editor.
Double click on the symbol to bring up the pallet window. Click on the button to bring up the Marker Palette.
Choose the square to differentiate the cities from the industries. (If needed, also pick out a color which will stand out from the other themes.)
Click Apply
Erase the larger file cities.shp from your view by making it active and going to Edit/Delete Theme.

What cities are these? Label them and find out...
Make sure the Camcities theme is active.
Goto Theme/Auto Label...
Choose Label field: City_name, and check Allow overlapping labels
Click OK
The cities Harlingen, San Benito and Brownsville should appear in your view.
Goto Window/Show Symbol Window to change the text and color of the labels and/or move them if desired.

3. Producing a Map on Demand

The Environmental Protection Agency collects huge amounts of data-- most of which is available to the public through the Freedom of Information Act. EPA Envirofacts http://www.epa.gov/enviro/html/ef_home.html is a relational database that integrates data extracted from EPA program systems:

Envirofacts Overview describes these systems in more detail.

These databases can be accessed to produce "Maps On Demand"

Applications include:

Site Info Application
Pick an industry from the splat.txt table in the project and note the latitude and longitude information.
Goto National SITEINFO Request Form and submit a request for a map using whichever parameters interest you. Include your email address, and very shortly you will receive a message saying your map is completed and giving you the web address where you can find it.

This is an excellent example of how GIS is being used to put previously difficult to acquire information at the public's fingertips-AWESOME!

4. Preparing for Geocoding

Geocoding is the process by which you add point locations defined by street address, or other address information, to your map. It's the computer's equivalent of pushing pins into a street map on your wall. When you geocode tabular data containing addresses, ArcView reads the addresses, finds where they are located on your map, and creates a new theme containing a point for each address it was able to find.

Add the Address Table
Load the table containing the address data:
Click on the Project window, highlight Table and press Add
Choose addresses.txt from your working directory. (Make sure the type of file listed is delineated text)
This data was acquired from the Manufacturers & Processor Directory for Brownsville, Texas.

ArcView supports address formats including:

This address file is in the US street address with zone format-- the most frequently used address format in the US. The street address is contained in one field. Another field contains the Zip code, which ArcView uses as the zone identifier to resolve situations where there is more than one street with the same name in a given area. A field containing the city name can be used as the zone identifier instead of the Zip code. The fields containing the street address and the zone identifier can have any name. Address data typically also has a field containing the sate name or abbreviation. This field is not used by ArcView because the US state is inherent in an address's Zip code.

ArcView supports a wide range of abbreviations for terms such as 'street' and 'Road'. If data contains an unusal abbreviation or a misspelling, ArcView can bring this to your attention when you geocode the data. Street addresses can also be given as intersections separated by an ampersand (&). All geocoding is case insensitive.

Adding & Sizing A Reference Theme
You have to buy a suitable street map and hang it on your wall before you can start pushing pins into it. Similarly, before you start geocoding your data, you need to prepare the view that you want to add this data to by adding a reference theme.

The data table industry.txt to be geocoded contains street addresses, so the reference theme for this exercise will show the streets in the area of interest.

Open a new View, go to View/Properties and rename it Geocode.
Click and add the camstrt.shp theme. This is the street data for Cameron county which was clipped from the street data for all of eastern Texas taken from the ESRI Street database. U.S. Streets contains interstate, U.S. and state highwys, other major thoroughfares and all local streets within the United States. The nationwide street database is based on the Geographic Data Technology (GDT) Dynamap/1000 street file, an improved version of census TIGER data.
Eleven descriptive fields identify the entire network of over 30 million roads:

As you might imagine, these street files are big! The address file includes industries only in the Brownsville area, so you can cut down the street data even further to create a new theme which includes only the streets in that area.
Use the tool to zoom into the greater Brownsville area. (The lower portion of the county)

Make sure the camstrt.shp theme is active
Use the button from the toolbar to draw a box around the view. The street in the view will be highlighted.
Goto Theme/Convert to Shapefile and name the new theme bstreets. Add the theme to your view. Turn off the cameron and txestrt themes.
Now your View contains only the Brownsville streets!

Make A Geocoding Index
In order to use a reference theme in geocoding, it must have a geocoding index, an index that ArcView builds to speed up the process of finding addresses in the theme.
To determine if bstreets has a geocoding index:
Click on the theme bstreets to make it active.
Look at the Locate button In this case, the button is dimmed out (not available) so you need to build a geocoding index for the theme... luckily this is easy!

Make sure the bstreets theme is active.
Goto Theme/Properties or Click the Theme Properties button
In the Theme Properties Window, click the Geocoding Icon to display the theme's geocoding properties. ArcView will automatically choose the address style: US Streets with Zone. ArcView's built in address styles specify which fields in a reference theme's attribute table are most likely to contain the address information required to build the geocoding index which will include the files streets.ixs and streets.mxs
Press OK. Press Yes when prompted if you want to build the geocoding index for the theme.
Now you have a geocoding index and the locate button should be active!

5. Address Matching & Troubleshooting

Locating a single address
If you want to find a single address on the map, you can just type it in.
Click on the reference theme (bstreets) to make it active.
Click the locate button
In the window that appears, type in the address you want to find. Use one of the addresses in the industry table.
Arcview will geocode the address and (if it can find it) locate it on your map with a point.
To clear the view for the next application, select the point and goto Edit/Delete Graphic to remove it.

This application can be expanded into applications such as Internet Yellowpages and location finders.
For example, GTE superpages http://yp.gte.net allow you to:

Check it out!

Geocode a Table of Addresses
Click on the Geocode View to make it active, then goto View/Geocode Addresses
The Reference Theme dropdown list shows the reference themes in the view that are ready for use in the geocoding process because they have a geocoding index. In this case, the selection is obviously bstreets.shp.
From the Address Table dropdown list, choose industry.txt, the table containing the addresses to be geocoded.

Choose the fields in the table which include the street address data:
Address Field: Address
Zone Fiels: Zip
Display Field: Name
The Geocoded Theme field specifies the name and location for the new theme that ArcView will create to contain the points representing the geocoded addresses. Put the theme in your working directory and name it industries.

You can match the addresses in either batch or interactive mode.
Press Batch Match to geocode all the addresses and display the results. (You will go back to fix the problems later.)

ArcView geocodes the addresses in the address table by matching them to the address data in the reference theme. For each address being geocoded, ArcView finds the most likely candidates in your reference theme. So, for this exercise, each candidate is a street segment in Brownsville along which ArcView could locate the address. For each candidate, ArcView calculates a score reflecting how well it matches the address you are geocoding. It then ranks the candidates by this match score and chooses the candidate that yields the highest score as match for the address.

A perfect match yields a match score of 100. There are a number of reasons why scores are sometimes lower than 100. For example, there may be a mistake in the table being geocoded. A street name may be spelled incorrectly, the house number may be incorrect, the address might not specify whether it's a street or a road, or the address may not be in the area covered by the reference theme. It's also possible that the reference theme being used is incorrect or out of date. For example, a new street may have been built which has not yet been included in the street database.

A candidate with a score between 75 and 100 can generaly be considered a good match. Scores less than 75 indicate that ArcView has found a location on the map for the address, but this location might be incorrect. An address for which ArcView can find no candidate with a score above the minimum match score is considered to have no match, and can't be added to the map. (This minimum match score is set by default at 60, but can be changed-as you will soon find out!)

Once ArcView has finished geocoding, the Re-Match Addresses Window displays how many addresses it could match.

At this point, if you press the Done button, ArcView will add the new theme containing the geocoded addresses to your map. Addresses for which no match could be found will not be added to the map.
However, you can try to match the addresses that could not be matched...

Interactive Re-Match
Make sure the Re-match dropdown list is set to No Match, and press Interactive Re-Match.
The Geocoding Editor will appear with the addresses that ArcView couldn't match.
For each address that you want to re-match, the Geocoding Editor will normally list all the likely candidates you could match the address to. In this case, no candidate is listed for the first address 468 Regel Rd.
Since the person who typed in the file is a notoriously bad speller, a good guess to the problem might be a spelling mistake. To find out what the correct street might be, relax the spelling sensitivity.
Click the Preferences button.
In the Geocoding Preferences window that appears, adjust the slider for Spelling Sensitivity to 60

Press OK
Using the relaxed spelling sensitivity, ArcView has now found one candidate for the address. If you decide to accept it (go for it), Click the Match button.

Note: At this point, you've not proven conclusively that Regel is a misspelling of Regal, you've just determined that this is probable. It's also possible, for example, that there is a Regel street in Brownsville but it was not included in the reference theme street database.

Press the Done button.
The geocoding results are shown again, updated to include the interactive matches.

Use the dropdown list to change Rematch:Partial Match
Select Interactive Rematch to show the addresses (one-by-one) that partially matched and the candidates. Take your best guess and press Match to match the addresses.
Press the Done button to get back to the Re-Match Addresses window.
Press Done to finish geocoding.

The theme containing the geocoded industries addresses is added to the map!

Rematching a Geocoded Theme
A theme created by geocoding can be re-matched at any time. If you get inspired and want to see if you can match unmatched addresses by relaxing the geocoding preferences.
Click the geocoded theme industries to make it active.
Goto Theme/Re-match to bring up the Re-match Addresses window, decide what you want to re-match from the re-match pull down and select Geocoding Preferences to change the sensitivity.

Note: Re-matching a geocoded theme does not involve the table that was originally geocoded to create the theme. If the original table data is updated and needs to be geocoded, use the View/Geocode Address and create a new theme.

6. Prettying the Picture

Street maps are not very practical without labels. In this section we will add road names and differentiate between the different types of roads.
Make the streets theme active
Goto Theme/Table... to bring up the attribute table of the street file. You will find a long list of PolyLine shapes. If you scroll to the right, you will also find the Name and CFCC attributes which will be used to accomplish the above.

Road Type Legend
Double click the streets theme to bring up the Legend Editor window.
Press Load...
Select roads.avl from your working directory to load the road legend. If desired, use the Pen Palette to change the size and shape of roads.
Click Apply to get a new improved street map! Use the Zoom In button to get a closer look. (Choose an area where there are several industries.)

The legend editor file uses the CFCC (Census Feature Classification Code) classification from the attribute table to assign different colors to different types of roads. The CFCC is one letter and 2 number classification to describe a map feature. This street file contains only "A" (road) classifications.

Road Names

Make sure the streets theme is active.
Goto Themes/Auto Label...
Choose Label field:Name and Check Allow Overlapping Labels
This process takes awhile, so you may want to also choose Label Only Features in View Extent
Depending on the size of the view you selected, you will get a pretty complicated picture. (The names in a different color are those which overlap) You can use the Text Pallet from the symbol window to manipulate the size and font of the text, or you can zoom in to a greater extent.

Selecting Road Type To deal with a larger scale map, you could select only the major roads.

Make sure the streets theme is active
Open the Attributes of streets Table
Click on the hammer icon
Check the Update Values box and click on [CFCC] Field. This will give you a list Values options. Set the CFCC values equal to the values for Interstate (A10-A18) or US & State Highways (A20-A28). Be careful of the syntax! Check the example below:

Choose New Set. ArcView will find all of the roads in the categories you selected. (This takes a little while.) You will see them highlighted in your View.
Open the View and select Themes/Convert to Shapefile...
Name the new shapefile bigroads and add it to your view.
Uncheck streets, and check bigroads.
Load the roads.avl file in the legend editor.
Make the industries theme active and hit the button.
Now you should see only Brownsville's major roads!

Congratulations! You are now on your way to being a geocode "geek"!


These materials may be used for study, research, and education, but please credit the authors and the Center for Research in Water Resources, The University of Texas at Austin. All commercial rights reserved. Copyright 1997 Center for Research in Water Resources.