Mapping Environmental Data Stored in Microsoft Access

Andrew Romanek and David Maidment

The University of Texas at Austin

Goals of the Exercise
Computer and Data Requirements
Introduction - What is a Relational Database
Procedure

Acquire the Coverages and Database
Display the Coverages
Understanding Relationships
Creating Queries in Access
Setting up the ODBC Driver
Importing Data into ArcView

Joining Data with an Existing Coverage
Creating a New Point Coverage

Creating Contour Grids

Inverse Distance Weighting (IDW) Method
Spline Method

Preparing Data for a Risk Map

Using the Binomial Distribution

Goals of the Exercise

This exercise covers the following concepts:

Working with the relational database Microsoft Access.
Importing data from Access into ArcView.
Creating contour plots.
Calculating exceedance probabilities of specific chemical levels.

Upon completion of this exercise, you should be able to set up simple queries in Access, import data from a database into ArcView, join this data to an existing coverage, and create different surfaces based on concentration measurements. This exercise is intended to introduce you to the numerous possibilities of using Access and ArcView for display and analysis.

Computer and Data Requirements

To complete this exercise, you will need access to a computer with ArcView GIS 3.1 (or 3.0a) and Microsoft Access '97. Your computer must also have an Open DataBase Connectivity (ODBC) driver for Microsoft Access databases. This driver allows ArcView to locate tables and queries in Access, and thus retrieve any desired information. The existence of this driver can be checked by opening the control panel and double clicking on the ODBC icon. Select the ODBC drivers tab to see if you have an Access driver. More information on ODBC drivers and SQL connections is available from the ArcView help topics menu.

The Creating Contours step requires the Spatial Analyst extension.

Introduction - What is a relational database

A database is simply a collection of information that relates to a particular subject or purpose. The database is stored in one file, but this file can contain multiple tables, all of which relate to the particular subject but contain different data related to that subject. For instance, a database for a retail company could contain a table for customer information, another table for product information, and a final table for shipping orders. All three tables relate to the operations of the company, but they contain different data sets that are easier to manage and understand through the use of separate tables. Tables are the fundamental elements of a database, and all database operations are performed through actions on tables. Tables are essentially storage containers, and database programs such as Microsoft Access can perform various operations on these containers. Data in tables can be managed in the following ways with Access:

View, add, and update table data using online forms.
Find and retrieve just the data you want using queries.
Analyze or print data in a specific layout using reports.

In order to combine data from separate tables into one form, query, or report, relationships or links can be set between each table. In the example described above, a common field such as a customer id could exist in both the customer information table and in the shipping orders table. By joining this field between tables, data can be retrieved from each table and combined into another table (a query or report). For example, one could determine all the shipping orders in March for a specific customer. Queries and reports can also perform calculations on data. To extend the above example even further, one could create a report that would calculate the money spent by each customer for orders in March. To accomplish this task, Access (or an equivalent database utility) would determine all the orders and products bought for each customer in March from the shipping orders table and then calculate a sum of $'s spent using prices in the product information table. This figure displays the data management operations in Access:

(From Access help menu)

In this exercise, you will be using a sample environmental database for an oil refinery located in Marcus Hook, PA. This database is set up like any other relational database, but contains information specific to environmental data at this facility. This database contains information such as sampling point locations, groundwater levels, and chemical analytical results. We could determine, for instance, the location of a specific well, information about that well, the groundwater levels that have been observed at the well, and the chemical concentrations that have been measured at the well. Databases work in both directions, so we could also find all the locations at which a specific chemical of concern (COC) was measured. Pretty cool!

With an environmental database, it would be beneficial to not only create reports and queries, but also to view and analyze this information. We can query concentration measurement locations in Access, but it is difficult to visualize these locations from a table of data. That is where ArcView becomes a very useful tool because of both its display and data storage capabilities. Let us see how this works.

Procedure

Acquire the Coverages and Database

There are a total of nine files that you will need to complete this exercise. The files can be found on the LRC server at lrc/class/maidment/giswr/riskmap. The files also can be downloaded using this riskmap.zip file.

Here is the file breakdown:

bound.e00 - Marcus Hook Refinery property line.
monit.e00 - On-site monitoring wells.
sourcearea.e00 - Concentration source area.
usgsmap.tif - USGS map for Marcus Hook and the surrounding area.
usgsmap.tfw - Geographic reference of the USGS map (without this file, the image would not be in the correct projection).
database.mdb - Microsoft Access environmental database.
cocvalue.ave - Script to create a point coverage of concentration data.
gwelev.avl - Legend for the groundwater maps.
coc.avl - Legend for the concentration maps.

Some of these files are in export format, so you will need to use the ArcView Import71 command or the Arc/Info import command. These are the commands if you are using Arc/Info:

Arc: import cover bound.e00 boundary

Arc: import cover monit.e00 wells

Arc: import cover sourcearea.e00 sourcearea

Note that the file monit.e00 is imported as a new name, wells, and similarly, bound.e00 becomes boundary when imported. Usgsmap.tif is an image file. We will examine this further in the next section.

Display the Coverages

Start up ArcView. Open a new view and add the themes Boundary and Wells from you working directory. You should now see the refinery boundary line and the layout of wells across the site. In order to get a better idea of where this refinery is located and its general layout, let's add the USGS map. Add the theme usgsmap.tif to your view. Make sure you have image data source selected. This map has a raster data structure. If you zoom in closely, you will see that the map is a collection of grid cells. Each cell has a different color, and the size of these cells determines the pixel resolution for the image. Zoom out to obtain a clearer view.

This refinery lies along the Delaware River. Marcus Hook is located in Southeast Pennsylvania, about twenty miles South of Philadelphia. The data are given in State Plane Coordinates for the South Region of Pennsylvania. This area is highly industrialized, but there is a residential area across from the northern boundary of the refinery. Your view should like something like this:

The University of Texas is currently involved with a research project at this facility, and the project goal is to develop a program for corrective action that will protect human and ecological receptors both on and off the property. The primary steps in the risk assessment process are to identify sources, source areas, transport mechanisms, points of exposure, exposure routes, and receptors. Sources are the physical structures that could cause a chemical release such as a tank or pipeline. Source areas are the locations of the highest concentrations in environmental media, such as groundwater or soil. A transport mechanism is the combination of chemical, physical, and biological processes that move a chemical from the source to the point of exposure. The point of exposure is the location at which an individual may come in contact with a chemical. The exposure route is the manner in which a chemical comes in contact with an organism, and receptors are persons that are or may be affected by a release. In this exercise, we will focus on identifying source areas based on concentration measurements taken from the soil on-site. We will also look at groundwater level measurements to identify potential transport pathways.

Understanding Relationships

From the Start menu, choose Programs and Microsoft Access. Select Open an Existing Database and choose the database you obtained in the first step. The database window will appear with five different tables listed. Again, these tables contain separate types of data all related to the environmental conditions at the refinery. Click on the icon to see the relationships between these tables. This view shows the tables, the fields within each table, and the links between tables:

In order to understand these relationships, let us look at the Sample_Collection and Results tables. The Sample_Collection table is linked to the Results table through the common SAMP_ID field. Sample_Collection provides information related to each sample that has been collected, including the type of sample and location where it was obtained. The Results table displays all the concentration measurements for each sample. These concentrations are determined using specified analytical methods. View the data in any table by returning to the original "Database" window that you got when you opened the database and then double-clicking on any of the tables to see its contents. Click on each of the tables so that you can see what it contains.

More information on what each field refers to can be obtained from the Database Dictionary.

Creating Queries in Access

In this step, you will learn how to take information from different tables and combine it into one table using a query. We want to create a query that will determine the groundwater level measurements on a particular date. From the main Database window, click on the Queries tab and select New.

Access "Wizards" can be useful tools for designing queries, forms, and reports. However, we will use the design view to create our query, so select Design View and click OK.

You will be prompted for the tables and/or queries you want to obtain data from for your query. Add the Location, Groundwater_Levels, and Well tables and then close the window. You will then see a clipboard containing the tables you added and an empty query form. Notice that the relationships among tables are also shown on this clipboard. You can modify these relationships by double clicking on any of the links between tables. When creating queries, you need to show all tables in the relational chain even if you are not obtaining data from all of them. For instance, if you wanted to obtain the groundwater levels for a specific well and also list attributes for that well, you would need to show the Groundwater_Levels, Well, and Location tables even if you did not want to include any fields from the Location table.

From Groundwater_Levels, click on the LOC_ID field and drag it into the first empty column on the form. Also select GWL_DATE and GW_LEVEL from this table and add them to the query form. From the Well table, select TWC_ELEV, which is the top of the well casing elevation (also referred to as the top of the well riser elevation). The GW_LEVEL is the depth to groundwater as measured from the top of the well casing. By subtracting this depth from the elevation of the top of the well casing, the elevation of the water table is found, and from these data a map of water table elevations can be created. The elevations used here are referenced to the North American Vertical Datum of 1988 (NAVD88), which is the most accurate vertical datum presently available.

To calculate the elevations, we will create a new field using an equation. To do this, move the cursor to the next empty column field and type GW_ELEV:[TWC_ELEV]-[GW_LEVEL]. This expression creates a new column called GW_ELEV using the given mathematical expression.

Besides performing calculations, you can also set criteria so that the query returns only the type of data you desire. For this query, we want to limit the groundwater level measurements to a single date, so type #4/30/94# in the Criteria field for the GWL_DATE column. Also type Is Not Null in the Criteria field for the GW_LEVEL column so as to eliminate measurements with no values.

A final step that may not always be necessary but is good to do anyway, is to remove duplicated records from your selected set. To do this, click in the gray area within the Query Window (so that the cursor is not identifying a particular table or field in the query), and then click on the icon in the main tool bar. Switch Unique Values from No to Yes and then close the window. Your completed form should look like this:

Now click on to run your query. You should have 36 records. Click on the x and then say Yes to Save Changes? Call your query GW_Elevations. Be sure to include the underscore because ArcView is not able to read spaces, and thus, without the underscore in the table name, you will not be able to import your data into ArcView later in the exercise. If you don't get the correct number of records, close the query window, open a new query, and repeat the instructions carefully. Make sure that you've added all three tables that are needed. The Location table is needed to complete the relational linkage between the Groundwater_Level and Wells data even though none of the query fields are drawn from the Location table.

We are also interested in benzene concentrations at the refinery, so we will need to create another query to obtain this data. Again select New and choose Design View. Add the Location, Sample_Collection, and Results tables. Include the following fields in your query design:

From Sample Collection: LOC_ID, SAMP_ID, and MATRIX

From Location: EASTING and NORTHING

From Results: CAS, VALUE, DET_LIMIT, and UNIT

The order you put these in will not matter, but be sure your query contains these fields because they will be searched for later with an Avenue script. As criteria, type "SOIL" in the MATRIX column, "71-43-2" in the CAS column, and "UG/KG" in the UNIT column. Be sure to include the quote marks. Run the query and then save it as Benzene_Soil. You should have 177 records. The CAS field 71-43-2 is the identification number for benzene. Note that there are three tables involved in this query and two different key fields are used to connect records in these tables. The records with no data in the Value field had a measurement that was below the detection limit as specified in the DET_LIMIT field. The detection limit is the minimum concentration that can be measured in an analysis. There are 52 records with a value greater than the detection limit recorded. Note that detection limits vary depending on the analysis method and the actual constituents in a sample. Some samples with no detected concentration but high detection limits might have a value that is greater than that for a sample with a detected concentration but low detection limit. This problem is a data quality issue that will not be discussed here.

To be turned in: Write in words what has been done in the query you have just executed. Which fields are used to connect the tables? Which fields are used to select the required records? What is the query which has been executed? Which fields are picked up as ancillary information about the selected records? Choose one record from the resulting Benzene_Soil table and describe what information this record represents (utilize the database dictionary provided above).

Setting up the ODBC Driver

Before importing data into ArcView, we need to tell the computer where to find the data. Open your Control Panel and double click on the ODBC icon.

Under the User DSN tab: click Add, select the Microsoft Access Driver, and then click Finish. For Data Source Name, type EnvAccess. Under Database, click Select and choose the location path for database.mdb. Choose OK and your driver should now appear under User Data Sources and look something like this:

Click OK to leave.

Importing Data into ArcView

Joining Data with an Existing Coverage

You have now built your database and are ready to import data into ArcView. From your previously opened project, open the attributes table for Wells. Notice the Loc_Id field has identifiers that are the same as the LOC_ID field in the database. If you are going to import data from a database and connect it to an existing coverage, you need to have two fields with identical values so that a link can be established.

Make the project window active and select SQL Connect under Project. In the Connection box, you should be able to scroll down and choose EnvAccess. If not, the driver is not set up properly. Click on Connect and you will see all the tables and queries that exist in the database (8 in all).

Click once on GW_Elevations to see the fields in that query. Double click on LOC_ID, GWL_DATE, and GW_ELEV. This process selects the fields we want to import. Now double click on GW_Elevations to set the input source. Criteria expressions can be added in the Where field, but this task is much easier to accomplish in Access. For Output Table, type Elevations. Your form should look like this:

Click on Query and ArcView creates a table of data using the database.

Close the window and open the Attributes of Wells table. We now want to join the data from the query with the well coverage. Make the Elevations table active and click on LOC_ID. Now make the Attributes of Wells table active and click on Loc_id. Use the Table/Join command to include the data you imported in the well attributes. If you go back to your view and use the identify tool to click on a well, you will now see the two fields you imported. The water level was not measured at some of the monitoring wells on 4/30/94 so the GW_LEVEL field is blank for them. NOTE: Each time you open this project, ArcView will requery the database and rejoin the data. This feature is particularly useful if you make a lot of changes to the database. Neat!

Creating a New Point Coverage

In the previous step, we already had a coverage to which we could join data. However, this luxury might not always be available. In this step, we will create our own coverage using location coordinates from the database. We already determined the points and data we are interested in with our benzene concentration query, but we now want to import this data. A script has been created to do this for you. The script is very similar to the one used in the Building a Basemap exercise, but it includes a section to read the coordinates from the database (using SQL Connect) and additional fields in the output table for the data we want.

In the Project window, switch to the Script icon and select New. Load the text file cocvalue.ave and then compile the script. The script requires that the view to be the active document. You can use the Window menu to switch back and forth between the view and the script, but it is often easier to customize your view properties to eliminate the switching procedure. To do this, open your View and double click on any part of the blank gray area of the toolbar. You should see the customize box appear. Change the Category to Buttons. Scroll all the way to the right and click once on the help icon. Click Separator and then New to create a new icon, one space away from the help icon. Go to the table in the lower part of the window, and Double click on Click and select Script1 (Script1 has to be open in the project for this to be a valid choice). Then double click on Icon and choose an appropriate button. Your customize box should look like:

Do not hit Reset or Make Default here! Close the customize window by clicking on the x in the upper right corner and click on your new icon to run the script. When prompted to Enter the COC Query Name, type Benzene_Soil. For the output table name, type Benzene Soil Data. Finally, save the new shapefile as benzene to your working directory. What this script does is create a new point coverage in ArcView using the Easting and Northing values that were contained in the data extracted from MS Access in the Benzene_Soil query. An alternative method to using a script is to import the data using SQL Connect and then display the data with the Add Event Theme option available from the View menu. This alternative provides more flexibility when adding different types of queries.

Make the new theme visible and you will see that the benzene measurements are scattered across the site. Some areas contain many measurements and others very few. Open the attributes table and take a look at the data. The records with an empty value field are the "no detect" measurements.

To be turned in: Use the Statistics command for the Value field and report the mean, median, range, maximum, minimum, and standard deviation of the measurements. What do these values tell you about benzene concentrations measurements at the refinery?

You can find out more about how to statistically analyze the benzene data by looking at the Statistical Analysis of Environmental Data exercise for the Environmental Risk Assessment Course.

Creating Contour Grids

In a perfect world, we would know the groundwater level and concentration value at any location, and at any time. However, it would be an extremely time-consuming and expensive task to visit every location on the site and determine the groundwater level or concentration at that location. Additionally, this study would only give us values for one moment in time. How then can we track and evaluate information using a limited data set? One method of analysis that will estimate measurements at any location is interpolating a surface through the existing data points. The values of this surface will be equal to the measurements at the measurement locations, and will be estimated for all other locations. There will likely be errors in the estimates, but the surface can still be a useful tool for identifying areas of concern and determining where more measurements are needed. For this exercise, we will examine two different methods of estimation: the Inverse Distance Weighting (IDW) Method and Spline Method. Both methods can be run using the Spatial Analyst extension in ArcView.

Inverse Distance Weighting (IDW) Method

For the Inverse Distance Weighting Method, a fine mesh grid is laid over the study area and a value is interpolated for each grid cell using the inverse distance squared between the cell location and the measurement location as a weight for each measurement. The interpolator assumes that each point has a local influence that diminishes with distance. It weighs the points closer to the processing cell greater than those farther away. You can specify the number of points to interpolate from, or optionally, you can specify all the points within a radius to interpolate from. The power parameter in the IDW interpolation controls the significance of the surrounding points on the interpolated value. Higher powers result in less influence from more distant points. You can also specify barriers beyond which the surface will not interpolate for the input point. For instance, you could specify a barrier for a cliff in your study area.

Turn on the Spatial Analyst extension from the File menu (Project window active). Open your View and make the Wells theme active. Choose Interpolate Grid under Surface. For Output Grid Extent, choose Same as Boundary. The Output Grid Cell Size, Number of Rows, and Number of Columns will be set according to the extent of the boundary theme. However, you can change these values if you want a finer or larger grid cell size. You might want to choose a 25' cell size, for example. Remember that your grid will only be as accurate as your data measurements (i.e., a fine grid will not be any more accurate than a larger grid if the amount of data points is small). Accept the defaults for your grid parameters.

You will now see the Interpolate Surface dialog box. Choose GW_ELEV for the Z Value Field and accept the default surface parameters. If you don't see GW_ELEV as an available field at this point, it means that you did not do the database query correctly using the ODBC connection through EnvAccess. You should see a grid appear of groundwater elevation measurements for the refinery. You can use the legend editor to color your grid, and there are even a few elevation color schemes. You can also load the legend, gwelev.avl, that has been provided for you. The elevation values for your grid are the elevations above mean sea level, so water will flow from the higher values to the lower ones. If you want to see the input data values for the groundwater elevations, you can use the Theme/Auto-Label command with the Wells theme. Your grid should look something like the one provided below. To save your grid, choose Save Data Set under Theme. Call your grid gwidw. Use the theme properties to change the theme name to GW IDW Surface.

To be turned in: Which well has the highest recorded groundwater level? Use the measuring tool to determine the distance across the property to the boundary line near the Delaware River. What is the maximum gradient of groundwater flow on this site? All map units, vertical and horizontal are in feet.

Spline Method

Another method for interpolating surfaces is the Spline Method. The Spline Method uses a polynomial function to fit a surface which passes through all the data values and which is smoothed so that there are not as many peaks and pits as in the IDW Method. Conceptually, it is like bending a sheet of rubber to pass through the points, while minimizing the total curvature of the surface. This method can overshoot estimated values if there are large changes within a short horizontal distance, but the method is well suited for gently varying surfaces. You can specify two different kinds of surfaces. The Regularized method yields a smooth surface while the Tension method tunes the stiffness of the surface according to the character of the modeled phenomenon. The weight parameter for the Regularized method defines the weight of the third derivative of the surface in the curvature minimization expression, while the weight parameter for the Tension method defines the weight of tension. The number of points parameter identifies the number of points per region used for local approximation.

Follow the same procedure above, except this time choose Spline for the Interpolate Surface Method. Also choose Tension for Type with a weight of 10. The default value range will be somewhat larger for this grid, and you can use the gwelev.avl legend to allow for a better comparison with the IDW grid. Note that the grid is only accurate within the facility boundary area. If you click on a point in the Delaware River, you will notice it has a large elevation value. Its elevation should be zero (assuming the river is at mean sea level). Again check the layout below to compare your answer. Save your grid as gwspline and rename it GW Spline Surface.

Repeat this procedure for the concentration data (use Value field). For the Spline Method, use Tension for Type with a weight of 10000. Call your two grids cocidw and cocspline and rename them Benzene IDW Surface and Benzene Spline Surface. Load the legend coc.avl for both grids to obtain a better comparison. Here is what your four grids should look like:

We could also have generated contour lines instead of grids. The procedure is identical except that you choose Create Contours instead of Interpolate Surface from the Surface menu. Feel free to experiment with this option and with different weight parameters. You will notice that the weights can have a significant influence on the results. You can also overlay the contours created for a particular interpolated surface so as to more easily understand what the interpolated grid represents. If you already have an interpolated grid and wish to create the contours from that, just highlight the grid and then use Surface/Create Contours.

To be turned in: Briefly describe the benzene concentration grids. Which sample contains the highest benzene concentration? Which interpolation method do you think is better for mapping groundwater measurements and which interpolation method is better for mapping benzene concentrations? Present a copy of the map which you feel best presents the groundwater elevation data and another for the benzene concentration data.

Preparing Data for a Risk Map

Concentration maps for the entire site can give us a good idea of the range in concentration data. However, many of these data measurements come from different sources and some areas have considerably more measurements than others. For a more detailed analysis, let us focus our study on one particular area. Add the sourcearea coverage to your view. Use the legend editor to make the polygon transparent and select an appropriate color for the outline. This polygon encloses a former surface impoundment that has received excess sewer effluent, sand filter backwash, stormwater runoff, and tank farm wastes.

It turns out that we have a lot of benzene soil measurements from this area. We can query these measurements by making the benzene.shp theme active and selecting Select byTheme under Theme. Select features of the active theme that Intersect the selected features of Sourcearea and choose New Set. Open the attributes table of benzene.shp and use the promote icon to view these records. You should see 28 selected records. We now want to analyze this data using Microsoft Excel, so use the Export command under File. Choose dBase for the export format and save the table to your working directory as records.dbf. ArcView will only export the highlighted records to your dBase file. Start Excel and open the table (be sure to select dBase Files for Files of type:). You should see something very familiar to the table you had in ArcView.

Using the Binomial Distribution

For the source area we have selected, we want to determine the probability the any soil sample taken from this area will be above a threshold concentration, such as an EPA action level. There are only two possible outcomes for each soil sample we obtain: a value above the threshold or a value below the threshold. Problems of this type can be modeled by a Bernoulli sequence, which is based on the following assumptions:

Each trial has only two possible outcomes: occurrence or non-occurrence of an event.
The probability of occurrence of the event in each trial is constant.
The trials are statistically independent.

For our small area of analysis, we can assume that the final two assumptions are valid. However, this assumption would not be safe if we were looking at the entire facility. The probability of having a sample value above the threshold concentration in the Southwest corner of the refinery is not likely to be the same as the probability of having a sample value above the threshold concentration in the Northeast corner of the refinery.

If the soil samples are taken from an area containing an infinite number of samples, or are taken from a finite supply and replaced, the ratio of soil samples above a threshold to the total number of samples obtained will be described by a binomial distribution:

The probability of drawing s soil samples (violations) in n trials is equal to

where is the number of combinations of n trials that contain s successes.

The cumulative probability of s or more successes in n trials is given by the sum of all e(n, m, P_r) with m greater than or equal to s:

In our case, we do not know the ratio of measurements exceeding to measurements not exceeding the threshold. This ratio can be estimated by repeatedly making soil measurements and keeping track of the number exceedances and non-exceedances. Thus, the best estimate ratio of exceedances to non-exceedances is simply the ratio of these two outcomes using the obtained measurements. The expected accuracy of this estimate increases as more samples are taken.

From our data set, we can obtain the best estimate probability that any soil sample in the area will be in violation of a set threshold by dividing the number of violations by the total number of samples. You can use your Excel spreadsheet to calculate this probability. Highlight your records and then choose Data/Sort. Sort by Value in Descending order. Your first five results should look like this:

For illustration purposes in this exercise, let us use a threshold concentration of 15 ug/kg (this is actually much lower than EPA target levels for benzene in soil). The best estimate probability = (# of samples greater than 15 ug/kg)/(total # of samples). One of the benefits of this method is that it includes the no detect data. However, this method does not take into account the actual value measurements (i.e., for the given threshold, a sample with a value of 50,000 ug/kg is not distinguished from a sample with a value of 20 ug/kg). In this case, the total number of samples is equal to the number of records in the table, namely 28.

To be turned in: What is the best estimate probability for the source area data?

We now have a best estimate probability, but it will be more appropriate to express the probability estimate as a range of possible values with a degree of confidence that the true probability falls in that range. A typical statement of this type is "The probability of measuring a concentration above the target level in a single trial lies between 40% and 60% with a confidence level of 95%." As an appropriate screening level for our data, we would like to calculate the upper bound on our probability estimate with a confidence level of 95% (two-sided). This means that there will only be a 2.5% chance that the actual probability of violation will be greater than the upper limit we calculate.

To estimate an upper bound, we need to find the value of P_u(s) that will satisfy the following equation:

1 - E(n, n-s, 1-P_u(s)) = 1 - a/2 where a = 1 - confidence limit

This task can be accomplished using Excel with the Solver Add-In. Select Tools/Add-Ins and verify that Analysis ToolPak and Solver Add-In are checked. Solver is installed on the machines in the LRC, but if you are using your own version of Microsoft Office, you might have to add Solver using your setup CD. Move to the right side of your spreadsheet and edit it as follows: in cell J1, type 0; in cell K1, type =1-binomdist(n-s,n,1-J1,true)+binomdist(n-s,n,1-J1,false); in cell K2, type 0.025 (=a/2); and in cell L1, type =J1*100. Put the actual number of successes and trials in place of s and n (s = 4, so n-s = 24, and n = 28), respectively. You don't have to do this task in this location, but make sure you set up the references correctly. Binomdist is a built-in function of Excel that can be used for binomial distribution calculations. The first term is the number of successes. The second term is the number of trials. The third term is the probability, and the fourth terms tells Excel whether it should calculate the probability mass function (false) or the cumulative binomial (true). It should also be noted that Excel defines the cumulative binomial as the sum from x = 0 to x = s as opposed to the sum from x = s to x = n as described above. This is the reason for the strange looking formula you wrote on your spreadsheet.

Now we are ready to solve for the upper bound. Select Tools/Solver. We want to change the first cell until it solves the equation, and then place the result in the last cell. Thus, type $J$1 under By Changing Cells, and type $L$1 under Set Target Cell. In the Subject to the Constraints section, click Add and type Cell Reference: $K$1 >= $K$2, and then click OK. Make sure Equal To: is set to Max and then click Solve. Accept the solution and the value in cell L1 is your upper bound (%). Your completed Solver form should look like this:

To be turned in: What is the upper bound probability that a sample selected from this source area will contain at least 15 ug/kg of benzene?

Now that we have an upper bound, we could add this information to our Sourcearea attributes table in ArcView. If we had other source areas, we could follow a similar procedure and then use the legend editor to color the source areas according to their upper exceedance probability. This map would be a useful tool for displaying possible risks associated with each source area. OK, you are done!

Go back to the GIS classroom page