Segment 1402 Analysis

Segment 1403 Analysis

Fourier Analysis Method

Fourier Analysis Results

Quantile Plot Method

Segment 1434--Bastrop and Smithville

Site Analysis- Dissolved Oxygen

Bastrop- CRWN Site # 1428.0600

This site is located in the upstream part of river segment 1434. The LCRA monitors at this site six times a year and the CRWN has a volunteer site at the same location. For the following analysis a time span of approximately four years is used:

CRWN data- 9/28/93 to 12/16/97, n=104

LCRA data- 8/23/93 to 10/1/97, n=26

The following tables are the descriptive statistics for each site:

Note that there are almost four times more CRWN data values than LCRA data values. Both of the data sets have mean and median values that are very close to each other and relatively small skewness values. The CRWN skewness value is more than 50% lower than the LCRA values, probably because of the larger number of data points. The range of the CRWN is substantially higher than the LCRA range, this being due to the larger number of CRWN measurements.

 

The descriptive statistics indicate that both sets of data are normally distributed. The following are the cumulative frequency plots and histograms for the two data sets:

 

The cumulative frequency plots are close to linear in the mid-range, indicating normal distribution.

The histogram is a good visual tool to determine the shape of the data's distribution. The histograms for the data are:

The histogram for the CRWN data has the appearance of a normal distribution, slightly skewed in the positive direction. The LCRA data appears to be slightly more skewed but there is only a difference of three data point between the highest intervals.

t-test Difference in Means of Two Data Sets

It is important to know to what degree the data follows a normal distribution in order to have confidence in the results of the t-test and mean distribution analysis. These analysis use the mean and standard deviation which are best used with normally distributed data. The value of t indicates to what degree the means of the two data sets are different. Each unit value of t, positive or negative, is one standard error. An absolute value of 2 for the t-stat is an estimate of the 95% confidence limit around the mean, so an absolute value of less than two indicates that there is no statistical difference between the two data sets.

Knowing that both data sets are normally distributed the t-test is used to determine if the two data sets are statistically different from each other. The t-statistic is determined by the following formula:

The t-test should work well with these data sets since they are normally distributed and the standard deviations are approximately the same. (A F-test could be done to determine if the variances are different). The t-test was done two ways: using Excel's built in test and using the above formula in the spreadsheet. The results were as follows:



The results using Excel and the formula were the same. The value of t = 1.108 indicates that the hypothesis there is no difference between the two means cannot be rejected.

 Error Statistics

A high degree of confidence in the measured water quality data is desirable. The data is used to determine whether water quality standards for a particular river segments are being met. Long term trends, which can be subtle, need to be watched for. These uses of the data require accuracy and confidence. Assuming that volunteer data is reliable, how much does the addition of volunteer data to water quality monitoring data help to give a higher degree of confidence?

The above analysis indicates that the two data sets, professional and volunteer, are very similar. It would be statistically acceptable to combine the two data sets if they are collected at the same point on the river segment. The question remains as to whether there is a statistical advantage to having the additional data that the volunteer sites can provide. There are two parts to this question:

1) Is there an advantage to having the additional data added to professional data (at the same site)?

2) Is there and advantage to having additional volunteer sites on a river segment at different locations on the segment?

One way of answering these questions is to determine whether there is a substantial increase in confidence in the data with the addition of the volunteer data. This can be looked at in two ways:

1) The volunteer sites usually have more sampling points in a given time interval than professional sites. Over a given time interval, is there a substantial increase confidence in the data with the increase in data measurements inherent with volunteer data?

2) When volunteer data is added to professional data (at the same site), is there a substantial increase in data confidence with this additional data?

One statistical method used to study this is the Standard Error of the Mean. This process would produce the "mean of the means" and an estimate of their spread about that mean. The standard deviation of the mean sx is known as the standard error of the mean, with which estimates of the reliability of the data mean can be made.

The following is an analysis of the Bastrop data:

The above results indicate that in 95 out of 100 similar measurements the mean would lie in the approximate range around the mean indicated in the last column. The range for the volunteer data around the mean for n=105 is 0.171 or the mean DO with 95% confidence is9.35 ± 0.085 mg/l. The range for the professional data around the mean for n=26 is 0.482 or the mean DO level with 95% confidence is 8.96 ± 0.241 mg/l.

The graphical representation of the above analysis is shown below (click the thumbnail):

Fourier Analysis

The fourier series for each data set was determined by regression analysis, using the following formula:

Once the coefficients were determined the fourier series for each data set were plotted together to visually see how well the data sets matched with seasonal variations:

 

Smithville CRWN Site#-1402.0505, LCRA Site#- 1402.0505 -- Segment 1434

The CRWN site is a level 2 site monitored by a high school science class. There are considerable temporal gaps in the CRWN data, mainly in the months April through September. The CRWN and LCRA sites are approximately one mile apart.

This analysis us data collected over a time span of approximately four years.

CRWN data- 9/9/93 to 3/20/97, n=39

LCRA data- 8/23/93 to 4/1/97, n=23

Descriptive Statistics

There are approximately one and a half times more CRWN data points. The LCRA data shows a stronger normal distribution than the CRWN data with the mean and median only differing by 0.19 mg/l and a skewness of 0.47. The mean and median values for the CRWN data differs by 0.52 mg/l and has a skewness value of -1.03. There is also a larger range in the CRWN data, which seems to be typical of the data sets with a larger n value.

Cumulative Frequency Distribution

The two data sets both show good linearity, indicating normality in the distributions. The CRWN data has some low DO measurement (less than 5 mg/l) which causes a lower R2 value on the linear trendline. None of the LCRA measurements were below 5.9 mg/l.

Histograms

The histograms show how the LCRA data is skewed in the negative direction due to the low DO measurements. This trend is not expected when the temporal nature of this data set is considered. The CRWN data lacks data when the water temperatures are the highest (the summer months) when the lower DO levels would be expected.

t-test

Despite the differences in the two data sets indicated from the descriptive statistics, cumulative frequency plot and histograms the absolute value of the t-stat is less than 2 indicating that the two data sets are not statistically different.

Standard Error of the Mean

The standard error of the mean analysis shows the LCRA to have a lower range around the mean with 95% confidence even though there are fewer data points. The 95% confidence interval for the two data sets are very close though, 0.55 mg/l for the CRWN data versus 0.45 mg/l for the LCRA data.

 

A statistical analysis will now be done on the two data sets combined. This is a valid procedure because of the t-value for the two data sets being less than 2. The addition of the volunteer data to the professional data should provide more confidence in the measurements.

By combining the two data sets the 95% confidence interval around the mean was reduced to a value which is less than either data set individually. The standard error of the mean analysis shows that for these particular data sets there is improvement in confidence in the accuracy of the data.

The descriptive statistics for the combined data are:

The fourier series plots for each data set:


Go back to Charlie Kaough's web page
Go back to Dr. D.R. Maidment's web page