CE 397: Environmental Risk Assessment
Department of Civil Engineering
The University of Texas at Austin
Lecture Notes: Class #2 - Fundamentals of Probability and Statistics, David Maidment, Jan 22, 1998
Introduction and Review of Assignment #1
The basic goal of statistics is to reduce a large number of values to a small number of more meaningful numbers.
Statistics summarize the important characteristics of the data set.
Environmental risk assessment involves the collection of sampling and monitoring data and statistical analysis is used to characterize the data and provide input values to be used in the evaluation process.
Mean - the expected value of a random variable. It is a measure of the middle of the distribution of the data. The calculation of the mean involves all of the data.
Median - the middle value (50th percentile) in the ordered sequence of measured values. For highly skewed data sets the median can give a better representation than the mean of the middle of the data distribution. The median is not as comprehensive a measure of the data set as the mean.
Geometric Mean - the antilog of the mean of the logs of the data. This gives a better estimate than the arithmetic mean of the middle values of log normallydistributed data.
Standard Deviation and Variance - measures of the dispersion or spread of the data
Coefficient of skewness - gives a measure of the symmetry of the distribution of the data.
Kurtosis - a measure of the flatness of the data distribution.
Random variable - is a variable described by a probability distribution.
Probability distribution - specifies the chance that an observation of the variable will fall within a specified range of values for the variable.
Relative frequency - is the number of successes divided by the number of samples.
Probability of an event - the relative frequency of a sample value as the number of samples goes to infinity. The estimate of the chance of occurrence based on sampling data. This is an objective probability
Subjective probability - an estimate of the chance of occurrence of an event that cannot be measured.
Principles of Statistical Analysis
Statistical analysis of a data set begins with determining the parameters of the original data. Often the data are more easily understood if the logarithms of the data are analyzed. This is particularly true when the data set includes values over several orders of magnitude. Log transformation of the data expands the small values and the data are more evenly spaced. Many environmental data sets are better analyzed in log space.
Frequency histograms summarize all the data in the set, provided that the data are statistically independent and identically distributed. The data included in Assignment #1 are actually from mixed populations, so they are not likely to be statistically independent or identically distributed. Later in the course, we'll return to this subject and partition the data spatially into more homogeneous subsets.
Calculation of the standard error of the estimate of the mean is like calculating the standard deviation of the mean.
Any function of a random variable (such as the mean) is itself a random variable.
The t-statistic is used to compare two data sets to determine if they are different from one another or if they cannot be distinguished from one another statistically.
Return to the Class Home Page