*CE
397: Environmental Risk Assessment*

Department of Civil Engineering

The University of Texas at Austin

**Lecture Notes: Class #2 - Fundamentals
of Probability and Statistics, David Maidment, Jan 22, 1998**

**Readings: **

- Chapter 11, sections 11.1-11.3, from "Applied Hydrology" by Chow, V.T. Maidment, D.R. and Mays, L.W.
- Reference Sources on Environmental Statistics

**Introduction and Review of **Assignment
#1

The basic goal of statistics is to reduce a large number of values to a small number of more meaningful numbers.

Statistics summarize the important characteristics of the data set.

Environmental risk assessment involves the collection of sampling and monitoring data and statistical analysis is used to characterize the data and provide input values to be used in the evaluation process.

**Important Definitions**

**Mean **- the expected value of
a random variable. It is a measure of the middle of the distribution of
the data. The calculation of the mean involves all of the data.

**Median **- the middle value (50th
percentile) in the ordered sequence of measured values. For highly skewed
data sets the median can give a better representation than the mean of
the middle of the data distribution. The median is not as comprehensive
a measure of the data set as the mean.

**Geometric Mean **-
the antilog of the mean of the logs of the data. This gives a better estimate
than the arithmetic mean of the middle values of log normallydistributed
data.

**Standard Deviation and Variance**
- measures of the dispersion or spread of the data

**Coefficient of skewness** - gives
a measure of the symmetry of the distribution of the data.

**Kurtosis** - a measure of the flatness
of the data distribution.

**Random variable** - is a variable
described by a probability distribution.

**Probability distribution** - specifies
the chance that an observation of the variable will fall within a specified
range of values for the variable.

**Relative frequency** - is the number
of successes divided by the number of samples.

**Probability of an event** - the
relative frequency of a sample value as the number of samples goes to infinity.
The estimate of the chance of occurrence based on sampling data. This is
an **objective probability**

**Subjective probability** - an estimate
of the chance of occurrence of an event that cannot be measured.

**Principles of Statistical
Analysis**

Statistical analysis of a data set begins with determining the parameters of the original data. Often the data are more easily understood if the logarithms of the data are analyzed. This is particularly true when the data set includes values over several orders of magnitude. Log transformation of the data expands the small values and the data are more evenly spaced. Many environmental data sets are better analyzed in log space.

Frequency histograms summarize all the data in the set, provided that the data are statistically independent and identically distributed. The data included in Assignment #1 are actually from mixed populations, so they are not likely to be statistically independent or identically distributed. Later in the course, we'll return to this subject and partition the data spatially into more homogeneous subsets.

Calculation of the standard error of the estimate of the mean is like calculating the standard deviation of the mean.

Any function of a random variable (such as the mean) is itself a random variable.

The t-statistic is used to compare two data sets to determine if they are different from one another or if they cannot be distinguished from one another statistically.

Return to the Class Home Page