Section 3 Probability and Distributions

Readings	Ott and Longnecker, pages 122-190 (Chapter 4).
Instructor Guidance	These are key concepts, so take your time with the readings and make sure you understand the concepts before going on. Probability For this course we will be dealing with the classical frequentist definition of probability. Before you attempt the probability of an event, make sure you understand exactly what is meant by an outcome and an event. The Lecture notes below should help with this since the book does not do a particularly good job here. The probability properties discussed in section 4.3 are simply some set theory ideas transferred to probability. Once you can get by the mathematical formulation, it is just common sense. The key is to remember that the probability of an event can never be greater than 1, nor can it be less than zero. Hence as we compute the probability of events that are made up of the union or intersection of other simpler events, we have to remember to add and subtract probability just as if we were adding or subtracting areas in Venn diagrams (page 131). Remember P(A U B) = P(A) + P(B) - P(A Ç B). conditional and marginal probabilities are important as is Bayes' Formula (pages 132 and 136). The concept of independent and dependent events is also important, but we will be returning to this a couple of times more in the course so it is less important that this sticks now. Random Variables The distinction between qualitative and quantitative random variables, and discrete and continuous quantitative random variables has ramifications for a lot of the decisions you make regarding basic statistics to use on a set of data and which statistical testing methodologies apply. Note that a random variable is a fancy name for anything that can be measured or observed whose value can change from individual-to-individual in the population. If it doesn't change in the population then it is a constant. Some things we observe cannot be quantified, e.g. gender or color or health status (sick or well, diseased or disease-free). These are examples of qualitative variables, also referred to as categorical variables since they can be used to create categories of individuals. Some things only have a countable number of possible values, e.g. number of leaves on a stem, number of eggs in a nest; these we refer to as discrete quantitative variables. Amount measurements, such as weight, height, length, time, etc. are all examples of quantitative continuous variables. Here we can conceive of amounts having any possible value on the positive real line. Note that because of measurement device limitations, at some level every measurement is discrete, e.g. we may only be able to weight things to the nearest 0.001gram, hence there are only a finitely countable number of possible weights. Still, it is more practical to deal with amount measurements as if they were truly from a continuous scale. Distributions of Random Variables The book and the lecture notes identify a number of "classical" probability distributions. These are mathematical representations of probability distributions that occur most often in real situations. There are a large number of other theoretical distribution forms that are not discussed here. Take a look at the additional references below. Pay special attention to the Binomial and Normal distributions. They are key distributions for the statistical methods discussed in this course. Other key distributions, like the t-, F- and Chi Square distributions will be introduced as they are needed, but the methods of determining probabilities from tabled values of a distribution is similar for all distributions. The exercises below will force you to practice determining probabilities for events the Binomial and Normal random variables. Sampling Distributions The chapter ends with a discussion of sampling distributions. Don't be deceived! This is probably the most important and most difficult to understand concept in the book. You get the concept of a sampling distribution down and you have this course licked. The thing to remember here is that any one set of data is just one realization of an infinite number of possible samples one could have obtained. Hence your sample mean is just one of an infinite number of possible sample means. This implies that the sample mean is a random variable when looked at this way. Since all random variables have a distribution, the sample mean has a distribution. The Central Limit Theorem of Statistics tells us that the distribution of sample means will follow a normal distribution shape, and it even tells us what the mean and standard deviation of that normal distribution will be. This is very powerful theory and is the foundation for much of testing methodology covered in this course. The distribution of the sample mean is called its sampling distribution. If we follow the logic above for the sample mean, we discover that the sample standard deviation is also a random variable and hence it too has a sampling distribution. What about the sample median? The sample mode?, The range?, The Interquartile range? Actually, any function of data will have an associated sampling distribution. Sometimes we have theory (like the Central Limit Theorem of the Mean) to help us determine that sampling distribution. Sometimes we don't, at which point we resort to simulations to get approximations to the distribution. The Optional Activities below will help you to understand this concept of sampling distribution.
PPT Lectures	Probability (PowerPoint (111kb) PDF (124kb) Notes) Distributions for Random Variables (PowerPoint, PDF Notes) Sampling Distributions (PowerPoint, PDF Notes)
Optional Activities	Understanding Distributions Understanding Sampling Distributions
Exercises	To check your understanding of the readings and practice these concepts and methods, go to Unit1 Section 3 Exercises, do the exercises then check your answers from the page provided. Following this continue on to the Unit 1 Test.
Other Information Sources	Statistical Distributions, 3rd Edition, Merran Evans, Nicholas Hastings, Brian Peacock, ISBN: 0-471-37124-6, Paperback, 248 Pages, June 2000 Continuous Univariate Distributions, Volume 1, 2nd Edition, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1994 Continuous Univariate Distributions, Volume 2, 2nd Edition, Norman L. Johnson, Samuel Kotz, N. Balakrishnan, 1995 If you need further information than is available here or in the book, check out. Web Oriented Teaching Resources