STA 6166 UNIT 1 Section 2
|Welcome||<||Begin||<||Section 1||<||Section 2||>||Section 2 Exercises|
|Readings||Ott and Longnecker, pages 40-120 (Chapter 3).|
Graphics and Tables
As you read Chapter 3, think about data characteristics and how they drive the types of graphics and summaries we use. Most of you will already know about bar and pie charts. Some of you will have seen histograms before. Pay particular attention to the differences between a frequency histogram and a relative frequency histogram (page 49). Note how the frequentist definition of probability is closely linked to the concept of a relative frequency histogram. Stem-and-leaf displays seem simple enough, but they are surprisingly powerful for a number of tasks in statistical analysis.
Scatterplots and associated time-series plots you may already be familiar with. What is important here is to remain honest about scales. You have heard this before "There are lies, damn lies and statistics!". The book published years ago titled "How to Lie with Statistics", should really have been titled "How to Lie with Graphics". Breaking scales in inappropriate places, having very different scales, and more recently, putting two-dimensional data into three-dimensional plots, can often obscure the true message in the data.
Computation of Basic Statistics
You need to learn how to compute, by hand, the mean, median, and mode, of simple sets of data and memorize the major characteristics as listed on page 77. These three statistics, and especially the sample mean, are at the foundation of most common statistical testing methodologies. Many of the statistical test discussed in this course compare means ("parametric tests") or medians ("non-parametric tests"). Most of these test statistics assume the underlying populations will, when randomly sampled, generate data whose histogram displays unimodality.
It is not sufficient to describe the data for a variable by its mean or median or mode alone. We typically use additional statistics to describe how the data are spread around the mean (or other measure of central tendency). The range is the simplest of these indices, but no one statistics can truly describe all of this variability. If the data are unimodal and have a "nice" distribution (i.e. symmetric looking histogram) then the variance (page 87) and associated standard deviation (page 88) have, as we will see later, some good properties that suggests that this statistic should be used. For testing purposes we will spend some time learning how to compute and use percentiles from a distribution. Another range-like statistic, the interquartile range (called the IQR), has been shown to provide a "robust" measure of variability when the data are not completely "nice".
We define a "nice" distribution as one that is mound shaped and symmetric (unimodal and symmetric). Note the Empirical Rule on page 89. We will see this again later. Finally, the statistic called the coefficient of variation , tells us something about relative spread of the data. Statisticians rarely do anything with the CV, but it is an important statistic for descriptive purposes in many sciences. Some know it as the "signal-to-noise ratio".
Review the section on Box-Plots. While originally designed by statisticians as a way of succinctly describing a distribution, they have found their way into a number of scientific publications. They are very useful in visually comparing distributions for one variable among multiple populations.
Memorize the seven key formulas on page 110. Know how to construct a histogram and a box plot.
|References||Just about any Statistics textbook will have a discussion of these topics.|
|Exercises||To check your understanding of the readings and practice these concepts and methods, go to Unit1 Section 2 Exercises, do the exercises then check your answers from the page provided. Following this continue on to the Unit 1 Section 3.|
|Additional Resources||If you need further information than is available here or in the book, check out. Web Oriented Teaching Resources|