STA 6166 UNIT 4 Section 2
Welcome < Begin < Unit 4 < Section 2 > Section 2 Exercises

Section 2

Multiple Regression and the General Linear Model

Readings Ott and Longnecker, Chapter 12, pages (617-704).
Instructor Guidance

In this section we attempt to extend the methods and concepts of simple linear regression to a multiple regression scenario. In multiple regression we are attempting to explain the variability and pattern in a single response variable through linear relationships with two or more explanatory or predictor variables. This is not as simple as it might seem.

The multiple regression model is defined on page 620. Note that the explanatory variables may be of two types. There are measured variables - random variables that are directly measured from the individual sample (or experimental) units. From these measured variables we can create a number of additional derived variables. Examples of derived variables are higher order components of a polynomial (i.e. if we have measured variable x1 then we could have derived terms x3=x12, or x4=x13) or cross-product terms (i.e. if we have measured variables x1 and x2, then a derived term might be x5=x1•x2).

The coding of treatment levels via dummy variables discussed on page 623 is important. From this we infer that the predictor variables in a multiple regression may be defined on a continuous scale or may be defined on a discrete scale. Note also that the partial slope coefficients for dummy variables translate into the effect of the treatment level coded by that dummy variable on the response mean. By broadening the definition of multiple linear regression models to include dummy variables we have created the general linear model.

Estimating the partial regression slope parameters for multiple regression follows the same basic approach as for simple linear regression. Because there are many more coefficients to estimate, the computations for the estimates are much more complex, requiring the use of linear algebra. We skip the details of this a proceed to using computer packages to do this work for us. Up to now you have been able to do all of the computations by hand. At this point it becomes very difficult to do these computations by hand and totally unnecessary. We will assume the computer packages are making the computations correctly and concentrate on extracting the appropriate information from the computer output.

In section 12.4 we extend the definition of the coefficient of determination to the multiple regression case. Note that in the simple regression case the terms is reported as r2, but in the multiple regression case it is reported as R2. It is interpreted the same - as the proportion of variability in the response variable that is explained by the multiple regression. The second part of this section deals with an overall test for significance of the multiple regression (page 649) and a test for non-zero value for an individual partial slope estimate (page 654). With the latter test we can determine if an individual explanatory variable should be dropped from the regression model (if its partial slope estimate is not significantly different from zero it is not providing much to the explanation of the variability in the response). With the former test we can tell if there is any relationship between the response and the set of predictors at all.

On page 658 we develop a test to determine whether a subset of the explanatory variables are collectively not providing much to the explanation of the variability in the response. This test involves fitting two regression models, a full model and a reduced model, and then using statistics (from their associated analysis of variance tables) from these fits for the test.

Finally we look at predicting with the final fitted regression model and assessing goodness of fit. In addition we show how the use of dummy variables and derived variables can be used to test fairly complex hypotheses, such as whether two regression lines have the same slope.

The section of logistic regression is not typically a part of this course. All the same, this is a very important class of regression models that have found wide use in many areas. Read it and enjoy.

PPT Lecture

Multiple Regression (Powerpoint) and (PDF)

Optional Activities None
Exercises To check your understanding of the readings and practice these concepts and methods, go to Unit 4 Section 2 Exercises, do the exercises then check your answers from the page provided. Following this continue on to the Unit 4 Section 3.