STA 6166 Unit 4 Section 2 Exercises

STA 6166 UNIT 4 Section 2 Exercises

Unit 4 Section 2 Exercises

Unit 4 Section 2 Exercises

You can choose to work some or all of the problems listed below. We recommend that you at least work the problems listed in your major area of interest. Answers to these exercises can be found here (Answers).

General Questions.

What is a multiple linear regression model? Write out the functional form for a multiple linear regression model having 4 explanatory variables?
In a linear multiple regrssion model, describe in words the interpretation of (partial slopes).
What are the four assumptions made when performing a multiple linear regression analysis?
What are "dummy variables"? Give an example of a dummy variable.
State the formula for the coefficient of determination, . In a multiple regression model, if you get a high value of , what can you conclude?
What are sequential sums of squares and what are they used for? Are sequential sums of squares indepent of the particular order in which the independent variables enter the multiple regression model. Why is this important to know?
What are the steps in performing an F-test for the null hypothesis . How would your write the form of the F- test in terms of ? What does this equation tell you?
What is the difference between the "complete model" and a "reduced model"? What are the null hypothesis, test statistics and rejection region for the F-test of a subset of predictors?

For students in agriculture and environmental fields.

The data below relate number of plant species (Y) in islands along the California coast to the area of the islands (, square miles), maximun elevation (, feet), and latitude (, degrees North).



Y       X1      X2       X3
205    134    3950     28.2
163     98    4600     29.0
420     96    2470     34.0
340     84    1560     34.0
392     75    2125     33.3
235     56    1965     32.9
120     22     910     33.2
190     14     830     34.0
 42    2.8     490     27.9
 40    1.0     635     33.4
 62    0.9     470     30.5
  4    0.2     130     37.7
 40   0.02     60      37.1
 39    2.5     660     28.3
 70    1.1     930     34.0

Fit all possible one variable, two variable, and three variable liner regression models. List the final estimated equations with standard error terms for estimated partial slopes.

Using the information from the first task, extract the R² values, SSR. SSE, degree of freedom and F test for each of the models to fill in the following table.

Model	SSR	DF1	SSE	DF2	F	R²
X1
X2
X3
X1:X2
X1:X3
X2:X3
X1:X2:X3

3) Using this table, calculate the F-statistic to test the hypothesis (at a=0.05) that X2 adds to the explanation of Y given that X1 is already in the model (e.g compute the partial sums of squares for X2 given X1 then perform the test.)

4) Using this same table, calculate the F-statistic to test the hypothesis (at a=0.05) that X13 adds to the explanation of Y given that X2 and X3 are already in the model (e.g compute the partial sums of squares for X1 given X2 and X3 then perform the test.)

5) Using the test in 3 and 4 above, which of the following models would be your final best model?

i) X2 alone,

ii) X2 + X1

iii) All three explanatory variables.

For students in engineering fields.

For students in toxicology and health science fields.

A hospital administrator wished to study the relationship between patient satisfaction (Y) and patient's age (X₁, in years), severity of illness (X₂, an index), and anxiety level (X₃, an index). She randomly selected 15 patients and collected the data presented below.



Y   X1  X2  X3
57  36  46  2.3  
66  40  48  2.2  
70  41  44  1.8
89  28  43  1.8
36  49  54  2.9
46  42  54  2.9
54  45  48  2.4
26  52  62  2.9  
77  29  50  2.1
89  29  48  2.4
67  43  53  2.4
47  38  55  2.2
51  34  51  2.3  
57  53  54  2.2
66  36  49  2.0

Fit all possible one variable, two variable, and three variable liner regression models. List the final estimated equations with standard error terms for estimated partial slopes.

Using the information from the first task, extract the R² values, SSR. SSE, degree of freedom and F test for each of the models to fill in the following table.

Model	SSR	DF1	SSE	DF2	F	R²
X1
X2
X3
X1:X2
X1:X3
X2:X3
X1:X2:X3

3) Using this table, calculate the F-statistic to test the hypothesis (at a=0.05) that X1 adds to the explanation of Y given that X2 is already in the model (e.g compute the partial sums of squares for X1 given X2 then perform the test.)

4) Using this same table, calculate the F-statistic to test the hypothesis (at a=0.05) that X3 adds to the explanation of Y given that X1 and X2 are already in the model (e.g compute the partial sums of squares for X3 given X1 and X2 then perform the test.)

5) Using the test in 3 and 4 above, which of the following models would be your final best model?

i) X2 alone

ii) X2:X1

iii) All three explanatory variables (X1:X2:X3).