STA
6166 UNIT 4 Section 2 Answers
|
Welcome | < | Begin | < | < | Unit 4 Section 2 Answers |
To Ag and Env. Answers |
To Tox and Health Answers |
To Social and Education Answers |
To Engineering Answers |
The data below relate number of plant species (Y) in islands along the California coast to the area of the islands (, square miles), maximun elevation (, feet), and latitude (, degrees North).
Y X1 X2 X3 205 134 3950 28.2 163 98 4600 29.0 420 96 2470 34.0 340 84 1560 34.0 392 75 2125 33.3 235 56 1965 32.9 120 22 910 33.2 190 14 830 34.0 42 2.8 490 27.9 40 1.0 635 33.4 62 0.9 470 30.5 4 0.2 130 37.7 40 0.02 60 37.1 39 2.5 660 28.3 70 1.1 930 34.0
1) Fit all possible one variable, two variable, and three variable liner regression models. List the final estimated equations with standard error terms for estimated partial slopes.
Model
|
intercept (stderr) |
X1 (stderr) |
X2 (stderr) |
X3 (stderr) |
X1 |
69.43 (31.55) |
2.25 (0.53) |
||
X2 |
76.91 (45.87) |
0.055 (0.023) |
||
X3 |
76.88 (402.35) |
2.48 (12.32) |
||
X1:X2 |
103.65 (32.11) |
4.42 (1.11) |
-0.08 (0.04) |
|
X1:X3 |
-355.27 (261.62) |
2.48 (0.52) |
12.79 (7.83) |
|
X2:X3 |
-479.53 (375.76) |
0.07 (0.02) |
16.39 (10.99) |
|
X1:X2:X3 |
-131.14 (291.19) |
4.11 (1.20) |
-0.07 (0.04) |
6.86 (8.45) |
2) Using the information from the first task, extract the R2 values, SSR. SSE, degree of freedom and F test for each of the models to fill in the following table.
Model
|
SSR
|
DF1
|
SSE
|
DF2
|
F
|
R2
|
X1 | 152981 | 1 | 110311 | 13 | 18.03 | 0.5810 |
X2 | 79175 | 1 | 184116 | 13 | 5.59 | 0.3007 |
X3 | 816.8 | 1 | 262475 | 13 | 0.04 | 0.0031 |
X1:X2 | 183701 | 2 | 79591 | 12 | 13.85 | 0.6977 |
X1:X3 | 173055 | 2 | 90236 | 12 | 11.51 | 0.6573 |
X2:X3 | 107954 | 2 | 155338 | 12 | 4.17 | 0.4100 |
X1:X2:X3 | 188195 | 3 | 75096 | 11 | 9.19 | 0.7148 |
3) Using this table, calculate the F-statistic to test the hypothesis (at a=0.05) that X2 adds to the explanation of Y given that X1 is already in the model (e.g compute the partial sums of squares for X2 given X1 then perform the test.)
Sums of square of X2 given X1 = 183701-152981 = 30719.8 dfr=1
Residual Sums of squares (X2:X1 model) = 79591, dfe=12
F-statistic = Mean squares X2|X1 / Mean Squares Error (X1:X2 model) = (30719.8/1)/(79591/12) = 4.63
F-critical at ndf=1, ddf=12 = 4.75
Since 4.63 < 4.75 we do not reject the null hypothesis that the partial slope for X2 is zero given X1 is already in the model (p-value is 0.0524). From this we conclude that X2 does not add significantly to the explanation of variability in Y when X1 is already in the model. Note that if X2 does not add to explanation over and above X1, then X3 will not as well. We could formally test this with the result that the F-statistic would be 2,67 which is also less than 4.75 (p-value of 0.1282).
4) Using this same table, calculate the F-statistic to test the hypothesis (at a=0.05) that X1 adds to the explanation of Y given that X2 and X3 are already in the model (e.g compute the partial sums of squares for X1 given X21 and X3 then perform the test.)
Sums of square of X1 given X2:X3 =188195-107954=80241 dfr=1
Residual Sums of squares (X2:X1 model) = 75096, dfe=12
F-statistic = Mean squares X2|X1 / Mean Squares Error (X1:X2 model) = (80241/1)/(75096/11) = 11.75
F-critical at ndf=1, ddf=11 = 4.84
Since 11.75 > 4.84 wereject the null hypothesis that the partial slope for X1 is zero given X2:X3 are already in the model (p-value is 0.0056). From this we conclude that X1 does add significantly to the explanation of variability in Y when X2 and X3 are already in the model. Note that if X2 does not add to explanation over and above X1, and X3 does not add significantly over X1, then we conclude that it is only X1 that seems important here.
5) Using the test in 3 and 4 above, which of the following models would be your final best model?
i) X1 alone, This is the best model. Thus we would conclude that the number of plant species seems to be related to island area alone.
ii) X2 + X1
iii) All three explanatory variables.
The SAS program for this analysis is:
options ls=78 ps=49 nocenter nodate; data patient; input Y X1 X2 X3; datalines; 205 134 3950 28.2 163 98 4600 29.0 420 96 2470 34.0 340 84 1560 34.0 392 75 2125 33.3 235 56 1965 32.9 120 22 910 33.2 190 14 830 34.0 42 2.8 490 27.9 40 1.0 635 33.4 62 0.9 470 30.5 4 0.2 130 37.7 40 0.02 60 37.1 39 2.5 660 28.3 70 1.1 930 34.0 ; run; proc reg data=patient; model y = x1 ; model y = x2 ; model y = x3 ; model y = x1 x2 ; model y = x1 x3 ; model y = x2 x3 ; model y = x1 x2 x3 ; run; proc glm data=patient; model y = x1 x2 x3 / solution; run; proc glm data=patient; model y = x1 x2/ solution; run; proc glm data=patient; model y = x1 x3/ solution; run;
In MINITAB you would use the Stat > Regression > Regression tab to fit the various models. Using this you could also fit the X2:X1 model and select the third option under the Results button to get sequential sums of squares. You could also fit the X2:X3 model to check its sequential sums of squares and the X2:X1:X3 model to check its sequential sums of squares.
In SPSS, after entering the data you would use the Analyze > Regression > Linear option to fit each of the selected models. To get the answers to Task 3 and 4 above, you would have to run the separate models and fill in the table in Task 2.
In Excel, after entering the data you would use the Tools > Data Analysis > Regression option to fit each of the selected models. Again, to get the answers to Task 3 and 4 above, you would have to run the separate models and fill in the table in Task 2.