STA
6166 UNIT 4 Section 2 Answers

Welcome  <  Begin  <  <  Unit 4 Section 2 Answers 
To Ag and Env. Answers 
To Tox and Health Answers 
To Social and Education Answers 
To Engineering Answers 
The data below relate number of plant species (Y) in islands along the California coast to the area of the islands (, square miles), maximun elevation (, feet), and latitude (, degrees North).
Y X1 X2 X3 205 134 3950 28.2 163 98 4600 29.0 420 96 2470 34.0 340 84 1560 34.0 392 75 2125 33.3 235 56 1965 32.9 120 22 910 33.2 190 14 830 34.0 42 2.8 490 27.9 40 1.0 635 33.4 62 0.9 470 30.5 4 0.2 130 37.7 40 0.02 60 37.1 39 2.5 660 28.3 70 1.1 930 34.0
1) Fit all possible one variable, two variable, and three variable liner regression models. List the final estimated equations with standard error terms for estimated partial slopes.
Model

intercept (stderr) 
X1 (stderr) 
X2 (stderr) 
X3 (stderr) 
X1 
69.43 (31.55) 
2.25 (0.53) 

X2 
76.91 (45.87) 
0.055 (0.023) 

X3 
76.88 (402.35) 
2.48 (12.32) 

X1:X2 
103.65 (32.11) 
4.42 (1.11) 
0.08 (0.04) 

X1:X3 
355.27 (261.62) 
2.48 (0.52) 
12.79 (7.83) 

X2:X3 
479.53 (375.76) 
0.07 (0.02) 
16.39 (10.99) 

X1:X2:X3 
131.14 (291.19) 
4.11 (1.20) 
0.07 (0.04) 
6.86 (8.45) 
2) Using the information from the first task, extract the R^{2} values, SSR. SSE, degree of freedom and F test for each of the models to fill in the following table.
Model

SSR

DF1

SSE

DF2

F

R^{2}

X1  152981  1  110311  13  18.03  0.5810 
X2  79175  1  184116  13  5.59  0.3007 
X3  816.8  1  262475  13  0.04  0.0031 
X1:X2  183701  2  79591  12  13.85  0.6977 
X1:X3  173055  2  90236  12  11.51  0.6573 
X2:X3  107954  2  155338  12  4.17  0.4100 
X1:X2:X3  188195  3  75096  11  9.19  0.7148 
3) Using this table, calculate the Fstatistic to test the hypothesis (at a=0.05) that X2 adds to the explanation of Y given that X1 is already in the model (e.g compute the partial sums of squares for X2 given X1 then perform the test.)
Sums of square of X2 given X1 = 183701152981 = 30719.8 dfr=1
Residual Sums of squares (X2:X1 model) = 79591, dfe=12
Fstatistic = Mean squares X2X1 / Mean Squares Error (X1:X2 model) = (30719.8/1)/(79591/12) = 4.63
Fcritical at ndf=1, ddf=12 = 4.75
Since 4.63 < 4.75 we do not reject the null hypothesis that the partial slope for X2 is zero given X1 is already in the model (pvalue is 0.0524). From this we conclude that X2 does not add significantly to the explanation of variability in Y when X1 is already in the model. Note that if X2 does not add to explanation over and above X1, then X3 will not as well. We could formally test this with the result that the Fstatistic would be 2,67 which is also less than 4.75 (pvalue of 0.1282).
4) Using this same table, calculate the Fstatistic to test the hypothesis (at a=0.05) that X1 adds to the explanation of Y given that X2 and X3 are already in the model (e.g compute the partial sums of squares for X1 given X21 and X3 then perform the test.)
Sums of square of X1 given X2:X3 =188195107954=80241 dfr=1
Residual Sums of squares (X2:X1 model) = 75096, dfe=12
Fstatistic = Mean squares X2X1 / Mean Squares Error (X1:X2 model) = (80241/1)/(75096/11) = 11.75
Fcritical at ndf=1, ddf=11 = 4.84
Since 11.75 > 4.84 wereject the null hypothesis that the partial slope for X1 is zero given X2:X3 are already in the model (pvalue is 0.0056). From this we conclude that X1 does add significantly to the explanation of variability in Y when X2 and X3 are already in the model. Note that if X2 does not add to explanation over and above X1, and X3 does not add significantly over X1, then we conclude that it is only X1 that seems important here.
5) Using the test in 3 and 4 above, which of the following models would be your final best model?
i) X1 alone, This is the best model. Thus we would conclude that the number of plant species seems to be related to island area alone.
ii) X2 + X1
iii) All three explanatory variables.
The SAS program for this analysis is:
options ls=78 ps=49 nocenter nodate; data patient; input Y X1 X2 X3; datalines; 205 134 3950 28.2 163 98 4600 29.0 420 96 2470 34.0 340 84 1560 34.0 392 75 2125 33.3 235 56 1965 32.9 120 22 910 33.2 190 14 830 34.0 42 2.8 490 27.9 40 1.0 635 33.4 62 0.9 470 30.5 4 0.2 130 37.7 40 0.02 60 37.1 39 2.5 660 28.3 70 1.1 930 34.0 ; run; proc reg data=patient; model y = x1 ; model y = x2 ; model y = x3 ; model y = x1 x2 ; model y = x1 x3 ; model y = x2 x3 ; model y = x1 x2 x3 ; run; proc glm data=patient; model y = x1 x2 x3 / solution; run; proc glm data=patient; model y = x1 x2/ solution; run; proc glm data=patient; model y = x1 x3/ solution; run;
In MINITAB you would use the Stat > Regression > Regression tab to fit the various models. Using this you could also fit the X2:X1 model and select the third option under the Results button to get sequential sums of squares. You could also fit the X2:X3 model to check its sequential sums of squares and the X2:X1:X3 model to check its sequential sums of squares.
In SPSS, after entering the data you would use the Analyze > Regression > Linear option to fit each of the selected models. To get the answers to Task 3 and 4 above, you would have to run the separate models and fill in the table in Task 2.
In Excel, after entering the data you would use the Tools > Data Analysis > Regression option to fit each of the selected models. Again, to get the answers to Task 3 and 4 above, you would have to run the separate models and fill in the table in Task 2.