STA 6166 UNIT 4 Section 2 Answers
 Welcome < Begin < Section 2 < Unit 4 Section 2 Answers

# Agriculture and Environment

The data below relate number of plant species (Y) in islands along the California coast to the area of the islands (, square miles), maximun elevation (, feet), and latitude (, degrees North).

```
Y       X1      X2       X3
205    134    3950     28.2
163     98    4600     29.0
420     96    2470     34.0
340     84    1560     34.0
392     75    2125     33.3
235     56    1965     32.9
120     22     910     33.2
190     14     830     34.0
42    2.8     490     27.9
40    1.0     635     33.4
62    0.9     470     30.5
4    0.2     130     37.7
40   0.02     60      37.1
39    2.5     660     28.3
70    1.1     930     34.0
```

1) Fit all possible one variable, two variable, and three variable liner regression models. List the final estimated equations with standard error terms for estimated partial slopes.

 Model intercept (stderr) X1 (stderr) X2 (stderr) X3 (stderr) X1 69.43 (31.55) 2.25 (0.53) X2 76.91 (45.87) 0.055 (0.023) X3 76.88 (402.35) 2.48 (12.32) X1:X2 103.65 (32.11) 4.42 (1.11) -0.08 (0.04) X1:X3 -355.27 (261.62) 2.48 (0.52) 12.79 (7.83) X2:X3 -479.53 (375.76) 0.07 (0.02) 16.39 (10.99) X1:X2:X3 -131.14 (291.19) 4.11 (1.20) -0.07 (0.04) 6.86 (8.45)

2) Using the information from the first task, extract the R2 values, SSR. SSE, degree of freedom and F test for each of the models to fill in the following table.

 Model SSR DF1 SSE DF2 F R2 X1 152981 1 110311 13 18.03 0.5810 X2 79175 1 184116 13 5.59 0.3007 X3 816.8 1 262475 13 0.04 0.0031 X1:X2 183701 2 79591 12 13.85 0.6977 X1:X3 173055 2 90236 12 11.51 0.6573 X2:X3 107954 2 155338 12 4.17 0.4100 X1:X2:X3 188195 3 75096 11 9.19 0.7148

3) Using this table, calculate the F-statistic to test the hypothesis (at a=0.05) that X2 adds to the explanation of Y given that X1 is already in the model (e.g compute the partial sums of squares for X2 given X1 then perform the test.)

Sums of square of X2 given X1 = 183701-152981 = 30719.8 dfr=1

Residual Sums of squares (X2:X1 model) = 79591, dfe=12

F-statistic = Mean squares X2|X1 / Mean Squares Error (X1:X2 model) = (30719.8/1)/(79591/12) = 4.63

F-critical at ndf=1, ddf=12 = 4.75

Since 4.63 < 4.75 we do not reject the null hypothesis that the partial slope for X2 is zero given X1 is already in the model (p-value is 0.0524). From this we conclude that X2 does not add significantly to the explanation of variability in Y when X1 is already in the model. Note that if X2 does not add to explanation over and above X1, then X3 will not as well. We could formally test this with the result that the F-statistic would be 2,67 which is also less than 4.75 (p-value of 0.1282).

4) Using this same table, calculate the F-statistic to test the hypothesis (at a=0.05) that X1 adds to the explanation of Y given that X2 and X3 are already in the model (e.g compute the partial sums of squares for X1 given X21 and X3 then perform the test.)

Sums of square of X1 given X2:X3 =188195-107954=80241 dfr=1

Residual Sums of squares (X2:X1 model) = 75096, dfe=12

F-statistic = Mean squares X2|X1 / Mean Squares Error (X1:X2 model) = (80241/1)/(75096/11) = 11.75

F-critical at ndf=1, ddf=11 = 4.84

Since 11.75 > 4.84 wereject the null hypothesis that the partial slope for X1 is zero given X2:X3 are already in the model (p-value is 0.0056). From this we conclude that X1 does add significantly to the explanation of variability in Y when X2 and X3 are already in the model. Note that if X2 does not add to explanation over and above X1, and X3 does not add significantly over X1, then we conclude that it is only X1 that seems important here.

5) Using the test in 3 and 4 above, which of the following models would be your final best model?

i) X1 alone, This is the best model. Thus we would conclude that the number of plant species seems to be related to island area alone.

ii) X2 + X1

iii) All three explanatory variables.

The SAS program for this analysis is:

```options ls=78 ps=49 nocenter nodate;
data patient;
input Y   X1  X2  X3;
datalines;
205    134    3950     28.2
163     98    4600     29.0
420     96    2470     34.0
340     84    1560     34.0
392     75    2125     33.3
235     56    1965     32.9
120     22     910     33.2
190     14     830     34.0
42    2.8     490     27.9
40    1.0     635     33.4
62    0.9     470     30.5
4    0.2     130     37.7
40   0.02     60      37.1
39    2.5     660     28.3
70    1.1     930     34.0
;
run;
proc reg data=patient;
model y = x1 ;
model y = x2 ;
model y = x3 ;
model y = x1 x2 ;
model y = x1 x3 ;
model y = x2 x3 ;
model y = x1 x2 x3 ;
run;
proc glm data=patient;
model y = x1 x2 x3 / solution;
run;
proc glm data=patient;
model y = x1 x2/ solution;
run;
proc glm data=patient;
model y = x1 x3/ solution;
run;```

In MINITAB you would use the Stat > Regression > Regression tab to fit the various models. Using this you could also fit the X2:X1 model and select the third option under the Results button to get sequential sums of squares. You could also fit the X2:X3 model to check its sequential sums of squares and the X2:X1:X3 model to check its sequential sums of squares.

In SPSS, after entering the data you would use the Analyze > Regression > Linear option to fit each of the selected models. To get the answers to Task 3 and 4 above, you would have to run the separate models and fill in the table in Task 2.

In Excel, after entering the data you would use the Tools > Data Analysis > Regression option to fit each of the selected models. Again, to get the answers to Task 3 and 4 above, you would have to run the separate models and fill in the table in Task 2.