Unit 4 Section 2 Answers

Agriculture and Environment

The data below relate number of plant species (Y) in islands along the California coast to the area of the islands (, square miles), maximun elevation (, feet), and latitude (, degrees North).


Y       X1      X2       X3
205    134    3950     28.2
163     98    4600     29.0
420     96    2470     34.0
340     84    1560     34.0
392     75    2125     33.3
235     56    1965     32.9
120     22     910     33.2
190     14     830     34.0
 42    2.8     490     27.9
 40    1.0     635     33.4
 62    0.9     470     30.5
  4    0.2     130     37.7
 40   0.02     60      37.1
 39    2.5     660     28.3
 70    1.1     930     34.0

1) Fit all possible one variable, two variable, and three variable liner regression models. List the final estimated equations with standard error terms for estimated partial slopes.

Model	intercept (stderr)	X1 (stderr)	X2 (stderr)	X3 (stderr)
X1	69.43 (31.55)	2.25 (0.53)
X2	76.91 (45.87)		0.055 (0.023)
X3	76.88 (402.35)			2.48 (12.32)
X1:X2	103.65 (32.11)	4.42 (1.11)	-0.08 (0.04)
X1:X3	-355.27 (261.62)	2.48 (0.52)		12.79 (7.83)
X2:X3	-479.53 (375.76)		0.07 (0.02)	16.39 (10.99)
X1:X2:X3	-131.14 (291.19)	4.11 (1.20)	-0.07 (0.04)	6.86 (8.45)

2) Using the information from the first task, extract the R² values, SSR. SSE, degree of freedom and F test for each of the models to fill in the following table.

Model	SSR	DF1	SSE	DF2	F	R²
X1	152981	1	110311	13	18.03	0.5810
X2	79175	1	184116	13	5.59	0.3007
X3	816.8	1	262475	13	0.04	0.0031
X1:X2	183701	2	79591	12	13.85	0.6977
X1:X3	173055	2	90236	12	11.51	0.6573
X2:X3	107954	2	155338	12	4.17	0.4100
X1:X2:X3	188195	3	75096	11	9.19	0.7148

3) Using this table, calculate the F-statistic to test the hypothesis (at a=0.05) that X2 adds to the explanation of Y given that X1 is already in the model (e.g compute the partial sums of squares for X2 given X1 then perform the test.)

Sums of square of X2 given X1 = 183701-152981 = 30719.8 dfr=1

Residual Sums of squares (X2:X1 model) = 79591, dfe=12

F-statistic = Mean squares X2|X1 / Mean Squares Error (X1:X2 model) = (30719.8/1)/(79591/12) = 4.63

F-critical at ndf=1, ddf=12 = 4.75

Since 4.63 < 4.75 we do not reject the null hypothesis that the partial slope for X2 is zero given X1 is already in the model (p-value is 0.0524). From this we conclude that X2 does not add significantly to the explanation of variability in Y when X1 is already in the model. Note that if X2 does not add to explanation over and above X1, then X3 will not as well. We could formally test this with the result that the F-statistic would be 2,67 which is also less than 4.75 (p-value of 0.1282).

4) Using this same table, calculate the F-statistic to test the hypothesis (at a=0.05) that X1 adds to the explanation of Y given that X2 and X3 are already in the model (e.g compute the partial sums of squares for X1 given X21 and X3 then perform the test.)

Sums of square of X1 given X2:X3 =188195-107954=80241 dfr=1

Residual Sums of squares (X2:X1 model) = 75096, dfe=12

F-statistic = Mean squares X2|X1 / Mean Squares Error (X1:X2 model) = (80241/1)/(75096/11) = 11.75

F-critical at ndf=1, ddf=11 = 4.84

Since 11.75 > 4.84 wereject the null hypothesis that the partial slope for X1 is zero given X2:X3 are already in the model (p-value is 0.0056). From this we conclude that X1 does add significantly to the explanation of variability in Y when X2 and X3 are already in the model. Note that if X2 does not add to explanation over and above X1, and X3 does not add significantly over X1, then we conclude that it is only X1 that seems important here.

5) Using the test in 3 and 4 above, which of the following models would be your final best model?

i) X1 alone, This is the best model. Thus we would conclude that the number of plant species seems to be related to island area alone.

ii) X2 + X1

iii) All three explanatory variables.

The SAS program for this analysis is:

options ls=78 ps=49 nocenter nodate;
data patient;
input Y   X1  X2  X3;
datalines;
205    134    3950     28.2
163     98    4600     29.0
420     96    2470     34.0
340     84    1560     34.0
392     75    2125     33.3
235     56    1965     32.9
120     22     910     33.2
190     14     830     34.0
 42    2.8     490     27.9
 40    1.0     635     33.4
 62    0.9     470     30.5
  4    0.2     130     37.7
 40   0.02     60      37.1
 39    2.5     660     28.3
 70    1.1     930     34.0
;
run;
proc reg data=patient;
 model y = x1 ;
 model y = x2 ;
 model y = x3 ;
 model y = x1 x2 ;
 model y = x1 x3 ;
 model y = x2 x3 ;
 model y = x1 x2 x3 ;
 run;
 proc glm data=patient;
   model y = x1 x2 x3 / solution;
   run;
proc glm data=patient;
   model y = x1 x2/ solution;
   run;
proc glm data=patient;
   model y = x1 x3/ solution;
   run;

In MINITAB you would use the Stat > Regression > Regression tab to fit the various models. Using this you could also fit the X2:X1 model and select the third option under the Results button to get sequential sums of squares. You could also fit the X2:X3 model to check its sequential sums of squares and the X2:X1:X3 model to check its sequential sums of squares.

In SPSS, after entering the data you would use the Analyze > Regression > Linear option to fit each of the selected models. To get the answers to Task 3 and 4 above, you would have to run the separate models and fill in the table in Task 2.

In Excel, after entering the data you would use the Tools > Data Analysis > Regression option to fit each of the selected models. Again, to get the answers to Task 3 and 4 above, you would have to run the separate models and fill in the table in Task 2.