Unit 4 Section 2 Answers

Toxicology and Health

A hospital administrator wished to study the relationship between patient satisfaction (Y) and patient's age (X₁, in years), severity of illness (X₂, an index), and anxiety level (X₃, an index). She randomly selected 15 patients and collected the data presented below.



Y   X1  X2  X3
57  36  46  2.3  
66  40  48  2.2  
70  41  44  1.8
89  28  43  1.8
36  49  54  2.9
46  42  54  2.9
54  45  48  2.4
26  52  62  2.9  
77  29  50  2.1
89  29  48  2.4
67  43  53  2.4
47  38  55  2.2
51  34  51  2.3  
57  53  54  2.2
66  36  49  2.0

1) Fit all possible one variable, two variable, and three variable liner regression models. List the final estimated equations with standard error terms for estimated partial slopes.

Model	intercept (stderr)	X1 (stderr)	X2 (stderr)	X3 (stderr)
X1	125.73 (16.68)	-1.166 (0.413)
X2	201.414 (32.621)		-2.797 (0.642)
X3	143.85 (22.505)			-36.202 (9.596)
X1:X2	189.472 (29.550)	-0.955 (0.453)	-1.813 (0.737)
X1:X3	156.698 (19.203)	-1.115 (0.419)		-22.679 (9.418)
X2:X3	195.130 (31.941)		-1.888 (0.913)	-17.133 (12.598)
X1:X2:X3	185.365 (29.188)	-0.876 (0.449)	-1.156 (0.904)	-13.888 (11.463)

2) Using the information from the first task, extract the R² values, SSR. SSE, degree of freedom and F test for each of the models to fill in the following table.

Model	SSR	DF1	SSE	DF2	F	R²
X1	2451.9	1	1971.8	13	16.16	0.554
X2	2626.2	1	1797.5	13	18.99	0.594
X3	2311.8	1	2111.8	13	14.23	0.523
X1:X2	3112.2	2	1311.5	12	14.24	0.704
X1:X3	3094.3	2	1329.4	12	13.97	0.700
X2:X3	2866.2	2	1557.5	12	11.04	0.647
X1:X2:X3	3266.6	3	1157.1	11	10.35	0.738

3) Using this table, calculate the F-statistic to test the hypothesis (at a=0.05) that X1 adds to the explanation of Y given that X2 is already in the model (e.g compute the partial sums of squares for X1 given X2 then perform the test.)

Sums of square of X1 given X2 = 3112.2-2626.2 =486.0 dfr=1

Residual Sums of squares (X1:X2 model) = 1311.5, dfe=12

F-statistic = Mean squares X1|X2 / Mean Squares Error (X1:X2 model) = (486/1)/(1311.5/12) = 4.45

F-critical at ndf=1, ddf=12 = 4.75

Since 4.45 < 4.75 we do not reject the null hypothesis that the partial slope for X1 is zero given X2 is already in the model. From this we conclude that X1 does not add significantly to the explanation of variability in Y when X2 is already in the model. Note that if X1 does not add to explanation over and above X2, then X3 will not as well. We could formally test this with the result that the F-statistic would be 1.85 which is also less than 4.75.

4) Using this same table, calculate the F-statistic to test the hypothesis (at a=0.05) that X3 adds to the explanation of Y given that X1 and X2 are already in the model (e.g compute the partial sums of squares for X3 given X1 and X2 then perform the test.)

Sums of square of X3 given X1 and X2 are in the model = 3266.6 - 3112.2 = 154.4 dfr=1

Residual Sums of squares (X1:X2:X3 model) = 1157.1, dfe=11

F-statistic = Mean squares X3|X1:X2 / Mean Squares Error (X1:X2:X3 model) = (154.4/1)/(1157.1/11) = 1.47

F-critical at ndf=1, ddf=11, a=0.05 = 4.84

Since 1.47 < 4.84 we do not reject the null hypothesis that the partial slope for X3 is zero given X1 and X2 are already in the model. From this we conclude that X3 does not add significantly to the explanation of variability in Y when X1 and X2 are already in the model.

5) Using the test in 3 and 4 above, which of the following models would be your final best model?

i) X2 alone, This is the best model. From this we would conclude that patient satisfaction seems to be related to illness severity alone.

ii) X2:X1 Not the best model since X1 does not add over and above X2.

iii) All three explanatory variables (X1:X2:X3). Not the best model since X3 does not add over and above X1 + X2 model. Note that we did not even have to do the test in task 4 above since neither the X2:X1 nor the X2:X3 models were better than the X2 alone model.

The SAS Program to perform this analysis follows:


options ls=78 ps=49 nocenter nodate;
data patient;
input Y   X1  X2  X3;
datalines;
57  36  46  2.3  
66  40  48  2.2  
70  41  44  1.8
89  28  43  1.8
36  49  54  2.9
46  42  54  2.9
54  45  48  2.4
26  52  62  2.9  
77  29  50  2.1
89  29  48  2.4
67  43  53  2.4
47  38  55  2.2
51  34  51  2.3  
57  53  54  2.2
66  36  49  2.0
;
run;
/* Run all possible models */
proc reg data=patient;
 model y = x1 ;
 model y = x2 ;
 model y = x3 ;
 model y = x1 x2 ;
 model y = x1 x3 ;
 model y = x2 x3 ;
 model y = x1 x2 x3 ;
 run;
/* We run proc glm to examine partial and sequential sums of squares */
proc glm data=patient;
   model y = x2 x1/ solution;
   run;
proc glm data=patient;
   model y = x2 x3/ solution;
   run;
proc glm data=patient;
   model y = x1 x2 x3 / solution;
   run;

In MINITAB you would use the Stat > Regression > Regression tab to fit the various models. Using this you could also fit the X2:X1 model and select the third option under the Results button to get sequential sums of squares. You could also fit the X2:X3 model to check its sequential sums of squares and the X2:X1:X3 model to check its sequential sums of squares.

In SPSS, after entering the data you would use the Analyze > Regression > Linear option to fit each of the selected models. To get the answers to Task 3 and 4 above, you would have to run the separate models and fill in the table in Task 2.

In Excel, after entering the data you would use the Tools > Data Analysis > Regression option to fit each of the selected models. Again, to get the answers to Task 3 and 4 above, you would have to run the separate models and fill in the table in Task 2.