STAT 5303 Assignment 7
Predicting Skeletal Spongiosa Volume From Simple Body Measurements
A primary goal in molecular radiotherapy is to optimize various treatment
parameters - radionuclide, carrier molecule (peptide, antibody,
pharmaceutical), administered activity, use of pre-therapy drugs (cold
antibody, amino acids, etc.) - in order to maximize tumor cell kill while minimizing toxicity in non-targeted tissues. Active bone marrow is one of the more radiosensitive tissues in the human body
and hence it is important to predict and possibly avoid myelotoxicity in
radionuclide therapies (e.g. chemotherapy). Current procedures used to calculate marrow
dose generally require knowledge of the patient total skeletal active
marrow mass, a value which at present cannot be directly
measured. Theoretically, the active marrow mass can
be calculated given knowledge of the Total Skeletal Spongiosa Volume (TSSV).
Your goal in this study is to build a multiple regression model to predict TSSV
based on simple skeletal measurements obtainable from a CT scan or radiograph.
In a recent study, 17 different body measurements were recorded from whole-body
CT images of 40 cadavers (20 male and 20 female), whose ages ranged from
roughly 40-80
years, as representative of cancer patients potentially treated with
radionuclide therapy. For each cadaver, TSSV was calculated by manual
segmentation, a laborious process involving cadaver dissection. A description of the variables appearing in the dataset along with abbreviated names is as follows.
- Sex: Sex (male=1, female=0).
- Age: Age (years).
- HT: Total body height measured on the CT scout images (cm).
- OC.W: Maximum width of the os coxae in the coronal plane (cm).
- OC.H: Average of the maximum heights of the left and
right side of the os coxae in the coronal plane (cm).
- Bi.B: Distance between the outermost portions of the
greater trochanters in the coronal plane (cm).
- ASH: Distance from the anterior sacral promontory to the apex
of the sacrum in the sagittal plane (cm).
- S.W: Maximum width of the sacrum in the transverse plane (cm).
- L5.T: Thickness of the fifth lumbar (L5) vertebrae in the sagittal plane (cm).
- S1.B: Longest diameter of the S1 sacral plate in the transverse plane (cm).
- P: Average of the maximum perimeter of the left and right femoral heads in the coronal plane (cm).
- FD: Average of the Feret's diameter for the right and left femoral heads measured in the coronal plane (cm).
- Max.H: Maximum height of the femoral head in the coronal plane, calculated as the average of the left and right femoral heads (cm).
- Max.W: Maximum width of the femoral head in the coronal plane, calculated as the average of the left and right femoral heads (cm).
- HH: Distance between the outermost portions of the right and left proximal humeral heads in the CT scout image (cm).
- FH: Maximum height of the femoral bones in the CT scout image, calculated as the average of the left and right femoral bones (cm).
- TSSV: Total skeletal spongiosa volume (cubic cm).
Note that the TSSV measurement for the cadaver labeled CAD-26 was not recorded (the period denotes a missing observation). Using these data, build a multiple regression model to predict TSSV from the remaining variables, and summarize your findings. In your quest for a suitable model, you should:
- Plot the data in meaningful ways. Examine the need for transforming variables.
- Carry out model selection using the techniques discussed in class
(Stepwise, Cp, Adj. R^2, etc.), and decide on one (or a few) candidate model(s).
- Check if there are any violations of the regression assumptions,
including potential problems like influential observations and
multicollinearity.
- Based on all the preceeding results, decide on a single model for
predicting TSSV, and report your findings.
- Using your chosen model, predict the TSSV value of CAD-26, giving an appropriate interval for your prediction, and comment on it. Is this person unusual in any way?
Instructions:
- Typeset your results as a report, using the same
format, template, and general instructions as for previous Assignments, except this time your report should approach the quality of a publishable paper. Therefore, no raw computer output should be included in the main write-up (although you may add an appendix). The report should be organized as follows:
- Title.
- Abstract (a summary of the paper).
- Introduction.
- Statistical Methods.
- Results and Conclusions.
- References (optional).
- Appendix (optional).
This article gives further guidance on the art of communicating statistical results. And here is an example of a paper in the Journal of Agronomy to give an idea of what a good finished product might look like.
- The length of your report must NOT exceed 5 sheets of
paper (10 pages)! This means you will have to make some hard decisions as to
which plots you should include; typically only extremely compelling ones. (You may however add an appendix, and this can be as long as you wish.)