# STA 6166 Assignment 5

**Tortoise Species Diversity in the Galapagos Islands**

The Galapagos Islands off the coast of Ecuador provide an excellent
laboratory for studying factors that influence the development and
survival of different species. These data give the number of
species of tortoise and related geographic variables for 30 different islands. Counts are
given both for the total number of species, and the number of species that
occur only on that specific island (the endemics). The variables from
left to right are:

- island name
- number of species
- number of endemics
- area (km^2)
- highest elevation (m)
- distance from nearest island (km)
- distance from Santa Cruz (km)
- area of adjacent island (km^2)

Using these data, answer the following two research questions.

#### Research Question 1 (10 points)

On close inspection of the data, you will notice that the number of
endemics on the island of Daphne Minor was not recorded (the
period denotes a missing observation). Noting the high correlation
between

*number of species* and

*number of endemics*, and by
using a suitable statistical procedure (such as simple linear
regression),

**predict** the number of endemics on Daphne Minor
and give a

**95% confidence interval** for the prediction. Ignore the
remaining variables (the geographic variables - last 5 columns of the data) in this question.

#### Research Question 2 (20 points)

Use this data to build a multiple regression model to predict species
diversity, as measured by

*number of species*, with the
five geographic variables as potential predictors. Ignore

*number of
endemics* in this question. Report the

**equation of your fitted
model**, and summarize your findings.
In your quest for a suitable model, you should:

- Plot the data in meaningful ways.
- Examine the need for transforming variables.
- Carry out model selection using the techniques discussed in class
(Cp, Adj. R^2, etc.), and decide on one (or a few) candidate model(s).
- Check if there are any violations of the regression assumptions,
including potential problems like influential observations and
multicollinearity.
- Based on all the preceeding results, decide on a single model for
predicting diversity, and report your findings.

**Instructions:**

- You can analyze a dataset of your own instead of the above, but you must discuss this with me and receive my approval beforehand. Your alternate data set must have a quantitative response, and at least 5 potential predictor variables. Analyze the plausability of a regression analysis by doing some scatterplots before approaching me.
- Typeset your results as a report, using the same
format, template, and general instructions as for previous Assignments. Label each of the two questions clearly within each section of
your report. This article gives further guidance on the art of communicating statistical results.
- The total length of your report
**must NOT exceed 7 sheets of
paper (14 pages)!** This means you will have to make some hard decisions as to
which plots you should include; typically only extremely compelling ones.