# STA 6166 Assignment 5

Tortoise Species Diversity in the Galapagos Islands

The Galapagos Islands off the coast of Ecuador provide an excellent laboratory for studying factors that influence the development and survival of different species. These data give the number of species of tortoise and related geographic variables for 30 different islands. Counts are given both for the total number of species, and the number of species that occur only on that specific island (the endemics). The variables from left to right are:

1. island name
2. number of species
3. number of endemics
4. area (km^2)
5. highest elevation (m)
6. distance from nearest island (km)
7. distance from Santa Cruz (km)
8. area of adjacent island (km^2)
Using these data, answer the following two research questions.

#### Research Question 1 (10 points)

On close inspection of the data, you will notice that the number of endemics on the island of Daphne Minor was not recorded (the period denotes a missing observation). Noting the high correlation between number of species and number of endemics, and by using a suitable statistical procedure (such as simple linear regression), predict the number of endemics on Daphne Minor and give a 95% confidence interval for the prediction. Ignore the remaining variables (the geographic variables - last 5 columns of the data) in this question.

#### Research Question 2 (20 points)

Use this data to build a multiple regression model to predict species diversity, as measured by number of species, with the five geographic variables as potential predictors. Ignore number of endemics in this question. Report the equation of your fitted model, and summarize your findings. In your quest for a suitable model, you should:
• Plot the data in meaningful ways.
• Examine the need for transforming variables.
• Carry out model selection using the techniques discussed in class (Cp, Adj. R^2, etc.), and decide on one (or a few) candidate model(s).
• Check if there are any violations of the regression assumptions, including potential problems like influential observations and multicollinearity.
• Based on all the preceeding results, decide on a single model for predicting diversity, and report your findings.

Instructions:

• You can analyze a dataset of your own instead of the above, but you must discuss this with me and receive my approval beforehand. Your alternate data set must have a quantitative response, and at least 5 potential predictor variables. Analyze the plausability of a regression analysis by doing some scatterplots before approaching me.
• Typeset your results as a report, using the same format, template, and general instructions as for previous Assignments. Label each of the two questions clearly within each section of your report. This article gives further guidance on the art of communicating statistical results.
• The total length of your report must NOT exceed 7 sheets of paper (14 pages)! This means you will have to make some hard decisions as to which plots you should include; typically only extremely compelling ones.