STA 6166 Assignment 4

This assignment is comprised of two parts, each with its own dataset. Part I is on categorical data analysis. Part II is a simple linear regression analysis. Typeset your results as a report, using the same format, template, and general instructions as for previous Assignments. Label each of the two parts clearly within each section of your report.

Part I: Survival Rates Aboard the Titanic

In 1912 the luxury liner Titanic, dubbed the unsinkable, struck an iceberg and sank on its maiden voyage across the North Atlantic. Some passengers got off the ship in lifeboats, but many died. Think of the Titanic disaster as an experiment on how the people of that time behaved when faced with death in a situation where only some can escape. The passengers can be regarded as a (hopefully random) sample from all socioeconomic strata of Western society. Here are the numbers on who lived and who died, cross-classified by economic status. (The data are not fictitious, but a few passengers with unknown economic status have been omitted.)

Men Women
Died Survived Died Survived
Highest 111 61 6 126
Middle 150 22 13 90
Lowest 419 85 107 101
Total 680 168 126 317

Your task is to analyze this data using the methods of Unit 3, by providing specific answers to the following questions:

  1. Common lore has it that priority was given to women on the lifeboats. Compare the proportions of men and women who died. Is there statistical evidence that a higher proportion of men die in such situations?
  2. Disregarding gender, is there evidence of an association between survival and economic status? If so, indicate which cell(s) in the table contributed the most to this evidence.

Part II: Plant Species Diversity in the Galapagos Islands

The Galapagos Islands off the coast of Ecuador provide an excellent laboratory for studying factors that influence the development and survival of different species. These data give the number of species of plants and related geographic variables for 30 different islands. Counts are given both for the total number of species, and the number of species that occur only on that specific island (the endemics). The variables from left to right are as follows.

  1. Island: island name
  2. S: number of species
  3. E: number of endemics
  4. A1: area (km^2)
  5. El: highest elevation (m)
  6. D1: distance from nearest island (km)
  7. D2: distance from Santa Cruz (km)
  8. A2: area of adjacent island (km^2)

On close inspection of the data, you will notice that the number of species and endemics on the island of Daphne Minor was not recorded (the periods denote missing observations). Build a simple linear regression model to predict the number of species (S) on Daphne Minor based on the single best geographic variable A1, El, D1, D2, or A2, indicating your reason for the choice of variable. Also give an appropriate (confidence or prediction) 95% interval for the prediction. (Ignore the endemics in this question.)