STAT 5371 -- Regression Analysis -- Spring 2021


Basic Information

Course instructor: Dr. Alex Trindade, 228 Mathematics & Statistics Building.
E-mail: alex.trindade"at"ttu.edu.
Course Meets: 11:00-12:20 TR, face-to-face in Math 017, and simultaneously on this zoom link (Passcode: 833999).
Office Hours: TWR 2:00-3:00, via this zoom link.

Required Book

  • Linear Models with R, by Julian Faraway, 2nd ed., 2014, CRC Press. ISBN-13: 978-1439887332. (I will abbreviate this book "LMR".)

    Useful Books

  • Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, by Julian Faraway, 2005, CRC Press.
  • Generalized Linear Models, by McCullagh and Nelder, 2nd ed., 1989, CRC Press.
  • An Introduction to Generalized Linear Models, by Dobson and Barnett, 3rd ed., 2008, CRC Press.

    Course Objectives and Syllabus

    The course will cover theory and methods for linear regression and generalized linear models (GLMs), including also some coverage of nonlinear regression. A full treatment of the linear regression model is given, focusing on results from mathematical statistics making use of matrix algebra. Computational methods and software will be used to analyze datasets based on ``canned routines'' as well as a matrix language. Prerequisite: STAT 5329 (Math-Stat). List of topics: Note that Chs 1-10 of LMR are essentially covered by the LM Course Notes, which we will follow for the first half of the course. The 2nd half of the course will follow the GLM Course Notes. (See "Notes and Handouts" below.)

    Expected Student Learning Outcomes

    By the end of the course students will be familiar with the theory (theorems and formulas) and practical aspects (ability to use a statistical package for implementation) of regression modeling. Given a dataset, they will be able to fit a suitable model by considering an appropriate subset of predictors using model selection techniques. This may include searching for appropriate transformations of the variables so as to linearize relationships, diagnosing collinearity, and assessing lack of fit and other potential problems with the model. They will be able to write down a matrix representation for the solution of the least squares criterion used to yield parameter estimates and their standard errors under the normal assumption. They will be able to construct confidence intervals and test hypotheses about linear combinations of model parameters (contrasts), and be able to interpret and summarize their findings. They will be able to embed these methods and results in the framework of the generalized linear model, which will allow extensions to a much larger class of linear models with a variety of response distributions and form for the link function connecting expected response to linear predictor. Finally, they will become proficient in communicating these results in the form of a scientific paper.

    Methods of Assessing the Expected Learning Outcomes

    The expected learning outcomes for the course will be assessed through a mix of homework assignments (35%), a midterm test (25%), and data analysis project (10%), and a comprehensive final exam (30%). The traditional grading scale will be used: The test schedule is as follows:

    Homework Assignments

    There will be weekly Assignment Sets. All work is to be uploaded to Blackboard. No late submissions will be accepted.

    Data Analysis Project

    Search the web to find suitable datasets to analyze via linear regression or GLM. You may also have some ideas from your own research. BUT: the data must not already have been analyzed by someone (as far as you know). Make sure there is a sufficiently large pool of predictors (at least 10), so that model selection will have to be used. Likewise, and since we are focusing on classical methods, the corresponding sample size (n) must not be smaller than the number of predictors (p), and should in fact be much larger. Send me a 1-page proposal of what you intend to do, basically describing the data and its source, and what your goal is. Once I OK it, you can proceed. Project due date: Thursday April 15.

    Notes and Handouts

    R Demos from old course

    Software

    I will use R as the primary software tool. SAS is also recommended. Some assignments will require extensive use of a software package of your choice. While we will focus on the theory, the applied data modeling aspect is an important complement that greatly helps in understanding the methodology. For details on R see my statistical computing page, and especially the section on "Linear Models & GLMs".

    Policies

    Pandemic


    top of page