STAT 5371 -- Regression Analysis -- Spring 2023


Basic Information

Course instructor: Dr. Alex Trindade, 233 Mathematics & Statistics Building.
E-mail: alex.trindade"at"ttu.edu.
Course Meets: 11:00-12:20 TR, face-to-face in Math 115.
Office Hours: TWR 1:00-2:00, or by appointment.

Required Books

  • Linear Models with R, by Julian Faraway, 2nd ed., 2014, CRC Press. ISBN-13: 978-1439887332. (I will abbreviate this book "LMR".)
  • Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, by Julian Faraway, 2005, CRC Press.

    Useful Books

  • Generalized Linear Models, by McCullagh and Nelder, 2nd ed., 1989, CRC Press.
  • An Introduction to Generalized Linear Models, by Dobson and Barnett, 3rd ed., 2008, CRC Press.

    Course Objectives and Syllabus

    The course will cover theory and methods for linear regression and generalized linear models (GLMs), including also some coverage of nonlinear regression. A full treatment of the linear regression model is given, focusing on results from mathematical statistics making use of matrix algebra. Computational methods and software will be used to analyze datasets based on ``canned routines'' as well as a matrix language. Prerequisite: STAT 5329 (Math-Stat). List of topics: Note that Chs 1-10 of LMR are essentially covered by the LM Course Notes, which we will follow for the first half of the course. The 2nd half of the course will follow the GLM Course Notes. (See "Notes and Handouts" below.)

    Expected Student Learning Outcomes

    By the end of the course students will be familiar with the theory (theorems and formulas) and practical aspects (ability to use a statistical package for implementation) of regression modeling. Given a dataset, they will be able to fit a suitable model by considering an appropriate subset of predictors using model selection techniques. This may include searching for appropriate transformations of the variables so as to linearize relationships, diagnosing collinearity, and assessing lack of fit and other potential problems with the model. They will be able to write down a matrix representation for the solution of the least squares criterion used to yield parameter estimates and their standard errors under the normal assumption. They will be able to construct confidence intervals and test hypotheses about linear combinations of model parameters (contrasts), and be able to interpret and summarize their findings. They will be able to embed these methods and results in the framework of the generalized linear model, which will allow extensions to a much larger class of linear models with a variety of response distributions and form for the link function connecting expected response to linear predictor. Finally, they will become proficient in communicating these results in the form of a scientific paper.

    Methods of Assessing the Expected Learning Outcomes

    The expected learning outcomes for the course will be assessed through a mix of homework assignments (35%), a midterm test (25%), a data analysis project (10%), and a comprehensive final exam (30%). The traditional grading scale will be used: The test schedule is as follows:

    Homework Assignments

    There will be weekly Assignment Sets. All work is to be uploaded to Blackboard. No late submissions will be accepted.

    Data Analysis Project

    Search the web to find suitable datasets to analyze via linear regression or GLM. You may also have some ideas from your own research. BUT: the data must not already have been analyzed by someone (as far as you know). Make sure there is a sufficiently large pool of predictors (at least 10), so that model selection will have to be used. Likewise, and since we are focusing on classical methods, the corresponding sample size (n) must not be smaller than the number of predictors (p), and should in fact be much larger. Send me a 1-page proposal of what you intend to do, basically describing the data and its source, and what your goal is. Once I OK it, you can proceed. Project due date: Saturday April 15. Project grade (out of 20) will be based on validity (10), quality (5), and clarity (5). Some ideas on finding suitable datasets.

    Notes and Handouts

    R Demos from old course

    Software

    I will use R as the primary software tool. SAS is also recommended. Some assignments will require extensive use of a software package of your choice. While we will focus on the theory, the applied data modeling aspect is an important complement that greatly helps in understanding the methodology. For details on R see my statistical computing page, and especially the section on "Linear Models & GLMs".

    Policies


    top of page