STAT 5371 -- Regression Analysis -- Spring 2025
Basic Information
Course instructor:
Dr. Alex
Trindade, 233 Mathematics & Statistics Building.
E-mail: alex.trindade"at"ttu.edu.
Course Meets: 11:00-12:20 TR, face-to-face in Math 014.
Office Hours: TWR 1:00-2:00, or by appointment.
Required Books
Linear Models with R, by Julian Faraway, 2nd ed., 2014, CRC Press. ISBN-13: 978-1439887332. (I will abbreviate this book "LMR".)
Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, by Julian Faraway, 2nd ed., 2016, CRC Press. (I will abbreviate this book "ELMR".)
Useful Books
Generalized Linear Models, by McCullagh and Nelder, 2nd ed., 1989, CRC Press.
An Introduction to Generalized Linear Models, by Dobson and Barnett, 3rd ed., 2008, CRC Press.
Course Objectives and Syllabus
The course will cover theory and methods for linear regression and generalized linear models (GLMs), including also some coverage of nonlinear regression. A full treatment of the linear regression model is given, focusing on results from mathematical statistics making use of matrix algebra. Computational methods and software will be used to analyze datasets based on ``canned routines'' as well as a matrix language. Prerequisite: STAT 5329 (Math-Stat). List of topics:
- Introduction (Ch 1 of LMR).
- Estimation (Ch 2 of LMR).
- Inference (Ch 3 of LMR).
- Prediction (Ch 4 of LMR).
- Explanation (Ch 5 of LMR)
- Diagnostics, problems with predictors and errors, transformations to correct (Chs 6-9 of LMR)
- Model selection (Ch 10 of LMR)
- Intro to nonlinear and nonparametric regression (LM Course Notes)
- Intro to Generalized Linear Models, esp. Logistic Regression (GLM Course Notes)
Note that Chs 1-10 of LMR are essentially covered by the LM Course Notes, which we will follow for the first half of the course. The 2nd half of the course will follow the GLM Course Notes. (See "Notes and Handouts" below.)
Expected Student Learning Outcomes
By the end of the course students will be familiar with the theory (theorems and formulas) and practical aspects (ability to use a statistical package for implementation) of regression modeling. Given a dataset, they will be able to fit a suitable model by considering an appropriate subset of predictors using model selection techniques. This may include searching for appropriate transformations of the variables so as to linearize relationships, diagnosing collinearity, and assessing lack of fit and other potential problems with the model. They will be able to write down a matrix representation for the solution of the least squares criterion used to yield parameter estimates and their standard errors under the normal assumption. They will be able to construct confidence intervals and test hypotheses about linear combinations of model parameters (contrasts), and be able to interpret and summarize their findings. They will be able to embed these methods and results in the framework of the generalized linear model, which will allow extensions to a much larger class of linear models with a variety of response distributions and form for the link function connecting expected response to linear predictor. Finally, they will become proficient in communicating these results in the form of a scientific paper.
Methods of Assessing the Expected Learning Outcomes
The expected learning outcomes for the course will be assessed through a mix of homework assignments (35%), a midterm test (25%), a data analysis project (10%), and a comprehensive final exam (30%). The traditional grading scale will be used:
- A: 90-100%.
- B: 80-89%.
- C: 70-79%.
- D: 60-69%.
- F: 0-59%.
The test schedule is as follows:
- Midterm: Thursday March 13.
- Final Exam: TBA.
Homework Assignments
There will be weekly Assignment Sets. All work is to be uploaded to Blackboard. No late submissions will be accepted. Only a subset of the hwk may be graded; if your hwk omits the problem(s) chosen to be graded your grade will be zero. Start each problem on a new page.
- Set 0 (due Jan 17): Read Ch 1 of LMR and do Exercise 1.1. (Not graded.)
- The remaining Hwk Sets are on Blackboard (Hwk 1 is due Fri Jan 24.)
Data Analysis Project
Search the web to find suitable datasets to analyze via linear regression or GLM. You may also have some ideas from your own research. BUT: the data must not already have been analyzed by someone (as far as you know). Make sure there is a sufficiently large pool of predictors (at least 10), so that model selection will have to be used. Likewise, and since we are focusing on classical methods, the corresponding sample size (n) must not be smaller than the number of predictors (p), and should in fact be much larger. Send me a 1-page proposal of what you intend to do, basically describing the data and its source, and what your goal is. Once I OK it, you can proceed. Project due date: Saturday April 15. Project grade (out of 20) will be based on validity (10), quality (5), and clarity (5). Some ideas on finding suitable datasets.
Notes and Handouts
R Demos from old course
Software
I will use R as the primary software tool. SAS is also recommended. Some assignments will require extensive use of a software package of your choice. While we will focus on the theory, the applied data modeling aspect is an important complement that greatly helps in understanding the methodology. For details on R see my statistical computing page, and especially the section on "Linear Models & GLMs".
Policies
- Required Texas Tech Policies can be found here.
- Recommended Texas Tech Policies can be found here.
- Use of Generative AI Tools. The use of generative AI tools (such as ChatGPT) is not permitted in this course; therefore, any use of AI tools for work in this class may be
considered a violation of Texas Tech's Academic Integrity policy and
the Student Code of Conduct since the work is not your own. The use of
unauthorized AI tools will result in referral to the Office of Student
Conduct.
- Electronic Devices in Tests. In the spirit of keeping costs down, I will permit the usage of apps on smart devices (phones, tablets, laptops, etc.), but any kind of communication or accessing of the web via these devices is forbidden.
- Collaboration. My policies on this are as follows.
- Homeworks: Discussion with peers regarding material/concepts covered in the
course is permitted, and is encouraged since it usually leads to greater comprehension. However, each person must write up his/her own
solution to a particular problem, and not simply have someone else do it for them.
- Tests: Any form of collaboration on tests, including e-device communication or trying to see what the person next to you is writing, is strictly forbidden and will not be tolerated.
top of page