Final exam review

Prof. Maria Tackett

Apr 21, 2026

Announcements

Project report and organized GitHub repo - due Wednesday, April 22 at 11:59pm
Course and TA evaluations - open until April 25 at 11:59pm
Final exam: April 30, 9am - 12pm

Project

Any questions about the project?

Course evaluations

Please share your feedback about the course!

Course evaluations and TA evaluations are open now until April 25 at 11:59pm.

If the response rate is at least 80% on both course evaluations and TA evaluations, everyone in the class will receive 0.5 points (out of 50) added to their final exam grade.
Check email for links to course and TA evaluations.

Final exam

In-class only: April 30, 9am - 12pm
Cumulative exam covering content from the full semester
Format similar to Exam 01 and Exam 02
Official university documentation or dean’s excuse required to excuse the exam
Can bring one page (front and back) of handwritten notes written directly on paper
- Turn in with the exam
- Point deductions for notes that don’t follow rules

Topics

Linear regression

Fitting and interpreting models
Model evaluation
Different types of predictors
Inference
Matrix representation of regression
Model conditions and diagnostics
Multicollinearity
Variable transformations

Maximum likelihood estimation
Properties of estimators
Model selection
Ridge regression

Logistic regression

Fitting and interpreting models
Predicted probabilities and classes
ROC curve and AUC
Inference
Model selection

Data science ethics

Tips for studying

Rework derivations from assignments and lecture notes
Review in-class activities and assignments, asking “why” as you review your process and reasoning
Focus on understanding not memorization
Explain concepts / process to others
Ask questions in office hours
Review prepare readings
Review lecture recordings (available until start of final exam)

Resources

Prepare readings
Lecture notes
Lecture recordings available until start of exam
HW and lab assignments (all solutions on Canvas)
Exam 01 and Exam 02 practice problems

Practice

Logistic regression response variable

\(Y\) is a binary response variable.

What is the distribution of \(Y\)?
What is the variance of \(y_i\), given the logistic regression model?
Why doesn’t constant variance hold for logistic regression?

Logistic regression coefficients

Let \(\pi_i\) be the probability an individual is diagnosed with lung cancer. We fit the model

\[ \log\Big(\frac{\hat{\pi}_i}{1-\hat{\pi}_i}\Big) = \hat{\beta}_0 + \hat{\beta}_1 \text{birdkeepingYes} \]

term	estimate
(Intercept)	-1.386
birdYes	1.356

Explain what -1.386 tells us.
Explain what 1.356 tells us.

Logistic regression odds ratios

Birdkeeping	Cancer	No Cancer
No	16	64
Yes	33	34

Drop-in-deviance test

term	df.resid	deviance	df	G	p.value
cancer ~ socioeconomic + age + bird	143	172	NA	NA	NA
cancer ~ socioeconomic + age + bird + yrsmoke + cigsday	141	155	2	17	0

What is the distribution of the test statistic \(G\)?
How would do we know the p-value will be very small?

Unbiased estimator

What does it mean for an estimator to be unbiased?
How do you show an estimator is unbiased?

Linear regression

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]

What assumptions on \(\boldsymbol{\epsilon}\) are needed to derive the least-squares estimator?
What assumptions on \(\boldsymbol{\epsilon}\) are needed to conduct inference for \(\boldsymbol{\beta}\) ?
What assumptions on \(\boldsymbol{\epsilon}\) are needed to derive the MLE for \(\boldsymbol{\beta}\)?

Sum of squares

\[ \text{Show }\hspace{1mm} SSR = \mathbf{y}^\mathsf{T}\mathbf{y} - \hat{\boldsymbol{\beta}}^\mathsf{T}\mathbf{X}^\mathsf{T}\mathbf{y} \]

Types of predictors

Why might we use centered predictors in a model?
Why might we use standardized predictors in a model?