Final exam review

Author

Prof. Maria Tackett

Published

Apr 21, 2026

Announcements

  • Project report and organized GitHub repo - due Wednesday, April 22 at 11:59pm

  • Course and TA evaluations - open until April 25 at 11:59pm

  • Final exam: April 30, 9am - 12pm

Project

Any questions about the project?

Course evaluations

Please share your feedback about the course!

Course evaluations and TA evaluations are open now until April 25 at 11:59pm.

  • If the response rate is at least 80% on both course evaluations and TA evaluations, everyone in the class will receive 0.5 points (out of 50) added to their final exam grade.

  • Check email for links to course and TA evaluations.

Final exam

  • In-class only: April 30, 9am - 12pm

  • Cumulative exam covering content from the full semester

  • Format similar to Exam 01 and Exam 02

  • Official university documentation or dean’s excuse required to excuse the exam

  • Can bring one page (front and back) of handwritten notes written directly on paper

    • Turn in with the exam

    • Point deductions for notes that don’t follow rules

Topics

Linear regression

  • Fitting and interpreting models
  • Model evaluation
  • Different types of predictors
  • Inference
  • Matrix representation of regression
  • Model conditions and diagnostics
  • Multicollinearity
  • Variable transformations
  • Maximum likelihood estimation
  • Properties of estimators
  • Model selection
  • Ridge regression

Logistic regression

  • Fitting and interpreting models
  • Predicted probabilities and classes
  • ROC curve and AUC
  • Inference
  • Model selection

Data science ethics

Tips for studying

  • Rework derivations from assignments and lecture notes

  • Review in-class activities and assignments, asking “why” as you review your process and reasoning

  • Focus on understanding not memorization

  • Explain concepts / process to others

  • Ask questions in office hours

  • Review prepare readings

  • Review lecture recordings (available until start of final exam)

Resources

  • Prepare readings

  • Lecture notes

  • Lecture recordings available until start of exam

  • HW and lab assignments (all solutions on Canvas)

  • Exam 01 and Exam 02 practice problems

Practice

Logistic regression response variable

\(Y\) is a binary response variable.

  • What is the distribution of \(Y\)?

  • What is the variance of \(y_i\), given the logistic regression model?

  • Why doesn’t constant variance hold for logistic regression?

Logistic regression coefficients

Let \(\pi_i\) be the probability an individual is diagnosed with lung cancer. We fit the model

\[ \log\Big(\frac{\hat{\pi}_i}{1-\hat{\pi}_i}\Big) = \hat{\beta}_0 + \hat{\beta}_1 \text{birdkeepingYes} \]

term estimate
(Intercept) -1.386
birdYes 1.356
  • Explain what -1.386 tells us.

  • Explain what 1.356 tells us.

Logistic regression odds ratios

Birdkeeping Cancer No Cancer
No 16 64
Yes 33 34

Drop-in-deviance test

term df.resid deviance df G p.value
cancer ~ socioeconomic + age + bird 143 172 NA NA NA
cancer ~ socioeconomic + age + bird + yrsmoke + cigsday 141 155 2 17 0
  • What is the distribution of the test statistic \(G\)?

  • How would do we know the p-value will be very small?

Unbiased estimator

  • What does it mean for an estimator to be unbiased?

  • How do you show an estimator is unbiased?

Linear regression

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]

  • What assumptions on \(\boldsymbol{\epsilon}\) are needed to derive the least-squares estimator?

  • What assumptions on \(\boldsymbol{\epsilon}\) are needed to conduct inference for \(\boldsymbol{\beta}\) ?

  • What assumptions on \(\boldsymbol{\epsilon}\) are needed to derive the MLE for \(\boldsymbol{\beta}\)?

Sum of squares

\[ \text{Show }\hspace{1mm} SSR = \mathbf{y}^\mathsf{T}\mathbf{y} - \hat{\boldsymbol{\beta}}^\mathsf{T}\mathbf{X}^\mathsf{T}\mathbf{y} \]

Types of predictors

  • Why might we use centered predictors in a model?

  • Why might we use standardized predictors in a model?

Questions?