| term | estimate |
|---|---|
| (Intercept) | -1.386 |
| birdYes | 1.356 |
Apr 21, 2026
Project report and organized GitHub repo - due Wednesday, April 22 at 11:59pm
Course and TA evaluations - open until April 25 at 11:59pm
Final exam: April 30, 9am - 12pm
Any questions about the project?
Please share your feedback about the course!
Course evaluations and TA evaluations are open now until April 25 at 11:59pm.
If the response rate is at least 80% on both course evaluations and TA evaluations, everyone in the class will receive 0.5 points (out of 50) added to their final exam grade.
Check email for links to course and TA evaluations.
In-class only: April 30, 9am - 12pm
Cumulative exam covering content from the full semester
Format similar to Exam 01 and Exam 02
Official university documentation or dean’s excuse required to excuse the exam
Can bring one page (front and back) of handwritten notes written directly on paper
Turn in with the exam
Point deductions for notes that don’t follow rules
Linear regression
Logistic regression
Data science ethics
Rework derivations from assignments and lecture notes
Review in-class activities and assignments, asking “why” as you review your process and reasoning
Focus on understanding not memorization
Explain concepts / process to others
Ask questions in office hours
Review prepare readings
Review lecture recordings (available until start of final exam)
Prepare readings
Lecture notes
Lecture recordings available until start of exam
HW and lab assignments (all solutions on Canvas)
Exam 01 and Exam 02 practice problems
\(Y\) is a binary response variable.
What is the distribution of \(Y\)?
What is the variance of \(y_i\), given the logistic regression model?
Why doesn’t constant variance hold for logistic regression?
Let \(\pi_i\) be the probability an individual is diagnosed with lung cancer. We fit the model
\[ \log\Big(\frac{\hat{\pi}_i}{1-\hat{\pi}_i}\Big) = \hat{\beta}_0 + \hat{\beta}_1 \text{birdkeepingYes} \]
| term | estimate |
|---|---|
| (Intercept) | -1.386 |
| birdYes | 1.356 |
Explain what -1.386 tells us.
Explain what 1.356 tells us.
| Birdkeeping | Cancer | No Cancer |
|---|---|---|
| No | 16 | 64 |
| Yes | 33 | 34 |
| term | df.resid | deviance | df | G | p.value |
|---|---|---|---|---|---|
| cancer ~ socioeconomic + age + bird | 143 | 172 | NA | NA | NA |
| cancer ~ socioeconomic + age + bird + yrsmoke + cigsday | 141 | 155 | 2 | 17 | 0 |
What is the distribution of the test statistic \(G\)?
How would do we know the p-value will be very small?
What does it mean for an estimator to be unbiased?
How do you show an estimator is unbiased?
\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]
What assumptions on \(\boldsymbol{\epsilon}\) are needed to derive the least-squares estimator?
What assumptions on \(\boldsymbol{\epsilon}\) are needed to conduct inference for \(\boldsymbol{\beta}\) ?
What assumptions on \(\boldsymbol{\epsilon}\) are needed to derive the MLE for \(\boldsymbol{\beta}\)?
\[ \text{Show }\hspace{1mm} SSR = \mathbf{y}^\mathsf{T}\mathbf{y} - \hat{\boldsymbol{\beta}}^\mathsf{T}\mathbf{X}^\mathsf{T}\mathbf{y} \]
Why might we use centered predictors in a model?
Why might we use standardized predictors in a model?