| term | estimate |
|---|---|
| (Intercept) | -1.386 |
| birdYes | 1.356 |
Final exam review
Announcements
Project report and organized GitHub repo - due Wednesday, April 22 at 11:59pm
Course and TA evaluations - open until April 25 at 11:59pm
Final exam: April 30, 9am - 12pm
Project
Any questions about the project?
Course evaluations
Please share your feedback about the course!
Course evaluations and TA evaluations are open now until April 25 at 11:59pm.
If the response rate is at least 80% on both course evaluations and TA evaluations, everyone in the class will receive 0.5 points (out of 50) added to their final exam grade.
Check email for links to course and TA evaluations.
Final exam
In-class only: April 30, 9am - 12pm
Cumulative exam covering content from the full semester
Format similar to Exam 01 and Exam 02
Official university documentation or dean’s excuse required to excuse the exam
Can bring one page (front and back) of handwritten notes written directly on paper
Turn in with the exam
Point deductions for notes that don’t follow rules
Topics
Linear regression
- Fitting and interpreting models
- Model evaluation
- Different types of predictors
- Inference
- Matrix representation of regression
- Model conditions and diagnostics
- Multicollinearity
- Variable transformations
- Maximum likelihood estimation
- Properties of estimators
- Model selection
- Ridge regression
Logistic regression
- Fitting and interpreting models
- Predicted probabilities and classes
- ROC curve and AUC
- Inference
- Model selection
Data science ethics
Tips for studying
Rework derivations from assignments and lecture notes
Review in-class activities and assignments, asking “why” as you review your process and reasoning
Focus on understanding not memorization
Explain concepts / process to others
Ask questions in office hours
Review prepare readings
Review lecture recordings (available until start of final exam)
Resources
Prepare readings
Lecture notes
Lecture recordings available until start of exam
HW and lab assignments (all solutions on Canvas)
Exam 01 and Exam 02 practice problems
Practice
Logistic regression response variable
\(Y\) is a binary response variable.
What is the distribution of \(Y\)?
What is the variance of \(y_i\), given the logistic regression model?
Why doesn’t constant variance hold for logistic regression?
Logistic regression coefficients
Let \(\pi_i\) be the probability an individual is diagnosed with lung cancer. We fit the model
\[ \log\Big(\frac{\hat{\pi}_i}{1-\hat{\pi}_i}\Big) = \hat{\beta}_0 + \hat{\beta}_1 \text{birdkeepingYes} \]
Explain what -1.386 tells us.
Explain what 1.356 tells us.
Logistic regression odds ratios
| Birdkeeping | Cancer | No Cancer |
|---|---|---|
| No | 16 | 64 |
| Yes | 33 | 34 |
Drop-in-deviance test
| term | df.resid | deviance | df | G | p.value |
|---|---|---|---|---|---|
| cancer ~ socioeconomic + age + bird | 143 | 172 | NA | NA | NA |
| cancer ~ socioeconomic + age + bird + yrsmoke + cigsday | 141 | 155 | 2 | 17 | 0 |
What is the distribution of the test statistic \(G\)?
How would do we know the p-value will be very small?
Unbiased estimator
What does it mean for an estimator to be unbiased?
How do you show an estimator is unbiased?
Linear regression
\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]
What assumptions on \(\boldsymbol{\epsilon}\) are needed to derive the least-squares estimator?
What assumptions on \(\boldsymbol{\epsilon}\) are needed to conduct inference for \(\boldsymbol{\beta}\) ?
What assumptions on \(\boldsymbol{\epsilon}\) are needed to derive the MLE for \(\boldsymbol{\beta}\)?
Sum of squares
\[ \text{Show }\hspace{1mm} SSR = \mathbf{y}^\mathsf{T}\mathbf{y} - \hat{\boldsymbol{\beta}}^\mathsf{T}\mathbf{X}^\mathsf{T}\mathbf{y} \]
Types of predictors
Why might we use centered predictors in a model?
Why might we use standardized predictors in a model?