library(tidyverse)
library(tidymodels)
library(knitr)
football <- read_csv("data/ncaa-football-exp.csv")AE 03: Inference
NCAA Football Expenditures
Go to the course GitHub organization and locate your ae-03 repo to get started.
Set up
Run the code below to load the required packages and data.
Data: NCAA football expenditures
Today’s data set comes from Equity in Athletics Data Analysis and includes information about sports expenditures and revenues for colleges and universities in the United States. This data set was featured in a March 2022 Tidy Tuesday.
We will focus on the 2019 - 2020 season expenditures on football for institutions in the NCAA - Division 1 FBS. The following variables are used in this analysis:
total_exp_m: Total expenditures on football in the 2019 - 2020 academic year (in millions USD)enrollment_th: Total student enrollment in the 2019 - 2020 academic year (in thousands)type: institution type (Public or Private)
Regression model
exp_fit <- lm(total_exp_m ~ enrollment_th + type, data = football)
tidy(exp_fit)|>
kable(digits = 3)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 19.332 | 2.984 | 6.478 | 0 |
| enrollment_th | 0.780 | 0.110 | 7.074 | 0 |
| typePublic | -13.226 | 3.153 | -4.195 | 0 |
Part 1: Distribution of \(\hat{\boldsymbol{\beta}}\)
Our ultimate goal is to use statistical inference to draw conclusions about the relationship between student enrollment and spending on football programs, after accounting for the institution type.
Exercise 1
We will use the vector of responses \(\mathbf{y}\) and the design matrix \(\mathbf{X}\) to compute the values needed for inference.
Obtain \(\mathbf{y}\) and \(\mathbf{X}\) from the football data frame. What are their dimensions?
Exercise 2
Use \(\mathbf{y}\) and \(\mathbf{X}\) to compute the estimated coefficients \(\hat{\boldsymbol{\beta}}\).
Exercise 3
Compute the residuals using the matrix/vector notation. You do not need to print the residuals.
Use the residuals to compute \(\hat{\sigma}^2_{\epsilon}\).
Exercise 4
Compute \(Var(\hat{\boldsymbol{\beta}})\). What are its dimensions?
Explain how the standard error for \(\hat{\beta}_j\), the estimated coefficient of
enrollment_th, is computed from \(Var(\hat{\boldsymbol{\beta}})\).
Part 2: Hypothesis test
We want to conduct a hypothesis test to determine if there is a linear relationship between enrollment and football expenditures, after accounting for institution type.
Exercise 5
State the null and alternative hypotheses in words and using mathematical notation.
Exercise 6
Compute the test statistic for this hypothesis test.
Exercise 7
Now we will compute a p-value to help make our final conclusion.
State the distribution used to calculate the p-value. Be specific.
Fill in the code below to calculate the p-value. Remove
#| eval: falseonce you’ve filled in the code.
2 * pt(test-statistic, df, lower.tail = FALSE)Exercise 8
State the conclusion in the context of the data. Use a threshold of \(\alpha = 0.05\).
Wrapping up
Once you’ve completed the AE:
- Render the document to produce the PDF with all of your work from today’s class.
- Push all your work to your AE repo on GitHub. You’re done! 🎉