Distribution of coefficients
February 05, 2026
HW 02 due Thursday, February 12 at 11:59pm
Exam 01 practice problems + lecture recordings posted on menu of course website
Today’s data come from Equity in Athletics Data Analysis and includes information about sports expenditures and revenues for colleges and universities in the United States. This data set was featured in a March 2022 Tidy Tuesday.
We will focus on the 2019 - 2020 season expenditures on football for institutions in the NCAA - Division 1 FBS. The following variables are used in this analysis:
total_exp_m: Total expenditures on football in the 2019 - 2020 academic year (in millions USD)
enrollment_th: Total student enrollment in the 2019 - 2020 academic year (in thousands)
type: institution type (Public or Private)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 19.332 | 2.984 | 6.478 | 0 |
| enrollment_th | 0.780 | 0.110 | 7.074 | 0 |
| typePublic | -13.226 | 3.153 | -4.195 | 0 |
0.780 is the exact relationship between football expenditure and enrollment for these 127 institutions in 2019-2020.
What if we want to say something about the relationship between these variables for all colleges and universities with football programs and across different years?
Statistical inference provides methods and tools so we can use the single observed sample to make valid statements (inferences) about the population it comes from
For our inferences to be valid, the sample should be representative (ideally random) of the population we’re interested in

Our objective is to infer properties about a population using data from observational (or experimental) data collection
Pre data collection: Before collecting the data, the data are unknown and random. \(\hat{\boldsymbol{\beta}}\), which is a function of the data, is also unknown and random.
Post data collection: After collecting the data, \(\hat{\boldsymbol{\beta}}\) is fixed and known.
In all cases: The true population parameter, \(\boldsymbol{\beta}\) is fixed but unknown.
Question pre data collection: Is the probability distribution of \(\hat{\mathbf{\boldsymbol{\beta}}}\) a meaningful representation of the population? (We will slowly answer this question across the next few classes)
\[\begin{aligned} \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}, \hspace{8mm} \boldsymbol{\epsilon} \sim N(\mathbf{0}, \sigma^2_{\epsilon}\mathbf{I}) \end{aligned} \]
such that the errors are independent and normally distributed.
\[ \mathbf{y}|\mathbf{X} \sim N(\mathbf{X}\boldsymbol{\beta}, \sigma_\epsilon^2\mathbf{I}) \]
Image source: Introduction to the Practice of Statistics (5th ed)
Last time we showed \(E(\mathbf{y}|\mathbf{X}) = \mathbf{X}\boldsymbol{\beta}\) and \(Var(\mathbf{y}|\mathbf{X}) = \sigma^2_{\epsilon}\mathbf{I}\)
Linear transformation of normal random variables
Suppose \(\mathbf{z}\) is a multivariate normal random variable. Then, \(\mathbf{Az} + \mathbf{b}\) is (multivariate) normal for \(\mathbf{A}\) a constant matrix and \(\mathbf{b}\) a constant vector.
Show that the distribution of \(\mathbf{y}|\mathbf{X}\) is normal.
\[ \mathbf{y}|\mathbf{X} \sim N(\mathbf{X}\boldsymbol{\beta}, \sigma_\epsilon^2\mathbf{I}) \]

Note
We will assume these hold for now and show how to check the assumptions after Exam 01.
Once we fit the model, we can use the residuals to estimate \(\sigma_{\epsilon}^2\)
The estimated value \(\hat{\sigma}^2_{\epsilon}\) is needed for inference on the coefficients
\[ \hat{\sigma}^2_\epsilon = \frac{SSR}{n - p - 1} = \frac{\mathbf{e}^\mathsf{T}\mathbf{e}}{n-p-1} \]
\[ \hat{\sigma}_\epsilon = \sqrt{\frac{SSR}{n - p - 1}} = \hat{\sigma}_\epsilon = \sqrt{\frac{\mathbf{e}^\mathsf{T}\mathbf{e}}{n - p - 1}} \]
We often want to conduct inference on individual model coefficients
Hypothesis test: Is there evidence of a linear relationship between the response and \(x_j\)? \((\beta_j \neq 0 ? )\)
Confidence interval: What is a plausible range of values \(\beta_j\) can take?
But first we need to understand the distribution of \(\hat{\beta}_j\)
A sampling distribution is the probability distribution of a statistic computed from many repeated random samples of size \(n\) drawn from a population.
The sampling distribution of \(\hat{\boldsymbol{\beta}}\) is the probability distribution of the estimated coefficients, formed by repeatedly taking sample of size \(n\) and fitting the regression model to compute \(\hat{\boldsymbol{\beta}}\)
\[ \hat{\boldsymbol{\beta}} \sim N(\boldsymbol{\beta}, \sigma^2_\epsilon(\mathbf{X}^\mathsf{T}\mathbf{X})^{-1}) \]
The estimated coefficients \(\hat{\boldsymbol{\beta}}\) are normally distributed with
\[ E(\hat{\boldsymbol{\beta}}) = \boldsymbol{\beta} \hspace{10mm} Var(\hat{\boldsymbol{\beta}}) = \sigma^2_{\epsilon}(\boldsymbol{X}^\mathsf{T}\boldsymbol{X})^{-1} \]
Show
\(E(\hat{\boldsymbol{\beta}}) = \boldsymbol{\beta}\)
\(Var(\hat{\boldsymbol{\beta}}) = \sigma^2_{\epsilon}(\boldsymbol{X}^\mathsf{T}\boldsymbol{X})^{-1}\)
Will show that \(\hat{\boldsymbol{\beta}}\) is normally distributed in the homework.
\[ \hat{\boldsymbol{\beta}} \sim N(\boldsymbol{\beta}, \sigma^2_\epsilon(\mathbf{X}^\mathsf{T}\mathbf{X})^{-1}) \]
Let \(\mathbf{C} = (\mathbf{X}^\mathsf{T}\mathbf{X})^{-1}\). Then, for each coefficient \(\hat{\beta}_j\),
\(E(\hat{\beta}_j) = \boldsymbol{\beta}_j\), the \(j^{th}\) element of \(\boldsymbol{\beta}\)
\(Var(\hat{\beta}_j) = \sigma^2_{\epsilon}C_{jj}\), where \(C_{jj}\) is the \(j^{th}\) diagonal element
\(Cov(\hat{\beta}_i, \hat{\beta}_j) = \sigma^2_{\epsilon}C_{ij}\)