(b) Remember that the (empirical) MSE for multiple linear regression is MSE(0) = (y₁ - 00 - 01x₁,1 - - Op.xi.p) ² Use calculus to show that any 0 = [00, 0₁,...,0p] that minimizes the MSE must solve the normal equations. (Hint: Recall that, at a minimum of MSE, the partial derivatives of MSE with respect to every 0¿ must all be zero. Find these partial derivatives and compare them to your answer in Q3a.)

A First Course in Probability (10th Edition)
10th Edition
ISBN:9780134753119
Author:Sheldon Ross
Publisher:Sheldon Ross
Chapter1: Combinatorial Analysis
Section: Chapter Questions
Problem 1.1P: a. How many different 7-place license plates are possible if the first 2 places are for letters and...
icon
Related questions
Question

Please answer question (b)

Calculus Perspective of Normal Equations
3. In the lecture, we discussed a geometric argument to get the least squares estimator.
Based on the properties of orthogonality, we can obtain the normal equations below:
X' (Y − X0) = 0.
We can rearrange the equation to solve for when X is full column rank.
Ô = (XTX)-1XTY.
Here, we are using X to denote the design matrix:
X=
=
and
1 1,1 X1,2
1 X2,1
X2,2
:
:
1 xn,1 Xn,2
X1.P
x2.p
⠀
In.p
ē =
1
= 1 x1
where 1 is the vector of all 1s of length n and x, is the n-vector
it is the jth feature vector.
To build intuition for these equations and relate them to the SLR estimating equations,
we will derive them algebraically using calculus.
n
n
X2
1
(a) Show that finding the optimal estimator 0 by solving the normal equa- tions is
equivalent to requiring that the residual vector e = Y-X0^should average to zero,
and the residual vector e should be orthogonal to X¡ for every j. That is, show
that the matrix form of normal equation can be written as:
Xp
eį = 0
I1.j
In other words,
Inj.
we = Σvijei = 0
for all j = 1,..., p. (Hint: Expand the normal equation above and perform matrix
multiplication for the first few terms. Can you find a pattern?)
Transcribed Image Text:Calculus Perspective of Normal Equations 3. In the lecture, we discussed a geometric argument to get the least squares estimator. Based on the properties of orthogonality, we can obtain the normal equations below: X' (Y − X0) = 0. We can rearrange the equation to solve for when X is full column rank. Ô = (XTX)-1XTY. Here, we are using X to denote the design matrix: X= = and 1 1,1 X1,2 1 X2,1 X2,2 : : 1 xn,1 Xn,2 X1.P x2.p ⠀ In.p ē = 1 = 1 x1 where 1 is the vector of all 1s of length n and x, is the n-vector it is the jth feature vector. To build intuition for these equations and relate them to the SLR estimating equations, we will derive them algebraically using calculus. n n X2 1 (a) Show that finding the optimal estimator 0 by solving the normal equa- tions is equivalent to requiring that the residual vector e = Y-X0^should average to zero, and the residual vector e should be orthogonal to X¡ for every j. That is, show that the matrix form of normal equation can be written as: Xp eį = 0 I1.j In other words, Inj. we = Σvijei = 0 for all j = 1,..., p. (Hint: Expand the normal equation above and perform matrix multiplication for the first few terms. Can you find a pattern?)
(b) Remember that the (empirical) MSE for multiple linear regression is
MSE (0)
1
==
n
n
Σ(Yi-00 - 01x₁,1 - - Op.xi.p)²
i=1
Use calculus to show that any 0 = [00,01,0p] that minimizes the MSE must
solve the normal equations.
(Hint: Recall that, at a minimum of MSE, the partial derivatives of MSE with
respect to every ; must all be zero. Find these partial derivatives and compare
them to your answer in Q3a.)
Remark: The two subparts above again together show that the geometric perspec-
tive is equivalent to the calculus approach of solving derivative and setting it to 0
for OLS. This is a desirable property of a linear model with L2 loss, and it generally
does not hold true for other models and loss types. We hope these exercises clear
up some mysteries about the geometric derivation!
Transcribed Image Text:(b) Remember that the (empirical) MSE for multiple linear regression is MSE (0) 1 == n n Σ(Yi-00 - 01x₁,1 - - Op.xi.p)² i=1 Use calculus to show that any 0 = [00,01,0p] that minimizes the MSE must solve the normal equations. (Hint: Recall that, at a minimum of MSE, the partial derivatives of MSE with respect to every ; must all be zero. Find these partial derivatives and compare them to your answer in Q3a.) Remark: The two subparts above again together show that the geometric perspec- tive is equivalent to the calculus approach of solving derivative and setting it to 0 for OLS. This is a desirable property of a linear model with L2 loss, and it generally does not hold true for other models and loss types. We hope these exercises clear up some mysteries about the geometric derivation!
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer
Recommended textbooks for you
A First Course in Probability (10th Edition)
A First Course in Probability (10th Edition)
Probability
ISBN:
9780134753119
Author:
Sheldon Ross
Publisher:
PEARSON
A First Course in Probability
A First Course in Probability
Probability
ISBN:
9780321794772
Author:
Sheldon Ross
Publisher:
PEARSON