PS2
pdf
keyboard_arrow_up
School
University of Illinois, Chicago *
*We aren’t endorsed by this school
Course
575
Subject
Mathematics
Date
Feb 20, 2024
Type
Pages
8
Uploaded by JusticeMagpieMaster175
Q1 Linear Regression Basic
22 Points
Q1.1
5 Points
Linear Regression is a supervised machine learning model/algorithm for predicting a continuous output variable.
True
False
Q1.2
5 Points
Which of the following offsets does the linear regression uses for its least square line fitting? Feel free to assume that the horizontal axis is an independent variable and the vertical axis is a dependent variable?
Vertical offset
Perpendicular offset
Both, depending on the situation
None of above
Q1.3
7 Points
Which of the following evaluation metrics can be properly used to evaluate a model that predicts a continuous output variable? (select all correct, No partial credits for selecting less or more items)
Q1.4
5 Points
Given the pictures of vertical and perpendicular offsets, choose the right one for residual.
Absolute Error ∣
y
−
∣
y
^
Squared Error (
y
−
)
y
^
2
Cubic Error (
y
−
)
y
^
3
0-1 Error 1{
y
=
}
y
^
Measured by vertical offsets. Higher is better.
Measured by vertical offsets. Lower is better.
Measured by perpendicular offsets. Higher is better.
Measured by perpendicular offsets. Lower is better.
None of the above.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Q2 Multivariate Calculus Basic
30 Points
Q2.1
10 Points
For a real vector , a multivariate function is defined to . Evaluate the gradient with respect to . (Free Response)
∇
h(x) = (3, 8.5, −1, 5)
Q2.2
5 Points
Is the result from Q2.1 a scalar or a vector?
Q2.3
10 Points
Say . Find the tangent at the point (-2,2): (this is autograded, please only fill in the final result number!)
-216
x
= (
x
,
x
,
x
,
x
) ∈
1
2
3
4
R
4
h
(
x
) = 3
x
+
1
8.5
x
−
2
x
+
3
5
x
4
∇
h
(
x
)
x
∇
h
(
x
)
a scalar
a vector
None of the above
h
(
x
,
x
) =
1
2
(1 + 2
x
+
1
3
x
)
2
2 2
∂
x
2
∂
h
Q2.4
5 Points
Say the answer value evaluated for Q2.3 is . What would be its interpretation?
s
Tiny change in by (with the fixed ) will change by .
x
1
ϵ
x
2
h
s
Tiny change in by (with the fixed ) will change by .
x
2
ϵ
x
1
h
s
Tiny change in by (with the fixed ) will change by .
x
1
ϵ
x
2
h
ϵs
Tiny change in by (with the fixed ) will change by .
x
2
ϵ
x
1
h
ϵs
None of the above
Q3 Linear Regression
48 Points
In class, we derived linear regression and various learning algorithms based on gradient descent. In addition to the least square objective, we also learned its probabilistic perspective where each observation is assumed to have a Gaussian noise. (i.e., Noise of each example is an independent and identically distributed sample from a normal distribution) In this problem, you are supposed to deal with the following regression model that includes one feature that relates linearly/quadratically and another linear feature .
where Q3.1
5 Points
The above equation says that linear regresion assumes is also a random variable due the amount of uncertainty given by the noise term . What distribution would the random output variable follow?
x
1
x
2
y
=
θ
+
0
θ
x
+
1
1
θ
x
+
2
2
θ
x
+
3
1
2
ϵ
ϵ
∼
N
(0,
σ
)
2
y
ϵ
y
uniform distribution
poisson distribution
normal distribution
Bernoulli distribution
multinomial distribution
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Q3.2
5 Points
Which of the following equations correspond to the mean of the distribution that follows? (i.e., the expectation )?
Q3.3
6 Points
You are provided with a training observations .Derive the conditional log-likelihood that will be later maximized to make most likely.
y
E
[
y
∣
x
,
x
]
1
2
θ
0
θ
+
0
θ
x
2
2
θ
+
0
θ
x
+
1
1
θ
x
3
1
2
θ
+
0
θ
x
+
1
1
θ
x
+
2
2
θ
x
3
1
2
θ
+
0
θ
x
+
1
1
θ
x
+
2
2
θ
x
3
1
θ
+
0
θ
x
+
1
1
θ
x
+
2
2
2
θ
x
3
1
D
= {(
x
,
x
,
y
∣1 ≤
1
(
i
)
2
(
i
)
(
i
)
i
≤
m
}
D
Q3.4
6 Points
If you omit all the constant that does not relate to our parameters ±, what will be the objective function that you are going to perform Maximum Likelihood Estimation?
θ
J
(
θ
,
θ
,
θ
,
θ
)
0
1
2
3