5-merged
pdf
keyboard_arrow_up
School
Texas A&M University, Kingsville *
*We aren’t endorsed by this school
Course
FDD
Subject
Mathematics
Date
May 14, 2024
Type
Pages
14
Uploaded by MinisterDolphinPerson1117
AMATH 383 Fall 2023
Problem Set 5
Last Problem Set!!
Due: Monday 11/20 at 11:59pm
Note: Submit electronically to Gradescope. Please show all work.
1.
(30 points) Hessians and linear compositions.
Recall that the Hessian of a function is a (generally symmetric) matrix
vector of second partial derivatives:
∇
2
f
(
β
) =
∂
2
f
∂β
1
∂β
1
. . .
∂
2
f
∂β
1
∂β
n
.
.
.
∂
2
f
∂β
n
∂β
1
. . .
∂
2
f
∂β
n
∂β
n
Recall our gradient chain rule for compositions: if
g
(
β
) =
f
(
Xβ
−
y
)
then
∇
g
(
β
) =
X
T
∇
f
(
Xβ
−
y
)
.
This rule also extends nicely for hessian computation:
∇
2
g
(
β
) =
X
T
∇
2
f
(
Xβ
−
y
)
X.
•
(a) Compute the Hessian of
f
(
β
) =
1
2
∥
Xβ
−
y
∥
2
2
+
λ
2
∥
β
∥
2
.
∇
f
(
β
) =
X
T
∇
(
1
2
∥
Xβ
−
y
∥
2
) +
λβ
=
X
T
(
Xβ
−
y
) +
λβ
=
X
T
Xβ
−
X
T
y
+
λβ
∇
2
f
(
β
) =
X
T
X
+
λI
1
•
(b) Compute the Hessian of
f
(
β
) =
m
X
i
=1
exp(
x
T
i
β
)
−
y
i
x
T
i
β
∇
f
(
β
) =
∑
m
i
=1
x
i
exp
(
x
T
i
β
)
−
(
y
i
x
i
)
∇
2
f
(
β
) =
∑
m
i
=1
x
i
exp
(
x
T
i
β
)
x
T
i
•
(c) Compute the Hessian of
f
(
β
) =
m
X
i
=1
log(1 + exp(
x
T
i
β
))
−
y
i
x
T
i
β.
Let
f
(
z
) =
log
(1 +
exp
(
z
)) where
z
=
x
T
i
β
Then
f
′
(
z
) =
exp
(
z
)
1+
exp
(
z
)
f
′′
(
z
) =
exp
(
z
)
(
exp
(
z
)+1)
2
Then
∇
f
(
β
) =
∑
m
i
=1
1
1+
exp
(
x
T
i
β
)
x
i
exp
(
x
T
i
β
)
−
y
i
x
i
∇
2
f
(
β
) =
∑
m
i
=1
x
i
(
exp
(
x
T
i
β
)
(1+
exp
(
x
T
i
β
))
2
)
x
T
i
•
(d) Compute the Hessian of
f
(
β
) =
m
X
i
=1
log(1 + (
x
T
i
β
)
2
/ν
)
Let
f
(
z
) =
log
(1 +
z
2
ν
) where
z
=
x
T
i
β
Then
f
′
(
z
) =
2
z
z
2
+
ν
f
′′
(
z
) =
−
2(
z
2
−
ν
)
(
z
2
+
ν
)
2
Then
∇
2
f
(
β
) =
∑
m
i
=1
−
x
i
(
2((
x
T
i
β
)
2
−
ν
)
((
x
T
i
β
)
2
+
ν
)
2
)
x
T
i
2.
(20 points) Show that Newton’s method finds the solution to any
regularized linear least squares problem in one step.
The minimum of
f
(
β
) needs to be found.
Consider the linear least squares problem
f
(
β
) =
∥
Xβ
−
y
∥
2
2
+
λ
2
∥
β
∥
2
for some
c
and
λ
.
∇
f
(
β
) = 2
X
T
Xβ
−
X
T
y
+
λβ
∇
2
f
(
β
) = 2
X
T
X
+
λI
Then
β
1
=
β
0
−
(2
X
T
X
+
λI
)
−
1
(2
X
T
Xβ
0
−
X
T
y
+
λβ
0
)
β
1
=
β
0
−
(2
X
T
X
+
λI
)
−
1
((2
X
T
X
+
λI
)
β
0
−
X
T
y
)
⇒
β
1
=
β
0
−
β
0
+ (2
X
T
X
+
λI
)
−
1
X
T
y
) = (2
X
T
X
+
λI
)
−
1
X
T
y
)
2
To find the true solution
∇
f
(
β
) should be set to zero so
2
X
T
Xβ
−
X
T
y
+
λβ
= 0
⇒
(2
X
T
X
+
λ
)
β
=
X
T
y
β
= (2
X
T
X
+
λ
)
−
1
X
T
y
=
β
1
3.
(30 points) Recall that gradient descent to minimize
f
(
x
) takes the
form
β
k
+1
=
β
k
−
α
k
∇
f
(
β
k
)
while Newton’s method takes the form
β
k
+1
=
β
k
−
α
k
H
−
1
k
∇
f
(
β
k
)
where
H
k
can be the Hessian
H
k
=
∇
2
f
(
β
k
) or a suitable approxi-
mation, and
α
k
is a suitable step that guarantees descent of function
values.
Using the starter code for your homework, finish implementing these
to methods (look for TODO items in the Python code).
Make sure
hw1.supp is in the same directory as your main python file.
4.
(50 points) Implement binomial and Poisson regression methods
and run the cells in the coding notebook that produce the plots. Please
save your final notebook as a pdf that includes the plots and printouts
that are automatically generated by the notebook.
To make things a bit easier, you can go ahead and always use the line
search with the option
use_line_search=True
in the call, instead of
solving for an upper bound in case of logistic regression. I’ll discuss
that part in class.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
1/11
AMATH 515 Homework 1
Due Date: 01/27/2021
Homework Instruction
: Please fill in the gaps in this template where commented as TODO
.
When submitting, make sure your file is still named 515Hw1_Coding.ipynb
-- that's what
Gradescope will be looking for. There is no need to submit hw1_supp.py
, it will already be
on the server when you submit your work. You'll have 10 attempts
to pass the tests.
# require numpy module
import
numpy as
np
from
numpy.linalg import
norm
from
numpy.linalg import
solve
import
matplotlib.pyplot as
plt
# import supplementary functions for Homework 1
import
sys
sys
.
path
.
insert
(
0
, './'
)
from
hw1_supp import
*
# You can change this seed if you want to test that your program works with differe
# but please set it back to 123 when you submit your work.
seed =
123
Gradient Descent Solver
Recall the gradient descent algorithm we learned in class and complete the gradient descent
solver.
def
optimizeWithGD
(
x0
, func
, grad
, step_size
, tol
=
1e-6
, max_iter
=
1000
, use_line_sea
"""
Optimize with Gradient Descent
input
-----
x0 : array_like
Starting point for the solver.
func : function
Takes x and return the function value.
grad : function
Takes x and returns the gradient of "func".
step_size : float or None
If it is a float number and `use_line_search=False`, it will be used as the
Otherwise, line search will be used
tol : float, optional
Gradient tolerance for terminating the solver.
max_iter : int, optional
Maximum number of iterations for terminating the solver.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
2/11
use_line_search : bool, optional
When it is true a line search will be used, other wise `step_size` has to b
output
------
x : array_like
Final solution
obj_his : array_like
Objective function's values convergence history
err_his : array_like
Norm of gradient convergence history
exit_flag : int
0, norm of gradient below `tol`
1, exceed maximum number of iteration
2, line search fail
3, other
"""
# safeguard
if
not
use_line_search and
step_size is
None
:
print
(
'Please specify the step_size or use the line search.'
)
return
x0
, np
.
array
([]), np
.
array
([]), 3
# initial step
x =
np
.
copy
(
x0
)
g =
grad
(
x
)
#
obj =
func
(
x
)
err =
norm
(
g
)
#
obj_his =
np
.
zeros
(
max_iter +
1
)
err_his =
np
.
zeros
(
max_iter +
1
)
#
obj_his
[
0
] =
obj
err_his
[
0
] =
err
# start iterations
iter_count =
0
while
err >=
tol
:
if
use_line_search
:
step_size =
lineSearch
(
x
, g
, g
, func
)
#
# if line search fail step_size will be None
if
step_size is
None
:
print
(
'Gradient descent line search fail.'
)
return
x
, obj_his
[:
iter_count
+
1
], err_his
[:
iter_count
+
1
], 2
#
# gradient descent step
#####
# TODO: with given step_size, complete gradient descent step
x =
x -
step_size
*
g
#####
#
# update function and gradient
g =
grad
(
x
)
#
obj =
func
(
x
)
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
3/11
err =
norm
(
g
)
#
iter_count +=
1
obj_his
[
iter_count
] =
obj
err_his
[
iter_count
] =
err
#
# check if exceed maximum number of iteration
if
iter_count >=
max_iter
:
print
(
'Gradient descent reach maximum number of iteration.'
)
return
x
, obj_his
[:
iter_count
+
1
], err_his
[:
iter_count
+
1
], 1
#
return
x
, obj_his
[:
iter_count
+
1
], err_his
[:
iter_count
+
1
], 0
Newton's Solver
Recall Newton method we learned from in class and complete Newton solver.
def
optimizeWithNT
(
x0
, func
, grad
, hess
, tol
=
1e-6
, max_iter
=
100
):
"""
Optimize with Newton's Method
input
-----
x0 : array_like
Starting point for the solver.
func : function
Takes x and return the function value.
grad : function
Takes x and return the gradient.
hess : function
Input x and return the Hessian matrix.
tol : float, optional
Gradient tolerance for terminating the solver.
max_iter : int, optional
Maximum number of iteration for terminating the solver.
output
------
x : array_like
Final solution
obj_his : array_like
Objective function value convergence history
err_his : array_like
Norm of gradient convergence history
exit_flag : int
0, norm of gradient below `tol`
1, exceed maximum number of iteration
2, others
"""
# initial step
x =
np
.
copy
(
x0
)
g =
grad
(
x
)
H =
hess
(
x
)
#
In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
4/11
obj =
func
(
x
)
err =
norm
(
g
)
#
obj_his =
np
.
zeros
(
max_iter +
1
)
err_his =
np
.
zeros
(
max_iter +
1
)
#
obj_his
[
0
] =
obj
err_his
[
0
] =
err
# start iterations
iter_count =
0
while
err >=
tol
:
# Newton step
#####
# TODO: complete Newton step
# x = ? hint: using solve function
x
=
x
-
solve
(
H
,
g
)
#####
#
# update function, gradient and Hessian
g =
grad
(
x
)
H =
hess
(
x
)
#
obj =
func
(
x
)
err =
norm
(
g
)
#
iter_count +=
1
obj_his
[
iter_count
] =
obj
err_his
[
iter_count
] =
err
#
# check if we exceeded maximum number of iterations
if
iter_count >=
max_iter
:
print
(
'Gradient descent reach maximum number of iteration.'
)
return
x
, obj_his
[:
iter_count
+
1
], err_his
[:
iter_count
+
1
], 1
#
return
x
, obj_his
[:
iter_count
+
1
], err_his
[:
iter_count
+
1
], 0
Simple Test
Consider a simple test problem,
We can easily calculate the gradient and Hessian of ,
where is an identity matrix, and we know that is the solution.
We will use this function as a simple test to check whether gradient descent and Newton
method are implemented right.
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
5/11
# create b
b =
np
.
array
([
1.0
, 2.0
, 3.0
])
# define test function
def
test_func
(
x
):
return
0.5
*
sum
((
x -
b
)
**
2
)
# define test gradient
def
test_grad
(
x
):
return
x -
b
# define test Hessian
def
test_hess
(
x
):
return
np
.
eye
(
b
.
size
)
# test gradient descent
x0_gd =
np
.
zeros
(
b
.
size
)
#
x_gd
, obj_his_gd
, err_his_gd
, exit_flag_gd =
optimizeWithGD
(
x0_gd
, test_func
, test_
# check if the solution is correct
err_gd =
norm
(
x_gd -
b
)
#
if
err_gd <
1e-6
:
print
(
'Gradient Descent: OK'
)
else
:
print
(
'Gradient Descent: Err'
)
Gradient Descent: OK
# test Newton's method
x0_nt =
np
.
zeros
(
b
.
size
)
#
x_nt
, obj_his_nt
, err_his_nt
, exit_flag_nt =
optimizeWithNT
(
x0_nt
, test_func
, test_
# check if the solution is correct
err_nt =
norm
(
x_nt -
b
)
#
if
err_nt <
1e-6
:
print
(
'Newton: OK'
)
else
:
print
(
'Newton: Err'
)
Newton: OK
Logistic Regression
Recall the logistic regression model
Calculate the gradient and Hessian of this function and define them below using the given
data.
# fix a random seed
np
.
random
.
seed
(
seed
)
# set dimensions and create some random data
In [ ]:
In [ ]:
In [ ]:
In [ ]:
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
6/11
m_lgt =
500
n_lgt =
50
A_lgt =
0.3
*
np
.
random
.
randn
(
m_lgt
, n_lgt
)
x_lgt =
np
.
random
.
randn
(
n_lgt
)
b_lgt =
sampleLGT
(
x_lgt
, A_lgt
)
lam_lgt =
0.1
Define functions
# implement logistic function, gradient and Hessian
def
lgt_func
(
x
):
#####
# TODO: complete the function
#####
a
=
np
.
sum
(
np
.
log
(
1
+
np
.
exp
(
A_lgt@x
)))
-
b_lgt@A_lgt@x
return
a +
lam_lgt
*
x@x
/
2
#
def
lgt_grad
(
x
):
#####
# TODO: complete the gradient
#####
return
A_lgt
.
T
@
((
np
.
exp
(
A_lgt@x
)
/
(
1
+
np
.
exp
(
A_lgt@x
))))
-
A_lgt
.
T@b_lgt
+
lam_lgt
*
x
#
def
lgt_hess
(
x
):
#####
# TODO: complete the Hessian
#####
a
=
np
.
diag
(
np
.
exp
(
A_lgt@x
)
/
(
1
+
np
.
exp
(
A_lgt@x
))
**
2
)
return
A_lgt
.
T @
a @
A_lgt +
lam_lgt *
np
.
eye
(
n_lgt
)
#==GRADED==#
# No need to change anything in this cell.
x_test =
3
*
np
.
ones
(
n_lgt
)
# We test the values of these functions at a sample point (3, 3, ..., 3)
lgt_func_test =
lgt_func
(
x_test
)
lgt_grad_test =
lgt_grad
(
x_test
)
lgt_hess_test =
lgt_hess
(
x_test
)
# uniform step size for gradient descent or should we use line search?
#####
# TODO: if there is a uniform stepsize, set up step_size_lgt = # otherwise set use_line_search=True
#####
step_size_lgt
=
1
/
100
Apply gradient descent solver
x0_lgt_gd =
np
.
zeros
(
n_lgt
)
#==GRADED==#
# No need to change anything in this cell.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
7/11
x_lgt_gd
, obj_his_lgt_gd
, err_his_lgt_gd
, exit_flag_lgt_gd =
\
optimizeWithGD
(
x0_lgt_gd
, lgt_func
, lgt_grad
, step_size_lgt
)
# plot result
fig
, ax =
plt
.
subplots
(
1
, 2
, figsize
=
(
12
,
5
))
ax
[
0
]
.
plot
(
obj_his_lgt_gd
)
ax
[
0
]
.
set_title
(
'function value'
)
ax
[
1
]
.
semilogy
(
err_his_lgt_gd
)
ax
[
1
]
.
set_title
(
'norm of the gradient'
)
fig
.
suptitle
(
'Gradient Descent on Logistic Regression'
)
Text(0.5, 0.98, 'Gradient Descent on Logistic Regression')
Apply Newton's solver
x0_lgt_nt =
np
.
zeros
(
n_lgt
)
#==GRADED==#
# No need to change anything in this cell.
x_lgt_nt
, obj_his_lgt_nt
, err_his_lgt_nt
, exit_flag_lgt_nt =
\
optimizeWithNT
(
x0_lgt_nt
, lgt_func
, lgt_grad
, lgt_hess
)
# plot result
fig
, ax =
plt
.
subplots
(
1
, 2
, figsize
=
(
12
,
5
))
ax
[
0
]
.
plot
(
obj_his_lgt_nt
)
ax
[
0
]
.
set_title
(
'function value'
)
ax
[
1
]
.
semilogy
(
err_his_lgt_nt
)
ax
[
1
]
.
set_title
(
'norm of the gradient'
)
fig
.
suptitle
(
'Newton\'s Method on Logistic Regression'
)
Text(0.5, 0.98, "Newton's Method on Logistic Regression")
In [ ]:
Out[ ]:
In [ ]:
In [ ]:
Out[ ]:
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
8/11
Poisson Regression
Recall the Poisson regression model
Calculate the gradient and Hessian of this function and define them below using the given
data.
# fix a random seed
np
.
random
.
seed
(
seed
)
# set the dimensions and create some data
m_psn =
500
n_psn =
50
A_psn =
0.3
*
np
.
random
.
randn
(
m_psn
, n_psn
)
x_psn =
np
.
random
.
randn
(
n_psn
)
b_psn =
samplePSN
(
x_psn
, A_psn
)
lam_psn =
0.1
Define functions
# define function, gradient and Hessian
def
psn_func
(
x
):
#####
# TODO: complete the function
#####
a
=
np
.
sum
(
np
.
exp
(
A_psn@x
))
return
a
-
b_psn
.
T@A_psn@x +
lam_psn
*
x@x
/
2
#
def
psn_grad
(
x
):
#####
# TODO: complete the gradient
In [ ]:
In [ ]:
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
9/11
#####
a
=
(
A_psn
.
T@np
.
exp
(
A_psn@x
))
return
a
-
A_psn
.
T@b_psn
+
lam_psn
*
x
#
def
psn_hess
(
x
):
#####
# TODO: complete the Hessian
#####
return
A_psn
.
T
*
np
.
exp
(
A_psn@x
)
@A_psn +
lam_psn *
np
.
eye
(
n_psn
)
#==GRADED==#
# No need to change anything in this cell.
x_test =
2
*
np
.
ones
(
n_psn
)
psn_func_test =
psn_func
(
x_test
)
psn_grad_test =
psn_grad
(
x_test
)
psn_hess_test =
psn_hess
(
x_test
)
# uniform step size for gradient descent or should we use line search?
#####
# TODO: if there is a uniform stepsize, set up step_size_lgt = # otherwise set use_line_search=True
#####
#uselinsearch=true
Apply gradient descent solver
x0_psn_gd =
np
.
zeros
(
n_lgt
)
#==GRADED==#
# No need to change anything in this cell.
x_psn_gd
, obj_his_psn_gd
, err_his_psn_gd
, exit_flag_psn_gd =
\
optimizeWithGD
(
x0_psn_gd
, psn_func
, psn_grad
, None
, use_line_search
=
True
)
# plot result
fig
, ax =
plt
.
subplots
(
1
, 2
, figsize
=
(
12
,
5
))
ax
[
0
]
.
plot
(
obj_his_psn_gd
)
ax
[
0
]
.
set_title
(
'function value'
)
ax
[
1
]
.
semilogy
(
err_his_psn_gd
)
ax
[
1
]
.
set_title
(
'norm of the gradient'
)
fig
.
suptitle
(
'Gradient Descent on Poisson Regression'
)
Text(0.5, 0.98, 'Gradient Descent on Poisson Regression')
In [ ]:
In [ ]:
In [ ]:
In [ ]:
Out[ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
10/11
Apply Newton's solver
x0_psn_nt =
np
.
zeros
(
n_lgt
)
#==GRADED==#
# No need to change anything in this cell.
x_psn_nt
, obj_his_psn_nt
, err_his_psn_nt
, exit_flag_psn_nt =
\
optimizeWithNT
(
x0_psn_nt
, psn_func
, psn_grad
, psn_hess
)
# plot result
fig
, ax =
plt
.
subplots
(
1
, 2
, figsize
=
(
12
,
5
))
ax
[
0
]
.
plot
(
obj_his_psn_nt
)
ax
[
0
]
.
set_title
(
'function value'
)
ax
[
1
]
.
semilogy
(
err_his_psn_nt
)
ax
[
1
]
.
set_title
(
'norm of the gradient'
)
fig
.
suptitle
(
'Newton\'s Method on Poisson Regression'
)
Text(0.5, 0.98, "Newton's Method on Poisson Regression")
In [ ]:
In [ ]:
Out[ ]:
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
11/11
In [ ]: