5-merged

pdf

School

Texas A&M University, Kingsville *

*We aren’t endorsed by this school

Course

FDD

Subject

Mathematics

Date

May 14, 2024

Type

pdf

Pages

14

Uploaded by MinisterDolphinPerson1117

Report
AMATH 383 Fall 2023 Problem Set 5 Last Problem Set!! Due: Monday 11/20 at 11:59pm Note: Submit electronically to Gradescope. Please show all work. 1. (30 points) Hessians and linear compositions. Recall that the Hessian of a function is a (generally symmetric) matrix vector of second partial derivatives: 2 f ( β ) = 2 f ∂β 1 ∂β 1 . . . 2 f ∂β 1 ∂β n . . . 2 f ∂β n ∂β 1 . . . 2 f ∂β n ∂β n Recall our gradient chain rule for compositions: if g ( β ) = f ( y ) then g ( β ) = X T f ( y ) . This rule also extends nicely for hessian computation: 2 g ( β ) = X T 2 f ( y ) X. (a) Compute the Hessian of f ( β ) = 1 2 y 2 2 + λ 2 β 2 . f ( β ) = X T ( 1 2 y 2 ) + λβ = X T ( y ) + λβ = X T X T y + λβ 2 f ( β ) = X T X + λI 1
(b) Compute the Hessian of f ( β ) = m X i =1 exp( x T i β ) y i x T i β f ( β ) = m i =1 x i exp ( x T i β ) ( y i x i ) 2 f ( β ) = m i =1 x i exp ( x T i β ) x T i (c) Compute the Hessian of f ( β ) = m X i =1 log(1 + exp( x T i β )) y i x T i β. Let f ( z ) = log (1 + exp ( z )) where z = x T i β Then f ( z ) = exp ( z ) 1+ exp ( z ) f ′′ ( z ) = exp ( z ) ( exp ( z )+1) 2 Then f ( β ) = m i =1 1 1+ exp ( x T i β ) x i exp ( x T i β ) y i x i 2 f ( β ) = m i =1 x i ( exp ( x T i β ) (1+ exp ( x T i β )) 2 ) x T i (d) Compute the Hessian of f ( β ) = m X i =1 log(1 + ( x T i β ) 2 ) Let f ( z ) = log (1 + z 2 ν ) where z = x T i β Then f ( z ) = 2 z z 2 + ν f ′′ ( z ) = 2( z 2 ν ) ( z 2 + ν ) 2 Then 2 f ( β ) = m i =1 x i ( 2(( x T i β ) 2 ν ) (( x T i β ) 2 + ν ) 2 ) x T i 2. (20 points) Show that Newton’s method finds the solution to any regularized linear least squares problem in one step. The minimum of f ( β ) needs to be found. Consider the linear least squares problem f ( β ) = y 2 2 + λ 2 β 2 for some c and λ . f ( β ) = 2 X T X T y + λβ 2 f ( β ) = 2 X T X + λI Then β 1 = β 0 (2 X T X + λI ) 1 (2 X T 0 X T y + λβ 0 ) β 1 = β 0 (2 X T X + λI ) 1 ((2 X T X + λI ) β 0 X T y ) β 1 = β 0 β 0 + (2 X T X + λI ) 1 X T y ) = (2 X T X + λI ) 1 X T y ) 2
To find the true solution f ( β ) should be set to zero so 2 X T X T y + λβ = 0 (2 X T X + λ ) β = X T y β = (2 X T X + λ ) 1 X T y = β 1 3. (30 points) Recall that gradient descent to minimize f ( x ) takes the form β k +1 = β k α k f ( β k ) while Newton’s method takes the form β k +1 = β k α k H 1 k f ( β k ) where H k can be the Hessian H k = 2 f ( β k ) or a suitable approxi- mation, and α k is a suitable step that guarantees descent of function values. Using the starter code for your homework, finish implementing these to methods (look for TODO items in the Python code). Make sure hw1.supp is in the same directory as your main python file. 4. (50 points) Implement binomial and Poisson regression methods and run the cells in the coding notebook that produce the plots. Please save your final notebook as a pdf that includes the plots and printouts that are automatically generated by the notebook. To make things a bit easier, you can go ahead and always use the line search with the option use_line_search=True in the call, instead of solving for an upper bound in case of logistic regression. I’ll discuss that part in class. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 1/11 AMATH 515 Homework 1 Due Date: 01/27/2021 Homework Instruction : Please fill in the gaps in this template where commented as TODO . When submitting, make sure your file is still named 515Hw1_Coding.ipynb -- that's what Gradescope will be looking for. There is no need to submit hw1_supp.py , it will already be on the server when you submit your work. You'll have 10 attempts to pass the tests. # require numpy module import numpy as np from numpy.linalg import norm from numpy.linalg import solve import matplotlib.pyplot as plt # import supplementary functions for Homework 1 import sys sys . path . insert ( 0 , './' ) from hw1_supp import * # You can change this seed if you want to test that your program works with differe # but please set it back to 123 when you submit your work. seed = 123 Gradient Descent Solver Recall the gradient descent algorithm we learned in class and complete the gradient descent solver. def optimizeWithGD ( x0 , func , grad , step_size , tol = 1e-6 , max_iter = 1000 , use_line_sea """ Optimize with Gradient Descent input ----- x0 : array_like Starting point for the solver. func : function Takes x and return the function value. grad : function Takes x and returns the gradient of "func". step_size : float or None If it is a float number and `use_line_search=False`, it will be used as the Otherwise, line search will be used tol : float, optional Gradient tolerance for terminating the solver. max_iter : int, optional Maximum number of iterations for terminating the solver. In [ ]: In [ ]: In [ ]: In [ ]:
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 2/11 use_line_search : bool, optional When it is true a line search will be used, other wise `step_size` has to b output ------ x : array_like Final solution obj_his : array_like Objective function's values convergence history err_his : array_like Norm of gradient convergence history exit_flag : int 0, norm of gradient below `tol` 1, exceed maximum number of iteration 2, line search fail 3, other """ # safeguard if not use_line_search and step_size is None : print ( 'Please specify the step_size or use the line search.' ) return x0 , np . array ([]), np . array ([]), 3 # initial step x = np . copy ( x0 ) g = grad ( x ) # obj = func ( x ) err = norm ( g ) # obj_his = np . zeros ( max_iter + 1 ) err_his = np . zeros ( max_iter + 1 ) # obj_his [ 0 ] = obj err_his [ 0 ] = err # start iterations iter_count = 0 while err >= tol : if use_line_search : step_size = lineSearch ( x , g , g , func ) # # if line search fail step_size will be None if step_size is None : print ( 'Gradient descent line search fail.' ) return x , obj_his [: iter_count + 1 ], err_his [: iter_count + 1 ], 2 # # gradient descent step ##### # TODO: with given step_size, complete gradient descent step x = x - step_size * g ##### # # update function and gradient g = grad ( x ) # obj = func ( x )
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 3/11 err = norm ( g ) # iter_count += 1 obj_his [ iter_count ] = obj err_his [ iter_count ] = err # # check if exceed maximum number of iteration if iter_count >= max_iter : print ( 'Gradient descent reach maximum number of iteration.' ) return x , obj_his [: iter_count + 1 ], err_his [: iter_count + 1 ], 1 # return x , obj_his [: iter_count + 1 ], err_his [: iter_count + 1 ], 0 Newton's Solver Recall Newton method we learned from in class and complete Newton solver. def optimizeWithNT ( x0 , func , grad , hess , tol = 1e-6 , max_iter = 100 ): """ Optimize with Newton's Method input ----- x0 : array_like Starting point for the solver. func : function Takes x and return the function value. grad : function Takes x and return the gradient. hess : function Input x and return the Hessian matrix. tol : float, optional Gradient tolerance for terminating the solver. max_iter : int, optional Maximum number of iteration for terminating the solver. output ------ x : array_like Final solution obj_his : array_like Objective function value convergence history err_his : array_like Norm of gradient convergence history exit_flag : int 0, norm of gradient below `tol` 1, exceed maximum number of iteration 2, others """ # initial step x = np . copy ( x0 ) g = grad ( x ) H = hess ( x ) # In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 4/11 obj = func ( x ) err = norm ( g ) # obj_his = np . zeros ( max_iter + 1 ) err_his = np . zeros ( max_iter + 1 ) # obj_his [ 0 ] = obj err_his [ 0 ] = err # start iterations iter_count = 0 while err >= tol : # Newton step ##### # TODO: complete Newton step # x = ? hint: using solve function x = x - solve ( H , g ) ##### # # update function, gradient and Hessian g = grad ( x ) H = hess ( x ) # obj = func ( x ) err = norm ( g ) # iter_count += 1 obj_his [ iter_count ] = obj err_his [ iter_count ] = err # # check if we exceeded maximum number of iterations if iter_count >= max_iter : print ( 'Gradient descent reach maximum number of iteration.' ) return x , obj_his [: iter_count + 1 ], err_his [: iter_count + 1 ], 1 # return x , obj_his [: iter_count + 1 ], err_his [: iter_count + 1 ], 0 Simple Test Consider a simple test problem, We can easily calculate the gradient and Hessian of , where is an identity matrix, and we know that is the solution. We will use this function as a simple test to check whether gradient descent and Newton method are implemented right.
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 5/11 # create b b = np . array ([ 1.0 , 2.0 , 3.0 ]) # define test function def test_func ( x ): return 0.5 * sum (( x - b ) ** 2 ) # define test gradient def test_grad ( x ): return x - b # define test Hessian def test_hess ( x ): return np . eye ( b . size ) # test gradient descent x0_gd = np . zeros ( b . size ) # x_gd , obj_his_gd , err_his_gd , exit_flag_gd = optimizeWithGD ( x0_gd , test_func , test_ # check if the solution is correct err_gd = norm ( x_gd - b ) # if err_gd < 1e-6 : print ( 'Gradient Descent: OK' ) else : print ( 'Gradient Descent: Err' ) Gradient Descent: OK # test Newton's method x0_nt = np . zeros ( b . size ) # x_nt , obj_his_nt , err_his_nt , exit_flag_nt = optimizeWithNT ( x0_nt , test_func , test_ # check if the solution is correct err_nt = norm ( x_nt - b ) # if err_nt < 1e-6 : print ( 'Newton: OK' ) else : print ( 'Newton: Err' ) Newton: OK Logistic Regression Recall the logistic regression model Calculate the gradient and Hessian of this function and define them below using the given data. # fix a random seed np . random . seed ( seed ) # set dimensions and create some random data In [ ]: In [ ]: In [ ]: In [ ]:
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 6/11 m_lgt = 500 n_lgt = 50 A_lgt = 0.3 * np . random . randn ( m_lgt , n_lgt ) x_lgt = np . random . randn ( n_lgt ) b_lgt = sampleLGT ( x_lgt , A_lgt ) lam_lgt = 0.1 Define functions # implement logistic function, gradient and Hessian def lgt_func ( x ): ##### # TODO: complete the function ##### a = np . sum ( np . log ( 1 + np . exp ( A_lgt@x ))) - b_lgt@A_lgt@x return a + lam_lgt * x@x / 2 # def lgt_grad ( x ): ##### # TODO: complete the gradient ##### return A_lgt . T @ (( np . exp ( A_lgt@x ) / ( 1 + np . exp ( A_lgt@x )))) - A_lgt . T@b_lgt + lam_lgt * x # def lgt_hess ( x ): ##### # TODO: complete the Hessian ##### a = np . diag ( np . exp ( A_lgt@x ) / ( 1 + np . exp ( A_lgt@x )) ** 2 ) return A_lgt . T @ a @ A_lgt + lam_lgt * np . eye ( n_lgt ) #==GRADED==# # No need to change anything in this cell. x_test = 3 * np . ones ( n_lgt ) # We test the values of these functions at a sample point (3, 3, ..., 3) lgt_func_test = lgt_func ( x_test ) lgt_grad_test = lgt_grad ( x_test ) lgt_hess_test = lgt_hess ( x_test ) # uniform step size for gradient descent or should we use line search? ##### # TODO: if there is a uniform stepsize, set up step_size_lgt = # otherwise set use_line_search=True ##### step_size_lgt = 1 / 100 Apply gradient descent solver x0_lgt_gd = np . zeros ( n_lgt ) #==GRADED==# # No need to change anything in this cell. In [ ]: In [ ]: In [ ]: In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 7/11 x_lgt_gd , obj_his_lgt_gd , err_his_lgt_gd , exit_flag_lgt_gd = \ optimizeWithGD ( x0_lgt_gd , lgt_func , lgt_grad , step_size_lgt ) # plot result fig , ax = plt . subplots ( 1 , 2 , figsize = ( 12 , 5 )) ax [ 0 ] . plot ( obj_his_lgt_gd ) ax [ 0 ] . set_title ( 'function value' ) ax [ 1 ] . semilogy ( err_his_lgt_gd ) ax [ 1 ] . set_title ( 'norm of the gradient' ) fig . suptitle ( 'Gradient Descent on Logistic Regression' ) Text(0.5, 0.98, 'Gradient Descent on Logistic Regression') Apply Newton's solver x0_lgt_nt = np . zeros ( n_lgt ) #==GRADED==# # No need to change anything in this cell. x_lgt_nt , obj_his_lgt_nt , err_his_lgt_nt , exit_flag_lgt_nt = \ optimizeWithNT ( x0_lgt_nt , lgt_func , lgt_grad , lgt_hess ) # plot result fig , ax = plt . subplots ( 1 , 2 , figsize = ( 12 , 5 )) ax [ 0 ] . plot ( obj_his_lgt_nt ) ax [ 0 ] . set_title ( 'function value' ) ax [ 1 ] . semilogy ( err_his_lgt_nt ) ax [ 1 ] . set_title ( 'norm of the gradient' ) fig . suptitle ( 'Newton\'s Method on Logistic Regression' ) Text(0.5, 0.98, "Newton's Method on Logistic Regression") In [ ]: Out[ ]: In [ ]: In [ ]: Out[ ]:
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 8/11 Poisson Regression Recall the Poisson regression model Calculate the gradient and Hessian of this function and define them below using the given data. # fix a random seed np . random . seed ( seed ) # set the dimensions and create some data m_psn = 500 n_psn = 50 A_psn = 0.3 * np . random . randn ( m_psn , n_psn ) x_psn = np . random . randn ( n_psn ) b_psn = samplePSN ( x_psn , A_psn ) lam_psn = 0.1 Define functions # define function, gradient and Hessian def psn_func ( x ): ##### # TODO: complete the function ##### a = np . sum ( np . exp ( A_psn@x )) return a - b_psn . T@A_psn@x + lam_psn * x@x / 2 # def psn_grad ( x ): ##### # TODO: complete the gradient In [ ]: In [ ]:
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 9/11 ##### a = ( A_psn . T@np . exp ( A_psn@x )) return a - A_psn . T@b_psn + lam_psn * x # def psn_hess ( x ): ##### # TODO: complete the Hessian ##### return A_psn . T * np . exp ( A_psn@x ) @A_psn + lam_psn * np . eye ( n_psn ) #==GRADED==# # No need to change anything in this cell. x_test = 2 * np . ones ( n_psn ) psn_func_test = psn_func ( x_test ) psn_grad_test = psn_grad ( x_test ) psn_hess_test = psn_hess ( x_test ) # uniform step size for gradient descent or should we use line search? ##### # TODO: if there is a uniform stepsize, set up step_size_lgt = # otherwise set use_line_search=True ##### #uselinsearch=true Apply gradient descent solver x0_psn_gd = np . zeros ( n_lgt ) #==GRADED==# # No need to change anything in this cell. x_psn_gd , obj_his_psn_gd , err_his_psn_gd , exit_flag_psn_gd = \ optimizeWithGD ( x0_psn_gd , psn_func , psn_grad , None , use_line_search = True ) # plot result fig , ax = plt . subplots ( 1 , 2 , figsize = ( 12 , 5 )) ax [ 0 ] . plot ( obj_his_psn_gd ) ax [ 0 ] . set_title ( 'function value' ) ax [ 1 ] . semilogy ( err_his_psn_gd ) ax [ 1 ] . set_title ( 'norm of the gradient' ) fig . suptitle ( 'Gradient Descent on Poisson Regression' ) Text(0.5, 0.98, 'Gradient Descent on Poisson Regression') In [ ]: In [ ]: In [ ]: In [ ]: Out[ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 10/11 Apply Newton's solver x0_psn_nt = np . zeros ( n_lgt ) #==GRADED==# # No need to change anything in this cell. x_psn_nt , obj_his_psn_nt , err_his_psn_nt , exit_flag_psn_nt = \ optimizeWithNT ( x0_psn_nt , psn_func , psn_grad , psn_hess ) # plot result fig , ax = plt . subplots ( 1 , 2 , figsize = ( 12 , 5 )) ax [ 0 ] . plot ( obj_his_psn_nt ) ax [ 0 ] . set_title ( 'function value' ) ax [ 1 ] . semilogy ( err_his_psn_nt ) ax [ 1 ] . set_title ( 'norm of the gradient' ) fig . suptitle ( 'Newton\'s Method on Poisson Regression' ) Text(0.5, 0.98, "Newton's Method on Poisson Regression") In [ ]: In [ ]: Out[ ]:
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 11/11 In [ ]: