CSCI 1300 Lecture Package - Gradient Descent

pdf

School

Dalhousie University *

*We aren’t endorsed by this school

Course

1300

Subject

Electrical Engineering

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by SuperOkapiMaster52

Gradient Descent Date: In this final lecture within our Optimization Module, we will be looking at a very powerful technique called Gradient Descent. This method is an iterative optimization tool that computer scientists use quite commonly (you’ll be using it next week in your tutorial!). Buckle up, because this lecture is going to be a ride! At the end of this lecture, you will be able to answer the following learning objectives: Learning Objectives: What is gradient descent, and how can we use it to find a line of best fit for a data set? Key Topics The follow are the key topics you will need to become familiar with: Learning Rate; Training Set; Weight; Loss and Cost Functions; and Gradient Descent Important Notes: 142

Cue Questions While you Wait: In Iterative Optimization, how did we iterate through the optimizing process? How did we update x i ? Some important terms: The learning rate is A training set is Weights are A Loss Function is A Cost Function is Summary 143

Cue Questions Gradient Descent is an approach that finds the minimum value iteratively by using partial derivatives on weights of an objective function (in this course, it will mainly be minimizing a cost function). Steps for Gradient Descent: 1. Initialize the weights of the objective function. This can be any initialization, but typically we initialize everything to be 0. 2. Calculate the prediction of the objective function using your training set and your initialized weights. 3. Calculate the Loss between your prediction and what you expected from your training set. 4. Perform Iterative Optimization on each of the weights using partial di ↵ erentiation. 5. Repeat for each point in your training set. 6. Repeat for each iteration. Strategy: A good strategy for these questions is to complete a table like the following: j ( x, y ) Pred. Loss Dw 0 Dw 1 . . . New w 0 New w 1 . . . 1 ( x 1 , y 1 ) 1 ( x 2 , y 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 ( x 1 , y 1 ) 2 ( x 2 , y 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . where j is the iteration, ( x i , y i ) is the i th point in the training set, Pred is the prediction using your objective function and current weights, Loss is your expected value minus your prediction, DW k is the partial derivative of your Cost function with respect to w k , and New w 0 is the updated weight (using the Iterative Optimization process). Note that the notation DW k is as such as it matches the Maple file. If you would prefer to use the notation @ w 0 then that is perfectly acceptable. Summary 144

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Cue Questions Self-Challenge Questions: 1. Calculate the loss, prediction, and any partial derivatives for the gradient descent method for the function y = w 0 e w 1 x . 2. Use gradient descent with the information above to compute one iteration on the training set A = { (1 , 2) , (2 , 7) , (3 , 20) } with ⌘ = 0 . 1. You can leave your answers in exact form. You can use the table to guide you j ( x, y ) Pred. Loss Dw 0 Dw 1 New w 0 New w 1 1 (1 , 2) 1 (2 , 7) 1 (3 , 20) Summary 145

Cue Questions Self-Challenge Questions Continued: 3. Use your final prediction to predict the output of x = 4. Summary 146

Gradient Descent Worksheet Date: Hard Answer Practice Problems 1. Using Gradient Descent, calculate the first two iterations of w 0 and w 1 for the training set { (1 , 1) , (2 , 4) , (3 , 9) , (4 , 16) , (5 , 25) } with ⌘ = 0 . 01, and graph your function f ( x ) = w 1 x + w 0 against g ( x ) = x 2 . 147

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

2. Adapt the Gradient Descent algorithm to a quadratic polynomial y = w 2 x 2 + w 1 x + w 0 . GradientDescentQuad :=proc(TestList,eta) local w1,w0, ,i,j,X,Y,Prediction,Loss,Dw1,Dw0, : w1:=0;# initializes weight w1 to 0 w0:=0;# initializes weight w0 to 0 for j from 1 to 600 do # number of times we iterate through the gradient descent algorithm for i from 1 to nops(TestList) do # minimizing the distance for each x i X:=TestList[i][1];# sets the X value Y:=TestList[i][2];# sets the Y value Prediction:= :# calculates the prediction from the line Loss :=Y-Prediction;# calculates the di ↵ erence between what we expect and what we predicted Dw1:= ;# setting the partial derivative with respect to w1 Dw0:= ;# setting the partial derivative with respect to w0 w1:=w1-Dw1*eta: # updates w 1 - ⌘@ f/ @ w 1 w0:=w0-Dw0*eta; # updates w 0 - ⌘@ f/ @ w 0 od:# end loop if abs(Dw1) < 10 - 8 and abs(Dw0) < 10 - 8 and then # terminates the gradient descent loop if the partial derivatives are close to 0 break; fi:# end if od: # end loop return [ ,w1,w0];# returns the values of the weights end:# end of the procedure 148

3. Calculate the first two iterations of w0, w1, and w2 for the training set { (1 , 1) , (2 , 4) , (3 , 9) , (4 , 16) , (5 , 25) } with ⌘ = 0 . 01 using the GradientDescentQuad function, and graph your function f ( x ) = w 2 x 2 + w 1 x + w 0 against g ( x ) = x 2 . 149