Assignment_2

pdf

School

University of Toronto *

*We aren’t endorsed by this school

Course

1508

Subject

Electrical Engineering

Date

Feb 20, 2024

Type

pdf

Pages

14

Uploaded by ProfessorProton19692

Report
University of Toronto Department of Electrical and Computer Engineering ECE 1508S2: Applied Deep Learning A. Bereyhi - Winter 2024 Assignment 2 Feedforward Neural Networks D ATE : Feb 2, 2024 D UE : Feb 16, 2024 P REFACE This is the second series of assignments for the course Special Topics in Communica- tions: Applied Deep Learning . The exercises are aimed to review the topics of Chapter 2, i.e., Feedforward Neural Networks. Below, you can find the information regarding the contents of these exercises, as well as instructions on how to submit them. G ENERAL I NFORMATION The assignments are given in two sections. In the first section, you have written ques- tions that you can answer in words or by derivation. The questions are consistent with the material of Chapter 2 and you do not need any further resources to answer them. The second section includes the programming assignments. For these assignments, you need to use the package torch in Python. For those who are beginners, an intro- duction has been given and some useful online resources have been cited. In the case that a question is unclear or there is any flaws, please contact over Piazza. Also, in case that any particular assumption is required to solve the problem, feel free to consider the assumption and state it in your solution. The total mark of the assignments is 100 points with written questions having the following mark distribution: • Question 1: 10 points • Question 2: 5 points Assignment 2 Deadline: Feb 16, 2024 Page 1 of 14
• Question 3: 10 points • Question 4: 5 points The mark distribution of the programming assignments are further as follows: 50 points for the first assignment and 20 points for the second assignment. There- fore, the total mark of written questions adds to 30 points and the total mark of the programming assignments adds to 70 points. H OW TO S UBMIT Please submit the answer to written exercises as a PDF or image file. It does not require to be machine-typed, and you can submit the photo of your handwritten solu- tions. For the programming tasks, it is strongly suggested to use the Python notebook Assgn _ 2.ipynb that is available on Quercus and you can use it for submission. Note that most of the codes for programming assignments are already given in the Python notebook and you are only asked to complete the code in the indicated lines. Nevertheless, this is not mandatory to use this file and you can use any other file format for your submission. Regardless of what format or template you choose, your submission for the programming assignments should be included in a single file. 1 The deadline for your submission is on February 16, 2024 at 11:59 PM . • You can delay up to three days, i.e., until February 19, 2024 at 11:59 PM . After this extended deadline no submission is accepted. • In case of your delay, you lose one of your two penalty-free delays. After two penalty-free delays, each day of delay deducts 10% of the assignment mark. Please submit your assignment only through Quercus, and not by email. 1 W RITTEN E XERCISES Q UESTION 1: F ORWARD AND B ACKWARD P ASS In this exercise, we try forward and backward propagation for the simple feedforward neural network (FNN), we had in the first series of assignments. This FNN has been shown in Figure. 1.1 . In this FNN, we have used soft-ReLU function for activation in the hidden layer. This means that f ( · ) in Figure. 1.1 is f ( z ) = log ( 1 + e z ) with log taken in natural base. The output layer is further activated via the sigmoid function, i.e., σ ( z ) = 1 1 + e z . For training of this FNN, we use the cross-entropy function as the loss function. We are given with the data-point x = " 1 1 # 1 A zip file of multiple executable files is also accepted. Assignment 2 Deadline: Feb 16, 2024 Page 2 of 14
x 1 x 2 f f σ y Figure 1.1: Fully-connected FNN with two-dimensional input x = [ x 1 , x 2 ] T . whose true label is v 0 = 1. We intend to perform one forward and backward pass by hand. To this end, assume that all weights and biases are initiated by value 0.1, i.e., all entries of W (0) 1 and w (0) 2 are 0.1, where W 1 is the matrix containing all weights and biases of the hidden layer and w 2 is the vector containing all weights and biases of the output layer. 1. Determine all variables calculated in the forward pass. You have to explain the order of your calculation using the forward propagation algorithm. 2. Determine the gradient of the loss with respect to all the weights and biases at the given initial values via backpropagation . Note: You must use the backpropagation algorithm. 3. Assume we are doing sample-level training. Calculate the updated weights and biases for the next iteration of gradient descent, i.e., W (1) 1 and w (1) 2 . Q UESTION 2: F ORWARD -P ROPAGATION R EVISITED Consider a fully-connected feed- forward neural network (FNN) with L hidden layers. The input data-point x to this FNN has N entries, i.e., x N . The hidden layer for ∈ { 1, ... , L } has W neurons all being activated with activation function f ( · ) : 7→ and its output layer contains W L +1 neurons with activation function f L +1 ( · ) : 7→ . For this network, we derived the forward-propagation algorithm in the lecture as given in Algorithm 1 . Algorithm 1 ForwardProp(): Standard Form Derived in Lecture 1: Initiate with y 0 = x 2: for = 0, ... , L do 3: Add y [0] = 1 and determine z +1 = W +1 y # forward affine 4: Determine y +1 = f +1 ( z +1 ) # forward activation 5: end for 6: for = 1, ... , L + 1 do 7: Return y and z 8: end for In this algorithm, matrix W +1 W +1 × ( W +1) which contains all the weights and bi- ases of the neurons in layer + 1, where we define the input layer to be layer 0 with W 0 = N nodes, i.e., the input entries, and the output layer to be layer L + 1. In this exercise, we intend to represent an alternative form for forward-propagation in which we represent the weights and biases as separate components. 2 For ∈ { 0, ... , L } , let ˜ W +1 W +1 ×W be a matrix whose entry in row j and column i denotes the weight of neuron j in layer +1 for its i -th input. Moreover, let b +1 W +1 2 This means that we do not want to use the dummy node 1 in each layer as we did it in the lecture. Assignment 2 Deadline: Feb 16, 2024 Page 3 of 14
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
be the vector of biases in layer + 1 whose entry j denotes the bias of neuron j in layer + 1. 1. Write the affine transform of layer + 1 in terms of the weight matrix ˜ W +1 and the bias vector b +1 . 2. Re-write the forward-propagation algorithm in terms of ˜ W +1 and b +1 . For sake of simplicity, an uncompleted version of the algorithm is given below in Algorithm 2 : you should only complete the blank lines . Hint: Note that this alternative form should not contain W anymore. Algorithm 2 ForwardProp(): Alternative Form 1: ---------- # complete 2: for = 0, ... , L do 3: ---------- # complete 4: Determine y +1 = f +1 ( z +1 ) # forward activation 5: end for 6: for = 1, ... , L + 1 do 7: Return y and z 8: end for 3. Explain what is the relation between matrix W in Algorithm 1 and ˜ W and b in Algorithm 2 . Q UESTION 3: C HAIN -R ULE FOR A FFINE O PERATION Assume that scalar ˆ R is a function of vector y K , i.e., ˆ R = L ( y ) for some L ( · ) : K 7→ 1. We already have calculated the gradient of ˆ R with respect to y , i.e., we have the vector y ˆ R = ˆ R y 1 . . . ˆ R y K . We further know that y is an affine function of an input vector z N , i.e., y = Az + b for some matrix A K × N and b K . We want to calculate the gradient of ˆ R with respect to any of these three components, i.e., z , A and b from y ˆ R . 1. First assume that A and b are given. We intend to calculate gradient of ˆ R with respect to z , i.e., z ˆ R . The computation graph for this problem is shown in Figure. 1.2 . Determine z ˆ R in terms of y ˆ R . Hint: You need to present the result compactly as a matrix-vector multiplication. z y ˆ R A z + b L y ˆ R Figure 1.2: Computation graph for Case 1, where we aim to calculate z ˆ R . Assignment 2 Deadline: Feb 16, 2024 Page 4 of 14
2. Now, assume another case in which z and b are given, and we intend to calcu- late gradient of ˆ R with respect to A , i.e., A ˆ R . The computation graph for this problem is shown in Figure. 1.3 . Determine A ˆ R in terms of y ˆ R . Hint: You need to present the result compactly as a vector-vector multiplication. A y ˆ R A z + b L y ˆ R Figure 1.3: Computation graph for Case 2, where we aim to calculate A ˆ R . 3. As the last case, assume that A and z are given. We now intend to calculate gradient of ˆ R with respect to b , i.e., b ˆ R . The computation graph for this problem is shown in Figure. 1.4 . Determine b ˆ R in terms of y ˆ R . Hint: You need to present the result compactly as a vector. b y ˆ R Az + b L y ˆ R Figure 1.4: Computation graph for Case 3, where we aim to calculate b ˆ R . Q UESTION 4: B ACKPROPAGATION R EVISITED We now extend the alternative repres- entation of Question 2 to the backward pass, using the results of Question 3. Recall the backpropagation algorithm derived in the lecture: this is given in Algorithm 3 . In this algorithm, y represents the gradient of loss ˆ R determined for data-point ( x , v ), i.e., if y L +1 denotes the output of the FNN for input point x ; then, ˆ R = L ( y L +1 , v ). Fur- thermore, ˙ f ( · ) : 7→ denotes the derivative of activation f ( · ) with respect to its argument and is entry-wise product. Algorithm 3 BackProp(): Standard Form Derived in Lecture 1: Initiate with y L +1 = L ( y L +1 , v ) and z L +1 = y L +1 ˙ f L +1 ( z L +1 ) 2: for = L , ... , 1 do 3: Determine y = W T +1 z +1 and drop y [0] # backward affine 4: Determine z = ˙ f ( z ) y # backward activation 5: end for 6: for = 1, ... , L + 1 do 7: Return W ˆ R = z y T 1 8: end for 1. Using the results of Question 3, complete the alternative form of Backpropaga- tion algorithm given below in Algorithm 4 . This alternative form should only contain matrices ˜ W +1 and vectors b +1 , as defined in Question 2. Assignment 2 Deadline: Feb 16, 2024 Page 5 of 14
Algorithm 4 BackProp(): Alternative Form 1: Initiate with y L +1 = L ( y L +1 , v ) and z L +1 = y L +1 ˙ f L +1 ( z L +1 ) 2: for = L , ... , 1 do 3: ---------- # complete 4: Determine z = ˙ f ( z ) y # backward activation 5: end for 6: for = 1, ... , L + 1 do 7: ---------- # complete 8: end for 2. Explain what is the relation between W ˆ R in Algorithm 1 and those gradient that are returned in line:7 of Algorithm 4 . 2 P ROGRAMMING E XERCISES Throughout programming tasks we use library torch to implement the forward and backward propagation through a three-layer FNN. We also use and torchvision to access to the MNIST dataset. In case that you are a beginner, you may find the follow- ing description useful to start with these packages. U SING T ENSORS IN P Y T ORCH To use any library in Python , you need to install it first. This can be done directly through terminal (command line) using pip installer which is an inline installer of Python packages. Below is an example of installing PyTorch : pip install torch torchvision torchaudio Depending on your operating system you can find the exact installation command at pytorch.org/get-started/locally . Once the packages are installed, you could import them using the command import : import torch and we could access modules and functions by calling torch : for instance, in this assignment, we use the random generator. We could generate a 3 × 2 × 2 uniform random tensor by torch.rand(3,2,2) This gives a random tensor, for example >> tensor([[[0.3428, 0.4368], [0.5732, 0.4344]], [[0.7477, 0.8229], [0.8687, 0.4596]], [[0.9962, 0.0207], [0.4515, 0.8986]]]) You could learn more at docs.python.org/3/tutorial/modules#packages . Assignment 2 Deadline: Feb 16, 2024 Page 6 of 14
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
P REFACE : S PECIFYING THE FNN AND L OSS We are going to work with the three-layer FNN studied in the lecture. Wherever needed, we consider the following specifications: we have a three-layer fully-connected FNN whose input is an image from MNIST dataset. This means that the input dimension is N = 28 × 28 = 784. Both hidden layers have 128 neurons, i.e., W 1 = W 2 = 128. All hidden neurons are activated by ReLU function. In MNIST, we have C = 10 classes. We hence use a softmax-activated neuron at the output layer: it gets the 128 outputs of the second hidden layer and returns a 10-dimensional vector. We further determine the loss via the cross-entropy function: assume the image be- longs to class v ∈ { 1, ... , 10 } and let y 3 10 be the output of our FNN. The loss is then calculated as ˆ R = log y 3 [ v ] where y 3 [ v ] is entry v of vector y 3 . 2.1 F ORWARD AND B ACKWARD P ROPAGATION FROM S CRATCH In this assignment, we intend to implement the forward and backpropagation algorithms for the specified three-layer FNN. T ASK 1: I MPLEMENTING H IDDEN L AYERS Now that we know the exact architecture of the FNN and the loss function, we intend to implement the forward and backpropaga- tion. To this end, we need to first define our model. We do this by first defining the hidden layers. 1. Start your code by writing a class called hidden() . This class gets the input and output dimensions of the hidden layer and performs the forward and backward passes. In this class, define the input and output dimensions as attributes. Also, initiate the matrix of weights via a matrix given as the input. You can make this class by completing the following code: class hidden(): def __ init __ (self, input _ size, output _ size, W): # define the attributes of this class self.input _ size = # complete self.output _ size = # complete # initiate the matrix of weights self.weights = W Next, we implement the forward propagation from the input to the output of the hidden layer. This includes two passes, i.e., one affine transform and one activation . We can implement the affine transform by multiplying the extended input (input + dummy entry 1) to the weight matrix of the hidden layer. For activation, we need to pass every entry through ReLU, i.e., make it zero if negative and leave it unchanged otherwise. Assignment 2 Deadline: Feb 16, 2024 Page 7 of 14
2. Add the function forward to the class hidden() . This function gets an input vector. This input is the output of the previous layer that has been added with a dummy entry 1 at its index 0 . The function returns both the affine transform and the output of hidden layer after ReLU activation . It also adds a dummy entry 1 at its index 0 of its output. You can write this function by completing the following code. class hidden(): def __ init __ (self, input _ size, output _ size, W): # we implemented in previous part pass def forward(self, x): ’’’ x is input to the hidden layer x is of size input _ size + 1 ’’’ self.z = # complete self.y = # complete # self.y should be of size output _ size + 1 return We finally implement the backward propagation through the hidden layer. To under- stand what we are going to do, let’s denote the input to the hidden layer with x , the affine transform by z and the output with y . We assume that we have the gradient of the loss with respect to the output, i.e., we already have y ˆ R . Note that this gradi- ent does not contain the derivative with respect to the dummy entry . We should now calculate z ˆ R and x ˆ R . 3. Add the function backward to the class hidden() . This function gets an input vector. This input is the gradient with respect to the layer’s output given by the next layer in the backward pass. This vector has output _ size entries. The function returns both the gradient with respect to affine transform , i.e., z ˆ R , and the gradient with respect to input of the hidden layer , i.e., x ˆ R . It also drops the first dummy entry of x ˆ R. You can write this function by completing the following code. class hidden(): def __ init __ (self, input _ size, output _ size, W): # we implemented in previous part pass def forward(self, x): # we implemented in previous part pass def backward(self, g _ y): ’’’ g _ y is gradient w.r.t. output Assignment 2 Deadline: Feb 16, 2024 Page 8 of 14
g _ y is of size output _ size ’’’ self.g _ z = # complete self.g _ x = # complete # self.g _ x should be of size input _ size return T ASK 2: I MPLEMENTING O UTPUT L AYER In this task, we implement the output layer which is a softmax-activated vector neuron. Recall that for an K -dimensional input x K and C classes, this layer first calculate an affine function as z = W " 1 x # for some matrix W C × ( K +1) . It then passes z C through the softmax activation to determine the output y C . Entry i of y is given by y [ i ] = e z [ i ] C X j =1 e z [ j ] (2.1) where z [ i ] is entry i of vector z . 1. Write a new class outLayer() . This class gets the input and output dimensions of the output layer, i.e., K and C in the above example, and performs the forward and backward passes. In this class, define the input and output dimensions as attributes. Also, initiate the matrix of weights via a matrix given as input. You can make this class by completing the following code: class outLayer(): def __ init __ (self, input _ size, output _ size, W): # define the attributes of this class self.input _ size = # complete self.output _ size = # complete # initiate the matrix of weights self.weights = W We implement the forward propagation from the input to the output of output layer. This includes two passes, i.e., one affine transform and one softmax activation . We can implement the affine transform by multiplying the extended input (input + dummy entry 1) to the weight matrix. For activation, we need to pass the output of the affine transform through the softmax function defined in (2.1). 2. Add the function forward to the class outLayer() . This function gets an input vector. This input is the output of last hidden layer that has been added with a dummy entry 1 at its index 0 . The function returns both the affine transform and the output of softmax activation . You can write this function by completing the following code. Assignment 2 Deadline: Feb 16, 2024 Page 9 of 14
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
class outLayer(): def __ init __ (self, input _ size, output _ size, W): # we implemented in previous part pass def forward(self, x): ’’’ x is output of last hidden layer x is of size input _ size + 1 ’’’ self.z = # complete self.y = # complete # size of self.y should be # of classes return We next implement the backward propagation through the softmax-activated neuron. To understand what we are going to do, let’s again denote the input to the output layer with x , the affine transform by z and the output of softmax function with y . We should calculate y ˆ R using the definition of the cross-entropy loss, and z ˆ R and x ˆ R via backward pass. 3. Add the function loss to the class outLayer() . This function gets the true label as input and determines the cross-entropy loss, as well as its gradient with re- spect to the output layer’s output. You can write this function by completing the following code. class outLayer(): def __ init __ (self, input _ size, output _ size, W): # we implemented in previous part pass def forward(self, x): # we implemented in previous part pass def loss(self, v): ’’’ v is the true label {0, 1, ..., 9} ’’’ self.loss = # complete # self.loss is cross-entropy between self.y and v self.g _ y = # complete # self.g _ y is the gradient of loss w.r.t. output # self.g _ y is of size output _ size return 4. Add the function backward to the class outLayer() . This function returns both the gradient with respect to affine transform , i.e., z ˆ R , and the gradient with Assignment 2 Deadline: Feb 16, 2024 Page 10 of 14
respect to input to the output layer , i.e., x ˆ R . It also drops the first dummy entry of x ˆ R. You can write this function by completing the following code. class outLayer(): def __ init __ (self, input _ size, output _ size, W): # we implemented in previous part pass def forward(self, x): # we implemented in previous part pass def loss(self, v): # we implemented in previous part pass def backward(self): self.g _ z = # complete self.g _ x = # complete # size of self.g _ x should be input _ size # no input needed as loss generated self.g _ y return T ASK 3: C OMPLETING FNN I MPLEMENTATION Since we have all the layers implemen- ted, we can now implement our specified three-layer FNN with its complete forward and backward pass. 1. Write a new class myFNN() . This attributes of this class are the widths of our FNN and its initially chosen weights, i.e., W (0) for = 1, 2, 3. Set these initial weights to be matrices whose entries are randomly chosen from interval [ 1, 1]. You can write this class by completing the following code: class myFNN(): def __ init __ (self): # define the attributes of this class self.input _ size = 784 weights _ 1 = # complete self.hidden _ size _ 1 = 128 self.hidden1 = hidden(784, 128, weights _ 1) weights _ 2 = # complete self.hidden _ size _ 2 = 128 self.hidden2 = hidden(128, 128, weights _ 2) weights _ 3 = # complete self.num _ classes = 10 self.outLayer = outLayer(128, 10, weights _ 3) Assignment 2 Deadline: Feb 16, 2024 Page 11 of 14
2. Write the function forward for this class. This functions gets a data-point x with its label v and implements the forward pass through the three-layer FNN. You can write this function by completing the following code: class myFNN(): def __ init __ (self): # we implemented in previous part pass def forward(self, x, v): # add dummy 1 to x at index 0 x = # complete # forward pass through hidden layer 1 self.hidden1.forward(x) # forward pass through hidden layer 2 self.hidden2.forward(self.hidden1.y) # forward pass through output layer # complete # compute loss # complete <loss> return 3. Write the function backward for this class that implements the backward pass through the FNN. You can write this function by completing the following code: class myFNN(): def __ init __ (self): # we implemented in previous part pass def forward(self, x, v): # we implemented in previous part pass def backward(self): # backward pass through output layer self.outLayer.backward() # backward pass through hidden layer 2 self.hidden2.backward(self.outLayer.g _ y) # backward pass through hidden layer 1 # complete # Now, compute gradients w.r.t. weights self.grad _ 1 = # complete < gradient for layer 1> self.grad _ 2 = # complete < gradient for layer 2> self.grad _ 3 = # complete < gradient for output layer> return The class myFNN() now implements forward and backward pass. We can initiate call it to have the three-layer FNN with some random weights. We can then give an input Assignment 2 Deadline: Feb 16, 2024 Page 12 of 14
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
along with its true label to compute the forward pass, and then apply the backward pass to get the gradients. This can be readily done by just few lines of code: # define the model model = myFNN() # for input data-point x and label v pass forward model.forward(x,v) # then we pass backward model.backward() # now we have the sample gradients print (model.grad _ 1) print (model.grad _ 2) print (model.grad _ 3) We next need to read some data-point from MNIST. 2.2 MNIST D ATASET In this short assignment, we learn how to load MNIST data-points as a torch.Tensor. As mentioned in the lecture, you can read MNIST from module torchvision.datasets : let’s import this module, and also the module torchvision.transforms that helps us convert MNIST data-points to torch.Tensor. While importing we give them names: import torchvision.datasets as DS import torchvision.transforms as transform T ASK 1: L OADING A D ATA -P OINT We can now readily load MNIST as mnist = DS.MNIST(’./data’ , train=True, transform=transform.ToTensor(), download=True) In the above code, we indicate that the dataset is saved in folder ’data’ inside our current directory. We indicate that we load the training dataset. We apply the trans- form .ToTensor() to load them as torch.Tensor, and finally we let the dataset to be downloaded. The object mnist is a collection of 60,000 tuples: the first entry of the tuple is a torch.Tensor whose entries are pixels of the image and the second entry is the label. 1. Use the command len() to check the length of object mnist . 2. Read the first tuple in mnist and specify its pixel tensor and label. 3. Use the method .reshape() to reshape the pixel tensor into a form that can be given to the three-layer FNN implemented in previous assignment. 4. Call the reshaped pixel tensor x and the label v and run the code below to check myFNN() implementation Assignment 2 Deadline: Feb 16, 2024 Page 13 of 14
# define the model model = myFNN() # for input data-point x and label v pass forward model.forward(x,v) # then we pass backward model.backward() T ASK 2: M AKING M INI -B ATCHES PyTorch has modules that can be used to divide a dataset into mini-batches. But, we want to do this ourselves in this task. You can imagine how easy it is: we only need to make a loop. 1. Write the function myBatcher that gets batch _ size as input and returns a list of mini-batches of size batch _ size . You can write this function by completing the following code: def myBatcher(batch _ size): # initiate with empty list of mini-batches batch _ list = [] # compute the number of mini-batches num _ batches = # complete for j in range (num _ batches): # initiate with tensors of all zeros batch _ x = torch.zeros(batch _ size,784) batch _ v = torch.zeros(batch _ size) for i in range (batch _ size): # read pixel batche entry batch _ x[i] = # complete # read label batche entry batch _ v[i] = # complete # put pixel and label batch in a tuple batch = (batch _ x,batch _ v) # append this mini-batch to the list batch _ list.append(batch) return batch _ list 2. Run the function myBatcher with batch _ size = 100 and print the labels of the first and third mini-batches. Assignment 2 Deadline: Feb 16, 2024 Page 14 of 14