assignment 4 637

pdf

School

Stevens Institute Of Technology *

*We aren’t endorsed by this school

Course

637

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

4

Uploaded by DeanScience12746

Report
Assignment 4 Abhinay Reddy 2 0009174 Discovering Knowledge in Data: An introduction to Data Mining, Daniel T. Larose, John Wiley (2004) Chapter 7, Page 146, #7, 8, and 10 The example is the same as the one in the lecture 8 slides. Noted that the learning rate = 0.1, although there might be a typo in the textbook/lecture slides saying the learning rate = 0.01. The neural network consists of layered, feed forward and completely connected network of nodes. Below is a simple neural network structure. Data Inputs and Initial Values for Neural Network weights from page 132, table 7.1 The next set of inputs is, 1. Adjust the weights W 0B , W 1B , W 2B , and W 3B from the example of back-propagation in the text (P137)? Answer: The back-propagation algorithm initially assumes random weights and then decreases the error using gradient descent method by calculating the right weights by propagating back through network. Where, ΔW ij = ηδ j x ij η = learning rate
x ij = signifies i th input to node j. δ j = represents responsibility for an error belonging to node j. The error responsibility is computed using partial derivative of the sigmoid function w.r.t to net j. Therefore, Where, ΣW jk δ j refers to weighted sum of error responsibilities for nodes downstream. 1) W OZ, NEW = W OZ, CURRENT + ΔW OZ error responsibility δ z for node Z. Z being an output node δ z = output z (1 output z ) (actual z output z ) = 0.875(1 0.875) (0.8 0.875) = -0.0082 Adjusting the constant weight W OZ using the back-propagation rule: ΔW OZ = ηδ Z (1) = 0.1(-0.0082)(1) = -0.00082 W OZ, NEW = W OZ, CURRENT + ΔW OZ = 0.5 0.00082 = 0.499188 2) W BZ, NEW = W BZ, CURRENT + ΔW BZ the error responsibility δ B for node B. B is hidden node and Z is the only downstream node: δ B = output B (1 output B ) ΣW BZ δ Z = 0.8176(1 0.8176) (0.9) (-0.0082) = -0.0011 Adjusting the constant weight W BZ using the back- propagation rule: ΔW BZ = ηδ Z output B = 0.1(- 0.0082) (0.8176) = - 0.00067 W BZ, NEW = W BZ, CURRENT + ΔW BZ = 0.9 0.00067 = 0.89933 We move upstream to the connections being used as inputs to node B. ΔW 1B = ηδ B x 1 = 0.1(- 0.0011) (0.4) = -0.000044 W 1B, NEW = W 1B, CURRENT + ΔW 1B = 0.9 0.000044 = 0.899956 ΔW 2B = ηδ B x 2 = 0.1(- 0.0011) (0.2) = -0.000022 W 2B, NEW = W 2B, CURRENT + ΔW 2B = 0.8 0.000022 = 0.799978 ΔW 3B = ηδ B x 3 = 0.1(- 0.0011) (0.7) = -0.000077 W 3B, NEW = W 3B, CURRENT + ΔW 3B = 0.4 0.000077 = 0.399923 ΔW 0B = ηδ B x 0 = 0.1(- 0.0011) (1) = -0.000011 W 0B, NEW = W 0B, CURRENT + ΔW 0B = 0.7 0.000011 = 0.699989 3) Similarly, for node A: δ A = output A (1- output A ) ΣW AZ δ Z = 0.7892(1 - 0.7892) (0.9) (-0.0082) = - 0.00123 ΔW AZ = ηδ z output A = 0.1(- 0.0082) (0.7892) = - 0.000647 W AZ, NEW = W AZ, CURRENT + ΔW AZ = 0.9 0.000647 = 0.899353 We move upstream to the connections used as inputs to node A. ΔW 1A = ηδ A x 1 = 0.1(- 0.00123) (0.4) = -0.0000492 W 1A, NEW = W 1A, CURRENT + ΔW 1A = 0.6 0.0000492 = 0.5999508
1+0.267 ΔW 2A = ηδ A x 2 = 0.1(- 0.00123) (0.2) = -0.0000246 W 2A, NEW = W 2A, CURRENT + ΔW 2A = 0.8 0.0000246 = 0.7999754 ΔW 3A = ηδ A x 3 = 0.1(- 0.00123) (0.7) = -0.0000861 W 3A, NEW = W 3A, CURRENT + ΔW 3A = 0.6 0.0000861 = 0.5999139 ΔW 0A = ηδ A x 0 = 0.1(- 0.00123) (1) = -0.000123 W 0A, NEW = W 0A, CURRENT + ΔW 0A = 0.5 0.000123 = 0.499877 1. Refer to the previous problem. Show that the adjusted weights result in a smaller prediction error? Answer: The nonlinear sigmoid function is given by, Y The outputs of the nodes A, B and Z are as follows: f(net A ) = 0.7892 f(net B ) = 0.8176 f(net Z ) = 0.8750 Adjusting the weights, net A’ = Σ W iA x iA = W 0A (1) + W 1A x 1A + W 2A x 2A + W 3A x 3A = 0.499877 + 0.5999508(0.4) + 0.7999754(0.2) + 0.5999139(0.7) = 1.31979213 f(net A’ ) = 1 1+ 𝑒 1.31979213 = 1 = 0.7892659 netB = Σ Wi bX i B = W0 B ( 1) + W1 Bx 1 B + W2 Bx 2 B + W3 Bx 3 B = 0.699989 + 0.899956(0.4) + 0.799978(0.2) + 0.399923(0.7) = 1.4999131 f(netB ’) = 1 1 + 𝑒 1 .4999131 1+ netZ’ = Σ Wiz Xi z = 1 0.223 =0 .81766148 = W0Z (1 ) + WAZ xA Z + WBZ xB Z = 0.5 + 0.8(0.7892659) + 0.8(0.81766148) = 0.5 + 0.63141272 + 0.654129184
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
= 1.785541904 f(netZ’ ) = 1 1+ 𝑒 1. 785541904 = 1 .168 +0 =0. 85616438356 On observing, updated f(netZ’) i s smaller than f(netZ) b y ~0.0188. So, there is a smaller prediction error with adjusted weights. 1. Describe the benefits and drawbacks of using large or small values for the learning rate? An swer: 2. The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. The learning rate values can be large or small and can have its very own positives and drawbacks. For high learning rate, A high value of learning rate helps the model to learn quickly at the cost of arriving on a sub-optimal value of final set of weights. 1. W th high learning rate, undesirable divergent behavior is observed in in the loss function. When the learning rate is too large then consequently the weight updates will be larger and hence the performance of the model will fluctuate over the training epochs. For low learning rate, 1. A low value of learning rate allows the model to learn with an optimal set of weights but may take significantly longer time to train. With low learning rate, training will progress slowly as it adds very less value to the weights in the network. When the learning rate is too small then the weight updates may never converge or could get stuck on a sub optimal solution. 1