assignment 4 637

pdf

School

Stevens Institute Of Technology *

*We aren’t endorsed by this school

Course

637

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by DeanScience12746

Assignment 4 Abhinay Reddy 2 0009174 Discovering Knowledge in Data: An introduction to Data Mining, Daniel T. Larose, John Wiley (2004) Chapter 7, Page 146, #7, 8, and 10 The example is the same as the one in the lecture 8 slides. Noted that the learning rate = 0.1, although there might be a typo in the textbook/lecture slides saying the learning rate = 0.01. The neural network consists of layered, feed forward and completely connected network of nodes. Below is a simple neural network structure. Data Inputs and Initial Values for Neural Network weights from page 132, table 7.1 The next set of inputs is, 1. Adjust the weights W 0B , W 1B , W 2B , and W 3B from the example of back-propagation in the text (P137)? Answer: The back-propagation algorithm initially assumes random weights and then decreases the error using gradient descent method by calculating the right weights by propagating back through network. Where, ΔW ij = ηδ j x ij η = learning rate

x ij = signifies i th input to node j. δ j = represents responsibility for an error belonging to node j. The error responsibility is computed using partial derivative of the sigmoid function w.r.t to net j. Therefore, Where, ΣW jk δ j refers to weighted sum of error responsibilities for nodes downstream. 1) W OZ, NEW = W OZ, CURRENT + ΔW OZ error responsibility δ z for node Z. Z being an output node δ z = output z (1 – output z ) (actual z – output z ) = 0.875(1 – 0.875) (0.8 – 0.875) = -0.0082 Adjusting the constant weight W OZ using the back-propagation rule: ΔW OZ = ηδ Z (1) = 0.1(-0.0082)(1) = -0.00082 W OZ, NEW = W OZ, CURRENT + ΔW OZ = 0.5 – 0.00082 = 0.499188 2) W BZ, NEW = W BZ, CURRENT + ΔW BZ the error responsibility δ B for node B. B is hidden node and Z is the only downstream node: δ B = output B (1 – output B ) ΣW BZ δ Z = 0.8176(1 – 0.8176) (0.9) (-0.0082) = -0.0011 Adjusting the constant weight W BZ using the back- propagation rule: ΔW BZ = ηδ Z output B = 0.1(- 0.0082) (0.8176) = - 0.00067 W BZ, NEW = W BZ, CURRENT + ΔW BZ = 0.9 – 0.00067 = 0.89933 We move upstream to the connections being used as inputs to node B. ΔW 1B = ηδ B x 1 = 0.1(- 0.0011) (0.4) = -0.000044 W 1B, NEW = W 1B, CURRENT + ΔW 1B = 0.9 – 0.000044 = 0.899956 ΔW 2B = ηδ B x 2 = 0.1(- 0.0011) (0.2) = -0.000022 W 2B, NEW = W 2B, CURRENT + ΔW 2B = 0.8 – 0.000022 = 0.799978 ΔW 3B = ηδ B x 3 = 0.1(- 0.0011) (0.7) = -0.000077 W 3B, NEW = W 3B, CURRENT + ΔW 3B = 0.4 – 0.000077 = 0.399923 ΔW 0B = ηδ B x 0 = 0.1(- 0.0011) (1) = -0.000011 W 0B, NEW = W 0B, CURRENT + ΔW 0B = 0.7 – 0.000011 = 0.699989 3) Similarly, for node A: δ A = output A (1- output A ) ΣW AZ δ Z = 0.7892(1 - 0.7892) (0.9) (-0.0082) = - 0.00123 ΔW AZ = ηδ z output A = 0.1(- 0.0082) (0.7892) = - 0.000647 W AZ, NEW = W AZ, CURRENT + ΔW AZ = 0.9 – 0.000647 = 0.899353 We move upstream to the connections used as inputs to node A. ΔW 1A = ηδ A x 1 = 0.1(- 0.00123) (0.4) = -0.0000492 W 1A, NEW = W 1A, CURRENT + ΔW 1A = 0.6 – 0.0000492 = 0.5999508

1+0.267 ΔW 2A = ηδ A x 2 = 0.1(- 0.00123) (0.2) = -0.0000246 W 2A, NEW = W 2A, CURRENT + ΔW 2A = 0.8 – 0.0000246 = 0.7999754 ΔW 3A = ηδ A x 3 = 0.1(- 0.00123) (0.7) = -0.0000861 W 3A, NEW = W 3A, CURRENT + ΔW 3A = 0.6 – 0.0000861 = 0.5999139 ΔW 0A = ηδ A x 0 = 0.1(- 0.00123) (1) = -0.000123 W 0A, NEW = W 0A, CURRENT + ΔW 0A = 0.5 – 0.000123 = 0.499877 1. Refer to the previous problem. Show that the adjusted weights result in a smaller prediction error? Answer: The nonlinear sigmoid function is given by, Y The outputs of the nodes A, B and Z are as follows: f(net A ) = 0.7892 f(net B ) = 0.8176 f(net Z ) = 0.8750 Adjusting the weights, net A’ = Σ W iA x iA = W 0A (1) + W 1A x 1A + W 2A x 2A + W 3A x 3A = 0.499877 + 0.5999508(0.4) + 0.7999754(0.2) + 0.5999139(0.7) = 1.31979213 f(net A’ ) = 1 1+ 𝑒 − 1.31979213 = 1 = 0.7892659 netB ’ = Σ Wi bX i B = W0 B ( 1) + W1 Bx 1 B + W2 Bx 2 B + W3 Bx 3 B = 0.699989 + 0.899956(0.4) + 0.799978(0.2) + 0.399923(0.7) = 1.4999131 f(netB ’) = 1 1 + 𝑒 − 1 .4999131 1+ netZ’ = Σ Wiz Xi z = 1 0.223 =0 .81766148 = W0Z (1 ) + WAZ xA Z + WBZ xB Z = 0.5 + 0.8(0.7892659) + 0.8(0.81766148) = 0.5 + 0.63141272 + 0.654129184

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

= 1.785541904 f(netZ’ ) = 1 1+ 𝑒 − 1. 785541904 = 1 .168 +0 =0. 85616438356 On observing, updated f(netZ’) i s smaller than f(netZ) b y ~0.0188. So, there is a smaller prediction error with adjusted weights. 1. Describe the benefits and drawbacks of using large or small values for the learning rate? An swer: 2. The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. The learning rate values can be large or small and can have its very own positives and drawbacks. For high learning rate, A high value of learning rate helps the model to learn quickly at the cost of arriving on a sub-optimal value of final set of weights. 1. W th high learning rate, undesirable divergent behavior is observed in in the loss function. When the learning rate is too large then consequently the weight updates will be larger and hence the performance of the model will fluctuate over the training epochs. For low learning rate, 1. A low value of learning rate allows the model to learn with an optimal set of weights but may take significantly longer time to train. With low learning rate, training will progress slowly as it adds very less value to the weights in the network. When the learning rate is too small then the weight updates may never converge or could get stuck on a sub optimal solution. 1

Related Documents

CYB 230 Module Two Lab Worksheet.docx

Practice-Exam-Final.pdf

SCS 100 Project 1 Comparison Template.docx

hw5solutions.pdf

CYB 230 Module Three Lab Worksheet.docx

Diversion.docx

it-200 m7.docx

Blog Critique Paper.pdf

Recommended textbooks for you

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

MIS

Computer Science

ISBN:9781337681919

Author:BIDGOLI

Publisher:Cengage

SEE MORE TEXTBOOKS

Recommended textbooks for you

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
MIS
Computer Science
ISBN:9781337681919
Author:BIDGOLI
Publisher:Cengage