tute3_ans(1)

pdf

School

Nanyang Technological University *

*We aren’t endorsed by this school

Course

MISC

Subject

Accounting

Date

Nov 24, 2024

Type

pdf

Pages

42

Uploaded by DukeLemur2884

Report
Neuron Layers SC4001 – Tutorial 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
GD for Softmax layer Given training set ࠵?, ࠵? Set learning rate ࠵? Initialize ࠵? and ࠵? Iterate until convergence: ࠵? = ࠵?࠵? + ࠵? ࠵? ࠵? = ! ࠵? "#$ % ! ࠵? ࠵? ࠵? # ࠵? = − ࠵? − ࠵? ࠵? ࠵? ← ࠵? − ࠵?࠵? $ ࠵? ࠵? ࠵? ࠵? ← ࠵? − ࠵? ࠵? ࠵? ࠵? $ ࠵? &
Labels for classes: ࠵?࠵?࠵?࠵?࠵? ࠵? → 1, ࠵?࠵?࠵?࠵?࠵? ࠵? → 2, ࠵?࠵?࠵?࠵?࠵? ࠵? → 3 The data matrix and target vector: ࠵? = 0 4 −1 3 2 3 −2 2 0 2 1 2 −1 2 4 −1 , ࠵? = 1 1 1 2 2 1 2 3 , ࠵? = 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1
࠵? ! ࠵? " ࠵? # ࠵? ! ࠵? " +1 ࠵? ࠵? = ࠵? ࠵? "#$ % ࠵? ࠵? ࠵? = ࠵?(࠵? = ࠵?|࠵?) ࠵? = argmax " ࠵? ࠵? Learning rate ࠵? = 0.05. Initialize weights and biases: ࠵? = 0.88 0.08 −0.34 0.68 −0.39 −0.19 and ࠵? = 0.0 0.0 0.0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Epoch 1: ࠵? = ࠵?࠵? + ࠵? = 0 4 −1 3 2 3 −2 2 4 −1 0.88 0.08 −0.34 0.68 −0.39 −0.19 + 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 = 2.72 −1.54 −0.75 1.17 −1.23 −0.23 3.8 −1.0 −1.23 −0.39 −0.93 0.30 2.82 0.71 −1.16
࠵? = 2.72 −1.54 −0.75 1.17 −1.23 −0.23 3.8 −1.0 −1.23 −0.39 −0.93 0.30 2.82 0.71 −1.16 ࠵? ࠵? & = ࠵? " "#$ % ࠵? ࠵? ࠵? ࠵? == ࠵? (.*( ࠵? (.*( + ࠵? +$.,- + ࠵? +..*, ࠵? +$.,- ࠵? (.*( + ࠵? +$.,- + ࠵? +..*, ࠵? +..*, ࠵? (.*( + ࠵? +$.,- + ࠵? +..*, ࠵? $.$* ࠵? $.$* + ࠵? +$.(/ + ࠵? +..(/ ࠵? +$.(/ ࠵? $.$* + ࠵? +$.(/ + ࠵? +..(/ ࠵? +..(/ ࠵? $.$* + ࠵? +$.(/ + ࠵? +..(/ ࠵? (.0( ࠵? (.0( + ࠵? ..*$ + ࠵? +$.$1 ࠵? ..*$ ࠵? (.0( + ࠵? ..*$ + ࠵? +$.$1 ࠵? +$.$1 ࠵? (.0( + ࠵? ..*$ + ࠵? +$.$1 = 0.96 0.01 0.03 0.75 0.07 0.18 0.88 0.11 0.02
࠵? = argmax ࠵? ࠵? = argmax 0.96 0.01 0.03 0.75 0.07 0.18 0.99 0.01 0.01 0.28 0.16 0.56 0.88 0.11 0.02 = 1 1 1 3 1 Classificationerror = ∑ ()* & 1 ࠵? ≠ ࠵? = 14 ࠵?࠵?࠵?࠵?࠵?࠵?࠵? = − X ()* & ࠵?࠵?࠵? ࠵? ࠵? (+ = − log 0.96 − log 0.75 − log 0.99 − log 0.16 ⋯ − log 0.02 = 34.36 ࠵? = 1 1 1 2 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
࠵? # ࠵? = − ࠵? − ࠵? ࠵? = − 1 0 0 1 0 0 0 0 1 0.96 0.01 0.03 0.75 0.07 0.18 0.88 0.11 0.02 = −0.04 0.01 0.03 −0.25 0.07 0.18 0.88 0.11 −0.98 ࠵? = ࠵? − ࠵? ࠵? $ ࠵? ࠵? ࠵? = 0.88 0.08 −0.34 0.68 −0.39 −0.19 − 0.05 0 −1 4 4 3 −1 −0.04 0.01 0.03 −0.25 0.07 0.18 0.88 0.11 −0.98 = 0.28 −0.54 0.89 0.54 −0.12 −0.31 ࠵? = ࠵? − ࠵? ࠵? ࠵? ࠵? $ ࠵? & = 0.0 0.0 0.0 + 0.05 −0.04 0.01 0.03 −0.25 0.07 0.18 0.88 0.11 −0.98 $ 1 1 1 1 = −0.32 0.27 0.06
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
At convergence: ࠵? = −0.15 −3.41 4.18 5.27 −1.02 −4.15 and ࠵? = −7.82 5.81 2.02 Entropy = 0.58 Classification Error = 0
࠵? = −1 2 0 4 −1 3 0 2 3 0 −2 −1 4 1 1 2 2 −1 2 3 2 1 −2 0 −3 −1 1 0 −1 1 4 −1 −3 1 −2 2 ࠵? ࠵? = 1.0 0.0 0.0 0.88 0.12 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.26 0.74 0.0 0.89 0.1 0.0 0.01 0.99 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.02 0.98 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 , ࠵? = 1 1 1 2 2 1 2 2 2 3 3 2 3 3 2 2 3 3 Probabilities that input patterns belong to target classes are given in RED . At convergence:
At convergence: ࠵? = −0.15 −3.41 4.18 5.27 −1.02 −4.15 and ࠵? = −7.82 5.81 2.02 Synaptic inputs at the softmax layer for an input ࠵? = ࠵? $ , ࠵? ( : Neuron of class A, ࠵? $ = ࠵? $ 2 ࠵? + ࠵? $ = −0.15࠵? $ + 5.27࠵? ( − 7.82 Neuron of class B, ࠵? ( = ࠵? ( 2 ࠵? + ࠵? ( = −3.41࠵? $ − 1.02࠵? ( + 5.81 Neuron of class C, ࠵? / = ࠵? / 2 ࠵? + ࠵? / = 4.18࠵? $ − 4.15࠵? ( + 2.02 Decision boundaries: Between class A and class B is given when ࠵? $ = ࠵? ( −0.15࠵? $ + 5.27࠵? ( − 7.82 = −3.41࠵? $ − 1.02࠵? ( + 5.81 3.25࠵? $ + 6.29࠵? ( − 13.63 = 0 Similarly, between class B and class C ࠵? ( = ࠵? / : −7.59࠵? $ + 3.13࠵? ( + 3.79 = 0 between class A and class C ࠵? / = ࠵? $ : 4.33࠵? $ − 9.42࠵? ( + 9.84 = 0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
class A and class B class A and class C class B and class C
Decision boundaries leant by the softmax layer
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
from sklearn.datasets i mport load_linnerud X, y = load_linnerud (return_X_y=True) X y Parameters of the data targets inputs
# preprocess input and output data from sklearn import preprocessing # normal Gaussian scaling for inputs X_scaler = preprocessing.StandardScaler(). fit (X_train) X_scaled = X_scaler. transform (X_train) [ 9.06666667 161.6 77.8 ] [ 28.46222222 3795.17333333 2861.62666667] ࠵? 3 = ࠵? − ࠵? ࠵? Normalizing input variables such that ࠵? , ~࠵? 0, 1 . print (X_scaler.mean_) print (X_scaler.var_)
Sclaing output variables such that ࠵? 3 ~[0, 1]. # linear scaling up to [0,1] for outputs y_scaler = preprocessing.MinMaxScaler(). fit (y_train) y_scaled = y_scaler. transform (y_train) [0.00917431 0.06666667 0.03571429] [-1.26605505 -2.06666667 -1.64285714] prin t(y_scaler.scale_) # print (y_scaler.min_) ࠵? , = * - ()* .- (+, ࠵? + .- (+, - ()* .- (+, ࠵?࠵?࠵?࠵?࠵? ࠵?࠵?࠵?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Given a training dataset ࠵?, ࠵? Set learning parameter ࠵? Initialize ࠵? and ࠵? Repeat until convergence: ࠵? = ࠵?࠵? + ࠵? ࠵? = ࠵? ࠵? ࠵? # ࠵? = − ࠵? − ࠵? b ࠵? , ࠵? ࠵? ← ࠵? − ࠵?࠵? $ ࠵? ࠵? ࠵? ࠵? ← ࠵? − ࠵? ࠵? ࠵? ࠵? $ ࠵? & GD for a perceptron layer with direct gradients ࠵? ! ࠵? " ࠵? # +1 ࠵? ! ࠵? # ࠵? "
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Implementing Perceptron Layer using pytorch libraries : from torch import nn # torch nn API provides methods for build custom neural networks nn.Module - Base class for all neural network modules. - Your models should also subclass this class. nn.Linear - applies a linear transformation ࠵? = ࠵? 2 ࠵? + ࠵? nn.Sigmoidal - implements a perceptron layer nn.Sequential A sequential container. Modules will be added to it in the order they are passed in the constructor. The value a Sequential provides over manually calling a sequence of modules is that it allows treating the whole container as a single module Implementing MSE loss and SGD optim izer loss_fn = nn.MSELoss () optimizer = torch.optim.SGD (model.parameters(), lr=lr)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
# Perceptron layer class class PerceptronLaye r(nn.Module): def __init__(self, no_inputs, no_outputs): super().__init__() self.perceptron_layer = nn. Sequential ( nn. Linear (no_inputs, no_outputs), nn. Sigmoid () ) def forward(self, x): logits = self.perceptron_layer(x) return logits # create an instance of the layer no_inputs, no_outputs = 3, 3 model = PerceptronLayer (no_inputs, no_outputs) GD for a perceptron layer with PyTorch nn.Module class
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
loss_fn = torch.nn. MSELoss () optimizer = torch.optim. SGD (model.parameters(), lr=.001) no_epochs, lr = 50000, 0.001 for epoch in range(no_epochs): # Compute prediction and loss pred = model(torch.tensor(X_scaled, dtype=torch.float)) loss = loss_fn(pred, torch.tensor(y_scaled, dtype=torch.float)) optimizer.zero_grad() loss.backward() optimizer.step() GD for a perceptron layer with PyTorch autograd
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Direct gradients Autograd Time for weight update = 0.074ms Number of epochs = 10,000 Time for weight update = 0.114ms Number of epochs = 50,000
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
# scaling testing inputs X_scaled = X_scaler.transform(X_test) y_pred = model (X_scaled) # scaling predicted outputs y_scaled = y_scaler. inverse_transform (y_pred.detach().numpy()) # computing mean square error from sklearn.metrics import mean_squared_error rms = mean_squared_error (y_scaled, y_test, squared=True, multioutput='raw_values') # computing R 2 from sklearn.metrics import r2_score r2 = r2_score (y_scaled, y_test, multioutput='raw_values')
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Direct gradients Autograd MSE = [407.06558087, 1.21179368, 22.40310961] R 2 = [-15.16134924, 0.53783585 , -11.22642492] MSE = [356.95227178, 1.61016719, 21.11763412] R 2 = [ -3.18467906, 0.06741378 , -11.90990362] Exercise can best predict the waist!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
R 2 : coefficient of determination Let ‘࠵? & be the predicted value of data point ࠵? & and the number of data points be ࠵?: The residual sum of square error ࠵?࠵? 456 = ∑ $ 7 ‘࠵? & − ࠵? & ( If mean of data, b࠵? = $ 7 &#$ 7 ࠵? & . The total sum of squares (proportional to the variance), ࠵?࠵? 898 = ∑ $ 7 b࠵? − ࠵? & ( ࠵? ( = 1 − ࠵?࠵? 456 ࠵?࠵? 898 In the best case, the modelled value exactly match the observed value ࠵?࠵? 456 =0, then ࠵? ( = 1. The baseline model which predicts b࠵? , then ࠵? ( = 0. The models that have worse prediction than baseline models have negative ࠵? ( .
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
+1 ࠵? $ ࠵? " 1 2 3 ࠵? # ࠵? ! Sepal length Sepal width Petal length Petal width Setosa Versicolour Virginica Four features and three labels 90 data points for training and 60 data points for testing
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
# datasets import from sklearn import datasets from sklearn.model_selection import train_test_split iris = datasets.load_iris () iris.data -= np.mean (iris.data, axis=0) # mean correct data # dataset split for train and test x_train, x_test, y_train, y_test = train_test_split (iris.data, iris.target, test_size=0.2, random_state=2)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
class SoftmaxLayer (nn.Module): def __init__(self, no_inputs, no_outputs): super().__init__() self.softmax_layer = nn. Sequential ( nn. Linear (no_inputs, no_outputs), # applies a linear transformation ࠵? = ࠵? 2 ࠵? + ࠵? nn. Softmax (dim=1) # implements softmax; sum up across rows to 1.0 ) def forward(self, x): logits = self.softmax_layer(x) return logits from torch import nn
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Mini-batch gradient descent Batch size = 16, learning factor = 0.1 Torch Dataset and Dataloader utils provide easy function for mini-batch learning from torch.utils.data import Dataset from torch.utils.data import DataLoader We will be creating a subclass of Dataset class and using Dataloaders to implement mini-batch gradient descent
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
# create a Dataset class # A custom Dataset class must implement three functions: __init__ , __len__ , and __getitem__ . class MyDataset (Dataset): def __init__ (self, X, y): self.X =torch.tensor(X, dtype=torch.float) self.y =torch.tensor(y) def __len__( self): return len(self.y) def __getitem__ (self,idx): return self.X[idx], self.y[idx] # create Dataset objects for train and test data train_data = MyDataset (x_train, y_train) test_data = MyDataset (x_test, y_test) # create DataLoader objects for train and test data train_dataloader = DataLoader (train_data, batch_size=batch_size, shuffle= True ) test_dataloader = DataLoader (test_data, batch_size=batch_size, shuffle= True )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
def train_loop (dataloader, model, loss_fn, optimizer): size = len(dataloader.dataset) num_batches = len(dataloader) train_loss, correct = 0, 0 for batch, (X, y) in enumerate(dataloader): # Compute prediction and loss pred = model(X) loss = loss_fn(pred, y) optimizer.zero_grad(). #initialize gradient calculations loss.backward() # compute gradients optimizer.step() # execute one step of SGD train_loss += loss.item() correct += (pred.argmax(1) == y).type(torch.float).sum().item() train_loss /= size correct /= size return train_loss, correct
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
def test_loop (dataloader, model, loss_fn): size = len(dataloader.dataset) num_batches = len(dataloader) test_loss, correct = 0, 0 with torch.no_grad(): for X, y in dataloader: pred = model(X) test_loss += loss_fn(pred, y).item() correct += (pred.argmax(dim=1) == y).type(torch.float).sum().item() test_loss /= size correct /= size return test_loss, correct
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
train_loss_, train_acc_, test_loss_, test_acc_ = [], [], [], [] for epoch in range(no_epochs): train_loss, train_acc = train_loop(train_dataloader, model, loss_fn, optimizer) test_loss, test_acc = test_loop(test_dataloader, model, loss_fn) train_loss_.append(train_loss), train_acc_.append(train_acc) test_loss_.append(test_loss), test_acc_.append(test_acc)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Learning curves at batch-size = 16
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
weight = [[-0.875575 1.9825934 -4.016873 -1.6209128 ] [ 0.70730793 -1.0039392 -0.2982792 -2.5409048 ] [-0.3355393 -0.79086286 3.4620962 3.9241226 ]] bias = [-0.6561739 3.990793 -2.9169736] train_loss = 0.039542 train_acc = 0.966667 test_loss = 0.040103, test_acc = 0.950000 At convergence (1000 epochs):
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Accuracies and time-to-update weights against batch-size Elbow point
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help