Neural Network Week 2
docx
keyboard_arrow_up
School
University of Texas *
*We aren’t endorsed by this school
Course
NETWORKS
Subject
Computer Science
Date
Feb 20, 2024
Type
docx
Pages
14
Uploaded by MateMaskEel33
Neural Network Week 2:
2.1 – SGD with Momentum
Q No:
1
Correct Answer
Marks: 1/1
A convex function cannot have more than one dimension?
True
False
You Selected
A function can be 2d or 3d or have more than 3-dimensions.
Q No:
2
Correct Answer
Marks: 1/1
Does momentum allow to accelerate the gradient descent algorithm?
True
You Selected
False
It accelerates the gradient descent algorithm by considering the exponentially weighted
average of the gradients.
Q No:
3
Correct Answer
Marks: 1/1
Which of the following is true about a convex function?
It has multiple local minimums.
Local minimum and global minimum are different.
Local minimum and global minimum are the same.
You Selected
It has multiple global minimums.
2.2 – Other variants of Gradient Descent
Q No:
1
Correct Answer
Marks: 1/1
Which of the below given are the different variants of optimizers?
Adagrad
RMSprop
Adam
All the above
You Selected
SGD with momentum, Adagrad, RMSprop, and Adam are the different variants of optimizers.
Q No:
2
Correct Answer
Marks: 1/1
Does the learning rate in the Adagrad dynamically decrease as the algorithm proceeds?
True
You Selected
False
The learning rate in the adagrad decreases for every iteration during the training process, and the learning rate here is dimension-specific.
2.3 – Weight initialization and its techniques
Q No:
1
Correct Answer
Marks: 1/1
Which are the following weight initialization technique are used for Relu activation?
Xavier initialization
HE initialization
You Selected
Both A and B
None of these
2.4 – regularization
Q No:
1
Correct Answer
Marks: 1/1
Which of the following is another name for L2 Regularization?
Lasso
Ridge
You Selected
Both A and B
L2-Regularization is also known as Ridge and L1 regularization as Lasso.
Q No:
2
Correct Answer
Marks: 1/1
Data Augmentation technique is used on the images?
True
You Selected
False
Data Augmentation is a regularization technique used on images that increase the amount of data by adding slightly modified copies of already existing data.
2.5 – dropout
Q No:
1
Correct Answer
Marks: 1/1
Which of the following is true about the Dropout technique?
I.
Dropout is a regularization technique that reduces overfitting.
II.
Dropout randomly drops the neurons according to the dropout ratio in the network during the training period.
III.
Dropout is the same as the fully connected layer.
Only statement I is correct.
Both I and III statements are correct.
All three statements are correct.
Both I and II statements are correct.
You Selected
Dropout is a regularization technique used to prevent the model from overfitting. Dropout drops the number of neurons in the hidden layer according to the dropout ratio given by default ratio is 0.5.
Q No:
2
Correct Answer
Marks: 1/1
Can the Dropout technique be implemented in the Output layer?
True
False
You Selected
No, dropout can not be applied on the output layer since the output layer will be giving the probabilities and numerical values for classification and regression problems respectively, and dropout will randomly drop the neurons in the layer.
2.6 – batch normalization
Q No:
1
Correct Answer
Marks: 1/1
Which of the following is true about Batch Normalization?
It normalizes all the input before sending it to the next layer.
You Selected
It is one of the Backpropagation techniques used in Neural Networks
It gives the mean and standard deviation of the weights
Batch Normalization is the same as the Dropout technique.
Batch Normalization is a regularization technique applied before activation function or after activation function in a neuron, which normalizes all the input before sending it to the next layer.
Q No:
2
Correct Answer
Marks: 1/1
Can the Batch Normalization technique be implemented in the Output layer?
True
False
You Selected
No, Batch Normalization can’t be applied on the output layer since it will normalize the value, and the output layer will output probabilities and numerical for classification and regression problems respectively.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2.7 – types of neural networks
Which of the following are the different types of Neural Networks?
RNN
CNN
Feed Forward Neural Network
All the above
You Selected
Q No:
2
Correct Answer
Marks: 1/1
For an image recognition problem, which of the following architectures of the neural network is best suited to solve the problem?
Perceptron
Convolutional Neural Network
You Selected
Multi-Layer Perceptron
LSTM
The convolutional neural network architecture is the best solution for solving any image recognition problem.
Practice Quiz
Attempt #1
Feb 07 at 9:45 PM
Marks: 8
Q No:
1
Correct Answer
Marks: 1/1
Import the dataset and answer the following question.
The distribution of the ‘Age’ attribute looks
slightly right distributed.
True
You Selected
False
#import pandas as pd
data = pd.read_csv('new_preprocessed_data.csv')
#Plotting the Distribution plot
sns.displot(data=data, x='age', kde=True)
# Finding the skewness of the Age variable
data.age.skew()
Q No:
2
Correct Answer
Marks: 2/2
Build a Neural Network Model on the dataset by following the below steps:
Store the Independent and Dependent features in X and y
Use train_test split to split the data (80% for training and 20% for testing)
Convert the target feature into a NumPy array using Keras to_categorical function
Use the below parameters mentioned
1.
The number of neurons in the First and second layers is 64 and 32 respectively.
2.
Use Dropout of ratio 0.2 after the second layer.
3.
Use ReLu as an Activation function in Input and Hidden layers and Adam as an Optimizer with 1e-3 as learning rate
4.
Build the model on 20 Epochs with validation_split=0.2
What is the accuracy of the model on the training data?
Note
:
- Do not use stratify sampling and Callbacks.
- The given dataset is scaled, so please don't scale the data again.
>30 and <50
>51 and <70
You Selected
>70 and <85
>90
#Splitting Independent and Dependent variables
X = data.drop('loan_status',axis=1)
Y = data[['loan_status']]
# Train Test Split
X_train, X_test, y_train, y_test=train_test_split(X, Y, test_size=0.2, random_state=1)
# Target Encoding
from tensorflow.keras.utils import to_categorical
y_train = to_categorical(y_train, 3)
y_test_cat = to_categorical(y_test, 3)
# Building the model
import tensorflow as tf
from tensorflow import keras
model = keras.Sequential()
# Input layer
model.add(Dense(64, activation='relu',input_shape=(11,)))
# Second layer
model.add(Dense(32, activation='relu'))
# Adding dropout
model.add(Dropout(0.2))
# Output layer
model.add(Dense(3, activation='softmax'))
# Defining the Optimizer
adam = optimizers.Adam(lr=1e-3)
# Compiling the model
model.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy'])
# Fitting the model
history=model.fit(X_train, y_train, epochs=20, validation_split=0.2, verbose=2)
Q No:
3
Correct Answer
Marks: 2/2
Build a model on the train data using the below hyperparameters and find the f1-score of the test data of the 0th class.
The number of neurons in the first, second, third, and fourth layers should be 256,124,64 and 32 respectively.
Use the BatchNormalization after the second layer.
Use ReLu as an Activation function in Indput and Hidden layers and RMSprop as Optimizer with 1e-3 as the learning rate
Build the model on 50 Epochs with validation_split=0.2, and batch_size=128,
Note
Do not use stratify sampling and Callbacks.
The given dataset is scaled, so please don't scale the data again.
0.51 - 0.55
0.71 - 0.80
0.60 - 0.70
You Selected
0.35 - 0.50
#Defining the model
model_1 = keras.Sequential()
# Adding the first (input) layer
model_1.add(Dense(256, activation='relu',input_shape=(11,)))
# Adding second layer
model_1.add(Dense(124, activation='relu'))
# Adding the BatchNormalization
model_1.add(BatchNormalization())
# Adding the third layer
model_1.add(Dense(64, activation='relu'))
#Adding fourth layer
model_1.add(Dense(32, activation='relu'))
# Adding the output layer
model_1.add(Dense(3, activation='softmax'))
# Defining the Optimizer
RMSprop = optimizers.RMSprop(lr=1e-3)
# Compiling the model
model_1.compile(loss=losses.categorical_crossentropy, optimizer=RMSprop, metrics=['accuracy'])
# Fitting the model
history_1=model_1.fit(X_train, y_train, validation_split=0.2, epochs=50, batch_size=128, verbose=2)
# Predicting on test
y_pred=model_1.predict(X_test)
# Applying argmax
y_pred_final=[]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
for i in y_pred:
y_pred_final.append(np.argmax(i))
# Plotting classification report
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred_final))
Q No:
4
Correct Answer
Marks: 1.50/1.50
Which of the following order shows the correct implementation of the Batch Normalization in the model?
model = keras.Sequential()
keras.layers.Dense(11, input_shape=(11,), activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(6, activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(3,activation='softmax')
You Selected
model = keras.Sequential()
keras.layers.BatchNormalization(),
keras.layers.Dense(11, input_shape=(11,), activation='relu'),
keras.layers.Dense(6, activation='relu'),
keras.layers.Dense(3,activation='softmax')
model = keras.Sequential()
keras.layers.Dense(11, input_shape=(11,), activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(6, activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(3,activation='softmax'),
keras.layers.BatchNormalization()
model = keras.Sequential()
keras.layers.Dense(11, input_shape=(11,), activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(6, activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(3,activation='softmax'),
keras.layers.BatchNormalization(0.5)
BatchNormalization is used only after every dense layer and BatchNormalization should not be used after the output layer.
Q No:
5
Marks: 0/2
Build a model on the train data using the below hyperparameters and find the precision of the test data of the 0th class.
The number of neurons in the first, second, third, and fourth layers should be 128,64,64 and 32 respectively.
Use the Dropout of ratio 0.3 after the second layer and BatchNormation after the third layer.
Use ReLu as an Activation function in Input and Hidden layers
Adam as Optimizer with 1e-3 as the learning rate
Build the model on 100 Epochs with validation_split=0.2, and batch_size=128
Note
Do not use stratify sampling and Callbacks.
The given dataset is scaled, so please don't scale the data again.
0.20 - 0.40
0.41 - 0.70
Correct Option
0.71 - 0.80
You Selected
>0.80
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import BatchNormalization
model_2 = keras.Sequential()
model_2.add(Dense(128, activation='relu',input_shape=(11,)))
model_2.add(Dense(64, activation='relu'))
model_2.add(Dropout(0.3))
model_2.add(Dense(64, activation='relu'))
model_2.add(BatchNormalization())
model_2.add(Dense(32, activation='relu'))
model_2.add(Dense(3, activation='softmax'))
# Defining the optimizer
adam = optimizers.Adam(lr=1e-3)
# Compiling the model
model_2.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy'])
# Fitting the model
history_2=model_2.fit(X_train, y_train, validation_split=0.2, epochs=100, batch_size=128, verbose=2)
# predicting on test
y_pred_2=model_2.predict(X_test)
# Applying Argmax function
y_pred_final_2=[]
for i in y_pred_2:
y_pred_final_2.append(np.argmax(i))
# Plotting classification report
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred_final_2))
Q No:
6
Correct Answer
Marks: 1.50/1.50
For the following line of code used to build a Neural Network model, choose the correct pairing -
model = keras.Sequential()
model.add(Dense(32, activation='relu',input_shape=(11,)))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu')
model.add(BatchNormalization())
model.add(Dense(3, activation='softmax'))
RMSprop= optimizers.RMSprop()
A
model.add(Dense(10, activation='softmax'))
i
Adding dropout the model
B
model.add(Dropout(0.5))
ii
Adding Batch Normalization to the model
C
model.add(Dense(64, activation='relu')
iii
adding output layer to the model
D
model.add(BatchNormalization())
iv
adding the hidden layer to the model
A-iii,B-i,C-iv,D-ii
You Selected
A-iv,B-i,C-ii,D-iii
A-iii,B-iv,C-ii,D-i
A-ii,B-iii,C-iv,D-i
model.add(Dense(10, activation='softmax'))
- Adding output layer to the model
model.add(Dropout(0.5)) - Adding dropout to the model
model.add(Dense(64, activation='relu') - Adding the hidden layer to the model
model.add(BatchNormalization()) -
Adding Batch Normalization to the model
Main quiz:
Attempt #1
Feb 07 at 10:38 PM
Marks: 12
Q No:
1
Correct Answer
Marks: 1/1
What happens if the learning rate in the Gradient Descent algorithm is set too high?
A) The convergence may be slower
B) The algorithm may get stuck in local minima
C) The algorithm may oscillate and fail to converge
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
All A, B and C
Both A and C
Only A
Only C
You Selected
Setting the learning rate in Gradient Descent too high can lead to oscillations in the optimization process. The algorithm may start overshooting the minimum, making the parameter updates bounce back and forth around the optimal solution. This oscillation prevents convergence and can cause the algorithm to fail to converge.
Q No:
2
Incorrect Answer
Marks: 0/1
What happens if all the weights in the neural network are initialized to zero?
The neural network with zero weight initialization will perform better than random initialization
The neural network will be stuck in the local minima
You Selected
The neural network will train properly without any problem
The neural network will not learn anything new
Correct Option
If the weights in the neural network are initialized to zero, then the input to the 2nd layer will be the same for all nodes. Then all the neurons will follow the same gradient, and will always end up doing the same thing as one another and ends up learning no new feature.
Q No:
3
Correct Answer
Marks: 2/2
Which of the following statements are true?
I. L1 and L2 regularization are the techniques to reduce the model complexity and prevent overfitting.
II. L1 regularization not only helps in reducing the overfitting but can help in feature selection also.
III. L2 regularization shrinks the coefficient, and it helps to reduce the model complexity and multi-collinearity.
All the Statements (I, II, III)
You Selected
Only Statement I
Both Statement I and II
Both Statement II and III
I. Both L1 and L2 regularization techniques are used to reduce model complexity and prevent overfitting. They add a penalty term to the loss function, discouraging the model from assigning excessively high weights to features.
II. L1 regularization, also known as Lasso regularization, not only assists in reducing overfitting but can also aid in feature selection. It tends to drive some feature weights
to exactly zero, effectively excluding those features from the model. This property can be helpful for identifying the most important features.
III. L2 regularization, also known as Ridge regularization, shrinks the coefficients towards zero without making them exactly zero. It helps in reducing model complexity and also mitigates multicollinearity, a situation where predictor variables are highly correlated, which can lead to unstable coefficient estimates.
Q No:
4
Correct Answer
Marks: 2/2
Which of the following statements are true about Adam Optimizer?
I. Adam is a combination of RMSprop and SGD with momentum.
II. Adam stands for Adaptive Moment Estimation.
III. Adam uses the moving average of the gradient to avoid noise.
Only Statement II
Both Statement I and III
Both Statement I and II
All the Statements (I, II, III)
You Selected
Adam stands for Adaptive moment and is a combination of RMSprop and SGD with momentum, and from RMSprop, it can change/scale the learning rate efficiently, and with the momentum, it uses an exponentially weighted average to avoid noise.
Q No:
5
Incorrect Answer
Marks: 0/2
Which of the following statements are true about the dropout during the testing process?
I. All the Neurons will be turned on during the testing process.
II. All the weights associated to the neurons are scaled with the dropout ratio used in the training process.
Both I and II statements
Correct Option
Only Statement II
None of the above statements
Only Statement I
You Selected
During the testing process, all the neurons will be active, and the weights associated with those neurons get scaled with the dropout ratio given during training.
Q No:
6
Correct Answer
Marks: 1/1
Which of the following are the penalty terms for L1 and L2 regularization respectively?
Both L1 and L2 regularization has the same penalty term i,e, the absolute sum of the coefficients
The square of the magnitude of the coefficients and the sum of the coefficients
The absolute sum of the coefficients and the square root of the magnitude of the coefficients
The absolute sum of the coefficients and the square of the magnitude of the coefficients
You Selected
The formula for L1 regularization: L(y,yhat) * (the absolute sum of the coefficients)
The formula for L2 regularization: L(y,yhat) * (the square of the magnitude of the coefficients)
Q No:
7
Correct Answer
Marks: 1/1
Which of the following statement is true about Batch Normalization during the testing process?
I. It uses population statistics to standardize the data.
II. BatchNormalization is used before the input layer.
Both Statement I and II
Only Statement I
You Selected
Only Statement II
None of the above statements
I. Batch Normalization uses population statistics (mean and variance) computed during the training process to standardize the data during testing. This ensures that the testing
process remains consistent with the normalization applied during training and helps improve the stability and performance of the model.
II. BatchNormalization is typically used within the hidden layers of a neural network, not before the input layer. It's applied to the outputs of hidden layers to normalize the activations and accelerate training. It's not common to use BatchNormalization directly before the input layer.
Q No:
8
Correct Answer
Marks: 2/2
Which of the following statements are true about RMSprop?
I. RMSprop is also known as Root Mean Square propagation.
II. RMSprop is similar to Adagrad, but additionally it uses the exponential moving average for scaling the learning rate
III. RMSprop is the same as SGD with momentum.
Both Statement II and III
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Both Statement I and II
You Selected
Only Statement I
All the Statements (I, II, III)
RMSprop is also called Root Mean Square propagation and is an improved version of Adagrad that aims to reduce the aggressive learning rate by taking the exponential average of the gradients instead of the cumulative.
Q No:
9
Correct Answer
Marks: 2/2
Which of the following order shows the correct implementation of the Batch Normalization in the model?
model = keras.Sequential()
keras.layers.Dense(11, input_shape=(11,), activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(6, activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(3,activation='softmax'),
keras.layers.BatchNormalization(0.5)
model = keras.Sequential()
keras.layers.Dense(11, input_shape=(11,), activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(6, activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(3,activation='softmax'),
keras.layers.BatchNormalization()
model = keras.Sequential()
keras.layers.Dense(11, input_shape=(11,), activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(6, activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(3,activation='softmax')
You Selected
model = keras.Sequential()
keras.layers.BatchNormalization(),
keras.layers.Dense(11, input_shape=(11,), activation='relu'),
keras.layers.Dense(6, activation='relu'),
keras.layers.Dense(3,activation='softmax')
BatchNormalization is used only after every dense layer and not before the input layer
BatchNormalization should not be used after the output layer.
Q No:
10
Correct Answer
Marks: 1/1
Which of the following activation functions are used with Xavier initialization?
A. Sigmoid
B. TanH
C. ReLU
Only C
Both B and C
Both A and B
You Selected
All A, B and C
Sigmoid and TanH can be used with Xavier initialization
ReLU and LeakyReLU can be used with HE initialization
Recommended textbooks for you

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Recommended textbooks for you
- Operations Research : Applications and AlgorithmsComputer ScienceISBN:9780534380588Author:Wayne L. WinstonPublisher:Brooks ColeC++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage Learning
- Principles of Information Systems (MindTap Course...Computer ScienceISBN:9781285867168Author:Ralph Stair, George ReynoldsPublisher:Cengage LearningFundamentals of Information SystemsComputer ScienceISBN:9781305082168Author:Ralph Stair, George ReynoldsPublisher:Cengage Learning

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning