Neural Network Week 2

docx

School

University of Texas *

*We aren’t endorsed by this school

Course

NETWORKS

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

14

Uploaded by MateMaskEel33

Report
Neural Network Week 2: 2.1 – SGD with Momentum Q No: 1 Correct Answer Marks: 1/1 A convex function cannot have more than one dimension? True False You Selected A function can be 2d or 3d or have more than 3-dimensions. Q No: 2 Correct Answer Marks: 1/1 Does momentum allow to accelerate the gradient descent algorithm? True You Selected False It accelerates the gradient descent algorithm by considering the exponentially weighted average of the gradients. Q No: 3 Correct Answer Marks: 1/1 Which of the following is true about a convex function? It has multiple local minimums. Local minimum and global minimum are different. Local minimum and global minimum are the same. You Selected It has multiple global minimums. 2.2 – Other variants of Gradient Descent Q No: 1 Correct Answer Marks: 1/1 Which of the below given are the different variants of optimizers? Adagrad RMSprop Adam All the above You Selected SGD with momentum, Adagrad, RMSprop, and Adam are the different variants of optimizers. Q No: 2 Correct Answer Marks: 1/1 Does the learning rate in the Adagrad dynamically decrease as the algorithm proceeds? True
You Selected False The learning rate in the adagrad decreases for every iteration during the training process, and the learning rate here is dimension-specific. 2.3 – Weight initialization and its techniques Q No: 1 Correct Answer Marks: 1/1 Which are the following weight initialization technique are used for Relu activation? Xavier initialization HE initialization You Selected Both A and B None of these 2.4 – regularization Q No: 1 Correct Answer Marks: 1/1 Which of the following is another name for L2 Regularization? Lasso Ridge You Selected Both A and B L2-Regularization is also known as Ridge and L1 regularization as Lasso. Q No: 2 Correct Answer Marks: 1/1 Data Augmentation technique is used on the images? True You Selected False Data Augmentation is a regularization technique used on images that increase the amount of data by adding slightly modified copies of already existing data. 2.5 – dropout Q No: 1 Correct Answer Marks: 1/1 Which of the following is true about the Dropout technique? I. Dropout is a regularization technique that reduces overfitting.
II. Dropout randomly drops the neurons according to the dropout ratio in the network during the training period. III. Dropout is the same as the fully connected layer. Only statement I is correct. Both I and III statements are correct. All three statements are correct. Both I and II statements are correct. You Selected Dropout is a regularization technique used to prevent the model from overfitting. Dropout drops the number of neurons in the hidden layer according to the dropout ratio given by default ratio is 0.5. Q No: 2 Correct Answer Marks: 1/1 Can the Dropout technique be implemented in the Output layer? True False You Selected No, dropout can not be applied on the output layer since the output layer will be giving the probabilities and numerical values for classification and regression problems respectively, and dropout will randomly drop the neurons in the layer. 2.6 – batch normalization Q No: 1 Correct Answer Marks: 1/1 Which of the following is true about Batch Normalization? It normalizes all the input before sending it to the next layer. You Selected It is one of the Backpropagation techniques used in Neural Networks It gives the mean and standard deviation of the weights Batch Normalization is the same as the Dropout technique. Batch Normalization is a regularization technique applied before activation function or after activation function in a neuron, which normalizes all the input before sending it to the next layer. Q No: 2 Correct Answer Marks: 1/1 Can the Batch Normalization technique be implemented in the Output layer? True False You Selected No, Batch Normalization can’t be applied on the output layer since it will normalize the value, and the output layer will output probabilities and numerical for classification and regression problems respectively.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2.7 – types of neural networks Which of the following are the different types of Neural Networks? RNN CNN Feed Forward Neural Network All the above You Selected Q No: 2 Correct Answer Marks: 1/1 For an image recognition problem, which of the following architectures of the neural network is best suited to solve the problem? Perceptron Convolutional Neural Network You Selected Multi-Layer Perceptron LSTM The convolutional neural network architecture is the best solution for solving any image recognition problem. Practice Quiz Attempt #1 Feb 07 at 9:45 PM Marks: 8 Q No: 1 Correct Answer Marks: 1/1 Import the dataset and answer the following question. The distribution of the ‘Age’ attribute looks slightly right distributed. True You Selected False #import pandas as pd data = pd.read_csv('new_preprocessed_data.csv') #Plotting the Distribution plot  sns.displot(data=data, x='age', kde=True) # Finding the skewness of the Age variable  data.age.skew()  Q No: 2 Correct Answer Marks: 2/2
Build a Neural Network Model on the dataset by following the below steps: Store the Independent and Dependent features in X and y Use train_test split to split the data (80% for training and 20% for testing) Convert the target feature into a NumPy array using Keras to_categorical function Use the below parameters mentioned 1. The number of neurons in the First and second layers is 64 and 32 respectively. 2. Use Dropout of ratio 0.2 after the second layer. 3. Use ReLu as an Activation function in Input and Hidden layers and Adam as an Optimizer with 1e-3 as learning rate 4. Build the model on 20 Epochs with validation_split=0.2 What is the accuracy of the model on the training data? Note : - Do not use stratify sampling and Callbacks. - The given dataset is scaled, so please don't scale the data again. >30 and <50 >51 and <70 You Selected >70 and <85 >90 #Splitting Independent and Dependent variables X = data.drop('loan_status',axis=1) Y = data[['loan_status']] # Train Test Split  X_train, X_test, y_train, y_test=train_test_split(X, Y, test_size=0.2, random_state=1) # Target Encoding from tensorflow.keras.utils import to_categorical y_train = to_categorical(y_train, 3) y_test_cat = to_categorical(y_test, 3) # Building the model  import tensorflow as tf from tensorflow import keras model = keras.Sequential() # Input layer  model.add(Dense(64, activation='relu',input_shape=(11,))) # Second layer  model.add(Dense(32, activation='relu')) # Adding dropout  model.add(Dropout(0.2)) # Output layer model.add(Dense(3, activation='softmax')) # Defining the Optimizer adam = optimizers.Adam(lr=1e-3) # Compiling the model  model.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy'])
# Fitting the model  history=model.fit(X_train, y_train, epochs=20, validation_split=0.2, verbose=2) Q No: 3 Correct Answer Marks: 2/2 Build a model on the train data using the below hyperparameters and find the f1-score of the test data of the 0th class. The number of neurons in the first, second, third, and fourth layers should be 256,124,64 and 32 respectively. Use the BatchNormalization after the second layer. Use ReLu as an Activation function in Indput and Hidden layers and RMSprop as Optimizer with 1e-3 as the learning rate Build the model on 50 Epochs with validation_split=0.2, and batch_size=128, Note Do not use stratify sampling and Callbacks. The given dataset is scaled, so please don't scale the data again. 0.51 - 0.55 0.71 - 0.80 0.60 - 0.70 You Selected 0.35 - 0.50 #Defining the model model_1 = keras.Sequential() # Adding the first (input) layer model_1.add(Dense(256, activation='relu',input_shape=(11,))) # Adding second layer  model_1.add(Dense(124, activation='relu')) # Adding the BatchNormalization model_1.add(BatchNormalization()) # Adding the third layer  model_1.add(Dense(64, activation='relu')) #Adding fourth layer model_1.add(Dense(32, activation='relu')) # Adding the output layer model_1.add(Dense(3, activation='softmax')) # Defining the Optimizer RMSprop = optimizers.RMSprop(lr=1e-3) # Compiling the model model_1.compile(loss=losses.categorical_crossentropy, optimizer=RMSprop, metrics=['accuracy']) # Fitting the model history_1=model_1.fit(X_train, y_train, validation_split=0.2, epochs=50, batch_size=128, verbose=2) # Predicting on test  y_pred=model_1.predict(X_test) # Applying argmax y_pred_final=[]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
for i in y_pred: y_pred_final.append(np.argmax(i)) # Plotting classification report  from sklearn.metrics import classification_report print(classification_report(y_test,y_pred_final)) Q No: 4 Correct Answer Marks: 1.50/1.50 Which of the following order shows the correct implementation of the Batch Normalization in the model? model = keras.Sequential() keras.layers.Dense(11, input_shape=(11,), activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(6, activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(3,activation='softmax') You Selected model = keras.Sequential() keras.layers.BatchNormalization(), keras.layers.Dense(11, input_shape=(11,), activation='relu'), keras.layers.Dense(6, activation='relu'), keras.layers.Dense(3,activation='softmax') model = keras.Sequential() keras.layers.Dense(11, input_shape=(11,), activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(6, activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(3,activation='softmax'), keras.layers.BatchNormalization() model = keras.Sequential() keras.layers.Dense(11, input_shape=(11,), activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(6, activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(3,activation='softmax'), keras.layers.BatchNormalization(0.5) BatchNormalization is used only after every dense layer and BatchNormalization should not be used after the output layer. Q No: 5 Marks: 0/2 Build a model on the train data using the below hyperparameters and find the precision of the test data of the 0th class. The number of neurons in the first, second, third, and fourth layers should be 128,64,64 and 32 respectively.
Use the Dropout of ratio 0.3 after the second layer and BatchNormation after the third layer. Use ReLu as an Activation function in Input and Hidden layers Adam as Optimizer with 1e-3 as the learning rate Build the model on 100 Epochs with validation_split=0.2, and batch_size=128 Note Do not use stratify sampling and Callbacks. The given dataset is scaled, so please don't scale the data again. 0.20 - 0.40 0.41 - 0.70 Correct Option 0.71 - 0.80 You Selected >0.80 import tensorflow as tf from tensorflow import keras from tensorflow.keras.layers import BatchNormalization model_2 = keras.Sequential() model_2.add(Dense(128, activation='relu',input_shape=(11,))) model_2.add(Dense(64, activation='relu')) model_2.add(Dropout(0.3)) model_2.add(Dense(64, activation='relu')) model_2.add(BatchNormalization()) model_2.add(Dense(32, activation='relu')) model_2.add(Dense(3, activation='softmax')) # Defining the optimizer adam = optimizers.Adam(lr=1e-3) # Compiling the model model_2.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy']) # Fitting the model history_2=model_2.fit(X_train, y_train, validation_split=0.2, epochs=100, batch_size=128, verbose=2) # predicting on test y_pred_2=model_2.predict(X_test) # Applying Argmax function  y_pred_final_2=[] for i in y_pred_2: y_pred_final_2.append(np.argmax(i)) # Plotting classification report  from sklearn.metrics import classification_report print(classification_report(y_test,y_pred_final_2)) Q No: 6 Correct Answer Marks: 1.50/1.50 For the following line of code used to build a Neural Network model, choose the correct pairing -
model = keras.Sequential() model.add(Dense(32, activation='relu',input_shape=(11,))) model.add(Dropout(0.5)) model.add(Dense(64, activation='relu') model.add(BatchNormalization()) model.add(Dense(3, activation='softmax')) RMSprop= optimizers.RMSprop() A model.add(Dense(10, activation='softmax')) i Adding dropout the model  B model.add(Dropout(0.5)) ii Adding Batch Normalization to the model C model.add(Dense(64, activation='relu') iii adding output layer to the model  D model.add(BatchNormalization()) iv adding the hidden layer to the model  A-iii,B-i,C-iv,D-ii You Selected A-iv,B-i,C-ii,D-iii A-iii,B-iv,C-ii,D-i A-ii,B-iii,C-iv,D-i model.add(Dense(10, activation='softmax')) - Adding output layer to the model model.add(Dropout(0.5)) - Adding dropout to the model model.add(Dense(64, activation='relu') - Adding the hidden layer to the model model.add(BatchNormalization()) - Adding Batch Normalization to the model Main quiz: Attempt #1 Feb 07 at 10:38 PM Marks: 12 Q No: 1 Correct Answer Marks: 1/1 What happens if the learning rate in the Gradient Descent algorithm is set too high? A) The convergence may be slower B) The algorithm may get stuck in local minima C) The algorithm may oscillate and fail to converge
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
All A, B and C Both A and C Only A Only C You Selected Setting the learning rate in Gradient Descent too high can lead to oscillations in the optimization process. The algorithm may start overshooting the minimum, making the parameter updates bounce back and forth around the optimal solution. This oscillation prevents convergence and can cause the algorithm to fail to converge. Q No: 2 Incorrect Answer Marks: 0/1 What happens if all the weights in the neural network are initialized to zero? The neural network with zero weight initialization will perform better than random initialization The neural network will be stuck in the local minima You Selected The neural network will train properly without any problem The neural network will not learn anything new Correct Option If the weights in the neural network are initialized to zero, then the input to the 2nd layer will be the same for all nodes. Then all the neurons will follow the same gradient, and will always end up doing the same thing as one another and ends up learning no new feature. Q No: 3 Correct Answer Marks: 2/2 Which of the following statements are true? I. L1 and L2 regularization are the techniques to reduce the model complexity and prevent overfitting. II. L1 regularization not only helps in reducing the overfitting but can help in feature selection also. III. L2 regularization shrinks the coefficient, and it helps to reduce the model complexity and multi-collinearity. All the Statements (I, II, III) You Selected Only Statement I Both Statement I and II Both Statement II and III I. Both L1 and L2 regularization techniques are used to reduce model complexity and prevent overfitting. They add a penalty term to the loss function, discouraging the model from assigning excessively high weights to features. II. L1 regularization, also known as Lasso regularization, not only assists in reducing overfitting but can also aid in feature selection. It tends to drive some feature weights
to exactly zero, effectively excluding those features from the model. This property can be helpful for identifying the most important features. III. L2 regularization, also known as Ridge regularization, shrinks the coefficients towards zero without making them exactly zero. It helps in reducing model complexity and also mitigates multicollinearity, a situation where predictor variables are highly correlated, which can lead to unstable coefficient estimates. Q No: 4 Correct Answer Marks: 2/2 Which of the following statements are true about Adam Optimizer? I. Adam is a combination of RMSprop and SGD with momentum. II. Adam stands for Adaptive Moment Estimation. III. Adam uses the moving average of the gradient to avoid noise. Only Statement II Both Statement I and III Both Statement I and II All the Statements (I, II, III) You Selected Adam stands for Adaptive moment and is a combination of RMSprop and SGD with momentum, and from RMSprop, it can change/scale the learning rate efficiently, and with the momentum, it uses an exponentially weighted average to avoid noise. Q No: 5 Incorrect Answer Marks: 0/2 Which of the following statements are true about the dropout during the testing process? I. All the Neurons will be turned on during the testing process. II. All the weights associated to the neurons are scaled with the dropout ratio used in the training process. Both I and II statements Correct Option Only Statement II None of the above statements Only Statement I You Selected During the testing process, all the neurons will be active, and the weights associated with those neurons get scaled with the dropout ratio given during training. Q No: 6 Correct Answer Marks: 1/1 Which of the following are the penalty terms for L1 and L2 regularization respectively?
Both L1 and L2 regularization has the same penalty term i,e, the absolute sum of the coefficients The square of the magnitude of the coefficients and the sum of the coefficients The absolute sum of the coefficients and the square root of the magnitude of the coefficients The absolute sum of the coefficients and the square of the magnitude of the coefficients You Selected The formula for L1 regularization: L(y,yhat) * (the absolute sum of the coefficients) The formula for L2 regularization: L(y,yhat) * (the square of the magnitude of the coefficients) Q No: 7 Correct Answer Marks: 1/1 Which of the following statement is true about Batch Normalization during the testing process? I. It uses population statistics to standardize the data. II. BatchNormalization is used before the input layer. Both Statement I and II Only Statement I You Selected Only Statement II None of the above statements I. Batch Normalization uses population statistics (mean and variance) computed during the training process to standardize the data during testing. This ensures that the testing process remains consistent with the normalization applied during training and helps improve the stability and performance of the model. II. BatchNormalization is typically used within the hidden layers of a neural network, not before the input layer. It's applied to the outputs of hidden layers to normalize the activations and accelerate training. It's not common to use BatchNormalization directly before the input layer. Q No: 8 Correct Answer Marks: 2/2 Which of the following statements are true about RMSprop? I. RMSprop is also known as Root Mean Square propagation. II. RMSprop is similar to Adagrad, but additionally it uses the exponential moving average for scaling the learning rate III. RMSprop is the same as SGD with momentum. Both Statement II and III
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Both Statement I and II You Selected Only Statement I All the Statements (I, II, III) RMSprop is also called Root Mean Square propagation and is an improved version of Adagrad that aims to reduce the aggressive learning rate by taking the exponential average of the gradients instead of the cumulative. Q No: 9 Correct Answer Marks: 2/2 Which of the following order shows the correct implementation of the Batch Normalization in the model? model = keras.Sequential() keras.layers.Dense(11, input_shape=(11,), activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(6, activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(3,activation='softmax'), keras.layers.BatchNormalization(0.5) model = keras.Sequential() keras.layers.Dense(11, input_shape=(11,), activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(6, activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(3,activation='softmax'), keras.layers.BatchNormalization() model = keras.Sequential() keras.layers.Dense(11, input_shape=(11,), activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(6, activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dense(3,activation='softmax') You Selected model = keras.Sequential() keras.layers.BatchNormalization(), keras.layers.Dense(11, input_shape=(11,), activation='relu'), keras.layers.Dense(6, activation='relu'), keras.layers.Dense(3,activation='softmax') BatchNormalization is used only after every dense layer and not before the input layer BatchNormalization should not be used after the output layer. Q No: 10 Correct Answer Marks: 1/1 Which of the following activation functions are used with Xavier initialization? A. Sigmoid B. TanH
C. ReLU Only C Both B and C Both A and B You Selected All A, B and C Sigmoid and TanH can be used with Xavier initialization ReLU and LeakyReLU can be used with HE initialization