CDA_Problem-set-3

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

MISC

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

17

Uploaded by ConstableNeutronRam141

Report
Problem Set-3 Swetha lenkala 2023-11-03 #1a) library (rpart.plot) ## Loading required package: rpart library (rattle) ## Loading required package: tibble ## Loading required package: bitops ## Rattle: A free graphical interface for data science with R. ## Version 5.5.1 Copyright (c) 2006-2021 Togaware Pty Ltd. ## Type 'rattle()' to shake, rattle, and roll your data. library (palmerpenguins) ## Warning: package 'palmerpenguins' was built under R version 4.3.2 library (caret) ## Loading required package: ggplot2 ## Warning: package 'ggplot2' was built under R version 4.3.2 ## Loading required package: lattice data ( "penguins" ) penguins <- na.omit (penguins) set.seed ( 123 ) # For reproducibility train_ind <- createDataPartition (penguins $ species, p = 0.75 , list = FALSE ) train_data <- penguins[train_ind, ] test_data <- penguins[ - train_ind, ] #1b) library (rpart) library (rpart.plot) library (RColorBrewer) library (rattle) tree <- rpart (species ~ island, data = train_data) fancyRpartPlot (tree)
The island where only Adelie penguins are found is Torgersen. Gentoo are found in Biscoe. #1c) tree <- rpart (species ~ flipper_length_mm + bill_depth_mm, data = train_data, control = rpart.control ( cp = 0.00005 )) fancyRpartPlot (tree)
# Predict species on the test data predictions <- predict (tree, test_data, type = "class" ) # Calculate the classification rate on the test data correct_predictions <- sum (predictions == test_data $ species) total_predictions <- nrow (test_data) classification_rate <- correct_predictions / total_predictions cat ( "Classification Rate on Test Data:" , classification_rate, " \n " ) ## Classification Rate on Test Data: 0.7560976 #1d) tree <- rpart (species ~ flipper_length_mm + bill_depth_mm, data = train_data, control = rpart.control ( cp = 0.05 )) fancyRpartPlot (tree)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
predictions <- predict (tree, test_data, type = "class" ) # Calculate the classification rate on the test data correct_predictions <- sum (predictions == test_data $ species) # counts the number of correct predictions made by the random forest model on the test data. total_predictions <- nrow (test_data) # calculates the total number of predictions made on the test data. classification_rate <- correct_predictions / total_predictions # cat ( "Classification Rate on Test Data:" , classification_rate, " \n " ) ## Classification Rate on Test Data: 0.7926829 when the classification rate improves from 0.75 to 0.79 as you increase the complexity parameter in a decision tree model, it reflects a shift in the bias-variance tradeoff. This shift can be explained as follows: Lower Bias: The improvement in the classification rate suggests that the model is better at fitting the training data. In other words, it captures the underlying patterns and relationships in the training data more accurately. This is indicative of lower bias in the model.
Potentially Higher Variance: However, increasing the complexity parameter leads to a more complex model, which may be at risk of overfitting. Overfitting occurs when the model becomes overly sensitive to the noise and fluctuations in the training data, tailoring itself to the specific quirks of that data. This can result in excellent performance on the training data but poor generalization to unseen data, which is a sign of potentially higher variance. #1e) library (randomForest) ## randomForest 4.7-1.1 ## Type rfNews() to see new features/changes/bug fixes. ## ## Attaching package: 'randomForest' ## The following object is masked from 'package:ggplot2': ## ## margin ## The following object is masked from 'package:rattle': ## ## importance # Fit a random forest with 1000 trees rf_model <- randomForest (species ~ flipper_length_mm + bill_depth_mm, data = train_data, ntree = 1000 ) # Predict species on the test data rf_predictions <- predict (rf_model, test_data) # Calculate the classification rate on the test data rf_correct_predictions <- sum (rf_predictions == test_data $ species) total_predictions <- nrow (test_data) rf_classification_rate <- rf_correct_predictions / total_predictions cat ( "Classification Rate with Random Forest (1000 trees):" , rf_classification_rate, " \n " ) ## Classification Rate with Random Forest (1000 trees): 0.7560976 #1f) rf_model_3_trees <- randomForest (species ~ flipper_length_mm + bill_depth_mm, data = train_data, ntree = 3 ) # Predict species on the test data rf_predictions_3_trees <- predict (rf_model_3_trees, test_data) # Calculate the classification rate on the test data
rf_correct_predictions_3_trees <- sum (rf_predictions_3_trees == test_data $ species) total_predictions <- nrow (test_data) rf_classification_rate_3_trees <- rf_correct_predictions_3_trees / total_predictions cat ( "Classification Rate with Random Forest (3 trees):" , rf_classification_rate_3_trees, " \n " ) ## Classification Rate with Random Forest (3 trees): 0.7560976 #2a) set.seed ( 123 ) #torch_manual_seed(123) library (torch) library (luz) # high-level interface for torch library (torchvision) # for datasets and image transformation library (caret) # for data partition library (kernlab) ## ## Attaching package: 'kernlab' ## The following object is masked from 'package:ggplot2': ## ## alpha library (caret) library (MASS) data (Boston) head (Boston) ## crim zn indus chas nox rm age dis rad tax ptratio black lstat ## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 ## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 ## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 ## 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 ## 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 ## 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 ## medv ## 1 24.0 ## 2 21.6 ## 3 34.7
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## 4 33.4 ## 5 36.2 ## 6 28.7 x <- model.matrix (medv ~ . - 1 , data = Boston) y <- Boston $ medv train_ind = createDataPartition (y, p= . 70 , list= FALSE ) x_train <- x[train_ind, ] y_train <- y[train_ind ] x_test <- x[ - train_ind, ] y_test <- y[ - train_ind ] modnn <- nn_module ( initialize = function (input_size) { self $ hidden1 <- nn_linear (input_size, 5 ) self $ activation <- nn_relu () self $ dropout <- nn_dropout (. 5 ) self $ output <- nn_linear ( 5 , 1 ) }, forward = function (x) { self $ output (self $ dropout (self $ activation (self $ hidden1 (x)))) } ) print ( modnn ( ncol (x))) ## An `nn_module` containing 76 parameters. ## ## ── Modules ───────────────────────────────────────────────────────────────────── ## • hidden1: <nn_linear> #70 parameters ## • activation: <nn_relu> #0 parameters ## • dropout: <nn_dropout> #0 parameters ## • output: <nn_linear> #6 parameters ### modnn <- set_hparams ( setup (modnn, loss = nn_mse_loss (), # Define the loss function optimizer = optim_rmsprop, # Defines an optimizer (we won't go into much detail here) metrics = list ()), # Allows user to monitor performance of metrics during training default is loss input_size = ncol (x))
fit_nn <- fit (modnn, data = list (x_train, matrix (y_train, ncol= 1 )), # supply train and test as lists valid_data = list (x_test, matrix (y_test, ncol= 1 )), epochs = 50 ) plot (fit_nn) predict (fit_nn, newdata= x_test) ## torch_tensor ## 16.0324 ## 16.0030 ## 15.9244 ## 8.6540 ## 11.5908 ## 14.6036 ## 14.0801 ## 12.6837 ## 10.0350 ## 13.4058 ## 14.7955 ## 18.5211 ## 19.1808 ## 13.8007 ## 12.2759 ## 13.6380
## 14.3542 ## 16.6603 ## 13.9160 ## 12.6796 ## 13.5556 ## 13.1031 ## 12.7596 ## 12.1836 ## 16.3503 ## 15.3950 ## 15.8121 ## 15.1284 ## 15.3448 ## 13.7744 ## ... [the output was truncated (use n=-1 to disable)] ## [ CPUFloatType{150,1} ] # MSE on test is mean (( as.numeric ( predict (fit_nn,x_test)) - y_test) ^ 2 ) ## [1] 164.2791 #2b) modnn1 <- nn_module ( initialize = function (input_size) { self $ hidden1 <- nn_linear (input_size, 10 ) self $ activation <- nn_relu () self $ dropout <- nn_dropout (. 5 ) self $ output <- nn_linear ( 10 , 1 ) }, forward = function (x) { self $ output (self $ dropout (self $ activation (self $ hidden1 (x)))) } ) print ( modnn1 ( ncol (x))) ## An `nn_module` containing 151 parameters. ## ## ── Modules ───────────────────────────────────────────────────────────────────── ## • hidden1: <nn_linear> #140 parameters ## • activation: <nn_relu> #0 parameters ## • dropout: <nn_dropout> #0 parameters ## • output: <nn_linear> #11 parameters ### modnn1 <- set_hparams ( setup (modnn1,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
loss = nn_mse_loss (), # Define the loss function optimizer = optim_rmsprop, # Defines an optimizer (we won't go into much detail here) metrics = list ()), # Allows user to monitor performance of metrics during training default is loss input_size = ncol (x)) fit_nn1 <- fit (modnn1, data = list (x_train, matrix (y_train, ncol= 1 )), # supply train and test as lists valid_data = list (x_test, matrix (y_test, ncol= 1 )), epochs = 50 ) mean (( as.numeric ( predict (fit_nn1,x_test)) - y_test) ^ 2 ) ## [1] 93.64908 Yes, the MSE improves by reduction of its value from 164.27 to 93.64. #2c) modnn2 <- nn_module ( initialize = function (input_size) { self $ hidden1 <- nn_linear (input_size, 10 ) self $ activation <- nn_sigmoid () self $ dropout <- nn_dropout (. 5 ) self $ output <- nn_linear ( 10 , 1 ) }, forward = function (x) { self $ output (self $ dropout (self $ activation (self $ hidden1 (x)))) } ) print ( modnn2 ( ncol (x))) ## An `nn_module` containing 151 parameters. ## ## ── Modules ───────────────────────────────────────────────────────────────────── ## • hidden1: <nn_linear> #140 parameters ## • activation: <nn_sigmoid> #0 parameters ## • dropout: <nn_dropout> #0 parameters ## • output: <nn_linear> #11 parameters ### modnn2 <- set_hparams ( setup (modnn2, loss = nn_mse_loss (), # Define the loss function
optimizer = optim_rmsprop, # Defines an optimizer (we won't go into much detail here) metrics = list ()), # Allows user to monitor performance of metrics during training default is loss input_size = ncol (x)) fit_nn2 <- fit (modnn2, data = list (x_train, matrix (y_train, ncol= 1 )), # supply train and test as lists valid_data = list (x_test, matrix (y_test, ncol= 1 )), epochs = 50 ) mean (( as.numeric ( predict (fit_nn2,x_test)) - y_test) ^ 2 ) ## [1] 111.5691 Yes, the MSE improves by reduction of its value from 168.27 to 111.59 #2d) modnn3 <- nn_module ( initialize = function (input_size) { self $ hidden1 <- nn_linear (input_size, 5 ) self $ activation <- nn_relu () self $ dropout <- nn_dropout (. 5 ) self $ output <- nn_linear ( 5 , 1 ) }, forward = function (x) { self $ output (self $ dropout (self $ activation (self $ hidden1 (x)))) } ) print ( modnn3 ( ncol (x))) ## An `nn_module` containing 76 parameters. ## ## ── Modules ───────────────────────────────────────────────────────────────────── ## • hidden1: <nn_linear> #70 parameters ## • activation: <nn_relu> #0 parameters ## • dropout: <nn_dropout> #0 parameters ## • output: <nn_linear> #6 parameters ### modnn3 <- set_hparams ( setup (modnn3, loss = nn_mse_loss (), # Define the loss function optimizer = optim_rmsprop, # Defines an optimizer
(we won't go into much detail here) metrics = list ()), # Allows user to monitor performance of metrics during training default is loss input_size = ncol (x)) fit_nn3 <- fit (modnn3, data = list (x_train, matrix (y_train, ncol= 1 )), # supply train and test as lists valid_data = list (x_test, matrix (y_test, ncol= 1 )), epochs = 500 ) plot (fit_nn3) predict (fit_nn3, newdata= x_test) ## torch_tensor ## 28.3811 ## 26.7713 ## 26.1201 ## 19.2314 ## 19.2314 ## 19.8532 ## 19.2314 ## 19.2314 ## 19.2314 ## 19.2314 ## 20.3112
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## 28.6382 ## 32.7379 ## 19.2802 ## 19.2314 ## 19.2314 ## 19.2314 ## 24.2737 ## 19.4768 ## 19.2314 ## 20.2424 ## 19.5323 ## 19.2314 ## 19.2314 ## 23.7937 ## 24.5917 ## 27.4984 ## 23.1331 ## 19.2568 ## 19.2314 ## ... [the output was truncated (use n=-1 to disable)] ## [ CPUFloatType{150,1} ] mean (( as.numeric ( predict (fit_nn3,x_test)) - y_test) ^ 2 ) ## [1] 44.58728 The sigmoid activation function is highly non-linear and saturates at the extremes. This can make the model more prone to overfitting, and the test error curve may be more U-shaped. On the other hand, the ReLU activation function is linear in the non-negative region and does not saturate. This can make the model less prone to overfitting, and the test error curve may be less U-shaped. Also, as the number of epochs increases, the model has more time to learn the training data. This can lead to overfitting, and the test error may start to increase after the optimal model complexity is reached . rm ( list= ls ()) # Another Way # Load necessary libraries library (torch) library (caret) library (luz) library (torchvision) library (kernlab) #3a) # Set seed for reproducibility set.seed ( 123 )
torch_manual_seed ( 123 ) # Generate 1000 realizations of X1, X2, and epsilon X1 <- rnorm ( 1000 , mean = 5 , sd = 4 ) X2 <- rnorm ( 1000 , mean = 5 , sd = 1 ) epsilon <- rnorm ( 1000 , mean = 0 , sd = 10 ) y <- 10 + 5 * X1 + 2 * X1 ^ 2 + epsilon # Create a data frame with the variables data_df <- data.frame (X1, X2, y) # Create a 50-50 training-testing split train_ind <- createDataPartition (data_df $ y, p = 0.5 , list = FALSE ) x_train <- as.matrix (data_df[train_ind, 1 : 2 ]) y_train <- as.matrix (data_df[train_ind, 3 ]) x_test <- as.matrix (data_df[ - train_ind, 1 : 2 ]) y_test <- as.matrix (data_df[ - train_ind, 3 ]) # Define the neural network architecture modnn1 <- nn_module ( initialize = function (input_size) { self $ hidden1 <- nn_linear (input_size, 5 ) self $ activation1 <- nn_relu () self $ dropout1 <- nn_dropout ( 0.2 ) self $ output <- nn_linear ( 5 , 1 ) }, forward = function (x) { self $ output (self $ dropout1 (self $ activation1 (self $ hidden1 (x)))) } ) # Set up the network modnn1 <- set_hparams ( setup ( modnn1, loss = nn_mse_loss (), optimizer = optim_rmsprop, metrics = list () ), input_size = ncol (x_train) ) # Train the neural network fit_nn1 <- fit (modnn1, data = list (x_train,y_train), valid_data = list (x_test,y_test), epochs = 20 ) mean_1 <- mean (( as.numeric ( predict (fit_nn1,x_test)) - y_test) ^ 2 ) mean_1
## [1] 3165.944 # 3b) predictions <- as.array ( predict (fit_nn1, x_test)) #Create a scatter plot for the neural network's estimates of y plot (x_test[, 1 ], predictions, main = "Neural Network Predictions" , xlab = "X1" , ylab = "Predicted y" , col = "blue" ) # Plot the true function for reference true_function <- 10 + 5 * x_test[, 1 ] + 2 * (x_test[, 1 ] ^ 2 ) points (x_test[, 1 ], true_function, col = "red" , pch = 20 ) legend ( "topleft" , legend = c ( "Neural Network" , "True Function" ), col = c ( "blue" , "red" ), pch = c ( 1 , 20 )) # 3c) # Define the modified neural network architecture modnn2 <- nn_module ( initialize = function (input_size) { self $ hidden1 <- nn_linear (input_size, 10 ) self $ activation1 <- nn_relu () self $ hidden2 <- nn_linear ( 10 , 5 ) self $ activation2 <- nn_relu () self $ dropout1 <- nn_dropout ( 0.2 )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
self $ output <- nn_linear ( 5 , 1 ) }, forward = function (x) { self $ output (self $ dropout1 (self $ activation2 (self $ hidden2 (self $ activation1 (self $ hidden1 (x)))))) } ) # Set up the network modnn2 <- set_hparams ( setup ( modnn2, loss = nn_mse_loss (), optimizer = optim_rmsprop, metrics = list () ), input_size = ncol (x_train) ) # Train the neural network fit_nn2 <- fit (modnn2, data = list (x_train,y_train), valid_data = list (x_test,y_test), epochs = 20 ) mean_2 <- mean (( as.numeric ( predict (fit_nn1,x_test)) - y_test) ^ 2 ) mean_2 ## [1] 3165.944 # d) predictions <- as.array ( predict (fit_nn2, x_test)) #Create a scatter plot for the neural network's estimates of y plot (x_test[, 1 ], predictions, main = "Neural Network Predictions" , xlab = "X1" , ylab = "Predicted y" , col = "blue" ) # Plot the true function for reference true_function <- 10 + 5 * x_test[, 1 ] + 2 * (x_test[, 1 ] ^ 2 ) points (x_test[, 1 ], true_function, col = "red" , pch = 20 ) legend ( "topleft" , legend = c ( "Neural Network" , "True Function" ), col = c ( "blue" , "red" ), pch = c ( 1 , 20 ))
The first MSE value(mean_1) is for neural network with 1 hidden layer and the second MSE value(mean_2) is for neural network with 2 hidden layers. From the results, we observe that MSE for neural network with 2 hidden layers is same for MSE value for neural network with 1 hidden layer. Adding another layer did not improve the fit (maybe because of overfitting).