ML_A1_Tanmay-merged

pdf

School

New York University *

*We aren’t endorsed by this school

Course

6143

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

30

Uploaded by tanmayr71

Report
Machine Learning Assignment 1 Name: Tanmay Rathi 1) Problem 1 Converted the matlab code to python code import numpy as np import matplotlib.pyplot as plt def polyreg(x, y, D, xT = None , yT = None , plot_enabled = True ): """ Finds a D-1 order polynomial fit to the data Args: x: vector of input scalars for training y: vector of output scalars for training D: the order plus one of the polynomial being fit xT: vector of input scalars for testing (optional) yT: vector of output scalars for testing (optional) Returns: err: average squared loss on training model: vector of polynomial parameter coefficients errT: average squared loss on testing (optional) """ xx = np.zeros(( len (x), D)) for i in range (D): xx[:, i] = x ** (D - i) model = np.linalg.pinv(xx).dot(y) err = ( 1 / ( 2 * len (x))) * np. sum ((y - xx.dot(model)) ** 2 ) if xT is not None and yT is not None : xxT = np.zeros(( len (xT), D)) for i in range (D): xxT[:, i] = xT ** (D - i) errT = ( 1 / ( 2 * len (xT))) * np. sum ((yT - xxT.dot(model)) ** 2 ) else : errT = None if plot_enabled: q = np.arange( min (x), max (x), ( max (x) - min (x)) / 300 ) qq = np.zeros(( len (q), D)) for i in range (D): qq[:, i] = q ** (D - i) plt.plot(x, y, 'X' )
# plt.hold(True) # plt.gca().set_prop_cycle(None) if xT is not None and yT is not None : plt.plot(xT, yT, 'co' ) plt.plot(q, qq.dot(model), 'r' ) plt.show() return err, model, errT # Example usage: x = 3 * (np.random.rand( 50 ) - 0.5 ) y = x ** 3 - x + np.random.rand( len (x)) err, model, errT = polyreg(x, y, 4 , plot_enabled = False ) # Example usage: x = 3 * (np.random.rand( 50 ) - 0.5 ) y = x ** 3 - x + np.random.rand( len (x)) err, model, errT = polyreg(x, y, 4 ) My Code Starts Here from google.colab import drive drive.mount( '/content/drive' ) Mounted at /content/drive
Loading the dataset from the mat file using the loadmat module from scipy.io import loadmat import pandas as pd data1 = loadmat( '/content/drive/MyDrive/Assignments/Machine Learning/Homework1/problem1.mat' ) x1_data = data1[ 'x' ].squeeze() y1_data = data1[ 'y' ].squeeze() print ( 'Type of x1:' , type (x1_data)) print ( 'Shape of x1:' , x1_data.shape) print ( 'Type of y1:' , type (y1_data)) print ( 'Shape of y1:' , y1_data.shape) Type of x1: <class 'numpy.ndarray'> Shape of x1: (500,) Type of y1: <class 'numpy.ndarray'> Shape of y1: (500,) Using the standard sklearn library to divide the dataset into training and testing sets to validate the model later from sklearn.model_selection import train_test_split x1_train, x1_test, y1_train, y1_test = train_test_split(x1_data, y1_data, test_size = 0.33 , random_state = 44 ) print ( 'Type of x1_train:' , type (x1_train)) print ( 'Shape of x1_train:' , x1_train.shape) print ( 'Type of x1_test:' , type (x1_test)) print ( 'Shape of x1_test:' , x1_test.shape) Type of x1_train: <class 'numpy.ndarray'> Shape of x1_train: (335,) Type of x1_test: <class 'numpy.ndarray'> Shape of x1_test: (165,) print ( 'Type of y1_train:' , type (y1_train)) print ( 'Shape of y1_train:' , y1_train.shape) print ( 'Type of y1_test:' , type (y1_test)) print ( 'Shape of y1_test:' , y1_test.shape) Type of y1_train: <class 'numpy.ndarray'> Shape of y1_train: (335,) Type of y1_test: <class 'numpy.ndarray'> Shape of y1_test: (165,) Plotting the hypothesis with degree of polynomial set to 4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
err_1, model_1, errT_1 = polyreg(x1_train, y1_train, 4 , x1_test, y1_test) This method will give the best value of D based on the testing error (errT). The best value of d (degree of polynomial) will be the degree for which the test error is minimum def getBestD(x,y,degree = 5 , plot_enabled = False ): x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33 , random_state = 44 ) train_err = [] test_err = [] min_test_err = float ( 'inf' ) bestD = None for d in range (degree): err, model, errT = polyreg(x_train, y_train, d, x_test, y_test, plot_enabled) train_err.append(err) test_err.append(errT) if (errT < min_test_err): min_test_err = errT bestD = d
return bestD, train_err, test_err These plots show the hypothesis and the fit for different values of d (degree) from 0 to 10 bestD, train_err, test_err = getBestD(x1_data, y1_data, 10 , True )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
From the plots above, it seems that after the model is sufficiently complex (a higher value of degree), it fits the data well. We can see the Error vs the Degree graph to visualize this and decide on a reasonable value of degree. plt.plot( range ( 10 ), train_err, label = 'Training Error' , color = 'b' , marker = 'o' ) plt.plot( range ( 10 ), test_err, label = 'Testing Error' , color = 'g' , marker = 'x' ) plt.xlabel( 'Degree (Order of Polynomial)' ) plt.ylabel( 'Error (MSE)' ) plt.legend() plt.show()
Let us check the best value of d based on the error bestD, train_err, test_err = getBestD(x1_data, y1_data, 10 , False ) bestD 7 Ans 1: Degree 7 seems reasonable Let us plot the hypothesis and see the fit for value of d around 7. err_1, model_1, errT_1 = polyreg(x1_train, y1_train, 7 , x1_test, y1_test, True )
err_1, model_1, errT_1 = polyreg(x1_train, y1_train, 30 , x1_test, y1_test, True )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
err_1, model_1, errT_1 = polyreg(x1_train, y1_train, 50 , x1_test, y1_test, True )
Let me create a method to check the degree vs Error for higher values of d. def plotDError(x,y,d = 20 ): bestD, train_err, test_err = getBestD(x, y, d, False ) plt.figure(figsize = ( 12 , 5 )) plt.plot( range (d), train_err, label = 'Training Error' , color = 'b' , marker = 'o' ) plt.plot( range (d), test_err, label = 'Testing Error' , color = 'g' , marker = 'x' ) plt.xlabel( 'Degree (Order of Polynomial)' ) plt.ylabel( 'Error (MSE)' ) plt.legend() plt.show() return bestD For a extremely high value of d(say 50), lets check the graph min50 = plotDError(x1_data, y1_data, 100 )
For higher values of d (around 30), as the model gets more complex, it fits poorly to both training and testing (new) data. We can clearly see we dont need such a complex model to fit to our data as the data has less number of features and such high degree is an overkill. bestD, train_err, test_err = getBestD(x1_data, y1_data, 10 , False ) plt.figure(figsize = ( 12 , 5 )) plt.plot( range ( 5 , 9 ), train_err[ 5 : 9 ], label = 'Training Error' , color = 'b' , marker = 'o' ) plt.plot( range ( 5 , 9 ), test_err[ 5 : 9 ], label = 'Testing Error' , color = 'g' , marker = 'x' ) plt.xlabel( 'Degree (Order of Polynomial)' ) plt.ylabel( 'Error (MSE)' ) plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
I think 7 is a reasonable and empirically evident best choice for value of degree, as the test error is the minimum from the graph above and increases from 7 to higher values of d test_err [2.2605326103477378e+23, 1.1992557056135127e+23, 1.1974814142831374e+23, 1.8533468895113963e+22, 1.8637171387103841e+22, 1.512890252039879e+21, 1.515472147841272e+21, 1.3243572740033636e+21, 1.3292206337282204e+21, 1.3253831846689917e+21] for i in range ( 5 , 9 ): print ( f'For i= { i } , Loss = { test_err[i] } ' ) For i=5, Loss = 1.512890252039879e+21 For i=6, Loss = 1.515472147841272e+21 For i=7, Loss = 1.3243572740033636e+21 For i=8, Loss = 1.3292206337282204e+21 min50 = plotDError(x1_data, y1_data, 50 ) min50 7
2) Problem 2 data2 = loadmat( '/content/drive/MyDrive/Assignments/Machine Learning/Homework1/problem2.mat' ) x2_data = data2[ 'x' ] y2_data = data2[ 'y' ] This method gives the model, training error and testing error(If testing data is provided) It provides regularization to avoid big values of theta (parameters) on data with multiple features def multivariateRegWithRegularization(x, y, l = 0 , xT = None , yT = None ): """ Finds a best fit to the multivariate dataset with regularization Args: x: matrix of input scalars for training y: vector of output scalars for training l: hyperparameter being used for regularization xT: matrix of input scalars for testing (optional) yT: vector of output scalars for testing (optional) Returns: err: average squared loss on training model: vector of multivariate parameter coefficients errT: average squared loss on testing (optional) """ #m -> no of training examples m_train = x.shape[ 0 ] #n -> no of features n = x.shape[ 1 ] model = np.linalg.inv(x.T.dot(x) + (l * np.identity(x.shape[ - 1 ]))).dot(x.T).dot(y) norm_model = np.linalg.norm(model) err = ( 1 / ( 2 * m_train)) * (np. sum (np.power(y - x.dot(model), 2 )) + l * norm_model * norm_model) if xT is not None and yT is not None : errT = ( 1 / ( 2 * xT.shape[ 0 ])) * (np. sum (np.power(yT - xT.dot(model), 2 )) + l * norm_model * norm_model) else : errT = None return err, model, errT Instead of using the train test split, using kfold as the problem statement had mentioned 2 folds
from sklearn.model_selection import KFold kf = KFold(n_splits = 2 , shuffle = True , random_state = 44 ) Gives the best value of lambda (regularization constant) for the given data def getBestLambda(x,y,l = 10 ,n_folds = 2 ): x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33 , random_state = 44 ) train_err = [] test_err = [] min_test_err = float ( 'inf' ) bestLambda = None kf = KFold(n_splits = n_folds, shuffle = True , random_state = 44 ) for a in range (l + 1 ): lambda_train_err = [] lambda_test_err = [] for train_index, test_index in kf.split(x): x_train, x_test = x[train_index], x[test_index] y_train, y_test = y[train_index], y[test_index] err, model, errT = multivariateRegWithRegularization(x_train, y_train, a, x_test, y_test) lambda_train_err.append(err) lambda_test_err.append(errT) avg_train_err = np.mean(lambda_train_err) avg_test_err = np.mean(lambda_test_err) train_err.append(avg_train_err) test_err.append(avg_test_err) if avg_test_err < min_test_err: min_test_err = avg_test_err bestLambda = a return bestLambda, train_err, test_err bestLambda, train_err, test_err = getBestLambda(x2_data, y2_data, 1000 ) bestLambda 1000 Let us plot the lambda (reg constant) vs total error (reg + mse) to get the idea of whats going on
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
def plotLambdaError(x,y,l = 20 ): bestLambda, train_err, test_err = getBestLambda(x, y, l) plt.figure(figsize = ( 12 , 5 )) plt.plot( range (l + 1 ), train_err, label = 'Training Error' , color = 'b' , marker = 'o' ) plt.plot( range (l + 1 ), test_err, label = 'Testing Error' , color = 'g' , marker = 'x' ) plt.xlabel( 'Lambda (Hyperparameter for Regularization)' ) plt.ylabel( 'Error (MSE) + Reg Error' ) plt.legend() plt.show() return bestLambda, train_err, test_err bestAlpha1000, train_err, test_err = plotLambdaError(x2_data, y2_data, 1000 ) With higher value of lambda, the model now fits worse to the training data as so the training error increases with higher value of lambda also the testing error decreases as the model is able to generalize better to data not in its training dataset I feel that the value of lambda although is empirically the higher value 1000, where the testing error is the lowest, i think having a tradeoff makes more sense between training and testing error so my answer for the best value is somewhere between 600 to 700, where the change in error is very less and it plateaus. bestAlpha1000 1000 3) Problem 3 Added the derivations as pdfs.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Machine Learning Assignment 1 Problem 4 Name: Tanmay Rathi import numpy as np import matplotlib.pyplot as plt import math Loading the data from dataset4 from scipy.io import loadmat data = loadmat( '/content/dataset4.mat' ) X = data[ 'X' ].squeeze() y = data[ 'Y' ].squeeze() The second column vector has 1's (Im assuming its for the bias term, w0) Rearranging the columns to get the 1's vector in X to be the first column for consistency as per the convention given in class notes X = X[:,[ 2 , 0 , 1 ]] This method calculates the activation (logistic regression / squashing function) for the given value of z. def sigmoid(z): return 1 / ( 1 + np.exp( - z)) compute_gradient method calculates the dj_dw (gradient) that is used to change the value of parameters (w) in a given iteration of gradient descent. It also gives the error (the empirical risk) and binary classification error. def compute_gradient(X, y, w): f_wb = sigmoid(np.dot(X, w)) epsilon = 1e-5 err = - np.mean(y * np.log(f_wb + epsilon) + ( 1 - y) * np.log( 1 - f_wb + epsilon)) predictions = (f_wb >= 0.5 ).astype( int ) classification_err = np.mean(predictions != y) dj_dw = np.dot(X.T, (f_wb - y)) / X.shape[ 0 ] return dj_dw, err, classification_err gradient_descent method gives the optimized value of parameters(w) by updating the parameters (w) with the gradient calculated from compute_gradient() method multiplied by the learning rate (alpha) It also returns the cost history over the number of iterations(num_iters) and classification error history The tolerance is a small number that can be specified as cutoff value to stop gradient descent, showcasing convergence.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
def gradient_descent(X, y, w, gradient_function, alpha, num_iters, tolerance = 0.0001 , show_stats = False ): J_history = [] classification_error_history = [] for i in range (num_iters): dj_dw, error, classification_error = gradient_function(X, y, w) w -= alpha * dj_dw J_history.append(error) classification_error_history.append(classification_error) if i > 0 and abs (J_history[i - 1 ] - J_history[i]) < tolerance: break if show_stats and (i % math.ceil(num_iters / 10 ) == 0 or i == (num_iters - 1 )): print ( f"Iteration { i :7} : Cost { float (J_history[ - 1 ]) :8.2f} , Classification Error { classification_error :.6f} " ) return w, J_history, classification_error_history Some plotting functions def plot_data(X, y, pos_label = "y=1" , neg_label = "y=0" ): positive = y == 1 negative = y == 0 plt.plot(X[positive, 0 ], X[positive, 1 ], 'k+' , label = pos_label) plt.plot(X[negative, 0 ], X[negative, 1 ], 'bo' , label = neg_label) plt.legend(loc = 'upper right' ) The derivation of the y values for decision boundary is given in the handwritten scanned pages uploaded above. def plot_decision_boundary(w, X, y): plot_data(X[:, 1 :], y) x_values = [ min (X[:, 1 ]), max (X[:, 1 ])] y_values = [( - w[ 0 ] - w[ 1 ] * x) / w[ 2 ] for x in x_values] plt.plot(x_values, y_values, 'g-' , label = "Decision Boundary" ) plt.legend(loc = 'upper right' ) Testing the algorithm with different values of hyperparameters and intiializations. np.random.seed( 1 ) initial_w = 0.01 * (np.random.rand(X.shape[ 1 ]) - 0.5 ) iterations = 10000 alpha = 0.001
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
tolerance = 0.0001 w, J_history, CE_history = gradient_descent(X ,y, initial_w, compute_gradient, alpha, iterations, tolerance, True ) plot_decision_boundary(w, X, y) Iteration 0: Cost 0.69, Classification Error 0.815000 Because the tolerance was too big, the gradient descent algorithm did not run. The error is still very high and the decision boundary is not ideal need a smaller value for tolerance for it to get to convergence. # np.random.seed(1) # initial_w = 0.01 * (np.random.rand(X.shape[1]) - 0.5) # iterations = 10000 # alpha = 0.001 tolerance = 0.000001 w, J_history, CE_history = gradient_descent(X ,y, initial_w, compute_gradient, alpha, iterations, tolerance, True ) plot_decision_boundary(w, X, y)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Iteration 0: Cost 0.69, Classification Error 0.800000 Iteration 1000: Cost 0.65, Classification Error 0.120000 Iteration 2000: Cost 0.62, Classification Error 0.125000 Iteration 3000: Cost 0.59, Classification Error 0.125000 Iteration 4000: Cost 0.57, Classification Error 0.125000 Iteration 5000: Cost 0.55, Classification Error 0.125000 Iteration 6000: Cost 0.53, Classification Error 0.120000 Iteration 7000: Cost 0.52, Classification Error 0.120000 Iteration 8000: Cost 0.51, Classification Error 0.120000 Iteration 9000: Cost 0.50, Classification Error 0.120000 Iteration 9999: Cost 0.49, Classification Error 0.120000 The model is underfitting the data and the decision boundary does not fit the data This is evident from the classification as it plateus after some iterations. The rate of change of loss is too low, so increasing the learning rate alpha. alpha = 0.01 initial_w = np.zeros(X.shape[ 1 ]) tolerance = 0.00000001 w, J_history, CE_history = gradient_descent(X ,y, initial_w, compute_gradient, alpha, iterations, tolerance, True ) plot_decision_boundary(w, X, y)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Iteration 0: Cost 0.69, Classification Error 0.500000 Iteration 1000: Cost 0.49, Classification Error 0.120000 Iteration 2000: Cost 0.43, Classification Error 0.120000 Iteration 3000: Cost 0.40, Classification Error 0.120000 Iteration 4000: Cost 0.39, Classification Error 0.125000 Iteration 5000: Cost 0.37, Classification Error 0.125000 Iteration 6000: Cost 0.36, Classification Error 0.125000 Iteration 7000: Cost 0.35, Classification Error 0.135000 Iteration 8000: Cost 0.34, Classification Error 0.140000 Iteration 9000: Cost 0.34, Classification Error 0.140000 Iteration 9999: Cost 0.33, Classification Error 0.130000 The number of iterations needed for this value of alpha and tolerance seems to higher for the model to converge and the classification error to be less. So the number of iterations need to be increased, for it to get enough small steps of gradient descent to get optimized value of w. alpha = 0.01 initial_w = np.zeros(X.shape[ 1 ]) iterations = 1000000 w, J_history, CE_history = gradient_descent(X ,y, initial_w, compute_gradient, alpha, iterations, tolerance, True ) plot_decision_boundary(w, X, y)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Iteration 0: Cost 0.69, Classification Error 0.500000 Iteration 100000: Cost 0.17, Classification Error 0.040000 Iteration 200000: Cost 0.13, Classification Error 0.020000 Iteration 300000: Cost 0.11, Classification Error 0.015000 Iteration 400000: Cost 0.09, Classification Error 0.010000 Iteration 500000: Cost 0.08, Classification Error 0.010000 Iteration 600000: Cost 0.08, Classification Error 0.010000 Iteration 700000: Cost 0.07, Classification Error 0.010000 Iteration 800000: Cost 0.07, Classification Error 0.010000 Iteration 900000: Cost 0.06, Classification Error 0.010000 Iteration 999999: Cost 0.06, Classification Error 0.010000 With higher number of iterations, it seems a good fit Let me try starting with random value for the parameters w alpha = 0.01 initial_w = 0.01 * (np.random.rand(X.shape[ 1 ]) - 0.5 ) iterations = 1000000 w, J_history, CE_history = gradient_descent(X ,y, initial_w, compute_gradient, alpha, iterations, tolerance, True ) plot_decision_boundary(w, X, y) Iteration 0: Cost 0.69, Classification Error 0.505000 Iteration 100000: Cost 0.17, Classification Error 0.040000
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Iteration 200000: Cost 0.13, Classification Error 0.020000 Iteration 300000: Cost 0.11, Classification Error 0.015000 Iteration 400000: Cost 0.09, Classification Error 0.010000 Iteration 500000: Cost 0.08, Classification Error 0.010000 Iteration 600000: Cost 0.08, Classification Error 0.010000 Iteration 700000: Cost 0.07, Classification Error 0.010000 Iteration 800000: Cost 0.07, Classification Error 0.010000 Iteration 900000: Cost 0.06, Classification Error 0.010000 Iteration 999999: Cost 0.06, Classification Error 0.010000 w array([-14.46400209, 35.8759321 , 20.4542973 ]) NAME: TANMAY ANIL RATHI
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help