HW2p_uub0hQq

pdf

School

Arizona State University *

*We aren’t endorsed by this school

Course

391L

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

8

Uploaded by MajorVulturePerson88

Report
HW2p_uub0hQq February 6, 2024 [429]: import numpy as np import matplotlib.pyplot as plt from sklearn import linear_model import time from joblib import Parallel, delayed 1 Problem 1 1.1 Dataset Generation Write a function to generate a training set of size 𝑚 - randomly generate a weight vector ? ∈ ℝ 10 , normalize length - generate a training set {(? 𝑖 , ? 𝑖 )} of size m - ? 𝑖 : random vector in 10 from N (0, 𝐼) - ? 𝑖 : {0, +1} with 𝑃[? = +1] = 𝜎(? ⋅ ? 𝑖 ) and 𝑃[? = 0] = 1 − 𝜎(? ⋅ ? 𝑖 ) [430]: def get_w_star (vector_size): normal_dist_vector = np . random . normal( 0 , 1 , vector_size) euclidean_norm = np . sqrt(np . sum(np . square(normal_dist_vector))) normalized_vector = normal_dist_vector / euclidean_norm return normalized_vector def sigmoid_function (z): return 1 / ( 1 + np . exp( - z)) def euclidean_distance (v1, v2): return np . sqrt(np . sum(np . square(v1 - v2))) [431]: def generate_data (m): w_star = get_w_star( 10 ) X = np . random . normal( 0 , 1 , (m, 10 )) probabilities = sigmoid_function(np . dot(X, w_star)) uniform_draw = np . random . uniform( 0 , 1 , m) Y = (uniform_draw <= probabilities) . astype( int ) return w_star, X, Y # returns the true w as well as X, Y data 1
#generates consistent training data for each repetition of training size, m #all algorithms predicting same true weight vector in each repetition/m pair, standardizing performance comparisions def generate_datasets_for_repetitions_of_m (training_sizes, repetitions): datasets = {m: [generate_data(m) for _ in range (repetitions)] for m in training_sizes} return datasets [432]: #runs single repetition of a given algorithm, used to parallelize repetitions, #returns distance and duration of the repetition def run_single_repetition (algorithm, dataset): w_star, X, Y = dataset start = time . time() w_prime = algorithm(X, Y) distance = euclidean_distance(w_star, w_prime) duration = time . time() - start #timing training of w' and distance calculation return distance, duration #runs passed algorithm, parallelizing each repetition for a given m #returns average distance and duration of all repetitions for a given training size def run_algorithm (algorithm, datasets, m): results = Parallel(n_jobs =-1 )(delayed(run_single_repetition)(algorithm, dataset) for dataset in datasets[m]) distances, times = zip ( * results) return np . mean(distances), np . mean(times) [433]: #experiment parameters training_sizes = [ 50 , 100 , 150 , 200 , 250 ] repetitions = 10 #consistent datasets datasets = generate_datasets_for_repetitions_of_m(training_sizes, repetitions) #results dict, key = regression method, value = (training size, avg distance, avg time per repetition) size_distance_time = {} #total time per experiment dict total_time = {} 2
1.2 Algorithm 1: logistic regression The goal is to learn ? . Algorithm 1 is logistic regression (you may use the built-in method Logis- ticRegression for this. Use max_iter=1000). [434]: def logistic_regression (X, Y): log_regression = linear_model . LogisticRegression(solver = 'lbfgs' , max_iter =1000 ) log_regression . fit(X, Y) w_prime = log_regression . coef_[ 0 ] return w_prime [435]: experiment_start = time . time() for m in training_sizes: d, t = run_algorithm(logistic_regression, datasets, m) size_distance_time . setdefault( "log_regression" , []) . append((m, d, t)) experiment_end = time . time() experiment_duration = experiment_end - experiment_start total_time[ "log_regression" ] = experiment_duration 1.3 Algorithm 2: gradient descent with square loss Define square loss as 𝐿 𝑖 (? (𝑡) ) = 1 2 (𝜎(? (𝑡) ⋅ ?) − ? 𝑖 ) 2 Algorithm 2 is gradient descent with respect to square loss (code this up yourself – run for 1000 iterations, use step size eta = 0.01). [436]: def calculate_gd_gradient (inputs, labels, weights): sigmoid_output = sigmoid_function(np . dot(inputs, weights)) sigmoid_derivative = sigmoid_output * ( 1- sigmoid_output) errors = sigmoid_output - labels gradient = (np . dot(inputs . T, errors * sigmoid_derivative)) / len (labels) return gradient def gradient_descent (X, Y, step_size =.01 , iterations =1000 ): w_prime = np . zeros(X . shape[ 1 ]) for _ in range (iterations): gradient = calculate_gd_gradient(X, Y, w_prime) w_prime -= step_size * gradient return w_prime 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[437]: experiment_start = time . time() for m in training_sizes: d, t = run_algorithm(gradient_descent, datasets, m) size_distance_time . setdefault( 'gradient_descent' , []) . append((m, d, t)) experiment_end = time . time() experiment_duration = experiment_end - experiment_start total_time[ "gradient_descent" ] = experiment_duration 1.4 Algorithm 3: stochastic gradient descent with square loss Similar to gradient descent, except we use the gradient at a single random training point every iteration. [438]: def calculate_sgd_gradient ( input , label, weights): sigmoid_output = sigmoid_function(np . dot( input , weights)) sigmoid_derivative = sigmoid_output * ( 1- sigmoid_output) errors = sigmoid_output - label gradient = (np . dot( input . T, errors * sigmoid_derivative)) return gradient def stochastic_gradient_descent (X, Y, step_size =.01 , iterations =1000 ): w_prime = np . zeros(X . shape[ 1 ]) for _ in range (iterations): i = np . random . randint( 0 , len (X)) x_i = X[i:i +1 ] y_i = Y[i] gradient = calculate_sgd_gradient(x_i, y_i, w_prime) w_prime -= step_size * gradient return w_prime [439]: experiment_start = time . time() for m in training_sizes: d, t = run_algorithm(stochastic_gradient_descent, datasets, m) size_distance_time . setdefault( 'stochastic_gradient_descent' , []) . append((m, d, t)) experiment_end = time . time() experiment_duration = experiment_end - experiment_start total_time[ "stochastic_gradient_descent" ] = experiment_duration 4
1.5 Evaluation Measure error ‖? − ̂?‖ 2 for each method at different sample size. For any fixed value of 𝑚 , choose many different ? ’s and average the values ‖? − ̂?‖ 2 for Algorithms 1, 2 and 3. Plot the results for for each algorithm as you make 𝑚 large (use 𝑚 = 50, 100, 150, 200, 250 ). Also record, for each algorithm, the time taken to run the overall experiment. [440]: algorithms = list (size_distance_time . keys()) fig, axs = plt . subplots( 1 , 2 , figsize = ( 12 , 6 )) for algorithm in algorithms: m, d, t = zip ( * size_distance_time[algorithm]) axs[ 0 ] . plot(m, d, label = algorithm . replace( '_' , ' ' ) . title()) axs[ 1 ] . plot(m, t, label = algorithm . replace( '_' , ' ' ) . title()) axs[ 0 ] . set_xlabel( "Training Size (m)" ) axs[ 0 ] . set_ylabel( "Average Euclidean Distance (||w*-w'||)" ) axs[ 0 ] . set_title( 'Comparison of Algorithm Performance (distance vs size)' ) axs[ 0 ] . grid( True ) axs[ 1 ] . set_xlabel( "Training Size (m)" ) axs[ 1 ] . set_ylabel( "Average Time for Training (seconds per repetition)" ) axs[ 1 ] . set_title( 'Comparison of Algorithm Performance (time vs size)' ) axs[ 1 ] . grid( True ) plt . legend() plt . tight_layout() plt . show() print ( "Average Distance and Duration for Each Algorithm and Training Size:" ) for algorithm in algorithms: print ( f" \n { algorithm . replace( '_' , ' ' ) . title() } :" ) for m, d, t in size_distance_time[algorithm]: print ( f"Training Size { m } : Distance { d : .4f } , { t : .4f } seconds." ) print ( f" \n Total Training Time for Each Algorithm (all Repetitions (running in parallel) of all Training Sizes)" ) for algorithm in algorithms: print ( f" { algorithm . replace( '_' , ' ' ) . title() } : { total_time[algorithm] : .4f } seconds." ) 5
Average Distance and Duration for Each Algorithm and Training Size: Log Regression: Training Size 50: Distance 0.9229, 0.0029 seconds. Training Size 100: Distance 0.8648, 0.0027 seconds. Training Size 150: Distance 0.6204, 0.0015 seconds. Training Size 200: Distance 0.5869, 0.0016 seconds. Training Size 250: Distance 0.4129, 0.0015 seconds. Gradient Descent: Training Size 50: Distance 0.7107, 0.0176 seconds. Training Size 100: Distance 0.6777, 0.0184 seconds. Training Size 150: Distance 0.6552, 0.0228 seconds. Training Size 200: Distance 0.6629, 0.0247 seconds. Training Size 250: Distance 0.6372, 0.0303 seconds. Stochastic Gradient Descent: Training Size 50: Distance 0.7213, 0.0234 seconds. Training Size 100: Distance 0.6862, 0.0239 seconds. Training Size 150: Distance 0.6641, 0.0219 seconds. Training Size 200: Distance 0.6736, 0.0197 seconds. Training Size 250: Distance 0.6294, 0.0188 seconds. Total Training Time for Each Algorithm (all Repetitions (running in parallel) of all Training Sizes) Log Regression: 0.0954 seconds. Gradient Descent: 0.2533 seconds. Stochastic Gradient Descent: 0.2300 seconds. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2 Problem 2 [445]: from sklearn import datasets, ensemble, tree from sklearn.model_selection import cross_val_score [446]: cancer = datasets . load_breast_cancer() X = cancer . data y = cancer . target For each depth in 1, … , 5 , instantiate an AdaBoost classifier with the base learner set to be a decision tree of that depth (set n_estimators=10 and learning_rate=1 ), and then record the 10- fold cross-validated error on the entire breast cancer data set. Plot the resulting curve of accuracy against base classifier depth. Use 101 as your random state for both the base learner as well as the AdaBoost classifier every time. [447]: RANDOM_STATE = 101 n_estimators = 10 learning_rate = 1 max_depth = [i for i in range ( 1 , 6 )] depth_cv_accuracies = [] for depth in max_depth: DecisionTree_clf = tree . DecisionTreeClassifier(max_depth = depth, random_state = RANDOM_STATE) AdaBoost_clf = ensemble . AdaBoostClassifier(estimator = DecisionTree_clf, n_estimators = n_estimators, learning_rate = learning_rate, random_state = RANDOM_STATE) # cv_accuracy = cross_val_score(AdaBoost_clf, X, y, cv =10 ) . mean() depth_cv_accuracies . append((depth,cv_accuracy)) [448]: depths, accuracies = zip ( * depth_cv_accuracies) plt . plot(depths, accuracies) plt . xlabel( "Max Depth of Decision Tree" ) plt . ylabel( "10-Fold CV Accuracy of Decision Tree with AdaBoost" ) plt . title( "Accuracy with AdaBoost vs Depth of Decision Tree" ) plt . xticks(max_depth) plt . grid( True ) plt . show() 7
8