Project2

pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

M148

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by patilkunal919

Project 2 - Binary Classification Comparative Methods For this project we're going to attempt a binary classification of a dataset using multiple methods and compare results. Our goals for this project will be to introduce you to several of the most common classification techniques, how to perform them and tweek parameters to optimize outcomes, how to produce and interpret results, and compare performance. You will be asked to analyze your findings and provide explanations for observed performance. DEFINITIONS </u> Binary Classification: In this case a complex dataset has an added 'target' label with one of two options. Your learning algorithm will try to assign one of these labels to the data. Supervised Learning: This data is fully supervised, which means it's been fully labeled and we can trust the veracity of the labeling. Submission Details Project is due May 17th at 12:00 pm (Wednesday Noon). To submit the project, please save the notebook as a pdf file and submit the assignment via Gradescope. In addition, make sure that all figures are legible and su ff iciently large. For best pdf results, we recommend downloading Latex and print the notebook using Latex. Loading Essentials and Helper Functions In [ ]: #Here are a set of libraries we imported to complete this assignment. #Feel free to use these or equivalent libraries for your implementation import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) import matplotlib.pyplot as plt # this is used for the plot the graph import matplotlib import os import time #Sklearn classes from sklearn.model_selection import train_test_split , cross_val_score , GridSearchCV , KFo from sklearn import metrics from sklearn.svm import SVC #SVM classifier from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import confusion_matrix import sklearn.metrics.cluster as smc from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler , OneHotEncoder , Normalizer , MinMaxScale from sklearn.compose import ColumnTransformer , make_column_transformer Loading [MathJax]/extensions/Safe.js

from matplotlib import pyplot import itertools % matplotlib inline #Sets random seed import random random . seed ( 42 ) In [ ]: # Helper function allowing you to export a graph def save_fig ( fig_id , tight_layout = True , fig_extension = "png" , resolution = 300 ): path = os . path . join ( fig_id + "." + fig_extension ) print ( "Saving figure" , fig_id ) if tight_layout : plt . tight_layout () plt . savefig ( path , format = fig_extension , dpi = resolution ) In [ ]: # Helper function that allows you to draw nicely formatted confusion matrices def draw_confusion_matrix ( y , yhat , classes ): ''' Draws a confusion matrix for the given target and predictions Adapted from scikit-learn and discussion example. ''' plt . cla () plt . clf () matrix = confusion_matrix ( y , yhat ) plt . imshow ( matrix , interpolation = 'nearest' , cmap = plt . cm . YlOrBr ) plt . title ( "Confusion Matrix" ) plt . colorbar () num_classes = len ( classes ) plt . xticks ( np . arange ( num_classes ), classes , rotation = 90 ) plt . yticks ( np . arange ( num_classes ), classes ) fmt = 'd' thresh = matrix . max () / 2. for i , j in itertools . product ( range ( matrix . shape [ 0 ]), range ( matrix . shape [ 1 ])): plt . text ( j , i , format ( matrix [ i , j ], fmt ), horizontalalignment = "center" , color = "white" if matrix [ i , j ] > thresh else "black" ) plt . ylabel ( 'True label' ) plt . xlabel ( 'Predicted label' ) plt . tight_layout () plt . show () In [ ]: def heatmap ( data , row_labels , col_labels , figsize = ( 20 , 12 ), cmap = "YlGn" , cbar_kw = {}, cbarlabel = "" , valfmt = " {x:.2f} " , textcolors = ( "black" , "white" ), threshold = None ): """ Create a heatmap from a numpy array and two lists of labels. Taken from matplotlib example. Parameters ---------- data A 2D numpy array of shape (M, N). row_labels A list or array of length M with the labels for the rows. col_labels A list or array of length N with the labels for the columns. ax Loading [MathJax]/extensions/Safe.js

A `matplotlib.axes.Axes` instance to which the heatmap is plotted. If not provided, use current axes or create a new one. Optional. cmap A string that specifies the colormap to use. Look at matplotlib docs for informa Optional. cbar_kw A dictionary with arguments to `matplotlib.Figure.colorbar`. Optional. cbarlabel The label for the colorbar. Optional. valfmt The format of the annotations inside the heatmap. This should either use the string format method, e.g. "$ {x:.2f}", or be a `matplotlib.ticker.Formatter`. Optional. textcolors A pair of colors. The first is used for values below a threshold, the second for those above. Optional. threshold Value in data units according to which the colors from textcolors are applied. If None (the default) uses the middle of the colormap as """ plt . figure ( figsize = figsize ) ax = plt . gca () # Plot the heatmap im = ax . imshow ( data , cmap = cmap ) # Create colorbar cbar = ax . figure . colorbar ( im , ax = ax , ** cbar_kw ) cbar . ax . set_ylabel ( cbarlabel , rotation =- 90 , va = "bottom" ) # Show all ticks and label them with the respective list entries. ax . set_xticks ( np . arange ( data . shape [ 1 ]), labels = col_labels ) ax . set_yticks ( np . arange ( data . shape [ 0 ]), labels = row_labels ) # Let the horizontal axes labeling appear on top. ax . tick_params ( top = True , bottom = False , labeltop = True , labelbottom = False ) # Rotate the tick labels and set their alignment. plt . setp ( ax . get_xticklabels (), rotation =- 30 , ha = "right" , rotation_mode = "anchor" ) # Turn spines off and create white grid. ax . spines [:] . set_visible ( False ) ax . set_xticks ( np . arange ( data . shape [ 1 ] + 1 ) - .5 , minor = True ) ax . set_yticks ( np . arange ( data . shape [ 0 ] + 1 ) - .5 , minor = True ) ax . grid ( which = "minor" , color = "w" , linestyle = '-' , linewidth = 3 ) ax . tick_params ( which = "minor" , bottom = False , left = False ) # Normalize the threshold to the images color range. if threshold is not None : threshold = im . norm ( threshold ) else : threshold = im . norm ( data . max ()) / 2. # Set default alignment to center, but allow it to be # overwritten by textkw. kw = dict ( horizontalalignment = "center" , verticalalignment = "center" ) # Get the formatter in case a string is supplied Loading [MathJax]/extensions/Safe.js

Your preview ends here