lab12

docx

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

C200

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

22

Uploaded by DrStar12779

Report
lab12 April 14, 2023 [1]: # Initialize Otter import otter grader = otter . Notebook( "lab12.ipynb" ) 0.0.1 Content Warning This lab includes discusssion about cancer. If you feel uncomfortable with this topic, please contact your GSI or the instructors, or reach out via the Spring 2023 extenuating circumstances form. 1 Lab 12: Logistic Regression In this lab, we will manually construct the logistic regression model and minimize cross-entropy loss using scipy.minimize. This structure mirrors the linear regression labs from earlier in the semester and lets us dive deep into how logistic regression works. We also introduce the sklearn.linear_model.LogisticRegression module that you would use in practice, and we ex plore performance metrics for classification. 1.0.1 Due Date The on-time deadline is Tuesday, April 18th, 11:59 PM PT . Please read the syllabus for the grace period policy. No late submissions beyond the grace period will be accepted. 1.0.2 Collaboration Policy Data science is a collaborative activity. While you may talk with others about this assignment, we ask that you write your solutions individually . If you discuss the assignment with others, please include their names in the cell below. Collaborators: list names here [2]: # Run this cell to set up your notebook import numpy as np import pandas as pd import sklearn 1 import sklearn.datasets import matplotlib.pyplot as plt import seaborn as sns import plotly.offline as py import plotly.graph_objs as go
import plotly.figure_factory as ff % matplotlib inline sns . set() sns . set_context( "talk" ) 1.0.3 Lab Walk-Through In addition to the lab notebook, we have also released a prerecorded walk-through video of the lab. We encourage you to reference this video as you work through the lab. Run the cell below to display the video. Note : The walkthrough video is recorded from Spring 2022. [3]: from IPython.display import YouTubeVideo YouTubeVideo( "75hj59nas-M" ) [3]: 2 1.1 Data Loading We will explore a breast cancer dataset from the University of Wisconsin ( source ). This dataset can be loaded using the sklearn.datasets.load_breast_cancer() method. [4]: # Run this cell to load the data, no further action is needed. data = sklearn . datasets . load_breast_cancer() # Data is a dictionary. print (data . keys()) print (data . DESCR)
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module']) .. _breast_cancer_dataset: Breast cancer wisconsin (diagnostic) dataset -------------------------------------------- **Data Set Characteristics:** :Number of Instances: 569 :Number of Attributes: 30 numeric, predictive attributes and the class :Attribute Information: - radius (mean of distances from center to points on the perimeter) - texture (standard deviation of gray-scale values) - perimeter - area - smoothness (local variation in radius lengths) - compactness (perimeter^2 / area - 1.0) - concavity (severity of concave portions of the contour) - concave points (number of concave portions of the contour) - symmetry - fractal dimension ("coastline approximation" - 1) The mean, standard error, and "worst" or largest (mean of the three worst/largest values) of these features were computed for each image, resulting in 30 features. For instance, field 0 is Mean Radius, field 10 is Radius SE, field 20 is Worst Radius. - class: - WDBC-Malignant 3 - WDBC-Benign :Summary Statistics: ===================================== ====== ====== Min Max ===================================== ====== ====== radius (mean): 6.981 28.11 texture (mean): 9.71 39.28 perimeter (mean): 43.79 188.5 area (mean): 143.5 2501.0 smoothness (mean): 0.053 0.163 compactness (mean): 0.019 0.345 concavity (mean): 0.0 0.427 concave points (mean): 0.0 0.201 symmetry (mean): 0.106 0.304 fractal dimension (mean): 0.05 0.097 radius (standard error): 0.112 2.873
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
texture (standard error): 0.36 4.885 perimeter (standard error): 0.757 21.98 area (standard error): 6.802 542.2 smoothness (standard error): 0.002 0.031 compactness (standard error): 0.002 0.135 concavity (standard error): 0.0 0.396 concave points (standard error): 0.0 0.053 symmetry (standard error): 0.008 0.079 fractal dimension (standard error): 0.001 0.03 radius (worst): 7.93 36.04 texture (worst): 12.02 49.54 perimeter (worst): 50.41 251.2 area (worst): 185.2 4254.0 smoothness (worst): 0.071 0.223 compactness (worst): 0.027 1.058 concavity (worst): 0.0 1.252 concave points (worst): 0.0 0.291 symmetry (worst): 0.156 0.664 fractal dimension (worst): 0.055 0.208 ===================================== ====== ====== :Missing Attribute Values: None :Class Distribution: 212 - Malignant, 357 - Benign :Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian :Donor: Nick Street :Date: November, 1995 4 This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets. https://goo.gl/U2Uwz2 Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. Separating plane described above was obtained using Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree Construction Via Linear Programming." Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes. The actual linear program used to obtain the separating plane in the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets",
Optimization Methods and Software 1, 1992, 23-34]. This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/ .. topic:: References - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993. - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995. - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171. Since the data format is a dictionary, we do some preprocessing to create a pandas.DataFrame. [5]: # Run this cell to see the first five rows of the data, no further action is needed. 5 df = pd . DataFrame(data . data, columns = data . feature_names) df . head() [5]: mean radius mean texture mean perimeter mean area mean smoothness \ 0 17.99 10.38 122.80 1001.0 0.11840 1 20.57 17.77 132.90 1326.0 0.08474 2 19.69 21.25 130.00 1203.0 0.10960 3 11.42 20.38 77.58 386.1 0.14250 4 20.29 14.34 135.10 1297.0 0.10030 mean compactness mean concavity mean concave points mean symmetry \ 0 0.27760 0.3001 0.14710 0.2419 1 0.07864 0.0869 0.07017 0.1812 2 0.15990 0.1974 0.12790 0.2069 3 0.28390 0.2414 0.10520 0.2597 4 0.13280 0.1980 0.10430 0.1809 mean fractal dimension … worst radius worst texture worst perimeter \ 0 0.07871 … 25.38 17.33 184.60 1 0.05667 … 24.99 23.41 158.80 2 0.05999 … 23.57 25.53 152.50 3 0.09744 … 14.91 26.50 98.87 4 0.05883 … 22.54 16.67 152.20 worst area worst smoothness worst compactness worst concavity \ 0 2019.0 0.1622 0.6656 0.7119 1 1956.0 0.1238 0.1866 0.2416 2 1709.0 0.1444 0.4245 0.4504 3 567.7 0.2098 0.8663 0.6869 4 1575.0 0.1374 0.2050 0.4000 worst concave points worst symmetry worst fractal dimension 0 0.2654 0.4601 0.11890 1 0.1860 0.2750 0.08902 2 0.2430 0.3613 0.08758 3 0.2575 0.6638 0.17300 4 0.1625 0.2364 0.07678 [5 rows x 30 columns]
The prediction task for this data is to predict whether a tumor is benign or malignant (a binary decision) given characteristics of that tumor. As a classic machine learning dataset, the prediction task is captured by the field data.target. To put the data back in its original context, we will create a new column called "malignant" which will be 1 if the tumor is malignant and 0 if it is benign (reversing the definition of target). In this lab, we will fit a simple classification model to predict breast cancer from the cell nuclei of a breast mass. For simplicity, we will work with only one feature: the mean radius which 6 corresponds to the size of the tumor. Our output (i.e., response) is the malignant column. [6]: # Run this cell to define X and Y, no further action is needed. # Target data_dict['target'] = 0 is malignant 1 is benign df[ 'malignant' ] = (data . target == 0 ) . astype( int ) # Define our features/design matrix X X = df[[ "mean radius" ]] Y = df[ 'malignant' ] Before we go further, we will split our dataset into training and testing data. This lets us explore the prediction power of our trained classifier on both seen and unseen data. [7]: # Run this cell to create a 75-25 train-test split, no further action needed. from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size =0.25 , random_state =42 ) print ( f"Training Data Size: { len (X_train) } " ) print ( f"Test Data Size: { len (X_test) } " ) Training Data Size: 426 Test Data Size: 143 2 Part 1: Defining the Model In these first two parts, you will manually build a logistic regression classifier. Recall that the Logistic Regression model is written as follows: �� �� ( ) = ( �� �� �� �� ) �� where �� �� ( ) = ( = 1| ) is the probability that our observation belongs to class �� �� �� �� 1, and is the sigmoid activation function: �� ( ) = �� �� 1 1 + �� −�� If we have a single feature, then is a scalar and our model has parameters �� �� �� = [ �� 0 �� 1 ] as follows:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
�� �� ( ) = ( �� �� �� 0 + �� 1 ) �� Therefore just like OLS, if we have datapoints and features, then we can construct the �� �� design matrix �� ∈ ℝ ×( +1) �� �� 7 with an all-ones column. Run the below cell to construct X_intercept_train. The syntax should look familiar: [8]: # Run this cell to add the bias column, no further action needed. def add_bias_column (X): return np . hstack([np . ones(( len (X), 1 )), X]) X_intercept_train = add_bias_column(X_train) X_intercept_train . shape [8]: (426, 2) 2.0.1 Question 1a Using the above definition for , we can also construct a matrix representation of our Logistic �� Regression model, just like we did for OLS. Noting that �� �� = [ �� 0 �� 1 �� �� ], the vector �� ̂ is: = ( ) �� �� ���� Then the -th element of �� �� ̂ is the probability that the -th observation belongs to class 1, �� given the feature vector is the -th row of design matrix and the parameter vector is . �� �� �� Below, implement the lr_model function to evaluate this expression. To matrix-multiply two numpy arrays, use @ or np.dot. In case you’re interested, the matmul documentation contrasts the two methods. [9]: def sigmoid (z): """ The sigmoid function, defined for you. """ return 1 / ( 1 + np . exp( - z)) def lr_model (theta, X): """ Return the logistic regression model as defined above. You should not need to use a for loop; use @ or np.dot. Args: theta: The model parameters. Dimension (p+1,).
X: The design matrix. Dimension (n, p+1). Return: Probabilities that Y = 1 for each datapoint. Dimension (n,). """ 8 return sigmoid(X . dot(theta)) # SOLUTION [10]: grader . check( "q1a" ) [10]: q1a results: All test cases passed! 2.0.2 Question 1b: Compute Empirical Risk Now let’s try to analyze the cross-entropy loss from logistic regression. Suppose for a single obser vation, we predict probability �� that the true response �� is in class 1 (otherwise the prediction is 0 with probability 1 − ��). The cross-entropy loss is: − (�� log(��) + (1 − ��)log(1 − ��)) For the logistic regression model, the empirical risk is therefore defined as the average cross entropy loss across all datapoints: �� ��(��) = − 1 �� �� =1 �� ( �� �� log( ( �� �� �� �� ��)) + (1 − �� �� )log(1 − ��(�� �� �� ))) �� Where �� �� is the ��−th response in our dataset, �� are the parameters of our model, �� �� �� is the i’th row of our design matrix , and ( �� �� �� �� �� ) is the �� probability that the response is 1 given input �� �� . Note : In this class, when performing linear algebra operations, we interpret both rows and columns as column vectors. So if we wish to calculate the dot product between column vector �� �� and a vector , we write �� �� �� �� . �� Below, implement the function lr_loss that computes empirical risk over the dataset. Feel free to use functions defined in the previous part. [11]: def lr_avg_loss (theta, X, Y): ''' Compute the average cross entropy loss using X, Y, and theta. You should not need to use a for loop. Args: theta: The model parameters. Dimension (p+1,) X: The design matrix. Dimension (n, p+1).
Y: The label. Dimension (n,). Return: The average cross entropy loss. ''' # BEGIN SOLUTION prob_1s = sigmoid(X . dot(theta)) # or lr_model(theta, X) 9 loss = - np . mean((Y * np . log(prob_1s)) + (( 1 - Y) * np . log( 1 - prob_1s))) # END SOLUTION return loss # SOLUTION [12]: grader . check( "q1b" ) [12]: q1b results: All test cases passed! Below is a plot showing the average training cross-entropy loss for various values of �� 0 and �� 1 (respectively x and y axis in the plot). [13]: # Run this cell to create the plotly visualization, no further action needed. with np . errstate(invalid = 'ignore' , divide = 'ignore' ): uvalues = np . linspace( -8 , 8 , 70 ) vvalues = np . linspace( -5 , 5 , 70 ) (u,v) = np . meshgrid(uvalues, vvalues) thetas = np . vstack((u . flatten(),v . flatten())) lr_avg_loss_values = np . array([lr_avg_loss(t, X_intercept_train, Y_train) for t in thetas . T]) lr_loss_surface = go . Surface(name = "Logistic Regression Loss" , x = u, y = v, z = np . reshape(lr_avg_loss_values,( len (uvalues), len (vvalues))), contours = dict (z = dict (show = True , color = "gray" , project = dict (z = True ))) ) fig = go . Figure(data = [lr_loss_surface]) fig . update_layout( scene = dict ( xaxis_title = 'theta_0' , yaxis_title = 'theta_1' , zaxis_title = 'Loss' ), width =700 , margin = dict (r =20 , l =10 , b =10 , t =10 )) py . iplot(fig)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10 2.0.3 Question 1c Describe one interesting observation about the loss plot above. Type your answer here, replacing this text. SOLUTION: One remark that can be made is that this plot shows that there are multiple points that minimize the loss. Therefore, there is not necessarily a unique optimizer for the cross-entropy loss function. 3 Part 2: Fit and Predict 3.0.1 [Tutorial] scipy.optimize.minimize The next two cells call the minimize function from scipy on the lr_avg_loss function you defined in the previous part. We pass in the training data to args ( documentation ) to find the theta_hat that minimizes the average cross-entropy loss over the training set. [14]: # Run this cell to minimize lr_avg_loss using scipy, no further action needed. from scipy.optimize import minimize min_result = minimize(lr_avg_loss, x0 = np . zeros(X_intercept_train . shape[ 1 ]), args = (X_intercept_train, Y_train)) min_result [14]: fun: 0.3123767645009187 hess_inv: array([[747.98712729, -52.13268913], [-52.13268913, 3.68380729]]) jac: array([-4.13507223e-07, -7.34627247e-06]) message: 'Optimization terminated successfully.' nfev: 57 nit: 16 njev: 19 status: 0 success: True x: array([-13.87178638, 0.93723916])
[15]: # Run this cell to print `theta_hat`, no further action needed. theta_hat = min_result[ 'x' ] theta_hat 11 [15]: array([-13.87178638, 0.93723916]) Because our design matrix leads with a column of all ones, theta_hat has two elements: �� ̂ �� 0 is the estimate of the intercept/bias term, and ̂ �� 1 is the estimate of the slope of our single feature. 3.0.2 Recap: • For logistic regression with parameter , ( = 1| ) = ( �� �� �� �� �� �� �� ), where �� is the sigmoid function and is a feature vector. Therefore ( �� �� �� �� �� ��) is the Bernoulli probability that the response is 1 given the feature is ��. Otherwise the response is 0 with probability �� (�� = 0|��) = 1 − ��(�� �� ). �� • The ̂ that �� minimizes average cross entropy loss of our training data also maximizes the likelihood of observing the training data according to the logistic regression model (check out lecture for more details). The main takeaway is that logistic regression models probabilities of classifying datapoints as 1 or 0. Next, we use this takeaway to implement model predictions. 3.1 Question 2 Using the theta_hat estimate above, we can construct a decision rule for classifying a datapoint with observation . Let ( = 1| ) = ( �� �� �� �� �� �� �� ̂ ): �� classify( ) = { �� 1, if �� (�� = 1|��) ≥ 0.5 0, if ( = 1| ) < 0.5 �� �� �� This decision rule has a decision threshold = 0.5. This threshold means that we treat the �� classes 0 and 1 “equally.” Lower thresholds mean that we are more likely to predict 1, whereas higher thersholds mean that we are more likely to predict 0. Implement the lr_predict function below, which returns a vector of predictions according to the logistic regression model. The function takes a design matrix of observations X, parameter estimate theta, and decision threshold threshold with default value 0.5. [16]: def lr_predict (theta, X, threshold =0.5 ): ''' Classification using a logistic regression model with a given decision rule threshold. Args:
theta: The model parameters. Dimension (p+1,) X: The design matrix. Dimension (n, p+1). threshold: decision rule threshold for predicting class 1. Return: 12 A vector of predictions. ''' return (lr_model(theta, X) >= threshold) . astype( int ) # SOLUTION # Do not modify below this line. Y_train_pred = lr_predict(theta_hat, X_intercept_train) Y_train_pred [16]: array([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0]) [17]: grader . check( "q2" ) [17]: q2 results: All test cases passed! 3.1.1 [Tutorial] Linearly separable data How do these predicted classifications compare to the true responses ? �� Run the below cell to visualize our predicted responses, the true responses, and the probabili ties we used to make predictions. We use sns.stripplot which introduces some jitter to avoid overplotting. [18]: # Run this cell to generate the visualization, no further action needed. plot_df = pd . DataFrame({ "X" : np . squeeze(X_train), "Y" : Y_train,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
"Y_pred" : Y_train_pred, "correct" : (Y_train == Y_train_pred)}) sns . stripplot(data = plot_df, x = "X" , y = "Y" , orient = 'h' , alpha =0.5 , hue = "correct" ) 13 plt . xlabel( 'mean radius, $x$' ) plt . ylabel( '$y$' ) plt . yticks(ticks = [ 0 , 1 ], labels = [ '0: \n benign' , '1: \n malignant' ]) plt . title( "Predictions for decision threshold T = 0.5" ) plt . show() Because we are using a decision threshold �� = 0.5, we predict 1 for all �� where ��( ⃗�� �� ��) ≥ 0.5, which happens when: 1 1 + �� − ⃗�� �� �� = 1 2 → �� − ⃗�� �� �� = 1 → ⃗�� �� = 0 �� . For the single mean radius feature, we can use algebra to solve for the boundary to be approximately �� ≈ 14.8. We can see this by substituting for �� = ̂ in the equation �� above: ⃗⃗�� �� ̂ = 0 �� [1 �� ] [ ̂ �� 0 ̂ �� 1 ] = 0 From the minimize function, we found that theta_hat is array([-13.87178638, 0.93723916]). Plugging for ̂ : �� −13.87178638 + 0.93723916�� = 0�� ≈ 14.8
14 In other words, will always predict 0 (benign) if the mean radius feature is less than 14.8, and 1 (malignant) otherwise. However, in our training data there are datapoints with large mean radii that are benign, and vice versa. Our data is not linearly separable by a vertical line. The above visualization is useful when we have just one feature. In practice, however, we use other performance metrics to diagnose our model performance. Next, we will explore several such metrics: accuracy, precision, recall, and confusion matrices. 4 Part 3: Quantifying Performance 4.0.1 [Tutorial] sklearn’s LogisticRegression Instead of using the model structure we built manually in the previous questions, we will instead use sklearn’s LogisticRegression model, which operates similarly to the sklearn OLS, Ridge, and LASSO models. Let’s first fit a logistic regression model to the training data. Some notes: * Like with linear models, the fit_intercept argument specifies if model includes an intercept term. We therefore pass in the original matrix X_train (defined at the beginning of the notebook, without intercept term) in the call to lr.fit(). * sklearn fits a l2 regularized logistic regression model by default as specified in the documentation for more details. The penalty argument specifies the regularization penalty term. [19]: # Run this cell to fit a sklearn LogisticRegression model, no further action needed. from sklearn.linear_model import LogisticRegression lr = LogisticRegression( fit_intercept = True , penalty = 'l2' ) lr . fit(X_train, Y_train) lr . intercept_, lr . coef_ [19]: (array([-13.75289919]), array([[0.92881284]])) Note that because we are now fitting a regularized logistic regression model, the estimated coeffi cients above deviate slightly from our numerical findings in Question 1. Like with linear models, we can call lr.predict(x_train) to classify our training data with our fitted model. [20]: # Run this cell to make prediction, no further action needed. lr . predict(X_train) [20]: array([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 15 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0,
1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0]) Note that for a binary classification task, the sklearn model uses an unadjustable decision rule of 0.5. If you’re interested in manually adjusting this threshold, check out the documentation for lr.predict_proba(). 4.0.2 Question 3a: Accuracy Fill in the code below to compute the training and testing accuracy, defined as: Training Accuracy = 1 �� _ ���������� ������ ��∈����������_������ Testing Accuracy = 1 �� �� �� == ̂ �� �� �� _ �������� ������ ��∈��������_������ �� �� �� == ̂ �� �� where for the -th observation in the respective dataset, ̂ �� �� �� is the predicted response (class 0 or 1) and �� �� the true response. �� �� �� == ̂ �� �� is an indicator function which is 1 if �� �� = ̂ �� �� and $ 0$ otherwise. [21]: train_accuracy = sum (lr . predict(X_train) == Y_train) / len (Y_train) # SOLUTION test_accuracy = sum (lr . predict(X_test) == Y_test) / len (Y_test) # SOLUTION print ( f"Train accuracy: { train_accuracy : .4f } " ) print ( f"Test accuracy: { test_accuracy : .4f } " ) Train accuracy: 0.8709 Test accuracy: 0.9091 [22]: grader . check( "q3a" ) 16 [22]: q3a results: All test cases passed!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4.0.3 Question 3b: Precision and Recall It seems we can get a very high test accuracy. What about precision and recall? - Precision (also called positive predictive value) is the fraction of true positives among the total number of data points predicted as positive. - Recall (also known as sensitivity) is the fraction of true positives among the total number of data points with positive labels. Precision measures the ability of our classifier to not predict negative samples as positive (i.e., avoid false positives), while recall is the ability of the classifier to find all the positive samples (i.e., avoid false negatives). Below is a graphical illustration of precision and recall, modified slightly from Wikipedia : Mathemat ically, Precision and Recall are defined as: Precision = �� _ �������� ������������������ �� _ �������� ������������������ + �� _ ���������� ������������������ = �� �� + �� �� ���� Recall = �� _ �������� ������������������ �� _ �������� ������������������ + �� _ ���������� ������������������ = �� �� + �� �� ���� Use the formulas above to compute the precision and recall for the test set using the lr model trained using sklearn. 17 [23]: Y_test_pred = lr . predict(X_test) # SOLUTION precision = sum ((Y_test_pred == Y_test) & (Y_test_pred == 1 )) /
sum (Y_test_pred) # SOLUTION recall = sum ((Y_test_pred == Y_test) & (Y_test_pred == 1 )) / sum (Y_test) # SOLUTION print ( f'precision = { precision : .4f } ' ) print ( f'recall = { recall : .4f } ' ) precision = 0.9184 recall = 0.8333 [24]: grader . check( "q3b" ) [24]: q3b results: All test cases passed! Our precision is fairly high, while our recall is a bit lower. Consider the following plots, which display the distribution of the response variable in the �� training and test sets. Recall class labels are 0: benign, 1: malignant. [25]: fig, axes = plt . subplots( 1 , 2 ) sns . countplot(x = Y_train, ax = axes[ 0 ]); sns . countplot(x = Y_test, ax = axes[ 1 ]); axes[ 0 ] . set_title( 'Train' ) axes[ 1 ] . set_title( 'Test' ) plt . tight_layout(); 18
4.0.4 Question 3c Based on the above distribution, what might explain the observed difference between our precision and recall metrics? Type your answer here, replacing this text. SOLUTION: We obtain a good precision score: most of the cancer records that we label as positive are indeed positive. The recall score is not as good: our classifier has difficulty selecting all the true positive cancer records. We observe a significant class imbalance in the data, which might affect the performance of our classifier. 4.0.5 [Tutorial] Confusion Matrices To understand the link between precision and recall, it’s useful to create a confusion matrix of our predictions. Luckily, sklearn.metrics provides us with such a function! The confusion_matrix function ( documentation ) categorizes counts of datapoints based if their true and predicted values match. 19 For the 143-datapoint test dataset: [26]: # Run this cell to define confusion matrix, no further action needed. from sklearn.metrics import confusion_matrix
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Y_test_pred = lr . predict(X_test) cnf_matrix = confusion_matrix(Y_test, Y_test_pred) cnf_matrix [26]: array([[85, 4], [ 9, 45]]) We’ve implemented the following function to better visualize these four counts against the true and predicted categories: [27]: # Run this cell to plot confusion matrix, no further action needed. def plot_confusion_matrix (cm, classes, title = 'Confusion matrix' , cmap = plt . cm . Blues): """ This function prints and plots the confusion matrix. """ import itertools plt . imshow(cm, interpolation = 'nearest' , cmap = cmap) plt . title(title) plt . colorbar() tick_marks = np . arange( len (classes)) plt . xticks(tick_marks, classes, rotation =45 ) plt . yticks(tick_marks, classes) plt . grid( False ) thresh = cm . max() / 2. for i, j in itertools . product( range (cm . shape[ 0 ]), range (cm . shape[ 1 ])): plt . text(j, i, np . round(cm[i, j], 2 ), horizontalalignment = "center" , color = "white" if cm[i, j] > thresh else "black" ) plt . tight_layout() plt . ylabel( 'True label' ) plt . xlabel( 'Predicted label' ) class_names = [ 'False' , 'True' ] plot_confusion_matrix(cnf_matrix, classes = class_names, title = 'Confusion matrix, without normalization' ) 20
4.0.6 Question 3d: Normalized Confusion Matrix To better interpret these counts, assign cnf_matrix_norm to a normalized confusion matrix by the count of each true label category. In other words, build a 2-D NumPy array constructed by normalizing cnf_matrix by the count of datapoints in each row. For example, the top-left quadrant of cnf_matrix_norm should represent the proportion of true negatives over the total number of datapoints with negative labels. Hint : In array broadcasting, you may encounter issues dividing 2-D NumPy arrays by 1-D NumPy ar rays. * Check out the keepdims parameter in np.sum ( documentation ), to preserve the dimensions of cnf_matrix after using np.sum. * Alternatively, add the dimension back using np.newaxis ( documentation ). [28]: cnf_matrix_norm = cnf_matrix / cnf_matrix . sum(axis =1 )[:,np . newaxis] # SOLUTION 21 # Do not modify below this line. plot_confusion_matrix(cnf_matrix_norm, classes = class_names, title = 'Normalized confusion matrix' )
[29]: grader . check( "q3d" ) [29]: q3d results: All test cases passed! Compare the normalized confusion matrix to the values you computed for precision and recall earlier: [30]: # Run this cell to see precision and recall again, no further action needed. print ( f'precision = { precision : .4f } ' ) print ( f'recall = { recall : .4f } ' ) precision = 0.9184 recall = 0.8333 Based on the definitions of precision and recall, why does only recall appear in the normalized confusion matrix? Why doesn’t precision appear? (No answer required for this part; just something 22 to think about.) 4.1 Congratulations! You are finished with Lab 12! 4.2 Submission Make sure you have run all cells in your notebook in order before running the cell below, so that all
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
images/graphs appear in the output. The cell below will generate a zip file for you to submit. Please save before exporting! [31]: # Save your notebook first, then run this cell to export your submission. grader . export(pdf = False , run_tests = True ) Running your submission against local test cases… Your submission received the following results when run against available test cases: q1a results: All test cases passed! q1b results: All test cases passed! q2 results: All test cases passed! q3a results: All test cases passed! q3b results: All test cases passed! q3d results: All test cases passed! <IPython.core.display.HTML object> 23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help