lab12
docx
keyboard_arrow_up
School
University of California, Berkeley *
*We aren’t endorsed by this school
Course
C200
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
22
Uploaded by DrStar12779
lab12 April 14, 2023 [1]: # Initialize Otter import otter grader = otter
.
Notebook(
"lab12.ipynb"
) 0.0.1 Content Warning This lab includes discusssion about cancer. If you feel uncomfortable with this topic, please contact
your GSI or the instructors, or reach out via the Spring 2023 extenuating circumstances form.
1 Lab 12: Logistic Regression In this lab, we will manually construct the logistic regression model and minimize cross-entropy loss
using scipy.minimize. This structure mirrors the linear regression labs from earlier in the semester
and lets us dive deep into how logistic regression works. We also introduce the
sklearn.linear_model.LogisticRegression module that you would use in practice, and we ex plore
performance metrics for classification. 1.0.1 Due Date The on-time deadline is Tuesday, April 18th, 11:59 PM PT
. Please read the syllabus for the grace period policy. No late submissions beyond the grace period will be accepted. 1.0.2 Collaboration Policy Data science is a collaborative activity. While you may talk with others about this assignment, we
ask that you write your solutions individually
. If you discuss the assignment with others, please
include their names in the cell below. Collaborators: list names here [2]: # Run this cell to set up your notebook import numpy as np import pandas as pd import sklearn 1
import sklearn.datasets import matplotlib.pyplot as plt import seaborn as sns import plotly.offline as py import plotly.graph_objs as go
import plotly.figure_factory as ff %
matplotlib inline sns
.
set() sns
.
set_context(
"talk"
) 1.0.3 Lab Walk-Through In addition to the lab notebook, we have also released a prerecorded walk-through video of the lab.
We encourage you to reference this video as you work through the lab. Run the cell below to display
the video. Note
: The walkthrough video is recorded from Spring 2022. [3]: from IPython.display import YouTubeVideo YouTubeVideo(
"75hj59nas-M"
) [3]: 2
1.1 Data Loading We will explore a breast cancer dataset from the University of Wisconsin (
source
). This dataset can be loaded using the sklearn.datasets.load_breast_cancer() method. [4]: # Run this cell to load the data, no further action is needed. data = sklearn
.
datasets
.
load_breast_cancer() # Data is a dictionary. print
(data
.
keys()) print
(data
.
DESCR)
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module']) .. _breast_cancer_dataset: Breast cancer wisconsin (diagnostic) dataset -------------------------------------------- **Data Set Characteristics:** :Number of Instances: 569 :Number of Attributes: 30 numeric, predictive attributes and the class :Attribute Information: - radius (mean of distances from center to points on the perimeter) - texture (standard deviation of gray-scale values) - perimeter - area - smoothness (local variation in radius lengths) - compactness (perimeter^2 / area - 1.0) - concavity (severity of concave portions of the contour) - concave points (number of concave portions of the contour) - symmetry - fractal dimension ("coastline approximation" - 1) The mean, standard error, and "worst" or largest (mean of the three worst/largest values) of these features were computed for each image, resulting in 30 features. For instance, field 0 is Mean Radius, field 10 is Radius SE, field 20 is Worst Radius. - class: - WDBC-Malignant 3
- WDBC-Benign :Summary Statistics: ===================================== ====== ====== Min Max ===================================== ====== ====== radius (mean): 6.981 28.11 texture (mean): 9.71 39.28 perimeter (mean): 43.79 188.5 area (mean): 143.5 2501.0 smoothness (mean): 0.053 0.163 compactness (mean): 0.019 0.345 concavity (mean): 0.0 0.427 concave points (mean): 0.0 0.201 symmetry (mean): 0.106 0.304 fractal dimension (mean): 0.05 0.097 radius (standard error): 0.112 2.873
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
texture (standard error): 0.36 4.885 perimeter (standard error): 0.757 21.98 area (standard error): 6.802 542.2 smoothness (standard error): 0.002 0.031 compactness (standard error): 0.002 0.135 concavity (standard error): 0.0 0.396 concave points (standard error): 0.0 0.053 symmetry (standard error): 0.008 0.079 fractal dimension (standard error): 0.001 0.03 radius (worst): 7.93 36.04 texture (worst): 12.02 49.54 perimeter (worst): 50.41 251.2 area (worst): 185.2 4254.0 smoothness (worst): 0.071 0.223 compactness (worst): 0.027 1.058 concavity (worst): 0.0 1.252 concave points (worst): 0.0 0.291 symmetry (worst): 0.156 0.664 fractal dimension (worst): 0.055 0.208 ===================================== ====== ====== :Missing Attribute Values: None :Class Distribution: 212 - Malignant, 357 - Benign :Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian :Donor: Nick Street :Date: November, 1995 4
This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets. https://goo.gl/U2Uwz2 Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. Separating plane described above was obtained using Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree Construction Via Linear Programming." Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes. The actual linear program used to obtain the separating plane in the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets",
Optimization Methods and Software 1, 1992, 23-34]. This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/ .. topic:: References - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science
and Technology, volume 1905, pages 861-870, San Jose, CA, 1993. - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995. - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques
to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171. Since the data format is a dictionary, we do some preprocessing to create a pandas.DataFrame. [5]: # Run this cell to see the first five rows of the data, no further action is
␣ ↪
needed. 5
df = pd
.
DataFrame(data
.
data, columns
=
data
.
feature_names) df
.
head() [5]: mean radius mean texture mean perimeter mean area mean smoothness \ 0 17.99 10.38
122.80 1001.0 0.11840 1 20.57 17.77 132.90 1326.0 0.08474 2 19.69 21.25 130.00 1203.0
0.10960 3 11.42 20.38 77.58 386.1 0.14250 4 20.29 14.34 135.10 1297.0 0.10030 mean compactness mean concavity mean concave points mean symmetry \ 0 0.27760 0.3001 0.14710 0.2419 1 0.07864 0.0869 0.07017 0.1812 2 0.15990 0.1974 0.12790 0.2069 3 0.28390 0.2414 0.10520 0.2597 4 0.13280 0.1980 0.10430 0.1809 mean fractal dimension … worst radius worst texture worst perimeter \ 0 0.07871 … 25.38 17.33 184.60 1 0.05667 … 24.99 23.41 158.80 2 0.05999 … 23.57 25.53 152.50 3 0.09744 … 14.91 26.50 98.87 4 0.05883 … 22.54 16.67 152.20 worst area worst smoothness worst compactness worst concavity \ 0 2019.0 0.1622 0.6656 0.7119 1 1956.0 0.1238 0.1866 0.2416 2 1709.0 0.1444 0.4245 0.4504
3 567.7 0.2098 0.8663 0.6869 4 1575.0 0.1374 0.2050 0.4000 worst concave points worst symmetry worst fractal dimension 0 0.2654 0.4601 0.11890 1 0.1860 0.2750 0.08902 2 0.2430 0.3613 0.08758 3 0.2575 0.6638 0.17300 4 0.1625 0.2364 0.07678 [5 rows x 30 columns]
The prediction task for this data is to predict whether a tumor is benign or malignant (a binary
decision) given characteristics of that tumor. As a classic machine learning dataset, the prediction
task is captured by the field data.target. To put the data back in its original context, we will create a
new column called "malignant" which will be 1 if the tumor is malignant and 0 if it is benign (reversing
the definition of target). In this lab, we will fit a simple classification model to predict breast cancer from the cell nuclei of a breast mass. For simplicity, we will work with only one feature: the mean radius which 6
corresponds to the size of the tumor. Our output (i.e., response) is the malignant column. [6]: # Run this cell to define X and Y, no further action is needed. # Target data_dict['target'] = 0 is malignant 1 is benign df[
'malignant'
] = (data
.
target == 0
)
.
astype(
int
) # Define our features/design matrix X X = df[[
"mean radius"
]] Y = df[
'malignant'
] Before we go further, we will split our dataset into training and testing data. This lets us explore the prediction power of our trained classifier on both seen and unseen data. [7]: # Run this cell to create a 75-25 train-test split, no further action needed. from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size
=0.25
,
␣ ↪
random_state
=42
) print
(
f"Training Data Size: {
len
(X_train)
}
"
) print
(
f"Test Data Size: {
len
(X_test)
}
"
) Training Data Size: 426 Test Data Size: 143 2 Part 1: Defining the Model In these first two parts, you will manually build a logistic regression classifier. Recall that the Logistic Regression model is written as follows: ��
��
(
) = (
��
�� ��
��
) ��
where ��
��
(
) = (
= 1|
) is the probability that our observation belongs to class ��
�� ��
��
1, and is the sigmoid activation function: ��
(
) = �� ��
1 1 + ��
−��
If we have a single feature, then is a scalar and our model has parameters ��
��
��
= [
��
0
��
1
] as
follows:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
��
��
(
) = (
��
�� ��
0 + ��
1
) ��
Therefore just like OLS, if we have datapoints and features, then we can construct the ��
��
design matrix �� ∈ ℝ
×(
+1)
��
��
7
with an all-ones column. Run the below cell to construct X_intercept_train. The syntax should look familiar: [8]: # Run this cell to add the bias column, no further action needed. def add_bias_column
(X): return np
.
hstack([np
.
ones((
len
(X), 1
)), X]) X_intercept_train = add_bias_column(X_train) X_intercept_train
.
shape [8]: (426, 2) 2.0.1 Question 1a Using the above definition for , we can also construct a matrix representation of our Logistic ��
Regression model, just like we did for OLS. Noting that ��
��
= [
��
0
��
1 … ��
��
], the vector
��
̂
is: = (
) ��
�� ����
Then the -th element of ��
��
̂
is the probability that the -th observation belongs to class 1, ��
given the feature vector is the -th row of design matrix and the parameter vector is . ��
��
��
Below, implement the lr_model function to evaluate this expression. To matrix-multiply two numpy
arrays, use @ or np.dot. In case you’re interested, the matmul documentation contrasts the two
methods. [9]: def sigmoid
(z): """ The sigmoid function, defined for you. """ return 1 / (
1 + np
.
exp(
-
z)) def lr_model
(theta, X): """ Return the logistic regression model as defined above. You should not need to use a for loop; use @ or np.dot. Args: theta: The model parameters. Dimension (p+1,).
X: The design matrix. Dimension (n, p+1). Return: Probabilities that Y = 1 for each datapoint. Dimension (n,). """ 8
return sigmoid(X
.
dot(theta)) # SOLUTION [10]: grader
.
check(
"q1a"
) [10]: q1a results: All test cases passed! 2.0.2 Question 1b: Compute Empirical Risk Now let’s try to analyze the cross-entropy loss from logistic regression. Suppose for a
single obser vation, we predict probability �� that the true response �� is in class 1
(otherwise the prediction is 0 with probability 1 − ��). The cross-entropy loss is: − (�� log(��) + (1 − ��)log(1 − ��)) For the logistic regression model, the empirical risk is therefore defined as the average cross entropy loss across all datapoints: ��
��(��) = − 1
��
��
∑ =1 ��
(
��
��
log(
(
�� ��
��
��
��)) + (1 − ��
��
)log(1 − ��(��
��
��
))) ��
Where ��
��
is the ��−th response in our dataset, �� are the parameters of our model, ��
��
��
is the i’th row of our design matrix , and (
��
�� ��
��
��
) is the ��
probability that the response is 1 given input ��
��
. Note
: In this class, when performing linear algebra operations, we interpret both rows and columns
as column vectors. So if we wish to calculate the dot product between column vector ��
��
and a
vector , we write ��
��
��
��
. ��
Below, implement the function lr_loss that computes empirical risk over the dataset. Feel free to use functions defined in the previous part. [11]: def lr_avg_loss
(theta, X, Y): ''' Compute the average cross entropy loss using X, Y, and theta. You should not need to use a for loop. Args: theta: The model parameters. Dimension (p+1,) X: The design matrix. Dimension (n, p+1).
Y: The label. Dimension (n,). Return: The average cross entropy loss. ''' # BEGIN SOLUTION prob_1s = sigmoid(X
.
dot(theta)) # or lr_model(theta, X) 9
loss = -
np
.
mean((Y * np
.
log(prob_1s)) + ((
1 - Y) * np
.
log(
1 - prob_1s))) # END SOLUTION return loss # SOLUTION [12]: grader
.
check(
"q1b"
) [12]: q1b results: All test cases passed! Below is a plot showing the average training cross-entropy loss for various values of ��
0 and ��
1
(respectively x and y axis in the plot). [13]: # Run this cell to create the plotly visualization, no further action needed. with np
.
errstate(invalid
=
'ignore'
, divide
=
'ignore'
): uvalues = np
.
linspace(
-8
,
8
,
70
) vvalues = np
.
linspace(
-5
,
5
,
70
) (u,v) = np
.
meshgrid(uvalues, vvalues) thetas = np
.
vstack((u
.
flatten(),v
.
flatten())) lr_avg_loss_values = np
.
array([lr_avg_loss(t, X_intercept_train, Y_train)
␣ ↪
for t in thetas
.
T]) lr_loss_surface = go
.
Surface(name
=
"Logistic Regression Loss"
, x
=
u, y
=
v, z
=
np
.
reshape(lr_avg_loss_values,(
len
(uvalues),
␣ ↪
len
(vvalues))), contours
=
dict
(z
=
dict
(show
=
True
, color
=
"gray"
, project
=
dict
(z
=
True
))) ) fig = go
.
Figure(data
=
[lr_loss_surface]) fig
.
update_layout( scene = dict
( xaxis_title
=
'theta_0'
, yaxis_title
=
'theta_1'
, zaxis_title
=
'Loss'
), width
=700
, margin
=
dict
(r
=20
, l
=10
, b
=10
, t
=10
)) py
.
iplot(fig)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10
2.0.3 Question 1c Describe one interesting observation about the loss plot above. Type your answer here, replacing this text. SOLUTION: One remark that can be made is that this plot shows that there are multiple points that
minimize the loss. Therefore, there is not necessarily a unique optimizer for the cross-entropy loss
function. 3 Part 2: Fit and Predict 3.0.1 [Tutorial] scipy.optimize.minimize The next two cells call the minimize function from scipy on the lr_avg_loss function you defined in
the previous part. We pass in the training data to args (
documentation
) to find the theta_hat that
minimizes the average cross-entropy loss over the training set. [14]: # Run this cell to minimize lr_avg_loss using scipy, no further action needed. from scipy.optimize import minimize min_result = minimize(lr_avg_loss, x0
=
np
.
zeros(X_intercept_train
.
shape[
1
]), args
=
(X_intercept_train, Y_train)) min_result [14]: fun: 0.3123767645009187 hess_inv: array([[747.98712729, -52.13268913], [-52.13268913, 3.68380729]]) jac: array([-4.13507223e-07, -7.34627247e-06]) message: 'Optimization terminated successfully.' nfev: 57 nit: 16 njev: 19 status: 0 success: True x: array([-13.87178638, 0.93723916])
[15]: # Run this cell to print `theta_hat`, no further action needed. theta_hat = min_result[
'x'
] theta_hat 11
[15]: array([-13.87178638, 0.93723916]) Because our design matrix leads with a column of all ones, theta_hat has two elements: ��
̂ ��
0
is
the estimate of the intercept/bias term, and ̂ ��
1
is the estimate of the slope of our single feature. 3.0.2 Recap: • For logistic regression with parameter , (
= 1|
) = (
�� ��
��
��
�� ��
��
), where
��
is the sigmoid function and is a feature vector. Therefore (
��
��
�� ��
��
��) is the
Bernoulli probability that the response is 1 given the feature is ��. Otherwise the
response is 0 with probability �� (�� = 0|��) = 1 − ��(��
��
). ��
• The ̂ that ��
minimizes average cross entropy loss of our training data also maximizes the
likelihood of observing the training data according to the logistic regression model (check out
lecture for more details). The main takeaway is that logistic regression models probabilities of classifying datapoints as 1 or 0. Next, we use this takeaway to implement model predictions. 3.1 Question 2 Using the theta_hat estimate above, we can construct a decision rule for classifying a datapoint with observation . Let (
= 1|
) = (
��
�� ��
��
�� ��
��
̂ ): ��
classify(
) = {
��
1, if �� (�� = 1|��) ≥ 0.5 0, if (
= 1|
) < 0.5 �� ��
��
This decision rule has a decision threshold = 0.5. This threshold means that we treat the
��
classes 0 and 1 “equally.” Lower thresholds mean that we are more likely to predict 1, whereas
higher thersholds mean that we are more likely to predict 0. Implement the lr_predict function below, which returns a vector of predictions according to the
logistic regression model. The function takes a design matrix of observations X, parameter estimate
theta, and decision threshold threshold with default value 0.5. [16]: def lr_predict
(theta, X, threshold
=0.5
): ''' Classification using a logistic regression model with a given decision rule threshold. Args:
theta: The model parameters. Dimension (p+1,) X: The design matrix. Dimension (n, p+1). threshold: decision rule threshold for predicting class 1. Return: 12
A vector of predictions. ''' return (lr_model(theta, X) >= threshold)
.
astype(
int
) # SOLUTION # Do not modify below this line. Y_train_pred = lr_predict(theta_hat, X_intercept_train) Y_train_pred [16]: array([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0]) [17]: grader
.
check(
"q2"
) [17]: q2 results: All test cases passed! 3.1.1 [Tutorial] Linearly separable data How do these predicted classifications compare to the true responses ? ��
Run the below cell to visualize our predicted responses, the true responses, and the probabili ties
we used to make predictions. We use sns.stripplot which introduces some jitter to avoid overplotting.
[18]: # Run this cell to generate the visualization, no further action needed. plot_df = pd
.
DataFrame({
"X"
: np
.
squeeze(X_train), "Y"
: Y_train,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
"Y_pred"
: Y_train_pred, "correct"
: (Y_train == Y_train_pred)}) sns
.
stripplot(data
=
plot_df, x
=
"X"
, y
=
"Y"
, orient
=
'h'
, alpha
=0.5
, hue
=
"correct"
) 13
plt
.
xlabel(
'mean radius, $x$'
) plt
.
ylabel(
'$y$'
) plt
.
yticks(ticks
=
[
0
, 1
], labels
=
[
'0:
\n
benign'
, '1:
\n
malignant'
]) plt
.
title(
"Predictions for decision threshold T = 0.5"
) plt
.
show() Because we are using a decision threshold �� = 0.5, we predict 1 for all �� where ��( ⃗��
��
��) ≥ 0.5, which happens when:
1 1 + ��
− ⃗��
��
��
=
1
2
→ ��
− ⃗��
��
��
= 1 → ⃗��
��
= 0 ��
. For the single mean radius feature, we can use algebra to solve for the boundary to be approximately �� ≈ 14.8. We can see this by substituting for �� = ̂ in the equation ��
above: ⃗⃗��
��
̂ = 0
��
[1 ��
]
[
̂ ��
0
̂ ��
1
] = 0 From the minimize function, we found that theta_hat is array([-13.87178638, 0.93723916]). Plugging for ̂ : ��
−13.87178638 + 0.93723916�� = 0�� ≈ 14.8
14
In other words, will always predict 0 (benign) if the mean radius feature is less than 14.8, and 1
(malignant) otherwise. However, in our training data there are datapoints with large mean radii that
are benign, and vice versa. Our data is not linearly separable by a vertical line. The above visualization is useful when we have just one feature. In practice, however, we use other
performance metrics to diagnose our model performance. Next, we will explore several such metrics:
accuracy, precision, recall, and confusion matrices. 4 Part 3: Quantifying Performance 4.0.1 [Tutorial] sklearn’s LogisticRegression Instead of using the model structure we built manually in the previous questions, we will instead use
sklearn’s LogisticRegression model, which operates similarly to the sklearn OLS, Ridge, and LASSO
models. Let’s first fit a logistic regression model to the training data. Some notes: * Like with linear models,
the fit_intercept argument specifies if model includes an intercept term. We therefore pass in the
original matrix X_train (defined at the beginning of the notebook, without intercept term) in the call to
lr.fit(). * sklearn fits a l2 regularized logistic regression model by default as specified in the
documentation for more details. The penalty argument specifies the regularization penalty term. [19]: # Run this cell to fit a sklearn LogisticRegression model, no further action
␣ ↪
needed. from sklearn.linear_model import LogisticRegression lr = LogisticRegression( fit_intercept
=
True
, penalty
=
'l2'
) lr
.
fit(X_train, Y_train) lr
.
intercept_, lr
.
coef_ [19]: (array([-13.75289919]), array([[0.92881284]])) Note that because we are now fitting a regularized logistic regression model, the estimated coeffi cients above deviate slightly from our numerical findings in Question 1. Like with linear models, we can call lr.predict(x_train) to classify our training data with our fitted model. [20]: # Run this cell to make prediction, no further action needed. lr
.
predict(X_train) [20]: array([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 15
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0,
0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0,
1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0,
1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0,
1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1,
1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1,
0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0,
1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0,
0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0,
0, 1, 0, 0, 0, 0, 0, 0]) Note that for a binary classification task, the sklearn model uses an unadjustable decision rule of 0.5.
If you’re interested in manually adjusting this threshold, check out the documentation
for
lr.predict_proba(). 4.0.2 Question 3a: Accuracy Fill in the code below to compute the training and testing accuracy, defined as:
Training Accuracy =
1 ��
_
���������� ������
∑ ��∈����������_������ Testing Accuracy =
1 ��
��
��
== ̂
��
��
��
_
�������� ������
∑ ��∈��������_������ ��
��
��
== ̂
��
��
where for the -th observation in the respective dataset, ̂
��
��
��
is the predicted response (class 0 or 1) and ��
��
the true response. ��
��
��
== ̂
��
��
is an indicator function which is 1 if ��
��
= ̂
��
��
and $ 0$ otherwise. [21]: train_accuracy = sum
(lr
.
predict(X_train) == Y_train) / len
(Y_train) # SOLUTION test_accuracy =
sum
(lr
.
predict(X_test) == Y_test) / len
(Y_test) # SOLUTION print
(
f"Train accuracy: {
train_accuracy
:
.4f
}
"
) print
(
f"Test accuracy: {
test_accuracy
:
.4f
}
"
) Train accuracy: 0.8709 Test accuracy: 0.9091 [22]: grader
.
check(
"q3a"
) 16
[22]: q3a results: All test cases passed!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4.0.3 Question 3b: Precision and Recall It seems we can get a very high test accuracy. What about precision and recall? - Precision (also called positive predictive value) is the fraction of true positives among the total number of data points
predicted as positive. - Recall (also known as sensitivity) is the fraction of true positives among the total number of data points with positive labels. Precision measures the ability of our classifier to not predict negative samples as positive (i.e., avoid
false positives), while recall is the ability of the classifier to find all the positive samples (i.e., avoid
false negatives). Below is a graphical illustration of precision and recall, modified slightly from Wikipedia
:
Mathemat
ically, Precision and Recall are defined as: Precision =
��
_
�������� ������������������
��
_
�������� ������������������
+
��
_
���������� ������������������
=
�� ��
+ �� ��
����
Recall =
��
_
�������� ������������������
��
_
�������� ������������������
+
��
_
���������� ������������������
=
�� ��
+ �� ��
����
Use the formulas above to compute the precision and recall for the test set using the lr model trained using sklearn. 17
[23]: Y_test_pred = lr
.
predict(X_test) # SOLUTION precision = sum
((Y_test_pred == Y_test) & (Y_test_pred == 1
)) /
␣
↪
sum
(Y_test_pred) # SOLUTION recall = sum
((Y_test_pred == Y_test) & (Y_test_pred == 1
)) / sum
(Y_test) #
␣ ↪
SOLUTION print
(
f'precision = {
precision
:
.4f
}
'
) print
(
f'recall = {
recall
:
.4f
}
'
) precision = 0.9184 recall = 0.8333 [24]: grader
.
check(
"q3b"
) [24]: q3b results: All test cases passed! Our precision is fairly high, while our recall is a bit lower. Consider the following plots, which display the distribution of the response variable in the ��
training and test sets. Recall class labels are 0: benign, 1: malignant. [25]: fig, axes = plt
.
subplots(
1
, 2
) sns
.
countplot(x
=
Y_train, ax
=
axes[
0
]); sns
.
countplot(x
=
Y_test, ax
=
axes[
1
]); axes[
0
]
.
set_title(
'Train'
) axes[
1
]
.
set_title(
'Test'
) plt
.
tight_layout(); 18
4.0.4 Question 3c Based on the above distribution, what might explain the observed difference between our precision and recall metrics? Type your answer here, replacing this text. SOLUTION: We obtain a good precision score: most of the cancer records that we label as positive
are indeed positive. The recall score is not as good: our classifier has difficulty selecting all the true
positive cancer records. We observe a significant class imbalance in the data, which might affect the
performance of our classifier. 4.0.5 [Tutorial] Confusion Matrices To understand the link between precision and recall, it’s useful to create a confusion matrix of our predictions. Luckily, sklearn.metrics provides us with such a function! The confusion_matrix function (
documentation
) categorizes counts of datapoints based if their true and predicted values match. 19
For the 143-datapoint test dataset: [26]: # Run this cell to define confusion matrix, no further action needed. from sklearn.metrics import confusion_matrix
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Y_test_pred = lr
.
predict(X_test) cnf_matrix = confusion_matrix(Y_test, Y_test_pred) cnf_matrix [26]: array([[85, 4], [ 9, 45]]) We’ve implemented the following function to better visualize these four counts against the true and predicted categories: [27]: # Run this cell to plot confusion matrix, no further action needed. def plot_confusion_matrix
(cm, classes, title
=
'Confusion matrix'
, cmap
=
plt
.
cm
.
Blues): """ This function prints and plots the confusion matrix. """ import itertools plt
.
imshow(cm, interpolation
=
'nearest'
, cmap
=
cmap) plt
.
title(title) plt
.
colorbar() tick_marks = np
.
arange(
len
(classes)) plt
.
xticks(tick_marks, classes, rotation
=45
) plt
.
yticks(tick_marks, classes) plt
.
grid(
False
) thresh = cm
.
max() / 2. for i, j in itertools
.
product(
range
(cm
.
shape[
0
]), range
(cm
.
shape[
1
])): plt
.
text(j, i, np
.
round(cm[i, j], 2
), horizontalalignment
=
"center"
, color
=
"white" if cm[i, j] > thresh else "black"
) plt
.
tight_layout() plt
.
ylabel(
'True label'
) plt
.
xlabel(
'Predicted label'
) class_names = [
'False'
, 'True'
] plot_confusion_matrix(cnf_matrix, classes
=
class_names, title
=
'Confusion matrix, without normalization'
) 20
4.0.6 Question 3d: Normalized Confusion Matrix To better interpret these counts, assign cnf_matrix_norm to a normalized confusion matrix by the count of each true label category. In other words, build a 2-D NumPy array constructed by normalizing cnf_matrix by the count of
datapoints in each row. For example, the top-left quadrant of cnf_matrix_norm should represent the
proportion of true negatives over the total number of datapoints with negative labels. Hint
: In array broadcasting, you may encounter issues dividing 2-D NumPy arrays by 1-D NumPy ar rays.
* Check out the keepdims parameter in np.sum (
documentation
), to preserve the dimensions of
cnf_matrix after using np.sum. * Alternatively, add the dimension back using np.newaxis
(
documentation
). [28]: cnf_matrix_norm = cnf_matrix / cnf_matrix
.
sum(axis
=1
)[:,np
.
newaxis] # SOLUTION 21
# Do not modify below this line. plot_confusion_matrix(cnf_matrix_norm, classes
=
class_names, title
=
'Normalized confusion matrix'
)
[29]: grader
.
check(
"q3d"
) [29]: q3d results: All test cases passed! Compare the normalized confusion matrix to the values you computed for precision and recall earlier: [30]: # Run this cell to see precision and recall again, no further action needed. print
(
f'precision = {
precision
:
.4f
}
'
) print
(
f'recall = {
recall
:
.4f
}
'
) precision = 0.9184 recall = 0.8333 Based on the definitions of precision and recall, why does only recall appear in the normalized confusion matrix? Why doesn’t precision appear? (No answer required for this part; just something 22
to think about.) 4.1 Congratulations! You are finished with Lab 12! 4.2 Submission Make sure you have run all cells in your notebook in order before running the cell below, so that all
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
images/graphs appear in the output. The cell below will generate a zip file for you to submit. Please
save before exporting! [31]: # Save your notebook first, then run this cell to export your submission. grader
.
export(pdf
=
False
, run_tests
=
True
) Running your submission against local test cases… Your submission received the following results when run against available test cases: q1a results: All test cases passed! q1b results: All test cases passed! q2 results: All test cases passed! q3a results: All test cases passed! q3b results: All test cases passed! q3d results: All test cases passed! <IPython.core.display.HTML object> 23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
Give five examples where the use
of regression analysis
can be beneficiálly be made.
arrow_forward
We use two data points and an exponential function to model the population of the United States from 1970 through 2010. The data are shown in the table. Use all five data points to solve, Use your graphing utility’s linear regression option to obtain a model of the form y = ax + b that fits the data. How well does the correlation coefficient, r, indicate that the model fitsthe data?
arrow_forward
Pison/&user=tmorrison7&key%3DrotNBBCzVIsGRqZV8t..
Previous Problem
Problem List
Next Problem
(1 point) Empathy means being able to understand what others feel. To see how the brain expresses empathy, researchers recruited 16 couples in their midtwenties
who were married or had been dating for at least two years. They zapped the man's hand with an electrode while the woman watched, and measured the activity in
several parts of the woman's brain that would respond to her own pain. Brain activity was recorded as a fraction of the activity observed when the woman herself
was zapped with the electrode. The women also completed a psychological test that measures empathy.
Subject
1
3
4
7
8
9
10
11
12
13
14
15
16
Empathy Score
41
48
39
56
64
63
59
53
Brain Activity -0.127 0.3910.02 0.378 0.015 0.404 0.11 0.513 0.149 0.736 0.254 0.573 0.217 0.72 0.355 0.786
47
52
64
73
12
68
28
120
1.2
sub ject Z
F20
10,0
120
-8.4
Given that the equation for the regression line is y =
0.00619x +0.00032, what is…
arrow_forward
Define the different ways to use linear regression?
arrow_forward
Pls help ASAP. Pls show all work.
arrow_forward
This uses minitab.
Make a regression that will enable you to predict BAC from the number of beers consumed. What is the predicted BAC for a person who has consumed 7 beers? Give your answer to 3 decimal places.
arrow_forward
Please help me understand how to solve question and understand properly.
A researcher conducted a study about the hemoglobin levels among menopausal and non-menopausal women. With study of 20 women total, the following information was collected: Hemoglobin Level (g/dl), Age, and Menopausal Status (yes/no). The researcher conducted a multiple logistic regression analysis to find the effects of hemoglobin level, and age on menopausal status. The summary of the analysis is attached as photo.
Question: Determine the 95% confidence interval for the Odds Ratio for the hemoglobin level with interpretation.
arrow_forward
A researcher calculates a regression equation to predict an insurance premium based
on a person's age. One person in the sample was observed to have a premium of
$500 at 20 years old, but the predicted value was $400. VWhat is the value of the
residual for this person?
Next Page
Page 10 of 14
arrow_forward
Discuss Linear Regression. (at least 50 words minimum not copying from a website or your textbook).
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/b0445/b044547db96333d789eefbebceb5f3241eb2c484" alt="Text book image"
data:image/s3,"s3://crabby-images/005d3/005d33e7f3eb02359be846bf8989d1c18295f0a9" alt="Text book image"
Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Related Questions
- Give five examples where the use of regression analysis can be beneficiálly be made.arrow_forwardWe use two data points and an exponential function to model the population of the United States from 1970 through 2010. The data are shown in the table. Use all five data points to solve, Use your graphing utility’s linear regression option to obtain a model of the form y = ax + b that fits the data. How well does the correlation coefficient, r, indicate that the model fitsthe data?arrow_forwardPison/&user=tmorrison7&key%3DrotNBBCzVIsGRqZV8t.. Previous Problem Problem List Next Problem (1 point) Empathy means being able to understand what others feel. To see how the brain expresses empathy, researchers recruited 16 couples in their midtwenties who were married or had been dating for at least two years. They zapped the man's hand with an electrode while the woman watched, and measured the activity in several parts of the woman's brain that would respond to her own pain. Brain activity was recorded as a fraction of the activity observed when the woman herself was zapped with the electrode. The women also completed a psychological test that measures empathy. Subject 1 3 4 7 8 9 10 11 12 13 14 15 16 Empathy Score 41 48 39 56 64 63 59 53 Brain Activity -0.127 0.3910.02 0.378 0.015 0.404 0.11 0.513 0.149 0.736 0.254 0.573 0.217 0.72 0.355 0.786 47 52 64 73 12 68 28 120 1.2 sub ject Z F20 10,0 120 -8.4 Given that the equation for the regression line is y = 0.00619x +0.00032, what is…arrow_forward
- Define the different ways to use linear regression?arrow_forwardPls help ASAP. Pls show all work.arrow_forwardThis uses minitab. Make a regression that will enable you to predict BAC from the number of beers consumed. What is the predicted BAC for a person who has consumed 7 beers? Give your answer to 3 decimal places.arrow_forward
- Please help me understand how to solve question and understand properly. A researcher conducted a study about the hemoglobin levels among menopausal and non-menopausal women. With study of 20 women total, the following information was collected: Hemoglobin Level (g/dl), Age, and Menopausal Status (yes/no). The researcher conducted a multiple logistic regression analysis to find the effects of hemoglobin level, and age on menopausal status. The summary of the analysis is attached as photo. Question: Determine the 95% confidence interval for the Odds Ratio for the hemoglobin level with interpretation.arrow_forwardA researcher calculates a regression equation to predict an insurance premium based on a person's age. One person in the sample was observed to have a premium of $500 at 20 years old, but the predicted value was $400. VWhat is the value of the residual for this person? Next Page Page 10 of 14arrow_forwardDiscuss Linear Regression. (at least 50 words minimum not copying from a website or your textbook).arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillTrigonometry (MindTap Course List)TrigonometryISBN:9781305652224Author:Charles P. McKeague, Mark D. TurnerPublisher:Cengage Learning
- Big Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtAlgebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:Cengage
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/b0445/b044547db96333d789eefbebceb5f3241eb2c484" alt="Text book image"
data:image/s3,"s3://crabby-images/005d3/005d33e7f3eb02359be846bf8989d1c18295f0a9" alt="Text book image"
Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage