Homework2_HiteshNarra
pdf
keyboard_arrow_up
School
University of Wisconsin, Madison *
*We aren’t endorsed by this school
Course
521
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
42
Uploaded by MagistrateKomodoDragon3072
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
1/42
ISyE 521: HOMEWORK 2
HITESH NARRA
PROBLEM 1
Predicting Life Expectancy in the United States during the 1970s:
In [1]:
#1
import
pandas as
pd
from
sklearn.model_selection import
KFold
, GridSearchCV
, train_test_split
from
sklearn.linear_model import
Lasso
from
sklearn.tree import
DecisionTreeRegressor
from
sklearn.metrics import
r2_score
import
time # Load the dataset
data =
pd
.
read_csv
(
'StateData.csv'
) # Replace 'StateData.csv' with your file path
num_observations =
len
(
data
)
print
(
f"Number of observations: {
num_observations
}"
)
# Define features (X) and target variable (y)
X =
data
[[
'Population'
, 'Income'
, 'Illiteracy'
, 'Murder'
, 'HighSchoolGrad'
, 'Frost'
y =
data
[
'LifeExp'
]
start_time
=
time
.
time
()
# Define different random seeds
random_seeds =
[
1
, 5
, 10
, 20
] # Add more random seeds as needed
for
seed in
random_seeds
:
# Initialize models
lasso_model =
Lasso
()
cart_model =
DecisionTreeRegressor
()
# KFold with 10 folds and specific random seed
kf =
KFold
(
n_splits
=
10
, shuffle
=
True
, random_state
=
seed
)
# Initialize parameters for GridSearchCV
param_grid_lasso =
{
'alpha'
: [
0.001
, 0.01
, 0.1
, 1
, 10
]}
param_grid_cart =
{
'min_samples_leaf'
: [
0
, 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, 10
]}
# Lists to store R-squared scores
r2_scores_lasso =
[]
r2_scores_cart =
[]
# Perform 10-fold cross-validation and hyperparameter tuning
for
train_index
, test_index in
kf
.
split
(
X
):
X_train
, X_test =
X
.
iloc
[
train_index
], X
.
iloc
[
test_index
]
y_train
, y_test =
y
.
iloc
[
train_index
], y
.
iloc
[
test_index
]
# GridSearchCV for Lasso
grid_search_lasso =
GridSearchCV
(
lasso_model
, param_grid_lasso
, scoring
=
'r2
grid_search_lasso
.
fit
(
X_train
, y_train
)
best_lasso_model =
grid_search_lasso
.
best_estimator_
y_pred_lasso =
best_lasso_model
.
predict
(
X_test
)
r2_lasso =
r2_score
(
y_test
, y_pred_lasso
)
r2_scores_lasso
.
append
(
r2_lasso
)
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
2/42
Number of observations: 50
# GridSearchCV for CART
grid_search_cart =
GridSearchCV
(
cart_model
, param_grid_cart
, scoring
=
'r2'
, grid_search_cart
.
fit
(
X_train
, y_train
)
best_cart_model =
grid_search_cart
.
best_estimator_
y_pred_cart =
best_cart_model
.
predict
(
X_test
)
r2_cart =
r2_score
(
y_test
, y_pred_cart
)
r2_scores_cart
.
append
(
r2_cart
)
# Calculate average R-squared values for each seed
avg_r2_lasso =
sum
(
r2_scores_lasso
) /
len
(
r2_scores_lasso
)
avg_r2_cart =
sum
(
r2_scores_cart
) /
len
(
r2_scores_cart
)
print
(
f"Random Seed: {
seed
}"
)
print
(
f"Average R-squared for Lasso model: {
avg_r2_lasso
}"
)
print
(
f"Average R-squared for CART model: {
avg_r2_cart
}"
)
if
avg_r2_lasso >
avg_r2_cart
:
print
(
"Lasso model performed better."
)
else
:
print
(
"CART model performed better."
)
print
(
"\n"
)
end_time =
time
.
time
()
#Calculate the elapsed time
elapsed_time =
end_time -
start_time
print
(
f"Time taken to run the models: {
elapsed_time
:.2f} seconds"
)
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
3/42
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.11850502 0.14137533 0.1510754 0.25586429 0.17071131
0.20143935 0.18175009 0.32927866 0.15656313 0.27729491]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.106957
16 -0.10692139 -0.04741887 0.03106423 0.01219886
0.11055171 0.06554953 0.10779761 0.0789667 0.09099035]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
4/42
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.053191
47 -0.02364643 0.17981273 -0.09907187 0.12707414
0.25396166 0.27257061 -0.00779393 0.02166761 -0.02792838]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.567759
95 0.06513312 0.05465951 0.00112932 0.01172282
-0.42682527 -0.43879821 -0.75089062 -0.81760283 -0.6079314 ]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
5/42
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -1.147004
65 -0.88781403 -0.79827566 -0.22980491 -0.27949846
-0.12661363 0.16324384 0.08064276 -0.4649434 -0.42879903]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.25432442 0.20453046 0.10024083 0.05054613 0.15096296
0.16271187 0.10178181 0.1179111 0.2240885 0.24965337]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969:
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
6/42
UserWarning: One or more of the test scores are non-finite: [ nan 0.010124
07 -0.23950088 0.21310493 0.14431472 0.16146947
0.26397747 0.07250198 0.09208388 0.03997921 0.07865732]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.121060
55 0.13775186 0.05480393 0.05889035 0.16251526
0.19562518 0.18248653 0.16025332 0.07195136 0.30535437]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.11882512 0.0105553 0.12524628 0.18406771 0.18214938
0.25679283 0.24277666 0.33999795 0.31624985 0.26528232]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
7/42
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.472759
25 -0.89776597 -0.26201317 -0.26238283 -0.15054404
-0.20042835 -0.40833484 -0.29260483 -0.33269222 -0.59514517]
warnings.warn(
Random Seed: 1
Average R-squared for Lasso model: 0.35409020698197735
Average R-squared for CART model: -0.4428182844778349
Lasso model performed better.
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
8/42
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.153994
89 0.15455341 0.20907821 0.18535356 0.1391514
0.2759165 0.36501954 0.11799332 -0.03436303 0.13109816]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.566809
01 -0.60202705 -0.30701856 -0.12510417 0.09620587
-0.06190971 -0.08101144 -0.13054346 -0.06569434 -0.12266388]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
9/42
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.083299
27 -0.25389872 0.10054305 0.22201196 0.20200935
0.25860598 0.27488986 0.14049069 0.19809747 0.13665728]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.120386
09 -0.03318741 -0.00607486 0.13306365 0.05845927
0.21610725 0.11417506 0.09295177 0.09337268 0.03927135]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
10/42
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.02065748 0.15056628 0.22941426 0.23284832 0.21754546
0.26870805 0.23911793 0.27546231 0.29276375 0.22896697]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.244584
92 0.27590216 0.02096078 -0.03611129 0.12602892
0.12282928 0.15736397 0.09154146 0.2553872 0.10076938]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969:
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
11/42
UserWarning: One or more of the test scores are non-finite: [ nan -0.286661
57 -0.45822423 -0.62512527 0.03890445 -0.00874959
0.06554088 0.071934 -0.00350693 -0.05785652 -0.04805435]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.069611
48 -0.29227327 -0.56403706 -0.39567359 -0.26336293
-0.22701687 -0.17000183 -0.15570112 -0.1403999 -0.15093269]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.939541
05 -0.33124344 -0.53280634 -0.08424799 -0.04383757
0.03049933 0.17131049 0.13488876 -0.35361801 -0.26024821]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
12/42
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.324258
45 -0.0039106 0.22176154 0.26517304 0.31938084
0.35873858 0.24835623 0.15458998 0.29921823 0.13006992]
warnings.warn(
Random Seed: 5
Average R-squared for Lasso model: 0.5353473586774904
Average R-squared for CART model: 0.035296666576434266
Lasso model performed better.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
13/42
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.229499
47 -0.20825477 -0.10015404 0.0833982 0.0857641
0.07834246 0.18175984 0.08367466 0.18656869 0.16572797]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.023709
72 0.19385673 0.12898918 0.08829088 0.07471576
0.21240159 -0.04169518 0.13953054 0.13749036 0.10126004]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
14/42
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.096364
03 0.14233729 0.3794604 0.40797416 0.3339146
0.31592464 0.29129696 0.30230506 0.20444228 0.26407283]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.193873
32 0.04536946 0.02117345 -0.03579445 0.01647898
0.06343304 0.20028477 0.24346356 0.13178895 0.12082812]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
15/42
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.518569
79 -0.06018588 -0.21435533 0.2189873 0.26894299
0.27589774 0.2880225 0.19482028 0.04924998 -0.07730852]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.351189
28 -0.15979025 0.03871669 -0.02302867 0.07302056
0.17315833 0.2451383 0.25763253 0.26398624 0.148405 ]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
16/42
UserWarning: One or more of the test scores are non-finite: [ nan -3.73
912778e-01 -2.93976111e-01 -2.27948703e-01
1.29901145e-01 2.63418509e-01 2.98013859e-01 2.55685351e-01
-1.04228145e-02 -2.16341224e-04 1.13293464e-01]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -2.16
366656e-01 -5.48247152e-01 1.35351034e-02
-9.91286993e-02 8.59137554e-02 4.50546680e-02 -2.32114935e-01
-1.14024451e-01 1.57979210e-05 1.09186195e-01]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.032120
74 -0.08350811 0.22468421 0.22796179 0.14956829
0.27123282 0.22108917 0.14032745 0.16597587 0.14899859]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning:
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
17/42
10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.427681
05 -0.13932315 -0.27730305 -0.20395288 -0.12371477
0.07950594 0.09586351 -0.08711268 -0.08623121 -0.05623911]
warnings.warn(
Random Seed: 10
Average R-squared for Lasso model: 0.48717654642561004
Average R-squared for CART model: 0.23732190092360916
Lasso model performed better.
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
18/42
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.174480
7 -0.16757201 -0.07193141 0.08567782 0.11368842
0.28691442 0.16746376 0.16615409 0.25864321 0.10676024]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.066245
56 0.04028023 -0.15949167 0.12948758 0.17243477
0.33311131 0.42443106 0.3447734 0.24507438 0.20510001]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
19/42
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.322802
76 -0.30516402 -0.13508709 -0.1229082 0.06204285
0.25690527 0.26689191 0.24835133 0.20651144 0.09308653]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.2127085 0.36840731 0.36405493 0.30770387 0.33903419
0.39055685 0.30960863 0.37696061 0.29749957 0.31600338]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
20/42
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.226894
52 0.01094429 0.19052935 0.03871542 0.04865815
0.17018525 0.06434408 -0.06222963 -0.0650863 -0.02458248]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.773409
77 -0.92924952 -0.63728764 -0.13197011 -0.12002242
-0.21169843 -0.20493272 -0.11209458 0.15960674 0.18366843]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969:
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
21/42
UserWarning: One or more of the test scores are non-finite: [ nan 0.135401
93 0.00219956 0.16465138 0.16035616 0.07547488
0.06999168 0.01276818 -0.04655049 -0.08865489 -0.08139331]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.346300
02 -0.13724715 -0.04099233 0.12727124 0.18470916
0.25631961 0.21204006 0.22141211 0.06423162 0.105773 ]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.240890
75 -0.62942703 -0.05513127 -0.24240473 -0.2299039
-0.06252845 -0.08854852 0.06615957 -0.08558443 -0.09376053]
warnings.warn(
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
22/42
Random Seed: 20
Average R-squared for Lasso model: 0.37485289437528585
Average R-squared for CART model: -0.2060279431490824
Lasso model performed better.
Time taken to run the models: 17.32 seconds
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 10 fits failed out of a total of 110.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.127709
97 -0.04961833 0.05065571 -0.04704127 -0.00574963
0.07211569 0.12731161 0.01874366 0.02785747 0.00261603]
warnings.warn(
code can be executed but you observe lot of fitfailedwarnings and some R sqared values are
negetive which is concerning
In [2]:
#2
import
pandas as
pd
from
sklearn.model_selection import
KFold
, GridSearchCV
, train_test_split
from
sklearn.linear_model import
Lasso
from
sklearn.tree import
DecisionTreeRegressor
from
sklearn.metrics import
r2_score
import
time
# Load the dataset
data =
pd
.
read_csv
(
'StateData.csv'
) # Replace 'StateData.csv' with your file path
# Define features (X) and target variable (y)
X =
data
[[
'Population'
, 'Income'
, 'Illiteracy'
, 'Murder'
, 'HighSchoolGrad'
, 'Frost'
y =
data
[
'LifeExp'
]
start_time
=
time
.
time
()
# Define different random seeds
random_seeds =
[
1
, 5
, 10
, 20
] # Add more random seeds as needed
for
seed in
random_seeds
:
# Initialize models
lasso_model =
Lasso
()
cart_model =
DecisionTreeRegressor
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
23/42
# KFold with 3 folds and specific random seed
kf =
KFold
(
n_splits
=
3
, shuffle
=
True
, random_state
=
seed
)
# Initialize parameters for GridSearchCV
param_grid_lasso =
{
'alpha'
: [
0.001
, 0.01
, 0.1
, 1
, 10
]}
param_grid_cart =
{
'min_samples_leaf'
: [
0
, 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, 10
]}
# Lists to store R-squared scores
r2_scores_lasso =
[]
r2_scores_cart =
[]
# Perform 3-fold cross-validation and hyperparameter tuning
for
train_index
, test_index in
kf
.
split
(
X
):
X_train
, X_test =
X
.
iloc
[
train_index
], X
.
iloc
[
test_index
]
y_train
, y_test =
y
.
iloc
[
train_index
], y
.
iloc
[
test_index
]
# GridSearchCV for Lasso with 3 folds
grid_search_lasso =
GridSearchCV
(
lasso_model
, param_grid_lasso
, scoring
=
'r2
grid_search_lasso
.
fit
(
X_train
, y_train
)
best_lasso_model =
grid_search_lasso
.
best_estimator_
y_pred_lasso =
best_lasso_model
.
predict
(
X_test
)
r2_lasso =
r2_score
(
y_test
, y_pred_lasso
)
r2_scores_lasso
.
append
(
r2_lasso
)
# GridSearchCV for CART with 3 folds
grid_search_cart =
GridSearchCV
(
cart_model
, param_grid_cart
, scoring
=
'r2'
, grid_search_cart
.
fit
(
X_train
, y_train
)
best_cart_model =
grid_search_cart
.
best_estimator_
y_pred_cart =
best_cart_model
.
predict
(
X_test
)
r2_cart =
r2_score
(
y_test
, y_pred_cart
)
r2_scores_cart
.
append
(
r2_cart
)
# Calculate average R-squared values for each seed
avg_r2_lasso =
sum
(
r2_scores_lasso
) /
len
(
r2_scores_lasso
)
avg_r2_cart =
sum
(
r2_scores_cart
) /
len
(
r2_scores_cart
)
print
(
f"Random Seed: {
seed
}"
)
print
(
f"Average R-squared for Lasso model with 3-folds: {
avg_r2_lasso
}"
)
print
(
f"Average R-squared for CART model with 3-folds: {
avg_r2_cart
}"
)
if
avg_r2_lasso >
avg_r2_cart
:
print
(
"Lasso model performed better."
)
else
:
print
(
"CART model performed better."
)
print
(
"\n"
)
end_time =
time
.
time
()
#Calculate the elapsed time
elapsed_time =
end_time -
start_time
print
(
f"Time taken to run the models: {
elapsed_time
:.2f} seconds"
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
24/42
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.05749378 0.0836301 0.47369389 0.38417654 0.40178053
0.29865271 0.28709607 0.18887466 0.18887466 0.18887466]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.06320239 0.08758411 0.10765732 0.14046846 0.32874469
0.15152887 0.14544556 0.24680448 0.25500318 0.33532408]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
25/42
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.23782805 0.30764163 0.42063156 0.40095956 0.55796344
0.44753647 0.56171947 0.5332726 0.5332726 0.48592796]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.26616615 0.42081368 0.51156123 0.48866446 0.37644541
0.42922629 0.37891869 0.37891869 0.37891869 0.29729663]
warnings.warn(
Random Seed: 1
Average R-squared for Lasso model with 3-folds: 0.1232949323087044
Average R-squared for CART model with 3-folds: 0.2868703955754948
CART model performed better.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
26/42
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.16233605 0.17887513 0.34880613 0.26219863 0.31343871
0.352168 0.0811881 0.07180758 0.07180758 0.14983593]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.045246
98 0.10790415 0.03336096 -0.10009705 -0.00743769
0.00678587 0.30482689 0.21198292 0.23701564 0.23701564]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
27/42
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.19195991 0.24486236 0.31151417 0.3408369 0.21471034
0.21685326 0.14902434 0.28955308 0.28955308 0.28955308]
warnings.warn(
Random Seed: 5
Average R-squared for Lasso model with 3-folds: 0.37631903017897866
Average R-squared for CART model with 3-folds: 0.4258692913692093
CART model performed better.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
28/42
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.36470998 0.43450104 0.427858 0.35728732 0.37242247
0.36168847 0.35704469 0.35704469 0.35704469 0.26983195]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.18557188 0.28579693 0.15789593 0.2943409 0.25951798
0.20485838 0.25130676 0.22931752 0.22931752 0.00309347]
warnings.warn(
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
29/42
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.290654
59 -0.24549507 0.07233077 0.29939396 0.32468559
0.23573998 0.20524011 0.20524011 0.20524011 0.21856575]
warnings.warn(
Random Seed: 10
Average R-squared for Lasso model with 3-folds: 0.6591611495213835
Average R-squared for CART model with 3-folds: 0.2530810403790312
Lasso model performed better.
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.27364612 0.35156541 0.25906612 0.16805526 0.12058317
0.13282421 0.35443136 0.33864663 0.36849476 0.35538138]
warnings.warn(
Random Seed: 20
Average R-squared for Lasso model with 3-folds: 0.5427829072114675
Average R-squared for CART model with 3-folds: 0.23113029441591829
Lasso model performed better.
Time taken to run the models: 1.64 seconds
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
30/42
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:
372: FitFailedWarning: 3 fits failed out of a total of 33.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_sco
re='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida
tion.py", line 680, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
super().fit(
File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit
raise ValueError(
ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0
warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.655853
33 -0.27926483 -0.0218939 -0.18277832 0.01467076
0.12817834 -0.24119132 -0.08134588 -0.07071582 0.14932351]
warnings.warn(
It does help it not only executed the code but we can also see the output for this block of
code I think reducing the number of folds in cross-validation, from 10 to 3,impacted the
model evaluation in several ways. With 3 folds, there's a decrease in the variability between
different train-test splits, resulting in more consistent and stable estimations of model
performance.
Each fold in 3-fold cross-validation contains a larger portion of the dataset
compared to 10-fold, allowing the model to learn from more data during both training and
testing phases. This increased data in each fold aids the model in capturing better patterns
from the dataset, potentially improving its overall performance. *fewer folds mean fewer
computations, making the process faster and more efficient
In [3]:
#3.A
import
pandas as
pd
import
numpy as
np
import
matplotlib.pyplot as
plt
from
sklearn.model_selection import
KFold
, cross_val_score
from
sklearn.linear_model import
Lasso
from
sklearn.tree import
DecisionTreeRegressor
from
sklearn.metrics import
r2_score
import
time
start_time =
time
.
time
()
#Calculate the elapsed time
elapsed_time =
end_time -
start_time
print
(
f"Time taken to run the models: {
elapsed_time
:.2f} seconds"
)
# Load the dataset
data =
pd
.
read_csv
(
'StateData.csv'
) # Replace 'StateData.csv' with your file path
# Define features (X) and target variable (y)
X =
data
[[
'Population'
, 'Income'
, 'Illiteracy'
, 'Murder'
, 'HighSchoolGrad'
, 'Frost'
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
31/42
Time taken to run the models: -0.98 seconds
y =
data
[
'LifeExp'
]
# Initialize models
lasso_model =
Lasso
()
cart_model =
DecisionTreeRegressor
()
# Initialize lists to store R-squared values for each model
r2_scores_lasso =
[]
r2_scores_cart =
[]
# Perform repeated cross-validation with 25 repetitions
num_repetitions =
25
for
repetition in
range
(
num_repetitions
):
# KFold with 10 folds and specific random state
kf =
KFold
(
n_splits
=
3
, shuffle
=
True
, random_state
=
repetition
)
# Calculate R-squared scores for Lasso using cross-validation
lasso_scores =
cross_val_score
(
lasso_model
, X
, y
, scoring
=
'r2'
, cv
=
kf
)
r2_scores_lasso
.
extend
(
lasso_scores
)
# Calculate R-squared scores for CART using cross-validation
cart_scores =
cross_val_score
(
cart_model
, X
, y
, scoring
=
'r2'
, cv
=
kf
)
r2_scores_cart
.
extend
(
cart_scores
)
# Combine the R-squared scores into a dictionary for boxplot creation
r2_scores =
{
'Lasso'
: r2_scores_lasso
, 'CART'
: r2_scores_cart
}
# Create boxplots to show the distribution of R-squared values for each model
plt
.
figure
(
figsize
=
(
8
, 6
))
plt
.
boxplot
(
r2_scores
.
values
())
plt
.
xticks
([
1
, 2
], r2_scores
.
keys
())
plt
.
title
(
'Distribution of R-squared values for Lasso and CART models'
)
plt
.
ylabel
(
'R-squared'
)
plt
.
grid
(
True
)
plt
.
show
()
# Determine which model performed best
avg_r2_lasso =
np
.
mean
(
r2_scores_lasso
)
avg_r2_cart =
np
.
mean
(
r2_scores_cart
)
print
(
f"Average R-squared for Lasso model: {
avg_r2_lasso
}"
)
print
(
f"Average R-squared for CART model: {
avg_r2_cart
}"
)
if
avg_r2_lasso >
avg_r2_cart
:
print
(
"Lasso model performed better."
)
else
:
print
(
"CART model performed better."
)
end_time =
time
.
time
()
#Calculate the elapsed time
elapsed_time =
end_time -
start_time
print
(
f"Time taken to run the models: {
elapsed_time
:.2f} seconds"
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
32/42
Average R-squared for Lasso model: 0.23623783291173964
Average R-squared for CART model: 0.09385622489104001
Lasso model performed better.
Time taken to run the models: -0.98 seconds
3.B the results are not very concerning while comparing with the outputs from above The
CART model exhibiting a lower average R-squared than Lasso might suggest that it struggles
to capture the relationships within the data, potentially indicating its limited predictive
power.
In [4]:
#4.A
import
pandas as
pd
import
numpy as
np
import
matplotlib.pyplot as
plt
from
sklearn.model_selection import
KFold
, cross_val_score
, GridSearchCV
from
sklearn.linear_model import
Lasso
from
sklearn.tree import
DecisionTreeRegressor
from
sklearn.ensemble import
RandomForestRegressor
, AdaBoostRegressor
from
sklearn.metrics import
r2_score
import
time
# Load the dataset
data =
pd
.
read_csv
(
'StateData.csv'
) # Replace 'StateData.csv' with your file path
# Define features (X) and target variable (y)
X =
data
[[
'Population'
, 'Income'
, 'Illiteracy'
, 'Murder'
, 'HighSchoolGrad'
, 'Frost'
y =
data
[
'LifeExp'
]
start_time =
time
.
time
()
# Initialize models
lasso_model =
Lasso
()
cart_model =
DecisionTreeRegressor
()
random_forest_model =
RandomForestRegressor
()
adaboost_model =
AdaBoostRegressor
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
33/42
# Define hyperparameters for Random Forest and AdaBoost
param_grid_random_forest =
{
'n_estimators'
: [
10
, 100
, 250
, 500
, 1000
]}
param_grid_adaboost =
{
'learning_rate'
: [
0.001
, 0.01
, 0.1
, 1
]}
# Initialize lists to store R-squared values for each model
r2_scores_lasso =
[]
r2_scores_cart =
[]
r2_scores_random_forest =
[]
r2_scores_adaboost =
[]
# Perform repeated cross-validation with 25 repetitions
num_repetitions =
25
for
repetition in
range
(
num_repetitions
):
# KFold with 10 folds and specific random state
kf =
KFold
(
n_splits
=
3
, shuffle
=
True
, random_state
=
repetition
)
# Calculate R-squared scores for Lasso using cross-validation
lasso_scores =
cross_val_score
(
lasso_model
, X
, y
, scoring
=
'r2'
, cv
=
kf
)
r2_scores_lasso
.
extend
(
lasso_scores
)
# Calculate R-squared scores for CART using cross-validation
cart_scores =
cross_val_score
(
cart_model
, X
, y
, scoring
=
'r2'
, cv
=
kf
)
r2_scores_cart
.
extend
(
cart_scores
)
# GridSearchCV for Random Forest
grid_search_rf =
GridSearchCV
(
random_forest_model
, param_grid_random_forest
, sc
grid_search_rf
.
fit
(
X
, y
)
best_rf_model =
grid_search_rf
.
best_estimator_
rf_scores =
cross_val_score
(
best_rf_model
, X
, y
, scoring
=
'r2'
, cv
=
kf
)
r2_scores_random_forest
.
extend
(
rf_scores
)
# GridSearchCV for AdaBoost
grid_search_adaboost =
GridSearchCV
(
adaboost_model
, param_grid_adaboost
, scorin
grid_search_adaboost
.
fit
(
X
, y
)
best_adaboost_model =
grid_search_adaboost
.
best_estimator_
adaboost_scores =
cross_val_score
(
best_adaboost_model
, X
, y
, scoring
=
'r2'
, cv
=
k
r2_scores_adaboost
.
extend
(
adaboost_scores
)
# Combine the R-squared scores into a dictionary for boxplot creation
r2_scores =
{
'Lasso'
: r2_scores_lasso
,
'CART'
: r2_scores_cart
,
'Random Forest'
: r2_scores_random_forest
,
'AdaBoost'
: r2_scores_adaboost
}
# Create boxplots to show the distribution of R-squared values for all models
plt
.
figure
(
figsize
=
(
10
, 6
))
plt
.
boxplot
(
r2_scores
.
values
())
plt
.
xticks
(
range
(
1
, len
(
r2_scores
) +
1
), r2_scores
.
keys
())
plt
.
title
(
'Distribution of R-squared values for all models'
)
plt
.
ylabel
(
'R-squared'
)
plt
.
grid
(
True
)
plt
.
show
()
# Determine which model performed best
avg_r2_scores =
{
model
: np
.
mean
(
scores
) for
model
, scores in
r2_scores
.
items
()}
best_model =
max
(
avg_r2_scores
, key
=
avg_r2_scores
.
get
)
print
(
"Average R-squared values:"
)
for
model
, avg_r2 in
avg_r2_scores
.
items
():
print
(
f"{
model
}: {
avg_r2
}"
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
34/42
Average R-squared values:
Lasso: 0.23623783291173964
CART: 0.08872684581633686
Random Forest: 0.4610205421399896
AdaBoost: 0.40720293047646283
Random Forest performed the best.
Time taken to run the models: 179.20 seconds
4.B
The way the models perform differently might worry us if we really need them to predict
better for our problem. Each model works in its own way, and these differences can affect
how well they understand the data or find important connections between things like
features and the result we want.
the reason might differ from model complexity, performance evaluation, hyperparamters,
Underfitting or Overfitting.
5.The data had 50 observations in total the time taken to run the experiments are relatively
short excpet the experiment where we implement all four models took somewhere from 3-7
mins
PROBLEM 3
Ensemble Methods
print
(
f"\n{
best_model
} performed the best."
)
#part 5
end_time =
time
.
time
()
#Calculate the elapsed time
elapsed_time =
end_time -
start_time
print
(
f"Time taken to run the models: {
elapsed_time
:.2f} seconds"
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
35/42
1 .Differences between Bagging and Boosting:
a. Training Method:
Bagging (Bootstrap Aggregating): It involves training multiple individual models
independently on different subsets of the dataset by using bootstrap sampling. These
models then vote to make a collective prediction.
Boosting: It trains multiple weak learners sequentially, where each subsequent learner
focuses more on the samples that the previous ones misclassified. It aims to improve
upon the weaknesses of earlier models.
b. Weighting of Models:
Bagging: All models in bagging are typically given equal weight or importance when
making predictions.
Boosting: Boosting assigns weights to data points, where it emphasizes the misclassified
points, allowing subsequent models to concentrate more on correcting these mistakes.
c. Model Complexity:
Bagging: Each model in bagging is usually trained independently, with no direct
influence on other models. They can be diverse, leading to ensemble diversity.
Boosting: Models in boosting are trained sequentially, and each new model focuses on
improving areas where previous models made mistakes. Boosting tends to produce a
sequence of models where later models try to correct errors made by earlier ones.
2.Impact of Boosting's Sequential Training on Practitioners:
Boosting's sequential nature means that each model in the sequence is dependent on the
previous one. As a practitioner:
Adjusting hyperparameters or diagnosing issues in boosting might be more complex due to
the interdependence between models. It's crucial to monitor and control the number of
iterations or weak learners (to prevent overfitting) and tune learning rates effectively for
optimal performance.
3.Overfitting Concerns in Boosting vs. Bagging:
Boosting focuses on sequentially minimizing errors, potentially leading to overfitting if the
boosting process continues for too many iterations. It adapts its models to correct previous
mistakes, which might start fitting the noise in the data. Bagging, on the other hand,
constructs diverse models by using different subsets of the data, reducing variance without
overfitting as each model is trained independently. Hence, it's less prone to overfitting than
boosting. Stacking vs. Boosting vs. Bagging:
Bagging: It constructs multiple independent models by training them on random subsets of
the data. Each model contributes equally to the final prediction by voting or averaging.
Boosting: Boosting sequentially builds models, with each subsequent model trying to correct
the errors made by the previous ones. It focuses on the difficult instances and adapts its
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
36/42
models based on their performance on these instances.
4.Stacking (Stacked Generalization):
Stacking involves training a meta-learner that
combines the predictions of multiple base learners. Instead of a simple averaging or voting
mechanism, stacking learns how to best combine the predictions of diverse base models. It
uses the predictions of base models as features to train a higher-level model, aiming to
make more accurate predictions.
In simpler terms, while bagging and boosting create several models and combine their
predictions differently, stacking takes the predictions of these models and feeds them into
another model (meta-learner) to make the final prediction. Stacking learns how to best use
the predictions of the individual models to improve overall performance.
PROBLEM 4
Support vector machines
Support Vector Machines (SVMs): 1Difference between Soft and Hard Margin SVM:
Hard Margin SVM: It aims to find the maximum margin hyperplane that perfectly separates
the data points of different classes without allowing any misclassifications (0 training errors).
However, it might not be possible or practical when dealing with noisy or overlapping data.
For instance, in a dataset with outliers or non-linearly separable classes, a hard margin SVM
might fail to find a feasible decision boundary.
Soft Margin SVM: In contrast, a soft margin SVM allows for a margin that may have some
misclassifications or violations, known as slack variables. It tolerates a certain amount of
errors or misclassifications to find a broader margin that better generalizes to unseen data.
For example, when dealing with noisy data or when perfect separation is not feasible, a soft
margin SVM can be preferred as it provides a trade-off between margin width and errors,
improving generalization.
2.Kernel Trick and its Purpose:
The kernel trick enables SVMs to handle non-linearly separable data by implicitly mapping
the input data into a higher-dimensional space where it becomes linearly separable.
Conceptually, imagine a 2D dataset that cannot be separated with a straight line (linear
boundary). The kernel trick allows transforming this data into a higher-dimensional space
(e.g., 3D or higher) where it becomes separable by a hyperplane. This transformation is done
efficiently without explicitly calculating the new higher-dimensional feature space, thus
avoiding high computational costs.
3.Solving the Dual Formulation of the SVM Optimization Problem:
The dual formulation of the SVM optimization problem is often solved because of
computational efficiency and the kernel trick's convenience.
The dual formulation allows expressing the optimization problem in terms of dot products
between pairs of data points. This formulation enables the use of kernels, allowing SVMs to
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
37/42
efficiently operate in high-dimensional spaces without explicitly transforming the data.
Additionally, solving the dual formulation often results in a simpler problem with better
computational properties, making it more tractable for optimization algorithms compared to
the primal formulation.
In academic terms, SVMs, with their hard and soft margin concepts, utilize the kernel trick to
handle non-linear data by implicitly projecting it into a higher-dimensional space. Solving
the dual formulation of the optimization problem enables efficient computations by using
kernels and facilitates working in higher-dimensional spaces without explicitly transforming
the data, which is beneficial for handling complex data structures
PROBLEM 2
Predicting invasive species
In [5]:
#1
import
pandas as
pd
from
sklearn.model_selection import
train_test_split
, GridSearchCV
from
sklearn.preprocessing import
MinMaxScaler
from
sklearn.linear_model import
LogisticRegression
from
sklearn.metrics import
roc_auc_score
# Load the dataset
data =
pd
.
read_csv
(
'SpeciesData.csv'
)
# Separate features and target variable
X =
data
.
drop
(
columns
=
[
'Target'
])
y =
data
[
'Target'
]
# Scale features to [0, 1] range
scaler =
MinMaxScaler
()
X_scaled =
scaler
.
fit_transform
(
X
)
# Split the data into training and testing sets
X_train
, X_test
, y_train
, y_test =
train_test_split
(
X_scaled
, y
, test_size
=
0.3
, ran
# Define hyperparameters for logistic regression
param_grid =
{
'C'
: [
0.001
, 0.01
, 0.1
, 1
, 10
]}
# Initialize logistic regression with Lasso penalty
log_reg =
LogisticRegression
(
penalty
=
'l1'
, solver
=
'liblinear'
)
# Use GridSearchCV to find the best hyperparameters
grid_search =
GridSearchCV
(
log_reg
, param_grid
, cv
=
3
)
grid_search
.
fit
(
X_train
, y_train
)
# Get the best value for C
best_C =
grid_search
.
best_params_
[
'C'
]
# Train logistic regression model with the best C value
best_log_reg =
LogisticRegression
(
penalty
=
'l1'
, solver
=
'liblinear'
, C
=
best_C
)
best_log_reg
.
fit
(
X_train
, y_train
)
# Predict probabilities for the test set
y_pred_prob =
best_log_reg
.
predict_proba
(
X_test
)[:, 1
]
# Calculate test set AUC
test_auc =
roc_auc_score
(
y_test
, y_pred_prob
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
38/42
Best value for C: 1
Test set AUC: 0.8270916804513728
Training set AUC: 0.8373540654491209
1.C Based on these observations, the small difference between the training and test set AUC
and the high AUC values themselves indicate that overfitting might not be a significant
concern with this model. The model seems to have achieved good generalization to unseen
data, considering the relatively comparable performance between the training and test sets.
# Calculate training set AUC
y_pred_prob_train =
best_log_reg
.
predict_proba
(
X_train
)[:, 1
]
train_auc =
roc_auc_score
(
y_train
, y_pred_prob_train
)
print
(
f"Best value for C: {
best_C
}"
)
print
(
f"Test set AUC: {
test_auc
}"
)
print
(
f"Training set AUC: {
train_auc
}"
)
In [6]:
import
pandas as
pd
from
sklearn.model_selection import
train_test_split
, GridSearchCV
from
sklearn.preprocessing import
MinMaxScaler
from
sklearn.ensemble import
RandomForestClassifier
from
sklearn.metrics import
roc_auc_score
import
time
import
matplotlib.pyplot as
plt
start_time =
time
.
time
()
# Load the dataset
data =
pd
.
read_csv
(
'SpeciesData.csv'
)
# Separate features and target variable
X =
data
.
drop
(
columns
=
[
'Target'
])
y =
data
[
'Target'
]
# Scale features to [0, 1] range
scaler =
MinMaxScaler
()
X_scaled =
scaler
.
fit_transform
(
X
)
# Split the data into training and testing sets
X_train
, X_test
, y_train
, y_test =
train_test_split
(
X_scaled
, y
, test_size
=
0.3
, ran
# Define hyperparameters for random forest
param_grid =
{
'n_estimators'
: [
10
, 100
, 1000
, 5000
, 10000
]}
# Initialize random forest classifier
rf =
RandomForestClassifier
()
# Use GridSearchCV to find the best number of trees
grid_search =
GridSearchCV
(
rf
, param_grid
, cv
=
3
)
grid_search
.
fit
(
X_train
, y_train
)
# Get the best number of trees
best_n_estimators =
grid_search
.
best_params_
[
'n_estimators'
]
# Train random forest with the best number of trees
best_rf =
RandomForestClassifier
(
n_estimators
=
best_n_estimators
)
best_rf
.
fit
(
X_train
, y_train
)
# Predict probabilities for the test set
y_pred_prob =
best_rf
.
predict_proba
(
X_test
)[:, 1
]
# Calculate test set AUC
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
39/42
Best number of trees: 100
Test set AUC: 0.9178747034865737
Training set AUC: 1.0
Time taken to run the models: 484.64 seconds
2.b
test_auc =
roc_auc_score
(
y_test
, y_pred_prob
)
# Calculate training set AUC
y_pred_prob_train =
best_rf
.
predict_proba
(
X_train
)[:, 1
]
train_auc =
roc_auc_score
(
y_train
, y_pred_prob_train
)
print
(
f"Best number of trees: {
best_n_estimators
}"
)
print
(
f"Test set AUC: {
test_auc
}"
)
print
(
f"Training set AUC: {
train_auc
}"
)
end_time =
time
.
time
()
# Calculate the elapsed time
elapsed_time =
end_time -
start_time
print
(
f"Time taken to run the models: {
elapsed_time
:.2f} seconds"
)
# Plotting AUC against the number of trees
plt
.
figure
(
figsize
=
(
8
, 6
))
plt
.
plot
(
param_grid
[
'n_estimators'
], grid_search
.
cv_results_
[
'mean_test_score'
], ma
plt
.
title
(
'Random Forest: Number of Trees vs. Mean Test AUC'
)
plt
.
xlabel
(
'Number of Trees'
)
plt
.
ylabel
(
'Mean Test AUC'
)
plt
.
grid
(
True
)
plt
.
show
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
40/42
you'll likely notice that initially, as the number of trees increases, the model's performance
improves. However, at a certain point(here at 5000 trees), adding more trees will result in
only marginal improvements in performance.which indicates the point of diminishing
returns, where the performance gain becomes negligible despite increasing the number of
trees
2.d
Random Forests can be prone to overfitting, especially if they're built with a large number of
trees or if the trees are deep. They are capable of capturing complex relationships in the
data, potentially leading to overfitting when the model is overly complex for the given
dataset. ->reducing the number of trees(pruning), limiting tree depth, adjusting other
hyperparameters like minimum samples per leaf, or implementing regularization techniques
could be beneficial to overcome overfitting.
In [7]:
import
pandas as
pd
from
sklearn.model_selection import
train_test_split
, GridSearchCV
from
sklearn.preprocessing import
MinMaxScaler
from
sklearn.svm import
SVC
from
sklearn.metrics import
roc_auc_score
import
time
start_time
=
time
.
time
()
# Load the dataset
data =
pd
.
read_csv
(
'SpeciesData.csv'
)
# Separate features and target variable
X =
data
.
drop
(
columns
=
[
'Target'
])
y =
data
[
'Target'
]
# Scale features to [0, 1] range
scaler =
MinMaxScaler
()
X_scaled =
scaler
.
fit_transform
(
X
)
# Split the data into training and testing sets
X_train
, X_test
, y_train
, y_test =
train_test_split
(
X_scaled
, y
, test_size
=
0.3
, ran
# Define hyperparameters for SVM with different kernels
param_grid =
{
'kernel'
: [
'linear'
, 'poly'
, 'rbf'
, 'sigmoid'
]}
# Initialize SVM classifier
svm =
SVC
(
probability
=
True
)
# Use GridSearchCV to find the best kernel
grid_search =
GridSearchCV
(
svm
, param_grid
, cv
=
3
)
grid_search
.
fit
(
X_train
, y_train
)
# Get the best kernel
best_kernel =
grid_search
.
best_params_
[
'kernel'
]
# Train SVM with the best kernel
best_svm =
SVC
(
kernel
=
best_kernel
, probability
=
True
)
best_svm
.
fit
(
X_train
, y_train
)
# Predict probabilities for the test set
y_pred_prob =
best_svm
.
predict_proba
(
X_test
)[:, 1
]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
41/42
Best kernel: poly
Test set AUC: 0.848156332188832
Training set AUC: 0.8690520987641532
Time taken to run the models: 97.97 seconds
3.c
Given the small difference between the training and test set AUC and the relatively high AUC
values themselves, it appears that overfitting might not be a significant concern with this
model. The model seems to have reasonably good generalization to unseen data despite
being trained on the training set.
3.d
Beating the Random Forest's performance depends on various factors including the dataset
characteristics, and the nature of the problem. With sufficient effort in hyperparameter
tuning, feature engineering, and optimization, it's conceivable that an SVM could potentially
outperform the Random Forest. ->the relative superiority of each model might vary based
on the specific task requirements and data characteristics
4
Best Performing Model: The Random Forest outperformed the SVM with a higher Test AUC
(0.918 vs. 0.848).
Training and Testing Time: The Random Forest took longer to train and test (484.64 seconds)
compared to the SVM (97.97 seconds).
Performance vs. Computational Time: The Random Forest's slightly better performance
might justify its longer computational time if the higher Test AUC is critical. However,
considering the substantial time difference and if a marginally lower performance is
acceptable, the SVM's quicker computation might be preferred.
# Calculate test set AUC
test_auc =
roc_auc_score
(
y_test
, y_pred_prob
)
# Calculate training set AUC
y_pred_prob_train =
best_svm
.
predict_proba
(
X_train
)[:, 1
]
train_auc =
roc_auc_score
(
y_train
, y_pred_prob_train
)
print
(
f"Best kernel: {
best_kernel
}"
)
print
(
f"Test set AUC: {
test_auc
}"
)
print
(
f"Training set AUC: {
train_auc
}"
)
end_time =
time
.
time
()
#Calculate the elapsed time
elapsed_time =
end_time -
start_time
print
(
f"Time taken to run the models: {
elapsed_time
:.2f} seconds"
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/21/23, 8:54 AM
Homework2_HiteshNarra
localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false
42/42
Ultimately, the choice between models depends on the balance between model
performance and computational efficiency, considering the specific needs and constraints of
the problem at hand.
In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Recommended textbooks for you
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning

LINUX+ AND LPIC-1 GDE.TO LINUX CERTIF.
Computer Science
ISBN:9781337569798
Author:ECKERT
Publisher:CENGAGE L

A+ Guide To It Technical Support
Computer Science
ISBN:9780357108291
Author:ANDREWS, Jean.
Publisher:Cengage,
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning
Recommended textbooks for you
- Np Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:CengageSystems ArchitectureComputer ScienceISBN:9781305080195Author:Stephen D. BurdPublisher:Cengage LearningLINUX+ AND LPIC-1 GDE.TO LINUX CERTIF.Computer ScienceISBN:9781337569798Author:ECKERTPublisher:CENGAGE L
- A+ Guide To It Technical SupportComputer ScienceISBN:9780357108291Author:ANDREWS, Jean.Publisher:Cengage,Programming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:CengageC++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning

LINUX+ AND LPIC-1 GDE.TO LINUX CERTIF.
Computer Science
ISBN:9781337569798
Author:ECKERT
Publisher:CENGAGE L

A+ Guide To It Technical Support
Computer Science
ISBN:9780357108291
Author:ANDREWS, Jean.
Publisher:Cengage,
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning