Homework2_HiteshNarra

pdf

School

University of Wisconsin, Madison *

*We aren’t endorsed by this school

Course

521

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

42

Uploaded by MagistrateKomodoDragon3072

Report
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 1/42 ISyE 521: HOMEWORK 2 HITESH NARRA PROBLEM 1 Predicting Life Expectancy in the United States during the 1970s: In [1]: #1 import pandas as pd from sklearn.model_selection import KFold , GridSearchCV , train_test_split from sklearn.linear_model import Lasso from sklearn.tree import DecisionTreeRegressor from sklearn.metrics import r2_score import time # Load the dataset data = pd . read_csv ( 'StateData.csv' ) # Replace 'StateData.csv' with your file path num_observations = len ( data ) print ( f"Number of observations: { num_observations }" ) # Define features (X) and target variable (y) X = data [[ 'Population' , 'Income' , 'Illiteracy' , 'Murder' , 'HighSchoolGrad' , 'Frost' y = data [ 'LifeExp' ] start_time = time . time () # Define different random seeds random_seeds = [ 1 , 5 , 10 , 20 ] # Add more random seeds as needed for seed in random_seeds : # Initialize models lasso_model = Lasso () cart_model = DecisionTreeRegressor () # KFold with 10 folds and specific random seed kf = KFold ( n_splits = 10 , shuffle = True , random_state = seed ) # Initialize parameters for GridSearchCV param_grid_lasso = { 'alpha' : [ 0.001 , 0.01 , 0.1 , 1 , 10 ]} param_grid_cart = { 'min_samples_leaf' : [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ]} # Lists to store R-squared scores r2_scores_lasso = [] r2_scores_cart = [] # Perform 10-fold cross-validation and hyperparameter tuning for train_index , test_index in kf . split ( X ): X_train , X_test = X . iloc [ train_index ], X . iloc [ test_index ] y_train , y_test = y . iloc [ train_index ], y . iloc [ test_index ] # GridSearchCV for Lasso grid_search_lasso = GridSearchCV ( lasso_model , param_grid_lasso , scoring = 'r2 grid_search_lasso . fit ( X_train , y_train ) best_lasso_model = grid_search_lasso . best_estimator_ y_pred_lasso = best_lasso_model . predict ( X_test ) r2_lasso = r2_score ( y_test , y_pred_lasso ) r2_scores_lasso . append ( r2_lasso )
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 2/42 Number of observations: 50 # GridSearchCV for CART grid_search_cart = GridSearchCV ( cart_model , param_grid_cart , scoring = 'r2' , grid_search_cart . fit ( X_train , y_train ) best_cart_model = grid_search_cart . best_estimator_ y_pred_cart = best_cart_model . predict ( X_test ) r2_cart = r2_score ( y_test , y_pred_cart ) r2_scores_cart . append ( r2_cart ) # Calculate average R-squared values for each seed avg_r2_lasso = sum ( r2_scores_lasso ) / len ( r2_scores_lasso ) avg_r2_cart = sum ( r2_scores_cart ) / len ( r2_scores_cart ) print ( f"Random Seed: { seed }" ) print ( f"Average R-squared for Lasso model: { avg_r2_lasso }" ) print ( f"Average R-squared for CART model: { avg_r2_cart }" ) if avg_r2_lasso > avg_r2_cart : print ( "Lasso model performed better." ) else : print ( "CART model performed better." ) print ( "\n" ) end_time = time . time () #Calculate the elapsed time elapsed_time = end_time - start_time print ( f"Time taken to run the models: { elapsed_time :.2f} seconds" )
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 3/42 C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.11850502 0.14137533 0.1510754 0.25586429 0.17071131 0.20143935 0.18175009 0.32927866 0.15656313 0.27729491] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.106957 16 -0.10692139 -0.04741887 0.03106423 0.01219886 0.11055171 0.06554953 0.10779761 0.0789667 0.09099035] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 4/42 -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.053191 47 -0.02364643 0.17981273 -0.09907187 0.12707414 0.25396166 0.27257061 -0.00779393 0.02166761 -0.02792838] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.567759 95 0.06513312 0.05465951 0.00112932 0.01172282 -0.42682527 -0.43879821 -0.75089062 -0.81760283 -0.6079314 ] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 5/42 super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -1.147004 65 -0.88781403 -0.79827566 -0.22980491 -0.27949846 -0.12661363 0.16324384 0.08064276 -0.4649434 -0.42879903] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.25432442 0.20453046 0.10024083 0.05054613 0.15096296 0.16271187 0.10178181 0.1179111 0.2240885 0.24965337] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969:
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 6/42 UserWarning: One or more of the test scores are non-finite: [ nan 0.010124 07 -0.23950088 0.21310493 0.14431472 0.16146947 0.26397747 0.07250198 0.09208388 0.03997921 0.07865732] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.121060 55 0.13775186 0.05480393 0.05889035 0.16251526 0.19562518 0.18248653 0.16025332 0.07195136 0.30535437] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.11882512 0.0105553 0.12524628 0.18406771 0.18214938 0.25679283 0.24277666 0.33999795 0.31624985 0.26528232] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 7/42 If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.472759 25 -0.89776597 -0.26201317 -0.26238283 -0.15054404 -0.20042835 -0.40833484 -0.29260483 -0.33269222 -0.59514517] warnings.warn( Random Seed: 1 Average R-squared for Lasso model: 0.35409020698197735 Average R-squared for CART model: -0.4428182844778349 Lasso model performed better.
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 8/42 C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.153994 89 0.15455341 0.20907821 0.18535356 0.1391514 0.2759165 0.36501954 0.11799332 -0.03436303 0.13109816] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.566809 01 -0.60202705 -0.30701856 -0.12510417 0.09620587 -0.06190971 -0.08101144 -0.13054346 -0.06569434 -0.12266388] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures:
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 9/42 -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.083299 27 -0.25389872 0.10054305 0.22201196 0.20200935 0.25860598 0.27488986 0.14049069 0.19809747 0.13665728] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.120386 09 -0.03318741 -0.00607486 0.13306365 0.05845927 0.21610725 0.11417506 0.09295177 0.09337268 0.03927135] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 10/42 super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.02065748 0.15056628 0.22941426 0.23284832 0.21754546 0.26870805 0.23911793 0.27546231 0.29276375 0.22896697] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.244584 92 0.27590216 0.02096078 -0.03611129 0.12602892 0.12282928 0.15736397 0.09154146 0.2553872 0.10076938] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969:
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 11/42 UserWarning: One or more of the test scores are non-finite: [ nan -0.286661 57 -0.45822423 -0.62512527 0.03890445 -0.00874959 0.06554088 0.071934 -0.00350693 -0.05785652 -0.04805435] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.069611 48 -0.29227327 -0.56403706 -0.39567359 -0.26336293 -0.22701687 -0.17000183 -0.15570112 -0.1403999 -0.15093269] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.939541 05 -0.33124344 -0.53280634 -0.08424799 -0.04383757 0.03049933 0.17131049 0.13488876 -0.35361801 -0.26024821] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan.
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 12/42 If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.324258 45 -0.0039106 0.22176154 0.26517304 0.31938084 0.35873858 0.24835623 0.15458998 0.29921823 0.13006992] warnings.warn( Random Seed: 5 Average R-squared for Lasso model: 0.5353473586774904 Average R-squared for CART model: 0.035296666576434266 Lasso model performed better.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 13/42 C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.229499 47 -0.20825477 -0.10015404 0.0833982 0.0857641 0.07834246 0.18175984 0.08367466 0.18656869 0.16572797] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.023709 72 0.19385673 0.12898918 0.08829088 0.07471576 0.21240159 -0.04169518 0.13953054 0.13749036 0.10126004] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures:
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 14/42 -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.096364 03 0.14233729 0.3794604 0.40797416 0.3339146 0.31592464 0.29129696 0.30230506 0.20444228 0.26407283] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.193873 32 0.04536946 0.02117345 -0.03579445 0.01647898 0.06343304 0.20028477 0.24346356 0.13178895 0.12082812] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 15/42 super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.518569 79 -0.06018588 -0.21435533 0.2189873 0.26894299 0.27589774 0.2880225 0.19482028 0.04924998 -0.07730852] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.351189 28 -0.15979025 0.03871669 -0.02302867 0.07302056 0.17315833 0.2451383 0.25763253 0.26398624 0.148405 ] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 16/42 UserWarning: One or more of the test scores are non-finite: [ nan -3.73 912778e-01 -2.93976111e-01 -2.27948703e-01 1.29901145e-01 2.63418509e-01 2.98013859e-01 2.55685351e-01 -1.04228145e-02 -2.16341224e-04 1.13293464e-01] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -2.16 366656e-01 -5.48247152e-01 1.35351034e-02 -9.91286993e-02 8.59137554e-02 4.50546680e-02 -2.32114935e-01 -1.14024451e-01 1.57979210e-05 1.09186195e-01] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.032120 74 -0.08350811 0.22468421 0.22796179 0.14956829 0.27123282 0.22108917 0.14032745 0.16597587 0.14899859] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning:
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 17/42 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.427681 05 -0.13932315 -0.27730305 -0.20395288 -0.12371477 0.07950594 0.09586351 -0.08711268 -0.08623121 -0.05623911] warnings.warn( Random Seed: 10 Average R-squared for Lasso model: 0.48717654642561004 Average R-squared for CART model: 0.23732190092360916 Lasso model performed better.
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 18/42 C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.174480 7 -0.16757201 -0.07193141 0.08567782 0.11368842 0.28691442 0.16746376 0.16615409 0.25864321 0.10676024] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.066245 56 0.04028023 -0.15949167 0.12948758 0.17243477 0.33311131 0.42443106 0.3447734 0.24507438 0.20510001] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 19/42 -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.322802 76 -0.30516402 -0.13508709 -0.1229082 0.06204285 0.25690527 0.26689191 0.24835133 0.20651144 0.09308653] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.2127085 0.36840731 0.36405493 0.30770387 0.33903419 0.39055685 0.30960863 0.37696061 0.29749957 0.31600338] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 20/42 super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.226894 52 0.01094429 0.19052935 0.03871542 0.04865815 0.17018525 0.06434408 -0.06222963 -0.0650863 -0.02458248] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.773409 77 -0.92924952 -0.63728764 -0.13197011 -0.12002242 -0.21169843 -0.20493272 -0.11209458 0.15960674 0.18366843] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969:
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 21/42 UserWarning: One or more of the test scores are non-finite: [ nan 0.135401 93 0.00219956 0.16465138 0.16035616 0.07547488 0.06999168 0.01276818 -0.04655049 -0.08865489 -0.08139331] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.346300 02 -0.13724715 -0.04099233 0.12727124 0.18470916 0.25631961 0.21204006 0.22141211 0.06423162 0.105773 ] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.240890 75 -0.62942703 -0.05513127 -0.24240473 -0.2299039 -0.06252845 -0.08854852 0.06615957 -0.08558443 -0.09376053] warnings.warn(
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 22/42 Random Seed: 20 Average R-squared for Lasso model: 0.37485289437528585 Average R-squared for CART model: -0.2060279431490824 Lasso model performed better. Time taken to run the models: 17.32 seconds C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.127709 97 -0.04961833 0.05065571 -0.04704127 -0.00574963 0.07211569 0.12731161 0.01874366 0.02785747 0.00261603] warnings.warn( code can be executed but you observe lot of fitfailedwarnings and some R sqared values are negetive which is concerning In [2]: #2 import pandas as pd from sklearn.model_selection import KFold , GridSearchCV , train_test_split from sklearn.linear_model import Lasso from sklearn.tree import DecisionTreeRegressor from sklearn.metrics import r2_score import time # Load the dataset data = pd . read_csv ( 'StateData.csv' ) # Replace 'StateData.csv' with your file path # Define features (X) and target variable (y) X = data [[ 'Population' , 'Income' , 'Illiteracy' , 'Murder' , 'HighSchoolGrad' , 'Frost' y = data [ 'LifeExp' ] start_time = time . time () # Define different random seeds random_seeds = [ 1 , 5 , 10 , 20 ] # Add more random seeds as needed for seed in random_seeds : # Initialize models lasso_model = Lasso () cart_model = DecisionTreeRegressor ()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 23/42 # KFold with 3 folds and specific random seed kf = KFold ( n_splits = 3 , shuffle = True , random_state = seed ) # Initialize parameters for GridSearchCV param_grid_lasso = { 'alpha' : [ 0.001 , 0.01 , 0.1 , 1 , 10 ]} param_grid_cart = { 'min_samples_leaf' : [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ]} # Lists to store R-squared scores r2_scores_lasso = [] r2_scores_cart = [] # Perform 3-fold cross-validation and hyperparameter tuning for train_index , test_index in kf . split ( X ): X_train , X_test = X . iloc [ train_index ], X . iloc [ test_index ] y_train , y_test = y . iloc [ train_index ], y . iloc [ test_index ] # GridSearchCV for Lasso with 3 folds grid_search_lasso = GridSearchCV ( lasso_model , param_grid_lasso , scoring = 'r2 grid_search_lasso . fit ( X_train , y_train ) best_lasso_model = grid_search_lasso . best_estimator_ y_pred_lasso = best_lasso_model . predict ( X_test ) r2_lasso = r2_score ( y_test , y_pred_lasso ) r2_scores_lasso . append ( r2_lasso ) # GridSearchCV for CART with 3 folds grid_search_cart = GridSearchCV ( cart_model , param_grid_cart , scoring = 'r2' , grid_search_cart . fit ( X_train , y_train ) best_cart_model = grid_search_cart . best_estimator_ y_pred_cart = best_cart_model . predict ( X_test ) r2_cart = r2_score ( y_test , y_pred_cart ) r2_scores_cart . append ( r2_cart ) # Calculate average R-squared values for each seed avg_r2_lasso = sum ( r2_scores_lasso ) / len ( r2_scores_lasso ) avg_r2_cart = sum ( r2_scores_cart ) / len ( r2_scores_cart ) print ( f"Random Seed: { seed }" ) print ( f"Average R-squared for Lasso model with 3-folds: { avg_r2_lasso }" ) print ( f"Average R-squared for CART model with 3-folds: { avg_r2_cart }" ) if avg_r2_lasso > avg_r2_cart : print ( "Lasso model performed better." ) else : print ( "CART model performed better." ) print ( "\n" ) end_time = time . time () #Calculate the elapsed time elapsed_time = end_time - start_time print ( f"Time taken to run the models: { elapsed_time :.2f} seconds" )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 24/42 C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.05749378 0.0836301 0.47369389 0.38417654 0.40178053 0.29865271 0.28709607 0.18887466 0.18887466 0.18887466] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.06320239 0.08758411 0.10765732 0.14046846 0.32874469 0.15152887 0.14544556 0.24680448 0.25500318 0.33532408] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 25/42 -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.23782805 0.30764163 0.42063156 0.40095956 0.55796344 0.44753647 0.56171947 0.5332726 0.5332726 0.48592796] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.26616615 0.42081368 0.51156123 0.48866446 0.37644541 0.42922629 0.37891869 0.37891869 0.37891869 0.29729663] warnings.warn( Random Seed: 1 Average R-squared for Lasso model with 3-folds: 0.1232949323087044 Average R-squared for CART model with 3-folds: 0.2868703955754948 CART model performed better.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 26/42 C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.16233605 0.17887513 0.34880613 0.26219863 0.31343871 0.352168 0.0811881 0.07180758 0.07180758 0.14983593] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.045246 98 0.10790415 0.03336096 -0.10009705 -0.00743769 0.00678587 0.30482689 0.21198292 0.23701564 0.23701564] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 27/42 -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.19195991 0.24486236 0.31151417 0.3408369 0.21471034 0.21685326 0.14902434 0.28955308 0.28955308 0.28955308] warnings.warn( Random Seed: 5 Average R-squared for Lasso model with 3-folds: 0.37631903017897866 Average R-squared for CART model with 3-folds: 0.4258692913692093 CART model performed better.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 28/42 C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.36470998 0.43450104 0.427858 0.35728732 0.37242247 0.36168847 0.35704469 0.35704469 0.35704469 0.26983195] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.18557188 0.28579693 0.15789593 0.2943409 0.25951798 0.20485838 0.25130676 0.22931752 0.22931752 0.00309347] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 29/42 -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.290654 59 -0.24549507 0.07233077 0.29939396 0.32468559 0.23573998 0.20524011 0.20524011 0.20524011 0.21856575] warnings.warn( Random Seed: 10 Average R-squared for Lasso model with 3-folds: 0.6591611495213835 Average R-squared for CART model with 3-folds: 0.2530810403790312 Lasso model performed better. C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.27364612 0.35156541 0.25906612 0.16805526 0.12058317 0.13282421 0.35443136 0.33864663 0.36849476 0.35538138] warnings.warn( Random Seed: 20 Average R-squared for Lasso model with 3-folds: 0.5427829072114675 Average R-squared for CART model with 3-folds: 0.23113029441591829 Lasso model performed better. Time taken to run the models: 1.64 seconds
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 30/42 C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 3 fits failed out of a total of 33. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 3 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan -0.655853 33 -0.27926483 -0.0218939 -0.18277832 0.01467076 0.12817834 -0.24119132 -0.08134588 -0.07071582 0.14932351] warnings.warn( It does help it not only executed the code but we can also see the output for this block of code I think reducing the number of folds in cross-validation, from 10 to 3,impacted the model evaluation in several ways. With 3 folds, there's a decrease in the variability between different train-test splits, resulting in more consistent and stable estimations of model performance. Each fold in 3-fold cross-validation contains a larger portion of the dataset compared to 10-fold, allowing the model to learn from more data during both training and testing phases. This increased data in each fold aids the model in capturing better patterns from the dataset, potentially improving its overall performance. *fewer folds mean fewer computations, making the process faster and more efficient In [3]: #3.A import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import KFold , cross_val_score from sklearn.linear_model import Lasso from sklearn.tree import DecisionTreeRegressor from sklearn.metrics import r2_score import time start_time = time . time () #Calculate the elapsed time elapsed_time = end_time - start_time print ( f"Time taken to run the models: { elapsed_time :.2f} seconds" ) # Load the dataset data = pd . read_csv ( 'StateData.csv' ) # Replace 'StateData.csv' with your file path # Define features (X) and target variable (y) X = data [[ 'Population' , 'Income' , 'Illiteracy' , 'Murder' , 'HighSchoolGrad' , 'Frost'
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 31/42 Time taken to run the models: -0.98 seconds y = data [ 'LifeExp' ] # Initialize models lasso_model = Lasso () cart_model = DecisionTreeRegressor () # Initialize lists to store R-squared values for each model r2_scores_lasso = [] r2_scores_cart = [] # Perform repeated cross-validation with 25 repetitions num_repetitions = 25 for repetition in range ( num_repetitions ): # KFold with 10 folds and specific random state kf = KFold ( n_splits = 3 , shuffle = True , random_state = repetition ) # Calculate R-squared scores for Lasso using cross-validation lasso_scores = cross_val_score ( lasso_model , X , y , scoring = 'r2' , cv = kf ) r2_scores_lasso . extend ( lasso_scores ) # Calculate R-squared scores for CART using cross-validation cart_scores = cross_val_score ( cart_model , X , y , scoring = 'r2' , cv = kf ) r2_scores_cart . extend ( cart_scores ) # Combine the R-squared scores into a dictionary for boxplot creation r2_scores = { 'Lasso' : r2_scores_lasso , 'CART' : r2_scores_cart } # Create boxplots to show the distribution of R-squared values for each model plt . figure ( figsize = ( 8 , 6 )) plt . boxplot ( r2_scores . values ()) plt . xticks ([ 1 , 2 ], r2_scores . keys ()) plt . title ( 'Distribution of R-squared values for Lasso and CART models' ) plt . ylabel ( 'R-squared' ) plt . grid ( True ) plt . show () # Determine which model performed best avg_r2_lasso = np . mean ( r2_scores_lasso ) avg_r2_cart = np . mean ( r2_scores_cart ) print ( f"Average R-squared for Lasso model: { avg_r2_lasso }" ) print ( f"Average R-squared for CART model: { avg_r2_cart }" ) if avg_r2_lasso > avg_r2_cart : print ( "Lasso model performed better." ) else : print ( "CART model performed better." ) end_time = time . time () #Calculate the elapsed time elapsed_time = end_time - start_time print ( f"Time taken to run the models: { elapsed_time :.2f} seconds" )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 32/42 Average R-squared for Lasso model: 0.23623783291173964 Average R-squared for CART model: 0.09385622489104001 Lasso model performed better. Time taken to run the models: -0.98 seconds 3.B the results are not very concerning while comparing with the outputs from above The CART model exhibiting a lower average R-squared than Lasso might suggest that it struggles to capture the relationships within the data, potentially indicating its limited predictive power. In [4]: #4.A import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import KFold , cross_val_score , GridSearchCV from sklearn.linear_model import Lasso from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import RandomForestRegressor , AdaBoostRegressor from sklearn.metrics import r2_score import time # Load the dataset data = pd . read_csv ( 'StateData.csv' ) # Replace 'StateData.csv' with your file path # Define features (X) and target variable (y) X = data [[ 'Population' , 'Income' , 'Illiteracy' , 'Murder' , 'HighSchoolGrad' , 'Frost' y = data [ 'LifeExp' ] start_time = time . time () # Initialize models lasso_model = Lasso () cart_model = DecisionTreeRegressor () random_forest_model = RandomForestRegressor () adaboost_model = AdaBoostRegressor ()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 33/42 # Define hyperparameters for Random Forest and AdaBoost param_grid_random_forest = { 'n_estimators' : [ 10 , 100 , 250 , 500 , 1000 ]} param_grid_adaboost = { 'learning_rate' : [ 0.001 , 0.01 , 0.1 , 1 ]} # Initialize lists to store R-squared values for each model r2_scores_lasso = [] r2_scores_cart = [] r2_scores_random_forest = [] r2_scores_adaboost = [] # Perform repeated cross-validation with 25 repetitions num_repetitions = 25 for repetition in range ( num_repetitions ): # KFold with 10 folds and specific random state kf = KFold ( n_splits = 3 , shuffle = True , random_state = repetition ) # Calculate R-squared scores for Lasso using cross-validation lasso_scores = cross_val_score ( lasso_model , X , y , scoring = 'r2' , cv = kf ) r2_scores_lasso . extend ( lasso_scores ) # Calculate R-squared scores for CART using cross-validation cart_scores = cross_val_score ( cart_model , X , y , scoring = 'r2' , cv = kf ) r2_scores_cart . extend ( cart_scores ) # GridSearchCV for Random Forest grid_search_rf = GridSearchCV ( random_forest_model , param_grid_random_forest , sc grid_search_rf . fit ( X , y ) best_rf_model = grid_search_rf . best_estimator_ rf_scores = cross_val_score ( best_rf_model , X , y , scoring = 'r2' , cv = kf ) r2_scores_random_forest . extend ( rf_scores ) # GridSearchCV for AdaBoost grid_search_adaboost = GridSearchCV ( adaboost_model , param_grid_adaboost , scorin grid_search_adaboost . fit ( X , y ) best_adaboost_model = grid_search_adaboost . best_estimator_ adaboost_scores = cross_val_score ( best_adaboost_model , X , y , scoring = 'r2' , cv = k r2_scores_adaboost . extend ( adaboost_scores ) # Combine the R-squared scores into a dictionary for boxplot creation r2_scores = { 'Lasso' : r2_scores_lasso , 'CART' : r2_scores_cart , 'Random Forest' : r2_scores_random_forest , 'AdaBoost' : r2_scores_adaboost } # Create boxplots to show the distribution of R-squared values for all models plt . figure ( figsize = ( 10 , 6 )) plt . boxplot ( r2_scores . values ()) plt . xticks ( range ( 1 , len ( r2_scores ) + 1 ), r2_scores . keys ()) plt . title ( 'Distribution of R-squared values for all models' ) plt . ylabel ( 'R-squared' ) plt . grid ( True ) plt . show () # Determine which model performed best avg_r2_scores = { model : np . mean ( scores ) for model , scores in r2_scores . items ()} best_model = max ( avg_r2_scores , key = avg_r2_scores . get ) print ( "Average R-squared values:" ) for model , avg_r2 in avg_r2_scores . items (): print ( f"{ model }: { avg_r2 }" )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 34/42 Average R-squared values: Lasso: 0.23623783291173964 CART: 0.08872684581633686 Random Forest: 0.4610205421399896 AdaBoost: 0.40720293047646283 Random Forest performed the best. Time taken to run the models: 179.20 seconds 4.B The way the models perform differently might worry us if we really need them to predict better for our problem. Each model works in its own way, and these differences can affect how well they understand the data or find important connections between things like features and the result we want. the reason might differ from model complexity, performance evaluation, hyperparamters, Underfitting or Overfitting. 5.The data had 50 observations in total the time taken to run the experiments are relatively short excpet the experiment where we implement all four models took somewhere from 3-7 mins PROBLEM 3 Ensemble Methods print ( f"\n{ best_model } performed the best." ) #part 5 end_time = time . time () #Calculate the elapsed time elapsed_time = end_time - start_time print ( f"Time taken to run the models: { elapsed_time :.2f} seconds" )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 35/42 1 .Differences between Bagging and Boosting: a. Training Method: Bagging (Bootstrap Aggregating): It involves training multiple individual models independently on different subsets of the dataset by using bootstrap sampling. These models then vote to make a collective prediction. Boosting: It trains multiple weak learners sequentially, where each subsequent learner focuses more on the samples that the previous ones misclassified. It aims to improve upon the weaknesses of earlier models. b. Weighting of Models: Bagging: All models in bagging are typically given equal weight or importance when making predictions. Boosting: Boosting assigns weights to data points, where it emphasizes the misclassified points, allowing subsequent models to concentrate more on correcting these mistakes. c. Model Complexity: Bagging: Each model in bagging is usually trained independently, with no direct influence on other models. They can be diverse, leading to ensemble diversity. Boosting: Models in boosting are trained sequentially, and each new model focuses on improving areas where previous models made mistakes. Boosting tends to produce a sequence of models where later models try to correct errors made by earlier ones. 2.Impact of Boosting's Sequential Training on Practitioners: Boosting's sequential nature means that each model in the sequence is dependent on the previous one. As a practitioner: Adjusting hyperparameters or diagnosing issues in boosting might be more complex due to the interdependence between models. It's crucial to monitor and control the number of iterations or weak learners (to prevent overfitting) and tune learning rates effectively for optimal performance. 3.Overfitting Concerns in Boosting vs. Bagging: Boosting focuses on sequentially minimizing errors, potentially leading to overfitting if the boosting process continues for too many iterations. It adapts its models to correct previous mistakes, which might start fitting the noise in the data. Bagging, on the other hand, constructs diverse models by using different subsets of the data, reducing variance without overfitting as each model is trained independently. Hence, it's less prone to overfitting than boosting. Stacking vs. Boosting vs. Bagging: Bagging: It constructs multiple independent models by training them on random subsets of the data. Each model contributes equally to the final prediction by voting or averaging. Boosting: Boosting sequentially builds models, with each subsequent model trying to correct the errors made by the previous ones. It focuses on the difficult instances and adapts its
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 36/42 models based on their performance on these instances. 4.Stacking (Stacked Generalization): Stacking involves training a meta-learner that combines the predictions of multiple base learners. Instead of a simple averaging or voting mechanism, stacking learns how to best combine the predictions of diverse base models. It uses the predictions of base models as features to train a higher-level model, aiming to make more accurate predictions. In simpler terms, while bagging and boosting create several models and combine their predictions differently, stacking takes the predictions of these models and feeds them into another model (meta-learner) to make the final prediction. Stacking learns how to best use the predictions of the individual models to improve overall performance. PROBLEM 4 Support vector machines Support Vector Machines (SVMs): 1Difference between Soft and Hard Margin SVM: Hard Margin SVM: It aims to find the maximum margin hyperplane that perfectly separates the data points of different classes without allowing any misclassifications (0 training errors). However, it might not be possible or practical when dealing with noisy or overlapping data. For instance, in a dataset with outliers or non-linearly separable classes, a hard margin SVM might fail to find a feasible decision boundary. Soft Margin SVM: In contrast, a soft margin SVM allows for a margin that may have some misclassifications or violations, known as slack variables. It tolerates a certain amount of errors or misclassifications to find a broader margin that better generalizes to unseen data. For example, when dealing with noisy data or when perfect separation is not feasible, a soft margin SVM can be preferred as it provides a trade-off between margin width and errors, improving generalization. 2.Kernel Trick and its Purpose: The kernel trick enables SVMs to handle non-linearly separable data by implicitly mapping the input data into a higher-dimensional space where it becomes linearly separable. Conceptually, imagine a 2D dataset that cannot be separated with a straight line (linear boundary). The kernel trick allows transforming this data into a higher-dimensional space (e.g., 3D or higher) where it becomes separable by a hyperplane. This transformation is done efficiently without explicitly calculating the new higher-dimensional feature space, thus avoiding high computational costs. 3.Solving the Dual Formulation of the SVM Optimization Problem: The dual formulation of the SVM optimization problem is often solved because of computational efficiency and the kernel trick's convenience. The dual formulation allows expressing the optimization problem in terms of dot products between pairs of data points. This formulation enables the use of kernels, allowing SVMs to
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 37/42 efficiently operate in high-dimensional spaces without explicitly transforming the data. Additionally, solving the dual formulation often results in a simpler problem with better computational properties, making it more tractable for optimization algorithms compared to the primal formulation. In academic terms, SVMs, with their hard and soft margin concepts, utilize the kernel trick to handle non-linear data by implicitly projecting it into a higher-dimensional space. Solving the dual formulation of the optimization problem enables efficient computations by using kernels and facilitates working in higher-dimensional spaces without explicitly transforming the data, which is beneficial for handling complex data structures PROBLEM 2 Predicting invasive species In [5]: #1 import pandas as pd from sklearn.model_selection import train_test_split , GridSearchCV from sklearn.preprocessing import MinMaxScaler from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_auc_score # Load the dataset data = pd . read_csv ( 'SpeciesData.csv' ) # Separate features and target variable X = data . drop ( columns = [ 'Target' ]) y = data [ 'Target' ] # Scale features to [0, 1] range scaler = MinMaxScaler () X_scaled = scaler . fit_transform ( X ) # Split the data into training and testing sets X_train , X_test , y_train , y_test = train_test_split ( X_scaled , y , test_size = 0.3 , ran # Define hyperparameters for logistic regression param_grid = { 'C' : [ 0.001 , 0.01 , 0.1 , 1 , 10 ]} # Initialize logistic regression with Lasso penalty log_reg = LogisticRegression ( penalty = 'l1' , solver = 'liblinear' ) # Use GridSearchCV to find the best hyperparameters grid_search = GridSearchCV ( log_reg , param_grid , cv = 3 ) grid_search . fit ( X_train , y_train ) # Get the best value for C best_C = grid_search . best_params_ [ 'C' ] # Train logistic regression model with the best C value best_log_reg = LogisticRegression ( penalty = 'l1' , solver = 'liblinear' , C = best_C ) best_log_reg . fit ( X_train , y_train ) # Predict probabilities for the test set y_pred_prob = best_log_reg . predict_proba ( X_test )[:, 1 ] # Calculate test set AUC test_auc = roc_auc_score ( y_test , y_pred_prob )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 38/42 Best value for C: 1 Test set AUC: 0.8270916804513728 Training set AUC: 0.8373540654491209 1.C Based on these observations, the small difference between the training and test set AUC and the high AUC values themselves indicate that overfitting might not be a significant concern with this model. The model seems to have achieved good generalization to unseen data, considering the relatively comparable performance between the training and test sets. # Calculate training set AUC y_pred_prob_train = best_log_reg . predict_proba ( X_train )[:, 1 ] train_auc = roc_auc_score ( y_train , y_pred_prob_train ) print ( f"Best value for C: { best_C }" ) print ( f"Test set AUC: { test_auc }" ) print ( f"Training set AUC: { train_auc }" ) In [6]: import pandas as pd from sklearn.model_selection import train_test_split , GridSearchCV from sklearn.preprocessing import MinMaxScaler from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import roc_auc_score import time import matplotlib.pyplot as plt start_time = time . time () # Load the dataset data = pd . read_csv ( 'SpeciesData.csv' ) # Separate features and target variable X = data . drop ( columns = [ 'Target' ]) y = data [ 'Target' ] # Scale features to [0, 1] range scaler = MinMaxScaler () X_scaled = scaler . fit_transform ( X ) # Split the data into training and testing sets X_train , X_test , y_train , y_test = train_test_split ( X_scaled , y , test_size = 0.3 , ran # Define hyperparameters for random forest param_grid = { 'n_estimators' : [ 10 , 100 , 1000 , 5000 , 10000 ]} # Initialize random forest classifier rf = RandomForestClassifier () # Use GridSearchCV to find the best number of trees grid_search = GridSearchCV ( rf , param_grid , cv = 3 ) grid_search . fit ( X_train , y_train ) # Get the best number of trees best_n_estimators = grid_search . best_params_ [ 'n_estimators' ] # Train random forest with the best number of trees best_rf = RandomForestClassifier ( n_estimators = best_n_estimators ) best_rf . fit ( X_train , y_train ) # Predict probabilities for the test set y_pred_prob = best_rf . predict_proba ( X_test )[:, 1 ] # Calculate test set AUC
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 39/42 Best number of trees: 100 Test set AUC: 0.9178747034865737 Training set AUC: 1.0 Time taken to run the models: 484.64 seconds 2.b test_auc = roc_auc_score ( y_test , y_pred_prob ) # Calculate training set AUC y_pred_prob_train = best_rf . predict_proba ( X_train )[:, 1 ] train_auc = roc_auc_score ( y_train , y_pred_prob_train ) print ( f"Best number of trees: { best_n_estimators }" ) print ( f"Test set AUC: { test_auc }" ) print ( f"Training set AUC: { train_auc }" ) end_time = time . time () # Calculate the elapsed time elapsed_time = end_time - start_time print ( f"Time taken to run the models: { elapsed_time :.2f} seconds" ) # Plotting AUC against the number of trees plt . figure ( figsize = ( 8 , 6 )) plt . plot ( param_grid [ 'n_estimators' ], grid_search . cv_results_ [ 'mean_test_score' ], ma plt . title ( 'Random Forest: Number of Trees vs. Mean Test AUC' ) plt . xlabel ( 'Number of Trees' ) plt . ylabel ( 'Mean Test AUC' ) plt . grid ( True ) plt . show ()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 40/42 you'll likely notice that initially, as the number of trees increases, the model's performance improves. However, at a certain point(here at 5000 trees), adding more trees will result in only marginal improvements in performance.which indicates the point of diminishing returns, where the performance gain becomes negligible despite increasing the number of trees 2.d Random Forests can be prone to overfitting, especially if they're built with a large number of trees or if the trees are deep. They are capable of capturing complex relationships in the data, potentially leading to overfitting when the model is overly complex for the given dataset. ->reducing the number of trees(pruning), limiting tree depth, adjusting other hyperparameters like minimum samples per leaf, or implementing regularization techniques could be beneficial to overcome overfitting. In [7]: import pandas as pd from sklearn.model_selection import train_test_split , GridSearchCV from sklearn.preprocessing import MinMaxScaler from sklearn.svm import SVC from sklearn.metrics import roc_auc_score import time start_time = time . time () # Load the dataset data = pd . read_csv ( 'SpeciesData.csv' ) # Separate features and target variable X = data . drop ( columns = [ 'Target' ]) y = data [ 'Target' ] # Scale features to [0, 1] range scaler = MinMaxScaler () X_scaled = scaler . fit_transform ( X ) # Split the data into training and testing sets X_train , X_test , y_train , y_test = train_test_split ( X_scaled , y , test_size = 0.3 , ran # Define hyperparameters for SVM with different kernels param_grid = { 'kernel' : [ 'linear' , 'poly' , 'rbf' , 'sigmoid' ]} # Initialize SVM classifier svm = SVC ( probability = True ) # Use GridSearchCV to find the best kernel grid_search = GridSearchCV ( svm , param_grid , cv = 3 ) grid_search . fit ( X_train , y_train ) # Get the best kernel best_kernel = grid_search . best_params_ [ 'kernel' ] # Train SVM with the best kernel best_svm = SVC ( kernel = best_kernel , probability = True ) best_svm . fit ( X_train , y_train ) # Predict probabilities for the test set y_pred_prob = best_svm . predict_proba ( X_test )[:, 1 ]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 41/42 Best kernel: poly Test set AUC: 0.848156332188832 Training set AUC: 0.8690520987641532 Time taken to run the models: 97.97 seconds 3.c Given the small difference between the training and test set AUC and the relatively high AUC values themselves, it appears that overfitting might not be a significant concern with this model. The model seems to have reasonably good generalization to unseen data despite being trained on the training set. 3.d Beating the Random Forest's performance depends on various factors including the dataset characteristics, and the nature of the problem. With sufficient effort in hyperparameter tuning, feature engineering, and optimization, it's conceivable that an SVM could potentially outperform the Random Forest. ->the relative superiority of each model might vary based on the specific task requirements and data characteristics 4 Best Performing Model: The Random Forest outperformed the SVM with a higher Test AUC (0.918 vs. 0.848). Training and Testing Time: The Random Forest took longer to train and test (484.64 seconds) compared to the SVM (97.97 seconds). Performance vs. Computational Time: The Random Forest's slightly better performance might justify its longer computational time if the higher Test AUC is critical. However, considering the substantial time difference and if a marginally lower performance is acceptable, the SVM's quicker computation might be preferred. # Calculate test set AUC test_auc = roc_auc_score ( y_test , y_pred_prob ) # Calculate training set AUC y_pred_prob_train = best_svm . predict_proba ( X_train )[:, 1 ] train_auc = roc_auc_score ( y_train , y_pred_prob_train ) print ( f"Best kernel: { best_kernel }" ) print ( f"Test set AUC: { test_auc }" ) print ( f"Training set AUC: { train_auc }" ) end_time = time . time () #Calculate the elapsed time elapsed_time = end_time - start_time print ( f"Time taken to run the models: { elapsed_time :.2f} seconds" )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 42/42 Ultimately, the choice between models depends on the balance between model performance and computational efficiency, considering the specific needs and constraints of the problem at hand. In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help