Homework2_HiteshNarra

pdf

School

University of Wisconsin, Madison *

*We aren’t endorsed by this school

Course

521

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by MagistrateKomodoDragon3072

11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 1/42 ISyE 521: HOMEWORK 2 HITESH NARRA PROBLEM 1 Predicting Life Expectancy in the United States during the 1970s: In [1]: #1 import pandas as pd from sklearn.model_selection import KFold , GridSearchCV , train_test_split from sklearn.linear_model import Lasso from sklearn.tree import DecisionTreeRegressor from sklearn.metrics import r2_score import time # Load the dataset data = pd . read_csv ( 'StateData.csv' ) # Replace 'StateData.csv' with your file path num_observations = len ( data ) print ( f"Number of observations: { num_observations }" ) # Define features (X) and target variable (y) X = data [[ 'Population' , 'Income' , 'Illiteracy' , 'Murder' , 'HighSchoolGrad' , 'Frost' y = data [ 'LifeExp' ] start_time = time . time () # Define different random seeds random_seeds = [ 1 , 5 , 10 , 20 ] # Add more random seeds as needed for seed in random_seeds : # Initialize models lasso_model = Lasso () cart_model = DecisionTreeRegressor () # KFold with 10 folds and specific random seed kf = KFold ( n_splits = 10 , shuffle = True , random_state = seed ) # Initialize parameters for GridSearchCV param_grid_lasso = { 'alpha' : [ 0.001 , 0.01 , 0.1 , 1 , 10 ]} param_grid_cart = { 'min_samples_leaf' : [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ]} # Lists to store R-squared scores r2_scores_lasso = [] r2_scores_cart = [] # Perform 10-fold cross-validation and hyperparameter tuning for train_index , test_index in kf . split ( X ): X_train , X_test = X . iloc [ train_index ], X . iloc [ test_index ] y_train , y_test = y . iloc [ train_index ], y . iloc [ test_index ] # GridSearchCV for Lasso grid_search_lasso = GridSearchCV ( lasso_model , param_grid_lasso , scoring = 'r2 grid_search_lasso . fit ( X_train , y_train ) best_lasso_model = grid_search_lasso . best_estimator_ y_pred_lasso = best_lasso_model . predict ( X_test ) r2_lasso = r2_score ( y_test , y_pred_lasso ) r2_scores_lasso . append ( r2_lasso )

11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 2/42 Number of observations: 50 # GridSearchCV for CART grid_search_cart = GridSearchCV ( cart_model , param_grid_cart , scoring = 'r2' , grid_search_cart . fit ( X_train , y_train ) best_cart_model = grid_search_cart . best_estimator_ y_pred_cart = best_cart_model . predict ( X_test ) r2_cart = r2_score ( y_test , y_pred_cart ) r2_scores_cart . append ( r2_cart ) # Calculate average R-squared values for each seed avg_r2_lasso = sum ( r2_scores_lasso ) / len ( r2_scores_lasso ) avg_r2_cart = sum ( r2_scores_cart ) / len ( r2_scores_cart ) print ( f"Random Seed: { seed }" ) print ( f"Average R-squared for Lasso model: { avg_r2_lasso }" ) print ( f"Average R-squared for CART model: { avg_r2_cart }" ) if avg_r2_lasso > avg_r2_cart : print ( "Lasso model performed better." ) else : print ( "CART model performed better." ) print ( "\n" ) end_time = time . time () #Calculate the elapsed time elapsed_time = end_time - start_time print ( f"Time taken to run the models: { elapsed_time :.2f} seconds" )

11/21/23, 8:54 AM Homework2_HiteshNarra localhost:8888/nbconvert/html/Desktop/Homework2_HiteshNarra.ipynb?download=false 3/42 C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.11850502 0.14137533 0.1510754 0.25586429 0.17071131 0.20143935 0.18175009 0.32927866 0.15656313 0.27729491] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 10 fits failed with the following error: Traceback (most recent call last): File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_valida tion.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1315, in fit super().fit( File "C:\Users\narra\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 235, in fit raise ValueError( ValueError: min_samples_leaf must be at least 1 or in (0, 0.5], got 0 warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.106957 16 -0.10692139 -0.04741887 0.03106423 0.01219886 0.11055171 0.06554953 0.10779761 0.0789667 0.09099035] warnings.warn( C:\Users\narra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py: 372: FitFailedWarning: 10 fits failed out of a total of 110. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_sco re='raise'. Below are more details about the failures:

Your preview ends here