Question 5: Linear Regression with SKlearn • Build a simple linear regression model to predict tip from total_bill using scikit-learn. ⚫ Calculate and interpret the R-squared value, and MSE for both the training and testing sets. • Generate a residual plot (using the testing dataset: y_test-y_pred) and analyze its distribution. Interpret the slope and intercept of the regression line. ⚫ Split the dataset into 70% for model training and 30% for testing the model from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean squared_error, r2_score # Code goes here Training Set: R-squared: 0.4555559076450443 MSE: 1.1627372520616328 H Testing Set: R-squared: 0.4291782688312412 MSE: 0.7524510349809156 Residual Plot on the Training Sample -1 Intercept: 0.8769576391532712 Slope: 0.10889370921420224 2 Residual Plot on the Testing Sample 1 2 -N ▾ Question 6: Multiple Linear Regression Model • Build a multiple linear regression model to predict tip using total_bill, size, sex, smoker, and day. • Use the same split as above (70% for training and 30% for testing). • Evaluate the model's performance using the same metrics as in the simple linear regression (R^2 & MSB ⚫ Compare the performance of the simple and multiple linear regression models. [] # Code goes here total bill tip size sex Female smoker_No day Fri day Sat day Sun time Dinner 0 16.99 1.01 2 1 1 0 0 1 1 1 10.34 1.66 3 0 1 0 0 1 1 2 21.01 3.50 3 0 1 0 0 1 1 Model metrics (MSE & R^2) on Training Dataset: R-squared: 0.49252348904344123 MSE: 1.0837877609860398 Model metrics (MSE & R^2) on Testing Dataset: R-squared: 0.2930966744126695 MSE: 0.9318323215911046 ▾ Question 7: Regularization • Apply StandardScaler on the datasets. ⚫ Check which alpha parameter is the optimal value among [0.1, 1, 10, 100, 200] • Apply Ridge Linear Regression with the best alpha value. • What is your conclusion? [ ] from sklearn.preprocessing import StandardScaler from sklearn.linear_model import Ridge # Code goes here

Question 5: Linear Regression with SKlearn • Build a simple linear regression model to predict tip from total_bill using scikit-learn. ⚫ Calculate and interpret the R-squared value, and MSE for both the training and testing sets. • Generate a residual plot (using the testing dataset: y_test-y_pred) and analyze its distribution. Interpret the slope and intercept of the regression line. ⚫ Split the dataset into 70% for model training and 30% for testing the model from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean squared_error, r2_score # Code goes here Training Set: R-squared: 0.4555559076450443 MSE: 1.1627372520616328 H Testing Set: R-squared: 0.4291782688312412 MSE: 0.7524510349809156 Residual Plot on the Training Sample -1 Intercept: 0.8769576391532712 Slope: 0.10889370921420224 2 Residual Plot on the Testing Sample 1 2 -N ▾ Question 6: Multiple Linear Regression Model • Build a multiple linear regression model to predict tip using total_bill, size, sex, smoker, and day. • Use the same split as above (70% for training and 30% for testing). • Evaluate the model's performance using the same metrics as in the simple linear regression (R^2 & MSB ⚫ Compare the performance of the simple and multiple linear regression models. [] # Code goes here total bill tip size sex Female smoker_No day Fri day Sat day Sun time Dinner 0 16.99 1.01 2 1 1 0 0 1 1 1 10.34 1.66 3 0 1 0 0 1 1 2 21.01 3.50 3 0 1 0 0 1 1 Model metrics (MSE & R^2) on Training Dataset: R-squared: 0.49252348904344123 MSE: 1.0837877609860398 Model metrics (MSE & R^2) on Testing Dataset: R-squared: 0.2930966744126695 MSE: 0.9318323215911046 ▾ Question 7: Regularization • Apply StandardScaler on the datasets. ⚫ Check which alpha parameter is the optimal value among [0.1, 1, 10, 100, 200] • Apply Ridge Linear Regression with the best alpha value. • What is your conclusion? [ ] from sklearn.preprocessing import StandardScaler from sklearn.linear_model import Ridge # Code goes here