Question 5: Linear Regression with SKlearn • Build a simple linear regression model to predict tip from total_bill using scikit-learn. ⚫ Calculate and interpret the R-squared value, and MSE for both the training and testing sets. • Generate a residual plot (using the testing dataset: y_test-y_pred) and analyze its distribution. Interpret the slope and intercept of the regression line. ⚫ Split the dataset into 70% for model training and 30% for testing the model from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean squared_error, r2_score # Code goes here Training Set: R-squared: 0.4555559076450443 MSE: 1.1627372520616328 H Testing Set: R-squared: 0.4291782688312412 MSE: 0.7524510349809156 Residual Plot on the Training Sample -1 Intercept: 0.8769576391532712 Slope: 0.10889370921420224 2 Residual Plot on the Testing Sample 1 2 -N ▾ Question 6: Multiple Linear Regression Model • Build a multiple linear regression model to predict tip using total_bill, size, sex, smoker, and day. • Use the same split as above (70% for training and 30% for testing). • Evaluate the model's performance using the same metrics as in the simple linear regression (R^2 & MSB ⚫ Compare the performance of the simple and multiple linear regression models. [] # Code goes here total bill tip size sex Female smoker_No day Fri day Sat day Sun time Dinner 0 16.99 1.01 2 1 1 0 0 1 1 1 10.34 1.66 3 0 1 0 0 1 1 2 21.01 3.50 3 0 1 0 0 1 1 Model metrics (MSE & R^2) on Training Dataset: R-squared: 0.49252348904344123 MSE: 1.0837877609860398 Model metrics (MSE & R^2) on Testing Dataset: R-squared: 0.2930966744126695 MSE: 0.9318323215911046 ▾ Question 7: Regularization • Apply StandardScaler on the datasets. ⚫ Check which alpha parameter is the optimal value among [0.1, 1, 10, 100, 200] • Apply Ridge Linear Regression with the best alpha value. • What is your conclusion? [ ] from sklearn.preprocessing import StandardScaler from sklearn.linear_model import Ridge # Code goes here

icon
Related questions
Question

I need help with the machine learning question. I need help on trying to use this.

Don't know which code to use.

  • Build a simple linear regression model to predict tip from total_bill using scikit-learn.
  • Calculate and interpret the R-squared value, and MSE for both the training and testing sets.
  • Generate a residual plot (using the testing dataset: y_test - y_pred) and analyze its distribution.
  • Interpret the slope and intercept of the regression line.
  • Split the dataset into 70% for model training and 30% for testing the model.


  • Build a multiple linear regression model to predict tip using total_bill, size, sex, smoker, and day.
  • Use the same split as above (70% for training and 30% for testing).
  • Evaluate the model's performance using the same metrics as in the simple linear regression (R^2 & MSE).
  • Compare the performance of the simple and multiple linear regression models.

 

Question 5: Linear Regression with SKlearn
• Build a simple linear regression model to predict tip from total_bill using scikit-learn.
⚫ Calculate and interpret the R-squared value, and MSE for both the training and testing sets.
• Generate a residual plot (using the testing dataset: y_test-y_pred) and analyze its distribution.
Interpret the slope and intercept of the regression line.
⚫ Split the dataset into 70% for model training and 30% for testing the model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean squared_error, r2_score
# Code goes here
Training Set:
R-squared: 0.4555559076450443
MSE: 1.1627372520616328
H
Testing Set:
R-squared: 0.4291782688312412
MSE: 0.7524510349809156
Residual Plot on the Training Sample
-1
Intercept: 0.8769576391532712
Slope: 0.10889370921420224
2
Residual Plot on the Testing Sample
1
2
-N
Transcribed Image Text:Question 5: Linear Regression with SKlearn • Build a simple linear regression model to predict tip from total_bill using scikit-learn. ⚫ Calculate and interpret the R-squared value, and MSE for both the training and testing sets. • Generate a residual plot (using the testing dataset: y_test-y_pred) and analyze its distribution. Interpret the slope and intercept of the regression line. ⚫ Split the dataset into 70% for model training and 30% for testing the model from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean squared_error, r2_score # Code goes here Training Set: R-squared: 0.4555559076450443 MSE: 1.1627372520616328 H Testing Set: R-squared: 0.4291782688312412 MSE: 0.7524510349809156 Residual Plot on the Training Sample -1 Intercept: 0.8769576391532712 Slope: 0.10889370921420224 2 Residual Plot on the Testing Sample 1 2 -N
▾ Question 6: Multiple Linear Regression Model
• Build a multiple linear regression model to predict tip using total_bill, size, sex, smoker, and day.
• Use the same split as above (70% for training and 30% for testing).
• Evaluate the model's performance using the same metrics as in the simple linear regression (R^2 & MSB
⚫ Compare the performance of the simple and multiple linear regression models.
[] # Code goes here
total bill tip size sex Female smoker_No day Fri day Sat day Sun time Dinner
0
16.99 1.01
2
1
1
0
0
1
1
1
10.34 1.66
3
0
1
0
0
1
1
2
21.01 3.50
3
0
1
0
0
1
1
Model metrics (MSE & R^2) on Training Dataset:
R-squared: 0.49252348904344123
MSE: 1.0837877609860398
Model metrics (MSE & R^2) on Testing Dataset:
R-squared: 0.2930966744126695
MSE: 0.9318323215911046
▾ Question 7: Regularization
• Apply StandardScaler on the datasets.
⚫ Check which alpha parameter is the optimal value among [0.1, 1, 10, 100, 200]
• Apply Ridge Linear Regression with the best alpha value.
• What is your conclusion?
[ ] from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
# Code goes here
Transcribed Image Text:▾ Question 6: Multiple Linear Regression Model • Build a multiple linear regression model to predict tip using total_bill, size, sex, smoker, and day. • Use the same split as above (70% for training and 30% for testing). • Evaluate the model's performance using the same metrics as in the simple linear regression (R^2 & MSB ⚫ Compare the performance of the simple and multiple linear regression models. [] # Code goes here total bill tip size sex Female smoker_No day Fri day Sat day Sun time Dinner 0 16.99 1.01 2 1 1 0 0 1 1 1 10.34 1.66 3 0 1 0 0 1 1 2 21.01 3.50 3 0 1 0 0 1 1 Model metrics (MSE & R^2) on Training Dataset: R-squared: 0.49252348904344123 MSE: 1.0837877609860398 Model metrics (MSE & R^2) on Testing Dataset: R-squared: 0.2930966744126695 MSE: 0.9318323215911046 ▾ Question 7: Regularization • Apply StandardScaler on the datasets. ⚫ Check which alpha parameter is the optimal value among [0.1, 1, 10, 100, 200] • Apply Ridge Linear Regression with the best alpha value. • What is your conclusion? [ ] from sklearn.preprocessing import StandardScaler from sklearn.linear_model import Ridge # Code goes here
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer