Calculate RMSE on the validation set. What is it equal to? Provide the answer, rounded to the nearest FIVE decimal places (e.g. 12.3456789 -> 12.34568). Notice the speed of the algorithm. Ans:- # code here
import numpy as np
import pandas as pd
from catboost
import CatBoostRegressor
from lightgbm import LGBMRegressor
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
Q- Now let's train our first model - XGBoost. A link to the documentation: https://xgboost.readthedocs.io/en/latest/
We will use Scikit-Learn Wrapper interface for XGBoost (and the same logic applies to the following LightGBM and CatBoost models). Here, we work on the regression task - hence we will use XGBRegressor. Read about the parameters of XGBRegressor: https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBRegressor
The main list of XGBoost parameters: https://xgboost.readthedocs.io/en/latest/parameter.html# Look through this list so that you understand which parameters are presented in the library.
Take XGBRegressor with MSE objective (objective='reg:squarederror'), 200 trees (n_estimators=200), learning_rate=0.01, max_depth=5, random_state=13 and all other default parameter values. Train it on the train set (fit function).
q5: Calculate Root Mean Squared Error (RMSE) on the validation set. What is it equal to? Provide the answer, rounded to the nearest FIVE decimal places (e.g. 12.3456789 -> 12.34568).
Ans:-
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
xgb1 = XGBClassifier( learning_rate =0.01, n_estimators=200, max_depth=5, random_state=13, objective='reg:squarederror')
# Code here
Q-
In the task 5, we have decided to build 200 trees in our model. However, it is hard to understand whether it is a good decision - maybe it is too much? Maybe 150 is a better number? Or 100? Or 50 is enough?
During the training process, it is possible to stop constructing the ensemble if we see that the validation error does not decrease anymore. Using the same XGBoost model, call fit function (to train it) with eval_set=[(X_val, y_val)] (to evaluate the boosting model after building a new tree) and early_stopping_rounds=50 (and other default parameter values). This early_stopping_rounds says that if the validation metric does not increase on 50 consequent iterations, the training stops.
q6: Calculate RMSE on the validation set. What is it equal to? Provide the answer, rounded to the nearest FIVE decimal places (e.g. 12.3456789 -> 12.34568).
Ans:- # code here
Q- Notes on parameter tuning: https://xgboost.readthedocs.io/en/latest/tutorials/param_tuning.html
Here, we tuned some parameters of the
- objective='reg:squarederror'
- n_estimators=5000
- learning_rate=0.001
- max_depth=4
- gamma=1
- subsample=0.5
- random_state=13
- all other default parameter values
Train it in the same manner, as in the task 6, but with early_stopping_rounds=500.
q7: Calculate RMSE on the validation set. What is it equal to? Provide the answer, rounded to the nearest FIVE decimal places (e.g. 12.3456789 -> 12.34568).
Notice the speed of the algorithm.
Ans:- # code here
Trending now
This is a popular solution!
Step by step
Solved in 4 steps with 1 images