Answered: Calculate RMSE on the validation set.…

Computer Networking: A Top-Down Approach (7th Edition)

7th Edition

ISBN:9780133594140

Author:James Kurose, Keith Ross

Publisher:James Kurose, Keith Ross

Chapter1: Computer Networks And The Internet

Section: Chapter Questions

Problem R1RQ: What is the difference between a host and an end system? List several different types of end...

See similar textbooks

Related questions

Question

100%

import numpy as np

import pandas as pd

from catboost

import CatBoostRegressor

from lightgbm import LGBMRegressor

from sklearn.linear_model import Lasso

from sklearn.metrics import mean_squared_error

from sklearn.model_selection import train_test_split

from xgboost import XGBRegressor

df=pd.read_csv('data.csv')

X = df.drop('shares', axis=1)

y = df['shares']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.40, random_state=13)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.25, random_state=13)

Ans:- # code here

Q- Now let's train our first model - XGBoost. A link to the documentation: https://xgboost.readthedocs.io/en/latest/

We will use Scikit-Learn Wrapper interface for XGBoost (and the same logic applies to the following LightGBM and CatBoost models). Here, we work on the regression task - hence we will use XGBRegressor. Read about the parameters of XGBRegressor: https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBRegressor

The main list of XGBoost parameters: https://xgboost.readthedocs.io/en/latest/parameter.html# Look through this list so that you understand which parameters are presented in the library.

Take XGBRegressor with MSE objective (objective='reg:squarederror'), 200 trees (n_estimators=200), learning_rate=0.01, max_depth=5, random_state=13 and all other default parameter values. Train it on the train set (fit function).

q5: Calculate Root Mean Squared Error (RMSE) on the validation set. What is it equal to? Provide the answer, rounded to the nearest FIVE decimal places (e.g. 12.3456789 -> 12.34568).

Ans:-

import xgboost as xgb

from xgboost.sklearn import XGBClassifier

xgb1 = XGBClassifier( learning_rate =0.01, n_estimators=200, max_depth=5, random_state=13, objective='reg:squarederror')

# Code here

Q-

In the task 5, we have decided to build 200 trees in our model. However, it is hard to understand whether it is a good decision - maybe it is too much? Maybe 150 is a better number? Or 100? Or 50 is enough?

During the training process, it is possible to stop constructing the ensemble if we see that the validation error does not decrease anymore. Using the same XGBoost model, call fit function (to train it) with eval_set=[(X_val, y_val)] (to evaluate the boosting model after building a new tree) and early_stopping_rounds=50 (and other default parameter values). This early_stopping_rounds says that if the validation metric does not increase on 50 consequent iterations, the training stops.

q6: Calculate RMSE on the validation set. What is it equal to? Provide the answer, rounded to the nearest FIVE decimal places (e.g. 12.3456789 -> 12.34568).

Ans:- # code here

Q- Notes on parameter tuning: https://xgboost.readthedocs.io/en/latest/tutorials/param_tuning.html

Here, we tuned some parameters of the algorithm. Take XGBRegressor with the following parameters:

objective='reg:squarederror'
n_estimators=5000
learning_rate=0.001
max_depth=4
gamma=1
subsample=0.5
random_state=13
all other default parameter values

Train it in the same manner, as in the task 6, but with early_stopping_rounds=500.

q7: Calculate RMSE on the validation set. What is it equal to? Provide the answer, rounded to the nearest FIVE decimal places (e.g. 12.3456789 -> 12.34568).

Notice the speed of the algorithm.

Ans:- # code here

Expert Solution