Note
the
logarithmic
scale
of
the
y-axis.
When
plotting
logarithmically,
the
relation
seems
to
be
quite
linear
and
so
should
be
relatively
easy
to
predict,
apart
from
some
bumps.
We
will
make
a
forecast
for
the
years
after
2000
using
the
historical
data
up
to
that
point,
with
the
date
as
our
only
feature.
We
will
compare
two
simple
models:
a
DecisionTreeRegressor
and
LinearRegression.
We
rescale
the
prices
using
a
loga-
rithm,
so
that
the
relationship
is
relatively
linear.
This
doesn't
make
a
difference
for
the
DecisionTreeRegressor,
but
it
makes
a
big
difference
for
LinearRegression
(we
will
discuss
this
in
more
depth
in
Chapter
4).
After
training
the
models
and
making
predictions,
we
apply
the
exponential
map
to
undo
the
logarithm
transform.
We
make
predictions
on
the
whole
dataset
for
visualization
purposes
here,
but
for
a
quantitative
evaluation
we
would
only
consider
the
test
dataset:
In[66]:
from
sklearn.tree
import
DecisionTreeRegressor
#
use
historical
data
to
forecast
prices
after
the
year
2000
data_train
=
ram_prices[ram_prices.date
<
2000]
data_test
=
ram_prices[ram_prices.date
>=
2000]
#
predict
prices
based
on
date
X_train
=
data_train.date[:,
np.newaxis]
#
we
use
a
log-transform
to
get
a
simpler
relationship
of
data
to
target
y_train
=
np.log(data_train.price)
tree
=
DecisionTreeRegressor().fit(X_train,
y_train)
linear_reg
=
LinearRegression().fit(X_train,
y_train)
#
predict
on
all
data
X_all
=
ram_prices.date[:,
np.newaxis]
pred_tree
=
tree.predict(X_all)
pred_lr
=
linear_reg.predict(X_all)
#
undo
log-transform
price_tree
=
np.exp(pred_tree)
price_lr
=
np.exp(pred_lr)
Figure
2-32,
created
here,
compares
the
predictions
of
the
decision
tree
and
the
linear
regression
model
with
the
ground
truth:
In[67]:
plt.semilogy(data_train.date,
data_train.price,
label="Training
data")
plt.semilogy(data_test.date,
data_test.price,
label="Test
data")
plt.semilogy(ram_prices.date,
price_tree,
label="Tree
prediction")
plt.semilogy(ram_prices.date,
price_lr,
label="Linear
prediction")
plt.legend()
Supervised
Machine
Learning
Algorithms
|
81