the below  is an example of diabetes dataset import matplotlib.pyplot as plt import numpy as np from sklearn.datasets import load_diabetes from sklearn import linear_model d = load_diabetes() d_X = d.data[:, np.newaxis, 2] dx_train = d_X[:-20] dy_train = d.target[:-20] dx_test = d_X[-20:] dy_test = d.target[-20:] lr = linear_model.LinearRegression() lr.fit(dx_train, dy_train) mse = np.mean((lr.predict(dx_test) - dy_test) **2) lr_score = lr.score(dx_test, dy_test) print(lr.coef_) print(mse) print(lr_score) plt.scatter(dx_test, dy_test) plt.plot(dx_test, lr.predict(dx_test), c='r') plt.show()

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

the below  is an example of diabetes dataset

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn import linear_model

d = load_diabetes()
d_X = d.data[:, np.newaxis, 2]
dx_train = d_X[:-20]
dy_train = d.target[:-20]
dx_test = d_X[-20:]
dy_test = d.target[-20:]

lr = linear_model.LinearRegression()
lr.fit(dx_train, dy_train)

mse = np.mean((lr.predict(dx_test) - dy_test) **2)
lr_score = lr.score(dx_test, dy_test)

print(lr.coef_)
print(mse)
print(lr_score)
plt.scatter(dx_test, dy_test)
plt.plot(dx_test, lr.predict(dx_test), c='r')
plt.show()

• Create a Python file called linear Regression.py.
• In this task you will use the diabetes dataset mentioned above to perform
linear regression to find the best fit line through the data.
Reserve the last 20 observations for testing and use the rest for training
your model.
Instead of using linear_model.LinearRegression () from sklearn, write a
function and make use of numpy to calculate the gradient and the
y-intercept of the best fit line, which has equation y = mx + b. The
equations below describe how both the gradient and the y-intercept can
be calculated from the training data and labels. Note: when you calculate
the gradient, you will need to reshape the x array to remove an extra
dimension of 1 from its shape (it has this as the dataset was formatted for
use with the sklearn functions, which require this extra dimension). You
can easily do this by applying squeeze () to the x array when you pass it
as an argument to the method. Hint: if the line doesn't look like it fits the
data well, there is a bug in your code.
0 m = (µ(x) * µ(y) − µ(x * y))/((µ(x))² − µ(x²))
-
o b = µ(y) = m* µ(x)
Where u is a mean function
• Use these values to produce a figure with the following:
o Scatter plot of training data colored red.
o Scatter plot of testing data colored green.
O Line graph for the best-fit line colored blue.
o Legend.
Transcribed Image Text:• Create a Python file called linear Regression.py. • In this task you will use the diabetes dataset mentioned above to perform linear regression to find the best fit line through the data. Reserve the last 20 observations for testing and use the rest for training your model. Instead of using linear_model.LinearRegression () from sklearn, write a function and make use of numpy to calculate the gradient and the y-intercept of the best fit line, which has equation y = mx + b. The equations below describe how both the gradient and the y-intercept can be calculated from the training data and labels. Note: when you calculate the gradient, you will need to reshape the x array to remove an extra dimension of 1 from its shape (it has this as the dataset was formatted for use with the sklearn functions, which require this extra dimension). You can easily do this by applying squeeze () to the x array when you pass it as an argument to the method. Hint: if the line doesn't look like it fits the data well, there is a bug in your code. 0 m = (µ(x) * µ(y) − µ(x * y))/((µ(x))² − µ(x²)) - o b = µ(y) = m* µ(x) Where u is a mean function • Use these values to produce a figure with the following: o Scatter plot of training data colored red. o Scatter plot of testing data colored green. O Line graph for the best-fit line colored blue. o Legend.
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 4 steps with 3 images

Blurred answer
Knowledge Booster
Types of trees
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education