Week 9 Sklearn Fish

pptx

School

St. John's University *

*We aren’t endorsed by this school

Course

243

Subject

Statistics

Date

Feb 20, 2024

Type

pptx

Pages

7

Uploaded by DukeMeerkatPerson605

Report
SciKit-Learn Linear Regression
Linear Regression Linear regression is a supervised machine learning algorithm Target variable modeled on independent variables Can be between univariate, multivariate This lab demonstrates Linear regression using Sklearn
Import Libraries We’re going to not just model, but display our data. Step 1: Download the following libraries import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn import preprocessing, svm from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Load and Explore Data Use the Fish dataset. Choose the same two columns you used for previous lab. Step 2: Load the data using pandas. It’s always a good idea to run df.head() after you load and at least describe(). Make sure the data looks as expected Step 3: Plot the scatter plot Seaborn has an lmplot which can display a scatter plot and draw a regression line. Use the following parameters: ci = None, line_kws={"color": "red"}. This will remove the confidence interval, and generate a red fit line. Compare it to the Statsmodels line. Is it close?
Generate the variables and Fit the model Step 4: SKlearn works with Arrays. You’ll need to convert your X and Y variables into 1D numpy arrays. There’s many ways to do this. Try reshape(- 1,1) Step 5: We’ve imported the train_test_split module. This lets us perform a split for training purposes. Data is split into training dataset, used to model the data Testing dataset used to check accuracy The code here is a little bit tough. The code is in the notes section of this slide deck. Run this code. Note the score - that’s how accurate your X variable is at predicting your Y.
Visualize the model Step 6 Sklearn uses regr.predict to predict the values for the predicted line. This line should be the best fit. Your line may look very similar to the OLS line you generated above. This will depend on the variables you chose. I chose Length1 and Weight. My accuracy was 81%. My line was nearly straight. Therefore, it looks almost the same as this model. y_pred = regr.predict(X_test) plt.scatter(X_test, y_test, color ='b') plt.plot(X_test, y_pred, color ='r') plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Metrics We already know our accuracy, that’s contained in the variable regr.score. Similar to the way Statsmodels had a model fit, we can also generate some model statistics with Sklearn. Run the below to generate some model statistics. The RMSE or root mean squared error is the most important. Think of it as the average error of the model. The smaller the better. What was the RMSE of your model? from sklearn.metrics import mean_absolute_error,mean_squared_error mae = mean_absolute_error(y_true=y_test,y_pred=y_pred) #squared True returns MSE value, False returns RMSE value. mse = mean_squared_error(y_true=y_test,y_pred=y_pred) #default=True rmse = mean_squared_error(y_true=y_test,y_pred=y_pred,squared=False) print("MAE:",mae) print("MSE:",mse) print("RMSE:",rmse)