Machine Learning and Feature Selection (Nov 27) Class(1)

School

Boston University *

*We aren’t endorsed by this school

Course

BA222

Subject

Computer Science

Date

Jan 9, 2024

Type

Pages

Uploaded by MinisterWolverine2792

#!/usr/bin/env python # coding: utf-8 # In[ ]: #Created on Wed Nov 9 10:27:58 2022 #This document was created by Dr. N. Orkun Baycik and cannot be shared with anyone without permission #@author: nobaycik # In[ ]: import pandas as pd import statsmodels.api as sm from sklearn.linear_model import Lasso from sklearn.model_selection import train_test_split # Read and view data Companies = pd.read_excel("Companies2018.xlsx") #Get the variable names: # In[ ]: # Use all quantitative variables as your predictor for now. # We could use categorical ones for sure. Just need to add dummy variables as usual. # Response variable is "Gross Profit==]" y = Companies["Gross Profit"] X = Companies[['SG&A Expense', 'Operating Expenses', 'Operating Income', 'Interest Expense', 'Earnings before Tax', 'Income Tax Expense', 'Net Income', 'Preferred Dividends', 'EPS', 'EPS Diluted', 'Weighted Average Shs Out', 'Weighted Average Shs Out (Dil)', 'Dividend per Share', 'Gross Margin', 'EBITDA Margin', 'EBIT Margin', 'Profit Margin', 'Free Cash Flow margin', 'EBITDA', 'EBIT', 'Consolidated Income', 'Earnings Before Tax Margin', 'Net Profit Margin',

'Cash and cash equivalents', 'Short-term investments', 'Cash and short-term investments', 'Receivables', 'Inventories', 'Total current assets', 'Property, Plant & Equipment Net', 'Goodwill and Intangible Assets', 'Long-term investments', 'Tax assets']] #These are all the things we have already learned and done many times! # In[ ]: # In ML algorithms, we usually split the data into: Training and Test. # Training data searches for patterns in the data set and test data is for testing of the model. # Split the dataset into training and testing sets. X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 1) #splits entire data frame into two pieces # In[ ]: # Fit a Lasso model # In[ ]: # Identify variables with zero coefficients. # Sometimes, we don't get exactly zero. # We may choose to focus on those that are close to 0 by setting a threshold. # In[ ]: #Let's filter the features in the training set based on the non- zero coefficients obtained from the Lasso model. #We only want the features that have non-zero coefficients, effectively performing feature selection.

# In[ ]: #Lasso gives you back regression coefficients for the variables it includes. #It often makes sense to then take the variables it included and just run a regular regression on them. #There are two benefits of this: # You get a regular OLS object in Python, which you know how to work with and interpret #So, in short: run Lasso to pick your variables, then run regular linear regression on those variables. # In[ ]: # Fit OLS using post-lasso Xfull = sm.add_constant(X_train) postlasso = sm.OLS(y_train, Xfull).fit() # Get the post-Lasso coefficients postlasso_coefs = postlasso.params postlasso.summary() #The postlasso.summary() represents a summary of the OLS #regression model that was fitted using the training data after applying Lasso regularization. # Compare the coefficients print("Lasso Coefficients:", model.coef_) print("Post-Lasso Coefficients:", postlasso_coefs) # In[ ]:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

7-2 Final Project Submission AI Solution Proposal.docx

5-2 Activity Proof of Concept Draft(1).docx

5-2 Final Project Milestone Three Proposed Changes.docx

Lesson 3.docx

AEM2 TASK 3.docx

ACM1 Task 1.docx

BA 222 Introduction to Pandas (Oct 11) Class.py

hw3-client.py

Variability.docx

Week 8 Assignment - IPsec, SLL_TLS Standards, and VPN Security.docx

CIS502 Theories of Security Management.docx

Roles of Core Technologies in an Effective IT System-3.pdf

Recommended textbooks for you

Systems Architecture

Computer Science

ISBN:9781305080195

Author:Stephen D. Burd

Publisher:Cengage Learning

Microsoft Visual C#

Computer Science

ISBN:9781337102100

Author:Joyce, Farrell.

Publisher:Cengage Learning,

EBK JAVA PROGRAMMING

Computer Science

ISBN:9781337671385

Author:FARRELL

Publisher:CENGAGE LEARNING - CONSIGNMENT

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning