Machine Learning and Feature Selection (Nov 27) Class(1)

py

School

Boston University *

*We aren’t endorsed by this school

Course

BA222

Subject

Computer Science

Date

Jan 9, 2024

Type

py

Pages

3

Uploaded by MinisterWolverine2792

Report
#!/usr/bin/env python # coding: utf-8 # In[ ]: #Created on Wed Nov 9 10:27:58 2022 #This document was created by Dr. N. Orkun Baycik and cannot be shared with anyone without permission #@author: nobaycik # In[ ]: import pandas as pd import statsmodels.api as sm from sklearn.linear_model import Lasso from sklearn.model_selection import train_test_split # Read and view data Companies = pd.read_excel("Companies2018.xlsx") #Get the variable names: # In[ ]: # Use all quantitative variables as your predictor for now. # We could use categorical ones for sure. Just need to add dummy variables as usual. # Response variable is "Gross Profit==]" y = Companies["Gross Profit"] X = Companies[['SG&A Expense', 'Operating Expenses', 'Operating Income', 'Interest Expense', 'Earnings before Tax', 'Income Tax Expense', 'Net Income', 'Preferred Dividends', 'EPS', 'EPS Diluted', 'Weighted Average Shs Out', 'Weighted Average Shs Out (Dil)', 'Dividend per Share', 'Gross Margin', 'EBITDA Margin', 'EBIT Margin', 'Profit Margin', 'Free Cash Flow margin', 'EBITDA', 'EBIT', 'Consolidated Income', 'Earnings Before Tax Margin', 'Net Profit Margin',
'Cash and cash equivalents', 'Short-term investments', 'Cash and short-term investments', 'Receivables', 'Inventories', 'Total current assets', 'Property, Plant & Equipment Net', 'Goodwill and Intangible Assets', 'Long-term investments', 'Tax assets']] #These are all the things we have already learned and done many times! # In[ ]: # In ML algorithms, we usually split the data into: Training and Test. # Training data searches for patterns in the data set and test data is for testing of the model. # Split the dataset into training and testing sets. X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 1) #splits entire data frame into two pieces # In[ ]: # Fit a Lasso model # In[ ]: # Identify variables with zero coefficients. # Sometimes, we don't get exactly zero. # We may choose to focus on those that are close to 0 by setting a threshold. # In[ ]: #Let's filter the features in the training set based on the non- zero coefficients obtained from the Lasso model. #We only want the features that have non-zero coefficients, effectively performing feature selection.
# In[ ]: #Lasso gives you back regression coefficients for the variables it includes. #It often makes sense to then take the variables it included and just run a regular regression on them. #There are two benefits of this: # You get a regular OLS object in Python, which you know how to work with and interpret #So, in short: run Lasso to pick your variables, then run regular linear regression on those variables. # In[ ]: # Fit OLS using post-lasso Xfull = sm.add_constant(X_train) postlasso = sm.OLS(y_train, Xfull).fit() # Get the post-Lasso coefficients postlasso_coefs = postlasso.params postlasso.summary() #The postlasso.summary() represents a summary of the OLS #regression model that was fitted using the training data after applying Lasso regularization. # Compare the coefficients print("Lasso Coefficients:", model.coef_) print("Post-Lasso Coefficients:", postlasso_coefs) # In[ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help