D208_Task2_PA

docx

School

Western Governors University *

*We aren’t endorsed by this school

Course

D208

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by MagistrateAntelope3113

D208 - Predictive Modeling Logistic Regression Modeling

Table of Contents Part I: Research Question ......................................................................................................... 3 A. Describe Purpose of Analysis ......................................................................................................... 3 1. Summarize one research question .................................................................................................................... 3 2. Define Goals of Analysis .................................................................................................................................... 3 Part II: Method Justification ..................................................................................................... 3 B. Describe Multiple Logistic Regression Methods ............................................................................. 3 1. Summarize four assumptions of a logistic regression model ............................................................................ 3 2. Describe two benefits of using Python in support of analysis .......................................................................... 3 3. Explain why logistic regression an appropriate technique is to use based on question in Part I ..................... 3 Part III: Data Prep .................................................................................................................... 3 C. Summarize the data prep process .................................................................................................. 3 1. Describe Data cleaning goals – See attached .ipynb File ................................................................................... 3 2. Describe dependent and all independent variables ......................................................................................... 5 3. Generate univariate and bivariate visualizations of the distributions – independent and dependent variables, include dependent variable in bivariate visualization .......................................................................... 7 4. Describe data transformation goals that algin with your research question and the steps used to transform the data to achieve goals, include annotated code ............................................................................................ 13 5. Provide Prepared set as a CSV file ................................................................................................................... 14 Part IV: Model Comparison & Analysis ................................................................................... 15 D. Compare initial and reduced linear regression model .................................................................. 15 1. Initial multiple linear regression model with all variables from part C2 ......................................................... 15 2. Justify statistically based feature selection ..................................................................................................... 16 3. Provide reduced linear regression model ....................................................................................................... 19 E ....................................................................................................................................................... 20 1. Model Evaluation Mettric explanation ............................................................................................................ 20 2. Confusion Matrix & Accuracy Calculation ....................................................................................................... 21 3. Attached code .............................................................................................................................. 22 F. Summary ...................................................................................................................................... 23 1. Discuss results ................................................................................................................................................. 23 2. Recommend course of action .......................................................................................................................... 24 G ...................................................................................................................................................... 24 H ...................................................................................................................................................... 24 I ........................................................................................................................................................ 24

Part I: Research Question A. Describe Purpose of Analysis 1. Summarize one research question What factors contribute to Churn? 2. Define Goals of Analysis The objective of my analysis is to gain insight into what customer factors directly correlate to whether or not a customer Churns. Part II: Method Justification B. Describe Multiple Logistic Regression Methods 1. Summarize four assumptions of a logistic regression model Assumptions for this model include: There is independence of observations, the outcome of one observation should not influence what happens in another observation. There is nominal independence that the independent variables don’t correlate highly with each other. A goodness of fit test should be used to evaluate how well the model fits our data. The independent variables and log-fits should be linear. 2. Describe two benefits of using Python in support of analysis Jupyter notebook and Python are the tools I used to complete this analysis. Using Python as my method of analysis is beneficial for many reasons. But I will only list 2. The first benefit is that Python offers multiple libraries of data visualization that I can use to help me visualize my logistic regression models. The second benefit is that it has a rich ecosystem of libraries which means that it has many of the calculations already built out which can save time in the analysis phase. These both mean that I can calculate and visualize my data with ease using Python. 3. Explain why logistic regression an appropriate technique is to use based on question in Part I Our target variable, Churn is a binary, categorical field. Logistic regression will help identify the elements that influence it. Therefore, logistic regression is an excellent technique to assist me in answering my question in Part I. We will test independent variables to determine the affect they have on our target variable. The affect could be positive, negative, or none. Part III: Data Prep C. Summarize the data prep process 1. Describe Data cleaning goals – See attached .ipynb File While becoming familiar with the data, by using .describe(), box plots, and .isnull() sums, I was able to identify areas that needed to be cleaned in the data. The goal is to have a data environment that is optimal to perform a linear regression analysis.

Your preview ends here