Assignment 4_MKTG 746_Group5

docx

School

York University *

*We aren’t endorsed by this school

Course

746

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

15

Uploaded by LieutenantSteel8402

Report
Big Data and Predictive Analysis Assignment 4 (Lab 2 Part 2) Predictive Modeling Using Regression-SAS Miner Submitted by
REGRESSION EXERCISE 1. Predictive Modeling Using Regression a. Return to the Chapter 3 Organics diagram in the My Project . Use the StatExplore tool on the ORGANICS data source. 1) First StatExplore node is connected to the ORGANICS node. 2) StatExplorer node results is generated b. In-order to prepare for regression, missing values are imputed? Why do you think we should impute?
c. What changed after imputing? d. Add an Impute node from the Modify tab into the diagram and connect it to the Data Partition node. Set the node to impute U for unknown class variable values and the overall mean for unknown interval variable values. Create imputation indicators for all imputed inputs. e. Add a Regression node to the diagram and connect it to the Impute node. Choose stepwise as the selection model and the validation error as the selection criterion.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
f. Choose stepwise as the selection model and the validation error as the selection criterion. g.
h. Run the Regression node and view the results. Maximize the Effect Plot. i. Which variables are included in the final model? Which variables are important in this model? What is the validation ASE?
i) Go to line 664 in the Output window.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
j) The odds ratios indicate the effect that each input has on the logit score. k) Interpret the odds ratio estimate: 1. For IMP_DemAffl, the odds ratio of 1.283 suggests that for every unit increase in democratic affiliation, the odds of purchasing organic products increases by approximately 28%. 2. For IMP_DemAge, the odds ratio of 0.947 implies that for each additional year of age, the odds of buying organic products decreases by roughly 5%, assuming all else remains equal. 3. Regarding IMP_DemGender, the odds ratio of 6.967 for females versus unknown suggests that women have almost six times greater odds of purchasing organic products compared to those whose gender is unknown. Likewise, the odds ratio of 2.899 for males versus unknown suggests that men are nearly three times more likely to buy organic products than those whose gender is unknown. 4. T his indicates that females are 6 times more likely to purchase as compared to men.
l) The validation ASE is given in the Fit Statistics window. PART 2
a. In preparation for regression, are any transformations of the data warranted? Why or why not? Because of outliers, regression model does not give proper results. Also, before starting a regression model, we select the variables which are highly skewed. i. Open the Variables window of the Regression node. Select the imputed interval inputs. ii. Select Explore . The Explore window appears. b. Both Card Tenure and Affluence Grade have moderately skewed distributions. Applying a log transformation to these inputs might improve the model fit.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
c. Disconnect the Impute node from the Data Partition node. d. Add a Transform Variables node from the Modify tab to the diagram and connect it to the Data Partition node. e. Connect the Transform Variables node to the Impute node. f. Apply a log transformation to the DemAffl and PromTime inputs. i. Open the Variables window of the Transform Variables node. ii. Select Method Log for the DemAffl and PromTime inputs. Select OK to close the Variables window. g. Run the Transform Variables node. Explore the exported training data. Did the transformations result in less skewed distributions? Yes, it resulted in a less skewed distribution. i. The easiest way to explore the created inputs is to open the Variables window in the subsequent Impute node. Make sure that you update the Impute node before opening its Variables window. ii. With the LOG_DemAffl and LOG_PromTime inputs selected, select Explore . The distributions are nicely symmetric. h. Rerun the Regression node. Do the selected variables change? How about the validation ASE? The selected variables changed and the validation ASE changed to 0.137535.
i. Go to line 664 of the Output window.
i. Apparently the log transformation actually increased the validation ASE slightly. j. Create a full second-degree polynomial model. How does the validation average squared error for the polynomial model compare to the original model? There is slight reduction in the average squared error in polynomial regression in comparison to the original model. i. Add another Regression node to the diagram and rename it Polynomial Regression .
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ii. Make the indicated changes to the Polynomial Regression Properties panel and run the node. iii. Go to line 1598 of the results output window. iv. The polynomial regression node adds additional interaction terms. v. Examine the Fit Statistics window.
k) In your words, describe what did you do in this assignment and why you had to do each of these steps? Plus, how would you describe the IV’s that have an impact on the DV. Using SAS Miner's Regression function, we performed predictive modelling. Preparing the data, analysing it, converting it, and then performing regression analysis were the processes involved. The steps and reasoning are broken down as follows: 1. StatExplore and Data Imputation: We first used the StatExplore tool to look into the ORGANICS sample set, which allowed us to find any absent values. By putting any blank values—usually the average for interval factors and a standard grouping for class factors—imputation was required to prevent bias in the framework. 2. Establishing the Regression Model: We created an impute node and then assigned it to the average for interval parameters and unidentified class parameters after imputing the data that was absent. This guaranteed that we had a way to assess new situations and an entire set of information for modelling. 3. Model Building and Analysis: To ascertain which factors were more accurate, we employed a stepwise choosing model with validation error as the regression node's criteria. DemGender, DemAge, and DemAffl were the factors that were chosen. To evaluate the model's fit, the validation ASE (Average Squared Error) was computed.
4. How to Interpret the Odds Ratios: Our dependent variable i.e. the possibility of buying organic items, was represented by the probability ratios from the results, which indicated the influence of each independent variable (IV) on this likelihood (DV). For example, the probabilities increased by around 28% for every single increase in DemAffl (IV). 5. Data Transformation: Regression outcomes may be impacted by the outliers and skew we found in the parameters. Model fit was enhanced by using modifications, such as logarithmic modifications for factors like Card Tenure and Affluence Grade, to produce less skewed ranges. 6. Refine the model: We unplugged some of the nodes, inserted a Transform Variables node, and then ran the regression to see how the validation ASE and certain variables that were chosen were modified. The fact that ASE can occasionally rise after a transformation suggests that the model may not always be enhanced by the modification. 7. Polynomial Regression: To evaluate the validation ASE of the final model with the original model, we constructed a complete second-degree polynomial version. We discovered a small decrease in error. To guarantee that the predictive analysis was as precise and trustworthy as possible, each stage in this procedure contributed to the data and model's improvement. The factors that are thought to affect the DV, such as DemAffl, DemAge, and DemGender, or IVs, were carefully investigated to determine how they affected the likelihood of buying organic goods or DV. All these steps were important for developing an accurate predictive model that might be utilised to make reasonable inferences according to the evaluation.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help