Skip to main content

Documents Statistics

Assignment 4_MKTG 746_Group5.docx

Assignment 4_MKTG 746_Group5

docx

School

York University *

*We aren’t endorsed by this school

Course

746

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

15

Uploaded by LieutenantSteel8402

Big Data and Predictive Analysis Assignment 4 (Lab 2 Part 2) Predictive Modeling Using Regression-SAS Miner Submitted by

REGRESSION EXERCISE 1. Predictive Modeling Using Regression a. Return to the Chapter 3 Organics diagram in the My Project . Use the StatExplore tool on the ORGANICS data source. 1) First StatExplore node is connected to the ORGANICS node. 2) StatExplorer node results is generated b. In-order to prepare for regression, missing values are imputed? Why do you think we should impute?

c. What changed after imputing? d. Add an Impute node from the Modify tab into the diagram and connect it to the Data Partition node. Set the node to impute U for unknown class variable values and the overall mean for unknown interval variable values. Create imputation indicators for all imputed inputs. e. Add a Regression node to the diagram and connect it to the Impute node. Choose stepwise as the selection model and the validation error as the selection criterion.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

f. Choose stepwise as the selection model and the validation error as the selection criterion. g.

h. Run the Regression node and view the results. Maximize the Effect Plot. i. Which variables are included in the final model? Which variables are important in this model? What is the validation ASE?

i) Go to line 664 in the Output window.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

j) The odds ratios indicate the effect that each input has on the logit score. k) Interpret the odds ratio estimate: 1. For IMP_DemAffl, the odds ratio of 1.283 suggests that for every unit increase in democratic affiliation, the odds of purchasing organic products increases by approximately 28%. 2. For IMP_DemAge, the odds ratio of 0.947 implies that for each additional year of age, the odds of buying organic products decreases by roughly 5%, assuming all else remains equal. 3. Regarding IMP_DemGender, the odds ratio of 6.967 for females versus unknown suggests that women have almost six times greater odds of purchasing organic products compared to those whose gender is unknown. Likewise, the odds ratio of 2.899 for males versus unknown suggests that men are nearly three times more likely to buy organic products than those whose gender is unknown. 4. T his indicates that females are 6 times more likely to purchase as compared to men.

l) The validation ASE is given in the Fit Statistics window. PART 2

a. In preparation for regression, are any transformations of the data warranted? Why or why not? Because of outliers, regression model does not give proper results. Also, before starting a regression model, we select the variables which are highly skewed. i. Open the Variables window of the Regression node. Select the imputed interval inputs. ii. Select Explore . The Explore window appears. b. Both Card Tenure and Affluence Grade have moderately skewed distributions. Applying a log transformation to these inputs might improve the model fit.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

c. Disconnect the Impute node from the Data Partition node. d. Add a Transform Variables node from the Modify tab to the diagram and connect it to the Data Partition node. e. Connect the Transform Variables node to the Impute node. f. Apply a log transformation to the DemAffl and PromTime inputs. i. Open the Variables window of the Transform Variables node. ii. Select Method  Log for the DemAffl and PromTime inputs. Select OK to close the Variables window. g. Run the Transform Variables node. Explore the exported training data. Did the transformations result in less skewed distributions? Yes, it resulted in a less skewed distribution. i. The easiest way to explore the created inputs is to open the Variables window in the subsequent Impute node. Make sure that you update the Impute node before opening its Variables window. ii. With the LOG_DemAffl and LOG_PromTime inputs selected, select Explore . The distributions are nicely symmetric. h. Rerun the Regression node. Do the selected variables change? How about the validation ASE? The selected variables changed and the validation ASE changed to 0.137535.

i. Go to line 664 of the Output window.

i. Apparently the log transformation actually increased the validation ASE slightly. j. Create a full second-degree polynomial model. How does the validation average squared error for the polynomial model compare to the original model? There is slight reduction in the average squared error in polynomial regression in comparison to the original model. i. Add another Regression node to the diagram and rename it Polynomial Regression .

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

ii. Make the indicated changes to the Polynomial Regression Properties panel and run the node. iii. Go to line 1598 of the results output window. iv. The polynomial regression node adds additional interaction terms. v. Examine the Fit Statistics window.

k) In your words, describe what did you do in this assignment and why you had to do each of these steps? Plus, how would you describe the IV’s that have an impact on the DV. Using SAS Miner's Regression function, we performed predictive modelling. Preparing the data, analysing it, converting it, and then performing regression analysis were the processes involved. The steps and reasoning are broken down as follows: 1. StatExplore and Data Imputation: We first used the StatExplore tool to look into the ORGANICS sample set, which allowed us to find any absent values. By putting any blank values—usually the average for interval factors and a standard grouping for class factors—imputation was required to prevent bias in the framework. 2. Establishing the Regression Model: We created an impute node and then assigned it to the average for interval parameters and unidentified class parameters after imputing the data that was absent. This guaranteed that we had a way to assess new situations and an entire set of information for modelling. 3. Model Building and Analysis: To ascertain which factors were more accurate, we employed a stepwise choosing model with validation error as the regression node's criteria. DemGender, DemAge, and DemAffl were the factors that were chosen. To evaluate the model's fit, the validation ASE (Average Squared Error) was computed.

4. How to Interpret the Odds Ratios: Our dependent variable i.e. the possibility of buying organic items, was represented by the probability ratios from the results, which indicated the influence of each independent variable (IV) on this likelihood (DV). For example, the probabilities increased by around 28% for every single increase in DemAffl (IV). 5. Data Transformation: Regression outcomes may be impacted by the outliers and skew we found in the parameters. Model fit was enhanced by using modifications, such as logarithmic modifications for factors like Card Tenure and Affluence Grade, to produce less skewed ranges. 6. Refine the model: We unplugged some of the nodes, inserted a Transform Variables node, and then ran the regression to see how the validation ASE and certain variables that were chosen were modified. The fact that ASE can occasionally rise after a transformation suggests that the model may not always be enhanced by the modification. 7. Polynomial Regression: To evaluate the validation ASE of the final model with the original model, we constructed a complete second-degree polynomial version. We discovered a small decrease in error. To guarantee that the predictive analysis was as precise and trustworthy as possible, each stage in this procedure contributed to the data and model's improvement. The factors that are thought to affect the DV, such as DemAffl, DemAge, and DemGender, or IVs, were carefully investigated to determine how they affected the likelihood of buying organic goods or DV. All these steps were important for developing an accurate predictive model that might be utilised to make reasonable inferences according to the evaluation.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

MATH302 Week 7 Knowledge Check.pdf

575_E4_Salgueiro.xlsx

Descriptive_Statistics.xlsx

Activity Intro Regress - R.docx

Activity Indexing R.docx

Introduction+to+Inference-MBA+example.docx

SPH-Q 381.SP24 Homework 4 QUESTIONS FEB 2024.docx

Biostats Homework 2.docx

Biostats Homework 3.docx

Data report (Assignment 3) 2023.docx

module1_stata_lab.docx

Assignment-6-Introduction-to-working-with-R-RStudio.docx

Related Questions

2 PART Question regarding MULTIPLE LINEAR REGRESSION A researcher is interested in predicting the number of homes sold from years in business as a real estate agent and their level of education.Part 1- In jamovi explore the relationship between these variables and describe what the results are PART 2 - Conduct the regression analysis in jamovi

Tire pressure (psi) and mileage (mpg) were recorded for a random sample of seven cars of thesame make and model. The extended data table (left) and fit model report (right) are based on aquadratic model What is the predicted average mileage at tire pressure x = 31?

Create the regression equations based on the research model below!

We have data on Lung Capacity of persons and we wish to build a multiple linear regression model that predicts Lung Capacity based on the predictors Age and Smoking Status. Age is a numeric variable whereas Smoke is a categorical variable (0 if non-smoker, 1 if smoker). Here is the partial result from STATISTICA. b* Std.Err. of b* Std.Err. N=725 of b Intercept Age Smoke 0.835543 -0.075120 1.085725 0.555396 0.182989 0.014378 0.021631 0.021631 -0.648588 0.186761 Which of the following statements is absolutely false? A. The expected lung capacity of a smoker is expected to be 0.648588 lower than that of a non-smoker. B. The predictor variables Age and Smoker both contribute significantly to the model. C. For every one year that a person gets older, the lung capacity is expected to increase by 0.555396 units, holding smoker status constant. D. For every one unit increase in smoker status, lung capacity is expected to decrease by 0.648588 units, holding age constant.

Used cars 2010 Vehix.com offered several used ToyotaCorollas for sale. The following table displays the ages ofthe cars and the advertised prices. a) Make a scatterplot for these data.b) Do you think a linear model is appropriate? Explain.c) Find the equation of the regression line. d) Check the residuals to see if the conditions for infer-ence are met. Age (yr) Price ($) Age (yr) Price ($)1 15988 6 99951 13988 6 119882 14488 7 89903 10995 8 94883 13998 8 89954 13622 9 59904 12810 10 41005 9988 12 2995

Define the Linear Regression Model. Also explain Terminology for the Linear Regression Model with a Single Regressor?

(Print-screen your Excel -Solution and upload it) The electric power consumed each month by a chemical plant is thought to be related to the average ambient temperature x1, the number of days in the month x2, the average product purity X3, and the tons of product produced X4. The past year's historical data are available and are presented in the following table. (a) Fit a multiple linear regression model using the above data set (b) Predict power consumption for a month in which x1 = 75°F, x2 = 24 days, x3 = 90%, and x4 = 98 tons.

Gas mileage As the example in the chapter indicates,one of the important factors determining a car’s FuelEfficiency is its Weight. Let’s examine this relationshipagain, for 11 cars.a) Describe the association between these variablesshown in the scatterplot.b) Here is the regression analysis for the linear model.What does the slope of the line say about thisrelationship?c) Do you think this linear model is appropriate? Use theresiduals plot to explain your decision.

Can you help me answer this question please

Independent variable data is listed in cells B2 through B100, and dependent variable data is in cells C2 through C100. Which spreadsheet function would calculate the slope of a linear regression model of this data? Group of answer choices =SLOPE(B2:B100,C2:C100) =SLOPE(C2:C100,B2:B100) =SLOPE(B2,C2) =SLOPE(C2,C100,B2,B100)

Management of a soft drink bottling company has the business objective of developing a method for allocating delivery costs to customers. Although one cost clearly relates to travel time within a particular route, another variable cost reflects the time required to unload the cases of soft drink at the delivery point. To begin, management decided to develop a regression model to predict delivery time based on the number of cases delivered. A sample of 7 deliveries within a territory was selected. The delivery times and the number of cases delivered were organized in the following table: Customer No. of Cases Delivery Time 1 14 24 16 31 3 17 28 4 19 30 5 11 20 6 16 22 7 24 40 a) Use the least-squares method to compute the regression coefficients. b) Write down the estimated equation and interpret the meaning of the coefficients in this problem c) Predict the mean delivery time for 26 cases of soft drink. d) Determine the value of the extent of relationship between delivery time and…

Corvette, Ferrari, and Jaguar produced a variety of classic cars that continue to increase in value. The data showing the rarity rating (1–20) and the high price ($1000s) for 15 classic cars is contained in the Excel Online file below. Construct a spreadsheet to answer the following questions. Open spreadsheet Develop a scatter diagram of the data using the rarity rating as the independent variable and price as the dependent variable. Does a simple linear regression model appear to be appropriate? A simple linear regression model _________appearsdoes not appear to be appropriate. Develop an estimated multiple regression equation with rarity rating and as the two independent variables. (to whole numbers) What is the value of the coefficient of determination? Note: report between 0 and 1. (to 3 decimals) What is the value of the test statistic? (to 2 decimals) What is the -value? (to 4 decimals) Consider the nonlinear relationship shown by equation .…

COmpare and constrast the use of prediction intervals for a Single Linear Regression model having one X and Multiple Linear Regression Model having two predictors X1 and X2. WHat are the similarities/differences in process and interpretation?

SEE MORE QUESTIONS

Recommended textbooks for you

Text book image

College Algebra

Algebra

ISBN:9781337282291

Author:Ron Larson

Publisher:Cengage Learning

Text book image

Algebra & Trigonometry with Analytic Geometry

Algebra

ISBN:9781133382119

Author:Swokowski

Publisher:Cengage

SEE MORE TEXTBOOKS

Related Questions

SEE MORE QUESTIONS

Recommended textbooks for you

College Algebra
Algebra
ISBN:9781337282291
Author:Ron Larson
Publisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Text book image

College Algebra

Algebra

ISBN:9781337282291

Author:Ron Larson

Publisher:Cengage Learning

Text book image

Algebra & Trigonometry with Analytic Geometry

Algebra

ISBN:9781133382119

Author:Swokowski

Publisher:Cengage

SEE MORE TEXTBOOKS