Skip to main content

Documents Computer Science

Lab 2 Part 2 W21 Regression_GladysVillafuerte_301264680.docx.docx

Lab 2 Part 2 W21 Regression_GladysVillafuerte_301264680.docx

docx

School

Centennial College *

*We aren’t endorsed by this school

Course

MISC

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

17

Uploaded by MegaMonkeyPerson959

Big Data and Predictive Analysis Assignment 4 (Lab 2 Part 2) Predictive Modeling Using Regression-SAS Miner Submitted to Prof. David Parent Submitted by Gladys Anne Villafuerte - 301264680 Abimbola Babasola - 301249147

REGRESSION EXERCISE 1. Predictive Modeling Using Regression a. Return to the Chapter 3 Organics diagram in the My Project . Use the StatExplore tool on the ORGANICS data source. 1) First StatExplore node is connected to the ORGANICS node. 2) StatExplorer node results is generated

b. In-order to prepare for regression, missing values are imputed? Why do you think we should impute? Go to line 37 in the Output window, several of the class inputs have missing values. Go to line 65 of the Output window, most of the interval inputs also giving missing values. c. What changed after imputing? Type your answer here: Yes, Imputation is necessary to avoid obtaining a biased model, and its purpose is to substitute the missing values.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

d. Add an Impute node from the Modify tab into the diagram and connect it to the Data Partition node. Set the node to impute U for unknown class variable values and the overall mean for unknown interval variable values. Create imputation indicators for all imputed inputs. Type your answer here: Imputing data before reaching the Decision Tree node is unnecessary, as Decision Trees come with their own methods for handling missing values.

e. Add a Regression node to the diagram and connect it to the Impute node. Choose stepwise as the selection model and the validation error as the selection criterion. f. Choose stepwise as the selection model and the validation error as the selection criterion.

g. Run the Regression node and view the results. Maximize the Effect Plot. h. Which variables are included in the final model? Which variables are important in this model? What is the validation ASE? Type your answer here: Variable Included: DemAffl, DemGender, DemAge Important Variable: DemGender, DemAffl, Validation ASE: 0.137156

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

i) Go to line 664 in the Output window. j) The odds ratios indicate the effect that each input has on the logit score.

k) Interpret the odds ratio estimate: l) The validation ASE is given in the Fit Statistics window. Type your answer here: The given estimates appear to be odds ratios, which measure the relative odds of an event occurring between two groups. In general, an odds ratio greater than 1 indicates that the event is more likely to occur in the first group compared to the second group. Here are the interpretations for each of the given estimates - IMP_DemAffl: For every one-unit increase in DemAffl score, the odds of the event occurring increase by 28.3%. - IMP_DemAge: For every one-unit increase in DemAge, the odds of the event occurring decrease by 5.3%. - IMP_DemGender F vs U: The odds of the event occurring are almost 7 times higher for females compared to individuals with an unknown gender. - IMP_DemGender M vs U: The odds of the event occurring are almost 3 times higher for males compared to individuals with an unknown gender. - M_DemAffl 0 vs 1: The odds of the event occurring are 29.2% lower for individuals with a DemAffl score of 1 compared to those with a score of 0. - M_DemAge 0 vs 1: The odds of the event occurring are 20.4% lower for individuals with a DemAge score of 1 compared to those with a score of 0. - M_DemGender 0 vs 1: The odds of the event occurring are 31.5% lower for individuals with a male gender compared to those with a non-male gender.

PART 2 a. In preparation for regression, are any transformations of the data warranted? Why or why not? Answer: No, Outlier values can have a significant impact on the accuracy of regression models, and selecting input values that have high skewness can help improve the overall performance of the model. i. Open the Variables window of the Regression node. Select the imputed interval inputs. ii. Select Explore . The Explore window appears.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

b. Both Card Tenure and Affluence Grade have moderately skewed distributions. Applying a log transformation to these inputs might improve the model fit.

c. Disconnect the Impute node from the Data Partition node. d. Add a Transform Variables node from the Modify tab to the diagram and connect it to the Data Partition node. e. Connect the Transform Variables node to the Impute node. f. Apply a log transformation to the DemAffl and PromTime inputs. i. Open the Variables window of the Transform Variables node. ii. Select Method  Log for the DemAffl and PromTime inputs. Select OK to close the Variables window. g. Run the Transform Variables node. Explore the exported training data. Did the transformations result in less skewed distributions? Answer: Yes, Transformation leads to a distribution that is less skewed. i. The easiest way to explore the created inputs is to open the Variables window in the subsequent Impute node. Make sure that you update the Impute node before opening its Variables window. ii. With the LOG_DemAffl and LOG_PromTime inputs selected, select Explore .

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

The distributions are nicely symmetric. h. Rerun the Regression node. Do the selected variables change? How about the validation ASE? Answer: Validation ASE changed from 0.1371 to 0.138204 i. Go to line 664 of the Output window. Apparently the log transformation actually increased the validation ASE slightly.

i. Create a full second-degree polynomial model. How does the validation average squared error for the polynomial model compare to the original model? Answer: The validation ASE is slightly decreased by the extra terms i. Add another Regression node to the diagram and rename it Polynomial Regression .

ii. Make the indicated changes to the Polynomial Regression Properties panel and run the node. iii. Go to line 1598 of the results output window.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

iv. The polynomial regression node adds additional interaction terms. v. Examine the Fit Statistics window.

k) In your words, describe what did you do in this assignment and why you had to do each of these steps? Plus, how would you describe the IV’s that have an impact on the DV. Type your answer here: Transformation leads to a distribution that is less skewed.Typically, when completing an assignment, the first step is to carefully read and understand the task requirements. This allows the person completing the assignment to determine what needs to be done and what resources they may need. The next step is to research and gather information relevant to the assignment. This may involve reading books, articles, or other materials, conducting experiments or surveys, or analyzing data. Once the necessary information has been gathered, the person completing the assignment should organize it and develop a plan for how to present it. This may involve creating an outline or rough draft, developing visual aids such as charts or graphs, or creating a presentation or report. The final step is to review and edit the assignment to ensure that it is complete, accurate, and well- written. This may involve checking for errors in spelling or grammar, ensuring that all sources are properly cited, and making sure that the presentation is clear and effective. Regarding the independent variables (IVs) that impact the dependent variable (DV), these can vary depending on the specific assignment or research question being studied. Generally, IVs are the factors that are manipulated or controlled in an experiment, while the DV is the variable that is measured or observed as a result of the IVs.

Related Documents

CS115_Fall_2023_Lab09_Description (2).docx

CS115_Fall_2023_Lab10_Description.docx

CP4P_CompressionBackup_Activity_Instructions.pdf

CSCO 220 Modules 9 and 10 Lab Packet-QoS and Network Management.docx

Lab 16 (Port Security) Tesah Capers.doc

RA 2 Counting Lines & Words (Fall 2023).pdf

Lab 3 Part 1 Neural Networks_GladysVillafuerte_301264680.docx

CIS 236 Final Exam.pdf

S2023_Assignment 2.docx

Module 6 Discussion.docx

Module 8 Reflection Activity.docx

Burge-Irby,April 1.3.pdf

Recommended textbooks for you

Text book image

Oracle 12c: SQL

Computer Science

ISBN:9781305251038

Author:Joan Casteel

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Oracle 12c: SQL
Computer Science
ISBN:9781305251038
Author:Joan Casteel
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Text book image

Oracle 12c: SQL

Computer Science

ISBN:9781305251038

Author:Joan Casteel

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS