Group Assignment - Lab 2 Part 1 Decision Tree

docx

School

York University *

*We aren’t endorsed by this school

Course

746

Subject

Computer Science

Date

Apr 3, 2024

Type

docx

Pages

12

Uploaded by LieutenantSteel8402

Report
Assignment 3 Decision Tree- SAS Miner Submitted by Harsh Dani – 301367879 Akshan Tuteja - Raehle - Ashish Miglani - Japkirat Khurana-301367976
1. A supermarket is offering a new line of organic products. The supermarket's management wants to determine which customers are likely to purchase these products. The supermarket has a customer loyalty program. As an initial buyer incentive plan, the supermarket provided coupons for the organic products to all of the loyalty program participants and collected data that includes whether these customers purchased any of the organic products. The ORGANICS data set contains 13 variables and over 22,000 observations. The variables in the data set are shown below with the appropriate roles and levels: Name Model Role Measurement Level Description ID ID Nominal Customer loyalty identification number DemAffl Input Interval Affluence grade on a scale from 1 to 30 DemAge Input Interval Age, in years DemCluster Rejected Nominal Type of residential neighborhood DemClusterGroup Input Nominal Neighborhood group DemGender Input Nominal M = male, F = female, U = unknown DemRegion Input Nominal Geographic region DemTVReg Input Nominal Television region PromClass Input Nominal Loyalty status: tin, silver, gold, or platinum PromSpend Input Interval Total amount spent PromTime Input Interval Time as loyalty card member TargetBuy Target Binary Organics purchased? 1 = Yes, 0 = No TargetAmt Rejected Interval Number of organic products purchased Although two target variables are listed, these exercises concentrate on the binary variable TargetBuy . a. Create a new diagram named Organics . b. Define the data set AAEM.ORGANICS as a data source for the project. 1) Set the model roles for the analysis variables as shown above. 2) Examine the distribution of the target variable. What is the proportion of individuals who purchased organic products? 3) The variable DemClusterGroup contains collapsed levels of the variable DemCluster . Presume that, based on previous experience, you believe that DemClusterGroup is sufficient for this type of modeling effort. Set the model role for DemCluster to Rejected .
4) As noted above, only TargetBuy will be used for this analysis and should have a role of Target . Can TargetAmt be used as an input for a model used to predict TargetBuy ? Why or why not? Finish the Organics data source definition. c. Add the AAEM.ORGANICS data source to the Organics diagram workspace. Add a Data Partition node to the diagram and connect it to the Data Source node. Assign 50% of the data for training and 50% for validation. d. Add a Decision Tree node to the workspace and connect it to the Data Partition node. e. Create a decision tree model autonomously. Use average square error as the model assessment statistic. 1) How many leaves are in the optimal tree? Which variable was used for the first split? 2) What were the competing splits for this first split? 3) Add a second Decision Tree node to the diagram and connect it to the Data Partition node. 4) In the Properties panel of the new Decision Tree node, change the maximum number of branches from a node to 3 to enable three-way splits. 5) Create a decision tree model. Use average square error as the model assessment statistic. 6) How many leaves are in the optimal tree? f. Based on average square error, which of the decision tree models appears to be better? g. Based on your analysis write a report to the supermarket manager how to determine the potential customers for the new line of organic products. Specify the variables (and their various levels) in relation to customer’s decision to purchase these products. Do you think the loyalty coupons worked? Answers: a. Create a new diagram named Organics . b. Define the data set AAEM.ORGANICS as a data source for the project.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1) Set the model roles for the analysis variables as shown. 2) Examine the distribution of the target variable. What is the proportion of individuals who purchased organic products? 3) The variable DemClusterGroup contains collapsed levels of the variable DemCluster . Presume that, based on previous experience, you believe that DemClusterGroup is sufficient for this type of modeling effort. Set the model role for DemCluster to Rejected .
4) As noted above, only TargetBuy will be used for this analysis and should have a role of Target . Can TargetAmt be used as an input for a model used to predict TargetBuy ? Why or why not? 1. Target buy focuses on the analysis of organics. Additionally, using TargetAmt as an input for a model used to predict TargetBuy may not be useful because TargetAmt is only recorded if there is any TargetBuy. Therefore, including TargetAmt would introduce bias into the model since non-purchasers do not have a value for TargetAmt.
c. Add the AAEM.ORGANICS data source to the Organics diagram workspace. Add a Data Partition node to the diagram and connect it to the Data Source node. Assign 50% of the data for training and 50% for validation.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
d. Add a Decision Tree node to the workspace and connect it to the Data Partition node. a. Create a decision tree model autonomously. Use average square error as the model assessment statistic.
1. How many leaves are in the optimal tree? 2. Which variable was used for the first split? 3. What were the competing splits for this first split?
e. Add a second Decision Tree node to the diagram and connect it to the Data Partition node. I. In the Properties panel of the new Decision Tree node, change the maximum number of branches from a node to 3 to enable three-way splits. II. Create a decision tree model. Use average square error as the model assessment statistic. III. How many leaves are in the optimal tree? f. Based on average square error, which of the decision tree models appears to be better?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
g. Based on the analysis write a report to the supermarket manager how to determine the potential customers for the new line of organic products. In the report, specify the variables (and their various levels) in relation to customer’s decision to purchase these products and how would the loyalty coupons work? You need to write about which decision tree you are using as the final decision model and specify how pragmatic the model is. Make sure that your report has an introduction, a discussion on your analysis, a conclusion and at-least 3 strategies based on the analysis. I am expecting a report that a non-data analyst would understand as well. This report holds the most points in this assignment. Introduction: This report analyses loyalty programme information as well as initial coupon distribution to determine the probability of consumers will buy our new organic goods. A Decision Tree model from the SAS Miner tool was used for the analysis, which considered several spending elements and demographics.  Analysis Discussion: Factors Influencing the Decision to Buy: 1. Affluence (DemAffl): A measure of a customer's wealth that is rated from 1 to 30. 2. Age (DemAge): The crucial factor in the decision tree's initial split. 3. Neighborhood (DemClusterGroup): Residential community of every kind, 4. Gender (DemGender): F (female), M (male), and U (unknown) 5. Region (DemRegion): a consumer's precise location. 6. TV Region (DemTVReg): TV Service Region  7. Loyalty Status (PromClass): Status categories like tin, gold, silver, or platinum.  8. Total Spend (PromSpend): The total sum that the client has spent. 9. PromTime: The length of time spent registered in the loyalty club. The efficiency of loyalty vouchers: All participants of the loyalty programme received coupons, and sales of organic goods were monitored. Following the distribution of coupons, 24.77% of people were buying organic products. Model Decision Tree: After the initial split was determined by age, the most suitable decision tree model contained 29 leaves. Another decision tree model incorporating three-way splits was assessed as well; it generated 33 leaves and scored marginally better in terms of average square error, suggesting a more precise forecast of consumer purchasing patterns.
Conclusion:   The results of the research indicate that the chance of buying organic products is highly influenced by specific demographics, loyalty status, and behavioural characteristics. As a first consumer promotion, the loyalty vouchers performed effectively, encouraging roughly 25% of loyalty programme members to buy organic goods. Strategies: 1. Tailored Advertising Campaigns: Focus advertising campaigns, particularly on age and income groups, on the groups that have been found to be more inclined to buy organic goods. 2. Enhanced Loyalty Programmes: To improve the frequency of buys, give consumers in higher income levels and those who have been members of your loyalty programme longer extra benefits for buying organic goods. 3. Product Selection and Location: Increase the variety of organic items and enhance in-store positioning for those of various ages in the data analysis that shows greater rates of buying. The decision tree model employed in this analysis is pragmatic because it can be upgraded when fresh data is obtained, utilizes readily available customer information and offers beneficial insights. The decision tree model with three-way splits was ultimately selected as the best option because of its improved fit and reduced average square error.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help