Power Assignment 8 - Group 2 (5)

docx

School

Simon Fraser University *

*We aren’t endorsed by this school

Course

462

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

docx

Pages

5

Uploaded by cdcb2901

Report
1) Classification tree: Determine the explanatory variables that best predict the “class” (e.g., low, med, high) of sales. a) Include a screenshot of your classification tree workflow. b) Include a screenshot of your best classification tree (only what you can fit on one screen). Walk the reader through the meaning of the first split. The first split is based on the "LTD ASP" variable, which indicates that the model has found that, within the dataset, there is a significant difference in the LTD Sales based on the average “life to date” sale price of the game. “LTD ASP” variable is the first branch in this classification tree, which has the best score according to the splitting heuristic used by KNIME (the Gini index).
The interpretation here is, for example, if the average “life to date” sale price of the game is lower than $29,166,700, the probability of LTD Sales being in the “high” class [8.091,293.025] is 10.9%. Meanwhile, if the brand's latest game sales is higher than $29,166,700, the probability of LTD Sales being in the “high” class [8.091,293.025] is 45.7%. c) How accurate is your classification model overall? Include your confusion matrix. The classification model has an accuracy of 61.431% d) What is its “recall” (or sensitivity) with respect to the highest revenue class? That is, how good is your model at correctly predicting good games? True Positives (TP) = 119 (predicted highest revenue class, which is correct) False Negatives (FN) = 54 (cases from the highest revenue class were incorrectly predicted as something else) The recall for the highest revenue class [8.091,293.025] is: Recall = TP / (TP + FN) = 119/(119+54) = 0.6878 or 68.78% So, the model has a recall of approximately 68.78% with respect to the highest revenue class, which means it is relatively good at correctly predicting good games in this class. 2) Regression tree: Use a similar process to create a regression tree. a) Include a screenshot of your regression tree workflow and result.
Result: b) What is the R2 of your regression tree? R-squared of this regression tree: 33.4%
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
c) Plot predicted versus observed values. X - prediction of LTD Sales Y - LTD Sales d) Walk the reader through the meaning of the first split.
The first split is based on the "TotalPriorSalesBrand" variable. If the TotalPriorSalesBrand is less than $64,266,100, $9,479,400 is the mean value of the LTD Sales. There are 722 training examples (rows of data) at this node. If TotalPriorSalesBrand is more than $64,266,100, $72,671,100 is the mean value of the LTD Sales. There are 31 training examples (rows of data) at this node. e) Which tool do you prefer for this task: classification trees or regression trees? Why? We opt for Regression Trees for this task due to the nature of the data, which primarily consists of continuous values. The use of bins, especially for the high revenue category, is not practical, given the wide range of values, spanning from 8 million to 293 million dollars.