BDAT1007_Assignment 3_Group 2

docx

School

Georgian College *

*We aren’t endorsed by this school

Course

1007

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by AgentGrouseMaster1039

BDAT 1007: SOCIAL DATA MINING TECHNIQUES Assignment 3 – Group Work March 24, 2023 Elysse Joy Angelica Pascual, Iman Shokri, Haben Iyob, Suryadevsinh Zala Georgian College

Group 2 Topic: Technology Data: Technology - Kaggle Dataset Link: https:// Part A – Decision Tree Analysis - Iman Shokri Attributes (Display Size, Graphic Card, Original Price, OS, Star Rating) The first step was to use the retrieve operator to load the input data, which was followed by select attributes to select the required attributes. Furthermore, the set role operator was used to set our target variable. After that cross-validation operator has been added to split the data into training and testing data which will be further used in the process. Cross validation operator connects to a split screen where we define our training and testing operators. The max depth for decision tree operator was set to 5. After that apply model operator was used to apply the model. The last operator added was performance operator which helped us evaluate the performance.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Part B – Logistic Regression Analysis - Iman Shokri Firstly, we are using a retrieve operator to read input data, followed by a select attributes operator which is used to select specific dependent (target) variable and independent variables. The third operator is numerical to binomial operator used to convert our target variable, from numerical to binomial. Then we set our target variable by using set role operator followed by a logistic regression operator which takes the training dataset and generates the predicted model. The next operator we have used is apply model which expects two inputs –one is a model coming from output of logistic regression operator and the other is the example set generated from the LR operator. Finally, we use a performance operator to evaluate the performance of the model.

Part A – Decision Tree Analysis - Haben Iyob Part B – Logistic Regression Analysis - Haben Iyob Part A – Decision Tree Analysis - Suryadevsinh Zala

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Part B – Logistic Regression Analysis - Suryadevsinh Zala

Part A – Decision Tree Analysis - Elysse Joy Angelica Pascual Part B – Logistic Regression Analysis - Elysse Joy Angelica Pascual PART C: When to adopt Logistic Regression Analysis Technique – Group Work

Binary classification: Logistic regression is often used for binary classification tasks where the dependent variable has two possible values, such as 0 or 1. For example, you might use logistic regression to predict whether a customer will churn based on their demographic information and transaction history. Multiclass classification: Logistic regression can also be used for multiclass classification tasks where the dependent variable has more than two possible values. In this case, you would use a variant of logistic regression called multinomial logistic regression. Linear decision boundary: Logistic regression assumes a linear decision boundary between the classes, which means that it works well when the classes can be separated by a straight line or plane. If the decision boundary is more complex, you might consider using a more advanced technique such as decision trees or support vector machines. Large datasets: Logistic regression can handle large datasets efficiently, making it a good choice when you have a lot of data to work with. When it comes to visualization, there are several charts and diagrams that you can use to understand and interpret the results of logistic regression: Confusion matrix: A confusion matrix is a table that shows the number of true positives, false positives, true negatives, and false negatives for a given set of predictions. From this matrix, you can calculate metrics such as accuracy, precision, recall, and F1 score, which can help you evaluate the performance of the logistic regression model. ROC curve: An ROC (receiver operating characteristic) curve is a plot of true positive rate (TPR) against false positive rate (FPR) for different classification thresholds. The ROC curve helps you evaluate the tradeoff between sensitivity (true positive rate) and specificity (true negative rate) for different thresholds. Precision-recall curve: A precision-recall curve is a plot of precision (positive predictive value) against recall (true positive rate) for different classification thresholds. The precision-recall curve helps you evaluate the tradeoff between precision and recall for different thresholds. Scatter plot: A scatter plot can be used to visualize the relationship between two independent variables and the dependent variable. For example, you might use a scatter plot to see how customer age and income relate to the likelihood of churn. Histogram: A histogram can be used to visualize the distribution of a single variable. For example, you might use a histogram to see how transaction amount is distributed among customers who churned and those who did not. By using these visualization techniques, you can gain insights into the relationships between variables, evaluate the performance of the logistic regression model, and identify areas for improvement.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

BDAT1007_Assignment 3_Group 2

Related Documents