BDAT1007_Assignment 3_Group 2
docx
keyboard_arrow_up
School
Georgian College *
*We aren’t endorsed by this school
Course
1007
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
docx
Pages
9
Uploaded by AgentGrouseMaster1039
BDAT 1007: SOCIAL DATA MINING TECHNIQUES
Assignment 3 – Group Work
March 24, 2023
Elysse Joy Angelica Pascual, Iman Shokri, Haben Iyob, Suryadevsinh Zala
Georgian College
Group 2 Topic: Technology
Data: Technology - Kaggle
Dataset Link: https://
Part A – Decision Tree Analysis - Iman Shokri
Attributes (Display Size, Graphic Card, Original Price, OS, Star Rating)
The first step was to use the retrieve operator to load the input data, which was followed by select
attributes to select the required attributes. Furthermore, the set role operator was used to set our target
variable. After that cross-validation operator has been added to split the data into training and testing
data which will be further used in the process. Cross validation operator connects to a split screen
where we define our training and testing operators. The max depth for decision tree operator was set
to 5. After that apply model operator was used to apply the model. The last operator added was
performance operator which helped us evaluate the performance.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Part B – Logistic Regression Analysis - Iman Shokri
Firstly, we are using a retrieve operator to read input data, followed by a select attributes operator which
is used to select specific dependent (target) variable and independent variables. The third operator is
numerical to binomial operator used to convert our target variable, from numerical to binomial. Then we
set our target variable by using set role operator followed by a logistic regression operator which takes the
training dataset and generates the predicted model. The next operator we have used is apply model which
expects two inputs –one is a model coming from output of logistic regression operator and the other is the
example set generated from the LR operator. Finally, we use a performance operator to evaluate the
performance of the model.
Part A – Decision Tree Analysis - Haben Iyob
Part B – Logistic Regression Analysis - Haben Iyob
Part A – Decision Tree Analysis - Suryadevsinh Zala
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Part
B
–
Logistic
Regression
Analysis
-
Suryadevsinh
Zala
Part A – Decision Tree Analysis - Elysse Joy Angelica Pascual
Part B – Logistic Regression Analysis - Elysse Joy Angelica Pascual
PART C: When to adopt Logistic Regression Analysis Technique – Group Work
Binary classification: Logistic regression is often used for binary classification tasks where the
dependent variable has two possible values, such as 0 or 1. For example, you might use logistic
regression to predict whether a customer will churn based on their demographic information and
transaction history.
Multiclass classification: Logistic regression can also be used for multiclass classification tasks
where the dependent variable has more than two possible values. In this case, you would use a
variant of logistic regression called multinomial logistic regression.
Linear decision boundary: Logistic regression assumes a linear decision boundary between the
classes, which means that it works well when the classes can be separated by a straight line or
plane. If the decision boundary is more complex, you might consider using a more advanced
technique such as decision trees or support vector machines.
Large datasets: Logistic regression can handle large datasets efficiently, making it a good choice
when you have a lot of data to work with.
When it comes to visualization, there are several charts and diagrams that you can use to
understand and interpret the results of logistic regression:
Confusion matrix: A confusion matrix is a table that shows the number of true positives, false
positives, true negatives, and false negatives for a given set of predictions. From this matrix, you
can calculate metrics such as accuracy, precision, recall, and F1 score, which can help you
evaluate the performance of the logistic regression model.
ROC curve: An ROC (receiver operating characteristic) curve is a plot of true positive rate
(TPR) against false positive rate (FPR) for different classification thresholds. The ROC curve
helps you evaluate the tradeoff between sensitivity (true positive rate) and specificity (true
negative rate) for different thresholds.
Precision-recall curve: A precision-recall curve is a plot of precision (positive predictive value)
against recall (true positive rate) for different classification thresholds. The precision-recall curve
helps you evaluate the tradeoff between precision and recall for different thresholds.
Scatter plot: A scatter plot can be used to visualize the relationship between two independent
variables and the dependent variable. For example, you might use a scatter plot to see how
customer age and income relate to the likelihood of churn.
Histogram: A histogram can be used to visualize the distribution of a single variable. For
example, you might use a histogram to see how transaction amount is distributed among
customers who churned and those who did not.
By using these visualization techniques, you can gain insights into the relationships between
variables, evaluate the performance of the logistic regression model, and identify areas for
improvement.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help