sampling
docx
keyboard_arrow_up
School
San Jose State University *
*We aren’t endorsed by this school
Course
156
Subject
Management
Date
Nov 24, 2024
Type
docx
Pages
70
Uploaded by AdmiralTitanium1237
Credit Card Fraud Detection
Student’s Name:
Student’s ID:
Date:
University Name:
Abstract Over the past few years, the frequency of credit card fraud across the globe is increasing day by
day due to emerging technologies. The incident of credit card fraud can directly lead to huge
financial and reputational loss. This specific research topic on the topic “Credit Card Fraud
Detection” has specifically focused on understanding credit card fraud and providing insights
into how credit card issuers can continue to improve their security measures to prevent
fraudulent use in an efficient way. Different kinds of factors are associated with the topic such as the most effective machine
learning algorithms for fraud detection, the most predictive features of a credit card transaction,
ethical and effective fraud detection strategies, and the limitations of machine learning
algorithms in credit card fraud detection. In this study, the credit card fraud detection dataset is
selected. The selected dataset is imbalanced which has 492 fraud transaction out of total number
of 284,807 transactions. The positive class of the dataset is accounted for 0.172% of all the
transactions. This study aimed to gather all the relevant information with the help of secondary data collection
method where a bunch of secondary sources have been utilized. On the other side, this paper has
also delivered an effective recommendation section which can play a major role in helping the
study identify some effective and innovative strategies associated with the selected topic. The
logistic regression analysis, random forest method, KNN, SVC and decision tree have significant
amount of precision, recall and F1 score. Based on the requirement, demands and accuracy of the
model, logistic regression has been declared as the best algorithm for the classification purpose.
For this study, the Gaussian NB model is proposed. The proposed model has significant amount
of accuracy as well as recall value compared other models. II
Table of Contents
Chapter 1: Introduction
...............................................................................................................
- 1 -
1.1 Research Background
.......................................................................................................
- 1 -
1.2 Research Rationale
............................................................................................................
- 1 -
1.3 Research Problem
.............................................................................................................
- 2 -
1.4 Research Aim and Objectives
...........................................................................................
- 2 -
1.5 Research Question
.............................................................................................................
- 3 -
1.6 Research Scope
.................................................................................................................
- 3 -
Chapter 2: Literature Review
......................................................................................................
- 4 -
2.1 Overview
...........................................................................................................................
- 4 -
2.2 Theoretical Framework
.....................................................................................................
- 4 -
2.2.1 Effective Machine Learning Algorithms for Fraud Detection
.......................................
- 4 -
2.2.2 Most Predictive Features of a Credit Card Transaction
.................................................
- 8 -
2.2.3 Limitations of “Machine Learning Algorithms in Credit Card Fraud Detection
.........
- 10 -
2.2.4 Ethical and Effective Fraud Detection Strategies
........................................................
- 12 -
2.3 Conceptual Framework
...................................................................................................
- 14 -
2.4 Literature Gap
.................................................................................................................
- 14 -
2.6 Analysis of the Problem
..................................................................................................
- 15 -
2.7 Summary
.........................................................................................................................
- 17 -
Chapter 3: Research Methodology
............................................................................................
- 18 -
3.1 Overview
.........................................................................................................................
- 18 -
3.2 Research Philosophy
.......................................................................................................
- 18 -
3.3 Research Approach
.........................................................................................................
- 18 -
3.4 Research Design
..............................................................................................................
- 19 -
3.5 Research Strategy
............................................................................................................
- 19 -
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3.6 Data Collection Method
..................................................................................................
- 19 -
3.7 Data Analysis Method
.....................................................................................................
- 19 -
3.8 Ethical Considerations
....................................................................................................
- 20 -
3.9 Research Limitations
.......................................................................................................
- 20 -
3.10 Summary
.......................................................................................................................
- 20 -
Chapter 4: Artifact Design and Implementation
.......................................................................
- 21 -
4.1 Design
.............................................................................................................................
- 21 -
4.2 Evaluation Metric
............................................................................................................
- 24 -
4.3 Data Splitting
..................................................................................................................
- 25 -
4.4 Implementation
...............................................................................................................
- 26 -
4.5 Result
..............................................................................................................................
- 29 -
4.6 Proposed Algorithm
........................................................................................................
- 36 -
Chapter 5: Critical Evaluation
...................................................................................................
- 38 -
Chapter 6: Conclusion and Recommendation
...........................................................................
- 42 -
6.1 Conclusion
......................................................................................................................
- 42 -
6.2 Recommendation
............................................................................................................
- 43 -
6.3 Future Scope
...................................................................................................................
- 43 -
References
.................................................................................................................................
- 44 -
Appendix
...................................................................................................................................
- 49 -
III
List of Figur
Figure.1: Credit Card Fraud Detection Using Random Forest
...................................................
- 7 -
Figure.2: conceptual Framework of “Credit Card Fraud Detection”
........................................
- 14 -
Figure.3: Confusion Matrix
.......................................................................................................
- 25 -
Figure.4: Importing Dataset
......................................................................................................
- 26 -
Figure.5: Data Distribution of Amount and Time
....................................................................
- 26 -
Figure.6: Correlation Matrix
.....................................................................................................
- 27 -
Figure.7: Negative Correlation of variables
..............................................................................
- 28 -
Figure.8: Data Distribution of Variables
...................................................................................
- 28 -
Figure.9: Result of Logistic Regression
....................................................................................
- 30 -
Figure.10: Result of SVC
..........................................................................................................
- 31 -
Figure.11: Result of Random Forest
.........................................................................................
- 32 -
Figure.12: Result of Decision Tree
...........................................................................................
- 33 -
Figure.13: Result of KNN
.........................................................................................................
- 34 -
Figure.14: Comparison of Models with Proposed Model
.........................................................
- 36 -
Figure.15: Result of GaussianNB
.............................................................................................
- 37 -
Y
IIII
Chapter 1: Introduction
1.1 Research Background
Credit card fraud can be considered as a particular kind of financial fraud in which any illegal
transaction is generated with the help of a stolen or fake credit card. It can also be considered a
widespread problem that has been infecting the entire financial industry for many years.
Fraudulent transactions are very effective in leading to major financial losses for both businesses
and customers, leading to a trust loss in the whole financial system. Over the past few years,
there has been a major increase in the amount of generating credit card transactions, and as an
outcome, the whole amount of deceptive transactions has also significantly risen, which has
played a vital role in leading to an increasing interest in credit card fraud detection, which
specifically focuses on developing effectual ways of identifying and preventing fraudulent and
threatening transactions (Ileberi, Sun and Wang, 2022). With the development of different kinds
of machine learning algorithms, the researchers started to explore the effectiveness of machine
learning algorithms in the process of detecting credit card fraud. Generally, Machine learning
algorithms are trained on historical data in order to recognize patterns and generate predictions
about new transactions. The most usually utilized machine learning algorithms for credit card
fraud detection are neural networks, decision trees, and support vector machines.
1.2 Research Rationale
The idea of detecting credit card fraud by utilizing the effectiveness of machine learning
algorithms can be considered critical in the process of making sure of the financial security of
both businesses and individuals. Fraudulent credit card transactions can directly lead to major
financial losses, which can specifically have long-term effects on the credit card holders as well
as the credit card issuer. In addition, fraud can also be considered very effective in damaging the
reputation and business value of financial institutions and trimming down the confidence of the
customers in electronic payment systems (Bin Sulaiman, Schetinin and Sant, 2022). On the other
side, the expansion of an effectual system of credit card fraud detection has major or huge
economic implications. By the process of stopping fake transactions, the business and the
financial are capable of reducing their losses and boosting profits in a direct way. In addition, the
idea of eliminating or decreasing the frequency of credit card fraud can play a vital role in
1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
improving the confidence of customers in the system of electronic payment, resulting in
amplified acceptance of cashless transactions.
1.3 Research Problem
Credit card fraud can be considered a major issue in the whole financial industry, which plays a
major role in leading to considerable financial losses to both the customers and the merchants.
One of the most concerning and significant challenges associated with credit card fraud detection
is the stable development of different kinds of innovative and advanced fraudulent techniques
that are utilized by criminals. Fraudsters all across the world generally focus on using
sophisticated techniques in order to cover their activities, which can play a vital role in making it
more difficult for the traditional techniques of the entire process of traditional fraud detection to
keep up (Alharbi et al., 2022). Another major challenge in the process of credit card fraud
detection can be considered the high rate of false positives, which is generated by the fraud
detection algorithms. False positives occur when a rightful transaction is specifically flagged as
fraudulent, leading to frustration and problem for the consumers. The false positives can also
directly lead to lost revenue for the merchant, as consumers may dispose of their purchases if
their transactions are declined. On the other hand, the process of detecting credit card fraud by
using machine learning algorithms can also be considered very effective in resulting in different
kinds of issues. Machine learning algorithms require a vast amount of data in order to calculate
accurate credit card fraud detection in an efficient and successful way (Bin Sulaiman, Schetinin
and Sant, 2022). It can be considered very complicated and difficult for organizations to deliver
high-quality and adequate data to the machine learning model, which can directly lead to
inaccurate and wrong outcomes.
1.4 Research Aim and Objectives
The aim of this research is to contribute to the understanding of credit card fraud and to provide
insights into how credit card issuers can continue to improve their security measures to prevent
fraudulent use. Based on this specific aim, the research paper has also formed some effectual and
successful research objectives, which will help the study to develop an effective theory of the
research topic.
To identify the most effective machine learning algorithms for fraud detection.
2
To determine the most predictive features of a credit card transaction.
To develop ethical and effective fraud detection strategies.
To explore the limitations of “machine learning algorithms in credit card fraud detection”.
1.5 Research Question
The study has also addressed some research questions, which will help the study address all its
research objectives in a more efficient and successful way. The research questions are:
How the research could identify the most effective machine learning algorithms for fraud
detection?
How this will determine the most predictive features of a credit card transaction?
How to develop ethical and effective fraud detection strategies?
How this research could explore the limitations of “machine learning algorithms in credit
card fraud detection”?
1.6 Research Scope
The purpose of this particular research is to build up a successful system of credit card fraud
detection, which is capable of identifying and preventing fraudulent transactions. This study will
play a major role in identifying accurate machine-learning algorithms in the process of detecting
credit card fraud. The research will also focus on determining the most extrapolative features of
the credit card transaction (Saheed, Baba and Raji, 2022). This study will also help in identifying
different limitations of credit card fraud detection using machine learning algorithms, which can
be considered very effective in helping the financial institution to build up effective strategies.
3
Chapter 2: Literature Review
2.1 Overview
According to Ileberi, Sun and Wang, (2022) credit card fraud can be considered an important
concern for both financial institutions and consumers. Different kinds of fraudulent activities can
directly lead to significant legal disputes, financial losses, and reputational damage. Credit card
companies focus on employing a variety of methods in order to detect and prevent different kinds
of fraud. These methods specifically include machine learning algorithms, real-time monitoring
of card transactions for suspicious patterns, recognizing irregular spending behaviors of the
customers, and fixing limits on card usage in order to prevent unusual purchases. Another
significant and common technique is to utilize the effectiveness of data analytics for identifying
potential fraud patterns. On the other side, Abdulghani, Uçan and Alheeti, (2021) stated that
credit card organizations also provide guidance and education to their customers in order to help
them guard their financial and personal information. By the process of employing all these
strategies, credit card companies can better defend their consumers from different kinds of fraud
while reducing the chances of experiencing reputational damage and financial losses.This section
of the literature review will be focused on discussing different factors associated with the topic
“credit card fraud detection using Machine Learning” by multiple existing theories. 2.2 Theoretical Framework
2.2.1 Effective Machine Learning Algorithms for Fraud Detection
Fraud detection can be considered one of the most critical applications of machine learning.
Based on the theory by Naveen and Diwan, (2020), the effectiveness of machine learning
algorithms are increasingly being utilized for detecting fraud because they are capable of
analyzing huge amounts of data accurately and quickly, making them suitable for the process of
rectifying or sensing any kind of fraudulent behavior. With the help of machine learning
algorithms, it is possible to identify patterns in data that may be fraudulent or indicative, such as
anomalies in the behavior of the users and unusual transactions. On the other side, machine
learning algorithms can also be utilized in order to generate predictive models that are able to
identify and sense the probability of fraud occurring according to historical data. Based on the
viewpoint of Dang et al., (2021), real-time monitoring is another major area that is significantly
4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
focused on by machine learning algorithms in the period of detecting fraud. The usefulness of
machine learning algorithms can be utilized in the process of monitoring transactions in real time
and flagging any doubtful activity as it takes place. Machine learning algorithms can be trained
to decrease or eliminate the number of false positives, which can directly help in minimizing the
number of legitimate transactions that are incorrectly flagged as deceitful. Machine learning
algorithms focus on continuously learning from new data, which performs a major role in
allowing them to improve and adapt over time. There are many effective algorithms that are
widely been utilized in the process of detecting fraud. Some of these machine-learning
algorithms are Logistic Regression, Decision Trees, Random Forests, Support Vector Machines,
and Neutral Networks.
Decision Trees
Saheed, Baba and Raji, (2022) stated that decision trees can be considered one of the most
popular machine learning algorithms for fraud detection as they are very easy to interpret and
understand. Decision trees generally perform by tearing the data into minor subsets according to
a variety of features and generating a prediction according to the final subset. Every subset writes
to a node in the tree, and the final subsets are known as the leaves of the tree. The algorithm of
the decision tree performs by selecting the forecaster variable that best splits the data at every
node. The tearing or splitting criterion can be associated with a range of measures, such as
entropy, information gain, or the Gini index. Based on the thesis by Ileberi, Sun and Wang,
(2022), the main aim is to generate partitions that are possible and pure, indicating that they hold
mostly one class of the response variable. After constructing the tree, it can generally be utilized
in order to create predictions on new data by the process of following the way from the root to
the suitable leaf node. The forecast at the leaf node can be considered the majority class of the
training data that falls inside that partition.
The decision tree is popular for having quite a few beneficial aspects in fraud detection.
According to the statement of Abdulghani, Uçan and Alheeti, (2021), one of the major
advantages is that it is easy to visualize and interpret, making it easy to appreciate the decision-
making procedure of the model. This can also be considered very useful for explaining the model
to non-technical stakeholders or for recognizing particular features that are significant for
detecting fraud. Another major advantage of considering and implementing a decision tree is that
5
it is capable of handling both continuous and categorical predictor variables, and can
automatically handle and detect the missing data. This makes them a powerful and flexible
algorithm for fraud detection. Ahmad et al., (2023) said the idea of considering decision trees in
the process of fraud detection also has some major and concerning limitations such as
overfitting, and being unable to handle changes.Decision trees can be utilized in credit card fraud
detection in order to successfully identify doubtful transactions based on several features such as
location, transaction amount, and time of day. By considering the idea of using decision trees in
the credit card fraud detection process, financial institutions can accurately and quickly identify
different kinds of fraudulent transactions and take suitable action in order to prevent losses.
Random Forest
According to the statement of Zioviris, Kolomvatsos and Stamoulis, (2022), Random forests are
one specific kind of ensemble learning method, which can be utilized in fraud detection. Random
forests are generally created by the process of combining numerous decision trees, all trained on
a random subset of the predictor variables and a random subset of the data. The last prediction is
typically generated by aggregating the forecasts of all the trees. The idea of considering random
forest in fraud detection has a bunch of beneficial aspects. Random forests are capable of
handling both continuous and categorical predictor variables, as well as can automatically handle
missing data. This plays a major role in making them powerful and flexible algorithms for
detecting fraudulent activities. Based on the viewpoint of Ahmad et al., (2023), Random forests
can be considered very effective in reducing the chances of experiencing overfitting, which can
be a concerning problem with decision trees. By the process of constructing numerous trees and
aggregating their forecasts, random forests are able to eliminate or reduce the discrepancy of the
model and develop its generalization performance. 6
Figure.1: Credit Card Fraud Detection Using Random Forest
(Source: Shukur and Kurnaz, 2019)
One significant limitation associated with the concept of considering random forests in fraud
detection is that they can be comparatively more expensive than decision trees, particularly when
v the size of the data and the number of trees are huge. Dang et al., (2021) identified that in the
case of credit card fraud detection, random forests can be utilized in order to sense or identify
suspicious or doubtful transactions according to the features such as the location, transaction
amount, and time of day. The random forest can also implement extra features such as credit
score or the transaction history of the customer.
Logistic Regression
According to the thesis of Ileberi, Sun and Wang, (2022), logistic regression can be considered
one of the most significant and widely used machine learning algorithms in the process of fraud
detection. Logistic regression is one kind of statistical method utilized for binary classification
problems, where the response variable is definite with only two different outcomes, typically
represented as 1 or 0. The model of logistic regression specifically aims to measure the
likelihood of an event happening, according to one or more forecaster variables. The logistic
regression model also focuses on using the logistic function in order to transform the linear
predictor into likelihood. The logistic function can be considered an S-shaped curve that
generally ranges between 0 and 1 and is described as follows:
p = 1 / (1 + e^(-z))
7
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Where z can be considered as the linear predictor, p can be considered as the probability of the
positive outcome and e can be addressed as the mathematical constant more or less equal to
2.718.
Saheed, Baba and Raji, (2022) stated the logistic regression model can be considered very
effective and beneficial in estimating the values of the regression coefficients that focus on
maximizing the probability of the observed data. This is generally performed utilizing maximum
probability estimation, which plays a major role in finding the values of the regression
coefficients that increase the probability function. Shukur and Kurnaz, (2019) said that the
effectiveness of logistic regression can generally be utilized for an extensive range of binary
classification problems, such as disease diagnosis, spam filtering, and fraud detection. One of the
beneficial aspects of considering logistic regression is that it is an interpretable model that can be
simply understood by non-experts. Moreover, logistic regression is also capable of handling both
continuous and categorical predictor variables. Support Vector Machine
According to the viewpoint of Zioviris, Kolomvatsos and Stamoulis, (2022), SVM or support
vector machine can be considered a powerful algorithm, which can be utilized for both multi-
class and binary classification. Support vector machine specifically performs the process of
finding the finest boundary between two classes, and it is very helpful and effective for fraud
detection as it is capable of handling complex data. SVMs perform the process of finding the
best hyperplane that helps in separating the data into multiple classes. In the case of fraud
detection, SVMs can be successfully trained on historical data in order to learn the patterns of
deceitful behavior and then utilized to classify new transactions as either legitimate or fraudulent.
Esenogho et al., (2022) stated that SVMs are capable of handling high-dimensional data, which
can be helpful in the procedure of fraud detection where there are often a lot of different features
that need to be considered. SVMs are able to handle both non-linear and linear decision
boundaries, making them more versatile and able to capture complicated patterns in the data.
Different kinds of techniques such as kernel functions and regularization are utilized by SVMs in
order to prevent the issue of overfitting and develop the generalization performance of the
model.
8
2.2.2 Most Predictive Features of a Credit Card Transaction
Based on the statement of Al-Shabi, (2019), credit card transactions can be considered very
effective in the process of generating a major amount of data that can generally be utilized in
order to identify different kinds of fraudulent activity. The capability of analyzing and predicting
the characteristics of credit card transactions can play a major role in the process of helping
financial institutions in the process of detecting fraud and generate better choices in terms of risk
management. Some of the most predictive features of credit card transactions are mentioned in
the following. The amount of transactions can be considered one of the most critical features
associated with credit card transactions. It performs a major role in the process of representing
the amount of money invested by a cardholder, and it can have a major impact in delivering
insights into their habits of spending. In addition, the transaction amount can be used to identify
outliers or suspicious transactions, which may indicate fraud. It specifically focuses on
representing the amount of money spent by a cardholder, and it can directly deliver insights into
their spending habits. Moreover, the transaction amount can be used in order to recognize
suspicious transactions and outliers, which may point out fraud. On the other side, Rtayli and
Enneya, (2020) stated the MCC can be considered a four-digit code, which is specifically
designed for merchants that categorizes the kind of business they generally function. The date
and time of the transaction can be considered one of the significant characteristics of credit card
transactions that help in the process of detecting credit card fraud. The date and time of a
transaction can be considered very effective in the process of delivering insights into the
spending habits of the cardholder, such as whether they formulate purchases during particular
times of the week or day. It can also be considered very helpful and beneficial for financial
institutions in the procedure of addressing fraud through the process of sending unusual or
suspicious transaction dates or times. Based on the thesis of Bin Sulaiman, Schetinin and Sant,
(2022), the location of the cardholder can deliver insights into the habits of their spending, such
as whether the user makes the purchases in abroad or their home country. On the other side, it
can also have a significant impact in helping financial institutions in the process of detecting
fraud by identifying suspicious locations, such as the locations that are popular for their high
fraud levels. The location of the merchant can be considered the other major characteristic of
credit card transactions that can be used by banks and other financial services in the process of
detecting credit card fraud. The location of the merchant can also be considered very effective in
9
delivering insights about the habit of consumers of using their cards, such as whether they
formulate a purchase at national or local retailers. It can be considered very effective in the
process of helping financial institutions to detect fraud by the method of recognizing suspicious
or unusual merchant locations. Gupta, Lohani and Manchanda, (2021) stated the type of credit
card utilized by the user in the transaction process helps in achieving insights into the
cardholder's spending habits and creditworthiness. For instance, the situation of using a platinum
card may directly indicate that the cardholder generally has a towering credit score and uses
more money compared to someone with a regular card. The type of credit card can also play a
major role in helping financial institutions in sensing or detecting fraud by identifying suspicious
or unusual card types. On the other side, the currency of the transaction can also have a major
impact in helping the financial institution to identify if the purchase has been made in foreign
currency or their home currency. Based on the thesis by Saheed, Baba and Raji, (2022), insights
into creditworthiness and account balance can be delivered by the status of the transaction, which
can be considered a major characteristic of credit card transactions. Transaction frequency also
helps financial firms in the process of avoiding the chance of experiencing fraudulent credit card
activities.
2.2.3 Limitations of “Machine Learning Algorithms in Credit Card Fraud Detection
According to the statement of Alharbi et al., (2022) credit card fraud can be considered as one of
the significant problems in the whole financial industry, leading to billions of dollars in losses
every year. Traditional methodologies of detecting fraud specifically rely on rule-associated
systems, which have inadequate effectiveness in the process of detecting sophisticated and new
kinds of fraud. Different kinds of machine learning algorithms have shown their emphasis on the
process of detecting fraudulent transactions with high accuracy. Though, there are still quite a
few limitations to their use in credit card fraud detection, which have been addressed in the
following.
Data Quality and Availability
According to the thesis by Bin Sulaiman, Schetinin and Sant, (2022), data availability and
quality can be considered one of the most concerning and significant limitations associated with
the idea of considering machine learning algorithms in fraud detection. The efficiency and
effectiveness of machine learning algorithms specifically depend significantly on the quantity
10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
and quality of the available data. In the case of credit card fraud detection, data is frequently
imbalanced and scarce, with deceptive transactions being a little percentage of the entire data.
This imbalance can directly result in bias in the machine learning algorithms, where they turn out
to be better at detecting non-fraudulent or normal transactions than deceitful ones. In addition,
fraudsters are continuously emerging their tools and techniques, which mean that the algorithms
trained on historical data, may not be successful in the process of detecting new kinds of fraud.
Thus, the data utilized in order to train the algorithms must be continuously updated and
validated to make sure that they stay effective and successful.
Interpretability
Based on the viewpoint of Esenogho et al., (2022), interpretability can be considered the other
major and concerning issue associated with the idea of using machine learning algorithms in
fraud detection. Many of the state-of-the-art algorithms, including deep learning models and
neural networks, are black boxes that can play a major role in making it hard to understand how
they appeared in their decisions. This lack of transparency can play a significant role in making it
difficult for fraud analysts to give details about why a specific transaction was marked as
deceptive. It can also be considered very effective in making it difficult to recognize and correct
any biases or errors in the algorithm.
Time and cost
According to Alharbi et al., (2022), the idea of implementing machine learning algorithms for
the process of credit card fraud detection can be addressed as time-consuming and costly.
Training and building the machine learning models specifically require adequate resources and
expertise, and the data utilized need to be constantly validated, updated, and cleaned. On the
other side, the algorithms require to be continuously updated and monitored in order to make
sure that they remain successful and efficient against evolving and new kinds of fraud. The time
and cost involved in the process of updating and maintaining the machine learning algorithms
can be prohibitively high for some businesses.
Overfitting
11
Overfitting can be considered one of the most concerning and major issues associated with
machine learning algorithms in credit card fraud detection. Based on the viewpoint of Arya and
Sastry, (2020), overfitting generally takes place when the machine learning algorithm becomes
very complicated, fitting very closely to training data. This can directly lead to the algorithm
being excessively sensitive to the noise in data, which can play a significant role in resulting in
false positives. The issue of overfitting can be difficult to notice, as the algorithm may come out
to be working well on the training data. On the other hand, in the period of testing on new data,
the algorithm may execute poorly, resulting in an inaccurate outcome.
Imbalanced Data
According to the statement of Bin Sulaiman, Schetinin and Sant, (2022), credit card fraud can be
considered a comparatively rare event, with only a little percentage of transactions leading to
fraud. This plays a major role in creating an imbalanced dataset, which may directly result in
imprecise outcomes. Different kinds of machine learning algorithms may generally be biased
towards the majority class, leading to a towering rate of false negatives. This specifically means
that the machine learning algorithm may not sense deceitful or fake transactions, causing huge
financial losses for the card issuer and cardholder.
Concept Drift
Based on the viewpoints of Jain, Agrawal and Kumar, (2020), concept drift specifically takes
place when the underlying distribution of the data alters over time. In the case of detecting credit
card fraud, this can generally happen when the fraudsters change their hacking tactics or when
new types of fraud emerge. If the selected machine learning algorithm is not properly trained on
new data, it may not be capable of detecting such changes, resulting in negatives. On the other
side, concept drift can also be considered very challenging to notice, as it may take place slowly
over time.
Adversarial Attacks
Based on the thesis by Shirgave et al., (2019), fraudsters may try to bypass the machine learning
algorithms by the process of manipulating the information or generating fraudulent transactions
that are specifically structured in order to avoid detection. It can also be known as an adversarial
12
attack. On the other side, Adversarial attacks can also be considered very hard to detect, as the
fraudsters may utilize different techniques in order to avoid detection. In addition, all these
attacks can generally be utilized for manipulating the algorithm, resulting in false negatives or
false positives.
2.2.4 Ethical and Effective Fraud Detection Strategies
The idea of utilizing machine learning for fraud detection specifically comes with some specific
ethical considerations. For instance, false positives can directly result in innocent customers
experiencing their transactions being rejected, while false negatives can specifically lead to fraud
going unnoticed and leading to major financial loss. Therefore, it can be considered very
essential to discover the correct balance between ethical considerations and accuracy. Some
Ethical and Effective credit card Fraud Detection Strategies using machine learning algorithms
are addressed in the following. It can be considered very important to select the ideal machine
learning algorithms in order to accomplish a desired and successful outcome. The machine
learning algorithms are required to be selected based on the circumstances, data type, and
objectives. Based on the thesis by Shirgave et al., (2019), the idea of analyzing a range of data
sources, such as location data, transaction history, and behavioral patterns, can directly play a
major role in the process of detecting fraudulent activities. By the process of combining different
kinds of data sources, machine learning algorithms are more capable of precisely identifying
fraudulent behavior. On the other side, Hussein et al., (2021) stated the concept of real-time
monitoring of credit card transactions is also capable of rapidly recognizing doubtful activity and
avoiding fraudulent charges from being accepted. Real-time monitoring can also be considered
ad effective in the process of allowing the backs and credit card organizations to take action
rapidly to assume fraud. Moreover, ethical guidelines require to be developed in order to make
sure that the fraud detection strategies are utilized ethically or morally and in compliance with
appropriate regulations and laws. Guidelines require to be developed with input from ethical and
legal experts and require to be considered data protection and customer privacy. According to the
statement of Jain, Agrawal and Kumar, (2020), fraud detection strategies should typically focus
on the process of reducing or minimizing false positives and making sure that legitimate
transactions are not rejected. Providing transparency is the other major strategy in the process of
detecting credit card fraud. The customers need to be delivered with transparent and clear
information about different fraud detection strategies and how these strategies are utilized in
13
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Card Fraud Detection using Machine Leraning Algopritms: Decision Trees
Support Vector Machine
Random Forest
Logistic Regression
Limitations of “Machine Learning Algorithms in Credit Card Fraud Detection: Data Quality and Availability
Interpretability
Time and cost
Overfitting
Imbalanced Data
Adversarial Attacks
Stratagies for effective fraud detection: Selecting idea machine leraning algorithm
Analyzing several data sources
Monitoring in Real time
Focusng on false positives
Delivering fraud education
order to protect their financial interests. The customers should also be up to date about how their
personal data is secured and utilized. Delivering fraud education can be considered one of tey
most effective strategies that need to be considered by banks and financial organizations in the
process of preventing credit card fraud detection. Based on the article by Zioviris, Kolomvatsos,
and Stamoulis, (2022), the idea of educating customers about fraud detection and prevention can
help them protect themselves from different kinds of fraud and identify deceitful activity.
Educational resources can generally include online videos, tutorials, and articles on how to guard
their confidential and important information and keep away from common fraud scams.
2.3 Conceptual Framework
Figure.2: conceptual Framework of “Credit Card Fraud Detection”
(Source: By Author)
This conceptual framework focuses on showcasing different kinds of machine learning
algorithms such as decision trees, SVM, Logistic regression, and random forest, which can be
utilized by the bank and other financial institutions in the process of detecting suspicious
14
activities on credit card transactions. This conceptual framework also played a significant role in
the process of representing the limitation of machine learning’s credit card fraud detection, which
can be considered very effective in the process of helping the firm to identify the ideal strategies
to solve and overcome the limitations. 2.4 Literature Gap
This particular section of the literature gap generally aims to identify all the areas associated with
the research topic, which have not been discussed in this research paper. In the case of this
particular research paper, only the utilization of machine learning algorithms in the process of
detecting credit card fraud has been discussed. Different kinds of deep learning algorithms can
also be considered very effective and beneficial in helping banks and other financial institutions
in the process of detecting credit card fraud, which has not been mentioned and discussed in this
paper. On the other side, the study also hasn’t focused on discussing and analyzing the step-by-
step process followed and maintained by the financial services in detecting credit card frauds
using the effectiveness of machine learning algorithms. The approach of discussing the other
methods that can be utilized to detect credit card fraud could perform a significant role in
improving the quality and effectiveness of the study in an efficient and successful way. All these
areas have a major impact in preventing this entire theory from being more precise n the chosen
research topic.
2.6 Analysis of the Problem
Credit card fraud is considered to be a huge problem in today’s financial world. It leads to
considerable amount of financial loss for companies as well as users. The financial transactions
are shifted towards the online platforms which needs the detection of the fraud transactions. The
machine learning is one of the emerging methods which provides effective solution in order to
handle the fraud transactions of credit card. Moreover, machine learning model provides
substantial benefits which prevents several challenges in terms of security of the online
transactions. The key advantages of using the machine learning model are to process the large
amount of data. The traditional fraud detection system cannot detect fraud due to the innovation
and new techniques of fraudsters that bypasses the rules of the credit card fraud. The machine
learning model learns from new data and detect new and complex pattern of the credit card
fraud. The anomaly detection as well as the supervised learning can determine the normal
15
spending behavior which helps to detect the suspicious transactions. This method also has some
drawbacks such as false positive. The machine learning model learns from the past data which
helps to classify the fraudulent transactions based on the past behavior. This aspect can decline
transactions which can lead to the dissatisfaction of customers (Shirgave et al., 2019). The most significant challenge that is associated with the credit card fraud is the enhanced
development of different kinds of new and advanced techniques of fraudulence that are hard to
detect. Credit card fraud offenders use multiple techniques using high end technology to commit
these fraudulent activities. All around the globe, attackers use sophisticated methods to commit
such crimes which are hard to detect using simple technology. The traditional method of
detection thus results in failure while facing such innovative fraudulent tactics used by offenders.
Another major challenge regarding the fraud detection is the emergence of high levels of false
positives that is often generated by the detection algorithms. The false positives often reflect on
the system when a legitimate transaction is flagged as fraudulent which leads to loss of trust of
the customers on the detection method. This leads to frustration and distrust among the
consumers of the credit card. These false positives may also lead to loss of revenue for the
company or the merchants. Hence, machine learning technique is an effective method to deal
with the credit card fraud detection method which results in less errors and yields higher
accuracy as compared to the traditional methods. Machine learning algorithms utilizes huge data
and calculates the fraud occurrences in an effective manner. The organizations may find it hard to
yield accurate results owing to machine learning process which often leads to inaccurate
outcomes (
Stamoulis, 2022). The machine learning tool for this problem solving is one of the
powerful approaches which provide significant protection against the fraud transactions of credit
card. This method adopts new and innovative methods based on the large amount of data of real-
time.
Some of the major challenges faced by the credit card detection method are the problems of data
availability as many times the consumer data or information related to the credit card
transactions are private. Unclassified data is another major challenge faced by fraud detection
systems as all fraud transactions or attempts are successfully caught by the system or reported.
Scammers and various adversary groups often use adaptive techniques against the machine
learning models and algorithms resulting in disruption of the model and failure of the fraud
16
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
detection mechanism. Another challenge that is often faced by the ML system is the huge amount
of data that needs to be processed on a regular basis. The model does not always keep up with
the expected amount of speed and accuracy needed for the detection of such huge number of fake
transactions taking place (
Uçan and Alheeti, 2021). Data imbalance is another major problem.
The Machine learning model finds it challenging to detect the exact number of fraud transactions
happening throughout from the entire set of transaction data available. The machine learning
model must be fast enough to detect the suspicious activities or anomalies in transactions in a
faster way and hence its overall efficiency needs to be increased. The privacy of the customer
often comes at stake since the transaction data needs to be available for the model to utilize it to
find the fraudulent activities. Hence, protection of user privacy is another major challenge often
faced by the ML model during the credit card fraud detection. The model must rely on a much
more trustworthy source to cross check the data obtained for the model training process. The
complex algorithms used in the model is tough to decode but is also prone to hacking or attacks
by scammers. Thus, security of the model must be ensured by the developers (
Shirgave et al.,
2019). The model must be adaptable to new changes so that attackers find it difficult to scam the
same model every time. 2.7 Summary
This section of the literature review focused on discussing different kinds of factors associated
with the topic of “credit card fraud detection”. The study has identified and discussed all the
machine learning algorithms that can help in detecting different kinds of credit card fraud and
eliminate the chances of facing huge financial loss. The machine learning algorithms that can
help credit card users and credit card issuers to prevent the chances of accruing credit card fraud
are logistics regressions, decision trees, support vector machines, and random forests. This study
also focused on analyzing the role of all these algorithms in credit card fraud detection, which
perform a vital role in helping the study identify the appropriate algorithm to use in credit card
fraud detection. Most predictive features of a credit card transaction have also been addressed
and discussed in this study. Different predictive features of a credit card transaction are the time
and date of the transaction, the amount of the transaction, the location of the users, the currency
of the transaction, the frequency of the transaction, and many more. These entire situations
perform a major role in delivering an indication to the organization if any kind of credit card
fraud has taken place. There are different kinds of limitations associated with the process of
17
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
detecting credit card fraud by using machine learning algorithms, which perform a major role in
preventing the organization to be more capable of resisting the situation of experiencing different
kinds of credit card fraudulent activities. This section also focused on discussing different kinds
of effective and beneficial strategies that help banks and other financial institutions in avoiding
the chances of facing huge financial losses due to credit card fraud.
18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chapter 3: Research Methodology
3.1 Overview
The research methodology section tends to discuss the way in which the research paper would be
carried out so that the aim, objectives, and research question of the paper can be addressed
successfully (
Melnikovas, 2018). Through the Saunders Research Onion, the various decisions
that are considered to be important for developing a research methodology for the research paper
are described (
Orth and Maçada, 2021). In this manner, the scholar can ensure the reliability and
validity of the outcomes of the research.
3.2 Research Philosophy
In research, there are basically 4 types of philosophies that are mainly used, which are
pragmatism, positivism, realism, and interpretivism (
Žukauskas, Vveinhardt and Andriukaitienė,
2018). In the following research, the scholar had chosen to use the interpretivism research
philosophy so that the important elements of the study that would be identified through the study
could be interpreted successfully. In this manner, the research scholar would be able to integrate
the interest of humans across the outcome of the study. The use of the respective philosophy thus
would be helpful in focusing on the meaning so that the different aspects of the research problem
can be reflected (
Williamson, 2021).
3.3 Research Approach
In academic research studies, the concept of research approach is regarded as the general plan
and procedure that helps in conducting the research study. The three main research approaches
that are mainly used while conducting research are the deductive approach, inductive approach,
and abductive approach (
Snyder, 2019). In the following research, considering the nature of the
study, the deductive approach is used. The deductive research approach focuses on the testing of
theory as well as the hypothesis with the help of empirical data (
Gupta and Gupta, 2022). The
hypothesis of this study helps to detect the credit card fraud. This approach helps to test the
hypothesis with the help of the collected credit card transactions with the help of multiple
algorithms (
Tamminen and Poucher, 2020). 19
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3.4 Research Design
The research design acts as a research methods and techniques framework that helps the research
scholar to carry out the study successfully. The different types of research designs that are mainly
used in a research study are descriptive, experimental, correlational, diagnostic, and explanatory
research design (
Pandey and Pandey, 2021). In this research, the scholar intended to use the
experimental research design so that the cause and effect of credit card fraud detection can be
established. In this manner, the scholar would be able to identify and analyze the impact of the
independent variable over the dependent variable (
Patel and Patel, 2019). 3.5 Research Strategy
The research strategy tends to state the overall plan for the conduction of the research study. By
determining the research strategy, the important aspects of the research, such as planning,
executing, and monitoring the overall research, can be done in a successful manner (
Kumar,
2018). For the following research, the scholar intended to use the quantitative strategy so that
numerical data could be collected from different sources and an emphasis on objective
measurements could be focused (
Snyder, 2019).
3.6 Data Collection Method
In the following research the scholar had chosen to collect the data from secondary sources.
From the secondary source dataset would be collected for credit card fraud detection. The dataset
would be collected from the website of Kaggle (kaggle.com, 2023). The selected dataset
provides the real-world relevance like the credit card fraud is one of the critical aspects for the
financial institutions. This dataset is relatively imbalanced dataset compared other studies of
credit card fraud detection. This dataset has small number of fraudulent cases compared to the
overall data. In other studies, the legitimate and fraudulent transactions have balance data. The
selected dataset makes challenging for traditional algorithms. Thus, this dataset is selected to
provide another solution for credit card fraud detection. 3.7 Data Analysis Method
It is important to identify the appropriate tool that can be used for analyzing the credit card
fraud-related data for an improvised analysis further. For the analysis of the data that would be
collected from the website of Kaggle, various machine learning algorithms would be used. The
different machine learning algorithms that had been chosen to be used for the analysis process
20
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
are logistic regression, support vector classifier, random forest classifier, Gaussian Process
Classifier, and the k-Naïve classifier (
Kumar, 2018). Apart from this, the data is analyzed with
the help of statistical method such as descriptive statistics, and correlation matrix that helps to
determine the relationship as well as the insight information of variables of the credit card fraud
detection.
3.8 Ethical Considerations
Ethical considerations are assumed to be a set of principles that should be followed by the
research scholar while conducting the research so that the study can be guided in the correct
direction. It helps in maintaining the validity and integrity of the research paper (
Mohajan,
2018). The various ethical considerations that had been followed in the research are destroying
the data after it is being used in the research so that it does not get misused in any manner,
avoiding any plagiarism, protecting the rights of the participants, causing no physical,
psychological or mental harm to any people, checking the authenticity of the data before using it
in the research to avoid any falsification of data (
Mukherjee, 2019). 3.9 Research Limitations
It is not possible for the research scholar to include each and every factor associated with the
research in a single paper. The research limitations help in identifying the areas of the study that
had not been addressed in the study (
Williamson, 2021). The following research does not
comprise any qualitative data, due to which a gap in the theoretical aspect of the study is being
observed. Apart from that, the paper had also not included the primary data collection method,
such as an interview or survey, due to which the study lacks any real-time data. These two are the
major limitation of the methodological section.
3.10 Summary
The following section successfully discussed all the aspects that are important to carry out
research successfully in the correct direction, such as philosophy, approach, design, strategy, data
collection methods, etc. By abiding by these step-by-step approaches, the scholar would be able
to address the aim and objectives of the paper successfully and thus solve the research question. 21
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chapter 4: Artifact Design and Implementation
4.1 Design
The design of the machine learning model is the step-by-step method. The data visualization is a
critical step of designing the machine learning model which helps to provide critical information
about the selected dataset. The first step of designing the machine learning model is the
understanding of the problem. Before starting the model. It is essential to understand the purpose
of the model which helps to select the proper algorithms for this model. For this study, the
purpose of the machine learning model is to classify the credit card fraud with the help of
different algorithms. In this step, several independent and dependent variables are determined
which helps to provide the most important variable for the classification. In order to build the
credit card fraud detection model, the understanding of the industry operations needs to be
analysed for developing better model (
Guo et al., 2019). 4.1.1 Experimental Setup For the development of the machine learning model, the Google Colab platform is used. This
platform helps to write the code in python which is supported by all the web browsers and this
platform can be accessed remotely. For this model, Python 3.10 version is used in which the
libraries can be added or removed with the help of Google Colab. For this experiment, 6 GB
NVIDA VRAM is used as well as 16 GB RAM is used which helps to provide efficient
experimental environment. 4.1.2 Selected Dataset
It is essential for organization to determine the fraudulent of the transaction of credit card so that
the consumers not charged for the items which are not purchased by them. The selected dataset
contains the transactions that are made by the customers of credit card of the European region.
The dataset shows the transactions of two days which have more than 492 fraudulent of the total
transaction amounts of 284,807. The selected dataset is unbalanced which has the positive class
of 0.17% of all the transactions. This dataset contains several variables which is the outcome of
the PCA transformation. This dataset does not provide the background of the variables. The
features of the dataset from V1 to V28 are the PCA and two variables are transformed with the
22
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
help of PCA which are time and amount. The time contains the transaction time and amount is
the transaction amount. The feature class of the dataset is the response variable which has the
value 1 and 0. The class of the dataset is imbalance and accuracy of this data can be measured
with the help of AUC curve and the confusion matrix (kaggle.com, 2023). 4.1.3 Used Algorithms
For this machine learning model, logistic regression, support vector classification, random forest
classification, decision tree, and KNN. Logistic Regression The logistic regression is the statistical method which is used for the classification and
prediction. This method estimates the probability of the occurring of the event based on the
independent variable. The outcome of this method is the probability and dependent variable is
bound from 0 and 1. In this method, the logit transformation is applied and the probability of
success is divided with the help of probability of failure. The logistic function is based on
Logit(pi) = 1/ (1+ exp(-pi)) (
Yang and Shami, 2020). Support Vector Machine
The support vector classification is the supervised algorithm which is used for regression and
classification. The objective of this method is to determine the hyperplane in the N-dimension of
the space which helps to classifies the data points. The dimension of the hyperplanes is
dependent on the feature numbers. The kernel is the function of SVM which takes the low-
dimensional inputs and transforms it to the higher dimensional space. It converts the non-
separable problems to the separable problems which is essential for non-linear problems (
Sarker,
2021). The primary advantage of using this method is that it is efficient in terms of memory as it
utilizes the subset of the training set in the decision function. 23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Random Forest
The random forest is the supervised machine learning algorithm which is highly used for the
classification as well as regression. This method contains several trees and number of trees are
considered as the robustness of this model. The accuracy of this method is based on the number
of trees which helps to provide problem-solving capabilities. The random forest contains several
decision trees based on the several subsets of the dataset which helps to improve predictive
accuracy of the selected dataset. Decision Tree
The decision tree is the supervised method which is used for the classification problems and it is
preferred for solving the classification issues. This method is tree structured classifier and the
internal node shows the feature of the dataset. The branches of the tree show the decision rules
and leaf node shows the outcome of the tree. The performance of this method is based on the
feature of the dataset (
Ray, 2019). Proposed Model Gaussian NB
The gaussianNB is the supervised algorithm which is the special type of Naïve Bayes algorithm.
This method is used when the feature has the continuous values and the all the features needs to
follow the gaussian distribution. This method is based on the Bayes theorem which assumes that
all the features are independent to each other. The KNN method is non-parametric method which
is a supervised learning classifier. The prediction as well as classification are made based on the
grouping of the individual data. The class label of this model is assigned based on the majority of
the data (
Ho et al., 2021). 24
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4.2 Evaluation Metric
The performance evaluation of the machine learning model is one of the essential steps to build
an effective ML model. In order to analyze the performance as well as the quality of the model
different types of metrics are used which are known as performance metrics or evaluation
metrics. These metrices helps to understand the performance of the model based on the provided
dataset. With the help of these metrics, the performance of the model can be improved by tuning
the hypermeters. Each machine learning model has the purpose to generalize the data and the
performance metrics determine the generalization of the model on the dataset. The performance
of each model is evaluated with the help of various evaluation metrices such as accuracy, F1
score, Recall, and Precision (Saheed, Baba and Raji, 2022). 1.
The accuracy provides the accurate classification value with the help of measuring the
ratio of predicted value and the total number of instances. This metric is essential while
the false positive and the false negative value is similar (Alharbi et al., 2022). 2.
The precision value shows the ratio of correct predictions of positive values in terms of
the total predicted positive values. Higher precision value shows the low rate of false
negative instances (Bin Sulaiman, Schetinin and Sant, 2022). 3.
The recall value provides the current predicted positive amount regarding the total number
of positive amounts in the dataset. The high recall value provides the most actual positive
instances of the dataset (Saheed, Baba and Raji, 2022). 4.
The F1-score provides the average value of precision and recall value. It provides the
balanced analysis of the performance of the model (Alharbi et al., 2022). The confusion matrix of machine learning is known as the error which is a table that provides the
virilization of the performance of a specific algorithm. Each row of this matrix represents the
actual class while the column presents the instances in the predicted class. The true positive
shows the positive value which is true whereas the true negative value predicts the negative
value that is true. The false positive value is the type-1 error value which predicts the positive
value which is false. The false negative value is the type-2 error which predicts the negative
value which is false. In figure 3, the confusion matrix with the actual values and predicted values
(Bin Sulaiman, Schetinin and Sant, 2022). 25
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Figure.3: Confusion Matrix
4.3 Data Splitting
The data splitting is another crucial part of machine learning model which improves the overall
performance. This method divides the data into two subsets for the learning, and validation of the
ML model. The data splitting ensures the creation of the model for the data as well as the
process. The training model is used to train the developed models. This aspect is used to estimate
different parameters in order to compare the performance of different. The test data is compared
to check the final model in terms of correct outcome. For the data splitting, the random sampling
is used. This method protects the process the data modelling from bias towards various
characteristics of data. The dataset is split into two sets such as training dataset and test dataset.
The data models use the data splitting for training the model based on dataset. The training data
is added to the model in order to update the training parameters. After the training phase, the data
from test set is measure in the comparison of handling new observation of the model. The
training dataset has the 80% of the total number of dataset whereas the training dataset has the
20% of the total data. For the splitting the random state 42 is used that helps to make sure the
reproducibility with the help of the solving the random state seed at the level 42 (Naveen and
Diwan, 2020). 26
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4.4 Implementation
4.4.1 Data Importing
Dataset import is one of the critical aspects of the machine learning model. For import the
dataset panda library is used. The head () function helps to provide the first few data of the
dataset which is loaded to the data frame. Figure.4: Importing Dataset
4.4.2 Data Visualization
The data visualization is the critical part of the machine learning model which helps to represent
the variables with the help of graphical presentation such as charts, graphs, heatmaps, etc. This
process helps to determine the insight information of the dataset which is essential for the feature
selection for the model. Apart from this, this method helps to provide the dependency between
the variables which helps to determine the target variable of the model (
Sarker, 2021). Figure.5: Data Distribution of Amount and Time
The above graph shows the data distribution of time and amount of the credit card transactions.
The amount of transaction has high density in 0-5000 and the amount of the density is more than
0.0015. On the other hand, time has the highest density in 2000-8000 and 12000 to 16000. The
highest density amount is 1. Based on the above graph, these variables have some outliers. 27
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Figure.6: Correlation Matrix
The above figure shows the correlation between the variables. Amount has the negative
relationship with V1 with the value of -0.228 and this variable has the negative connection with
V2, V3 and V5 with the values of -0.53, -0.21 and -0.38. The V4 has the positive relationship
with amount with the value of 0.99. The V6 has the positive connection with amount with the
relationship amount of 0.21. Based on the above, all the variables are related to each other in
both positive and negative manner. 28
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Figure.7: Negative Correlation of variables
The above graph shows the negative correlation of variables. The V17 has the negative
correlation of -0.5 and V14 has the negative correlation of higher than -0.5. The V12 has the
negative correlation of -0.5 and V10 has the negative value of more than -0.5. Thus, all the
variables are related to each other by negative manner. Figure.8: Data Distribution of Variables
From the above graph, the V14 has the highest distribution from -20 to -5 and the highest value
of the data is more than 0.10. In the case of V12, the distribution is from -20 to 0 and the highest
value is more than 0.08. The V10 has the distribution value of -25 to 5 and the highest value is
0.12. Thus, all the variables have the normal distribution. 29
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4.2.3 Handling Imbalanced Dataset
The selected dataset is imbalanced because it has 0.17% fraud values and 99.83% non-fraud
values. With the imbalanced of the dataset, the result of the model can become biased. In order to
solve this issue, StratifiedShuffleSplit is used. This function helps to manage the imbalanced
dataset with the help of class distribution in the original dataset which is maintained in trained
and test split. The shuffling is the part of this function which helps to split the data with the help
of randomness before the splitting and the dataset is not influenced by inherent. This function
helps to remove the bias which is introduced by the under sampling and the train as well as test
set shows the original distribution of the class (Uçan and Alheeti, 2021). 4.5 Result
Name Precisio
n
Recal
l Accurac
y F1
Logistic
Regression 0.956
0.936
0.947
0.94
6
SVC
0.944
0.914
0.931
0.92
9
Random Forest
0.92
0.95
0.93
0.93
Decision Tree
0.94
0.91
0.93
0.92
Table.1: Model Validation
Based on the above table, the logistic regression, and random forest have the significant amount
of recall, precision and F1 score. Based on the requirement and accuracy of the model, logistic
regression is the best algorithm for this classification. 30
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Logistic Regression
The testing of the model is done with the help of accuracy and confusion matrix which helps to
provide the overall performance of the model. In the figure 9, the performance of the logistic
regression is shown. The accuracy of the model is 95% and the precision ratio of the model has
the value of 0.956 which helps to correctly predict the positive values by more than 95%. The
recall value identifies the actual positive values by 93.5%. From the confusion matrix, the true
positive value is 92 and the false positive value is 4. The false negative value is 6 and true
positive value is 87. Figure.9: Result of Logistic Regression
31
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Support Vector Classifier
In the figure 10, the overall performance of the support vector machine is provided. The accuracy
of this model is 0.97 which indicates that this model can successfully classifies 97% of the total
prediction. The recall value of the SVC is 0.97 for the class of 0 whereas the value of this aspect
for class 1 is 91. On the other hand, the F1 score for the SVM is 0.94 for class 0 and 0.94 for
class 1. Thus, this model can accurately predict the positive instances by 91% and negative
instances for 97%. Based on the confusion matrix, this model has 93 true negative values and 3
false positive values. The false negative value is 8 and true positive value is 85. Figure.10: Result of SVC
32
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Random Forest Classifier
From the figure 11, the accuracy of the algorithm is 0.9365 which shows that this model can
classify more than 93% of the total data correctly. In the case of precision value, the random
forest has the value of 0.93 for class 0 and 0.95 for class 1. Moreover, this model has the recall
amount of 0.95 for majority class and 0.92 for minority class. On the other hand, the F1- score of
this model has the value of 0.94 for class 0 and 0.93 for class 1. From the confusion matrix, this
algorithm has 93 true negative and 3 false positive. It has 8 false negative value and 85 true
positive values. Figure.11: Result of Random Forest
33
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Decision Tree
Based on the figure 11, the accuracy of the above model is 0.9312 which indicates that this
model can classify the total data by more than 93%. The precision value of this model is 0.92 for
class 0 and 0.94 for class 1. On the other hand, the recall value of this model is 0.95 for class 0
and 0.91 for class 1. Based on the f1 score, the value of class 0 is 0.93 and value of class 1 is
0.93. From the confusion matrix, the true negative is 91 with false positive of 5. It has 7 false
negative and 86 true positive. Figure.12: Result of Decision Tree
34
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
KNN
Based on the figure 13, The accuracy of the model is 94.17 which is significantly higher. Thus,
this model can accurately predict the value by more than 94%. From the precision score, the 0
class has the value of 0.92 and class 1 has the value of 0.94. The recall value of this model is
0.95 for class 0 and 0.91 for the class 1. From the F1 score, the class 0 has the value of 0.93 and
class 1 has the value of 0.93. From the confusion matrix, false positive value is 3 with the true
negative value of 93. The true positive value of the model is 85 with the false negative value of
8. Figure.13: Result of KNN
35
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
36
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4.6 Proposed Algorithm
The above graph shows the performance of all algorithms along with the propose algorithm
Gaussian NB. From the above, the recall value is significantly high for this algorithm which
helps to provide accurate positive instances compared to other models used in this study. From
the figure 14, the gaussian NB has the recall value of 0.96 which is the highest value compare to
other models used in this study. Moreover, the logistic regression has the highest accuracy
followed by gaussian NB. The LR has the highest precision value. The decision tree as well as
the random forest have the lowest accuracy compared other models used in this project. Logist
ic Regr
ession
SVC
Random
Fore
st
Decisi
on Tree
Pro
posed Method (Gaussian NB)
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
Precision
Recall Accuracy F1
Figure.14: Comparison of Models with Proposed Model
37
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Gaussian NB
From the figure 14, the accuracy of the model is 94.17 which is significantly high. The precision
value of the model is 0.99 for class 0 and 0.99 for class 1. The recall value of this model is 0.99
for class 0 and 0.89 for class 1. In the case of F1 score, the value of class 0 is 0.95 and value of
class 1 is 0.94. Thus, this model can accurately predict the positive value by 90% and negative
instances by 99%. Thus, the performance of this model is significantly high compared to another
model used in this project. From the confusion matrix, the true negative value is 93 with false
positive value of 3. The true positive value is 85 with false negative value of 8. Figure.15: Result of GaussianNB
38
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chapter 5: Critical Evaluation
From the section of the literature review, it can be stated that different kinds of machine learning
algorithms can perform a major role in the process of helping financial institutions in the process
of sensing different kinds of credit card fraudulent activities. It can play a major role in helping
the nacks and other financial services to maintain their financial stability and reputational image.
Different machine learning algorithms that have a major role in the process of performing fraud
detection are decision trees, random forests, and support vector machines (Esenogho et al.,
2022). Support Vector Machine specifically focuses on predicting credit card fraud by the
process of generating a decision boundary that helps in separating non-fraudulent and fraudulent
transactions according to the features. It also plays a major role in classifying new transactions
by the process of measuring which side of the border they fall on, performing a significant role in
identifying potential fraud cases with the highest accuracy. Decision tree plays a vital role in
predicting credit card fraud detection by the process of structuring a hierarchical structure of
decision nodes according to the transaction features. It utilizes a sequence of if-else conditions in
order to steer through the tree and stamp a non-fraud or fraud label to every transaction. The way
followed by a transaction directly results in the last prediction, generating the decision trees
effectual for the process of credit card fraud detection (Zioviris, Kolomvatsos, and Stamoulis,
2022). On the other side, Random Forest aims to credit card fraud detection by the process of
combining several decision trees. It helps in generating a collection of trees, each instructing on a
random subset of the data with the substitute. All trees separately predict non-fraud or fraud for
the transactions, and the final forecast is typically determined through majority voting or
averaging across all the trees.
Based on the analysis section it can be stated that among different machine learning algorithms,
the logistic regression can be considered the most effective algorithm in the process of detecting
a variety of credit card frauds. The accuracy level of Logistic Regression in predicting credit card
fraud can be considered almost 95%. LR is more likely to have better accuracy in the process of
detecting different kinds of credit card frauds in comparison with SVM, decision trees and other
algorithms as they focus on combining numerous decision trees, helping to eliminate or reduce
overfitting and capturing an extensive range of patterns. They can also play a vital role in
39
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
leveraging the diversity of trees in order to mitigate individual tree errors and biases, leading to a
more accurate and robust prediction by the process of aggregating the averages or votes across
the ensemble.
After discussing the literature review section and analysis section, it can be stated that the LRis
considered the most effective machine learning algorithm with the highest accuracy level
compared to the other machine learning algorithms. Thus, financial institutions can utilize the
use of LR in order to detect different kinds of credit card fraud in an efficient way.
According to the literature review section it can be stated that there are different kinds of
predictive features of credit card transactions, which can play a vital role in helping machine
learning algorithms to detect credit card fraud in a more effective and fast way. The total amount
of transactions is one of the most significant features related to credit card transactions. These
particular features can be utilized to identify suspicious transactions or outliers, which may point
to fraud. The date and time of the transaction can be considered another major predictive feature
associated with credit card transactions. If the time and date of the transaction are unusual it can
directly point to credit card fraud, helping the cardholders and card issues to avoid the chances of
experiencing different kinds of credit card fraud (Rtayli and Enneya, 2020). On the other side,
the location of the cardholder can play a major role in delivering insights into the habits of their
spending, which can be considered very effective and beneficial in helping financial institutions
to detect fraudulent activities associated with credit card transactions. The currency of the
transaction is also one of the most significant predictive features of a credit card transaction. The
currency can help credit card holders and the credit card issuer to identify the area of the
transaction as if the transaction is generated within the home country or outside of the country
(Bin Sulaiman, Schetinin, and Sant, 2022). The frequency of the transaction is another vital
aspect that can be considered effective in the process of sensing the chance of facing fraud in
credit card transactions.
Based on the analysis section it can be considered that most of the banks and financial
institutions focus on detecting different kinds of fraud in the period of credit card transactions
two most common features, such as the amount of transaction, and transaction frequency. With
the help of measuring transaction frequency, the cardholders and card issues are capable of
identifying if there are any kinds of fraudulent activities taking place. Machine learning
40
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
algorithms have gathered insights into the normal frequency of using the credit card, which can
play a significant role in the process of identifying any abnormal and unusual transaction in an
efficient and successful way.
After analyzing the literature review chapter and analysis section it can be considered that banks
and other financial institutions can consider the most predictive features of a Credit Card
Transaction in the process of predicting credit card fraud.
Based on the literature review section it can be stated that there are different limitations
associated with the machine learning algorithms in the process of detecting credit card frauds.
Some of these major limitations are Data Quality and Availability, Interpretability, Time and cost,
Overfitting, Imbalanced Data, Concept Drift, Adversarial Attacks, and more. All these limitations
can play a significant role in the process of damaging the ability of machine learning algorithm
in credit card fraud detection (Shirgave et al., 2019). It can be considered very important to
deliver high quality and adequate amount of information to the chosen machine learning
algorithms in order to accomplish the desired outcomes. Sometimes, it can be considered as
challenging to deliver high quality data to the machine learning model, which can directly lead to
inappropriate and poor results. On the other side, the entire process of utilizing machine learning
models in detecting credit card frauds can be considered very costly and time-consuming.
Selecting, training, building and maintaining machine learning model also require enough
expertise and resources, and gathered information required to be regularly validated, updated and
cleaned. All these can be considered very difficult to perform.
According to the analysis section it can be considered that it is very important to reduce the
limitation of machine learning algorithms in the process of detecting credit card frauds in order
to accomplish a more reliable and successful outcome (Hussein et al., 2021). This section also
played a major role in describing those different kinds of limitation can perform a vital role in the
process of damaging the ability of machine learning model from predicting more accurate
prediction about credit card frauds. After discussing the literature review section and discussion chapter it can finally be stated that
different kinds of limitations associated with the machine learning algorithms in the process of
41
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
predicting credit card frauds played a major role in encouraging the financial institution to
incorporate multiple effective strategies in order to obtain the ideal and expected predictions.
According to the literature review section it can finally be stated that there are multiple Ethical
and Effective Fraud Detection Strategies that help machine learning to identify credit card fraud
in a more efficient way. The major and effective strategies for credit card fraud detection are
selecting the ideal machine learning algorithm, analyzing multiple data sources, monitoring in
real-time, focusing on false positives, and delivering fraud education to the users and employees
within the organizations.
Based on the analysis section it can be asserted that the employees and users need to be aware of
all the precautions that can help them in the process of avoiding the situation of affecting or
being infected by different kinds of credit card fraud.
After discussing the literature review and review it can be stated that the idea of incorporating
multiple strategies that can help in preventing credit card fraud can help the financial institution
to avoid the chances of facing huge financial loss and reputational loss.
42
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chapter 6: Conclusion and Recommendation
6.1 Conclusion
With the growing technology, the fraudulent activities associated with credit card transactions are
also increasing day by day. Credit card fraud can directly lead to reputational ad financial
damages. Thus, it is very important for credit card users to avoid the chances of acing different
kinds of credit card fraud in a successful way. Different kinds of machine learning algorithms
can play a significant role in the process of predicting any kinds of fraudulent activities
associated with credit card transactions, which can play a major role in helping the financial
institution to avoid credit card fraud. This particular study on the topic “Credit Card Fraud
Detection” has specifically focused on contributing to the understanding of credit card fraud and
providing insights into how credit card issuers can continue to improve their security measures to
prevent fraudulent use. The introduction section has played a major role in structuring effective
research objectives, which played a significant role in the process of helping the whole study to
discuss the research topic in a more efficient and precise way. The study has discussed different
kinds of machine learning algorithms that can be utilized in the process of detecting a variety of
credit card frauds. The algorithms that have been discussed in this study are random forest,
decision tree, support vector machine, and more. It is very important to select the ideal machine
learning model according to the situation in order to achieve the expected prediction. All these
three machine learning algorithms follow their own way in the process of predicting different
kinds of fraud. On the other side, the study has also focused on addressing the most predictive
features of credit card transactions, which can play a significant role in the process of forecasting
the chances of credit card fraud. A range of limitations of machine learning algorithms in the
process of credit card fraud detection has also been discussed in this research paper. These
limitations may play a major role in the process of affecting the machine learning algorithms’
ability to detect credit card fraud. Some of these limitations are data availability and quality,
interpretability, cost and time, overfitting, imbalanced data, concept drift, and adversarial attack.
Credit card users and credit card issuers should successfully implement different kinds of
strategies in order to avoid the chances of experiencing different kinds of fraudulent activities
related to credit card transactions. The section on the literature gap helped the study to identify
43
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
the significant areas that have not been discussed in this specific research paper. Interpretivism
research philosophy is utilized in this paper in order to address the research objectives in a more
effective and precise way. The secondary data collection method is considered in this study and
the secondary data sources that have been utilized in the process of developing this research
paper are newspapers, journals, reports, magazines, books, and websites. The quantitative data
that has been gathered through the secondary data collection method is analyzed with the help of
a thematic tool. The illegal transactions made using credit cards are considered to be credit card
fraud. These fraudulent actions must be dealt by proper monitoring of the transactions and using
an effective fraud detection system using machine learning process. Various kinds of machine
learning processes and related algorithms have helped companies as well as users to detect
fraudulence and safeguard their interests. More extended research needs to be done in this
respect to improve the efficiency of the machine learning systems in credit card detection.
6.2 Recommendation
One of the most efficient approaches for credit card fraud detection can be considered the idea of
implementing machine learning algorithms. By the process of leveraging historical transaction
data, such algorithms can be trained about anomalies and patterns associated with deceptive
activities. Different kinds of features such as time, location, transaction amount, and user
behavior can be utilized in order to train the ML models that can precisely identify possible
fraud. Moreover, the idea of incorporating anomaly detection techniques and real-time
monitoring can improve the system's capability of identifying and preventing fraudulent
transactions, providing a strong defense against credit card fraud.
6.3 Future Scope
Besides different kinds of machine learning algorithms, there are different kinds of advanced and
innovative technologies that can play a major role in the process of detecting credit card fraud.
Future studies on the topic of “credit card fraud detection” can focus on discussing different
kinds of innovative technologies rather than the machine learning model in credit card fraud
detection. On the other side, the future research paper could also focus on using both primary and
secondary data collection methods in order to develop the quality and reliability of the entire
research paper. 44
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
References
Abdulghani, A.Q., Uçan, O.N. and Alheeti, K.M.A., 2021, December. Credit card fraud detection
using XGBoost algorithm. In
2021 14th International Conference on Developments in eSystems
Engineering (DeSE)
(pp. 487-492). IEEE.
Ahmad, H., Kasasbeh, B., Aldabaybah, B. and Rawashdeh, E., 2023. Class balancing framework
for credit card fraud detection based on clustering and similarity-based selection
(SBS).
International Journal of Information Technology
,
15
(1), pp.325-333.
Alharbi, A., Alshammari, M., Okon, O.D., Alabrah, A., Rauf, H.T., Alyami, H. and Meraj, T.,
2022. A novel text2IMG mechanism of credit card fraud detection: a deep learning
approach.
Electronics
,
11
(5), p.756.
Al-Shabi, M.A., 2019. Credit card fraud detection using autoencoder model in unbalanced
datasets.
Journal of Advances in Mathematics and Computer Science
,
33
(5), pp.1-16.
Arya, M. and Sastry G, H., 2020. DEAL–‘Deep Ensemble ALgorithm’framework for credit card
fraud detection in real-time data stream with Google TensorFlow.
Smart Science
,
8
(2), pp.71-83.
Baier, L., Jöhren, F. and Seebacher, S., 2019, June. Challenges in the Deployment and Operation
of Machine Learning in Practice. In
ECIS
(Vol. 1).
Bhatt, U., Xiang, A., Sharma, S., Weller, A., Taly, A., Jia, Y., Ghosh, J., Puri, R., Moura, J.M. and
Eckersley, P., 2020, January. Explainable machine learning in deployment. In
Proceedings of the
2020 conference on fairness, accountability, and transparency
(pp. 648-657).
Bin Sulaiman, R., Schetinin, V. and Sant, P., 2022. Review of Machine Learning Approach on
Credit Card Fraud Detection.
Human-Centric Intelligent Systems
,
2
(1-2), pp.55-68.
Dang, T.K., Tran, T.C., Tuan, L.M. and Tiep, M.V., 2021. Machine Learning Based on
Resampling Approaches and Deep Reinforcement Learning for Credit Card Fraud Detection
Systems.
Applied Sciences
,
11
(21), p.10004.
Elshawi, R., Maher, M. and Sakr, S., 2019. Automated machine learning: State-of-the-art and
open challenges.
arXiv preprint arXiv:1906.02287
.
45
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Esenogho, E., Mienye, I.D., Swart, T.G., Aruleba, K. and Obaido, G., 2022. A neural network
ensemble with feature engineering for improved credit card fraud detection.
IEEE Access
,
10
,
pp.16400-16407.
Gevorkyan, M.N., Demidova, A.V., Demidova, T.S. and Sobolev, A.A., 2019. Review and
comparative analysis of machine learning libraries for machine learning. Discrete and
Continuous Models and Applied Computational Science, 27(4), pp.305-315.
Guo, Q., Chen, S., Xie, X., Ma, L., Hu, Q., Liu, H., Liu, Y., Zhao, J. and Li, X., 2019, November.
An empirical study towards characterizing deep learning development and deployment across
different frameworks and platforms. In
2019 34th IEEE/ACM International Conference on
Automated Software Engineering (ASE)
(pp. 810-822). IEEE.
Gupta, A. and Gupta, N., 2022.
Research methodology
. SBPD Publications.
Gupta, A., Lohani, M.C. and Manchanda, M., 2021. Financial fraud detection using naive bayes
algorithm in highly imbalance data set.
Journal of Discrete Mathematical Sciences and
Cryptography
,
24
(5), pp.1559-1572.
Ho, W.K., Tang, B.S. and Wong, S.W., 2021. Predicting property prices with machine learning
algorithms.
Journal of Property Research
,
38
(1), pp.48-70.
Hussein, A.S., Khairy, R.S., Najeeb, S.M.M. and ALRikabi, H.T., 2021. Credit Card Fraud
Detection Using Fuzzy Rough Nearest Neighbor and Sequential Minimal Optimization with
Logistic Regression.
International Journal of Interactive Mobile Technologies
,
15
(5).
Ileberi, E., Sun, Y. and Wang, Z., 2022. A machine learning based credit card fraud detection
using the GA algorithm for feature selection.
Journal of Big Data
,
9
(1), pp.1-17.
Jain, V., Agrawal, M. and Kumar, A., 2020, June. Performance analysis of machine learning
algorithms in credit cards fraud detection. In
2020 8th International Conference on Reliability,
Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO)
(pp. 86-88).
IEEE.
46
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
kaggle.com, 2023. Credit Card Fraud Detection.
Anonymized credit card transactions labeled as
fraudulent or genuine
[Online]. Available at:
https://www.kaggle.com/datasets/mlg-
ulb/creditcardfraud
. [Assessed On: 19/5/2023]
Kumar, R., 2018.
Research methodology: A step-by-step guide for beginners
. Sage.
Melnikovas, A., 2018. Towards an explicit research methodology: Adapting research onion
model for futures studies.
Journal of futures Studies
,
23
(2), pp.29-44.
Min, Q., Lu, Y., Liu, Z., Su, C. and Wang, B., 2019. Machine learning based digital twin
framework for production optimization in petrochemical industry.
International Journal of
Information Management
,
49
, pp.502-519.
Mohajan, H.K., 2018. Qualitative research methodology in social sciences and related
subjects.
Journal of economic development, environment and people
,
7
(1), pp.23-48.
Mukherjee, S.P., 2019. A guide to research methodology: An overview of research problems,
tasks and methods.
Naveen, P. and Diwan, B., 2020, October. Relative Analysis of ML Algorithm QDA, LR and
SVM for Credit Card Fraud Detection Dataset. In
2020 Fourth International Conference on I-
SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC)
(pp. 976-981). IEEE.
Orth, C.D.O. and Maçada, A.C.G., 2021. Corporate fraud and relationships: a systematic
literature review in the light of research onion.
Journal of Financial Crime
,
28
(3), pp.741-764.
Pandey, P. and Pandey, M.M., 2021.
Research methodology tools and techniques
. Bridge Center.
Patel, M. and Patel, N., 2019. Exploring Research Methodology.
International Journal of
Research and Review
,
6
(3), pp.48-55.
Ray, S., 2019, February. A quick review of machine learning algorithms. In
2019 International
conference on machine learning, big data, cloud and parallel computing (COMITCon)
(pp. 35-
39). IEEE.
47
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Rtayli, N. and Enneya, N., 2020. Enhanced credit card fraud detection based on SVM-recursive
feature elimination and hyper-parameters optimization.
Journal of Information Security and
Applications
,
55
, p.102596.
Saheed, Y.K., Baba, U.A. and Raji, M.A., 2022. Big Data Analytics for Credit Card Fraud
Detection Using Supervised Machine Learning Models. In
Big Data Analytics in the Insurance
Market
(pp. 31-56). Emerald Publishing Limited.
Sarker, I.H., 2021. Machine learning: Algorithms, real-world applications and research
directions.
SN computer science
,
2
(3), p.160.
Shirgave, S., Awati, C., More, R. and Patil, S., 2019. A review on credit card fraud detection
using machine learning.
International Journal of Scientific & technology research
,
8
(10),
pp.1217-1220.
Shukur, H.A. and Kurnaz, S., 2019. Credit card fraud detection using machine learning
methodology.
International Journal of Computer Science and Mobile Computing
,
8
(3), pp.257-
260.
Silaparasetty, N. and Silaparasetty, N., 2020. The tensorflow machine learning library. Machine
Learning Concepts with Python and the Jupyter Notebook Environment: Using Tensorflow 2.0,
pp.149-171.
Snyder, H., 2019. Literature review as a research methodology: An overview and
guidelines.
Journal of business research
,
104
, pp.333-339.
Suresh, H. and Guttag, J., 2021. A framework for understanding sources of harm throughout the
machine learning life cycle. In
Equity and access in algorithms, mechanisms, and
optimization
(pp. 1-9).
Tamminen, K.A. and Poucher, Z.A., 2020. Research philosophies. In
The Routledge
international encyclopedia of sport and exercise psychology
(pp. 535-549). Routledge.
Williamson, T., 2021.
The philosophy of philosophy
. John Wiley & Sons.
48
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Yang, L. and Shami, A., 2020. On hyperparameter optimization of machine learning algorithms:
Theory and practice.
Neurocomputing
,
415
, pp.295-316.
Zioviris, G., Kolomvatsos, K. and Stamoulis, G., 2022. Credit card fraud detection using a deep
learning multistage model.
The Journal of Supercomputing
,
78
(12), pp.14571-14596.
Žukauskas, P., Vveinhardt, J. and Andriukaitienė, R., 2018. Philosophy and paradigm of
scientific research.
Management culture and corporate social responsibility
,
121
, p.139.
49
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Appendix
Python Code
from google.colab import drive
drive.mount('/content/drive')
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA, TruncatedSVD
import matplotlib.patches as mpatches
import time
# Classifier Libraries
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
import collections
50
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
# Other Libraries
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from imblearn.pipeline import make_pipeline as imbalanced_make_pipeline
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import NearMiss
from imblearn.metrics import classification_report_imbalanced
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score,
accuracy_score, classification_report
from collections import Counter
from sklearn.model_selection import KFold, StratifiedKFold
import warnings
warnings.filterwarnings("ignore")
df = pd.read_csv('/content/drive/MyDrive/creditcard/creditcard.csv')
df.head()
df.describe()
df.isnull().sum().max()
print('No Frauds', round(df['Class'].value_counts()[0]/len(df) * 100,2), '% of the dataset')
print('Frauds', round(df['Class'].value_counts()[1]/len(df) * 100,2), '% of the dataset')
fig, ax = plt.subplots(1, 2, figsize=(18,4))
51
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
amount_val = df['Amount'].values
time_val = df['Time'].values
sns.distplot(amount_val, ax=ax[0], color='r')
ax[0].set_title('Distribution of Transaction Amount', fontsize=14)
ax[0].set_xlim([min(amount_val), max(amount_val)]) sns.distplot(time_val, ax=ax[1], color='b')
ax[1].set_title('Distribution of Transaction Time', fontsize=14)
ax[1].set_xlim([min(time_val), max(time_val)]) plt.show()
from sklearn.preprocessing import StandardScaler, RobustScaler
# RobustScaler is less prone to outliers.
std_scaler = StandardScaler()
rob_scaler = RobustScaler()
df['scaled_amount'] = rob_scaler.fit_transform(df['Amount'].values.reshape(-1,1))
df['scaled_time'] = rob_scaler.fit_transform(df['Time'].values.reshape(-1,1))
df.drop(['Time','Amount'], axis=1, inplace=True)
scaled_amount = df['scaled_amount']
52
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
scaled_time = df['scaled_time']
df.drop(['scaled_amount', 'scaled_time'], axis=1, inplace=True)
df.insert(0, 'scaled_amount', scaled_amount)
df.insert(1, 'scaled_time', scaled_time)
# Amount and Time are Scaled!
df.head()
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedShuffleSplit
print('No Frauds', round(df['Class'].value_counts()[0]/len(df) * 100,2), '% of the dataset')
print('Frauds', round(df['Class'].value_counts()[1]/len(df) * 100,2), '% of the dataset')
X = df.drop('Class', axis=1)
y = df['Class']
sss = StratifiedKFold(n_splits=5, random_state=None, shuffle=False)
for train_index, test_index in sss.split(X, y):
53
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
print("Train:", train_index, "Test:", test_index)
original_Xtrain, original_Xtest = X.iloc[train_index], X.iloc[test_index]
original_ytrain, original_ytest = y.iloc[train_index], y.iloc[test_index]
# Turn into an array
original_Xtrain = original_Xtrain.values
original_Xtest = original_Xtest.values
original_ytrain = original_ytrain.values
original_ytest = original_ytest.values
# See if both the train and test label distribution are similarly distributed
train_unique_label, train_counts_label = np.unique(original_ytrain, return_counts=True)
test_unique_label, test_counts_label = np.unique(original_ytest, return_counts=True)
print('-' * 100)
print('Label Distributions: \n')
print(train_counts_label/ len(original_ytrain))
print(test_counts_label/ len(original_ytest))
df = df.sample(frac=1)
# amount of fraud classes 492 rows.
fraud_df = df.loc[df['Class'] == 1]
54
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
non_fraud_df = df.loc[df['Class'] == 0][:492]
normal_distributed_df = pd.concat([fraud_df, non_fraud_df])
# Shuffle dataframe rows
new_df = normal_distributed_df.sample(frac=1, random_state=42)
new_df.head()
new_df_corr = df.corr()
new_df_corr
masking = np.triu(new_df_corr)
plt.figure(figsize = (25, 15))
plt.title('Correlation Matrix')
sns.heatmap(new_df_corr, cmap = 'viridis', annot = True, mask = masking, linecolor = 'white',
linewidths = 0.5, fmt = '.3f')
new_df.corr()["Class"]
f, axes = plt.subplots(ncols=4, figsize=(20,4))
# Negative Correlations with our Class (The lower our feature value the more likely it will be a
fraud transaction)
sns.boxplot(x="Class", y="V17", data=new_df, ax=axes[0])
axes[0].set_title('V17 vs Class Negative Correlation')
55
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
sns.boxplot(x="Class", y="V14", data=new_df, ax=axes[1])
axes[1].set_title('V14 vs Class Negative Correlation')
sns.boxplot(x="Class", y="V12", data=new_df, ax=axes[2])
axes[2].set_title('V12 vs Class Negative Correlation')
sns.boxplot(x="Class", y="V10", data=new_df, ax=axes[3])
axes[3].set_title('V10 vs Class Negative Correlation')
plt.show()
from scipy.stats import norm
f, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(20, 6))
v14_fraud_dist = new_df['V14'].loc[new_df['Class'] == 1].values
sns.distplot(v14_fraud_dist,ax=ax1, fit=norm, color='#FB8861')
ax1.set_title('V14 Distribution \n (Fraud Transactions)', fontsize=14)
56
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
v12_fraud_dist = new_df['V12'].loc[new_df['Class'] == 1].values
sns.distplot(v12_fraud_dist,ax=ax2, fit=norm, color='#56F9BB')
ax2.set_title('V12 Distribution \n (Fraud Transactions)', fontsize=14)
v10_fraud_dist = new_df['V10'].loc[new_df['Class'] == 1].values
sns.distplot(v10_fraud_dist,ax=ax3, fit=norm, color='#C5B3F9')
ax3.set_title('V10 Distribution \n (Fraud Transactions)', fontsize=14)
plt.show()
plt.figure(figsize = (10, 8))
plt.pie(df['Class'].value_counts(),labels=['No Fraud','Fraud'], autopct='%1.1f%%', explode =
(0.0, 0.1),startangle=50 ,colors = ['yellow','red'], shadow = True)
plt.legend(title = "Class", loc = 'lower right')
plt.show()
v14_fraud = new_df['V14'].loc[new_df['Class'] == 1].values
q25, q75 = np.percentile(v14_fraud, 25), np.percentile(v14_fraud, 75)
print('Quartile 25: {} | Quartile 75: {}'.format(q25, q75))
v14_iqr = q75 - q25
print('iqr: {}'.format(v14_iqr))
57
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
v14_cut_off = v14_iqr * 1.5
v14_lower, v14_upper = q25 - v14_cut_off, q75 + v14_cut_off
print('Cut Off: {}'.format(v14_cut_off))
print('V14 Lower: {}'.format(v14_lower))
print('V14 Upper: {}'.format(v14_upper))
outliers = [x for x in v14_fraud if x < v14_lower or x > v14_upper]
print('Feature V14 Outliers for Fraud Cases: {}'.format(len(outliers)))
print('V10 outliers:{}'.format(outliers))
new_df = new_df.drop(new_df[(new_df['V14'] > v14_upper) | (new_df['V14'] <
v14_lower)].index)
print('----' * 44)
# -----> V12 removing outliers from fraud transactions
v12_fraud = new_df['V12'].loc[new_df['Class'] == 1].values
q25, q75 = np.percentile(v12_fraud, 25), np.percentile(v12_fraud, 75)
v12_iqr = q75 - q25
v12_cut_off = v12_iqr * 1.5
v12_lower, v12_upper = q25 - v12_cut_off, q75 + v12_cut_off
58
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
print('V12 Lower: {}'.format(v12_lower))
print('V12 Upper: {}'.format(v12_upper))
outliers = [x for x in v12_fraud if x < v12_lower or x > v12_upper]
print('V12 outliers: {}'.format(outliers))
print('Feature V12 Outliers for Fraud Cases: {}'.format(len(outliers)))
new_df = new_df.drop(new_df[(new_df['V12'] > v12_upper) | (new_df['V12'] <
v12_lower)].index)
print('Number of Instances after outliers removal: {}'.format(len(new_df)))
print('----' * 44)
# Removing outliers V10 Feature
v10_fraud = new_df['V10'].loc[new_df['Class'] == 1].values
q25, q75 = np.percentile(v10_fraud, 25), np.percentile(v10_fraud, 75)
v10_iqr = q75 - q25
v10_cut_off = v10_iqr * 1.5
v10_lower, v10_upper = q25 - v10_cut_off, q75 + v10_cut_off
print('V10 Lower: {}'.format(v10_lower))
print('V10 Upper: {}'.format(v10_upper))
outliers = [x for x in v10_fraud if x < v10_lower or x > v10_upper]
59
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
print('V10 outliers: {}'.format(outliers))
print('Feature V10 Outliers for Fraud Cases: {}'.format(len(outliers)))
new_df = new_df.drop(new_df[(new_df['V10'] > v10_upper) | (new_df['V10'] <
v10_lower)].index)
print('Number of Instances after outliers removal: {}'.format(len(new_df)))
f,(ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20,6))
colors = ['#B3F9C5', '#f9c5b3']
# Boxplots with outliers removed
# Feature V14
sns.boxplot(x="Class", y="V14", data=new_df,ax=ax1, palette=colors)
ax1.set_title("V14 Feature \n Reduction of outliers", fontsize=14)
ax1.annotate('Fewer extreme \n outliers', xy=(0.98, -17.5), xytext=(0, -12),
arrowprops=dict(facecolor='black'),
fontsize=14)
# Feature 12
sns.boxplot(x="Class", y="V12", data=new_df, ax=ax2, palette=colors)
ax2.set_title("V12 Feature \n Reduction of outliers", fontsize=14)
ax2.annotate('Fewer extreme \n outliers', xy=(0.98, -17.3), xytext=(0, -12),
arrowprops=dict(facecolor='black'),
60
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
fontsize=14)
# Feature V10
sns.boxplot(x="Class", y="V10", data=new_df, ax=ax3, palette=colors)
ax3.set_title("V10 Feature \n Reduction of outliers", fontsize=14)
ax3.annotate('Fewer extreme \n outliers', xy=(0.95, -16.5), xytext=(0, -12),
arrowprops=dict(facecolor='black'),
fontsize=14)
plt.show()
X = new_df.drop('Class', axis=1)
y = new_df['Class']
# T-SNE Implementation
t0 = time.time()
X_reduced_tsne = TSNE(n_components=2, random_state=42).fit_transform(X.values)
t1 = time.time()
print("T-SNE took {:.2} s".format(t1 - t0))
# PCA Implementation
t0 = time.time()
61
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
X_reduced_pca = PCA(n_components=2, random_state=42).fit_transform(X.values)
t1 = time.time()
print("PCA took {:.2} s".format(t1 - t0))
# TruncatedSVD
t0 = time.time()
X_reduced_svd
=
TruncatedSVD(n_components=2,
algorithm='randomized',
random_state=42).fit_transform(X.values)
t1 = time.time()
print("Truncated SVD took {:.2} s".format(t1 - t0))
X = new_df.drop('Class', axis=1)
y = new_df['Class']
from sklearn.model_selection import train_test_split
# This is explicitly used for undersampling.
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
def perform(y_pred):
print("Precision : ", precision_score(y_test, y_pred))
print("Recall : ", recall_score(y_test, y_pred))
print("Accuracy : ", accuracy_score(y_test, y_pred))
print("F1 Score : ", f1_score(y_test, y_pred))
62
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
print('')
print(confusion_matrix(y_test, y_pred), '\n')
print(classification_report(y_test, y_pred))
cm = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix(y_test, y_pred),
display_labels = ['No Fraud', 'Fraud'])
cm.plot()
model_lr = LogisticRegression()
model_lr.fit(x_train, y_train)
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score,
confusion_matrix, ConfusionMatrixDisplay, classification_report, PrecisionRecallDisplay,
RocCurveDisplay
y_pred_lr = model_lr.predict(x_test)
perform(y_pred_lr)
model_svc = SVC()
model_svc.fit(x_train, y_train)
y_pred_svc = model_svc.predict(x_test)
perform(y_pred_svc)
model_rf = RandomForestClassifier()
model_rf.fit(x_train, y_train)
y_pred_rf = model_rf.predict(x_test)
63
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
perform(y_pred_rf)
model_dt = DecisionTreeClassifier()
model_dt.fit(x_train, y_train)
y_pred_dt = model_dt.predict(x_test)
perform(y_pred_dt)
model_nb = GaussianNB()
model_nb.fit(x_train, y_train)
y_pred_nb = model_nb.predict(x_test)
perform(y_pred_nb)
model_knn = knn = KNeighborsClassifier(n_neighbors=3)
model_knn.fit(x_train, y_train)
y_pred_knn = model_knn.predict(x_test)
perform(y_pred_knn)
fig, ax = plt.subplots()
plt.title('Precision-Recall Curve')
PrecisionRecallDisplay.from_predictions(y_test, y_pred_nb, name = f'Gaussian Naive Bayes',
ax=ax, color = 'red')
PrecisionRecallDisplay.from_predictions(y_test, y_pred_knn, name = f'KNeighborsClassifier',
ax=ax, color = 'pink')
PrecisionRecallDisplay.from_predictions(y_test, y_pred_lr, name = f'Logistic Regression',
ax=ax, color = 'blue')
64
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
PrecisionRecallDisplay.from_predictions(y_test, y_pred_dt, name = f'Decision Tree', ax=ax,
color = 'brown')
PrecisionRecallDisplay.from_predictions(y_test, y_pred_rf, name = f'Random Forest', ax=ax,
color = 'yellow')
PrecisionRecallDisplay.from_predictions(y_test, y_pred_svc, name = f'SVC', ax=ax, color =
'orange')
plt.legend(loc = 'best', fontsize = '6.8')
fig, ax = plt.subplots()
plt.title('ROC Curve')
RocCurveDisplay.from_predictions(y_test, y_pred_nb, name = f'Gaussian Naive Bayes', ax=ax,
color = 'red')
PrecisionRecallDisplay.from_predictions(y_test, y_pred_knn, name = f'KNeighborsClassifier',
ax=ax, color = 'pink')
RocCurveDisplay.from_predictions(y_test, y_pred_lr, name = f'Logistic Regression', ax=ax,
color = 'blue')
RocCurveDisplay.from_predictions(y_test, y_pred_dt, name = f'Decision Tree', ax=ax, color =
'brown')
RocCurveDisplay.from_predictions(y_test, y_pred_rf, name = f'Random Forest', ax=ax, color =
'yellow')
RocCurveDisplay.from_predictions(y_test, y_pred_svc, name = f'SVC', ax=ax, color = 'orange')
plt.plot([0, 1], [0, 1], linestyle = "--", color = 'black')
plt.legend(loc = 'best', fontsize = '6.8')
65
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Foundations of Business (MindTap Course List)
Marketing
ISBN:9781337386920
Author:William M. Pride, Robert J. Hughes, Jack R. Kapoor
Publisher:Cengage Learning

Foundations of Business - Standalone book (MindTa...
Marketing
ISBN:9781285193946
Author:William M. Pride, Robert J. Hughes, Jack R. Kapoor
Publisher:Cengage Learning
Recommended textbooks for you
- Foundations of Business (MindTap Course List)MarketingISBN:9781337386920Author:William M. Pride, Robert J. Hughes, Jack R. KapoorPublisher:Cengage LearningFoundations of Business - Standalone book (MindTa...MarketingISBN:9781285193946Author:William M. Pride, Robert J. Hughes, Jack R. KapoorPublisher:Cengage Learning

Foundations of Business (MindTap Course List)
Marketing
ISBN:9781337386920
Author:William M. Pride, Robert J. Hughes, Jack R. Kapoor
Publisher:Cengage Learning

Foundations of Business - Standalone book (MindTa...
Marketing
ISBN:9781285193946
Author:William M. Pride, Robert J. Hughes, Jack R. Kapoor
Publisher:Cengage Learning