Application of Deep Learning for Fraud Detection in E-payment System
Fraud detection in online shopping systems is the hottest topic nowadays. Fraud investigators, banking systems, and electronic payment systems such as PayPal must have an efficient and complex fraud detection system to prevent fraud activities that change rapidly. According to a CyberSource report from 2017, the present fraud loss by order channel, that is, the percentage of fraud loss in their web store was 74 percent and 49 percent in their mobile channels . Based on this information, the lesson is to determine anomalies across patterns of fraud behavior that have undergone change relative to the past. A good fraud detection system should be able to identify the fraud transaction accurately and should make the detection possible in real- time transactions. Fraud detection can be divided into two groups: anomaly detection and misuse detection . Anomaly detection systems bring normal transaction to be trained and use techniques to determine novel frauds. Conversely, a misuse fraud detection system uses the labeled transaction as normal or fraud transaction to be trained in the database history. So, this misuse detection system entails a system of supervised learning and anomaly detection system a system of unsupervised learning.
Electronic Fraud is increasing with the expansion of modern technology and global communication. This increase in the fraudulent transactions, resulting in substantial losses to the businesses, and therefore, fraud detection has become an important issue to be considered. Fraud detection can be seen as a problem of classification of legitimate transactions from the fraudulent transactions. Existing fraud detection techniques have been implemented by a number of methods such as data mining, statistics, and artificial intelligence. In general, fraud detection is a prediction problem and its objective is to maximize correct prediction and maintain incorrect predictions at an acceptable level of cost. Recent studies have shown that data mining using Artificial Intelligence (AI) techniques achieved better performance than traditional statistical methods for building prediction models [2, 12]. AI techniques, particularly rule-based expert systems, case-based reasoning systems and machine learning (ML) techniques such as neural networks have been used to support such analysis and classification problems. The major difference between traditional statistical methods and machine learning methods is that: in statistical methods usually researchers impose structures to different models, and construct the model by estimating parameters to fit the data or observation, while machine learning techniques allow learning the particular structure of the model from the data . As a result, the structures of the models used in statistical methods are relatively simple, easy to interpret and tend to under-fit the data while models obtained in machine learning methods are usually very complicated, hard to explain and tend to overfit the data. Under-fit and over-fit of the data is in fact the trade-off between the explanatory power and parsimony of a model, where explanatory power leads to high prediction accuracy and parsimony usually assures generalizability and interpretability of the model.
DEEP LEARNING TECHNIQUE FOR FRAUD DETECTION IN E-PAYMENT
Deep learning is the state of the art technology that recently attracted the IT circle’s considerable attention. The deep learning principle is an ANN that has many hidden layers. Conversely, non-deep learning feed forward neural networks have only a single hidden layer. The given picture shows the comparison between non-deep learning and deep learning with hidden layers. Now, we know about ANN, ML, and Deep Learning (DL). If these three words are metaphorically equated with the human body, they would be comparable as follows: artificial intelligence is like the body that contains the traits of intelligence, reasoning, communication, emotions, and feeling. ML is like one system that acts in the body, especially the visual system. Finally, deep learning is comparable to the visual signaling mechanism. It consists of a number of cells, such as retina that acts as a receptor and translates light signals into nerve signals. Now, we shall compare all the three categories with the human body. Deep learning is a generic term used for multilayer neural network. Based on deep learning, there are many algorithms to implement such as AE, deep convolutional network, support vector machine, and others. One problem in selecting the algorithm to solve the problem is that the developer should know the real problem and what each algorithm in deep learning does. The three algorithms of deep learning that do unsupervised learning are RBM, AE, and the sparse coding model. Unsupervised learning automatically extracts the meaningful features of your data, leverages the availability of unlabeled data, and adds a data-dependent regularization for training. In this study, we use AE for credit card fraud detection. AE has the input equal to the output in the hidden layer that has more or less the kind of input units depicted.
Fraud in e-payment system
With the development of the Internet in the 1990s and subsequent evolution of electronic paymenthave given rise to a dynamic business environment where transactions take place without face to face interaction. The International Telecommunication Union reported that internet is quickly becoming the first stop for people for making decision about buying services and products over internet and that the number of internet users has reached 2.3 billion in 2011. The increase in the volume of transactions has given rise to numerous electronic payment (e-payments) systems. Recent studies (Manning 1998, Wortington 2000) agree that electronic payment transactions have been in use for quite some years, like automatic teller machines (ATM), credit and debit cards, direct deposit and direct payment. Innopay Online payment report (2011 ) found that there is massive growth in the market for digital goods and this has given rise to numerous payment systems making the process of payment over the Internet easier for consumers. E-payment ensures smooth, secure and efficient transactions in ebusiness. However, the development of e-payment methods have expanded and with it the fakery has inevitably kept pace. As a result, the consumers face a number of risks to personal information and have second thoughts of giving their credit account information over internet (Centeno (2002). A recent study found that more than 7 million consumer complaints have been received during the period from 2007 to 2011and a total of 990,242 of these complaints were related to frauds The Consumer Sentinel Network (2012) (CSN). Consumers have reported paying over $1.5 billion in those fraud complaints. According to CyberSource (2012) found that merchants have reported losing an average of 1.0% of the total online revenue to fraud. It is also reported that the fraud rate of international order is 2% more than domestic orders. For the merchants managing e-frauds remain to be major and growing cost. Fraud in e-payment transaction is a global problem. We find fraudsters maneuver in all countries and industries. The purpose of this research is to study frauds in e-payments transactions, in order to point out some prospective threats and countermeasure required to reduce frauds. This paper discusses e-fraud (electronic fraud) and the different types of frauds in e-payment transactions.
OVERVIEW OF E-PAYMENT FRAUDS
Graycar & Smith (2002) has defined fraud as a “act or instance of deception, an artifice by which the right or interest of another is injured, a dishonest trick or stratagem.” Bergmen (2005) define E-Fraud as “a deception deliberately practiced to secure unfair or unlawful gain where some part of the communication between the victim and the fraudster is via a computer network and/or some action of the victim and/or the fraudster is performed on the computer network.” The USA Department of Justice (DOJ) defines e-fraud as “ a fraud scheme that uses one or more components of the Internet – such as chat rooms, e-mails, message boards, or web sites – to present fraudulent solicitations to prospective victims, to conduct fraudulent transactions, or to transmit the proceeds of fraud to financial institution or to other connected with the scheme” With more and more people using internet in recent times e-fraud is becoming common because internet allows fraudsters appears anonymous. Internet has been a suitable method for committing fraud because the Internet allows hiding real identification of people who deal with it and thus the fraudsters remain anonymous. As internet increases business opportunities the criminal develop more sophisticated and effective ways to scam online. Commission of European Committee (2008) report summarized the fraud problem by saying “Fraud against means of payment (payment fraud) remains a threat to the success of the internal market for payments. Payment fraud affects the consumer confidence in non-cash means of payment and ultimately the real economy.” Organizations find that the frauds in the e-payment transaction are increasing year after year. Association for Financial Professionals AFP (2012) has reported percentage of organizations subject to attempted and/or actual payments fraud has shown an increase from 2004 to 2009, while from 2010 and 2011 showed a decline in attempted and actual payment fraud. The report also showed that it the larger organization are targets of payment frauds than smaller ones. 81% of the organizations with annual revenue over $1 billion were victims of payments fraud in 2011 compared to 55% organization with less than $1billion revenue. It is also observed in 2011 that it is the larger organizations that have experienced decrease in fraud while the smaller organizations continue to experience increase in the fraud activity. According to the CyberSource (2012) the loss of online revenue to fraud in 2011 showed a decrease and the merchants reported 33% decrease in the orders lost to fraud which was due to effective fraud measures. In north America the total revenue loss was approximately $3.4 billion which is around $700 million increase over 2010. This was for due to increase in the rejection rate since 2009. The merchants reported that 2.8% of the orders were rejected due to suspicion of payment fraud. It is also found that international orders are riskier than domestic orders. It is reported by the merchants that the international order fraud rate is three times more than domestic.
TYPES OF E-FRAUDS
In order to assess the risks of and combat payment fraud there should be an understanding of its many facets. E-payment frauds have a multiplicity of types and there is no exact number or fixed list of these types. Frauds are classified as online fraud and offline frauds. Online frauds occur when fraudster possess legitimate company to obtain sensitive personal information and illegally conduct transactions in the existing accounts. Phishing and spoofing are examples of online frauds. Online frauds occur when fraudster steals personal information such as credit number, bank account number or other identification and uses it repeatedly to open new account or pledges transaction in the real individual/company’s name. Offline fraud includes credit card fraud, phone solicitations, print fraud, check scams and mail fraud. Department of Justice (DOJ) U.S has divided frauds(computer fraud) into three categories: 1) crimes in which computer hardware, peripherals, and software are the target of a crime; where in the fraudster obtains objects illegally: 2) crimes in which the computer is the immediate subject of a crime, that is the attacks is on a computer or a system, destruction or disrupting of which is the damage caused; and 3) crimes in which computers and related systems are the means or “instrument” by which ordinary crimes are committed, such as theft of identities, data, or money or the distribution of child pornography. There are different types of e-fraud and all of these attack in a slightly different way. Fraud can occur in a number of ways as listed below.
Account Hacking: Hacking includes gaining illegal entry into a person computer (PC) system. Fraudster use compromised customer credentials to hijack the origination system and use it in the lawful account holder’s name. Corporation are also targeted and also seen on a rise.
Attacks are aimed Identity theft: Identity theft/fraud refer to crime in which fraudster illegally obtains and uses another person personal information in some way that involves deception or fraud to gain something of value. Identity theft/fraud is the most serious crime for the person whose information is stolen as well as the financial institution.
Phishing : Phishing is a well-known technique for obtaining confidential information from an user by posing as a trusted authoring. Phishing is an attempt by fraudster to „fish‟ for your banking details through emails with attachment or hyperlinks. The e-mail appears to be send from legitimate organization to trick people in order to reveal sensitive information. On clicking the attachment or the hyperlink the computer system get infected with malware. During the next online transaction the malware will activate and steal private and personal financial information, including credit card numbers, PIN number which is used by fraudster to steal money from the account. Malware or „Malicious Software‟ is software which includes computer viruses, worms, Trojan Horses, spyware and other malicious software.
Spoofing or Website cloning: This is an act of creating a hoax web site or to say duplication of a website for criminal use. The fraudsters use legitimate companies name, logos, graphics and even code. This usually take form of know chat room or trade sites where in people would innocently giving out personal information to criminals or make a fake purchase of a product the does not exist.
Internet Gambling (Virtual casinos): The Internet has made certain types of gambling possible. A person in India or china from his home can participate in internet poker game in Caribbean over the Internet. CERT-LEXSI (2006) as cited by McAfee (2009) ther are around 15000 active online gambling sites in 2006 out of which 1766 operate on license. Although there are operating online casinos in an honest manner, the potential for fraud connected with casinos and bookmarking operations is far greater. Online gambling establishment appear and disappear with regularity, collecting from losers and not paying winners without any fear of being appended and prosecuted.
ACH Frauds : Automated Clearing House (ACH) Fraud is basically information fraud. With the increase in ACH transactions for corporate payments obviously there is increase in the ACH frauds. The fraudsters access the account information and route number illegitimately to steal funds directly from accounts. Government payment, payroll and other online payment face these frauds. In the year 2011, 17% of the organizations that are victims of fraud, suffered financial loss (AFP 2012)
Check frauds: Check frauds continue to be a threat to financial security. Electronic check frauds can be easily committed; the fraudster needs scanner, printer and desktop phishing software. The most common forms of check fraud include altering check, forging endorsement, counterfeiting checks and creating remote checks. According to the AFP Report (2011) 14% of the victims of the organization suffered financial loss due to check fraud. Lottery frauds: One will receive scam emails informing of winning a substantial amount of money in a lottery draw. When the receiver reply‟s, the sender then asks for bank account details and other personal information so they can transfer the money. These emails are fake and may ask to pay a handling feel that will lead to loss of money and your personal information which may be used in other fraud.
Nigerian advance fee fraud (419 fraud)” This e-fraud is the most popular and lucrative fraud, which is named after the section of Nigerian law that covers it “419”. The hoax often arrive with bulk mailing or family member email of asking the recipients to enter into business and getting money transferred with huge commission in return. Once the contact is established the fraudsters request money in advance which need opening of an account in the bank or paying some fee which leads to troubles and expenses.
MEASURES FOR FRAUD PREVENTION AND DETECTION
With the increase in e-commerce sales the merchants face challenges to reduce frauds in epayment transactions. E-frauds start with diversion of personal information. A poorly protected computer , a trash or recycling bin, an email message or chat on internet exposes to fraud. For the merchants the majority of fraud loss is due to consumer‟s claim of fraudulent account used and/or subsequent information from additional orders placed by fraudster. Fraud has become the persistent threat to merchants in e-payment transactions in e-business. It is impossible to totally eliminate the chance of fraud but timely measures taken can reduce the frauds. The merchant and the financial Institution take the necessary measures in combating fraud effectively. Fraud prevention involves taking measures to stop fraud from occurring and while fraud prevention fails then the merchant takes steps to detect the frauds quickly and stop it as soon as possible. Fraud prevention and detection involves planning, detecting and avoiding risk. Frauds can be controlled by monitoring the internet threats, understanding the customer and implementing security measures. Different techniques are required as there are different types of fraud in epayment transactions.
Fraud Detection Tools
Fraud detection tools are those that are used to assess the probability of frauds in payment transactions. Cyber source 2012 shows that 56% of merchants surveyed utilize an automated screening system. Every merchant doing e-business should be aware that frauds cannot be totally eliminated but can be controlled with protective measures. Some of these measures are to counter internal threats and some are to stop external threats. Some are relatively inexpensive while others are expensive involving huge amount of money. The anti-fraud tool are required to detect frauds accurately and in time, automate processes when required, adapt to changing patterns of fraud and behavior of customers. Some of the Anti-fraud tools are include
Universal Payment Identification Code (UPIC):A UPIC is a unique account identifier that issued by financial Institution is developed by Electronic Payment Network (EPN) . This will allow merchants doing e-business to receive e-payment without disclosing confidential banking information.
ACH Block (Automated Clearing House): This can place a „block‟ preventing ACH activity when the merchant account is unauthorized for ACH transaction. The merchant can receive alert from the bank to ACH transactions that don’t meet predefined conditions and then take decision whether to accept or decline the transaction. This enables the merchant to stop e-fraud before it happens.
Fraud Detection Software/tools: The organization doing e-business should install fraud detection software/tools that can detect fraud and to reduce fraud rates. The software will give fraud results and the merchant will be able to take decision whether to accept, reject or review the transaction. There are different categories of fraud detection tools which are grouped into validation service, proprietary data, purchase device tracing and multi-merchant data. Some of the tool are AVS – Address Verification Service, CVC – Card Verification Code and Risk Management Modules or Fraud Screens. According to CyberSource (2012), 56% of the merchants‟ survey made use of these tools.
IP Address Locator: It provides the merchant the data on user‟s exact location and displays its origin on a map, giving approximately the city and state. It also calculates the distance between the billing address of online buyer and actual location of persons entering the orders. This is not a fool proof that visitor is using a proxy; however the merchants can apply authentication measures for transaction wherein there is a great difference in distance and take decision on which transaction to review and which to allow. There should a check to if any users are using anonymous proxy servers to hide their IP address, which can be done by obtaining a list of anonymous proxy server
Nature and significance of deep learning
Machine-learning technology powers many aspects of modern society: from web searches to content filtering on social networks to recommendations on e-commerce websites, and it is increasingly present in consumer products such as cameras and smartphones. Machine-learning systems are used to identify objects in images, transcribe speech into text, match news items, posts or products with users’ interests, and select relevant results of search. Increasingly, these applications make use of a class of techniques called deep learning. Conventional machine-learning techniques were limited in their ability to process natural data in their raw form. For decades, constructing a pattern-recognition or machine-learning system required careful engineering and considerable domain expertise to design a feature extractor that transformed the raw data (such as the pixel values of an image) into a suitable internal representation or feature vector from which the learning subsystem, often a classifier, could detect or classify patterns in the input. Representation learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification. Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions can be learned. For classification tasks, higher layers of representation amplify aspects of the input that are important for discrimination and suppress irrelevant variations. An image, for example, comes in the form of an array of pixel values, and the learned features in the first layer of representation typically represent the presence or absence of edges at particular orientations and locations in the image. The second layer typically detects motifs by spotting particular arrangements of edges, regardless of small variations in the edge positions. The third layer may assemble motifs into larger combinations that correspond to parts of familiar objects, and subsequent layers would detect objects as combinations of these parts. The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure. Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years. It has turned out to be very good at discovering intricate structures in high-dimensional data and is therefore applicable to many domains of science, business and government. In addition to beating records in image recognition and speech recognition, it has beaten other machine-learning techniques at predicting the activity of potential drug molecules, analysing particle accelerator data9,10, reconstructing brain circuits, and predicting the effects of mutations in non-coding DNA on gene expression and disease. Perhaps more surprisingly, deep learning has produced extremely promising results for various tasks in natural language understanding14, particularly topic classification, sentiment analysis, question answering and language translation. We think that deep learning will have many more successes in the near future because it requires very little engineering by hand, so it can easily take advantage of increases in the amount of available computation and data. New learning algorithms and architectures that are currently being developed for deep neural networks will only accelerate this progress.
The future of deep learning
Unsupervised learning had a catalytic effect in reviving interest in deep learning, but has since been overshadowed by the successes of purely supervised learning. Although we have not focused on it in this Review, we expect unsupervised learning to become far more important in the longer term. Human and animal learning is largely unsupervised: we discover the structure of the world by observing it, not by being told the name of every object. Human vision is an active process that sequentially samples the optic array in an intelligent, task-specific way using a small, high-resolution fovea with a large, low-resolution surround. We expect much of the future progress in vision to come from systems that are trained end-toend and combine ConvNets with RNNs that use reinforcement learning to decide where to look. Systems combining deep learning and reinforcement learning are in their infancy, but they already outperform passive vision systems at classification tasks and produce impressive results in learning to play many different video games. Natural language understanding is another area in which deep learning is poised to make a large impact over the next few years. We expect systems that use RNNs to understand sentences or whole documents will become much better when they learn strategies for selectively attending to one part at a time. Ultimately, major progress in artificial intelligence will come about through systems that combine representation learning with complex reasoning. Although deep learning and simple reasoning have been used for speech and handwriting recognition for a long time, new paradigms are needed to replace rule-based manipulation of symbolic expressions by operations on large vectors
FRAUD DETECTION TECHNIQUES
Researchers have developed two general categories of detection techniques; misuse and anomaly detections. In misuse detection, well-known fraudulent transactions are encoded into patterns, which are then used to match new transactions to identify the fraudulent ones. In anomaly detection, normal behavior of user and system activities are first summarized into normal profiles, which are then used as yardsticks, so that run-time activities that result in significant deviation from the user profiles are considered as probable fraudulent transactions. In this section we will briefly describe some current fraud detection techniques used in detecting credit cards fraud, and mention the advantages and disadvantages of these techniques:
- Neural Networks Neural Networks (NN) are an Artificial Intelligence (AI) techniques or methods that represent models of biological learning systems. They are networks of many simple processors or units that are connected and process numeric values . It is structured as a directed graph with many nodes (processing elements) and arcs (interconnections) between them. The node computes a weighted sum of its inputs and generates an output. This output then becomes an input to other nodes in the network. The process continues until one or more outputs are generated. NN have been used as a powerful data mining technique in industry to do classification, clustering, generalization, and forecasting (predictions). For examples; it can be used to distinguish legal from fraudulent transactions, detect Internet fraud on an ecommerce site, predict which transaction may be a fraudulent transaction, etc. In addition, NN are one of the first and most successful applications in the area of detecting credit card fraud.
- Rule Induction Rule induction (RI) creates a decision tree (DT) or a set of decision rules from training examples with a known classification. DT is a predictive modeling technique used in classification, clustering, and prediction tasks. DT is defined as a tree where the root and each internal node are labeled with a question about an independent variable. The arcs from each node represent each possible answer to the associated question. Each leaf node represents a prediction (dependent variable) of a solution to the problem under consideration. The knowledge represented in DT can be extracted and represented in IF-THEN rules. One rule is created for each path from the root to a leaf node. DT classification is a two-step process: induction process to construct a DT using training data, and the second step, is to apply the DT to new instances or records of the data to determine its class.
- Expert Systems Rules can be generated from information obtained from a human expert or group of experts and stored in a rule based system as IF-THEN rules. If information is stored in a Knowledge base (KB) then it is called a Knowledge base system (KBS), or an Expert system (ES). The rules in the ES used to perform operations on a data to inference in order to reach appropriate conclusion. ES provide powerful and flexible solutions to many application problems. In the financial area, it can be used for several applications such as financial analysis and fraud detection. Suspicious activity or transaction can be detected from deviations from “normal’ spending patterns through the use of ES .
- Case-based reasoning (CBR) The basic idea of CBR is to adapt solutions that were used to solve previous problems and use them to solve new problems. In CBR, descriptions of past experience of human specialists, represented as cases, are stored in a database for later retrieval when the user encounters a new case with similar parameters. These cases can be used for classification purposes. Given a new problem, a CBR system tries to find a matching case. There are several algorithms used with this approach , but the nearest neighbor matching algorithm is often used. In this algorithm the training data is the model, and when a new case or instance is presented to the model, the algorithm looks at all the data to find a subset of cases that are most similar to it and uses them to predict the outcome.
- Genetic algorithms (GAs) GAs are search procedures based on the evolutionary computing methods. Given a population of potential problem solutions (individuals), evolutionary computing expands this population with new and potentially better solutions. In data mining, GAs may be used for clustering, prediction, and even association rules. These techniques can be used to find the fittest models from a set of models to represent the data.
- Inductive logic programming (ILP) ILP uses first order predicate logic to define a concept by using a set of positive and negative examples. This logic program is then used to classify new examples. In this approach of classification; complex relationship among components or attributes can be easily expressed, which improve the expressive power of the model. Domain knowledge can be easily represented in an ILP system, which improves the effectiveness of the system. The model expressed in predicate logic is also easy to understand.
- Regression Regression is a statistical techniques generally used to predict future values based on past values by fitting a set of points to a curve. Regression assumes that target data fit into some known type of function (e.g., linear, logistic, etc). In the banking systems, linear regression can be used to build a classification model of two classes, and then use this model to approve or reject a new loan application, or classify a transaction as fraudulent or non-fraudulent transaction.
- Summary of the advantages and disadvantages of the techniques
Several data mining tools for fraud detection are used widely in different organizations, but which are the most effective in terms of money, accuracy and time for a given application. A. Factors that affect the performance of data mining techniques There are several factors that affect the performance and accuracy of a data mining techniques. Understanding these factors is useful in evaluating and selecting an appropriate technique for an application. In this section a description of some of these factors is given: – Noise in data: data sets often contain noise in the form of inaccuracies and inconsistencies in the data. For example, inadequate data validation procedures may allow the user to enter incorrect data. – Missing data: attributes required for analysis may not be available. Missing data may cause problems in the training phase and in the classification process. Therefore, the ability of the technique to maintain this problem is an important factor. – Measuring performance: The performance of classification algorithm is usually examined by evaluating the accuracy of the classification. Accuracy of a data mining technique strongly influences its effectiveness. Higher predictive accuracy with actual data is a desirable feature. Classification accuracy is usually calculated by determining the percentage of records placed in the correct class. This ignores the fact that there also may be a cost associated with an incorrect assignment to the wrong class, which also should be determined. With two classes, there are four possible outcomes. Given a class C, and a record r to be classified, the four outcomes are:
True positive (TP): r predicted to be in C and is actually in it.
False positive (FP): r predicted to be in C and is not actually in it (False alarm).
True negative (TN): r not predicted to be in C and is not actually in it.
False negative (FN): r not predicted to be in C but is actually in it. TP and TN represent correct actions, but FP and FN represent incorrect actions. The performance of a classification could be determined by associating cost with each type of these outcomes. So maximizing TP and minimizing FP and FN are desirable characteristics of a data mining application. Some studies [1, 8] show that, overall predictive accuracy of fraud detection is inappropriate as the single measure of predictive performance. For example; If 1% of the transactions are fraudulent (usual probability of fraud), then a model that always predicts “legitimate” will be 99% accurate, and at the same time may not catch the fraudulent transactions. So of the 1% fraudulent transactions, there is a need to compute models that predict 100% of these, yet produce no false alarms (i.e. predict no legitimate transactions to be fraudulent). In addition, a model that predicts less than 100% of the fraudulent transactions may not be accurate.
For example, if a model predicts 90% of the fraudulent transactions, then it may correctly predict the lowest cost transactions, and being entirely wrong about the top 10% most expensive frauds. Therefore, there is a need for a cost model in order to best judge the success of the fraud detection technique.
– Scalability: usually data mining applications use too large data sets. These data sets loaded into RAM and may slow the processing and the running of the algorithm. In addition the network bandwidth capability of a system may affect the processing. So scalability of the data mining technique becomes an important issue.
– Different data types: Business databases or data sets contain data of various types (numeric, ordinal, and nominal etc). If a data mining technique can handle different data types, it will be more useful for business data mining.
– Explanation capability: the prediction result is more likely to be accepted by business manager, if it is explainable in business terms. Understanding the model building and involvement of the user in data loading and manipulation of the algorithm parameters will increase the accuracy and performance of the system. Therefore, the ability to explain the results is an important factor.
– Ease of integration: data mining application usually work with other information systems (IS) such as DSS or DBMS. Therefore, ease of integration with other information systems is a desirable characteristic of data mining application.
– Ease of operation: A technique that is easy to understand, easy to build and that requires fewer preprocessing activities is more useful to an end user.
– Skewed distribution: usually fraud detection data is highly skewed or imbalanced. So ability of the data mining model to handle this problem is a desirable characteristic.
Several studies to evaluate the different data mining techniques for credit card fraud detection were held. Each study evaluated the different techniques according to some of the above factors. Several machine learning algorithms (ID3, CART, BAYES, and RIPPER) and meta-learning strategies on real world credit card data (one million transactions) were tested in  to select the best classifier. This study shows that skewed class distribution could be a major factor on classifier performance and that True and False positive rates are the critical evaluation metrics for the accuracy of the models built. The study reported that 50%/50% distribution of fraud/non-fraud training data will generate classifiers with the highest True positive rate and low False positive rate. The best classifier in the study was a metaclassifier BAYES. The next two are CART and RIPPER. All the three trained on a 50%/50% fraud/non-fraud distribution, and each attained a True positive rate of approximately 80% and False positive rate less than 17% for base classifiers and 13% for the meta-classifier. ID3 was the last with True positive rate 76% and False positive rate 23%. A comparative study between Bayesian Belief Network (BBN) using STAGE algorithm and Artificial Neural Network (ANN) using BP algorithm for credit card fraud detection was held in . The results show that: BBNs were more accurate (in some cases 8% more of catching fraudulent transactions) and much faster to train than ANN, but ANN is much faster than BBNs when applied to new instances. In ; authors employ five different inductive learning programs: Bayes (with Bayesian learning algorithm), C4.5, ID3, CART, Ripper (rule induction algorithm), and a metalearning methods to compute accurate classification models for detecting electronic credit-card fraud. They used three metrics: the overall accuracy, the TP − FP spread and a cost model. There results show that metaclassifiers outperform all base classifiers, and in some cases by a significant margin. Also, results show that the most accurate classifiers are not necessarily the most cost effective. For example; Bayesian base classifiers are less accurate than Ripper and C4.5 base classifiers, but they are the best under the cost model. Also the results show that partitioning the large data set into smaller subsets improves the cost saving. In addition, the performance of classification was sensitive to the changes in the data sets. An expert system model for credit-card fraud detection was presented in . The model’s rule base was constructed through the input of many fraud experts within a bank. The model’s performance was measured based on classification accuracy and based on the cost of misclassification. It was assumed that the cost of one fraud would be approximately equal to the cost of disturbing twenty good customers. The expert model was able to classify 89.68% accuracy overall and 80.45% correct within the fraud class. This expert model was compared with other three different models of fraud detection with the same data sets and it outperforms all of them. A comparative study to select a suitable data mining tool or product for fraud detection was carried in . In this paper 40 data mining tools were chosen to be evaluated. Three stages of evaluation were done, and only the top five tools were continued to be evaluated in the third stage. In the third stage, an extensive evaluation includes the areas of client-server compliance, automation capabilities, breadth of algorithms implemented, ease of use, and overall accuracy was held. Results show that decision trees and neural networks allowed the best cross comparison, and proved to be better than the other models. Decision Trees were better than Neural Networks at reducing false alarms and specifying misclassification costs. In addition, the pruning options for the trees were better developed than the stopping rules for the networks, so the hazard of over-fit was less. A comparative research between three machine learning methods (ANNs, CBR, and RI) and one statistical method – least squares regression (LSR), to build prediction systems was held in . The study compares the prediction systems in terms of accuracy, explanatory value and configurability. The results show that all approaches are sensitive to changes in the training data set and may not cope well with heterogeneity. ANN was the best in accuracy, but the worst in explanatory value and configurability. CBR and LSR were the best in explanatory value and configurability, and good in accuracy. RI is good in explanatory value, but the last in accuracy and configurability. Several studies [10, 1, 8, 3] show that hybrid Techniques of classification can outperform a single classification technique with a significant margin. With hybrid architecture two or more classification techniques can be integrated or combined to get cooperative effect where the strength of one technique can compensate for the weakness of another. One study uses NN, GA, and RI to mine classification rules from a database. This approach combines the robustness and search ability of GA with high predictive accuracy of NN and interpretability of rules to create a data mining system that outperforms systems based on a single technique. Other studies [10, 1] show that combining classifiers computed by different machine learning algorithms produces a meta-classification that has the best overall performance. In , a study of combing BP, naïve Bayesian (NB), and C4.5 algorithms was held to build a meta-classifier to improve cost saving in credit card fraud detection. Results show that this approach performs better than the base classifiers used in the combination, and outperform the common technique (BP) used in industry. Also, results show that partitioning the data set and using multiple algorithms approach achieves higher cost savings. Combination of rule-based and case-based systems can: Offer a first check against known cases before undertaking rule-based reasoning and the associated search cost; and it can record search-based results as cases for future use, which can avoid duplicating costly search. In addition, it might be advantageous to use multiple algorithms for classification of the same problem. This will increase confidence in the results when predictions from two models are identical, and appropriately raising a flag when the two models disagree. H(igh), M(edium), and L(ow) indicate the expected ranking for each of them.