Busted! Engineers Revolutionize Fraud Detection with Machine Learning
In the U.S., credit card fraud costs $5 billion annually, identity theft adds $16.4 billion, and Medicare fraud drains $60 billion each year.
Fraud is widespread in the United States and increasingly driven by technology. For example, 93% of credit card fraud now involves remote account access, not physical theft. In 2023, fraud losses surpassed $10 billion for the first time. The financial toll is staggering: credit card fraud costs $5 billion annually, affecting 60% of U.S. cardholders, while identity theft resulted in $16.4 billion in losses in 2021. Medicare fraud costs $60 billion each year, and government losses range from $233 billion to $521 billion annually, with improper payments totaling $2.7 trillion since 2003.
Machine learning plays a critical role in fraud detection by identifying patterns and anomalies in real-time. It analyzes large datasets to spot normal behavior and flag significant deviations, such as unusual transactions or account access. However, fraud detection is challenging because fraud cases are much rarer than normal ones, and the data is often messy or unlabeled.
To address these challenges, researchers from the College of Engineering and Computer Science at Florida Atlantic University have developed a novel method for generating binary class labels in highly imbalanced datasets, offering a promising solution for fraud detection in industries like health care and finance. This approach works without relying on labeled data, a key advantage in sectors where privacy concerns and the cost of labeling are significant obstacles.
The team tested their method on two real-world, large-scale datasets with severe class imbalance (less than 0.2%): European credit card transactions (more than 280,000 from September 2013) and Medicare Part D claims (more than 5 million from 2013 to 2019), both labeled as fraudulent or genuine. These datasets, with fraud cases far outnumbered by non-fraud cases, provide a real-world challenge ideal for testing fraud detection methods.
Results of the study, published in the Journal of Big Data, show that this new labeling method effectively addresses the challenge of labeling severely imbalanced data in an unsupervised framework. Additionally, and unlike traditional methods, this approach evaluated the newly generated fraud and non-fraud labels directly without the need of relying on a supervised classifier.
“The use of machine learning in fraud detection brings many advantages,” said Taghi Khoshgoftaar, Ph.D., senior author and Motorola Professor in the FAU Department of Electrical Engineering and Computer Science. “Machine learning algorithms can label data much faster than human annotation, significantly improving efficiency. Our method represents a major advancement in fraud detection, especially in highly imbalanced datasets. It reduces the workload by minimizing cases that require further inspection, which is crucial in sectors like Medicare and credit card fraud, where fast data processing is vital to prevent financial losses and enhance operational efficiency.”
The study shows the new method outperformed the widely-used Isolation Forest algorithm, providing a more efficient way to identify fraud while minimizing the need for further investigation. This confirms the method’s ability to generate reliable binary class labels for fraud detection, even in challenging datasets. It offers a scalable solution for detecting fraud without relying on costly and time-consuming labeled data, which requires significant manual expert input and is resource-intensive, especially for large datasets.
“Our method generates labels for both fraud or positive and non-fraud or negative instances, which are then refined to minimize the number of fraud labels,” said Mary Anne Walauskis, first author and a Ph.D. candidate in the FAU Department of Electrical Engineering and Computer Science. “By applying our method, we minimize false positives, or in other words, genuine instances marked as fraud, which is key to improving fraud detection.
This approach ensures that only the most confidently identified fraud cases are retained, enhancing accuracy and reducing unnecessary alarms, making fraud detection more efficient.”
The method combines two strategies: an ensemble of three unsupervised learning techniques using the SciKit-learn library and a percentile-gradient approach. The goal is to minimize false positives by focusing on the most confidently identified fraud cases. This is achieved by refining the labels and reducing errors in both the unsupervised methods (EUM) and the percentile-gradient approach (PGM).
The refined labels create a subset of confident labels that are highly likely to be accurate. These labels are then used to create confidence intervals and finalize the labeling, requiring minimal domain knowledge to select the number of positive instances.
“This innovative approach holds great promise for industries plagued by fraud, offering a more accessible and effective way to identify fraudulent activity and safeguard both financial and health care systems,” said Stella Batalama, Ph.D., dean of the College of Engineering and Computer Science. “Fraud’s impact goes beyond financial losses, including emotional distress, reputational damage and reduced trust in organizations. Health care fraud, in particular, undermines care quality and cost, while identity theft can cause severe stress. Addressing fraud is key to mitigating its broad societal impact.”
Looking ahead, the research team plans to enhance the method by automating the determination of the optimal number of positive instances, further improving efficiency and scalability for large-scale applications.
The current journal article, Unsupervised Label Generation for Severely Imbalanced Fraud Data, is an updated version of the researchers’ previous work, Confident Labels: A Novel Approach to New Class Labeling and Evaluation on Highly Imbalanced Data. The original paper was presented and published at the IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI) in November 2024, where it won the Best Student Paper Award. ICTAI, with an acceptance rate of about 25% from more than 400 submissions, is a prestigious conference.
-FAU-
Latest News Desk
- The Role of the Courts in Interpreting the ConstitutionFlorida Atlantic University's Dorothy F. Schmidt College of Arts and Letters presents the 2025 Constitution Day lecture "The Role of the Courts in Interpreting the Constitution."
- FAU Lands U.S. EPA Grant to Monitor Water Quality in Lake OkeechobeeWith a $700,000 grant, researchers will study how sunlight alters pollutants like pesticides into toxic byproducts using innovative sampling and chemical analysis techniques.
- FAU Engineers and Sensing Institute Map the Brain's Blood FlowResearchers built a detailed computer model of mouse brain vasculature, offering new insights into brain protection and potential breakthroughs in diagnosing stroke, Alzheimer's and traumatic brain injuries.
- FAU Provost Russell Ivy, Ph.D., Earns National APLU AwardFAU's Interim Provost and Vice President for Academic Affairs, Russell Ivy, Ph.D., is the winner of the 2025 Michael P. Malone International Leadership Award.
- Study Urges Reform in Mental Health Screening for Incarcerated YouthNew research from FAU's College of Education finds troubling gaps in the accuracy and fairness of mental health screenings used with incarcerated youth who have been adjudicated delinquent in the U.S.
- New Study on Hope Among U.S. Youth Reveals Key to Safer SchoolsAs the new school year begins, a study by FAU's College of Social Work and Criminal Justice reveals that hope in adolescents is a powerful protective force against bullying and cyberbullying.