Abstract

Financial fraud, particularly credit card fraud, continues to pose substantial challenges to financial institutions due to its increasing frequency and impact on consumer trust. While traditional rule-based methods have provided foundational defenses, their limitations in scalability and adaptability have accelerated the adoption of machine learning (ML) techniques. Concurrently, Benford’s Law—a statistical principle often used in forensic accounting—has demonstrated efficacy in detecting anomalies within naturally occurring numerical datasets. This study explores a hybrid fraud detection approach that integrates Benford’s Law with supervised machine learning algorithms, including Logistic Regression, Random Forest, and k-Nearest Neighbors. Using the publicly available European credit card fraud dataset from Kaggle, this research applies Benford’s Law to transaction amounts and incorporates derived features into ML models. The findings reveal that fraudulent transactions deviate significantly from Benford’s expected digit distributions, validating its use as a pre-screening tool. Among the evaluated models, Random Forest consistently outperformed others across metrics such as recall, F1-score, and AUC-ROC, particularly when enhanced with Benford-derived features. The integration of Benford’s Law improved anomaly detection accuracy and interpretability, bridging a notable gap in the existing literature. This research demonstrates the practical benefits of combining statistical digit analysis with machine learning to create more robust, accurate, and explainable fraud detection systems.

Document Type

Paper

Disciplines

Artificial Intelligence and Robotics | Banking and Finance Law | Cybersecurity | Information Security | Statistical Models | Theory and Algorithms

DOI

10.25776/71s5-jt20

Publication Date

4-15-2025

Upload File

wf_yes

Share

COinS
 

Leveraging Benford’s Law and Machine Learning for Financial Fraud Detection

Financial fraud, particularly credit card fraud, continues to pose substantial challenges to financial institutions due to its increasing frequency and impact on consumer trust. While traditional rule-based methods have provided foundational defenses, their limitations in scalability and adaptability have accelerated the adoption of machine learning (ML) techniques. Concurrently, Benford’s Law—a statistical principle often used in forensic accounting—has demonstrated efficacy in detecting anomalies within naturally occurring numerical datasets. This study explores a hybrid fraud detection approach that integrates Benford’s Law with supervised machine learning algorithms, including Logistic Regression, Random Forest, and k-Nearest Neighbors. Using the publicly available European credit card fraud dataset from Kaggle, this research applies Benford’s Law to transaction amounts and incorporates derived features into ML models. The findings reveal that fraudulent transactions deviate significantly from Benford’s expected digit distributions, validating its use as a pre-screening tool. Among the evaluated models, Random Forest consistently outperformed others across metrics such as recall, F1-score, and AUC-ROC, particularly when enhanced with Benford-derived features. The integration of Benford’s Law improved anomaly detection accuracy and interpretability, bridging a notable gap in the existing literature. This research demonstrates the practical benefits of combining statistical digit analysis with machine learning to create more robust, accurate, and explainable fraud detection systems.