As a full-stack developer, I thrive on tackling new challenges and bringing ideas to life. I’m always excited to take on projects that push the boundaries of innovation and collaborate with like-minded, creative individuals.

Phone Number

+27 84 866 2418

Email

motaungleon@gmail.com

Linkedin

Leon Motaung

Address

12 Vermeer street, Bellville, Cape Town, 7530

Social

Chapter 3-Predictive Analytics with Ensemble Learning/

Chapter 3-Predictive Analytics with Ensemble Learning/

Started: 2025-11-28

View on GitHub
Python Seabon Numpy
Project Progress 100%

About this project

Online Payments Fraud Detection

Online Payments Fraud Detection with Machine Learning

Author: Leon Motaung

Technologies: Python, Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, Plotly

Project Overview

This project focuses on detecting fraudulent online payment transactions using machine learning. The dataset contains 284,808 transactions, each with 30 anonymized features (V1–V28), plus Amount, Time, and Class (fraud = 1, normal = 0). The dataset is highly imbalanced, with far fewer fraudulent transactions than normal ones.

Process & Steps Taken

  1. Data Cleaning and Preprocessing: Handled missing values, corrected data types, scaled Amount and Time features, removed outliers using IQR, and added a log-transformed scaled_amount feature. The cleaned dataset was saved as creditcard_final.csv.
  2. Data Visualization: Used boxplots, histograms, correlation heatmaps, and scatterplots to explore feature distributions, detect anomalies, and understand the class imbalance.
  3. Baseline Models: Trained Logistic Regression, Decision Tree, and K-Nearest Neighbors (KNN). Focused on Recall and F1-score due to class imbalance.

Baseline Model Evaluation

Model Accuracy Precision Recall F1-score Notes
Logistic Regression 0.9768 0.0577 0.8929 0.1083 Simple, interpretable baseline
Decision Tree 0.9993 0.8478 0.6964 0.7647 Visualize feature importance
K-Nearest Neighbors 0.9995 0.9744 0.6786 0.8000 Works well for small datasets

Findings & Insights

  • Class Imbalance Matters: Accuracy alone is misleading; Recall and F1-score are more meaningful for fraud detection.
  • Decision Trees and KNN performed well: Both models gave strong F1-scores, with Decision Trees providing feature importance insight.
  • Feature Relationships: Correlation heatmaps revealed subtle patterns useful for feature engineering.
  • Visualization is Key: Scatterplots and boxplots helped detect anomalies and better understand distributions.

Next Steps

  • Handle class imbalance with SMOTE or undersampling.
  • Train advanced models: XGBoost, LightGBM.
  • Evaluate models using Precision, Recall, F1-score, ROC-AUC.
  • Deploy model via Flask or Streamlit dashboard for real-time fraud detection.

Project Structure

  • app.py – Main fraud detection script
  • draw.py – Visualization scripts
  • creditcard_final.csv – Cleaned dataset
  • Images – Boxplots, scatterplots, heatmaps, charts

This project gave me hands-on experience in data preprocessing, visualization, and baseline model evaluation. It reinforced the importance of appropriate evaluation metrics for imbalanced datasets and prepared me to tackle more advanced predictive analytics tasks.