Classification and Regression Using Supervised Learning
Started: 2025-11-28
About this project
Python Data Preprocessing & Classification Techniques
This repository demonstrates data preprocessing techniques and basic classifiers using
scikit-learn. It includes examples with visualizations to help understand how classifiers separate different classes.
Techniques Covered
Data Preprocessing
- Mean Removal (Standardization): Centering data so that the mean is 0 and standard deviation is 1.
- Scaling (Min-Max Scaling): Rescales features to a fixed range (0 to 1).
- Normalization (L1/L2): Scales rows to have unit norm.
- Binarization: Converts numeric data into 0/1 using a threshold.
- Label Encoding: Converts categorical labels into numeric form.
Classification
-
Naïve Bayes Classifier: Uses Bayes theorem and assumes features are independent.
Example decision boundary:
-
Logistic Regression Classifier: Predicts class probabilities using a logistic function.
Example decision boundary:
Confusion Matrix
Visualizes the performance of a classification model.
Getting Started
Prerequisites
- Python 3.x
- NumPy
- scikit-learn
- Matplotlib (for visualizations)
Installation
- Clone the repository (optional):
git clone https://github.com/LeonMotaung/AI-Engineer.git cd AI-Engineer - Install required packages:
pip install -r requirements.txt
This project provides a hands-on introduction to preprocessing, basic classifiers, and visual evaluation metrics for supervised learning tasks.