AutoML Trainer

A no-code machine learning web application to upload spreadsheet data, train multiple models, compare performance, and generate predictions through UI and API.

✨ Features

📤 Upload CSV/XLSX/XLS datasets with instant data-quality analysis
🧹 Automatic preprocessing (missing values, duplicates, type handling)
🤖 Multi-model training for both classification and regression
🎯 Smart model recommendations with comparison charts
💾 Download trained .joblib models and cleaned datasets
🔮 Single-record and batch-file prediction workflows
📚 Built-in OpenAPI + Swagger docs for prediction endpoints
🧾 Prediction history tracking and clear/reset actions
🧼 Artifact cleanup endpoint + helper script

📦 Tech Stack

Backend: Flask
ML: scikit-learn, NumPy, pandas
Visualization: Matplotlib, Chart.js (CDN)
API Docs: Swagger UI (CDN)
Optional async deps included: Celery + Redis

🧠 Supported Models

Classification

Random Forest
Gradient Boosting
Logistic Regression
Support Vector Machine (SVM)
K-Nearest Neighbors (KNN)
Decision Tree

Regression

Random Forest Regressor
Gradient Boosting Regressor
Linear Regression
Support Vector Regressor (SVR)
Decision Tree Regressor
Elastic Net

🏗️ Architecture

flowchart TD
    A[Browser UI\nJinja + CSS + JS] --> B[Flask App\napp.py]
    B --> C[Upload + Quality Analysis\n/api/data-quality]
    B --> D[Preprocessing + Model Selection]
    D --> E[Training Pipeline\nscikit-learn + RandomizedSearchCV]
    E --> F[(models/*.joblib)]
    D --> G[(cleaned_data/*)]

    F --> H[Prediction UI\n/predict/<model>]
    F --> I[Prediction API\n/api/predict/<model>]
    H --> J[(prediction_history/*.jsonl)]
    I --> J

    B --> K[Model Insights\n/api/model-insights/<model>]
    B --> L[OpenAPI Spec\n/api/openapi.json]

⚙️ Model Training Pipeline

Data preparation

Upload accepts csv, xlsx, xls
Data-quality report computes missing values, duplicates, data types, and per-column stats
Cleaning removes duplicate rows and rows with missing target values
Problem type is auto-detected from target dtype/distribution (classification vs regression)
Training is capped by MAX_TRAIN_ROWS to avoid oversized jobs

Feature preprocessing pipeline

Numeric features: median imputation
Scaled numeric models (svm, svr, knn, logistic_regression, linear_regression, elastic_net) add StandardScaler
Categorical features: most-frequent imputation + OneHotEncoder(handle_unknown="ignore")
Combined with ColumnTransformer and wrapped in a single scikit-learn Pipeline

Model training and selection

Train/validation split is automatic
Classification uses stratified split when feasible and class-aware handling
Optional hyperparameter tuning uses RandomizedSearchCV (ENABLE_HYPERPARAM_TUNING=True)
Cross-validation folds/iterations are bounded by app config:
HYPERPARAM_SEARCH_MAX_ITER
HYPERPARAM_SEARCH_CV_MAX_FOLDS
Best estimator is persisted as .joblib

Metrics and artifacts

Classification metrics: accuracy, precision, recall, F1
Regression metrics: R2, MAE, RMSE
Saved model payload includes:
preprocessing + estimator pipeline
feature columns and feature dtypes
inferred feature value hints
target classes/encoder metadata (classification)
training metrics and configuration

Prediction pipeline

UI and API inference both load the same saved .joblib pipeline
Input validation is dtype-aware and checks required feature columns
Batch prediction endpoint supports uploaded files
Prediction history is stored in prediction_history/*.jsonl

📁 Project Structure

ML-web/
|-- app.py
|-- requirements.txt
|-- scripts/
|   `-- cleanup_artifacts.py
|-- templates/
|   |-- base.html
|   |-- index.html
|   |-- upload.html
|   |-- train.html
|   |-- select_models.html
|   |-- results_comparison.html
|   |-- models.html
|   |-- predict.html
|   `-- api_docs.html
|-- static/
|   |-- css/style.css
|   |-- js/script.js
|   `-- images/
|-- uploads/
|-- cleaned_data/
|-- models/
`-- prediction_history/

🚀 Quick Start

1. Clone and enter project

git clone <your-repo-url>
cd ML-web

2. Create virtual environment

python -m venv .venv

Windows (PowerShell):

.\.venv\Scripts\Activate.ps1

macOS/Linux:

source .venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Run the app

python app.py

Open: http://127.0.0.1:5000

🔄 End-to-End Workflow

Upload a dataset on /upload
Inspect quality report and choose target column
Select recommended models
Train and compare metrics
Download best model and cleaned data
Run predictions from /predict/<model_filename> or API

🔌 Key Endpoints

GET / - Landing page
GET|POST /upload - Upload dataset + training configuration
POST /select-models - Model recommendation page
POST /train-models - Train selected models
GET /models - Saved model library
GET|POST /predict/<filename> - Interactive prediction workspace
POST /api/predict/<filename> - JSON prediction API
POST /predict-file/<filename> - Batch predictions from uploaded file
GET /api/model-insights/<filename> - Model metrics + feature importance
GET /api/openapi.json - OpenAPI document
GET /api/docs - Swagger UI docs
GET /api/predict-history/<filename> - Prediction history
POST /api/predict-history/<filename>/clear - Clear history
POST /admin/cleanup-artifacts - Cleanup generated artifacts

🖼️ Screenshots

⚙️ Configuration Notes

Default runtime settings in app.py include:

MAX_TRAIN_ROWS = 50000
ENABLE_HYPERPARAM_TUNING = True
HYPERPARAM_SEARCH_MAX_ITER = 12
HYPERPARAM_SEARCH_CV_MAX_FOLDS = 5
Auto-cleanup enabled after training with retention limits

Optional env var:

CLEANUP_TOKEN for securing /admin/cleanup-artifacts

🧼 Cleanup Utility

Dry run:

python scripts/cleanup_artifacts.py

Apply deletion:

python scripts/cleanup_artifacts.py --apply

Include uploads cleanup:

python scripts/cleanup_artifacts.py --include-uploads --apply

📌 Notes

Generated artifacts (uploads/, models/, cleaned_data/) are git-ignored.
App currently runs with debug=True in app.py; disable for production.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
prediction_history		prediction_history
scripts		scripts
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
file_sructure.txt		file_sructure.txt
hero.png		hero.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoML Trainer

✨ Features

📦 Tech Stack

🧠 Supported Models

Classification

Regression

🏗️ Architecture

⚙️ Model Training Pipeline

Data preparation

Feature preprocessing pipeline

Model training and selection

Metrics and artifacts

Prediction pipeline

📁 Project Structure

🚀 Quick Start

1. Clone and enter project

2. Create virtual environment

3. Install dependencies

4. Run the app

🔄 End-to-End Workflow

🔌 Key Endpoints

🖼️ Screenshots

⚙️ Configuration Notes

🧼 Cleanup Utility

📌 Notes

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoML Trainer

✨ Features

📦 Tech Stack

🧠 Supported Models

Classification

Regression

🏗️ Architecture

⚙️ Model Training Pipeline

Data preparation

Feature preprocessing pipeline

Model training and selection

Metrics and artifacts

Prediction pipeline

📁 Project Structure

🚀 Quick Start

1. Clone and enter project

2. Create virtual environment

3. Install dependencies

4. Run the app

🔄 End-to-End Workflow

🔌 Key Endpoints

🖼️ Screenshots

⚙️ Configuration Notes

🧼 Cleanup Utility

📌 Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages