Skip to content

kspeiris/Auto_ml_trainner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

11 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AutoML Trainer

Flask scikit-learn Python

A no-code machine learning web application to upload spreadsheet data, train multiple models, compare performance, and generate predictions through UI and API.


image


โœจ Features

  • ๐Ÿ“ค Upload CSV/XLSX/XLS datasets with instant data-quality analysis
  • ๐Ÿงน Automatic preprocessing (missing values, duplicates, type handling)
  • ๐Ÿค– Multi-model training for both classification and regression
  • ๐ŸŽฏ Smart model recommendations with comparison charts
  • ๐Ÿ’พ Download trained .joblib models and cleaned datasets
  • ๐Ÿ”ฎ Single-record and batch-file prediction workflows
  • ๐Ÿ“š Built-in OpenAPI + Swagger docs for prediction endpoints
  • ๐Ÿงพ Prediction history tracking and clear/reset actions
  • ๐Ÿงผ Artifact cleanup endpoint + helper script

๐Ÿ“ฆ Tech Stack

  • Backend: Flask
  • ML: scikit-learn, NumPy, pandas
  • Visualization: Matplotlib, Chart.js (CDN)
  • API Docs: Swagger UI (CDN)
  • Optional async deps included: Celery + Redis

๐Ÿง  Supported Models

Classification

  • Random Forest
  • Gradient Boosting
  • Logistic Regression
  • Support Vector Machine (SVM)
  • K-Nearest Neighbors (KNN)
  • Decision Tree

Regression

  • Random Forest Regressor
  • Gradient Boosting Regressor
  • Linear Regression
  • Support Vector Regressor (SVR)
  • Decision Tree Regressor
  • Elastic Net

๐Ÿ—๏ธ Architecture

image


flowchart TD
    A[Browser UI\nJinja + CSS + JS] --> B[Flask App\napp.py]
    B --> C[Upload + Quality Analysis\n/api/data-quality]
    B --> D[Preprocessing + Model Selection]
    D --> E[Training Pipeline\nscikit-learn + RandomizedSearchCV]
    E --> F[(models/*.joblib)]
    D --> G[(cleaned_data/*)]

    F --> H[Prediction UI\n/predict/<model>]
    F --> I[Prediction API\n/api/predict/<model>]
    H --> J[(prediction_history/*.jsonl)]
    I --> J

    B --> K[Model Insights\n/api/model-insights/<model>]
    B --> L[OpenAPI Spec\n/api/openapi.json]
Loading

โš™๏ธ Model Training Pipeline

Data preparation

  • Upload accepts csv, xlsx, xls
  • Data-quality report computes missing values, duplicates, data types, and per-column stats
  • Cleaning removes duplicate rows and rows with missing target values
  • Problem type is auto-detected from target dtype/distribution (classification vs regression)
  • Training is capped by MAX_TRAIN_ROWS to avoid oversized jobs

Feature preprocessing pipeline

  • Numeric features: median imputation
  • Scaled numeric models (svm, svr, knn, logistic_regression, linear_regression, elastic_net) add StandardScaler
  • Categorical features: most-frequent imputation + OneHotEncoder(handle_unknown="ignore")
  • Combined with ColumnTransformer and wrapped in a single scikit-learn Pipeline

Model training and selection

  • Train/validation split is automatic
  • Classification uses stratified split when feasible and class-aware handling
  • Optional hyperparameter tuning uses RandomizedSearchCV (ENABLE_HYPERPARAM_TUNING=True)
  • Cross-validation folds/iterations are bounded by app config:
  • HYPERPARAM_SEARCH_MAX_ITER
  • HYPERPARAM_SEARCH_CV_MAX_FOLDS
  • Best estimator is persisted as .joblib

Metrics and artifacts

  • Classification metrics: accuracy, precision, recall, F1
  • Regression metrics: R2, MAE, RMSE
  • Saved model payload includes:
  • preprocessing + estimator pipeline
  • feature columns and feature dtypes
  • inferred feature value hints
  • target classes/encoder metadata (classification)
  • training metrics and configuration

Prediction pipeline

  • UI and API inference both load the same saved .joblib pipeline
  • Input validation is dtype-aware and checks required feature columns
  • Batch prediction endpoint supports uploaded files
  • Prediction history is stored in prediction_history/*.jsonl

๐Ÿ“ Project Structure

ML-web/
|-- app.py
|-- requirements.txt
|-- scripts/
|   `-- cleanup_artifacts.py
|-- templates/
|   |-- base.html
|   |-- index.html
|   |-- upload.html
|   |-- train.html
|   |-- select_models.html
|   |-- results_comparison.html
|   |-- models.html
|   |-- predict.html
|   `-- api_docs.html
|-- static/
|   |-- css/style.css
|   |-- js/script.js
|   `-- images/
|-- uploads/
|-- cleaned_data/
|-- models/
`-- prediction_history/

๐Ÿš€ Quick Start

1. Clone and enter project

git clone <your-repo-url>
cd ML-web

2. Create virtual environment

python -m venv .venv

Windows (PowerShell):

.\.venv\Scripts\Activate.ps1

macOS/Linux:

source .venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Run the app

python app.py

Open: http://127.0.0.1:5000

๐Ÿ”„ End-to-End Workflow

  1. Upload a dataset on /upload
  2. Inspect quality report and choose target column
  3. Select recommended models
  4. Train and compare metrics
  5. Download best model and cleaned data
  6. Run predictions from /predict/<model_filename> or API

๐Ÿ”Œ Key Endpoints

  • GET / - Landing page
  • GET|POST /upload - Upload dataset + training configuration
  • POST /select-models - Model recommendation page
  • POST /train-models - Train selected models
  • GET /models - Saved model library
  • GET|POST /predict/<filename> - Interactive prediction workspace
  • POST /api/predict/<filename> - JSON prediction API
  • POST /predict-file/<filename> - Batch predictions from uploaded file
  • GET /api/model-insights/<filename> - Model metrics + feature importance
  • GET /api/openapi.json - OpenAPI document
  • GET /api/docs - Swagger UI docs
  • GET /api/predict-history/<filename> - Prediction history
  • POST /api/predict-history/<filename>/clear - Clear history
  • POST /admin/cleanup-artifacts - Cleanup generated artifacts

๐Ÿ–ผ๏ธ Screenshots

image image imageimageimageimageimageimageimageimageimageimageimageimageimageimageimage

โš™๏ธ Configuration Notes

Default runtime settings in app.py include:

  • MAX_TRAIN_ROWS = 50000
  • ENABLE_HYPERPARAM_TUNING = True
  • HYPERPARAM_SEARCH_MAX_ITER = 12
  • HYPERPARAM_SEARCH_CV_MAX_FOLDS = 5
  • Auto-cleanup enabled after training with retention limits

Optional env var:

  • CLEANUP_TOKEN for securing /admin/cleanup-artifacts

๐Ÿงผ Cleanup Utility

Dry run:

python scripts/cleanup_artifacts.py

Apply deletion:

python scripts/cleanup_artifacts.py --apply

Include uploads cleanup:

python scripts/cleanup_artifacts.py --include-uploads --apply

๐Ÿ“Œ Notes

  • Generated artifacts (uploads/, models/, cleaned_data/) are git-ignored.
  • App currently runs with debug=True in app.py; disable for production.

About

Mini - Project - Flask web app for cleaning tabular data, training multiple ML models, comparing performance, and downloading artifacts.

Topics

Resources

Stars

Watchers

Forks

Contributors