A no-code machine learning web application to upload spreadsheet data, train multiple models, compare performance, and generate predictions through UI and API.
- ๐ค Upload CSV/XLSX/XLS datasets with instant data-quality analysis
- ๐งน Automatic preprocessing (missing values, duplicates, type handling)
- ๐ค Multi-model training for both classification and regression
- ๐ฏ Smart model recommendations with comparison charts
- ๐พ Download trained
.joblibmodels and cleaned datasets - ๐ฎ Single-record and batch-file prediction workflows
- ๐ Built-in OpenAPI + Swagger docs for prediction endpoints
- ๐งพ Prediction history tracking and clear/reset actions
- ๐งผ Artifact cleanup endpoint + helper script
- Backend: Flask
- ML: scikit-learn, NumPy, pandas
- Visualization: Matplotlib, Chart.js (CDN)
- API Docs: Swagger UI (CDN)
- Optional async deps included: Celery + Redis
- Random Forest
- Gradient Boosting
- Logistic Regression
- Support Vector Machine (SVM)
- K-Nearest Neighbors (KNN)
- Decision Tree
- Random Forest Regressor
- Gradient Boosting Regressor
- Linear Regression
- Support Vector Regressor (SVR)
- Decision Tree Regressor
- Elastic Net
flowchart TD
A[Browser UI\nJinja + CSS + JS] --> B[Flask App\napp.py]
B --> C[Upload + Quality Analysis\n/api/data-quality]
B --> D[Preprocessing + Model Selection]
D --> E[Training Pipeline\nscikit-learn + RandomizedSearchCV]
E --> F[(models/*.joblib)]
D --> G[(cleaned_data/*)]
F --> H[Prediction UI\n/predict/<model>]
F --> I[Prediction API\n/api/predict/<model>]
H --> J[(prediction_history/*.jsonl)]
I --> J
B --> K[Model Insights\n/api/model-insights/<model>]
B --> L[OpenAPI Spec\n/api/openapi.json]
- Upload accepts
csv,xlsx,xls - Data-quality report computes missing values, duplicates, data types, and per-column stats
- Cleaning removes duplicate rows and rows with missing target values
- Problem type is auto-detected from target dtype/distribution (
classificationvsregression) - Training is capped by
MAX_TRAIN_ROWSto avoid oversized jobs
- Numeric features: median imputation
- Scaled numeric models (
svm,svr,knn,logistic_regression,linear_regression,elastic_net) addStandardScaler - Categorical features: most-frequent imputation +
OneHotEncoder(handle_unknown="ignore") - Combined with
ColumnTransformerand wrapped in a single scikit-learnPipeline
- Train/validation split is automatic
- Classification uses stratified split when feasible and class-aware handling
- Optional hyperparameter tuning uses
RandomizedSearchCV(ENABLE_HYPERPARAM_TUNING=True) - Cross-validation folds/iterations are bounded by app config:
HYPERPARAM_SEARCH_MAX_ITERHYPERPARAM_SEARCH_CV_MAX_FOLDS- Best estimator is persisted as
.joblib
- Classification metrics: accuracy, precision, recall, F1
- Regression metrics: R2, MAE, RMSE
- Saved model payload includes:
- preprocessing + estimator pipeline
- feature columns and feature dtypes
- inferred feature value hints
- target classes/encoder metadata (classification)
- training metrics and configuration
- UI and API inference both load the same saved
.joblibpipeline - Input validation is dtype-aware and checks required feature columns
- Batch prediction endpoint supports uploaded files
- Prediction history is stored in
prediction_history/*.jsonl
ML-web/
|-- app.py
|-- requirements.txt
|-- scripts/
| `-- cleanup_artifacts.py
|-- templates/
| |-- base.html
| |-- index.html
| |-- upload.html
| |-- train.html
| |-- select_models.html
| |-- results_comparison.html
| |-- models.html
| |-- predict.html
| `-- api_docs.html
|-- static/
| |-- css/style.css
| |-- js/script.js
| `-- images/
|-- uploads/
|-- cleaned_data/
|-- models/
`-- prediction_history/
git clone <your-repo-url>
cd ML-webpython -m venv .venvWindows (PowerShell):
.\.venv\Scripts\Activate.ps1macOS/Linux:
source .venv/bin/activatepip install -r requirements.txtpython app.pyOpen: http://127.0.0.1:5000
- Upload a dataset on
/upload - Inspect quality report and choose target column
- Select recommended models
- Train and compare metrics
- Download best model and cleaned data
- Run predictions from
/predict/<model_filename>or API
GET /- Landing pageGET|POST /upload- Upload dataset + training configurationPOST /select-models- Model recommendation pagePOST /train-models- Train selected modelsGET /models- Saved model libraryGET|POST /predict/<filename>- Interactive prediction workspacePOST /api/predict/<filename>- JSON prediction APIPOST /predict-file/<filename>- Batch predictions from uploaded fileGET /api/model-insights/<filename>- Model metrics + feature importanceGET /api/openapi.json- OpenAPI documentGET /api/docs- Swagger UI docsGET /api/predict-history/<filename>- Prediction historyPOST /api/predict-history/<filename>/clear- Clear historyPOST /admin/cleanup-artifacts- Cleanup generated artifacts
Default runtime settings in app.py include:
MAX_TRAIN_ROWS = 50000ENABLE_HYPERPARAM_TUNING = TrueHYPERPARAM_SEARCH_MAX_ITER = 12HYPERPARAM_SEARCH_CV_MAX_FOLDS = 5- Auto-cleanup enabled after training with retention limits
Optional env var:
CLEANUP_TOKENfor securing/admin/cleanup-artifacts
Dry run:
python scripts/cleanup_artifacts.pyApply deletion:
python scripts/cleanup_artifacts.py --applyInclude uploads cleanup:
python scripts/cleanup_artifacts.py --include-uploads --apply- Generated artifacts (
uploads/,models/,cleaned_data/) are git-ignored. - App currently runs with
debug=Trueinapp.py; disable for production.


















