Skip to content

Predicting student academic performance using 7 ML models on 30+ features — from linear regression to deep neural networks

Notifications You must be signed in to change notification settings

SolyZak/StudentPerformanceAI

Repository files navigation

Student Performance Prediction with Machine Learning

Predicting student academic performance using 7 ML models on 30+ environmental and personal features

Python scikit-learn TensorFlow Pandas


Overview

A comprehensive machine learning analysis predicting student academic grades based on environmental, demographic, and lifestyle factors. Compares 7 models (from linear regression to deep neural networks) and includes K-Means clustering for outlier detection. Uses the Kaggle Student Performance dataset (395 students, 32 features).

Research Questions

  • Can ML predict student performance based on environmental and personal factors?
  • Do parental education levels significantly affect student performance?
  • How do personal habits and health impact academic grades?

Models Implemented

Traditional ML

Model Description
Linear Regression Baseline regression model
Ridge Regression L2-regularized linear model
Decision Tree Non-linear tree-based regression
SVR Support Vector Regression (linear kernel)

Deep Learning

Model Description
Basic Neural Network 2-layer Dense with ReLU
Improved Neural Network 3-layer with BatchNormalization
Advanced Neural Network 3-layer with BatchNormalization + Dropout

Unsupervised

Model Description
K-Means Clustering Outlier detection and student segmentation

Dataset

Source: Kaggle - Student Performance Data Size: 395 students, 32 features

Key Feature Categories:

  • Demographics — Age, sex, family size, parental status
  • Education — Mother's/Father's education level, school support, study time
  • Lifestyle — Free time, going out, alcohol consumption, health status
  • Academic — Past failures, absences, travel time

Target: Average grade (mean of G1, G2, G3 grading periods)

Methodology

  1. Preprocessing — Categorical to numerical conversion, MinMax normalization
  2. EDA — Distribution analysis, correlation heatmap, feature-target scatter plots
  3. Outlier Detection — K-Means clustering with distance-based threshold (1.956)
  4. Model Training — 80/20 train-test split across all 7 models
  5. Evaluation — MAE, MSE, RMSE, R² Score, Explained Variance

Visualizations

  • Correlation matrix heatmap
  • Target distribution bar chart
  • 30-feature scatter plot grid
  • K-Means cluster visualization (PCA)
  • Per-model prediction plots with training/validation loss curves

Project Structure

StudentPerformanceAI/
├── StudentPerformanceAI.ipynb                    # Main analysis notebook
├── StudentPerformance_Report_SolimanZakaria.pdf  # Full report
├── Model Plots/                                  # Model performance visualizations
├── Plots/                                        # EDA visualizations
└── Student Performance Dataset/
    └── student_data.csv                          # Source dataset

Getting Started

git clone https://github.com/SolyZak/StudentPerformanceAI.git
cd StudentPerformanceAI
pip install pandas numpy scikit-learn tensorflow matplotlib seaborn
jupyter notebook StudentPerformanceAI.ipynb

License

This project is for educational purposes.

About

Predicting student academic performance using 7 ML models on 30+ features — from linear regression to deep neural networks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages