Skip to content

AlexGeorgallis/python-data-mining-algorithms

Repository files navigation

Python Data Mining & Applied Algorithms

This repository contains Python implementations of advanced data mining algorithms, machine learning pipelines, and high-performance recommender systems.

Core Implementations & Performance

  • Recommender Systems (02_recommender_systems_svd.ipynb): Engineered scalable User Collaborative Filtering and Singular Value Decomposition (SVD) models.
  • Performance Optimization: Achieved sub-5-second execution times on 100k+ row datasets by strictly utilizing sparse matrices (scipy.sparse.csr_matrix) and vectorized mathematical operations via NumPy, successfully eliminating iterative Python loops.
  • Machine Learning & NLP (03_clustering_nlp_pipelines.ipynb): Implemented classification pipelines utilizing TF-IDF and pre-trained Word2Vec models for NLP feature extraction. Designed clustering models (K-means, Agglomerative) using Silhouette coefficients.
  • Data Engineering (01_data_preprocessing_eda.ipynb): Cleaned and transformed raw, high-dimensional data using Pandas for downstream machine learning tasks.

Technology Stack

  • Language: Python
  • Libraries: Pandas, NumPy, SciPy, Scikit-learn, Matplotlib

About

Python implementations of scalable recommender systems, ML pipelines, and high-performance clustering algorithms using sparse matrices.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors