This repository contains Python implementations of advanced data mining algorithms, machine learning pipelines, and high-performance recommender systems.
- Recommender Systems (
02_recommender_systems_svd.ipynb): Engineered scalable User Collaborative Filtering and Singular Value Decomposition (SVD) models. - Performance Optimization: Achieved sub-5-second execution times on 100k+ row datasets by strictly utilizing sparse matrices (
scipy.sparse.csr_matrix) and vectorized mathematical operations viaNumPy, successfully eliminating iterative Python loops. - Machine Learning & NLP (
03_clustering_nlp_pipelines.ipynb): Implemented classification pipelines utilizing TF-IDF and pre-trained Word2Vec models for NLP feature extraction. Designed clustering models (K-means, Agglomerative) using Silhouette coefficients. - Data Engineering (
01_data_preprocessing_eda.ipynb): Cleaned and transformed raw, high-dimensional data usingPandasfor downstream machine learning tasks.
- Language: Python
- Libraries: Pandas, NumPy, SciPy, Scikit-learn, Matplotlib