[SYSTEMDS-2850] Generalized parameter server Autoencoder#2434
Open
AdityaPandey2612 wants to merge 13 commits intoapache:mainfrom
Open
[SYSTEMDS-2850] Generalized parameter server Autoencoder#2434AdityaPandey2612 wants to merge 13 commits intoapache:mainfrom
AdityaPandey2612 wants to merge 13 commits intoapache:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SystemDS: Parameter Server Autoencoder with variable hidden layers (Already rebased onto the main branch of SystemDS)
Overview
This repository contains a comprehensive implementation and experimental evaluation of distributed autoencoder training using Apache SystemDS. The project implements a generalized symmetric autoencoder in DML (Declarative Machine Learning) and provides a complete infrastructure for automated testing, validation, and performance analysis of parameter server-based distributed training.
Key Contributions
1. Generalized Autoencoder Implementation (DML)
autoencoder_2layer.dml(867 lines): Core implementation supporting both DEFAULTSERVER (single-node) and PARAMSERVER (distributed) training modesautoencoderGeneralized.dml(130 lines): Wrapper enabling arbitrary encoder depths with symmetric decoder mirroringautoGradientCheck.dml(95 lines): Finite-difference gradient verification for correctness validation2, Automated Testing Suite
JUnit Integration Tests
The implementation includes comprehensive JUnit tests integrated with the SystemDS test framework:
BuiltinAutoencoderGeneralizedTest.java
Complete test suite covering multiple architectural configurations:
Test Coverage:
BuiltinAutoencoderGeneralizedBasicTest.java
Basic sanity test for quick validation:
Key Features:
Experimental/Correctness/Testing Infrastructure
Automated Experiment Runner:
run_sysds_experiments.py: Python-based automation framework for executing experiment sweepsConfiguration Files (8 YAML configs):
e16_default.yaml: DEFAULTSERVER baseline experimentse16_ps_w2.yaml: 2-worker parameter server configurationse16_ps_w4.yaml: 4-worker parameter server with K-parameter sweepepoch_curve.yaml: Convergence analysis over training epochsepoch_curve_sbp.yaml: SBP staleness parameter explorationgradient_check.yaml: Gradient verification test suitestress_suite.yaml: Comprehensive stress testing (30+ configurations)epoch_curve_fast.yaml: Quick validation experimentsExperimental Results Summary
Correctness Validation
Convergence Performance
Best Configuration: SBP(K=2, W=4, ModelAvg=True)
Key Findings:
Key Insights
Repository Structure
Quick Start
Prerequisites
pyyaml,pandas,matplotlib,seaborn,numpyInstallation
Generate Data
Run Experiments
Single configuration:
Full experiment suite:
Manual execution (for debugging):
Analyze Results
Visualizations
The analysis pipeline generates 11 publication-ready figures:
Experimental Details
Model Architecture
Training Configuration
Synchronization Strategies Evaluated
SBP Parameter K:
Performance Metrics
Convergence Quality
Runtime Performance
Statistical Analysis
Technical Highlights
DML Implementation Features
paramserv()API usageInfrastructure Features
Documentation
Complete Technical Report
The repository includes a comprehensive 38-page technical report (
report_comprehensive.pdf) covering:Key Sections
Research Context
This work was completed as part of the Large-Scale Data Engineering course at Technische Universität Berlin. The project demonstrates:
Future Work
Algorithmic Extensions
Infrastructure Enhancements
Experimental Extensions
Contributing
Author
Aditya Pandey
Technische Universität Berlin
For a more comprehensive understanding of the project, experimentation, and documentation, please look at the pdf below
report_comprehensive.pdf