Kael Valen kaelvalen

Mehmet Arda Hakbilen (Kael Valen)

ML Architecture Researcher · Non-Transformer Models · Systematic Learning

Researching non-transformer architectures through systematic implementation.
Not just reading papers — building Mamba, RWKV, Flash Attention from scratch.

GitHub Analytics & Activity

Tracking systematic learning through daily commits

Research & Core Focus

Beyond Transformer: Alternative Sequence Architectures

Questioning the assumption that transformers are the only viable solution

State-Space Models → Mamba & RWKV implementations
- Linear-time inference vs quadratic attention complexity
- Selective state mechanisms for efficient memory
- Comparing trade-offs: speed vs expressiveness
Hybrid Architectures → RNN + Attention combinations
- Exploring best of both worlds: recurrence + selectivity
- Custom memory systems for long-context tasks
- Implementation-first approach to understanding
Flash Attention v2 → From-scratch CUDA optimization
- Understanding memory-efficient attention at kernel level
- Production inference optimization
- 10x speedup through proper memory access patterns

Featured Projects

Beyond Transformer

Open-source alternative architecture research

Implementation-first approach to understanding non-transformer models. Building Mamba, RWKV, and hybrid systems from scratch to compare trade-offs.

PyTorch JAX CUDA ONNX

Current Phase:
• Flash Attention v2 from scratch
• Mamba state-space model analysis
• RWKV architecture comparison

SentinelFS

Distributed P2P File Sync (Archived)

Autonomous peer-to-peer synchronization with ML-based anomaly detection, delta-sync algorithms, and genetic topology remeshing for fault tolerance.

C++17 Threading P2P ML

Key Features:
• ML anomaly detection pipeline
• Self-healing network topology
• Zero-copy delta sync protocol
• Byzantine fault tolerance

Research Blog (Planned)

Implementation Notes & Architecture Analysis

Documenting learning journey: from-scratch implementations, architecture comparisons, and trade-off analysis. Focus on practical insights over theory.

Markdown GitHub Pages Technical Writing

Planned Topics:
• Mamba vs Transformer trade-offs
• Flash Attention internals
• CUDA kernel optimization
• JAX vs PyTorch for research

Tech Stack & Tools

ML Frameworks
Research Tools
Core Languages
Infrastructure
Web Stack(I'm not happy about this, ngl)

Collaboration & Contact

Open to research collaboration & technical discussions
Interested in alternative architectures, efficient inference, or systematic ML learning?

What I'm looking for:
• Co-researchers on non-transformer architectures
• Code review & implementation feedback
• Trade-off discussions: speed vs accuracy vs memory

What I'm not interested in:
❌ Wrapper apps without novel architecture
❌ "Just use ChatGPT API" projects
❌ Hype-driven development

"Implementation over theory. Trade-offs over hype. Systematic learning over vibe coding."

_{Star the repos if you like them. Building in private — launching into space soon.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly