- mumbai
-
19:34
(UTC +05:30) - in/shlok-l-50180120b
- @shlok_fx
- https://leetcode.com/u/Shlok_Fx/
Pinned Loading
-
Mini-Attention
Mini-Attention PublicFP16 Flash Attention 2 from scratch in CUDA C++ acheving 96% of CuDnn performance on SM120 (RTX 5090)
Cuda 5
-
100-days-cuda
100-days-cuda PublicThis repository documents my 100-day journey of learning and writing CUDA kernels.
-
SageAttention
SageAttention PublicForked from thu-ml/SageAttention
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Cuda
-
flex-block-attn
flex-block-attn PublicForked from Tencent-Hunyuan/flex-block-attn
flex-block-attn: an efficient block sparse attention computation library
Jupyter Notebook
-
sglang
sglang PublicForked from sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
Python
-
ThunderKittens
ThunderKittens PublicForked from HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Cuda
If the problem persists, check the GitHub status page or contact support.




