tensor-parallelism

Here are 18 public repositories matching this topic...

bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Updated Sep 7, 2024
Python

InternLM / InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

pytorch multi-modal gemma pipeline-parallelism transformers-models tensor-parallelism llava llm-training internlm flash-attention zero3 llm-framework sequence-parallelism internlm2 ring-attention deepspeed-ulysses llama3 910b

Updated Aug 21, 2025
Python

kaiyuyue / torchshard

Star

Slicing a PyTorch Tensor Into Parallel Shards

pytorch model-parallelism tensor-parallelism

Updated Jun 7, 2025
Python

ai-decentralized / BloomBee

Star

Decentralized LLMs fine-tuning and inference with offloading

distributed-systems machine-learning deep-learning pytorch llama pipeline-parallelism tensor-parallelism

Updated Mar 3, 2026
Python

xrsrke / pipegoose

Star

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

transformers moe data-parallelism distributed-optimizers model-parallelism megatron mixture-of-experts pipeline-parallelism huggingface-transformers megatron-lm tensor-parallelism large-scale-language-modeling 3d-parallelism zero-1 sequence-parallelism

Updated Dec 14, 2023
Python

gty111 / gLLM

Star

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

pipeline-parallelism tensor-parallelism llm-serving llm-inference pagedattention continuous-batching qwen3 token-throttling chunked-prefill

Updated Jan 12, 2026
Python

aniquetahir / JORA

Star

JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)

machine-learning lora jax tensor-parallelism large-language-models

Updated Apr 25, 2024
Python

ShinoharaHare / LLM-Training

Star

A distributed training framework for large language models powered by Lightning.

transformer llama distributed-training fine-tuning pre-training tensor-parallelism llm instruction-tuning llm-training llm-finetuning phi-3

Updated Jul 31, 2025
Python

AlibabaPAI / FlashModels

Star

Fast and easy distributed model training examples.

deep-learning pytorch zero data-parallelism model-parallelism distributed-training xla tensor-parallelism llm fsdp sequence-parallelism

Updated Nov 26, 2024
Python

fattorib / transformer_shmap

Star

Tensor Parallelism with JAX + Shard Map

transformers gpt tpu jax tensor-parallelism pjit shmap

Updated Sep 29, 2023
Python

George614 / gpu-mem-calculator

Star

GPU Memory Calculator for LLM Training - Calculate GPU memory requirements for training Large Language Models with support for multiple training engines including PyTorch DDP, DeepSpeed ZeRO, Megatron-LM, and FSDP.

Updated Jan 26, 2026
Python

NiuHuangxiaozi / Deep-Learning-Parallelism

Star

This repository outlines a comprehensive guide for training a distributed deep learning model.

pytorch ps ddp allreduce pipline deepspeed tensor-parallelism

Updated Jul 2, 2024
Python

polrolnik2 / upmem-tensor-multiplication-runtime

Star

A reference implementation of Matrix Multiplication algorithms for ML on UPMEM PIM - a processing-in-memory platform

c distributed-systems machine-learning cmake hpc ml pim upmem processing-in-memory tensor-parallelism

Updated Mar 3, 2026
C

nshkrdotcom / vllm

Sponsor

Star

vLLM - High-throughput, memory-efficient LLM inference engine with PagedAttention, continuous batching, CUDA/HIP optimization, quantization (GPTQ/AWQ/INT4/INT8/FP8), tensor/pipeline parallelism, OpenAI-compatible API, multi-GPU/TPU/Neuron support, prefix caching, and multi-LoRA capabilities

Updated Feb 7, 2026
Elixir

eduardburlacu / NanoTransformer

Star

Communication-efficient Tensor Parallelism for GPT-2

distributed-computing tensor-parallelism llm-training

Updated Oct 17, 2025
Python

SuZeAI / DP

Star

This repository focuses on distributed and parallel computing with PyTorch, covering model parallelism, data parallelism, and advanced optimization techniques. It provides resources for scaling AI training and inference efficiently across multiple devices.

parallel parallel-computing distributed ddp parallel-data tensor-parallelism fsdp

Updated Jun 29, 2025
Jupyter Notebook

CoffeeVampir3 / Hyper-AMX

Star

Repo for AMX + FAST

inference amx tensor quantization avx512 inference-engine matmul numa-aware tensor-parallelism

Updated Nov 1, 2025
C++

shreyansh26 / wordle-solver

Star

Training Qwen3 to solve Wordle using SFT and GRPO

rl wordle sft rft tensor-parallelism wordle-solver llm fsdp grpo qwen3

Updated Sep 22, 2025
Python

Improve this page

Add a description, image, and links to the tensor-parallelism topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensor-parallelism topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensor-parallelism

Here are 18 public repositories matching this topic...

bigscience-workshop / petals

InternLM / InternEvo

kaiyuyue / torchshard

ai-decentralized / BloomBee

xrsrke / pipegoose

gty111 / gLLM

aniquetahir / JORA

ShinoharaHare / LLM-Training

AlibabaPAI / FlashModels

fattorib / transformer_shmap

George614 / gpu-mem-calculator

NiuHuangxiaozi / Deep-Learning-Parallelism

polrolnik2 / upmem-tensor-multiplication-runtime

nshkrdotcom / vllm

eduardburlacu / NanoTransformer

SuZeAI / DP

CoffeeVampir3 / Hyper-AMX

shreyansh26 / wordle-solver

Improve this page

Add this topic to your repo