GitHub - rlops/rlix: GPU time-sharing for concurrent LLM RL

A control plane for concurrent LLM RL on shared GPUs

In agentic RL, long-horizon rollouts are increasingly long-tailed: a small number of stragglers dominate wall-clock time while many GPUs sit underutilized.

RLix addresses this by time-sharing GPUs across concurrent RL training jobs, expanding rollout workers onto temporarily idle capacity and shrinking them when that capacity is needed elsewhere. RLix does not change pipeline-level training semantics: each recipe retains its original behavior, whether on-policy or off-policy under staleness bounds, while delivering much higher GPU utilization.

RLix builds on Partial Overlapping scheduling from Alibaba/ROLL and extends it with a distributed control plane for coordinating multiple independent training jobs on a shared GPU cluster.

RLix is an AI-native project, with AI deeply involved across design, planning, implementation, testing, and code review, alongside human oversight. Correctness, code quality, and maintainability remain first-class concerns.

Features

Recipe-Transparent Scheduling: Training logic stays fully decoupled from GPU scheduling, so each pipeline can be developed in isolation.
Two-Level GPU Sharing: GPUs are shared both across pipelines through elastic expand/shrink and within a pipeline through multi-LoRA adapters on a shared base model.
Demand-Driven Rollout Scaling: Rollout workers expand onto idle GPU capacity and shrink based on heartbeat-reported demand.
Efficient Memory Management: Model weights are cached on the trainer CPU and synced on demand only to resumed rollout workers; when workers shrink, inference weights are dropped to minimize memory footprint.

Installation

git clone https://github.com/rlops/rlix.git
cd rlix
pip install -e .

Quick Start

The example below shows a minimal control-plane setup for registering and running a pipeline under RLix.

import ray
import rlix
from rlix.pipeline import PipelineCoordinator
from rlix.protocol.types import COORDINATOR_ACTOR_NAME_PREFIX

# Pipeline-specific configuration object
my_config = ...

# 1. Initialize the RLix control plane
orchestrator = rlix.init(create_if_missing=True)

# 2. Allocate a pipeline ID
pipeline_id = ray.get(orchestrator.allocate_pipeline_id.remote("ft"))

# 3. Register the pipeline's GPU topology
ray.get(
    orchestrator.register_pipeline.remote(
        pipeline_id=pipeline_id,
        ray_namespace=f"pipeline_{pipeline_id}_NS",
        cluster_tp_configs={"actor_train": 8, "actor_infer": 8},
        cluster_device_mappings={
            "actor_train": [0, 1, 2, 3, 4, 5, 6, 7],
            "actor_infer": [0, 1, 2, 3, 4, 5, 6, 7],
        },
    )
)

# 4. Admit the pipeline before GPU allocation
ray.get(orchestrator.admit_pipeline.remote(pipeline_id=pipeline_id))

# 5. Create the pipeline coordinator
CoordinatorActor = ray.remote(PipelineCoordinator)
coordinator = CoordinatorActor.options(
    name=f"{COORDINATOR_ACTOR_NAME_PREFIX}{pipeline_id}",
    namespace=f"pipeline_{pipeline_id}_NS",
).remote(pipeline_id=pipeline_id, pipeline_config=my_config)

# 6. Create and run the pipeline
pipeline_actor = ray.get(
    coordinator.create_pipeline_actor.remote(pipeline_config=my_config)
)
ray.get(pipeline_actor.run.remote())

See examples/ for complete multi-pipeline examples and full configuration options.

Pipeline Types

RLix currently supports two built-in pipeline types:

Full Finetune Pipeline (`RollFullFinetunePipeline`)

Full-parameter training with elastic GPU expand/shrink. Each job trains all model weights while yielding idle GPUs to other jobs.

Multi-LoRA Pipeline (`RollMultiLoraPipeline`)

Concurrent training of multiple LoRA adapters on a shared base model, with an isolated optimizer for each adapter. Jobs share the base model in GPU memory while keeping adapter weights and optimizer states fully independent.

Beyond these built-in options, RLix supports custom pipelines and integrations that follow the RLix control-plane protocol.

Architecture

RLix separates scheduling from per-pipeline training logic: a shared control plane coordinates multiple independent jobs, while rollout stages elastically consume residual GPU capacity.

┌───────────────────────────────────────────────────────────┐
│                     RLix Control Plane                    │
├──────────────────┬──────────────────┬─────────────────────┤
│   Orchestrator   │    Scheduler     │  Resource Manager   │
│ (lifecycle mgmt) │ (priority +      │   (GPU topology)    │
│                  │ rollout preempt) │                     │
└────────┬─────────┴────────┬─────────┴─────────┬───────────┘
         │                  │                   │
    ┌────▼─────┐       ┌────▼─────┐        ┌────▼─────┐
    │FullFine- │       │Multi-LoRA│        │Custom /  │
    │tune Job 1│       │  Job 2   │        │External  │
    │          │       │          │        │  Job N   │
    └────┬─────┘       └────┬─────┘        └────┬─────┘
         │                  │                   │
    ┌────▼──────────────────▼───────────────────▼────┐
    │               Shared GPU Capacity              │
    │   [GPU 0] [GPU 1] [GPU 2] [GPU 3] ... [GPU N]  │
    └────────────────────────────────────────────────┘

Scheduling Policy

The scheduler assigns GPUs in priority order, with lower values indicating higher priority. All stages except rollout are non-preemptable: once they acquire GPUs, they keep them until completion. Rollout (6) is the lowest-priority stage and is always preemptable, using only the capacity remaining after all higher-priority stages are satisfied. When multiple jobs roll out concurrently, the remaining GPUs are divided proportionally to each job’s outstanding rollout demand, subject to placement constraints.

0 Initialization: Model loading; must complete before scheduling begins.
1 Actor Training: Policy gradient update.
2 Critic Training: Value function update.
3 Old-Policy Log Probs: Log-probability computation under the previous policy.
4 Reference-Model Log Probs: Log-probability computation under the reference model.
5 Value Compute: Value estimation for advantage calculation.
6 Rollout: Trajectory sampling; always preemptable.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
assets		assets
examples		examples
rlix		rlix
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A control plane for concurrent LLM RL on shared GPUs

Features

Installation

Quick Start

Pipeline Types

Full Finetune Pipeline (`RollFullFinetunePipeline`)

Multi-LoRA Pipeline (`RollMultiLoraPipeline`)

Architecture

Scheduling Policy

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A control plane for concurrent LLM RL on shared GPUs

Features

Installation

Quick Start

Pipeline Types

Full Finetune Pipeline (RollFullFinetunePipeline)

Multi-LoRA Pipeline (RollMultiLoraPipeline)

Architecture

Scheduling Policy

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages

Full Finetune Pipeline (`RollFullFinetunePipeline`)

Multi-LoRA Pipeline (`RollMultiLoraPipeline`)