In agentic RL, long-horizon rollouts are increasingly long-tailed: a small number of stragglers dominate wall-clock time while many GPUs sit underutilized.
RLix addresses this by time-sharing GPUs across concurrent RL training jobs, expanding rollout workers onto temporarily idle capacity and shrinking them when that capacity is needed elsewhere. RLix does not change pipeline-level training semantics: each recipe retains its original behavior, whether on-policy or off-policy under staleness bounds, while delivering much higher GPU utilization.
RLix builds on Partial Overlapping scheduling from Alibaba/ROLL and extends it with a distributed control plane for coordinating multiple independent training jobs on a shared GPU cluster.
RLix is an AI-native project, with AI deeply involved across design, planning, implementation, testing, and code review, alongside human oversight. Correctness, code quality, and maintainability remain first-class concerns.
- Recipe-Transparent Scheduling: Training logic stays fully decoupled from GPU scheduling, so each pipeline can be developed in isolation.
- Two-Level GPU Sharing: GPUs are shared both across pipelines through elastic expand/shrink and within a pipeline through multi-LoRA adapters on a shared base model.
- Demand-Driven Rollout Scaling: Rollout workers expand onto idle GPU capacity and shrink based on heartbeat-reported demand.
- Efficient Memory Management: Model weights are cached on the trainer CPU and synced on demand only to resumed rollout workers; when workers shrink, inference weights are dropped to minimize memory footprint.
git clone https://github.com/rlops/rlix.git
cd rlix
pip install -e .The example below shows a minimal control-plane setup for registering and running a pipeline under RLix.
import ray
import rlix
from rlix.pipeline import PipelineCoordinator
from rlix.protocol.types import COORDINATOR_ACTOR_NAME_PREFIX
# Pipeline-specific configuration object
my_config = ...
# 1. Initialize the RLix control plane
orchestrator = rlix.init(create_if_missing=True)
# 2. Allocate a pipeline ID
pipeline_id = ray.get(orchestrator.allocate_pipeline_id.remote("ft"))
# 3. Register the pipeline's GPU topology
ray.get(
orchestrator.register_pipeline.remote(
pipeline_id=pipeline_id,
ray_namespace=f"pipeline_{pipeline_id}_NS",
cluster_tp_configs={"actor_train": 8, "actor_infer": 8},
cluster_device_mappings={
"actor_train": [0, 1, 2, 3, 4, 5, 6, 7],
"actor_infer": [0, 1, 2, 3, 4, 5, 6, 7],
},
)
)
# 4. Admit the pipeline before GPU allocation
ray.get(orchestrator.admit_pipeline.remote(pipeline_id=pipeline_id))
# 5. Create the pipeline coordinator
CoordinatorActor = ray.remote(PipelineCoordinator)
coordinator = CoordinatorActor.options(
name=f"{COORDINATOR_ACTOR_NAME_PREFIX}{pipeline_id}",
namespace=f"pipeline_{pipeline_id}_NS",
).remote(pipeline_id=pipeline_id, pipeline_config=my_config)
# 6. Create and run the pipeline
pipeline_actor = ray.get(
coordinator.create_pipeline_actor.remote(pipeline_config=my_config)
)
ray.get(pipeline_actor.run.remote())See examples/ for complete multi-pipeline examples and full configuration options.
RLix currently supports two built-in pipeline types:
Full-parameter training with elastic GPU expand/shrink. Each job trains all model weights while yielding idle GPUs to other jobs.
Concurrent training of multiple LoRA adapters on a shared base model, with an isolated optimizer for each adapter. Jobs share the base model in GPU memory while keeping adapter weights and optimizer states fully independent.
Beyond these built-in options, RLix supports custom pipelines and integrations that follow the RLix control-plane protocol.
RLix separates scheduling from per-pipeline training logic: a shared control plane coordinates multiple independent jobs, while rollout stages elastically consume residual GPU capacity.
┌───────────────────────────────────────────────────────────┐
│ RLix Control Plane │
├──────────────────┬──────────────────┬─────────────────────┤
│ Orchestrator │ Scheduler │ Resource Manager │
│ (lifecycle mgmt) │ (priority + │ (GPU topology) │
│ │ rollout preempt) │ │
└────────┬─────────┴────────┬─────────┴─────────┬───────────┘
│ │ │
┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐
│FullFine- │ │Multi-LoRA│ │Custom / │
│tune Job 1│ │ Job 2 │ │External │
│ │ │ │ │ Job N │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
┌────▼──────────────────▼───────────────────▼────┐
│ Shared GPU Capacity │
│ [GPU 0] [GPU 1] [GPU 2] [GPU 3] ... [GPU N] │
└────────────────────────────────────────────────┘
The scheduler assigns GPUs in priority order, with lower values indicating higher priority. All stages except rollout are non-preemptable: once they acquire GPUs, they keep them until completion. Rollout (6) is the lowest-priority stage and is always preemptable, using only the capacity remaining after all higher-priority stages are satisfied. When multiple jobs roll out concurrently, the remaining GPUs are divided proportionally to each job’s outstanding rollout demand, subject to placement constraints.
- 0 Initialization: Model loading; must complete before scheduling begins.
- 1 Actor Training: Policy gradient update.
- 2 Critic Training: Value function update.
- 3 Old-Policy Log Probs: Log-probability computation under the previous policy.
- 4 Reference-Model Log Probs: Log-probability computation under the reference model.
- 5 Value Compute: Value estimation for advantage calculation.
- 6 Rollout: Trajectory sampling; always preemptable.