fix: align RL forward-backward loss #207

sbassam · 2026-02-10T06:14:22Z

Note

Medium Risk
Changes the public RL API contract and request/response schemas (notably forward-backward loss fields), which can break existing clients and requires coordinated backend/client updates.

Overview
Adds a new async RL sampling API: POST /rl/training-sessions/{session_id}:sample plus GET /rl/training-sessions/{session_id}/operations/sample/{operation_id}, with new request/response schemas for prompts, sampling parameters, and generated sequences with logprobs.

Refactors RL forward-backward loss inputs in the OpenAPI spec by replacing loss_fn/loss_fn_inputs with a structured loss/loss_inputs model (RL.LossConfig, RL.LossType, and GRPO-specific inputs like advantages/logprobs), fixes token datatype to integer, and adds metrics to RL.ForwardBackwardResult for loss-specific reporting.

^{Written by Cursor Bugbot for commit dfc5517. This will update automatically on new commits. Configure here.}

github-actions · 2026-02-10T06:14:48Z

✱ Stainless preview builds

This PR will update the togetherai SDKs with the following commit message.

fix: align RL forward-backward loss

⚠️

togetherai-typescript studio · code

There was a regression in your SDK.
generate ⚠️ → build ✅ → lint ✅ → test ✅
npm install https://pkg.stainless.com/s/togetherai-typescript/6df3f308bd1947723d5a844afc3bcc9d72898b45/dist.tar.gz

⚠️

togetherai-openapi studio · code

There was a regression in your SDK.
generate ⚠️

⚠️

togetherai-python studio · code

There was a regression in your SDK.
generate ⚠️ → build ✅ → lint ✅ → test ⏳
pip install https://pkg.stainless.com/s/togetherai-python/19bc82354c663ec5af52bf502b5f7ac09ce163ca/together-2.1.0-py3-none-any.whl

⚠️

togetherai-go studio · code

There was a regression in your SDK.
generate ⚠️ → lint ✅ → test ✅
go get github.com/stainless-sdks/togetherai-go@2b4aeaff6c421bc0b894e1bb54aa6d36a2af9684

⚠️

togetherai-terraform studio · code

There was a regression in your SDK.
generate ⚠️ → lint ✅ → test ✅

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-02-10 23:35:15 UTC

openapi.yaml

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-10T23:26:43Z

openapi.yaml

+            - -1.2
+            - -0.3
+          items:
+            type: integer


Logprobs items type is integer instead of number

High Severity

The logprobs field in RL.SampleSequence declares its items as type: integer, but log probabilities are floating-point values. The examples right above (-0.5, -1.2, -0.3) confirm these are floats. This will cause validation errors or silent truncation when clients or servers handle actual logprob values. The type needs to be number, consistent with how RL.LossLogprobs defines its items elsewhere in this same diff.

fix: align RL forward-backward loss

de99e9d

sbassam requested review from blainekasten and khaykingleb February 10, 2026 06:14

cursor bot reviewed Feb 10, 2026

View reviewed changes

openapi.yaml Show resolved Hide resolved

blainekasten reviewed Feb 10, 2026

View reviewed changes

openapi.yaml Show resolved Hide resolved

add sampling

2c15da5

cursor bot reviewed Feb 10, 2026

View reviewed changes

openapi.yaml Show resolved Hide resolved

openapi.yaml Outdated Show resolved Hide resolved

blainekasten added 2 commits February 10, 2026 17:19

some fixes

84596d2

fixes

dfc5517

blainekasten approved these changes Feb 10, 2026

View reviewed changes

blainekasten merged commit eafdaa6 into main Feb 10, 2026
5 of 6 checks passed

cursor bot reviewed Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: align RL forward-backward loss #207

fix: align RL forward-backward loss #207

Uh oh!

sbassam commented Feb 10, 2026 •

edited by cursor bot

Loading

Uh oh!

github-actions bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: align RL forward-backward loss #207

fix: align RL forward-backward loss #207

Uh oh!

Conversation

sbassam commented Feb 10, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 10, 2026

Choose a reason for hiding this comment

Logprobs items type is integer instead of number

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sbassam commented Feb 10, 2026 •

edited by cursor bot

Loading

github-actions bot commented Feb 10, 2026 •

edited

Loading