Skip to content

Conversation

@sbassam
Copy link
Contributor

@sbassam sbassam commented Feb 10, 2026

Note

Medium Risk
Changes the public RL API contract and request/response schemas (notably forward-backward loss fields), which can break existing clients and requires coordinated backend/client updates.

Overview
Adds a new async RL sampling API: POST /rl/training-sessions/{session_id}:sample plus GET /rl/training-sessions/{session_id}/operations/sample/{operation_id}, with new request/response schemas for prompts, sampling parameters, and generated sequences with logprobs.

Refactors RL forward-backward loss inputs in the OpenAPI spec by replacing loss_fn/loss_fn_inputs with a structured loss/loss_inputs model (RL.LossConfig, RL.LossType, and GRPO-specific inputs like advantages/logprobs), fixes token datatype to integer, and adds metrics to RL.ForwardBackwardResult for loss-specific reporting.

Written by Cursor Bugbot for commit dfc5517. This will update automatically on new commits. Configure here.

@github-actions
Copy link

github-actions bot commented Feb 10, 2026

✱ Stainless preview builds

This PR will update the togetherai SDKs with the following commit message.

fix: align RL forward-backward loss
⚠️ togetherai-typescript studio · code

There was a regression in your SDK.
generate ⚠️build ✅lint ✅test ✅

npm install https://pkg.stainless.com/s/togetherai-typescript/6df3f308bd1947723d5a844afc3bcc9d72898b45/dist.tar.gz
⚠️ togetherai-openapi studio · code

There was a regression in your SDK.
generate ⚠️

⚠️ togetherai-python studio · code

There was a regression in your SDK.
generate ⚠️build ✅lint ✅test ⏳

pip install https://pkg.stainless.com/s/togetherai-python/19bc82354c663ec5af52bf502b5f7ac09ce163ca/together-2.1.0-py3-none-any.whl
⚠️ togetherai-go studio · code

There was a regression in your SDK.
generate ⚠️lint ✅test ✅

go get github.com/stainless-sdks/togetherai-go@2b4aeaff6c421bc0b894e1bb54aa6d36a2af9684
⚠️ togetherai-terraform studio · code

There was a regression in your SDK.
generate ⚠️lint ✅test ✅


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-02-10 23:35:15 UTC

@blainekasten blainekasten merged commit eafdaa6 into main Feb 10, 2026
5 of 6 checks passed
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

- -1.2
- -0.3
items:
type: integer
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logprobs items type is integer instead of number

High Severity

The logprobs field in RL.SampleSequence declares its items as type: integer, but log probabilities are floating-point values. The examples right above (-0.5, -1.2, -0.3) confirm these are floats. This will cause validation errors or silent truncation when clients or servers handle actual logprob values. The type needs to be number, consistent with how RL.LossLogprobs defines its items elsewhere in this same diff.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants