-
Notifications
You must be signed in to change notification settings - Fork 15
fix: align RL forward-backward loss #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✱ Stainless preview buildsThis PR will update the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| - -1.2 | ||
| - -0.3 | ||
| items: | ||
| type: integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logprobs items type is integer instead of number
High Severity
The logprobs field in RL.SampleSequence declares its items as type: integer, but log probabilities are floating-point values. The examples right above (-0.5, -1.2, -0.3) confirm these are floats. This will cause validation errors or silent truncation when clients or servers handle actual logprob values. The type needs to be number, consistent with how RL.LossLogprobs defines its items elsewhere in this same diff.


Note
Medium Risk
Changes the public RL API contract and request/response schemas (notably forward-backward loss fields), which can break existing clients and requires coordinated backend/client updates.
Overview
Adds a new async RL sampling API:
POST /rl/training-sessions/{session_id}:sampleplusGET /rl/training-sessions/{session_id}/operations/sample/{operation_id}, with new request/response schemas for prompts, sampling parameters, and generated sequences with logprobs.Refactors RL forward-backward loss inputs in the OpenAPI spec by replacing
loss_fn/loss_fn_inputswith a structuredloss/loss_inputsmodel (RL.LossConfig,RL.LossType, and GRPO-specific inputs like advantages/logprobs), fixes token datatype to integer, and addsmetricstoRL.ForwardBackwardResultfor loss-specific reporting.Written by Cursor Bugbot for commit dfc5517. This will update automatically on new commits. Configure here.