Constraint failure reward override by alexmanle · Pull Request #865 · NVIDIA/cloudai

alexmanle · 2026-04-09T18:13:22Z

Summary

Adds an agent config flag to override the default -1.0 reward.

Example TOML Usage

[agent_config]
constraint_reward_override = 0.01

AIConfigurator Example

To induce a constraint failure, a "dummy" constraint was added.
Default behavior (no override):

[INFO] Running step 3 (of 4) with action {'disagg.p_tp': 2, 'disagg.p_pp': 1, 'disagg.p_dp': 1, 'disagg.p_workers': 8, 'disagg.d_tp': 1, 'disagg.d_pp': 1, 'disagg.d_dp': 1, 'disagg.d_bs': 32, 'disagg.d_workers': 16}
[INFO] Constraint check failed. Skipping step.
[INFO] Step 3: Observation: [-1.0], Reward: -1.0000

With override applied:

[INFO] Running step 3 (of 4) with action {'disagg.p_tp': 2, 'disagg.p_pp': 1, 'disagg.p_dp': 1, 'disagg.p_workers': 8, 'disagg.d_tp': 1, 'disagg.d_pp': 1, 'disagg.d_dp': 1, 'disagg.d_bs': 32, 'disagg.d_workers': 16}
[INFO] Constraint check failed. Skipping step.
[INFO] Step 3: Observation: [-1.0], Reward: 0.0100

Test Plan

Added constraint failure test.

uv run python3 -m pytest tests/test_cloudaigym.py::test_constraint_failure -v

This checks that when no override is given the default reward = -1.0.
When a custom override (-2.5) is set, the returned reward = -2.5

tests/test_cloudaigym.py::test_constraint_failure[default_penalty] PASSED                                                                                                                                                                                                                           [ 50%]
tests/test_cloudaigym.py::test_constraint_failure[custom_penalty] PASSED

coderabbitai · 2026-04-09T18:13:38Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 34da728c-3157-4f33-9e32-b75e4ff06fbe

📥 Commits

Reviewing files that changed from the base of the PR and between b13da3d and d047457.

📒 Files selected for processing (1)

src/cloudai/configurator/base_gym.py

📝 Walkthrough

Walkthrough

Added a constraint-reward override across config, env, and handler layers: BaseAgentConfig.constraint_reward_override added; BaseGym.step and CloudAIGymEnv.step signatures accept constraint_check_reward; DSE job handler conditionally forwards the override into env.step() when not -1.0.

Changes

Cohort / File(s)	Summary
Configuration `src/cloudai/configurator/base_agent.py`	Added public field `constraint_reward_override: float = -1.0` to agent config.
Gym API `src/cloudai/configurator/base_gym.py`, `src/cloudai/configurator/cloudai_gym.py`	Updated `BaseGym.step` signature to accept `constraint_check_reward`; updated `CloudAIGymEnv.step` to accept `constraint_check_reward: float = -1.0` and return that value on constraint-check failure instead of a fixed `-1.0` reward.
Handler Logic `src/cloudai/cli/handlers.py`	`handle_dse_job` now conditionally forwards `agent_config.constraint_reward_override` into `env.step(action, ...)` when value != `-1.0`; otherwise calls `env.step(action)` with the original signature.
Tests `tests/test_cloudaigym.py`	Added parametrized `test_constraint_failure` to assert observation, done, info, and reward behavior for default and overridden constraint-check rewards.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble code paths, soft and spry,
A tiny override hops by—oh my!
Minus one nests as the default tune,
Swap the number and the step sings soon.
✨

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title directly and clearly summarizes the main change: adding the ability to override the constraint failure reward penalty.
Description check	✅ Passed	The pull request description is well-related to the changeset, providing a clear summary, TOML usage example, demonstration of behavior, and test plan details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 104-111: Update the abstract BaseGym.step signature to match
CloudAIGymEnv.step by adding the constraint_check_reward: float = -1.0 parameter
and keeping the same return type and typing (Tuple[list, float, bool, dict]);
modify the BaseGym.step method declaration and its docstring to include and
document constraint_check_reward so subclasses satisfy the contract and callers
(e.g., handlers using this arg) remain type-correct—look for the BaseGym class
and its step method and change the signature from step(self, action: Any) ->
Tuple[...] to step(self, action: Any, constraint_check_reward: float = -1.0) ->
Tuple[...].

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e9050d73-b71f-4dc8-ac8d-fb1b5ee1558c

📥 Commits

Reviewing files that changed from the base of the PR and between 57f4a4f and 4f32272.

📒 Files selected for processing (4)

src/cloudai/cli/handlers.py
src/cloudai/configurator/base_agent.py
src/cloudai/configurator/cloudai_gym.py
tests/test_cloudaigym.py

src/cloudai/configurator/cloudai_gym.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/configurator/base_gym.py`:
- Line 70: The abstract method BaseGym.step currently requires
constraint_check_reward with no default while CloudAIGymEnv.step defines
constraint_check_reward: float = -1.0 and call sites sometimes call
env.step(action) — update the BaseGym.step signature to provide the same default
(e.g., constraint_check_reward: float = -1.0) so the abstract contract matches
concrete CloudAIGymEnv.step and existing call sites; adjust any type hints or
docstrings referencing BaseGym.step to reflect the default as well.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3015b63c-5872-4715-999c-5d1d9634aafa

📥 Commits

Reviewing files that changed from the base of the PR and between 4f32272 and b13da3d.

📒 Files selected for processing (2)

src/cloudai/configurator/base_gym.py
tests/test_cloudaigym.py

src/cloudai/configurator/base_gym.py

srivatsankrishnan

Looks good. Thanks Alex for this.

srivatsankrishnan · 2026-04-09T19:35:40Z

@podkidyshev can you review this as well?

podkidyshev · 2026-04-10T10:58:01Z

Lemme know when merge

alexmanle added 2 commits April 9, 2026 11:01

add constraint reward failure override flag

5f8a183

add constraint failure tests

4f32272

alexmanle requested review from jeffnvidia, podkidyshev and srivatsankrishnan as code owners April 9, 2026 18:13

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

src/cloudai/configurator/cloudai_gym.py Show resolved Hide resolved

alexmanle added 2 commits April 9, 2026 11:29

update base class abstract method

4f2cf70

fix linting

b13da3d

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

src/cloudai/configurator/base_gym.py Outdated Show resolved Hide resolved

alexmanle added 2 commits April 9, 2026 11:36

fix copyright

aa21713

fix base class

d047457

srivatsankrishnan approved these changes Apr 9, 2026

View reviewed changes

podkidyshev approved these changes Apr 10, 2026

View reviewed changes

podkidyshev merged commit 0442ec9 into NVIDIA:main Apr 13, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constraint failure reward override#865

Constraint failure reward override#865
podkidyshev merged 6 commits intoNVIDIA:mainfrom
alexmanle:constraint_reward_override

alexmanle commented Apr 9, 2026

Uh oh!

coderabbitai bot commented Apr 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

srivatsankrishnan left a comment

Uh oh!

srivatsankrishnan commented Apr 9, 2026

Uh oh!

podkidyshev commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alexmanle commented Apr 9, 2026

Summary

Example TOML Usage

AIConfigurator Example

Test Plan

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

srivatsankrishnan left a comment

Choose a reason for hiding this comment

Uh oh!

srivatsankrishnan commented Apr 9, 2026

Uh oh!

podkidyshev commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Apr 9, 2026 •

edited

Loading