Skip to content

Constraint failure reward override#865

Merged
podkidyshev merged 6 commits intoNVIDIA:mainfrom
alexmanle:constraint_reward_override
Apr 13, 2026
Merged

Constraint failure reward override#865
podkidyshev merged 6 commits intoNVIDIA:mainfrom
alexmanle:constraint_reward_override

Conversation

@alexmanle
Copy link
Copy Markdown
Contributor

Summary

Adds an agent config flag to override the default -1.0 reward.

Example TOML Usage

[agent_config]
constraint_reward_override = 0.01

AIConfigurator Example

To induce a constraint failure, a "dummy" constraint was added.
Default behavior (no override):

[INFO] Running step 3 (of 4) with action {'disagg.p_tp': 2, 'disagg.p_pp': 1, 'disagg.p_dp': 1, 'disagg.p_workers': 8, 'disagg.d_tp': 1, 'disagg.d_pp': 1, 'disagg.d_dp': 1, 'disagg.d_bs': 32, 'disagg.d_workers': 16}
[INFO] Constraint check failed. Skipping step.
[INFO] Step 3: Observation: [-1.0], Reward: -1.0000

With override applied:

[INFO] Running step 3 (of 4) with action {'disagg.p_tp': 2, 'disagg.p_pp': 1, 'disagg.p_dp': 1, 'disagg.p_workers': 8, 'disagg.d_tp': 1, 'disagg.d_pp': 1, 'disagg.d_dp': 1, 'disagg.d_bs': 32, 'disagg.d_workers': 16}
[INFO] Constraint check failed. Skipping step.
[INFO] Step 3: Observation: [-1.0], Reward: 0.0100

Test Plan

Added constraint failure test.

uv run python3 -m pytest tests/test_cloudaigym.py::test_constraint_failure -v

This checks that when no override is given the default reward = -1.0.
When a custom override (-2.5) is set, the returned reward = -2.5

tests/test_cloudaigym.py::test_constraint_failure[default_penalty] PASSED                                                                                                                                                                                                                           [ 50%]
tests/test_cloudaigym.py::test_constraint_failure[custom_penalty] PASSED 

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 34da728c-3157-4f33-9e32-b75e4ff06fbe

📥 Commits

Reviewing files that changed from the base of the PR and between b13da3d and d047457.

📒 Files selected for processing (1)
  • src/cloudai/configurator/base_gym.py

📝 Walkthrough

Walkthrough

Added a constraint-reward override across config, env, and handler layers: BaseAgentConfig.constraint_reward_override added; BaseGym.step and CloudAIGymEnv.step signatures accept constraint_check_reward; DSE job handler conditionally forwards the override into env.step() when not -1.0.

Changes

Cohort / File(s) Summary
Configuration
src/cloudai/configurator/base_agent.py
Added public field constraint_reward_override: float = -1.0 to agent config.
Gym API
src/cloudai/configurator/base_gym.py, src/cloudai/configurator/cloudai_gym.py
Updated BaseGym.step signature to accept constraint_check_reward; updated CloudAIGymEnv.step to accept constraint_check_reward: float = -1.0 and return that value on constraint-check failure instead of a fixed -1.0 reward.
Handler Logic
src/cloudai/cli/handlers.py
handle_dse_job now conditionally forwards agent_config.constraint_reward_override into env.step(action, ...) when value != -1.0; otherwise calls env.step(action) with the original signature.
Tests
tests/test_cloudaigym.py
Added parametrized test_constraint_failure to assert observation, done, info, and reward behavior for default and overridden constraint-check rewards.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble code paths, soft and spry,
A tiny override hops by—oh my!
Minus one nests as the default tune,
Swap the number and the step sings soon.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title directly and clearly summarizes the main change: adding the ability to override the constraint failure reward penalty.
Description check ✅ Passed The pull request description is well-related to the changeset, providing a clear summary, TOML usage example, demonstration of behavior, and test plan details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 104-111: Update the abstract BaseGym.step signature to match
CloudAIGymEnv.step by adding the constraint_check_reward: float = -1.0 parameter
and keeping the same return type and typing (Tuple[list, float, bool, dict]);
modify the BaseGym.step method declaration and its docstring to include and
document constraint_check_reward so subclasses satisfy the contract and callers
(e.g., handlers using this arg) remain type-correct—look for the BaseGym class
and its step method and change the signature from step(self, action: Any) ->
Tuple[...] to step(self, action: Any, constraint_check_reward: float = -1.0) ->
Tuple[...].
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e9050d73-b71f-4dc8-ac8d-fb1b5ee1558c

📥 Commits

Reviewing files that changed from the base of the PR and between 57f4a4f and 4f32272.

📒 Files selected for processing (4)
  • src/cloudai/cli/handlers.py
  • src/cloudai/configurator/base_agent.py
  • src/cloudai/configurator/cloudai_gym.py
  • tests/test_cloudaigym.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cloudai/configurator/base_gym.py`:
- Line 70: The abstract method BaseGym.step currently requires
constraint_check_reward with no default while CloudAIGymEnv.step defines
constraint_check_reward: float = -1.0 and call sites sometimes call
env.step(action) — update the BaseGym.step signature to provide the same default
(e.g., constraint_check_reward: float = -1.0) so the abstract contract matches
concrete CloudAIGymEnv.step and existing call sites; adjust any type hints or
docstrings referencing BaseGym.step to reflect the default as well.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3015b63c-5872-4715-999c-5d1d9634aafa

📥 Commits

Reviewing files that changed from the base of the PR and between 4f32272 and b13da3d.

📒 Files selected for processing (2)
  • src/cloudai/configurator/base_gym.py
  • tests/test_cloudaigym.py

Copy link
Copy Markdown
Contributor

@srivatsankrishnan srivatsankrishnan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks Alex for this.

@srivatsankrishnan
Copy link
Copy Markdown
Contributor

@podkidyshev can you review this as well?

@podkidyshev
Copy link
Copy Markdown
Contributor

Lemme know when merge

@podkidyshev podkidyshev merged commit 0442ec9 into NVIDIA:main Apr 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants