diff --git a/.agents/skills/sdk-ci-triage/SKILL.md b/.agents/skills/sdk-ci-triage/SKILL.md index ddb2f270..b1b57953 100644 --- a/.agents/skills/sdk-ci-triage/SKILL.md +++ b/.agents/skills/sdk-ci-triage/SKILL.md @@ -34,13 +34,14 @@ Read when relevant: - `lint`: pre-commit diff-based checks - `ensure-pinned-actions`: workflow hygiene +- `static_checks`: Ubuntu-only Python matrix for `pylint` and `test_types` - `smoke`: install/import matrix across Python and OS - `nox`: provider and core test matrix, sharded through `py/scripts/nox-matrix.py` - `adk-py`: reusable workflow for ADK coverage - `langchain-py`: reusable workflow for LangChain coverage - `upload-wheel`: build wheel sanity check -The most common failure source is the `nox` matrix job. +The most common failure source is still the `nox` matrix job, but `pylint` and `test_types` failures now surface through `static_checks`, not through `nox`. ## Standard Workflow @@ -48,15 +49,17 @@ The most common failure source is the `nox` matrix job. 2. Inspect the failing job logs with `gh`. 3. Determine which workflow branch failed: - `lint` + - `static_checks` - `smoke` - `nox` - reusable workflow (`adk-py`, `langchain-py`) - `upload-wheel` 4. For `nox` failures, map the matrix job to the exact nox session and pinned provider version from the logs. -5. Reproduce the narrowest failing command locally. -6. Fix the bug. -7. Re-run the narrowest failing command first. -8. Expand only if shared code changed. +5. For `static_checks` failures, identify whether `pylint` or `test_types` failed under the reported Python version. +6. Reproduce the narrowest failing command locally. +7. Fix the bug. +8. Re-run the narrowest failing command first. +9. Expand only if shared code changed. Do not start by running the whole suite locally unless the failure genuinely spans many sessions. @@ -94,25 +97,29 @@ gh api repos/braintrustdata/braintrust-sdk-python/actions/jobs//logs Job names look like this: ```text -nox (3.10, ubuntu-latest, 0) +nox (3.10, ubuntu-24.04, 0) ``` That means: - Python `3.10` -- OS `ubuntu-latest` +- OS `ubuntu-24.04` - shard `0` out of 4 The workflow runs: ```bash -mise exec python@ -- python ./py/scripts/nox-matrix.py 4 +mise exec python@ -- python ./py/scripts/nox-matrix.py 4 \ + --exclude-session pylint \ + --exclude-session test_types ``` Use a dry run first to see which sessions belong to the shard: ```bash -mise exec python@3.10 -- python ./py/scripts/nox-matrix.py 0 4 --dry-run +mise exec python@3.10 -- python ./py/scripts/nox-matrix.py 0 4 --dry-run \ + --exclude-session pylint \ + --exclude-session test_types ``` Then inspect the failing logs to find the exact session name, for example: @@ -161,6 +168,23 @@ make lint make pylint ``` +### `static_checks` + +The `static_checks` job is an Ubuntu-only Python matrix that runs `pylint` and `test_types` together for each configured Python version. + +Local equivalents: + +```bash +mise exec python@3.10 -- nox -f ./py/noxfile.py -s pylint test_types +``` + +If only one of the two sessions failed in CI, narrow locally to that specific session: + +```bash +mise exec python@3.10 -- nox -f ./py/noxfile.py -s pylint +mise exec python@3.10 -- nox -f ./py/noxfile.py -s test_types +``` + ### `smoke` The smoke job validates install + import across OS and Python versions. @@ -276,7 +300,9 @@ Preferred progression: ```bash # 1. Inspect the failing shard -mise exec python@3.10 -- python ./py/scripts/nox-matrix.py 0 4 --dry-run +mise exec python@3.10 -- python ./py/scripts/nox-matrix.py 0 4 --dry-run \ + --exclude-session pylint \ + --exclude-session test_types # 2. Reproduce the exact session cd py @@ -299,7 +325,7 @@ When answering a CI-triage question, report: Good example structure: ```text -The failing job is `nox (3.10, ubuntu-latest, 0)`. +The failing job is `nox (3.10, ubuntu-24.04, 0)`. Within that shard, the failing session is `test_google_genai(1.30.0)`. The root cause is that the tests import a symbol that does not exist in google-genai 1.30.0, even though it exists in newer versions. You can reproduce it locally with `cd py && nox -s "test_google_genai(1.30.0)"`. @@ -311,6 +337,7 @@ The fix is to gate the behavior for older versions or stop assuming the newer AP Avoid these common mistakes: - guessing the session from the provider name without checking `py/noxfile.py` +- forgetting that CI excludes `pylint` and `test_types` from the sharded `nox` job - reproducing with `latest` when CI failed on an older pinned version - running from repo root when the real SDK command belongs in `py/` - fixing the symptom in tests without understanding the provider-version contract diff --git a/.github/workflows/adk-py-test.yaml b/.github/workflows/adk-py-test.yaml index adbfa294..85a4d832 100644 --- a/.github/workflows/adk-py-test.yaml +++ b/.github/workflows/adk-py-test.yaml @@ -9,7 +9,7 @@ on: jobs: test: - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 15 steps: diff --git a/.github/workflows/checks.yaml b/.github/workflows/checks.yaml index 4d4b1812..0f373e77 100644 --- a/.github/workflows/checks.yaml +++ b/.github/workflows/checks.yaml @@ -10,7 +10,7 @@ permissions: jobs: lint: - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 10 steps: - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 @@ -26,13 +26,31 @@ jobs: mise exec -- pre-commit run --from-ref origin/${{ github.base_ref || 'main' }} --to-ref HEAD ensure-pinned-actions: - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 5 steps: - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 - name: Ensure SHA pinned actions uses: zgosalvez/github-actions-ensure-sha-pinned-actions@70c4af2ed5282c51ba40566d026d6647852ffa3e # v5.0.1 + static_checks: + runs-on: ubuntu-24.04 + timeout-minutes: 20 + strategy: + fail-fast: false + matrix: + python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"] + steps: + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 + - name: Setup Python environment + uses: ./.github/actions/setup-python-env + with: + python-version: ${{ matrix.python-version }} + - name: Run pylint and type tests + shell: bash + run: | + mise exec python@${{ matrix.python-version }} -- nox -f ./py/noxfile.py -s pylint test_types + smoke: runs-on: ${{ matrix.os }} timeout-minutes: 20 @@ -41,7 +59,7 @@ jobs: fail-fast: false matrix: python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"] - os: [ubuntu-latest, windows-latest] + os: [ubuntu-24.04, windows-2025] steps: - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 @@ -66,7 +84,7 @@ jobs: fail-fast: false matrix: python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"] - os: [ubuntu-latest, windows-latest] + os: [ubuntu-24.04, windows-2025] shard: [0, 1, 2, 3] steps: @@ -78,7 +96,9 @@ jobs: - name: Run nox tests (shard ${{ matrix.shard }}/4) shell: bash run: | - mise exec python@${{ matrix.python-version }} -- python ./py/scripts/nox-matrix.py ${{ matrix.shard }} 4 + mise exec python@${{ matrix.python-version }} -- python ./py/scripts/nox-matrix.py ${{ matrix.shard }} 4 \ + --exclude-session pylint \ + --exclude-session test_types adk-py: uses: ./.github/workflows/adk-py-test.yaml @@ -90,7 +110,7 @@ jobs: needs: - smoke - nox - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 10 steps: - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 @@ -114,12 +134,13 @@ jobs: needs: - lint - ensure-pinned-actions + - static_checks - smoke - nox - adk-py - langchain-py - upload-wheel - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 5 if: always() steps: @@ -138,12 +159,13 @@ jobs: } check_result "lint" "${{ needs.lint.result }}" - check_result "ensure-pinned-actions" "${{ needs.ensure-pinned-actions.result }}" + check_result "ensure-pinned-actions" "${{ needs['ensure-pinned-actions'].result }}" + check_result "static_checks" "${{ needs.static_checks.result }}" check_result "smoke" "${{ needs.smoke.result }}" check_result "nox" "${{ needs.nox.result }}" - check_result "adk-py" "${{ needs.adk-py.result }}" - check_result "langchain-py" "${{ needs.langchain-py.result }}" - check_result "upload-wheel" "${{ needs.upload-wheel.result }}" + check_result "adk-py" "${{ needs['adk-py'].result }}" + check_result "langchain-py" "${{ needs['langchain-py'].result }}" + check_result "upload-wheel" "${{ needs['upload-wheel'].result }}" if [ "$FAILED" -ne 0 ]; then echo "One or more required checks failed" diff --git a/.github/workflows/langchain-py-test.yaml b/.github/workflows/langchain-py-test.yaml index c49495f6..e53f1342 100644 --- a/.github/workflows/langchain-py-test.yaml +++ b/.github/workflows/langchain-py-test.yaml @@ -5,7 +5,7 @@ on: jobs: test: - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 15 steps: diff --git a/AGENTS.md b/AGENTS.md index 6da9e3a1..ca3b2a72 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -15,7 +15,8 @@ Use this file as the default playbook for work in this repository. 2. **Use `mise` as the source of truth for tools and environment.** 3. **Do not guess test commands or version coverage.** - - `py/noxfile.py` is the source of truth for nox session names, provider/version matrices, and CI coverage. + - `py/noxfile.py` is the source of truth for nox session names, provider/version matrices, and local reproduction commands. + - `.github/workflows/checks.yaml` is the source of truth for which sessions run in CI, on which Python versions, and outside vs. inside the nox shard matrix. - For provider and integration work, also check `py/src/braintrust/integrations/versioning.py`. 4. **Keep changes narrow and validate with the smallest relevant test first.** @@ -116,7 +117,7 @@ Do not guess: - supported provider versions - which tests a provider session runs -Check `py/noxfile.py` and reproduce with the exact local session CI uses. +Check `py/noxfile.py` and `.github/workflows/checks.yaml`, then reproduce with the exact local session CI uses. ### Run the smallest relevant test first @@ -143,6 +144,8 @@ Before changing provider/integration behavior: - `test_core` runs without optional vendor packages. - `test_types` runs pyright, mypy, and pytest on `py/src/braintrust/type_tests/`. +- CI runs `pylint` and `test_types` via the dedicated `static_checks` workflow job on Ubuntu across the configured Python matrix, not inside the sharded `nox` job. +- The sharded `nox` workflow excludes `pylint` and `test_types`; use `py/scripts/nox-matrix.py --exclude-session ...` when reproducing shard membership locally. - wrapper coverage is split across dedicated nox sessions by provider/version. - `test-wheel` is a wheel sanity check and requires a built wheel first. diff --git a/py/scripts/nox-matrix.py b/py/scripts/nox-matrix.py index 4c9ff0c3..11460abf 100644 --- a/py/scripts/nox-matrix.py +++ b/py/scripts/nox-matrix.py @@ -6,7 +6,7 @@ by weight descending and greedily assigns each to the lightest shard. Usage: - python nox-matrix.py [--dry-run] + python nox-matrix.py [--dry-run] [--exclude-session ...] """ import argparse @@ -80,6 +80,12 @@ def main() -> None: parser.add_argument("shard_index", type=int, help="Zero-based shard index") parser.add_argument("num_shards", type=int, help="Total number of shards") parser.add_argument("--dry-run", action="store_true", help="Print assignment without running nox") + parser.add_argument( + "--exclude-session", + action="append", + default=[], + help="Exclude a nox session from shard assignment. May be passed multiple times.", + ) parser.add_argument( "--output-durations", type=Path, @@ -108,6 +114,8 @@ def main() -> None: weights_file = root_dir / "py" / "scripts" / "session-weights.json" all_sessions = get_nox_sessions(noxfile) + excluded_sessions = set(args.exclude_session) + all_sessions = [session for session in all_sessions if session not in excluded_sessions] weights, default_weight = load_weights(weights_file) shard_assignments = assign_shards(all_sessions, args.num_shards, weights, default_weight)