Skip to content

[SYNPY-1764] Add Trivy container vulnerability scanning#1346

Open
BryanFauble wants to merge 1 commit intoadd-claude-mdfrom
synpy-1764-trivy-scanning
Open

[SYNPY-1764] Add Trivy container vulnerability scanning#1346
BryanFauble wants to merge 1 commit intoadd-claude-mdfrom
synpy-1764-trivy-scanning

Conversation

@BryanFauble
Copy link
Member

Summary

  • Add Trivy vulnerability scanning to gate Docker image publication on GHCR, following Sage's Container Vulnerability Scanning guidelines
  • Restructure both release and develop Docker jobs from single build+push steps into a build → scan → push pattern — images are only pushed if no Critical/High unfixed vulnerabilities are found
  • Add daily periodic scan of the latest published image with auto-remediation (patch version bump + rebuild) when new vulnerabilities are detected

New workflow files

File Purpose
trivy.yml Reusable Trivy scanning workflow — scans tar or remote image, uploads SARIF to GitHub Security tab
docker_build.yml Reusable build/scan/push workflow for periodic rebuilds
trivy_periodic_scan.yml Daily rescan of latest published image with auto-remediation

Key Trivy settings

  • ignore-unfixed: true — only actionable vulnerabilities
  • severity: CRITICAL,HIGH — skip Medium/Low
  • exit-code: 1 — fail builds on findings
  • SARIF upload to GitHub Security tab for triage
  • Alternate Trivy DB repos (public.ecr.aws) to avoid rate limits

Test plan

  • Push to develop branch and verify the build → Trivy scan → push flow completes successfully
  • Verify SARIF results appear in the repo's Security tab (Code Scanning)
  • Verify Docker image is pushed to GHCR only after Trivy scan passes
  • Manually trigger trivy_periodic_scan.yml via workflow_dispatch and verify it scans the latest published image
  • Create a pre-release to verify the release Docker flow works end-to-end

🤖 Generated with Claude Code

Add Trivy scanning to gate Docker image publication on GHCR. Both release
and develop Docker jobs now follow a build→scan→push pattern where images
are only pushed if no Critical/High unfixed vulnerabilities are found.

New workflows:
- trivy.yml: reusable Trivy scanning workflow with SARIF upload to GitHub Security tab
- docker_build.yml: reusable build/scan/push workflow for image rebuilds
- trivy_periodic_scan.yml: daily rescan of latest published image with auto-remediation
@BryanFauble BryanFauble requested a review from a team as a code owner March 23, 2026 22:20
@BryanFauble BryanFauble changed the base branch from develop to add-claude-md March 23, 2026 22:21
Copy link
Member Author

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-review: documentation comments on complex areas to help reviewers.

Note: These comments were generated with AI assistance to help reviewers understand complex areas.

# containerize the package and upload to the GHCR upon new release (whether pre-release or not)
ghcr-build-and-push-on-release:
# Step 1: Build the Docker image and save as tar for scanning
ghcr-build-on-release:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core architectural change — the old single ghcr-build-and-push-on-release job has been split into a 3-job pipeline that gates image publication on a Trivy vulnerability scan.

graph LR
    A[ghcr-build-on-release] -->|tar artifact| B[trivy-scan-release]
    B -->|pass| C[ghcr-push-on-release]
    B -->|fail: CRITICAL/HIGH found| D[Build stops — image NOT pushed]
Loading

One thing worth noting: the push job (ghcr-push-on-release) rebuilds the image from source rather than loading the tar artifact. This is because docker/build-push-action with load: true (used in the build job) is incompatible with cache-to: type=registry — they require different buildx drivers. The rebuild should be near-instant thanks to cache-from, and this lets us keep populating the registry build cache.

The tag computation moved from inline if: conditionals on two separate build steps into a single set-tags step that outputs the tag string, which the push job reads via needs.ghcr-build-on-release.outputs.image-tags.

Note: This comment was drafted with AI assistance and reviewed by me for accuracy.

@@ -0,0 +1,91 @@
---
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the central reusable scanning workflow — called from both build.yml (pre-push scan) and trivy_periodic_scan.yml (post-publish rescan). It supports two modes:

Mode SOURCE_TYPE How it gets the image Used by
Pre-push tar Downloads artifact from calling workflow, loads via docker load build.yml, docker_build.yml
Post-publish image Trivy pulls directly from GHCR trivy_periodic_scan.yml

The EXIT_CODE input controls whether findings fail the workflow (1) or just report (0). Both build.yml and the periodic scan use 1 so vulnerabilities are blocking.

The alternate Trivy DB repos (public.ecr.aws/aquasecurity/trivy-db:2) are important — the default GitHub-hosted DB gets rate-limited due to high download volume across the ecosystem.

SARIF results are uploaded even when Trivy finds vulnerabilities (the success() || steps.trivy.conclusion == 'failure' condition), so findings always land in the Security tab for triage regardless of whether the build passes.

Note: This comment was drafted with AI assistance and reviewed by me for accuracy.

@@ -0,0 +1,89 @@
---
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow rescans the latest published image daily to catch newly disclosed CVEs. The flow has a multi-job conditional chain that's worth understanding:

graph TD
    A[get-image-reference] -->|"latest tag from mathieudutour/github-tag-action"| B[periodic-scan]
    B -->|clean| C[Done — no action needed]
    B -->|"trivy_conclusion == 'failure'"| D[bump-tag]
    D -->|"new patch version tag"| E[update-image]
    E -->|"calls docker_build.yml"| F[Rebuild + Trivy scan + push]
Loading

The !cancelled() condition on bump-tag and update-image is important — without it, these jobs would be skipped when the scan fails (since needs.periodic-scan would have a failure result, and GitHub Actions skips downstream jobs by default on failure). The !cancelled() override lets them run, and the trivy_conclusion == 'failure' check ensures they only run when there are actual findings.

The rebuild via docker_build.yml includes its own Trivy scan, so the patched image is only published if the rebuild actually remediates the vulnerabilities.

Note: This comment was drafted with AI assistance and reviewed by me for accuracy.

@@ -0,0 +1,103 @@
---
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reusable workflow exists specifically for the periodic scan's auto-remediation path. It's a self-contained build-scan-push pipeline that the trivy_periodic_scan.yml workflow calls when it needs to rebuild an image after finding new vulnerabilities.

Unlike the build.yml push jobs (which rebuild from cache to maintain cache-to capability), this workflow uses the tar artifact approach end-to-end: the push-image job downloads the tar, loads it via docker load, then tags and pushes. This is fine here because there's no need to update the registry build cache during periodic remediation rebuilds.

The IMAGE_REFERENCES input is comma-separated, and the push step iterates over each tag — this lets the periodic scan push both a specific version tag and a major.minor tag in one go (e.g., 1.2.3 and 1.2).

Note: This comment was drafted with AI assistance and reviewed by me for accuracy.

Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See news about trivy scan: https://socket.dev/blog/trivy-under-attack-again-github-actions-compromise
Also, should we merge this branch to develop rather than add-claude-md?

run: docker load -i ${{ steps.tar-download.outputs.download-path }}/${{ inputs.TARFILE_NAME }}

- name: Run Trivy vulnerability scanner for any major issues
uses: aquasecurity/trivy-action@0.32.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to try scan attack: https://sagebionetworks.jira.com/browse/SMR-703, we should also update the trivy version like this PR here: https://github.com/Sage-Bionetworks/sage-monorepo/pull/3951/changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants