[SYNPY-1764] Add Trivy container vulnerability scanning#1346
[SYNPY-1764] Add Trivy container vulnerability scanning#1346BryanFauble wants to merge 1 commit intoadd-claude-mdfrom
Conversation
Add Trivy scanning to gate Docker image publication on GHCR. Both release and develop Docker jobs now follow a build→scan→push pattern where images are only pushed if no Critical/High unfixed vulnerabilities are found. New workflows: - trivy.yml: reusable Trivy scanning workflow with SARIF upload to GitHub Security tab - docker_build.yml: reusable build/scan/push workflow for image rebuilds - trivy_periodic_scan.yml: daily rescan of latest published image with auto-remediation
BryanFauble
left a comment
There was a problem hiding this comment.
Pre-review: documentation comments on complex areas to help reviewers.
Note: These comments were generated with AI assistance to help reviewers understand complex areas.
| # containerize the package and upload to the GHCR upon new release (whether pre-release or not) | ||
| ghcr-build-and-push-on-release: | ||
| # Step 1: Build the Docker image and save as tar for scanning | ||
| ghcr-build-on-release: |
There was a problem hiding this comment.
This is the core architectural change — the old single ghcr-build-and-push-on-release job has been split into a 3-job pipeline that gates image publication on a Trivy vulnerability scan.
graph LR
A[ghcr-build-on-release] -->|tar artifact| B[trivy-scan-release]
B -->|pass| C[ghcr-push-on-release]
B -->|fail: CRITICAL/HIGH found| D[Build stops — image NOT pushed]
One thing worth noting: the push job (ghcr-push-on-release) rebuilds the image from source rather than loading the tar artifact. This is because docker/build-push-action with load: true (used in the build job) is incompatible with cache-to: type=registry — they require different buildx drivers. The rebuild should be near-instant thanks to cache-from, and this lets us keep populating the registry build cache.
The tag computation moved from inline if: conditionals on two separate build steps into a single set-tags step that outputs the tag string, which the push job reads via needs.ghcr-build-on-release.outputs.image-tags.
Note: This comment was drafted with AI assistance and reviewed by me for accuracy.
| @@ -0,0 +1,91 @@ | |||
| --- | |||
There was a problem hiding this comment.
This is the central reusable scanning workflow — called from both build.yml (pre-push scan) and trivy_periodic_scan.yml (post-publish rescan). It supports two modes:
| Mode | SOURCE_TYPE |
How it gets the image | Used by |
|---|---|---|---|
| Pre-push | tar |
Downloads artifact from calling workflow, loads via docker load |
build.yml, docker_build.yml |
| Post-publish | image |
Trivy pulls directly from GHCR | trivy_periodic_scan.yml |
The EXIT_CODE input controls whether findings fail the workflow (1) or just report (0). Both build.yml and the periodic scan use 1 so vulnerabilities are blocking.
The alternate Trivy DB repos (public.ecr.aws/aquasecurity/trivy-db:2) are important — the default GitHub-hosted DB gets rate-limited due to high download volume across the ecosystem.
SARIF results are uploaded even when Trivy finds vulnerabilities (the success() || steps.trivy.conclusion == 'failure' condition), so findings always land in the Security tab for triage regardless of whether the build passes.
Note: This comment was drafted with AI assistance and reviewed by me for accuracy.
| @@ -0,0 +1,89 @@ | |||
| --- | |||
There was a problem hiding this comment.
This workflow rescans the latest published image daily to catch newly disclosed CVEs. The flow has a multi-job conditional chain that's worth understanding:
graph TD
A[get-image-reference] -->|"latest tag from mathieudutour/github-tag-action"| B[periodic-scan]
B -->|clean| C[Done — no action needed]
B -->|"trivy_conclusion == 'failure'"| D[bump-tag]
D -->|"new patch version tag"| E[update-image]
E -->|"calls docker_build.yml"| F[Rebuild + Trivy scan + push]
The !cancelled() condition on bump-tag and update-image is important — without it, these jobs would be skipped when the scan fails (since needs.periodic-scan would have a failure result, and GitHub Actions skips downstream jobs by default on failure). The !cancelled() override lets them run, and the trivy_conclusion == 'failure' check ensures they only run when there are actual findings.
The rebuild via docker_build.yml includes its own Trivy scan, so the patched image is only published if the rebuild actually remediates the vulnerabilities.
Note: This comment was drafted with AI assistance and reviewed by me for accuracy.
| @@ -0,0 +1,103 @@ | |||
| --- | |||
There was a problem hiding this comment.
This reusable workflow exists specifically for the periodic scan's auto-remediation path. It's a self-contained build-scan-push pipeline that the trivy_periodic_scan.yml workflow calls when it needs to rebuild an image after finding new vulnerabilities.
Unlike the build.yml push jobs (which rebuild from cache to maintain cache-to capability), this workflow uses the tar artifact approach end-to-end: the push-image job downloads the tar, loads it via docker load, then tags and pushes. This is fine here because there's no need to update the registry build cache during periodic remediation rebuilds.
The IMAGE_REFERENCES input is comma-separated, and the push step iterates over each tag — this lets the periodic scan push both a specific version tag and a major.minor tag in one go (e.g., 1.2.3 and 1.2).
Note: This comment was drafted with AI assistance and reviewed by me for accuracy.
There was a problem hiding this comment.
See news about trivy scan: https://socket.dev/blog/trivy-under-attack-again-github-actions-compromise
Also, should we merge this branch to develop rather than add-claude-md?
| run: docker load -i ${{ steps.tar-download.outputs.download-path }}/${{ inputs.TARFILE_NAME }} | ||
|
|
||
| - name: Run Trivy vulnerability scanner for any major issues | ||
| uses: aquasecurity/trivy-action@0.32.0 |
There was a problem hiding this comment.
Due to try scan attack: https://sagebionetworks.jira.com/browse/SMR-703, we should also update the trivy version like this PR here: https://github.com/Sage-Bionetworks/sage-monorepo/pull/3951/changes
Summary
New workflow files
trivy.ymldocker_build.ymltrivy_periodic_scan.ymlKey Trivy settings
ignore-unfixed: true— only actionable vulnerabilitiesseverity: CRITICAL,HIGH— skip Medium/Lowexit-code: 1— fail builds on findingspublic.ecr.aws) to avoid rate limitsTest plan
trivy_periodic_scan.ymlviaworkflow_dispatchand verify it scans the latest published image🤖 Generated with Claude Code