Skip to content

Add Kubernetes manifests and CI/CD workflows for deployment#347

Open
t0mdavid-m wants to merge 8 commits intomainfrom
claude/kubernetes-migration-plan-KQJwD
Open

Add Kubernetes manifests and CI/CD workflows for deployment#347
t0mdavid-m wants to merge 8 commits intomainfrom
claude/kubernetes-migration-plan-KQJwD

Conversation

@t0mdavid-m
Copy link
Member

@t0mdavid-m t0mdavid-m commented Mar 5, 2026

Summary

This PR introduces comprehensive Kubernetes manifests and GitHub Actions workflows to enable containerized deployment of the OpenMS Streamlit application. It establishes the infrastructure-as-code foundation for running the application in Kubernetes clusters with automated validation and integration testing.

Key Changes

Kubernetes Manifests (k8s/)

  • Base manifests (k8s/base/):

    • namespace.yaml: Creates dedicated openms namespace
    • configmap.yaml: Centralized configuration for Streamlit settings, analytics, and queue settings
    • redis.yaml: Redis deployment and service for job queue backend
    • streamlit-deployment.yaml: Main Streamlit application with 2 replicas, health probes, and resource limits
    • streamlit-service.yaml: ClusterIP service exposing Streamlit on port 8501
    • rq-worker-deployment.yaml: RQ worker for background job processing
    • cleanup-cronjob.yaml: Daily CronJob for workspace cleanup at 3 AM UTC
    • workspace-pvc.yaml: 100Gi PersistentVolumeClaim for shared workspace storage
    • ingress.yaml: NGINX ingress with WebSocket support, session affinity, and streaming optimizations
  • Overlay (k8s/overlays/template-app/):

    • kustomization.yaml: Kustomize configuration with namePrefix, image patching, and ingress host customization

GitHub Actions Workflows

  • k8s-manifests-ci.yml:

    • Validates Kubernetes manifests using kubeconform against v1.28.0
    • Builds and validates Kustomize overlays
    • Integration testing: Creates kind cluster, deploys application, verifies Redis connectivity and deployment readiness
  • build-and-push-image.yml:

    • Builds and pushes Docker image to GHCR on main branch and version tags
    • Generates semantic versioning and SHA-based image tags
    • Supports manual workflow dispatch

Configuration Updates

  • .streamlit/config.toml: Removed enableCORS and enableXsrfProtection flags (now handled by ingress)
  • clean-up-workspaces.py: Made workspace directory configurable via WORKSPACES_DIR environment variable for Kubernetes deployment

Notable Implementation Details

  • High Availability: Streamlit deployment configured with 2 replicas and session affinity via NGINX ingress cookies
  • Resource Management: Defined requests (2 CPU, 4Gi memory) and limits (8 CPU, 32Gi memory) for compute-intensive workloads
  • Health Checks: Readiness and liveness probes on Streamlit using /_stcore/health endpoint
  • WebSocket Support: Ingress configured with extended timeouts and disabled buffering for Streamlit's real-time features
  • Persistent Storage: Shared workspace storage mounted across Streamlit and RQ worker pods
  • CI/CD Integration: Automated validation and integration testing ensures manifest correctness before deployment

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

Summary by CodeRabbit

  • New Features

    • Kubernetes deployment for the Streamlit app: Redis, ClusterIP service, Ingress, PVC-backed workspaces, scheduled daily cleanup CronJob, and a templated overlay for environment-specific hosts.
    • Centralized runtime settings provided via a ConfigMap.
  • Chores

    • Add CI workflows for building/publishing container images and for manifest validation + cluster integration testing.
    • Make workspace directory configurable and adjust Streamlit server CORS/XSRF-related settings.

claude added 2 commits March 5, 2026 19:58
Decompose the monolithic Docker container into Kubernetes workloads:
- Streamlit Deployment with health probes and session affinity
- Redis Deployment + Service for job queue
- RQ Worker Deployment for background workflows
- CronJob for workspace cleanup
- Ingress with WebSocket support and cookie-based sticky sessions
- Shared PVC (ReadWriteMany) for workspace data
- ConfigMap for runtime configuration (replaces build-time settings)
- Kustomize base + template-app overlay for multi-app deployment

Code changes:
- Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml
- Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py

CI/CD:
- Add build-and-push-image.yml to push Docker images to ghcr.io
- Add k8s-manifests-ci.yml for manifest validation and kind integration tests

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
kustomization.yaml is a Kustomize config file, not a standard K8s resource,
so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds GitHub Actions workflows for building/pushing container images and validating/integration-testing Kubernetes manifests; introduces k8s base/overlay resources (Namespace, ConfigMap, PVC, Deployments, Service, Ingress, Redis, CronJob); makes workspace path configurable and removes two Streamlit server settings.

Changes

Cohort / File(s) Summary
GitHub Actions Workflows
​.github/workflows/build-and-push-image.yml, ​.github/workflows/k8s-manifests-ci.yml
New CI: image build/push to ghcr.io (metadata, login, build/push) and manifest CI with kubeconform/kustomize validation plus kind-based integration tests that build/load images, deploy, and verify Redis/app readiness.
Kustomize & Overlays
k8s/base/kustomization.yaml, k8s/overlays/template-app/kustomization.yaml
Adds base kustomization listing resources and an overlay template-app with namePrefix, commonLabels, image override, and ingress host patch.
Core Kubernetes Resources
k8s/base/namespace.yaml, k8s/base/configmap.yaml, k8s/base/workspace-pvc.yaml
Adds Namespace openms, ConfigMap streamlit-config (settings.json), and PVC workspaces-pvc (500Gi, storageClass cinder-csi).
Application Deployments & Services
k8s/base/streamlit-deployment.yaml, k8s/base/streamlit-service.yaml, k8s/base/rq-worker-deployment.yaml
Adds Streamlit Deployment (2 replicas, probes, volumes, resources), ClusterIP Service on 8501, and RQ worker Deployment (single replica) referencing Redis and mounting config/workspaces.
State & Infrastructure
k8s/base/redis.yaml, k8s/base/ingress.yaml, k8s/base/cleanup-cronjob.yaml
Adds Redis Deployment+Service with probes, nginx Ingress with detailed annotations, and a daily CronJob that mounts the workspace PVC to run clean-up-workspaces.py.
Local script & Streamlit config
clean-up-workspaces.py, .streamlit/config.toml
clean-up-workspaces.py: make workspaces dir configurable via WORKSPACES_DIR (default /workspaces-streamlit-template); .streamlit/config.toml: removed enableCORS and enableXsrfProtection lines.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant GH as "GitHub Actions"
  participant Repo as "Repository"
  participant Registry as "GHCR"
  participant Kind as "kind Cluster"
  participant K8s as "Kubernetes (in-kind)"
  participant Redis as "Redis Pod"

  GH->>Repo: checkout code
  GH->>Registry: docker/metadata-action (tags/labels)
  GH->>Registry: docker/login (ghcr)
  GH->>Registry: docker/build-push-action (build & push image)
  Registry-->>GH: image pushed

  GH->>Repo: run k8s-manifests-ci (validate-manifests)
  Repo->>Repo: kustomize build
  Repo->>Repo: kubeconform validation

  GH->>Kind: create cluster
  GH->>Kind: load built image
  GH->>K8s: apply kustomize output (imagePullPolicy=Never)
  K8s->>Redis: start Redis pod
  Redis-->>K8s: readiness OK
  K8s->>K8s: deploy streamlit & rq-worker
  K8s-->>GH: report app & Redis readiness
Loading

Possibly related PRs

Poem

🐰 I munched on YAML by moonlit light,
I stitched CI, pods, and image flight,
Redis hummed, ingress sang along,
Cron cleaned paths where workspaces throng,
Hooray — the cluster sleeps soundly tonight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: adding Kubernetes manifests and CI/CD workflows for deployment, which aligns with the substantial changeset across k8s/ directories and new GitHub Actions workflows.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/kubernetes-migration-plan-KQJwD

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The integration-test job now uses a matrix with Dockerfile_simple and
Dockerfile. Each matrix entry checks if its Dockerfile exists before
running — all steps are guarded with an `if` condition so they skip
gracefully when a Dockerfile is absent. This allows downstream forks
that only have one Dockerfile to pass CI without errors.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (3)
k8s/overlays/template-app/kustomization.yaml (1)

14-15: Prefer immutable image pinning over main.

Using a mutable tag makes rollbacks and provenance harder; use commit SHA or digest in the overlay/environment patch.

🔧 Example adjustment
-    newTag: main
+    newTag: sha-<commit-sha>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/overlays/template-app/kustomization.yaml` around lines 14 - 15, The
overlay sets newTag: main which is mutable; update kustomization.yaml to pin the
image immutably by replacing newTag: main with a stable identifier (preferably
the commit SHA or image digest) for ghcr.io/openms/streamlit-template so
deployments are reproducible; locate the newTag entry in the kustomization.yaml
and substitute the mutable tag with either the specific SHA tag (e.g.,
v1.2.3-or-<commit-sha>) or the image digest (sha256:...) for
ghcr.io/openms/streamlit-template, and ensure any CI that updates this overlay
writes the immutable value automatically.
k8s/base/configmap.yaml (1)

12-15: The ConfigMap queue_settings are not consumed at runtime and should be removed or connected to the application configuration.

The timeout (7200) and result_ttl (86400) parameters are hardcoded in src/workflow/QueueManager.py (lines 105-112) and used when calling queue.enqueue() in src/workflow/WorkflowManager.py (line 63). The ConfigMap values are never read from configuration, so changing them has no effect on the application. Either connect the ConfigMap to load these settings dynamically or remove the ConfigMap entries to avoid confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/configmap.yaml` around lines 12 - 15, The ConfigMap "queue_settings"
keys are not used at runtime—replace the hardcoded timeout and result_ttl in
src/workflow/QueueManager.py (where QueueManager constructs defaults) by loading
those values from the application configuration (the ConfigMap) and expose them
via the QueueManager instance, and update src/workflow/WorkflowManager.py to
read the values from QueueManager before calling queue.enqueue() (instead of
using the literals currently passed to queue.enqueue()); alternatively, if you
prefer removal, delete the "queue_settings" entries from the ConfigMap to avoid
confusion and keep the hardcoded defaults as-is. Ensure you reference the
QueueManager class and the call site queue.enqueue in WorkflowManager so the
runtime value source is consistent.
.github/workflows/k8s-manifests-ci.yml (1)

80-81: Pin ingress-nginx manifest to stable release tag controller-v1.14.3.

Replace main with controller-v1.14.3 on line 80 to ensure reproducible CI runs:

Suggested change
- kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
+ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.14.3/deploy/static/provider/kind/deploy.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/k8s-manifests-ci.yml around lines 80 - 81, The
ingress-nginx manifest URL used in the kubectl apply command should be pinned to
a stable release tag for reproducible CI: update the URL in the kubectl apply
line that currently references
"https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml"
to use the release tag "controller-v1.14.3" instead of "main" (i.e., replace
"main" with "controller-v1.14.3" in that kubectl apply invocation so the
workflow installs the pinned controller-v1.14.3 manifest).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/build-and-push-image.yml:
- Around line 48-49: Replace the insecure use of build-args GITHUB_TOKEN in the
GitHub Actions workflow and Dockerfiles: stop passing GITHUB_TOKEN via
build-args (symbol: build-args: GITHUB_TOKEN) and instead configure BuildKit
secret mounts in the workflow (use --secret id=github_token) and update
Dockerfile/Dockerfile_simple to remove ARG GITHUB_TOKEN and ENV GH_TOKEN and use
a RUN step that reads /run/secrets/github_token (use
--mount=type=secret,id=github_token) to populate GH_TOKEN at build time and
conditionally run gh release download only when the secret is present; ensure
the workflow supplies the repo secret as the BuildKit secret rather than a
build-arg.

In @.github/workflows/k8s-manifests-ci.yml:
- Line 103: The CI step "Verify all deployments are available" currently appends
"|| true" to the kubectl wait command ("kubectl wait --for=condition=available
deployment -l app=template-app --timeout=120s || true"), which masks failures;
remove the "|| true" so the step fails on non-zero exit and the workflow
correctly fails when deployments do not become available.

In `@clean-up-workspaces.py`:
- Line 9: The code sets workspaces_directory from WORKSPACES_DIR without
validation; add checks before any recursive deletion: ensure the env var is
non-empty, resolve the Path (workspaces_directory.resolve()), verify it exists
and is a directory, and confirm it is within an allowed base/whitelist (e.g.,
startswith or is_relative_to a known safe base like
"/workspaces-streamlit-template") and not a top-level/root/home path; if any
check fails, log an error and abort the delete operation that uses
workspaces_directory.

In `@k8s/base/cleanup-cronjob.yaml`:
- Around line 18-45: The cleanup job runs as root and sources miniforge from
/root, so add pod- and container-level securityContext to enforce non-root
execution: set spec.securityContext with runAsNonRoot: true and fsGroup (e.g.,
1000) and add container.securityContext with runAsUser set to a non-root uid,
readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, and
capabilities.drop: ["ALL"] on the container named "cleanup" (image
openms-streamlit) so the job cannot escalate privileges; also update the
image/Dockerfile to create and switch to a non-root user and relocate Miniforge
out of /root (adjust the container command/args to source the new path) so
runAsNonRoot succeeds. Ensure the WORKSPACES_DIR and volumeMount remain writable
by the chosen fsGroup/runAsUser.

In `@k8s/base/configmap.yaml`:
- Line 11: The ConfigMap key "workspaces_dir" is set to a relative path ("..")
which can escape the container workspace; update the "workspaces_dir" value in
k8s/base/configmap.yaml to the exact absolute path used as the PVC mount in the
Pod/Deployment (i.e., the same mountPath configured on the container), ensuring
the key matches the mounted PVC path so the app uses the intended in-container
workspace.

In `@k8s/base/ingress.yaml`:
- Around line 20-32: Ingress currently exposes host streamlit.openms.example.de
without TLS; update the Ingress spec (ingressClassName: nginx, host:
streamlit.openms.example.de, backend service: name streamlit, port number 8501)
to add a tls section listing the host and a secretName (e.g., streamlit-tls) and
ensure that the corresponding TLS secret exists in the cluster (or is created
via cert-manager); optionally add any nginx ingress annotations required for
TLS/redirects, but the minimal fix is adding spec.tls with hosts:
[streamlit.openms.example.de] and secretName: <your-tls-secret>.
- Around line 16-17: Replace the unlimited body-size annotation by setting a
bounded value: change the nginx.ingress.kubernetes.io/proxy-body-size annotation
(currently "0") to a reasonable maximum such as "100m" (or another value
appropriate for expected file sizes) and update the comment that references file
upload limits; ensure the ingress uses the new annotated value so NGINX enforces
a finite upload limit to mitigate DoS/resource exhaustion risks.

In `@k8s/base/redis.yaml`:
- Around line 16-38: Add a pod and container securityContext to the Redis
manifest to enforce non-root execution and drop privileges: set pod-level
podSecurityContext (e.g., runAsNonRoot: true, runAsUser: 1000) and in the redis
container's securityContext set runAsNonRoot: true, runAsUser: 1000,
allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, and capabilities:
drop: ["ALL"]; also add writable emptyDir volumes and mounts for /data and /tmp
(volume names referenced from spec.volumes and mounted into the redis container)
so Redis can start with a read-only root filesystem; ensure the container name
"redis" and readinessProbe/livenessProbe blocks remain unchanged.

In `@k8s/base/rq-worker-deployment.yaml`:
- Around line 16-49: The rq-worker container and pod run without security
hardening; update the Pod spec to set a pod-level seccompProfile: RuntimeDefault
and update the rq-worker container spec (container name "rq-worker") to include
allowPrivilegeEscalation: false and capabilities with drop: ["ALL"]; optionally
evaluate readOnlyRootFilesystem: true on the "rq-worker" container only after
verifying the image (conda at /root/miniforge3) operates correctly without
writes.

In `@k8s/base/streamlit-deployment.yaml`:
- Around line 19-20: Update the deployment so image updates are reliably pulled:
change the imagePullPolicy from IfNotPresent to Always for the container that
uses image: openms-streamlit (referencing the imagePullPolicy field in the
Streamlit Deployment), or alternatively make the image tag immutable by
replacing newTag: main with a specific version/sha in the overlay so deployments
are deterministic; implement one of these two fixes and ensure imagePullPolicy
and newTag values are consistent across kustomization overlays and the streamlit
deployment manifest.
- Around line 16-63: The Streamlit container lacks security hardening: update
the container spec for the container named "streamlit" to add a securityContext
that drops all Linux capabilities (capabilities.drop: ["ALL"]) and sets a
seccompProfile (type: RuntimeDefault) to reduce attack surface; also consider
adding a pod-level securityContext (runAsNonRoot: true and runAsUser if
appropriate) to the Deployment spec and ensure these changes are applied
alongside existing volume mounts/readinessProbe/livenessProbe so Streamlit still
has writable runtime dirs (keep current workspaces and config mounts).

In `@k8s/base/workspace-pvc.yaml`:
- Around line 6-10: The PVC currently requests ReadWriteMany (accessModes:
ReadWriteMany) with resources.requests.storage: 100Gi but lacks a
storageClassName, which can prevent binding on clusters whose default class is
RWO-only; add a storageClassName set to an RWX-capable class (or alternatively
add per-environment overlay patches that inject storageClassName) so the PVC
binds correctly — update the PVC manifest where accessModes and resources are
defined to include storageClassName with the appropriate RWX class name.

---

Nitpick comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Around line 80-81: The ingress-nginx manifest URL used in the kubectl apply
command should be pinned to a stable release tag for reproducible CI: update the
URL in the kubectl apply line that currently references
"https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml"
to use the release tag "controller-v1.14.3" instead of "main" (i.e., replace
"main" with "controller-v1.14.3" in that kubectl apply invocation so the
workflow installs the pinned controller-v1.14.3 manifest).

In `@k8s/base/configmap.yaml`:
- Around line 12-15: The ConfigMap "queue_settings" keys are not used at
runtime—replace the hardcoded timeout and result_ttl in
src/workflow/QueueManager.py (where QueueManager constructs defaults) by loading
those values from the application configuration (the ConfigMap) and expose them
via the QueueManager instance, and update src/workflow/WorkflowManager.py to
read the values from QueueManager before calling queue.enqueue() (instead of
using the literals currently passed to queue.enqueue()); alternatively, if you
prefer removal, delete the "queue_settings" entries from the ConfigMap to avoid
confusion and keep the hardcoded defaults as-is. Ensure you reference the
QueueManager class and the call site queue.enqueue in WorkflowManager so the
runtime value source is consistent.

In `@k8s/overlays/template-app/kustomization.yaml`:
- Around line 14-15: The overlay sets newTag: main which is mutable; update
kustomization.yaml to pin the image immutably by replacing newTag: main with a
stable identifier (preferably the commit SHA or image digest) for
ghcr.io/openms/streamlit-template so deployments are reproducible; locate the
newTag entry in the kustomization.yaml and substitute the mutable tag with
either the specific SHA tag (e.g., v1.2.3-or-<commit-sha>) or the image digest
(sha256:...) for ghcr.io/openms/streamlit-template, and ensure any CI that
updates this overlay writes the immutable value automatically.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e1a9caca-9f14-4c35-9abd-1a0fc0006988

📥 Commits

Reviewing files that changed from the base of the PR and between c65d503 and aa28e91.

📒 Files selected for processing (15)
  • .github/workflows/build-and-push-image.yml
  • .github/workflows/k8s-manifests-ci.yml
  • .streamlit/config.toml
  • clean-up-workspaces.py
  • k8s/base/cleanup-cronjob.yaml
  • k8s/base/configmap.yaml
  • k8s/base/ingress.yaml
  • k8s/base/kustomization.yaml
  • k8s/base/namespace.yaml
  • k8s/base/redis.yaml
  • k8s/base/rq-worker-deployment.yaml
  • k8s/base/streamlit-deployment.yaml
  • k8s/base/streamlit-service.yaml
  • k8s/base/workspace-pvc.yaml
  • k8s/overlays/template-app/kustomization.yaml
💤 Files with no reviewable changes (1)
  • .streamlit/config.toml

Comment on lines +48 to +49
build-args: |
GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and examine Dockerfiles and their usage of GITHUB_TOKEN
fd -HI '^Dockerfile' | head -20

Repository: OpenMS/streamlit-template

Length of output: 97


🏁 Script executed:

# Check for GITHUB_TOKEN references in Dockerfiles
rg -n 'ARG\s+GITHUB_TOKEN|ENV\s+GITHUB_TOKEN|--mount=type=secret|GITHUB_TOKEN' --type dockerfile

Repository: OpenMS/streamlit-template

Length of output: 103


🏁 Script executed:

# Check the workflow file for build-args and secrets usage
rg -n 'build-args:|secrets:' .github/workflows/build-and-push-image.yml -A 3 -B 1

Repository: OpenMS/streamlit-template

Length of output: 205


🏁 Script executed:

# Get context around lines 48-49 in the workflow file
cat -n .github/workflows/build-and-push-image.yml | sed -n '40,60p'

Repository: OpenMS/streamlit-template

Length of output: 487


🏁 Script executed:

# Check Dockerfile_simple
cat -n Dockerfile_simple

Repository: OpenMS/streamlit-template

Length of output: 8542


🏁 Script executed:

# Check regular Dockerfile
cat -n Dockerfile

Repository: OpenMS/streamlit-template

Length of output: 12981


🏁 Script executed:

# Search for GITHUB_TOKEN references in all Dockerfiles
rg 'GITHUB_TOKEN' -n

Repository: OpenMS/streamlit-template

Length of output: 901


Use Docker BuildKit secret mounts instead of build-args for GITHUB_TOKEN.

Lines 48-49 pass the credential through build-args, which embeds it in image layers and cache. Both Dockerfile_simple (line 15) and Dockerfile (line 15) convert the argument to an environment variable that persists in the final image.

Since the token is only needed during build time, use --mount=type=secret with BuildKit to prevent the credential from being written to any image layers.

🔐 Proposed workflow change
       - name: Build and push Docker image
         uses: docker/build-push-action@v5
         with:
           context: .
           file: Dockerfile_simple
           push: true
           tags: ${{ steps.meta.outputs.tags }}
           labels: ${{ steps.meta.outputs.labels }}
-          build-args: |
-            GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }}
+          secrets: |
+            github_token=${{ secrets.GITHUB_TOKEN }}

Update the Dockerfile to use the secret mount instead of ARG/ENV:

- ARG GITHUB_TOKEN
- ENV GH_TOKEN=${GITHUB_TOKEN}
+ RUN --mount=type=secret,id=github_token \
+   GH_TOKEN=$(cat /run/secrets/github_token) && \
+   if [ -n "$GH_TOKEN" ]; then \
+     gh release download -R ${GITHUB_USER}/${GITHUB_REPO} -p "OpenMS-App.zip" -D /app; \
+   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build-and-push-image.yml around lines 48 - 49, Replace the
insecure use of build-args GITHUB_TOKEN in the GitHub Actions workflow and
Dockerfiles: stop passing GITHUB_TOKEN via build-args (symbol: build-args:
GITHUB_TOKEN) and instead configure BuildKit secret mounts in the workflow (use
--secret id=github_token) and update Dockerfile/Dockerfile_simple to remove ARG
GITHUB_TOKEN and ENV GH_TOKEN and use a RUN step that reads
/run/secrets/github_token (use --mount=type=secret,id=github_token) to populate
GH_TOKEN at build time and conditionally run gh release download only when the
secret is present; ensure the workflow supplies the repo secret as the BuildKit
secret rather than a build-arg.

- name: Verify all deployments are available
if: steps.check.outputs.exists == 'true'
run: |
kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s || true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate the workflow file and check its structure
fd "k8s-manifests-ci.yml" .github/workflows/

Repository: OpenMS/streamlit-template

Length of output: 107


🏁 Script executed:

# Check the specific line and surrounding context
sed -n '95,110p' .github/workflows/k8s-manifests-ci.yml | cat -n

Repository: OpenMS/streamlit-template

Length of output: 694


🏁 Script executed:

# Look for all instances of kubectl wait and || true patterns in workflows
rg -n "kubectl wait" .github/workflows/

Repository: OpenMS/streamlit-template

Length of output: 536


Remove || true to prevent masking deployment failures in tests.

Line 103 uses || true, which swallows kubectl wait failures and allows the integration test to pass even when deployments never become available. This contradicts other similar verification steps in the workflow (lines 81, 93) that properly fail when conditions aren't met. The step is named "Verify all deployments are available," but the current implementation only reports status without enforcing it.

Proposed fix
-          kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s || true
+          kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s || true
kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/k8s-manifests-ci.yml at line 103, The CI step "Verify all
deployments are available" currently appends "|| true" to the kubectl wait
command ("kubectl wait --for=condition=available deployment -l app=template-app
--timeout=120s || true"), which masks failures; remove the "|| true" so the step
fails on non-zero exit and the workflow correctly fails when deployments do not
become available.


# Define the workspaces directory
workspaces_directory = Path("/workspaces-streamlit-template")
workspaces_directory = Path(os.environ.get("WORKSPACES_DIR", "/workspaces-streamlit-template"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate WORKSPACES_DIR before using it for recursive deletion.

If WORKSPACES_DIR is empty/invalid, this can target unintended directories and delete data.

🛡️ Proposed safety guard
-workspaces_directory = Path(os.environ.get("WORKSPACES_DIR", "/workspaces-streamlit-template"))
+raw_workspaces_dir = os.environ.get("WORKSPACES_DIR", "/workspaces-streamlit-template").strip()
+if not raw_workspaces_dir:
+    raise ValueError("WORKSPACES_DIR cannot be empty")
+workspaces_directory = Path(raw_workspaces_dir).resolve()
+if not workspaces_directory.exists() or not workspaces_directory.is_dir():
+    raise FileNotFoundError(f"WORKSPACES_DIR is invalid: {workspaces_directory}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clean-up-workspaces.py` at line 9, The code sets workspaces_directory from
WORKSPACES_DIR without validation; add checks before any recursive deletion:
ensure the env var is non-empty, resolve the Path
(workspaces_directory.resolve()), verify it exists and is a directory, and
confirm it is within an allowed base/whitelist (e.g., startswith or
is_relative_to a known safe base like "/workspaces-streamlit-template") and not
a top-level/root/home path; if any check fails, log an error and abort the
delete operation that uses workspaces_directory.

Comment on lines +18 to +45
spec:
restartPolicy: OnFailure
containers:
- name: cleanup
image: openms-streamlit
imagePullPolicy: IfNotPresent
command: ["/bin/bash", "-c"]
args:
- |
source /root/miniforge3/bin/activate streamlit-env
exec python clean-up-workspaces.py
env:
- name: WORKSPACES_DIR
value: "/workspaces-streamlit-template"
volumeMounts:
- name: workspaces
mountPath: /workspaces-streamlit-template
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
volumes:
- name: workspaces
persistentVolumeClaim:
claimName: workspaces-pvc
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check current security context in cleanup-cronjob.yaml
echo "=== Current state in cleanup-cronjob.yaml ==="
rg -n 'securityContext|allowPrivilegeEscalation|readOnlyRootFilesystem|runAsNonRoot|seccompProfile' k8s/base/cleanup-cronjob.yaml -C2

# Find and examine Dockerfile
echo ""
echo "=== Searching for Dockerfile ==="
fd -HI '^Dockerfile' | head -20

# Check Dockerfile content for USER and relevant directives
echo ""
echo "=== Dockerfile USER/WORKDIR settings ==="
fd -HI '^Dockerfile' | xargs -r head -50

Repository: OpenMS/streamlit-template

Length of output: 5906


Harden pod/container security context for the cleanup job.

Lines 18-45 currently run with default privileges. The Dockerfile defaults to root user and installs miniforge to /root/miniforge3, leaving avoidable privilege-escalation and runtime-hardening gaps.

🛡️ Proposed hardening patch
         spec:
+          securityContext:
+            seccompProfile:
+              type: RuntimeDefault
           restartPolicy: OnFailure
           containers:
             - name: cleanup
               image: openms-streamlit
               imagePullPolicy: IfNotPresent
+              securityContext:
+                allowPrivilegeEscalation: false
+                capabilities:
+                  drop: ["ALL"]
+                readOnlyRootFilesystem: true
               command: ["/bin/bash", "-c"]

To migrate to runAsNonRoot, the Dockerfile must switch from root user and relocate miniforge away from /root/miniforge3.

🧰 Tools
🪛 Checkov (3.2.334)

[medium] 1-45: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-45: Minimize the admission of root containers

(CKV_K8S_23)

🪛 Trivy (0.69.1)

[error] 21-41: Root file system is not read-only

Container 'cleanup' of CronJob 'workspace-cleanup' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 21-41: Default security context configured

container workspace-cleanup in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 18-45: Default security context configured

cronjob workspace-cleanup in default namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/cleanup-cronjob.yaml` around lines 18 - 45, The cleanup job runs as
root and sources miniforge from /root, so add pod- and container-level
securityContext to enforce non-root execution: set spec.securityContext with
runAsNonRoot: true and fsGroup (e.g., 1000) and add container.securityContext
with runAsUser set to a non-root uid, readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, and capabilities.drop: ["ALL"] on the container
named "cleanup" (image openms-streamlit) so the job cannot escalate privileges;
also update the image/Dockerfile to create and switch to a non-root user and
relocate Miniforge out of /root (adjust the container command/args to source the
new path) so runAsNonRoot succeeds. Ensure the WORKSPACES_DIR and volumeMount
remain writable by the chosen fsGroup/runAsUser.

"app-name": "OpenMS WebApp Template",
"online_deployment": true,
"enable_workspaces": true,
"workspaces_dir": "..",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

workspaces_dir should match the mounted PVC path in k8s.

Using ".." is risky in-container and can resolve outside the intended workspace mount.

🔧 Proposed fix
-      "workspaces_dir": "..",
+      "workspaces_dir": "/workspaces-streamlit-template",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"workspaces_dir": "..",
"workspaces_dir": "/workspaces-streamlit-template",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/configmap.yaml` at line 11, The ConfigMap key "workspaces_dir" is
set to a relative path ("..") which can escape the container workspace; update
the "workspaces_dir" value in k8s/base/configmap.yaml to the exact absolute path
used as the PVC mount in the Pod/Deployment (i.e., the same mountPath configured
on the container), ensuring the key matches the mounted PVC path so the app uses
the intended in-container workspace.

Comment on lines +16 to +38
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "256Mi"
cpu: "250m"
readinessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 15
periodSeconds: 20
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

❓ Verification inconclusive

Script executed:

cat k8s/base/redis.yaml

Repository: OpenMS/streamlit-template


Repository: OpenMS/streamlit-template
Exit code: 0

stdout:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  labels:
    component: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      component: redis
  template:
    metadata:
      labels:
        component: redis
    spec:
      containers:
        - name: redis
          image: redis:7-alpine
          ports:
            - containerPort: 6379
          resources:
            requests:
              memory: "64Mi"
              cpu: "50m"
            limits:
              memory: "256Mi"
              cpu: "250m"
          readinessProbe:
            exec:
              command: ["redis-cli", "ping"]
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            exec:
              command: ["redis-cli", "ping"]
            initialDelaySeconds: 15
            periodSeconds: 20
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  labels:
    component: redis
spec:
  type: ClusterIP
  ports:
    - port: 6379
      targetPort: 6379
  selector:
    component: redis

redis:7-alpine readOnlyRootFilesystem kubernetes security context compatibility


Yes—redis:7-alpine can run with readOnlyRootFilesystem: true in Kubernetes, as long as you provide explicit writable mounts for the paths Redis needs to write to.

What must be writable

  • /data: the official Redis image declares persistence under VOLUME /data (RDB/AOF, etc.). With a read-only root FS, /data must be backed by a PVC or emptyDir. [1]
  • A temp directory (commonly /tmp): many hardened setups mount an emptyDir at /tmp to avoid “read-only file system” errors when software tries to create temp files. (Not Redis-specific, but commonly required in practice when enabling readOnlyRootFilesystem.) [2]

SecurityContext that’s typically compatible

The Redis official image is designed to drop privileges to a non-root redis user by default, and the Alpine variant uses a dedicated redis user (commonly UID 999). [1], [3]
So a restricted container security context usually works, provided your volume permissions align:

securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: ["ALL"]
  seccompProfile:
    type: RuntimeDefault
volumeMounts:
  - name: data
    mountPath: /data
  - name: tmp
    mountPath: /tmp
volumes:
  - name: data
    persistentVolumeClaim:
      claimName: redis-data
  - name: tmp
    emptyDir: {}

Common gotchas

  • If you override runAsUser, ensure it matches the image’s Redis UID (or ensure /data is writable via fsGroup/permissions), otherwise Redis may fail to write to /data. [3]
  • If you configure Redis to write anywhere else (custom dir, unix socket path, log file), that path must be on a writable volume; otherwise it will fail under a read-only root filesystem. [2]

Sources: Kubernetes security context docs (for readOnlyRootFilesystem) [2]; Redis official image docs (persisted data under /data, privilege dropping) [1]; Redis docker-library image user details (Alpine redis user/UID behavior) [3].


Script executed:

# Check if there are any Redis configuration files or persistence setup in the repo
find . -type f -name "redis*" -o -name "*redis*" | head -20

Repository: OpenMS/streamlit-template


Repository: OpenMS/streamlit-template
Exit code: 0

stdout:

./k8s/base/redis.yaml

Harden Redis pod/container security context.

The deployment currently runs with default security settings. Please add pod and container security contexts to enforce non-root execution, drop capabilities, and disable privilege escalation.

Note: If enabling readOnlyRootFilesystem: true, ensure both /data and /tmp are mounted as writable volumes (typically emptyDir). Redis will fail to start without writable access to /tmp.

🔒 Corrected hardening patch
 spec:
   replicas: 1
   selector:
     matchLabels:
       component: redis
   template:
     metadata:
       labels:
         component: redis
     spec:
+      securityContext:
+        runAsNonRoot: true
+        seccompProfile:
+          type: RuntimeDefault
       containers:
         - name: redis
           image: redis:7-alpine
           ports:
             - containerPort: 6379
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
+            readOnlyRootFilesystem: true
+          volumeMounts:
+            - name: redis-data
+              mountPath: /data
+            - name: tmp
+              mountPath: /tmp
           resources:
             requests:
               memory: "64Mi"
               cpu: "50m"
             limits:
               memory: "256Mi"
               cpu: "250m"
           readinessProbe:
             exec:
               command: ["redis-cli", "ping"]
             initialDelaySeconds: 5
             periodSeconds: 10
           livenessProbe:
             exec:
               command: ["redis-cli", "ping"]
             initialDelaySeconds: 15
             periodSeconds: 20
+      volumes:
+        - name: redis-data
+          emptyDir: {}
+        - name: tmp
+          emptyDir: {}
🧰 Tools
🪛 Trivy (0.69.1)

[error] 18-38: Root file system is not read-only

Container 'redis' of Deployment 'redis' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 18-38: Default security context configured

container redis in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 16-38: Default security context configured

deployment redis in default namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/redis.yaml` around lines 16 - 38, Add a pod and container
securityContext to the Redis manifest to enforce non-root execution and drop
privileges: set pod-level podSecurityContext (e.g., runAsNonRoot: true,
runAsUser: 1000) and in the redis container's securityContext set runAsNonRoot:
true, runAsUser: 1000, allowPrivilegeEscalation: false, readOnlyRootFilesystem:
true, and capabilities: drop: ["ALL"]; also add writable emptyDir volumes and
mounts for /data and /tmp (volume names referenced from spec.volumes and mounted
into the redis container) so Redis can start with a read-only root filesystem;
ensure the container name "redis" and readinessProbe/livenessProbe blocks remain
unchanged.

Comment on lines +16 to +49
spec:
containers:
- name: rq-worker
image: openms-streamlit
imagePullPolicy: IfNotPresent
command: ["/bin/bash", "-c"]
args:
- |
source /root/miniforge3/bin/activate streamlit-env
exec rq worker openms-workflows --url $REDIS_URL
env:
- name: REDIS_URL
value: "redis://redis:6379/0"
volumeMounts:
- name: workspaces
mountPath: /workspaces-streamlit-template
- name: config
mountPath: /app/settings.json
subPath: settings.json
readOnly: true
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "32Gi"
cpu: "8"
volumes:
- name: workspaces
persistentVolumeClaim:
claimName: workspaces-pvc
- name: config
configMap:
name: streamlit-config
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the actual deployment file
find . -type f -name "rq-worker-deployment.yaml" | head -5

Repository: OpenMS/streamlit-template

Length of output: 105


🏁 Script executed:

# Also check the overall structure of the k8s directory
ls -la k8s/base/ 2>/dev/null || find . -type d -name "base" | grep k8s

Repository: OpenMS/streamlit-template

Length of output: 858


🏁 Script executed:

# Read the actual rq-worker-deployment.yaml file
cat -n k8s/base/rq-worker-deployment.yaml

Repository: OpenMS/streamlit-template

Length of output: 1678


🏁 Script executed:

# Check if there's a Dockerfile to understand image dependencies and write requirements
find . -type f -name "Dockerfile" -o -name "Dockerfile.*" | head -10

Repository: OpenMS/streamlit-template

Length of output: 81


🏁 Script executed:

# Check other deployment/pod manifests to see if security contexts are used elsewhere
cat -n k8s/base/streamlit-deployment.yaml

Repository: OpenMS/streamlit-template

Length of output: 2176


🏁 Script executed:

# Check the cleanup-cronjob.yaml mentioned in the scratchpad to see if it has similar issues
cat -n k8s/base/cleanup-cronjob.yaml

Repository: OpenMS/streamlit-template

Length of output: 1670


🏁 Script executed:

# Read the Dockerfile to understand image structure and capabilities
cat -n Dockerfile

Repository: OpenMS/streamlit-template

Length of output: 12981


rq-worker container missing security hardening controls.

Lines 16-49 run with default security context, weakening least-privilege guarantees. Add pod-level seccomp policy and container-level restrictions:

  • seccompProfile: RuntimeDefault on pod spec
  • allowPrivilegeEscalation: false, capabilities: drop: ["ALL"] on container spec

Note: readOnlyRootFilesystem: true requires validating the image can function without write access to the root filesystem; it may conflict with conda environment activation from /root/miniforge3 or temporary file requirements during job execution. Test carefully before applying.

🧰 Tools
🪛 Checkov (3.2.334)

[medium] 1-49: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-49: Minimize the admission of root containers

(CKV_K8S_23)

🪛 Trivy (0.69.1)

[error] 18-42: Root file system is not read-only

Container 'rq-worker' of Deployment 'rq-worker' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 18-42: Default security context configured

container rq-worker in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 16-49: Default security context configured

deployment rq-worker in default namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/rq-worker-deployment.yaml` around lines 16 - 49, The rq-worker
container and pod run without security hardening; update the Pod spec to set a
pod-level seccompProfile: RuntimeDefault and update the rq-worker container spec
(container name "rq-worker") to include allowPrivilegeEscalation: false and
capabilities with drop: ["ALL"]; optionally evaluate readOnlyRootFilesystem:
true on the "rq-worker" container only after verifying the image (conda at
/root/miniforge3) operates correctly without writes.

Comment on lines +16 to +63
spec:
containers:
- name: streamlit
image: openms-streamlit
imagePullPolicy: IfNotPresent
command: ["/bin/bash", "-c"]
args:
- |
source /root/miniforge3/bin/activate streamlit-env
exec streamlit run app.py --server.address 0.0.0.0
ports:
- containerPort: 8501
env:
- name: REDIS_URL
value: "redis://redis:6379/0"
volumeMounts:
- name: workspaces
mountPath: /workspaces-streamlit-template
- name: config
mountPath: /app/settings.json
subPath: settings.json
readOnly: true
readinessProbe:
httpGet:
path: /_stcore/health
port: 8501
initialDelaySeconds: 10
periodSeconds: 10
livenessProbe:
httpGet:
path: /_stcore/health
port: 8501
initialDelaySeconds: 30
periodSeconds: 30
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "32Gi"
cpu: "8"
volumes:
- name: workspaces
persistentVolumeClaim:
claimName: workspaces-pvc
- name: config
configMap:
name: streamlit-config
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the streamlit deployment file
fd -t f "streamlit-deployment.yaml"

Repository: OpenMS/streamlit-template

Length of output: 103


🏁 Script executed:

# Let's also check the kustomization file referenced in the comment
fd -t f "kustomization.yaml" --path "*overlays*"

Repository: OpenMS/streamlit-template

Length of output: 302


🏁 Script executed:

# Read the streamlit deployment file to verify security context
cat -n k8s/base/streamlit-deployment.yaml

Repository: OpenMS/streamlit-template

Length of output: 2176


🏁 Script executed:

# Find kustomization.yaml files correctly
fd -t f "kustomization.yaml"

Repository: OpenMS/streamlit-template

Length of output: 141


🏁 Script executed:

# Read the overlay kustomization file to verify mutable tag claim
cat -n k8s/overlays/template-app/kustomization.yaml

Repository: OpenMS/streamlit-template

Length of output: 661


🏁 Script executed:

# Let's also check if there are any security policies or other hardening measures in the k8s directory
find k8s -type f -name "*.yaml" | head -20

Repository: OpenMS/streamlit-template

Length of output: 391


🏁 Script executed:

# Check if there's any documentation or comments in deployment about filesystem requirements
grep -r "readOnly\|writeable\|cache" k8s/ --include="*.yaml"

Repository: OpenMS/streamlit-template

Length of output: 196


🏁 Script executed:

# Check the base kustomization to see if there are any other security settings
cat -n k8s/base/kustomization.yaml

Repository: OpenMS/streamlit-template

Length of output: 440


Streamlit pod/container are missing baseline security controls.

Lines 16-63 lack securityContext hardening. Add seccomp profile and drop all capabilities to reduce attack surface.

🛡️ Proposed hardening patch
     spec:
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       containers:
         - name: streamlit
           image: openms-streamlit
           imagePullPolicy: IfNotPresent
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
           command: ["/bin/bash", "-c"]

Note: readOnlyRootFilesystem: true was considered but is not feasible here—Streamlit requires writable filesystem for cache and temp directories. If read-only enforcement is required, it would need explicit emptyDir mounts for runtime directories.

🧰 Tools
🪛 Checkov (3.2.334)

[medium] 1-63: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-63: Minimize the admission of root containers

(CKV_K8S_23)

🪛 Trivy (0.69.1)

[error] 18-56: Root file system is not read-only

Container 'streamlit' of Deployment 'streamlit' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 18-56: Default security context configured

container streamlit in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 16-63: Default security context configured

deployment streamlit in default namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/streamlit-deployment.yaml` around lines 16 - 63, The Streamlit
container lacks security hardening: update the container spec for the container
named "streamlit" to add a securityContext that drops all Linux capabilities
(capabilities.drop: ["ALL"]) and sets a seccompProfile (type: RuntimeDefault) to
reduce attack surface; also consider adding a pod-level securityContext
(runAsNonRoot: true and runAsUser if appropriate) to the Deployment spec and
ensure these changes are applied alongside existing volume
mounts/readinessProbe/livenessProbe so Streamlit still has writable runtime dirs
(keep current workspaces and config mounts).

Comment on lines +19 to +20
image: openms-streamlit
imagePullPolicy: IfNotPresent
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's check if these files exist
echo "=== Checking file existence ==="
ls -la k8s/base/streamlit-deployment.yaml 2>/dev/null || echo "Not found: k8s/base/streamlit-deployment.yaml"
ls -la k8s/overlays/template-app/kustomization.yaml 2>/dev/null || echo "Not found: k8s/overlays/template-app/kustomization.yaml"

# Check the directory structure
echo "=== Directory structure ==="
find k8s -type f -name "*.yaml" -o -name "*.yml" 2>/dev/null | head -20

Repository: OpenMS/streamlit-template

Length of output: 628


🏁 Script executed:

#!/bin/bash
# Examine the streamlit deployment file
echo "=== Content of k8s/base/streamlit-deployment.yaml ==="
cat -n k8s/base/streamlit-deployment.yaml 2>/dev/null || echo "File not found"

echo ""
echo "=== Content of k8s/overlays/template-app/kustomization.yaml ==="
cat -n k8s/overlays/template-app/kustomization.yaml 2>/dev/null || echo "File not found"

Repository: OpenMS/streamlit-template

Length of output: 2888


Use imagePullPolicy: Always or switch to immutable image tags.

Line 20 combined with newTag: main in k8s/overlays/template-app/kustomization.yaml creates a risky pattern: IfNotPresent prevents pulling newer versions of the main tag, leaving stale images running and making rollouts non-deterministic.

Recommended fix
-          imagePullPolicy: IfNotPresent
+          imagePullPolicy: Always

Alternatively, replace mutable main tag with a specific version (semver or SHA) in the overlay for fully deterministic deployments.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
image: openms-streamlit
imagePullPolicy: IfNotPresent
image: openms-streamlit
imagePullPolicy: Always
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/streamlit-deployment.yaml` around lines 19 - 20, Update the
deployment so image updates are reliably pulled: change the imagePullPolicy from
IfNotPresent to Always for the container that uses image: openms-streamlit
(referencing the imagePullPolicy field in the Streamlit Deployment), or
alternatively make the image tag immutable by replacing newTag: main with a
specific version/sha in the overlay so deployments are deterministic; implement
one of these two fixes and ensure imagePullPolicy and newTag values are
consistent across kustomization overlays and the streamlit deployment manifest.

Comment on lines +6 to +10
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Gi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual PVC file
cat -n k8s/base/workspace-pvc.yaml

Repository: OpenMS/streamlit-template

Length of output: 303


🏁 Script executed:

# Check for overlays directory structure
find k8s -type d -name "overlays" | head -20

Repository: OpenMS/streamlit-template

Length of output: 81


🏁 Script executed:

# Search for storageClassName references and RWX configuration
rg -n --type=yaml 'storageClassName|ReadWriteMany' k8s/ -A 2 -B 2

Repository: OpenMS/streamlit-template

Length of output: 286


🏁 Script executed:

# Look for any patches or kustomization files that might modify the PVC
find k8s -name "kustomization.yaml" -o -name "kustomization.yml" | xargs grep -l "workspace-pvc" 2>/dev/null

Repository: OpenMS/streamlit-template

Length of output: 96


🏁 Script executed:

# Check the kustomization.yaml in base
cat -n k8s/base/kustomization.yaml

Repository: OpenMS/streamlit-template

Length of output: 440


🏁 Script executed:

# List overlay structure
ls -la k8s/overlays/

Repository: OpenMS/streamlit-template

Length of output: 235


🏁 Script executed:

# Check for patches related to workspace-pvc in overlays
find k8s/overlays -type f \( -name "*.yaml" -o -name "*.yml" \) | xargs grep -l "storageClassName\|workspace-pvc" 2>/dev/null

Repository: OpenMS/streamlit-template

Length of output: 53


🏁 Script executed:

# Check for any pvc-related patches in overlays
find k8s/overlays -type f -name "*pvc*" -o -name "*patch*" | head -20

Repository: OpenMS/streamlit-template

Length of output: 51


🏁 Script executed:

# Check the overlay's kustomization.yaml
cat -n k8s/overlays/template-app/kustomization.yaml

Repository: OpenMS/streamlit-template

Length of output: 661


🏁 Script executed:

# Check if there are any patch files in the overlay
ls -la k8s/overlays/template-app/

Repository: OpenMS/streamlit-template

Length of output: 244


🏁 Script executed:

# Search specifically for any YAML files in overlay that might patch the PVC
find k8s/overlays -type f -name "*.yaml" -o -name "*.yml" | xargs cat

Repository: OpenMS/streamlit-template

Length of output: 493


🏁 Script executed:

# Check if there's any documentation about storage class requirements
find . -name "README*" -o -name "INSTALL*" -o -name "*.md" | xargs grep -l "storage\|RWX\|PVC" 2>/dev/null | head -5

Repository: OpenMS/streamlit-template

Length of output: 89


🏁 Script executed:

# Check the user guide for any storage/PVC requirements
cat -n ./docs/user_guide.md | grep -A 10 -B 5 -i "storage\|pvc\|rwx\|persistent"

Repository: OpenMS/streamlit-template

Length of output: 1159


🏁 Script executed:

# Also check if there are other overlays or environments we should examine
find k8s/overlays -type d -mindepth 1 | sort

Repository: OpenMS/streamlit-template

Length of output: 94


Specify an RWX-capable storageClassName for portability.

ReadWriteMany requires explicit storage class configuration. Without it, the PVC may fail to bind on clusters whose default class is RWO-only or unavailable. Currently, neither the base configuration nor the overlay provides a storageClassName.

🔧 Proposed fix
 spec:
+  storageClassName: <rwx-storage-class>
   accessModes:
     - ReadWriteMany

Either define the storageClassName in the base PVC or add an overlay patch per environment with the appropriate RWX-capable class name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/workspace-pvc.yaml` around lines 6 - 10, The PVC currently requests
ReadWriteMany (accessModes: ReadWriteMany) with resources.requests.storage:
100Gi but lacks a storageClassName, which can prevent binding on clusters whose
default class is RWO-only; add a storageClassName set to an RWX-capable class
(or alternatively add per-environment overlay patches that inject
storageClassName) so the PVC binds correctly — update the PVC manifest where
accessModes and resources are defined to include storageClassName with the
appropriate RWX class name.

- Switch workspace PVC from ReadWriteMany to ReadWriteOnce with
  cinder-csi storage class (required by de.NBI KKP cluster)
- Increase PVC storage to 500Gi
- Add namespace: openms to kustomization.yaml
- Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU)
  so all workspace-mounting pods fit on a single node

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
k8s/base/rq-worker-deployment.yaml (1)

16-49: ⚠️ Potential issue | 🟠 Major

Add baseline pod/container security hardening.

Line 16 and Line 18 still use the default security context, so the worker keeps default privileges and capability set. Please add pod-level seccomp and container-level privilege restrictions here.

Suggested patch
     spec:
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       containers:
         - name: rq-worker
           image: openms-streamlit
           imagePullPolicy: IfNotPresent
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
           command: ["/bin/bash", "-c"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/rq-worker-deployment.yaml` around lines 16 - 49, Add pod- and
container-level security contexts: under the Pod spec set a
podSecurityContext/seccompProfile (e.g., type: RuntimeDefault) to enforce
seccomp, and in the container entry for name: rq-worker add a securityContext
that enforces runAsNonRoot: true, allowPrivilegeEscalation: false, privileged:
false, readOnlyRootFilesystem: true and drop all capabilities
(capabilities.drop: ["ALL"]); ensure these keys are added alongside the existing
spec.containers and not inside args so the rq-worker container is restricted at
runtime.
k8s/base/streamlit-deployment.yaml (2)

16-63: ⚠️ Potential issue | 🟠 Major

Add baseline pod/container security hardening.

Line 16 and Line 18 still run with the default security context. That leaves the pod with a broader privilege surface than necessary.

Suggested patch
     spec:
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       containers:
         - name: streamlit
           image: openms-streamlit
           imagePullPolicy: IfNotPresent
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
           command: ["/bin/bash", "-c"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/streamlit-deployment.yaml` around lines 16 - 63, The Pod spec for
the Streamlit container (container name "streamlit" in the Deployment spec)
lacks a securityContext; add a podSecurityContext and container securityContext
to harden privileges: set podSecurityContext.fsGroup and runAsNonRoot/runAsUser,
and on the container set securityContext.runAsUser (e.g., 1000), runAsGroup,
runAsNonRoot: true, allowPrivilegeEscalation: false, readOnlyRootFilesystem:
true, drop all capabilities, and add a seccompProfile (RuntimeDefault) and
readOnlyRootFilesystem where appropriate; ensure volume mounts (volumeMounts:
workspaces/config) remain writable if needed by adjusting fsGroup rather than
granting root.

19-20: ⚠️ Potential issue | 🟠 Major

Avoid IfNotPresent for an untagged or mutable app image.

Line 20 makes rollouts non-deterministic. openms-streamlit is untagged here, so direct applies resolve to latest, and if the overlay keeps using a mutable tag like main, new pods can keep running an older cached build. k8s/base/rq-worker-deployment.yaml Line 20 has the same problem, so both deployments should be aligned.

Suggested patch if you keep mutable tags
-          imagePullPolicy: IfNotPresent
+          imagePullPolicy: Always
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/streamlit-deployment.yaml` around lines 19 - 20, The deployment uses
an untagged/mutable image ("image: openms-streamlit") but keeps imagePullPolicy:
IfNotPresent, which makes rollouts non-deterministic; update the Streamlit
Deployment (and align rq-worker-deployment.yaml similarly) to either pin the
image to an immutable tag or set imagePullPolicy to Always for the current
mutable/untagged image, i.e., change the image reference to a fixed tag
(preferred) or change imagePullPolicy from IfNotPresent to Always on the
Deployment spec that contains "image: openms-streamlit" so new pods always pull
the latest build.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@k8s/base/rq-worker-deployment.yaml`:
- Around line 16-49: Add pod- and container-level security contexts: under the
Pod spec set a podSecurityContext/seccompProfile (e.g., type: RuntimeDefault) to
enforce seccomp, and in the container entry for name: rq-worker add a
securityContext that enforces runAsNonRoot: true, allowPrivilegeEscalation:
false, privileged: false, readOnlyRootFilesystem: true and drop all capabilities
(capabilities.drop: ["ALL"]); ensure these keys are added alongside the existing
spec.containers and not inside args so the rq-worker container is restricted at
runtime.

In `@k8s/base/streamlit-deployment.yaml`:
- Around line 16-63: The Pod spec for the Streamlit container (container name
"streamlit" in the Deployment spec) lacks a securityContext; add a
podSecurityContext and container securityContext to harden privileges: set
podSecurityContext.fsGroup and runAsNonRoot/runAsUser, and on the container set
securityContext.runAsUser (e.g., 1000), runAsGroup, runAsNonRoot: true,
allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, drop all
capabilities, and add a seccompProfile (RuntimeDefault) and
readOnlyRootFilesystem where appropriate; ensure volume mounts (volumeMounts:
workspaces/config) remain writable if needed by adjusting fsGroup rather than
granting root.
- Around line 19-20: The deployment uses an untagged/mutable image ("image:
openms-streamlit") but keeps imagePullPolicy: IfNotPresent, which makes rollouts
non-deterministic; update the Streamlit Deployment (and align
rq-worker-deployment.yaml similarly) to either pin the image to an immutable tag
or set imagePullPolicy to Always for the current mutable/untagged image, i.e.,
change the image reference to a fixed tag (preferred) or change imagePullPolicy
from IfNotPresent to Always on the Deployment spec that contains "image:
openms-streamlit" so new pods always pull the latest build.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6454e3b0-21c3-4f57-b2cd-f57c0c9d6c1d

📥 Commits

Reviewing files that changed from the base of the PR and between aa28e91 and 9aa798b.

📒 Files selected for processing (4)
  • k8s/base/kustomization.yaml
  • k8s/base/rq-worker-deployment.yaml
  • k8s/base/streamlit-deployment.yaml
  • k8s/base/workspace-pvc.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • k8s/base/workspace-pvc.yaml
  • k8s/base/kustomization.yaml

The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which
requires all pods mounting it to run on the same node. Without explicit
affinity rules, the scheduler was failing silently, leaving pods in
Pending state with no events.

Adds a `volume-group: workspaces` label and podAffinity with
requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment,
rq-worker deployment, and cleanup cronjob. This ensures the scheduler
explicitly co-locates all workspace-consuming pods on the same node.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
k8s/base/cleanup-cronjob.yaml (1)

19-39: ⚠️ Potential issue | 🟠 Major

Add an explicit security context and stop depending on /root for the runtime.

Lines 19-39 still run with the default pod/container privileges, and the command on Lines 35-39 ties the job to /root/miniforge3. That leaves the CronJob running as root by default today, and it also means a later runAsNonRoot change will fail unless the image is updated too.

🛡️ Manifest-side hardening
         spec:
+          securityContext:
+            seccompProfile:
+              type: RuntimeDefault
           restartPolicy: OnFailure
           affinity:
             podAffinity:
               requiredDuringSchedulingIgnoredDuringExecution:
@@
           containers:
             - name: cleanup
               image: openms-streamlit
               imagePullPolicy: IfNotPresent
+              securityContext:
+                allowPrivilegeEscalation: false
+                capabilities:
+                  drop: ["ALL"]
+                readOnlyRootFilesystem: true
               command: ["/bin/bash", "-c"]

After that, the image should be updated so the environment is no longer sourced from /root/miniforge3, then runAsNonRoot/runAsUser can be enforced here as well.

#!/bin/bash
set -euo pipefail

echo "=== cleanup-cronjob securityContext ==="
sed -n '1,120p' k8s/base/cleanup-cronjob.yaml | nl -ba | sed -n '19,60p'

echo
echo "=== Dockerfile USER / miniforge references ==="
fd -HI '^Dockerfile$' -x sh -c '
  echo "--- {} ---"
  rg -n "^(FROM|USER|WORKDIR|ENV|RUN)|miniforge|conda" "{}" -C1 || true
'

Expected result: the manifest shows no securityContext, and the image build should confirm whether it still relies on /root/miniforge3 or lacks a non-root USER.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/cleanup-cronjob.yaml` around lines 19 - 39, The CronJob manifest is
missing a securityContext and still sources /root/miniforge3 in the container
args; add a pod-level securityContext and container.securityContext with
runAsNonRoot: true and runAsUser set to a non-root UID (e.g., 1000) and set
fsGroup as needed, and update the cleanup container spec (containers -> name:
cleanup, image, command/args) to stop sourcing /root/miniforge3 (remove the
"source /root/miniforge3/bin/activate streamlit-env" line and rely on the image
PATH or non-root home), then ensure the image is rebuilt/updated so that the
runtime does not depend on /root/miniforge3 allowing runAsUser/runAsNonRoot to
be enforced.
k8s/base/rq-worker-deployment.yaml (1)

17-31: ⚠️ Potential issue | 🟠 Major

Add baseline pod/container security context.

This Deployment still runs with the default security context. Add a pod-level seccompProfile: RuntimeDefault and container-level allowPrivilegeEscalation: false plus capabilities.drop: ["ALL"]; validate readOnlyRootFilesystem separately because the Miniforge/RQ process may need writes.

Proposed patch
     spec:
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       affinity:
         podAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
@@
       containers:
         - name: rq-worker
           image: openms-streamlit
           imagePullPolicy: IfNotPresent
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
           command: ["/bin/bash", "-c"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/rq-worker-deployment.yaml` around lines 17 - 31, Add a pod-level
seccompProfile and container-level privilege restrictions: under the Deployment
spec's pod template (near the existing spec/podAffinity block) add a
podSecurityContext with seccompProfile.type set to RuntimeDefault, and in the
rq-worker container spec (container name "rq-worker") set
securityContext.allowPrivilegeEscalation to false and
securityContext.capabilities.drop to ["ALL"]; also evaluate and, if compatible,
set securityContext.readOnlyRootFilesystem for the container (leave it unset if
the Miniforge/RQ process requires writes).
k8s/base/streamlit-deployment.yaml (1)

17-31: ⚠️ Potential issue | 🟠 Major

Add baseline pod/container security context.

This pod/container pair still uses the default security context. Please add pod-level seccompProfile: RuntimeDefault and container-level allowPrivilegeEscalation: false with capabilities.drop: ["ALL"]; only add readOnlyRootFilesystem after confirming Streamlit has writable runtime dirs elsewhere.

Proposed patch
     spec:
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       affinity:
         podAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
@@
       containers:
         - name: streamlit
           image: openms-streamlit
           imagePullPolicy: IfNotPresent
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
           command: ["/bin/bash", "-c"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/streamlit-deployment.yaml` around lines 17 - 31, Add a pod-level
securityContext with seccompProfile.type set to RuntimeDefault under the same
spec that contains podAffinity, and in the containers list add a securityContext
for the streamlit container (name: streamlit) that sets
allowPrivilegeEscalation: false and capabilities.drop: ["ALL"]; do not add
readOnlyRootFilesystem yet until writable runtime dirs are confirmed. Locate the
pod spec around podAffinity and the containers entry for the streamlit container
and insert these securityContext fields accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@k8s/base/streamlit-deployment.yaml`:
- Line 8: The deployment currently sets replicas: 2 while the podAffinity +
shared workspace PVC pins pods to the same node, so change the configuration to
reflect true HA: either reduce replicas to 1 (update replicas: 2 → replicas: 1
in the Streamlit Deployment), or migrate the workspace PVC to
RWX/external/object storage and then remove or loosen the same-node podAffinity
(look for podAffinity/podAntiAffinity and the shared workspace PVC reference) so
multiple replicas can land on different nodes; ensure any change updates related
sections noted around lines 18-27 and 68-71 consistently.

---

Duplicate comments:
In `@k8s/base/cleanup-cronjob.yaml`:
- Around line 19-39: The CronJob manifest is missing a securityContext and still
sources /root/miniforge3 in the container args; add a pod-level securityContext
and container.securityContext with runAsNonRoot: true and runAsUser set to a
non-root UID (e.g., 1000) and set fsGroup as needed, and update the cleanup
container spec (containers -> name: cleanup, image, command/args) to stop
sourcing /root/miniforge3 (remove the "source /root/miniforge3/bin/activate
streamlit-env" line and rely on the image PATH or non-root home), then ensure
the image is rebuilt/updated so that the runtime does not depend on
/root/miniforge3 allowing runAsUser/runAsNonRoot to be enforced.

In `@k8s/base/rq-worker-deployment.yaml`:
- Around line 17-31: Add a pod-level seccompProfile and container-level
privilege restrictions: under the Deployment spec's pod template (near the
existing spec/podAffinity block) add a podSecurityContext with
seccompProfile.type set to RuntimeDefault, and in the rq-worker container spec
(container name "rq-worker") set securityContext.allowPrivilegeEscalation to
false and securityContext.capabilities.drop to ["ALL"]; also evaluate and, if
compatible, set securityContext.readOnlyRootFilesystem for the container (leave
it unset if the Miniforge/RQ process requires writes).

In `@k8s/base/streamlit-deployment.yaml`:
- Around line 17-31: Add a pod-level securityContext with seccompProfile.type
set to RuntimeDefault under the same spec that contains podAffinity, and in the
containers list add a securityContext for the streamlit container (name:
streamlit) that sets allowPrivilegeEscalation: false and capabilities.drop:
["ALL"]; do not add readOnlyRootFilesystem yet until writable runtime dirs are
confirmed. Locate the pod spec around podAffinity and the containers entry for
the streamlit container and insert these securityContext fields accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 37843ee5-fcbc-4f54-963a-c742b8c9f889

📥 Commits

Reviewing files that changed from the base of the PR and between 9aa798b and 0d90cf7.

📒 Files selected for processing (3)
  • k8s/base/cleanup-cronjob.yaml
  • k8s/base/rq-worker-deployment.yaml
  • k8s/base/streamlit-deployment.yaml

labels:
component: streamlit
spec:
replicas: 2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

replicas: 2 does not provide node-level HA with this storage/scheduling model.

The required same-node pod affinity plus the shared workspace PVC means both Streamlit replicas are effectively pinned to one node. A node drain/failure still drops the whole frontend, so this replica count gives a misleading HA story unless the workspace storage moves to RWX or external/object storage.

Also applies to: 18-27, 68-71

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/streamlit-deployment.yaml` at line 8, The deployment currently sets
replicas: 2 while the podAffinity + shared workspace PVC pins pods to the same
node, so change the configuration to reflect true HA: either reduce replicas to
1 (update replicas: 2 → replicas: 1 in the Streamlit Deployment), or migrate the
workspace PVC to RWX/external/object storage and then remove or loosen the
same-node podAffinity (look for podAffinity/podAntiAffinity and the shared
workspace PVC reference) so multiple replicas can land on different nodes;
ensure any change updates related sections noted around lines 18-27 and 68-71
consistently.

claude added 2 commits March 11, 2026 17:46
The controller pod being Ready doesn't guarantee the admission webhook
service is accepting connections. Add a polling loop that waits for the
webhook endpoint to have an IP assigned before applying the Ingress
resource, preventing "connection refused" errors during kustomize apply.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.github/workflows/k8s-manifests-ci.yml (1)

113-113: ⚠️ Potential issue | 🔴 Critical

Remove || true to prevent masking deployment failures.

The || true suffix swallows kubectl wait failures, allowing the integration test to pass even when deployments never become available. This contradicts the purpose of the verification step.

🐛 Proposed fix
-          kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s || true
+          kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/k8s-manifests-ci.yml at line 113, The CI step currently
appends "|| true" to the kubectl wait command (kubectl wait
--for=condition=available deployment -l app=template-app --timeout=120s ||
true), which masks deployment failures; remove the "|| true" suffix so that
kubectl wait exits with a non-zero status on failure and causes the job to fail,
ensuring deployment availability errors are detected by the workflow.
🧹 Nitpick comments (2)
.github/workflows/k8s-manifests-ci.yml (2)

17-20: Pin kubeconform to a specific version for reproducible CI builds.

Using /latest/ in the download URL means the workflow could break unexpectedly if a new kubeconform release introduces breaking changes or bugs.

♻️ Proposed fix to pin the version
      - name: Install kubeconform
        run: |
-          curl -sSL https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz | tar xz
+          KUBECONFORM_VERSION="v0.6.7"
+          curl -sSL "https://github.com/yannh/kubeconform/releases/download/${KUBECONFORM_VERSION}/kubeconform-linux-amd64.tar.gz" | tar xz
          sudo mv kubeconform /usr/local/bin/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/k8s-manifests-ci.yml around lines 17 - 20, The CI step
that installs kubeconform (the "Install kubeconform" run block) currently
downloads from the /latest/ URL which is not reproducible; change it to pin a
specific release by defining and using a KUBECONFORM_VERSION variable (or
hardcode a version string) and update the curl URL to reference that version
(e.g., use the release tarball URL containing the version tag instead of
/latest/), and ensure the move (sudo mv kubeconform /usr/local/bin/) remains the
same so the pinned binary is installed.

80-80: Pin ingress-nginx to a stable versioned release instead of main branch.

Referencing the main branch can introduce instability if upstream makes breaking changes. Use the latest stable release, controller-v1.15.0, instead.

♻️ Proposed fix
-          kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
+          kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.15.0/deploy/static/provider/kind/deploy.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/k8s-manifests-ci.yml at line 80, Update the kubectl apply
invocation that currently points at the ingress-nginx main branch to a fixed,
stable release URL: replace the URL
"https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml"
with the tagged release URL for controller-v1.15.0 (e.g.,
"https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.15.0/deploy/static/provider/kind/deploy.yaml")
so the workflow always installs the specified stable ingress-nginx release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Around line 77-91: The ingress webhook readiness loop in the "Install nginx
ingress controller" step can silently time out; update that step so the script
tracks whether the webhook became ready (e.g., set a READY flag inside the
for-loop when the endpoint check succeeds) and after the loop check the flag
and, if not set, print a clear error message and exit non-zero (exit 1) to fail
the workflow rather than proceeding.

---

Duplicate comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Line 113: The CI step currently appends "|| true" to the kubectl wait command
(kubectl wait --for=condition=available deployment -l app=template-app
--timeout=120s || true), which masks deployment failures; remove the "|| true"
suffix so that kubectl wait exits with a non-zero status on failure and causes
the job to fail, ensuring deployment availability errors are detected by the
workflow.

---

Nitpick comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Around line 17-20: The CI step that installs kubeconform (the "Install
kubeconform" run block) currently downloads from the /latest/ URL which is not
reproducible; change it to pin a specific release by defining and using a
KUBECONFORM_VERSION variable (or hardcode a version string) and update the curl
URL to reference that version (e.g., use the release tarball URL containing the
version tag instead of /latest/), and ensure the move (sudo mv kubeconform
/usr/local/bin/) remains the same so the pinned binary is installed.
- Line 80: Update the kubectl apply invocation that currently points at the
ingress-nginx main branch to a fixed, stable release URL: replace the URL
"https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml"
with the tagged release URL for controller-v1.15.0 (e.g.,
"https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.15.0/deploy/static/provider/kind/deploy.yaml")
so the workflow always installs the specified stable ingress-nginx release.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95cfa9b5-b2f5-49bc-b1f6-80489bb73742

📥 Commits

Reviewing files that changed from the base of the PR and between 0d90cf7 and ed3075b.

📒 Files selected for processing (1)
  • .github/workflows/k8s-manifests-ci.yml

Comment on lines +77 to +91
- name: Install nginx ingress controller
if: steps.check.outputs.exists == 'true'
run: |
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=90s
# Wait for the admission webhook to be ready to accept connections
echo "Waiting for ingress-nginx admission webhook..."
for i in $(seq 1 30); do
if kubectl get endpoints -n ingress-nginx ingress-nginx-controller-admission -o jsonpath='{.subsets[0].addresses[0].ip}' 2>/dev/null | grep -q .; then
echo "Webhook endpoint ready"
break
fi
echo " attempt $i/30 - waiting..."
sleep 2
done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ingress webhook readiness loop silently continues on timeout.

If the webhook endpoint never becomes ready within 30 attempts, the loop exits without an error, allowing the workflow to proceed. This can cause cryptic connection refused errors later during kubectl apply.

🐛 Proposed fix to fail explicitly on timeout
          # Wait for the admission webhook to be ready to accept connections
          echo "Waiting for ingress-nginx admission webhook..."
          for i in $(seq 1 30); do
            if kubectl get endpoints -n ingress-nginx ingress-nginx-controller-admission -o jsonpath='{.subsets[0].addresses[0].ip}' 2>/dev/null | grep -q .; then
              echo "Webhook endpoint ready"
              break
            fi
            echo "  attempt $i/30 - waiting..."
            sleep 2
+           if [ "$i" -eq 30 ]; then
+             echo "ERROR: Webhook endpoint did not become ready in time"
+             exit 1
+           fi
          done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Install nginx ingress controller
if: steps.check.outputs.exists == 'true'
run: |
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=90s
# Wait for the admission webhook to be ready to accept connections
echo "Waiting for ingress-nginx admission webhook..."
for i in $(seq 1 30); do
if kubectl get endpoints -n ingress-nginx ingress-nginx-controller-admission -o jsonpath='{.subsets[0].addresses[0].ip}' 2>/dev/null | grep -q .; then
echo "Webhook endpoint ready"
break
fi
echo " attempt $i/30 - waiting..."
sleep 2
done
- name: Install nginx ingress controller
if: steps.check.outputs.exists == 'true'
run: |
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=90s
# Wait for the admission webhook to be ready to accept connections
echo "Waiting for ingress-nginx admission webhook..."
for i in $(seq 1 30); do
if kubectl get endpoints -n ingress-nginx ingress-nginx-controller-admission -o jsonpath='{.subsets[0].addresses[0].ip}' 2>/dev/null | grep -q .; then
echo "Webhook endpoint ready"
break
fi
echo " attempt $i/30 - waiting..."
sleep 2
if [ "$i" -eq 30 ]; then
echo "ERROR: Webhook endpoint did not become ready in time"
exit 1
fi
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/k8s-manifests-ci.yml around lines 77 - 91, The ingress
webhook readiness loop in the "Install nginx ingress controller" step can
silently time out; update that step so the script tracks whether the webhook
became ready (e.g., set a READY flag inside the for-loop when the endpoint check
succeeds) and after the loop check the flag and, if not set, print a clear error
message and exit non-zero (exit 1) to fail the workflow rather than proceeding.

The kustomize overlay deploys into the openms namespace, but the
verification steps (Redis wait, Redis ping, deployment checks) were
querying the default namespace, causing "no matching resources found".

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
.github/workflows/k8s-manifests-ci.yml (2)

77-91: ⚠️ Potential issue | 🟡 Minor

Ingress webhook readiness loop silently continues on timeout.

If the webhook endpoint never becomes ready within 30 attempts, the loop exits without an error, allowing the workflow to proceed. This can cause cryptic connection refused errors later during kubectl apply.

🐛 Proposed fix to fail explicitly on timeout
          # Wait for the admission webhook to be ready to accept connections
          echo "Waiting for ingress-nginx admission webhook..."
+         WEBHOOK_READY=false
          for i in $(seq 1 30); do
            if kubectl get endpoints -n ingress-nginx ingress-nginx-controller-admission -o jsonpath='{.subsets[0].addresses[0].ip}' 2>/dev/null | grep -q .; then
              echo "Webhook endpoint ready"
+             WEBHOOK_READY=true
              break
            fi
            echo "  attempt $i/30 - waiting..."
            sleep 2
          done
+         if [ "$WEBHOOK_READY" != "true" ]; then
+           echo "ERROR: Webhook endpoint did not become ready in time"
+           exit 1
+         fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/k8s-manifests-ci.yml around lines 77 - 91, The readiness
loop in the "Install nginx ingress controller" step that checks the endpoints
for ingress-nginx-controller-admission can silently time out; update the loop to
detect timeout and fail the job explicitly (e.g., set a flag like webhook_ready
when the grep succeeds and after the for-loop check that flag and call exit 1 or
fail the step with a descriptive processLogger-style message if not ready) so
the workflow does not continue when the admission webhook never becomes
available.

110-115: ⚠️ Potential issue | 🔴 Critical

Remove || true to prevent masking deployment failures in tests.

Line 113 uses || true, which swallows kubectl wait failures and allows the integration test to pass even when deployments never become available. The step is named "Verify all deployments are available," but the current implementation only reports status without enforcing it.

🐛 Proposed fix
-          kubectl wait -n openms --for=condition=available deployment -l app=template-app --timeout=120s || true
+          kubectl wait -n openms --for=condition=available deployment -l app=template-app --timeout=120s
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/k8s-manifests-ci.yml around lines 110 - 115, The CI step
named "Verify all deployments are available" is masking failures by appending
"|| true" to the kubectl wait command; remove the "|| true" so that a non-zero
exit from "kubectl wait -n openms --for=condition=available deployment -l
app=template-app --timeout=120s" fails the job, ensuring deployment
unavailability causes the workflow to fail and surface the error (leave the
subsequent kubectl get pods/services lines unchanged so they still run when the
wait succeeds).
🧹 Nitpick comments (1)
.github/workflows/k8s-manifests-ci.yml (1)

17-20: Pin kubeconform to a specific version for reproducible builds.

Using /releases/latest/ makes CI non-deterministic. A breaking change in kubeconform could cause unexpected failures unrelated to your changes. Pin to the latest stable version.

♻️ Proposed fix to pin version
      - name: Install kubeconform
        run: |
-          curl -sSL https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz | tar xz
+          KUBECONFORM_VERSION="v0.7.0"
+          curl -sSL "https://github.com/yannh/kubeconform/releases/download/${KUBECONFORM_VERSION}/kubeconform-linux-amd64.tar.gz" | tar xz
          sudo mv kubeconform /usr/local/bin/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/k8s-manifests-ci.yml around lines 17 - 20, The CI step
named "Install kubeconform" currently downloads from /releases/latest/, making
builds non-deterministic; change it to pin kubeconform to a specific release tag
by introducing a fixed version variable (e.g., KUBECONFORM_VERSION) and using
that tag in the download URL instead of "latest", update the curl URL used in
that step and the related move operation to use the pinned version so the
workflow always installs the exact kubeconform release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Around line 77-91: The readiness loop in the "Install nginx ingress
controller" step that checks the endpoints for
ingress-nginx-controller-admission can silently time out; update the loop to
detect timeout and fail the job explicitly (e.g., set a flag like webhook_ready
when the grep succeeds and after the for-loop check that flag and call exit 1 or
fail the step with a descriptive processLogger-style message if not ready) so
the workflow does not continue when the admission webhook never becomes
available.
- Around line 110-115: The CI step named "Verify all deployments are available"
is masking failures by appending "|| true" to the kubectl wait command; remove
the "|| true" so that a non-zero exit from "kubectl wait -n openms
--for=condition=available deployment -l app=template-app --timeout=120s" fails
the job, ensuring deployment unavailability causes the workflow to fail and
surface the error (leave the subsequent kubectl get pods/services lines
unchanged so they still run when the wait succeeds).

---

Nitpick comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Around line 17-20: The CI step named "Install kubeconform" currently downloads
from /releases/latest/, making builds non-deterministic; change it to pin
kubeconform to a specific release tag by introducing a fixed version variable
(e.g., KUBECONFORM_VERSION) and using that tag in the download URL instead of
"latest", update the curl URL used in that step and the related move operation
to use the pinned version so the workflow always installs the exact kubeconform
release.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 200a528f-223c-4953-8563-e417d1703f4f

📥 Commits

Reviewing files that changed from the base of the PR and between ed3075b and 6036d14.

📒 Files selected for processing (1)
  • .github/workflows/k8s-manifests-ci.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants