Add Kubernetes manifests and CI/CD workflows for deployment#347
Add Kubernetes manifests and CI/CD workflows for deployment#347t0mdavid-m wants to merge 8 commits intomainfrom
Conversation
Decompose the monolithic Docker container into Kubernetes workloads: - Streamlit Deployment with health probes and session affinity - Redis Deployment + Service for job queue - RQ Worker Deployment for background workflows - CronJob for workspace cleanup - Ingress with WebSocket support and cookie-based sticky sessions - Shared PVC (ReadWriteMany) for workspace data - ConfigMap for runtime configuration (replaces build-time settings) - Kustomize base + template-app overlay for multi-app deployment Code changes: - Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml - Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py CI/CD: - Add build-and-push-image.yml to push Docker images to ghcr.io - Add k8s-manifests-ci.yml for manifest validation and kind integration tests https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
kustomization.yaml is a Kustomize config file, not a standard K8s resource, so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds GitHub Actions workflows for building/pushing container images and validating/integration-testing Kubernetes manifests; introduces k8s base/overlay resources (Namespace, ConfigMap, PVC, Deployments, Service, Ingress, Redis, CronJob); makes workspace path configurable and removes two Streamlit server settings. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant GH as "GitHub Actions"
participant Repo as "Repository"
participant Registry as "GHCR"
participant Kind as "kind Cluster"
participant K8s as "Kubernetes (in-kind)"
participant Redis as "Redis Pod"
GH->>Repo: checkout code
GH->>Registry: docker/metadata-action (tags/labels)
GH->>Registry: docker/login (ghcr)
GH->>Registry: docker/build-push-action (build & push image)
Registry-->>GH: image pushed
GH->>Repo: run k8s-manifests-ci (validate-manifests)
Repo->>Repo: kustomize build
Repo->>Repo: kubeconform validation
GH->>Kind: create cluster
GH->>Kind: load built image
GH->>K8s: apply kustomize output (imagePullPolicy=Never)
K8s->>Redis: start Redis pod
Redis-->>K8s: readiness OK
K8s->>K8s: deploy streamlit & rq-worker
K8s-->>GH: report app & Redis readiness
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The integration-test job now uses a matrix with Dockerfile_simple and Dockerfile. Each matrix entry checks if its Dockerfile exists before running — all steps are guarded with an `if` condition so they skip gracefully when a Dockerfile is absent. This allows downstream forks that only have one Dockerfile to pass CI without errors. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (3)
k8s/overlays/template-app/kustomization.yaml (1)
14-15: Prefer immutable image pinning overmain.Using a mutable tag makes rollbacks and provenance harder; use commit SHA or digest in the overlay/environment patch.
🔧 Example adjustment
- newTag: main + newTag: sha-<commit-sha>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/overlays/template-app/kustomization.yaml` around lines 14 - 15, The overlay sets newTag: main which is mutable; update kustomization.yaml to pin the image immutably by replacing newTag: main with a stable identifier (preferably the commit SHA or image digest) for ghcr.io/openms/streamlit-template so deployments are reproducible; locate the newTag entry in the kustomization.yaml and substitute the mutable tag with either the specific SHA tag (e.g., v1.2.3-or-<commit-sha>) or the image digest (sha256:...) for ghcr.io/openms/streamlit-template, and ensure any CI that updates this overlay writes the immutable value automatically.k8s/base/configmap.yaml (1)
12-15: The ConfigMap queue_settings are not consumed at runtime and should be removed or connected to the application configuration.The
timeout(7200) andresult_ttl(86400) parameters are hardcoded insrc/workflow/QueueManager.py(lines 105-112) and used when callingqueue.enqueue()insrc/workflow/WorkflowManager.py(line 63). The ConfigMap values are never read from configuration, so changing them has no effect on the application. Either connect the ConfigMap to load these settings dynamically or remove the ConfigMap entries to avoid confusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/base/configmap.yaml` around lines 12 - 15, The ConfigMap "queue_settings" keys are not used at runtime—replace the hardcoded timeout and result_ttl in src/workflow/QueueManager.py (where QueueManager constructs defaults) by loading those values from the application configuration (the ConfigMap) and expose them via the QueueManager instance, and update src/workflow/WorkflowManager.py to read the values from QueueManager before calling queue.enqueue() (instead of using the literals currently passed to queue.enqueue()); alternatively, if you prefer removal, delete the "queue_settings" entries from the ConfigMap to avoid confusion and keep the hardcoded defaults as-is. Ensure you reference the QueueManager class and the call site queue.enqueue in WorkflowManager so the runtime value source is consistent..github/workflows/k8s-manifests-ci.yml (1)
80-81: Pin ingress-nginx manifest to stable release tagcontroller-v1.14.3.Replace
mainwithcontroller-v1.14.3on line 80 to ensure reproducible CI runs:Suggested change
- kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml + kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.14.3/deploy/static/provider/kind/deploy.yaml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/k8s-manifests-ci.yml around lines 80 - 81, The ingress-nginx manifest URL used in the kubectl apply command should be pinned to a stable release tag for reproducible CI: update the URL in the kubectl apply line that currently references "https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml" to use the release tag "controller-v1.14.3" instead of "main" (i.e., replace "main" with "controller-v1.14.3" in that kubectl apply invocation so the workflow installs the pinned controller-v1.14.3 manifest).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/build-and-push-image.yml:
- Around line 48-49: Replace the insecure use of build-args GITHUB_TOKEN in the
GitHub Actions workflow and Dockerfiles: stop passing GITHUB_TOKEN via
build-args (symbol: build-args: GITHUB_TOKEN) and instead configure BuildKit
secret mounts in the workflow (use --secret id=github_token) and update
Dockerfile/Dockerfile_simple to remove ARG GITHUB_TOKEN and ENV GH_TOKEN and use
a RUN step that reads /run/secrets/github_token (use
--mount=type=secret,id=github_token) to populate GH_TOKEN at build time and
conditionally run gh release download only when the secret is present; ensure
the workflow supplies the repo secret as the BuildKit secret rather than a
build-arg.
In @.github/workflows/k8s-manifests-ci.yml:
- Line 103: The CI step "Verify all deployments are available" currently appends
"|| true" to the kubectl wait command ("kubectl wait --for=condition=available
deployment -l app=template-app --timeout=120s || true"), which masks failures;
remove the "|| true" so the step fails on non-zero exit and the workflow
correctly fails when deployments do not become available.
In `@clean-up-workspaces.py`:
- Line 9: The code sets workspaces_directory from WORKSPACES_DIR without
validation; add checks before any recursive deletion: ensure the env var is
non-empty, resolve the Path (workspaces_directory.resolve()), verify it exists
and is a directory, and confirm it is within an allowed base/whitelist (e.g.,
startswith or is_relative_to a known safe base like
"/workspaces-streamlit-template") and not a top-level/root/home path; if any
check fails, log an error and abort the delete operation that uses
workspaces_directory.
In `@k8s/base/cleanup-cronjob.yaml`:
- Around line 18-45: The cleanup job runs as root and sources miniforge from
/root, so add pod- and container-level securityContext to enforce non-root
execution: set spec.securityContext with runAsNonRoot: true and fsGroup (e.g.,
1000) and add container.securityContext with runAsUser set to a non-root uid,
readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, and
capabilities.drop: ["ALL"] on the container named "cleanup" (image
openms-streamlit) so the job cannot escalate privileges; also update the
image/Dockerfile to create and switch to a non-root user and relocate Miniforge
out of /root (adjust the container command/args to source the new path) so
runAsNonRoot succeeds. Ensure the WORKSPACES_DIR and volumeMount remain writable
by the chosen fsGroup/runAsUser.
In `@k8s/base/configmap.yaml`:
- Line 11: The ConfigMap key "workspaces_dir" is set to a relative path ("..")
which can escape the container workspace; update the "workspaces_dir" value in
k8s/base/configmap.yaml to the exact absolute path used as the PVC mount in the
Pod/Deployment (i.e., the same mountPath configured on the container), ensuring
the key matches the mounted PVC path so the app uses the intended in-container
workspace.
In `@k8s/base/ingress.yaml`:
- Around line 20-32: Ingress currently exposes host streamlit.openms.example.de
without TLS; update the Ingress spec (ingressClassName: nginx, host:
streamlit.openms.example.de, backend service: name streamlit, port number 8501)
to add a tls section listing the host and a secretName (e.g., streamlit-tls) and
ensure that the corresponding TLS secret exists in the cluster (or is created
via cert-manager); optionally add any nginx ingress annotations required for
TLS/redirects, but the minimal fix is adding spec.tls with hosts:
[streamlit.openms.example.de] and secretName: <your-tls-secret>.
- Around line 16-17: Replace the unlimited body-size annotation by setting a
bounded value: change the nginx.ingress.kubernetes.io/proxy-body-size annotation
(currently "0") to a reasonable maximum such as "100m" (or another value
appropriate for expected file sizes) and update the comment that references file
upload limits; ensure the ingress uses the new annotated value so NGINX enforces
a finite upload limit to mitigate DoS/resource exhaustion risks.
In `@k8s/base/redis.yaml`:
- Around line 16-38: Add a pod and container securityContext to the Redis
manifest to enforce non-root execution and drop privileges: set pod-level
podSecurityContext (e.g., runAsNonRoot: true, runAsUser: 1000) and in the redis
container's securityContext set runAsNonRoot: true, runAsUser: 1000,
allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, and capabilities:
drop: ["ALL"]; also add writable emptyDir volumes and mounts for /data and /tmp
(volume names referenced from spec.volumes and mounted into the redis container)
so Redis can start with a read-only root filesystem; ensure the container name
"redis" and readinessProbe/livenessProbe blocks remain unchanged.
In `@k8s/base/rq-worker-deployment.yaml`:
- Around line 16-49: The rq-worker container and pod run without security
hardening; update the Pod spec to set a pod-level seccompProfile: RuntimeDefault
and update the rq-worker container spec (container name "rq-worker") to include
allowPrivilegeEscalation: false and capabilities with drop: ["ALL"]; optionally
evaluate readOnlyRootFilesystem: true on the "rq-worker" container only after
verifying the image (conda at /root/miniforge3) operates correctly without
writes.
In `@k8s/base/streamlit-deployment.yaml`:
- Around line 19-20: Update the deployment so image updates are reliably pulled:
change the imagePullPolicy from IfNotPresent to Always for the container that
uses image: openms-streamlit (referencing the imagePullPolicy field in the
Streamlit Deployment), or alternatively make the image tag immutable by
replacing newTag: main with a specific version/sha in the overlay so deployments
are deterministic; implement one of these two fixes and ensure imagePullPolicy
and newTag values are consistent across kustomization overlays and the streamlit
deployment manifest.
- Around line 16-63: The Streamlit container lacks security hardening: update
the container spec for the container named "streamlit" to add a securityContext
that drops all Linux capabilities (capabilities.drop: ["ALL"]) and sets a
seccompProfile (type: RuntimeDefault) to reduce attack surface; also consider
adding a pod-level securityContext (runAsNonRoot: true and runAsUser if
appropriate) to the Deployment spec and ensure these changes are applied
alongside existing volume mounts/readinessProbe/livenessProbe so Streamlit still
has writable runtime dirs (keep current workspaces and config mounts).
In `@k8s/base/workspace-pvc.yaml`:
- Around line 6-10: The PVC currently requests ReadWriteMany (accessModes:
ReadWriteMany) with resources.requests.storage: 100Gi but lacks a
storageClassName, which can prevent binding on clusters whose default class is
RWO-only; add a storageClassName set to an RWX-capable class (or alternatively
add per-environment overlay patches that inject storageClassName) so the PVC
binds correctly — update the PVC manifest where accessModes and resources are
defined to include storageClassName with the appropriate RWX class name.
---
Nitpick comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Around line 80-81: The ingress-nginx manifest URL used in the kubectl apply
command should be pinned to a stable release tag for reproducible CI: update the
URL in the kubectl apply line that currently references
"https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml"
to use the release tag "controller-v1.14.3" instead of "main" (i.e., replace
"main" with "controller-v1.14.3" in that kubectl apply invocation so the
workflow installs the pinned controller-v1.14.3 manifest).
In `@k8s/base/configmap.yaml`:
- Around line 12-15: The ConfigMap "queue_settings" keys are not used at
runtime—replace the hardcoded timeout and result_ttl in
src/workflow/QueueManager.py (where QueueManager constructs defaults) by loading
those values from the application configuration (the ConfigMap) and expose them
via the QueueManager instance, and update src/workflow/WorkflowManager.py to
read the values from QueueManager before calling queue.enqueue() (instead of
using the literals currently passed to queue.enqueue()); alternatively, if you
prefer removal, delete the "queue_settings" entries from the ConfigMap to avoid
confusion and keep the hardcoded defaults as-is. Ensure you reference the
QueueManager class and the call site queue.enqueue in WorkflowManager so the
runtime value source is consistent.
In `@k8s/overlays/template-app/kustomization.yaml`:
- Around line 14-15: The overlay sets newTag: main which is mutable; update
kustomization.yaml to pin the image immutably by replacing newTag: main with a
stable identifier (preferably the commit SHA or image digest) for
ghcr.io/openms/streamlit-template so deployments are reproducible; locate the
newTag entry in the kustomization.yaml and substitute the mutable tag with
either the specific SHA tag (e.g., v1.2.3-or-<commit-sha>) or the image digest
(sha256:...) for ghcr.io/openms/streamlit-template, and ensure any CI that
updates this overlay writes the immutable value automatically.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e1a9caca-9f14-4c35-9abd-1a0fc0006988
📒 Files selected for processing (15)
.github/workflows/build-and-push-image.yml.github/workflows/k8s-manifests-ci.yml.streamlit/config.tomlclean-up-workspaces.pyk8s/base/cleanup-cronjob.yamlk8s/base/configmap.yamlk8s/base/ingress.yamlk8s/base/kustomization.yamlk8s/base/namespace.yamlk8s/base/redis.yamlk8s/base/rq-worker-deployment.yamlk8s/base/streamlit-deployment.yamlk8s/base/streamlit-service.yamlk8s/base/workspace-pvc.yamlk8s/overlays/template-app/kustomization.yaml
💤 Files with no reviewable changes (1)
- .streamlit/config.toml
| build-args: | | ||
| GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find and examine Dockerfiles and their usage of GITHUB_TOKEN
fd -HI '^Dockerfile' | head -20Repository: OpenMS/streamlit-template
Length of output: 97
🏁 Script executed:
# Check for GITHUB_TOKEN references in Dockerfiles
rg -n 'ARG\s+GITHUB_TOKEN|ENV\s+GITHUB_TOKEN|--mount=type=secret|GITHUB_TOKEN' --type dockerfileRepository: OpenMS/streamlit-template
Length of output: 103
🏁 Script executed:
# Check the workflow file for build-args and secrets usage
rg -n 'build-args:|secrets:' .github/workflows/build-and-push-image.yml -A 3 -B 1Repository: OpenMS/streamlit-template
Length of output: 205
🏁 Script executed:
# Get context around lines 48-49 in the workflow file
cat -n .github/workflows/build-and-push-image.yml | sed -n '40,60p'Repository: OpenMS/streamlit-template
Length of output: 487
🏁 Script executed:
# Check Dockerfile_simple
cat -n Dockerfile_simpleRepository: OpenMS/streamlit-template
Length of output: 8542
🏁 Script executed:
# Check regular Dockerfile
cat -n DockerfileRepository: OpenMS/streamlit-template
Length of output: 12981
🏁 Script executed:
# Search for GITHUB_TOKEN references in all Dockerfiles
rg 'GITHUB_TOKEN' -nRepository: OpenMS/streamlit-template
Length of output: 901
Use Docker BuildKit secret mounts instead of build-args for GITHUB_TOKEN.
Lines 48-49 pass the credential through build-args, which embeds it in image layers and cache. Both Dockerfile_simple (line 15) and Dockerfile (line 15) convert the argument to an environment variable that persists in the final image.
Since the token is only needed during build time, use --mount=type=secret with BuildKit to prevent the credential from being written to any image layers.
🔐 Proposed workflow change
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: Dockerfile_simple
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- build-args: |
- GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }}
+ secrets: |
+ github_token=${{ secrets.GITHUB_TOKEN }}Update the Dockerfile to use the secret mount instead of ARG/ENV:
- ARG GITHUB_TOKEN
- ENV GH_TOKEN=${GITHUB_TOKEN}
+ RUN --mount=type=secret,id=github_token \
+ GH_TOKEN=$(cat /run/secrets/github_token) && \
+ if [ -n "$GH_TOKEN" ]; then \
+ gh release download -R ${GITHUB_USER}/${GITHUB_REPO} -p "OpenMS-App.zip" -D /app; \
+ fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/build-and-push-image.yml around lines 48 - 49, Replace the
insecure use of build-args GITHUB_TOKEN in the GitHub Actions workflow and
Dockerfiles: stop passing GITHUB_TOKEN via build-args (symbol: build-args:
GITHUB_TOKEN) and instead configure BuildKit secret mounts in the workflow (use
--secret id=github_token) and update Dockerfile/Dockerfile_simple to remove ARG
GITHUB_TOKEN and ENV GH_TOKEN and use a RUN step that reads
/run/secrets/github_token (use --mount=type=secret,id=github_token) to populate
GH_TOKEN at build time and conditionally run gh release download only when the
secret is present; ensure the workflow supplies the repo secret as the BuildKit
secret rather than a build-arg.
| - name: Verify all deployments are available | ||
| if: steps.check.outputs.exists == 'true' | ||
| run: | | ||
| kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s || true |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the workflow file and check its structure
fd "k8s-manifests-ci.yml" .github/workflows/Repository: OpenMS/streamlit-template
Length of output: 107
🏁 Script executed:
# Check the specific line and surrounding context
sed -n '95,110p' .github/workflows/k8s-manifests-ci.yml | cat -nRepository: OpenMS/streamlit-template
Length of output: 694
🏁 Script executed:
# Look for all instances of kubectl wait and || true patterns in workflows
rg -n "kubectl wait" .github/workflows/Repository: OpenMS/streamlit-template
Length of output: 536
Remove || true to prevent masking deployment failures in tests.
Line 103 uses || true, which swallows kubectl wait failures and allows the integration test to pass even when deployments never become available. This contradicts other similar verification steps in the workflow (lines 81, 93) that properly fail when conditions aren't met. The step is named "Verify all deployments are available," but the current implementation only reports status without enforcing it.
Proposed fix
- kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s || true
+ kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s || true | |
| kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/k8s-manifests-ci.yml at line 103, The CI step "Verify all
deployments are available" currently appends "|| true" to the kubectl wait
command ("kubectl wait --for=condition=available deployment -l app=template-app
--timeout=120s || true"), which masks failures; remove the "|| true" so the step
fails on non-zero exit and the workflow correctly fails when deployments do not
become available.
|
|
||
| # Define the workspaces directory | ||
| workspaces_directory = Path("/workspaces-streamlit-template") | ||
| workspaces_directory = Path(os.environ.get("WORKSPACES_DIR", "/workspaces-streamlit-template")) |
There was a problem hiding this comment.
Validate WORKSPACES_DIR before using it for recursive deletion.
If WORKSPACES_DIR is empty/invalid, this can target unintended directories and delete data.
🛡️ Proposed safety guard
-workspaces_directory = Path(os.environ.get("WORKSPACES_DIR", "/workspaces-streamlit-template"))
+raw_workspaces_dir = os.environ.get("WORKSPACES_DIR", "/workspaces-streamlit-template").strip()
+if not raw_workspaces_dir:
+ raise ValueError("WORKSPACES_DIR cannot be empty")
+workspaces_directory = Path(raw_workspaces_dir).resolve()
+if not workspaces_directory.exists() or not workspaces_directory.is_dir():
+ raise FileNotFoundError(f"WORKSPACES_DIR is invalid: {workspaces_directory}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clean-up-workspaces.py` at line 9, The code sets workspaces_directory from
WORKSPACES_DIR without validation; add checks before any recursive deletion:
ensure the env var is non-empty, resolve the Path
(workspaces_directory.resolve()), verify it exists and is a directory, and
confirm it is within an allowed base/whitelist (e.g., startswith or
is_relative_to a known safe base like "/workspaces-streamlit-template") and not
a top-level/root/home path; if any check fails, log an error and abort the
delete operation that uses workspaces_directory.
| spec: | ||
| restartPolicy: OnFailure | ||
| containers: | ||
| - name: cleanup | ||
| image: openms-streamlit | ||
| imagePullPolicy: IfNotPresent | ||
| command: ["/bin/bash", "-c"] | ||
| args: | ||
| - | | ||
| source /root/miniforge3/bin/activate streamlit-env | ||
| exec python clean-up-workspaces.py | ||
| env: | ||
| - name: WORKSPACES_DIR | ||
| value: "/workspaces-streamlit-template" | ||
| volumeMounts: | ||
| - name: workspaces | ||
| mountPath: /workspaces-streamlit-template | ||
| resources: | ||
| requests: | ||
| memory: "256Mi" | ||
| cpu: "100m" | ||
| limits: | ||
| memory: "512Mi" | ||
| cpu: "500m" | ||
| volumes: | ||
| - name: workspaces | ||
| persistentVolumeClaim: | ||
| claimName: workspaces-pvc |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check current security context in cleanup-cronjob.yaml
echo "=== Current state in cleanup-cronjob.yaml ==="
rg -n 'securityContext|allowPrivilegeEscalation|readOnlyRootFilesystem|runAsNonRoot|seccompProfile' k8s/base/cleanup-cronjob.yaml -C2
# Find and examine Dockerfile
echo ""
echo "=== Searching for Dockerfile ==="
fd -HI '^Dockerfile' | head -20
# Check Dockerfile content for USER and relevant directives
echo ""
echo "=== Dockerfile USER/WORKDIR settings ==="
fd -HI '^Dockerfile' | xargs -r head -50Repository: OpenMS/streamlit-template
Length of output: 5906
Harden pod/container security context for the cleanup job.
Lines 18-45 currently run with default privileges. The Dockerfile defaults to root user and installs miniforge to /root/miniforge3, leaving avoidable privilege-escalation and runtime-hardening gaps.
🛡️ Proposed hardening patch
spec:
+ securityContext:
+ seccompProfile:
+ type: RuntimeDefault
restartPolicy: OnFailure
containers:
- name: cleanup
image: openms-streamlit
imagePullPolicy: IfNotPresent
+ securityContext:
+ allowPrivilegeEscalation: false
+ capabilities:
+ drop: ["ALL"]
+ readOnlyRootFilesystem: true
command: ["/bin/bash", "-c"]To migrate to runAsNonRoot, the Dockerfile must switch from root user and relocate miniforge away from /root/miniforge3.
🧰 Tools
🪛 Checkov (3.2.334)
[medium] 1-45: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-45: Minimize the admission of root containers
(CKV_K8S_23)
🪛 Trivy (0.69.1)
[error] 21-41: Root file system is not read-only
Container 'cleanup' of CronJob 'workspace-cleanup' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 21-41: Default security context configured
container workspace-cleanup in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 18-45: Default security context configured
cronjob workspace-cleanup in default namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/base/cleanup-cronjob.yaml` around lines 18 - 45, The cleanup job runs as
root and sources miniforge from /root, so add pod- and container-level
securityContext to enforce non-root execution: set spec.securityContext with
runAsNonRoot: true and fsGroup (e.g., 1000) and add container.securityContext
with runAsUser set to a non-root uid, readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, and capabilities.drop: ["ALL"] on the container
named "cleanup" (image openms-streamlit) so the job cannot escalate privileges;
also update the image/Dockerfile to create and switch to a non-root user and
relocate Miniforge out of /root (adjust the container command/args to source the
new path) so runAsNonRoot succeeds. Ensure the WORKSPACES_DIR and volumeMount
remain writable by the chosen fsGroup/runAsUser.
| "app-name": "OpenMS WebApp Template", | ||
| "online_deployment": true, | ||
| "enable_workspaces": true, | ||
| "workspaces_dir": "..", |
There was a problem hiding this comment.
workspaces_dir should match the mounted PVC path in k8s.
Using ".." is risky in-container and can resolve outside the intended workspace mount.
🔧 Proposed fix
- "workspaces_dir": "..",
+ "workspaces_dir": "/workspaces-streamlit-template",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "workspaces_dir": "..", | |
| "workspaces_dir": "/workspaces-streamlit-template", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/base/configmap.yaml` at line 11, The ConfigMap key "workspaces_dir" is
set to a relative path ("..") which can escape the container workspace; update
the "workspaces_dir" value in k8s/base/configmap.yaml to the exact absolute path
used as the PVC mount in the Pod/Deployment (i.e., the same mountPath configured
on the container), ensuring the key matches the mounted PVC path so the app uses
the intended in-container workspace.
| spec: | ||
| containers: | ||
| - name: redis | ||
| image: redis:7-alpine | ||
| ports: | ||
| - containerPort: 6379 | ||
| resources: | ||
| requests: | ||
| memory: "64Mi" | ||
| cpu: "50m" | ||
| limits: | ||
| memory: "256Mi" | ||
| cpu: "250m" | ||
| readinessProbe: | ||
| exec: | ||
| command: ["redis-cli", "ping"] | ||
| initialDelaySeconds: 5 | ||
| periodSeconds: 10 | ||
| livenessProbe: | ||
| exec: | ||
| command: ["redis-cli", "ping"] | ||
| initialDelaySeconds: 15 | ||
| periodSeconds: 20 |
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
cat k8s/base/redis.yamlRepository: OpenMS/streamlit-template
Repository: OpenMS/streamlit-template
Exit code: 0
stdout:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
labels:
component: redis
spec:
replicas: 1
selector:
matchLabels:
component: redis
template:
metadata:
labels:
component: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "256Mi"
cpu: "250m"
readinessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 15
periodSeconds: 20
---
apiVersion: v1
kind: Service
metadata:
name: redis
labels:
component: redis
spec:
type: ClusterIP
ports:
- port: 6379
targetPort: 6379
selector:
component: redis
redis:7-alpine readOnlyRootFilesystem kubernetes security context compatibility
Yes—redis:7-alpine can run with readOnlyRootFilesystem: true in Kubernetes, as long as you provide explicit writable mounts for the paths Redis needs to write to.
What must be writable
/data: the official Redis image declares persistence underVOLUME /data(RDB/AOF, etc.). With a read-only root FS,/datamust be backed by a PVC oremptyDir. [1]- A temp directory (commonly
/tmp): many hardened setups mount anemptyDirat/tmpto avoid “read-only file system” errors when software tries to create temp files. (Not Redis-specific, but commonly required in practice when enablingreadOnlyRootFilesystem.) [2]
SecurityContext that’s typically compatible
The Redis official image is designed to drop privileges to a non-root redis user by default, and the Alpine variant uses a dedicated redis user (commonly UID 999). [1], [3]
So a restricted container security context usually works, provided your volume permissions align:
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
seccompProfile:
type: RuntimeDefault
volumeMounts:
- name: data
mountPath: /data
- name: tmp
mountPath: /tmp
volumes:
- name: data
persistentVolumeClaim:
claimName: redis-data
- name: tmp
emptyDir: {}Common gotchas
- If you override
runAsUser, ensure it matches the image’s Redis UID (or ensure/datais writable viafsGroup/permissions), otherwise Redis may fail to write to/data. [3] - If you configure Redis to write anywhere else (custom
dir, unix socket path, log file), that path must be on a writable volume; otherwise it will fail under a read-only root filesystem. [2]
Sources: Kubernetes security context docs (for readOnlyRootFilesystem) [2]; Redis official image docs (persisted data under /data, privilege dropping) [1]; Redis docker-library image user details (Alpine redis user/UID behavior) [3].
Script executed:
# Check if there are any Redis configuration files or persistence setup in the repo
find . -type f -name "redis*" -o -name "*redis*" | head -20Repository: OpenMS/streamlit-template
Repository: OpenMS/streamlit-template
Exit code: 0
stdout:
./k8s/base/redis.yaml
Harden Redis pod/container security context.
The deployment currently runs with default security settings. Please add pod and container security contexts to enforce non-root execution, drop capabilities, and disable privilege escalation.
Note: If enabling readOnlyRootFilesystem: true, ensure both /data and /tmp are mounted as writable volumes (typically emptyDir). Redis will fail to start without writable access to /tmp.
🔒 Corrected hardening patch
spec:
replicas: 1
selector:
matchLabels:
component: redis
template:
metadata:
labels:
component: redis
spec:
+ securityContext:
+ runAsNonRoot: true
+ seccompProfile:
+ type: RuntimeDefault
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
+ securityContext:
+ allowPrivilegeEscalation: false
+ capabilities:
+ drop: ["ALL"]
+ readOnlyRootFilesystem: true
+ volumeMounts:
+ - name: redis-data
+ mountPath: /data
+ - name: tmp
+ mountPath: /tmp
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "256Mi"
cpu: "250m"
readinessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 15
periodSeconds: 20
+ volumes:
+ - name: redis-data
+ emptyDir: {}
+ - name: tmp
+ emptyDir: {}🧰 Tools
🪛 Trivy (0.69.1)
[error] 18-38: Root file system is not read-only
Container 'redis' of Deployment 'redis' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 18-38: Default security context configured
container redis in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 16-38: Default security context configured
deployment redis in default namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/base/redis.yaml` around lines 16 - 38, Add a pod and container
securityContext to the Redis manifest to enforce non-root execution and drop
privileges: set pod-level podSecurityContext (e.g., runAsNonRoot: true,
runAsUser: 1000) and in the redis container's securityContext set runAsNonRoot:
true, runAsUser: 1000, allowPrivilegeEscalation: false, readOnlyRootFilesystem:
true, and capabilities: drop: ["ALL"]; also add writable emptyDir volumes and
mounts for /data and /tmp (volume names referenced from spec.volumes and mounted
into the redis container) so Redis can start with a read-only root filesystem;
ensure the container name "redis" and readinessProbe/livenessProbe blocks remain
unchanged.
| spec: | ||
| containers: | ||
| - name: rq-worker | ||
| image: openms-streamlit | ||
| imagePullPolicy: IfNotPresent | ||
| command: ["/bin/bash", "-c"] | ||
| args: | ||
| - | | ||
| source /root/miniforge3/bin/activate streamlit-env | ||
| exec rq worker openms-workflows --url $REDIS_URL | ||
| env: | ||
| - name: REDIS_URL | ||
| value: "redis://redis:6379/0" | ||
| volumeMounts: | ||
| - name: workspaces | ||
| mountPath: /workspaces-streamlit-template | ||
| - name: config | ||
| mountPath: /app/settings.json | ||
| subPath: settings.json | ||
| readOnly: true | ||
| resources: | ||
| requests: | ||
| memory: "4Gi" | ||
| cpu: "2" | ||
| limits: | ||
| memory: "32Gi" | ||
| cpu: "8" | ||
| volumes: | ||
| - name: workspaces | ||
| persistentVolumeClaim: | ||
| claimName: workspaces-pvc | ||
| - name: config | ||
| configMap: | ||
| name: streamlit-config |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and read the actual deployment file
find . -type f -name "rq-worker-deployment.yaml" | head -5Repository: OpenMS/streamlit-template
Length of output: 105
🏁 Script executed:
# Also check the overall structure of the k8s directory
ls -la k8s/base/ 2>/dev/null || find . -type d -name "base" | grep k8sRepository: OpenMS/streamlit-template
Length of output: 858
🏁 Script executed:
# Read the actual rq-worker-deployment.yaml file
cat -n k8s/base/rq-worker-deployment.yamlRepository: OpenMS/streamlit-template
Length of output: 1678
🏁 Script executed:
# Check if there's a Dockerfile to understand image dependencies and write requirements
find . -type f -name "Dockerfile" -o -name "Dockerfile.*" | head -10Repository: OpenMS/streamlit-template
Length of output: 81
🏁 Script executed:
# Check other deployment/pod manifests to see if security contexts are used elsewhere
cat -n k8s/base/streamlit-deployment.yamlRepository: OpenMS/streamlit-template
Length of output: 2176
🏁 Script executed:
# Check the cleanup-cronjob.yaml mentioned in the scratchpad to see if it has similar issues
cat -n k8s/base/cleanup-cronjob.yamlRepository: OpenMS/streamlit-template
Length of output: 1670
🏁 Script executed:
# Read the Dockerfile to understand image structure and capabilities
cat -n DockerfileRepository: OpenMS/streamlit-template
Length of output: 12981
rq-worker container missing security hardening controls.
Lines 16-49 run with default security context, weakening least-privilege guarantees. Add pod-level seccomp policy and container-level restrictions:
seccompProfile: RuntimeDefaulton pod specallowPrivilegeEscalation: false,capabilities: drop: ["ALL"]on container spec
Note: readOnlyRootFilesystem: true requires validating the image can function without write access to the root filesystem; it may conflict with conda environment activation from /root/miniforge3 or temporary file requirements during job execution. Test carefully before applying.
🧰 Tools
🪛 Checkov (3.2.334)
[medium] 1-49: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-49: Minimize the admission of root containers
(CKV_K8S_23)
🪛 Trivy (0.69.1)
[error] 18-42: Root file system is not read-only
Container 'rq-worker' of Deployment 'rq-worker' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 18-42: Default security context configured
container rq-worker in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 16-49: Default security context configured
deployment rq-worker in default namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/base/rq-worker-deployment.yaml` around lines 16 - 49, The rq-worker
container and pod run without security hardening; update the Pod spec to set a
pod-level seccompProfile: RuntimeDefault and update the rq-worker container spec
(container name "rq-worker") to include allowPrivilegeEscalation: false and
capabilities with drop: ["ALL"]; optionally evaluate readOnlyRootFilesystem:
true on the "rq-worker" container only after verifying the image (conda at
/root/miniforge3) operates correctly without writes.
| spec: | ||
| containers: | ||
| - name: streamlit | ||
| image: openms-streamlit | ||
| imagePullPolicy: IfNotPresent | ||
| command: ["/bin/bash", "-c"] | ||
| args: | ||
| - | | ||
| source /root/miniforge3/bin/activate streamlit-env | ||
| exec streamlit run app.py --server.address 0.0.0.0 | ||
| ports: | ||
| - containerPort: 8501 | ||
| env: | ||
| - name: REDIS_URL | ||
| value: "redis://redis:6379/0" | ||
| volumeMounts: | ||
| - name: workspaces | ||
| mountPath: /workspaces-streamlit-template | ||
| - name: config | ||
| mountPath: /app/settings.json | ||
| subPath: settings.json | ||
| readOnly: true | ||
| readinessProbe: | ||
| httpGet: | ||
| path: /_stcore/health | ||
| port: 8501 | ||
| initialDelaySeconds: 10 | ||
| periodSeconds: 10 | ||
| livenessProbe: | ||
| httpGet: | ||
| path: /_stcore/health | ||
| port: 8501 | ||
| initialDelaySeconds: 30 | ||
| periodSeconds: 30 | ||
| resources: | ||
| requests: | ||
| memory: "4Gi" | ||
| cpu: "2" | ||
| limits: | ||
| memory: "32Gi" | ||
| cpu: "8" | ||
| volumes: | ||
| - name: workspaces | ||
| persistentVolumeClaim: | ||
| claimName: workspaces-pvc | ||
| - name: config | ||
| configMap: | ||
| name: streamlit-config |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the streamlit deployment file
fd -t f "streamlit-deployment.yaml"Repository: OpenMS/streamlit-template
Length of output: 103
🏁 Script executed:
# Let's also check the kustomization file referenced in the comment
fd -t f "kustomization.yaml" --path "*overlays*"Repository: OpenMS/streamlit-template
Length of output: 302
🏁 Script executed:
# Read the streamlit deployment file to verify security context
cat -n k8s/base/streamlit-deployment.yamlRepository: OpenMS/streamlit-template
Length of output: 2176
🏁 Script executed:
# Find kustomization.yaml files correctly
fd -t f "kustomization.yaml"Repository: OpenMS/streamlit-template
Length of output: 141
🏁 Script executed:
# Read the overlay kustomization file to verify mutable tag claim
cat -n k8s/overlays/template-app/kustomization.yamlRepository: OpenMS/streamlit-template
Length of output: 661
🏁 Script executed:
# Let's also check if there are any security policies or other hardening measures in the k8s directory
find k8s -type f -name "*.yaml" | head -20Repository: OpenMS/streamlit-template
Length of output: 391
🏁 Script executed:
# Check if there's any documentation or comments in deployment about filesystem requirements
grep -r "readOnly\|writeable\|cache" k8s/ --include="*.yaml"Repository: OpenMS/streamlit-template
Length of output: 196
🏁 Script executed:
# Check the base kustomization to see if there are any other security settings
cat -n k8s/base/kustomization.yamlRepository: OpenMS/streamlit-template
Length of output: 440
Streamlit pod/container are missing baseline security controls.
Lines 16-63 lack securityContext hardening. Add seccomp profile and drop all capabilities to reduce attack surface.
🛡️ Proposed hardening patch
spec:
+ securityContext:
+ seccompProfile:
+ type: RuntimeDefault
containers:
- name: streamlit
image: openms-streamlit
imagePullPolicy: IfNotPresent
+ securityContext:
+ allowPrivilegeEscalation: false
+ capabilities:
+ drop: ["ALL"]
command: ["/bin/bash", "-c"]Note: readOnlyRootFilesystem: true was considered but is not feasible here—Streamlit requires writable filesystem for cache and temp directories. If read-only enforcement is required, it would need explicit emptyDir mounts for runtime directories.
🧰 Tools
🪛 Checkov (3.2.334)
[medium] 1-63: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-63: Minimize the admission of root containers
(CKV_K8S_23)
🪛 Trivy (0.69.1)
[error] 18-56: Root file system is not read-only
Container 'streamlit' of Deployment 'streamlit' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 18-56: Default security context configured
container streamlit in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 16-63: Default security context configured
deployment streamlit in default namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/base/streamlit-deployment.yaml` around lines 16 - 63, The Streamlit
container lacks security hardening: update the container spec for the container
named "streamlit" to add a securityContext that drops all Linux capabilities
(capabilities.drop: ["ALL"]) and sets a seccompProfile (type: RuntimeDefault) to
reduce attack surface; also consider adding a pod-level securityContext
(runAsNonRoot: true and runAsUser if appropriate) to the Deployment spec and
ensure these changes are applied alongside existing volume
mounts/readinessProbe/livenessProbe so Streamlit still has writable runtime dirs
(keep current workspaces and config mounts).
| image: openms-streamlit | ||
| imagePullPolicy: IfNotPresent |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's check if these files exist
echo "=== Checking file existence ==="
ls -la k8s/base/streamlit-deployment.yaml 2>/dev/null || echo "Not found: k8s/base/streamlit-deployment.yaml"
ls -la k8s/overlays/template-app/kustomization.yaml 2>/dev/null || echo "Not found: k8s/overlays/template-app/kustomization.yaml"
# Check the directory structure
echo "=== Directory structure ==="
find k8s -type f -name "*.yaml" -o -name "*.yml" 2>/dev/null | head -20Repository: OpenMS/streamlit-template
Length of output: 628
🏁 Script executed:
#!/bin/bash
# Examine the streamlit deployment file
echo "=== Content of k8s/base/streamlit-deployment.yaml ==="
cat -n k8s/base/streamlit-deployment.yaml 2>/dev/null || echo "File not found"
echo ""
echo "=== Content of k8s/overlays/template-app/kustomization.yaml ==="
cat -n k8s/overlays/template-app/kustomization.yaml 2>/dev/null || echo "File not found"Repository: OpenMS/streamlit-template
Length of output: 2888
Use imagePullPolicy: Always or switch to immutable image tags.
Line 20 combined with newTag: main in k8s/overlays/template-app/kustomization.yaml creates a risky pattern: IfNotPresent prevents pulling newer versions of the main tag, leaving stale images running and making rollouts non-deterministic.
Recommended fix
- imagePullPolicy: IfNotPresent
+ imagePullPolicy: AlwaysAlternatively, replace mutable main tag with a specific version (semver or SHA) in the overlay for fully deterministic deployments.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| image: openms-streamlit | |
| imagePullPolicy: IfNotPresent | |
| image: openms-streamlit | |
| imagePullPolicy: Always |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/base/streamlit-deployment.yaml` around lines 19 - 20, Update the
deployment so image updates are reliably pulled: change the imagePullPolicy from
IfNotPresent to Always for the container that uses image: openms-streamlit
(referencing the imagePullPolicy field in the Streamlit Deployment), or
alternatively make the image tag immutable by replacing newTag: main with a
specific version/sha in the overlay so deployments are deterministic; implement
one of these two fixes and ensure imagePullPolicy and newTag values are
consistent across kustomization overlays and the streamlit deployment manifest.
k8s/base/workspace-pvc.yaml
Outdated
| accessModes: | ||
| - ReadWriteMany | ||
| resources: | ||
| requests: | ||
| storage: 100Gi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the actual PVC file
cat -n k8s/base/workspace-pvc.yamlRepository: OpenMS/streamlit-template
Length of output: 303
🏁 Script executed:
# Check for overlays directory structure
find k8s -type d -name "overlays" | head -20Repository: OpenMS/streamlit-template
Length of output: 81
🏁 Script executed:
# Search for storageClassName references and RWX configuration
rg -n --type=yaml 'storageClassName|ReadWriteMany' k8s/ -A 2 -B 2Repository: OpenMS/streamlit-template
Length of output: 286
🏁 Script executed:
# Look for any patches or kustomization files that might modify the PVC
find k8s -name "kustomization.yaml" -o -name "kustomization.yml" | xargs grep -l "workspace-pvc" 2>/dev/nullRepository: OpenMS/streamlit-template
Length of output: 96
🏁 Script executed:
# Check the kustomization.yaml in base
cat -n k8s/base/kustomization.yamlRepository: OpenMS/streamlit-template
Length of output: 440
🏁 Script executed:
# List overlay structure
ls -la k8s/overlays/Repository: OpenMS/streamlit-template
Length of output: 235
🏁 Script executed:
# Check for patches related to workspace-pvc in overlays
find k8s/overlays -type f \( -name "*.yaml" -o -name "*.yml" \) | xargs grep -l "storageClassName\|workspace-pvc" 2>/dev/nullRepository: OpenMS/streamlit-template
Length of output: 53
🏁 Script executed:
# Check for any pvc-related patches in overlays
find k8s/overlays -type f -name "*pvc*" -o -name "*patch*" | head -20Repository: OpenMS/streamlit-template
Length of output: 51
🏁 Script executed:
# Check the overlay's kustomization.yaml
cat -n k8s/overlays/template-app/kustomization.yamlRepository: OpenMS/streamlit-template
Length of output: 661
🏁 Script executed:
# Check if there are any patch files in the overlay
ls -la k8s/overlays/template-app/Repository: OpenMS/streamlit-template
Length of output: 244
🏁 Script executed:
# Search specifically for any YAML files in overlay that might patch the PVC
find k8s/overlays -type f -name "*.yaml" -o -name "*.yml" | xargs catRepository: OpenMS/streamlit-template
Length of output: 493
🏁 Script executed:
# Check if there's any documentation about storage class requirements
find . -name "README*" -o -name "INSTALL*" -o -name "*.md" | xargs grep -l "storage\|RWX\|PVC" 2>/dev/null | head -5Repository: OpenMS/streamlit-template
Length of output: 89
🏁 Script executed:
# Check the user guide for any storage/PVC requirements
cat -n ./docs/user_guide.md | grep -A 10 -B 5 -i "storage\|pvc\|rwx\|persistent"Repository: OpenMS/streamlit-template
Length of output: 1159
🏁 Script executed:
# Also check if there are other overlays or environments we should examine
find k8s/overlays -type d -mindepth 1 | sortRepository: OpenMS/streamlit-template
Length of output: 94
Specify an RWX-capable storageClassName for portability.
ReadWriteMany requires explicit storage class configuration. Without it, the PVC may fail to bind on clusters whose default class is RWO-only or unavailable. Currently, neither the base configuration nor the overlay provides a storageClassName.
🔧 Proposed fix
spec:
+ storageClassName: <rwx-storage-class>
accessModes:
- ReadWriteManyEither define the storageClassName in the base PVC or add an overlay patch per environment with the appropriate RWX-capable class name.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/base/workspace-pvc.yaml` around lines 6 - 10, The PVC currently requests
ReadWriteMany (accessModes: ReadWriteMany) with resources.requests.storage:
100Gi but lacks a storageClassName, which can prevent binding on clusters whose
default class is RWO-only; add a storageClassName set to an RWX-capable class
(or alternatively add per-environment overlay patches that inject
storageClassName) so the PVC binds correctly — update the PVC manifest where
accessModes and resources are defined to include storageClassName with the
appropriate RWX class name.
- Switch workspace PVC from ReadWriteMany to ReadWriteOnce with cinder-csi storage class (required by de.NBI KKP cluster) - Increase PVC storage to 500Gi - Add namespace: openms to kustomization.yaml - Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU) so all workspace-mounting pods fit on a single node https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
There was a problem hiding this comment.
♻️ Duplicate comments (3)
k8s/base/rq-worker-deployment.yaml (1)
16-49:⚠️ Potential issue | 🟠 MajorAdd baseline pod/container security hardening.
Line 16 and Line 18 still use the default security context, so the worker keeps default privileges and capability set. Please add pod-level seccomp and container-level privilege restrictions here.
Suggested patch
spec: + securityContext: + seccompProfile: + type: RuntimeDefault containers: - name: rq-worker image: openms-streamlit imagePullPolicy: IfNotPresent + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] command: ["/bin/bash", "-c"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/base/rq-worker-deployment.yaml` around lines 16 - 49, Add pod- and container-level security contexts: under the Pod spec set a podSecurityContext/seccompProfile (e.g., type: RuntimeDefault) to enforce seccomp, and in the container entry for name: rq-worker add a securityContext that enforces runAsNonRoot: true, allowPrivilegeEscalation: false, privileged: false, readOnlyRootFilesystem: true and drop all capabilities (capabilities.drop: ["ALL"]); ensure these keys are added alongside the existing spec.containers and not inside args so the rq-worker container is restricted at runtime.k8s/base/streamlit-deployment.yaml (2)
16-63:⚠️ Potential issue | 🟠 MajorAdd baseline pod/container security hardening.
Line 16 and Line 18 still run with the default security context. That leaves the pod with a broader privilege surface than necessary.
Suggested patch
spec: + securityContext: + seccompProfile: + type: RuntimeDefault containers: - name: streamlit image: openms-streamlit imagePullPolicy: IfNotPresent + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] command: ["/bin/bash", "-c"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/base/streamlit-deployment.yaml` around lines 16 - 63, The Pod spec for the Streamlit container (container name "streamlit" in the Deployment spec) lacks a securityContext; add a podSecurityContext and container securityContext to harden privileges: set podSecurityContext.fsGroup and runAsNonRoot/runAsUser, and on the container set securityContext.runAsUser (e.g., 1000), runAsGroup, runAsNonRoot: true, allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, drop all capabilities, and add a seccompProfile (RuntimeDefault) and readOnlyRootFilesystem where appropriate; ensure volume mounts (volumeMounts: workspaces/config) remain writable if needed by adjusting fsGroup rather than granting root.
19-20:⚠️ Potential issue | 🟠 MajorAvoid
IfNotPresentfor an untagged or mutable app image.Line 20 makes rollouts non-deterministic.
openms-streamlitis untagged here, so direct applies resolve tolatest, and if the overlay keeps using a mutable tag likemain, new pods can keep running an older cached build.k8s/base/rq-worker-deployment.yamlLine 20 has the same problem, so both deployments should be aligned.Suggested patch if you keep mutable tags
- imagePullPolicy: IfNotPresent + imagePullPolicy: Always🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/base/streamlit-deployment.yaml` around lines 19 - 20, The deployment uses an untagged/mutable image ("image: openms-streamlit") but keeps imagePullPolicy: IfNotPresent, which makes rollouts non-deterministic; update the Streamlit Deployment (and align rq-worker-deployment.yaml similarly) to either pin the image to an immutable tag or set imagePullPolicy to Always for the current mutable/untagged image, i.e., change the image reference to a fixed tag (preferred) or change imagePullPolicy from IfNotPresent to Always on the Deployment spec that contains "image: openms-streamlit" so new pods always pull the latest build.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@k8s/base/rq-worker-deployment.yaml`:
- Around line 16-49: Add pod- and container-level security contexts: under the
Pod spec set a podSecurityContext/seccompProfile (e.g., type: RuntimeDefault) to
enforce seccomp, and in the container entry for name: rq-worker add a
securityContext that enforces runAsNonRoot: true, allowPrivilegeEscalation:
false, privileged: false, readOnlyRootFilesystem: true and drop all capabilities
(capabilities.drop: ["ALL"]); ensure these keys are added alongside the existing
spec.containers and not inside args so the rq-worker container is restricted at
runtime.
In `@k8s/base/streamlit-deployment.yaml`:
- Around line 16-63: The Pod spec for the Streamlit container (container name
"streamlit" in the Deployment spec) lacks a securityContext; add a
podSecurityContext and container securityContext to harden privileges: set
podSecurityContext.fsGroup and runAsNonRoot/runAsUser, and on the container set
securityContext.runAsUser (e.g., 1000), runAsGroup, runAsNonRoot: true,
allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, drop all
capabilities, and add a seccompProfile (RuntimeDefault) and
readOnlyRootFilesystem where appropriate; ensure volume mounts (volumeMounts:
workspaces/config) remain writable if needed by adjusting fsGroup rather than
granting root.
- Around line 19-20: The deployment uses an untagged/mutable image ("image:
openms-streamlit") but keeps imagePullPolicy: IfNotPresent, which makes rollouts
non-deterministic; update the Streamlit Deployment (and align
rq-worker-deployment.yaml similarly) to either pin the image to an immutable tag
or set imagePullPolicy to Always for the current mutable/untagged image, i.e.,
change the image reference to a fixed tag (preferred) or change imagePullPolicy
from IfNotPresent to Always on the Deployment spec that contains "image:
openms-streamlit" so new pods always pull the latest build.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6454e3b0-21c3-4f57-b2cd-f57c0c9d6c1d
📒 Files selected for processing (4)
k8s/base/kustomization.yamlk8s/base/rq-worker-deployment.yamlk8s/base/streamlit-deployment.yamlk8s/base/workspace-pvc.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- k8s/base/workspace-pvc.yaml
- k8s/base/kustomization.yaml
The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which requires all pods mounting it to run on the same node. Without explicit affinity rules, the scheduler was failing silently, leaving pods in Pending state with no events. Adds a `volume-group: workspaces` label and podAffinity with requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment, rq-worker deployment, and cleanup cronjob. This ensures the scheduler explicitly co-locates all workspace-consuming pods on the same node. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
k8s/base/cleanup-cronjob.yaml (1)
19-39:⚠️ Potential issue | 🟠 MajorAdd an explicit security context and stop depending on
/rootfor the runtime.Lines 19-39 still run with the default pod/container privileges, and the command on Lines 35-39 ties the job to
/root/miniforge3. That leaves the CronJob running as root by default today, and it also means a laterrunAsNonRootchange will fail unless the image is updated too.🛡️ Manifest-side hardening
spec: + securityContext: + seccompProfile: + type: RuntimeDefault restartPolicy: OnFailure affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: @@ containers: - name: cleanup image: openms-streamlit imagePullPolicy: IfNotPresent + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + readOnlyRootFilesystem: true command: ["/bin/bash", "-c"]After that, the image should be updated so the environment is no longer sourced from
/root/miniforge3, thenrunAsNonRoot/runAsUsercan be enforced here as well.#!/bin/bash set -euo pipefail echo "=== cleanup-cronjob securityContext ===" sed -n '1,120p' k8s/base/cleanup-cronjob.yaml | nl -ba | sed -n '19,60p' echo echo "=== Dockerfile USER / miniforge references ===" fd -HI '^Dockerfile$' -x sh -c ' echo "--- {} ---" rg -n "^(FROM|USER|WORKDIR|ENV|RUN)|miniforge|conda" "{}" -C1 || true 'Expected result: the manifest shows no
securityContext, and the image build should confirm whether it still relies on/root/miniforge3or lacks a non-rootUSER.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/base/cleanup-cronjob.yaml` around lines 19 - 39, The CronJob manifest is missing a securityContext and still sources /root/miniforge3 in the container args; add a pod-level securityContext and container.securityContext with runAsNonRoot: true and runAsUser set to a non-root UID (e.g., 1000) and set fsGroup as needed, and update the cleanup container spec (containers -> name: cleanup, image, command/args) to stop sourcing /root/miniforge3 (remove the "source /root/miniforge3/bin/activate streamlit-env" line and rely on the image PATH or non-root home), then ensure the image is rebuilt/updated so that the runtime does not depend on /root/miniforge3 allowing runAsUser/runAsNonRoot to be enforced.k8s/base/rq-worker-deployment.yaml (1)
17-31:⚠️ Potential issue | 🟠 MajorAdd baseline pod/container security context.
This Deployment still runs with the default security context. Add a pod-level
seccompProfile: RuntimeDefaultand container-levelallowPrivilegeEscalation: falsepluscapabilities.drop: ["ALL"]; validatereadOnlyRootFilesystemseparately because the Miniforge/RQ process may need writes.Proposed patch
spec: + securityContext: + seccompProfile: + type: RuntimeDefault affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: @@ containers: - name: rq-worker image: openms-streamlit imagePullPolicy: IfNotPresent + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] command: ["/bin/bash", "-c"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/base/rq-worker-deployment.yaml` around lines 17 - 31, Add a pod-level seccompProfile and container-level privilege restrictions: under the Deployment spec's pod template (near the existing spec/podAffinity block) add a podSecurityContext with seccompProfile.type set to RuntimeDefault, and in the rq-worker container spec (container name "rq-worker") set securityContext.allowPrivilegeEscalation to false and securityContext.capabilities.drop to ["ALL"]; also evaluate and, if compatible, set securityContext.readOnlyRootFilesystem for the container (leave it unset if the Miniforge/RQ process requires writes).k8s/base/streamlit-deployment.yaml (1)
17-31:⚠️ Potential issue | 🟠 MajorAdd baseline pod/container security context.
This pod/container pair still uses the default security context. Please add pod-level
seccompProfile: RuntimeDefaultand container-levelallowPrivilegeEscalation: falsewithcapabilities.drop: ["ALL"]; only addreadOnlyRootFilesystemafter confirming Streamlit has writable runtime dirs elsewhere.Proposed patch
spec: + securityContext: + seccompProfile: + type: RuntimeDefault affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: @@ containers: - name: streamlit image: openms-streamlit imagePullPolicy: IfNotPresent + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] command: ["/bin/bash", "-c"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/base/streamlit-deployment.yaml` around lines 17 - 31, Add a pod-level securityContext with seccompProfile.type set to RuntimeDefault under the same spec that contains podAffinity, and in the containers list add a securityContext for the streamlit container (name: streamlit) that sets allowPrivilegeEscalation: false and capabilities.drop: ["ALL"]; do not add readOnlyRootFilesystem yet until writable runtime dirs are confirmed. Locate the pod spec around podAffinity and the containers entry for the streamlit container and insert these securityContext fields accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@k8s/base/streamlit-deployment.yaml`:
- Line 8: The deployment currently sets replicas: 2 while the podAffinity +
shared workspace PVC pins pods to the same node, so change the configuration to
reflect true HA: either reduce replicas to 1 (update replicas: 2 → replicas: 1
in the Streamlit Deployment), or migrate the workspace PVC to
RWX/external/object storage and then remove or loosen the same-node podAffinity
(look for podAffinity/podAntiAffinity and the shared workspace PVC reference) so
multiple replicas can land on different nodes; ensure any change updates related
sections noted around lines 18-27 and 68-71 consistently.
---
Duplicate comments:
In `@k8s/base/cleanup-cronjob.yaml`:
- Around line 19-39: The CronJob manifest is missing a securityContext and still
sources /root/miniforge3 in the container args; add a pod-level securityContext
and container.securityContext with runAsNonRoot: true and runAsUser set to a
non-root UID (e.g., 1000) and set fsGroup as needed, and update the cleanup
container spec (containers -> name: cleanup, image, command/args) to stop
sourcing /root/miniforge3 (remove the "source /root/miniforge3/bin/activate
streamlit-env" line and rely on the image PATH or non-root home), then ensure
the image is rebuilt/updated so that the runtime does not depend on
/root/miniforge3 allowing runAsUser/runAsNonRoot to be enforced.
In `@k8s/base/rq-worker-deployment.yaml`:
- Around line 17-31: Add a pod-level seccompProfile and container-level
privilege restrictions: under the Deployment spec's pod template (near the
existing spec/podAffinity block) add a podSecurityContext with
seccompProfile.type set to RuntimeDefault, and in the rq-worker container spec
(container name "rq-worker") set securityContext.allowPrivilegeEscalation to
false and securityContext.capabilities.drop to ["ALL"]; also evaluate and, if
compatible, set securityContext.readOnlyRootFilesystem for the container (leave
it unset if the Miniforge/RQ process requires writes).
In `@k8s/base/streamlit-deployment.yaml`:
- Around line 17-31: Add a pod-level securityContext with seccompProfile.type
set to RuntimeDefault under the same spec that contains podAffinity, and in the
containers list add a securityContext for the streamlit container (name:
streamlit) that sets allowPrivilegeEscalation: false and capabilities.drop:
["ALL"]; do not add readOnlyRootFilesystem yet until writable runtime dirs are
confirmed. Locate the pod spec around podAffinity and the containers entry for
the streamlit container and insert these securityContext fields accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 37843ee5-fcbc-4f54-963a-c742b8c9f889
📒 Files selected for processing (3)
k8s/base/cleanup-cronjob.yamlk8s/base/rq-worker-deployment.yamlk8s/base/streamlit-deployment.yaml
| labels: | ||
| component: streamlit | ||
| spec: | ||
| replicas: 2 |
There was a problem hiding this comment.
replicas: 2 does not provide node-level HA with this storage/scheduling model.
The required same-node pod affinity plus the shared workspace PVC means both Streamlit replicas are effectively pinned to one node. A node drain/failure still drops the whole frontend, so this replica count gives a misleading HA story unless the workspace storage moves to RWX or external/object storage.
Also applies to: 18-27, 68-71
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/base/streamlit-deployment.yaml` at line 8, The deployment currently sets
replicas: 2 while the podAffinity + shared workspace PVC pins pods to the same
node, so change the configuration to reflect true HA: either reduce replicas to
1 (update replicas: 2 → replicas: 1 in the Streamlit Deployment), or migrate the
workspace PVC to RWX/external/object storage and then remove or loosen the
same-node podAffinity (look for podAffinity/podAntiAffinity and the shared
workspace PVC reference) so multiple replicas can land on different nodes;
ensure any change updates related sections noted around lines 18-27 and 68-71
consistently.
…ration-plan-KQJwD
The controller pod being Ready doesn't guarantee the admission webhook service is accepting connections. Add a polling loop that waits for the webhook endpoint to have an IP assigned before applying the Ingress resource, preventing "connection refused" errors during kustomize apply. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
.github/workflows/k8s-manifests-ci.yml (1)
113-113:⚠️ Potential issue | 🔴 CriticalRemove
|| trueto prevent masking deployment failures.The
|| truesuffix swallowskubectl waitfailures, allowing the integration test to pass even when deployments never become available. This contradicts the purpose of the verification step.🐛 Proposed fix
- kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s || true + kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/k8s-manifests-ci.yml at line 113, The CI step currently appends "|| true" to the kubectl wait command (kubectl wait --for=condition=available deployment -l app=template-app --timeout=120s || true), which masks deployment failures; remove the "|| true" suffix so that kubectl wait exits with a non-zero status on failure and causes the job to fail, ensuring deployment availability errors are detected by the workflow.
🧹 Nitpick comments (2)
.github/workflows/k8s-manifests-ci.yml (2)
17-20: Pin kubeconform to a specific version for reproducible CI builds.Using
/latest/in the download URL means the workflow could break unexpectedly if a new kubeconform release introduces breaking changes or bugs.♻️ Proposed fix to pin the version
- name: Install kubeconform run: | - curl -sSL https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz | tar xz + KUBECONFORM_VERSION="v0.6.7" + curl -sSL "https://github.com/yannh/kubeconform/releases/download/${KUBECONFORM_VERSION}/kubeconform-linux-amd64.tar.gz" | tar xz sudo mv kubeconform /usr/local/bin/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/k8s-manifests-ci.yml around lines 17 - 20, The CI step that installs kubeconform (the "Install kubeconform" run block) currently downloads from the /latest/ URL which is not reproducible; change it to pin a specific release by defining and using a KUBECONFORM_VERSION variable (or hardcode a version string) and update the curl URL to reference that version (e.g., use the release tarball URL containing the version tag instead of /latest/), and ensure the move (sudo mv kubeconform /usr/local/bin/) remains the same so the pinned binary is installed.
80-80: Pin ingress-nginx to a stable versioned release instead ofmainbranch.Referencing the
mainbranch can introduce instability if upstream makes breaking changes. Use the latest stable release,controller-v1.15.0, instead.♻️ Proposed fix
- kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml + kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.15.0/deploy/static/provider/kind/deploy.yaml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/k8s-manifests-ci.yml at line 80, Update the kubectl apply invocation that currently points at the ingress-nginx main branch to a fixed, stable release URL: replace the URL "https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml" with the tagged release URL for controller-v1.15.0 (e.g., "https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.15.0/deploy/static/provider/kind/deploy.yaml") so the workflow always installs the specified stable ingress-nginx release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Around line 77-91: The ingress webhook readiness loop in the "Install nginx
ingress controller" step can silently time out; update that step so the script
tracks whether the webhook became ready (e.g., set a READY flag inside the
for-loop when the endpoint check succeeds) and after the loop check the flag
and, if not set, print a clear error message and exit non-zero (exit 1) to fail
the workflow rather than proceeding.
---
Duplicate comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Line 113: The CI step currently appends "|| true" to the kubectl wait command
(kubectl wait --for=condition=available deployment -l app=template-app
--timeout=120s || true), which masks deployment failures; remove the "|| true"
suffix so that kubectl wait exits with a non-zero status on failure and causes
the job to fail, ensuring deployment availability errors are detected by the
workflow.
---
Nitpick comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Around line 17-20: The CI step that installs kubeconform (the "Install
kubeconform" run block) currently downloads from the /latest/ URL which is not
reproducible; change it to pin a specific release by defining and using a
KUBECONFORM_VERSION variable (or hardcode a version string) and update the curl
URL to reference that version (e.g., use the release tarball URL containing the
version tag instead of /latest/), and ensure the move (sudo mv kubeconform
/usr/local/bin/) remains the same so the pinned binary is installed.
- Line 80: Update the kubectl apply invocation that currently points at the
ingress-nginx main branch to a fixed, stable release URL: replace the URL
"https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml"
with the tagged release URL for controller-v1.15.0 (e.g.,
"https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.15.0/deploy/static/provider/kind/deploy.yaml")
so the workflow always installs the specified stable ingress-nginx release.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 95cfa9b5-b2f5-49bc-b1f6-80489bb73742
📒 Files selected for processing (1)
.github/workflows/k8s-manifests-ci.yml
| - name: Install nginx ingress controller | ||
| if: steps.check.outputs.exists == 'true' | ||
| run: | | ||
| kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml | ||
| kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=90s | ||
| # Wait for the admission webhook to be ready to accept connections | ||
| echo "Waiting for ingress-nginx admission webhook..." | ||
| for i in $(seq 1 30); do | ||
| if kubectl get endpoints -n ingress-nginx ingress-nginx-controller-admission -o jsonpath='{.subsets[0].addresses[0].ip}' 2>/dev/null | grep -q .; then | ||
| echo "Webhook endpoint ready" | ||
| break | ||
| fi | ||
| echo " attempt $i/30 - waiting..." | ||
| sleep 2 | ||
| done |
There was a problem hiding this comment.
Ingress webhook readiness loop silently continues on timeout.
If the webhook endpoint never becomes ready within 30 attempts, the loop exits without an error, allowing the workflow to proceed. This can cause cryptic connection refused errors later during kubectl apply.
🐛 Proposed fix to fail explicitly on timeout
# Wait for the admission webhook to be ready to accept connections
echo "Waiting for ingress-nginx admission webhook..."
for i in $(seq 1 30); do
if kubectl get endpoints -n ingress-nginx ingress-nginx-controller-admission -o jsonpath='{.subsets[0].addresses[0].ip}' 2>/dev/null | grep -q .; then
echo "Webhook endpoint ready"
break
fi
echo " attempt $i/30 - waiting..."
sleep 2
+ if [ "$i" -eq 30 ]; then
+ echo "ERROR: Webhook endpoint did not become ready in time"
+ exit 1
+ fi
done📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Install nginx ingress controller | |
| if: steps.check.outputs.exists == 'true' | |
| run: | | |
| kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml | |
| kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=90s | |
| # Wait for the admission webhook to be ready to accept connections | |
| echo "Waiting for ingress-nginx admission webhook..." | |
| for i in $(seq 1 30); do | |
| if kubectl get endpoints -n ingress-nginx ingress-nginx-controller-admission -o jsonpath='{.subsets[0].addresses[0].ip}' 2>/dev/null | grep -q .; then | |
| echo "Webhook endpoint ready" | |
| break | |
| fi | |
| echo " attempt $i/30 - waiting..." | |
| sleep 2 | |
| done | |
| - name: Install nginx ingress controller | |
| if: steps.check.outputs.exists == 'true' | |
| run: | | |
| kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml | |
| kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=90s | |
| # Wait for the admission webhook to be ready to accept connections | |
| echo "Waiting for ingress-nginx admission webhook..." | |
| for i in $(seq 1 30); do | |
| if kubectl get endpoints -n ingress-nginx ingress-nginx-controller-admission -o jsonpath='{.subsets[0].addresses[0].ip}' 2>/dev/null | grep -q .; then | |
| echo "Webhook endpoint ready" | |
| break | |
| fi | |
| echo " attempt $i/30 - waiting..." | |
| sleep 2 | |
| if [ "$i" -eq 30 ]; then | |
| echo "ERROR: Webhook endpoint did not become ready in time" | |
| exit 1 | |
| fi | |
| done |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/k8s-manifests-ci.yml around lines 77 - 91, The ingress
webhook readiness loop in the "Install nginx ingress controller" step can
silently time out; update that step so the script tracks whether the webhook
became ready (e.g., set a READY flag inside the for-loop when the endpoint check
succeeds) and after the loop check the flag and, if not set, print a clear error
message and exit non-zero (exit 1) to fail the workflow rather than proceeding.
The kustomize overlay deploys into the openms namespace, but the verification steps (Redis wait, Redis ping, deployment checks) were querying the default namespace, causing "no matching resources found". https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
There was a problem hiding this comment.
♻️ Duplicate comments (2)
.github/workflows/k8s-manifests-ci.yml (2)
77-91:⚠️ Potential issue | 🟡 MinorIngress webhook readiness loop silently continues on timeout.
If the webhook endpoint never becomes ready within 30 attempts, the loop exits without an error, allowing the workflow to proceed. This can cause cryptic
connection refusederrors later duringkubectl apply.🐛 Proposed fix to fail explicitly on timeout
# Wait for the admission webhook to be ready to accept connections echo "Waiting for ingress-nginx admission webhook..." + WEBHOOK_READY=false for i in $(seq 1 30); do if kubectl get endpoints -n ingress-nginx ingress-nginx-controller-admission -o jsonpath='{.subsets[0].addresses[0].ip}' 2>/dev/null | grep -q .; then echo "Webhook endpoint ready" + WEBHOOK_READY=true break fi echo " attempt $i/30 - waiting..." sleep 2 done + if [ "$WEBHOOK_READY" != "true" ]; then + echo "ERROR: Webhook endpoint did not become ready in time" + exit 1 + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/k8s-manifests-ci.yml around lines 77 - 91, The readiness loop in the "Install nginx ingress controller" step that checks the endpoints for ingress-nginx-controller-admission can silently time out; update the loop to detect timeout and fail the job explicitly (e.g., set a flag like webhook_ready when the grep succeeds and after the for-loop check that flag and call exit 1 or fail the step with a descriptive processLogger-style message if not ready) so the workflow does not continue when the admission webhook never becomes available.
110-115:⚠️ Potential issue | 🔴 CriticalRemove
|| trueto prevent masking deployment failures in tests.Line 113 uses
|| true, which swallowskubectl waitfailures and allows the integration test to pass even when deployments never become available. The step is named "Verify all deployments are available," but the current implementation only reports status without enforcing it.🐛 Proposed fix
- kubectl wait -n openms --for=condition=available deployment -l app=template-app --timeout=120s || true + kubectl wait -n openms --for=condition=available deployment -l app=template-app --timeout=120s🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/k8s-manifests-ci.yml around lines 110 - 115, The CI step named "Verify all deployments are available" is masking failures by appending "|| true" to the kubectl wait command; remove the "|| true" so that a non-zero exit from "kubectl wait -n openms --for=condition=available deployment -l app=template-app --timeout=120s" fails the job, ensuring deployment unavailability causes the workflow to fail and surface the error (leave the subsequent kubectl get pods/services lines unchanged so they still run when the wait succeeds).
🧹 Nitpick comments (1)
.github/workflows/k8s-manifests-ci.yml (1)
17-20: Pin kubeconform to a specific version for reproducible builds.Using
/releases/latest/makes CI non-deterministic. A breaking change in kubeconform could cause unexpected failures unrelated to your changes. Pin to the latest stable version.♻️ Proposed fix to pin version
- name: Install kubeconform run: | - curl -sSL https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz | tar xz + KUBECONFORM_VERSION="v0.7.0" + curl -sSL "https://github.com/yannh/kubeconform/releases/download/${KUBECONFORM_VERSION}/kubeconform-linux-amd64.tar.gz" | tar xz sudo mv kubeconform /usr/local/bin/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/k8s-manifests-ci.yml around lines 17 - 20, The CI step named "Install kubeconform" currently downloads from /releases/latest/, making builds non-deterministic; change it to pin kubeconform to a specific release tag by introducing a fixed version variable (e.g., KUBECONFORM_VERSION) and using that tag in the download URL instead of "latest", update the curl URL used in that step and the related move operation to use the pinned version so the workflow always installs the exact kubeconform release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Around line 77-91: The readiness loop in the "Install nginx ingress
controller" step that checks the endpoints for
ingress-nginx-controller-admission can silently time out; update the loop to
detect timeout and fail the job explicitly (e.g., set a flag like webhook_ready
when the grep succeeds and after the for-loop check that flag and call exit 1 or
fail the step with a descriptive processLogger-style message if not ready) so
the workflow does not continue when the admission webhook never becomes
available.
- Around line 110-115: The CI step named "Verify all deployments are available"
is masking failures by appending "|| true" to the kubectl wait command; remove
the "|| true" so that a non-zero exit from "kubectl wait -n openms
--for=condition=available deployment -l app=template-app --timeout=120s" fails
the job, ensuring deployment unavailability causes the workflow to fail and
surface the error (leave the subsequent kubectl get pods/services lines
unchanged so they still run when the wait succeeds).
---
Nitpick comments:
In @.github/workflows/k8s-manifests-ci.yml:
- Around line 17-20: The CI step named "Install kubeconform" currently downloads
from /releases/latest/, making builds non-deterministic; change it to pin
kubeconform to a specific release tag by introducing a fixed version variable
(e.g., KUBECONFORM_VERSION) and using that tag in the download URL instead of
"latest", update the curl URL used in that step and the related move operation
to use the pinned version so the workflow always installs the exact kubeconform
release.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 200a528f-223c-4953-8563-e417d1703f4f
📒 Files selected for processing (1)
.github/workflows/k8s-manifests-ci.yml
Summary
This PR introduces comprehensive Kubernetes manifests and GitHub Actions workflows to enable containerized deployment of the OpenMS Streamlit application. It establishes the infrastructure-as-code foundation for running the application in Kubernetes clusters with automated validation and integration testing.
Key Changes
Kubernetes Manifests (k8s/)
Base manifests (
k8s/base/):namespace.yaml: Creates dedicatedopenmsnamespaceconfigmap.yaml: Centralized configuration for Streamlit settings, analytics, and queue settingsredis.yaml: Redis deployment and service for job queue backendstreamlit-deployment.yaml: Main Streamlit application with 2 replicas, health probes, and resource limitsstreamlit-service.yaml: ClusterIP service exposing Streamlit on port 8501rq-worker-deployment.yaml: RQ worker for background job processingcleanup-cronjob.yaml: Daily CronJob for workspace cleanup at 3 AM UTCworkspace-pvc.yaml: 100Gi PersistentVolumeClaim for shared workspace storageingress.yaml: NGINX ingress with WebSocket support, session affinity, and streaming optimizationsOverlay (
k8s/overlays/template-app/):kustomization.yaml: Kustomize configuration with namePrefix, image patching, and ingress host customizationGitHub Actions Workflows
k8s-manifests-ci.yml:build-and-push-image.yml:Configuration Updates
.streamlit/config.toml: RemovedenableCORSandenableXsrfProtectionflags (now handled by ingress)clean-up-workspaces.py: Made workspace directory configurable viaWORKSPACES_DIRenvironment variable for Kubernetes deploymentNotable Implementation Details
/_stcore/healthendpointhttps://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ
Summary by CodeRabbit
New Features
Chores