Skip to content

fix(grafana): Resolve issue with add-grafana make command & add new K8s dashboard#1018

Open
rh-rahulshetty wants to merge 3 commits intoambient-code:mainfrom
rh-rahulshetty:feat-update-grafana
Open

fix(grafana): Resolve issue with add-grafana make command & add new K8s dashboard#1018
rh-rahulshetty wants to merge 3 commits intoambient-code:mainfrom
rh-rahulshetty:feat-update-grafana

Conversation

@rh-rahulshetty
Copy link
Copy Markdown
Contributor

Summary

  • Moved Grafana resources (grafana.yaml, dashboards, RBAC) into the with-grafana overlay
  • Fixed Grafana Prometheus auth: init container injects SA token into datasource config at pod startup
  • Fixed PVC mount path (/var/lib/grafana instead of /var/lib/grafana/data) so grafana.db persists across restarts
  • Removed GF_SECURITY_ADMIN_PASSWORD env var that was resetting password on every restart
  • Added K8s infrastructure dashboards (cluster, nodes, namespace, pods)
  • PVC is managed separately from kustomize so make clean-observability preserves Grafana data

Dashboards

Dashboard Screenshot Description
K8s Cluster Monitoring Cluster-level CPU, memory, network
K8s Nodes Node-level resource usage
K8s Namespace Namespace-level resource usage
K8s Pods Pod-level resource usage

Test plan

  • make add-grafana deploys without errors
  • make clean-observability removes stack but preserves PVC
  • Grafana password persists across pod restarts
  • All dashboards load and show data from Prometheus

Note: These charts are taken from community dashboards like https://github.com/dotdc/grafana-dashboards-kubernetes, so some data might be missing depending on the cluster setup. I'll be addressing them in another change request.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

Walkthrough

Restructures observability manifests: moves core resources into a base kustomize, removes the monolithic Grafana manifest, adds Grafana overlay files (RBAC, deployment patch, PVC, datasources, dashboards), introduces dashboard provisioning via kustomize ConfigMaps, and updates Makefile/README to apply and preserve the Grafana PVC.

Changes

Cohort / File(s) Summary
Makefile & Docs
Makefile, components/manifests/observability/README.md
make add-grafana now applies grafana-pvc.yaml explicitly before the Grafana overlay; clean-observability help text updated to note PVC preservation and prints a manual PVC-delete command. README updated to document new workflow and automated dashboard provisioning.
Base observability kustomize
components/manifests/observability/base/kustomization.yaml, components/manifests/observability/kustomization.yaml
Added base/kustomization.yaml declaring otel-collector.yaml and servicemonitor.yaml in namespace ambient-code; root kustomization now references base/ instead of individual files.
Grafana overlay: core manifests
components/manifests/observability/overlays/with-grafana/grafana.yaml, components/manifests/observability/overlays/with-grafana/grafana-rbac.yaml
Introduced separate Grafana ConfigMaps (datasources, dashboards provider), Deployment, and Service; added ServiceAccount and ClusterRoleBinding for Grafana RBAC in ambient-code.
Grafana overlay: patches & init logic
components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml, components/manifests/observability/overlays/with-grafana/grafana-datasource-patch.yaml
Added deployment patch with initContainers: inject-token (substitutes token into datasources) and collect-dashboards (copies dashboard JSONs into runtime dir). Datasource patch removed inline RBAC and replaced token placeholder with INJECT_TOKEN_HERE; added jsonData.httpMethod: POST.
Grafana storage
components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml
Added standalone 5Gi PVC grafana-storage; Makefile now applies this PVC before overlay and clean-observability preserves it (documents manual deletion command).
Dashboard provisioning
components/manifests/observability/overlays/with-grafana/kustomization.yaml, .../dashboards/README.md, .../dashboards/*.json
Updated overlay kustomization to generate ConfigMaps from dashboard JSONs with disableNameSuffixHash: true; added guidance for adding dashboards and two new dashboards (k8s-cluster-monitoring.json, k8s-pods.json) plus README describing provisioning steps.
Removed monolith
components/manifests/observability/grafana.yaml (deleted)
Deleted the single-file Grafana manifest; resources redistributed into overlay files and patches.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Dev as Developer (make / kubectl)
participant K8s as Kubernetes API
participant PVC as PersistentVolumeClaim
participant Init as Pod InitContainers
participant Grafana as Grafana main container
participant CM as ConfigMaps (dashboards, datasources)
Dev->>K8s: kubectl apply -f grafana-pvc.yaml
K8s->>PVC: provision grafana-storage
Dev->>K8s: kubectl apply -k overlays/with-grafana/
K8s->>K8s: create ConfigMaps (dashboards, datasources), RBAC, Deployment, Service
K8s->>Init: start pod initContainers
Init->>CM: read datasource template (contains INJECT_TOKEN_HERE)
Init->>Init: substitute token from ServiceAccount token
Init->>Grafana: write rendered datasources & copy dashboards into /var/lib/grafana/dashboards (emptyDir)
Grafana->>PVC: mount grafana-storage
Grafana->>CM: read dashboards provider config from ConfigMap (file provider)
Grafana->>Prometheus: query metrics (via configured datasource)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately identifies the primary changes: fixing the add-grafana make command and adding new Kubernetes dashboards, matching the core objectives of the PR.
Description check ✅ Passed The description comprehensively covers the changeset, detailing resource reorganization, authentication fixes, PVC management, password persistence improvements, and new dashboard additions with supporting test evidence.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ambient-code
Copy link
Copy Markdown
Contributor

ambient-code bot commented Mar 26, 2026

Review Queue Status

Check Status Detail
CI FAIL test-local-dev-simulation
Conflicts pass ---
Reviews pass ---

Action needed: Fix failing CI checks

Auto-generated by Review Queue workflow. Updated when PR changes.

@jeremyeder
Copy link
Copy Markdown
Contributor

Hi @rh-rahulshetty this is awesome - can you rebase? Also has failing e2e. Does this depend on any other PRs or can it merge now?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`:
- Line 11: The init container image currently uses an unpinned tag
"registry.access.redhat.com/ubi9/ubi-minimal:latest" which breaks
reproducibility; change it to a fixed, immutable reference (either a specific
version tag or an image digest) by replacing
"registry.access.redhat.com/ubi9/ubi-minimal:latest" in the grafana deployment
patch with a pinned tag or digest (for example a known stable UBI9 tag or sha256
digest) so the init container image is deterministic across builds; update all
occurrences (including the other instance noted at the same file) of this image
string.
- Around line 10-16: The current container "inject-token" injects the
ServiceAccount token into /etc/grafana-provisioning/datasources.yaml using sed
with default delimiter, which can break if the token contains "/" or "&"; update
the injection in the inject-token container (the command/args that build TOKEN
and run sed) to use a safer substitution method (e.g., use a different sed
delimiter such as | or @, or write the token into the target file via a
here-doc/perl/awk that performs literal replacement) when replacing
INJECT_TOKEN_HERE in /etc/grafana-base/datasources.yaml and ensure the output is
written to /etc/grafana-provisioning/datasources.yaml.

In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`:
- Around line 1-9: The PersistentVolumeClaim resource (kind:
PersistentVolumeClaim, metadata.name: grafana-storage) lacks an explicit
namespace; add a metadata.namespace: ambient-code entry to the manifest so the
PVC is created in the expected ambient-code namespace when applied directly with
kubectl.

In `@components/manifests/observability/overlays/with-grafana/grafana.yaml`:
- Around line 47-72: Add livenessProbe and readinessProbe to the grafana
container spec (container name "grafana") so Kubernetes can manage pod health;
configure both to use an httpGet on path /api/health (or /api/healthz) against
port named "http" (3000) with sensible timing (e.g., initialDelaySeconds ~10-30,
periodSeconds ~10-15, timeoutSeconds ~1-5, failureThreshold ~3) and ensure
readinessProbe returns success before traffic is routed. Update the manifest
under the container "grafana" to include these probe blocks and tune values to
match startup characteristics.
- Around line 46-72: Add a pod- and container-level securityContext to harden
the Grafana deployment: under spec (pod) add a securityContext with fsGroup and
runAsNonRoot and runAsUser values, and under the grafana container (name:
grafana) add a securityContext specifying readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, and runAsNonRoot/runAsUser to enforce non-root
execution; update volumeMounts or storage as needed so read-only root filesystem
and the chosen fsGroup still allow Grafana to write to /var/lib/grafana and
/etc/grafana/provisioning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e6b10a79-c556-4aad-90a7-e82e0f0c8435

📥 Commits

Reviewing files that changed from the base of the PR and between 2bf8d2f and f6da902.

📒 Files selected for processing (19)
  • Makefile
  • components/manifests/observability/README.md
  • components/manifests/observability/base/kustomization.yaml
  • components/manifests/observability/base/otel-collector.yaml
  • components/manifests/observability/base/servicemonitor.yaml
  • components/manifests/observability/grafana.yaml
  • components/manifests/observability/kustomization.yaml
  • components/manifests/observability/overlays/with-grafana/dashboards/README.md
  • components/manifests/observability/overlays/with-grafana/dashboards/ambient-operator-dashboard.json
  • components/manifests/observability/overlays/with-grafana/dashboards/k8s-cluster-monitoring.json
  • components/manifests/observability/overlays/with-grafana/dashboards/k8s-namespace.json
  • components/manifests/observability/overlays/with-grafana/dashboards/k8s-nodes.json
  • components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json
  • components/manifests/observability/overlays/with-grafana/grafana-datasource-patch.yaml
  • components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml
  • components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml
  • components/manifests/observability/overlays/with-grafana/grafana-rbac.yaml
  • components/manifests/observability/overlays/with-grafana/grafana.yaml
  • components/manifests/observability/overlays/with-grafana/kustomization.yaml
💤 Files with no reviewable changes (1)
  • components/manifests/observability/grafana.yaml

Comment on lines +10 to +16
- name: inject-token
image: registry.access.redhat.com/ubi9/ubi-minimal:latest
command: ["sh", "-c"]
args:
- |
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Token injection via sed may break with special characters.

If the ServiceAccount token contains / or & characters, the sed substitution will fail or produce incorrect output. Consider using a different delimiter or a safer substitution method.

🔧 Safer token injection
         args:
         - |
           TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
-          sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml
+          # Use awk to avoid sed delimiter issues with tokens
+          awk -v token="$TOKEN" '{gsub(/INJECT_TOKEN_HERE/, token)}1' /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml

Alternatively, use a different sed delimiter that's unlikely to appear in JWT tokens:

-          sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml
+          sed "s|INJECT_TOKEN_HERE|${TOKEN}|" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml
🧰 Tools
🪛 Trivy (0.69.3)

[warning] 10-21: Can elevate its own privileges

Container 'inject-token' of Deployment 'grafana' should set 'securityContext.allowPrivilegeEscalation' to false

Rule: KSV-0001

Learn more

(IaC/Kubernetes)


[info] 10-21: Default capabilities: some containers do not drop all

Container 'inject-token' of Deployment 'grafana' should add 'ALL' to 'securityContext.capabilities.drop'

Rule: KSV-0003

Learn more

(IaC/Kubernetes)


[info] 10-21: Default capabilities: some containers do not drop any

Container 'inject-token' of 'deployment' 'grafana' in 'default' namespace should set securityContext.capabilities.drop

Rule: KSV-0004

Learn more

(IaC/Kubernetes)


[info] 10-21: CPU not limited

Container 'inject-token' of Deployment 'grafana' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)


[warning] 10-21: Runs as root user

Container 'inject-token' of Deployment 'grafana' should set 'securityContext.runAsNonRoot' to true

Rule: KSV-0012

Learn more

(IaC/Kubernetes)


[warning] 10-21: Image tag ":latest" used

Container 'inject-token' of Deployment 'grafana' should specify an image tag

Rule: KSV-0013

Learn more

(IaC/Kubernetes)


[error] 10-21: Root file system is not read-only

Container 'inject-token' of Deployment 'grafana' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[info] 10-21: CPU requests not specified

Container 'inject-token' of Deployment 'grafana' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)


[info] 10-21: Memory requests not specified

Container 'inject-token' of Deployment 'grafana' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)


[info] 10-21: Memory not limited

Container 'inject-token' of Deployment 'grafana' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)


[info] 10-21: Runs with UID <= 10000

Container 'inject-token' of Deployment 'grafana' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)


[info] 10-21: Runs with GID <= 10000

Container 'inject-token' of Deployment 'grafana' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)


[info] 10-21: Runtime/Default Seccomp profile not set

Either Pod or Container should set 'securityContext.seccompProfile.type' to 'RuntimeDefault'

Rule: KSV-0030

Learn more

(IaC/Kubernetes)


[warning] 10-21: Seccomp policies disabled

container "inject-token" of deployment "grafana" in "default" namespace should specify a seccomp profile

Rule: KSV-0104

Learn more

(IaC/Kubernetes)


[info] 10-21: Container capabilities must only include NET_BIND_SERVICE

container should drop all

Rule: KSV-0106

Learn more

(IaC/Kubernetes)


[error] 10-21: Default security context configured

container grafana in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[warning] 10-21: Restrict container images to trusted registries

Container inject-token in deployment grafana (namespace: default) uses an image from an untrusted registry.

Rule: KSV-0125

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`
around lines 10 - 16, The current container "inject-token" injects the
ServiceAccount token into /etc/grafana-provisioning/datasources.yaml using sed
with default delimiter, which can break if the token contains "/" or "&"; update
the injection in the inject-token container (the command/args that build TOKEN
and run sed) to use a safer substitution method (e.g., use a different sed
delimiter such as | or @, or write the token into the target file via a
here-doc/perl/awk that performs literal replacement) when replacing
INJECT_TOKEN_HERE in /etc/grafana-base/datasources.yaml and ensure the output is
written to /etc/grafana-provisioning/datasources.yaml.

serviceAccountName: grafana
initContainers:
- name: inject-token
image: registry.access.redhat.com/ubi9/ubi-minimal:latest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Pin init container image tags for reproducibility.

Using :latest tag for ubi9/ubi-minimal makes builds non-reproducible and could introduce unexpected changes.

📌 Suggested fix
       - name: inject-token
-        image: registry.access.redhat.com/ubi9/ubi-minimal:latest
+        image: registry.access.redhat.com/ubi9/ubi-minimal:9.4
       - name: collect-dashboards
-        image: registry.access.redhat.com/ubi9/ubi-minimal:latest
+        image: registry.access.redhat.com/ubi9/ubi-minimal:9.4

Also applies to: 23-23

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`
at line 11, The init container image currently uses an unpinned tag
"registry.access.redhat.com/ubi9/ubi-minimal:latest" which breaks
reproducibility; change it to a fixed, immutable reference (either a specific
version tag or an image digest) by replacing
"registry.access.redhat.com/ubi9/ubi-minimal:latest" in the grafana deployment
patch with a pinned tag or digest (for example a known stable UBI9 tag or sha256
digest) so the init container image is deterministic across builds; update all
occurrences (including the other instance noted at the same file) of this image
string.

Comment on lines +1 to +9
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-storage
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 5Gi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add explicit namespace to avoid deployment issues.

The PVC is applied directly via kubectl apply -f (not through Kustomize), so it won't inherit the namespace from the kustomization. Without an explicit namespace, the PVC will be created in the current kubectl context's namespace, which may not be ambient-code.

🔧 Proposed fix
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
   name: grafana-storage
+  namespace: ambient-code
 spec:
   accessModes: [ReadWriteOnce]
   resources:
     requests:
       storage: 5Gi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-storage
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 5Gi
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-storage
namespace: ambient-code
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 5Gi
🧰 Tools
🪛 Trivy (0.69.3)

[info] 5-9: limit range usage

A LimitRange policy with a default requests and limits for each container should be configured

Rule: KSV-0039

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`
around lines 1 - 9, The PersistentVolumeClaim resource (kind:
PersistentVolumeClaim, metadata.name: grafana-storage) lacks an explicit
namespace; add a metadata.namespace: ambient-code entry to the manifest so the
PVC is created in the expected ambient-code namespace when applied directly with
kubectl.

Comment on lines +46 to +72
spec:
containers:
- name: grafana
image: grafana/grafana:11.4.0
ports:
- containerPort: 3000
name: http
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_USERS_ALLOW_SIGN_UP
value: "false"
volumeMounts:
- name: datasources
mountPath: /etc/grafana/provisioning/datasources
- name: dashboards-provider
mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml
subPath: dashboards.yaml
- name: storage
mountPath: /var/lib/grafana
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add security context to harden the Grafana container.

The deployment lacks a securityContext for the pod and container. For production workloads, consider adding:

🛡️ Suggested security hardening
     spec:
+      securityContext:
+        runAsNonRoot: true
+        seccompProfile:
+          type: RuntimeDefault
       containers:
       - name: grafana
         image: grafana/grafana:11.4.0
+        securityContext:
+          allowPrivilegeEscalation: false
+          capabilities:
+            drop:
+              - ALL
+          readOnlyRootFilesystem: false  # Grafana needs write access
         ports:
         - containerPort: 3000
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
spec:
containers:
- name: grafana
image: grafana/grafana:11.4.0
ports:
- containerPort: 3000
name: http
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_USERS_ALLOW_SIGN_UP
value: "false"
volumeMounts:
- name: datasources
mountPath: /etc/grafana/provisioning/datasources
- name: dashboards-provider
mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml
subPath: dashboards.yaml
- name: storage
mountPath: /var/lib/grafana
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
spec:
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: grafana
image: grafana/grafana:11.4.0
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false # Grafana needs write access
ports:
- containerPort: 3000
name: http
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_USERS_ALLOW_SIGN_UP
value: "false"
volumeMounts:
- name: datasources
mountPath: /etc/grafana/provisioning/datasources
- name: dashboards-provider
mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml
subPath: dashboards.yaml
- name: storage
mountPath: /var/lib/grafana
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
🧰 Tools
🪛 Trivy (0.69.3)

[warning] 46-70: Can elevate its own privileges

Container 'grafana' of Deployment 'grafana' should set 'securityContext.allowPrivilegeEscalation' to false

Rule: KSV-0001

Learn more

(IaC/Kubernetes)


[info] 46-70: Default capabilities: some containers do not drop all

Container 'grafana' of Deployment 'grafana' should add 'ALL' to 'securityContext.capabilities.drop'

Rule: KSV-0003

Learn more

(IaC/Kubernetes)


[info] 46-70: Default capabilities: some containers do not drop any

Container 'grafana' of 'deployment' 'grafana' in 'default' namespace should set securityContext.capabilities.drop

Rule: KSV-0004

Learn more

(IaC/Kubernetes)


[warning] 46-70: Runs as root user

Container 'grafana' of Deployment 'grafana' should set 'securityContext.runAsNonRoot' to true

Rule: KSV-0012

Learn more

(IaC/Kubernetes)


[error] 46-70: Root file system is not read-only

Container 'grafana' of Deployment 'grafana' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[info] 46-70: Runs with UID <= 10000

Container 'grafana' of Deployment 'grafana' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)


[info] 46-70: Runs with GID <= 10000

Container 'grafana' of Deployment 'grafana' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)


[info] 46-70: Runtime/Default Seccomp profile not set

Either Pod or Container should set 'securityContext.seccompProfile.type' to 'RuntimeDefault'

Rule: KSV-0030

Learn more

(IaC/Kubernetes)


[warning] 46-70: Seccomp policies disabled

container "grafana" of deployment "grafana" in "default" namespace should specify a seccomp profile

Rule: KSV-0104

Learn more

(IaC/Kubernetes)


[info] 46-70: Container capabilities must only include NET_BIND_SERVICE

container should drop all

Rule: KSV-0106

Learn more

(IaC/Kubernetes)


[error] 46-70: Default security context configured

container grafana in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[warning] 46-70: Restrict container images to trusted registries

Container grafana in deployment grafana (namespace: default) uses an image from an untrusted registry.

Rule: KSV-0125

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/manifests/observability/overlays/with-grafana/grafana.yaml` around
lines 46 - 72, Add a pod- and container-level securityContext to harden the
Grafana deployment: under spec (pod) add a securityContext with fsGroup and
runAsNonRoot and runAsUser values, and under the grafana container (name:
grafana) add a securityContext specifying readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, and runAsNonRoot/runAsUser to enforce non-root
execution; update volumeMounts or storage as needed so read-only root filesystem
and the chosen fsGroup still allow Grafana to write to /var/lib/grafana and
/etc/grafana/provisioning.

Comment on lines +47 to +72
containers:
- name: grafana
image: grafana/grafana:11.4.0
ports:
- containerPort: 3000
name: http
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_USERS_ALLOW_SIGN_UP
value: "false"
volumeMounts:
- name: datasources
mountPath: /etc/grafana/provisioning/datasources
- name: dashboards-provider
mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml
subPath: dashboards.yaml
- name: storage
mountPath: /var/lib/grafana
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding liveness and readiness probes.

The Grafana container lacks health probes, which can impact pod lifecycle management and rolling updates.

💡 Suggested probe configuration
       - name: grafana
         image: grafana/grafana:11.4.0
         ports:
         - containerPort: 3000
           name: http
+        livenessProbe:
+          httpGet:
+            path: /api/health
+            port: 3000
+          initialDelaySeconds: 30
+          periodSeconds: 10
+        readinessProbe:
+          httpGet:
+            path: /api/health
+            port: 3000
+          initialDelaySeconds: 10
+          periodSeconds: 5
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/manifests/observability/overlays/with-grafana/grafana.yaml` around
lines 47 - 72, Add livenessProbe and readinessProbe to the grafana container
spec (container name "grafana") so Kubernetes can manage pod health; configure
both to use an httpGet on path /api/health (or /api/healthz) against port named
"http" (3000) with sensible timing (e.g., initialDelaySeconds ~10-30,
periodSeconds ~10-15, timeoutSeconds ~1-5, failureThreshold ~3) and ensure
readinessProbe returns success before traffic is routed. Update the manifest
under the container "grafana" to include these probe blocks and tune values to
match startup characteristics.

@rh-rahulshetty
Copy link
Copy Markdown
Contributor Author

Hi @jeremyeder,
I've rebased from the latest commit, and also at the moment I don't have dep on other PR as I've developed and tested this branch independently. Not sure why the e2e tests have been skipped here :/

@ambient-code ambient-code bot added this to the Review Queue milestone Mar 27, 2026
Signed-off-by: Rahul Shetty <rashetty@redhat.com>
Signed-off-by: Rahul Shetty <rashetty@redhat.com>
Signed-off-by: Rahul Shetty <rashetty@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml (1)

3-5: ⚠️ Potential issue | 🟠 Major

Set an explicit namespace on the PVC manifest.

Because this file is applied directly, the claim can be created in the wrong namespace when kube context differs, causing Grafana to miss the expected claim.

🔧 Proposed fix
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
   name: grafana-storage
+  namespace: ambient-code
 spec:
   accessModes: [ReadWriteOnce]
   resources:
     requests:
       storage: 5Gi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`
around lines 3 - 5, The PVC manifest lacks an explicit namespace so it can be
created in the wrong context; update the grafana-pvc.yaml by adding
metadata.namespace with the Grafana release namespace (i.e. the namespace where
the Grafana Deployment/Helm release runs) alongside metadata.name:
grafana-storage so the PersistentVolumeClaim is always created in the correct
namespace; ensure this namespace matches the Grafana deployment/service account
namespace referenced in your overlays.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json`:
- Line 2149: The PromQL expressions like
kube_pod_container_status_waiting_reason{reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}
must include the dashboard variables for cluster and namespace so results
respect the selected scope; update that selector to include cluster=~"$cluster"
and namespace=~"$namespace" (e.g.
kube_pod_container_status_waiting_reason{cluster=~"$cluster",
namespace=~"$namespace",
reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}) and apply the same
change to the other similar query at the second occurrence so both tables filter
by the selected cluster and namespace.

In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`:
- Around line 15-16: The init-container currently snapshots the ServiceAccount
token via the sed call that reads
/var/run/secrets/kubernetes.io/serviceaccount/token and writes
/etc/grafana-provisioning/datasources.yaml (the sed
"s/INJECT_TOKEN_HERE/${TOKEN}/" step), which becomes stale after kubelet token
rotation; remove that init-only token snapshot and instead add a small sidecar
(token-refresh) that watches the live token file
(/var/run/secrets/kubernetes.io/serviceaccount/token) and atomically rewrites
/etc/grafana-provisioning/datasources.yaml from the template at
/etc/grafana-base/datasources.yaml whenever the token changes (or use an
in-place updater loop with sleep/watch+cmp), ensuring the sidecar has the same
shared volume mounts as the Grafana container; alternatively, mount a projected
serviceAccountToken volume and modify Grafana provisioning to read the token
file at runtime if supported—locate the current sed invocation in the
initContainer and replace it with the sidecar implementation that updates the
same target path.

In `@components/manifests/observability/overlays/with-grafana/grafana.yaml`:
- Around line 53-57: The Deployment currently sets GF_SECURITY_ADMIN_USER and
GF_USERS_ALLOW_SIGN_UP but leaves the admin password at Grafana's default; add
an env var GF_SECURITY_ADMIN_PASSWORD that sources its value from a Kubernetes
Secret (e.g., secret name like grafana-admin-secret, key like admin-password)
using valueFrom.secretKeyRef in the container spec so the password is not
hardcoded or defaulted; ensure the Secret is created separately and referenced
by the env entry alongside GF_SECURITY_ADMIN_USER and GF_USERS_ALLOW_SIGN_UP.

In `@components/manifests/observability/overlays/with-grafana/kustomization.yaml`:
- Around line 36-37: The kustomization currently sets
generatorOptions.disableNameSuffixHash: true which prevents ConfigMap names from
changing and thus prevents the init-container that copies dashboards from
triggering pod rollouts; to fix, remove or set
generatorOptions.disableNameSuffixHash to false so kustomize appends content
hashes to generated ConfigMap names (causing Deployment changes), or
alternatively add an explicit rollout trigger by updating the Deployment pod
template (the init-container pod spec) to include an annotation like a checksum
of the dashboard ConfigMap(s) (e.g., checksum/config-dashboards) that is
recalculated when the ConfigMap content changes so pods are recreated and the
init-container picks up updated dashboards.

---

Duplicate comments:
In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`:
- Around line 3-5: The PVC manifest lacks an explicit namespace so it can be
created in the wrong context; update the grafana-pvc.yaml by adding
metadata.namespace with the Grafana release namespace (i.e. the namespace where
the Grafana Deployment/Helm release runs) alongside metadata.name:
grafana-storage so the PersistentVolumeClaim is always created in the correct
namespace; ensure this namespace matches the Grafana deployment/service account
namespace referenced in your overlays.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9e7e669a-5ca5-4276-a70f-6d5894619be1

📥 Commits

Reviewing files that changed from the base of the PR and between f6da902 and d6d861f.

📒 Files selected for processing (19)
  • Makefile
  • components/manifests/observability/README.md
  • components/manifests/observability/base/kustomization.yaml
  • components/manifests/observability/base/otel-collector.yaml
  • components/manifests/observability/base/servicemonitor.yaml
  • components/manifests/observability/grafana.yaml
  • components/manifests/observability/kustomization.yaml
  • components/manifests/observability/overlays/with-grafana/dashboards/README.md
  • components/manifests/observability/overlays/with-grafana/dashboards/ambient-operator-dashboard.json
  • components/manifests/observability/overlays/with-grafana/dashboards/k8s-cluster-monitoring.json
  • components/manifests/observability/overlays/with-grafana/dashboards/k8s-namespace.json
  • components/manifests/observability/overlays/with-grafana/dashboards/k8s-nodes.json
  • components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json
  • components/manifests/observability/overlays/with-grafana/grafana-datasource-patch.yaml
  • components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml
  • components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml
  • components/manifests/observability/overlays/with-grafana/grafana-rbac.yaml
  • components/manifests/observability/overlays/with-grafana/grafana.yaml
  • components/manifests/observability/overlays/with-grafana/kustomization.yaml
💤 Files with no reviewable changes (1)
  • components/manifests/observability/grafana.yaml

},
"editorMode": "code",
"exemplar": false,
"expr": "kube_pod_container_status_waiting_reason{reason=~\"ErrImagePull|ImagePullBackOff|CrashLoopBackOff\"} == 1",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Respect the selected cluster and namespace in these tables.

Lines 2149 and 2310 omit $cluster and $namespace, so this dashboard can show incident rows from unrelated namespaces or even other clusters after the user narrows the view. That makes both tables misleading in a shared Prometheus.

Suggested query fix
-          "expr": "kube_pod_container_status_waiting_reason{reason=~\"ErrImagePull|ImagePullBackOff|CrashLoopBackOff\"} == 1",
+          "expr": "kube_pod_container_status_waiting_reason{cluster=\"$cluster\", namespace=\"$namespace\", reason=~\"ErrImagePull|ImagePullBackOff|CrashLoopBackOff\"} == 1",
-          "expr": "kube_pod_status_scheduled{condition=\"false\"} == 1",
+          "expr": "kube_pod_status_scheduled{cluster=\"$cluster\", namespace=\"$namespace\", condition=\"false\"} == 1",

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 2310-2310

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json`
at line 2149, The PromQL expressions like
kube_pod_container_status_waiting_reason{reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}
must include the dashboard variables for cluster and namespace so results
respect the selected scope; update that selector to include cluster=~"$cluster"
and namespace=~"$namespace" (e.g.
kube_pod_container_status_waiting_reason{cluster=~"$cluster",
namespace=~"$namespace",
reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}) and apply the same
change to the other similar query at the second occurrence so both tables filter
by the selected cluster and namespace.

Comment on lines +15 to +16
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n "components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml"

Repository: ambient-code/platform

Length of output: 3308


🌐 Web query:

Kubernetes projected ServiceAccount token rotation behavior

💡 Result:

Kubernetes projected ServiceAccount tokens are short-lived JWTs obtained via the TokenRequest API and mounted into pods as projected volumes. The kubelet manages their lifecycle, automatically requesting new tokens and updating the mounted files. Key rotation behavior: - Tokens have configurable expirationSeconds (default 1 hour, minimum 10 minutes, max limited by --service-account-max-token-expiration). - Kubelet proactively refreshes the token if older than 80% of TTL or older than 24 hours. - The token file is updated atomically; applications must reload the file to use the new token (e.g., periodically or on file change). - Tokens are bound to the pod: invalidated on pod deletion (within ~60s grace period). - By default (since v1.22), ServiceAccount admission controller injects a projected volume (kube-api-access-*) with a token for API access. Example pod spec: volumes: - name: token projected: sources: - serviceAccountToken: expirationSeconds: 3600 audience: api Applications should read the token from the mount path (e.g., /var/run/secrets/kubernetes.io/serviceaccount/token) on a schedule (e.g., every 5 min) or watch for file changes, as Kubernetes clients handle this automatically. Legacy non-expiring secret-based tokens are deprecated and cleaned up in modern clusters (v1.24+). This is stable since Kubernetes v1.20 (TokenRequestProjection GA).

Citations:


🌐 Web query:

OpenShift token rotation init container lifecycle

💡 Result:

No specific official documentation or authoritative source describes a standard "OpenShift token rotation init container lifecycle." Searches across official Red Hat OpenShift docs, OKD docs, Kubernetes docs, GitHub issues, and bugzillas yielded no direct matches for this exact concept. Key related findings: Bound service account (SA) tokens in OpenShift (docs from 4.8 to 4.20) are automatically rotated by the kubelet when older than 80% of expirationSeconds or 24 hours. Apps must reload the token file. No init container involvement; rotation happens during main container runtime via projected volumes. Init containers run sequentially to completion before main containers start, sharing pod resources but with separate images. They are ideal for setup (e.g., waiting for services, cloning repos) but not ongoing rotation, as they don't run continuously. In Cluster API (CAPI) operator (TechPreview), a controller rotates token secrets every 30 minutes if older (Bug 2089254, fixed in 4.11). No mention of init containers in operator pod spec. Custom patterns exist: token refreshers (e.g., kube-token-refresher, cap-token-refresh cronjob), External Secrets Operator for external secret sync/rotation, but none use init containers specifically for OpenShift SA token rotation. Conclusion: Token rotation typically uses bound projected volumes (automatic kubelet-managed) or operators/controllers. Init containers are unsuitable for rotation lifecycle due to their one-time execution nature. Custom init container implementations for token fetch/rotation are possible but non-standard, undocumented officially, and would require pod restarts for refresh (defeating rotation purpose).

Citations:


Init-container token snapshot will become stale after automatic token rotation.

Lines 15–16 read the ServiceAccount token once during init-container execution and snapshot it into datasources.yaml. On current Kubernetes/OpenShift releases, this token is automatically rotated by the kubelet (default ~1 hour), but the snapshot is never updated. Once the token expires, Grafana's Prometheus datasource authentication fails. Restarting the Grafana container does not help because init containers only run at pod creation time; the pod must be recreated to obtain a fresh token. Use a token-refresh mechanism (e.g., a sidecar or operator) instead of an init-only snapshot.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`
around lines 15 - 16, The init-container currently snapshots the ServiceAccount
token via the sed call that reads
/var/run/secrets/kubernetes.io/serviceaccount/token and writes
/etc/grafana-provisioning/datasources.yaml (the sed
"s/INJECT_TOKEN_HERE/${TOKEN}/" step), which becomes stale after kubelet token
rotation; remove that init-only token snapshot and instead add a small sidecar
(token-refresh) that watches the live token file
(/var/run/secrets/kubernetes.io/serviceaccount/token) and atomically rewrites
/etc/grafana-provisioning/datasources.yaml from the template at
/etc/grafana-base/datasources.yaml whenever the token changes (or use an
in-place updater loop with sleep/watch+cmp), ensuring the sidecar has the same
shared volume mounts as the Grafana container; alternatively, mount a projected
serviceAccountToken volume and modify Grafana provisioning to read the token
file at runtime if supported—locate the current sed invocation in the
initContainer and replace it with the sidecar implementation that updates the
same target path.

Comment on lines +53 to +57
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_USERS_ALLOW_SIGN_UP
value: "false"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "grafana.yaml" -o -name "grafana-deployment-patch.yaml" | head -20

Repository: ambient-code/platform

Length of output: 225


🏁 Script executed:

cat -n components/manifests/observability/overlays/with-grafana/grafana.yaml | head -100

Repository: ambient-code/platform

Length of output: 2767


🏁 Script executed:

cat -n components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml

Repository: ambient-code/platform

Length of output: 3308


🏁 Script executed:

# Search for any Secret references in the overlays directory
rg -i "secret" components/manifests/observability/overlays/with-grafana/ --context 3

Repository: ambient-code/platform

Length of output: 2044


🌐 Web query:

Grafana admin password environment variable GF_SECURITY_ADMIN_PASSWORD default behavior

💡 Result:

The environment variable GF_SECURITY_ADMIN_PASSWORD overrides the [security] admin_password setting in Grafana's configuration. By default, without this variable set (or the ini equivalent), the admin password is "admin". However, it is only applied on first-run during initial database initialization when the default admin user is created. On subsequent runs, especially with persistent data storage (e.g., Docker volumes), the password is already set in the database and cannot be overridden by the environment variable; it must be changed via the UI, CLI, or API. To use it effectively: - Ensure fresh data directory (empty /var/lib/grafana). - Set via Docker: docker run -e "GF_SECURITY_ADMIN_PASSWORD=yourpassword" grafana/grafana - Defaults.ini explicitly sets admin_password = admin. Sources confirm consistent behavior across versions, with issues arising from persistence.

Citations:


🏁 Script executed:

find components/manifests/observability/overlays/with-grafana -name "kustomization.yaml" -o -name "kustomization.yml"

Repository: ambient-code/platform

Length of output: 140


🏁 Script executed:

find components/manifests/observability/overlays/with-grafana -type f -name "*.yaml" | xargs basename -a | sort | uniq

Repository: ambient-code/platform

Length of output: 191


🏁 Script executed:

cat -n components/manifests/observability/overlays/with-grafana/kustomization.yaml

Repository: ambient-code/platform

Length of output: 1285


Configure admin password from a Secret instead of relying on Grafana's built-in default.

Lines 54-57 set the admin username to admin but do not configure the admin password. With this Deployment configuration and persistent storage at /var/lib/grafana, a fresh Grafana instance will bootstrap with Grafana's default admin/admin credential, creating a known security vulnerability on first deploy. Add GF_SECURITY_ADMIN_PASSWORD sourced from a Kubernetes Secret instead of allowing the default to persist.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/manifests/observability/overlays/with-grafana/grafana.yaml` around
lines 53 - 57, The Deployment currently sets GF_SECURITY_ADMIN_USER and
GF_USERS_ALLOW_SIGN_UP but leaves the admin password at Grafana's default; add
an env var GF_SECURITY_ADMIN_PASSWORD that sources its value from a Kubernetes
Secret (e.g., secret name like grafana-admin-secret, key like admin-password)
using valueFrom.secretKeyRef in the container spec so the password is not
hardcoded or defaulted; ensure the Secret is created separately and referenced
by the env entry alongside GF_SECURITY_ADMIN_USER and GF_USERS_ALLOW_SIGN_UP.

Comment on lines +36 to +37
generatorOptions:
disableNameSuffixHash: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the target kustomization.yaml file
find . -path "*components/manifests/observability/overlays/with-grafana/kustomization.yaml" -type f

Repository: ambient-code/platform

Length of output: 142


🏁 Script executed:

# Check the structure of the observability directory
find ./components/manifests/observability -type f -name "*.yaml" | head -20

Repository: ambient-code/platform

Length of output: 786


🏁 Script executed:

# Examine the kustomization.yaml file content
if [ -f "components/manifests/observability/overlays/with-grafana/kustomization.yaml" ]; then
  cat -n "components/manifests/observability/overlays/with-grafana/kustomization.yaml"
fi

Repository: ambient-code/platform

Length of output: 1285


🏁 Script executed:

# Search for init-container references and ConfigMap copying logic
rg -i "init.*container|copy.*configmap|dashboard" --type yaml -A 3 -B 3 ./components/manifests/observability/

Repository: ambient-code/platform

Length of output: 14768


🏁 Script executed:

# Look for Deployments that might reference the grafana overlay
rg "kind.*Deployment" --type yaml -A 10 ./components/manifests/observability/

Repository: ambient-code/platform

Length of output: 3094


Dashboard ConfigMap updates won't trigger pod rollouts with disableNameSuffixHash: true.

Since the init-container copies dashboards from ConfigMaps at pod startup and those ConfigMaps have stable names, content changes won't propagate to existing pods. The Deployment spec remains unchanged when the ConfigMap content is updated, so no new pods are created to fetch the updated dashboard JSON files.

To ensure dashboard updates roll out automatically, remove the hash suffix disablement so ConfigMap names change with content. Alternatively, implement an explicit rollout trigger (such as a pod restart annotation or webhook) whenever dashboard ConfigMaps are updated.

🔧 Proposed fix
-generatorOptions:
-  disableNameSuffixHash: true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
generatorOptions:
disableNameSuffixHash: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/manifests/observability/overlays/with-grafana/kustomization.yaml`
around lines 36 - 37, The kustomization currently sets
generatorOptions.disableNameSuffixHash: true which prevents ConfigMap names from
changing and thus prevents the init-container that copies dashboards from
triggering pod rollouts; to fix, remove or set
generatorOptions.disableNameSuffixHash to false so kustomize appends content
hashes to generated ConfigMap names (causing Deployment changes), or
alternatively add an explicit rollout trigger by updating the Deployment pod
template (the init-container pod spec) to include an annotation like a checksum
of the dashboard ConfigMap(s) (e.g., checksum/config-dashboards) that is
recalculated when the ConfigMap content changes so pods are recreated and the
init-container picks up updated dashboards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants