fix(grafana): Resolve issue with add-grafana make command & add new K8s dashboard by rh-rahulshetty · Pull Request #1018 · ambient-code/platform

rh-rahulshetty · 2026-03-25T10:38:11Z

Summary

Moved Grafana resources (grafana.yaml, dashboards, RBAC) into the with-grafana overlay
Fixed Grafana Prometheus auth: init container injects SA token into datasource config at pod startup
Fixed PVC mount path (/var/lib/grafana instead of /var/lib/grafana/data) so grafana.db persists across restarts
Removed GF_SECURITY_ADMIN_PASSWORD env var that was resetting password on every restart
Added K8s infrastructure dashboards (cluster, nodes, namespace, pods)
PVC is managed separately from kustomize so make clean-observability preserves Grafana data

Dashboards

Dashboard	Screenshot	Description
K8s Cluster Monitoring		Cluster-level CPU, memory, network
K8s Nodes		Node-level resource usage
K8s Namespace		Namespace-level resource usage
K8s Pods		Pod-level resource usage

Test plan

make add-grafana deploys without errors
make clean-observability removes stack but preserves PVC
Grafana password persists across pod restarts
All dashboards load and show data from Prometheus

Note: These charts are taken from community dashboards like https://github.com/dotdc/grafana-dashboards-kubernetes, so some data might be missing depending on the cluster setup. I'll be addressing them in another change request.

coderabbitai · 2026-03-25T10:38:28Z

Walkthrough

Restructures observability manifests: moves core resources into a base kustomize, removes the monolithic Grafana manifest, adds Grafana overlay files (RBAC, deployment patch, PVC, datasources, dashboards), introduces dashboard provisioning via kustomize ConfigMaps, and updates Makefile/README to apply and preserve the Grafana PVC.

Changes

Cohort / File(s)	Summary
Makefile & Docs `Makefile`, `components/manifests/observability/README.md`	`make add-grafana` now applies `grafana-pvc.yaml` explicitly before the Grafana overlay; `clean-observability` help text updated to note PVC preservation and prints a manual PVC-delete command. README updated to document new workflow and automated dashboard provisioning.
Base observability kustomize `components/manifests/observability/base/kustomization.yaml`, `components/manifests/observability/kustomization.yaml`	Added `base/kustomization.yaml` declaring `otel-collector.yaml` and `servicemonitor.yaml` in namespace `ambient-code`; root kustomization now references `base/` instead of individual files.
Grafana overlay: core manifests `components/manifests/observability/overlays/with-grafana/grafana.yaml`, `components/manifests/observability/overlays/with-grafana/grafana-rbac.yaml`	Introduced separate Grafana `ConfigMap`s (datasources, dashboards provider), `Deployment`, and `Service`; added `ServiceAccount` and `ClusterRoleBinding` for Grafana RBAC in `ambient-code`.
Grafana overlay: patches & init logic `components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`, `components/manifests/observability/overlays/with-grafana/grafana-datasource-patch.yaml`	Added deployment patch with initContainers: `inject-token` (substitutes token into datasources) and `collect-dashboards` (copies dashboard JSONs into runtime dir). Datasource patch removed inline RBAC and replaced token placeholder with `INJECT_TOKEN_HERE`; added jsonData.httpMethod: POST.
Grafana storage `components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`	Added standalone 5Gi PVC `grafana-storage`; Makefile now applies this PVC before overlay and `clean-observability` preserves it (documents manual deletion command).
Dashboard provisioning `components/manifests/observability/overlays/with-grafana/kustomization.yaml`, `.../dashboards/README.md`, `.../dashboards/*.json`	Updated overlay kustomization to generate ConfigMaps from dashboard JSONs with `disableNameSuffixHash: true`; added guidance for adding dashboards and two new dashboards (`k8s-cluster-monitoring.json`, `k8s-pods.json`) plus README describing provisioning steps.
Removed monolith `components/manifests/observability/grafana.yaml` (deleted)	Deleted the single-file Grafana manifest; resources redistributed into overlay files and patches.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Dev as Developer (make / kubectl)
participant K8s as Kubernetes API
participant PVC as PersistentVolumeClaim
participant Init as Pod InitContainers
participant Grafana as Grafana main container
participant CM as ConfigMaps (dashboards, datasources)
Dev->>K8s: kubectl apply -f grafana-pvc.yaml
K8s->>PVC: provision grafana-storage
Dev->>K8s: kubectl apply -k overlays/with-grafana/
K8s->>K8s: create ConfigMaps (dashboards, datasources), RBAC, Deployment, Service
K8s->>Init: start pod initContainers
Init->>CM: read datasource template (contains INJECT_TOKEN_HERE)
Init->>Init: substitute token from ServiceAccount token
Init->>Grafana: write rendered datasources & copy dashboards into /var/lib/grafana/dashboards (emptyDir)
Grafana->>PVC: mount grafana-storage
Grafana->>CM: read dashboards provider config from ConfigMap (file provider)
Grafana->>Prometheus: query metrics (via configured datasource)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately identifies the primary changes: fixing the add-grafana make command and adding new Kubernetes dashboards, matching the core objectives of the PR.
Description check	✅ Passed	The description comprehensively covers the changeset, detailing resource reorganization, authentication fixes, PVC management, password persistence improvements, and new dashboard additions with supporting test evidence.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ambient-code · 2026-03-26T13:42:39Z

Review Queue Status

Check	Status	Detail
CI	FAIL	`test-local-dev-simulation`
Conflicts	pass	---
Reviews	pass	---

Action needed: Fix failing CI checks

Auto-generated by Review Queue workflow. Updated when PR changes.

jeremyeder · 2026-03-27T02:48:19Z

Hi @rh-rahulshetty this is awesome - can you rebase? Also has failing e2e. Does this depend on any other PRs or can it merge now?

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`:
- Line 11: The init container image currently uses an unpinned tag
"registry.access.redhat.com/ubi9/ubi-minimal:latest" which breaks
reproducibility; change it to a fixed, immutable reference (either a specific
version tag or an image digest) by replacing
"registry.access.redhat.com/ubi9/ubi-minimal:latest" in the grafana deployment
patch with a pinned tag or digest (for example a known stable UBI9 tag or sha256
digest) so the init container image is deterministic across builds; update all
occurrences (including the other instance noted at the same file) of this image
string.
- Around line 10-16: The current container "inject-token" injects the
ServiceAccount token into /etc/grafana-provisioning/datasources.yaml using sed
with default delimiter, which can break if the token contains "/" or "&"; update
the injection in the inject-token container (the command/args that build TOKEN
and run sed) to use a safer substitution method (e.g., use a different sed
delimiter such as | or @, or write the token into the target file via a
here-doc/perl/awk that performs literal replacement) when replacing
INJECT_TOKEN_HERE in /etc/grafana-base/datasources.yaml and ensure the output is
written to /etc/grafana-provisioning/datasources.yaml.

In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`:
- Around line 1-9: The PersistentVolumeClaim resource (kind:
PersistentVolumeClaim, metadata.name: grafana-storage) lacks an explicit
namespace; add a metadata.namespace: ambient-code entry to the manifest so the
PVC is created in the expected ambient-code namespace when applied directly with
kubectl.

In `@components/manifests/observability/overlays/with-grafana/grafana.yaml`:
- Around line 47-72: Add livenessProbe and readinessProbe to the grafana
container spec (container name "grafana") so Kubernetes can manage pod health;
configure both to use an httpGet on path /api/health (or /api/healthz) against
port named "http" (3000) with sensible timing (e.g., initialDelaySeconds ~10-30,
periodSeconds ~10-15, timeoutSeconds ~1-5, failureThreshold ~3) and ensure
readinessProbe returns success before traffic is routed. Update the manifest
under the container "grafana" to include these probe blocks and tune values to
match startup characteristics.
- Around line 46-72: Add a pod- and container-level securityContext to harden
the Grafana deployment: under spec (pod) add a securityContext with fsGroup and
runAsNonRoot and runAsUser values, and under the grafana container (name:
grafana) add a securityContext specifying readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, and runAsNonRoot/runAsUser to enforce non-root
execution; update volumeMounts or storage as needed so read-only root filesystem
and the chosen fsGroup still allow Grafana to write to /var/lib/grafana and
/etc/grafana/provisioning.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e6b10a79-c556-4aad-90a7-e82e0f0c8435

📥 Commits

Reviewing files that changed from the base of the PR and between 2bf8d2f and f6da902.

📒 Files selected for processing (19)

Makefile
components/manifests/observability/README.md
components/manifests/observability/base/kustomization.yaml
components/manifests/observability/base/otel-collector.yaml
components/manifests/observability/base/servicemonitor.yaml
components/manifests/observability/grafana.yaml
components/manifests/observability/kustomization.yaml
components/manifests/observability/overlays/with-grafana/dashboards/README.md
components/manifests/observability/overlays/with-grafana/dashboards/ambient-operator-dashboard.json
components/manifests/observability/overlays/with-grafana/dashboards/k8s-cluster-monitoring.json
components/manifests/observability/overlays/with-grafana/dashboards/k8s-namespace.json
components/manifests/observability/overlays/with-grafana/dashboards/k8s-nodes.json
components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json
components/manifests/observability/overlays/with-grafana/grafana-datasource-patch.yaml
components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml
components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml
components/manifests/observability/overlays/with-grafana/grafana-rbac.yaml
components/manifests/observability/overlays/with-grafana/grafana.yaml
components/manifests/observability/overlays/with-grafana/kustomization.yaml

💤 Files with no reviewable changes (1)

components/manifests/observability/grafana.yaml

coderabbitai · 2026-03-27T07:08:39Z

components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml

+      - name: inject-token
+        image: registry.access.redhat.com/ubi9/ubi-minimal:latest
+        command: ["sh", "-c"]
+        args:
+        - |
+          TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
+          sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml


⚠️ Potential issue | 🟡 Minor

Token injection via sed may break with special characters.

If the ServiceAccount token contains / or & characters, the sed substitution will fail or produce incorrect output. Consider using a different delimiter or a safer substitution method.

🔧 Safer token injection

args: - | TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) - sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml + # Use awk to avoid sed delimiter issues with tokens + awk -v token="$TOKEN" '{gsub(/INJECT_TOKEN_HERE/, token)}1' /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml

Alternatively, use a different sed delimiter that's unlikely to appear in JWT tokens:

- sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml + sed "s|INJECT_TOKEN_HERE|${TOKEN}|" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml

🧰 Tools

🪛 Trivy (0.69.3)

[warning] 10-21: Can elevate its own privileges

Container 'inject-token' of Deployment 'grafana' should set 'securityContext.allowPrivilegeEscalation' to false

Rule: KSV-0001

Learn more

(IaC/Kubernetes)

[info] 10-21: Default capabilities: some containers do not drop all

Container 'inject-token' of Deployment 'grafana' should add 'ALL' to 'securityContext.capabilities.drop'

Rule: KSV-0003

Learn more

(IaC/Kubernetes)

[info] 10-21: Default capabilities: some containers do not drop any

Container 'inject-token' of 'deployment' 'grafana' in 'default' namespace should set securityContext.capabilities.drop

Rule: KSV-0004

Learn more

(IaC/Kubernetes)

[info] 10-21: CPU not limited

Container 'inject-token' of Deployment 'grafana' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)

[warning] 10-21: Runs as root user

Container 'inject-token' of Deployment 'grafana' should set 'securityContext.runAsNonRoot' to true

Rule: KSV-0012

Learn more

(IaC/Kubernetes)

[warning] 10-21: Image tag ":latest" used

Container 'inject-token' of Deployment 'grafana' should specify an image tag

Rule: KSV-0013

Learn more

(IaC/Kubernetes)

[error] 10-21: Root file system is not read-only

Container 'inject-token' of Deployment 'grafana' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)

[info] 10-21: CPU requests not specified

Container 'inject-token' of Deployment 'grafana' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)

[info] 10-21: Memory requests not specified

Container 'inject-token' of Deployment 'grafana' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)

[info] 10-21: Memory not limited

Container 'inject-token' of Deployment 'grafana' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)

[info] 10-21: Runs with UID <= 10000

Container 'inject-token' of Deployment 'grafana' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)

[info] 10-21: Runs with GID <= 10000

Container 'inject-token' of Deployment 'grafana' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)

[info] 10-21: Runtime/Default Seccomp profile not set

Either Pod or Container should set 'securityContext.seccompProfile.type' to 'RuntimeDefault'

Rule: KSV-0030

Learn more

(IaC/Kubernetes)

[warning] 10-21: Seccomp policies disabled

container "inject-token" of deployment "grafana" in "default" namespace should specify a seccomp profile

Rule: KSV-0104

Learn more

(IaC/Kubernetes)

[info] 10-21: Container capabilities must only include NET_BIND_SERVICE

container should drop all

Rule: KSV-0106

Learn more

(IaC/Kubernetes)

[error] 10-21: Default security context configured

container grafana in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

[warning] 10-21: Restrict container images to trusted registries

Container inject-token in deployment grafana (namespace: default) uses an image from an untrusted registry.

Rule: KSV-0125

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml` around lines 10 - 16, The current container "inject-token" injects the ServiceAccount token into /etc/grafana-provisioning/datasources.yaml using sed with default delimiter, which can break if the token contains "/" or "&"; update the injection in the inject-token container (the command/args that build TOKEN and run sed) to use a safer substitution method (e.g., use a different sed delimiter such as | or @, or write the token into the target file via a here-doc/perl/awk that performs literal replacement) when replacing INJECT_TOKEN_HERE in /etc/grafana-base/datasources.yaml and ensure the output is written to /etc/grafana-provisioning/datasources.yaml.

coderabbitai · 2026-03-27T07:08:39Z

components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml

+      serviceAccountName: grafana
+      initContainers:
+      - name: inject-token
+        image: registry.access.redhat.com/ubi9/ubi-minimal:latest


🧹 Nitpick | 🔵 Trivial

Pin init container image tags for reproducibility.

Using :latest tag for ubi9/ubi-minimal makes builds non-reproducible and could introduce unexpected changes.

📌 Suggested fix

- name: inject-token - image: registry.access.redhat.com/ubi9/ubi-minimal:latest + image: registry.access.redhat.com/ubi9/ubi-minimal:9.4

- name: collect-dashboards - image: registry.access.redhat.com/ubi9/ubi-minimal:latest + image: registry.access.redhat.com/ubi9/ubi-minimal:9.4

Also applies to: 23-23

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml` at line 11, The init container image currently uses an unpinned tag "registry.access.redhat.com/ubi9/ubi-minimal:latest" which breaks reproducibility; change it to a fixed, immutable reference (either a specific version tag or an image digest) by replacing "registry.access.redhat.com/ubi9/ubi-minimal:latest" in the grafana deployment patch with a pinned tag or digest (for example a known stable UBI9 tag or sha256 digest) so the init container image is deterministic across builds; update all occurrences (including the other instance noted at the same file) of this image string.

coderabbitai · 2026-03-27T07:08:39Z

components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml

+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: grafana-storage
+spec:
+  accessModes: [ReadWriteOnce]
+  resources:
+    requests:
+      storage: 5Gi


⚠️ Potential issue | 🟡 Minor

Add explicit namespace to avoid deployment issues.

The PVC is applied directly via kubectl apply -f (not through Kustomize), so it won't inherit the namespace from the kustomization. Without an explicit namespace, the PVC will be created in the current kubectl context's namespace, which may not be ambient-code.

🔧 Proposed fix

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: grafana-storage + namespace: ambient-code spec: accessModes: [ReadWriteOnce] resources: requests: storage: 5Gi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

name: grafana-storage

spec:

accessModes: [ReadWriteOnce]

resources:

requests:

storage: 5Gi

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

name: grafana-storage

namespace: ambient-code

spec:

accessModes: [ReadWriteOnce]

resources:

requests:

storage: 5Gi

🧰 Tools

🪛 Trivy (0.69.3)

[info] 5-9: limit range usage

A LimitRange policy with a default requests and limits for each container should be configured

Rule: KSV-0039

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml` around lines 1 - 9, The PersistentVolumeClaim resource (kind: PersistentVolumeClaim, metadata.name: grafana-storage) lacks an explicit namespace; add a metadata.namespace: ambient-code entry to the manifest so the PVC is created in the expected ambient-code namespace when applied directly with kubectl.

coderabbitai · 2026-03-27T07:08:39Z

components/manifests/observability/overlays/with-grafana/grafana.yaml

+    spec:
+      containers:
+      - name: grafana
+        image: grafana/grafana:11.4.0
+        ports:
+        - containerPort: 3000
+          name: http
+        env:
+        - name: GF_SECURITY_ADMIN_USER
+          value: admin
+        - name: GF_USERS_ALLOW_SIGN_UP
+          value: "false"
+        volumeMounts:
+        - name: datasources
+          mountPath: /etc/grafana/provisioning/datasources
+        - name: dashboards-provider
+          mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml
+          subPath: dashboards.yaml
+        - name: storage
+          mountPath: /var/lib/grafana
+        resources:
+          requests:
+            cpu: 100m
+            memory: 128Mi
+          limits:
+            cpu: 500m
+            memory: 512Mi


🧹 Nitpick | 🔵 Trivial

Add security context to harden the Grafana container.

The deployment lacks a securityContext for the pod and container. For production workloads, consider adding:

🛡️ Suggested security hardening

spec: + securityContext: + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault containers: - name: grafana image: grafana/grafana:11.4.0 + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: false # Grafana needs write access ports: - containerPort: 3000

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

spec:

containers:

- name: grafana

image: grafana/grafana:11.4.0

ports:

- containerPort: 3000

name: http

env:

- name: GF_SECURITY_ADMIN_USER

value: admin

- name: GF_USERS_ALLOW_SIGN_UP

value: "false"

volumeMounts:

- name: datasources

mountPath: /etc/grafana/provisioning/datasources

- name: dashboards-provider

mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml

subPath: dashboards.yaml

- name: storage

mountPath: /var/lib/grafana

resources:

requests:

cpu: 100m

memory: 128Mi

limits:

cpu: 500m

memory: 512Mi

spec:

securityContext:

runAsNonRoot: true

seccompProfile:

type: RuntimeDefault

containers:

- name: grafana

image: grafana/grafana:11.4.0

securityContext:

allowPrivilegeEscalation: false

capabilities:

drop:

- ALL

readOnlyRootFilesystem: false # Grafana needs write access

ports:

- containerPort: 3000

name: http

env:

- name: GF_SECURITY_ADMIN_USER

value: admin

- name: GF_USERS_ALLOW_SIGN_UP

value: "false"

volumeMounts:

- name: datasources

mountPath: /etc/grafana/provisioning/datasources

- name: dashboards-provider

mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml

subPath: dashboards.yaml

- name: storage

mountPath: /var/lib/grafana

resources:

requests:

cpu: 100m

memory: 128Mi

limits:

cpu: 500m

memory: 512Mi

🧰 Tools

🪛 Trivy (0.69.3)

[warning] 46-70: Can elevate its own privileges

Container 'grafana' of Deployment 'grafana' should set 'securityContext.allowPrivilegeEscalation' to false

Rule: KSV-0001

Learn more

(IaC/Kubernetes)

[info] 46-70: Default capabilities: some containers do not drop all

Container 'grafana' of Deployment 'grafana' should add 'ALL' to 'securityContext.capabilities.drop'

Rule: KSV-0003

Learn more

(IaC/Kubernetes)

[info] 46-70: Default capabilities: some containers do not drop any

Container 'grafana' of 'deployment' 'grafana' in 'default' namespace should set securityContext.capabilities.drop

Rule: KSV-0004

Learn more

(IaC/Kubernetes)

[warning] 46-70: Runs as root user

Container 'grafana' of Deployment 'grafana' should set 'securityContext.runAsNonRoot' to true

Rule: KSV-0012

Learn more

(IaC/Kubernetes)

[error] 46-70: Root file system is not read-only

Container 'grafana' of Deployment 'grafana' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)

[info] 46-70: Runs with UID <= 10000

Container 'grafana' of Deployment 'grafana' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)

[info] 46-70: Runs with GID <= 10000

Container 'grafana' of Deployment 'grafana' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)

[info] 46-70: Runtime/Default Seccomp profile not set

Either Pod or Container should set 'securityContext.seccompProfile.type' to 'RuntimeDefault'

Rule: KSV-0030

Learn more

(IaC/Kubernetes)

[warning] 46-70: Seccomp policies disabled

container "grafana" of deployment "grafana" in "default" namespace should specify a seccomp profile

Rule: KSV-0104

Learn more

(IaC/Kubernetes)

[info] 46-70: Container capabilities must only include NET_BIND_SERVICE

container should drop all

Rule: KSV-0106

Learn more

(IaC/Kubernetes)

[error] 46-70: Default security context configured

container grafana in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

[warning] 46-70: Restrict container images to trusted registries

Container grafana in deployment grafana (namespace: default) uses an image from an untrusted registry.

Rule: KSV-0125

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/manifests/observability/overlays/with-grafana/grafana.yaml` around lines 46 - 72, Add a pod- and container-level securityContext to harden the Grafana deployment: under spec (pod) add a securityContext with fsGroup and runAsNonRoot and runAsUser values, and under the grafana container (name: grafana) add a securityContext specifying readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, and runAsNonRoot/runAsUser to enforce non-root execution; update volumeMounts or storage as needed so read-only root filesystem and the chosen fsGroup still allow Grafana to write to /var/lib/grafana and /etc/grafana/provisioning.

coderabbitai · 2026-03-27T07:08:39Z

components/manifests/observability/overlays/with-grafana/grafana.yaml

+      containers:
+      - name: grafana
+        image: grafana/grafana:11.4.0
+        ports:
+        - containerPort: 3000
+          name: http
+        env:
+        - name: GF_SECURITY_ADMIN_USER
+          value: admin
+        - name: GF_USERS_ALLOW_SIGN_UP
+          value: "false"
+        volumeMounts:
+        - name: datasources
+          mountPath: /etc/grafana/provisioning/datasources
+        - name: dashboards-provider
+          mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml
+          subPath: dashboards.yaml
+        - name: storage
+          mountPath: /var/lib/grafana
+        resources:
+          requests:
+            cpu: 100m
+            memory: 128Mi
+          limits:
+            cpu: 500m
+            memory: 512Mi


🧹 Nitpick | 🔵 Trivial

Consider adding liveness and readiness probes.

The Grafana container lacks health probes, which can impact pod lifecycle management and rolling updates.

💡 Suggested probe configuration

- name: grafana image: grafana/grafana:11.4.0 ports: - containerPort: 3000 name: http + livenessProbe: + httpGet: + path: /api/health + port: 3000 + initialDelaySeconds: 30 + periodSeconds: 10 + readinessProbe: + httpGet: + path: /api/health + port: 3000 + initialDelaySeconds: 10 + periodSeconds: 5

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/manifests/observability/overlays/with-grafana/grafana.yaml` around lines 47 - 72, Add livenessProbe and readinessProbe to the grafana container spec (container name "grafana") so Kubernetes can manage pod health; configure both to use an httpGet on path /api/health (or /api/healthz) against port named "http" (3000) with sensible timing (e.g., initialDelaySeconds ~10-30, periodSeconds ~10-15, timeoutSeconds ~1-5, failureThreshold ~3) and ensure readinessProbe returns success before traffic is routed. Update the manifest under the container "grafana" to include these probe blocks and tune values to match startup characteristics.

rh-rahulshetty · 2026-03-27T07:22:08Z

Hi @jeremyeder,
I've rebased from the latest commit, and also at the moment I don't have dep on other PR as I've developed and tested this branch independently. Not sure why the e2e tests have been skipped here :/

Signed-off-by: Rahul Shetty <rashetty@redhat.com>

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml (1)

3-5: ⚠️ Potential issue | 🟠 Major

Set an explicit namespace on the PVC manifest.

Because this file is applied directly, the claim can be created in the wrong namespace when kube context differs, causing Grafana to miss the expected claim.

🔧 Proposed fix

 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
   name: grafana-storage
+  namespace: ambient-code
 spec:
   accessModes: [ReadWriteOnce]
   resources:
     requests:
       storage: 5Gi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`
around lines 3 - 5, The PVC manifest lacks an explicit namespace so it can be
created in the wrong context; update the grafana-pvc.yaml by adding
metadata.namespace with the Grafana release namespace (i.e. the namespace where
the Grafana Deployment/Helm release runs) alongside metadata.name:
grafana-storage so the PersistentVolumeClaim is always created in the correct
namespace; ensure this namespace matches the Grafana deployment/service account
namespace referenced in your overlays.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json`:
- Line 2149: The PromQL expressions like
kube_pod_container_status_waiting_reason{reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}
must include the dashboard variables for cluster and namespace so results
respect the selected scope; update that selector to include cluster=~"$cluster"
and namespace=~"$namespace" (e.g.
kube_pod_container_status_waiting_reason{cluster=~"$cluster",
namespace=~"$namespace",
reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}) and apply the same
change to the other similar query at the second occurrence so both tables filter
by the selected cluster and namespace.

In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`:
- Around line 15-16: The init-container currently snapshots the ServiceAccount
token via the sed call that reads
/var/run/secrets/kubernetes.io/serviceaccount/token and writes
/etc/grafana-provisioning/datasources.yaml (the sed
"s/INJECT_TOKEN_HERE/${TOKEN}/" step), which becomes stale after kubelet token
rotation; remove that init-only token snapshot and instead add a small sidecar
(token-refresh) that watches the live token file
(/var/run/secrets/kubernetes.io/serviceaccount/token) and atomically rewrites
/etc/grafana-provisioning/datasources.yaml from the template at
/etc/grafana-base/datasources.yaml whenever the token changes (or use an
in-place updater loop with sleep/watch+cmp), ensuring the sidecar has the same
shared volume mounts as the Grafana container; alternatively, mount a projected
serviceAccountToken volume and modify Grafana provisioning to read the token
file at runtime if supported—locate the current sed invocation in the
initContainer and replace it with the sidecar implementation that updates the
same target path.

In `@components/manifests/observability/overlays/with-grafana/grafana.yaml`:
- Around line 53-57: The Deployment currently sets GF_SECURITY_ADMIN_USER and
GF_USERS_ALLOW_SIGN_UP but leaves the admin password at Grafana's default; add
an env var GF_SECURITY_ADMIN_PASSWORD that sources its value from a Kubernetes
Secret (e.g., secret name like grafana-admin-secret, key like admin-password)
using valueFrom.secretKeyRef in the container spec so the password is not
hardcoded or defaulted; ensure the Secret is created separately and referenced
by the env entry alongside GF_SECURITY_ADMIN_USER and GF_USERS_ALLOW_SIGN_UP.

In `@components/manifests/observability/overlays/with-grafana/kustomization.yaml`:
- Around line 36-37: The kustomization currently sets
generatorOptions.disableNameSuffixHash: true which prevents ConfigMap names from
changing and thus prevents the init-container that copies dashboards from
triggering pod rollouts; to fix, remove or set
generatorOptions.disableNameSuffixHash to false so kustomize appends content
hashes to generated ConfigMap names (causing Deployment changes), or
alternatively add an explicit rollout trigger by updating the Deployment pod
template (the init-container pod spec) to include an annotation like a checksum
of the dashboard ConfigMap(s) (e.g., checksum/config-dashboards) that is
recalculated when the ConfigMap content changes so pods are recreated and the
init-container picks up updated dashboards.

---

Duplicate comments:
In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`:
- Around line 3-5: The PVC manifest lacks an explicit namespace so it can be
created in the wrong context; update the grafana-pvc.yaml by adding
metadata.namespace with the Grafana release namespace (i.e. the namespace where
the Grafana Deployment/Helm release runs) alongside metadata.name:
grafana-storage so the PersistentVolumeClaim is always created in the correct
namespace; ensure this namespace matches the Grafana deployment/service account
namespace referenced in your overlays.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9e7e669a-5ca5-4276-a70f-6d5894619be1

📥 Commits

Reviewing files that changed from the base of the PR and between f6da902 and d6d861f.

📒 Files selected for processing (19)

Makefile
components/manifests/observability/README.md
components/manifests/observability/base/kustomization.yaml
components/manifests/observability/base/otel-collector.yaml
components/manifests/observability/base/servicemonitor.yaml
components/manifests/observability/grafana.yaml
components/manifests/observability/kustomization.yaml
components/manifests/observability/overlays/with-grafana/dashboards/README.md
components/manifests/observability/overlays/with-grafana/dashboards/ambient-operator-dashboard.json
components/manifests/observability/overlays/with-grafana/dashboards/k8s-cluster-monitoring.json
components/manifests/observability/overlays/with-grafana/dashboards/k8s-namespace.json
components/manifests/observability/overlays/with-grafana/dashboards/k8s-nodes.json
components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json
components/manifests/observability/overlays/with-grafana/grafana-datasource-patch.yaml
components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml
components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml
components/manifests/observability/overlays/with-grafana/grafana-rbac.yaml
components/manifests/observability/overlays/with-grafana/grafana.yaml
components/manifests/observability/overlays/with-grafana/kustomization.yaml

💤 Files with no reviewable changes (1)

components/manifests/observability/grafana.yaml

coderabbitai · 2026-03-30T07:24:13Z

components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json

+          },
+          "editorMode": "code",
+          "exemplar": false,
+          "expr": "kube_pod_container_status_waiting_reason{reason=~\"ErrImagePull|ImagePullBackOff|CrashLoopBackOff\"} == 1",


⚠️ Potential issue | 🟠 Major

Respect the selected cluster and namespace in these tables.

Lines 2149 and 2310 omit $cluster and $namespace, so this dashboard can show incident rows from unrelated namespaces or even other clusters after the user narrows the view. That makes both tables misleading in a shared Prometheus.

Suggested query fix

- "expr": "kube_pod_container_status_waiting_reason{reason=~\"ErrImagePull|ImagePullBackOff|CrashLoopBackOff\"} == 1", + "expr": "kube_pod_container_status_waiting_reason{cluster=\"$cluster\", namespace=\"$namespace\", reason=~\"ErrImagePull|ImagePullBackOff|CrashLoopBackOff\"} == 1",

- "expr": "kube_pod_status_scheduled{condition=\"false\"} == 1", + "expr": "kube_pod_status_scheduled{cluster=\"$cluster\", namespace=\"$namespace\", condition=\"false\"} == 1",

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 2310-2310

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json` at line 2149, The PromQL expressions like kube_pod_container_status_waiting_reason{reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"} must include the dashboard variables for cluster and namespace so results respect the selected scope; update that selector to include cluster=~"$cluster" and namespace=~"$namespace" (e.g. kube_pod_container_status_waiting_reason{cluster=~"$cluster", namespace=~"$namespace", reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}) and apply the same change to the other similar query at the second occurrence so both tables filter by the selected cluster and namespace.

coderabbitai · 2026-03-30T07:24:13Z

components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml

+          TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
+          sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n "components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml"

Repository: ambient-code/platform

Length of output: 3308

🌐 Web query:

Kubernetes projected ServiceAccount token rotation behavior

💡 Result:

Kubernetes projected ServiceAccount tokens are short-lived JWTs obtained via the TokenRequest API and mounted into pods as projected volumes. The kubelet manages their lifecycle, automatically requesting new tokens and updating the mounted files. Key rotation behavior: - Tokens have configurable expirationSeconds (default 1 hour, minimum 10 minutes, max limited by --service-account-max-token-expiration). - Kubelet proactively refreshes the token if older than 80% of TTL or older than 24 hours. - The token file is updated atomically; applications must reload the file to use the new token (e.g., periodically or on file change). - Tokens are bound to the pod: invalidated on pod deletion (within ~60s grace period). - By default (since v1.22), ServiceAccount admission controller injects a projected volume (kube-api-access-*) with a token for API access. Example pod spec: volumes: - name: token projected: sources: - serviceAccountToken: expirationSeconds: 3600 audience: api Applications should read the token from the mount path (e.g., /var/run/secrets/kubernetes.io/serviceaccount/token) on a schedule (e.g., every 5 min) or watch for file changes, as Kubernetes clients handle this automatically. Legacy non-expiring secret-based tokens are deprecated and cleaned up in modern clusters (v1.24+). This is stable since Kubernetes v1.20 (TokenRequestProjection GA).

Citations:

1: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/

2: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/

3: http://k8s.io/docs/reference/access-authn-authz/service-accounts-admin/

4: http://k8s.io/docs/concepts/storage/projected-volumes/

🌐 Web query:

OpenShift token rotation init container lifecycle

💡 Result:

No specific official documentation or authoritative source describes a standard "OpenShift token rotation init container lifecycle." Searches across official Red Hat OpenShift docs, OKD docs, Kubernetes docs, GitHub issues, and bugzillas yielded no direct matches for this exact concept. Key related findings: Bound service account (SA) tokens in OpenShift (docs from 4.8 to 4.20) are automatically rotated by the kubelet when older than 80% of expirationSeconds or 24 hours. Apps must reload the token file. No init container involvement; rotation happens during main container runtime via projected volumes. Init containers run sequentially to completion before main containers start, sharing pod resources but with separate images. They are ideal for setup (e.g., waiting for services, cloning repos) but not ongoing rotation, as they don't run continuously. In Cluster API (CAPI) operator (TechPreview), a controller rotates token secrets every 30 minutes if older (Bug 2089254, fixed in 4.11). No mention of init containers in operator pod spec. Custom patterns exist: token refreshers (e.g., kube-token-refresher, cap-token-refresh cronjob), External Secrets Operator for external secret sync/rotation, but none use init containers specifically for OpenShift SA token rotation. Conclusion: Token rotation typically uses bound projected volumes (automatic kubelet-managed) or operators/controllers. Init containers are unsuitable for rotation lifecycle due to their one-time execution nature. Custom init container implementations for token fetch/rotation are possible but non-standard, undocumented officially, and would require pod restarts for refresh (defeating rotation purpose).

Citations:

1: https://docs.okd.io/4.16/nodes/containers/nodes-containers-init.html

2: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/

3: https://docs.openshift.com/container-platform/4.8/authentication/bound-service-account-tokens.html

4: https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/authentication_and_authorization/bound-service-account-tokens

5: https://bugzilla.redhat.com/show_bug.cgi?id=2089254

6: https://linkedin.com/company/init-6-inc

7: https://docs.openshift.com/container-platform/4.20/authentication/understanding-and-creating-service-accounts.html

8: https://docs.okd.io/4.20/authentication/configuring-internal-oauth.html

9: https://github.com/openshift/cluster-capi-operator

Init-container token snapshot will become stale after automatic token rotation.

Lines 15–16 read the ServiceAccount token once during init-container execution and snapshot it into datasources.yaml. On current Kubernetes/OpenShift releases, this token is automatically rotated by the kubelet (default ~1 hour), but the snapshot is never updated. Once the token expires, Grafana's Prometheus datasource authentication fails. Restarting the Grafana container does not help because init containers only run at pod creation time; the pod must be recreated to obtain a fresh token. Use a token-refresh mechanism (e.g., a sidecar or operator) instead of an init-only snapshot.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml` around lines 15 - 16, The init-container currently snapshots the ServiceAccount token via the sed call that reads /var/run/secrets/kubernetes.io/serviceaccount/token and writes /etc/grafana-provisioning/datasources.yaml (the sed "s/INJECT_TOKEN_HERE/${TOKEN}/" step), which becomes stale after kubelet token rotation; remove that init-only token snapshot and instead add a small sidecar (token-refresh) that watches the live token file (/var/run/secrets/kubernetes.io/serviceaccount/token) and atomically rewrites /etc/grafana-provisioning/datasources.yaml from the template at /etc/grafana-base/datasources.yaml whenever the token changes (or use an in-place updater loop with sleep/watch+cmp), ensuring the sidecar has the same shared volume mounts as the Grafana container; alternatively, mount a projected serviceAccountToken volume and modify Grafana provisioning to read the token file at runtime if supported—locate the current sed invocation in the initContainer and replace it with the sidecar implementation that updates the same target path.

coderabbitai · 2026-03-30T07:24:13Z

components/manifests/observability/overlays/with-grafana/grafana.yaml

+        env:
+        - name: GF_SECURITY_ADMIN_USER
+          value: admin
+        - name: GF_USERS_ALLOW_SIGN_UP
+          value: "false"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "grafana.yaml" -o -name "grafana-deployment-patch.yaml" | head -20

Repository: ambient-code/platform

Length of output: 225

🏁 Script executed:

cat -n components/manifests/observability/overlays/with-grafana/grafana.yaml | head -100

Repository: ambient-code/platform

Length of output: 2767

🏁 Script executed:

cat -n components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml

Repository: ambient-code/platform

Length of output: 3308

🏁 Script executed:

# Search for any Secret references in the overlays directory rg -i "secret" components/manifests/observability/overlays/with-grafana/ --context 3

Repository: ambient-code/platform

Length of output: 2044

🌐 Web query:

Grafana admin password environment variable GF_SECURITY_ADMIN_PASSWORD default behavior

💡 Result:

The environment variable GF_SECURITY_ADMIN_PASSWORD overrides the [security] admin_password setting in Grafana's configuration. By default, without this variable set (or the ini equivalent), the admin password is "admin". However, it is only applied on first-run during initial database initialization when the default admin user is created. On subsequent runs, especially with persistent data storage (e.g., Docker volumes), the password is already set in the database and cannot be overridden by the environment variable; it must be changed via the UI, CLI, or API. To use it effectively: - Ensure fresh data directory (empty /var/lib/grafana). - Set via Docker: docker run -e "GF_SECURITY_ADMIN_PASSWORD=yourpassword" grafana/grafana - Defaults.ini explicitly sets admin_password = admin. Sources confirm consistent behavior across versions, with issues arising from persistence.

Citations:

1: https://grafana.com/docs/grafana/v4.4/installation/configuration/

2: https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/

3: https://grafana.com/docs/grafana/v8.4/administration/configure-docker/

4: https://raw.githubusercontent.com/grafana/grafana/main/conf/defaults.ini

5: https://www.grafana.com/docs/grafana/v12.4/setup-grafana/configure-docker/

6: GF_SECURITY_ADMIN_PASSWORD issue grafana/grafana-docker#37

7: https://grafana.com/docs/grafana/v6.4/installation/docker/

8: [Bug] Security config is not affected by ENV variables in Docker container grafana/grafana#4825

🏁 Script executed:

find components/manifests/observability/overlays/with-grafana -name "kustomization.yaml" -o -name "kustomization.yml"

Repository: ambient-code/platform

Length of output: 140

🏁 Script executed:

find components/manifests/observability/overlays/with-grafana -type f -name "*.yaml" | xargs basename -a | sort | uniq

Repository: ambient-code/platform

Length of output: 191

🏁 Script executed:

cat -n components/manifests/observability/overlays/with-grafana/kustomization.yaml

Repository: ambient-code/platform

Length of output: 1285

Configure admin password from a Secret instead of relying on Grafana's built-in default.

Lines 54-57 set the admin username to admin but do not configure the admin password. With this Deployment configuration and persistent storage at /var/lib/grafana, a fresh Grafana instance will bootstrap with Grafana's default admin/admin credential, creating a known security vulnerability on first deploy. Add GF_SECURITY_ADMIN_PASSWORD sourced from a Kubernetes Secret instead of allowing the default to persist.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/manifests/observability/overlays/with-grafana/grafana.yaml` around lines 53 - 57, The Deployment currently sets GF_SECURITY_ADMIN_USER and GF_USERS_ALLOW_SIGN_UP but leaves the admin password at Grafana's default; add an env var GF_SECURITY_ADMIN_PASSWORD that sources its value from a Kubernetes Secret (e.g., secret name like grafana-admin-secret, key like admin-password) using valueFrom.secretKeyRef in the container spec so the password is not hardcoded or defaulted; ensure the Secret is created separately and referenced by the env entry alongside GF_SECURITY_ADMIN_USER and GF_USERS_ALLOW_SIGN_UP.

coderabbitai · 2026-03-30T07:24:13Z

components/manifests/observability/overlays/with-grafana/kustomization.yaml

+generatorOptions:
+  disableNameSuffixHash: true


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the target kustomization.yaml file find . -path "*components/manifests/observability/overlays/with-grafana/kustomization.yaml" -type f

Repository: ambient-code/platform

Length of output: 142

🏁 Script executed:

# Check the structure of the observability directory find ./components/manifests/observability -type f -name "*.yaml" | head -20

Repository: ambient-code/platform

Length of output: 786

🏁 Script executed:

# Examine the kustomization.yaml file content if [ -f "components/manifests/observability/overlays/with-grafana/kustomization.yaml" ]; then cat -n "components/manifests/observability/overlays/with-grafana/kustomization.yaml" fi

Repository: ambient-code/platform

Length of output: 1285

🏁 Script executed:

# Search for init-container references and ConfigMap copying logic rg -i "init.*container|copy.*configmap|dashboard" --type yaml -A 3 -B 3 ./components/manifests/observability/

Repository: ambient-code/platform

Length of output: 14768

🏁 Script executed:

# Look for Deployments that might reference the grafana overlay rg "kind.*Deployment" --type yaml -A 10 ./components/manifests/observability/

Repository: ambient-code/platform

Length of output: 3094

Dashboard ConfigMap updates won't trigger pod rollouts with disableNameSuffixHash: true.

Since the init-container copies dashboards from ConfigMaps at pod startup and those ConfigMaps have stable names, content changes won't propagate to existing pods. The Deployment spec remains unchanged when the ConfigMap content is updated, so no new pods are created to fetch the updated dashboard JSON files.

To ensure dashboard updates roll out automatically, remove the hash suffix disablement so ConfigMap names change with content. Alternatively, implement an explicit rollout trigger (such as a pod restart annotation or webhook) whenever dashboard ConfigMaps are updated.

🔧 Proposed fix

-generatorOptions: - disableNameSuffixHash: true

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

generatorOptions:

disableNameSuffixHash: true

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/manifests/observability/overlays/with-grafana/kustomization.yaml` around lines 36 - 37, The kustomization currently sets generatorOptions.disableNameSuffixHash: true which prevents ConfigMap names from changing and thus prevents the init-container that copies dashboards from triggering pod rollouts; to fix, remove or set generatorOptions.disableNameSuffixHash to false so kustomize appends content hashes to generated ConfigMap names (causing Deployment changes), or alternatively add an explicit rollout trigger by updating the Deployment pod template (the init-container pod spec) to include an annotation like a checksum of the dashboard ConfigMap(s) (e.g., checksum/config-dashboards) that is recalculated when the ConfigMap content changes so pods are recreated and the init-container picks up updated dashboards.

rh-rahulshetty requested a review from Gkrumbach07 March 25, 2026 10:40

rh-rahulshetty force-pushed the feat-update-grafana branch from b226ef6 to f6da902 Compare March 27, 2026 07:03

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

ambient-code bot added this to the Review Queue milestone Mar 27, 2026

rh-rahulshetty added 3 commits March 30, 2026 12:40

fix grafana setup + new dashboards

e991d90

Signed-off-by: Rahul Shetty <rashetty@redhat.com>

removed state metrics

406722c

Signed-off-by: Rahul Shetty <rashetty@redhat.com>

update dashboard

d6d861f

Signed-off-by: Rahul Shetty <rashetty@redhat.com>

rh-rahulshetty force-pushed the feat-update-grafana branch from f6da902 to d6d861f Compare March 30, 2026 07:10

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

		TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
		sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml

Conversation

rh-rahulshetty commented Mar 25, 2026

Summary

Dashboards

Test plan

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

ambient-code bot commented Mar 26, 2026

Review Queue Status

Uh oh!

jeremyeder commented Mar 27, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

rh-rahulshetty commented Mar 27, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 25, 2026 •

edited

Loading