fix(grafana): Resolve issue with add-grafana make command & add new K8s dashboard#1018
fix(grafana): Resolve issue with add-grafana make command & add new K8s dashboard#1018rh-rahulshetty wants to merge 3 commits intoambient-code:mainfrom
Conversation
WalkthroughRestructures observability manifests: moves core resources into a base kustomize, removes the monolithic Grafana manifest, adds Grafana overlay files (RBAC, deployment patch, PVC, datasources, dashboards), introduces dashboard provisioning via kustomize ConfigMaps, and updates Makefile/README to apply and preserve the Grafana PVC. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review Queue Status
Action needed: Fix failing CI checks
|
|
Hi @rh-rahulshetty this is awesome - can you rebase? Also has failing e2e. Does this depend on any other PRs or can it merge now? |
b226ef6 to
f6da902
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`:
- Line 11: The init container image currently uses an unpinned tag
"registry.access.redhat.com/ubi9/ubi-minimal:latest" which breaks
reproducibility; change it to a fixed, immutable reference (either a specific
version tag or an image digest) by replacing
"registry.access.redhat.com/ubi9/ubi-minimal:latest" in the grafana deployment
patch with a pinned tag or digest (for example a known stable UBI9 tag or sha256
digest) so the init container image is deterministic across builds; update all
occurrences (including the other instance noted at the same file) of this image
string.
- Around line 10-16: The current container "inject-token" injects the
ServiceAccount token into /etc/grafana-provisioning/datasources.yaml using sed
with default delimiter, which can break if the token contains "/" or "&"; update
the injection in the inject-token container (the command/args that build TOKEN
and run sed) to use a safer substitution method (e.g., use a different sed
delimiter such as | or @, or write the token into the target file via a
here-doc/perl/awk that performs literal replacement) when replacing
INJECT_TOKEN_HERE in /etc/grafana-base/datasources.yaml and ensure the output is
written to /etc/grafana-provisioning/datasources.yaml.
In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`:
- Around line 1-9: The PersistentVolumeClaim resource (kind:
PersistentVolumeClaim, metadata.name: grafana-storage) lacks an explicit
namespace; add a metadata.namespace: ambient-code entry to the manifest so the
PVC is created in the expected ambient-code namespace when applied directly with
kubectl.
In `@components/manifests/observability/overlays/with-grafana/grafana.yaml`:
- Around line 47-72: Add livenessProbe and readinessProbe to the grafana
container spec (container name "grafana") so Kubernetes can manage pod health;
configure both to use an httpGet on path /api/health (or /api/healthz) against
port named "http" (3000) with sensible timing (e.g., initialDelaySeconds ~10-30,
periodSeconds ~10-15, timeoutSeconds ~1-5, failureThreshold ~3) and ensure
readinessProbe returns success before traffic is routed. Update the manifest
under the container "grafana" to include these probe blocks and tune values to
match startup characteristics.
- Around line 46-72: Add a pod- and container-level securityContext to harden
the Grafana deployment: under spec (pod) add a securityContext with fsGroup and
runAsNonRoot and runAsUser values, and under the grafana container (name:
grafana) add a securityContext specifying readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, and runAsNonRoot/runAsUser to enforce non-root
execution; update volumeMounts or storage as needed so read-only root filesystem
and the chosen fsGroup still allow Grafana to write to /var/lib/grafana and
/etc/grafana/provisioning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e6b10a79-c556-4aad-90a7-e82e0f0c8435
📒 Files selected for processing (19)
Makefilecomponents/manifests/observability/README.mdcomponents/manifests/observability/base/kustomization.yamlcomponents/manifests/observability/base/otel-collector.yamlcomponents/manifests/observability/base/servicemonitor.yamlcomponents/manifests/observability/grafana.yamlcomponents/manifests/observability/kustomization.yamlcomponents/manifests/observability/overlays/with-grafana/dashboards/README.mdcomponents/manifests/observability/overlays/with-grafana/dashboards/ambient-operator-dashboard.jsoncomponents/manifests/observability/overlays/with-grafana/dashboards/k8s-cluster-monitoring.jsoncomponents/manifests/observability/overlays/with-grafana/dashboards/k8s-namespace.jsoncomponents/manifests/observability/overlays/with-grafana/dashboards/k8s-nodes.jsoncomponents/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.jsoncomponents/manifests/observability/overlays/with-grafana/grafana-datasource-patch.yamlcomponents/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yamlcomponents/manifests/observability/overlays/with-grafana/grafana-pvc.yamlcomponents/manifests/observability/overlays/with-grafana/grafana-rbac.yamlcomponents/manifests/observability/overlays/with-grafana/grafana.yamlcomponents/manifests/observability/overlays/with-grafana/kustomization.yaml
💤 Files with no reviewable changes (1)
- components/manifests/observability/grafana.yaml
| - name: inject-token | ||
| image: registry.access.redhat.com/ubi9/ubi-minimal:latest | ||
| command: ["sh", "-c"] | ||
| args: | ||
| - | | ||
| TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) | ||
| sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml |
There was a problem hiding this comment.
Token injection via sed may break with special characters.
If the ServiceAccount token contains / or & characters, the sed substitution will fail or produce incorrect output. Consider using a different delimiter or a safer substitution method.
🔧 Safer token injection
args:
- |
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
- sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml
+ # Use awk to avoid sed delimiter issues with tokens
+ awk -v token="$TOKEN" '{gsub(/INJECT_TOKEN_HERE/, token)}1' /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yamlAlternatively, use a different sed delimiter that's unlikely to appear in JWT tokens:
- sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml
+ sed "s|INJECT_TOKEN_HERE|${TOKEN}|" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml🧰 Tools
🪛 Trivy (0.69.3)
[warning] 10-21: Can elevate its own privileges
Container 'inject-token' of Deployment 'grafana' should set 'securityContext.allowPrivilegeEscalation' to false
Rule: KSV-0001
(IaC/Kubernetes)
[info] 10-21: Default capabilities: some containers do not drop all
Container 'inject-token' of Deployment 'grafana' should add 'ALL' to 'securityContext.capabilities.drop'
Rule: KSV-0003
(IaC/Kubernetes)
[info] 10-21: Default capabilities: some containers do not drop any
Container 'inject-token' of 'deployment' 'grafana' in 'default' namespace should set securityContext.capabilities.drop
Rule: KSV-0004
(IaC/Kubernetes)
[info] 10-21: CPU not limited
Container 'inject-token' of Deployment 'grafana' should set 'resources.limits.cpu'
Rule: KSV-0011
(IaC/Kubernetes)
[warning] 10-21: Runs as root user
Container 'inject-token' of Deployment 'grafana' should set 'securityContext.runAsNonRoot' to true
Rule: KSV-0012
(IaC/Kubernetes)
[warning] 10-21: Image tag ":latest" used
Container 'inject-token' of Deployment 'grafana' should specify an image tag
Rule: KSV-0013
(IaC/Kubernetes)
[error] 10-21: Root file system is not read-only
Container 'inject-token' of Deployment 'grafana' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[info] 10-21: CPU requests not specified
Container 'inject-token' of Deployment 'grafana' should set 'resources.requests.cpu'
Rule: KSV-0015
(IaC/Kubernetes)
[info] 10-21: Memory requests not specified
Container 'inject-token' of Deployment 'grafana' should set 'resources.requests.memory'
Rule: KSV-0016
(IaC/Kubernetes)
[info] 10-21: Memory not limited
Container 'inject-token' of Deployment 'grafana' should set 'resources.limits.memory'
Rule: KSV-0018
(IaC/Kubernetes)
[info] 10-21: Runs with UID <= 10000
Container 'inject-token' of Deployment 'grafana' should set 'securityContext.runAsUser' > 10000
Rule: KSV-0020
(IaC/Kubernetes)
[info] 10-21: Runs with GID <= 10000
Container 'inject-token' of Deployment 'grafana' should set 'securityContext.runAsGroup' > 10000
Rule: KSV-0021
(IaC/Kubernetes)
[info] 10-21: Runtime/Default Seccomp profile not set
Either Pod or Container should set 'securityContext.seccompProfile.type' to 'RuntimeDefault'
Rule: KSV-0030
(IaC/Kubernetes)
[warning] 10-21: Seccomp policies disabled
container "inject-token" of deployment "grafana" in "default" namespace should specify a seccomp profile
Rule: KSV-0104
(IaC/Kubernetes)
[info] 10-21: Container capabilities must only include NET_BIND_SERVICE
container should drop all
Rule: KSV-0106
(IaC/Kubernetes)
[error] 10-21: Default security context configured
container grafana in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[warning] 10-21: Restrict container images to trusted registries
Container inject-token in deployment grafana (namespace: default) uses an image from an untrusted registry.
Rule: KSV-0125
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`
around lines 10 - 16, The current container "inject-token" injects the
ServiceAccount token into /etc/grafana-provisioning/datasources.yaml using sed
with default delimiter, which can break if the token contains "/" or "&"; update
the injection in the inject-token container (the command/args that build TOKEN
and run sed) to use a safer substitution method (e.g., use a different sed
delimiter such as | or @, or write the token into the target file via a
here-doc/perl/awk that performs literal replacement) when replacing
INJECT_TOKEN_HERE in /etc/grafana-base/datasources.yaml and ensure the output is
written to /etc/grafana-provisioning/datasources.yaml.
| serviceAccountName: grafana | ||
| initContainers: | ||
| - name: inject-token | ||
| image: registry.access.redhat.com/ubi9/ubi-minimal:latest |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Pin init container image tags for reproducibility.
Using :latest tag for ubi9/ubi-minimal makes builds non-reproducible and could introduce unexpected changes.
📌 Suggested fix
- name: inject-token
- image: registry.access.redhat.com/ubi9/ubi-minimal:latest
+ image: registry.access.redhat.com/ubi9/ubi-minimal:9.4 - name: collect-dashboards
- image: registry.access.redhat.com/ubi9/ubi-minimal:latest
+ image: registry.access.redhat.com/ubi9/ubi-minimal:9.4Also applies to: 23-23
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`
at line 11, The init container image currently uses an unpinned tag
"registry.access.redhat.com/ubi9/ubi-minimal:latest" which breaks
reproducibility; change it to a fixed, immutable reference (either a specific
version tag or an image digest) by replacing
"registry.access.redhat.com/ubi9/ubi-minimal:latest" in the grafana deployment
patch with a pinned tag or digest (for example a known stable UBI9 tag or sha256
digest) so the init container image is deterministic across builds; update all
occurrences (including the other instance noted at the same file) of this image
string.
| apiVersion: v1 | ||
| kind: PersistentVolumeClaim | ||
| metadata: | ||
| name: grafana-storage | ||
| spec: | ||
| accessModes: [ReadWriteOnce] | ||
| resources: | ||
| requests: | ||
| storage: 5Gi |
There was a problem hiding this comment.
Add explicit namespace to avoid deployment issues.
The PVC is applied directly via kubectl apply -f (not through Kustomize), so it won't inherit the namespace from the kustomization. Without an explicit namespace, the PVC will be created in the current kubectl context's namespace, which may not be ambient-code.
🔧 Proposed fix
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-storage
+ namespace: ambient-code
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 5Gi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| apiVersion: v1 | |
| kind: PersistentVolumeClaim | |
| metadata: | |
| name: grafana-storage | |
| spec: | |
| accessModes: [ReadWriteOnce] | |
| resources: | |
| requests: | |
| storage: 5Gi | |
| apiVersion: v1 | |
| kind: PersistentVolumeClaim | |
| metadata: | |
| name: grafana-storage | |
| namespace: ambient-code | |
| spec: | |
| accessModes: [ReadWriteOnce] | |
| resources: | |
| requests: | |
| storage: 5Gi |
🧰 Tools
🪛 Trivy (0.69.3)
[info] 5-9: limit range usage
A LimitRange policy with a default requests and limits for each container should be configured
Rule: KSV-0039
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`
around lines 1 - 9, The PersistentVolumeClaim resource (kind:
PersistentVolumeClaim, metadata.name: grafana-storage) lacks an explicit
namespace; add a metadata.namespace: ambient-code entry to the manifest so the
PVC is created in the expected ambient-code namespace when applied directly with
kubectl.
| spec: | ||
| containers: | ||
| - name: grafana | ||
| image: grafana/grafana:11.4.0 | ||
| ports: | ||
| - containerPort: 3000 | ||
| name: http | ||
| env: | ||
| - name: GF_SECURITY_ADMIN_USER | ||
| value: admin | ||
| - name: GF_USERS_ALLOW_SIGN_UP | ||
| value: "false" | ||
| volumeMounts: | ||
| - name: datasources | ||
| mountPath: /etc/grafana/provisioning/datasources | ||
| - name: dashboards-provider | ||
| mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml | ||
| subPath: dashboards.yaml | ||
| - name: storage | ||
| mountPath: /var/lib/grafana | ||
| resources: | ||
| requests: | ||
| cpu: 100m | ||
| memory: 128Mi | ||
| limits: | ||
| cpu: 500m | ||
| memory: 512Mi |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add security context to harden the Grafana container.
The deployment lacks a securityContext for the pod and container. For production workloads, consider adding:
🛡️ Suggested security hardening
spec:
+ securityContext:
+ runAsNonRoot: true
+ seccompProfile:
+ type: RuntimeDefault
containers:
- name: grafana
image: grafana/grafana:11.4.0
+ securityContext:
+ allowPrivilegeEscalation: false
+ capabilities:
+ drop:
+ - ALL
+ readOnlyRootFilesystem: false # Grafana needs write access
ports:
- containerPort: 3000📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| spec: | |
| containers: | |
| - name: grafana | |
| image: grafana/grafana:11.4.0 | |
| ports: | |
| - containerPort: 3000 | |
| name: http | |
| env: | |
| - name: GF_SECURITY_ADMIN_USER | |
| value: admin | |
| - name: GF_USERS_ALLOW_SIGN_UP | |
| value: "false" | |
| volumeMounts: | |
| - name: datasources | |
| mountPath: /etc/grafana/provisioning/datasources | |
| - name: dashboards-provider | |
| mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml | |
| subPath: dashboards.yaml | |
| - name: storage | |
| mountPath: /var/lib/grafana | |
| resources: | |
| requests: | |
| cpu: 100m | |
| memory: 128Mi | |
| limits: | |
| cpu: 500m | |
| memory: 512Mi | |
| spec: | |
| securityContext: | |
| runAsNonRoot: true | |
| seccompProfile: | |
| type: RuntimeDefault | |
| containers: | |
| - name: grafana | |
| image: grafana/grafana:11.4.0 | |
| securityContext: | |
| allowPrivilegeEscalation: false | |
| capabilities: | |
| drop: | |
| - ALL | |
| readOnlyRootFilesystem: false # Grafana needs write access | |
| ports: | |
| - containerPort: 3000 | |
| name: http | |
| env: | |
| - name: GF_SECURITY_ADMIN_USER | |
| value: admin | |
| - name: GF_USERS_ALLOW_SIGN_UP | |
| value: "false" | |
| volumeMounts: | |
| - name: datasources | |
| mountPath: /etc/grafana/provisioning/datasources | |
| - name: dashboards-provider | |
| mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml | |
| subPath: dashboards.yaml | |
| - name: storage | |
| mountPath: /var/lib/grafana | |
| resources: | |
| requests: | |
| cpu: 100m | |
| memory: 128Mi | |
| limits: | |
| cpu: 500m | |
| memory: 512Mi |
🧰 Tools
🪛 Trivy (0.69.3)
[warning] 46-70: Can elevate its own privileges
Container 'grafana' of Deployment 'grafana' should set 'securityContext.allowPrivilegeEscalation' to false
Rule: KSV-0001
(IaC/Kubernetes)
[info] 46-70: Default capabilities: some containers do not drop all
Container 'grafana' of Deployment 'grafana' should add 'ALL' to 'securityContext.capabilities.drop'
Rule: KSV-0003
(IaC/Kubernetes)
[info] 46-70: Default capabilities: some containers do not drop any
Container 'grafana' of 'deployment' 'grafana' in 'default' namespace should set securityContext.capabilities.drop
Rule: KSV-0004
(IaC/Kubernetes)
[warning] 46-70: Runs as root user
Container 'grafana' of Deployment 'grafana' should set 'securityContext.runAsNonRoot' to true
Rule: KSV-0012
(IaC/Kubernetes)
[error] 46-70: Root file system is not read-only
Container 'grafana' of Deployment 'grafana' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[info] 46-70: Runs with UID <= 10000
Container 'grafana' of Deployment 'grafana' should set 'securityContext.runAsUser' > 10000
Rule: KSV-0020
(IaC/Kubernetes)
[info] 46-70: Runs with GID <= 10000
Container 'grafana' of Deployment 'grafana' should set 'securityContext.runAsGroup' > 10000
Rule: KSV-0021
(IaC/Kubernetes)
[info] 46-70: Runtime/Default Seccomp profile not set
Either Pod or Container should set 'securityContext.seccompProfile.type' to 'RuntimeDefault'
Rule: KSV-0030
(IaC/Kubernetes)
[warning] 46-70: Seccomp policies disabled
container "grafana" of deployment "grafana" in "default" namespace should specify a seccomp profile
Rule: KSV-0104
(IaC/Kubernetes)
[info] 46-70: Container capabilities must only include NET_BIND_SERVICE
container should drop all
Rule: KSV-0106
(IaC/Kubernetes)
[error] 46-70: Default security context configured
container grafana in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[warning] 46-70: Restrict container images to trusted registries
Container grafana in deployment grafana (namespace: default) uses an image from an untrusted registry.
Rule: KSV-0125
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/observability/overlays/with-grafana/grafana.yaml` around
lines 46 - 72, Add a pod- and container-level securityContext to harden the
Grafana deployment: under spec (pod) add a securityContext with fsGroup and
runAsNonRoot and runAsUser values, and under the grafana container (name:
grafana) add a securityContext specifying readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, and runAsNonRoot/runAsUser to enforce non-root
execution; update volumeMounts or storage as needed so read-only root filesystem
and the chosen fsGroup still allow Grafana to write to /var/lib/grafana and
/etc/grafana/provisioning.
| containers: | ||
| - name: grafana | ||
| image: grafana/grafana:11.4.0 | ||
| ports: | ||
| - containerPort: 3000 | ||
| name: http | ||
| env: | ||
| - name: GF_SECURITY_ADMIN_USER | ||
| value: admin | ||
| - name: GF_USERS_ALLOW_SIGN_UP | ||
| value: "false" | ||
| volumeMounts: | ||
| - name: datasources | ||
| mountPath: /etc/grafana/provisioning/datasources | ||
| - name: dashboards-provider | ||
| mountPath: /etc/grafana/provisioning/dashboards/dashboards.yaml | ||
| subPath: dashboards.yaml | ||
| - name: storage | ||
| mountPath: /var/lib/grafana | ||
| resources: | ||
| requests: | ||
| cpu: 100m | ||
| memory: 128Mi | ||
| limits: | ||
| cpu: 500m | ||
| memory: 512Mi |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding liveness and readiness probes.
The Grafana container lacks health probes, which can impact pod lifecycle management and rolling updates.
💡 Suggested probe configuration
- name: grafana
image: grafana/grafana:11.4.0
ports:
- containerPort: 3000
name: http
+ livenessProbe:
+ httpGet:
+ path: /api/health
+ port: 3000
+ initialDelaySeconds: 30
+ periodSeconds: 10
+ readinessProbe:
+ httpGet:
+ path: /api/health
+ port: 3000
+ initialDelaySeconds: 10
+ periodSeconds: 5🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/observability/overlays/with-grafana/grafana.yaml` around
lines 47 - 72, Add livenessProbe and readinessProbe to the grafana container
spec (container name "grafana") so Kubernetes can manage pod health; configure
both to use an httpGet on path /api/health (or /api/healthz) against port named
"http" (3000) with sensible timing (e.g., initialDelaySeconds ~10-30,
periodSeconds ~10-15, timeoutSeconds ~1-5, failureThreshold ~3) and ensure
readinessProbe returns success before traffic is routed. Update the manifest
under the container "grafana" to include these probe blocks and tune values to
match startup characteristics.
|
Hi @jeremyeder, |
Signed-off-by: Rahul Shetty <rashetty@redhat.com>
Signed-off-by: Rahul Shetty <rashetty@redhat.com>
Signed-off-by: Rahul Shetty <rashetty@redhat.com>
f6da902 to
d6d861f
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml (1)
3-5:⚠️ Potential issue | 🟠 MajorSet an explicit namespace on the PVC manifest.
Because this file is applied directly, the claim can be created in the wrong namespace when kube context differs, causing Grafana to miss the expected claim.
🔧 Proposed fix
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: grafana-storage + namespace: ambient-code spec: accessModes: [ReadWriteOnce] resources: requests: storage: 5Gi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml` around lines 3 - 5, The PVC manifest lacks an explicit namespace so it can be created in the wrong context; update the grafana-pvc.yaml by adding metadata.namespace with the Grafana release namespace (i.e. the namespace where the Grafana Deployment/Helm release runs) alongside metadata.name: grafana-storage so the PersistentVolumeClaim is always created in the correct namespace; ensure this namespace matches the Grafana deployment/service account namespace referenced in your overlays.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json`:
- Line 2149: The PromQL expressions like
kube_pod_container_status_waiting_reason{reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}
must include the dashboard variables for cluster and namespace so results
respect the selected scope; update that selector to include cluster=~"$cluster"
and namespace=~"$namespace" (e.g.
kube_pod_container_status_waiting_reason{cluster=~"$cluster",
namespace=~"$namespace",
reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}) and apply the same
change to the other similar query at the second occurrence so both tables filter
by the selected cluster and namespace.
In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`:
- Around line 15-16: The init-container currently snapshots the ServiceAccount
token via the sed call that reads
/var/run/secrets/kubernetes.io/serviceaccount/token and writes
/etc/grafana-provisioning/datasources.yaml (the sed
"s/INJECT_TOKEN_HERE/${TOKEN}/" step), which becomes stale after kubelet token
rotation; remove that init-only token snapshot and instead add a small sidecar
(token-refresh) that watches the live token file
(/var/run/secrets/kubernetes.io/serviceaccount/token) and atomically rewrites
/etc/grafana-provisioning/datasources.yaml from the template at
/etc/grafana-base/datasources.yaml whenever the token changes (or use an
in-place updater loop with sleep/watch+cmp), ensuring the sidecar has the same
shared volume mounts as the Grafana container; alternatively, mount a projected
serviceAccountToken volume and modify Grafana provisioning to read the token
file at runtime if supported—locate the current sed invocation in the
initContainer and replace it with the sidecar implementation that updates the
same target path.
In `@components/manifests/observability/overlays/with-grafana/grafana.yaml`:
- Around line 53-57: The Deployment currently sets GF_SECURITY_ADMIN_USER and
GF_USERS_ALLOW_SIGN_UP but leaves the admin password at Grafana's default; add
an env var GF_SECURITY_ADMIN_PASSWORD that sources its value from a Kubernetes
Secret (e.g., secret name like grafana-admin-secret, key like admin-password)
using valueFrom.secretKeyRef in the container spec so the password is not
hardcoded or defaulted; ensure the Secret is created separately and referenced
by the env entry alongside GF_SECURITY_ADMIN_USER and GF_USERS_ALLOW_SIGN_UP.
In `@components/manifests/observability/overlays/with-grafana/kustomization.yaml`:
- Around line 36-37: The kustomization currently sets
generatorOptions.disableNameSuffixHash: true which prevents ConfigMap names from
changing and thus prevents the init-container that copies dashboards from
triggering pod rollouts; to fix, remove or set
generatorOptions.disableNameSuffixHash to false so kustomize appends content
hashes to generated ConfigMap names (causing Deployment changes), or
alternatively add an explicit rollout trigger by updating the Deployment pod
template (the init-container pod spec) to include an annotation like a checksum
of the dashboard ConfigMap(s) (e.g., checksum/config-dashboards) that is
recalculated when the ConfigMap content changes so pods are recreated and the
init-container picks up updated dashboards.
---
Duplicate comments:
In `@components/manifests/observability/overlays/with-grafana/grafana-pvc.yaml`:
- Around line 3-5: The PVC manifest lacks an explicit namespace so it can be
created in the wrong context; update the grafana-pvc.yaml by adding
metadata.namespace with the Grafana release namespace (i.e. the namespace where
the Grafana Deployment/Helm release runs) alongside metadata.name:
grafana-storage so the PersistentVolumeClaim is always created in the correct
namespace; ensure this namespace matches the Grafana deployment/service account
namespace referenced in your overlays.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9e7e669a-5ca5-4276-a70f-6d5894619be1
📒 Files selected for processing (19)
Makefilecomponents/manifests/observability/README.mdcomponents/manifests/observability/base/kustomization.yamlcomponents/manifests/observability/base/otel-collector.yamlcomponents/manifests/observability/base/servicemonitor.yamlcomponents/manifests/observability/grafana.yamlcomponents/manifests/observability/kustomization.yamlcomponents/manifests/observability/overlays/with-grafana/dashboards/README.mdcomponents/manifests/observability/overlays/with-grafana/dashboards/ambient-operator-dashboard.jsoncomponents/manifests/observability/overlays/with-grafana/dashboards/k8s-cluster-monitoring.jsoncomponents/manifests/observability/overlays/with-grafana/dashboards/k8s-namespace.jsoncomponents/manifests/observability/overlays/with-grafana/dashboards/k8s-nodes.jsoncomponents/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.jsoncomponents/manifests/observability/overlays/with-grafana/grafana-datasource-patch.yamlcomponents/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yamlcomponents/manifests/observability/overlays/with-grafana/grafana-pvc.yamlcomponents/manifests/observability/overlays/with-grafana/grafana-rbac.yamlcomponents/manifests/observability/overlays/with-grafana/grafana.yamlcomponents/manifests/observability/overlays/with-grafana/kustomization.yaml
💤 Files with no reviewable changes (1)
- components/manifests/observability/grafana.yaml
| }, | ||
| "editorMode": "code", | ||
| "exemplar": false, | ||
| "expr": "kube_pod_container_status_waiting_reason{reason=~\"ErrImagePull|ImagePullBackOff|CrashLoopBackOff\"} == 1", |
There was a problem hiding this comment.
Respect the selected cluster and namespace in these tables.
Lines 2149 and 2310 omit $cluster and $namespace, so this dashboard can show incident rows from unrelated namespaces or even other clusters after the user narrows the view. That makes both tables misleading in a shared Prometheus.
Suggested query fix
- "expr": "kube_pod_container_status_waiting_reason{reason=~\"ErrImagePull|ImagePullBackOff|CrashLoopBackOff\"} == 1",
+ "expr": "kube_pod_container_status_waiting_reason{cluster=\"$cluster\", namespace=\"$namespace\", reason=~\"ErrImagePull|ImagePullBackOff|CrashLoopBackOff\"} == 1",- "expr": "kube_pod_status_scheduled{condition=\"false\"} == 1",
+ "expr": "kube_pod_status_scheduled{cluster=\"$cluster\", namespace=\"$namespace\", condition=\"false\"} == 1",As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
Also applies to: 2310-2310
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@components/manifests/observability/overlays/with-grafana/dashboards/k8s-pods.json`
at line 2149, The PromQL expressions like
kube_pod_container_status_waiting_reason{reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}
must include the dashboard variables for cluster and namespace so results
respect the selected scope; update that selector to include cluster=~"$cluster"
and namespace=~"$namespace" (e.g.
kube_pod_container_status_waiting_reason{cluster=~"$cluster",
namespace=~"$namespace",
reason=~"ErrImagePull|ImagePullBackOff|CrashLoopBackOff"}) and apply the same
change to the other similar query at the second occurrence so both tables filter
by the selected cluster and namespace.
| TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) | ||
| sed "s/INJECT_TOKEN_HERE/${TOKEN}/" /etc/grafana-base/datasources.yaml > /etc/grafana-provisioning/datasources.yaml |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n "components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml"Repository: ambient-code/platform
Length of output: 3308
🌐 Web query:
Kubernetes projected ServiceAccount token rotation behavior
💡 Result:
Kubernetes projected ServiceAccount tokens are short-lived JWTs obtained via the TokenRequest API and mounted into pods as projected volumes. The kubelet manages their lifecycle, automatically requesting new tokens and updating the mounted files. Key rotation behavior: - Tokens have configurable expirationSeconds (default 1 hour, minimum 10 minutes, max limited by --service-account-max-token-expiration). - Kubelet proactively refreshes the token if older than 80% of TTL or older than 24 hours. - The token file is updated atomically; applications must reload the file to use the new token (e.g., periodically or on file change). - Tokens are bound to the pod: invalidated on pod deletion (within ~60s grace period). - By default (since v1.22), ServiceAccount admission controller injects a projected volume (kube-api-access-*) with a token for API access. Example pod spec: volumes: - name: token projected: sources: - serviceAccountToken: expirationSeconds: 3600 audience: api Applications should read the token from the mount path (e.g., /var/run/secrets/kubernetes.io/serviceaccount/token) on a schedule (e.g., every 5 min) or watch for file changes, as Kubernetes clients handle this automatically. Legacy non-expiring secret-based tokens are deprecated and cleaned up in modern clusters (v1.24+). This is stable since Kubernetes v1.20 (TokenRequestProjection GA).
Citations:
- 1: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
- 2: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/
- 3: http://k8s.io/docs/reference/access-authn-authz/service-accounts-admin/
- 4: http://k8s.io/docs/concepts/storage/projected-volumes/
🌐 Web query:
OpenShift token rotation init container lifecycle
💡 Result:
No specific official documentation or authoritative source describes a standard "OpenShift token rotation init container lifecycle." Searches across official Red Hat OpenShift docs, OKD docs, Kubernetes docs, GitHub issues, and bugzillas yielded no direct matches for this exact concept. Key related findings: Bound service account (SA) tokens in OpenShift (docs from 4.8 to 4.20) are automatically rotated by the kubelet when older than 80% of expirationSeconds or 24 hours. Apps must reload the token file. No init container involvement; rotation happens during main container runtime via projected volumes. Init containers run sequentially to completion before main containers start, sharing pod resources but with separate images. They are ideal for setup (e.g., waiting for services, cloning repos) but not ongoing rotation, as they don't run continuously. In Cluster API (CAPI) operator (TechPreview), a controller rotates token secrets every 30 minutes if older (Bug 2089254, fixed in 4.11). No mention of init containers in operator pod spec. Custom patterns exist: token refreshers (e.g., kube-token-refresher, cap-token-refresh cronjob), External Secrets Operator for external secret sync/rotation, but none use init containers specifically for OpenShift SA token rotation. Conclusion: Token rotation typically uses bound projected volumes (automatic kubelet-managed) or operators/controllers. Init containers are unsuitable for rotation lifecycle due to their one-time execution nature. Custom init container implementations for token fetch/rotation are possible but non-standard, undocumented officially, and would require pod restarts for refresh (defeating rotation purpose).
Citations:
- 1: https://docs.okd.io/4.16/nodes/containers/nodes-containers-init.html
- 2: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
- 3: https://docs.openshift.com/container-platform/4.8/authentication/bound-service-account-tokens.html
- 4: https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/authentication_and_authorization/bound-service-account-tokens
- 5: https://bugzilla.redhat.com/show_bug.cgi?id=2089254
- 6: https://linkedin.com/company/init-6-inc
- 7: https://docs.openshift.com/container-platform/4.20/authentication/understanding-and-creating-service-accounts.html
- 8: https://docs.okd.io/4.20/authentication/configuring-internal-oauth.html
- 9: https://github.com/openshift/cluster-capi-operator
Init-container token snapshot will become stale after automatic token rotation.
Lines 15–16 read the ServiceAccount token once during init-container execution and snapshot it into datasources.yaml. On current Kubernetes/OpenShift releases, this token is automatically rotated by the kubelet (default ~1 hour), but the snapshot is never updated. Once the token expires, Grafana's Prometheus datasource authentication fails. Restarting the Grafana container does not help because init containers only run at pod creation time; the pod must be recreated to obtain a fresh token. Use a token-refresh mechanism (e.g., a sidecar or operator) instead of an init-only snapshot.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yaml`
around lines 15 - 16, The init-container currently snapshots the ServiceAccount
token via the sed call that reads
/var/run/secrets/kubernetes.io/serviceaccount/token and writes
/etc/grafana-provisioning/datasources.yaml (the sed
"s/INJECT_TOKEN_HERE/${TOKEN}/" step), which becomes stale after kubelet token
rotation; remove that init-only token snapshot and instead add a small sidecar
(token-refresh) that watches the live token file
(/var/run/secrets/kubernetes.io/serviceaccount/token) and atomically rewrites
/etc/grafana-provisioning/datasources.yaml from the template at
/etc/grafana-base/datasources.yaml whenever the token changes (or use an
in-place updater loop with sleep/watch+cmp), ensuring the sidecar has the same
shared volume mounts as the Grafana container; alternatively, mount a projected
serviceAccountToken volume and modify Grafana provisioning to read the token
file at runtime if supported—locate the current sed invocation in the
initContainer and replace it with the sidecar implementation that updates the
same target path.
| env: | ||
| - name: GF_SECURITY_ADMIN_USER | ||
| value: admin | ||
| - name: GF_USERS_ALLOW_SIGN_UP | ||
| value: "false" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "grafana.yaml" -o -name "grafana-deployment-patch.yaml" | head -20Repository: ambient-code/platform
Length of output: 225
🏁 Script executed:
cat -n components/manifests/observability/overlays/with-grafana/grafana.yaml | head -100Repository: ambient-code/platform
Length of output: 2767
🏁 Script executed:
cat -n components/manifests/observability/overlays/with-grafana/grafana-deployment-patch.yamlRepository: ambient-code/platform
Length of output: 3308
🏁 Script executed:
# Search for any Secret references in the overlays directory
rg -i "secret" components/manifests/observability/overlays/with-grafana/ --context 3Repository: ambient-code/platform
Length of output: 2044
🌐 Web query:
Grafana admin password environment variable GF_SECURITY_ADMIN_PASSWORD default behavior
💡 Result:
The environment variable GF_SECURITY_ADMIN_PASSWORD overrides the [security] admin_password setting in Grafana's configuration. By default, without this variable set (or the ini equivalent), the admin password is "admin". However, it is only applied on first-run during initial database initialization when the default admin user is created. On subsequent runs, especially with persistent data storage (e.g., Docker volumes), the password is already set in the database and cannot be overridden by the environment variable; it must be changed via the UI, CLI, or API. To use it effectively: - Ensure fresh data directory (empty /var/lib/grafana). - Set via Docker: docker run -e "GF_SECURITY_ADMIN_PASSWORD=yourpassword" grafana/grafana - Defaults.ini explicitly sets admin_password = admin. Sources confirm consistent behavior across versions, with issues arising from persistence.
Citations:
- 1: https://grafana.com/docs/grafana/v4.4/installation/configuration/
- 2: https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/
- 3: https://grafana.com/docs/grafana/v8.4/administration/configure-docker/
- 4: https://raw.githubusercontent.com/grafana/grafana/main/conf/defaults.ini
- 5: https://www.grafana.com/docs/grafana/v12.4/setup-grafana/configure-docker/
- 6: GF_SECURITY_ADMIN_PASSWORD issue grafana/grafana-docker#37
- 7: https://grafana.com/docs/grafana/v6.4/installation/docker/
- 8: [Bug] Security config is not affected by ENV variables in Docker container grafana/grafana#4825
🏁 Script executed:
find components/manifests/observability/overlays/with-grafana -name "kustomization.yaml" -o -name "kustomization.yml"Repository: ambient-code/platform
Length of output: 140
🏁 Script executed:
find components/manifests/observability/overlays/with-grafana -type f -name "*.yaml" | xargs basename -a | sort | uniqRepository: ambient-code/platform
Length of output: 191
🏁 Script executed:
cat -n components/manifests/observability/overlays/with-grafana/kustomization.yamlRepository: ambient-code/platform
Length of output: 1285
Configure admin password from a Secret instead of relying on Grafana's built-in default.
Lines 54-57 set the admin username to admin but do not configure the admin password. With this Deployment configuration and persistent storage at /var/lib/grafana, a fresh Grafana instance will bootstrap with Grafana's default admin/admin credential, creating a known security vulnerability on first deploy. Add GF_SECURITY_ADMIN_PASSWORD sourced from a Kubernetes Secret instead of allowing the default to persist.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/observability/overlays/with-grafana/grafana.yaml` around
lines 53 - 57, The Deployment currently sets GF_SECURITY_ADMIN_USER and
GF_USERS_ALLOW_SIGN_UP but leaves the admin password at Grafana's default; add
an env var GF_SECURITY_ADMIN_PASSWORD that sources its value from a Kubernetes
Secret (e.g., secret name like grafana-admin-secret, key like admin-password)
using valueFrom.secretKeyRef in the container spec so the password is not
hardcoded or defaulted; ensure the Secret is created separately and referenced
by the env entry alongside GF_SECURITY_ADMIN_USER and GF_USERS_ALLOW_SIGN_UP.
| generatorOptions: | ||
| disableNameSuffixHash: true |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and examine the target kustomization.yaml file
find . -path "*components/manifests/observability/overlays/with-grafana/kustomization.yaml" -type fRepository: ambient-code/platform
Length of output: 142
🏁 Script executed:
# Check the structure of the observability directory
find ./components/manifests/observability -type f -name "*.yaml" | head -20Repository: ambient-code/platform
Length of output: 786
🏁 Script executed:
# Examine the kustomization.yaml file content
if [ -f "components/manifests/observability/overlays/with-grafana/kustomization.yaml" ]; then
cat -n "components/manifests/observability/overlays/with-grafana/kustomization.yaml"
fiRepository: ambient-code/platform
Length of output: 1285
🏁 Script executed:
# Search for init-container references and ConfigMap copying logic
rg -i "init.*container|copy.*configmap|dashboard" --type yaml -A 3 -B 3 ./components/manifests/observability/Repository: ambient-code/platform
Length of output: 14768
🏁 Script executed:
# Look for Deployments that might reference the grafana overlay
rg "kind.*Deployment" --type yaml -A 10 ./components/manifests/observability/Repository: ambient-code/platform
Length of output: 3094
Dashboard ConfigMap updates won't trigger pod rollouts with disableNameSuffixHash: true.
Since the init-container copies dashboards from ConfigMaps at pod startup and those ConfigMaps have stable names, content changes won't propagate to existing pods. The Deployment spec remains unchanged when the ConfigMap content is updated, so no new pods are created to fetch the updated dashboard JSON files.
To ensure dashboard updates roll out automatically, remove the hash suffix disablement so ConfigMap names change with content. Alternatively, implement an explicit rollout trigger (such as a pod restart annotation or webhook) whenever dashboard ConfigMaps are updated.
🔧 Proposed fix
-generatorOptions:
- disableNameSuffixHash: true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| generatorOptions: | |
| disableNameSuffixHash: true |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/manifests/observability/overlays/with-grafana/kustomization.yaml`
around lines 36 - 37, The kustomization currently sets
generatorOptions.disableNameSuffixHash: true which prevents ConfigMap names from
changing and thus prevents the init-container that copies dashboards from
triggering pod rollouts; to fix, remove or set
generatorOptions.disableNameSuffixHash to false so kustomize appends content
hashes to generated ConfigMap names (causing Deployment changes), or
alternatively add an explicit rollout trigger by updating the Deployment pod
template (the init-container pod spec) to include an annotation like a checksum
of the dashboard ConfigMap(s) (e.g., checksum/config-dashboards) that is
recalculated when the ConfigMap content changes so pods are recreated and the
init-container picks up updated dashboards.
Summary
grafana.yaml, dashboards, RBAC) into thewith-grafanaoverlay/var/lib/grafanainstead of/var/lib/grafana/data) sografana.dbpersists across restartsGF_SECURITY_ADMIN_PASSWORDenv var that was resetting password on every restartmake clean-observabilitypreserves Grafana dataDashboards
Test plan
make add-grafanadeploys without errorsmake clean-observabilityremoves stack but preserves PVC