Skip to content

K8s namespace not ready #552

@mohammedmuflih6-cpu

Description

@mohammedmuflih6-cpu

Agent Diagnostic

No agent available — NemoClaw gateway failed to start, preventing agent setup.
Could not load any skills. This is the issue being reported.

Description

The OpenShell gateway fails to start on WSL2 + Docker Desktop.
Expected: gateway starts successfully and openshell namespace is created.
Actual: times out waiting for namespace 'openshell' to exist.

Reproduction Steps

  1. Install NemoClaw on Windows WSL2 (Ubuntu) with Docker Desktop
  2. Run: curl -fsSL https://www.nvidia.com/nemoclaw.sh | sudo bash
  3. Gateway fails during onboarding
  4. Tried: openshell gateway destroy --name nemoclaw && openshell gateway start --name nemoclaw
  5. Same error every time

Environment

OS: Windows 11, WSL2 Ubuntu 24
Docker: 29.2.1
OpenShell: 0.0.14
NemoClaw: 0.1.0
Node.js: v22.22.1
RAM: ~5.6GB available to Docker

Logs

Error:   × K8s namespace not ready
  ╰─▶ timed out waiting for namespace 'openshell' to exist: Error from server (NotFound): namespaces "openshell" not
      found

      container logs:
        I0323 19:14:54.082629      95 iptables.go:212] Changing default FORWARD chain policy to ACCEPT
        I0323 19:14:54.108257      95 iptables.go:358] bootstrap done
        time="2026-03-23T19:14:54Z" level=info msg="Wrote flannel subnet file to /run/flannel/subnet.env"
        time="2026-03-23T19:14:54Z" level=info msg="Running flannel backend."
        I0323 19:14:54.124237      95 vxlan_network.go:68] watching for new subnet leases
        I0323 19:14:54.124279      95 vxlan_network.go:115] starting vxlan device watcher
        I0323 19:14:54.124864      95 iptables.go:358] bootstrap done
        time="2026-03-23T19:14:55Z" level=info msg="Starting network policy controller version v2.6.3-k3s1, built on
      2026-03-04T22:29:48Z, go1.25.7"
        I0323 19:14:55.153686      95 network_policy_controller.go:164] Starting network policy controller
        I0323 19:14:55.238143      95 network_policy_controller.go:179] Starting network policy controller full sync
      goroutine
        time="2026-03-23T19:14:55Z" level=info msg="Started tunnel to 172.18.0.2:6443"
        time="2026-03-23T19:14:55Z" level=info msg="Stopped tunnel to 127.0.0.1:6443"
        time="2026-03-23T19:14:55Z" level=info msg="Connecting to proxy" url="wss://172.18.0.2:6443/v1-k3s/connect"
        time="2026-03-23T19:14:55Z" level=info msg="Proxy done" err="context canceled" url="wss://127.0.0.1:6443/v1-
      k3s/connect"
        time="2026-03-23T19:14:55Z" level=info msg="error in remotedialer server [400]: websocket: close 1006
      (abnormal closure): unexpected EOF"
        time="2026-03-23T19:14:55Z" level=info msg="Handling backend connection request [fd891fc6b0cd]"
        time="2026-03-23T19:14:55Z" level=info msg="Connected to proxy" url="wss://172.18.0.2:6443/v1-k3s/connect"
        time="2026-03-23T19:14:55Z" level=info msg="Remotedialer connected to proxy" url="wss://172.18.0.2:6443/v1-
      k3s/connect"
        E0323 19:15:21.848355      95 resource_quota_controller.go:460] "Error during resource discovery" err="unable
      to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: stale GroupVersion discovery:
      metrics.k8s.io/v1beta1"
        I0323 19:15:21.891818      95 garbagecollector.go:792] "failed to discover some groups"
      groups="map[\"metrics.k8s.io/v1beta1\":\"stale GroupVersion discovery: metrics.k8s.io/v1beta1\"]"
        E0323 19:15:43.168368      95 handler_proxy.go:143] error resolving kube-system/metrics-server: no endpoints
      available for service "metrics-server"
        I0323 19:15:51.128839      95 pod_startup_latency_tracker.go:108] "Observed pod startup duration" pod="agent-
      sandbox-system/agent-sandbox-controller-0" podStartSLOduration=26.392285921 podStartE2EDuration="1m2.128822219s"
      podCreationTimestamp="2026-03-23 19:14:49 +0000 UTC" firstStartedPulling="2026-03-23 19:15:08.916140434 +0000
      UTC m=+31.547774233" lastFinishedPulling="2026-03-23 19:15:50.0588984 +0000 UTC m=+67.284310531"
      observedRunningTime="2026-03-23 19:15:51.128746023 +0000 UTC m=+68.354158154" watchObservedRunningTime="2026-03-
      23 19:15:51.128822219 +0000 UTC m=+68.354234349"
        E0323 19:15:54.584887      95 resource_quota_controller.go:460] "Error during resource discovery" err="unable
      to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: stale GroupVersion discovery:
      metrics.k8s.io/v1beta1"
        I0323 19:15:54.632588      95 garbagecollector.go:792] "failed to discover some groups"
      groups="map[\"metrics.k8s.io/v1beta1\":\"stale GroupVersion discovery: metrics.k8s.io/v1beta1\"]"
        W0323 19:15:56.050813      95 handler_proxy.go:99] no RequestInfo found in the context
        E0323 19:15:56.050927      95 controller.go:113] "Unhandled Error" err="loading OpenAPI spec for
      \"v1beta1.metrics.k8s.io\" failed with: Error, could not get list of group versions for APIService"
        I0323 19:15:56.050944      95 controller.go:126] OpenAPI AggregationController: action for item
      v1beta1.metrics.k8s.io: Rate Limited Requeue.
        W0323 19:15:56.051992      95 handler_proxy.go:99] no RequestInfo found in the context
        E0323 19:15:56.052121      95 controller.go:102] "Unhandled Error" err=<
        loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to download v1beta1.metrics.k8s.io:
      failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
        , Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
        >
        I0323 19:15:56.052134      95 controller.go:109] OpenAPI AggregationController: action for item
      v1beta1.metrics.k8s.io: Rate Limited Requeue.
        I0323 19:16:25.655634      95 pod_startup_latency_tracker.go:108] "Observed pod startup duration" pod="kube-
      system/coredns-7566b5ff58-tx8h7" podStartSLOduration=28.907583805 podStartE2EDuration="1m36.655419772s"
      podCreationTimestamp="2026-03-23 19:14:49 +0000 UTC" firstStartedPulling="2026-03-23 19:15:08.863062269 +0000
      UTC m=+31.494696068" lastFinishedPulling="2026-03-23 19:16:24.473339812 +0000 UTC m=+99.242532035"
      observedRunningTime="2026-03-23 19:16:25.655324087 +0000 UTC m=+100.424516310" watchObservedRunningTime="2026-
      03-23 19:16:25.655419772 +0000 UTC m=+100.424611995"
        E0323 19:16:27.045192      95 resource_quota_controller.go:460] "Error during resource discovery" err="unable
      to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: stale GroupVersion discovery:
      metrics.k8s.io/v1beta1"
        I0323 19:16:27.096992      95 garbagecollector.go:792] "failed to discover some groups"
      groups="map[\"metrics.k8s.io/v1beta1\":\"stale GroupVersion discovery: metrics.k8s.io/v1beta1\"]"
        I0323 19:16:34.678303      95 pod_startup_latency_tracker.go:108] "Observed pod startup duration" pod="kube-
      system/local-path-provisioner-6bc6568469-hk6tl" podStartSLOduration=28.146023392
      podStartE2EDuration="1m45.678286787s" podCreationTimestamp="2026-03-23 19:14:49 +0000 UTC"
      firstStartedPulling="2026-03-23 19:15:08.899546198 +0000 UTC m=+31.531180006" lastFinishedPulling="2026-03-23
      19:16:34.294251179 +0000 UTC m=+109.063443401" observedRunningTime="2026-03-23 19:16:34.678064102 +0000 UTC
      m=+109.447256334" watchObservedRunningTime="2026-03-23 19:16:34.678286787 +0000 UTC m=+109.447479010"
        E0323 19:16:48.356722      95 handler_proxy.go:143] error resolving kube-system/metrics-server: no endpoints
      available for service "metrics-server"
        E0323 19:16:59.564625      95 resource_quota_controller.go:460] "Error during resource discovery" err="unable
      to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: stale GroupVersion discovery:
      metrics.k8s.io/v1beta1"
        I0323 19:16:59.622480      95 garbagecollector.go:792] "failed to discover some groups"
      groups="map[\"metrics.k8s.io/v1beta1\":\"stale GroupVersion discovery: metrics.k8s.io/v1beta1\"]"

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions