Skip to content

feat(public-api): complete session API surface for mcp-acp integration#807

Open
adalton wants to merge 42 commits intomainfrom
andalton/mcp-update
Open

feat(public-api): complete session API surface for mcp-acp integration#807
adalton wants to merge 42 commits intomainfrom
andalton/mcp-update

Conversation

@adalton
Copy link
Copy Markdown
Contributor

@adalton adalton commented Mar 4, 2026

Summary

  • Add PATCH, logs, transcript, and metrics endpoints to the public API so mcp-acp can operate end-to-end using API keys (no code changes needed in mcp-acp itself)
  • Enrich SessionResponse DTO with displayName, repos, labels, and timeout fields
  • Add backend endpoints for pod logs (streaming, 10MB cap) and session metrics (CR status + usage annotations)
  • Fix label write/read symmetry: labels sent as annotations on create, reserved prefix validation on write, nil-annotation deletion in PatchSession
  • Fix setup-vertex-kind.sh to restart backend-api after ConfigMap patch

Test plan

  • Public-API unit tests pass (37 tests including new validation, PATCH routing, reserved prefix rejection, empty-string label, sub-resource proxy tests)
  • Backend unit tests pass (363 tests including 8 new metrics/logs handler tests)
  • E2E validated on local kind cluster (24/24 test steps + 2 post-review revalidations)
  • 3-iteration code review completed (all critical, design, and suggestion items addressed)
  • CI pipeline passes

🤖 Generated with Claude Code

Add PATCH, logs, transcript, and metrics endpoints to the public API so
mcp-acp can operate end-to-end with API keys. Enrich SessionResponse DTO
with displayName, repos, labels, and timeout fields.

Backend changes:
- Add GET /api/.../logs (streaming with 10MB cap, pod log via K8s API)
- Add GET /api/.../metrics (session CR status + usage annotations)
- Fix PatchSession to delete annotations set to nil instead of storing
  empty strings

Public API changes:
- Add PATCH /v1/sessions/:id with smart routing (start/stop/update/labels)
- Add GET /v1/sessions/:id/{logs,transcript,metrics} proxy handlers
- Forward labelSelector query params in ListSessions
- Forward displayName in CreateSession
- Validate label keys against reserved internal prefixes on write
- Fix labels-to-annotations mapping on create path for write/read symmetry
- Label PATCH returns full session DTO (follow-up GET for consistency)

Other:
- Fix setup-vertex-kind.sh to restart backend-api after ConfigMap patch
- Add IMAGE_PUBLIC_API substitution in e2e deploy script

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adalton adalton self-assigned this Mar 4, 2026
@adalton
Copy link
Copy Markdown
Contributor Author

adalton commented Mar 4, 2026

MCP Public API Update — E2E Test Plan

Result: 24/24 tests passed

Bugs found and fixed during testing:

  1. transformSession read labels from metadata.labels instead of
    metadata.annotations (sessions.go:268)
  2. Backend PatchSession set removed annotations to empty string instead of
    deleting them (sessions.go:991)
  3. setup-vertex-kind.sh only restarted the operator after patching
    operator-config ConfigMap, but not backend-api — leaving the backend
    with stale CLAUDE_CODE_USE_VERTEX=0, causing session creation to fail with
    "ANTHROPIC_API_KEY not configured"

Initial Setup

1. Verify port-forward to public API

$ curl -s http://localhost:18081/health
{"status":"ok"}
  • Health endpoint returns {"status":"ok"}

2. Create service account with RBAC for testing

$ kubectl -n ambient-code create serviceaccount e2e-test-sa
serviceaccount/e2e-test-sa created

$ kubectl -n ambient-code create role e2e-test-role \
  --verb=get,list,create,update,patch,delete \
  --resource=agenticsessions.vteam.ambient-code
role.rbac.authorization.k8s.io/e2e-test-role created

$ kubectl -n ambient-code create rolebinding e2e-test-binding \
  --role=e2e-test-role --serviceaccount=ambient-code:e2e-test-sa
rolebinding.rbac.authorization.k8s.io/e2e-test-binding created

$ kubectl -n ambient-code create role e2e-test-extras \
  --verb=get,list --resource=secrets,pods,pods/log,events
role.rbac.authorization.k8s.io/e2e-test-extras created

$ kubectl -n ambient-code create rolebinding e2e-test-extras-binding \
  --role=e2e-test-extras --serviceaccount=ambient-code:e2e-test-sa
rolebinding.rbac.authorization.k8s.io/e2e-test-extras-binding created
  • ServiceAccount created
  • Role with agenticsessions CRUD permissions created
  • RoleBinding created

3. Generate bearer token

$ TOKEN=$(kubectl -n ambient-code create token e2e-test-sa --duration=1h)
$ echo "Token length: ${#TOKEN}"
Token length: 954
  • Token is non-empty

4. Ensure backend-api has Vertex AI config

The setup-vertex-kind.sh script patches the operator-config ConfigMap and
restarts the operator, but does not restart backend-api. The backend reads
CLAUDE_CODE_USE_VERTEX from that same ConfigMap on startup, so it must be
restarted after the Vertex setup to skip the ANTHROPIC_API_KEY check.

$ kubectl -n ambient-code exec deploy/backend-api -- env | grep CLAUDE_CODE_USE_VERTEX
CLAUDE_CODE_USE_VERTEX=0

$ kubectl -n ambient-code rollout restart deployment backend-api
deployment.apps/backend-api restarted

$ kubectl -n ambient-code rollout status deployment backend-api --timeout=60s
Waiting for deployment "backend-api" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "backend-api" rollout to finish: 1 old replicas are pending termination...
deployment "backend-api" successfully rolled out

$ kubectl -n ambient-code exec deploy/backend-api -- env | grep CLAUDE_CODE_USE_VERTEX
CLAUDE_CODE_USE_VERTEX=1
  • backend-api has CLAUDE_CODE_USE_VERTEX=1

5. Verify token works with public API

$ curl -s http://localhost:18081/v1/sessions \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"
{"items":[],"total":0}
  • Returns 200 with {"items":[],"total":0} (no sessions yet)

Session CRUD — Enriched Fields

6. Create a session with displayName, model, and task

$ curl -s http://localhost:18081/v1/sessions \
  -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "echo hello world",
    "displayName": "E2E Test Session",
    "model": "claude-sonnet-4-20250514"
  }'
{"id":"session-1772647475","message":"Session created"}
  • Returns 201
  • Response contains id field
  • Response contains message field

7. Get the created session — verify enriched DTO

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"
{"id":"session-1772647475","status":"pending","task":"echo hello world",
 "model":"claude-sonnet-4-20250514","displayName":"E2E Test Session",
 "createdAt":"2026-03-04T18:04:35Z"}
  • Returns 200
  • Response contains id matching created session
  • Response contains task field (mapped from initialPrompt)
  • Response contains displayName field
  • Response contains model field
  • timeout omitted (correct: no timeout was set; field uses omitempty)
  • Response contains status field
  • Response contains createdAt field

8. List sessions — verify enriched fields in list

$ curl -s http://localhost:18081/v1/sessions \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"
{"items":[{"id":"session-1772647475","status":"running","task":"echo hello world",
 "model":"claude-sonnet-4-20250514","displayName":"E2E Test Session",
 "createdAt":"2026-03-04T18:04:35Z"}],"total":1}
  • Returns 200
  • items array contains at least 1 session
  • Session in list has displayName field

PATCH — Smart Routing

9. PATCH stop session (stopped: true)

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -X PATCH \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code" \
  -H "Content-Type: application/json" \
  -d '{"stopped": true}'
{"id":"session-1772647475","status":"running","task":"echo hello world",
 "model":"claude-sonnet-4-20250514","displayName":"E2E Test Session",
 "createdAt":"2026-03-04T18:04:35Z"}
  • Returns 200
  • Response is a valid session object (has id, status)

10. PATCH start session (stopped: false)

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -X PATCH \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code" \
  -H "Content-Type: application/json" \
  -d '{"stopped": false}'
{"id":"session-1772647475","status":"Stopping","task":"echo hello world",
 "model":"claude-sonnet-4-20250514","displayName":"E2E Test Session",
 "createdAt":"2026-03-04T18:04:35Z"}
  • Returns 200
  • Response is a valid session object

11. PATCH update displayName and timeout

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -X PATCH \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code" \
  -H "Content-Type: application/json" \
  -d '{"displayName": "Updated E2E Session", "timeout": 600}'
{"id":"session-1772647475","status":"Stopping","task":"echo hello world",
 "model":"claude-sonnet-4-20250514","displayName":"Updated E2E Session",
 "createdAt":"2026-03-04T18:04:35Z"}
  • Returns 200
  • Response reflects updated displayName

12. PATCH add labels

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -X PATCH \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code" \
  -H "Content-Type: application/json" \
  -d '{"labels": {"env": "test", "team": "platform"}}'
{"message":"Labels updated"}
  • Returns 200
  • Response contains success message

13. Verify labels appear on session

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"
{"id":"session-1772647475","status":"Stopped","task":"echo hello world",
 "model":"claude-sonnet-4-20250514","displayName":"Updated E2E Session",
 "createdAt":"2026-03-04T18:04:35Z","completedAt":"2026-03-04T18:33:52Z",
 "labels":{"env":"test","team":"platform"}}

NOTE: Initial attempt failed — transformSession was reading metadata.labels
instead of metadata.annotations. Fixed in sessions.go:268, rebuilt, and
redeployed. Labels now appear correctly.

  • GET session shows labels with the added key/value

14. PATCH remove labels

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -X PATCH \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code" \
  -H "Content-Type: application/json" \
  -d '{"removeLabels": ["env"]}'
{"message":"Labels updated"}
  • Returns 200

15. Verify labels removed

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"
{"id":"session-1772647475","status":"Stopped","task":"echo hello world",
 "model":"claude-sonnet-4-20250514","displayName":"Updated E2E Session",
 "createdAt":"2026-03-04T18:04:35Z","completedAt":"2026-03-04T18:33:52Z",
 "labels":{"team":"platform"}}

NOTE: Initial attempt showed "env":"" — the backend PatchSession was setting
anns[k] = nil which stored an empty string. Fixed backend to delete(anns, k)
when value is nil. Also added s != "" guard to filterUserLabels as defense.
Rebuilt and redeployed both backend-api and public-api.

  • GET session no longer shows removed label key

PATCH — Validation

16. PATCH with empty body — rejected

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -X PATCH \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code" \
  -H "Content-Type: application/json" \
  -d '{}'
{"error":"Request body must contain at least one of: stopped, displayName, timeout, labels, removeLabels"}
  • Returns 400
  • Error message indicates at least one field required

17. PATCH with mixed categories — rejected

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -X PATCH \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code" \
  -H "Content-Type: application/json" \
  -d '{"stopped": true, "labels": {"env": "test"}}'
{"error":"Cannot mix stopped, spec updates (displayName/timeout), and label changes in the same request"}
  • Returns 400
  • Error message indicates cannot mix categories

Sub-resource Endpoints

18. GET session logs

$ curl -s -w "\nHTTP_STATUS:%{http_code}\nCONTENT_TYPE:%{content_type}" \
  http://localhost:18081/v1/sessions/session-1772647475/logs \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"

HTTP_STATUS:200
CONTENT_TYPE:text/plain; charset=utf-8

Body is empty (session pod completed/cleaned up) but endpoint responds correctly.

  • Returns 200
  • Content-Type is text/plain

19. GET session logs with query params

$ curl -s -w "\nHTTP_STATUS:%{http_code}" \
  "http://localhost:18081/v1/sessions/session-1772647475/logs?tailLines=100&container=runner" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"

HTTP_STATUS:200
  • Returns 200
  • Query params (tailLines, container) accepted without error

20. GET session transcript

$ curl -s -w "\nHTTP_STATUS:%{http_code}" \
  http://localhost:18081/v1/sessions/session-1772647475/transcript \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"
{"sessionId":"session-1772647475","projectName":"ambient-code",
 "exportDate":"2026-03-04T18:52:01Z","aguiEvents":[...]}
HTTP_STATUS:200

Full transcript returned with sessionId, projectName, exportDate, and aguiEvents array.

  • Returns 200
  • Response is JSON

21. GET session metrics

$ curl -s http://localhost:18081/v1/sessions/session-1772647475/metrics \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"
{"completionTime":"2026-03-04T18:33:52Z","durationSeconds":1757,
 "phase":"Stopped","sessionId":"session-1772647475",
 "startTime":"2026-03-04T18:04:35Z"}
  • Returns 200
  • Response contains sessionId
  • Response contains phase

Query Parameter Forwarding

22. List sessions with labelSelector

$ curl -s http://localhost:18081/v1/sessions?labelSelector=team%3Dplatform \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"
{"items":[{"id":"session-1772647475","status":"Stopped","task":"echo hello world",
 "model":"claude-sonnet-4-20250514","displayName":"Updated E2E Session",
 "createdAt":"2026-03-04T18:04:35Z","completedAt":"2026-03-04T18:33:52Z",
 "labels":{"team":"platform"}}],"total":1}
  • Returns 200
  • Request accepted (no error about unknown query param)

Cleanup

23. Delete the test session

$ curl -s -w "\nHTTP_STATUS:%{http_code}" \
  http://localhost:18081/v1/sessions/session-1772647475 \
  -X DELETE \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"

HTTP_STATUS:204
  • Returns 204

24. Verify session deleted

$ curl -s http://localhost:18081/v1/sessions/session-1772647475 \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Ambient-Project: ambient-code"
{"error":"Session not found"}
HTTP_STATUS:404
  • Returns 404 or list is empty

Post-Review Revalidation

After addressing code review feedback, two behavioral changes required revalidation.
Rebuilt public-api + backend-api, pushed images, and rolled out both deployments.

R1. Label PATCH now returns full session DTO

Previously returned {"message":"Labels updated"}. After Item 4 fix (follow-up GET),
now returns a full SessionResponse.

$ curl -s .../v1/sessions/session-1772660361 -X PATCH \
  -d '{"labels": {"env": "test", "team": "platform"}}'
{"id":"session-1772660361","status":"failed","task":"revalidation test",
 "model":"sonnet","displayName":"Revalidation Session",
 "createdAt":"2026-03-04T21:39:21Z",
 "labels":{"env":"test","team":"platform"}}

$ curl -s .../v1/sessions/session-1772660361 -X PATCH \
  -d '{"removeLabels": ["env"]}'
{"id":"session-1772660361","status":"failed","task":"revalidation test",
 "model":"sonnet","displayName":"Revalidation Session",
 "createdAt":"2026-03-04T21:39:21Z",
 "labels":{"team":"platform"}}
  • Add labels returns full session DTO with labels included
  • Remove labels returns full session DTO with label removed

R2. Labels set at create time now visible on GET

Previously CreateSession sent labels as backendReq["labels"] (K8s labels),
but the read path extracted from metadata.annotations. After Item 8/14 fix,
labels are sent as backendReq["annotations"] for write/read symmetry.

$ curl -s .../v1/sessions -X POST \
  -d '{"task": "label-at-create test", "labels": {"origin": "api", "priority": "high"}}'
{"id":"session-1772660385","message":"Session created"}

$ curl -s .../v1/sessions/session-1772660385
{"id":"session-1772660385","status":"failed","task":"label-at-create test",
 "model":"sonnet","createdAt":"2026-03-04T21:39:45Z",
 "labels":{"origin":"api","priority":"high"}}
  • Labels set at creation time appear on GET (previously silently dropped)

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

…y comments

- Use unstructured.NestedString/NestedFloat64 in sessions_metrics.go
  instead of direct type assertions
- Replace raw JSON bind error with generic "Invalid request body"
  in PatchSession
- Add safety comments explaining why raw query forwarding is safe
  in ListSessions, GetSessionTranscript, and GetSessionLogs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adalton
Copy link
Copy Markdown
Contributor Author

adalton commented Mar 4, 2026

Review Response

Thanks for the thorough review. Here's what I addressed and what I'm pushing back on:

Fixed (e29b679)

Item 1 — Unstructured helpers in sessions_metrics.go: Switched to unstructured.NestedString / unstructured.NestedFloat64 for all field access. Fair point — the helpers are cleaner for nested path access.

Item 5 — Raw JSON bind error: Replaced err.Error() with generic "Invalid request body" in PatchSession.

Item 6 — Query string forwarding comments: Added safety comments on all three RawQuery forwarding sites explaining why it's safe (backend ignores unknown params, RBAC enforced via user token).

Not changing

Item 2 — Log injection sanitization: No existing handler in sessions.go sanitizes sessionName before logging — there are 30+ instances of c.Param("sessionName") passed directly to log.Printf. Fixing it in just the 2 new handlers while leaving the other 30+ unsanitized creates a false sense of coverage. This should be addressed globally (e.g., via middleware) if it's a real concern, not piecemeal in this PR.

Item 3 — isValidK8sName validation: The function isValidK8sName does not exist in the handlers package. No existing handler validates sessionName format before K8s API calls. The K8s API returns a clear error for malformed names. Adding bespoke validation in just these 2 handlers while the other 30+ handlers don't do it would be inconsistent. Same as item 2 — this is a global concern, not scoped to this PR.

Item 4 — k8sDyn naming: The reviewer claims reqK8s, reqDyn is the canonical convention, but _, k8sDyn := GetK8sClientsForRequest(c) appears 9 times in sessions.go. My code matches the dominant existing pattern. The k8sDyn == nil check is also consistent — multiple existing handlers check only the dynamic client when they don't need the typed client.

@github-actions

This comment has been minimized.

…ments

- Add kubectl.kubernetes.io/ and meta.kubernetes.io/ to
  internalLabelPrefixes to prevent leaking last-applied-configuration
- Align sessions_metrics.go auth check to use k8sClt for consistency
  with sessions_logs.go
- Refine query forwarding comments to note backend validates params

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adalton
Copy link
Copy Markdown
Contributor Author

adalton commented Mar 5, 2026

Review Response (iteration 2)

Fixed (44f9dc9)

Item 1 — Auth check consistency: Changed sessions_metrics.go to use k8sClt, k8sDyn := ... / if k8sClt == nil to match sessions_logs.go. The reviewer is still wrong that reqK8s is the codebase convention (it's not — k8sDyn appears 9 times in sessions.go), but having my own two new files be inconsistent with each other was a fair catch.

Item 5 — kubectl.kubernetes.io/ prefix filter: Added kubectl.kubernetes.io/ and meta.kubernetes.io/ to internalLabelPrefixes. Good catch — last-applied-configuration can contain the full resource manifest.

Item 6 — Comment wording: Updated to "Forwarded verbatim to backend; backend enforces its own validation and RBAC via the user's token."

Not changing

Item 2 — Labels silently dropped on CreateSession: This was already fixed and E2E validated. The backend CreateAgenticSessionRequest struct has an Annotations map[string]string field (types/session.go:65), and CreateSession at line 666–671 reads req.Annotations and writes it to metadata["annotations"]. The public API sends backendReq["annotations"] = req.Labels, which maps correctly. See testing-results.md section R2 for the round-trip E2E validation showing labels set at create time are readable on GET.

Item 3 — Hardcoded -runner pod suffix: A constant in this file doesn't help — the operator is a separate Go module and can't import it. The comment documents the cross-module coupling, which is the appropriate mechanism. The operator has used this convention since the project's inception.

Item 4 — Fallback response in patchSessionLabels: This is a defensive fallback for when the label PATCH succeeds but the follow-up GET fails milliseconds later against the same backend with the same auth token — an extremely unlikely scenario. A partial SessionResponse{ID: sessionID} with all other fields empty isn't meaningfully better for typed clients than {"message": "Labels updated"}. The current approach at least communicates what happened. The 200 status code correctly reflects that the mutation succeeded.

Item 7 — Usage annotations as strings: By design. K8s annotations are always stringstring. Parsing them to numbers at the API layer introduces error handling for data the backend stored as strings. This is a documentation concern.

@github-actions

This comment has been minimized.

@jeremyeder
Copy link
Copy Markdown
Contributor

still needs review to be addressed

@adalton
Copy link
Copy Markdown
Contributor Author

adalton commented Mar 9, 2026

Review Response (iteration 3)

Fixed (b73fded): Sanitized sessionName in both sessions_logs.go and sessions_metrics.go using the existing SanitizeForLog() helper from helpers.go, which strips all control characters (newlines, carriage returns, null bytes) via regex. Applied at point of use before any log statements or pod name construction.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

Claude Code Review

Summary

PR #807 completes the public API session surface (PATCH, logs, transcript, metrics endpoints), enriches the SessionResponse DTO with displayName, repos, labels, and timeout, and fixes label write/read symmetry across the stack. All new backend handlers correctly use GetK8sClientsForRequest, SanitizeForLog is applied to user-supplied path params, type-safe unstructured access is used in sessions_metrics.go, and label injection is prevented via validateLabelKeys. Overall this is in good shape with one memory-efficiency concern and a couple of minor items noted below.

Issues by Severity

Blocker Issues

None.

Critical Issues

None.

Major Issues

1. GetSessionLogs in public-api buffers response instead of streaming

  • File: components/public-api/handlers/sessions_sub.go (GetSessionLogs function)
  • Problem: The public-api proxy calls io.ReadAll(resp.Body) then writes the full buffer to the client. The backend sessions_logs.go correctly uses io.Copy + io.LimitReader(logStream, 10MB) to stream. The public-api layer undoes that efficiency by materialising up to 10 MB per request into heap memory. Under concurrent load: N concurrent log fetches = N x 10 MB worst-case heap allocations.
  • Standard violated: Performance review axis — unnecessary buffering that negates the backend streaming design.
  • Suggested fix: Replace the io.ReadAll + c.Data pattern with streaming after the status check. The backend already caps the source at 10 MB so no additional io.LimitReader is needed in the proxy:
c.Header("Content-Type", "text/plain; charset=utf-8")
c.Status(http.StatusOK)
if _, err := io.Copy(c.Writer, resp.Body); err != nil {
    log.Printf("GetSessionLogs: error streaming backend response for %s: %v", sessionID, err)
}

Minor Issues

1. Backend GetSessionLogs returns HTTP 200 with empty body when pod does not exist

  • File: components/backend/handlers/sessions_logs.go
  • Problem: When the pod has not been created yet (or was already cleaned up), the handler returns 200 OK with an empty body. This is documented in a comment but conflates "no pod" with "pod has no log output", making it impossible for callers to distinguish "session is pending" from "session has no logs". A 204 No Content would be semantically cleaner without breaking callers.

2. patchSessionLabels makes two backend round-trips per request

  • File: components/public-api/handlers/sessions_patch.go (patchSessionLabels function)
  • Problem: The label PATCH path issues a PATCH to the backend then a follow-up GET to return a full session DTO — 2 backend round-trips per label mutation. The pattern is intentional and the fallback is well-handled, but it doubles backend load for label operations compared to start/stop. Worth documenting as a known trade-off.

3. SanitizeForLog used as both log-safety guard and K8s lookup key

  • File: components/backend/handlers/sessions_logs.go, sessions_metrics.go
  • Problem: sessionName is sanitised with SanitizeForLog (strips \n/\r) and then used both in log messages and as the K8s resource name in .Get(ctx, sessionName, ...) and pod name construction. K8s names cannot contain newlines so stripping is a no-op for the lookup — but a brief comment would clarify the dual use is intentional and safe.

Positive Highlights

  • Security-first label handling: validateLabelKeys correctly blocks reserved prefixes (ambient-code.io/, vteam.ambient-code/, app.kubernetes.io/, etc.) on both create and patch paths, and filterUserLabels strips them on read. Test coverage (TestValidateLabelKeys, TestFilterUserLabels, TestE2E_PatchSession_ReservedLabelPrefix) is thorough.

  • Correct K8s client usage throughout: Every new backend handler uses GetK8sClientsForRequest(c) with a nil check and 401 abort per k8s-client-usage.md and security-standards.md. No service-account fallback anywhere.

  • Type-safe unstructured access: sessions_metrics.go exclusively uses unstructured.NestedString and unstructured.NestedFloat64, fully compliant with backend-development.md.

  • Backend log streaming with size cap: io.LimitReader(logStream, 10MB) is a well-considered OOM defence for runaway log sources.

  • Backwards-compatible prompt field extraction: The transformSession fallback (initialPrompt then prompt) ensures sessions created before this PR continue to round-trip correctly.

  • Clean PatchSession routing: The category-count approach cleanly rejects mixed-category requests with clear error messages, and the test suite covers all edge cases (empty body, mixed categories, stop, start, update, labels, reserved prefix, invalid ID).

Recommendations

  1. (Major) Fix GetSessionLogs in sessions_sub.go to stream via io.Copy rather than buffering with io.ReadAll. One-line change that aligns the proxy with the backend streaming design.

  2. (Minor) Consider returning 204 No Content from the backend GetSessionLogs when the pod does not exist, so callers can distinguish "session pending" from "session with empty logs".

  3. (Minor) Add a short inline comment in sessions_logs.go and sessions_metrics.go explaining that using the SanitizeForLog-processed name as the K8s lookup key is safe because K8s resource names cannot contain newline characters.


Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adalton
Copy link
Copy Markdown
Contributor Author

adalton commented Mar 9, 2026

Review Response (iteration 3)

Fixed (11813e0)

Major — Public-API GetSessionLogs buffering: Replaced io.ReadAll + c.Data with io.Copy streaming directly to the client. Error responses (non-200) are still buffered since they're small bodies that need status code forwarding. This aligns the proxy layer with the backend's streaming design and eliminates the up-to-10 MB per-request heap allocation under concurrent load.

Minor 3 — SanitizeForLog dual-use comment: Added inline comments in both sessions_logs.go and sessions_metrics.go clarifying that the sanitized value is safe to reuse as a K8s lookup key since K8s resource names cannot contain control characters.

Not changing

Minor 1 — 200 vs 204 for missing pod: Changing the HTTP status code is a behavioral change that could break existing callers (MCP clients, CLI). The current 200-with-empty-body is documented and semantically valid — "here are the logs, there are none yet." A 204 would also require callers to handle an additional status code for a non-error case.

Minor 2 — Two round-trips for label PATCH: The reviewer acknowledges this is intentional. The PATCH endpoint doesn't return a full session DTO, so the follow-up GET is necessary to give callers a consistent SessionResponse. This is a known trade-off, not a bug.

@github-actions

This comment has been minimized.

…d session ID to label fallback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

Claude Code Review

Summary

PR 807 adds PATCH, logs, transcript, and metrics endpoints to the public-API, enriching SessionResponse DTO and fixing label write/read symmetry. User-token auth is correctly enforced, log-injection prevention is applied consistently, reserved-prefix validation guards annotation writes, and test coverage is thorough. No blocker or critical issues were found.

Issues by Severity

BLOCKER: None.

CRITICAL: None.

MAJOR ISSUES

  1. sessions_logs.go - Pod 404 silently swallowed
    File: components/backend/handlers/sessions_logs.go:140-143
    The handler accesses pod logs without first verifying the session CR exists. If the session never existed, callers receive the same empty 200 as a valid session with no logs yet. A session name matching an unrelated pod in the same namespace could return that pods logs (RBAC-constrained but still unexpected).
    Suggested fix: fetch the session CR first (as GetSessionMetrics does) and return 404 if missing, then proceed to pod log retrieval.

  2. sessions_patch.go - Missing sessionID in backend error log messages
    File: components/public-api/handlers/sessions_patch.go:978, 1027, 1094
    Error messages like Backend request failed for start/stop omit the session ID, unlike sessions_sub.go which consistently includes sessionID in every error log. Deviates from the error-handling.md pattern (log errors with context - project, session name).

  3. sessions_metrics.go - Misleading durationSeconds when completionTime is unparseable
    File: components/backend/handlers/sessions_metrics.go:235-245
    On RFC3339 parse failure, the code falls back to end = time.Now() while still emitting the malformed completionTime string. This produces a wrong timestamp paired with a growing durationSeconds, making a supposedly completed session appear still-running.
    Suggested fix: skip both durationSeconds and completionTime if time.Parse fails.

MINOR ISSUES

  1. IsForbidden not logged in new backend handlers
    Files: sessions_logs.go:145-148, sessions_metrics.go:211-214
    Neither handler logs the forbidden access attempt. The pattern in error-handling.md and security-standards.md is to log authorization failures for security auditing. A single log.Printf with project/session context would suffice.

  2. sessions_patch.go - Follow-up GET after PATCH has an eventual-consistency window
    File: components/public-api/handlers/sessions_patch.go:1113-1136
    After a successful label PATCH, a second GET is issued to return the full session DTO. The resource could be modified or deleted between the two calls. The current fallback is safe, but this race window is worth documenting in the code comment or API contract.

POSITIVE HIGHLIGHTS

  • GetK8sClientsForRequest used correctly with nil guard in all new backend handlers, consistent with k8s-client-usage.md
  • SanitizeForLog applied to every user-supplied session name before log.Printf, with clear comment explaining dual-use
  • unstructured.NestedString, NestedFloat64, and NestedMap used exclusively - no direct type assertions anywhere in new code
  • validateLabelKeys called on both create and patch paths for labels and removeLabels, blocking writes to system annotation namespaces
  • 10 MB log cap via io.LimitReader prevents OOM from high-output runners
  • Annotation nil-deletion fix correctly implements sparse-patch semantics required for label removal
  • Strong test coverage: 8 new Ginkgo backend tests and 15+ public-api unit tests covering auth, validation, and error propagation

RECOMMENDATIONS

  1. (Major) Verify session CR existence before pod-log access in GetSessionLogs to avoid silent 200 for unknown sessions.
  2. (Major) Add sessionID to error log messages in patchSessionStartStop, patchSessionUpdate, and patchSessionLabels.
  3. (Major) Omit durationSeconds and completionTime when time.Parse fails in GetSessionMetrics.
  4. (Minor) Log errors.IsForbidden cases in new backend handlers for security auditing.
  5. (Minor) Document the follow-up GET race window in patchSessionLabels.

adalton and others added 2 commits March 9, 2026 14:10
…stamp, add audit logging

- GetSessionLogs now checks session CR exists before pod log access (returns 404 for unknown sessions)
- Skip durationSeconds/completionTime when completionTime fails RFC3339 parse
- Log IsForbidden attempts in both backend handlers for security auditing
- Add sessionID context to all patch error log messages
- Document follow-up GET race window in patchSessionLabels

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adalton adalton enabled auto-merge (squash) March 16, 2026 17:13
@ambient-code ambient-code bot modified the milestone: Review Queue Mar 17, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/backend/handlers/sessions.go`:
- Around line 1121-1125: The current annotation assignment (anns[k] = v) accepts
arbitrary JSON types and empty strings which cause Kubernetes API errors; update
the session PATCH handler to validate each annotation value before assignment:
ensure v is a string type, treat empty-string ("") as nil/delete (call
delete(anns, k)), and if v is a non-string return a 400 Bad Request with a clear
validation error; keep using the same variables (anns, k, v) and ensure the
validated string value is assigned to anns[k] only after these checks so the
subsequent Kubernetes Update call never receives non-string annotation values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ce53b79-ede4-4d98-ad13-5899bf591a2d

📥 Commits

Reviewing files that changed from the base of the PR and between 55f1123 and 0fdcc01.

📒 Files selected for processing (1)
  • components/backend/handlers/sessions.go

Comment on lines +1121 to +1125
if v == nil {
delete(anns, k)
} else {
anns[k] = v
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify current PatchSession behavior writes arbitrary annotation value types and lacks type validation.

rg -n -C4 'if annsPatch, ok := metaPatch\["annotations"\]\.\(map\[string\]interface\{\}\); ok \{|for k, v := range annsPatch|anns\[k\] = v|v == nil' components/backend/handlers/sessions.go

# Check whether tests cover invalid annotation value types for PatchSession.
rg -n -C3 --type go 'PatchSession|annotations|null|Invalid request body|must be string'

Repository: ambient-code/platform

Length of output: 50378


🏁 Script executed:

# Check what happens after annotations are set - look for error handling
rg -A10 'SetNestedMap.*annotations' components/backend/handlers/sessions.go

# Search for any validation of annotation values in PatchSession or related code
rg -B5 -A5 'annotation.*string|string.*annotation' components/backend/handlers/sessions.go | head -50

# Check the types to confirm Annotations is defined as map[string]string
rg -A3 'type.*Session.*struct' components/backend/types/session.go

Repository: ambient-code/platform

Length of output: 4168


🏁 Script executed:

# Look for tests that might be patching annotations with non-string values
rg -n 'annotations.*:.*[0-9]|annotations.*:.*true|annotations.*:.*false|annotations.*:.*{' components/public-api/handlers/sessions_patch_test.go components/backend/handlers/sessions_test.go

# Check if there's any type assertion or validation before annotation assignment
rg -B3 -A3 'for k, v := range annsPatch' components/backend/handlers/sessions.go

Repository: ambient-code/platform

Length of output: 465


🏁 Script executed:

# Verify Kubernetes annotation constraints - check if there's validation elsewhere
rg -n 'annotation.*string|string.*annotation|map\[string\]string' components/backend/handlers/sessions.go | head -20

# Check what unstructured.SetNestedMap does with non-string values in comments or tests
rg -i 'SetNestedMap.*annotation|annotation.*SetNestedMap' --type go

Repository: ambient-code/platform

Length of output: 1098


🏁 Script executed:

# Check if errors from SetNestedMap are actually ignored
rg -B5 'unstructured\.SetNestedMap.*annotations' components/backend/handlers/sessions.go | grep -A5 'SetNestedMap'

# Look at error handling around the Update call to see if annotation errors propagate
rg -B2 -A8 'k8sDyn\.Resource.*Update.*item' components/backend/handlers/sessions.go | head -40

# Search for how the backend type Annotations (map[string]string) is used vs actual annotation storage
rg -n 'Annotations.*map\[string\]string' components/backend/types/session.go

# Check if there are any tests for patching with invalid annotation types
rg -A20 'TestE2E_PatchSession|func.*Patch.*Session' components/public-api/handlers/sessions_patch_test.go | grep -i 'annotation\|invalid\|string\|number\|bool'

Repository: ambient-code/platform

Length of output: 3318


🏁 Script executed:

# Look at the session DTO in public-api to see type consistency
rg -B2 -A2 'Annotations' components/public-api/types/dto.go

# Check if validation happens when binding the PATCH request
rg -B10 -A5 'ShouldBindJSON.*patch\|patch.*ShouldBindJSON' components/backend/handlers/sessions.go

# See if empty string annotations are tested anywhere
rg -n '"".*annotation|annotation.*""' components/public-api/handlers/sessions_patch_test.go components/backend/handlers/sessions_test.go

Repository: ambient-code/platform

Length of output: 47


🏁 Script executed:

# Check Kubernetes unstructured documentation and usage to understand what happens with non-string values
rg -C3 'unstructured\.' components/backend/handlers/sessions.go | head -60

Repository: ambient-code/platform

Length of output: 2077


🏁 Script executed:

# Check if there are any integration tests or actual test data with non-string annotation values
rg -n 'annotations.*:' components/public-api/handlers/sessions_patch_test.go -A2 -B2

# Specifically look for what annotation values are being tested
rg '"annotations"' components/public-api/handlers/sessions_patch_test.go -A5 -B2

Repository: ambient-code/platform

Length of output: 1655


🏁 Script executed:

# Check what the backend returns in PatchSession response - verify if it successfully reflects non-string values
rg -A3 'PatchSession.*200\|StatusOK.*annotation' components/public-api/handlers/sessions_patch_test.go

# Look at the full PatchSession test to understand what values are tested
sed -n '150,222p' components/public-api/handlers/sessions_patch_test.go

Repository: ambient-code/platform

Length of output: 2638


🏁 Script executed:

# Check if there's any documentation or handling of what types annotations should accept
rg -i 'kubernetes.*annotation|annotation.*kubernetes|string.*only|must be string' --type go | head -20

Repository: ambient-code/platform

Length of output: 741


🏁 Script executed:

# See if the backend handler has any imports that might indicate type validation
rg -n 'import\|^import' components/backend/handlers/sessions.go | head -10

Repository: ambient-code/platform

Length of output: 47


Validate annotation values before assignment to avoid 500s on invalid PATCH payloads.

At line 1124, anns[k] = v accepts arbitrary JSON types without validation. Kubernetes annotations must be strings; non-string values (number, boolean, object, array) are rejected by the Kubernetes API during the subsequent Update call and currently surface as 500 errors instead of 400. Additionally, empty-string annotation values are not handled and persist rather than being deleted, contrary to Kubernetes semantics.

💡 Suggested fix
 			for k, v := range annsPatch {
-				if v == nil {
-					delete(anns, k)
-				} else {
-					anns[k] = v
-				}
+				switch vv := v.(type) {
+				case nil:
+					delete(anns, k)
+				case string:
+					// Treat empty string as delete to avoid retaining "removed" annotations.
+					if strings.TrimSpace(vv) == "" {
+						delete(anns, k)
+						continue
+					}
+					anns[k] = vv
+				default:
+					c.JSON(http.StatusBadRequest, gin.H{
+						"error": fmt.Sprintf("Invalid annotation value for key %q: must be string or null", k),
+					})
+					return
+				}
 			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if v == nil {
delete(anns, k)
} else {
anns[k] = v
}
for k, v := range annsPatch {
switch vv := v.(type) {
case nil:
delete(anns, k)
case string:
// Treat empty string as delete to avoid retaining "removed" annotations.
if strings.TrimSpace(vv) == "" {
delete(anns, k)
continue
}
anns[k] = vv
default:
c.JSON(http.StatusBadRequest, gin.H{
"error": fmt.Sprintf("Invalid annotation value for key %q: must be string or null", k),
})
return
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/handlers/sessions.go` around lines 1121 - 1125, The
current annotation assignment (anns[k] = v) accepts arbitrary JSON types and
empty strings which cause Kubernetes API errors; update the session PATCH
handler to validate each annotation value before assignment: ensure v is a
string type, treat empty-string ("") as nil/delete (call delete(anns, k)), and
if v is a non-string return a 400 Bad Request with a clear validation error;
keep using the same variables (anns, k, v) and ensure the validated string value
is assigned to anns[k] only after these checks so the subsequent Kubernetes
Update call never receives non-string annotation values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 77fe084. Added a type switch to validate annotation values before assignment:

  • nil → deletes the annotation key (existing behavior)
  • string → assigns the value
  • anything else → returns 400 Bad Request with a descriptive error

This prevents non-string JSON types (numbers, booleans, objects, arrays) from reaching the Kubernetes API and causing 500 errors.

Not adopted: the suggestion to treat empty strings as deletes. Empty-string annotations are valid in Kubernetes (kubectl annotate ... key="" is allowed), and silently converting them to deletes could break callers who intentionally set empty values. If a caller wants to remove an annotation, they should send null.

adalton and others added 7 commits March 18, 2026 11:59
…00 errors

Kubernetes annotations must be map[string]string, but the PATCH handler
accepted arbitrary JSON types (numbers, booleans, objects, arrays) and
passed them through to the K8s API, which rejected them with a 500. Use
a type switch to validate each value: allow string and nil (delete),
reject all other types with a 400 Bad Request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ambient-code ambient-code bot removed this from the Review Queue milestone Mar 24, 2026
@ambient-code ambient-code bot deleted a comment from Gkrumbach07 Mar 24, 2026
@ambient-code
Copy link
Copy Markdown
Contributor

ambient-code bot commented Mar 24, 2026

Review Queue Status

Check Status Detail
CI FAIL End-to-End Tests

Action needed: Fix failing CI checks

Auto-generated by Review Queue workflow. Updated when PR changes.

@ambient-code ambient-code bot added this to the Review Queue milestone Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants