Skip to content

perf: add SSAR cache, event log tail reads, and agent status cache#1026

Open
syntaxsdev wants to merge 5 commits intoambient-code:mainfrom
syntaxsdev:fix/perf-bottlenecks
Open

perf: add SSAR cache, event log tail reads, and agent status cache#1026
syntaxsdev wants to merge 5 commits intoambient-code:mainfrom
syntaxsdev:fix/perf-bottlenecks

Conversation

@syntaxsdev
Copy link
Copy Markdown
Contributor

@syntaxsdev syntaxsdev commented Mar 25, 2026

Addresses three performance bottlenecks identified through benchmarking:

  • SSAR cache (30s TTL): caches K8s permission checks per user/project
    to avoid etcd writes on every API request. Before (sustained load):

    Concurrent Users p50 Throughput
    10 1ms 4,380/s
    50 499ms 101/s
    100 999ms 100/s

    After: cached requests skip the K8s API call entirely.

  • Event log tail reads: reconnect replay now reads only the last 2MB
    instead of scanning the entire file. Measured before/after:

    Events File Size Full Scan Tail Read Speedup
    100K 15 MB 936ms 122ms 7.7x
    1M 151 MB 9,716ms 129ms 75x

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Introduces two in-process TTL caches (SSAR decision cache and session agent-status cache), integrates SSAR cache into request middleware to short-circuit authorization, bounds event replay by tail-reading large JSONL files, and adds benchmarks/tests for both caches and agent-status derivation.

Changes

Cohort / File(s) Summary
SSAR cache implementation
components/backend/handlers/ssar_cache.go
New thread-safe TTL cache for SelfSubjectAccessReview decisions keyed by hashed token+namespace/verb/group/resource; lazy expiration, capacity eviction, and a CacheDisabled exported flag controlled by DISABLE_SSAR_CACHE.
Middleware integration
components/backend/handlers/middleware.go
Middleware now checks globalSSARCache before creating SSARs: returns cached deny (403) immediately on deny, performs SSAR on miss, stores res.Status.Allowed in cache, preserves existing error handling for SSAR failures.
Agent-status cache & enrichment
components/backend/handlers/sessions.go
Adds agentStatusCache (in-memory TTL keyed by session name, 5s TTL) with background sweeper; enrichAgentStatus reads cached AgentStatus for Running sessions, derives and caches on miss, and only applies non-empty derived values.
Event replay / agui store
components/backend/websocket/agui_store.go
Bounds reconnect replay by tail-reading up to replayMaxTailBytes: opens file, seeks to tail offset, handles partial-line skip, parses remaining JSONL lines (skips malformed lines), and falls back to full read on certain failures.
Benchmarks & correctness tests (agent-status)
components/backend/handlers/agent_status_bench_test.go
Adds benchmarks and correctness tests that synthesize event logs and exercise cached vs uncached enrichments, concurrent access, and list-page scenarios; overrides DeriveAgentStatusFromEvents in benchmarks for controlled workloads.
Benchmarks & correctness tests (SSAR cache)
components/backend/handlers/ssar_cache_bench_test.go
Adds benchmarks and a correctness test for ssarCache covering hit/miss/parallel scenarios and a key-cost benchmark (including SHA256-based key computation).

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client as Client (HTTP)
participant Middleware as Middleware (handlers)
participant Cache as SSAR Cache
participant K8s as Kubernetes API (SSAR)
Client->>Middleware: request (token, namespace, verb, resource)
Middleware->>Cache: check(key)
alt cache hit (allowed)
Cache-->>Middleware: allowed=true
Middleware-->>Client: proceed
else cache hit (denied)
Cache-->>Middleware: allowed=false
Middleware-->>Client: 403 Forbidden
else cache miss
Cache-->>Middleware: miss
Middleware->>K8s: create SelfSubjectAccessReview
K8s-->>Middleware: response (allowed/denied)
Middleware->>Cache: store(key, allowed)
alt allowed
Middleware-->>Client: proceed
else denied
Middleware-->>Client: 403 Forbidden
end
end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main performance improvements in the changeset: SSAR cache, event log tail reads, and agent status cache.
Description check ✅ Passed The description is directly related to the changeset, providing context on the three performance bottlenecks being addressed with specific benchmark metrics.
Docstring Coverage ✅ Passed Docstring coverage is 85.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/backend/handlers/sessions.go (1)

403-427: ⚠️ Potential issue | 🟠 Major

Include the namespace in the agent-status cache key.

metadata.name is only namespace-scoped, but this map is process-global. Two projects with the same session name will alias the same entry and can receive each other's cached agentStatus.

🔧 Suggested direction
 name, _ := session.Metadata["name"].(string)
+namespace, _ := session.Metadata["namespace"].(string)
 if name == "" {
 	return
 }
+cacheKey := namespace + "/" + name

 // Check cache
 agentStatusCache.RLock()
-if entry, ok := agentStatusCache.entries[name]; ok && time.Now().Before(entry.expiresAt) {
+if entry, ok := agentStatusCache.entries[cacheKey]; ok && time.Now().Before(entry.expiresAt) {
 	agentStatusCache.RUnlock()
 	if entry.status != "" {
 		session.Status.AgentStatus = types.StringPtr(entry.status)
 	}
 	return
 }
 agentStatusCache.RUnlock()
@@
 agentStatusCache.Lock()
-agentStatusCache.entries[name] = agentStatusCacheEntry{
+agentStatusCache.entries[cacheKey] = agentStatusCacheEntry{
 	status:    derived,
 	expiresAt: time.Now().Add(agentStatusCacheTTL),
 }
 agentStatusCache.Unlock()

As per coding guidelines, **: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/handlers/sessions.go` around lines 403 - 427, The cache
currently keys agentStatusCache.entries by the session name
(session.Metadata["name"]) which is namespace-scoped and causes cross-project
collisions; change the key to include the namespace (e.g., build key :=
fmt.Sprintf("%s/%s", namespace, name) where namespace :=
session.Metadata["namespace"].(string) or fallback to ""), then use that
composite key everywhere the map is accessed (both the RLock read:
agentStatusCache.entries[...] and the Lock write that sets
agentStatusCache.entries[...]) and when deriving/storing the entry (keep using
DeriveAgentStatusFromEvents(name) if it expects just name, but store under the
composite key). Ensure any checks that read entry.status and set
session.Status.AgentStatus use the composite key as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/backend/handlers/middleware.go`:
- Around line 318-325: The cached-allowed branch (globalSSARCache.check
returning cachedAllowed==true) is short-circuiting SSAR validation and lets
expired/revoked tokens pass; update the middleware handling around
cacheKey/cachedAllowed to still validate the incoming token (e.g., call the same
SSAR/introspect flow used for cache misses or check a stored token expiry/active
flag) before allowing access. Specifically, modify the logic in middleware.go
where globalSSARCache.check(cacheKey) is used: if found && cachedAllowed then
perform a token validity check (or verify stored expiry/nonce) and only proceed
to skip the full SSAR call when that token check succeeds; alternatively, change
the cache payload to include token expiry/active state and verify it here before
returning allowed. Ensure you reference and reuse the existing SSAR/token
validation functions used elsewhere so handlers (e.g., sessions proxy checks)
still rely on non-nil clients but only after token validity is confirmed.

In `@components/backend/handlers/sessions.go`:
- Around line 381-384: The TTL cache defined by agentStatusCache (with entries
map[string]agentStatusCacheEntry) never evicts expired entries, which leaks
memory; add an expiration field to agentStatusCacheEntry if not present and
implement a periodic sweeper goroutine (use time.Ticker) that takes the cache's
RWMutex, iterates entries, and deletes those whose expiry has passed, and ensure
the sweeper can be stopped on shutdown; alternatively ensure setters/readers
(functions that touch agentStatusCache) update/remove entries atomically when
sessions complete or are deleted so stale keys are removed.

In `@components/backend/handlers/ssar_cache.go`:
- Around line 43-58: In ssarCache.check, avoid deleting a potentially refreshed
entry by re-checking the map under the write lock: when
time.Now().After(entry.expiresAt) is true, acquire c.mu.Lock(), reload entry :=
c.entries[key] (and ok), verify that ok and that entry.expiresAt is still
expired before deleting; if the reloaded entry is present and not expired, use
its allowed value and return (don’t delete it). This ensures the concurrent
store that refreshed the key between RUnlock and Lock doesn’t get removed.

In `@components/backend/websocket/agui_store.go`:
- Around line 269-276: Add a clear truncation notice when tail-reading large
logs: after computing offset := fileSize - replayMaxTailBytes and successfully
seeking, compute and log how many bytes (and optionally approximate events by
counting newline-delimited records skipped) were omitted for sessionID so
operators see "skipped X bytes (~Y events)" along with sessionID and
replayMaxTailBytes; if counting events is expensive, at minimum log the skipped
byte count and include this behavior in the function comment above the tail-read
block referencing replayMaxTailBytes, readJSONLFile, and the f.Seek fallback so
callers/operators know active runs may miss early events like RUN_STARTED.
- Around line 282-297: The tail-scan loop currently ignores scanner errors and
JSON unmarshal failures; after the for scanner.Scan() loop check scanner.Err()
and propagate it (return it or wrap and return) so I/O errors in tail mode are
not silenced, and change the json.Unmarshal handling to log malformed JSON lines
(use the same log.Printf("Export: Skipping malformed JSON line: %v", err) style
as readJSONLFile) instead of silently dropping them; update the function's
return path/signature if needed so the events slice and any scanner error are
returned to the caller (refer to the scanner variable, the json.Unmarshal call,
the events slice and the existing readJSONLFile behavior).
- Around line 263-267: The small-file fast path opens a file into variable f and
then calls readJSONLFile(path) which re-opens the same path; fix by reusing the
existing open handle f instead of double-opening: either add/rename a helper
like readJSONLFromReader(io.Reader) and call it with f when fileSize <=
replayMaxTailBytes, or close f immediately and then call readJSONLFile(path);
update the branch that checks replayMaxTailBytes to use f (or close it) and
remove the second open to avoid the redundant open on the same path.

---

Outside diff comments:
In `@components/backend/handlers/sessions.go`:
- Around line 403-427: The cache currently keys agentStatusCache.entries by the
session name (session.Metadata["name"]) which is namespace-scoped and causes
cross-project collisions; change the key to include the namespace (e.g., build
key := fmt.Sprintf("%s/%s", namespace, name) where namespace :=
session.Metadata["namespace"].(string) or fallback to ""), then use that
composite key everywhere the map is accessed (both the RLock read:
agentStatusCache.entries[...] and the Lock write that sets
agentStatusCache.entries[...]) and when deriving/storing the entry (keep using
DeriveAgentStatusFromEvents(name) if it expects just name, but store under the
composite key). Ensure any checks that read entry.status and set
session.Status.AgentStatus use the composite key as well.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f8d9901a-3860-4a62-af54-0d17e5e61ca3

📥 Commits

Reviewing files that changed from the base of the PR and between 05c33d3 and 3c90592.

📒 Files selected for processing (4)
  • components/backend/handlers/middleware.go
  • components/backend/handlers/sessions.go
  • components/backend/handlers/ssar_cache.go
  • components/backend/websocket/agui_store.go

Comment on lines +318 to +325
if cachedAllowed, found := globalSSARCache.check(cacheKey); found {
if !cachedAllowed {
c.JSON(http.StatusForbidden, gin.H{"error": "Unauthorized to access project"})
c.Abort()
return
}
// Cache hit — allowed, skip SSAR call
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cached Allowed=true decisions are extending token validity.

On a cache hit this path makes no API server call, and the production client setup in Lines 91-104 only constructs clients. Handlers such as components/backend/handlers/sessions.go, Lines 1848-1854 and Lines 2534-2540, only check for non-nil clients before proxying, so an expired or revoked token can still pass middleware and reach protected paths until the cache entry expires.

As per coding guidelines, **: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Also applies to: 348-349

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/handlers/middleware.go` around lines 318 - 325, The
cached-allowed branch (globalSSARCache.check returning cachedAllowed==true) is
short-circuiting SSAR validation and lets expired/revoked tokens pass; update
the middleware handling around cacheKey/cachedAllowed to still validate the
incoming token (e.g., call the same SSAR/introspect flow used for cache misses
or check a stored token expiry/active flag) before allowing access.
Specifically, modify the logic in middleware.go where
globalSSARCache.check(cacheKey) is used: if found && cachedAllowed then perform
a token validity check (or verify stored expiry/nonce) and only proceed to skip
the full SSAR call when that token check succeeds; alternatively, change the
cache payload to include token expiry/active state and verify it here before
returning allowed. Ensure you reference and reuse the existing SSAR/token
validation functions used elsewhere so handlers (e.g., sessions proxy checks)
still rely on non-nil clients but only after token validity is confirmed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 30s TTL is an intentional trade-off — every auth cache (including K8s's own token cache) has this window. A 30s delay on revocation is acceptable for the substantial reduction in API server load. The alternative is a full SSAR call on every single request, which was the performance bottleneck this PR addresses.

Comment on lines +269 to +276
// Large file — seek to tail to bound reconnect latency.
log.Printf("AGUI Store: large event log for %s (%.1f MB), reading tail only", sessionID, float64(fileSize)/(1024*1024))
offset := fileSize - replayMaxTailBytes
if _, err := f.Seek(offset, 0); err != nil {
log.Printf("AGUI Store: seek failed for %s: %v, falling back to full read", sessionID, err)
events, _ := readJSONLFile(path)
return events
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider documenting the truncation impact on active runs.

For large active runs, clients receive only tail events and may miss critical early events (e.g., RUN_STARTED, initial state). The SSE handler in agui_proxy.go sends these directly to clients for state reconstruction without indicating truncation.

Finished runs should be compacted and small, so this primarily affects active runs with very large logs. Consider either:

  1. Adding a log message noting how many events were skipped, or
  2. Documenting this trade-off explicitly in the function comment

This helps operators understand reconnect behavior when debugging client state issues.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/websocket/agui_store.go` around lines 269 - 276, Add a
clear truncation notice when tail-reading large logs: after computing offset :=
fileSize - replayMaxTailBytes and successfully seeking, compute and log how many
bytes (and optionally approximate events by counting newline-delimited records
skipped) were omitted for sessionID so operators see "skipped X bytes (~Y
events)" along with sessionID and replayMaxTailBytes; if counting events is
expensive, at minimum log the skipped byte count and include this behavior in
the function comment above the tail-read block referencing replayMaxTailBytes,
readJSONLFile, and the f.Seek fallback so callers/operators know active runs may
miss early events like RUN_STARTED.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code already logs when tail-only reads happen (line 269). Finished runs are compacted to snapshot-only events and will be small. Active runs with truly enormous logs are an edge case — the truncation is a bounded-latency trade-off, not data loss.

Comment on lines +282 to 297
for scanner.Scan() {
line := scanner.Bytes()
if len(line) == 0 {
continue
}
// Skip the first line — likely partial from the seek
if first {
first = false
continue
}
var evt map[string]interface{}
if err := json.Unmarshal(line, &evt); err == nil {
events = append(events, evt)
}
}
return events
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Scanner errors are silently ignored in tail-scan mode.

The loop exits without checking scanner.Err(). If the scanner encounters an I/O error mid-read, the function returns partial data without any indication of failure. The existing readJSONLFile properly returns scanner.Err() to propagate such issues.

Also, JSON unmarshal failures are silently ignored here, whereas readJSONLFile logs them with log.Printf("Export: Skipping malformed JSON line: %v", err). This behavioral divergence makes debugging production issues harder.

Proposed fix to add error handling
 	for scanner.Scan() {
 		line := scanner.Bytes()
 		if len(line) == 0 {
 			continue
 		}
 		// Skip the first line — likely partial from the seek
 		if first {
 			first = false
 			continue
 		}
 		var evt map[string]interface{}
-		if err := json.Unmarshal(line, &evt); err == nil {
+		if err := json.Unmarshal(line, &evt); err != nil {
+			log.Printf("AGUI Store: skipping malformed JSON line in tail read: %v", err)
+			continue
+		}
-			events = append(events, evt)
-		}
+		events = append(events, evt)
+	}
+	if err := scanner.Err(); err != nil {
+		log.Printf("AGUI Store: scanner error during tail read for %s: %v", sessionID, err)
 	}
 	return events
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for scanner.Scan() {
line := scanner.Bytes()
if len(line) == 0 {
continue
}
// Skip the first line — likely partial from the seek
if first {
first = false
continue
}
var evt map[string]interface{}
if err := json.Unmarshal(line, &evt); err == nil {
events = append(events, evt)
}
}
return events
for scanner.Scan() {
line := scanner.Bytes()
if len(line) == 0 {
continue
}
// Skip the first line — likely partial from the seek
if first {
first = false
continue
}
var evt map[string]interface{}
if err := json.Unmarshal(line, &evt); err != nil {
log.Printf("AGUI Store: skipping malformed JSON line in tail read: %v", err)
continue
}
events = append(events, evt)
}
if err := scanner.Err(); err != nil {
log.Printf("AGUI Store: scanner error during tail read for %s: %v", sessionID, err)
}
return events
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/websocket/agui_store.go` around lines 282 - 297, The
tail-scan loop currently ignores scanner errors and JSON unmarshal failures;
after the for scanner.Scan() loop check scanner.Err() and propagate it (return
it or wrap and return) so I/O errors in tail mode are not silenced, and change
the json.Unmarshal handling to log malformed JSON lines (use the same
log.Printf("Export: Skipping malformed JSON line: %v", err) style as
readJSONLFile) instead of silently dropping them; update the function's return
path/signature if needed so the events slice and any scanner error are returned
to the caller (refer to the scanner variable, the json.Unmarshal call, the
events slice and the existing readJSONLFile behavior).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — scanner.Err() is checked and logged at lines 312-314, and JSON unmarshal failures are logged at line 307. Both error paths are handled.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/backend/handlers/agent_status_bench_test.go`:
- Around line 165-182: The benchmark uses the same session name
("bench-session") for all entries so agentStatusCache (keyed by session name)
gets populated on the first call and the remaining enrichAgentStatus calls hit
the cache; change the synthetic sessions in the sessions slice so each
AgenticSession.Metadata["name"] is unique (e.g., append the index to
"bench-session") and ensure any stub log lookup/mapping used by
enrichAgentStatus is populated for each distinct name (or alternatively change
the test's stub to map each synthetic name to a corresponding log file) so the
uncached benchmark truly measures cold-cache behavior for each session.
- Around line 143-149: The benchmark shares pointers to the same
types.AgenticSession values across b.RunParallel workers causing data races
because enrichAgentStatus mutates session.Status.AgentStatus; fix it by ensuring
each worker operates on a unique session copy — inside the b.RunParallel
closure, before calling enrichAgentStatus, copy the session value from the
sessions slice into a local variable (e.g., value copy of *sessions[i%20]) and
pass a pointer to that local copy to enrichAgentStatus so no two goroutines
mutate the same underlying struct.

In `@components/backend/handlers/ssar_cache.go`:
- Around line 21-22: The comment for ssarCacheMaxSize incorrectly says "LRU
eviction" while the current implementation performs arbitrary/random eviction by
iterating a Go map; update the comment to state "random/unspecified eviction"
(or "map-iteration-based eviction") to match behavior, and if you prefer true
LRU replace the eviction logic with an LRU structure (e.g., implement
container/list + map or use hashicorp/golang-lru) in the cache code paths that
perform eviction (the block that iterates the map around lines referenced in the
review); ensure any references to "LRU" in comments or docstrings for
ssarCacheMaxSize or the eviction routine are changed accordingly.
- Around line 74-95: checkAccess() currently always calls the Kubernetes
SelfSubjectAccessReview API, bypassing the shared globalSSARCache; update
checkAccess to first build a deterministic cache key (e.g., user identifiers +
verb + resource + namespace) and check globalSSARCache.get(key) and use that
result if present, only performing the API call on a cache miss and then storing
the boolean result with globalSSARCache.store(key, allowed). Locate
checkAccess() in agui_proxy.go and use the existing ssarCache methods (and
globalSSARCache) to ensure thread-safe reads/writes and preserve the existing
30s TTL behavior; do not change the cache eviction logic in ssar_cache.store.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 40b96d78-b6b5-498e-8f5f-d8b59c5e5c49

📥 Commits

Reviewing files that changed from the base of the PR and between 3c90592 and 87cc472.

📒 Files selected for processing (3)
  • components/backend/handlers/agent_status_bench_test.go
  • components/backend/handlers/ssar_cache.go
  • components/backend/handlers/ssar_cache_bench_test.go

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{"name":"HttpError","status":500,"request":{"method":"PATCH","url":"https://api.github.com/repos/ambient-code/platform/issues/comments/4128103810","headers":{"accept":"application/vnd.github.v3+json","user-agent":"octokit.js/0.0.0-development octokit-core.js/7.0.6 Node.js/24","authorization":"token [REDACTED]","content-type":"application/json; charset=utf-8"},"body":{"body":"<!-- This is an auto-generated comment: summarize by coderabbit.ai -->\n<!-- walkthrough_start -->\n\n## Walkthrough\n\nAdds two in-process TTL caches (SSAR decision cache and agent-status cache), updates middleware to consult the SSAR cache before making SelfSubjectAccessReview calls, bounds event replay by tail-reading large JSONL files, and adds benchmarks/tests for both caches and agent-status derivation.\n\n## Changes\n\n|Cohort / File(s)|Summary|\n|---|---|\n|**SSAR cache & middleware** <br> `components/backend/handlers/ssar_cache.go`, `components/backend/handlers/middleware.go`|New thread-safe TTL cache (`globalSSARCache`) for SelfSubjectAccessReview results keyed by a hashed token+params; middleware checks cache first, returns cached deny (403) immediately, performs SSAR on miss, then stores result. Adds exported `handlers.CacheDisabled` (env `DISABLE_SSAR_CACHE`).|\n|**Session agent-status cache & enrichment** <br> `components/backend/handlers/sessions.go`, `components/backend/handlers/agent_status_bench_test.go`|Adds `agentStatusCache` (in-memory TTL keyed by session name, 5s TTL) and logic in `enrichAgentStatus` to use cached `AgentStatus` for Running sessions; on miss derives status, stores it, and only sets non-empty derived values. Includes benchmarks and correctness tests for cache behavior and concurrency.|\n|**Event replay / agui store** <br> `components/backend/websocket/agui_store.go`|`loadEvents` now tail-reads `agui-events.jsonl` when file > `replayMaxTailBytes`: seeks to tail, skips partial first line, scans remaining lines, and ignores malformed JSON lines during tail mode; falls back to full read for small files or on seek/stat errors.|\n|**SSAR cache benchmarks & tests** <br> `components/backend/handlers/ssar_cache_bench_test.go`|Adds benchmarks and correctness tests for the SSAR cache covering hit/miss/parallel scenarios and a key-cost benchmark (includes SHA256 key computation).|\n\n## Sequence Diagram(s)\n\nmermaid\nsequenceDiagram\n    participant Client as Client (HTTP)\n    participant Middleware as Middleware (handlers)\n    participant Cache as SSAR Cache\n    participant K8s as Kubernetes API (SSAR)\n    Client->>Middleware: request with token + action\n    Middleware->>Cache: check(key)\n    alt cache hit (allowed)\n        Cache-->>Middleware: allowed=true\n        Middleware-->>Client: proceed\n    else cache hit (denied)\n        Cache-->>Middleware: allowed=false\n        Middleware-->>Client: 403 Forbidden\n    else cache miss\n        Cache-->>Middleware: miss\n        Middleware->>K8s: create SelfSubjectAccessReview\n        K8s-->>Middleware: response (allowed/denied)\n        Middleware->>Cache: store(key, allowed)\n        alt allowed\n            Middleware-->>Client: proceed\n        else denied\n            Middleware-->>Client: 403 Forbidden\n        end\n    end\n\n## Estimated code review effort\n\n🎯 4 (Complex) | ⏱️ ~45 minutes\n\n<!-- walkthrough_end -->\n\n\n<!-- pre_merge_checks_walkthrough_start -->\n\n<details>\n<summary>🚥 Pre-merge checks | ✅ 3</summary>\n\n<details>\n<summary>✅ Passed checks (3 passed)</summary>\n\n|     Check name     | Status   | Explanation                                                                                                                                       |\n| :----------------: | :------- | :------------------------------------------------------------------------------------------------------------------------------------------------ |\n|     Title check    | ✅ Passed | The title accurately summarizes all three main changes: SSAR cache, event log tail reads, and agent status cache.                                 |\n|  Description check | ✅ Passed | The description relates directly to the changeset, providing performance context and benchmarking data for two of the three optimization changes. |\n| Docstring Coverage | ✅ Passed | Docstring coverage is 85.00% which is sufficient. The required threshold is 80.00%.                                                               |\n\n</details>\n\n<sub>✏️ Tip: You can configure your own custom pre-merge checks in the settings.</sub>\n\n</details>\n\n<!-- pre_merge_checks_walkthrough_end -->\n\n<!-- finishing_touch_checkbox_start -->\n\n<details>\n<summary>✨ Finishing Touches</summary>\n\n<details>\n<summary>🧪 Generate unit tests (beta)</summary>\n\n- [ ] <!-- {\"checkboxId\": \"f47ac10b-58cc-4372-a567-0e02b2c3d479\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Create PR with unit tests\n\n</details>\n\n</details>\n\n<!-- finishing_touch_checkbox_end -->\n\n<!-- tips_start -->\n\n---\n\nThanks for using [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=ambient-code/platform&utm_content=1026)! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.\n\n<details>\n<summary>❤️ Share</summary>\n\n- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)\n- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)\n- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)\n- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)\n\n</details>\n\n<sub>Comment `@coderabbitai help` to get the list of available commands and usage tips.</sub>\n\n<!-- tips_end -->\n\n<!-- internal state start -->\n\n\n<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKPR1AGxJdulAGZcaLT0AMohAIJ2DGgMsCQANJAkUhi4kB74RJC42h6QFCRBiImY9GikqfY5uNjI0bEkBgByjgKUXACMAAwATABsBgCqNgAyXLC4uNyIHAD0s0TqsNgCGkzMs2jMauxgTEqz3B7UfvgUG9zYHh6z3f1DiO32sqloAB6IShIGIfjYFAwSJABFQMLEuH54G9Dv4wAJ8JMvOQGABrRAGaDOUhpEGYcGQZjaDA/aq1Lj4XzEgDCBWodHQnEgPV6fTAXQAzGAegBWaAdPocAAsPSF7IAWm4EMhbOhggVEI9kLhYAUgb4KKdznigfDEWRRGiUEpUvBIfSJPA0MCyLFCRQUfAMEQOEYoGFIowYnFIAAKdldZDQaAjACUXCpXukkAA0itKOQaMgQiQPH4QisRGJwgxAQqbMl4CQAO48SjMJCIeD4LANVHIdWQWqUQ4UfCZtIEdASfDwegkXAMehFijqKPVpJSFyQcJWACS+RIAEdsNJNJAAEIkTVAhyIHKOuhgDJBAmFBwFNipZCIWD4Iv0hBEWDpOlgwvIUrpO/ZFV/J+XNIfRIN4tiOaQuG5LoFDBf4CkqJsKHrSDIAAXkgQUAE4MIJYof1bbB/zwVDIG6DpZkQABuEiuighCkKgtCsOw5hcOVfDCLSNDui6ciQw0ac/BoPhlSBWJMFIRJ6jiegCmXVdrwdbgfyBWM2goBMoxnedomuJITQKDxZA0V1IAAURSNIMiyfc8lpWgZkgfMmAwZE0gKI40HkDBvzs5Bq0M5SXz3JlIAAWXXfg/EC5J2C/LJIS8FAMD3Qp6HwKLEGiFzHWs712HgApIASkh+M3MFYDtQ0bzvF1IF0aiumjCd2GQH0Om5MKIuKsMiquPJMswSAMPZPocMgQAkwmyXIFxPDoeh6MafQAdg0JaAHX7F8OhsG4EMDDqqAOlC5qr19dqOk6or4C8Hq/D6+wsqG+Ilv5MbJpsmb6Dm5jWqW7kNsQLbaB2vaDCgcIKjSPdqFqT0Gi4WcwQ8bAlHoR1AplcRcC8KjvNlWh1CrDA0DyeBmDAy9qiJyAlBsvy+DacrKvrVsLVRpLAqUTKR24cRq2MsGwEMAwTCgMg0qitA8EICplBoeh1kpnw+EEdtJCjOQFCUKhVHULQdH0EXwCgOBUFQQapYIYh9SoeWFFYdguCoEsHCcKdNf2ZRdc0bRdCFowjdMAx1m4asWtmAQYhRcXZlgUovEQ2Zy2CLwi2cEqiHwF0ACJc4MCxp1na3yFt+lXbteR0sYOOnWkDFvWT2hU/TyBvJLagaHJ3AlXwHBlTOeAAC8gREz1rkoSALStK0pPpd07CUBgkGpx0UpPKuSbT2Q6mrPcKGwMQcvQDB6AcARy0mI+rQAA2TVN02EURcGzXNEHzC1i2v/jZzSYDcCoMQSpvSyRXMFAg0cMCJAENga69lj6QHnnDb00d5B+FbMwbI+AIGzGJmwQGMQSCzEnAIWY8o/gAiBEQfC3BJJxDrJAa+RAMiRw8PPCMDRr5FTOOgJB9ISYZHvLQWYxpCwyWkFcXAJQT4oAdgTOkAUCg1HUsga+goOSQAAGJnDUMEMgnCfRFiWAwhgGhwjwgoLgH0IZr4hkgEWOIWBR6z3oBIEmK4aZkHfPxAA8lgGekYCQVkSOoUsGozgsUCogrSY8PCJGjiQaY0U3hIHEE6JIFBWx8Brk3I+bQ44Wm4ZqRsxM8C3hHMPFxyBHRCWJnkRBfhcj/GkIkESWA9xnCjKPa+8oNAhFJIgUx1w7x0E4fKCRHMnEBLaNuPSmol5pNHvw4ZQjNQ6ONPwPA6wSoB3zpYcIHghLUCJj3Tmohji22OZFJIbxQ4WPpNwy4AgPDwAYHpLG74waQCaGHIwIwDx1BrqQWgXAADU7IMKzC5IKIwpk9xkzpArRQQICgfxLFuTUjIAAS8AnwGFztnV0QcQ5hyvBHKOMdskJ0QORaQlZd4aEzjnPOBdwhF1lqXU+jgK5XLErXdEBhEb/0UAfPhPAo7lBIEeGKpMMBgDYMwM48ggwjF4b6a+ErUh9JhogdhcQbGQBQfSTWio6VYFwSPXu7TCra3VvQa+EN2BapqIgThrjkZRjQSwBhAARSg6sHWav6Ro9B5kWpf2nLohWAT2BTj3gfbux8yhYGvo6dQVjOGZ3wqkkecc0jqirATHSAVED3gScgYC3ACr0hjV4yAgxuC0ARQwsgI5YgBtwE62onCCCJCKYAHAIbDYGyk6QAuAT2Fpcc3tBVgq1hRIFWeibx1pGviaomvT+mmMhp2l1RV0G8PoN5WVJAu7yDdaAuxSw/hpCLUfa+vqRxSHbdu4NLBQ1XmvlRccC7ywKmCWkG1UggE7n6YkK1UYQno0mQ0C9yoeHcjAI8ZyfYbkFSOdWKR9BWlLpUU+/pnD/LyFNKch95o3FAlQIesAx7eZGXMHsg5ctLmdlHovc5aHkpXIrWcO2DyVjPNeflcQddPnfPIALP55AAXiToKC7kAAOSF7IYVwsJHbT2C5UVJD8Birg2LcX4sJWAIwxLyCksjqiCl8dKDUoVM4AA+rPBlWc8XMr2Wym2Tby7OErlFXlpB+XhEjcYlgodTPdzJRZk+scrOJ1sxQBzkYnPX2CakVswN5lZCtNwcVpApVSFiXhVKCG0B+CBMq1VRS75pgzE/F+tL36FnbqUgeg92MLgcAcgZMBvQLrA+W1L74DUkFkEa+QVoQiYvCFybko0443iuaPcBZBfT/yHdEO2nY5PAlkImWxzl1tkCbYY2DIDVxiqoGwISrVr7mvwYCZLDDiEPcYdQ57ZD/j3b4tOSATDBAk1VavHIYIgQ+kYcwkmbDIz6tQAwWkdtjvPkGtU+M/3r42AAOqhTwMBa+YBeAIifvSQkSkq7qqGYIzhi9l670XdfCtVbEDhFwN2smq5QJdd1UfVAchssKnpPY5bd7ZwRHXCMUydn552apOEKkmLTIoVWyQThqBaaE7Rsm1NljrGJCnjwx5/GGG6pIN6pAKgvC0E4fCfAXhMBUUg7mgkSLjF0JRJwzenlkDuW4x+VVP6uvX1nT6FB1iFxKI41B6SDClkU8nmR/gGAApEd4NIWKn4h304KLQDDL5B7XXkEoLwiZrmVsz28kcUYh3ax4cOUcX5UQpYYMjAmaSCh7Bd0fUeMb5CdkpaJasDBYKxQKGg6QcR/dgaD8Nko5O6Ah9XpQcLO1HgWP6//QbCOeGQjePScroO4tG+Vfqz86pNQROiNlpeuB5DOSFXkD2NuRxpLQNwfHyTVMpnkHHVMgVidl/Xw4hhJEICAbaQEPWkBoFRffSMUKd4EIIeJXAWOjacBjC5GnZjb0VjZwdjPyKKLjO5NKPgfXF5N5QmYTcGILPAu2VxEcM3bwBhXvRCDQI3E3RAWgi3NVK3DwB7JQBpcZa+BpDwR4D9JdTBBhRXZXDXDANNEPAXLAMgCQWPPgIXEXMXCXCIGwKXGXOXThJcbAEmFRbORXbOGxcTf5auaTYFSAEFbobkSFLoZTcQd/RFJQDTRrLTHTMKOgeARwFzAlQWIzYOELElcLczCBIRe8AQRAfACzXATYIgGBOzMDJzJlAlFldzEuTzLlbzHlQFOuKAQVNLEVMoBhdyY4WQGAt4TEa6dcHbaQThZyaGSoT8Pzeka+Y8WgN9bubtXuOyePAKRbaaUncoGBKjCyAZIQKIhPThWQwKYqa5QEOgIBagbbGgdIMmdQFLNeCWdADwLeT3VKDvPKfSIEAAKRCG8SaBVV126VSjOIuJGA0WugQM+XzA8kBEw29E10tDyDumuDAF6KLRg2fEJAwD0LyGvnwAGW8UpGvlmFvmqE4UpRykSFqFvQGgwDuMuJ9D8BDyKUQEJF0mKg/GkStEeBIBRBBWvmgUhHwF6SyhLm7VyAQ0eiKXOVICui8AGU+W9X+CPi8CIBiHkHLCoTa0USnFBzwVYP8w0AmP8gmnQHiPgFGJahlMmK4JDFA2OwaA+J3FEGrDKEmGozSE9QwRvTSRuKCExIeKeJ9Gy2VBD07FbwpE8QWW9DmI3whKhJhK4WEm9HIBdnpL5MyBeWMnIPZitBsmZMGhv1bB+IES4DJLnU7H4KeLgOHnqlKM8gqKqI8BqMTAe3RN3S9WVGWPSj8EeDSHsSePsEUkrVdKBEhEQihnpPpGeXIESGvllIwA0EGAwDtBvBJh0K9HsBWEeDkkqEoyNPkDbPI1SF7mvmJwAG094coABdZHDUAhAAbwAF8HtPxn9KQ0oE95AHAcxaVfiApstEJq0xjQyI12YrJFgn8SSUDq0MluEkSnQuB4TqBYTb4SByTOEGlromlkA250BzE0hCl+FgQo5RCLTaArTHivBbTqBYAQ8fRxwAKgKNTF0CTT96RMS1jJMaZeSFlpp0SpCn9CpEBaz6RbY4hfTBoQK2zcogR8pCp2jEDdlkDDk+Zw9e4WMzksCBKcDi9uN7lCC+NiDBMPkoBRNtkDAJMowWjLCQU5N2RFMHD4U1MncUU3D0VuNdMcVYBfDDNjMgiwtqVQjLMT4qU4j2BEj+k7NGZYg7NExNBGVzK0ji45Yy4sipwq4WiAsgsrR/TIAABxIS1cCOG0CqZwOdOYgPKy8OWyqLBg6lDVXAZy7VVy+Kjy1cJLJKIVdLI+OIDwdUMAWgEjNpfsHaK5RAF4EScQV5dVRU5Uq8VU/yThKyYk5DSgJeSsJ/SGKobVPSVtCqWKPJNAApPgSvCeZgCReAMCB6Q7EcSE/iOABs6s2HQoIvTuW5bI1dccAmAoMQAeKMT8GvIvK0WkZ5OFV5MclcYHK5K0k6BNAXQqUeSEWpNY0rBgWQRvDiiyFAFRGwQYJoCXaASIaAUyb1B7Mgc8Q4oEADMufpdra3QDBC7OIsM4B0J0Yw0qeK5mJKRvFGIEQaNbSMegBVFwks69B/IDewOOUvdqx1fpI3YC7hQoWIFAfiomEofdR3emh3XgVnZmhdfvHULcDpbIdYlvBJfa+kItXCGefvQfSoNyhK+0H8ZY1W5tDASa3DbVN3WHSE5AJag5FaxKE6jjdGa8/hFMUQveWlVVG/fKdDRdK0B63APHCVa0JmRKvWqGMmK4OkfrSao+ZkcdBUS5DfXUZ8am7U30Kq0cVAmsAJeUfsWxZomm1OkcNgWgWxM+f4odLrbarWIEfAScEcLmHhAc0vO6MEASxQy0J5IEO9P1R9LdINENMY7orsHsNGcmb3RdQHVKKaa6KM8Pb0dlJtGKSoKySeS0BhNG59dBFCpXRIKsvm3yJIJYCeH6+6OY7hfxF/ekD6QxE+b8T8dEmQKONOCgOBVk00UQIGxKRejsWQXwE5RRf4ZNXG/GnKYwuE7OXsLwIm77JgDJJ+STJUM7a+aAVcE251I3KXM4c63AeB11EmXsCO33CsWOEJGaua7bBQIdVJLIA25M+9f1Xu7VF9ZgTol1EoPnFfEOsa51MGxsBtJtZMoB+0EB7mvgO6odairIO2nXP1M0JYtISjfeYdSRidGnOilajxB9NrTij/bPWdHeGCWByoVg0rK/chx0bsAmrIOnI2l5WAVBrtIsjBK28QVazNK9f5HillFA7A0Q4SxvUSy5KuSgqSngGSgTE0ITflBS3uYJoRIg8JrGfPESjO5Ae8QqOm00URKiZUGHXI/GH3CK6KhhQqvcYq90oxTyuKoO3W5uw+Gne3TuzKzhC/FECVAWAuGAqQ0rYKLe6cWpWQYeCgX5MwtS0FdkdkHoOwnSpwqu1w4sdw4yyAPTMygzfwyy0e6yiLMI6Ley6zciVg+LWefK8qEpry5zAzXy+eu2LzIK3zXI0K9mFKjZtK8lDKmLGzA5hLBoY59yzy4qm/IkI+Ip7W0mg8hQWBsQeB7IM7IpFdA5rmrav0+ZkFxK8tbfAEJAVGlJR0MQVVUhqsRCQISAW8P2u058BAHEEm4OhmngVvUOS4Y4IvMOMvMbaRdyZWpua/fhFRRzWdERwKVgtgIbWQKiK0P3QO20alh3dlhFAKEObHZAIdeAOSYV3CA2gPRLPllpAAvreBaPBi8RA5UVkk6CAfQxtIClmQKl3Wml5PPHCkcOovbiP/K66RJsFRVYQdDAKwLA8eLg0Q8xrBTu3ll3M21sBUAkZa1xs4dxyTficIcdR21YlF3WtgNAc8TpHrSE6C3zELPAK+VVFBFem+KAhoaMYbfRHFpvI+CbcIHkUadNmRI4URMxhNRWdgL7SumBrBqFzytVZBvcSHb5qkTBuB2lfVN1fBovXrY9TAVq5AIpKcvYAJP3ZpKoDpMAVpMACl60fJAl7PCtNXQJPnBd7hAmbTSgWKOiRdc90rOCPNVsdsRV4arIFBZANG+gE0hhUtuIct2QcNH+HYqIsmpvK601zWwG+wErfsa/F3ch/BIsCRhqH7GN/N0iml/bTW7Ga/PN1Yt90DAgQqEmGpI5J/GfFxMjNW6RQEjVhoNYUN7V5bNOdQU90R3SNxtDzpXuEOQvJSpA/ZAWtAoSjA5JnxoJm5SSgg0Jp5WSiJj5ETGJiT/Aw4MJkg0xzAlJuxS98w2udXa5bFtJT2bJqUGmU0KKAjfJ1ueZyplNpKtbMS30DODQDssqSV+0IduIOzbFZnFz61lEDzkgOzUKCsZ7VznW/z9Q9Bkdgx+97z1hhhMLyqALv957AdjtSLyMDBiF7B8d2xdGcK+Z6+U5spp49pywTpt+np6s8IfpwZ4Z0i0Zqw7oBTNkaZptdTAy+ZoyixLgUKLwnw1Zw2QOMWaRDeaWPyjle2JWfINAF2QK+QD2JFHWNQH2A2f2UWKb9QOzXsRAOzTrwRXKixP2YWDb9kBgDCLobkDCHodkEgCZpaSODoIIBgOTLoJ7kgHoEgLoOTEgJaQoDoOTNRHoBpNAdkY74byAOTJaHMQUJaHoJaQUV7h7kgPoP6PoOTNAQUPwWgcZjCNoLoUQTHq7gQS7vwDqQ2DbnoDoWgJaH7jCKHjCbkHoYIBgAQHoC7+HroWgKH7kXnkrPwdkWgTCG7voUX8HiHxWLbnbvbgsYsOgOzcWcHjb5POzNgCgUgBzF3Xb6GI7obrc/aSAcBxAWwdcDICzWgEdh2VIKwbNugbOCEfQhIA3o37xWu3sY0e3oqR3+IZ32gaIlcp0EdycCVQVFHVhaoEgT3/XuqOqbOEzF5yLIRTKpOd35uAoJzKPg3mPw3ggHIDwDRez45T3joH37P2P2psS9HJYb1f3tfJ0RAYvrPyAHc0vmPuP1KszV5pP95/Z+zRzTOTPsvnPhEEmAvluovrgHoVv7P7OCv45Kv5UGvhgAPogBvzoJvlvpv9v55zvxP2YCIqImIuIhIpIgfrgaPsvwwkf/Pwv3eT3qfpv8v2/5KBf2AJflftfpkDf6fw3+P3f7Z5PnbQGRn9IAF/Gfrn1H7P9P+D/IfrPygGv93+dfVfo32z6b8Z+f/EIl3x2ZNw9mcWL5p521rFcQBYAtvhAJv7j87+EEH/k/woEv9q+tfR/MgIgjf8t+GAmylgOT7ZVcqzqH5rACIH4BB+l/MgWPzqbJRPecmagYbzn67wEBDAnKJ/yWgb8DeaA3/kihsAqAVu6OVsDQCsAUB3A2MSPg70EJO9Y+1UK4LQFN7REUQtgT3gIUeCt9s4BMWgF6yX59JGBOqF3J70VwOCnBXrfQV4CpCeCuA3g33r2BcHSBYcK1ASoEINBeD94Jgw3m2Wji0BZwCoUBG4M965wHBxwPcDENRD5gOs3cT3ouSb4kDY+s6JoFsEMGG9/BokTwZIOzjQxnUcQlcA0IrTHBiYAlTIZXSxiJQYgZreRKeUCrwEPwukNiIBQJBEgdO/mLgIglniJAv6cUKerZFShUcygo1JobDEczZwGhdNaobjWcASMdhj/Q3udWrCQh4iBQWwd7xOHZwB4z5EmHkJRCVC2AmQ3oZHxYHZ8yhv/F3C8P2G+puYUQ6mLOmOGwDNhn/EIbAPaGYB2M3QkToCN5jUwDIBDM6k/H6LCc6h0mCsokHxxswj4J+cJNqGgg0A3gaQT8LZyPiNocgPpbIHjQWzehxh1dREeWFayt0QqGgUEZfz2GZCn6RwhoWcIwAXCmk1w4wQ0PuGOhHhvwqoZkK5iRDER1YAlKgJ/7fC4+Uo14VwGziIDGBkAIPsoFICciZ+4IloQkJn7QjOhRMTIVqKPhMBg+7JVAHJm5AaAaIAAUjsQIA+aqABwNpheSFhUgiLZFLoSrQfF5Qt4DwGjGQCvcnRXQZ0RyN2FIoeRhwkBmKJHAPCPATwv4TKLkH19FRMfHcgb1XLZD02uAWwACLlFdCNRgoBgF0HO5+AsIgoPoNyFp6s8loMQDCLDzx5/Q/AN3L7o2P9CA8WxfQEgJBA6ACBBQ9Y77iTwYD+huQgoWgOdyWh/RRAnI7ODkOLE2BahmQgQFhEbEzjAeWPHccD0FC3c/ADAfkNyFrEA9G0/INACQHp6A9uQgFWgNd2GhoA2xwoGiATyCAYQ/AgobkAwAXEEo8xEPFXmrw156MFe0iCnsbD7j4A7M2WJsNwMC4680gevFcUWJ9ZNhaATOBrHLwt4sAL4I7Shvby6BASNulsWCfBMeCISIJ9AfQEAA== -->\n\n<!-- internal state end -->"},"request":{"retryCount":3,"retries":3,"retryAfter":16}}}

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
components/backend/handlers/ssar_cache.go (1)

21-22: ⚠️ Potential issue | 🟠 Major

Eviction strategy can introduce latency spikes on hot auth path.

At Line 90, capacity eviction deletes ~50% of entries while holding the global write lock. This is a large blocking section for both check() and store() under load, causing latency cliffs right when the cache is hottest. Also, Line 21 still describes this as LRU, but the current behavior is map-iteration-based eviction.

As per coding guidelines, **: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Also applies to: 89-98

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/handlers/ssar_cache.go` around lines 21 - 22, The current
eviction in the SSAR cache (ssarCacheMaxSize) iterates the entire map and
deletes ~50% of entries while holding the global write lock inside
store()/check(), causing blocking and latency spikes; change the strategy to
avoid long-held locks by either implementing a proper LRU (use a doubly-linked
list + map and update on access inside short critical sections) or perform
eviction outside the hot write path (mark keys to evict, release the lock, then
do deletions or have a background goroutine drain an eviction queue); ensure
check() and store() only take the mutex briefly to update/access the map and
list and move eviction work to a non-blocking path.
components/backend/websocket/agui_store.go (1)

277-301: ⚠️ Potential issue | 🟠 Major

Don’t treat scanner failures as a successful replay.

Both scan paths only log scanner.Err() and still return whatever was parsed so far. That turns a read failure into a truncated replay, so reconnecting clients can rebuild from incomplete state without any signal to the caller. Please push the scanner loop behind a helper that returns ([]map[string]interface{}, error) so the small-file and tail paths can fail or fall back consistently.

Also applies to: 304-324

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/websocket/agui_store.go` around lines 277 - 301, Extract
the scanner loop into a helper like parseEventsFromScanner(sc *bufio.Scanner)
([]map[string]interface{}, error) that performs the first-line skip, JSON
unmarshal, appends events, and returns scanner.Err() if any scan error occurred
(instead of just logging and returning partial results); replace the inline
scanner code in both the small-file and tail paths with calls to this helper and
have the callers handle/log the returned error and decide whether to fail or
fallback so scanner failures are not treated as successful replays.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/backend/handlers/ssar_cache.go`:
- Around line 85-88: The store method on ssarCache is acquiring the mutex and
writing entries even when caching is disabled; update ssarCache.store to check
the global CacheDisabled flag (or the cache's enabled state) and return
immediately before acquiring c.mu (i.e., do an early-return if CacheDisabled) so
no lock or map write occurs when disabled; apply the same early-return pattern
to the similar methods referenced around lines 101-104 to avoid unnecessary
mutex/map overhead when globalSSARCache.store is called unconditionally by
middleware.

In `@components/backend/websocket/agui_store.go`:
- Around line 217-221: replayMaxTailBytes being a fixed 2MB can cause valid
JSONL records (up to scannerMaxLineSize) to be clipped; change the tail window
to be derived from scannerMaxLineSize (e.g., tailWindow :=
max(replayMaxTailBytes, scannerMaxLineSize)) so the window always can contain
the largest record. Also update the replay logic around the seek/skip (the code
using offset and the first boolean) to detect whether the seek landed mid-record
and only discard the first non-empty line when the seek actually started inside
a record (not when offset is on a record boundary or on the newline). Locate and
modify the constants and the replay loop that reference replayMaxTailBytes,
scannerMaxLineSize, offset and first (including the similar logic at the other
occurrence around the 270–289 region).

---

Duplicate comments:
In `@components/backend/handlers/ssar_cache.go`:
- Around line 21-22: The current eviction in the SSAR cache (ssarCacheMaxSize)
iterates the entire map and deletes ~50% of entries while holding the global
write lock inside store()/check(), causing blocking and latency spikes; change
the strategy to avoid long-held locks by either implementing a proper LRU (use a
doubly-linked list + map and update on access inside short critical sections) or
perform eviction outside the hot write path (mark keys to evict, release the
lock, then do deletions or have a background goroutine drain an eviction queue);
ensure check() and store() only take the mutex briefly to update/access the map
and list and move eviction work to a non-blocking path.

In `@components/backend/websocket/agui_store.go`:
- Around line 277-301: Extract the scanner loop into a helper like
parseEventsFromScanner(sc *bufio.Scanner) ([]map[string]interface{}, error) that
performs the first-line skip, JSON unmarshal, appends events, and returns
scanner.Err() if any scan error occurred (instead of just logging and returning
partial results); replace the inline scanner code in both the small-file and
tail paths with calls to this helper and have the callers handle/log the
returned error and decide whether to fail or fallback so scanner failures are
not treated as successful replays.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7010aab6-376c-465b-a26f-4947a66c4039

📥 Commits

Reviewing files that changed from the base of the PR and between 87cc472 and 21d78e9.

📒 Files selected for processing (3)
  • components/backend/handlers/sessions.go
  • components/backend/handlers/ssar_cache.go
  • components/backend/websocket/agui_store.go

@ambient-code ambient-code bot added this to the Review Queue milestone Mar 26, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/backend/handlers/ssar_cache.go (1)

21-22: ⚠️ Potential issue | 🟠 Major

Evicting half the map under a write lock can stall hot-path checks at capacity.

At capacity, this path performs thousands of deletes while holding c.mu.Lock(), blocking concurrent readers/writers and creating latency spikes. Also, the “LRU eviction” comment does not match behavior.

🔧 Minimal mitigation
-// ssarCacheMaxSize is the maximum number of entries before LRU eviction.
+// ssarCacheMaxSize is the maximum number of entries before map-iteration eviction.
 const ssarCacheMaxSize = 10000

 // Evict if at capacity (simple: clear half the entries)
 if len(c.entries) >= ssarCacheMaxSize {
-	count := 0
-	for k := range c.entries {
-		delete(c.entries, k)
-		count++
-		if count >= ssarCacheMaxSize/2 {
-			break
-		}
-	}
+	for k := range c.entries {
+		delete(c.entries, k) // remove one to keep lock hold bounded
+		break
+	}
 }

As per coding guidelines, **: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Also applies to: 92-101

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/handlers/ssar_cache.go` around lines 21 - 22, The current
capacity-handling deletes thousands of map entries while holding c.mu.Lock()
(refer to ssarCacheMaxSize and the eviction code around c.mu.Lock()/lines
~92-101), which stalls hot-path readers/writers and isn't true LRU; change the
design to avoid bulk deletes under the write lock by (a) checking capacity under
a read lock and immediately returning or marking the cache as over-capacity, (b)
performing eviction asynchronously (spawn a single background goroutine or push
keys to an eviction queue) so deletions happen outside the hot-path lock, or (c)
implement a true incremental LRU using a concurrent-safe list/queue that evicts
a few items per mutation instead of half the map at once; also update the
comment to accurately describe the eviction strategy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/backend/handlers/ssar_cache.go`:
- Around line 44-47: The cache key in ssarCacheKey currently uses only the first
8 bytes of the SHA-256 digest, which creates a small collision domain and can
lead to cross-principal authorization reuse; change ssarCacheKey to include the
full SHA-256 digest (use the entire h value instead of h[:8]) when formatting
the key so the key is generated from the full 256-bit hash combined with
namespace, verb, group, and resource.

---

Duplicate comments:
In `@components/backend/handlers/ssar_cache.go`:
- Around line 21-22: The current capacity-handling deletes thousands of map
entries while holding c.mu.Lock() (refer to ssarCacheMaxSize and the eviction
code around c.mu.Lock()/lines ~92-101), which stalls hot-path readers/writers
and isn't true LRU; change the design to avoid bulk deletes under the write lock
by (a) checking capacity under a read lock and immediately returning or marking
the cache as over-capacity, (b) performing eviction asynchronously (spawn a
single background goroutine or push keys to an eviction queue) so deletions
happen outside the hot-path lock, or (c) implement a true incremental LRU using
a concurrent-safe list/queue that evicts a few items per mutation instead of
half the map at once; also update the comment to accurately describe the
eviction strategy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9a44e235-c459-4c34-b289-b46425c3458b

📥 Commits

Reviewing files that changed from the base of the PR and between 21d78e9 and 8b4a28a.

📒 Files selected for processing (2)
  • components/backend/handlers/ssar_cache.go
  • components/backend/websocket/agui_store.go

- Fix misleading "LRU eviction" comment to "random eviction" (Go map
  iteration order is undefined)
- Fix data race in BenchmarkEnrichAgentStatus_Concurrent: each
  RunParallel goroutine now gets its own session struct
- Fix BenchmarkEnrichAgentStatus_ListPage/uncached: clear cache before
  each session so every call is a true cache miss

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/backend/handlers/agent_status_bench_test.go`:
- Around line 109-115: The benchmark is warming agentStatusCache only before
ResetTimer(), but the TTL can expire during long runs, skewing cached metrics;
update each benchmark (enrichAgentStatus calls in BenchmarkEnrichAgentStatus,
BenchmarkEnrichAgentStatus_Concurrent,
BenchmarkEnrichAgentStatus_ListPage/cached) to "pin" the cache entry after
warming by explicitly extending its TTL (e.g. re-set or update the entry via the
same cache API used in sessions.go such as agentStatusCache.Set or a
touch/update method on the session) immediately before b.ResetTimer(), ensuring
the cached entry cannot expire during the timed loop.
- Around line 275-282: Replace the plain callCount increment with an atomic
counter and assert its value after the goroutines complete: change callCount to
an int32 (used by the test stub for DeriveAgentStatusFromEvents), increment it
with atomic.AddInt32(&callCount, 1) inside the stub, and after wg.Wait() add an
assertion that the counter equals 1 to verify the cache prevented extra
derivations; keep the defer that restores DeriveAgentStatusFromEvents and
reference the same stub and wg.Wait() code paths when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cf70201e-237e-446a-b53a-afe7c848800a

📥 Commits

Reviewing files that changed from the base of the PR and between 8b4a28a and 7234a57.

📒 Files selected for processing (2)
  • components/backend/handlers/agent_status_bench_test.go
  • components/backend/handlers/ssar_cache.go

Comment on lines +109 to +115
// Prime the cache with one call
enrichAgentStatus(session)

b.ResetTimer()
for i := 0; i < b.N; i++ {
enrichAgentStatus(session)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pin the cache entry past the timed section.

agentStatusCache is TTL-based in components/backend/handlers/sessions.go, but these benchmarks only warm it once before ResetTimer(). On longer -benchtime runs, the entry can expire mid-loop and the reported “cached” numbers start including file-derivation work again.

Suggested change
 // Prime the cache with one call
 enrichAgentStatus(session)
+agentStatusCache.Lock()
+if entry, ok := agentStatusCache.entries[sessionName]; ok {
+	entry.expiresAt = time.Now().Add(time.Hour)
+	agentStatusCache.entries[sessionName] = entry
+}
+agentStatusCache.Unlock()

 b.ResetTimer()

Apply the same pattern to BenchmarkEnrichAgentStatus_Concurrent and BenchmarkEnrichAgentStatus_ListPage/cached.

Also applies to: 139-152, 189-199

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/handlers/agent_status_bench_test.go` around lines 109 -
115, The benchmark is warming agentStatusCache only before ResetTimer(), but the
TTL can expire during long runs, skewing cached metrics; update each benchmark
(enrichAgentStatus calls in BenchmarkEnrichAgentStatus,
BenchmarkEnrichAgentStatus_Concurrent,
BenchmarkEnrichAgentStatus_ListPage/cached) to "pin" the cache entry after
warming by explicitly extending its TTL (e.g. re-set or update the entry via the
same cache API used in sessions.go such as agentStatusCache.Set or a
touch/update method on the session) immediately before b.ResetTimer(), ensuring
the cached entry cannot expire during the timed loop.

Comment on lines +275 to +282
callCount := 0
origDerive := DeriveAgentStatusFromEvents
defer func() { DeriveAgentStatusFromEvents = origDerive }()

DeriveAgentStatusFromEvents = func(name string) string {
callCount++
return "working"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "agent_status_bench_test.go" -type f

Repository: ambient-code/platform

Length of output: 121


🏁 Script executed:

cat -n ./components/backend/handlers/agent_status_bench_test.go | sed -n '270,290p'

Repository: ambient-code/platform

Length of output: 771


🏁 Script executed:

cat -n ./components/backend/handlers/agent_status_bench_test.go | sed -n '315,340p'

Repository: ambient-code/platform

Length of output: 659


🏁 Script executed:

cat -n ./components/backend/handlers/agent_status_bench_test.go | sed -n '290,320p'

Repository: ambient-code/platform

Length of output: 1226


🏁 Script executed:

cat -n ./components/backend/handlers/agent_status_bench_test.go | sed -n '334,345p'

Repository: ambient-code/platform

Length of output: 91


🏁 Script executed:

cat -n ./components/backend/handlers/agent_status_bench_test.go | sed -n '330,350p'

Repository: ambient-code/platform

Length of output: 157


🏁 Script executed:

grep -n "callCount" ./components/backend/handlers/agent_status_bench_test.go

Repository: ambient-code/platform

Length of output: 366


🏁 Script executed:

cat -n ./components/backend/handlers/agent_status_bench_test.go | sed -n '1,30p'

Repository: ambient-code/platform

Length of output: 907


Make the concurrency check race-free and assert the outcome.

The stub increments a plain callCount (line 280) that all 50 concurrent goroutines can reach without synchronization. The concurrent test section (lines 321-334) never checks that callCount remained at 1 after wg.Wait(). If the cache regresses and multiple workers fall through to derivation, this test races on its own bookkeeping and still passes.

Use atomic.AddInt32(&callCount, 1) instead of callCount++, and add an assertion after line 334 to verify the cache prevented extra derivations under concurrent load.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/backend/handlers/agent_status_bench_test.go` around lines 275 -
282, Replace the plain callCount increment with an atomic counter and assert its
value after the goroutines complete: change callCount to an int32 (used by the
test stub for DeriveAgentStatusFromEvents), increment it with
atomic.AddInt32(&callCount, 1) inside the stub, and after wg.Wait() add an
assertion that the counter equals 1 to verify the cache prevented extra
derivations; keep the defer that restores DeriveAgentStatusFromEvents and
reference the same stub and wg.Wait() code paths when making the change.

@ambient-code ambient-code bot removed this from the Review Queue milestone Mar 27, 2026
@ambient-code
Copy link
Copy Markdown
Contributor

ambient-code bot commented Mar 27, 2026

Review Queue Status

Check Status Detail
CI pass ---
Conflicts pass ---
Reviews FAIL CodeRabbit Major: Cache entry not pinned during benchmark

Action needed: Address CodeRabbit Major issue on cache entry pinning

Auto-generated by Review Queue workflow. Updated when PR changes.

@ambient-code ambient-code bot added this to the Review Queue milestone Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants