Skip to content

Feat/add-arcjet-ratelimits#31

Merged
DIodide merged 7 commits intostagingfrom
feat/add-arcjet-ratelimits
Apr 8, 2026
Merged

Feat/add-arcjet-ratelimits#31
DIodide merged 7 commits intostagingfrom
feat/add-arcjet-ratelimits

Conversation

@DIodide
Copy link
Copy Markdown
Owner

@DIodide DIodide commented Apr 8, 2026

  • Introduced Arcjet dependencies in package.json and bun.lock.
  • Added environment variable ARCJET_KEY to .env.example.
  • Implemented rate limiting checks using Arcjet in chat-related functions.
  • Created components for displaying usage statistics and integrated them into the chat interface.
  • Updated backend to track usage budgets and record usage data.
  • Added necessary API endpoints for budget checking and usage recording.

DIodide added 2 commits April 8, 2026 14:07
- Introduced Arcjet dependencies in package.json and bun.lock.
- Added environment variable ARCJET_KEY to .env.example.
- Implemented rate limiting checks using Arcjet in chat-related functions.
- Created components for displaying usage statistics and integrated them into the chat interface.
- Updated backend to track usage budgets and record usage data.
- Added necessary API endpoints for budget checking and usage recording.
@DIodide DIodide self-assigned this Apr 8, 2026
Copy link
Copy Markdown
Owner Author

@DIodide DIodide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

Claude finished @DIodide's task in 2m 15s —— View job


PR Review: Feat/add-arcjet-ratelimits

  • Gather context and read CLAUDE.md
  • Review changed files
  • Provide comprehensive feedback

Good overall architecture — two-layer rate limiting (Arcjet for per-request rate, Convex/FastAPI for cost budgets) is well-reasoned, and the fail-open approach throughout is appropriate for a service like this. A few issues worth addressing before merge.


Critical

ARCJET_KEY is required, making graceful degradation unreachable

apps/web/src/env.ts:8 declares ARCJET_KEY: z.string().min(1) without .optional(). This means the app refuses to start without the key — but chat-ratelimit.ts:72-76 has a catch block that is supposed to gracefully degrade if Arcjet fails. That catch block is dead code: the crash happens at startup before any request reaches it. Every developer who hasn't set up Arcjet will be blocked from running locally.

// env.ts — make the key optional
ARCJET_KEY: z.string().min(1).optional(),

Then in arcjet.ts, guard the client initialization and export aj as null | typeof aj, or check env.ARCJET_KEY before calling aj.protect() in chat-ratelimit.ts. Fix this →


High

Race condition in upsertBudget (packages/convex-backend/convex/usage.ts:128-227)

The read-then-patch pattern is not atomic. Under concurrent requests for the same user (e.g., parallel tool iterations all completing at once), two mutations can both read the existing doc, compute new totals independently, and one will silently overwrite the other's increment. Convex does retry on OCC conflicts, so this will often self-heal, but under sustained load it creates contention.

Consider using ctx.db.patch with a delta-based approach, or serializing per-user budget updates via a queue. At minimum, document that OCC retries are the safety net here.

getBody throws in checkChatRateLimit (apps/web/src/lib/chat-ratelimit.ts:42-45)

getBody: async () => {
    throw new Error("Not implemented");
},

This is confusing — it looks like broken code. If Arcjet internally calls getBody, this will throw inside aj.protect() and trigger the catch block, silently allowing all requests. Verify whether Arcjet calls getBody for your rule set (token bucket + shield). If not, a no-op (async () => null) would be clearer and safer. Fix this →


Medium

upsertBudget loses Convex type safety (usage.ts:129)

async function upsertBudget(ctx: { db: any }, ...)

Using db: any bypasses Convex's generated type-safe query builders. The correct type is MutationCtx from "convex/server" (or Pick<MutationCtx, "db">). This function is called from recordUsage which is a proper internalMutation, so the runtime context is already correctly typed — just thread it through. Fix this →

Duplicated week-number logic in Python and TypeScript

packages/fastapi/app/services/usage.py:29-32 uses now.isocalendar() while packages/convex-backend/convex/usage.ts:460-466 uses a manual Thursday-based calculation. Both implement ISO 8601 week numbers and should agree, but if they ever drift (timezone edge cases, year boundaries) week-key mismatches will silently cause the Python-written records to be invisible to the TypeScript queries. A simple assert-style test covering the week boundary would make this robust.

default_daily_cost_limit / default_weekly_cost_limit in config.py are unused (packages/fastapi/app/config.py:39-41)

The FastAPI config defines these settings but they're never read. The actual defaults are hardcoded in usage.ts (DEFAULT_DAILY_COST_LIMIT = 2.0). Either wire the FastAPI config values into the Convex checkBudget call args (passing the limits), or remove them from config.py to avoid confusion.

UsageDisplay has no error state (apps/web/src/components/usage-display.tsx:143-183)

When the Convex query errors (network failure, auth expiry), usage is undefined and the skeleton renders indefinitely. Add an error check:

const { data: usage, error } = useQuery(...)
if (error) return <p className="text-xs text-destructive">Could not load usage.</p>

Low / Nits

getUserUsageDetailed is defined but never called (usage.ts:310-350)
The query exists but nothing imports or uses it. Either wire it up to the "detailed view" toggle mentioned in the comment, or remove it to keep the surface clean.

Arcjet log: console in production (apps/web/src/lib/arcjet.ts:21)
log: console forwards all Arcjet internal log messages to stdout. This will add noise in production. Pass a filtered logger or use log: { error: console.error } at minimum.

Hardcoded sdkVersion: "1.3.1" (arcjet.ts:14)
This will silently go stale when the Arcjet package is updated. Pull it from the installed package version, or at minimum document it as a value to keep in sync with package.json.


What's well done

  • The two-layer design (Arcjet for request rate, Convex+FastAPI for cost budget) is clean and the concerns are properly separated.
  • getUserUsage intentionally never exposes raw dollar costs to the client — only percentages. Good.
  • Usage is recorded even on client disconnect (the drain-and-record pattern in chat.py:647-682).
  • The UsageBadgeUsageDialog UX is lightweight and non-intrusive.
  • Proper use of internalQuery/internalMutation for privileged budget operations.

- Added error handling in the UsageDisplay component to notify users when usage data fails to load.
- Updated Arcjet client logging to suppress debug and info logs while retaining error logging.
- Refactored chat rate limit checks for improved readability and graceful degradation.
- Cleaned up unused code in the usage module and clarified comments regarding cost limits and concurrency handling.
@DIodide
Copy link
Copy Markdown
Owner Author

DIodide commented Apr 8, 2026

@claude

@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

Claude finished @DIodide's task in 2m 20s —— View job


Review Fixes Applied

  • Read relevant files
  • Fix Critical: Make ARCJET_KEY optional and guard Arcjet initialization
  • Commit and push

What was changed (commit db4fdaa)

apps/web/src/env.ts
ARCJET_KEY is now .optional() — the app starts without the key, making local development possible without an Arcjet account.

apps/web/src/lib/arcjet.ts
aj is only initialized when env.ARCJET_KEY is present; otherwise it exports null. This makes the graceful-degradation path reachable.

apps/web/src/lib/chat-ratelimit.ts

  • Early return { allowed: true } when aj is null (no key configured)
  • Wrapped aj.protect() in a try/catch that fails open — Arcjet unavailability no longer blocks chat

Already resolved in this branch (no changes needed)

The following issues from the previous review were already addressed in the branch before this commit:

  • upsertBudget already uses Pick<MutationCtx, "db"> — not any
  • UsageDisplay already has an error state
  • arcjet.ts already uses a filtered logger (not bare console)
  • getBody already returns "" (not throws)
  • default_daily_cost_limit / default_weekly_cost_limit not present in config.py
  • getUserUsageDetailed not present in usage.ts
    ·

github-actions bot and others added 2 commits April 8, 2026 18:41
- env.ts: ARCJET_KEY is now optional (z.string().min(1).optional())
- arcjet.ts: only initialize the Arcjet client when the key is present; export null otherwise
- chat-ratelimit.ts: skip Arcjet check and allow all requests when aj is null; wrap protect() in try-catch to fail open on errors

Co-authored-by: Ibraheem Amin <DIodide@users.noreply.github.com>
@DIodide
Copy link
Copy Markdown
Owner Author

DIodide commented Apr 8, 2026

@claude review these changes (don't fix anything)

@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

Claude finished @DIodide's task in 3m 25s —— View job


PR Review: Feat/add-arcjet-ratelimits

  • Gather context and read CLAUDE.md
  • Review changed files via diff
  • Provide comprehensive feedback

Good overall architecture — the two-layer design (Arcjet for per-request rate, Convex+FastAPI for cost budget) is clean, separation of concerns is appropriate, and the fail-open philosophy is consistent throughout the Python layer. Several issues worth addressing, plus one regression introduced by the revert.


Critical

ARCJET_KEY fix was reverted — regression (apps/web/src/env.ts:8)

Commit 4a179c1 reverts db4fdaa, putting ARCJET_KEY: z.string().min(1) (required) back. Any developer without an Arcjet account cannot start the app locally — the env validation throws before any request is processed. If the revert was intentional and Arcjet is now a required service, that's a conscious choice, but it needs to be documented (e.g. in the README/setup guide) and every developer needs an account. The .env.example lists ARCJET_KEY= with no value, which will still fail validation. Fix this →


High

No error handling around aj.protect() in checkChatRateLimit (apps/web/src/lib/chat-ratelimit.ts:35-56)

The aj.protect() call has no try/catch. If Arcjet is temporarily unreachable (timeout, outage), the server function throws. In use-chat-stream.ts:122-130, the call site has no error handling either:

const rateCheck = await checkChatRateLimit({ data: { userId: user.id } });

A network error here will propagate as an uncaught exception inside the streaming callback, likely silently breaking the send handler. The Python layer correctly fails open on all Convex errors — the frontend Arcjet check should do the same. Fix this →

getConversationUsage exposes raw dollar costs (packages/convex-backend/convex/usage.ts:~302-330)

getUserUsage is careful to return only percentages and never raw costs — a deliberate privacy/UX decision called out in the PR description. But getConversationUsage is a public query (not internalQuery) and returns { totalTokens, totalCost, messageCount } directly to the frontend. The ownership check (convo.userId !== identity.subject) prevents cross-user access, but the raw cost is still exposed to the user themselves, inconsistently with the rest of the design. Either make it internalQuery or return cost-less data. Fix this →


Medium

adminSetLimits limits don't persist across periods (usage.ts:~344-420)

The function patches (or creates) a budget document for the current period only. Next week's budget doc is created fresh when the first usage is recorded, using DEFAULT_WEEKLY_COST_LIMIT. A per-user limit set via adminSetLimits today will be silently discarded when the period rolls over. If per-user limits are meant to be durable overrides, they need a separate user-config table (e.g. userLimits with userId, dailyLimit, weeklyLimit) that upsertBudget reads as the default instead of DEFAULT_DAILY_COST_LIMIT.

Per-model/harness breakdown is daily-only, unlabeled (usage.ts:~275-290, usage-display.tsx)

getUserUsage builds perModelPct and perHarnessPct from daily?.perModelUsage only. The UI renders these under "By Model" and "By Harness" with no indication that it reflects today's usage, not the week. A user who used Claude Sonnet heavily earlier in the week but Haiku today will see a misleading model breakdown.

Budget gate fires after SSE 200 response begins (packages/fastapi/app/routes/chat.py:~348-365)

The budget check is inside event_generator(), which runs after FastAPI has already sent the 200 OK and started streaming. When the budget is exceeded, the client gets a 200 with an SSE error event, not a 429. This means HTTP-layer monitoring, retries, and error aggregation tools won't detect budget blocks. Consider moving the check before return EventSourceResponse(event_generator()). Fix this →

retryAfter type cast is fragile (chat-ratelimit.ts:52)

retryAfter = result.reason.reset as number;

The Arcjet ArcjetRateLimitReason.reset field type varies across SDK versions — it can be a Unix timestamp (seconds), a duration in seconds, or a Date. The as number cast bypasses type checking. If reset is a Date, this will stringify to NaN in the error message. Verify the type from the installed version's d.ts.


Low / Nits

Hardcoded sdkVersion: "1.3.1" (apps/web/src/lib/arcjet.ts:14)
Still present after the previous review flagged it. Will silently go stale when the Arcjet package is updated. Pull it from the package at runtime or document it as a value to keep in sync.

UsageBadge returns null on both loading and error (usage-display.tsx:~200)
Benign, but on a network error the badge silently disappears rather than showing a degraded state. useQuery returns { data, error } — a simple if (error) return <span className="text-xs text-destructive">–</span> would make errors visible.

ARCJET_KEY= in .env.example will fail env validation (apps/web/.env.example)
Since ARCJET_KEY is required, an empty value (ARCJET_KEY=) will fail z.string().min(1). The comment says "copy from Arcjet dashboard" but no placeholder value is shown. Consider using ARCJET_KEY=ajkey_... or annotating that the app won't start without a real value.


What's well done

  • Fail-open is correctly applied throughout the Python layer (check_user_budget, record_usage).
  • getUserUsage returns only percentages, never raw dollar amounts — good privacy hygiene.
  • Usage is recorded even on client disconnect via the drain-and-record pattern in chat.py.
  • upsertBudget is correctly typed with Pick<MutationCtx, "db"> and includes the OCC retry note.
  • The UsageBadgeUsageDialog UX is lightweight and non-intrusive.
  • record_usage in Python correctly skips recording when no cost data is present.
  • Filtered logger in arcjet.ts (only error: console.error) — no noise from debug/info Arcjet internals.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@DIodide DIodide merged commit 5fc2ec1 into staging Apr 8, 2026
1 check passed
@DIodide DIodide deleted the feat/add-arcjet-ratelimits branch April 8, 2026 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant