This document traces the exact path a user prompt takes from the Codebuff CLI through the SDK, agent runtime, server, and back.
┌─────────┐ ┌─────────┐ ┌───────────────┐ ┌────────────────┐ ┌──────────┐
│ CLI │───▶│ SDK │───▶│ Agent Runtime │───▶│ Codebuff Server│───▶│ LLM API │
│ (TUI) │◀───│ run.ts │◀───│ loopAgentSteps│◀───│ /v1/chat/... │◀───│(OR/OAI/..)│
└─────────┘ └─────────┘ └───────────────┘ └────────────────┘ └──────────┘
Files: cli/src/hooks/use-send-message.ts, cli/src/hooks/helpers/send-message.ts
- User types a prompt and hits Enter.
prepareUserMessage()processes the input:- Collects pending bash context (terminal output since last prompt)
- Processes image and text attachments
- Creates a user message in the chat UI
setupStreamingContext()initializes:- An
AbortController(for user cancellation via Escape) - A timer (tracks elapsed time)
- A batched message updater (efficiently updates the UI)
- An
- The CLI calls
client.run()from the SDK.
File: sdk/src/run.ts
run()→runOnce()is called with the prompt, agent ID, cost mode, and session state.- Session state is initialized (fresh) or restored (from
previousRun). - User identity is verified via
getUserInfoFromApiKey()(calls the web API). - Tool handlers are registered — these execute locally on the user's machine:
write_file,str_replace,apply_patch→ file editsrun_terminal_command→ shell commandscode_search,glob,list_directory→ file searchread_files→ file reading- Custom tool definitions and MCP tools
- Action handlers are registered to process server responses:
response-chunk→ streams text to the CLIsubagent-response-chunk→ streams subagent outputprompt-response→ final result (resolves the promise)prompt-error→ error result
callMainPrompt()is called (fire-and-forget, with a.catch()handler).- The function returns a promise that resolves when
prompt-responseor an error arrives.
File: packages/agent-runtime/src/main-prompt.ts
callMainPrompt()resets credits to 0 (server controls cost tracking).- Assembles local agent templates from the project's
.agents/directory. - Sends a
response-chunkstartevent to the CLI. mainPrompt()determines the agent type based on cost mode:free→base-freenormal→basemax→base-maxask→askexperimental→base2- Fallback (default) →
base2 - Or a custom agent ID
- Calls
loopAgentSteps()with the agent template, prompt, and session state.
File: packages/agent-runtime/src/run-agent-step.ts
loopAgentSteps()starts an agent run (recorded in the database).- Builds the system prompt, tool definitions, and initial messages.
- Enters the main loop:
while (true) { // 1. Run programmatic step (if agent has handleSteps) // 2. Check if turn should end // 3. Call runAgentStep() for LLM inference // 4. Process tool calls and responses } - Each
runAgentStep()call:- Checks context token count via the
/api/v1/token-countendpoint - Calls
getAgentStreamFromTemplate()→promptAiSdkStream() processStream()iterates over the AI SDK stream, handling text chunks and tool calls- Tool calls are sent back to the SDK via
requestToolCall, executed locally, and results fed back
- Checks context token count via the
- The loop continues until the agent signals completion (no more tool calls, or
task_completedtool). - Sends a
response-chunkfinishevent, then aprompt-responseaction with the final session state and output.
Files: sdk/src/impl/llm.ts, sdk/src/impl/model-provider.ts
promptAiSdkStream() selects the model provider:
- Claude OAuth — If the user has connected their Claude subscription and the model is a Claude model, requests go directly to
api.anthropic.comusing the user's OAuth token. Zero cost to the user's Codebuff credits. - ChatGPT OAuth — If the user has connected their ChatGPT subscription and the model is an OpenAI model, requests go to the ChatGPT backend API.
- Codebuff Backend (default) — Requests go to
POST /api/v1/chat/completionson the Codebuff web server, which routes to the appropriate LLM provider.
For OAuth providers, rate limit errors trigger automatic fallback to the Codebuff backend (unless in free mode).
The AI SDK's streamText() function handles the actual HTTP call, streaming, and retry logic.
File: web/src/app/api/v1/chat/completions/_post.ts
The server processes the request through several validation gates:
- Parse request body — Returns 400 if invalid JSON.
- Authenticate — Extracts API key from
Authorizationheader. Returns 401 if missing/invalid. - Check ban status — Returns 403
account_suspendedif user is banned. - Free mode country check — For free mode requests, checks user's IP against allowed countries. Returns 403
free_mode_unavailableif not allowed. - Validate agent run — Checks the
run_idexists and is inrunningstatus. Returns 400 if invalid. - Subscription block grant — For subscribers, ensures a billing block is active. Returns 429
rate_limit_exceededif limit hit and fallback disabled. - Credit check — Returns 402 if user has no remaining credits (and not a free mode request).
- Route to LLM provider — Based on the model, routes to:
- Fireworks AI (for supported models)
- OpenAI direct (for OpenAI models)
- OpenRouter (default, for all other models)
- Return response — Streaming requests return an SSE stream (
text/event-stream). Non-streaming requests return JSON.
- The LLM provider streams tokens back to the server.
- The server forwards the SSE stream to the AI SDK client.
promptAiSdkStream()yields chunks from the AI SDK'sfullStream:text-delta→ text contenttool-call→ tool invocationerror→ error handling (OAuth fallback, retries, etc.)
processStream()in agent-runtime handles each chunk:- Text chunks →
sendAction({ type: 'response-chunk', chunk })→ SDK → CLI UI - Tool calls →
requestToolCall()→ SDK executes locally → result fed back to stream
- Text chunks →
- When the agent loop finishes,
callMainPromptsends:- A
response-chunkfinishevent (with total cost) - A
prompt-responseaction (with final session state and output)
- A
- The SDK's
handlePromptResponse()validates the output againstAgentOutputSchemaand resolves the promise. - The CLI's
handleRunCompletion()processes the result:- Checks for known error types (out of credits, free mode unavailable)
- Updates the UI with completion time and credit cost
- Marks the message as complete
Tool calls execute locally on the user's machine, not on the server:
LLM Response (tool_call) Agent Runtime processes stream
│ │
▼ ▼
processStream() ─── requestToolCall ──▶ SDK run.ts
│ │
│ handleToolCall()
│ │
│ Executes locally
│ (file edit, terminal, search)
│ │
◀─────── tool result ───────────────┘
│
Feeds result back into next LLM call
Session state persists across prompts within a conversation:
sessionState.mainAgentState.messageHistory— Full conversation historysessionState.fileContext— Project files, knowledge files, custom tools- The CLI stores the
RunStatefrom each run and passes it aspreviousRunto the nextclient.run()call
When the user presses Escape:
- CLI aborts the
AbortController - The
abortsignal propagates through the SDK → agent runtime → AI SDK loopAgentStepscatches theAbortError, marks the run ascancelled- CLI's abort handler shows an interruption notice and marks the message complete