openrouter: add image, video, music, and speech generation providers#1
openrouter: add image, video, music, and speech generation providers#1
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughAdds OpenRouter image, video, music, and speech generation providers plus SSE streaming and config utilities; registers providers in the plugin and adds corresponding tests. Providers resolve API keys, build HTTP requests to OpenRouter, handle streaming or polled responses, and map outputs to GeneratedAsset results. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Provider as OpenRouter Provider
participant Auth as Auth Runtime
participant HTTP as HTTP Runtime
participant OpenRouter as OpenRouter API
Client->>Provider: generate request (image/video/music/speech)
Provider->>Auth: resolveApiKeyForProvider(...)
Auth-->>Provider: apiKey or empty
alt apiKey missing
Provider-->>Client: Error "API key missing"
else apiKey present
Provider->>HTTP: resolveProviderHttpRequestConfig(...)
HTTP-->>Provider: baseUrl, headers, dispatcher
Provider->>OpenRouter: POST /chat/completions or /videos (stream:true for audio)
alt Streaming (music/speech)
loop SSE chunks
OpenRouter-->>Provider: data: { delta.audio.data, delta.audio.transcript }
Provider->>Provider: collectStreamedAudio → buffers + transcript
end
Provider->>Client: return GeneratedMusic/Audio asset
else Job-based (video)
OpenRouter-->>Provider: job id / polling_url
Provider->>OpenRouter: poll /videos/{id}/status
alt completed
Provider->>OpenRouter: download (unsigned or authenticated)
OpenRouter-->>Provider: binary video data
Provider->>Client: return GeneratedVideo asset
else failed
Provider-->>Client: Error (generation failed)
end
else Image response
OpenRouter-->>Provider: choices[0].message.images (data URLs)
Provider->>Provider: decode data URLs → image buffer(s)
Provider->>Client: return GeneratedImage asset
end
Provider->>HTTP: response.body.cancel() / cleanup
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Register OpenRouter as an image, video, and music generation provider using the existing plugin-sdk media generation contracts. - Image generation via POST /chat/completions with modalities ["image", "text"] - Video generation via POST /videos with async job polling - Music generation via POST /chat/completions with streaming audio output - Contract test and unit tests for all three providers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use "audio" instead of "music" for ProviderRequestCapability - Use mockResolvedValueOnce with empty string for API key tests to avoid polluting subsequent test mock state - Preserve whitespace in streamed transcript chunks by reading raw string instead of normalizeOptionalString Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Decode each streamed base64 audio chunk individually before concatenating raw buffers to avoid corruption from padding chars - Derive image fileName extension from actual MIME type instead of hardcoding .png - Strip Authorization header from unsigned video download URLs to avoid leaking API key to third-party CDN hosts - Extract shared resolveConfiguredBaseUrl to openrouter-config.ts to eliminate duplication across all three provider files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Validate that the video polling_url shares the same origin as the configured base URL before sending authenticated requests to it; fall back to constructing the URL from the video ID - Declare supportsLyrics, supportsInstrumental, supportsFormat, and supportedFormats in music generation capabilities so the runtime normalizer preserves user-provided overrides Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split the audio modality into two providers matching OpenClaw's capability boundaries: - MusicGenerationProvider: Google Lyria 3 Clip/Pro models for music. No voice parameter; supports lyrics, instrumental, format overrides. - SpeechProvider: OpenAI GPT Audio/Mini/4o-Audio models for voice synthesis. Supports voice selection, format (mp3/opus/wav), and voice-note target. Extract shared SSE audio stream collector to streaming-audio.ts so both providers reuse the same chunked base64 decode logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add providerOverrides support so runtime directive overrides for model and voice flow through to the OpenRouter request - Add tests for: providerConfig model/voice/format overrides, providerOverrides model/voice at runtime, empty stream error, WAV format via config, complete voice list assertion - Match the test coverage depth of image/video/music providers and the reference OpenAI speech provider Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Video: release submit response before poll+download loop to avoid holding the connection open for the entire generation duration - Image: reject inputImages with an error when edit mode is disabled - Speech: add resolveConfig to normalize raw provider config at startup, matching the pattern from OpenAI/ElevenLabs/MiniMax - Speech: add parseDirectiveToken so @voice:nova and @model:... TTS directives work when this provider is selected Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch music and speech providers from fetchWithTimeout to fetchWithTimeoutGuarded so SSRF/private-network checks and dispatcher/proxy routing are enforced - Stop consuming SSE stream after [DONE] sentinel to prevent hanging when upstream keeps the connection open - Flush any remaining buffer content after the read loop exits to avoid dropping the final frame if it lacks a trailing newline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use fetchWithTimeoutGuarded for video poll and authenticated download calls so SSRF/dispatcher policy is enforced throughout the entire video generation workflow, not just the submit step - Keep raw fetchWithTimeout for unsigned CDN download URLs since they are third-party hosts outside the policy scope - Derive modalities from model: image-only models (Flux, Sourceful) use ["image"], dual-output models (Gemini) use ["image", "text"] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Invert the modalities logic: default to ["image"] (universally supported) and only use ["image", "text"] for known dual-output model families (Google, OpenAI). This is safer since an incorrect dual-output request fails on image-only models, but image-only works everywhere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
We only extract message.images from the response and never use the text output, so there is no reason to request dual-output. Using modalities: ["image"] works for all models and eliminates hardcoded model-prefix lists that would go stale as OpenRouter adds providers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix speech test: destructured name was fetchWithTimeoutMock but
factory created fetchWithTimeoutGuardedMock (ReferenceError)
- Fix video test: poll and authenticated download now use
fetchWithTimeoutGuarded; update mocks to return { response, release }
- Keep raw fetchWithTimeout mock for unsigned CDN downloads
- Add test for authenticated download path (no unsigned URL)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Import PinnedDispatcherPolicy from openclaw/plugin-sdk/infra-runtime instead of typing dispatcherPolicy as unknown - Use Uint8Array for Response body in tests (Buffer is not BodyInit) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4c70326 to
b5304e3
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
extensions/openrouter/speech-provider.ts (1)
119-127: Inconsistent handling of invalid directive values.For an invalid voice (line 116), the function returns
handled: truewith a warning. For an invalid model (line 125), it returnshandled: falsewithout a warning. This means invalid models silently fall through to other providers, while invalid voices are explicitly rejected.Consider aligning the behavior—either both should warn and mark as handled, or document why models should be treated differently (e.g., to allow delegation to other providers).
Option: Align model handling with voice handling
case "model": case "openrouter_model": if (!ctx.policy.allowModelId) { return { handled: true }; } if (!isValidModel(ctx.value)) { - return { handled: false }; + return { handled: true, warnings: [`invalid OpenRouter model "${ctx.value}"`] }; } return { handled: true, overrides: { model: ctx.value } };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extensions/openrouter/speech-provider.ts` around lines 119 - 127, The model directive handling (cases "model" and "openrouter_model") should mirror the voice handling: when ctx.policy.allowModelId is false return handled:true, and when isValidModel(ctx.value) is false emit a warning via the same logger used for voice (e.g., ctx.logger.warn or the module's logger) and return { handled: true } so invalid models are explicitly rejected instead of silently falling through; update the code in the switch case for "model"/"openrouter_model" to log a clear warning mentioning the invalid model value and then return handled:true, just like the voice branch that uses isValidVoice.extensions/openrouter/video-generation-provider.ts (1)
218-230: Unused variableallowPrivateNetwork.The
allowPrivateNetworkvariable is destructured fromresolveProviderHttpRequestConfigbut never used. It's passed topostJsonRequeston line 253, but consider whether it should also be applied to the guarded fetch calls in polling and download functions.Option: Remove unused destructuring or verify consistency
If not needed:
- const { baseUrl, allowPrivateNetwork, headers, dispatcherPolicy } = + const { baseUrl, headers, dispatcherPolicy } = resolveProviderHttpRequestConfig({Otherwise, verify
allowPrivateNetworkis correctly applied to all fetch operations where applicable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extensions/openrouter/video-generation-provider.ts` around lines 218 - 230, The destructured allowPrivateNetwork from resolveProviderHttpRequestConfig is unused after extraction; either remove it from the destructuring or ensure it is applied consistently to all guarded HTTP calls—specifically pass allowPrivateNetwork into the fetch wrappers used by the polling and download functions (where fetchFn is referenced) and into postJsonRequest if not already; update the destructuring in the block that calls resolveProviderHttpRequestConfig and the polling/download call sites so all network calls consistently honor allowPrivateNetwork or remove the variable if it's unnecessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@extensions/openrouter/image-generation-provider.ts`:
- Around line 60-64: The resolveFileExtension function currently maps unknown
MIME types to "png"; change it to extract and sanitize the subtype from the MIME
string as the fallback so returned filenames reflect the actual MIME (e.g.,
"image/avif" -> "avif"). Keep the explicit mappings for "jpeg"->"jpg",
"webp"->"webp", "gif"->"gif", but if none match, parse mimeType by splitting on
"/" (and dropping any parameters after ";"), validate the subtype is a safe
token (alphanumeric + dashes/underscores), and return that subtype; if
validation fails, fallback to a safe default like "bin". Apply this same logic
wherever the extension resolution is duplicated (the other occurrence referenced
around lines 165-168) to ensure consistent filenames.
In `@extensions/openrouter/music-generation-provider.ts`:
- Around line 24-33: The buildMusicPrompt function currently mixes mutually
exclusive instructions; update buildMusicPrompt(req: MusicGenerationRequest) to
validate and reject requests where req.instrumental === true and
normalizeOptionalString(req.lyrics) returns a non-empty string by throwing a
clear error (e.g., "Cannot specify lyrics for an instrumental track") instead of
appending a Lyrics block, so the function enforces the mutual exclusivity and
prevents conflicting prompt generation.
In `@extensions/openrouter/streaming-audio.ts`:
- Around line 75-93: The incremental TextDecoder usage leaves incomplete
multibyte sequences buffered because decoder.decode() is always called with {
stream: true }; after the read loop (and before the final
parseSseLine(buffer,...)) call decoder.decode() with no options to flush any
remaining bytes into buffer, append that result to buffer, and then call
parseSseLine(buffer, audioBuffers, transcriptParts) as currently done; update
the logic around the read loop where buffer, decoder, reachedDone, parseSseLine,
audioBuffers and transcriptParts are used so the final decoder.decode() flush
occurs and the last SSE line is not truncated.
---
Nitpick comments:
In `@extensions/openrouter/speech-provider.ts`:
- Around line 119-127: The model directive handling (cases "model" and
"openrouter_model") should mirror the voice handling: when
ctx.policy.allowModelId is false return handled:true, and when
isValidModel(ctx.value) is false emit a warning via the same logger used for
voice (e.g., ctx.logger.warn or the module's logger) and return { handled: true
} so invalid models are explicitly rejected instead of silently falling through;
update the code in the switch case for "model"/"openrouter_model" to log a clear
warning mentioning the invalid model value and then return handled:true, just
like the voice branch that uses isValidVoice.
In `@extensions/openrouter/video-generation-provider.ts`:
- Around line 218-230: The destructured allowPrivateNetwork from
resolveProviderHttpRequestConfig is unused after extraction; either remove it
from the destructuring or ensure it is applied consistently to all guarded HTTP
calls—specifically pass allowPrivateNetwork into the fetch wrappers used by the
polling and download functions (where fetchFn is referenced) and into
postJsonRequest if not already; update the destructuring in the block that calls
resolveProviderHttpRequestConfig and the polling/download call sites so all
network calls consistently honor allowPrivateNetwork or remove the variable if
it's unnecessary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 34d0c585-8876-4de4-b837-70ce4f96a33b
📒 Files selected for processing (13)
extensions/openrouter/image-generation-provider.test.tsextensions/openrouter/image-generation-provider.tsextensions/openrouter/index.tsextensions/openrouter/music-generation-provider.test.tsextensions/openrouter/music-generation-provider.tsextensions/openrouter/openclaw.plugin.jsonextensions/openrouter/openrouter-config.tsextensions/openrouter/plugin-registration.contract.test.tsextensions/openrouter/speech-provider.test.tsextensions/openrouter/speech-provider.tsextensions/openrouter/streaming-audio.tsextensions/openrouter/video-generation-provider.test.tsextensions/openrouter/video-generation-provider.ts
- Derive image file extension from MIME subtype for unknown formats instead of defaulting everything to .png - Reject contradictory instrumental + lyrics in music prompt builder - Flush TextDecoder at EOF to avoid losing final multibyte sequences Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@extensions/openrouter/image-generation-provider.ts`:
- Around line 174-182: The code currently accepts any data URL via
extractBase64FromDataUrl and creates a GeneratedImageAsset even for non-image
MIME types; update the logic in the block that handles the parsed result (the
code returning { buffer, mimeType, fileName }) to validate parsed.mimeType
starts with "image/" (or otherwise matches allowed image MIME types) and return
null for anything else, ensuring resolveFileExtension is only called for image
MIME types; reference extractBase64FromDataUrl and the object creation that
builds fileName to locate where to add this check.
In `@extensions/openrouter/music-generation-provider.ts`:
- Around line 73-83: The transport option passed to
resolveProviderHttpRequestConfig in the music-generation-provider.ts call is
incorrectly set to "http"; update the object literal argument to use "https"
instead so resolveProviderHttpRequestConfig receives transport: "https" (locate
the call to resolveProviderHttpRequestConfig that includes baseUrl, headers,
dispatcherPolicy and change the transport field).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c4545c4b-363e-4703-824d-4846ab841709
📒 Files selected for processing (4)
extensions/openrouter/image-generation-provider.tsextensions/openrouter/music-generation-provider.test.tsextensions/openrouter/music-generation-provider.tsextensions/openrouter/streaming-audio.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- extensions/openrouter/streaming-audio.ts
| const { baseUrl, headers, dispatcherPolicy } = resolveProviderHttpRequestConfig({ | ||
| baseUrl: resolveConfiguredBaseUrl(req.cfg), | ||
| defaultBaseUrl: OPENROUTER_BASE_URL, | ||
| allowPrivateNetwork: false, | ||
| defaultHeaders: { | ||
| Authorization: `Bearer ${auth.apiKey}`, | ||
| }, | ||
| provider: "openrouter", | ||
| capability: "audio", | ||
| transport: "http", | ||
| }); |
There was a problem hiding this comment.
Transport parameter should be "https".
The transport parameter is set to "http", but OpenRouter's API endpoint uses HTTPS. This mismatch could affect security policy enforcement or dispatcher behavior in resolveProviderHttpRequestConfig.
🔧 Suggested fix
provider: "openrouter",
capability: "audio",
- transport: "http",
+ transport: "https",
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const { baseUrl, headers, dispatcherPolicy } = resolveProviderHttpRequestConfig({ | |
| baseUrl: resolveConfiguredBaseUrl(req.cfg), | |
| defaultBaseUrl: OPENROUTER_BASE_URL, | |
| allowPrivateNetwork: false, | |
| defaultHeaders: { | |
| Authorization: `Bearer ${auth.apiKey}`, | |
| }, | |
| provider: "openrouter", | |
| capability: "audio", | |
| transport: "http", | |
| }); | |
| const { baseUrl, headers, dispatcherPolicy } = resolveProviderHttpRequestConfig({ | |
| baseUrl: resolveConfiguredBaseUrl(req.cfg), | |
| defaultBaseUrl: OPENROUTER_BASE_URL, | |
| allowPrivateNetwork: false, | |
| defaultHeaders: { | |
| Authorization: `Bearer ${auth.apiKey}`, | |
| }, | |
| provider: "openrouter", | |
| capability: "audio", | |
| transport: "https", | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@extensions/openrouter/music-generation-provider.ts` around lines 73 - 83, The
transport option passed to resolveProviderHttpRequestConfig in the
music-generation-provider.ts call is incorrectly set to "http"; update the
object literal argument to use "https" instead so
resolveProviderHttpRequestConfig receives transport: "https" (locate the call to
resolveProviderHttpRequestConfig that includes baseUrl, headers,
dispatcherPolicy and change the transport field).
Skip data URLs with non-image MIME types (e.g. text/html) instead of emitting them as GeneratedImageAsset. Normalize MIME to lowercase before the check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
AI-assisted
Built with Claude Code. All code reviewed against existing extension patterns (OpenAI, Google, Comfy, ElevenLabs).
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
Chores