[v0.3.31] Release Note: Omni-Modal Media Pipeline, Hybrid 1-Token Rollback and Enhanced Logging #80

JamePeng · 2026-03-06T23:25:28Z

JamePeng
Mar 6, 2026
Maintainer

Omni-Modal Media Pipeline, Hybrid 1-Token Rollback and Enhanced Logging

Release v0.3.31 introduces structural updates to the multi-modal processing pipeline, addresses a specific caching behavior in hybrid models, and improves how underlying C++ backend errors are surfaced to the Python layer.

Here is a detailed breakdown of the changes in this version.

1. Omni-Modal Media Pipeline

The media parsing and loading pipeline in MTMDChatHandler has been rewritten to handle both vision and audio inputs within a unified architecture.

Hardware Capability Sniffing: The _init_mtmd_context method now actively probes the C++ backend for ctx_v (vision) and ctx_a (audio) encoders. This provides proactive validation of the model's capabilities before media processing begins.
Unified Media Extraction: We replaced get_image_urls and split_text_on_image_urls with _get_media_items. This parses image_url, input_audio, and audio_url while strictly maintaining the chronological order of user prompts and enforcing OpenAI format specifications.
Media Dispatcher & Magic Bytes Validation: A unified load_media dispatcher has been introduced. It includes a new detect_audio_format method that mimics llama.cpp's C++ magic bytes sniffing (RIFF/WAVE, ID3/MPEG, fLaC) to prevent backend crashes caused by unsupported or corrupted audio formats.
Concurrent Omni-Decoding: The ThreadPoolExecutor in _process_mtmd_prompt has been updated to concurrently fetch and decode both image and audio payloads into unified mtmd_bitmap structures.

Note: Audio processing capabilities in the underlying llama.cpp engine are currently in an experimental stage.

2. Hybrid Model 1-Token Rollbacks (N-1 Checkpointing)

This release addresses an issue where generating responses with hybrid or recurrent models (like RNNs) could result in empty outputs or state desyncs when the prompt cache matched 100%.

When a prompt matches the cache entirely (e.g., when a user regenerates a response with the same prompt but a different seed), the engine attempts a "1-token rollback" to refresh the sampling logits. Because hybrid models cannot arbitrarily truncate their internal states like standard Transformers, rolling back one token without a dedicated snapshot caused the state machine to fail.

The engine now forces an N-1 state snapshot during the prompt prefilling phase for hybrid models. This ensures the engine can safely perform a 1-token rollback to refresh logits upon 100% cache matches, preventing desyncs without requiring a full re-evaluation of the prompt.

Note for ComfyUI Plugin Developers: If your workflow requires the seed to be a factor in the initial, complete recalculation of the prompt (rather than just affecting the sampling phase), we recommend performing an explicit reset operation or clearing the cache before inputting the prompt.

3. Exposing Critical C++ Errors

We have removed the OS-level log suppression (suppress_stdout_stderr) around critical C++ backend calls, specifically within _init_mtmd_context, _create_bitmap_from_bytes, and close.

Previously, when verbose=False, this file descriptor redirection was inadvertently swallowing fatal C++ backend errors—such as stb_image decoding failures, corrupted .mmproj model weights, or CUDA Out-Of-Memory aborts. This resulted in silent crashes that were difficult to debug.

The framework now relies entirely on the native C-API llama_log_callback to route logs to Python. This ensures that critical decoding and hardware exceptions remain visible in the console, while standard processing logs can still be filtered by the Python logging module.

4. Upstream Synchronization

Updated llama.cpp backend to ggml-org/llama.cpp commit [f5ddcd1696eca5069dc7915f4d4c03c9a709afea](ggml-org/llama.cpp@f5ddcd1).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v0.3.31] Release Note: Omni-Modal Media Pipeline, Hybrid 1-Token Rollback and Enhanced Logging #80

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[v0.3.31] Release Note: Omni-Modal Media Pipeline, Hybrid 1-Token Rollback and Enhanced Logging #80

Uh oh!

JamePeng Mar 6, 2026 Maintainer

Omni-Modal Media Pipeline, Hybrid 1-Token Rollback and Enhanced Logging

1. Omni-Modal Media Pipeline

2. Hybrid Model 1-Token Rollbacks (N-1 Checkpointing)

3. Exposing Critical C++ Errors

4. Upstream Synchronization

Replies: 0 comments

JamePeng
Mar 6, 2026
Maintainer