Skip to content

LLaVA image freeze after Third inference #90

@KLL535

Description

@KLL535

Description:

When using the LLaVA multimodal chat handler Llava15ChatHandler or Llava16ChatHandler, the model stops updating images starting from the third request. After this point, all responses reference the image from the second request, regardless of what new image or prompt is provided. This issue does not occur with Qwen-VL models using the same setup.

Environment:

llama-cpp-python version: 0.3.32 (latest commit)
llama.cpp version: latest commit
Model: llama-joycaption-beta-one-hf-llava-q8_0.gguf + llama-joycaption-beta-one-llava-mmproj-model-f16.gguf
https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main
GPU: NVIDIA RTX 5080 (16 GB VRAM)
OS: Windows 10
Python: 3.13.2

Steps to Reproduce

Start a chat session with a LLaVA model
Send a message with Image A + text prompt → ✅ Model correctly describes Image A
Send a message with Image B + same text prompt → ✅ Model correctly describes Image B
Send a message with Image C + same text prompt → ❌ Model describes Image B
Send a message with Image D + different text prompt → ❌ Model still describes Image B

Recovery:

❌ Changing the text prompt does not help
❌ Sending more requests does not help
✅ Only a model reload resolves the issue

Request

Please investigate why LLaVA image caching behaves differently from Qwen-VL
I'm provide additional logs if needed to help diagnose this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions