-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Description:
When using the LLaVA multimodal chat handler Llava15ChatHandler or Llava16ChatHandler, the model stops updating images starting from the third request. After this point, all responses reference the image from the second request, regardless of what new image or prompt is provided. This issue does not occur with Qwen-VL models using the same setup.
Environment:
llama-cpp-python version: 0.3.32 (latest commit)
llama.cpp version: latest commit
Model: llama-joycaption-beta-one-hf-llava-q8_0.gguf + llama-joycaption-beta-one-llava-mmproj-model-f16.gguf
https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main
GPU: NVIDIA RTX 5080 (16 GB VRAM)
OS: Windows 10
Python: 3.13.2
Steps to Reproduce
Start a chat session with a LLaVA model
Send a message with Image A + text prompt → ✅ Model correctly describes Image A
Send a message with Image B + same text prompt → ✅ Model correctly describes Image B
Send a message with Image C + same text prompt → ❌ Model describes Image B
Send a message with Image D + different text prompt → ❌ Model still describes Image B
Recovery:
❌ Changing the text prompt does not help
❌ Sending more requests does not help
✅ Only a model reload resolves the issue
Request
Please investigate why LLaVA image caching behaves differently from Qwen-VL
I'm provide additional logs if needed to help diagnose this issue.