LLaVA image freeze after Third inference

# Description:
When using the LLaVA multimodal chat handler Llava15ChatHandler or Llava16ChatHandler, the model stops updating images starting from the third request. After this point, all responses reference the image from the second request, regardless of what new image or prompt is provided. This issue does not occur with Qwen-VL models using the same setup.

# Environment:
llama-cpp-python version: 0.3.32 (latest commit)
llama.cpp version: latest commit
Model: llama-joycaption-beta-one-hf-llava-q8_0.gguf + llama-joycaption-beta-one-llava-mmproj-model-f16.gguf
https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main
GPU: NVIDIA RTX 5080 (16 GB VRAM)
OS: Windows 10
Python: 3.13.2

# Steps to Reproduce
Start a chat session with a LLaVA model
Send a message with Image A + text prompt → ✅ Model correctly describes Image A
Send a message with Image B + same text prompt → ✅ Model correctly describes Image B
Send a message with Image C + same text prompt → ❌ Model describes Image B
Send a message with Image D + different text prompt → ❌ Model still describes Image B

# Recovery:
❌ Changing the text prompt does not help
❌ Sending more requests does not help
✅ Only a model reload resolves the issue

# Request
Please investigate why LLaVA image caching behaves differently from Qwen-VL
I'm provide additional logs if needed to help diagnose this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVA image freeze after Third inference #90

Description:

Environment:

Steps to Reproduce

Recovery:

Request

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

LLaVA image freeze after Third inference #90

Description

Description:

Environment:

Steps to Reproduce

Recovery:

Request

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions