Description
BREAKING CHANGE: Refactor Ollama lifecycle management.
Current behavior: Ollama starts on team deploy (model_provider=ollama), stops when last such team stops (ref_count → 0).
New behavior: Ollama starts first time needed (RAG upload or Ollama team) and never stops automatically.
Changes:
- Remove
ollamaMu mutex from server.go
- Remove ref_count increment in
deployTeamAsync
- Remove ref_count decrement +
StopOllama call in StopTeam
- Keep EnsureOllama, ConnectToNetwork, PullModel, WarmUp in deploy flow
- Keep DisconnectFromNetwork in StopTeam
- Update
GetOllamaStatus handler (no more ref_count)
Why
RAG needs Ollama for embeddings at upload time AND query time. The old ref counting would stop Ollama when the last Ollama team stops, breaking RAG. Idle Ollama uses ~50MB RAM — negligible.
Acceptance Criteria
Description
BREAKING CHANGE: Refactor Ollama lifecycle management.
Current behavior: Ollama starts on team deploy (model_provider=ollama), stops when last such team stops (ref_count → 0).
New behavior: Ollama starts first time needed (RAG upload or Ollama team) and never stops automatically.
Changes:
ollamaMumutex fromserver.godeployTeamAsyncStopOllamacall inStopTeamGetOllamaStatushandler (no more ref_count)Why
RAG needs Ollama for embeddings at upload time AND query time. The old ref counting would stop Ollama when the last Ollama team stops, breaking RAG. Idle Ollama uses ~50MB RAM — negligible.
Acceptance Criteria