Sub-query level semantic caching for LLM APIs — 3-tier hybrid engine with FAISS vector search. 87.5% cache hit rate, 71.8% cost savings on 100 real API calls.
react python caching cost-optimization faiss fastapi vector-search llm semantic-cache intent-decomposition
-
Updated
Mar 4, 2026 - Python