Skip to content

feat: add health-supplement-search ability#214

Open
megz2020 wants to merge 13 commits intoopenhome-dev:devfrom
megz2020:feat/health-supplement-search
Open

feat: add health-supplement-search ability#214
megz2020 wants to merge 13 commits intoopenhome-dev:devfrom
megz2020:feat/health-supplement-search

Conversation

@megz2020
Copy link
Contributor

Adds a new voice-driven health supplement search ability that lets users ask about a health concern and get personalized supplement recommendations from a curated database of 100 real iHerb products.
health supplement search Loom record

Semantic vector search over 100 curated supplement products (names, brands, ratings, reviews, effects)
Supports Qdrant Cloud (free 1 GB tier, recommended) and Weaviate Cloud (14-day sandbox) as vector backends
Falls back to Serper web search when a product isn't found in the local database
Multi-turn conversation: ask for product details, re-rank by rating, or search a new concern
STT-resilient: handles garbled voice input via LLM intent classification and a guess-and-confirm flow
Passes local validator (validate_ability.py) with zero errors
How It Works
User speaks a health concern ("find me something for joint pain")
Query is embedded via Jina AI (jina-embeddings-v3, 1024 dims) and searched against the Qdrant collection
If cosine distance < 0.70 → return curated results; otherwise fall back to Serper
Results are summarized by the OpenHome LLM into a natural voice response
User can ask for details on a specific product, re-rank by rating, or search something new
STT Resilience
Voice recognition often garbles health queries. This ability handles it in two layers:

LLM intent check: all inputs of 3+ words go through an LLM to judge health intent, even if no keyword matched
Guess and confirm: short or ambiguous inputs trigger a guess ("Did you mean joint pain?") — a "yes" confirms and searches
Setup Required
The ability needs a pre-loaded vector database. Full setup instructions and scripts are in the companion branch: feat/health-supplement-search-setup

megz2020 added 11 commits March 15, 2026 03:40
Voice-driven semantic search over 100 curated supplement products.
Supports Weaviate (built-in Snowflake Arctic embeddings) and Qdrant
(Jina AI embeddings) via a config flag. Falls back to Serper web
search when a supplement is not found in the local DB.
- Remove unused json import from main.py
- Replace CONFIG_FILE/load_config with top-level constant block
- Update README to document constants-based setup (not JSON file)
- Fix setup branch link in README (root, not subfolder path)
Architecture:
- Add per-provider threshold note to DISTANCE_THRESHOLD config comment
- Extract trigger text in call() to pre-fill first search turn
- Initialize _last_results/_last_source/_trigger_text in call() (not class-level)

Code quality:
- Remove LLM fallback from _wants_exit; expand EXIT_WORDS with phrase set
- Add ordering comment above rerank/detail checks
- Add _strip_llm_fences + ordinal word fallback to _wants_detail int parse
- Wrap _log/_err in try/except matching local-event-explorer pattern
- Add isinstance(reviews, list) guard in _detail_response
- Add payload guard in _wants_detail list comprehension

Performance:
- Wrap all text_to_text_response calls in asyncio.to_thread (non-blocking)
- Make _summarize_curated, _summarize_web, _detail_response async
- Expand _DETAIL_TRIGGERS: add affirmative follow-ups (yes, give me, show me,
  the first/second/third) so 'Yes. Give me' correctly routes to detail mode
- Add clarification response when detail intent detected but product not resolved
  (e.g. STT garble like 'the restaurant') instead of falling through to search
- Tighten _summarize_curated prompt: explicitly forbid inferring benefits not
  listed in the data to prevent LLM hallucination (e.g. 'cancer treatment')
- Add _is_health_query() guard: keyword-first check then short LLM fallback
  rejects off-topic inputs before triggering a vector DB search
- Add thank you/thanks/cheers to EXIT_WORDS (covers 'Thank you, Snowby' garble)
- Add short-input LLM fallback in _wants_exit for inputs <=5 words that pass
  keyword check — catches STT garbles of goodbye that keyword matching misses
- Add _just_showed_detail flag: blocks _DETAIL_TRIGGERS from re-matching on
  the turn immediately after detail was shown, preventing the double-detail loop
- Strip HTML tags from reviews before passing to _detail_response using
  _strip_html() — source data contains raw <span className=...> tags that
  garble the review text and cause LLM to paraphrase instead of quoting
…ealth_query

Previously the LLM fallback only ran for inputs <=6 words, so long off-topic
queries like "What is the result between Liverpool and Tottenham today?" bypassed
the guard and triggered a supplement search. Now LLM is called for all inputs
that don't match health keywords.
- Remove implementation-detail and narrative comments; keep only "why" comments
- Update README: Qdrant as primary provider, STT resilience section, run_io_loop listed
- Apply ruff format (no logic changes)
…apping

- Declare _trigger_text and _just_showed_detail as class attributes
  to match OpenHome convention (alongside _last_results / _last_source)
- Remove awkward multi-line parens ruff introduced around pending_guess
  and confirmed_search inline comments
@megz2020 megz2020 requested a review from a team as a code owner March 15, 2026 22:03
@github-actions
Copy link
Contributor

github-actions bot commented Mar 15, 2026

✅ Community PR Path Check — Passed

All changed files are inside the community/ folder. Looks good!

@github-actions
Copy link
Contributor

github-actions bot commented Mar 15, 2026

🔀 Branch Merge Check

PR direction: feat/health-supplement-searchdev

Passedfeat/health-supplement-searchdev is a valid merge direction

@github-actions
Copy link
Contributor

github-actions bot commented Mar 15, 2026

✅ Ability Validation Passed

📋 Validating: community/health-supplement-search
  ✅ All checks passed!

@github-actions github-actions bot added the community-ability Community-contributed ability label Mar 15, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 15, 2026

🔍 Lint Results

__init__.py — Empty as expected

Files linted: community/health-supplement-search/main.py

✅ Flake8 — Passed

✅ All checks passed!

Copy link
Contributor

@uzair401 uzair401 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @megz2020, ran this through the voice naturalness audit. LLM-based intent routing is correctly used throughout and the STT resilience design is well thought out. A few issues to address:

1. Hardcoded string matching

  • The guess confirmation tuple is missing coverage for common spoken affirmatives. Add: "absolutely", "go ahead", "do it", "sounds good", "for sure", "yup".
  • _wants_rerank uses hardcoded substring matching ("best rated", "highest rated") which will miss paraphrases like "which one has the best reviews", "most popular", "sort by rating". Replace with an LLM classifier:
result = self.capability_worker.text_to_text_response(
    f"The user said: '{user_input}'. Are they asking to sort results by rating? "
    "Reply ONLY with: RATING_HIGH, RATING_LOW, or NO."
).strip().upper()
  • "more", "ok", and "okay" in _DETAIL_TRIGGERS will produce false positives on inputs like "no more", "one more search", "ok thanks". Remove them — the LLM path in _wants_detail already handles intent resolution correctly without these.

2. LLM classifier prompts missing few-shot examples

  • Both _wants_exit classifier prompts provide no examples, which reduces reliability on STT-garbled farewell phrases. Add inline examples: "'bye', 'that's all', 'im done', 'cheers' = YES. Reply YES or NO only."
  • _guess_health_intent provides no garbled input examples for the LLM to calibrate against. Add: "Examples: 'join te pin' → 'joint pain', 'sleep iz shoes' → 'sleep issues'."

3. EXIT_WORDS coverage gap

The set is missing several common spoken closing phrases. Add: "i'm good", "all set", "i'm all set", "that's enough", "nothing else".

4. LLM output formatting constraints incomplete

_summarize_curated, _summarize_web, and _detail_response instruct the LLM to avoid markdown but do not explicitly prohibit bullet points or numbered lists. A response like "1. Product A 2. Product B" will pass the markdown check but produce broken TTS output. Add to all three prompts: "Plain spoken English only. No bullet points, no numbered lists, no formatting of any kind."

5. Response length violations

  • The opening speak() string is 46 words, exceeding the 30-word ceiling. Refactor to:

    "Welcome to Health Supplement Search — informational only, not medical advice. What health concern can I help you with?" (17 words)

  • The setup error string references main.py and README — both are meaningless in a voice context. Refactor to: "Health Supplement Search isn't configured yet. Please add your API keys and re-upload the ability."
  • _summarize_curated instructs the LLM to respond in 3-4 sentences. Per voice delivery guidelines, result delivery should be capped at 2-3 sentences max. Update accordingly.

No menu-driven flow issues found. Please push fixes and we'll take another look!

- Add _normalize_query() to clean garbled STT before vector search
- Replace hardcoded _wants_rerank with LLM classifier
- Replace keyword-based _wants_exit with fully LLM-based classifier
- Add few-shot examples to exit/intent/guess prompts
- Expand affirmatives; remove false-positive detail triggers
- Cap LLM responses to 30-40 words for voice-appropriate length
- Bump DISTANCE_THRESHOLD to 0.85 for Weaviate compatibility
- Replace magic numbers with named constants
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-ability Community-contributed ability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants