feat: semantic icon search with VLM descriptions and embeddings#117
Draft
mmacpherson wants to merge 4 commits intomainfrom
Draft
feat: semantic icon search with VLM descriptions and embeddings#117mmacpherson wants to merge 4 commits intomainfrom
mmacpherson wants to merge 4 commits intomainfrom
Conversation
Add embedding-based search so users can find Lucide icons by natural
language queries ("payment", "hard work", "owl") instead of exact name
matching.
Architecture:
- Gemini 2.5 Flash Lite generates rich text descriptions from rendered
icon PNGs + Lucide metadata (tags, categories) at build time
- nomic-embed-text-v1.5-Q computes embeddings with asymmetric
search_query/search_document prefixes
- Descriptions saved as JSONL (durable source of truth), embeddings
stored in a separate SQLite search DB
- Search DB auto-downloaded on first use, cached locally
- Lucide repo auto-cloned for metadata during description generation
New modules:
- search.py: public API (search_icons, search_available, SearchResult)
- build_search.py: VLM + embedding build pipeline with JSONL intermediate
- build_clusters.py: HDBSCAN clustering + Gemini Flash theme naming
- cli.py: unified `lucide` CLI with subcommands (db, describe,
build-search, search, cluster, version)
Main DB now includes relational metadata:
- icon_tags (12,619 rows), icon_categories (3,309), icon_aliases (248)
- All indexed for fast lookup
CLI search with inline icon rendering via Kitty graphics protocol
(Ghostty, kitty, WezTerm) with white background for visibility.
Optional extra: pip install python-lucide[search]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-built artifacts for semantic search: - gemini-icon-descriptions.jsonl: VLM descriptions for all 1,703 icons (Gemini 2.5 Flash Lite, prompt hash 30eb8a53d63e) - lucide-search.db: embeddings (nomic-embed-text-v1.5-Q, 768d) + descriptions + 88 HDBSCAN cluster assignments with Gemini-named themes - lucide-icons.db: rebuilt with relational metadata tables (icon_tags, icon_categories, icon_aliases) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dev scripts and artifacts for exploring the icon embedding space: - UMAP + HDBSCAN clustering discovers 88 semantic themes - Gemini Flash names clusters from icon names alone - Interactive Plotly HTML visualizations (category map + cluster map) - Quality test script for validating search results - beads issue tracker initialized with follow-up items Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Test `lucide search` with a mock search DB - Test `lucide search` with nonexistent DB returns error - Verify cluster data is loaded into search DB by build_search_db - 62 tests total, all passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds embedding-based semantic search so users can find Lucide icons by natural language queries instead of exact name matching.
search_icons("payment")→ dollar-sign, banknote, receipt, credit-card...search_icons("a cozy cabin in the woods")→ tent-tree, armchair, tree-deciduous...search_icons("ennui")→ annoyed, meh, frown...How it works
search_query:/search_document:prefixesWhat's included
search.py: Public API —search_icons(),search_available(),SearchResultbuild_search.py: VLM + embedding pipeline with JSONL as durable intermediatebuild_clusters.py: HDBSCAN discovers 88 semantic themes, Gemini Flash names themlucideCLI: Unified subcommands —db,describe,build-search,search,cluster,versionicon_tags(12,619),icon_categories(3,309),icon_aliases(248) in main DBInstall & try
Optional extra keeps base package lightweight
Search DB (~8MB) downloads on first use and is cached in
~/.cache/python-lucide/.Test plan
🤖 Generated with Claude Code