English | 简体中文
KISS: Rules + ML + optional LLM, offline-ready by default. Unified title emoji cleanup, powerful deduplication, outputs HTML/Markdown/JSON.
- Rules first, ML/semantic assist, optional LLM integration (auto-fallback on failure)
- Unified title cleaning to avoid stacked emoji prefixes
- Always-on deduplication for stable cross-browser export merging
- Output classification limited to two levels for cleaner results
python -m pip install --user pipx
python -m pipx ensurepath
pipx install .Two commands available after installation:
cleanbook: Command-line processing (equivalent topython main.py)cleanbook-wizard: Interactive wizard experience
cleanbook -i examples/demo_bookmarks.html -o output
cleanbook -i "tests/input/*.html" --train
cleanbook-wizardCommon flags: --workers parallel, --train train ML, --no-ml disable ML, --health-check reachability check.
Edit config.json to enable:
"llm": {
"enable": true,
"provider": "openai",
"base_url": "https://api.openai.com",
"model": "gpt-4o-mini",
"api_key_env": "OPENAI_API_KEY"
}Set environment variable:
$env:OPENAI_API_KEY = "your_api_key"Falls back to offline classification when key is unset or API fails.
With organizer.enable, a secondary LLM pass clusters, sorts and summarizes categories after classification.
.
├─ src/
│ ├─ cleanbook/ # Unified CLI wrapper
│ │ └─ cli.py
│ ├─ ai_classifier.py # Rules + ML + semantic + user profile + LLM
│ ├─ enhanced_classifier.py
│ ├─ enhanced_clean_tidy.py
│ ├─ bookmark_processor.py
│ ├─ emoji_cleaner.py # Title emoji cleaning
│ └─ ...
├─ models/ # Models & cache
├─ examples/
├─ docs/
├─ config.json
├─ main.py # Top-level entry
├─ pyproject.toml # Packaging & CLI entry points
└─ changelog/
- Local/Team:
pipx install .for isolated global commands - Open Source: GitHub Release with example data; optionally publish to PyPI
- Windows standalone: Optional PyInstaller single-file EXE
MIT — see LICENSE.