Add scripts for scraping more models#2
Add scripts for scraping more models#2yuqiannemo wants to merge 1 commit intoXtra-Computing:mainfrom
Conversation
There was a problem hiding this comment.
Review
Thanks for adding the model scraping scripts! A few items to address before merging.
1. Output path — write to configs/
The pipeline stores model lists in configs/ following a {source}_llm_list naming convention:
| Existing file | Purpose |
|---|---|
configs/llm_list.txt |
Generic model list (HuggingFace) |
configs/OpenRouter_llm_list.txt |
OpenRouter model list |
configs/llm_metadata.json |
Model metadata |
The new scripts should follow the same convention. Suggested output paths:
closed_source_models.py→configs/openrouter_llm_list.jsonlopen_source_models.py→configs/huggingface_llm_list.jsonl
Use a path relative to the script's location so it works regardless of the working directory:
from pathlib import Path
ROOT = Path(__file__).resolve().parent.parent
output_path = ROOT / "configs" / "openrouter_llm_list.jsonl"2. Add if __name__ == "__main__" guards
Both scripts execute API calls and I/O at module level. This means they cannot be imported without side effects (e.g., for testing or reuse). Please wrap the execution logic:
def main():
# ... existing logic ...
if __name__ == "__main__":
main()3. Add error handling for HTTP requests
closed_source_models.py lines 5–6 — the requests.get() call has no timeout, no status code check, and no exception handling. Suggested fix:
response = requests.get(url, timeout=30)
response.raise_for_status()4. Declare dependencies
requests and huggingface_hub are not currently declared in the project's dependencies. Either:
- Add them as optional dependencies in
pyproject.toml(e.g., under[project.optional-dependencies]), or - Add a comment at the top of each script noting the required packages.
5. modality field is always "unknown"
closed_source_models.py line 43 — architecture.get("modality") does not exist in the OpenRouter API schema. The script already reads input_modalities and output_modalities; consider constructing a modality string from those, e.g.:
input_mods = architecture.get("input_modalities", [])
output_mods = architecture.get("output_modalities", [])
"modality": f"{'|'.join(input_mods)} -> {'|'.join(output_mods)}"Summary of requested changes
| # | Item | Priority |
|---|---|---|
| 1 | Write output to configs/ with consistent naming (openrouter_llm_list.jsonl, huggingface_llm_list.jsonl) |
Required |
| 2 | Add if __name__ == "__main__" guards |
Required |
| 3 | Add timeout and raise_for_status() to HTTP request |
Required |
| 4 | Declare requests / huggingface_hub dependencies |
Required |
| 5 | Fix dead modality field |
Minor |
|
Noted that both scripts should be merged into |
No description provided.