Add scripts for scraping more models by yuqiannemo · Pull Request #2 · Xtra-Computing/LLM-DNA

yuqiannemo · 2026-03-18T13:02:59Z

No description provided.

JerryLife

Overall looks good — a few items to address before merging:

JerryLife

Review

Thanks for adding the model scraping scripts! A few items to address before merging.

1. Output path — write to `configs/`

The pipeline stores model lists in configs/ following a {source}_llm_list naming convention:

Existing file	Purpose
`configs/llm_list.txt`	Generic model list (HuggingFace)
`configs/OpenRouter_llm_list.txt`	OpenRouter model list
`configs/llm_metadata.json`	Model metadata

The new scripts should follow the same convention. Suggested output paths:

closed_source_models.py → configs/openrouter_llm_list.jsonl
open_source_models.py → configs/huggingface_llm_list.jsonl

Use a path relative to the script's location so it works regardless of the working directory:

from pathlib import Path

ROOT = Path(__file__).resolve().parent.parent
output_path = ROOT / "configs" / "openrouter_llm_list.jsonl"

2. Add `if name == "main"` guards

Both scripts execute API calls and I/O at module level. This means they cannot be imported without side effects (e.g., for testing or reuse). Please wrap the execution logic:

def main():
    # ... existing logic ...

if __name__ == "__main__":
    main()

3. Add error handling for HTTP requests

closed_source_models.py lines 5–6 — the requests.get() call has no timeout, no status code check, and no exception handling. Suggested fix:

response = requests.get(url, timeout=30)
response.raise_for_status()

4. Declare dependencies

requests and huggingface_hub are not currently declared in the project's dependencies. Either:

Add them as optional dependencies in pyproject.toml (e.g., under [project.optional-dependencies]), or
Add a comment at the top of each script noting the required packages.

5. `modality` field is always `"unknown"`

closed_source_models.py line 43 — architecture.get("modality") does not exist in the OpenRouter API schema. The script already reads input_modalities and output_modalities; consider constructing a modality string from those, e.g.:

input_mods = architecture.get("input_modalities", [])
output_mods = architecture.get("output_modalities", [])
"modality": f"{'|'.join(input_mods)} -> {'|'.join(output_mods)}"

Summary of requested changes

#	Item	Priority
1	Write output to `configs/` with consistent naming (`openrouter_llm_list.jsonl`, `huggingface_llm_list.jsonl`)	Required
2	Add `if __name__ == "__main__"` guards	Required
3	Add timeout and `raise_for_status()` to HTTP request	Required
4	Declare `requests` / `huggingface_hub` dependencies	Required
5	Fix dead `modality` field	Minor

JerryLife · 2026-03-19T06:43:05Z

Noted that both scripts should be merged into dev branch. Not the main branch.

Add scripts for scraping more models

70ac3ae

JerryLife reviewed Mar 19, 2026

View reviewed changes

JerryLife self-assigned this Mar 19, 2026

JerryLife added the enhancement New feature or request label Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scripts for scraping more models#2

Add scripts for scraping more models#2
yuqiannemo wants to merge 1 commit intoXtra-Computing:mainfrom
yuqiannemo:dna_dev/more_models

yuqiannemo commented Mar 18, 2026

Uh oh!

JerryLife left a comment •

edited

Loading

Uh oh!

JerryLife left a comment •

edited

Loading

Uh oh!

JerryLife commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuqiannemo commented Mar 18, 2026

Uh oh!

JerryLife left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JerryLife left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review

1. Output path — write to configs/

2. Add if __name__ == "__main__" guards

3. Add error handling for HTTP requests

4. Declare dependencies

5. modality field is always "unknown"

Summary of requested changes

Uh oh!

JerryLife commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JerryLife left a comment •

edited

Loading

JerryLife left a comment •

edited

Loading

1. Output path — write to `configs/`

2. Add `if name == "main"` guards

5. `modality` field is always `"unknown"`