Add Qwen2.5-VL-7B-Instruct full vision-language contrib model#110
Open
jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
Open
Add Qwen2.5-VL-7B-Instruct full vision-language contrib model#110jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
Conversation
The existing Qwen2.5-VL-3B and VL-32B contrib src/ directories had issues: - VL-3B: missing source files (mrope.py, config_qwen2vl.py), used 1D RoPE instead of M-RoPE, no vision encoder, 67% token match - VL-32B: used 1D RoPE instead of M-RoPE, no vision encoder, 0% token match The unified VL-7B implementation already supports all sizes (3B, 7B, 32B, 72B) via config-driven parameterization. Validated: 3B at 104.3 tok/s, 72B at 44.3 tok/s. Replace broken src/test with redirect READMEs that provide size-specific TP guidance and point to the unified implementation.
Contributor
Author
|
I've added a commit that cleans up the existing What changed:
Why: All Qwen2.5-VL sizes share the identical architecture -- the only differences are numeric config parameters (hidden_size, num_layers, etc.) that are read from the HuggingFace If you'd prefer to keep this cleanup as a separate PR, I'm happy to split it out. Just let me know. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: The below template includes items meant for model contributions only. For other contributions such as bug fixes, features, etc., only fill out the relevant portions of the form.
Description
Full vision-language implementation of Qwen2.5-VL-7B-Instruct on NxD Inference. Unlike existing Qwen2.5-VL contrib entries (3B, 32B) which only support the text backbone, this implementation provides complete vision-language inference including the vision encoder with windowed attention.
NxDI has built-in support for
qwen2_vlandqwen3_vl, but skipped theqwen2_5_vlgeneration entirely.Key highlights:
Model Information
Model Name: Qwen2.5-VL-7B-Instruct
Model Architecture: Vision-Language model with ViT vision encoder + decoder-only transformer text backbone. GQA (28Q/4KV heads), M-RoPE [16,24,24], SwiGLU MLP. Vision encoder uses hybrid windowed (28 layers) + global (4 layers) attention with RMSNorm and Gated SwiGLU MLP.
Purpose: Vision-language inference (image understanding, image-to-text generation)
Checklist
Please ensure your PR includes the following items. Refer to the contrib/CONTRIBUTING.md for detailed guidelines.
Required Components
Accuracy Test (ex.
test/integration/test_model.py)README.md with the following sections:
Source Code (
src/)Optional Components
test/unit/directoryFolder Structure
Confirm your contribution follows this structure:
Testing
How did you test this change?
All 7 integration tests were run on trn2.3xlarge (TP=4, LNC=2) with Neuron SDK 2.28. The 72B model was tested on trn2.48xlarge (TP=32). Tests include:
logit_validation()fromneuronx_distributed_inference.experimental.core.accuracy.logit_validation-- all 8 tokens matched, max K5 error 0.0070 (threshold 0.01)Test Results:
Compatibility
Tested with:
Additional Information
Performance (TP=4, trn2.3xlarge, optimized config)
Multi-size validation
NKI Kernel Compatibility (7B text decoder)
Known Limitations
fix/qwen3-vl-batch-size-gt1-v2Related Issues
vLLM Integration
Validated on both vllm-neuron 0.4.1 and 0.5.0 (6/6 API tests passed on each). Patch scripts included (
patch_vllm_qwen25vl.pyfor 0.4.1,patch_vllm_050_qwen25vl.pyfor 0.5.0). Patches add Qwen2.5-VL to 4 files: constants.py, model_loader.py, model_runner.py, and NxDI constants.py.For vLLM integration details, see: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/onboarding-models.html#nxdi-onboarding-models-vllm
By submitting this PR, I confirm that: