Add DINOv3 vision foundation models (ViT + ConvNeXt, 21M-6.7B)#116
Open
jimburtoft wants to merge 3 commits intoaws-neuron:mainfrom
Open
Add DINOv3 vision foundation models (ViT + ConvNeXt, 21M-6.7B)#116jimburtoft wants to merge 3 commits intoaws-neuron:mainfrom
jimburtoft wants to merge 3 commits intoaws-neuron:mainfrom
Conversation
Onboard Meta DINOv3 ViT and ConvNeXt backbones (21M-6.7B params) to Neuron. Two compilation paths: torch_neuronx.trace() for models up to 840M, and neuronx-distributed TP=4 for ViT-7B (first encoder-only vision TP on Neuron). ViT cosine sim 1.000000, ConvNeXt 0.999989, peak DP=4: 722.8 img/s (ViT-S), ViT-7B TP=4: 38.8 img/s at 25.77ms latency.
pretrained=False gives different random weights on each load_dinov3_model call. compile_and_cache must receive the CPU model used for accuracy comparison rather than loading a new one internally.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
NxDI contrib implementation for Meta DINOv3 self-supervised vision foundation models. Supports 7 model variants from 21M to 6.7B parameters across two architectures (ViT and ConvNeXt), using
torch_neuronx.trace()for models up to 840M and tensor parallelism vianeuronx-distributedModelBuilder for ViT-7B (6.7B).Key highlights:
torch_neuronx.trace()for standard models,neuronx-distributedTP for 6.7BModel Information
Model Name: DINOv3 (ViT-S/B/L/H+, ConvNeXt-T/B, ViT-7B)
Model Architecture: Encoder-only vision transformer / ConvNeXt backbone
Purpose: Dense feature extraction (self-supervised vision embeddings)
HuggingFace / Source: https://github.com/facebookresearch/dinov3
License: DINOv3 License (not Apache/MIT -- review before redistribution)
Checklist
Required Components
Accuracy Test (
test/integration/test_model.py)README.md with the following sections:
dinov3package)Source Code (
src/)modeling_dinov3.py(801 lines): Model loading, trace compilation, TP ViT-7B definition, accuracy validation, benchmarking utilitiesOptional Components
Folder Structure
Testing
Instance: trn2.3xlarge (ap-southeast-4), SDK 2.28, DLAMI 20260227
Test command:
source /opt/aws_neuronx_venv_pytorch_inference_vllm_0_13/bin/activate git clone https://github.com/facebookresearch/dinov3.git /mnt/models/dinov3 python -m pytest contrib/models/DINOv3/test/integration/test_model.py -vTest Results: 15/15 PASSED (129 seconds)
Standalone validation:
Compatibility
Tested with:
Additional Information
Benchmark Results (trn2.3xlarge, LNC=2, DP=4)
GPU Comparison (A10G g5.xlarge vs trn2.3xlarge)
Neuron excels on ViT (transformer ops optimized), GPU excels on ConvNeXt (conv ops heavily optimized on CUDA).
Key Design Decisions
torch_neuronx.trace()for models up to 840M,neuronx-distributedTP for ViT-7B--auto-cast=matmultcritical: 50-60% speedup for FP32 models with matmult bf16 autocastpretrained=Falsefor tests: Random weights for architecture validation (avoids large downloads in CI)Known Limitations
By submitting this PR, I confirm that: