Add LeVo 2 (SongGeneration v2) contrib model#108
Open
jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
Open
Add LeVo 2 (SongGeneration v2) contrib model#108jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
Conversation
Three-stage text-to-music pipeline (LeLM AR + GPT2 diffusion + VAE) supporting v2-medium (2.83B) and v2-large (5.12B) via LeVo2Config. On-device KV cache via ModelBuilder, configurable batch size (B=1..N), GPT2 traced with --auto-cast none for fp32 diffusion accuracy. Validated on trn2.3xlarge (SDK 2.28): GPT2 cosine_sim=1.000, VAE cosine_sim=1.000, SNR=47.9dB, E2E 5s audio in 22.1s.
…ntainer - Replace cosine similarity tests with neuron_allclose() from torch_neuronx.testing.validation (with torch.allclose fallback) - Add Parameters field to README per contrib template requirements - Set maintainer to @jimburtoft - Remove duplicate GPT2 standalone test section - Clean up unused F import
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
LeVo 2 (SongGeneration v2) is a three-stage text-to-music pipeline that generates stereo 48kHz music with vocals from lyrics and text descriptions. This contribution adds Neuron support for both v2-medium (2.83B) and v2-large (5.12B) variants on Trainium2.
The pipeline compiles:
ModelBuilderwith on-device KV cache (torch.scatterin HBM)torch_neuronx.trace()with--auto-cast none(fp32 required for Euler solver accuracy)torch_neuronx.trace()with--auto-cast matmultModel Information
Model Name: LeVo 2 (SongGeneration v2)
Model Architecture: Multi-stage pipeline: Dual-Llama autoregressive LM (primary + secondary with delayed codebook pattern) + GPT2-RoPE CFM diffusion backbone (16L) + Stable Audio VAE decoder
Purpose: Text-to-music generation (lyrics + text description → stereo 48kHz audio with vocals)
Checklist
Please ensure your PR includes the following items. Refer to the contrib/CONTRIBUTING.md for detailed guidelines.
Required Components
Accuracy Test (ex.
test/integration/test_model.py)neuron_allclose()fromtorch_neuronx.testing.validationfor GPT2 and VAE accuracy comparison against CPU reference (atol=1e-3, rtol=1e-2)README.md with the following sections:
Source Code (
src/)src/modeling_levo2.py(1947 lines): Unified pipeline class withLeVo2Configdataclass,v2_medium()/v2_large()factory methods, compile/save/load/warmup/generate/generate_timed APIsrc/__init__.py: ExportsLeVo2Neuron,LeVo2ConfigOptional Components
test/unit/__init__.pypresent (placeholder for future unit tests)Folder Structure
Confirm your contribution follows this structure:
Testing
How did you test this change?
All tests run on a trn2.3xlarge instance (LNC=2, 4 NeuronCores) with Neuron SDK 2.28 (DLAMI 20260227). The standalone test runner compiles all 4 pipeline stages from scratch (~20 min) and runs all accuracy + E2E + performance tests.
Test Results:
Compatibility
Tested with:
Additional Information
--auto-cast none(fp32). The Euler ODE solver amplifies per-step rounding errors exponentially — using--auto-cast matmultcauses cosine similarity to drop to 0.64 vs CPU, producing garbled audio.torch.scatterinregister_bufferkeeps the cache in Neuron HBM without PCIe round-trips during the 1000+ step AR loop.Related Issues
N/A
vLLM Integration
By submitting this PR, I confirm that: