Add LeVo 2 (SongGeneration v2) contrib model by jimburtoft · Pull Request #108 · aws-neuron/neuronx-distributed-inference

jimburtoft · 2026-04-06T19:51:03Z

Description

LeVo 2 (SongGeneration v2) is a three-stage text-to-music pipeline that generates stereo 48kHz music with vocals from lyrics and text descriptions. This contribution adds Neuron support for both v2-medium (2.83B) and v2-large (5.12B) variants on Trainium2.

The pipeline compiles:

LeLM (dual-Llama AR): via ModelBuilder with on-device KV cache (torch.scatter in HBM)
GPT2-RoPE diffusion backbone: via torch_neuronx.trace() with --auto-cast none (fp32 required for Euler solver accuracy)
Stable Audio VAE decoder: via torch_neuronx.trace() with --auto-cast matmult

Model Information

Model Name: LeVo 2 (SongGeneration v2)

Model Architecture: Multi-stage pipeline: Dual-Llama autoregressive LM (primary + secondary with delayed codebook pattern) + GPT2-RoPE CFM diffusion backbone (16L) + Stable Audio VAE decoder

Purpose: Text-to-music generation (lyrics + text description → stereo 48kHz audio with vocals)

Checklist

Please ensure your PR includes the following items. Refer to the contrib/CONTRIBUTING.md for detailed guidelines.

Required Components

Accuracy Test (ex. test/integration/test_model.py)
- 6 test classes: TestCompilation (7 tests), TestGPT2Accuracy (2), TestVAEAccuracy (3), TestE2EGeneration (4), TestPerformance (2)
- Uses neuron_allclose() from torch_neuronx.testing.validation for GPT2 and VAE accuracy comparison against CPU reference (atol=1e-3, rtol=1e-2)
- Compiles all 4 pipeline stages on Neuron and runs full E2E generation
README.md with the following sections:
- Usage Example: Complete code example for both v2-medium and v2-large (compile, load prompts, generate, save WAV)
- Compatibility Matrix: trn2.3xlarge with SDK 2.28 validated for both variants
- Example Checkpoints: Links to lglg666/SongGeneration-v2-medium, v2-large, and Runtime on HuggingFace
- Testing Instructions: Full pytest and standalone runner commands with all required env vars
Source Code (src/)
- src/modeling_levo2.py (1947 lines): Unified pipeline class with LeVo2Config dataclass, v2_medium()/v2_large() factory methods, compile/save/load/warmup/generate/generate_timed API
- src/__init__.py: Exports LeVo2Neuron, LeVo2Config

Optional Components

Unit Tests (CPU or Neuron-based)
- test/unit/__init__.py present (placeholder for future unit tests)

Folder Structure

Confirm your contribution follows this structure:

/contrib/models/LeVo-2-SongGeneration/
  README.md
  /src
    __init__.py
    modeling_levo2.py
  /test
    __init__.py
    /unit
      __init__.py
    /integration
      __init__.py
      test_model.py

Testing

How did you test this change?

All tests run on a trn2.3xlarge instance (LNC=2, 4 NeuronCores) with Neuron SDK 2.28 (DLAMI 20260227). The standalone test runner compiles all 4 pipeline stages from scratch (~20 min) and runs all accuracy + E2E + performance tests.

Test Results:

[1/5] Building pipeline...
  PASS: Pipeline compiled and loaded

[2/5] Testing GPT2 accuracy...
  GPT2 neuron_allclose: True PASS

[3/5] Testing VAE accuracy...
  VAE neuron_allclose: True PASS
  VAE SNR: 47.9 dB PASS

[4/5] Testing E2E generation...
  Audio shape: torch.Size([1, 2, 240000])
  Audio range: [-1.4735, 1.3328]
  Audio std: 0.164790
  Audio RMS: 2780
  PASS: Audio is valid

[5/5] Performance results...
  LeLM: 21.7s (1327 steps, 16.4 ms/step)
  Diffusion: 0.281s
  VAE: 0.071s
  Total: 22.1s
  RTF: 4.42x

Compatibility

Tested with:

Neuron SDK Version(s): 2.28 (neuronx-cc 2.22, neuronx-distributed 0.16)
Instance Type(s): trn2.3xlarge (LNC=2)
PyTorch Version: 2.9.0
Python Version: 3.12

Additional Information

GPT2 diffusion backbone must use --auto-cast none (fp32). The Euler ODE solver amplifies per-step rounding errors exponentially — using --auto-cast matmult causes cosine similarity to drop to 0.64 vs CPU, producing garbled audio.
On-device KV cache via torch.scatter in register_buffer keeps the cache in Neuron HBM without PCIe round-trips during the 1000+ step AR loop.
Prefill optimization: 952-token conditioning prefix processed in one NEFF call before token-by-token decode.
Batch size is configurable (B=1..N with CFG doubling). B=2 delivers 1.44x throughput improvement per NeuronCore.

Related Issues

N/A

vLLM Integration

This model/feature is intended for use with vLLM
Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

I have read and followed the contributing guidelines
This is a community contribution and may have limited testing compared to officially-supported models
The code follows best practices and is well-documented
All required components listed above are included

Three-stage text-to-music pipeline (LeLM AR + GPT2 diffusion + VAE) supporting v2-medium (2.83B) and v2-large (5.12B) via LeVo2Config. On-device KV cache via ModelBuilder, configurable batch size (B=1..N), GPT2 traced with --auto-cast none for fp32 diffusion accuracy. Validated on trn2.3xlarge (SDK 2.28): GPT2 cosine_sim=1.000, VAE cosine_sim=1.000, SNR=47.9dB, E2E 5s audio in 22.1s.

@jimburtoft

…ntainer - Replace cosine similarity tests with neuron_allclose() from torch_neuronx.testing.validation (with torch.allclose fallback) - Add Parameters field to README per contrib template requirements - Set maintainer to @jimburtoft - Remove duplicate GPT2 standalone test section - Clean up unused F import

sdeeptan-aws

LGTM

jimburtoft marked this pull request as draft April 6, 2026 20:02

jimburtoft marked this pull request as ready for review April 6, 2026 20:17

sdeeptan-aws approved these changes Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LeVo 2 (SongGeneration v2) contrib model#108

Add LeVo 2 (SongGeneration v2) contrib model#108
jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
jimburtoft:contrib/levo2-songgeneration

jimburtoft commented Apr 6, 2026 •

edited

Loading

Uh oh!

sdeeptan-aws left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jimburtoft commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Model Information

Checklist

Required Components

Optional Components

Folder Structure

Testing

Compatibility

Additional Information

Related Issues

vLLM Integration

Uh oh!

sdeeptan-aws left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jimburtoft commented Apr 6, 2026 •

edited

Loading