Skip to content

Add LeVo 2 (SongGeneration v2) contrib model#108

Open
jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
jimburtoft:contrib/levo2-songgeneration
Open

Add LeVo 2 (SongGeneration v2) contrib model#108
jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
jimburtoft:contrib/levo2-songgeneration

Conversation

@jimburtoft
Copy link
Copy Markdown
Contributor

@jimburtoft jimburtoft commented Apr 6, 2026

Description

LeVo 2 (SongGeneration v2) is a three-stage text-to-music pipeline that generates stereo 48kHz music with vocals from lyrics and text descriptions. This contribution adds Neuron support for both v2-medium (2.83B) and v2-large (5.12B) variants on Trainium2.

The pipeline compiles:

  • LeLM (dual-Llama AR): via ModelBuilder with on-device KV cache (torch.scatter in HBM)
  • GPT2-RoPE diffusion backbone: via torch_neuronx.trace() with --auto-cast none (fp32 required for Euler solver accuracy)
  • Stable Audio VAE decoder: via torch_neuronx.trace() with --auto-cast matmult

Model Information

Model Name: LeVo 2 (SongGeneration v2)

Model Architecture: Multi-stage pipeline: Dual-Llama autoregressive LM (primary + secondary with delayed codebook pattern) + GPT2-RoPE CFM diffusion backbone (16L) + Stable Audio VAE decoder

Purpose: Text-to-music generation (lyrics + text description → stereo 48kHz audio with vocals)

Checklist

Please ensure your PR includes the following items. Refer to the contrib/CONTRIBUTING.md for detailed guidelines.

Required Components

  • Accuracy Test (ex. test/integration/test_model.py)

    • 6 test classes: TestCompilation (7 tests), TestGPT2Accuracy (2), TestVAEAccuracy (3), TestE2EGeneration (4), TestPerformance (2)
    • Uses neuron_allclose() from torch_neuronx.testing.validation for GPT2 and VAE accuracy comparison against CPU reference (atol=1e-3, rtol=1e-2)
    • Compiles all 4 pipeline stages on Neuron and runs full E2E generation
  • README.md with the following sections:

    • Usage Example: Complete code example for both v2-medium and v2-large (compile, load prompts, generate, save WAV)
    • Compatibility Matrix: trn2.3xlarge with SDK 2.28 validated for both variants
    • Example Checkpoints: Links to lglg666/SongGeneration-v2-medium, v2-large, and Runtime on HuggingFace
    • Testing Instructions: Full pytest and standalone runner commands with all required env vars
  • Source Code (src/)

    • src/modeling_levo2.py (1947 lines): Unified pipeline class with LeVo2Config dataclass, v2_medium()/v2_large() factory methods, compile/save/load/warmup/generate/generate_timed API
    • src/__init__.py: Exports LeVo2Neuron, LeVo2Config

Optional Components

  • Unit Tests (CPU or Neuron-based)
    • test/unit/__init__.py present (placeholder for future unit tests)

Folder Structure

Confirm your contribution follows this structure:

/contrib/models/LeVo-2-SongGeneration/
  README.md
  /src
    __init__.py
    modeling_levo2.py
  /test
    __init__.py
    /unit
      __init__.py
    /integration
      __init__.py
      test_model.py

Testing

How did you test this change?

All tests run on a trn2.3xlarge instance (LNC=2, 4 NeuronCores) with Neuron SDK 2.28 (DLAMI 20260227). The standalone test runner compiles all 4 pipeline stages from scratch (~20 min) and runs all accuracy + E2E + performance tests.

Test Results:

[1/5] Building pipeline...
  PASS: Pipeline compiled and loaded

[2/5] Testing GPT2 accuracy...
  GPT2 neuron_allclose: True PASS

[3/5] Testing VAE accuracy...
  VAE neuron_allclose: True PASS
  VAE SNR: 47.9 dB PASS

[4/5] Testing E2E generation...
  Audio shape: torch.Size([1, 2, 240000])
  Audio range: [-1.4735, 1.3328]
  Audio std: 0.164790
  Audio RMS: 2780
  PASS: Audio is valid

[5/5] Performance results...
  LeLM: 21.7s (1327 steps, 16.4 ms/step)
  Diffusion: 0.281s
  VAE: 0.071s
  Total: 22.1s
  RTF: 4.42x

Compatibility

Tested with:

  • Neuron SDK Version(s): 2.28 (neuronx-cc 2.22, neuronx-distributed 0.16)
  • Instance Type(s): trn2.3xlarge (LNC=2)
  • PyTorch Version: 2.9.0
  • Python Version: 3.12

Additional Information

  • GPT2 diffusion backbone must use --auto-cast none (fp32). The Euler ODE solver amplifies per-step rounding errors exponentially — using --auto-cast matmult causes cosine similarity to drop to 0.64 vs CPU, producing garbled audio.
  • On-device KV cache via torch.scatter in register_buffer keeps the cache in Neuron HBM without PCIe round-trips during the 1000+ step AR loop.
  • Prefill optimization: 952-token conditioning prefix processed in one NEFF call before token-by-token decode.
  • Batch size is configurable (B=1..N with CFG doubling). B=2 delivers 1.44x throughput improvement per NeuronCore.

Related Issues

N/A

vLLM Integration

  • This model/feature is intended for use with vLLM
  • Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

  • I have read and followed the contributing guidelines
  • This is a community contribution and may have limited testing compared to officially-supported models
  • The code follows best practices and is well-documented
  • All required components listed above are included

Three-stage text-to-music pipeline (LeLM AR + GPT2 diffusion + VAE)
supporting v2-medium (2.83B) and v2-large (5.12B) via LeVo2Config.

On-device KV cache via ModelBuilder, configurable batch size (B=1..N),
GPT2 traced with --auto-cast none for fp32 diffusion accuracy.

Validated on trn2.3xlarge (SDK 2.28): GPT2 cosine_sim=1.000,
VAE cosine_sim=1.000, SNR=47.9dB, E2E 5s audio in 22.1s.
@jimburtoft jimburtoft marked this pull request as draft April 6, 2026 20:02
…ntainer

- Replace cosine similarity tests with neuron_allclose() from
  torch_neuronx.testing.validation (with torch.allclose fallback)
- Add Parameters field to README per contrib template requirements
- Set maintainer to @jimburtoft
- Remove duplicate GPT2 standalone test section
- Clean up unused F import
@jimburtoft jimburtoft marked this pull request as ready for review April 6, 2026 20:17
Copy link
Copy Markdown
Contributor

@sdeeptan-aws sdeeptan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants