Add SongPrep-7B contrib model by jimburtoft · Pull Request #118 · aws-neuron/neuronx-distributed-inference

jimburtoft · 2026-04-09T04:26:24Z

Note: The below template includes items meant for model contributions only. For other contributions such as bug fixes, features, etc., only fill out the relevant portions of the form.

Description

Two-stage pipeline for song structure parsing and lyrics transcription with timestamps. SongPrep-7B converts audio waveforms into structured lyrics with section labels ([verse], [chorus], etc.) and timestamps using a MuCodec audio encoder (329.5M params, FP32 Wav2Vec2-Conformer + RVQ) followed by a Qwen2 7B decoder (BF16).

Key implementation details:

MuCodec encoder uses a split pipeline: CPU MelSTFT preprocessing (torch.stft not traceable due to overlapping window strides) + Neuron Conformer+RVQ via torch_neuronx.trace() with --auto-cast=matmult
Qwen2 decoder compiled via NxD Inference with on_device_sampling_config=None (extended vocabulary of 168,040 tokens exceeds on-device sampling NKI kernel limit)

Model Information

Model Name: SongPrep-7B

Model Architecture: MuCodec audio encoder (Wav2Vec2-Conformer + 1-RVQ) + Qwen2 decoder (GQA, RoPE, SiLU)

Purpose: Audio-to-text: song structure parsing and lyrics transcription with timestamps

Checklist

Please ensure your PR includes the following items. Refer to the contrib/CONTRIBUTING.md for detailed guidelines.

Required Components

Accuracy Test (ex. test/integration/test_model.py)
- MuCodec encoder: codec token match rate (Neuron vs CPU)
- Qwen2 decoder: token-level match with greedy decoding
- End-to-end pipeline: structural validity and timing
- Tests run on trn2.3xlarge
README.md with the following sections:
- Usage Example: Step-by-step trace, compile, and run pipeline
- Compatibility Matrix: trn2.3xlarge validated with SDK 2.27
- Example Checkpoints: tencent/SongPrep-7B on HuggingFace
- Testing Instructions: pytest command with environment variables
Source Code (src/)
- modeling_songprep.py: MuCodec tracing, Qwen2 NxDI config, SongPrepPipeline class
- Follows contrib folder hierarchy

Optional Components

Unit Tests (CPU or Neuron-based)
- Not included (unit/ directory created but empty)

Folder Structure

Confirm your contribution follows this structure:

/contrib/models/SongPrep-7B/
  README.md
  /src
    __init__.py
    modeling_songprep.py
  /test
    __init__.py
    /unit
      __init__.py
    /integration
      __init__.py
      test_model.py

Testing

How did you test this change?

All tests were run on a trn2.3xlarge instance (LNC=2, 4 logical cores) in sa-east-1 with Neuron SDK 2.27 and the Deep Learning AMI Neuron (Ubuntu 24.04).

Test Results:

MuCodec encoder: 96.8% codec token match (Neuron vs CPU, 250 tokens from 10s audio)
Qwen2 decoder: 100% token match (first 200 tokens, greedy decoding, Neuron vs CPU BF16)
MuCodec latency: 89-244ms for 10-60s audio (112-246x realtime)
Qwen2 throughput: 21-26 tok/s
End-to-end: structurally valid output with section tags and timestamps

Compatibility

Tested with:

Neuron SDK Version(s): 2.27
Instance Type(s): trn2.3xlarge
PyTorch Version: 2.9
Python Version: 3.12

Additional Information

This is the first contrib model to use torch_neuronx.trace() for a component (MuCodec encoder) alongside NxD Inference for the decoder. The split pipeline pattern (CPU preprocessing + Neuron backbone) may be useful for other audio/speech models with non-traceable preprocessing stages.

Known limitation: the SongPrep source repository must be cloned separately for MuCodec model definitions (not packaged as a pip-installable library).

Related Issues

None

vLLM Integration

This model/feature is intended for use with vLLM
Documentation includes vLLM registration instructions

vLLM integration is blocked by the extended vocabulary exceeding the on-device sampling NKI kernel limit. NxD Inference direct mode is used instead.

By submitting this PR, I confirm that:

I have read and followed the contributing guidelines
This is a community contribution and may have limited testing compared to officially-supported models
The code follows best practices and is well-documented
All required components listed above are included

Add SongPrep-7B contrib model

6fcc156

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SongPrep-7B contrib model#118

Add SongPrep-7B contrib model#118
jimburtoft wants to merge 1 commit intoaws-neuron:mainfrom
jimburtoft:contrib/songprep-7b

jimburtoft commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jimburtoft commented Apr 9, 2026

Description

Model Information

Checklist

Required Components

Optional Components

Folder Structure

Testing

Compatibility

Additional Information

Related Issues

vLLM Integration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant