Contrib: Solar Open 100B (upstage/Solar-Open-100B) on Trainium2 by jimburtoft · Pull Request #107 · aws-neuron/neuronx-distributed-inference

jimburtoft · 2026-04-05T03:49:01Z

Adds NeuronX Distributed Inference support for upstage/Solar-Open-100B, a 102.6B parameter Mixture-of-Experts model (128 routed + 1 shared expert per layer, top-8 sigmoid routing). The implementation includes custom weight loading for HuggingFace safetensors format, corrected YaRN RoPE, and optimized attention NKI kernel configuration delivering 34% TKG improvement over baseline.

Model Information

Model Name: Solar Open 100B
Model Architecture: Decoder-only MoE transformer (48 layers, 128+1 experts/layer, GQA 64Q/8KV, hidden_size=4096)
Purpose: Text generation (102.6B total params, 12B active per token)

Checklist

Required Components

Accuracy Test (test/integration/test_model.py)
- 4 integration tests: smoke test, logit accuracy (16 tokens, teacher forcing), CTE performance, TKG performance
- Uses check_accuracy_logits_v2 with multi-tiered tolerances (top-5/50/1000/all)
- Zero token divergence vs CPU HuggingFace reference
README.md with the following sections:
- Usage Example: Complete code example with kernel-optimized configuration
- Compatibility Matrix: Tested on trn2.48xlarge with SDK 2.28
- Example Checkpoints: Link to HuggingFace model
- Testing Instructions: Full instructions including CPU reference logit generation
Source Code (src/)
- modeling_solar_open.py (1095 lines): Full inference implementation based on GPT-OSS/DeepSeek-V3 architecture
- Custom weight loading for HF per-expert safetensors to fused format
- Corrected YaRN RoPE implementation (ramp boundaries + interpolation formula)
- GLU activation fix (glu_type="glu" not "swiglu")

Optional Components

Unit Tests (not included in this submission)

Folder Structure

/contrib/models/Solar-Open-100B/
  README.md
  /src
    __init__.py
    modeling_solar_open.py
  /test
    __init__.py
    /unit
      __init__.py
    /integration
      __init__.py
      test_model.py

Testing

How did you test this change?
Tested on trn2.48xlarge (us-east-2) with SDK 2.28 (torch-neuronx 2.9.0.2.12, NxDI 0.8.16251). All 4 tests pass with optimized kernel configuration (fused_qkv=True, qkv_kernel_enabled=True, qkv_nki_kernel_enabled=True).
Test Results:

================================================================================
Solar Open 100B Integration Tests
================================================================================
1. Smoke Test...
PASS: Smoke test - Model loaded successfully
2. Logit Accuracy Test (CPU reference vs Neuron)...
  Reference logits shape: torch.Size([16, 1, 196608])
  Prompt: 'The capital of France is'
PASS: Logit accuracy validated (16 tokens)
3. CTE Performance Test...
  CTE latency: 1565.2 ms (avg of 5 runs)
PASS: CTE latency 1565.2 ms
4. TKG Performance Test...
  TKG latency: 12.0 ms (83.4 tok/s)
PASS: TKG latency 12.0 ms (83.4 tok/s)
================================================================================
All tests passed!
================================================================================

Compatibility

Tested with:

Neuron SDK Version(s): 2.28 (neuronxcc 2.23.6484)
Instance Type(s): trn2.48xlarge (tp=64)
PyTorch Version: 2.9.0
Python Version: 3.12

Additional Information

Performance Highlights

TKG: 11.83 ms / 84.5 tok/s (with attention NKI kernels)
CTE: 1,565 ms at seq_len=4096
34% TKG improvement from attention NKI kernels vs baseline
Maximum seq_len: 32,768; Maximum batch: 4 at seq_len=4096

Known Limitations

MoE NKI kernels cannot be used (intermediate_size/tp = 1280/64 = 20, requires % 128 == 0)
CPU reference logits require transformers >= 5.0 (separate venv from NxDI)
seq_len > 32768 and batch > 4 at seq_len=4096 fail with serialization error

Issues Fixed During Onboarding (5 total)

hidden_act override: Config incorrectly defaulted to "sigmoid" instead of "silu"
HF weight format: Per-expert safetensors needed conversion to fused [E, H, 2*I] format
YaRN RoPE: Inverted ramp boundaries + wrong interpolation formula
glu_type mismatch: Solar Open requires "glu" not "swiglu" (despite SiLU activation)

Related Issues

N/A - New model contribution

vLLM Integration

This model/feature is intended for use with vLLM
Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

I have read and followed the contributing guidelines
This is a community contribution and may have limited testing compared to officially-supported models
The code follows best practices and is well-documented
All required components listed above are included

NeuronX Distributed Inference implementation of upstage/Solar-Open-100B, a 102.6B MoE model (128 routed + 1 shared expert, top-8 sigmoid routing). - TP=64 on trn2.48xlarge, BF16 - Logit validation passes (check_accuracy_logits_v2, 16 tokens) - CTE: 341.7 ms, TKG: 10.2 ms (98 tok/s) - 5 architecture issues fixed during onboarding (hidden_act, weight format, YaRN RoPE, glu_type)

…G improvement) Enable fused_qkv, qkv_kernel_enabled, and qkv_nki_kernel_enabled for 84.5 tok/s TKG (up from 55.8 baseline). Update README with kernel benchmark results, seq_len/batch sweep tables, and revised performance metrics. Increase default seq_len from 128 to 4096. All 4 integration tests pass on trn2.48xlarge with the new config.

jimburtoft added 2 commits April 3, 2026 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contrib: Solar Open 100B (upstage/Solar-Open-100B) on Trainium2#107

Contrib: Solar Open 100B (upstage/Solar-Open-100B) on Trainium2#107
jimburtoft wants to merge 2 commits intoaws-neuron:mainfrom
jimburtoft:contrib/solar-open-100b

jimburtoft commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jimburtoft commented Apr 5, 2026

Model Information

Checklist

Required Components

Optional Components

Folder Structure

Testing

Compatibility

Additional Information

Performance Highlights

Known Limitations

Issues Fixed During Onboarding (5 total)

Related Issues

vLLM Integration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant