Add Gemma4-E2B contrib model (text decoder + VLM)#115
Draft
jimburtoft wants to merge 1 commit intoaws-neuron:mainfrom
Draft
Add Gemma4-E2B contrib model (text decoder + VLM)#115jimburtoft wants to merge 1 commit intoaws-neuron:mainfrom
jimburtoft wants to merge 1 commit intoaws-neuron:mainfrom
Conversation
NxDI implementation of google/gemma-4-E2B (2.3B effective params) with: - Text decoder: PLE, KV cache sharing, heterogeneous SWA/global attention - Vision encoder and VLM wrapper (compilation blocked by NCC_ITEN404) - NxDI 0.7/0.8 compatibility helpers - Integration tests (text-only)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
google/gemma-4-E2B(2.3B effective params)Model Architecture
Gemma4-E2B is a decoder-only transformer with several novel features for NxDI:
Text Decoder Results (TP=1, batch=1, trn2.3xlarge)
Known Limitation
VLM (text + vision) compilation fails with
NCC_ITEN404inneuronx-cc2.23. The error occurs in theTensorInitializationtensorizer pass when compiling the context encoding NEFF with vision inputs. Text-only inference is unaffected. The VLM code is included and architecturally complete -- ready to enable once the compiler issue is resolved.Files
modeling_gemma4_e2b.pymodeling_gemma4_e2b_vlm.pymodeling_gemma4_vision.pyndxi_patch.pytest_model.pyTesting
# On a trn2.3xlarge with model weights at /mnt/models/gemma-4-E2B/ pytest contrib/models/gemma-4-E2B/test/integration/test_model.py --capture=tee-sys