Skip to content

[Feature] Add grammar bundle generation API for PPL language features#5162

Open
mengweieric wants to merge 21 commits intoopensearch-project:mainfrom
mengweieric:feature/grammar-bundle
Open

[Feature] Add grammar bundle generation API for PPL language features#5162
mengweieric wants to merge 21 commits intoopensearch-project:mainfrom
mengweieric:feature/grammar-bundle

Conversation

@mengweieric
Copy link
Collaborator

@mengweieric mengweieric commented Feb 20, 2026

Description

Implements the backend grammar metadata API for PPL autocomplete support.

This endpoint serves a versioned grammar bundle containing ANTLR metadata required for downstream consumers (for example OpenSearch Dashboards) to reconstruct a functional PPL lexer/parser at runtime using antlr4ng interpreter APIs. This enables full client-side parsing/autocomplete with zero per-keystroke server calls, while keeping backend grammar as the source of truth.

What the bundle contains:

  • Serialized lexer and parser ATNs via ATNSerializer.serialize().toArray(), compatible with antlr4ng ATNDeserializer.deserialize()
  • Token vocabulary (literal/symbolic names), lexer/parser rule names, channel names, and mode names for interpreter reconstruction
  • tokenDictionary, ignoredTokens, and rulesToVisit for autocomplete behavior
  • grammarHash (ATNs + lexer/parser rule names + literal/symbolic vocabulary + ANTLR version) for client-side change detection
  • bundleVersion and antlrVersion for compatibility validation

Backend behavior:

  • Synchronized lazy initialization (bundle built on first request and cached for node lifecycle)
  • Deterministic output (same plugin JAR -> same bundle shape/content)
  • Endpoint marked @ExperimentalApi
  • Grammar endpoint now performs authorization via PPL transport action before serving bundle data (aligns with existing PPL permission model)

Also included:

  • REST API spec registration for ppl.grammar + YAML REST response-shape test
  • Unit tests for grammar REST handler (including authorization-failure path)
  • Security integration tests for grammar endpoint access with/without PPL permission
  • THIRD-PARTY updated to reflect ANTLR 4.13.2

Related Issues

Resolves #5218

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 20, 2026

📝 Walkthrough

Walkthrough

Adds a new GET endpoint /_plugins/_ppl/_grammar that serves a cached, serialized ANTLR grammar bundle; introduces GrammarBundle and PPLGrammarBundleBuilder, a RestPPLGrammarAction handler with tests, unit tests for the builder, and updates ANTLR third‑party metadata to 4.13.2.

Changes

Cohort / File(s) Summary
Third‑party notices
THIRD-PARTY
Updated ANTLR version from 4.7.1 to 4.13.2 and copyright year range from 2012-2017 to 2012-2024 in license headers.
Plugin registration
plugin/src/main/java/org/opensearch/sql/plugin/SQLPlugin.java
Imported and registered RestPPLGrammarAction to expose the new PPL grammar endpoint.
REST handler
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLGrammarAction.java
New REST handler serving GET /_plugins/_ppl/_grammar with lazy thread‑safe caching (double‑checked locking), JSON serialization via XContentBuilder, error handling (500), and test hooks (buildBundle, invalidateCache).
REST handler tests
plugin/src/test/java/org/opensearch/sql/plugin/rest/RestPPLGrammarActionTest.java
Unit tests for route/name, successful response structure, caching behavior, cache invalidation, and error path (500). Includes MockRestChannel to capture responses.
Grammar data model
ppl/src/main/java/org/opensearch/sql/ppl/autocomplete/GrammarBundle.java
New immutable Lombok @Value/@Builder container for serialized ANTLR grammar data (bundleVersion, antlrVersion, grammarHash, lexer/parser ATNs, rule names, channels, modes, vocabulary).
Bundle builder
ppl/src/main/java/org/opensearch/sql/ppl/autocomplete/PPLGrammarBundleBuilder.java
New builder that instantiates ANTLR lexer/parser, serializes ATNs, extracts token/rule/channel/mode names, computes SHA‑256 grammar hash, and constructs GrammarBundle.
Builder tests
ppl/src/test/java/org/opensearch/sql/ppl/autocomplete/PPLGrammarBundleBuilderTest.java
Tests validating bundle contents, deterministic builds, hash format, start rule resolution, and non‑empty ATNs/names.
Manifest
pom.xml
Minor manifest changes related to dependency/metadata updates (ANTLR bump).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Handler as RestPPLGrammarAction
    participant Cache
    participant Builder as PPLGrammarBundleBuilder
    participant Serializer as XContentBuilder

    Client->>Handler: GET /_plugins/_ppl/_grammar
    Handler->>Cache: check cached bundle
    alt cache hit
        Cache-->>Handler: return Bundle
    else cache miss
        Handler->>Builder: buildBundle()
        Builder->>Builder: inspect lexer/parser, serialize ATNs, compute hash
        Builder-->>Handler: Bundle
        Handler->>Cache: store Bundle
    end
    Handler->>Serializer: serialize Bundle to JSON
    Serializer-->>Handler: JSON payload
    Handler-->>Client: HTTP 200 + JSON
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • penghuo
  • GumpacG
  • Swiddis
  • dai-chen
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main feature being added: a grammar bundle generation API for PPL language autocomplete support, which aligns with the primary changeset focus.
Description check ✅ Passed The pull request description clearly outlines the implementation of a backend grammar metadata API for PPL autocomplete support, detailing bundle contents, backend behavior, and related updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mengweieric mengweieric changed the title Feature/grammar bundle [Feature] Add grammar bundle generation for PPL autocomplete features Feb 20, 2026
@mengweieric mengweieric force-pushed the feature/grammar-bundle branch from d288392 to eabe8ec Compare February 20, 2026 23:00
@mengweieric mengweieric added PPL Piped processing language feature labels Feb 20, 2026
@mengweieric mengweieric changed the title [Feature] Add grammar bundle generation for PPL autocomplete features [Feature] Add grammar bundle generation API for PPL language features Feb 20, 2026
@mengweieric mengweieric force-pushed the feature/grammar-bundle branch 3 times, most recently from 3f36846 to c838750 Compare February 23, 2026 06:49
@mengweieric mengweieric marked this pull request as ready for review February 23, 2026 07:02
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

Persistent review updated to latest commit da140ef

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…y, and tests

- Hash full 32-bit ints in grammarHash to avoid collisions with ANTLR 4.13.2 ATN serialization
- Use RuntimeMetaData.getRuntimeVersion() instead of unreliable JAR manifest lookup
- Make GrammarBundle immutable with @value instead of @DaTa
- Update THIRD-PARTY to reflect ANTLR 4.13.2
- Harden tests with JSON parsing and add antlrVersion assertion

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Assert ATN serialization version 4 for both lexer and parser to enforce
  antlr4ng compatibility contract
- Resolve startRuleIndex by looking up "root" rule name instead of hardcoding 0
- Fix MockRestChannel.bytesOutput() to return real BytesStreamOutput
- Document nullable elements in literalNames/symbolicNames Javadoc
- Rename test methods to follow testXxx() convention per ppl/plugin modules

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Consistent with buildBundle() which is also @VisibleForTesting protected.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric force-pushed the feature/grammar-bundle branch from da140ef to 090171a Compare March 19, 2026 23:33
@mengweieric mengweieric requested a review from ahkcs as a code owner March 19, 2026 23:33
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 090171a

@mengweieric mengweieric force-pushed the feature/grammar-bundle branch from 090171a to fbd8982 Compare March 19, 2026 23:34
@github-actions
Copy link
Contributor

Persistent review updated to latest commit fbd8982

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric force-pushed the feature/grammar-bundle branch from fbd8982 to 7cac6ab Compare March 20, 2026 00:29
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 7cac6ab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add PPL Grammar Bundle API for Backend-Driven Autocomplete

5 participants