-
Notifications
You must be signed in to change notification settings - Fork 70
feat(derivation): batch verification, L1 reorg detection, and rollback-based recovery #907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
curryxbo
wants to merge
10
commits into
main
Choose a base branch
from
feat/derivation-batch-verification-reorg-detection
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
8e559f6
feat(derivation): add batch verification and L1 reorg detection
0c48f5c
fix(derivation): address code review feedback
f9a2578
fix(derivation): address round-2 review feedback
675e8fb
fix(derivation): prevent derivation height advancing when L1 block re…
fb3e1ce
fix(derivation): address round-3 review — halt on mismatch, optimize …
3c960cf
feat(derivation): add halted gauge metric for production alerting
ca81ed7
fix(derivation): guard against nil lastHeader panic on empty batch, f…
182b8d1
fix(derivation): add nil check for lastHeader on re-derive path, docu…
46edfa1
docs(derivation): add language tags to fenced code blocks (MD040)
a45705c
refactor(derivation): split new logic into verify.go and reorg.go
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,163 @@ | ||
| # Derivation Refactor: Batch Verification & L1 Reorg Detection | ||
|
|
||
| ## Background | ||
|
|
||
| The derivation module is the core component that syncs L2 state from L1 batch data. Previously it only ran on validator nodes and used a challenge mechanism when state mismatches were detected. This refactor makes two fundamental changes: | ||
|
|
||
| 1. **L1 batch data is the source of truth** — when local L2 blocks don't match L1 batch data, roll back and re-derive from L1 instead of issuing a challenge. | ||
| 2. **Support `latest` mode** for fetching L1 batches (instead of only `finalized`), with L1 reorg detection to handle the reduced confirmation window. | ||
|
|
||
| ## Design Principles | ||
|
|
||
| - **L2 rollback is only triggered by batch data mismatch**, never by L1 reorg alone. | ||
| - L1 reorg → clean up DB → re-derive from reorg point → batch comparison decides if L2 needs rollback. | ||
| - Most L1 reorgs just re-include the same batch tx in a different block — L2 stays valid. | ||
| - **Derivation can run as a verification thread** — when blocks already exist locally (e.g. produced by sequencer), derivation compares them against L1 batch data instead of skipping. | ||
|
|
||
| ## What Changed | ||
|
|
||
| ### Removed | ||
|
|
||
| | Item | Reason | | ||
| |------|--------| | ||
| | `validator` field in `Derivation` struct | Challenge mechanism removed | | ||
| | `validator.Validator` parameter in `NewDerivationClient()` | No longer needed | | ||
| | `ChallengeState` / `ChallengeEnable` logic in `derivationBlock()` | Replaced by rollback + re-derive | | ||
| | `validator` import in `node/cmd/node/main.go` | No longer referenced | | ||
|
|
||
| ### Added — L1 Reorg Detection | ||
|
|
||
| When `confirmations` is not `finalized` (i.e. using `latest` or `safe`), each derivation loop checks recent L1 blocks for hash changes before processing new batches. | ||
|
|
||
| **New DB layer** (`node/db/`): | ||
|
|
||
| - `DerivationL1Block` struct — stores `{Number, Hash}` per L1 block | ||
| - `WriteDerivationL1Block` / `ReadDerivationL1Block` / `ReadDerivationL1BlockRange` / `DeleteDerivationL1BlocksFrom` | ||
| - DB key prefix: `derivL1Block` + uint64 big-endian height | ||
|
|
||
| **New config** (`node/derivation/config.go`): | ||
|
|
||
| - `ReorgCheckDepth uint64` — how many recent L1 blocks to verify each loop (default: 64) | ||
| - CLI flag: `--derivation.reorgCheckDepth` / env `MORPH_NODE_DERIVATION_REORG_CHECK_DEPTH` | ||
|
|
||
| **New methods** (`node/derivation/derivation.go`): | ||
|
|
||
| | Method | Purpose | | ||
| |--------|---------| | ||
| | `detectReorg(ctx)` | Iterates recent L1 block hashes from DB, compares against current L1 chain. Returns the height where a mismatch is found, or nil. | | ||
| | `handleL1Reorg(height)` | Cleans DB records from the reorg point and resets `latestDerivationL1Height`. Does NOT rollback L2 — the next derivation loop re-fetches batches and the normal comparison logic decides. | | ||
| | `recordL1Blocks(ctx, from, to)` | After each derivation round, records L1 block hashes for the processed range. | | ||
|
|
||
| **Flow**: | ||
|
|
||
| ```text | ||
| derivationBlock() loop start | ||
| │ | ||
| ├─ [if not finalized] detectReorg() | ||
| │ ├─ no reorg → continue | ||
| │ └─ reorg at height X → handleL1Reorg(X) | ||
| │ ├─ DeleteDerivationL1BlocksFrom(X) | ||
| │ ├─ WriteLatestDerivationL1Height(X-1) | ||
| │ └─ return (next loop re-processes from X) | ||
| │ | ||
| ├─ fetch CommitBatch logs from L1 | ||
| ├─ process each batch → derive() + verifyBatchRoots() | ||
| ├─ recordL1Blocks(start, end) | ||
| └─ WriteLatestDerivationL1Height(end) | ||
| ``` | ||
|
|
||
| ### Added — Batch Data Verification | ||
|
|
||
| When `derive()` encounters an L2 block that already exists locally, it now **compares** the block against the L1 batch data instead of blindly skipping it. | ||
|
|
||
| **New methods**: | ||
|
|
||
| | Method | Purpose | | ||
| |--------|---------| | ||
| | `verifyBlockContext(localHeader, blockData)` | Compares timestamp, gasLimit, baseFee between local L2 block header and batch block context. | | ||
| | `verifyBatchRoots(batchInfo, lastHeader)` | Compares stateRoot and withdrawalRoot between L1 batch and last derived L2 block. Extracted from the old inline logic. | | ||
| | `rollbackLocalChain(targetBlockNumber)` | **TODO stub** — will call geth `SetHead` API to rewind L2 chain. | | ||
|
|
||
| **`derive()` new flow for each block in batch**: | ||
|
|
||
| ```text | ||
| block.Number <= latestBlockNumber? | ||
| ├─ YES (block exists) | ||
| │ ├─ verifyBlockContext() passes → skip, continue | ||
| │ └─ verifyBlockContext() fails | ||
| │ ├─ IncBlockMismatchCount() | ||
| │ ├─ rollbackLocalChain(block.Number - 1) | ||
| │ └─ fall through to NewSafeL2Block (re-execute) | ||
| │ | ||
| └─ NO (new block) | ||
| └─ NewSafeL2Block (execute normally) | ||
| ``` | ||
|
|
||
| **`derivationBlock()` batch-level verification**: | ||
|
|
||
| ```text | ||
| After derive(batchInfo) completes: | ||
| │ | ||
| ├─ verifyBatchRoots() passes → normal | ||
| └─ verifyBatchRoots() fails | ||
| ├─ IncRollbackCount() | ||
| ├─ rollbackLocalChain(firstBlockNumber - 1) | ||
| ├─ re-derive(batchInfo) | ||
| ├─ verifyBatchRoots() again | ||
| │ ├─ passes → recovered | ||
| │ └─ fails → CRITICAL error, stop (manual intervention needed) | ||
| ``` | ||
|
|
||
| ### Added — Metrics | ||
|
|
||
| | Metric | Type | Description | | ||
| |--------|------|-------------| | ||
| | `morphnode_derivation_l1_reorg_detected_total` | Counter | L1 reorg detection count | | ||
| | `morphnode_derivation_l2_rollback_total` | Counter | L2 rollbacks triggered by batch mismatch | | ||
| | `morphnode_derivation_block_mismatch_total` | Counter | Block-level context mismatches | | ||
| | `morphnode_derivation_halted` | Gauge | Set to 1 when derivation halts due to unrecoverable batch mismatch (alert on this) | | ||
|
|
||
| ## Modified Files | ||
|
|
||
| | File | Changes | | ||
| |------|---------| | ||
| | `node/derivation/derivation.go` | Core refactor: removed validator/challenge, added reorg detection, batch verification, rollback flow | | ||
| | `node/derivation/database.go` | Extended `Reader`/`Writer` interfaces for L1 block hash tracking | | ||
| | `node/derivation/config.go` | Added `ReorgCheckDepth` config field | | ||
| | `node/derivation/metrics.go` | Added 3 new counter metrics | | ||
| | `node/db/keys.go` | Added `derivationL1BlockPrefix` and `DerivationL1BlockKey()` | | ||
| | `node/db/store.go` | Added `DerivationL1Block` struct and 4 CRUD methods | | ||
| | `node/flags/flags.go` | Added `DerivationReorgCheckDepth` CLI flag | | ||
| | `node/cmd/node/main.go` | Removed `validator` dependency from `NewDerivationClient` call | | ||
|
|
||
| ## TODO (follow-up work) | ||
|
|
||
| ### `rollbackLocalChain()` — geth SetHead integration | ||
|
|
||
| Currently a stub that returns an error. Any batch mismatch will be detected and logged, but the | ||
| actual L2 chain rollback cannot proceed until this is implemented: | ||
|
|
||
| 1. Expose `SetL2Head(number uint64)` in `go-ethereum/eth/catalyst/l2_api.go` | ||
| 2. Add `SetHead` method to `go-ethereum/ethclient/authclient` | ||
| 3. Add `SetHead` method to `node/types/retryable_client.go` | ||
| 4. Call `d.l2Client.SetHead(d.ctx, targetBlockNumber)` in `rollbackLocalChain()` | ||
|
|
||
| Note: geth already has `BlockChain.SetHead(head uint64) error` — we just need to expose it through the engine API chain. | ||
|
|
||
| ### Transaction-level verification | ||
|
|
||
| `verifyBlockContext` currently checks timestamp, gasLimit, baseFee, and batch-internal tx count | ||
| consistency. Full transaction hash comparison against local blocks requires `BlockByNumber` RPC | ||
| on `RetryableClient`, which is not yet exposed. State root verification in `verifyBatchRoots` | ||
| covers transaction execution correctness as an indirect check. | ||
|
|
||
| ### Concurrency safety | ||
|
|
||
| When running as a verification thread alongside a sequencer, concurrent access between block production and rollback needs locking. This will be handled separately. | ||
|
|
||
| ## How to Test | ||
|
|
||
| 1. **Existing behavior preserved**: Set `--derivation.confirmations` to finalized (default) — reorg detection and L1 block hash recording are both skipped, batch verification still runs. | ||
| 2. **Latest mode**: Set `--derivation.confirmations` to `-2` (latest) — reorg detection activates, L1 block hashes are tracked. | ||
| 3. **Reorg detection**: Simulate by modifying a saved L1 block hash in DB — next loop should detect and clean up. | ||
| 4. **Batch verification**: When an existing L2 block matches L1 batch data, it logs "block verified" and skips. When mismatched, it logs the error and returns (rollback stub returns error, preventing silent continuation). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeleteDerivationL1BlocksFrommay leave orphaned records if gaps exist.The loop breaks at the first missing key (line 206-207), assuming contiguous data. However, per context snippet 2,
recordL1Blocksis only called in non-finalized mode. If the node switches between finalized and non-finalized modes, gaps can form in the recorded L1 blocks, leaving records after the gap undeleted.Consider whether this is acceptable or if you need a bounded scan (e.g., up to
height + reorgCheckDepth) to handle potential gaps.🔧 Potential approach for bounded deletion
🤖 Prompt for AI Agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The orphaned records are harmless in practice:
handleL1Reorg(150)resetslatestDerivationL1Heightto 149,detectReorgonly scans[checkFrom, 149]— the orphans at 301-400 are never read.recordL1Blocksreaches 301-400,WriteDerivationL1Blockoverwrites them viadb.Put.Adding a
maxLookaheadparameter to handle this edge case introduces interface complexity that outweighs the benefit. No change needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@curryxbo, that's a clear and well-reasoned explanation. The combination of bounded scan scope indetectReorg, idempotent overwrite semantics fromdb.PutinrecordL1Blocks, and the negligible interim storage cost all confirm that the gap-stop behavior is safe. Agreed — no change needed.