Open
Conversation
590d528 to
988a52e
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR moves tunnel provisioning from the doublezero CLI into an onchain-driven reconciler loop inside the doublezerod daemon, adds runtime enable/disable controls, and updates status/reporting + tests/fixtures accordingly.
Changes:
- Add an onchain reconciler (polling + provisioning/removal) to
doublezerod, with persisted enable/disable state and new/enable,/disable,/v2/statusendpoints. - Update CLI flows (
connect,status, plus newenable/disable) to interact with the reconciler rather than directly provisioning. - Add caching onchain fetcher + adjust tests/e2e fixtures to account for reconciler state and updated output.
Reviewed changes
Copilot reviewed 66 out of 66 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| rfcs/rfc17-client-onchain-reconciler.md | RFC documenting reconciler design, state, endpoints, and rollout plan |
| CHANGELOG.md | Changelog entry describing reconciler + CLI enable/disable |
| e2e/user_ban_test.go | Treat missing tunnel interface as successful route withdrawal |
| e2e/multicast_test.go | Update status fixture usage to templated reconciler-aware output |
| e2e/internal/fixtures/diff.go | Make CLI-table diff parsing robust to non-table preamble lines |
| e2e/ibrl_with_allocated_ip_test.go | Update status fixture usage to templated reconciler-aware output |
| e2e/ibrl_test.go | Update status fixture usage to templated reconciler-aware output |
| e2e/ibrl_enable_disable_test.go | New E2E test for connect → disable → enable → restart persistence lifecycle |
| e2e/fixtures/multicast/doublezero_status_disconnected.txt | Remove old disconnected status fixture |
| e2e/fixtures/multicast/doublezero_status_disconnected.tmpl | New reconciler-aware disconnected status fixture template |
| e2e/fixtures/multicast/doublezero_status_connected_subscriber.tmpl | Add “Reconciler” column to connected multicast subscriber fixture |
| e2e/fixtures/multicast/doublezero_status_connected_publisher.tmpl | Add “Reconciler” column to connected multicast publisher fixture |
| e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_disconnected.txt | Remove old disconnected status fixture |
| e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_disconnected.tmpl | New reconciler-aware disconnected status fixture template |
| e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_connected.tmpl | Add “Reconciler” column to connected allocated-IP fixture |
| e2e/fixtures/ibrl/doublezero_status_disconnected.txt | Remove old disconnected status fixture |
| e2e/fixtures/ibrl/doublezero_status_disconnected.tmpl | New reconciler-aware disconnected status fixture template |
| e2e/fixtures/ibrl/doublezero_status_connected.tmpl | Add “Reconciler” column to connected IBRL fixture |
| client/doublezerod/cmd/doublezerod/main.go | Add flags for client IP, reconciler poll interval, and state dir; pass into runtime |
| client/doublezerod/internal/runtime/run.go | Wire reconciler + state migration + caching fetcher; update routes handler wiring; latency now uses fetcher |
| client/doublezerod/internal/runtime/run_test.go | Update runtime tests to new Run() signature; remove statefile recovery assertions |
| client/doublezerod/internal/runtime/clientip.go | New client IP auto-discovery (explicit → interfaces → ifconfig.me) |
| client/doublezerod/internal/runtime/clientip_test.go | Unit tests for public IP classification and explicit IP behavior |
| client/doublezerod/internal/reconciler/reconciler.go | New reconciler loop + enable/disable endpoints + /v2/status |
| client/doublezerod/internal/reconciler/state.go | Persisted reconciler enable/disable state + migration from old doublezerod.json |
| client/doublezerod/internal/reconciler/state_test.go | Unit tests for state load/write + migration behavior |
| client/doublezerod/internal/reconciler/metrics.go | Prometheus metrics for reconciler polls/provisions/removals/matched users |
| client/doublezerod/internal/onchain/fetcher.go | New TTL-based caching fetcher shared by reconciler and latency subsystem |
| client/doublezerod/internal/onchain/fetcher_test.go | Unit tests for caching behavior (TTL, stale-on-error, concurrency) |
| client/doublezerod/internal/manager/manager.go | Remove DB/statefile dependency; add ResolveTunnelSrc + GetProvisionedServices |
| client/doublezerod/internal/manager/http_test.go | Update manager HTTP tests for removed DB + new GetProvisionedServices |
| client/doublezerod/internal/manager/db.go | Remove old on-disk state DB implementation |
| client/doublezerod/internal/manager/db_test.go | Remove tests for deleted DB/statefile system |
| client/doublezerod/internal/manager/fixtures/doublezerod.*.json | Remove fixtures used exclusively for deleted DB/statefile behavior |
| client/doublezerod/internal/services/base.go | Remove DBReaderWriter interface (services now keep ProvisionRequest in memory) |
| client/doublezerod/internal/services/services_test.go | Update service creation tests after DB removal |
| client/doublezerod/internal/services/ibrl.go | Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest() |
| client/doublezerod/internal/services/edgefiltering.go | Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest() |
| client/doublezerod/internal/services/multicast.go | Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest() |
| client/doublezerod/internal/latency/smartcontract.go | Remove old direct smartcontract fetcher module (replaced by fetcher integration) |
| client/doublezerod/internal/latency/manager.go | Support injected Fetcher; adjust SmartContractFunc signature and fetch path |
| client/doublezerod/internal/latency/manager_test.go | Update tests for new smartcontract func signature and removed program ID options |
| client/doublezerod/internal/api/routes.go | Replace DBReader with ServiceStateReader (provisioned services now in manager) |
| client/doublezerod/internal/api/routes_test.go | Update routes tests to use ServiceStateReader mock |
| client/doublezero/src/servicecontroller.rs | Add v2 status + enable/disable calls; remove CLI provisioning/remove/resolve-route APIs |
| client/doublezero/src/routes.rs | Remove resolve_route command surface; keep routes retrieval |
| client/doublezero/src/main.rs | Allow enable/disable commands without version warning gate |
| client/doublezero/src/cli/command.rs | Add doublezero enable / doublezero disable commands |
| client/doublezero/src/command/mod.rs | Register enable/disable subcommands |
| client/doublezero/src/command/enable.rs | New enable command implementation + unit tests |
| client/doublezero/src/command/disable.rs | New disable command implementation + unit tests |
| client/doublezero/src/command/status.rs | Use /v2/status; surface reconciler state in output/table; synthesize disconnected row when empty |
| client/doublezero/src/command/connect.rs | Switch connect flow to best-effort enable reconciler + poll daemon status for provisioning |
| client/doublezero/src/command/disconnect.rs | Stop calling daemon /remove; poll daemon for deprovision completion after onchain user deletion |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_disconnected.tmpl
Outdated
Show resolved
Hide resolved
packethog
requested changes
Feb 19, 2026
5706744 to
0d5d872
Compare
martinsander00
approved these changes
Feb 25, 2026
e9f4982 to
1df4033
Compare
Add a reconciliation loop to doublezerod that polls the DZ Ledger and automatically provisions/removes tunnels when users are activated or deactivated onchain. This replaces the CLI-driven /provision flow and the JSON state file crash recovery mechanism. Key changes: - Reconciler loop on NetlinkManager polls onchain state every 10s - Persistent enabled/disabled state with migration from old state file - Client IP auto-discovery via kernel default route with external fallback - Caching onchain fetcher wrapping the serviceability SDK - Drift detection re-provisions services when onchain state changes - Enable/disable HTTP endpoints and v2 status endpoint - Remove db.go state file, Recover() path, and DBReaderWriter interface
Adapt the doublezero CLI to work with the daemon's onchain reconciler instead of directly driving tunnel provisioning. - Add enable/disable commands to control the reconciler - Simplify connect: create onchain user, enable reconciler, poll daemon status until tunnel appears (drops ~400 lines of provisioning logic) - Simplify disconnect: delete onchain user and let reconciler tear down - Rewrite status to use daemon v2 endpoint with reconciler state - Deprecate --client-ip on CLI in favor of daemon flag - Read client IP from daemon v2 status for onchain user creation
- Add TestE2E_IBRL_EnableDisable covering enable/disable lifecycle - Update e2e fixtures for reconciler-driven status output - Update existing e2e tests for reconciler flow (pass client-ip to daemon) - Update CHANGELOG, DEVELOPMENT docs
Move onchain enrichment (current_device, metro, tenant, lowest_latency_device) from the CLI into the daemon's GET /v2/status endpoint. The daemon already has the data cached via its shared CachingFetcher and LatencyManager, so this eliminates 4 redundant RPC calls from the CLI on every status invocation.
Add Reprovision() that holds the mutex across both Remove and Provision, preventing Status() from observing a nil-service window during re-provisioning. This fixes the e2e multicast subscriber status test that intermittently saw "disconnected" when the reconciler was re-provisioning the service.
2b06b18 to
408e4c5
Compare
When only multicast group memberships change onchain, the reconciler now applies an incremental update (add/remove routes, restart PIM/heartbeat) instead of tearing down and re-provisioning the entire tunnel and BGP session. Falls back to full reprovision on failure or role transitions.
56b5742 to
2597dfc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of Changes
doublezerod) that polls onchain state and automatically provisions/tears down tunnels, replacing the CLI-driven provisioning modelGET /v2/statusendpoint, eliminating 4 redundant RPC calls per status invocationenable/disableCLI subcommands and daemon endpoints to control the reconciler, with persistent state across daemon restartsconnectto simply enable the reconciler and wait for the daemon to provision, rather than performing provisioning directlydoublezerod.json) with onchain-driven state; add automatic migration from legacy state filesDiff Breakdown
~1900 lines of core logic changes (net +235 after removing legacy DB/provisioning code), supported by ~2500 lines of new tests and ~200 lines of docs/RFC.
Key files (click to expand)
client/doublezerod/internal/manager/manager.go— onchain reconciler loop, tunnel provisioning from onchain state, enable/disable control, tunnel-src resolutionclient/doublezero/src/command/status.rs— simplified to read enriched fields from daemon v2/status instead of making SDK RPC callsclient/doublezero/src/command/connect.rs— refactored to enable reconciler and poll for daemon-driven provisioning instead of provisioning directlyclient/doublezerod/internal/manager/http.go— v2/status enrichment logic, enable/disable endpoints, V2ServiceStatus with device/metro/tenant/latencyclient/doublezero/src/command/disconnect.rs— refactored to disable reconciler instead of sending remove requestclient/doublezero/src/servicecontroller.rs— added V2ServiceStatus struct, enable/disable endpoints, removed legacy provisioningclient/doublezerod/internal/onchain/fetcher.go— CachingFetcher with 5s TTL and singleflight dedup for shared onchain data accessclient/doublezerod/internal/manager/state.go— persistent enabled/disabled state with migration from legacy doublezerod.jsonTesting Verification
GOOS=linux go vet(netlink syscalls require Linux for full test run)TestE2E_IBRL_EnableDisablecovering reconciler enable/disable flow