Skip to content

fix: subnet bootstrapping#1545

Open
phutchins wants to merge 6 commits intomainfrom
feature/subnet-bootstrapping
Open

fix: subnet bootstrapping#1545
phutchins wants to merge 6 commits intomainfrom
feature/subnet-bootstrapping

Conversation

@phutchins
Copy link
Contributor

@phutchins phutchins commented Mar 11, 2026

Note

Medium Risk
Touches the subnet initialization/bootstrapping flow and remote execution paths (SSH/sudo, config generation, genesis creation), so mistakes could prevent validators from starting or misconfigure networking, but changes are contained to scripting tooling.

Overview
Adds a bootstrap command to provision fresh remote validator hosts (deps + repo clone/build) and updates docs to guide a bootstrap-first workflow.

Reworks init to support --resume, separates local-vs-remote filesystem concerns (local ~/.ipc as source of truth, then copies configs/genesis to remotes), and makes node startup more reliable by generating a per-node start script with required resolver/subnet env vars.

Improves operability and troubleshooting: new diagnose command, check --wait, better libp2p/peer addressing via internal_ip, more robust SSH helpers/keepalives and non-login execution (exec_on_host_simple), safer config edits (temp scripts to avoid quoting), dashboard UX tweaks, and a new resolver troubleshooting guide.

Written by Cursor Bugbot for commit 6c0e662. This will update automatically on new commits. Configure here.

…figuration updates

- Added a new bootstrap command to install dependencies (Rust, Foundry, Node.js) on fresh validator hosts.
- Updated initialization process to support resuming from previous failures.
- Modified subnet configuration with new validator IPs and registry addresses.
- Improved health check and execution commands for better reliability.
- Enhanced documentation to reflect new bootstrap steps and usage instructions.
…rd metrics

- Introduced a new troubleshooting document for diagnosing issues with the IPLD Resolver not listening on port 26654.
- Enhanced the dashboard script to initialize additional metrics for better monitoring, including block production rates and finality tracking.
- Updated health check scripts to ensure proper environment variable handling and improve logging for resolver-related configurations.
- Introduced a new function `ssh_exec_long` to handle long-running commands with streaming output, preventing SSH timeouts during builds.
- Updated the `update_validator_binaries` function to utilize the new long-running command execution, improving build process logging and error handling.
@phutchins phutchins requested a review from a team as a code owner March 11, 2026 14:03
@phutchins phutchins changed the title Feature/subnet bootstrapping fix: subnet bootstrapping Mar 11, 2026
…d script

- Updated the calculation of `blocks_per_min` to accurately reflect block production rates based on time differences.
- Adjusted timestamp formatting logic to ensure proper handling of time zone indicators.
for arg in "$@"; do
case $arg in
--wait=*) wait_seconds="${arg#*=}" ;;
--wait) shift; wait_seconds="${1:-30}" ;;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--wait value not parsed in for loop

Medium Severity

In cmd_check, the --wait VALUE form (space-separated) calls shift inside a for arg in "$@" loop. shift modifies $@ but has no effect on the loop's already-captured iteration list. As a result, wait_seconds="${1:-30}" reads $1 — the original first argument (e.g., "--wait") — instead of the intended value (e.g., 45). Since "--wait" is not a number, the [ "$wait_seconds" -gt 0 ] check silently fails and no sleep occurs. The suggested usage ./ipc-manager check --wait 45 (advertised in the error output) will never work correctly.

Fix in Cursor Fix in Web

- Modified the `ipc-subnet-config.yml` to reflect new registry and gateway addresses for the parent subnet.
- Enhanced the `config.sh` script to retrieve parent addresses from a unified source, ensuring backward compatibility with existing configurations.
- Updated YAML config synchronization logic to maintain consistency between subnet and ipc_cli.parent sections.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

METRICS[peers]=0
METRICS[mempool_size]=0
METRICS[mempool_bytes]=0
METRICS[mempool_max]=5000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dashboard mempool_max initialization prevents config reading

Low Severity

METRICS[mempool_max] is now initialized to 5000 in initialize_dashboard, which causes the conditional check [ -z "${METRICS[mempool_max]:-}" ] in fetch_metrics to always evaluate to false. The actual CometBFT mempool config value is never read from the node. Previously this key was uninitialized, so the first fetch_metrics call would read the real value. The dashboard now always shows 5000 as capacity regardless of actual config, causing incorrect mempool percentage calculations.

Additional Locations (1)
Fix in Cursor Fix in Web

local peer_ip=$(get_config_value "validators[$peer_idx].ip")
if echo "$static_addrs" | grep -q "/ip4/$peer_ip/tcp/$libp2p_port"; then
local peer_ip=$(get_peer_ip "$peer_idx")
if echo "$static_addrs" | grep -q "/ip4/$peer_ip/tcp/$v_resolver_port"; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Info command checks wrong port for peer static_addresses

Low Severity

In cmd_info, the static_addresses peer check uses $v_resolver_port (the current validator's resolver port) to verify peer entries. But static_addresses contains each peer's own resolver port, not the current validator's port. In local mode where each validator has a different port offset, this check always fails, producing misleading diagnostic output. The peer's port via get_resolver_port_for_validator "$peer_idx" is needed instead.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant