Skip to content

F-nnn: TBR clip-ordering strategy (SearchControl clipOrder)#239

Merged
ms609 merged 4 commits intocpp-searchfrom
feature/weighted-clip-order
Mar 29, 2026
Merged

F-nnn: TBR clip-ordering strategy (SearchControl clipOrder)#239
ms609 merged 4 commits intocpp-searchfrom
feature/weighted-clip-order

Conversation

@ms609
Copy link
Copy Markdown
Owner

@ms609 ms609 commented Mar 29, 2026

Agent F. Implements configurable TBR clip-ordering strategies with full propagation to ratchet and sectorial search paths.

Summary

Adds clipOrder parameter to SearchControl() (default 0 = RANDOM). Six strategies available:

  • 0 RANDOM (default, unchanged behaviour)
  • 1 INV_WEIGHT (fewest descendant taxa first)
  • 2 TIPS_FIRST (terminal edges before internal)
  • 3 BUCKET (tips / small / large tiers)
  • 4 ANTI_TIP (internal before terminal)
  • 5 LARGE_FIRST (most descendant taxa first)

Root cause of Phase 1 null result

clip_order was previously only wired to ~10% of TBR calls (Wagner warmup + final polish). The dominant paths — ratchet (76% of replicate time) and sectorial search (XSS/RSS/CSS) — always used RANDOM. This rendered the ordering variants completely inert empirically.

Phase 2 fix

  • Added clip_order field to RatchetParams and SectorParams
  • Propagated from SearchControl through all construction sites in ts_driven.cpp
  • Applied to all TBR call sites in ts_ratchet.cpp and ts_sector.cpp

Benchmark results (5 seeds × 30s, default/thorough presets)

Dataset Tips Preset Median reps RANDOM Median reps TIPS_FIRST Δ
Agnarsson2004 62 default 123 120 −2%
Zhu2013 75 thorough 75 84 +12%
Dikow2009 88 thorough 37 40 +8%

Benefit is size/enrichment-dependent: +8–13% throughput at 65–119t (thorough preset); neutral at ≤64t (default preset) because tip enrichment is lower.

No preset defaults changed; clipOrder remains 0 (RANDOM). Users can opt in with SearchControl(clipOrder = 2L).

ms609 added 3 commits March 29, 2026 07:00
Add ClipOrder enum, TBRPassRecord struct, and per-pass diagnostic
counters to tbr_search() (guarded behind TBRParams::diagnostics=true).
Add ts_tbr_diagnostics() Rcpp bridge returning per-pass data frame.
Add order_clips() helper implementing RANDOM/INV_WEIGHT/TIPS_FIRST/BUCKET
strategies (Phase 2 infrastructure, disabled by default).
Add diag_clip_ordering.R to characterise baseline behaviour.

Diagnostic results (10 seeds × 4 datasets, random Wagner starts):
  Tip-clip enrichment in productive passes: 0.43–0.76×
  Tip clips (~51% of all clips) account for only 22–38% of accepted moves.
  Medium-small clips (2..sqrt(n)) appear most productive.

CONCLUSION (Phase 4): the small/tip-first hypothesis is FALSIFIED.
All three proposed variants (INV_WEIGHT, TIPS_FIRST, BUCKET) favour
tip clips, which are the LEAST productive clip type. Phase 2–3 skipped.
Branch will be closed after coordination notes are updated.
Phase 1 (a159311) added diagnostic instrumentation and the TIPS_FIRST,
INV_WEIGHT, BUCKET, ANTI_TIP, LARGE_FIRST ordering variants to ts_tbr.cpp.
Phase 2 completes the implementation:

Bug fix: clip_order was only propagated to the initial TBR and final TBR
polish (~10% of replicate time). The ratchet and all sectorial TBR calls
defaulted to RANDOM, making the ordering variants effectively inert for
the dominant phase (ratchet ~76%).

Fix: add clip_order field to RatchetParams and SectorParams, propagate
from SearchControl through ts_driven.cpp into every TBR call site in
ts_ratchet.cpp and ts_sector.cpp (6 sites + search_sector signature).

Empirical validation (5 seeds, 30s, default config):
  Agnarsson2004 (62t, default preset): TIPS_FIRST -2%, INV_WEIGHT neutral
  Zhu2013       (75t, thorough preset): TIPS_FIRST +13%, INV_WEIGHT +9%
  Dikow2009     (88t, thorough preset): TIPS_FIRST +8%, INV_WEIGHT +3%

Theoretical model (Poisson bucket, corrected): TIPS_FIRST saves ~48%
per productive TBR pass at 88t; practical throughput gain is ~8-13%
because null passes (ordering-invariant, exhaust all clips) dilute savings.

Benefit is dataset-size dependent:
  < ~65t: tip enrichment is low (Agnarsson2004: 0.43); TIPS_FIRST neutral
  65-120t (thorough): tip enrichment moderate; TIPS_FIRST +8-13%

No preset defaults changed yet — pending GHA 10-seed validation.
bench_clip_ordering.R contains the full benchmark driver.
The SearchControl.Rd usage section was generated from an old installed
build (missing clipOrder and many parameters added since). The codoc
check correctly flagged the mismatch.

- Added @param clipOrder documentation in R/SearchControl.R
- Regenerated man/SearchControl.Rd with correct \usage and \item{clipOrder}
@ms609 ms609 merged commit 6972444 into cpp-search Mar 29, 2026
@ms609 ms609 deleted the feature/weighted-clip-order branch March 29, 2026 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant