Skip to content

CID mode segfaults during TBR after cpp-search merge #228

@ms609

Description

@ms609

Summary

After merging cpp-search into feature/cid-consensus (commit 7cce43652), CID consensus search segfaults (exit code 139) during the TBR phase of driven_search(). Reproducible with as few as 2 input trees and 12 tips.

Reproducer

library(TreeSearch)
library(TreeTools)
data(inapplicable.phyData)
dat <- inapplicable.phyData[["Vinther2008"]]
trees <- MaximizeParsimony(dat, maxReplicates = 2, maxSeconds = 3, verbosity = 0)
CIDConsensus(trees, maxReplicates = 1, maxSeconds = 5, verbosity = 2, nThreads = 1)
# Output:
#   Replicate 1/1
#     wag_rand+NNI tree score: -10.261 [0 ms]
#   [exit code 139]

Root cause (partial)

Two issues were found and fixed in commit fc19747d7:

  1. RcppExports.cpp truncation: Git auto-merge inserted ts_wagner_bias_bench and ts_test_strategy_tracker inside the ts_cid_consensus function body (between Rcpp::wrap() and the missing return/END_RCPP/}). Fixed by restoring the closing lines.

  2. NNI warmup segfault: nni_first = true (new default from cpp-search) caused nni_search() to run on CID datasets. nni_search() calls score_tree() which for CID returns the CID score without populating Fitch state arrays, then uses fitch_incremental_downpass() on uninitialised prelim/local_cost arrays. Fixed by adding && (ds.scoring_mode != ScoringMode::CID) to the nni_wagner guard in ts_driven.cpp.

Remaining crash

After both fixes, the segfault persists. The crash occurs after Wagner tree construction succeeds (a score is printed) but before TBR reports its score. The TBR entry point calls full_rescore(tree, ds) which does:

  1. tree.reset_states(ds) — clears arrays, reloads tip states
  2. score_tree(tree, ds) — for CID, calls cid_score()compute_splits_cid()
  3. fitch_score(tree, ds) — populates Fitch state arrays for MRP screening

The crash is in one of these three steps. cid_score() accesses cd.cand_tip_bits, cd.cand_buf, and cd.lap_scratch, all allocated by prepare_cid_data(). The MRP DataSet from build_mrp_dataset() is structurally valid (Wagner + NNI succeed on it).

Investigation notes

  • The CID tests passed on the previous merge base (f97824eaa), so this is a regression from the cpp-search merge.
  • Only 2 cpp-search commits touched ts_tbr.cpp since the last merge: ae5431fa9 (removed dead function) and 62658709d (constraint post-hoc validation) — neither should affect CID.
  • Attempted to add Rprintf debug prints but Windows sed/awk writes literal newlines instead of \n in C strings, and the R session has TreeSearch DLL loaded causing Rscript crashes. A debug build with working prints would pinpoint the exact failing line.
  • The file has CRLF line endings (Windows worktree) vs LF on the branch — this caused diff to show the entire file as changed but is cosmetic.

Suggested next steps

  • Build a debug version with Rprintf markers around reset_states, cid_score, and fitch_score inside full_rescore() (in ts_tbr.cpp line 36) to pinpoint which call segfaults.
  • Check whether compute_splits_cid() is accessing out-of-bounds memory (it uses tree.postorder, tree.left[ni], tree.right[ni] — all should be valid post-Wagner).
  • Check lap_scratch sizing: ensure(cd.max_splits) sizes for max input-tree splits, but the candidate tree may produce more splits than any input tree.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions