Skip to content

Avoid large merge backlog to block the indexer#25

Open
rdettai-sk wants to merge 2 commits intosekoiafrom
slow-merge-list
Open

Avoid large merge backlog to block the indexer#25
rdettai-sk wants to merge 2 commits intosekoiafrom
slow-merge-list

Conversation

@rdettai-sk
Copy link
Copy Markdown
Collaborator

Description

Currently, when the indexer starts it is assigned all the ingest v1 pipelines even though DISABLE_INGEST_V1 is set to true. This triggers a huge scan for merge oportunities that happened to be very slow on the indexer node that accumulated a huge backlog of merges. This PR de-activates completely the assignation of ingest v1 pipelines, which should avoid this very slow metastore call.

It also adds some logs to help troubleshoot further issues with the indexing service.

Copilot AI review requested due to automatic review settings April 8, 2026 15:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent indexers from being assigned ingest v1 pipelines when QW_DISABLE_INGEST_V1 is enabled, avoiding expensive metastore work that can create a large merge backlog. It also adds instrumentation to help identify slow IndexingService actor handlers.

Changes:

  • Skip scheduling ingest v1 (SourceParams::IngestApi) sources when disable_ingest_v1() is true.
  • Add a SlowHandlerGuard and wire it into several IndexingService handlers to emit periodic “slow handler” warnings.
  • Improve indexing task map formatting by omitting entries with empty task lists.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
quickwit/quickwit-indexing/src/actors/indexing_service.rs Adds SlowHandlerGuard and instantiates it in multiple actor handlers to detect slow operations.
quickwit/quickwit-control-plane/src/indexing_scheduler/mod.rs Skips scheduling ingest v1 sources when disabled; tweaks debug-formatting to hide empty task lists.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1028 to +1035
pub struct SlowHandlerGuard {
_cancel_tx: oneshot::Sender<()>,
}

impl SlowHandlerGuard {
pub fn new(handler_name: &'static str) -> Self {
let (cancel_tx, mut cancel_rx) = oneshot::channel::<()>();
let start = Instant::now();
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SlowHandlerGuard uses oneshot::Sender / oneshot::channel, but oneshot is not imported anywhere in this file. This will not compile as-is; add an appropriate import (e.g. from tokio::sync) or fully-qualify the path at the usage sites.

Copilot uses AI. Check for mistakes.
Comment on lines +1033 to +1045
pub fn new(handler_name: &'static str) -> Self {
let (cancel_tx, mut cancel_rx) = oneshot::channel::<()>();
let start = Instant::now();
tokio::spawn(async move {
loop {
tokio::select! {
_ = tokio::time::sleep(Duration::from_secs(5)) => {
warn!(handler=handler_name, elapsed_secs=start.elapsed().as_secs(), "slow indexing service handler");
}
_ = &mut cancel_rx => { break; }
}
}
});
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SlowHandlerGuard::new spawns a new Tokio task on every handler invocation (including periodic supervise_loop). Even when the handler is fast, this creates avoidable task-scheduling overhead and can amplify load under high message volume. Consider implementing the slow-handler warning without spawning (e.g., wrap the handler future in a tokio::select! loop with an interval/sleep tick and log while the future is still pending).

Copilot uses AI. Check for mistakes.
Comment on lines 222 to +227
SourceParams::IngestApi => {
// TODO ingest v1 is scheduled differently
if disable_ingest_v1() {
// Existing indexes might still have the _ingest-api-source
continue;
}
// Note: ingest v1 is scheduled differently
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new branch makes get_sources_to_schedule behavior depend on QW_DISABLE_INGEST_V1. There is an existing unit test (test_get_sources_to_schedule) that includes an IngestApi source and asserts a fixed number of scheduled sources; with this change it will need to account for the disabled-ingest-v1 case (otherwise it can fail in test runs where QW_DISABLE_INGEST_V1=true). Consider updating/adding assertions so the test covers both outcomes (scheduled when enabled, skipped when disabled).

Copilot uses AI. Check for mistakes.
@rdettai-sk rdettai-sk requested a review from Darkheir April 8, 2026 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants