Skip to content

[phase-31 4/4] PostgreSQL metastore — migration + compaction columns#6245

Merged
g-talbot merged 42 commits intomainfrom
gtt/phase-31-pg-metastore
Apr 8, 2026
Merged

[phase-31 4/4] PostgreSQL metastore — migration + compaction columns#6245
g-talbot merged 42 commits intomainfrom
gtt/phase-31-pg-metastore

Conversation

@g-talbot
Copy link
Copy Markdown
Contributor

Summary

Add compaction metadata to the PostgreSQL metastore with migration 27 and updated SQL queries (Phase 31 Metadata Foundation, PR 4 of 4).

Stacks on gtt/phase-31-writer-wiring (PR #6244).

What's included

Migration 27 (27_add-compaction-metadata.{up,down}.sql):

  • ALTER TABLE metrics_splits: adds window_start, window_duration_secs, sort_fields, num_merge_ops, row_keys, zonemap_regexes
  • Partial index idx_metrics_splits_compaction_scope on (index_uid, sort_fields, window_start) WHERE split_state = 'Published'

stage_metrics_splits:

  • INSERT extended from 15 to 21 bind parameters ($16-$21 for compaction columns)
  • ON CONFLICT SET clause updates all 6 compaction columns
  • zonemap_regexes cast from text to JSONB in SELECT

list_metrics_splits:

  • PgMetricsSplit construction includes compaction fields (defaults until migration runs; authoritative data from JSON blob via to_metadata())

Bug fixes (pre-existing on upstream-10b-parquet-actors):

  • Added missing StageMetricsSplitsRequestExt import
  • Fixed index_id vs index_uid type mismatches in publish, mark, and delete functions
  • Fixed IndexUid binding to sqlx (use .to_string())
  • Fixed ListMetricsSplitsResponseExt::try_from_splits trait disambiguation

Verification

  • cargo build -p quickwit-metastore --features quickwit-metastore/postgres
  • Clippy: dependent on upstream-10b-parquet-actors test compilation (tests not gated)

Test plan

  • Compilation with postgres feature succeeds
  • Integration tests require Docker (Postgres) — run with make test-metrics-e2e
  • Migration SQL is syntactically valid and reversible

🤖 Generated with Claude Code

@g-talbot g-talbot force-pushed the gtt/phase-31-pg-metastore branch from c8836d6 to eccf657 Compare March 31, 2026 20:41
@g-talbot g-talbot force-pushed the gtt/phase-31-writer-wiring branch 2 times, most recently from 3bbfb71 to 95c3596 Compare March 31, 2026 20:55
@g-talbot g-talbot force-pushed the gtt/phase-31-pg-metastore branch 2 times, most recently from 6eeaecd to f2113e5 Compare March 31, 2026 21:03
@g-talbot g-talbot force-pushed the gtt/phase-31-writer-wiring branch 2 times, most recently from 179ccd2 to ed6d687 Compare March 31, 2026 21:08
@g-talbot g-talbot force-pushed the gtt/phase-31-pg-metastore branch from f2113e5 to 0d561b4 Compare March 31, 2026 21:08
@g-talbot g-talbot force-pushed the gtt/phase-31-writer-wiring branch from ed6d687 to a4d0d36 Compare March 31, 2026 21:26
@g-talbot g-talbot force-pushed the gtt/phase-31-pg-metastore branch from 0d561b4 to 8ba9201 Compare March 31, 2026 21:26
@g-talbot g-talbot force-pushed the gtt/phase-31-pg-metastore branch from 8ba9201 to 0761d11 Compare March 31, 2026 21:29
@g-talbot g-talbot force-pushed the gtt/phase-31-writer-wiring branch from a4d0d36 to f05d4e7 Compare March 31, 2026 21:31
@g-talbot g-talbot force-pushed the gtt/phase-31-writer-wiring branch from df6e699 to 76b703a Compare April 1, 2026 20:50
@g-talbot g-talbot force-pushed the gtt/phase-31-pg-metastore branch 2 times, most recently from c23e666 to ee0c5a4 Compare April 1, 2026 21:01
@g-talbot g-talbot force-pushed the gtt/phase-31-pg-metastore branch from ee0c5a4 to 605708e Compare April 1, 2026 21:14
g-talbot and others added 2 commits April 6, 2026 09:48
- Migration 27: add maturity_timestamp, delete_opstamp, node_id columns
  and publish_timestamp trigger to match the splits table (Paul's review)
- ListMetricsSplitsQuery: adopt FilterRange<i64> for time_range (matching
  log-side pattern), single time_range field for both read and compaction
  paths, add node_id/delete_opstamp/update_timestamp/create_timestamp/
  mature filters to close gaps with ListSplitsQuery
- Use SplitState enum instead of stringly-typed Vec<String> for split_states
- StoredMetricsSplit: add create_timestamp, node_id, delete_opstamp,
  maturity_timestamp so file-backed metastore can filter on them locally
- File-backed filter: use FilterRange::overlaps_with() for time range and
  window intersection, apply all new filters matching log-side predicate
- Postgres: intersection semantics for window queries, FilterRange-based
  SQL generation for all range filters
- Fix InsertableMetricsSplit.window_duration_secs from Option<i32> to i32
- Rename two-letter variables (ws, sf, dt) throughout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mattmkim
Copy link
Copy Markdown
Contributor

mattmkim commented Apr 6, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f7905198f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/// Update timestamp (Unix epoch seconds).
pub update_timestamp: i64,
/// Create timestamp (Unix epoch seconds).
pub create_timestamp: i64,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add default for new create_timestamp field

StoredMetricsSplit now requires create_timestamp during deserialization, but previously persisted file-backed index JSON does not contain this key. When a node upgrades and reloads an index with existing metrics_splits, serde will fail to decode those entries, preventing metastore state from loading. Marking this field with a serde default (like the other newly added fields) is needed for backward-compatible reads.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

g-talbot and others added 16 commits April 6, 2026 21:11
Co-authored-by: Matthew Kim <matthew.kim@datadoghq.com>
Co-authored-by: Matthew Kim <matthew.kim@datadoghq.com>
Resolve merge conflicts by taking main's versions of otel_metrics.rs
and arrow_metrics.rs (the PR didn't modify these files — conflicts
came from the base branch divergence). Kept PR's table_config module
export in quickwit-parquet-engine/src/lib.rs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-existing splits were serialized before the parquet_file field was
added, so their JSON doesn't contain it. Adding #[serde(default)]
makes deserialization fall back to empty string for old splits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the commit timeout fires and the accumulator contains only
zero-column batches, union_fields is empty and concat_batches fails
with "must either specify a row count or at least one column".
Now flush_internal treats empty union_fields the same as empty
pending_batches — resets state and returns None.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Resolve Cargo.lock/Cargo.toml merge conflicts
- P1 (sort column lookup): Already addressed by sort fields tag_ prefix
  fix — sort field names now match Parquet column names
- P2 (window_start at epoch 0): Remove time_range.start_secs > 0 guard
  so window_start is computed for all batches when window_duration > 0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Base automatically changed from gtt/phase-31-writer-wiring to main April 8, 2026 14:17
- Resolve writer.rs conflict: keep META-07 self-describing roundtrip test
- P1 (create_timestamp serde): Add #[serde(default)] to
  StoredMetricsSplit.create_timestamp for backward-compatible reads
  of pre-existing file-backed index JSON
- P1 (compaction window overlap): No change needed — Bound::Included
  vs Bound::Excluded already handles half-open interval semantics
  correctly, and the edge case (zero duration) is impossible
- fields.rs: No change — Matt noted it resolves with wide schema rebase

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot merged commit 469c606 into main Apr 8, 2026
8 checks passed
@g-talbot g-talbot deleted the gtt/phase-31-pg-metastore branch April 8, 2026 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants