Skip to content

MergeTreeData: force Wide part format when the table has deprecated Object columns#1415

Closed
mkmkme wants to merge 2 commits intoantalya-25.8from
bugfix/1412
Closed

MergeTreeData: force Wide part format when the table has deprecated Object columns#1415
mkmkme wants to merge 2 commits intoantalya-25.8from
bugfix/1412

Conversation

@mkmkme
Copy link
Collaborator

@mkmkme mkmkme commented Feb 17, 2026

fixes #1412

Here's the analysis of #1412 by Claude:

Root cause: The old Object('json') type converts JSON data to nested Tuple structures before storage. When parts have different JSON schemas, reading subcolumns from compact parts without per-substream marks fails — the compact reader's deserialization can't properly handle the complex nested type serialization, leading to tuple elements being deserialized with mismatched sizes.

Recommended Fix: Force wide parts for tables with deprecated Object columns

In MergeTreeData::choosePartFormat, detect tables with deprecated Object columns and always choose Wide format. Wide parts work correctly with all ClickHouse versions and handle complex nested types properly.

This is clean and targeted because:

  • Only affects tables using the deprecated Object('json') type (narrow scope)
  • The enum ordering (Wide=0 < Compact=1) means std::min in merge logic will pick Wide when choosePartFormat returns it, so even existing compact parts get rewritten to Wide during natural merges
  • No changes needed to the complex compact reader deserialization code

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

...

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

…bject columns

fixes #1412

Here's the analysis of #1412 by Claude:

     Root cause: The old Object('json') type converts JSON data to nested Tuple structures before storage. When parts have different JSON schemas, reading subcolumns from compact parts without per-substream marks fails — the compact
     reader's deserialization can't properly handle the complex nested type serialization, leading to tuple elements being deserialized with mismatched sizes.

     Crash: Logical error: 'Unexpected size of tuple element 1: 0. Expected size: 1' in SerializationTuple::deserializeBinaryBulkWithMultipleStreams

     Recommended Fix: Force wide parts for tables with deprecated Object columns

     In MergeTreeData::choosePartFormat, detect tables with deprecated Object columns and always choose Wide format. Wide parts work correctly with all ClickHouse versions and handle complex nested types properly.

     This is clean and targeted because:
     - Only affects tables using the deprecated Object('json') type (narrow scope)
     - The enum ordering (Wide=0 < Compact=1) means std::min in merge logic will pick Wide when choosePartFormat returns it, so even existing compact parts get rewritten to Wide during natural merges
     - No changes needed to the complex compact reader deserialization code
@github-actions
Copy link

github-actions bot commented Feb 17, 2026

Workflow [PR], commit [adff9a6]

@mkmkme
Copy link
Collaborator Author

mkmkme commented Feb 17, 2026

Wrong target branch, will recreate a PR

@mkmkme mkmkme closed this Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression in 25.8.16: Crash when reading Object(JSON) from Compact MergeTree parts

1 participant