MergeTreeData: force Wide part format when the table has deprecated Object columns#1415
Closed
mkmkme wants to merge 2 commits intoantalya-25.8from
Closed
MergeTreeData: force Wide part format when the table has deprecated Object columns#1415mkmkme wants to merge 2 commits intoantalya-25.8from
mkmkme wants to merge 2 commits intoantalya-25.8from
Conversation
…bject columns fixes #1412 Here's the analysis of #1412 by Claude: Root cause: The old Object('json') type converts JSON data to nested Tuple structures before storage. When parts have different JSON schemas, reading subcolumns from compact parts without per-substream marks fails — the compact reader's deserialization can't properly handle the complex nested type serialization, leading to tuple elements being deserialized with mismatched sizes. Crash: Logical error: 'Unexpected size of tuple element 1: 0. Expected size: 1' in SerializationTuple::deserializeBinaryBulkWithMultipleStreams Recommended Fix: Force wide parts for tables with deprecated Object columns In MergeTreeData::choosePartFormat, detect tables with deprecated Object columns and always choose Wide format. Wide parts work correctly with all ClickHouse versions and handle complex nested types properly. This is clean and targeted because: - Only affects tables using the deprecated Object('json') type (narrow scope) - The enum ordering (Wide=0 < Compact=1) means std::min in merge logic will pick Wide when choosePartFormat returns it, so even existing compact parts get rewritten to Wide during natural merges - No changes needed to the complex compact reader deserialization code
Collaborator
Author
|
Wrong target branch, will recreate a PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fixes #1412
Here's the analysis of #1412 by Claude:
Root cause: The old
Object('json')type converts JSON data to nested Tuple structures before storage. When parts have different JSON schemas, reading subcolumns from compact parts without per-substream marks fails — the compact reader's deserialization can't properly handle the complex nested type serialization, leading to tuple elements being deserialized with mismatched sizes.Recommended Fix: Force wide parts for tables with deprecated Object columns
In
MergeTreeData::choosePartFormat, detect tables with deprecated Object columns and always choose Wide format. Wide parts work correctly with all ClickHouse versions and handle complex nested types properly.This is clean and targeted because:
Wide=0 <Compact=1) meansstd::minin merge logic will pick Wide whenchoosePartFormatreturns it, so even existing compact parts get rewritten to Wide during natural mergesChangelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
...
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: