Use replica-level dataVolumeClaimTemplate and logVolumeClaimTemplate when the shard's field is not set.#1861
Conversation
In the case where replicas override data/logVolumeClaimTemplates but the shard does not set these, the overrides were being ignored. This change ensures replica settings are always preferred as they provide more fine-grained control points. Includes unit tests for the new behavior.
3dd86b4 to
1bb2e3e
Compare
sunsingerus
left a comment
There was a problem hiding this comment.
Thanks for the detailed write-up and for including unit tests — the test structure is solid.
However, after tracing the full normalization flow, I believe this PR may not address the scenario described, and the code change could introduce unintended side effects.
The described use case doesn't match the code change
In your YAML, you have replicas within a shard:
layout:
shards:
- replicas:
- name: 0-3
templates:
dataVolumeClaimTemplate: some-different-templateThis populates Layout.Shards[0].Hosts[3] (the replicas JSON field inside ChiShard maps to Hosts []*Host). Crucially, the top-level Layout.Replicas array remains empty.
The FillShardsReplicasExplicitlySpecified() method checks len(cluster.Layout.Replicas) — the top-level array. Since it's empty, ReplicasExplicitlySpecified = false. This means neither isReplicaExplicitlySpecified() nor isShardToBeUsedToInheritSettingsFrom() changes from the current behavior — the PR's code changes have no effect on this scenario.
Per-host templates should already survive normalization
Tracing the normalization flow for your use case:
createHostsField()— migrates hosts fromshard.Hosts[](including your replica0-3with itsdataVolumeClaimTemplate) intoHostsFieldviahost.MergeFrom(), which usesMergeTypeFillEmptyValuesnormalizeShardHosts()— re-populatesshard.HostsfromHostsField— templates survivenormalizeHostStage2()— callshost.InheritTemplatesFrom(shard)withMergeTypeFillEmptyValues, which only fills empty values — your host's explicitly set template should NOT be overwritten
If the per-replica template is being lost, the root cause is likely elsewhere — perhaps during template application in hostApplyHostTemplateSpecifiedOrDefault() or during the InheritSettingsFrom chain. Could you provide the operator version where you observed this issue? It would help narrow down the root cause.
Additional concerns
-
Breaking change: When both
Layout.ShardsandLayout.Replicasare populated (the rare case this PR does affect), shard currently wins. Changing to replica-wins could break existing users. -
type_status_test.gochanges are unnecessary: The0.26.0branch already usestypes.CopyStatusOptions. These changes were needed for0.25.6but are now stale/conflicting.
Recommendation
If you can reproduce the issue on the current 0.26.0 branch, please share:
- Exact operator version
- Full CHI YAML
- The generated StatefulSet showing the wrong volume claim
This will help us find the real root cause, which may be in a different part of the normalization pipeline.
in the case where replicas override
dataVolumeClaimTemplates/logVolumeClaimTemplatesbut the shard does not set it, the replica-level overrides are ignored. This fix prefers the replica settings always as they are more fine grained control points.I found this while attempting to do something a little bit out of the box: attempting to leverage the operator to migrate a clickhouse cluster from an old storageclass to a new one by leveraging the replica level template overrides.
So I'm going from this type of spec, which uses the
defaultsfor each shard and replica:And I'd like to migrate to this, which should have the effect of (re)creating the 3 existing replicas using their existing PVs, and adding a new replica to the shard which uses the new storageclass via the
logVolumeClaimTemplate + dataVolumeClaimTemplatefields.What happens though, is that the new replica
0-4is created using the default dataVolumeClaim instead of the one that is specified in the replica-level field of the spec.While trying to get this to work, I noticed that the replica level
dataVolumeClaimTemplatefield is ignored if it is not also set in the shard. This PR removes the shard-level check in favour of the replica-level field which allows for finer grained control of the logVolumeClaimTemplates and dataVolumeClaimTemplates.NOTE: I believe this to be a bug fix since replica level
logVolumeClaimTemplate + dataVolumeClaimTemplateconfigurations are ignored when both shard and replica were set, but it will change behaviour for users:Thanks for taking the time to contribute to
clickhouse-operator!Please, read carefully instructions on how to make a Pull Request.
This will help a lot for maintainers to adopt your Pull Request.
Important items to consider before making a Pull Request
Please check items PR complies to:
next-releasebranch, not intomasterbranch1. More info--
1 If you feel your PR does not affect any Go-code or any testable functionality (for example, PR contains docs only or supplementary materials), PR can be made into
masterbranch, but it has to be confirmed by project's maintainer.