Skip to content

Use replica-level dataVolumeClaimTemplate and logVolumeClaimTemplate when the shard's field is not set.#1861

Open
kschoche wants to merge 1 commit intoAltinity:0.26.0from
kschoche:kyle.schochenmaier/use-replica-overrides-over-shards-for-volumes
Open

Use replica-level dataVolumeClaimTemplate and logVolumeClaimTemplate when the shard's field is not set.#1861
kschoche wants to merge 1 commit intoAltinity:0.26.0from
kschoche:kyle.schochenmaier/use-replica-overrides-over-shards-for-volumes

Conversation

@kschoche
Copy link

@kschoche kschoche commented Nov 19, 2025

in the case where replicas override dataVolumeClaimTemplates/logVolumeClaimTemplates but the shard does not set it, the replica-level overrides are ignored. This fix prefers the replica settings always as they are more fine grained control points.

I found this while attempting to do something a little bit out of the box: attempting to leverage the operator to migrate a clickhouse cluster from an old storageclass to a new one by leveraging the replica level template overrides.

So I'm going from this type of spec, which uses the defaults for each shard and replica:

spec:
  configuration:
    clusters:
    - layout:
        replicasCount: 3
        shardsCount: 1
    defaults:
      templates:
        dataVolumeClaimTemplate: original-template

And I'd like to migrate to this, which should have the effect of (re)creating the 3 existing replicas using their existing PVs, and adding a new replica to the shard which uses the new storageclass via the logVolumeClaimTemplate + dataVolumeClaimTemplate fields.

spec:
  configuration:
    clusters:
    - layout:
        shards:
        - replicas:
          - name: 0-0
          - name: 0-1
          - name: 0-2
          - name: 0-3
             templates:
              dataVolumeClaimTemplate: some-different-template
              logVolumeClaimTemplate: some-different-template
    defaults:
      templates:
        dataVolumeClaimTemplate: original-template

What happens though, is that the new replica 0-4 is created using the default dataVolumeClaim instead of the one that is specified in the replica-level field of the spec.

While trying to get this to work, I noticed that the replica level dataVolumeClaimTemplate field is ignored if it is not also set in the shard. This PR removes the shard-level check in favour of the replica-level field which allows for finer grained control of the logVolumeClaimTemplates and dataVolumeClaimTemplates.

NOTE: I believe this to be a bug fix since replica level logVolumeClaimTemplate + dataVolumeClaimTemplate configurations are ignored when both shard and replica were set, but it will change behaviour for users:

  - Old behavior: When both shard-level and replica-level volume claim templates (or other settings) were specified, shard-level settings took precedence
  - New behavior: When both are specified, replica-level settings now take precedence

Thanks for taking the time to contribute to clickhouse-operator!

Please, read carefully instructions on how to make a Pull Request.

This will help a lot for maintainers to adopt your Pull Request.

Important items to consider before making a Pull Request

Please check items PR complies to:

  • All commits in the PR are squashed. More info
  • The PR is made into dedicated next-release branch, not into master branch1. More info
  • The PR is signed. More info

--

1 If you feel your PR does not affect any Go-code or any testable functionality (for example, PR contains docs only or supplementary materials), PR can be made into master branch, but it has to be confirmed by project's maintainer.

@kschoche kschoche changed the base branch from master to 0.25.6 November 19, 2025 19:24
In the case where replicas override data/logVolumeClaimTemplates but
the shard does not set these, the overrides were being ignored. This
change ensures replica settings are always preferred as they provide
more fine-grained control points.

Includes unit tests for the new behavior.
@kschoche kschoche force-pushed the kyle.schochenmaier/use-replica-overrides-over-shards-for-volumes branch from 3dd86b4 to 1bb2e3e Compare January 7, 2026 17:02
@kschoche kschoche marked this pull request as ready for review January 7, 2026 17:29
@sunsingerus sunsingerus changed the base branch from 0.25.6 to 0.26.0 February 17, 2026 15:49
Copy link
Collaborator

@sunsingerus sunsingerus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed write-up and for including unit tests — the test structure is solid.

However, after tracing the full normalization flow, I believe this PR may not address the scenario described, and the code change could introduce unintended side effects.

The described use case doesn't match the code change

In your YAML, you have replicas within a shard:

layout:
  shards:
  - replicas:
    - name: 0-3
      templates:
        dataVolumeClaimTemplate: some-different-template

This populates Layout.Shards[0].Hosts[3] (the replicas JSON field inside ChiShard maps to Hosts []*Host). Crucially, the top-level Layout.Replicas array remains empty.

The FillShardsReplicasExplicitlySpecified() method checks len(cluster.Layout.Replicas) — the top-level array. Since it's empty, ReplicasExplicitlySpecified = false. This means neither isReplicaExplicitlySpecified() nor isShardToBeUsedToInheritSettingsFrom() changes from the current behavior — the PR's code changes have no effect on this scenario.

Per-host templates should already survive normalization

Tracing the normalization flow for your use case:

  1. createHostsField() — migrates hosts from shard.Hosts[] (including your replica 0-3 with its dataVolumeClaimTemplate) into HostsField via host.MergeFrom(), which uses MergeTypeFillEmptyValues
  2. normalizeShardHosts() — re-populates shard.Hosts from HostsField — templates survive
  3. normalizeHostStage2() — calls host.InheritTemplatesFrom(shard) with MergeTypeFillEmptyValues, which only fills empty values — your host's explicitly set template should NOT be overwritten

If the per-replica template is being lost, the root cause is likely elsewhere — perhaps during template application in hostApplyHostTemplateSpecifiedOrDefault() or during the InheritSettingsFrom chain. Could you provide the operator version where you observed this issue? It would help narrow down the root cause.

Additional concerns

  1. Breaking change: When both Layout.Shards and Layout.Replicas are populated (the rare case this PR does affect), shard currently wins. Changing to replica-wins could break existing users.

  2. type_status_test.go changes are unnecessary: The 0.26.0 branch already uses types.CopyStatusOptions. These changes were needed for 0.25.6 but are now stale/conflicting.

Recommendation

If you can reproduce the issue on the current 0.26.0 branch, please share:

  • Exact operator version
  • Full CHI YAML
  • The generated StatefulSet showing the wrong volume claim

This will help us find the real root cause, which may be in a different part of the normalization pipeline.

@sunsingerus sunsingerus added the waiting reply Altinity is waiting for a reply label Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting reply Altinity is waiting for a reply

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants