Skip to content

Added minimum Manager property#6209

Merged
dlmarion merged 4 commits intoapache:mainfrom
dlmarion:multi-manager-wait
Mar 12, 2026
Merged

Added minimum Manager property#6209
dlmarion merged 4 commits intoapache:mainfrom
dlmarion:multi-manager-wait

Conversation

@dlmarion
Copy link
Contributor

Added properties and code to the Manager to wait for a minimum number of Managers to start and get the
assistant manager lock before continuing and acquiring the primary lock.

Added properties and code to the Manager to wait for
a minimum number of Managers to start and get the
assistant manager lock before continuing and acquiring
the primary lock.
@dlmarion dlmarion added this to the 4.0.0 milestone Mar 12, 2026
@dlmarion dlmarion requested a review from keith-turner March 12, 2026 15:53
@dlmarion dlmarion self-assigned this Mar 12, 2026
Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reuse of the existing code was nice. Would be good to have test for this, looked at the existing tserver props and it does not seem like that has test though. Maybe both can be tested in a follow on. Does seems like some test use the existing tserver prop, but do not think they are actually testing that prop. Was this manually tested? If so did you look at the logs? The log changes seems ok to me, but sometimes it easier to see bugs in the logging when looking at the actual logs.

@dlmarion
Copy link
Contributor Author

I'd like to merge #6208 first, then update this PR and see if I can use this in the MultipleManagerIT.

@dlmarion dlmarion linked an issue Mar 12, 2026 that may be closed by this pull request
@dlmarion dlmarion merged commit c01b349 into apache:main Mar 12, 2026
9 checks passed

// Wait a few seconds for processes to start and
// for ServerContext to be created
Thread.sleep(10_000);
Copy link
Contributor

@keith-turner keith-turner Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sleep needed w/ the wait for on the next line? Will the wait for error w/o the sleep? If so maybe could make the wait for code just catch errors and log at debug instead of sleeping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sleep is needed to allow Mini some time to start up and set the ServerContext. The call to getCluster().getServerContext() will return null. I could probably add a wait for it to be non-null and remove this sleep.

Comment on lines +42 to +43
String zAsstMgrPath = Constants.ZMANAGER_ASSISTANT_LOCK + "/" + ResourceGroupId.DEFAULT;
return getCluster().getServerContext().getZooSession().getChildren(zAsstMgrPath, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use the server paths code and if it did that could specify to only return entries holding a lock. That will give a stronger check than just seeing the dir in ZK.

// Set this lower so that locks timeout faster
cfg.setProperty(Property.INSTANCE_ZK_TIMEOUT, "5s");
cfg.setProperty(Property.MANAGER_STARTUP_MANAGER_AVAIL_MIN_COUNT, "2");
cfg.setProperty(Property.MANAGER_STARTUP_MANAGER_AVAIL_MAX_WAIT, "10s");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to set this higher as it could interfere w/ the test if things are running slow.

Wait.waitFor(() -> getAssistantManagers().size() == 2);

// The Primary Manager lock should now be acquired yet
assertNotNull(getCluster().getServerContext().getServerPaths().getManager(true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want this in a waitFor. The primary manager may not have picked up the lock yet.

@dlmarion
Copy link
Contributor Author

@keith-turner - sorry, I didn't realize you were going to look at this again since you already approved. I will address your comments above, and fix in a different PR if needed.

@keith-turner
Copy link
Contributor

No problem, it was not merged when I started looking at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wait for a minimum set of Managers before delegating management task

2 participants