Conversation
Added properties and code to the Manager to wait for a minimum number of Managers to start and get the assistant manager lock before continuing and acquiring the primary lock.
keith-turner
left a comment
There was a problem hiding this comment.
The reuse of the existing code was nice. Would be good to have test for this, looked at the existing tserver props and it does not seem like that has test though. Maybe both can be tested in a follow on. Does seems like some test use the existing tserver prop, but do not think they are actually testing that prop. Was this manually tested? If so did you look at the logs? The log changes seems ok to me, but sometimes it easier to see bugs in the logging when looking at the actual logs.
|
I'd like to merge #6208 first, then update this PR and see if I can use this in the MultipleManagerIT. |
|
|
||
| // Wait a few seconds for processes to start and | ||
| // for ServerContext to be created | ||
| Thread.sleep(10_000); |
There was a problem hiding this comment.
Is this sleep needed w/ the wait for on the next line? Will the wait for error w/o the sleep? If so maybe could make the wait for code just catch errors and log at debug instead of sleeping.
There was a problem hiding this comment.
The sleep is needed to allow Mini some time to start up and set the ServerContext. The call to getCluster().getServerContext() will return null. I could probably add a wait for it to be non-null and remove this sleep.
| String zAsstMgrPath = Constants.ZMANAGER_ASSISTANT_LOCK + "/" + ResourceGroupId.DEFAULT; | ||
| return getCluster().getServerContext().getZooSession().getChildren(zAsstMgrPath, null); |
There was a problem hiding this comment.
This could use the server paths code and if it did that could specify to only return entries holding a lock. That will give a stronger check than just seeing the dir in ZK.
| // Set this lower so that locks timeout faster | ||
| cfg.setProperty(Property.INSTANCE_ZK_TIMEOUT, "5s"); | ||
| cfg.setProperty(Property.MANAGER_STARTUP_MANAGER_AVAIL_MIN_COUNT, "2"); | ||
| cfg.setProperty(Property.MANAGER_STARTUP_MANAGER_AVAIL_MAX_WAIT, "10s"); |
There was a problem hiding this comment.
May want to set this higher as it could interfere w/ the test if things are running slow.
| Wait.waitFor(() -> getAssistantManagers().size() == 2); | ||
|
|
||
| // The Primary Manager lock should now be acquired yet | ||
| assertNotNull(getCluster().getServerContext().getServerPaths().getManager(true)); |
There was a problem hiding this comment.
Might want this in a waitFor. The primary manager may not have picked up the lock yet.
|
@keith-turner - sorry, I didn't realize you were going to look at this again since you already approved. I will address your comments above, and fix in a different PR if needed. |
|
No problem, it was not merged when I started looking at it. |
Added properties and code to the Manager to wait for a minimum number of Managers to start and get the
assistant manager lock before continuing and acquiring the primary lock.