fix assertion failure in distributed barrier#6871
fix assertion failure in distributed barrier#6871the-ivii wants to merge 6 commits intoTheHPXProject:masterfrom
Conversation
|
Can one of the admins verify this patch? |
6048f0b to
5af888e
Compare
| init_args.desc_cmdline = description; | ||
|
|
||
| HPX_TEST_EQ(hpx::init(argc, argv, init_args), 0); | ||
| hpx::distributed::barrier::synchronize(); |
There was a problem hiding this comment.
Only a few HPX facilities can be used when the runtime is not active (like here). The distributed barrier is not one of them.
Did you see actual code that is doing this? Or is this an artificial construct meant to test your change to synchronize() above?
There was a problem hiding this comment.
this is a test designed to simulate a race condition we saw during teardown in distributed runs. It is not meant to encourage post runtime usage, but just to ensure that if a sync call overlaps with cleanup, it returns gracefully instead of triggering a hard assertion failure.
There was a problem hiding this comment.
Wouldn't your solution just hide the problem?
Also, I'm glad you now mention under what circumstances you saw the original issue. If it happens during system teardown, then it's definitely not a problem in the initialization (as I initially assumed) but a possible sequencing error while the runtime exits (possibly caused by the sequencing of global destructors being undefined).
IIRC, the barrier in this case is used to make sure that all localities have finished executing.
I'd appreciate it if you could provide a stack backtrace of the original assert you were seeing. That may allow us to see which object is destroyed too late.
|
@the-ivii Could you please rebase your branch onto master (and resolve the conflicts)? |
|
@the-ivii What are your plans regarding this PR? Will you have the time to work on it? |
a42a94b to
191ad62
Compare
| HPX_TEST_EQ(hpx::init(argc, argv, init_args), 0); | ||
|
|
||
| return hpx::util::report_errors(); | ||
| } |
There was a problem hiding this comment.
Does this test reproduce the problem for you (without your change)?
Signed-off-by: the-ivii <divyanshitalreja12@gmail.com>
Signed-off-by: the-ivii <divyanshitalreja12@gmail.com>
c663e38 to
bafd7c6
Compare
|
The issue reported by the clang-format CI should be fixed. Also, the MacOS builders fail your new test. Please have an eye on the CIs yourself. |
f7fbcf8 to
26b73d6
Compare
|
@the-ivii there is still the clang-format issue, also your changes cause compilations errors. Please have a look. |
Signed-off-by: the-ivii <divyanshitalreja12@gmail.com>
Signed-off-by: the-ivii <divyanshitalreja12@gmail.com>
26b73d6 to
a807991
Compare
Signed-off-by: the-ivii <divyanshitalreja12@gmail.com>
Signed-off-by: the-ivii <divyanshitalreja12@gmail.com>
cb127a5 to
74d11aa
Compare
|
@the-ivii ping? |
Fixes #6430
Proposed Changes
Any background context you want to provide?
Checklist
Not all points below apply to all pull requests.