performance: Optimize shared_mutex and fix C++20 modular build errors by arpittkhandelwal · Pull Request #7007 · TheHPXProject/hpx

arpittkhandelwal · 2026-03-13T11:20:32Z

This PR introduces performance optimizations for hpx::shared_mutex and resolves build issues encountered with C++20 modularity.

Key Changes:

Lock-free fast path for lock_shared:
Added a fast-path to hpx::detail::shared_mutex_data::lock_shared that attempts to acquire a shared lock using an atomic increment before falling back to the internal spinlock. This significantly reduces serialization in read-heavy scenarios, such as AGAS cache lookups.
Reduced atomic refcounting:
Refactored the hpx::detail::shared_mutex wrapper class to avoid redundant atomic increment/decrement operations of the internal intrusive_ptr on every call.
C++20 modular build fixes:
Corrected the placement of HPX_CXX_EXPORT in components_base_fwd.hpp and component_type.hpp to ensure compatibility with C++20 modular builds.
New benchmark:
Added tests/performance/local/shared_mutex_overhead.cpp to quantify the overhead and contention of shared_mutex.

Performance Impact:

Benchmark results on a 4-thread reader-intensive workload (1,000,000 iterations per thread):

Baseline: 0.573879s
Optimized: 0.275067s
Improvement: ~52% reduction in overhead.

These optimizations will directly benefit high-concurrency read paths in HPX, particularly in the AGAS subsystem.

StellarBot · 2026-03-13T13:34:57Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	(=)	---

Info

Property	Before	After
HPX Commit	`0eeca86`	`3606f90`
HPX Datetime	2026-03-09T14:08:29+00:00	2026-03-13T11:25:36+00:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Envfile
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime	2026-03-09T09:15:24.034803-05:00	2026-03-13T08:31:44.436445-05:00
Clustername	rostam	rostam

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	(=)

Info

Property	Before	After
HPX Commit	`0eeca86`	`3606f90`
HPX Datetime	2026-03-09T14:08:29+00:00	2026-03-13T11:25:36+00:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Envfile
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime	2026-03-09T09:17:15.638328-05:00	2026-03-13T08:33:38.603173-05:00
Clustername	rostam	rostam

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	=	---
Stream Benchmark - Scale	(=)	(=)	---
Stream Benchmark - Triad	(=)	(=)	---
Stream Benchmark - Copy	(=)	--	---

Info

Property	Before	After
HPX Commit	`ba89f5d`	`3606f90`
HPX Datetime	2026-03-09T18:50:37+00:00	2026-03-13T11:25:36+00:00
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Envfile
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime	2026-03-09T17:49:10.837937-05:00	2026-03-13T08:34:13.207751-05:00
Clustername	rostam	rostam

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

codacy-production · 2026-03-13T13:58:56Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
Report missing for `89914d3`¹	✅ 55.56%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`89914d3`)	Report Missing	Report Missing	Report Missing
Head commit (`72858ce`)	196360	31968	16.28%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#7007)	18	10	55.56%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct. ↩

hkaiser · 2026-03-13T20:18:09Z

Please keep the module-related changes separate (you could apply those to the PR that you have already open). Also, please have a look at the compilation errors reported (e.g., https://cdash.rostam.cct.lsu.edu/viewBuildError.php?buildid=42049)

arpittkhandelwal · 2026-03-14T02:23:08Z

Please keep the module-related changes separate (you could apply those to the PR that you have oalready open). Also, please have a look at the compilation errors reported (e.g., https://cdash.rostam.cct.lsu.edu/viewBuildError.php?buildid=42049)

I've cleaned the branch to remove unrelated modularization changes and fixed the benchmark compilation error and formatting. It should be ready for review now!

StellarBot · 2026-03-14T02:28:40Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	(=)	---

Info

Property	Before	After
HPX Datetime	2026-03-09T14:08:29+00:00	2026-03-14T02:18:28+00:00
HPX Commit	`0eeca86`	`35750db`
Datetime	2026-03-09T09:15:24.034803-05:00	2026-03-13T21:25:45.841818-05:00
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Clustername	rostam	rostam
Envfile

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	(=)

Info

Property	Before	After
HPX Datetime	2026-03-09T14:08:29+00:00	2026-03-14T02:18:28+00:00
HPX Commit	`0eeca86`	`35750db`
Datetime	2026-03-09T09:17:15.638328-05:00	2026-03-13T21:27:40.440850-05:00
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Clustername	rostam	rostam
Envfile

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	=	---
Stream Benchmark - Scale	(=)	=	---
Stream Benchmark - Triad	(=)	+	---
Stream Benchmark - Copy	(=)	?	---

Info

Property	Before	After
HPX Datetime	2026-03-09T18:50:37+00:00	2026-03-14T02:18:28+00:00
HPX Commit	`ba89f5d`	`35750db`
Datetime	2026-03-09T17:49:10.837937-05:00	2026-03-13T21:28:14.628910-05:00
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Clustername	rostam	rostam
Envfile

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

hkaiser · 2026-03-15T16:42:17Z

@arpittkhandelwal Please rebase onto master to fix the reported problems.

arpittkhandelwal · 2026-03-20T16:52:39Z

@arpittkhandelwal Please rebase onto master to fix the reported problems.

I have pushed the rebased
New Benchmark Results (after rebase):
Threads: 4
Iterations: 1,000,000
Total Time: 0.31s (Original baseline was ~0.57s)
The performance improvement remains significant (~45% reduction in overhead). The PR is now clean and ready for review!

hkaiser · 2026-03-20T22:35:05Z

libs/core/synchronization/include/hpx/synchronization/shared_mutex.hpp

+            while (!s.data.exclusive && !s.data.exclusive_waiting_blocked)
+            {
+                auto s1 = s;
+                ++s.data.shared_count;


Since this is in a loop now, would the increment result in wrong counter results if the loop is executed more than once?

You're absolutely right. In the previous version, s would have retained the incremented shared_count and tag if set_state failed, leading to an incorrect cumulative total on the next iteration.

I've fixed this in the latest version by ensuring s is reset to the current atomic state (using the value returned in s1 from the failed CAS) at the end of each loop iteration:

while (!s.data.exclusive && !s.data.exclusive_waiting_blocked) { auto s1 = s; ++s.data.shared_count; if (set_state(s1, s)) { return; } s.value = s1.value; // Reset to the latest atomic state from CAS failure }

This ensures that every retry starts with the most up-to-date counter values. I've also verified this fix with the shared_mutex unit tests and performance benchmarks.

In order to restore s you still need to s = s1;. Otherwise, the counter will keep increasing.

libs/core/synchronization/include/hpx/synchronization/shared_mutex.hpp

hkaiser · 2026-03-22T13:08:55Z

@arpittkhandelwal Are you still interested in working on this PR?

arpittkhandelwal · 2026-03-22T16:32:51Z

@arpittkhandelwal Are you still interested in working on this PR?

Yes sir updated

libs/core/synchronization/include/hpx/synchronization/shared_mutex.hpp

hkaiser · 2026-03-23T14:19:57Z

libs/core/synchronization/include/hpx/synchronization/shared_mutex.hpp

+            if (data_->try_lock_shared())
+                return;
+
            auto data = data_;
            data->lock_shared();


Should this be:

Suggested change

if (data_->try_lock_shared())

return;

auto data = data_;

data->lock_shared();

auto data = data_;

if (data_->try_lock_shared())

return;

data->lock_shared();

?

The same applies to one more spot below.

I have reverted this change as requested.

libs/core/synchronization/include/hpx/synchronization/shared_mutex.hpp

StellarBot · 2026-03-23T16:00:44Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	(=)	---

Info

Property	Before	After
HPX Datetime	2026-03-09T14:08:29+00:00	2026-03-23T05:15:39+00:00
HPX Commit	`0eeca86`	`2a31433`
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Envfile
Clustername	rostam	rostam
Datetime	2026-03-09T09:15:24.034803-05:00	2026-03-23T10:58:09.100311-05:00
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	+++

Info

Property	Before	After
HPX Datetime	2026-03-09T14:08:29+00:00	2026-03-23T05:15:39+00:00
HPX Commit	`0eeca86`	`2a31433`
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Envfile
Clustername	rostam	rostam
Datetime	2026-03-09T09:17:15.638328-05:00	2026-03-23T10:59:47.126245-05:00
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	---
Stream Benchmark - Scale	(=)	--	---
Stream Benchmark - Triad	(=)	-	---
Stream Benchmark - Copy	(=)	+++	---

Info

Property	Before	After
HPX Datetime	2026-03-09T18:50:37+00:00	2026-03-23T05:15:39+00:00
HPX Commit	`ba89f5d`	`2a31433`
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Envfile
Clustername	rostam	rostam
Datetime	2026-03-09T17:49:10.837937-05:00	2026-03-23T11:00:21.250440-05:00
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

…ctural state management

hkaiser · 2026-03-24T20:00:43Z

libs/core/synchronization/include/hpx/synchronization/shared_mutex.hpp

+        {
+            while (true)
+            {
+                auto s = state.load(std::memory_order_acquire);


This initial assignment can now happen before the loop starts.

hkaiser · 2026-03-24T20:02:51Z

libs/core/synchronization/include/hpx/synchronization/shared_mutex.hpp

@@ -140,6 +140,27 @@ namespace hpx::detail {
            return true;
        }


This function is now (conceptually) different from try_unlock_shared_fast. Why is that the case? The latter resets s = s1; before restarting the loop. The former does not.

StellarBot · 2026-03-25T00:04:56Z

Performance test report

HPX Performance

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR	PARALLEL_EXECUTOR	SCHEDULER_EXECUTOR
For Each	(=)	(=)	---

Info

Property	Before	After
HPX Commit	`0eeca86`	`c25568a`
HPX Datetime	2026-03-09T14:08:29+00:00	2026-03-24T18:09:41+00:00
Envfile
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Clustername	rostam	rostam
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Datetime	2026-03-09T09:15:24.034803-05:00	2026-03-24T19:02:22.234435-05:00

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	+++

Info

Property	Before	After
HPX Commit	`0eeca86`	`c25568a`
HPX Datetime	2026-03-09T14:08:29+00:00	2026-03-24T18:09:41+00:00
Envfile
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Clustername	rostam	rostam
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Datetime	2026-03-09T09:17:15.638328-05:00	2026-03-24T19:04:00.477914-05:00

Comparison

BENCHMARK	FORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATOR	PARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATOR	SCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add	(=)	(=)	---
Stream Benchmark - Scale	=	--	---
Stream Benchmark - Triad	(=)	-	---
Stream Benchmark - Copy	(=)	+++	---

Info

Property	Before	After
HPX Commit	`ba89f5d`	`c25568a`
HPX Datetime	2026-03-09T18:50:37+00:00	2026-03-24T18:09:41+00:00
Envfile
Compiler	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8	/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Clustername	rostam	rostam
Hostname	medusa08.rostam.cct.lsu.edu	medusa08.rostam.cct.lsu.edu
Datetime	2026-03-09T17:49:10.837937-05:00	2026-03-24T19:04:34.428289-05:00

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (≤5%)
++/--	Large performance improvement/degradation (≤10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

hkaiser · 2026-04-11T22:14:03Z

@arpittkhandelwal What's your plan with regard to moving this forward?

codacy-production · 2026-04-12T16:39:11Z

Not up to standards ⛔

🔴 Issues 9 medium

Alerts:
⚠ 9 issues (≤ 0 issues of at least minor severity)

Results:
9 new issues

Category Results

ErrorProne 9 medium

View in Codacy

🟢 Metrics 5 complexity · 0 duplication

Metric Results

Complexity 5

Duplication 0

View in Codacy

_{TIP This summary will be updated as you push new changes. Give us feedback}

arpittkhandelwal · 2026-04-13T16:24:05Z

Hi @hkaiser sir , I’ll take another look at this PR, address the issues, and update it shortly.

arpittkhandelwal requested a review from hkaiser as a code owner March 13, 2026 11:20

arpittkhandelwal force-pushed the optimize-shared-mutex branch 2 times, most recently from f982021 to 72858ce Compare March 13, 2026 11:25

hkaiser added type: enhancement category: LCOs type: compatibility issue labels Mar 13, 2026

arpittkhandelwal force-pushed the optimize-shared-mutex branch from 72858ce to c16cfa8 Compare March 14, 2026 02:18

arpittkhandelwal force-pushed the optimize-shared-mutex branch from c16cfa8 to 9ffd2f2 Compare March 20, 2026 16:51

hkaiser reviewed Mar 20, 2026

View reviewed changes

arpittkhandelwal force-pushed the optimize-shared-mutex branch from 9ffd2f2 to af1f93a Compare March 22, 2026 16:31

hkaiser reviewed Mar 22, 2026

View reviewed changes

libs/core/synchronization/include/hpx/synchronization/shared_mutex.hpp Outdated Show resolved Hide resolved

libs/core/synchronization/include/hpx/synchronization/shared_mutex.hpp Show resolved Hide resolved

arpittkhandelwal force-pushed the optimize-shared-mutex branch from 4ea0f5f to 714ce34 Compare March 22, 2026 18:29

hkaiser reviewed Mar 23, 2026

View reviewed changes

arpittkhandelwal force-pushed the optimize-shared-mutex branch from 8cebde5 to 608dc81 Compare March 24, 2026 18:08

performance: Optimize shared_mutex with lock-free fast paths and stru…

17510da

…ctural state management

arpittkhandelwal force-pushed the optimize-shared-mutex branch from 608dc81 to 17510da Compare March 24, 2026 18:09

hkaiser reviewed Mar 24, 2026

View reviewed changes

Uh oh!

Conversation

arpittkhandelwal commented Mar 13, 2026

Key Changes:

Performance Impact:

Uh oh!

StellarBot commented Mar 13, 2026

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

Uh oh!

codacy-production bot commented Mar 13, 2026

Coverage summary from Codacy

Footnotes

Uh oh!

hkaiser commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arpittkhandelwal commented Mar 14, 2026

Uh oh!

StellarBot commented Mar 14, 2026

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

Uh oh!

hkaiser commented Mar 15, 2026

Uh oh!

arpittkhandelwal commented Mar 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hkaiser commented Mar 22, 2026

Uh oh!

arpittkhandelwal commented Mar 22, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

StellarBot commented Mar 23, 2026

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StellarBot commented Mar 25, 2026

HPX Performance

Comparison

Info

Comparison

Info

Comparison

Info

Explanation of Symbols

hkaiser commented Mar 13, 2026 •

edited

Loading