Skip to content

performance: Optimize shared_mutex and fix C++20 modular build errors#7007

Open
arpittkhandelwal wants to merge 1 commit intoTheHPXProject:masterfrom
arpittkhandelwal:optimize-shared-mutex
Open

performance: Optimize shared_mutex and fix C++20 modular build errors#7007
arpittkhandelwal wants to merge 1 commit intoTheHPXProject:masterfrom
arpittkhandelwal:optimize-shared-mutex

Conversation

@arpittkhandelwal
Copy link
Copy Markdown
Contributor

This PR introduces performance optimizations for hpx::shared_mutex and resolves build issues encountered with C++20 modularity.

Key Changes:

  1. Lock-free fast path for lock_shared:
    Added a fast-path to hpx::detail::shared_mutex_data::lock_shared that attempts to acquire a shared lock using an atomic increment before falling back to the internal spinlock. This significantly reduces serialization in read-heavy scenarios, such as AGAS cache lookups.
  2. Reduced atomic refcounting:
    Refactored the hpx::detail::shared_mutex wrapper class to avoid redundant atomic increment/decrement operations of the internal intrusive_ptr on every call.
  3. C++20 modular build fixes:
    Corrected the placement of HPX_CXX_EXPORT in components_base_fwd.hpp and component_type.hpp to ensure compatibility with C++20 modular builds.
  4. New benchmark:
    Added tests/performance/local/shared_mutex_overhead.cpp to quantify the overhead and contention of shared_mutex.

Performance Impact:

Benchmark results on a 4-thread reader-intensive workload (1,000,000 iterations per thread):

  • Baseline: 0.573879s
  • Optimized: 0.275067s
  • Improvement: ~52% reduction in overhead.

These optimizations will directly benefit high-concurrency read paths in HPX, particularly in the AGAS subsystem.

@arpittkhandelwal arpittkhandelwal force-pushed the optimize-shared-mutex branch 2 times, most recently from f982021 to 72858ce Compare March 13, 2026 11:25
@StellarBot
Copy link
Copy Markdown

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)(=)---

Info

PropertyBeforeAfter
HPX Commit0eeca863606f90
HPX Datetime2026-03-09T14:08:29+00:002026-03-13T11:25:36+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime2026-03-09T09:15:24.034803-05:002026-03-13T08:31:44.436445-05:00
Clusternamerostamrostam

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch(=)

Info

PropertyBeforeAfter
HPX Commit0eeca863606f90
HPX Datetime2026-03-09T14:08:29+00:002026-03-13T11:25:36+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime2026-03-09T09:17:15.638328-05:002026-03-13T08:33:38.603173-05:00
Clusternamerostamrostam

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)=---
Stream Benchmark - Scale(=)(=)---
Stream Benchmark - Triad(=)(=)---
Stream Benchmark - Copy(=)-----

Info

PropertyBeforeAfter
HPX Commitba89f5d3606f90
HPX Datetime2026-03-09T18:50:37+00:002026-03-13T11:25:36+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime2026-03-09T17:49:10.837937-05:002026-03-13T08:34:13.207751-05:00
Clusternamerostamrostam

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@codacy-production
Copy link
Copy Markdown

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
Report missing for 89914d31 55.56%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (89914d3) Report Missing Report Missing Report Missing
Head commit (72858ce) 196360 31968 16.28%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#7007) 18 10 55.56%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Footnotes

  1. Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Mar 13, 2026

Please keep the module-related changes separate (you could apply those to the PR that you have already open). Also, please have a look at the compilation errors reported (e.g., https://cdash.rostam.cct.lsu.edu/viewBuildError.php?buildid=42049)

@arpittkhandelwal
Copy link
Copy Markdown
Contributor Author

Please keep the module-related changes separate (you could apply those to the PR that you have oalready open). Also, please have a look at the compilation errors reported (e.g., https://cdash.rostam.cct.lsu.edu/viewBuildError.php?buildid=42049)

I've cleaned the branch to remove unrelated modularization changes and fixed the benchmark compilation error and formatting. It should be ready for review now!

@StellarBot
Copy link
Copy Markdown

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)(=)---

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T14:08:29+00:002026-03-14T02:18:28+00:00
HPX Commit0eeca8635750db
Datetime2026-03-09T09:15:24.034803-05:002026-03-13T21:25:45.841818-05:00
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam
Envfile

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch(=)

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T14:08:29+00:002026-03-14T02:18:28+00:00
HPX Commit0eeca8635750db
Datetime2026-03-09T09:17:15.638328-05:002026-03-13T21:27:40.440850-05:00
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam
Envfile

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)=---
Stream Benchmark - Scale(=)=---
Stream Benchmark - Triad(=)+---
Stream Benchmark - Copy(=)?---

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T18:50:37+00:002026-03-14T02:18:28+00:00
HPX Commitba89f5d35750db
Datetime2026-03-09T17:49:10.837937-05:002026-03-13T21:28:14.628910-05:00
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam
Envfile

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Mar 15, 2026

@arpittkhandelwal Please rebase onto master to fix the reported problems.

@arpittkhandelwal
Copy link
Copy Markdown
Contributor Author

@arpittkhandelwal Please rebase onto master to fix the reported problems.

I have pushed the rebased
New Benchmark Results (after rebase):
Threads: 4
Iterations: 1,000,000
Total Time: 0.31s (Original baseline was ~0.57s)
The performance improvement remains significant (~45% reduction in overhead). The PR is now clean and ready for review!

while (!s.data.exclusive && !s.data.exclusive_waiting_blocked)
{
auto s1 = s;
++s.data.shared_count;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is in a loop now, would the increment result in wrong counter results if the loop is executed more than once?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right. In the previous version, s would have retained the incremented shared_count and tag if set_state failed, leading to an incorrect cumulative total on the next iteration.

I've fixed this in the latest version by ensuring s is reset to the current atomic state (using the value returned in s1 from the failed CAS) at the end of each loop iteration:

            while (!s.data.exclusive && !s.data.exclusive_waiting_blocked)
            {
                auto s1 = s;
                ++s.data.shared_count;
                if (set_state(s1, s))
                {
                    return;
                }
                s.value = s1.value; // Reset to the latest atomic state from CAS failure
            }

This ensures that every retry starts with the most up-to-date counter values. I've also verified this fix with the shared_mutex unit tests and performance benchmarks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to restore s you still need to s = s1;. Otherwise, the counter will keep increasing.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Mar 22, 2026

@arpittkhandelwal Are you still interested in working on this PR?

@arpittkhandelwal
Copy link
Copy Markdown
Contributor Author

@arpittkhandelwal Are you still interested in working on this PR?

Yes sir updated

Comment on lines 557 to 561
if (data_->try_lock_shared())
return;

auto data = data_;
data->lock_shared();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be:

Suggested change
if (data_->try_lock_shared())
return;
auto data = data_;
data->lock_shared();
auto data = data_;
if (data_->try_lock_shared())
return;
data->lock_shared();

?

The same applies to one more spot below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted this change as requested.

@StellarBot
Copy link
Copy Markdown

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)(=)---

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T14:08:29+00:002026-03-23T05:15:39+00:00
HPX Commit0eeca862a31433
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Envfile
Clusternamerostamrostam
Datetime2026-03-09T09:15:24.034803-05:002026-03-23T10:58:09.100311-05:00
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch+++

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T14:08:29+00:002026-03-23T05:15:39+00:00
HPX Commit0eeca862a31433
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Envfile
Clusternamerostamrostam
Datetime2026-03-09T09:17:15.638328-05:002026-03-23T10:59:47.126245-05:00
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)---
Stream Benchmark - Scale(=)-----
Stream Benchmark - Triad(=)----
Stream Benchmark - Copy(=)+++---

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T18:50:37+00:002026-03-23T05:15:39+00:00
HPX Commitba89f5d2a31433
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Envfile
Clusternamerostamrostam
Datetime2026-03-09T17:49:10.837937-05:002026-03-23T11:00:21.250440-05:00
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

{
while (true)
{
auto s = state.load(std::memory_order_acquire);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initial assignment can now happen before the loop starts.

@@ -140,6 +140,27 @@ namespace hpx::detail {
return true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is now (conceptually) different from try_unlock_shared_fast. Why is that the case? The latter resets s = s1; before restarting the loop. The former does not.

@StellarBot
Copy link
Copy Markdown

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)(=)---

Info

PropertyBeforeAfter
HPX Commit0eeca86c25568a
HPX Datetime2026-03-09T14:08:29+00:002026-03-24T18:09:41+00:00
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Clusternamerostamrostam
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Datetime2026-03-09T09:15:24.034803-05:002026-03-24T19:02:22.234435-05:00

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch+++

Info

PropertyBeforeAfter
HPX Commit0eeca86c25568a
HPX Datetime2026-03-09T14:08:29+00:002026-03-24T18:09:41+00:00
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Clusternamerostamrostam
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Datetime2026-03-09T09:17:15.638328-05:002026-03-24T19:04:00.477914-05:00

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)(=)---
Stream Benchmark - Scale=-----
Stream Benchmark - Triad(=)----
Stream Benchmark - Copy(=)+++---

Info

PropertyBeforeAfter
HPX Commitba89f5dc25568a
HPX Datetime2026-03-09T18:50:37+00:002026-03-24T18:09:41+00:00
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Clusternamerostamrostam
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Datetime2026-03-09T17:49:10.837937-05:002026-03-24T19:04:34.428289-05:00

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 11, 2026

@arpittkhandelwal What's your plan with regard to moving this forward?

@codacy-production
Copy link
Copy Markdown

Not up to standards ⛔

🔴 Issues 9 medium

Alerts:
⚠ 9 issues (≤ 0 issues of at least minor severity)

Results:
9 new issues

Category Results
ErrorProne 9 medium

View in Codacy

🟢 Metrics 5 complexity · 0 duplication

Metric Results
Complexity 5
Duplication 0

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

@arpittkhandelwal
Copy link
Copy Markdown
Contributor Author

Hi @hkaiser sir , I’ll take another look at this PR, address the issues, and update it shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants