[SPGO] Use std::hash instead of MD5 to avoid run time regression in llvm-profgen #180581

HighW4y2H3ll · 2026-02-09T18:29:05Z

#66164 changed the hashing in SampleContextFrame from std::hash to MD5 in a very hot function (ContextTrieNode::getOrCrateChildContext()) in llvm-profgen. This creates over 2x run time regression when running llvm-profgen with csspgo preinliner enabled, since the MD5 computation is tripled comparing to the Murmur hash in the std library. An llvm-profgen run time comparison shows follows:

$ time llvm-profgen -binary $BINARY--perfscript $SAMPLES --populate-profile-symbol-list --show-density --output=XXX

# MD5 hash
real    105m31.644s
user    104m51.334s
sys     0m35.033s

# std::hash
real    46m0.340s
user    45m17.998s
sys     0m38.420s

Can confirm that this patch recovers the run time regression in llvm-profgen, and the perf testing in our internal services shows neutral.

llvmbot · 2026-02-09T18:29:41Z

@llvm/pr-subscribers-pgo

Author: None (HighW4y2H3ll)

Changes

#66164 changed the hashing in SampleContextFrame from std::hash to MD5 in a very hot function (ContextTrieNode::getOrCrateChildContext()) in llvm-profgen. This creates over 2x run time regression when running llvm-profgen with csspgo preinliner enabled, since the MD5 computation is tripled comparing to the Murmur hash in the std library. An llvm-profgen run time comparison shows follows:

$ time llvm-profgen -binary $BINARY--perfscript $SAMPLES --populate-profile-symbol-list --show-density --output=XXX

# MD5 hash
real    105m31.644s
user    104m51.334s
sys     0m35.033s

# std::hash
real    46m0.340s
user    45m17.998s
sys     0m38.420s

Can confirm that this patch recovers the run time regression in llvm-profgen, and the perf testing in our internal services shows neutral.

Full diff: https://github.com/llvm/llvm-project/pull/180581.diff

1 Files Affected:

(modified) llvm/include/llvm/ProfileData/SampleProf.h (+3-1)

diff --git a/llvm/include/llvm/ProfileData/SampleProf.h b/llvm/include/llvm/ProfileData/SampleProf.h
index b75dffaff19f7..8766ab23ac1da 100644
--- a/llvm/include/llvm/ProfileData/SampleProf.h
+++ b/llvm/include/llvm/ProfileData/SampleProf.h
@@ -522,7 +522,9 @@ struct SampleContextFrame {
   }
 
   uint64_t getHashCode() const {
-    uint64_t NameHash = Func.getHashCode();
+    // Context frame hash is heavily used in llvm-profgen context-sensitive
+    // pre-inliner. Use a lightweight hashing here to avoid speed regression.
+    uint64_t NameHash = std::hash<std::string>{}(Func.str());
     uint64_t LocId = Location.getHashCode();
     return NameHash + (LocId << 5) + LocId;
   }

github-actions · 2026-02-09T19:12:22Z

🐧 Linux x64 Test Results

189867 tests passed
5060 tests skipped

✅ The build succeeded and all tests passed.

apolloww · 2026-02-09T19:42:53Z

llvm/include/llvm/ProfileData/SampleProf.h

-    uint64_t NameHash = Func.getHashCode();
+    // Context frame hash is heavily used in llvm-profgen context-sensitive
+    // pre-inliner. Use a lightweight hashing here to avoid speed regression.
+    uint64_t NameHash = std::hash<std::string>{}(Func.str());


I think we only need to recompute hash when the FunctionId is a string.

ah, good catch.. updated! thx

apolloww

LGTM

[SPGO] Use std::hash instead of MD5 to avoid run time regression

e46e4cc

HighW4y2H3ll requested review from MatzeB, WenleiHe, apolloww and huangjd February 9, 2026 18:29

llvmbot added the PGO Profile Guided Optimizations label Feb 9, 2026

HighW4y2H3ll changed the title ~~[SPGO] Use std::hash instead of MD5 to avoid run time regression~~ [SPGO] Use std::hash instead of MD5 to avoid run time regression in llvm-profgen Feb 9, 2026

apolloww reviewed Feb 9, 2026

View reviewed changes

Don't recompute hash if FunctionId is MD5 already

f68533c

apolloww approved these changes Feb 9, 2026

View reviewed changes

HighW4y2H3ll merged commit 37c3241 into llvm:main Feb 9, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPGO] Use std::hash instead of MD5 to avoid run time regression in llvm-profgen #180581

[SPGO] Use std::hash instead of MD5 to avoid run time regression in llvm-profgen #180581

HighW4y2H3ll commented Feb 9, 2026

Uh oh!

llvmbot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

apolloww Feb 9, 2026

Uh oh!

HighW4y2H3ll Feb 9, 2026

Uh oh!

apolloww left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPGO] Use std::hash instead of MD5 to avoid run time regression in llvm-profgen #180581

[SPGO] Use std::hash instead of MD5 to avoid run time regression in llvm-profgen #180581

Conversation

HighW4y2H3ll commented Feb 9, 2026

Uh oh!

llvmbot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

apolloww Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

HighW4y2H3ll Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

apolloww left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Feb 9, 2026 •

edited

Loading