improve count accumulator performance by avoiding allocating new hash every time#1070
improve count accumulator performance by avoiding allocating new hash every time#1070markyang-toast wants to merge 3 commits intoblock:mainfrom
Conversation
8b4777b to
f5dbeb5
Compare
myronmarston
left a comment
There was a problem hiding this comment.
Left some inline comments about this. Regarding the benchmark, I recommend using benchmark-ips for future benchmarks. Here's what that looks like for this case:
#!/usr/bin/env ruby
# frozen_string_literal: true
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "benchmark-ips"
end
require "benchmark/ips"
# Compares `hash.merge("type" => "nested")` (allocates new Hash each time)
# vs `hash["type"] ||= "nested"` (mutates in place, no allocation).
#
# This is the optimization applied in CountAccumulator#merge_list_counts_into.
small = {"a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5, "type" => "existing"}
big = (1..100).each_with_object({}) { |i, h| h["k#{i}"] = i }.merge("type" => "existing")
puts "=== Small hash (#{small.size} keys) ==="
puts
Benchmark.ips do |x|
x.config(time: 5, warmup: 2)
x.report("||= assign") do
small["type"] ||= "nested"
end
x.report("merge") do
small.merge("type" => "nested")
end
x.compare!
end
puts
puts "=== Big hash (#{big.size} keys) ==="
puts
Benchmark.ips do |x|
x.config(time: 5, warmup: 2)
x.report("||= assign") do
big["type"] ||= "nested"
end
x.report("merge") do
big.merge("type" => "nested")
end
x.compare!
endAnd here are the results I got:
=== Small hash (6 keys) ===
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin25]
Warming up --------------------------------------
||= assign 3.423M i/100ms
merge 658.384k i/100ms
Calculating -------------------------------------
||= assign 33.737M (± 4.9%) i/s (29.64 ns/i) - 171.166M in 5.090397s
merge 6.327M (± 4.4%) i/s (158.05 ns/i) - 31.602M in 5.004810s
Comparison:
||= assign: 33736984.1 i/s
merge: 6327001.0 i/s - 5.33x slower
=== Big hash (101 keys) ===
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin25]
Warming up --------------------------------------
||= assign 3.015M i/100ms
merge 265.689k i/100ms
Calculating -------------------------------------
||= assign 29.874M (± 2.2%) i/s (33.47 ns/i) - 150.743M in 5.048738s
merge 2.567M (±10.4%) i/s (389.54 ns/i) - 12.753M in 5.031967s
Comparison:
||= assign: 29873750.0 i/s
merge: 2567100.2 i/s - 11.64x slower
Also, feel free to commit benchmarks--that's what the benchmarks directory is for!
Given that this method is called once per ingested event (and not called in any sort of tight loop), I'm not terribly concerned with the slower performance of merge.
| # @implements CountAccumulator | ||
| def self.merge_list_counts_into(params, mapping:, list_counts_field_paths_for_source:) | ||
| # Assign nested type to the property if not yet assigned | ||
| mapping["type"] ||= "nested" |
There was a problem hiding this comment.
I'm concerned that this mutates the mapping argument that's passed in from the caller. I generally favor a functional "avoid mutation if possible" style of code, but am willing to relax that when the mutation is localized and easy to reason about (e.g. if mapping originated within this method, the mutation wouldn't be a problem). But I tend to draw the line when mutation crosses a caller boundary like this, particularly when it's not the point of the method. I've seen too many bugs in my career that have stemmed from this kind of thing.
If you can provide a benchmark showing a noticable/significant different in real-world benchmarks I'll consider this but a microbenchmark like you've provided is IMO not sufficient to outweigh the downsides of changing this method to have a hidden side effect. I've seen plenty of microbenchmarks that don't pan out into any real gains in practice.
There was a problem hiding this comment.
I also have doubt here so I'm still testing it out.
one important thing for us it actually memory allocation so I will also get benchmark on RAM.
.merge() will allocate a new Hash every time while the simple '=' operation mutates mapping (in-place).
This improves CPU usage (since it's quicker) and memory allocation (since it's not allocating new memory)
here is some performance comparison: