improve count accumulator performance by avoiding allocating new hash every time by markyang-toast · Pull Request #1070 · block/elasticgraph

markyang-toast · 2026-03-10T15:25:27Z

.merge() will allocate a new Hash every time while the simple '=' operation mutates mapping (in-place).
This improves CPU usage (since it's quicker) and memory allocation (since it's not allocating new memory)
here is some performance comparison:

•	small hash (~5 keys): ||= ~ 63 ns vs merge ~ 392 ns → ~6.2× slower
•	big hash (~100 keys): ||= ~ 56 ns vs merge ~ 1101 ns → ~19.5× slower

# frozen_string_literal: true
require "objspace"

def bench(label, n)
  GC.start
  before = GC.stat(:total_allocated_objects)
  t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
  yield
  t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
  after = GC.stat(:total_allocated_objects)

  elapsed = t1 - t0
  puts "%-35s %8.1f ns/iter  %5.2f alloc/iter" % [
    label,
    elapsed * 1e9 / n,
    (after - before).to_f / n
  ]
end

small = { "a"=>1, "b"=>2, "c"=>3, "d"=>4, "e"=>5 }.merge("type"=>"existing")
big   = (1..100).each_with_object({}) { |i,h| h["k#{i}"]=i }.merge("type"=>"existing")

n_small = 5_000_000
n_big   = 1_000_000

bench("small ||= (type present)", n_small) { h=small; i=0; while i<n_small; h["type"] ||= "nested"; i+=1; end }
bench("small merge (type present)", n_small) { h=small; x=nil; i=0; while i<n_small; x=h.merge("type"=>"nested"); i+=1; end; x }

bench("big ||= (type present)", n_big) { h=big; i=0; while i<n_big; h["type"] ||= "nested"; i+=1; end }
bench("big merge (type present)", n_big) { h=big; x=nil; i=0; while i<n_big; x=h.merge("type"=>"nested"); i+=1; end; x }

CLAassistant · 2026-03-10T15:25:58Z

All committers have signed the CLA.

… every time

myronmarston

Left some inline comments about this. Regarding the benchmark, I recommend using benchmark-ips for future benchmarks. Here's what that looks like for this case:

#!/usr/bin/env ruby
# frozen_string_literal: true

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "benchmark-ips"
end

require "benchmark/ips"

# Compares `hash.merge("type" => "nested")` (allocates new Hash each time)
# vs `hash["type"] ||= "nested"` (mutates in place, no allocation).
#
# This is the optimization applied in CountAccumulator#merge_list_counts_into.

small = {"a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5, "type" => "existing"}
big = (1..100).each_with_object({}) { |i, h| h["k#{i}"] = i }.merge("type" => "existing")

puts "=== Small hash (#{small.size} keys) ==="
puts

Benchmark.ips do |x|
  x.config(time: 5, warmup: 2)

  x.report("||= assign") do
    small["type"] ||= "nested"
  end

  x.report("merge") do
    small.merge("type" => "nested")
  end

  x.compare!
end

puts
puts "=== Big hash (#{big.size} keys) ==="
puts

Benchmark.ips do |x|
  x.config(time: 5, warmup: 2)

  x.report("||= assign") do
    big["type"] ||= "nested"
  end

  x.report("merge") do
    big.merge("type" => "nested")
  end

  x.compare!
end

And here are the results I got:

=== Small hash (6 keys) ===

ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin25]
Warming up --------------------------------------
          ||= assign     3.423M i/100ms
               merge   658.384k i/100ms
Calculating -------------------------------------
          ||= assign     33.737M (± 4.9%) i/s   (29.64 ns/i) -    171.166M in   5.090397s
               merge      6.327M (± 4.4%) i/s  (158.05 ns/i) -     31.602M in   5.004810s

Comparison:
          ||= assign: 33736984.1 i/s
               merge:  6327001.0 i/s - 5.33x  slower


=== Big hash (101 keys) ===

ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin25]
Warming up --------------------------------------
          ||= assign     3.015M i/100ms
               merge   265.689k i/100ms
Calculating -------------------------------------
          ||= assign     29.874M (± 2.2%) i/s   (33.47 ns/i) -    150.743M in   5.048738s
               merge      2.567M (±10.4%) i/s  (389.54 ns/i) -     12.753M in   5.031967s

Comparison:
          ||= assign: 29873750.0 i/s
               merge:  2567100.2 i/s - 11.64x  slower

Also, feel free to commit benchmarks--that's what the benchmarks directory is for!

Given that this method is called once per ingested event (and not called in any sort of tight loop), I'm not terribly concerned with the slower performance of merge.

myronmarston · 2026-03-10T16:17:26Z

elasticgraph-indexer/lib/elastic_graph/indexer/operation/count_accumulator.rb

        # @implements CountAccumulator
        def self.merge_list_counts_into(params, mapping:, list_counts_field_paths_for_source:)
+          # Assign nested type to the property if not yet assigned
+          mapping["type"] ||= "nested"


I'm concerned that this mutates the mapping argument that's passed in from the caller. I generally favor a functional "avoid mutation if possible" style of code, but am willing to relax that when the mutation is localized and easy to reason about (e.g. if mapping originated within this method, the mutation wouldn't be a problem). But I tend to draw the line when mutation crosses a caller boundary like this, particularly when it's not the point of the method. I've seen too many bugs in my career that have stemmed from this kind of thing.

If you can provide a benchmark showing a noticable/significant different in real-world benchmarks I'll consider this but a microbenchmark like you've provided is IMO not sufficient to outweigh the downsides of changing this method to have a hidden side effect. I've seen plenty of microbenchmarks that don't pan out into any real gains in practice.

I also have doubt here so I'm still testing it out.
one important thing for us it actually memory allocation so I will also get benchmark on RAM.

markyang-toast requested review from BrianSigafoos-SQ, ayousufi, bsorbo, jwils, jwondrusch and myronmarston as code owners March 10, 2026 15:25

improve count accumulator performance by avoiding allocating new hash…

f5dbeb5

… every time

markyang-toast force-pushed the improve-count-accumulator-performance branch from 8b4777b to f5dbeb5 Compare March 10, 2026 15:59

Merge branch 'main' into improve-count-accumulator-performance

be624d1

markyang-toast marked this pull request as draft March 10, 2026 16:09

add a comment

3690ba4

myronmarston requested changes Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve count accumulator performance by avoiding allocating new hash every time#1070

improve count accumulator performance by avoiding allocating new hash every time#1070
markyang-toast wants to merge 3 commits intoblock:mainfrom
markyang-toast:improve-count-accumulator-performance

markyang-toast commented Mar 10, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 10, 2026 •

edited

Loading

Uh oh!

myronmarston left a comment

Uh oh!

myronmarston Mar 10, 2026

Uh oh!

markyang-toast Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

markyang-toast commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myronmarston left a comment

Choose a reason for hiding this comment

Uh oh!

myronmarston Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

markyang-toast Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

markyang-toast commented Mar 10, 2026 •

edited

Loading

CLAassistant commented Mar 10, 2026 •

edited

Loading