Benchmark It: Optimization Decisions Based on Real Data and Not Vibes

Benchmark It: Optimization Decisions Based on Real Data and Not Vibes
Photo by Ashley Whitlatch / Unsplash

The performance of our code is something that we should all care about deeply as developers. We should always be striving to make our code run as quickly and as efficiently as we can, within reason. But that begs the question, how can we objectively measure how fast our code is?

Rather than guessing at what implementation we are considering based purely on vibes, benchmarking can give us empirical data so we can make good decisions. Specifically, we want to do this when we are optimizing code paths, comparing different approaches, and evaluating third-party libraries.

We can’t rely on our instincts and vibes to determine how good our code is, we need data. Ruby has a library for that.

Getting Started

We are going to be using the Benchmark library, which is a part of the Ruby standard library. We start by requiring it.

require 'benchmark'

The first thing we are going to do is measure how long it will take to create an array of one million elements.

require 'benchmark'

time = Benchmark.measure do
  array = Array.new(1_000_000) do |i| 
	  i 
  end
end

puts time

We will get an output that looks like this:

0.024869   0.000951   0.025820 (  0.025847)

These numbers represent from right to left:

  • User CPU time
  • System CPU time
  • Total CPU time (user CPU time + system CPU time)
  • Real Elapsed Time - this is in parentheses.

The real elapsed time is what we are interested in because it represents what users experience. User CPU time is the time spent by the computer executing our code in user mode, which is the Ruby code we have written, the Ruby standard library stuff, code in any gems we are using and the Ruby interpreter. System CPU time is lower level stuff like all of the things our operating system is doing as requested by our Ruby, such as file operations, memory allocation and management, etc.

I Want Prettier!

Admittedly, that output is not that easy to read. So we can take things to the next level. We can use Benchmark.bm to generate a much more readable result and let us compare algorithmic approaches against each other.

This code will do the same thing - create an array of one million elements, but it will do so using three different approaches and then give us a report that will allow us to determine which approach is the best. I want to note that the 10 being passed to the #bm method specifies the width of each column. You can set it to whatever, we are just using 10 because that suits our purposes here.

require 'benchmark'

n = 1_000_000

Benchmark.bm(10) do |x|
  x.report("Array.new:") { Array.new(n) { |i| i } }
  x.report("Range:") { (0...n).to_a }
  x.report("upto:") { 
    array = []
    0.upto(n-1) { |i| array << i }
    array
  }
end

Running that we get output similar to this:

                 user     system      total        real
Array.new:   0.024696   0.000604   0.025300 (  0.025357)
Range:       0.011129   0.001368   0.012497 (  0.012498)
upto:        0.027432   0.001566   0.028998 (  0.029064)

Just with a glance at this, we can tell that using the Range Approach is the fastest.

two brown palm trees
Photo by Gilberto Olimpio / Unsplash

Warm It Up

An important thing that we should be aware of is that when we benchmark in Ruby, occasionally, the first run of a bit of code will perform differently than in subsequent runs. To address this, we have the very creatively named #bmbm method. What this does is that it will run your code twice. The first time is considered a “rehearsal” in order to warm things up and the second time it does it for realsies. It will provide data for both so you can determine if there was some sort of impact.

Let’s look at an example. We are going to create a large array, and do some stuff to it:

require 'benchmark'

array = (1..1_000_000).to_a
sorted = array.dup
reversed = array.reverse

Benchmark.bmbm(7) do |x|
  x.report("sort:") { array.dup.sort }
  x.report("sort!:") { sorted.dup.sort! }
  x.report("reverse:") { reversed.dup.sort }
end

Running this, we get this result:

Rehearsal --------------------------------------------
sort:      0.005534   0.001237   0.006771 (  0.006823)
sort!:     0.005565   0.001304   0.006869 (  0.006881)
reverse:   0.027210   0.001318   0.028528 (  0.028594)
----------------------------------- total: 0.042168sec

               user     system      total        real
sort:      0.005511   0.001054   0.006565 (  0.006577)
sort!:     0.005554   0.001061   0.006615 (  0.006617)
reverse:   0.026982   0.001266   0.028248 (  0.028336)

The code will tell you how long the rehearsal took. We can see that everything ran just slightly faster.

But Does it Scale?

We can use this library to create a report to show us if our code will scale well.

This bit of code will run the code we are testing four times with different sizes to see how it performs based on how much data it has to deal with.

require 'benchmark'

sizes = [50_000, 100_000, 500_000, 1_000_000]

Benchmark.benchmark(Benchmark::CAPTION, 15, Benchmark::FORMAT, ">total:", ">avg:") do |x|
  results = sizes.map do |size|
    x.report("Array(#{size}):") do
      Array.new(size) { |i| i }
    end
  end
  
  sum = results.inject(:+)
  
  [sum, sum / results.size]
end

These are the results generated:

                      user     system      total        real
Array(50000):     0.001398   0.000038   0.001436 (  0.001462)
Array(100000):    0.002823   0.000114   0.002937 (  0.002938)
Array(500000):    0.014158   0.000335   0.014493 (  0.014605)
Array(1000000):   0.028183   0.000675   0.028858 (  0.028969)
>total:           0.046562   0.001162   0.047724 (  0.047974)
>avg:             0.011640   0.000291   0.011931 (  0.011994)

Conclusion

You should be benchmarking your code, and though there are other tools for it, Ruby gives you one in its standard library. With this knowledge, we can make sure that we are making optimization decisions based on real data and not vibes and we can understand performance implications of various algorithmic approaches.

However, remember that performance is only one dimension of good code. Readability and maintainability are generally much much more important. The fastest code is not always the best choice.

This blog was written by Turing instructor, Mike Dao.

Be sure to follow us on Instagram, X, and LinkedIn - @Turing_School