The (obvious) little benchmark: slice access efficiency

Goal

Comparing the efficiency of slice access using index and element in time and memory.

(edit) Accessing slice by index means accesing slice item using square bracket (i.e. slice[i]), while slice by element means accessing slice item using return value from range (i.e. index, item := range slice).

This is a respond to @dfc comment in this thread.

Environment

  • Go version devel +ca47157 Thu Dec 31 00:20:54 2015 +0000 linux/amd64

Result

In this little benchmark we will sum slice of 100000 integer, one will access the slice element using index and another access the element using range.

Slice by Index

Source for sum slice of integer by index: https://gist.github.com/shuLhan/1d02353c777a1d154c81

Memory profiling output:

804.95kB of 804.95kB total (  100%)
Dropped 4 nodes (cum <= 4.02kB)
      flat  flat%   sum%        ■■■   ■■■%
     784kB 97.40% 97.40%      784kB 97.40%  main.SumByIndex
   20.95kB  2.60%   100%    20.95kB  2.60%  runtime.malg
         0     0%   100%      784kB 97.40%  main.main
         0     0%   100%     8.38kB  1.04%  runtime.allocm
...

CPU profiling output:

24.15s of 25.78s total (93.68%)
Dropped 118 nodes (cum <= 0.13s)
      flat  flat%   sum%        ■■■   ■■■%
    12.97s 50.31% 50.31%     17.92s 69.51%  main.SumByIndex
     4.26s 16.52% 66.83%      4.26s 16.52%  runtime.memclr
     1.25s  4.85% 71.68%      1.36s  5.28%  runtime.scanblock
     0.88s  3.41% 75.10%      1.01s  3.92%  runtime.scanobject
...

Slice by Element

Source for sum slice of integer by element: https://gist.github.com/shuLhan/a8e55d5845e4e9d22bb1

Mem profile output:

ms 0 % go tool pprof -text sumbyelm mem.pprof
1576.38kB of 1576.38kB total (  100%)
Dropped 10 nodes (cum <= 7.88kB)
      flat  flat%   sum%        ■■■   ■■■%
    1568kB 99.47% 99.47%     1568kB 99.47%  main.SumByElm
    8.38kB  0.53%   100%     8.38kB  0.53%  runtime.malg
         0     0%   100%     1568kB 99.47%  main.main
...

CPU profile output:

ms 0 % go tool pprof -text sumbyelm cpu.pprof
22.69s of 24.46s total (92.76%)
Dropped 122 nodes (cum <= 0.12s)
      flat  flat%   sum%        ■■■   ■■■%
    11.30s 46.20% 46.20%     16.55s 67.66%  main.SumByElm
     4.54s 18.56% 64.76%      4.54s 18.56%  runtime.memclr
     1.29s  5.27% 70.03%      1.46s  5.97%  runtime.scanblock
     0.97s  3.97% 74.00%      1.08s  4.42%  runtime.scanobject
...

Conclusion

  • Accessing slice by index faster by ~8-9% margin than accessing using element.
  • Accessing slice by element required 2x memory than accessing by index.
  • There is very little or no difference when using for with three clause syntax (x := 0; x < n; x++) than with range.

Well, even if this benchmark is look obvious, I hope I can sleep better now :smile: .

1 Like

Running a whole problem, then using the profiler to observe the whole program is a very coarse grained hammer. You should write this as a benchmark.

1 Like

Also, you’re using an old version of that package I recommend pkg/profile as a replacement.

Also pay attention to the last line of the README

Enabling more than one profile at once may cause your results to be less reliable as profiling itself is not without overhead.

1 Like

I write it as benchmark first [1], but then I can’t figure out how to compare the memory usage so I split it and use the profiling.

Here is the output using benchmark,

ms 2 % go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkSumByIndex10000-8         50000             24779 ns/op
BenchmarkSumByElm10000-8          100000             23206 ns/op
BenchmarkSumByIndex1000000-8        1000           2163662 ns/op
BenchmarkSumByElm1000000-8          1000           1977207 ns/op
ok      _/home/ms/Unduhan/sandbox/go/benchmark  8.618s

[1] https://gist.github.com/shuLhan/116e8e526ea20af81f54

You want b.ReportAllocs

At a guess your benchmark is being dominated by the large make at the top of each benchmark run. Move that out of the function under test.

The link that you gave is 404, not found. I search on godoc, did you mean this: profile package - github.com/pkg/profile - Go Packages ?

Yup, that’s what I get for typing it from memory.

DISREGARD my first post. I run the test again using pkg/profile as recommended above, the result showed that both use the same memory but the slice-by-index access take a little bit longer (still by ~8% margin) than slice-by-element. I am surprised, because I assume it would be otherwise.

The source for each benchmark still in the same links.

Slice by Index

Mem profile output,

ms 0 % go tool pprof -text sumbyindex mem.pprof
788.52kB of 788.52kB total (  100%)
Dropped 4 nodes (cum <= 3.94kB)
      flat  flat%   sum%        ■■■   ■■■%
     784kB 99.43% 99.43%      784kB 99.43%  main.SumByIndex
    4.52kB  0.57%   100%     4.52kB  0.57%  runtime.allocm
         0     0%   100%      784kB 99.43%  main.main
...

CPU profile output,

ms 0 % go tool pprof -text sumbyindex cpu.pprof
24.24s of 25.82s total (93.88%)
Dropped 112 nodes (cum <= 0.13s)
      flat  flat%   sum%        ■■■   ■■■%
    12.56s 48.64% 48.64%     18.53s 71.77%  main.SumByIndex
     5.19s 20.10% 68.75%      5.19s 20.10%  runtime.memclr
     0.97s  3.76% 72.50%      1.10s  4.26%  runtime.scanblock
     0.94s  3.64% 76.14%      0.94s  3.64%  runtime.futex
...

Slice by Element

Mem profile output,

ms 0 % go tool pprof -text sumbyelm mem.pprof
788.52kB of 788.52kB total (  100%)
Dropped 6 nodes (cum <= 3.94kB)
      flat  flat%   sum%        ■■■   ■■■%
     784kB 99.43% 99.43%      784kB 99.43%  main.SumByElm
    4.52kB  0.57%   100%     4.52kB  0.57%  runtime.allocm
         0     0%   100%      784kB 99.43%  main.main
...

CPU profile output,

ms 0 % go tool pprof -text sumbyelm cpu.pprof
22.76s of 24.36s total (93.43%)
Dropped 106 nodes (cum <= 0.12s)
      flat  flat%   sum%        ■■■   ■■■%
    11.23s 46.10% 46.10%     16.76s 68.80%  main.SumByElm
     4.53s 18.60% 64.70%      4.53s 18.60%  runtime.memclr
     1.27s  5.21% 69.91%      1.36s  5.58%  runtime.scanblock
     0.93s  3.82% 73.73%      0.93s  3.82%  runtime.futex
...

Thanks @dfc.

I don’t know what you mean why you say -by element or -by index, we don’t use those terms.

I recommend you write this as a standard go benchmark, then you get profiling for free.

Also, you haven’t tested while looping over a slice of a larger structure, you’re just doing ints.

What I mean by element is k, v := range slice, where v is an element of slice, while by index is slice[index].

Benchmark file.

That would cause a cache miss on program and make noisy in benchmark.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.