Why is func defined with generics performing so badly?

I read somewhere it might be better to use a function with generic parameter (instead of interface) so that go can avoid dynamic dispatch. But when I test that assumption using benchmarks:

package main

import (
	"testing"
)

type Closeable interface {
	Close()
}

type MyCloseable struct{}

func (m *MyCloseable) Close() {}

func closeWithInterface(c Closeable) {
	c.Close()
}

func closeWithGeneric[T Closeable](c T) {
	c.Close()
}

func closeWithConcrete(c *MyCloseable) {
	c.Close()
}

func BenchmarkCloseWithInterface(b *testing.B) {
	c := &MyCloseable{}
	b.ReportAllocs()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		closeWithInterface(c)
	}
}

func BenchmarkCloseWithGeneric(b *testing.B) {
	c := &MyCloseable{}
	b.ReportAllocs()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		closeWithGeneric(c)
	}
}

func BenchmarkCloseWithConcrete(b *testing.B) {
	c := &MyCloseable{}
	b.ReportAllocs()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		closeWithConcrete(c)
	}
}

I get this result

Running tool: /opt/homebrew/bin/go test -test.fullpath=true -benchmem -run=^$ -bench ^(BenchmarkCloseWithInterface|BenchmarkCloseWithGeneric|BenchmarkCloseWithConcrete)

goos: darwin
goarch: arm64
pkg: ####
cpu: Apple M3 Max
BenchmarkCloseWithInterface-14    	1000000000	         0.2552 ns/op	       0 B/op	       0 allocs/op
BenchmarkCloseWithGeneric-14      	1000000000	         0.7835 ns/op	       0 B/op	       0 allocs/op
BenchmarkCloseWithConcrete-14     	1000000000	         0.2490 ns/op	       0 B/op	       0 allocs/op
PASS

I assumed the bm result would be like Concrete > Generics > Interface
but actually Concrete > Interface > Generics

Why is it so?

Go version = 1.25

The calls are not equivalent. All three funcs should accept a pointer, probably. How does the benchmark look then?

Well, on my machine I have generics > interface > concrete

test-generic % go test -bench=. -run=^$ -benchmem -benchtime=1000000000x ./...                                                                         
goos: darwin
goarch: arm64
pkg: test-generic
cpu: Apple M1 Pro
BenchmarkCloseWithInterface-10          1000000000               2.055 ns/op           0 B/op          0 allocs/op
BenchmarkCloseWithGeneric-10            1000000000               2.043 ns/op           0 B/op          0 allocs/op
BenchmarkCloseWithConcrete-10           1000000000               2.156 ns/op           0 B/op          0 allocs/op
PASS
ok      test-generic    6.748s

And with hyperfine:

Benchmark 1: go test -bench=^BenchmarkCloseWithInterface$ -run=^$ -benchmem -benchtime=1000000000x
  Time (mean ± σ):      2.465 s ±  0.045 s    [User: 2.218 s, System: 0.216 s]
  Range (min … max):    2.405 s …  2.632 s    20 runs
 
Benchmark 2: go test -bench=^BenchmarkCloseWithGeneric$ -run=^$ -benchmem -benchtime=1000000000x
  Time (mean ± σ):      2.460 s ±  0.008 s    [User: 2.228 s, System: 0.218 s]
  Range (min … max):    2.446 s …  2.473 s    20 runs
 
Benchmark 3: go test -bench=^BenchmarkCloseWithConcrete$ -run=^$ -benchmem -benchtime=1000000000x
  Time (mean ± σ):      2.582 s ±  0.011 s    [User: 2.347 s, System: 0.221 s]
  Range (min … max):    2.562 s …  2.607 s    20 runs
 
Summary
  go test -bench=^BenchmarkCloseWithGeneric$ -run=^$ -benchmem -benchtime=1000000000x ran
    1.00 ± 0.02 times faster than go test -bench=^BenchmarkCloseWithInterface$ -run=^$ -benchmem -benchtime=1000000000x
    1.05 ± 0.01 times faster than go test -bench=^BenchmarkCloseWithConcrete$ -run=^$ -benchmem -benchtime=1000000000x

Generics show better performance.

1 Like

Well, on my machine I have generics > interface > concrete

Which go version are you using?

% go version
go version go1.25.5 darwin/arm64

my cpu is M3 Max, is that the problem?

Running the test in Fedora 42: generics looks much worse than the other alternatives

$ cat /etc/os-release  | head -5
NAME=“Fedora Linux”
VERSION=“42 (Adams)”
RELEASE_TYPE=stable
ID=fedora
VERSION_ID=42
$ go version
go version go1.24.11 linux/amd64

$ go test -test.fullpath=true -benchmem -run=‘^Test’ -bench=‘.’
goos: linux
goarch: amd64
pkg: generics
cpu: Intel(R) Core™ i7-8650U CPU @ 1.90GHz
BenchmarkCloseWithInterface-4 1000000000 0.2608 ns/op 0 B/op 0 allocs/op
BenchmarkCloseWithGeneric-4 833901868 1.324 ns/op 0 B/op 0 allocs/op
BenchmarkCloseWithConcrete-4 1000000000 0.2583 ns/op 0 B/op 0 allocs/op
PASS
ok generics 1.833s

1 Like

Interesting. Here are my results:

~/P/g/genericbench $ go test -bench=. -benchmem -benchtime=1000000000x ./...
goos: darwin
goarch: arm64
pkg: deantest/genericbench
cpu: Apple M4
BenchmarkCloseWithInterface-10          1000000000               0.2463 ns/op          0 B/op          0 allocs/op
BenchmarkCloseWithGeneric-10            1000000000               0.6762 ns/op          0 B/op          0 allocs/op
BenchmarkCloseWithConcrete-10           1000000000               0.2256 ns/op          0 B/op          0 allocs/op
PASS
ok      deantest/genericbench   1.295s
~/P/g/genericbench $ go version
go version go1.25.5 darwin/arm64

Also using Apple silicon and the same version of Go you are. Strange we are getting such different results!

That said, this is probably a non-issue. First off: this benchmark is so contrived, I think it’s likely the compiler is optimizing something away in the interface/concrete versions because it knows the code isn’t doing anything. Benchmarks are hard to get right because compilers are super clever. I would try benchmarking real code that actually does something and see what happens.

Second: I want to put into perspective how fast 0.6 nanoseconds is. A nanosecond is a billionth of a second. Whatever overhead you are potentially incurring here will almost certainly not be your bottleneck in the real world and will be immaterial.

Anyway, a note on compiler optimizations:

Before concluding I wanted to highlight that to be completely accurate, any benchmark should be careful to avoid compiler optimisations eliminating the function under test and artificially lowering the run time of the benchmark.

What happens if you refactor your code to use b.Loop?

1 Like

I think b.Loop is making things right

goos: darwin
goarch: arm64
pkg: github.com/kjsingh/jedi/src
cpu: Apple M3 Max
BenchmarkCloseWithInterface-14          605218370                1.988 ns/op           0 B/op          0 allocs/op
BenchmarkCloseWithGeneric-14            616801940                1.941 ns/op           0 B/op          0 allocs/op
BenchmarkCloseWithConcrete-14           621060686                1.974 ns/op           0 B/op          0 allocs/op

PS: but the ns/op went up

1 Like

Right; because it’s not optimizing out your function call.

3 Likes

Yap, forgot to mention, I switched to b.Loop on my runs

1 Like

From the b.Loop docs.

// Within the body of a "for b.Loop() { ... }" loop, arguments to and
// results from function calls within the loop are kept alive, preventing
// the compiler from fully optimizing away the loop body. Currently, this is
// implemented by disabling inlining of functions called in a b.Loop loop.
// This applies only to calls syntactically between the curly braces of the loop,
// and the loop condition must be written exactly as "b.Loop()". Optimizations
// are performed as usual in any functions called by the loop.

Looks like if we are not using b.Loop function calls might get inlined by the compiler inside the b.N loop. Thus, it explains why concrete types won in the first run. Since they are very lightweight and could be optimized right down to the bottom.

1 Like