I read somewhere it might be better to use a function with generic parameter (instead of interface) so that go can avoid dynamic dispatch. But when I test that assumption using benchmarks:
package main
import (
"testing"
)
type Closeable interface {
Close()
}
type MyCloseable struct{}
func (m *MyCloseable) Close() {}
func closeWithInterface(c Closeable) {
c.Close()
}
func closeWithGeneric[T Closeable](c T) {
c.Close()
}
func closeWithConcrete(c *MyCloseable) {
c.Close()
}
func BenchmarkCloseWithInterface(b *testing.B) {
c := &MyCloseable{}
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
closeWithInterface(c)
}
}
func BenchmarkCloseWithGeneric(b *testing.B) {
c := &MyCloseable{}
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
closeWithGeneric(c)
}
}
func BenchmarkCloseWithConcrete(b *testing.B) {
c := &MyCloseable{}
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
closeWithConcrete(c)
}
}
Well, on my machine I have generics > interface > concrete
test-generic % go test -bench=. -run=^$ -benchmem -benchtime=1000000000x ./...
goos: darwin
goarch: arm64
pkg: test-generic
cpu: Apple M1 Pro
BenchmarkCloseWithInterface-10 1000000000 2.055 ns/op 0 B/op 0 allocs/op
BenchmarkCloseWithGeneric-10 1000000000 2.043 ns/op 0 B/op 0 allocs/op
BenchmarkCloseWithConcrete-10 1000000000 2.156 ns/op 0 B/op 0 allocs/op
PASS
ok test-generic 6.748s
And with hyperfine:
Benchmark 1: go test -bench=^BenchmarkCloseWithInterface$ -run=^$ -benchmem -benchtime=1000000000x
Time (mean ± σ): 2.465 s ± 0.045 s [User: 2.218 s, System: 0.216 s]
Range (min … max): 2.405 s … 2.632 s 20 runs
Benchmark 2: go test -bench=^BenchmarkCloseWithGeneric$ -run=^$ -benchmem -benchtime=1000000000x
Time (mean ± σ): 2.460 s ± 0.008 s [User: 2.228 s, System: 0.218 s]
Range (min … max): 2.446 s … 2.473 s 20 runs
Benchmark 3: go test -bench=^BenchmarkCloseWithConcrete$ -run=^$ -benchmem -benchtime=1000000000x
Time (mean ± σ): 2.582 s ± 0.011 s [User: 2.347 s, System: 0.221 s]
Range (min … max): 2.562 s … 2.607 s 20 runs
Summary
go test -bench=^BenchmarkCloseWithGeneric$ -run=^$ -benchmem -benchtime=1000000000x ran
1.00 ± 0.02 times faster than go test -bench=^BenchmarkCloseWithInterface$ -run=^$ -benchmem -benchtime=1000000000x
1.05 ± 0.01 times faster than go test -bench=^BenchmarkCloseWithConcrete$ -run=^$ -benchmem -benchtime=1000000000x
~/P/g/genericbench $ go test -bench=. -benchmem -benchtime=1000000000x ./...
goos: darwin
goarch: arm64
pkg: deantest/genericbench
cpu: Apple M4
BenchmarkCloseWithInterface-10 1000000000 0.2463 ns/op 0 B/op 0 allocs/op
BenchmarkCloseWithGeneric-10 1000000000 0.6762 ns/op 0 B/op 0 allocs/op
BenchmarkCloseWithConcrete-10 1000000000 0.2256 ns/op 0 B/op 0 allocs/op
PASS
ok deantest/genericbench 1.295s
~/P/g/genericbench $ go version
go version go1.25.5 darwin/arm64
Also using Apple silicon and the same version of Go you are. Strange we are getting such different results!
That said, this is probably a non-issue. First off: this benchmark is so contrived, I think it’s likely the compiler is optimizing something away in the interface/concrete versions because it knows the code isn’t doing anything. Benchmarks are hard to get right because compilers are super clever. I would try benchmarking real code that actually does something and see what happens.
Second: I want to put into perspective how fast 0.6 nanoseconds is. A nanosecond is a billionth of a second. Whatever overhead you are potentially incurring here will almost certainly not be your bottleneck in the real world and will be immaterial.
Anyway, a note on compiler optimizations:
Before concluding I wanted to highlight that to be completely accurate, any benchmark should be careful to avoid compiler optimisations eliminating the function under test and artificially lowering the run time of the benchmark.
What happens if you refactor your code to use b.Loop?
// Within the body of a "for b.Loop() { ... }" loop, arguments to and
// results from function calls within the loop are kept alive, preventing
// the compiler from fully optimizing away the loop body. Currently, this is
// implemented by disabling inlining of functions called in a b.Loop loop.
// This applies only to calls syntactically between the curly braces of the loop,
// and the loop condition must be written exactly as "b.Loop()". Optimizations
// are performed as usual in any functions called by the loop.
Looks like if we are not using b.Loop function calls might get inlined by the compiler inside the b.N loop. Thus, it explains why concrete types won in the first run. Since they are very lightweight and could be optimized right down to the bottom.