I’m working on some graphics related project and I’ve noticed that my Go code is not as fast as it should be.
I did some investigation and had an educated guess that passing my Vec3 by value is the cause of the problem so I allocated all my Vec3 (defined as [3] float32
) on the heap and worked with *Vec3 and that made my code a lot faster (2x faster)
The next day I decided to inspect the generated assembly that’s when I noticed that my code is passing the Vec3 [3] float32
on the stack, and by accident I converted my Vec3 to a struct { X, Y, Z float32 }
and found out that Go compiler is not passing it on the stack anymore (probably passing it in registers) and that the struct version is as fast as the pointer to array version if not faster.
My question here is, why does the go compiler treat the array version differently? couldn’t the compiler treat it like the struct version and pass it in registers?
here’s my benchmark code
package main
import (
"testing"
)
type ArrVec3 [3]float32
func (v ArrVec3) Sub(u ArrVec3) (res ArrVec3) {
res[0] = v[0] - u[0]
res[1] = v[1] - u[1]
res[2] = v[2] - u[2]
return
}
func (v ArrVec3) Mul(t float32) (res ArrVec3) {
res[0] = v[0] - t
res[1] = v[1] - t
res[2] = v[2] - t
return
}
func (v ArrVec3) Dot(u ArrVec3) float32 {
return u[0]*v[0] + u[1]*v[1] + u[2]*v[2]
}
type Vec3 struct {
X, Y, Z float32
}
func (v Vec3) Sub(u Vec3) (res Vec3) {
res.X = v.X - u.X
res.Y = v.Y - u.Y
res.Z = v.Z - u.Z
return
}
func (v Vec3) Mul(t float32) (res Vec3) {
res.X = v.X * t
res.Y = v.Y * t
res.Z = v.Z * t
return
}
func (v Vec3) Dot(u Vec3) float32 {
return u.X*v.X + u.Y*v.Y + u.Z*v.Z
}
func BenchmarkArrVec3(b *testing.B) {
v := ArrVec3{1, 2, 3}
n := ArrVec3{0, 1, 0}
for i := 0; i < b.N; i++ {
v.Sub(n.Mul(v.Dot(n) * 2))
}
}
func BenchmarkVec3(b *testing.B) {
v := Vec3{1, 2, 3}
n := Vec3{0, 1, 0}
for i := 0; i < b.N; i++ {
v.Sub(n.Mul(v.Dot(n) * 2))
}
}
Here’s the godbolt link: Compiler Explorer
Sorry if I made any mistake reading the assembly, this is my first time reading Go’s assembly