Vec3 as [3]float32 is not as fast as a struct

MoustaphaSaad · March 27, 2023, 7:50pm

I’m working on some graphics related project and I’ve noticed that my Go code is not as fast as it should be.

I did some investigation and had an educated guess that passing my Vec3 by value is the cause of the problem so I allocated all my Vec3 (defined as [3] float32) on the heap and worked with *Vec3 and that made my code a lot faster (2x faster)

The next day I decided to inspect the generated assembly that’s when I noticed that my code is passing the Vec3 [3] float32 on the stack, and by accident I converted my Vec3 to a struct { X, Y, Z float32 } and found out that Go compiler is not passing it on the stack anymore (probably passing it in registers) and that the struct version is as fast as the pointer to array version if not faster.

My question here is, why does the go compiler treat the array version differently? couldn’t the compiler treat it like the struct version and pass it in registers?

here’s my benchmark code

package main

import (
	"testing"
)

type ArrVec3 [3]float32

func (v ArrVec3) Sub(u ArrVec3) (res ArrVec3) {
	res[0] = v[0] - u[0]
	res[1] = v[1] - u[1]
	res[2] = v[2] - u[2]
	return
}

func (v ArrVec3) Mul(t float32) (res ArrVec3) {
	res[0] = v[0] - t
	res[1] = v[1] - t
	res[2] = v[2] - t
	return
}

func (v ArrVec3) Dot(u ArrVec3) float32 {
	return u[0]*v[0] + u[1]*v[1] + u[2]*v[2]
}

type Vec3 struct {
	X, Y, Z float32
}

func (v Vec3) Sub(u Vec3) (res Vec3) {
	res.X = v.X - u.X
	res.Y = v.Y - u.Y
	res.Z = v.Z - u.Z
	return
}

func (v Vec3) Mul(t float32) (res Vec3) {
	res.X = v.X * t
	res.Y = v.Y * t
	res.Z = v.Z * t
	return
}

func (v Vec3) Dot(u Vec3) float32 {
	return u.X*v.X + u.Y*v.Y + u.Z*v.Z
}

func BenchmarkArrVec3(b *testing.B) {
	v := ArrVec3{1, 2, 3}
	n := ArrVec3{0, 1, 0}
	for i := 0; i < b.N; i++ {
		v.Sub(n.Mul(v.Dot(n) * 2))
	}
}


func BenchmarkVec3(b *testing.B) {
	v := Vec3{1, 2, 3}
	n := Vec3{0, 1, 0}
	for i := 0; i < b.N; i++ {
		v.Sub(n.Mul(v.Dot(n) * 2))
	}
}

Here’s the godbolt link: Compiler Explorer
Sorry if I made any mistake reading the assembly, this is my first time reading Go’s assembly

skillian · March 27, 2023, 7:57pm

It probably could, but it looks like the Go team might not have bothered with that based on my understanding of the Go internal ABI specification, specifically:

Non-trivial arrays are always passed on the stack because indexing into an array typically requires a computed offset, which generally isn’t possible with registers. Arrays in general are rare in function signatures (only 0.7% of functions in the Go 1.15 standard library and 0.2% in kubelet). We considered allowing array fields to be passed on the stack while the rest of an argument’s fields are passed in registers, but this creates the same problems as other large structs if the callee takes the address of an argument, and would benefit <0.1% of functions in kubelet (and even these very little).

I’m not sure what they mean by “non-trivial,” as float32 arrays seem pretty “trivial” to me, but it does seem that going with structs may be the best solution here.

system · June 25, 2023, 7:58pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.