Using LEA instead of ADD on amd64

Hi, all,

This is kind of a question, but one I don’t expect to have definite answers. Just a curiosity:

I wanted to see the assembly that gets generated for closures, so I wrote this:

package main

import (
        "fmt"
)

func main() {
        var x int
        f := func() int {
                x++
                return x - 1
        }
        times(5, &x, f)
}

func times(n int, p *int, f func() int) {
        for i := 0; i < n; i++ {
                fmt.Println("*p:", *p)
                fmt.Println("f():", f())
                fmt.Println("*p:", *p)
                fmt.Println()
        }
}

The assembly for main.func1 is:

"".main.func1 STEXT nosplit size=20 args=0x8 locals=0x0
	0x0000 00000 (./main.go:9)	TEXT	"".main.func1(SB), NOSPLIT|NEEDCTXT|ABIInternal, $0-8
	0x0000 00000 (./main.go:9)	PCDATA	$0, $-2
	0x0000 00000 (./main.go:9)	PCDATA	$1, $-2
	0x0000 00000 (./main.go:9)	FUNCDATA	$0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (./main.go:9)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (./main.go:9)	FUNCDATA	$2, gclocals·db688afbc90e26183a53c9ad23b80c29(SB)
	0x0000 00000 (./main.go:9)	PCDATA	$0, $2
	0x0000 00000 (./main.go:9)	PCDATA	$1, $0
	0x0000 00000 (./main.go:9)	MOVQ	8(DX), AX
	0x0004 00004 (./main.go:10)	MOVQ	(AX), CX
	0x0007 00007 (./main.go:10)	LEAQ	1(CX), DX
	0x000b 00011 (./main.go:10)	PCDATA	$0, $0
	0x000b 00011 (./main.go:10)	MOVQ	DX, (AX)
	0x000e 00014 (./main.go:11)	MOVQ	CX, "".~r0+8(SP)
	0x0013 00019 (./main.go:11)	RET

What really surprises me is that LEAQ 1(CX), DX at 0x0007. I’m not very good with assembly, but I think that means to add 1 to the value in CX and store the result in DX.

After some googling, it seemed to be common to use LEA instead of ADD in the old Pentium days because it could be performed earlier in the instruction decoding pipeline as opposed to ADD which (always? usually?) has to be executed on the ALU.

Is that all there is to it? Just a micro-optimization the Go assembler performs to make incrementing potentially faster? Is this not related to that increment and I missed something else? I’m wondering if anyone knows where I can look to find more information on instruction selection on modern processors.