Improving performance of a function

Hi,

I have a function and wish to improve its performance and memory allocation if possible. It was allocating 23 before and I did my best to bring it down to 8 now but I still think the function still can be improved. I would appreciate if I can get help with that please.

Thanks

package logger

import (
	"sort"
	"strconv"
)

type Field map[string]interface{}

func log(line string, fields Field) string {
	keys := make([]string, 0, len(fields))
	for k := range fields {
		keys = append(keys, k)
	}
	sort.Strings(keys)

	for _, k := range keys {
		if val, ok := fields[k].(string); ok {
			line += `,"` + k + `":"` + val + `"`
			continue
		}
		if val, ok := fields[k].(int); ok {
			line += `,"` + k + `":` + strconv.Itoa(val)
			continue
		}
		if val, ok := fields[k].(float64); ok {
			line += `,"` + k + `":` + string(strconv.AppendFloat([]byte(``), val, 'f', -1, 64))
			continue
		}
		if val, ok := fields[k].(bool); ok {
			line += `,"` + k + `":` + strconv.FormatBool(val)
			continue
		}
	}

	return line
}

package logger

import "testing"

func Benchmark_log(b *testing.B) {
	for i := 0; i < b.N; i++ {
		log("hello", Field{"str": "val" , "int": 1, "flo": 1234567890.01234567,"bol": true})
	}
}
$ go test -v -bench=. -benchmem ./logger/
goos: darwin
goarch: amd64
pkg: loog/logger
cpu: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
Benchmark_log
Benchmark_log-4   	  924964	      1246 ns/op	     328 B/op	       8 allocs/op
PASS
ok  	loog/logger	2.420s

Hi!

  1. you know the number of keys so you should avoid using a slice, preferring an array instead, so you can avoid the several append ops.
  2. You have to use a strings.Builder(initialized with the line content) in order to avoid many allocations concatenating line again and again

They already use a preallocated slice (3rd argument to make)

Will probably make things even worse, as it grows by the regular rules of slices and can not be properly made with a bigger capacity estimate.

Though indeed impleminting something similar that actually had a way to create an “empty” builder with a preguesses capacity might help with getting allocations down.

  1. you do not need the keys slice, remove that part, simply iterate on the map.
  2. use a type switch, not a series of (if, continue)
  3. you know the size of the final result: Simply loop twice through the map, first to compute the size of the final result. Then you allocate a string builder, and loop a second time through the map to build the result
  4. return the string.Builder .String()

@Metalymph

  1. It is not possible. The array length must be a constant so [len(fields)]string is illegal :woman_shrugging:
  2. Two less allocations :tada:

@telo_tade

  1. I have to keys sorted :-1:
  2. Barely noticeable difference, nearly nothing at all, just a cosmetic one :man_shrugging:

If you need the keys sorted, then you need the slice. Regarding the second point, it is not only a cosmetic difference, try it and you will likely have less allocation. At least you would be making use of the in built support for this scenario.

I was able to get it down to a single allocation for the log string, but with your current API of using a map[string]interface{}, I can’t think of an efficient allocation-free way to “sort” the map. Every iteration of a map can come out in a different order, so I used some nested, repetitive loops to keep track of the last key that was logged and keep skipping already-logged keys. I think the algorithm is n^2, so I do not recommend it.

If you can change your Field implementation to something like this:

type Field []struct{
    Key string
    Value interface{}
}

Then, because it’s a slice, it can be sorted inside of log without any additional allocations. This would change your call to log from this:

log("hello", Field{"str": "val", "int": 1, "flo": 1234567890.01234567, "bol": true})

to this:

log("hello", Field{{"str", "val"}, {"int", 1}, {"flo", 1234567890.01234567}, {"bol", true}})
Changed API, faster
// You can edit this code!
// Click here and start typing.
package main

import (
	"sort"
	"strconv"
	"strings"
	"unsafe"
)

const useUnsafe = false

type Field []struct {
	Key   string
	Value interface{}
}

func _() interface {
	sort.Interface
} {
	return (*Field)(nil)
}

func (fs *Field) Less(i, j int) bool {
	return strings.Compare((*fs)[i].Key, (*fs)[j].Key) < 0
}

func (fs *Field) Swap(i, j int) {
	(*fs)[i], (*fs)[j] = (*fs)[j], (*fs)[i]
}

func (fs *Field) Len() int { return len(*fs) }

func (fs *Field) estimateStringLength() (estimate int) {
	for _, kvp := range *fs {
		// include space for leading ," and trailing "
		estimate += len(`,""`)
		estimate += len(kvp.Key)
		switch val := kvp.Value.(type) {
		case string:
			estimate += len(val) + len(`""`)
		case int:
			estimate += 20 // length of math.MinInt64
		case float64:
			// I don't actually know how to find out the
			// maximum number of characters are needed to
			// represent a float value.  If you know that,
			// use that here instead.
			estimate += 24
		case bool:
			estimate += len("false")
		}
	}
	return
}

var unsafeFieldsSortTypePointer = func() unsafe.Pointer {
	var si sort.Interface = (*Field)(nil)
	return (*(*[2]unsafe.Pointer)(unsafe.Pointer(&si)))[0]
}()

func log(line string, fields Field) string {
	sb := make([]byte, len(line), len(line)+fields.estimateStringLength())
	sb = append(sb, line...)
	if useUnsafe {
		// sort.Sort should not hold a reference, so we're
		// going to build a sort.Interface manually that won't
		// escape.
		//
		// This might break if Go ever starts relocating its
		// allocations. In that case, just get rid of this
		// if useUnsafe block.
		var si sort.Interface
		siPtrs := (*struct {
			p unsafe.Pointer
			u uintptr
		})(unsafe.Pointer(&si))
		siPtrs.p = unsafeFieldsSortTypePointer
		siPtrs.u = uintptr(unsafe.Pointer(&fields))
		sort.Sort(si)
	} else {
		sort.Sort(&fields)
	}
	for _, kvp := range fields {
		sb = append(sb, `,"`...)
		// Note that you're not escaping any quotes
		// that might be inside of the field key:
		sb = append(sb, kvp.Key...)
		sb = append(sb, `":`...)
		switch x := kvp.Value.(type) {
		case string:
			sb = append(sb, '"')
			// Note that you're not escaping any quotes
			// that might be inside of the field value:
			sb = append(sb, x...)
			sb = append(sb, '"')
		case int:
			sb = strconv.AppendInt(sb, int64(x), 10)
		case float64:
			sb = strconv.AppendFloat(sb, x, 'f', -1, 64)
		case bool:
			sb = strconv.AppendBool(sb, x)
		}
	}
	if useUnsafe {
		return *((*string)(unsafe.Pointer(&sb)))
	}
	return string(sb)
}

Output:

Running tool: C:\Program Files\Go\bin\go.exe test -benchmem -run=^$ -bench ^Benchmark_log$ forum.golangbridge.org/improving-performance-of-a-function_30627 -v -cover

goos: windows
goarch: amd64
pkg: forum.golangbridge.org/improving-performance-of-a-function_30627
cpu: Intel(R) Core(TM) i9-9900 CPU @ 3.10GHz
Benchmark_log
Benchmark_log-16
 3858253	       311.4 ns/op	      96 B/op	       1 allocs/op
PASS
coverage: 94.6% of statements
ok  	forum.golangbridge.org/improving-performance-of-a-function_30627	1.718s
Same API, 3x slower
// You can edit this code!
// Click here and start typing.
package main

import (
	"strconv"
	"strings"
	"unsafe"
)

const useUnsafe = true

type Field map[string]interface{}

func (fs Field) estimateStringLength() (estimate int) {
	for k, v := range fs {
		// include space for leading ," and trailing "
		estimate += len(`,""`)
		estimate += len(k)
		switch val := v.(type) {
		case string:
			estimate += len(val) + len(`""`)
		case int:
			estimate += 20 // length of math.MinInt64
		case float64:
			// I don't actually know how to find out the
			// maximum number of characters are needed to
			// represent a float value.  If you know that,
			// use that here instead.
			estimate += 24
		case bool:
			estimate += len("false")
		}
	}
	return
}

func log(line string, fields Field) string {
	sb := make([]byte, 0, len(line)+fields.estimateStringLength())
	sb = append(sb, line...)
	lastKey := ""
	for i := 0; i < len(fields); i++ {
		nextKey, nextVal := "", interface{}(nil)
		for k, v := range fields {
			if i > 0 && strings.Compare(k, lastKey) <= 0 {
				continue
			}
			nextKey, nextVal = k, v
			break
		}
		for k, v := range fields {
			if i > 0 && strings.Compare(k, lastKey) <= 0 {
				continue
			}
			if strings.Compare(k, nextKey) < 0 {
				nextKey, nextVal = k, v
			}
		}
		lastKey = nextKey
		sb = append(sb, `,"`...)
		// Note that you're not escaping any quotes
		// that might be inside of the field key:
		sb = append(sb, nextKey...)
		sb = append(sb, `":`...)
		switch x := nextVal.(type) {
		case string:
			sb = append(sb, '"')
			// Note that you're not escaping any quotes
			// that might be inside of the field value:
			sb = append(sb, x...)
			sb = append(sb, '"')
		case int:
			sb = strconv.AppendInt(sb, int64(x), 10)
		case float64:
			sb = strconv.AppendFloat(sb, x, 'f', -1, 64)
		case bool:
			sb = strconv.AppendBool(sb, x)
		}
	}
	if useUnsafe {
		return *((*string)(unsafe.Pointer(&sb)))
	}
	return string(sb)
}

Output:

Running tool: C:\Program Files\Go\bin\go.exe test -benchmem -run=^$ -bench ^Benchmark_log$ forum.golangbridge.org/improving-performance-of-a-function_30627 -v -cover

goos: windows
goarch: amd64
pkg: forum.golangbridge.org/improving-performance-of-a-function_30627
cpu: Intel(R) Core(TM) i9-9900 CPU @ 3.10GHz
Benchmark_log
Benchmark_log-16
 1357574	       915.1 ns/op	      96 B/op	       1 allocs/op
PASS
coverage: 97.4% of statements
ok  	forum.golangbridge.org/improving-performance-of-a-function_30627	2.314s

@skillian Nice idea. Although, it makes me wonder: clearly logging is not a performance bottleneck, does it make sense then to change the rest of the application (how data is passed to logger) just so that it allows for better logging algorithm ?

Also, it is great to be able to optimize things, but it is also valuable to be able to estimate that something is already pretty close to any possible optimal state.

I don’t think that logging is inherently not a performance bottleneck. Sometimes there’s a tricky bug in production that you cannot figure out how to reproduce and you need to gradually introduce more and more logging in specific locations until you find it. It could be important to reduce allocations if this logging is happening in a loop.

At the same time, I don’t think it’s right to assume that logging is a performance bottleneck, so if I were writing this log function, I would write it like this:

func log(line string, fields Field) string {
    bs, _ := json.Marshal(fields)
    return fmt.Sprint(line, string(bs))
}

And then only if logging has been proven to be a bottleneck would I consider optimizing it. Also, I cannot be sure that optimizing purely for the number of allocations is important unless I’ve measured that as well!

Exactly. I wouldn’t use this code in production unless I had to. The optimizations and use of unsafe are hard to read and reason about. There needs to be a real reason to do this.

I just did it because I thought it was an interesting question and the solution I found demonstrates both that it is possible and “at great cost!” A sort of “pyrrhic victory?” :slightly_smiling_face:

@skillian Thank you for your very valuable input. Much appreciated.

Exactly. I wouldn’t use this code in production unless I had to. The optimizations and use of unsafe are hard to read and reason about. There needs to be a real reason to do this.

Which code did you mean you wouldn’t use in prod? Mine or one of yours?

I’ve never seen code like below. What does it do please?

func _() interface {
	sort.Interface
} {
	return (*Field)(nil)
}

Mine. Using the unsafe package the way I am in my example is fragile. My first example depends on the implementation details of interfaces and both the first and second depends on the implementation details of slices and strings. If those datatypes’ layouts ever change, this code will still compile but will not do what it’s supposed to do at runtime. Hopefully it will just crash the process immediately with a panic that takes you right to this code, but it might write somewhere in memory that isn’t detected and do who knows what. The compiler performs escape analysis to provide memory safety. My code breaks that traceability.

In hindsight, I regret writing it. I was so preoccupied with whether or not I could, I didn’t stop to think if I should. :smile:

It’s essentially a compile-time check that the *Field type implements sort.Interface. It declares a function with a “blank” name (so it cannot be called and therefore can be eliminated during compilation) that returns an interface { sort.Interface } (i.e. just sort.Interface). If I wanted *Field to implement other interfaces, I could add them along with sort.Interface like this:

func _() interface {
	json.Marshaler
	io.Writer
	sort.Interface
} {
	return (*Field)(nil)
}

Another common way to write that is like this:

var _ interface {
	sort.Interface
} = (*Field)(nil)
1 Like

I’ve just realised that running benchmark against a function in different PCs produce different allocs/op :thinking: One PC says 2 allocations but another 10!

Is there a way to control this or not really? I run my tests like this $ go test -v -bench=. -benchmem ./logger

For example I slightly modified the code as per @skillian 's code like below.

type Field struct {
	Key string
	Val interface{}
}

func (f Field) MarshalJSON() ([]byte, error) {
	return json.Marshal(map[string]interface{}{f.Key: f.Val})
}

func log(fields []Field) string {
	log, _ := json.Marshal(fields)
	return string(log)
}
package logger

import "testing"

func Benchmark_log(b *testing.B) {
	for i := 0; i < b.N; i++ {
		log([]Field{{"str", "val"}})
	}
}
  1. Are they both running the same version of Go and are they running on the same platform (e.g. amd64 vs. arm vs. 386, etc.)?
  2. What is your motivation for reducing the allocations in this log function? I’m thinking:
  • Whatever you’re passing the returned string to is probably going to allocate some memory to write the message out to a file or to a database, etc., or it will need to make IO calls which will dwarf the time of the allocations.
  • Your tests are logging constants but in your real code, you’re probably logging actual values. Any values which didn’t already escape to the heap will need to be “boxed” into interface{}s which will result in additional allocations.
  • The point of a language like Go is to not have to bother messing around with this kind of micro-optimization until you really need to. If allocations are a real problem, you might need to do whatever you’re doing in Rust, C++, or C, but I’d be curious to see what you’re doing that makes this necessary.
1 Like
  1. arm64 go19.2 (2 alloc) vs amd64 go19.2 (10 alloc)
  2. I am reinventing the logger package for my apps because I either don’t like interfaces of some or total alloc they use. Also I am introducing some specific configurations to it.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.