Why json marshal use append rather than copy?

func Marshal(v any) ([]byte, error) {
	e := newEncodeState()
	defer encodeStatePool.Put(e)

	err := e.marshal(v, encOpts{escapeHTML: true})
	if err != nil {
		return nil, err
	}
	buf := append([]byte(nil), e.Bytes()...)

	return buf, nil
}

In Go, the json.Marshal function uses append rather than copy when serializing data into JSON format for a specific reason. Let’s understand the rationale behind this choice.

When serializing data into JSON, the json.Marshal function needs to construct the byte slice that represents the JSON-encoded data. The size of this byte slice can vary depending on the complexity and size of the data being serialized. Since the size is not known in advance, append is used instead of copy for several reasons:

  1. Dynamic resizing: append allows for dynamic resizing of the underlying byte slice as needed. It automatically manages the capacity of the slice, ensuring that it can accommodate the serialized JSON data. append increases the capacity of the slice when necessary, avoiding unnecessary memory allocations and copying.
  2. Efficiency: Using append avoids unnecessary copying of data. If copy were used, it would require copying the serialized data to a new byte slice with a larger capacity whenever the current capacity is exceeded. This additional copying operation would incur unnecessary overhead and degrade performance.
  3. Flexibility: append enables flexibility in handling different data structures and sizes. It allows the json.Marshal function to efficiently handle various types and sizes of data without making assumptions about their specific sizes or structures.

Why not directly use buf:=e.Bytes() instead of buf := append([]byte(nil), e.Bytes()...)

The underlying memory is multiplexed, and a copy must be returned to the outside world.

I’ve measured the performance of both scenarios, append can write one less line of code.

Hi,

I don’t understand how the stated reasons apply to this case:

buf := append([]byte(nil), e.Bytes()...)

This will create a brand new slice with a known fixed size. There is no resizing, since a new slice (and underlying array) is allocated every time. The size of e.Bytes() is known at allocation time. Since the underlying array in e is reused after return, all data needs to by copied - no optimization possible. And since it is always a slice of bytes, no flexibility is needed, since it is always the same type of data: a byte slice with a known length.

Can you show your measurement?
Older benchmarks seem to indicate copy was faster than append, but I don’t know how the current compiler handles these scenarios.

In principle the compiler should recognize both and produce the same optimal machine-instructions for both cases, I think.

Just a guess, but semantically:

buf := make([]byte, len(e.Bytes())
_ = copy(buf, e.Bytes())

Has to zero-out buf during the call to make and then copy overwrites it. appending doesn’t need to zero-out anything and just overwrites elements. I would think the compiler would be able to optimize out that zeroing step, but perhaps it doesn’t. Or perhaps it didn’t at the time that code in Marshal was written.

1 Like

https://play.golang.com/p/G8WNP1kYNi4

goos: windows
goarch: amd64
pkg: hello/bench
cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics
BenchmarkAppend
BenchmarkAppend-12       1131752               954.6 ns/op
BenchmarkCopy
BenchmarkCopy-12         1281948               966.5 ns/op
PASS

Process finished with the exit code 0
1 Like

I tried windows-amd64, linux-amd64, copy and append are about the same speed. But darwin-arm64 platform copy is faster.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.