func Marshal(v any) ([]byte, error) {
e := newEncodeState()
defer encodeStatePool.Put(e)
err := e.marshal(v, encOpts{escapeHTML: true})
if err != nil {
return nil, err
}
buf := append([]byte(nil), e.Bytes()...)
return buf, nil
}
In Go, the json.Marshal
function uses append
rather than copy
when serializing data into JSON format for a specific reason. Let’s understand the rationale behind this choice.
When serializing data into JSON, the json.Marshal
function needs to construct the byte slice that represents the JSON-encoded data. The size of this byte slice can vary depending on the complexity and size of the data being serialized. Since the size is not known in advance, append
is used instead of copy
for several reasons:
- Dynamic resizing:
append
allows for dynamic resizing of the underlying byte slice as needed. It automatically manages the capacity of the slice, ensuring that it can accommodate the serialized JSON data.append
increases the capacity of the slice when necessary, avoiding unnecessary memory allocations and copying. - Efficiency: Using
append
avoids unnecessary copying of data. Ifcopy
were used, it would require copying the serialized data to a new byte slice with a larger capacity whenever the current capacity is exceeded. This additional copying operation would incur unnecessary overhead and degrade performance. - Flexibility:
append
enables flexibility in handling different data structures and sizes. It allows thejson.Marshal
function to efficiently handle various types and sizes of data without making assumptions about their specific sizes or structures.
Why not directly use buf:=e.Bytes()
instead of buf := append([]byte(nil), e.Bytes()...)
The underlying memory is multiplexed, and a copy must be returned to the outside world.
I’ve measured the performance of both scenarios, append can write one less line of code.
Hi,
I don’t understand how the stated reasons apply to this case:
buf := append([]byte(nil), e.Bytes()...)
This will create a brand new slice with a known fixed size. There is no resizing, since a new slice (and underlying array) is allocated every time. The size of e.Bytes() is known at allocation time. Since the underlying array in e
is reused after return, all data needs to by copied - no optimization possible. And since it is always a slice of bytes, no flexibility is needed, since it is always the same type of data: a byte slice with a known length.
Can you show your measurement?
Older benchmarks seem to indicate copy was faster than append, but I don’t know how the current compiler handles these scenarios.
In principle the compiler should recognize both and produce the same optimal machine-instructions for both cases, I think.
Just a guess, but semantically:
buf := make([]byte, len(e.Bytes())
_ = copy(buf, e.Bytes())
Has to zero-out buf
during the call to make
and then copy
overwrites it. append
ing doesn’t need to zero-out anything and just overwrites elements. I would think the compiler would be able to optimize out that zeroing step, but perhaps it doesn’t. Or perhaps it didn’t at the time that code in Marshal
was written.
https://play.golang.com/p/G8WNP1kYNi4
goos: windows
goarch: amd64
pkg: hello/bench
cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics
BenchmarkAppend
BenchmarkAppend-12 1131752 954.6 ns/op
BenchmarkCopy
BenchmarkCopy-12 1281948 966.5 ns/op
PASS
Process finished with the exit code 0
I tried windows-amd64, linux-amd64, copy and append are about the same speed. But darwin-arm64 platform copy is faster.