Why json marshal use append rather than copy?

caster · July 12, 2023, 11:22am

func Marshal(v any) ([]byte, error) {
	e := newEncodeState()
	defer encodeStatePool.Put(e)

	err := e.marshal(v, encOpts{escapeHTML: true})
	if err != nil {
		return nil, err
	}
	buf := append([]byte(nil), e.Bytes()...)

	return buf, nil
}

bluefire · July 14, 2023, 10:18am

In Go, the json.Marshal function uses append rather than copy when serializing data into JSON format for a specific reason. Let’s understand the rationale behind this choice.

When serializing data into JSON, the json.Marshal function needs to construct the byte slice that represents the JSON-encoded data. The size of this byte slice can vary depending on the complexity and size of the data being serialized. Since the size is not known in advance, append is used instead of copy for several reasons:

Dynamic resizing: append allows for dynamic resizing of the underlying byte slice as needed. It automatically manages the capacity of the slice, ensuring that it can accommodate the serialized JSON data. append increases the capacity of the slice when necessary, avoiding unnecessary memory allocations and copying.
Efficiency: Using append avoids unnecessary copying of data. If copy were used, it would require copying the serialized data to a new byte slice with a larger capacity whenever the current capacity is exceeded. This additional copying operation would incur unnecessary overhead and degrade performance.
Flexibility: append enables flexibility in handling different data structures and sizes. It allows the json.Marshal function to efficiently handle various types and sizes of data without making assumptions about their specific sizes or structures.

adctc · July 17, 2023, 12:48pm

Why not directly use buf:=e.Bytes() instead of buf := append([]byte(nil), e.Bytes()...)

caster · July 20, 2023, 2:55am

The underlying memory is multiplexed, and a copy must be returned to the outside world.

caster · July 28, 2023, 1:30am

I’ve measured the performance of both scenarios, append can write one less line of code.

falco467 · July 28, 2023, 12:35pm

Hi,

I don’t understand how the stated reasons apply to this case:

buf := append([]byte(nil), e.Bytes()...)

This will create a brand new slice with a known fixed size. There is no resizing, since a new slice (and underlying array) is allocated every time. The size of e.Bytes() is known at allocation time. Since the underlying array in e is reused after return, all data needs to by copied - no optimization possible. And since it is always a slice of bytes, no flexibility is needed, since it is always the same type of data: a byte slice with a known length.

falco467 · July 28, 2023, 12:39pm

Can you show your measurement?
Older benchmarks seem to indicate copy was faster than append, but I don’t know how the current compiler handles these scenarios.

In principle the compiler should recognize both and produce the same optimal machine-instructions for both cases, I think.

skillian · July 28, 2023, 2:09pm

Just a guess, but semantically:

buf := make([]byte, len(e.Bytes())
_ = copy(buf, e.Bytes())

Has to zero-out buf during the call to make and then copy overwrites it. appending doesn’t need to zero-out anything and just overwrites elements. I would think the compiler would be able to optimize out that zeroing step, but perhaps it doesn’t. Or perhaps it didn’t at the time that code in Marshal was written.

caster · July 29, 2023, 1:06pm

https://play.golang.com/p/G8WNP1kYNi4

goos: windows
goarch: amd64
pkg: hello/bench
cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics
BenchmarkAppend
BenchmarkAppend-12       1131752               954.6 ns/op
BenchmarkCopy
BenchmarkCopy-12         1281948               966.5 ns/op
PASS

Process finished with the exit code 0

caster · July 29, 2023, 1:08pm

I tried windows-amd64, linux-amd64, copy and append are about the same speed. But darwin-arm64 platform copy is faster.

system · October 27, 2023, 1:09pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.