[Data structures] should I return the contained elements or copies ? Go forces me to return a copy!

JOhn_Stuart · August 30, 2020, 11:29am

Hello fellow gophers,

I am implementing my tree data structure. Inside it I will put elements of any type that implements my Value interface.

I have a function that returns the largest element in my tree:

func (t *tree) Largest() *Value

and inside this function I first compute the value of the largest element:
largest := current

Then I have the last line that says:
return &largest

I iterate through this tree, find its largest element, and make a copy of it then return a pointer to that copy. The syntax largest := makes a copy of that actual element. Returning its address is returning the address of the copy.

This has one advantage: if I put complex values in my data structure, what I get out are copies and modifying the copies makes sure the original stays the same - thus I get less bugs.

But it also has a disadvantage: for big data, if I am to store lots of elements and retrieve each of them, I will use double the memory size, as each retrieve allocates a copy.

What is the idiomatic approach here ? should I stick to returning the pointer of a copy, or try to return the actual original element that was inserted ?

In production: how doI documment that this function returns a copy and this function returns the actual, original object - so that I remember its similar to immutable objects - is there a best practice here ? Is anybody documenting this detail about their return values ?

skillian · August 30, 2020, 3:02pm

If Value is an interface, then a variable (or struct field) of type Value consists of two pointers: a pointer to the type and a pointer to the boxed value. Because the Value is already a pointer to the wrapped value, you should not return a pointer to the Value. Just return by value.

Regardless of if you return *Value or Value, the value boxed inside of the Value interface will either be a pointer or a value and if it’s a pointer, then consumers of your API will still have the ability to mutate values in the tree.

Like I said before, if Value is an interface, then it doesn’t matter if you use *Value or Value. The boxed value will still not be copied.

If Value is an interface, then I think it’s safe to say to always pass it by value. If you’re doing some sort of code generation to get pseudo-generics, then it’s a toss up. It really depends on each scenario where the tree is going to be used. Sometimes you’ll want copies for stability reasons like you mentioned, but other times, perhaps you just want read-only access to some nested fields or something in the tree so it would be wasteful to create copies of large structs if you just wanted to pull a string or something out of each.

Note also that trees, just like linked lists are very slow on modern hardware. Every pointer dereference in the tree that references memory that isn’t in the cache will stall the processor for hundreds of cycles. If the next node needs to be dereferenced and that memory isn’t in the cache, there’s another few hundred cycles, etc… Most of the time, it’s faster to perform a linear search through an array-like data structure unless you have hundreds of thousands of elements that are each hundreds or thousands of bytes in size. If you have a simple slice, then users know to either:

for _, v := range vs {
    // work with copy
}

or

for i := range vs {
    v := &vs[i]
    // work with pointer
}

depending on if they want a copy or not.

system · November 28, 2020, 3:11pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.