Interface, reflect, uintptr & garbage collection

Hello,

I have question about memory manipulation and garbage collection.

I wrote a tree algorithm for index millions of data. Each entry of a tree contains the same kind of data. Each tree could have different data.

I store data as interface{}

An interface is 16 bytes, a documentation says there are 8 bytes for the data type and 8 byte as pointer on original data.

As my tree stores only the same type of data, I want to store the type only once in the root, and each node will contains only the pointer on data.

(Note: 34 millions of node with 8 bytes less, save ~260MB of RAM)

I wrote a POC which seems work. The limitation is using only pointers for my data.

While the init of the tree the caller gives the pointed type (if I store *string, I give string)
Each node is inserted with a pointer on this type.

type Root struct {
    ...
    Type reflect.Type
}

type Node struct {
    ...
    Data uintptr
}

func NewTree(valuetype interface{})(*Root) {
    return &Root{
        ...,
        Type: reflect.ValueOf(valuetype).Type(),
    }
}

func (r *Root)Insert(..., data interface{})(*Node) {
    return &Node{
        Data = reflect.ValueOf(data).Pointer()
    }
}

func (r *Root)Interface(n *Node)(interface{}) {
    return reflect.NewAt(r.Type, unsafe.Pointer(n.Data)).Interface()
}

this code works, but I’m not sure the value pointed by uintptr is not garbage collected. Someone has an idea ?

1 Like

From unsafe package - unsafe - pkg.go.dev, emphasis added:

A uintptr is an integer, not a reference. Converting a Pointer to a uintptr creates an integer value with no pointer semantics. Even if a uintptr holds the address of some object, the garbage collector will not update that uintptr’s value if the object moves, nor will that uintptr keep the object from being reclaimed.

Thanks for the doc pointer. I’m sad to use MB of ram to store the same data. I will found another way to compact memory.

You could use unsafe.Pointer instead of uintptr and I think it’d work. unsafe.Pointer is treated as a pointer in the runtime and will prevent the data from being reclaimed.

Don’t forget that Go 1.18 with its generics should be out in Feb 2022, so you should be able to do away with some of the pointer-chasing then!

Even if you manage to save memory, adding tons of pointers in your code will cause every GC cycle to consume more CPU.

Otherwise, respect to you for trying to optimize. I hope you manage to find a suitable optimization, and please share it.

What type of data is it anyway ? Is it a custom struct ?

Thank. I reach my goal, so I’ll try later.

It’s funny activity!

It’s homemade radix tree with 5M nodes x 3 tree. With intermediary node, I have a total of 34M nodes. Each leaf is a struct with 2 int32 and 7 strings. String has 20 ascii char max. In most cases string are empty.

I already found some ways to save memory and I reach my goal, so less than 8GB.

I save a max of memory serializing this struct in byte array with specific binary quick serializer.

→ original struct min size is 2 * 4 + 7 * 16 = 120byte, the most often case is one int32 (< 10000) and 20 chars string, so a mean of 140bytes. After serialization, the same struct got 23bytes. 140 - 23 = 117 bytes saved, so x5M = 585MB saved.

I save a max of memory using 32 bit references for nodes in place of pointer of each tree node (prent, left-child, right-child). I used a list of preallocated array of 2^16 entries. So my 32bit reference use 16msb for the index in the list entry and 16lsb for the index in the 2^16 list. I add a pool for node, and it works ! Also, I replace []byte by string. []byte is at least 24bytes, string is at least 16bytes.

→ Original node is 80byte, I reach 56bytes, so a gain of 24. x 34M = 816MB saved

A total of 1.4GB saved.

A few more memory optimisation, like sorting big amount of data using temporary files. It is slower but it save a big lot of memory (the bigest sorted list is 1M entries)

Also fix some alignment problems, gain a lot of byte in structs.

What is the radix of your tree ? And what is the practical problem which you try to solve with it ? If you can share, why does every leaf contain 7 strings, what is the data layout for nodes and edges ? And you index list, what is it for, is it memoization for node values ?

The idea looks great, it is a very powerful and versatile data structure, but you need to implement it by taking into account the practical realities of any given programming language.

Where do you intend to run it ? Is it a real time app like an api exposing service (that is why you want so serialize with least amount of memory usage), or more of a job running manually ?

For example, you talk a lot about saving memory, please bear in mind it comes at the cost of runtime CPU and based on practical costs it may be cheaper to deliberately use more memory if that allows you to save CPU.

[]byte is at least 24bytes, string is at least 16bytes

The overhead for a slice (pointer, len, cap) is one int less than the overhead for a string (len, pointer), that is true, but you are trying to squeeze out memory from Golang implementation internals, which should only be done after you are sure you have optimized your structure in the best possible way.

Also, in the code from your fist post, would it not make sense to have only 2 options for valueType, string or []byte, to avoid using reflect ?

Its a kind of tree which return the longest matched prefix. It is useful for dealing with network and ip adresses. It answer the question in what network is my ip address. (my tree contains 192.168.0.0/16, 192.168.55.0/24. In what network is the network 192.168.55.6/32 ? a lookup return 192.168.55.0/24 which is the more accurate).

I need these lookup:

  • in what network is is my network
  • i want to browse children network of specific network
  • i want to lookup parent network of existing node
  • … maybe some other …

there are some properties of some network, the owner, the as number (int32), the country code, the datacenter name, and a general category encoded in int32 …

I have about 40 data source extracted form internet. Each one of these source gives qualification for a network (the geo-location, the s number, the datacenter, the kind of activity, …) I need to aggregate all of these network in a final DB. The problem is to merge the following cases in a contiguous series of networks. Contiguous network because the final user of this database allow only one “longest match” lookup which return all the data.

  • 192.168.0.0/16 => geo=FR
  • 192.168.0.0/24 => geo=US

result in contiguous network is:

  • 192.168.0.0/24 => geo=US
  • 192.168.2.0/23 => geo=FR
  • 192.168.4.0/22 => geo=FR
  • 192.168.8.0/21 => geo=FR
  • 192.168.16.0/20 => geo=FR
  • 192.168.32.0/19 => geo=FR
  • 192.168.64.0/18 => geo=FR
  • 192.168.128.0/17 => geo=FR

Merge algorithm are written from a few month, and they require efficient index for manipulation milions of networks.

This node is not live request. The live request node is written in C and it is very fast with an optimized memory usage. I simply want to save memory because I like this activity :slight_smile: In other way it is a good way to understand Go underlaying mechanism and memory management.

Absolutely, I use cpu to compress memory after/before each lookup/insert. It is assumed:

  • this process is not live
  • my memory serializer is very fast,
    the result is not significant.

Also, my system to compress 64bit pointer in 32 bit require offset calculus for each node access, like this:

  • node pointer = &pool[column].rows[line]
  • line / column = ( (node_pointer - rows_base_pointer) / sizeof(node) ) | (node.pool_index << 16)

In the same way, this calculus are very fast, and the time added in lookups is not significant. The bad thing is a lot of storage for the pool_index. While I write these lines, I’m remember the struct size are aligned on 64 bit / 8 byte, so I can compress the reference removing the 3 last bits which are always 0 :slight_smile: Or use this 3 lsb to store some other data.

I also waste cpu using compression of data on disk, but I save time writing less data on disk :slight_smile:

Agreed. Hi hope I’d already optimized structs (each one contains the minimum dat, and the members are ordered to optimize alignment. So, I’m not a code-superhero, maybe I’m wrong.

I explore this way. the problem is I wrote the radix-tree as a library used by any other component. If I remove the interface{} (reflect & co.) I broke compatibility.

Maybe can I implement two way to use the radix-tree, but its hard to implement this.

Maybe each node should be accessed using interface{} defined function, but I afraid the many function call add very long time.

A way similar to the C-style solution is defining node like this:

package main

import (
	"fmt"
	"reflect"
	"unsafe"
)

type node struct {
	i int // dummy value for test
}

type data struct {
	n node   // the node - always the first entry
	d string // the data
}

func main() {
	var d *data
	var n *node

	// setup
	d = &data{
		n: node{
			i: 45,
		},
		d: "data",
	}
	n = &d.n

	// gets d from n
	d = reflect.NewAt(reflect.TypeOf(*d), unsafe.Pointer(n)).Interface().(*data)

	// display result
	fmt.Printf("%#v\n", d)
}

It work, it display:

&main.data{n:main.node{i:45}, d:"data"}

With this code, the caller knowns the kind of node, and use this formula to recover the data he wants. In other ways the intermediates nodes, are smaller because they no keep pointer on data.

But I’m absolutely not sure about the GC behavior :slight_smile: In otherway, I loose the replacement of 64bit pointer by 32bit references.

Hi @Thierry_Fournier ,

The data structure, and the energy you put in optimizing it, are both very impressive. Congratulation for that. Also, if your library is public, please post a link as others may also be curious.

Regarding the radix tree: since it is used for IPs, its height will never be much and it will be very wide.

My first reaction, after seeing your answer, is to ask myself if you are sure that a radix tree is what you want ? Keep in mind that a) it will be very wide on the first levels, and not too deep and b) your data has a very rigid format, namely it is an IP

A general radix tree is not making use of the extra information that you have (a and b). This is my instinctive reaction: can you optimize your radix tree to make use of the extra information that you have (a and b) ?

  • for a: for every child of root, you could have a different app running on a different machine, that only manages data for that child of root. This way you scale, and make your tree one level smaller.
  • an optimized version of a: your data is a few millions/hundreds of millions of IPs. Their distribution is likely not random, based on your data’s distribution, store the first levels of your data structure into a radix tree of its own, with the leafs being poitners/calls to other apps.
  • as for b: could you use math rules whenever you can, to remove some of the nodes ?

I am not sure I get why you are serializing your data structure ? Is this your way of ensuring persistency ? If so, can you potentially adapt your tree to no longer store in memory but use a DB in combination with the rules from b ?

I can’t export all the project because its a part of our job, but I can export the radix tree code (it is the funny part of the project).

The code were not designed to be exported, so i add a little bit of doc and example.

yes, its depth is log(n), so for 32bit IPv4, the depth is 32 levels (because radix tree could be summarised as binary tree).

Maybe it exists a faster storage way to answer my requirements, but this way is acceptable in terms of processing time.

In addition, my data is also times on 64 bit format. I need to keep time sorted to known at any time the next time I do something.

Nice way to growth the processing. If the data are homogeneously distributed, I can specialise nodes and use network to send data in the right computer.

So, today it is not interesting. I does my job on one machine in 60s, the process eat 5.7 millions of qualified network, it merge result in two databases of 3.7 millions of network and 500k networks. 60s is a little bit long, but for saving memory, I dump some inputs on disk.

the program work from a few hour (it is a continuous merge, refresh sources and merge again) and it eat 7GB of RAM. It seems I reach mygoal were is is less than 8GB of RAM.

I serialize for saving memory. My data take about ~140 bytes in a generic struct, become 24 bytes in serialized struct (note this part is not in the github code) 140 - 24 = 116 bytes saved, so 500MB of RAM.

Database is too slow to perform fast merge of data. Each network entered (5.7 million at total) in the merge system generates 30 lookup in 30 trees followed by a lot of insertion/deletion - between 1 and 4 (it depends of the merge result). All the lookup must be radix longest path lookup (I think postgresql does), so about 5.7M x 30 / 60s = 2.8M SELECT/s, not sure classic database reach this rate. (maybe using preparsed request and statements and a lot of optimizations).

In other way a database is another component in the company assets, and it need supervision, maintainance, high avalaibility, … With my solution, I just put two computer perform same merge on same data in parrallel and the high avalaibility is ensured.

If you’re curious, enjoy reading code of the radix-tree :wink: i’m not so proud of the result, but I reach the goal.

other optimization thing: radix_binary.go contains most called function, so I want to inline these function to avoid a call. Go decide itself the inlinement, I can’t force it. So I try to reduce the code generated by these function, but I reach a limit, I cannot reduce more the code.

1 Like