How to create read-only arbitrary Golang data structures?

simonhf · August 26, 2021, 7:26pm

Assuming there is no way to do this currently, another question might be:

How to allocate and use Golang data structures outside of Golang’s own heap / memory management?

What I’d like to do somehow:

Normally Golang data structures get allocated on the stack or on the heap. But what if I wanted to create and manage my own heap so when Golang would allocate a data structure on its heap, I can somehow divert it to allocate on my heap instead?

After allocation on my heap, then the data structures can be written to, and finally the memory can be marked as read-only, assuming my heap was originally allocated as an operating system memory map.

Presumably at this point the read-only data structures can be read by arbitrary Golang functions, but never written to, which would cause a segmentation fault because the memory is read-only.

Later, after the read-only arbitrary data structure is no longer needed, because the backing heap is under my control, I’d need to manage it myself, probably un-mapping the memory map.

Why would I like to do this?

I’d like to have immutable arbitrary data structures which are guaranteed never to change, even by arbitrary memory corruption. This is potentially useful for blockchain applications where data structures are created once and never change.

How would operating system memory maps help?

A memory map can be marked as read-only memory, so that writing even accidentally is not possible.
A single memory map can be efficiently grown to any size without re-copying the data when it needs to grow, e.g. if you have an already large memory map that cannot grow any more (because other memory is already allocated either side of it) then the OS will just remap its pages to a different memory address where it is unblocked and can grow, which can be much cheaper than having to copy the memory elsewhere.

Or is there another way to achieve the same outcome?

For example, let’s say I have an arbitrary data structure created on the Golang heap. Is there a way to make a copy of it into a memory map under my control? All pointers would have to be updated… somehow… how? Presumably, after copying, I can ignore / garbage collect the original data structure and be left with the read-only copy?

Other suggestions?

simonhf · August 27, 2021, 2:46am

I have experimented below with a quick and dirty deep copy of a simple Golang data structure into a memory map so that it becomes self-referencing:

$ cat unsafe-demo.go
package main

/*
$ go mod init unsafe-demo
$ go get github.com/edsrzf/mmap-go
$ go get github.com/jinzhu/copier
$ go run unsafe-demo.go
*/

import (
	"fmt"
	"unsafe"
	"github.com/jinzhu/copier"
	"github.com/edsrzf/mmap-go"
)

type A struct {
   A int8
   B string
   C float32
}

type B struct {
   D int8
   E string
   F float32
}

func main() {
	a := A{A: 1, B: `foo`, C: 1.234567}
	//b := B(a) cannot convert a (type A) to type B
	b := *(*B)(unsafe.Pointer(&a))

	a.A = 2 // only changes a.A and not b.D because b is a copy
	fmt.Printf("- &a=%p a.A=%d a.B=%s a.C=%f\n", &a, a.A, a.B, a.C)
	fmt.Printf("- &b=%p b.D=%d b.E=%s b.F=%f\n", &b, b.D, b.E, b.F)

	memory, err := mmap.MapRegion(nil, 4096, mmap.RDWR, mmap.ANON, 0)
	if err != nil {
		panic(err)
	}
	memory[0] = 123
	fmt.Printf("- &memory=%p memory=%p memory[0]=%d\n", &memory, memory, memory[0])

	c := (*B)(unsafe.Pointer(&memory[0])) // c is now a pointer to type B struct in the memory map!
	c.D = 111 // changes mmap byte[0] which is memory[0] AND c.D
	fmt.Printf("- &c=%p c.D=%d c.E=%s c.F=%f // c=%p memory[0]=%d\n", &c, c.D, c.E, c.F, c, memory[0])

	b.D = -1 // changes b.D to -1
	copier.Copy(c, b) // deep copy b to c
	fmt.Printf("- &c=%p c.D=%d c.E=%s c.F=%f // c=%p memory[0]=%d\n", &c, c.D, c.E, c.F, c, memory[0])

	dumpByteSlice(memory[:64])
	fmt.Printf("- &c.E=%p &c.F=%p\n", &c.E, &c.F)

	sizeOfC := unsafe.Sizeof(*c)
	fmt.Printf("- unsafe.Sizeof(c)=%d\n", sizeOfC)
	fakeString := (*[3]byte)(unsafe.Pointer(&memory[sizeOfC]))
	copy(fakeString[:], b.E) // copy string 'foo' to fake string in mmap
	fakeString[0] = 'z'
	p := (*uint64)(unsafe.Pointer(&memory[8]))
//	fakeStringPtr := 0x0102030405060708
	fakeStringPtr := (uintptr(unsafe.Pointer(&memory[sizeOfC])))
	fakeStringU64 := uint64(fakeStringPtr)
	*p = fakeStringU64 // store pointer to our fake Golang string
	dumpByteSlice(memory[:64])
	fmt.Printf("- c=%+v\n", c)
}

func dumpByteSlice(b []byte) {
	var a [16]byte
	n := (len(b) + 15) &^ 15
	for i := 0; i < n; i++ {
		if i%16 == 0 {
			fmt.Printf("%4d", i)
		}
		if i%8 == 0 {
			fmt.Print(" ")
		}
		if i < len(b) {
			fmt.Printf(" %02X", b[i])
		} else {
			fmt.Print("   ")
		}
		if i >= len(b) {
			a[i%16] = ' '
		} else if b[i] < 32 || b[i] > 126 {
			a[i%16] = '.'
		} else {
			a[i%16] = b[i]
		}
		if i%16 == 15 {
			fmt.Printf("  %s\n", string(a[:]))
		}
	}
}

The output shows that the cloned, self-referencing data structure in the mmap seems to be usable from Golang:

$ go run unsafe-demo.go
- &a=0xc000136000 a.A=2 a.B=foo a.C=1.234567
- &b=0xc000136020 b.D=1 b.E=foo b.F=1.234567
- &memory=0xc000128018 memory=0x12e8000 memory[0]=123
- &c=0xc000130020 c.D=111 c.E= c.F=0.000000 // c=0x12e8000 memory[0]=111
- &c=0xc000130020 c.D=-1 c.E=foo c.F=1.234567 // c=0x12e8000 memory[0]=255
   0  FF 00 00 00 00 00 00 00  47 16 10 01 00 00 00 00  ........G.......
  16  03 00 00 00 00 00 00 00  4B 06 9E 3F 00 00 00 00  ........K..?....
  32  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
  48  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
- &c.E=0x12e8008 &c.F=0x12e8018
- unsafe.Sizeof(c)=32
   0  FF 00 00 00 00 00 00 00  20 80 2E 01 00 00 00 00  ........ .......
  16  03 00 00 00 00 00 00 00  4B 06 9E 3F 00 00 00 00  ........K..?....
  32  7A 6F 6F 00 00 00 00 00  00 00 00 00 00 00 00 00  zoo.............
  48  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
- c=&{D:-1 E:zoo F:1.234567}

Is it worth continuing down this path, or are there going to be difficult or impossible to resolve issues?

skillian · August 28, 2021, 2:34pm

I do not yet understand your use case. One of the reasons you gave was:

But I don’t understand what you mean by that; what’s “arbitrary memory corruption?” If you have a hardware error in your memory or there’s some strong electromagnetic disturbance, that could potentially corrupt your memory regardless of the memory page safety.

I am not familiar with blockchain, but in my own (perhaps not applicable) experience, I want read only data structures so that I can catch attempted state modifications at compile time.

Given your example data structures:

type A struct {
   A int8
   B string
   C float32
}

type B struct {
   D int8
   E string
   F float32
}

Could you not refactor them to something like this: Go Playground - The Go Programming Language

This gets you compile-time safety at the cost of “clunkier” syntax. Even though it’s not as easy to work with as plain-ol’ structs and slices, the code communicates that “something weird is going on here where you cannot treat these data structures as simple structs and slices.”

The read-only memory map is interesting and could work as long as you never put any pointers into your structs that reference Go memory, which may or may not include strings. As far as I am aware, the Go garbage collector does not yet move objects and update pointers to the new locations, but that is a definite possibility in the future. If you put a pointer to some Go memory allocation into your read-only memory, Go might move the object being pointed to and either: Your pointer now points to uninitialized memory, or the Go garbage collector will try to update your pointer in the read only memory and you get a segmentation fault even though none of your code attempted to write to the struct. As long as you’re dealing with scalar values or manually building strings (and/or any other pointed-to types in your data structures) in the read-only memory, that might be fine, but without any compile-time safety checks, the only benefit I can see is that you get a segmentation fault and crash whenever you mutate memory instead of that write being allowed and silently/undetectably corrupting the state of your program.

Zyl · September 1, 2021, 2:50pm

What kind of changes are you trying to protect the memory from? Normal usage of the language? Programming errors? Reflective access? Processes with debug privileges? Hardware failure? Cosmic rays? The FBI? Need more info here.

simonhf · September 7, 2021, 7:26pm

Thanks for the response @skillian!

But I don’t understand what you mean by that; what’s “arbitrary memory corruption?” If you have a hardware error in your memory or there’s some strong electromagnetic disturbance, that could potentially corrupt your memory regardless of the memory page safety.

In this case, I just meant that if some other buggy code (e.g. a C library wrapped as a Golang package) extremely infrequently and/or randomly writes a byte into memory then if that memory is read-only then the corruption would get caught immediately.

Could you not refactor them to something like this: … This gets you compile-time safety at the cost of “clunkier” syntax.

Thanks for the suggestion!

Yeah, the problem is that the structures in reality are much larger and can be hierarchical. The current usage pattern is:

Many functions read and write to a larger struct until a certain point in time AKA construction phase.
At a certain point in time then the larger structure becomes fixed / frozen / unchangeable.
After being frozen then same functions should work as before to navigate the structures, and no more updating should happen.

It’s guaranteeing the last part “and no more updating should happen” which is the tricky bit. This is currently done in the code by taking a cryptographic hash of the serialized data structure. In theory the hash only has to be taken once and uniquely identifies the frozen structure. But in reality the hash gets calculated multiple times to ensure that the frozen data structure has not changed somehow, e.g. by buggy code or due to memory corruption or whatever.

The read-only memory map is interesting and could work as long as you never put any pointers into your structs that reference Go memory, which may or may not include strings.

Thanks, and that’s also what I was thinking too. I have a hacky PoC which copies structs, slices, and strings into an own memory map. In this case, there are pointers but they are pointers to memory I have created myself (and not Golang) in the memory map. So this means the data structure in the memory map DOES contain pointers but all the pointers point to things inside the memory map and all those things have been created by me.

This seems to work, BUT… I wanted to implement the same thing for Golang maps and now the problems start. If the arbitrary struct contains a pointer to a map then I’d like to recreate the map in the memory map too. However, the multiple data structures for the map, and biz logic to navigate them, are all hidden away in Golang’s own runtime package. So not sure how to recreate / copy a Golang managed map inside an own memory map. Because the Golang managed map uses multiple structs which are hidden inside the runtime package, I have no way to recreate them inside the memory map.

The only idea I have so far – and likely difficult to implement – is as follows:

For a given version of Golang on a given platform, my package xyz:
Grab the runtime source files and copy them into my package xyz.
Auto change the copied source files so that their package is xyz instead of runtime.
Assuming I can get it to compile then now I have a ‘copy’ of runtime built into my package.
That also means I can see all the hidden map internal data structures from my package.
In theory I can auto change the map.go copy so that when it allocates its structures, they do not go on the heap or stack, but go in my memory map

As a side functionality, the code will also be able to dump not only structures but also all the internal Golang memory associated with those structures. For example, packages like spew already dump arbitrary complex Golang data structures, but only in relatively high level format. Yes, there it dumps pointer values, but it doesn’t dump the memory and pointers associated with the internal Golang data structures such as struct internal padding, slices, strings, and finally the more complicated map internal structs.

So my plan is:

Step 1: Use the copy runtime technique to get access to the internal map structs.
Step 2: Finish off my ‘dumper’ which dumps all memory associated with an arbitrary user struct.
Step 3: Use the runtime copy to recreate maps at an arbitrary memory location, and use the step 2 dumper to check that all memory associated with the cloned data structure is in the memory map of my choosing.

Lots of problems to solve here, but in theory after I recreate an arbitrary map at a data location of my choosing – e.g. in my memory map – in theory, if I will be able to access that map in memory map with the ‘real’ runtime map functions and it’ll just work. If writing it will fail either because the memory map is read-only, or because it tries to update non-Golang managed internal structures.

What do you think? And is there an alternative way?

[1] GitHub - davecgh/go-spew: Implements a deep pretty printer for Go data structures to aid in debugging

simonhf · September 7, 2021, 7:28pm

@Zyl thanks for the response! For an answer, please see my response to @skillian.

simonhf · September 14, 2021, 4:16pm

Update: So I tried the copy runtime package idea, and changing the package name to effectively try and make a working copy of the runtime package where I can see all the private structs etc. There were about 500 files which was a lot of overkill just to get hold of the map structs. However, the copied package ended up not compiling. Why? I ran into this issue: [1]. Seems that the runtime package cannot compile on its own and needs a sort of Golang compile pre-script run first. I decided not to go further down this rabbit hole for the time being.

Instead I tried cherry picking the consts, structs, and funcs necessary from runtime to see the private map structs and iterate through them. This has proven more successful and I didn’t run into the pre-script issue, or need any cgo or assembler files, etc. Next step is to actually use the cherry picked and copied runtime source code

[1] go - Why "undefined: StackGuardMultiplierDefault" error? - Stack Overflow

system · December 13, 2021, 4:17pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.