Zero-Copy Communication

Hello there,

We have multiple Docker containers running isolated go binaries and currently we make them communicate between each other over ZMQ IPC (most likely over UNIX domain sockets) the throughput is good, we reach several Gbit/s but still this is not a zero-copy approach and has quite some overhead involved.

My question is: how to make those isolated Go services communicate in a zero-copy way to avoid any overhead?

There was an idea to make the signal communication go through the domains sockets and the data in a shared memory region, but somehow I was not able to find good examples for shared memory usage in Go, maybe there is a more elegant solution we don’t yet know about?

I think the elegant solution here would be to run your “isolated go binaries” in a single process and communicate with function calls and/or channels.

Assuming good separation between ZMQ and the core of your programs, you could replace ZMQ (perhaps with a build flag) with an IPC layer that call the other program cores.

Since your current deployment does not strictly require the system to be distributed (as the “isolated go binaries” are communicating over unix sockets, meaning they are all running on a single machine and are therefore not actually distributed, though they could be), a parallel system would be sufficient. Given a ZMQ interface compatible IPC layer, this deployment option could be arranged at compile time.

Unfortunately this is not possible, we are talking about an eco-system of individual binaries/services using one communication protocol standard (RPC basically) which are isolated inside their Docker containers for security reasons.

Yes, those containers run on a single machine, though they must remain separate. I thought of shared memory beeing an option because it basically gives 2 separate binaries access to a shared region of memory they can directly use without buffer copying, but still I don’t know how to actually implement it.

In my opinion, any shared memory solution would be fundamentally incompatible with isolation for security reasons. If they share memory, there is no isolation. Communicating by sharing memory is also contrary to recommended Go design. In other words, this sounds like a bad idea to me. :slight_smile:

I’ll try to illustrate the whole situation in a little more detail:

We keep different services separated in their own Docker containers for the following reasons:

  • Services generate data.
  • Services have no access to data of other services by default.
  • Services are developed by third parties.
  • Services are installed by the user as plugins so containerization is an ideal solution here.
  • There are preinstalled root services and installable user services.

Services communicate between each other over a well-defined IPC protocol (standardized proprietary RPC). User services aren’t allowed to talk to neither user nor root services directly so they have to pass their messages through our IPC RPC proxy which verifies the message and either passes it over to their destination service or discards it.

the problem’s just that the current ZMQ IPC implementation is far from ideal in terms of latency and throughput. Unix domain sockets pass those messages through the Kernel transfering Gigabytes of information through Kernel buffers copying it over and over again causing unnecessary system load and latency, that’s why I’m looking for a zero-copy solution here to improve performance and lower system load by an order of magnitude.

That is not true because of the following reasons:

  • Processes share only certain, isolated regions of memory, not their own memory (this wouldn’t be possible either). Processes write only certain, shared information onto those shared memory regions, other information maintained by the process is not in any way exposed.
  • Shared memory instances are mapped to files (at least in Linux) and files have regular permission sets, if process A provides certain information for process B via shared memory, processes C and D won’t be able to read it because of file permissions.

Please correct me if I’m wrong, but as far as I know there ain’t any security issues with correctly applied shared memory.

Even though “communicating by sharing memory” indeed is contrary to the recommended Go design you should take into account we are talking about IPC, inter-process communication and not about the scope of a single binary where those recommendations would actually apply, obviously we can’t use channels for IPC etc; so again: those design recommendations don’t apply to IPC, they apply to writing go programs.

So far the question still remains unanswered.

This may be an “agree to disagree” situation. Obviously I understand that shared memory does not mean sharing the entire process address space. However I’m still skeptical towards reading memory under the control of an untrusted process. With an IPC protocol, messages can be verified to conform to the expected structure and limits, and then will not change under your feet. Working directly out of memory which is controlled by an attacker and can change at any time (no copies, right?) sounds like a security nightmare to me.

As you yourself noted, you can mmap a file. I would expect mmapping a file in a volume shared by multiple containers to work?


Totally agree on that, but couldn’t we make the sender somehow give away his rights on the shared memory object before we consider the message sent? In this case service A creates a message M (structured object in shared memory), gives away it’s rights on M to the intermediate proxy P (a root service) and asks it to forward the message to service B. The proxy verifies M is accessible only to P and passes over the rights to the receiver B.

In this case we create the message only once and only alter the access permissions somehow avoiding lots of buffer copies, and even more when a message is forwarded multiple times.

Would such an architecture be possible to implement, and if yes, then how?

as far as I know memory mapped files are basically contents of files that are stored on the disk mapped to RAM so you can work with file contents more efficiently. I didn’t really understand how you’d apply this to the current problem, can you explain your suggestion in a little more detail please?

You can memory map a file from a tmpfs such as /dev/shm or /run, then use unix.Mmap (from “”) to get a []byte of shared memory, then you can cast the slice to your own data structures. I have done this to inter-operate with legacy applications. You still need some way to synchronize access to the share memory to avoid reading of partial writes.

Sounds like it could work, but I’m still concerned about the security issue Jakob mentioned when the contents of the message change after the message was sent.

Can the sender irreversibly give away his rights on the memory mapped file to another service? If yes, how could we do that?

I think normal file permissions should be good enough. If you have attackers so that you can’t trust your file system, then there is very little you can trust. If you have bugs, then you have bugs, and more code means more bugs. There is no point just emulating the security of another communication mode. Granular security means more overhead, that is a trade off you have to make.

That said, you should build an abstraction layer over your shared memory so your user code does not manipulate the bytes directly, perhaps there can be a process id ownership check assertion in this abstraction layer that defend against certain concurrency bugs. This is not foolproof or prevent any attacks.

There maybe a granular file permissions system that you can use, or change the ownership of shared memory file on the fly, but these are outside my area of expertise.

A zero-copy api have to give a byte slice to be written into, and I know no easy way to forcefully take that away.
If you really insist take that byte slice away after filling it, you could use syscall.Mprotect:

Get(n int) ([]byte, error) // return a byte slice with at least n*4096 bytes (the slices must be aligned to page boundaries)
Put([]byte) // give the byte slice back - this will call mprotect on the slice, so no further modifications allowed on the slice

This will slow things down a little bit, as those mprotect calls are syscalls.
The same can be repeated on the reader side:

Next() ([]byte, error) // return the next block of message - it is Mprotect-ed to be read-only, and this call will panic if the previous slice hasn’t been Released yet!
Release([]byte) // release the slice - this will Mprotect the slice to be unaccessable; if the slice is not Released, the Next call will panic.

Seems a little bit awkward…

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.