Why are goroutines better than traditional pthreads?

Ignoring differences in the default stack size/memory, why are goroutines any better than pthreads? Goroutines still result in high amounts of context switches. If I’m running a fasthttp server that does nothing but respond with a 200 status code, I’m seeing ~250,000 ctxsw/s (measured with vmstat) when I reach ~200,000 requests per second (8 cores, GOMAXPROCS=8). I see people talk about goroutine context switches being less expensive, but I haven’t seen any data on that - has that been proven to be true?

tl;dr - Memory aside (RAM is cheap), are goroutines better than concurrency models in other languages that map onto pthreads? Why?

1 Like

Goroutines are cheaper to create and do not carry any OS overhead.

2 Likes

From the perspective of a programmer the concept of Goroutines ist much easier to understand than the concept of threads. If you follow the advice

Do not communicate by sharing memory; instead, share memory by communicating.

you can avoid a whole class of problems that bite you with threads.

2 Likes

Goroutines are cheaper to create

That’s true

and do not carry any OS overhead.

But is this? If I’m running a program that simply has two goroutines, with one sending something to a channel that the other is listening on in an endless loop, I’m observing hundreds of thousands of OS context switches and each one seems to be taking an average of two microseconds. This doesn’t differ much from context switch times I measure in Java between threads.

I am aware that the general consensus is that Goroutines don’t carry much, if any, OS overhead, and I want to believe that, but that differs from the data I’m observing in my testing.

Context switches do not happen in the OS, they happen on the CPU.

You can’t do anything against them.

Goroutines between each other know how to communicate, the scheduler does as well and therefore can optimise scheduling based on this knowledge.

If you were doing this via memory observations between is threads/processes the knowledge were not available, communication were opaque to the scheduler and it had to give every thread a timeslot every now and then, just causing the thread to hand it back again (or if programmed badly wait busily).

2 Likes

Goroutines between each other know how to communicate, the scheduler does as well and therefore can optimise scheduling based on this knowledge.

To me, the appeal of goroutines seems to be convenience/memory, and not necessarily an improvement in performance, as @lutzhorn said. And I agree that - with channels and other paradigms offered with Go concurrency - goroutines are very intuitive and easy to grasp.

In my fasthttp vs Vert.x example, when I notice (slightly) more context switches being done by fasthttp and net/http then by Vert.x/Firenio/Undertow/Wizzardo when handling the same throughput and doing the same work, and when context switches are arguably the main reason cited when comparing pthreads to goroutines, are goroutines really reducing system CPU time compared to pthreads?

As a disclaimer, I prefer goroutines significantly more than traditional pthread-based concurrency models. They’re easy to understand and use, and with 2KB per thread (compared to ~64KB per thread minimum in Java), it’s hard to argue that goroutines aren’t objectively better than pthreads.

But from a performance perspective, ignoring RAM constraints, are goroutines better when internally they use similar, syscall heavy paradigms as eventloop based models?