I have been spending my day over implementing an efficient PubSub system. I had implemented one before using channels, and I wanted to benchmark that against sync.Cond. Here is the quick and dirty test that I put together a gist here. Now my confusion starts when I change GOMAXPROCS to test how it would perform on my age old Raspberry Pi. Here are results:
mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=8 go test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out -benchmem
BenchmarkPubSubPrimitiveChannelsMultiple-8 10000 165419 ns/op 92 B/op 2 allocs/op
BenchmarkPubSubWaitGroupMultiple-8 10000 204685 ns/op 53 B/op 2 allocs/op
PASS
ok sibte.so/rascore 3.749s
mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=4 go test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out -benchmem
BenchmarkPubSubPrimitiveChannelsMultiple-4 20000 101704 ns/op 60 B/op 2 allocs/op
BenchmarkPubSubWaitGroupMultiple-4 10000 204039 ns/op 52 B/op 2 allocs/op
PASS
ok sibte.so/rascore 5.087s
mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=2 go test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out -benchmem
BenchmarkPubSubPrimitiveChannelsMultiple-2 30000 51255 ns/op 54 B/op 2 allocs/op
BenchmarkPubSubWaitGroupMultiple-2 20000 60871 ns/op 43 B/op 2 allocs/op
PASS
ok sibte.so/rascore 4.022s
mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=1 go test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out -benchmem
BenchmarkPubSubPrimitiveChannelsMultiple 20000 79534 ns/op 61 B/op 2 allocs/op
BenchmarkPubSubWaitGroupMultiple 100000 19066 ns/op 40 B/op 2 allocs/op
PASS
ok sibte.so/rascore 4.502s
I tried multiple times and results are consistent. I am using Go 1.8, Linux x64, 8GB RAM. I have multiple questions:
Why do channels perform worst than sync.Cond in single core results? Context switching is same if anything it should perform worst.
As I increase the max procs the sync.Cond results go down which might be explainable, but what is up with channels? 20k to 30k to 20k to 10k I have a i5 with 4 cores, so it should have peaked at 4 procs (pst. I tried 3 as well it’s consistent).
I am still suspicious I am not making some kind of mistake in code. Any ideas?