I was hoping we could get some pointers or suggestion to solve a performance problem we have been having.
We have a server which starts a goroutine to handle every request. Within each request handler (go-routine) it issues another request to another ‘upstream’ server on the local host (which is usually very fast), over a udp channel. The read on the udp socket seems to be taking much longer than expected. e.g the upstream server takes 300 microsecs to return a response, but the read on the socket takes as much as 10 milli secs. After looking at pprof output and ruling out several things we are feeling that it could be due to time taken for the goroutine to get scheduled back again after it blocks on the read.
Here is the output from 'GODEBUG=scheddetail=1,schedtrace=1000’
P0: status=1 schedtick=85975 syscalltick=205078 m=15 runqsize=16 gfreecnt=42
P1: status=1 schedtick=88090 syscalltick=195469 m=11 runqsize=0 gfreecnt=55
P2: status=1 schedtick=78348 syscalltick=193827 m=8 runqsize=0 gfreecnt=49
P3: status=2 schedtick=81384 syscalltick=197765 m=-1 runqsize=0 gfreecnt=38
P4: status=2 schedtick=77978 syscalltick=199689 m=-1 runqsize=39 gfreecnt=11
P5: status=1 schedtick=85777 syscalltick=201910 m=4 runqsize=0 gfreecnt=58
P6: status=1 schedtick=100552 syscalltick=201247 m=2 runqsize=1 gfreecnt=56
P7: status=1 schedtick=90465 syscalltick=201659 m=16 runqsize=0 gfreecnt=48
i am curious about the imbalance in the runqsize between differernt processors. Is this something which could be causing this. If yes how can we fix/avoid this situation ?