Help understanding call graph and optimizing code

I have been working on a server and it seems really inefficient. I ran a profiler, here’s the pdf it produced:


So it seems like what’s most expnsive is when I am writing to the websockets…but I don’t understand what “bufio.(*Writer).flush” or “syscall.Syscall” are.

Does anyone know what I can do to fix these functions taking up so much processing power? Or is writing to websockets just expensive in general?

Any help is appreciated!:slight_smile:

The profiler measures time, not CPU usage. Sometimes they are the same, if your code is doing computation, but in this case they are not as your code is waiting the network.

It’s also worth noting that profiling is not reliable on osx unless you are running el Capitan or later. Profiling in the other BSDs is also suspect.

Also, try running your benchnark for longer, several minutes at least as there are only 100 profiling event per second.

I saw that somewhere and updated to El Capitan! It wasn’t working at all before but seems to work fine now. Thanks for the heads up:) I’ll also run it a bit longer to see what’s up but it definitely seems like writing to a websocket is taking a long time.

Or whatever is reading the other end of the web socket is taking a long time.

Do you know if it’s possible to implement a non blocking write?

The net package already implements non blocking writes.

http://dave.cheney.net/2015/08/08/performance-without-the-event-loop

oh shoot my bad, but then why would it matter how long it takes for the other end to read?

The goroutine doing the write might be blocked if the other end is slow to process the data. The Go runtime will assign that goroutines’ thread to some other work if available, the profiler does not show this activity.

I just let it run for a longer period of time and it looks like both read and write are taking forever.


I’m not really sure how to fix that though, especially with the write because it’s not like the code is waiting for it to be done. It just calls write and then it should continue with the for loop

Maybe I’m looking at the wrong data here though. The issue I’m having is that the % of cpu usage goes up a ton as more players connect to the server…but this data could be completely unrelated right?

22% of the time is spent in syscall.Syscall waiting for a read operation to complete. The next biggest consumer is a lot of spinning inside the scheduler probably running a goroutine for a brief instant then having to go back to the scheduler for more work.

Let’s step back a bit. Why do think this application is slow ? It looks like it’s spending most of its time waiting for someone to send it data, possibly rather slowly or inefficiently.

There is a goroutine called getData that blocks until data is sent to it, could that be it? If so what would the alternative be to waiting until data is sent?

Can you please post your source. Thanks.

Yeah I will, sorry I didn’t earlier. It’s just a ton of code, I thought it would be annoying to post it
https://bitbucket.org/Diericx/mmoserver4.0/src

There is a goroutine called getData that blocks until data is sent to it, could that be it?

Yes

If so what would the alternative be to waiting until data is sent?

I guess it depends on what your program does. What are you trying to benchmark ?

What do you mean what am I trying to benchmark?

And it’s a server for a multiplayer game, so it’s really just relaying information to multiple WebSockets. The only issue is that I want it to be able to support a TON of players, and at this point it’s starting to lag on my macbook at around 10 people and a few enemies, which is pretty bad haha

The data you showed suggested your program is spending all its time waiting on network IO.

I suggest starting with some higher level metrics. How much CPU does top say your server program is using ? Turn on gc debugging, http://talks.godoc.org/github.com/davecheney/presentations/writing-high-performance-go.slide#25, is the output continious, or does it settle down after an initial spike – you could have a problem with allocations (although the profile suggests you dont).

Is this a client problem, does your client software need to get updates from all it’s peers before doing some action ? Does adding more clients cause the amount of traffic to grow exponentially ?

1 Like

I’ll make a little graph of the estimations of CPU usage. Sorry about the formatting, I couldn’t get an actual table to look nice:

(num of people, cpu %)
(0, 2%)
(1, 3.2%)
(2, 4.5%)
(3, 5.5%)
(4, 6.5%)
(5, 7.8%)
(6, 8.6%)
(6 with 1 Enemy, 30%)

So there is definitely an issue with enemies. When I was collecting data for the profiler I did make sure I spawned an enemy and it didn’t seem like any other function was being expensive.

I tried printing the garbage collection and here’s what I got…I’m not too sure what it means though:(

gc 1 @0.007s 4%: 0.085+0.67+0.18 ms clock, 0.17+1.0/1.1/0.48+0.36 ms cpu, 4->4->1 MB, 5 MB goal, 8 P gc 2 @0.010s 6%: 0.023+1.0+0.16 ms clock, 0.070+1.1/1.5/1.3+0.48 ms cpu, 4->4->1 MB, 5 MB goal, 8 P gc 3 @0.013s 8%: 0.040+0.91+0.16 ms clock, 0.12+1.2/1.6/1.6+0.50 ms cpu, 4->4->1 MB, 5 MB goal, 8 P gc 4 @0.016s 10%: 0.054+1.6+0.22 ms clock, 0.21+1.8/2.9/3.7+0.90 ms cpu, 4->4->2 MB, 5 MB goal, 8 P gc 5 @0.019s 11%: 0.022+0.98+0.15 ms clock, 0.090+1.4/1.8/2.3+0.62 ms cpu, 4->4->2 MB, 5 MB goal, 8 P gc 6 @0.021s 12%: 0.020+0.98+0.13 ms clock, 0.080+1.2/1.9/3.0+0.55 ms cpu, 5->5->3 MB, 6 MB goal, 8 P gc 7 @0.026s 12%: 0.050+1.3+0.16 ms clock, 0.20+1.7/2.4/2.8+0.65 ms cpu, 6->6->3 MB, 7 MB goal, 8 P gc 8 @0.029s 13%: 0.14+1.6+0.17 ms clock, 0.59+1.6/2.5/3.1+0.70 ms cpu, 7->7->3 MB, 8 MB goal, 8 P gc 9 @0.033s 13%: 0.020+1.2+0.20 ms clock, 0.083+1.4/2.3/3.3+0.82 ms cpu, 6->7->5 MB, 7 MB goal, 8 P gc 10 @0.036s 14%: 0.24+1.1+0.085 ms clock, 0.98+1.2/2.0/3.3+0.34 ms cpu, 9->9->6 MB, 10 MB goal, 8 P gc 11 @0.040s 14%: 0.034+1.3+0.091 ms clock, 0.13+1.4/2.2/2.9+0.36 ms cpu, 11->11->6 MB, 12 MB goal, 8 P gc 12 @0.042s 15%: 0.13+1.4+0.27 ms clock, 0.54+1.4/2.7/3.0+1.1 ms cpu, 13->13->8 MB, 14 MB goal, 8 P gc 13 @0.045s 15%: 0.72+1.2+0.16 ms clock, 2.8+1.5/2.2/3.2+0.65 ms cpu, 16->17->12 MB, 17 MB goal, 8 P gc 14 @0.048s 16%: 0.13+2.0+0.12 ms clock, 0.55+2.0/3.5/3.7+0.50 ms cpu, 24->25->17 MB, 25 MB goal, 8 P gc 15 @0.053s 16%: 0.28+1.3+0.30 ms clock, 1.1+1.8/2.2/2.2+1.2 ms cpu, 31->32->22 MB, 34 MB goal, 8 P gc 16 @0.058s 16%: 0.19+1.6+0.14 ms clock, 0.78+1.8/2.2/2.5+0.59 ms cpu, 44->44->31 MB, 45 MB goal, 8 P gc 17 @0.070s 14%: 0.033+0.95+0.24 ms clock, 0.13+2.9/1.8/0+0.97 ms cpu, 57->57->31 MB, 62 MB goal, 8 P gc 18 @0.096s 12%: 0.096+2.3+0.10 ms clock, 0.77+6.8/3.8/0.048+0.86 ms cpu, 60->61->31 MB, 62 MB goal, 8 P

As for the client receiving data, it will just wait for data to come in from the server and update it’s game state according to the data it receives.
var sock = new WebSocket(ip); //73.158.248.23 sock.onmessage = function(m) {Server.receiveData(m, game)}
For sending data, it is constantly sending data every 15 milliseconds.
window.setInterval(function(){ game.sendData(); }, 15);

Honestly, the packets that the server sends are sometimes pretty large (I think). I’ll put an example packet below:

{"Objects":[{"Id":"887ab535-4d16-4b7a-ba5f-85089e4fbbeb","Username":"","Parent":"","Child":"","Height":0,"Type":"Npc","Tag":"EyeEnemy","Health":100,"HealthCap":100,"Power":100,"MaxPower":0,"X":743.8494262695312,"Y":748.7734375,"Rot":0.19684016704559326,"Items":null},{"Id":"ffade326-dff9-4fd5-ae26-0477e75e3e05","Username":"","Parent":"","Child":"","Height":0,"Type":"Npc","Tag":"EyeEnemy","Health":100,"HealthCap":100,"Power":100,"MaxPower":0,"X":743.4473876953125,"Y":750.5331420898438,"Rot":-0.08118445426225662,"Items":null},{"Id":"33d44406-8709-4e89-a6e1-482a3da4a320","Username":"","Parent":"","Child":"","Height":0,"Type":"Npc","Tag":"EyeEnemy","Health":100,"HealthCap":100,"Power":100,"MaxPower":0,"X":760.0582885742188,"Y":763.01904296875,"Rot":-2.2285945415496826,"Items":null},{"Id":"3f97337e-5b43-40c1-a9d8-119a842ea002","Username":"","Parent":"","Child":"","Height":0,"Type":"Npc","Tag":"EyeEnemy","Health":100,"HealthCap":100,"Power":100,"MaxPower":0,"X":756.9351196289062,"Y":768.0557250976562,"Rot":-1.9375170469284058,"Items":null},{"Id":"ae562dff-f3bb-4a54-8aa5-49fb5e466d33","Username":"","Parent":"","Child":"","Height":0,"Type":"Npc","Tag":"EyeEnemy","Health":100,"HealthCap":100,"Power":100,"MaxPower":0,"X":755.141357421875,"Y":778.2151489257812,"Rot":-1.7510385513305664,"Items":null},{"Id":"1ecc1629-7369-4406-89f2-0aca9ad3886a","Username":"","Parent":"","Child":"","Height":0,"Type":"Npc","Tag":"EyeBoss","Health":500,"HealthCap":500,"Power":100,"MaxPower":0,"X":750,"Y":750,"Rot":0,"Items":null},{"Id":"f178503a-05b7-4e38-98da-9a0e5ad777aa","Username":"","Parent":"","Child":"","Height":0,"Type":"Npc","Tag":"EyeEnemy","Health":100,"HealthCap":100,"Power":100,"MaxPower":0,"X":767.244140625,"Y":775.5899047851562,"Rot":-2.1637659072875977,"Items":null},{"Id":"916b895d-0b42-4fa8-ad05-cc4d98370c2a","Username":"","Parent":"","Child":"","Height":0,"Type":"Npc","Tag":"DickEnemy","Health":500,"HealthCap":500,"Power":100,"MaxPower":0,"X":-700,"Y":-700,"Rot":0,"Items":null},{"Id":"d1220ac7-ea44-4d2b-9954-003574af0642","Username":"","Parent":"","Child":"","Height":0,"Type":"GameObject","Tag":"Laser MKII","Health":100,"HealthCap":100,"Power":100,"MaxPower":0,"X":-50,"Y":-50,"Rot":0,"Items":null}],"Cp":{"Id":"7629ab7d-8cab-4c96-ad4a-5d1116f8fc46","Username":"Johnsy","Parent":"","Child":"","Height":0,"Score":0,"Health":100,"HealthCap":100,"Power":200,"MaxPower":200,"X":0,"Y":0,"Rot":-0.05031720548868179,"MouseAngle":0,"Items":["","","","","","",""],"Targets":[{"Distance":70,"Angle":-2.356194490192345}]},"Leaderboard":[{"Username":"Johnsy","Score":0},{"Username":"","Score":0},{"Username":"","Score":0},{"Username":"","Score":0},{"Username":"","Score":0}]}

It’s around 2500 bytes. It seems large to me but maybe I’m just underestimating what I’m able to send over sockets?

gc stats look fine, cpu stats look fine.

Honestly, I dont sending data in json over websockets every 15 milliseconds is going to fly. Your profile shows your program spends all of it’s time waiting for data from the network, and the client probably spends most of its time generating that data and pushing it up to the server.

JSON data is very expensive to parse and generates a lot of temporary objects which will put pressure on the garbage collector.

I think if you want this to be performant you need to move to a binary encoding and probably UDP, or at least raw TCP for your client <> server communications.

Yeah that makes sense, I was using MessagePack a while back but used Json so that debugging (looking at the packets) would be easier. I’ll switch back to MessagePack and send it over as a byte array. Does that sound good?

I’m using WebSockets because I really wanted this game to be browser based and JavaScript only supports WebSockets :frowning: Do you think it would make that much of a difference in the server’s performance to switch to TCP/UDP? Because as it is the messages are being sent fast enough and there isn’t much lag it’s just that the server is being very expensive.

It must be possible to use WebSockets because a game called Realm of the Mad God was created using flash and it supports up to 50 people in the same area without too much lag.

Is there any way to view CPU usage instead of time in the profiler?