Yeah, I have a lot of work ahead of me optimization-wise, but I didn't think I had to do it immediately.
What I was saying before was this same game ran flawlessly in the past with more players than it's seeing now.
The server only went live again a few days ago and immediately this world.cpu issue is a problem. Of course I figured it was something I did, but these past few days worth of investigation and conversations I've with two people who've encountered the same problems leads me to believe it's not the game. Both those individuals were only able to tame the problem by downgrading the BYOND they used.
Another point I was making: I've watched BYOND grow for a long time now, but with that progress, issues that were seemingly non-existent or hardly notice-able for the average client in older versions, becomes exaggerated as newer builds are released. Those who have reported these issues -- Zhxi being a notable mention -- never saw any fruit because said issues are hardly reproducible (since you have to be playing an active game), so nothing is done.
This is why I'm going with the desperate attempt of downgrading. I don't have much choice at this point, as I've been constantly rebooting for the last few days to calm the issue temporarily, but that is frustrating for me and players.
Anyway, you can check out a profile here: https://pastebin.com/mPUFUUe5
Even though I know these things need optimization, I'm 99% sure they are not problems directly.
Note: The profile above is even after most code was changed to use garbage collection. That's over a span of about 2 hours (which is the maximum amount of time needed before CPU begins overflowing).
In response to Lummox JR
|
|
What I'm saying is, a single profile isn't enough to help. You need multiple profiles, and to keep track of roughly how many players are on when each one is made. Get maybe a minute worth of sample time to see what's running, and that should make for good comparisons.
Armed with that data, it should be possible not only to identify problem areas in your own code, but also perhaps to see if maybe anything in BYOND could be exacerbating matters on the backend. The latter is tricky because not everything that might happen on the backend would show up in the profile, but it's possible there could be an echo of something useful. Overall though, I really don't see how it's possible that server performance would have downgraded at all in 512. If anything you should be seeing an improvement. The only things that changed appreciably under the hood would result in slight speedups, but nothing should be slowing down. |
Well I'll report back tomorrow with those profiles and whether or not downgrading has a positive impact in the long-run.
|
Here are profiles over the span of about a minute of activity:
50 players: https://pastebin.com/MEj2yngs 68 players: https://pastebin.com/KAJnGQ4g 72 players: https://pastebin.com/N2gYDpsB 75 players: https://pastebin.com/bKSe6cAJ The last two profiles had a turf war going on where it's at least 10 vs. 10 battling on a map. The code for that is in war.dm. I have not downgraded yet. I figured before I try that, I'll try changing the map_format back to tiled. One of the conversations I had indicated that the problem was associated with map rendering. I'm already seeing improvements on a local server; CPU no longer jumps around between 0-2 with just me and AI active, it sits at a firm 0). We'll see tomorrow though. |
if you are in DD's admin list (world.SetConfig("APP/admin", ckey, "role=admin")) you can type byond://?debug=status into the input/command/verb bar and this will list all active connections and (more importantly) print out the sleep/spawn queue
edit, also: id:2197174 |
In response to FKI
|
|
If you use a lot of big icons, it's conceivable that was having some impact with the visual bounds code. However that really shouldn't come up a lot unless those big icons are getting updated a lot, which for turfs at least shouldn't be the case. Also I don't think that would be likely to get appreciably worse with more players, unless most of the big objects are created through their activity.
|
In response to Lummox JR
|
|
Yeah, there are a good amount of skills larger than world.icon_size, which are used frequently during gameplay.
|
In response to FKI
|
|
Here's something I noticed: If you're not using tiled mode, then it looks like all of the code having to do with the Grabbed var is useless. If you were to get rid of that, or wrap it up in an #ifdef or #ifndef block, you could get rid of atom/movable/Del() with it and save a crapload of time.
And in fact, your code isn't calling Expand() anywhere. The only place it could get called is commented out. Right now your code is overriding atom/movable/Del() for literally no reason, and that's the biggest regular CPU user in all of your profiles. |
In response to FKI
|
|
It would take a lot of convincing for me to believe the visible bounds code could be impacting your server performance in a serious way. Certainly a downgrade to 511 would not solve that, either. Multi-tiled would only increase the number of objs on the screen, and doesn't really get around the need to use visual bounds if you have a high pixel offset anyway.
|
So I found something that could be the cause thanks to that tip from MrStonedOne. *crosses fingers*
Also I'll remove that extra Del() override. |
In response to FKI
|
|
FKI wrote:
So I found something that could be the cause thanks to that tip from MrStonedOne. *crosses fingers* What was the thing you found? |
likely something sleeping in a endless loop using cpu. profiles don't show procs until they finish (unless the profile was started before the proc), so a proc that sleeps in a endless loop would never finish and would never show up on a profile.
|
@Lummox JR: There was an issue with NPCs continuously calling npcregeneration() and npcDie(). Fixed though and there is no change with server performance however.
How much is a lot of for the "Run Time" metric using the command MrStonedOne mentioned? There aren't a lot of pending procs but the Run Time is above 5000s. |
1) Statbar_Refresh() is an ongoing loop instead of only updating on demand, which you have a todo note for. Ideally, what you want is something you can call on demand that also has a bit of "debounce" in it so multiple calls on the same tick won't have it update multiple times.
2) You also have a very, very weird piece of code in Statbar_Refresh() that ensures even logging-out clients won't stop this proc from running:
That seems like really bad juju to me. While I don't think this is contributing anything to your woes, because it should never have an impact unless a player is logged in only a very short time, and even then it would only result in a useless proc called every 5 seconds to do nothing, it has no business being there.
3) Speaking of debounce, checkwinset() would benefit from that.
4) Also not a real factor, but in regeneration.dm you have a couple spawn() statements right next to each other with the exact same timing. The statements really should be part of the same block, so you're only resuming a spawned proc once instead of twice.
5) Quite a few routines do a sleep(N) before an empty spawn(). Effect on performance is minimal, but don't do that; do spawn(N) instead. You're just adding to proc call overhead for no reason.
6) DoCooldown() in Skill.dm looks like it's being done for all skills, and you have a todo note about improving performance by switching to another system. Seems like a good idea to do so.
7) WorldLoop_Status() is calling itself, without using spawn(). The set waitfor=0 line probably removes the possibility of infinite recursion, but for sanity's sake you should always, always use a spawn in a case like this.
8) You have like a million world/New() overrides defined, and it looks like Login() has something similar. This is not a good idea. Put these procs together under a single proc instead of relying on ..() to chain everything.
Now most of this is nitpicky stuff and I basically gave up after a while because I was getting into the weeds; there might be a nasty loop in here somewhere that isn't showing itself, or at least isn't showing up within my attention span. A lot of the above is just issues of form, although a couple of those things--debouncing some procs and replacing the skill cooldown system--will probably have a bigger impact overall.
However I think it would be a really good idea for you to get detailed profiling results for the cases when you have 60 players, 90, etc., and especially when the CPU starts to run away on you. You said the profiler didn't show anything suspect, but I don't know if you've done that kind of comparison yet--and even if you have, extra eyes on the problem would not hurt. If nothing else it would be good to have a handle on which procs are using up those CPU cycles and see if those particular procs can be improved upon.