Descriptive Problem Summary:
Unexplained high cpu and random shuttering lag under chunked map send system with DD and DS. (This bug report might be different to the webclient shuttering bug report)
This thread is for goonstation and /tg/station along with any other byond game to coordinate efforts in tracking down this issue.
Lets see if we can't get some work done to nail down the exact version it starts, i'd assume 510.1328, as that added the new chunking system, but it could be 29,30, or 31, as 31 and 32 are the only confirmed bug sources, and 27 is the only confirmed working sources.
After that, lets try to find things that make it worst or better, (besides more or less clients.) to help lummox narrow down the code.
Things like trying to find correlation with movable atoms moving around, creating new atoms, hints that something is broadcasting updates to more clients then it should (like if we can find out how having everybody looking at the same part of of the map changes it compared to different parts.) you get the idea.
1
2
ID:2057211
Mar 22 2016, 6:44 am
|
|||||||||||||
Resolved
| |||||||||||||
if you wanted, you could load up a debugged DS client and join bagil, our low pop server, and get an idea as to whats going on, even if it just has debugging output as to how many turfs its getting updates for, and what not
I'm gonna keep it on 1332 for time being, the shuttering is noticeable, but not noticeable enough to be an issue. byond://game.tgstation13.org:2337 for comparisons, sybil runs on the same box, (but in its own vm), and is on 510.1327: byond://game.tgstation13.org:1337 Theses are the live servers, so the rules apply, but if you observe as ghost (good idea too as it has no movement throttling of any kind that could interfere) then the only applicable rule is not talking about the current round in out of character (ooc) chat. |
Alright, So during the pre-game lobby, where all players are locked to viewing a certain part of the map (unless they opt out of the next round and enter observe mode as a ghost), if i go and ghost and move around, there isn't enough of a shutter to determine that it's not network related. (ie, really low amount)
I however noticed the shutter grow slightly when another player went and observed and started moving around, and reduce when I sent them back to the lobby. |
I have something interesting to report on this thread.
On my test world I setup a scenario where every single tile has one obj on it, on a 100x100 map. There is noticeable stutter at predictable intervals as I move from one corner to another. I did some profiling, and tried to get numbers at an intra-function level in SendMaps() to explain the stuttering when a large number of objs is present. The results I got surprised me: The routine that updates a client's personal chunk is not the problem--or at least, it's not a problem anymore as of some changes I made earlier today. I believe, therefore, that the issue is coming up when two internal Bag classes (these use red-black binary trees) that handle obj and mob changes are being updated with the new objs in the newly visible chunks. Whereas in the past, the routines for movables looped through every tile, now they loop through every movable in the client's personal chunk (which spans multiple map chunks). I believe it's the deluge of new objs entering the tree all at once that's causing this problem. As a result of this information, I think I know the way forward: a partial step backward. Namely, I think I need to change the movable-filling routines to traverse tiles again, and then to go through the personal chunk only to look at out-of-bounds objects. If this works as I think it will, then my next change would be to modify the map chunk system so movables are only recorded in chunks when out of bounds, which should improve efficiency in a few other places. So the result would be hybrid of the old setup and the new. |
Lummox JR resolved issue with message:
Maps with a large number of objs could see mild stuttering as clients moved into range of new chunks. |
In response to Somepotato
|
|
Somepotato wrote:
Will it be possible for us to get a beta build tonight? I'm still kinda in the middle of some webclient stuff and not ready to release yet. Tomorrow for sure. |
In response to Dunc
|
|
Dunc wrote:
Fuckin lummox you're the best you know that :+1: |
In response to Dunc
|
|
Dunc wrote:
Fuckin lummox you're the best you know that |
In response to Dunc
|
|
Dunc wrote:
Fuckin MrStonedOne you're the best too |
Really? I ran tests with an obj on every tile and the problems caused by chunks coming into view completely vanished. The numbers showed marked improvement.
|
So i was trying to nail down a replicable pattern in 1332 for local testing using multiple test clients and fps rates ss13 should never run at, so i can make sure it's gone in 1333, and i have some interesting findings:
The server side caused lag/shuttering seems to be gone, world.cpu is much lower, etc, there however might be something still remaining that seems to be client side. It also seems that it might be verb related as it was worst for me as an admin then one of my guest connections who wasn't an admin. When I made my logged in connection deadmin (losing a lot of set src = type in view() verbs), it got even worst until I reconnected, suggesting a similar issue to the right click lag bug, where losing visibility of a verb causes additional overhead. It also affected the framerate of infinite looped animations, furthering my theory that its client side. However overall i'm considering this a YUGE! success, I'll update the servers tonight and let you know once we've gotten enough time to tell. Edit: The client side shuttering seems to only affect mobs or clients (not sure yet) who have lost manually added verb(s), so the players won't see it and admins won't most of the time unless they deadmin to play, and they can use the .reconnect verb we added to the file menu to quickly reset it. It also goes away once they get those verbs back. Edit2: Confirmed that the initial problem is gone, this other lag might need its own thread. |
One thing I don't understand is how verb availability could be a big problem here, unless there's some really bad behavior on the part of the info control. But if there's any way you can build up a test case that I can look at, that could help. The right-click thing has always been a problem to replicate.
|
yep, so there is definitely something odd going on here, I had to make a world that randomly generated objects (I used my test sight project)
https://tgstation13.org/msoshit/verb%20check.zip Just click "gimmie client verbs" to add 30 verbs to your client, and "they took r verbs" to remove them. move around before and after and you will see the issue. Re-add the verbs and it goes back to normal. reconnect after removing them and it goes back to normal. You can tweak the amount of spawned objects by editing turf's new() in sight check.dm to start at a lower prob() or decrease it more. The random pixel_x/y is just to make it easier track your movements around, and i don't know that it's required to cause the issue. ignore the command tab. You can even right click before and after to see the right click lag issue too. It works when hosted and connected to localhost. |
Quick question: With the random pixel_x/y, if you add atom/appearance_flags = TILE_BOUND does the issue vanish?
|
1
2
In Ishuri's bug, all turfs are being marked as "interesting" because he used 96x96 icons and a -32,-32 offset, in a world with a 32x32 tile size. This was a bug, because that should be considered "in bounds"; the chunk system thought the turfs were starting at -64,-64 instead. But he had this same stutter any time he crossed over a boundary where new chunks became visible.
I believe the culprit is the routine that updates the client's personal chunk, in that apparently it's not behaving all that nicely when trying to merge two large sorted lists. This makes some degree of sense, because the list insertion is binary, but when merging two lists there are shortcuts that can be taken.