Descriptive Problem Summary:
So /vg/station was super nice and ported over their fancy lighting that is based off of goonstation's lighting (you've seen that before lummox, its the one with the rainbow color overlay) Here is the diff for that change: https://github.com/tgstation/tgstation/pull/23087/files
Cue crashes every 3 hours from a memory leak that survives world.Reboot()
So I did the fancy task of debugging it, and confirmed that if i just have it track everything normally lighting wise, but never actually assign the computed color matrix to the lighting object, the mem leak goes away.
So that should be simple to reproduce right? Wrong!
I tried just a thing to assign random numbers to color matrices in the same sort of setup, (including blendmode and plane master stuff) and it didn't leak
LUCKLY! I was able to port the entire lighting system to a test project along with a bunch of our helper frameworks (this involved a lot of blanking procs that relied on types that didn't exist and what not.
(I erred on the side of including shit rather then trying to figure out what i could safely not include, so don't assume that just because its checked it means that file is needed, i'm sure some of them aren't needed and some of them are only needed because of procs that aren't even used calling other procs that aren't even used)
https://tgstation13.org/msoshit/vglightleakfound.zip
Once you join the server with a client (this was only tested in DD<->DS setups, but might happen in a DS only setup) the game will spawn a bunch of light emitters and move them around randomly. world.sleep_offline should be enabled so you can just leave to pause it
sight_check.dm has the staging code
defines/lighting.dm has most of the lighting related defines
lighting/ has all of the lighting code. The actual lighting object that has the color matrix is in lighting_overlay.dm (its not a "overlay" in the byond sense)
You can comment out the assignment of the color matrix and the leak won't happen.
I did some due diligence attempting to verify this was not coder error (leaving something laying around), but once i noticed this was hanging around after the world was rebooted and once i noticed that commenting out the color matrix assignment stopped the leaking, I figured it was ensured that a byond bug is involved in some form or another.
Tested in 510 and 511.1375
Related /tg/ page: https://github.com/tgstation/tgstation/pull/24692
ID:2220974
Mar 5 2017, 12:05 pm (Edited on Mar 5 2017, 4:20 pm)
|
|||||||||||||
Resolved
| |||||||||||||
Thanks for your work on this. I'll take a look at the test project tomorrow and get to the bottom of it.
|
For clarity, where is the spot where the color matrix assignment happens that I should comment out to avoid the leak? Is it lighting_overlay.dm:81?
|
Odd, because I'm seeing the exact same memory leak if I comment out the code on lines 81-87.
I was able to confirm the code as-is produces the leak. That much is verified. But the leak is the same if I comment out that part. Or does the leak proper not happen till much later, and all I'm seeing is a build-up to a normal operating level? (If so it will be a lot harder to check.) What I see either way in the code right now is that DS starts gaining memory quickly early on, and then it seems to keep increasing steadily. |
I don't think thats a leak you are seeing.
the map is (i think) 128,128,2. the first minute of memory usage spiking you see is the turf and lighting datums filling up their changed var list as the light objects move around, and is normal. you could reduce the map size to confirm, after uploading that i discovered the same annoyance and did so, 75,75,1 works just as well. |
What I see either way in the code right now is that DS starts gaining memory quickly early on Just to clarify, The leak I'm reporting is in DD, i'm not sure if you are testing DS or just running it in standalone, but either way I think there is an ancillary smaller leak in DS (or just it hanging on to appearances) so it might be better to keep the two seperate to avoid conflating the two. |
Here is some entertainment for all the /tg/ bros f5ing on this thread right now:
|
It looks like the way the AppearanceColorMatrix class was handled was a little subpar, so I'm making some changes that should handle it better.
|
so, any hint why the memory leak doesn't happen if i replace lum_* values in lighting_corner.dm:111 with rand()?
#if LIGHTING_SOFT_THRESHOLD != 0 |
In response to MrStonedOne
|
|
Not directly, except that I suspect the problem had more to do with what happened to temporary AppearanceColorMatrix structs when there was a match than when there was no match.
I wasn't able to tease out exactly where the leak was happening, so much as a rough idea why. Testing has shown my fix solved the problem. |
Lummox JR resolved issue with message:
A memory leak was found for color matrices in certain cases. |
That also gives me an idea for a temp fix then, so basically we just need to avoid appearance matches?
+(rand(1,999)/100000) |
That would likely do the trick if I'm correct in my guess. Of course it means you won't be recycling appearances efficiently, so you'd want to nix that fix once the new build is out.
I am still seeing some memory buildup as appearances are being kept on the client side, although the client is doing its job of reporting unused appearances more frequently, so there should be a point (I'm not sure where) that it will hit equilibrium. |
color
to a list that contains the values (cache_r/g/b) calculated on the corner datums. It doesn't matter how they get into the list, or if you do further math on them, or even if you round() them, something is wrong with the numbers that breaks howcolor
matrices are handled. Numbers from rand(), however, are fine.While experimenting with minor tweaks, I changed this line in the corner code:
to this:
and the memory leakage got really bad. That change should increase how many color matrix entries are very nearly 1 instead of exactly 1, so there must be something wrong with the way certain floats are getting handled.
I should note that I also changed
sleep_offline
to be 0, and you don't even need a client connected for the memory to leak.