Hey there, I am the current host and system admin for tgstation13, one of the more popular ss13 code bases. We have had this issue for a while since about late last year (2014).
Basically, a random forloop will runtime out with "out of resources" followed by what seems like every global (global scope, or world, not global vars as in static vars attached to a class object) list triggering a "bad list" run time on access. Rebooting the world via either code (admins have verb to do it) or the world->reboot menu option will trigger world/reboot(), but otherwise fail to actually restart the world. (Hitting stop in dd works fine)
The server has 64gb of memory, and the hypervisor VM that byond runs in has exclusive access to 8gb of memory. During the time that this triggers, DD is not using any abnormal amount of memory, and is still at it average amount of 700mb. (we have also had this triggered while using only 600mb on the low pop server)
We are currently running 507.1284 but this error has been seen on 1282 and i think 1279 (can't remember what exact version we ran before i took over).
Here is the memory stats after this has happened:
Prototypes:
obj: 1306644 (7982)
mob: 1308788 (134)
proc: 8411592 (17922)
str: 4267297 (85786)
appearance: 6051897 (8923)
id array: 8395740 (34880)
map: 78031920 (255,255,7)
objects:
mobs: 222480 (202)
objs: 31409744 (122566)
datums: 6002816 (49362)
lists: 25153408 (465187)
Here is what they look like directly after world/new has finished:
Prototypes:
obj: 1306644 (7982)
mob: 1308788 (134)
proc: 8411592 (17922)
str: 4246606 (85786)
appearance: 5908006 (8307)
id array: 8402164 (34880)
map: 76949104 (255,255,7)
objects:
mobs: 60576 (66)
objs: 31151568 (122566)
datums: 5116576 (43725)
lists: 37901992 (962898)
(note: both of these are from a testing server on my home computer(windows 7 64bit, 1284), where I ran the server for a little over a day with me connected but not doing anything to get this to trigger, no idea how much time had past with the world in this state, as I wasn't checking up on it much, and basically almost forgot about it)
The following is a list of permalinks to time frozen code of the forloops that have triggered this runtime in the past few months. (taken from runtime logs on the production servers) (the first two are from the last few days, and after we upgraded to 1284) (this is just a short list, as i got tired of digging thru runtime logs and generating time frozen links)
https://github.com/tgstation/-tg-station/blob/ a029a49392b4708b76f9f4f709d8b707475c04c7/code/LINDA/ LINDA_system.dm#L85
https://github.com/tgstation/-tg-station/blob/ 01a8aa662a9f00f5795247087ce4b7e53a0e4035/code/game/ communications.dm#L209
https://github.com/tgstation/-tg-station/blob/ 01a8aa662a9f00f5795247087ce4b7e53a0e4035/code/game/turfs/ turf.dm#L83 (happened here 4 times, but this proc gets called ALOT)
https://github.com/tgstation/-tg-station/blob/ 29609457f57a8cefd244da5b36e43181399ed002/code/controllers/ master_controller.dm#L85
https://github.com/tgstation/-tg-station/blob/ 9038fb15af6ea1e93729f4d6877cf3a7083d1c05/code/modules/mob/ mob_movement.dm#L311
(note, the highlighted lines are the lines given by the runtime error)
To shorten our search, I'm gonna focus in depth on that last one, mob_movement.dm Line 311 because its the simplist proc in relation to use of other variables in the forloop.
for(var/atom/A in orange(1, get_turf(src)))
get_turf, for reference:
/proc/get_turf(atom/movable/AM)
if(istype(AM))
return locate(/turf) in AM.locs
else if(isturf(AM))
return AM
So this is a simple loop in a mob proc, that only relates to a byond generated list (orange) of the atoms in the 1 tiles around the mob.
This runtime triggered on basil (our lower population server) back in march of 28. Shortly after stopping and starting DD, it triggered again two separate times on turf/enter (above) until i actually force closed dd and byond pager.
We've been waiting to report this because of the bug report about list corruption that was fixed in 1284, it wasn't until it happened twice (the first two links) after upgrading that i started to investigate it.
I had assumed it might be bad ram until it happen on my home computer.
One assumption you're making here is that BYOND can use 8 GB of memory; it can't. It's a 32-bit application. Still, 700 MB should be nowhere near a limit.
Although the list corruption really shouldn't happen when the server runs out of memory (in that sense this is a bug), it's hard to predict what will happen in that case. The out of resources message however is the same as "out of memory"; this particular message only happens when a list is copied.
Aside from running out of memory, the only other possibility I can think of is heap corruption. Inability to allocate more memory seems much more likely, though. The striking thing for me is that while you start off with an unholy number of lists, at the time of the crash you have about half that.