Descriptive Problem Summary:
We put 1432 on one of the servers, it worked fine
The next morning I woke up and was told that everything was fucked.
Players were going to random mobs on log in, editing variables with admin tools would edit the same variables on other objects instead.
1
2
ID:2380006
Jun 28 2018, 9:15 pm
|
|||||||||||||
Resolved
| |||||||||||||
Jun 28 2018, 9:16 pm
|
|
lol this isn't a very useful report, stop bloating up the bug queue
|
That's odd. Any chance it's related to id:2379947?
I need something to go on of course to get to the bottom of this. |
as far as i know, we don't use ./: chaining it violates the repo's policy against using operators that don't compile-time type check
|
I'll try to elaborate a bit on what I know but I did not see this first-hand.
We generally do not allow chained operations like list[1].thing or proc().thing. I'm sure there are a fair number of places where that actually happens, but i don't think it's related. New players who connected were logged into types other than /mob/dead/new_player, despite this code in /mob/Login() log_access("Mob Login: [key_name(src)] was assigned to a [type]") resulting in ACCESS: Mob Login: Jordie0608/(Jordie0608) was assigned to a /mob/dead/new_player In actuality, Jordie here was logged into a /mob/living/simple_animal/hostile/hivebot/strong, or at least his Topic() calls came from a hivebot. An admin also reported that mob tags resolved wrong, effectively "mob_40" locate()'d into a mob with tag "mob_495" |
I'm not entirely sure the chained list/proc thing wasn't an issue though, especially if problems only started after a certain point. I can't see any reason for a mob to login to the wrong type unless something got changed inappropriately.
|
The fix for the other issue is in 512.1433, so please retest with that build.
You will need to recompile. If this doesn't fix the issue I'll need some more concrete data to work with. |
Well now it's reproducible!
https://travis-ci.org/tgstation/tgstation/jobs/ 398482646#L1042 Here is the same code on 1427 with no runtimes https://travis-ci.org/tgstation/tgstation/jobs/ 397638628#L954 This is actually seeming similar to 1431's bug, as those runtimes weren't a thing in 1432 |
Yep, its all broken down, can confirm. Began with lots of list errors, but I'm sure you can see the runtimes yourself.
|
Um, do you have that code somewhere else where I can look at it? That site barfs unless I enable cookies for it.
|
It seems to be TG code, I think this will work with most SS13 codebases, the Travis is just the errors... Which basically comes down to lots and lots of null.proc() errors. Bad index calls for me, nullified lists.
(So I *think* you can just use your version of TG, or one of their newer ones and reproduce it all the same) |
This is correct, I had made a post that said just as much, but the website went down while posting it, and i didn't feel like retyping it
|
Here's a pastebin version of that log
https://file.house/wgZG.txt It should be reproducible by running DD with this commandline DreamDaemon.exe -trusted -params "test-run" with or without LOWMEMORYMODE And here the exact revision of the tree that's running https://github.com/tgstation/tgstation/archive/ 9acba9cde801ec523cdcb1404c8ecaaa6f5bea34.zip |
I won't run tests in trusted mode but I think that log and that .zip should be helpful to me. I'll see what I can find.
|
I'm working on this, but boy is it a slog. From what I'm seeing right now in the debugger (low-memory mode), it looks like the air subsystem is being initialized before the atom subsystem and that's what's causing the problem. OR, the atom subsystem is somehow skipping over the turf. The first turf this happens to (to get the null.proc error on open.dm:157) is 0x1000300. I'm not seeing its air var being set to anything, which from what I see the atom subsystem is responsible for, before the atmos subsystem takes over. Hence the problem: the air var is expected not to be null, but it is null.
|
The subsystems are initialized in code/controllers/master.dm, line 80 onwards:
var/list/subsytem_types = subtypesof(/datum/controller/subsystem) cmp_subsystem_init is in /code/__HELPERS/cmp.dm, it provides a comparison function to sort the subsystems by their initialization order: /proc/cmp_subsystem_init(datum/controller/subsystem/a, datum/controller/subsystem/b) The subsystems are defined under the files in code/controllers, but each of them only has init_order = SOME_DEFINE. The actual values are in code/__DEFINES/subsystems.dm, line 51 onwards. I'm guessing the bug involves initial() on variables containing typepaths somehow, a pattern we use in a few places to avoid instantiating an object just to get a constant value from it. |
Well, I ran a test on initial() with a regular object earlier and confirmed that much at least seemed to be working correctly. I don't see any reason that type paths in initial() would have changed, though. However it definitely appears the subsystems are initializing out of order.
|
I'm trying to drill down on this issue further, right now i've confirmed that sortTim() runs and runs the
if(L && L.len >= 2)
e: /datum/sortInstance/proc/timSort() runs, which runs /datum/sortInstance/proc/binarySort(), but never runs
if(call(cmp)(fetchElement(L,mid), pivot) > 0)
for(,start < hi, ++start)
e2: the for loop never runs. ## WARNING: binarysort called, start is 65, hi is 65, cmp is /proc/cmp_subsystem_init e3: this call:
sortTim(subsytem_types, /proc/cmp_subsystem_init)
runs this code: /proc/sortTim(list/L, cmp=/proc/cmp_numeric_asc, associative, fromIndex=1, toIndex=0) which gives this warning: ## WARNING: sortTim here, fromIndex is , toIndex is so the default arguments got nulled? Going to run this with 1427 to see what the difference is |
1
2