Descriptive Problem Summary:
DreamDaemon sometimes hangs at 100% processing power and stops working.
We've been encountering this problem for several months with the BS12 source, but we haven't been able to reproduce it. A recent commit changed that - The bug now predictably happens at startup!
Numbered Steps to Reproduce Problem:
1. Download tooreasonable.org/public/baystation12.zip to a linux box
2. Compile and run
Expected Results:
Server finishes startup and contacts BYOND central, regardless of CPU load
Actual Results:
Server hangs forever and never finishes startup
Does the problem occur:
Tested on two linux boxes(one 64 bit, the other 32 bit), and on windows(by Murrawhip)
Workarounds:
None.
ID:926437
Aug 12 2012, 3:20 am (Edited on Aug 13 2012, 1:33 am)
|
|
Aug 12 2012, 11:25 am
|
|
So what was in the recent commit that predictably causes the lockup at start?
|
There were several commits. Narrowing it down could be interesting, but I don't have the time right now. For what it's worth, I profiled DreamDaemon with cachegrind to see where it's spending its time in. http://i.imgur.com/dqlJs.png
|
https://github.com/Baystation12/Baystation12/commit/ 5e2f4948d039c583d9958161cea1e919143836d4 We've narrowed it down to this commit, looking right now whether anything there could cause it.
|
Replacing (code/ZAS/Functions.dm)
if(istype(O) && !(O in open || O in closed || O in doors) && O.ZCanPass(T)) with if(istype(O) && !(O in open) && !(O in doors) && !(O in closed) && O.ZCanPass(T)) prevented the lockup for me. |
Yep. Apparently it causes an iloop. Going to ask the coder of it where exactly the iloop happens.
|
I did another test where I let the game wait 5 minutes until starting up. Result is that DreamDaemon connects to BYOND central just fine and works fine, but the moment the startup code is triggered, it hangs forever.
|
In response to Murrawhip
|
|
Murrawhip wrote:
Replacing (code/ZAS/Functions.dm) > if(istype(O) && !(O in open || O in closed || O in doors) && O.ZCanPass(T)) with > if(istype(O) && !(O in open) && !(O in doors) && !(O in closed) && O.ZCanPass(T)) prevented the lockup for me. Semantically, !A && !B && !C is equivalent to !(A || B || C). However, I'd be concerned with the operator precedence of 'in'. Surprisingly, the reference entry on Operators lists precedence, but not for 'in'. Try !((O in open) || (O in doors) || (O in closed)) |
in is lowest precedence, so the statement
!(O in open || O in closed || O in doors) Reads as: !(O in (open || (O in (closed || (O in doors))))) Which is kind of a nonsense statement, being type-wrangled I think and so always evaluating to false. I am a little surprised you'd get no runtime about O in [not a list], but maybe it does just coerce it. |
In response to Stephen001
|
|
Stephen001 wrote:
> !(O in (open || (O in (closed || (O in doors))))) Which is kind of a nonsense statement, being type-wrangled I think and so always evaluating to false. I am a little surprised you'd get no runtime about O in [not a list], but maybe it does just coerce it. After testing this a little, it seems the following will run: world << "[src in TRUE]" Basically, it seems like 'in' makes no guarantee about its RHS actually being a list. And will run and execute, regardless of what you give it. This means you can do: world << "[src in someProc()]" which you can't otherwise in DM, but it also means that if the RHS is *not* a list, it's just going to return 0. |
I've moved this to Developer Help because it was not a BYOND bug; it was a bug in the code.
I'm not sure why SkyMarshal thought that replacing that if() statement was going to help. If he didn't recognize the need for parentheses around the in clauses, then the statement should have looked equivalent. Replacing a statement with an equivalent is never going to change anything for the better. To answer questions about why the in operator works on non-lists, I believe that's intentional. Using "item in list" when the list is null is a great way to avoid having to initialize lists you don't need. I should also mention for performance that doors |= T is not the quickest way to add T to the list while checking for uniqueness. A better way, I believe, is an associative list. doors[T]=null should be just as effective, and because it's a binary lookup it should be quicker to discover if T is in the list or not. |
Erm.. what? Did you read the OP? Whether the specific DM code works or not is unrelated.
The issue is that BYOND will just hang completely and become unresponsive(which is equivalent to a *crash*), without as much as a message in the logs as to what happened. To get even more explicit: We've had our server crash 2-3 times per day for months now, and it's impossible for us to fix these crashes, as there's no indication as to why they happen. I have presented here a case to *reproduce* the problem that causes the crashes, which should make it possible for you, who unlike us possesses the source for DreamDaemon, to actually fix the related bug and/or make it crash gracefully with an error message. |
An infinite loop makes Dream Daemon unresponsive; that is not a bug, just a simple fact of how the interpreter works. And an infinite loop is not truly equivalent to a crash; it has some similar results in that the server stops working.
In most infinite loops, world.loop_checks should kick in before the problem becomes an issue. Was that disabled? Is something else preventing it from catching, like breaking the proc up into chunks? Were any sleep() calls added to deal with processing getting out of control (a wise precaution if doing a giant loop)? Again to be clear, the infinite loop tying up the CPU isn't a bug; that's what infinite loops do. There is a feature request to move the Windows UI to a separate thread, though that's not a simple matter to do. (This is why a sleep() call every once in a while, like after 1000 iterations or so, is a good idea.) Even in that case, you'd still have the CPU tied up by that loop and it'd still be killing your gameplay, and there's absolutely nothing we could do about that. The only hope would be finding some way to provide better diagnostic tools to tell where your loop is occurring. |