DreamDaemon.exe server crashes on startup, but not every startup.
The sequence of events when the crash happens:
1) Start server. DreamDaemon UI becomes unresponsive (normal behavior) while it loads the world etc.
2) Wait until it becomes responsive again and then connect.
3) More game init code starts running once someone has connected.
4) At some point it will stop executing code prematurely, but the dreamdaemon ui will become responsive again. At this point, if the client disconnects it will instantly crash DreamDaemon.exe
I have tried varying what subsystems in our code startup, adding logging statements etc. But I have unfortunately not been able to exactly pinpoint at which point it crashes. Given the dreamdaemon process crashes I can't be sure any logging I do actually gets flushed anyway.
If the game successfully starts up all the way, I don't have any problems after that point, works perfectly.
I can't find any pattern that influences whether or not any given startup will crash or not.
Project depends on 511 features, so compiling on 510 isn't feasible. But to be specific the crash has been observed on 511.1380, 511.1381 and 511.1382.
Dump Summary (version 511.1380)
This dump file has an exception of interest stored in it. The stored exception information can be accessed via .ecxr. (fa0.2b8c): Access violation - code c0000005 (first/second chance not available) *** ERROR: Symbol file could not be found. Defaulted to export symbols for KERNELBASE.dll - eax=00000000 ebx=00000000 ecx=00000000 edx=13ab5d00 esi=00000003 edi=00000003 eip=7718718c esp=00ffcf00 ebp=00ffd090 iopl=0 nv up ei pl nz ac pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000216 ntdll!NtWaitForMultipleObjects+0xc: 7718718c c21400 ret 14h *** ERROR: Symbol file could not be found. Defaulted to export symbols for byondcore.dll - eax=00000000 ebx=00ffc320 ecx=00000000 edx=13ab5d00 esi=00000000 edi=125acb10 eip=016ec452 esp=00ffdcf0 ebp=00ffdcfc iopl=0 nv up ei pl nz na po nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202 byondcore!LocalDB::HubToJS+0x7d62: 016ec452 8b7028 mov esi,dword ptr [eax+28h] ds:002b:00000028=???????? ^ Extra character error in '.ecxr.'
Call stack:
00ffdcfc 016eb97c byondcore!LocalDB::HubToJS+0x7d62 00ffdd1c 016d8fec byondcore!LocalDB::HubToJS+0x728c 00ffdda0 017334d2 byondcore!DungServer::ThreadNetMsg+0x2313c 00ffdde0 0174a77d byondcore!DMTextPrinter::ValidateResourceFile+0x2762 00ffe240 01726c5b byondcore!DMTextPrinter::ValidateResourceFile+0x19a0d 00ffe25c 017bde28 byondcore!DungPager::InstallProgress_PIO+0x8c2b 00ffe284 017bc895 byondcore!ByondHttpServerLink::WriteBuffer+0x2618 00ffe294 017ce8c1 byondcore!ByondHttpServerLink::WriteBuffer+0x1085 00ffe2bc 00a87445 byondcore!SocketLib::Event_io+0x1f1 00ffe2cc 0f5d540a dreamdaemon+0x17445 00ffe39c 0f5d50ca mfc120!CWnd::OnWndMsg+0x31d ....
Full dump available upon request
Where this is crashing specifically is right in the beginning of the routine, where it's checking the proc data's associated proc info, and that latter value happens to be null. That should not be possible. Other parts of the code have sanity checks for this, although the ScanProcMem() doesn't.
That sanity check is easy to add and it make sense for me to do so--but far more troubling is that this value is becoming null in the first place, which shouldn't be possible. The null proc pointer explains why in the crashy cases, your code simply stops running prematurely: other sanity checks are catching this and execution doesn't happen.
I'll keep looking into this to see if I can uncover the reason for the null pointer. This is a real head-scratcher so I'm not sure if I'll be able to figure it out without running the code.