Aug 6 2013, 9:45 pm
|
|
We'll try to get a fix out this week.
|
499.1202 has been released. The primary change here is a fix to the issues found in the valgrind log. Users who have been having issues should retest with the new version.
The issues all stem from a data structure being written/read after deletion, if the connection was closed unexpectedly. The main place this showed up was in map routines, where there was some (but very little) checking for this, which I've now addressed. The other place was in hub communication. I suspect there could be other lingering cases like this that I may have missed (most likely in completely different areas though), to say nothing of completely unrelated causes. My main concern was addressing the known cases of heap corruption. |
Well test it out shortly and keep you updated. Our longest uptime without freeze / crash on 1193 is 34hours (average is 7hours) so were aiming for 48hours without corruption related crash/freeze to confirm if its working or not.
(were using 1193 because its the version that gave us the longest run, 1197,1201 tend to crash within minutes.) |
Froze on 499.1202.
(gdb) attach 19361 |
well that was quick
GDB
sigusr2
We reverted back to 1993 since well, this one is just unusable right now (same as 1201), Which would mean that either the error we caught in valgrind are still here, or there is something else going on. It would be great if that infinite loop could somehow be caught and handled, i don't have the code so i cant really guess what the structure is, but if its something relatively contained, a few checks sprinkled around to see if its corrupted (a loop counter with a limit in the millions strings or gb if its on a per byte loop would probably do the trick), could help. Since its pretty obvious the memory is still allocated (application end up in an infinite loop and not a segfault) knowing at which point it end up corrupted would probably help you a lot. I'm worried that the corruption come from something the player do on the server, so valgrind has very little chance of catching it since its unplayable when valgrind is running.</_dl_fini> |
Hell, id rather it asserted or segfaulted when that infinite loop happen, than just stay here frozen (at least with a proper crash auto restart can happen right away).
Right now, im stuck using a "stillalive" file that the game touch every now and then, and to kill the process if it hasnt been touched for x minutes. But its not exactly as reliable |
In response to Jey123456
|
|
I wonder why yours manages to catch SIGUSR2 - mine doesn't catch anything while frozen.
|
Well rats, that was disappointing. I think I'll need more valgrind info to go on, then, because the issues your log brought up should all be taken care of in this build.
|
valgrind is not an option really. Its unplayable, and its most definitly player induced. I cannot tell exactly what action since it doesnt freeze right away, but i had the server empty for days in a row without freeze / crash.
|
So is it clear that the issue occurred between 1193 and 1194, or can that be isolated?
|
In response to Tom
|
|
The freeze mentioned in my original post occurred in a 498 daemon compiled by 498 dreammaker, after the release of 499. I had never experienced a freeze until roughly the 499 release.
I can't be sure whether it was coincidentally some update I made to the game at the time, though. |
no 1193 also have the problem so does 1197 just not at the same extreme.
1193 will often work fine for quite a few hours in a row, where as 1197 generally only last 1-2 hours max, and 1201/1202 never seen past 30 minutes yet in my tests. but all 3 share the same exact freeze location in debugger (the string infinite loop) |
I am unsure why 1197 only lasts 1-2 hours for you most of BYONDPanels servers are on 1197 especially all new ones with the exception of eternia on 498.1158 (I think) haven't had any complaints thus far about freezing or any other issues, some clients including a new one had over 40 players.
Each server is sporting Ubuntu 12.10 32Bit. |
believe me i am as unsure as you are heh. 1193 is the one that proved to give us some form of stability (but even then, it freeze now and then). 1197 was definitely worse and 1201-1202 are plain simply unusable.
|
I still don't get how such a dramatic difference could be seen between versions of 499. The server code was touched very little throughout that process. Hub connection code was changed a bit, but the one case of possible heap corruption we identified there has been taken care of. I'll look over the hub code some more though to see if there's anything there that could remotely be an issue. That's pretty much the only place I can expect to find significant changes after 1193, since most of the beta changes affected the pager only.
Obviously the core problem predates 499 since the original report was for 498; the sources of possible corruption I've already addressed would fit that description. It could be there are new sources of corruption in newer 499 builds, but I think a more likely explanation is that something is merely exacerbating the existing issue. I don't know what else could be screwing up the heap, though, which is why logs from a tool like valgrind are so critical. That can catch heap corruption as it happens, rather than after when it's too late. |
If this helps at all, my friend was attempting to connect but was returned with "Connection failed". In the logs it was clearly shown that he had disconnected and connected within a second, which was approximately immediately to half a second before it crashed.
(This was before the new build, but we are still crashing, and this is what happens most of the time.) PS: Server's Sigusr2 returns that the server got stuck at mob/Stat, but not on the mob that connected and disconnected. |
Maybe there's another issue related to the map-sending problem, then. I'm looking into ways that I can mitigate any issues like this across the board.
|
Sun Aug 18 00:24:38 2013
World opened on network port 1213. Welcome BYOND! (4.0 Public Version 499.1197) The BYOND hub reports that port 1213 is reachable. BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:160) BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:161) BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:168) BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:169) BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:170) BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:171) BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:174) BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:175) BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:266) BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:269) BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:270) BUG: File not found: /home/dmb/wano0132/logs/chat/2013/08/16/kai7max.html (current directory is /home/dmb/wano0132) BUG: File not found: /home/dmb/wano0132/logs/chat/2013/08/16/kai7max.html (current directory is /home/dmb/wano0132) BUG: File not found: /home/dmb/wano0132/logs/chat/2013/08/18/kai7max.html (current directory is /home/dmb/wano0132) BUG: File not found: /home/dmb/wano0132/logs/chat/2013/08/18/kai7max.html (current directory is /home/dmb/wano0132) BUG: Bad ref (6:40461) in DecRefCount(DM summon.dm:33) BUG: Bad ref (6:40461) in DecRefCount(DM summon.dm:33) BUG: Bad ref (6:40461) in DecRefCount(DM savefiles.dm:311) BUG: Unexpected hub certificate (65535) BUG: Unexpected certificate (6) BUG: Failed to decode message 54,5 BUG: Network connection for Odensity shutting down due to read error. (2,1) BUG: File not found: /home/dmb/wano0132/logs/chat/2013/08/18/theshadowone.html (current directory is /home/dmb/wano0132) BUG: File not found: /home/dmb/wano0132/logs/chat/2013/07/24/lokus.html (current directory is /home/dmb/wano0132) Mon Aug 19 20:38:04 2013 World opened on network port 1213. Welcome BYOND! (4.0 Public Version 499.1197) The BYOND hub reports that port 1213 is reachable. BUG: Crashing due to an illegal operation! Backtrace for BYOND 499.1197 on Linux: Generated at Tue Aug 20 00:24:03 2013 DreamDaemon [0x8048000, 0x0], [0x8048000, 0x804a8ce] libbyond.so 0x2fa610, 0x2fa631 [0x60b000, 0x60b600], [0x60b000, 0x60b600] libbyond.so 0x2fa610, 0x2fa631 libbyond.so 0x25f520, 0x25f958 libbyond.so 0x262d80, 0x262de1 libbyond.so 0x262f90, 0x26306f libbyond.so [0x60c000, 0x0], 0x29235b libbyond.so [0x60c000, 0x0], 0x2c5f6a libbyond.so [0x60c000, 0x0], 0x2b4c16 libbyond.so 0x2c5360, 0x2c546b libbyond.so 0x2c54e0, 0x2c5593 libbyond.so [0x60c000, 0x0], 0x2c6a52 libbyond.so [0x60c000, 0x0], 0x2b122c libbyond.so 0x2c5360, 0x2c546b libbyond.so 0x2c54e0, 0x2c5593 libbyond.so [0x60c000, 0x0], 0x2c68ec libbyond.so [0x60c000, 0x0], 0x2b122c libbyond.so 0x2c5360, 0x2c546b libbyond.so 0x2c54e0, 0x2c5593 libbyond.so 0x2c7730, 0x2c77fc libbyond.so 0x261140, 0x261596 libbyond.so 0x2a7400, 0x2a7981 libbyond.so [0x60c000, 0x0], 0x2c7506 libbyond.so [0x60c000, 0x0], 0x2b6481 libbyond.so 0x2c5360, 0x2c546b libbyond.so 0x2c54e0, 0x2c5593 libbyond.so [0x60c000, 0x0], 0x2c68ec libbyond.so [0x60c000, 0x0], 0x2b122c libbyond.so 0x2c5360, 0x2c546b libbyond.so 0x2c54e0, 0x2c5593 libbyond.so [0x60c000, 0x0], 0x2c6a52 libbyond.so [0x60c000, 0x0], 0x2b122c libbyond.so 0x2c5360, 0x2c546b libbyond.so [0x60c000, 0x0], 0x2ce972 libbyond.so [0x60c000, 0x0], 0x2b1132 libbyond.so 0x2c5360, 0x2c546b libbyond.so 0x2c54e0, 0x2c5593 libbyond.so 0x2c7730, 0x2c77fc libbyond.so [0x60c000, 0x0], 0x2c90e9 libbyond.so [0x60c000, 0x0], 0x2cea31 libbyond.so [0x60c000, 0x0], 0x2b1132 libbyond.so 0x2c5360, 0x2c546b libbyond.so 0x2c54e0, 0x2c5593 libbyond.so 0x2c7730, 0x2c77fc libbyond.so 0x273b60, 0x273c73 libbyond.so [0x60c000, 0x0], 0x2751da libbyond.so [0x60c000, 0x0], 0x2883ca libbyond.so 0x289450, 0x2894cf libbyond.so 0x2d5e60, 0x2d5ef9 |