Jun 30 2013, 2:14 pm
|
|
It'd be great if this were resolved somehow. I can't actually keep hosting the game from my own computer, haha.
|
In response to Kaiochao
|
|
Kaiochao wrote:
It'd be great if this were resolved somehow. I can't actually keep hosting the game from my own computer, haha. Agreed .. I have a lot of customers complaining and putting up with the issue .. as it's out of my hands really, I can only suggest a few things such as downgrading BYOND on the server... |
I don't really see what would be different in Linux vs. in Windows; it seems like the crash/freeze would be likely to happen equally in both. At the moment I'm still stumped.
|
I'm not sure why but Eternia's server has went from crashing daily to having yet to crash in a week or so.
|
In response to Writing A New One
|
|
Mine hasn't frozen since June 25... It was freezing pretty much daily before then. I haven't changed any configuration/version. Very odd.
|
Yeah, my project is the same... The version remained the same I believe as well (ATHK could confirm this--I'm not particularly sure).
But that seems illogical and kind of... impossible? If nothing has changed with the versioning, then crashing should persist. |
I started the server up on Linux ~4 hours ago and it froze some time between then and now. It's definitely still happening.
I guess I'm gonna have to keep hosting it on Windows until something happens. |
In response to Kaiochao
|
|
If you attach gdb, what do you get? I'm curious if you'll have anything different that might point to a possible cause. Also, do you get those refcount 5:xxx errors before this occurs, or do those not happen in your particular project? That could be a clue, though I'm still not sure of a possible cause.
If you do get the refcount 5:xxx errors, I'd like to know what skin procs you're calling, and where. In particular you should avoid calling skin procs in client/New(); the skin is not initialized at that point. |
In response to Lummox JR
|
|
I dunno what a gdb is. I haven't gotten any of the usual old refcount errors, but these look new:
BUG: Bad ref (2:80252) in IncRefCount [client] I spot about 9 chunks of them throughout today's log. There are no errors at the bottom of the log, though, where the server would've been frozen at 100% CPU. |
You may need to install gdb with whichever package manager you're using.
Then just get the pid of your frozen DreamDaemon instance and: gdb |
In response to Murrawhip
|
|
Murrawhip wrote:
Mine hasn't frozen since June 25... It was freezing pretty much daily before then. I haven't changed any configuration/version. Very odd. ^Scratch that. Froze just now. |
Have any of you upgraded to the new version of BYOND? 1197
Does this fix the issue? |
Alrightly, I'm hosting Hazordhu II on 499.1193, on CentOS 6.4 amd64.
Just caught it in a tight loop as these guys describes, with it hooked up to gdb, backtrace is as such: #0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:202 Instruction read-out for the current frame is so: 207 jbe L(more16byteseq) (Obviously, I have no debug symbols for BYOND) I'll leave it hooked up and hanging for now, in case you want to just SSH in and have a look, Lummox/Tom. Similarly, I can quite happily install a build with debug symbols and basically walk the entire thing through with you/myself and diagnose what's up. |
I would like to add that i also am experiencing that issue (freeze with an apparent infinite loop inside 0x0046951c in ProtoStrCompSigned(ProtoStr*, ProtoStr*) ()).
Heres the data gathered from the sigusr2 Caught SIGUSR2, printing diagnostics: Server port: 2506 Server visibility: invisible Server reachable by players: yes Fri Jul 5 03:02:12 2013 proc name: Stat (/mob/Stat) source file: mob.dm,701 usr: Someone (/mob/living/silicon/robot) src: Someone (/mob/living/silicon/robot) call stack: Someone (/mob/living/silicon/robot): Stat() Someone (/mob/living/silicon/robot): Stat() DreamDaemon [0x8048000, 0x0], [0x8048000, 0x804a8ce] libc.so.6 [0x813000, 0x0], 0x117ee9 [0x121000, 0x121600], [0x121000, 0x121600] libc.so.6 [0x813000, 0x0], 0x117ee9 libbyond.so 0x33f5d0, 0x33f5ec libbyond.so [0x122000, 0x0], 0x24db67 libbyond.so [0x122000, 0x0], 0x2b25cd libbyond.so 0x2c5360, 0x2c546b libbyond.so [0x122000, 0x0], 0x2ce972 libbyond.so [0x122000, 0x0], 0x2b1132 libbyond.so 0x2c5360, 0x2c546b libbyond.so 0x2c54e0, 0x2c5593 libbyond.so 0x2c7730, 0x2c77fc libbyond.so [0x122000, 0x0], 0x2cc49e libbyond.so 0x2c54e0, 0x2c568d libbyond.so 0x2c7730, 0x2c77fc libbyond.so [0x122000, 0x0], 0x272204 libbyond.so 0x284020, 0x284506 libbyond.so [0x122000, 0x0], 0x28b65a libbyond.so 0x35a990, 0x35ab07 libbyond.so 0x32ba80, 0x32bcea DreamDaemon [0x8048000, 0x0], [0x8048000, 0x804a3ee] libc.so.6 0x19000, 0x190f3 (__libc_start_main) server mem usage: Prototypes: obj: 849164 (6216) mob: 851372 (138) proc: 8102304 (14260) str: 4764794 (86181) appearance: 7262282 (14866) id array: 8141556 (27491) map: 1268608 (240,240,6) objects: mobs: 218976 (147) objs: 12617164 (49362) datums: 5364016 (52448) lists: 16033348 (286412) the process in question when the freeze happen is somewhat random, this time it was mob.dm:701 which is stat(null,"CPU:\t[world.cpu]") in our case, but there is always 1 thing in common with all of them. Its always happening on a line with a string operation |
Can I get a debug build of 499.1197? We got debugging through with Lummox recently, and found the that during a string append, it suddenly got a huge value for a counter in a loop, which was causing the lock-up. Problem was, walking the execution with gdb without debug symbols (and the StringEditor class members especially on the heap) proved rather error prone and eventually I broke the stack-frame.
|
Incidentally, I upgraded to 499.1197, compiled Chatters on that version, and hosted on it, and manage to get hangs pretty quickly (15 minutes or so, with 10 clients?).
#0 0x00478346 in StringEditor::Insert(char const*, int) () from /srv/byond/499.1197/lib/libbyond.so Full source code here https://github.com/Stephen001/Chatters The one improvement we saw that could be done in this specific case was to use memmove() in Insert() instead of the loop. Obviously if buffer in the StringEditor was knackered (which I broke the stack-frame trying to look at, impressively), then it couldn't fix the bug. |
It's probably worth noting that due to my side-by-side installs and scripts, I can basically test any combination of compiler and runtime you want me to.
|
You should try stepping through calls/returns and see where its actually hanging
e: oh wait you already did that, not sure how you're managing to break the frame, it shouldent be too bad to step Its probably a minor buffer overflow breaking a value on the stack, or something more trivial |