In response to A.T.H.K
|
|
WANO's last logs showed consistent failures in image() when it was being fed an /icon datum--or its internal icon var. It's not clear why that was happening, but the failure is so consistently within the image() proc it can't be a coincidence. Doohl made some changes, but it appears I missed one of the spots in the code that does this and he either hasn't had a chance to retest yet or hasn't gotten back to me since then. I also recommended to him that he put in some debugging code to give him a heads-up if any other parts of the code were doing this.
|
ATHK, I did a trace and found your crash was somewhere in our PNG reading routine, but I couldn't find out more yet; it's a difficult routine to analyze. If the game allows for image uploads, It might help to disable that. It's possible though that the problem wasn't so much in the PNG read as in something else causing memory corruption. It's very hard to say yet.
I did discover an issue in 500+ regarding animate(), which caused some memory leakage and some resulting weird crashes. Specifically in one project I saw consistent crashes in a routine that sets the length of a list, even though the actual cause was a memory leak. While this only impacts projects using animate(), I thought I'd mention it here as it's relevant to Eternia at least. I feel confident the issue with the mystery corruption of the string table was taken care of late in the 499 series when we changed the way we compile Linux builds, and the issue being seen now could well be different. I'm not even certain your crash and Eternia's are the same issue. For any games using animate(), I would recommend a retry on 502.1219, which fixes the list leak. |
We've updated to the latest build are all still experiencing crashing. We've also reprogrammed all the suggestions that Lummox provided. Still crashing daily.
Ready for further instruction. |
Backtrace for BYOND 502.1219 on Linux: |
Thanks for the update. I ran a trace and confirmed the crash is still happening inside image(). Most specifically, it's happening in a routine trying to access an internal object used by the /icon datum. However in your code this is always failing inside of an image() call.
If you have made all the appropriate changes, then in theory image() shouldn't be called with /icon datums anymore at all, but clearly this is not the case, which means one of us missed one. Per what I mentioned to you and Doohl, the way we can verify this for sure is to use a macro before each image() call where the first arg isn't a constant (that is, null or a single-quoted icon file). I strongly recommend this:
#define IMAGECHECK(i) if(i && !isfile(i)) {world.log << "Icon datum in image()"; i = fcopy_rsc(i)}
Stick that somewhere in a file with the rest of your defines. The call to IMAGECHECK() should be put before each call using image() where the first argument is a var, like so: IMAGECHECK(hood_icon) By doing this we can narrow down exactly which part of the code is having an issue. The fact that this crashes so regularly in the same spot means something it's doing beforehand is causing the crash, and that's what I need to find. |
Your IMAGECHECK() should probably also output the file and line too, shouldn't it?
|
In response to Nadrew
|
|
Ah yes, so it should. So [__FILE__]:[__LINE__] should be thrown into the log part of the macro there.
|
Apologies for the slow response. The error catching has been implemented and we'll update you whenever a crash next occurs.
|
Here are the results of your IMAGECHECK():
Icon datum in image() - code/spells/class/special/illusion.dm:176 And here are the files: illusion.dm:176 for(var/Over in m.overlays) combat.dm:117 for(var/Over in P.overlays) Everything within a player's overlays is an /image type. If it's not, it should cause a runtime error, which it doesn't. |
The items in the overlays and underlays lists are not /image types; they're Appearances. An Appearance is a special internal type used by DM to keep track of all the visual aspects of an icon. Anything that gets added to overlays or underlays is converted to an Appearance.
I think those outputs are red herrings. My IMAGECHECK() macro is relatively simple and I hadn't designed it to check for Appearance types. The problem is only happening when the icon argument is a datum or the datum's internal object. Wherever that's happening, the code just before it is probably what's triggering the bug. Since you're looping through the overlays list, the objects you're using are already Appearances, so their icons are actual cache files and not datums; therefore I'm confident these are not the approximate locations of the bug. If you change Over to Over:icon as the first argument in image() and also as the argument in IMAGECHECK(), these cases will go away. That only applies here though, not in all your other image() calls. |
I'm not completely sure about this, but I used to have this crashing, too. My previous lead dev Jey123456 posted the issue.
HOWEVER. After reinstalling our server, and downloading a new copy of the code we use, I haven't crashed in over 2 weeks. I really don't think these crashes are related to BYOND, any more. Rather, you should look through your code, or, if you want to go resource intensive, log every proc call made, it's sure to get you to the one that seems to be last when the crash occurs. |
In response to Lummox JR
|
|
Lummox JR wrote:
Wherever that's happening, the code just before it is probably what's triggering the bug. Are you saying you want us to post this code? We're not too sure what to make of your reply / what you want us to do next. var/tmpoverlays = new/list() |
In response to Writing A New One
|
|
No, what I mean is the spots in the code you found are not actually the places this issue is happening. My IMAGECHECK() macro is just catching them because it's not really smart enough to look for an Appearance; it just assumes anything that flunks the isfile() test is a datum or the internal object.
If you change your code so you pass Over:icon to image() instead of Over, you should get the same functionality but this won't trip up IMAGECHECK(). Obviously that only applies to these specific places where you're passing an overlay/underlay. My thinking is that once you take care of those false positives, assuming no others crop up, you should eventually find where the issue is occurring and we can go from there. |
I will call image() with Over:image, but this will just remove all instances of IMAGECHECK() catching an error. There's nowhere else so far where the error's occurred, and it's been crashing daily or bi-daily on average still.
|
In response to Doohl
|
|
Doohl wrote:
I will call image() with Over:image, but this will just remove all instances of IMAGECHECK() catching an error. There's nowhere else so far where the error's occurred, and it's been crashing daily or bi-daily on average still. That'd be Over:icon, actually. An Appearance has no image var. Assuming you didn't miss any spot where an IMAGECHECK() should go, your results would suggest that 1) it's crashing before the file output to world.log is properly saved, and 2) it's crashing the very first time the routine is called. This being on Linux, I would suggest possibly losing the -logself option and world.log, and instead of logging to a file, sending all output to a file via the Linux command line with > instead. Theoretically, I think that the regular output buffer would auto-flush and you'd get proper output then that could catch the missing image check. Another option, which is rather radical, is to make IMAGECHECK() print out an error and then return from the proc without ever creating the image, if it catches a datum being used. Since I'm convinced the code before this point is what's causing the crash, I have doubts this will truly prevent a crash from happening, but I suspect you'll be more likely to get usable output that can track down the culprit routine. |
Please try this in the latest 503.1222. When it crashes, it should provide some new info about the DM procs in use. Also, you may want to try disabling the map-threads if that turns out to be causing new crashes.
You can compile your game in pre-500 versions if you want to ensure that old clients can login. |
Backtrace for BYOND 503.1222 on Linux: Thu Dec 26 19:07:41 2013 Thu Dec 26 19:14:44 2013 |
http://www.byond.com/forum/?post=1264881
Which went on for months, this issue seems to be the same in terms of it crashing all the time.
501.1217 has the issue, below
Also reported on 500.1214.
500.1209 seems to work ok (as I haven't had any complaints), in the 4 series 499.1197 was the best (only a few reported crashes).
Apart from that I can go through my support tickets and look for more reportings of the crashes, each one has a log file which looks pretty similar to the error above.