ID:1958683
 
Resolved
The savefile.Flush() proc has been added, to make sure any pending writes to a savefile are done immediately.
Applies to:DM Language
BYOND Version:509.1305
Operating System:Windows 10 Pro 64-bit
Web Browser:Chrome 45.0.2454.101
Status: Resolved (510.1320)

This issue has been resolved.
Descriptive Problem Summary:
It looks like there's a synchronisation bug with savefiles.

Numbered Steps to Reproduce Problem:
1. We're creating a savefile object using new/savefile("file.sav").
2. We enter a for loop which loops over a group of turfs (previously obtained with the block() proc).
3. As this is going to be CPU-intensive we periodically call sleep() so the server remains responsive.
4. At some point the variable that had a reference to the savefile object is reset to null.
5. Proc crashes because it attempts to set savefile.cd to something, but since the variable is now null it can't do this.

Code Snippet (if applicable) to Reproduce Problem:
mob/Login()
spawn test()

spawn
var/savefile/F = new("savefile.sav")
F.Lock()
sleep(20)
world << "F = [F]" // F = <null>

proc/test()
var/savefile/F = new("savefile.sav")

sleep(10)

fdel("savefile.sav") // ignores the lock and dereferences the savefile


Expected Results:
The savefile remains open and available.
fdel should return false because the file is locked. (The developer should wrap this in a try/catch statement.)

Actual Results:
The savefile variable becomes null mid-loop.
The file is deleted and the savefile dereferenced, leading to errors elsewhere.

Attempting to reacquire the savefile from within the loop (by checking if the savefile object is null and then recreating it with new/savefile("file.sav")) causes the proc to hang indefinitely. The server occassionally "hiccups" as if some intensive process is running in the background. Have not been able to reproduce this after my fixes.

Does the problem occur:
Every time? Or how often?
I've got a reproducible case, but I haven't been able to reproduce it in its base form.
In other games?
So far it's only occurred in my project. Reproduced for all games.
In other user accounts?
Untested.
On other computers?
Untested.

When does the problem NOT occur?
The problem does not occur if the proc never sleeps. This isn't ideal in my case as I have to perform CPU-intensive operations which I'd rather push to the background.
The problem only occurs when fdel is used on a savefile, so there's a viable workaround by checking if you have a savefile open.

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.)
Unknown.

Workarounds:
Don't use sleep() when working with savefiles.
Manually check if you have "locked" a savefile when using fdel.
@LummoxJR: It took 3 attempts before I actually attached the files before sending the e-mail, but it's sent. Can you confirm you've received the files?
Dropbox is your best friend :X
An additional bug:

I forgot to call del(savefile) after writing to it (because the naming suggests I'd be deleting my data rather than closing the savefile object) but when you reopen the savefile later it doesn't flush the remaining bytes to the file. This leads to a corrupted savefile.

It's poor practice, so it probably shouldn't have a high priority, but if you're looking into this then you may already be dealing with the code that deals with flushing data / closing the file.
I'm not seeing any indication that you sent me a file, so I have no idea where you sent it. You can send me files at [email protected], but typically most people just put files up in storage space and send me a link in a private message.

As a general rule I would say mixing sleep() with savefiles is a terrible idea. If you must sleep(), use Lock(). But the reference disappearing prematurely is clearly a bad thing. The question is: is it a bug?

One thing I would suggest is looking at your code to see if you're calling del() on savefile vars, ever. If you are--don't! You never ever need to call del() on a savefile refrence; you should always let the garbage collector handle it. That's going to be the first thing I check in your source, because it's the most obvious way for this issue to come up and if that happens to be the case, it's not a bug. The savefile reference nulling out shouldn't even be possible without a del() call, so I'm almost certain there is no bug in play.

Also, if del() is being called, it's an indication that you're running "concurrent" operations on the savefile (letting one proc sleep while another takes it over), and the only way that can possibly work well is if you make sure to save the savefile datum's cd var before you sleep and restore it afterward. Even then I'm not certain it can work properly; you're better off always waiting for the first routine to finish before doing the second.
In response to Lummox JR
Lummox JR wrote:
I'm not seeing any indication that you sent me a file, so I have no idea where you sent it. You can send me files at [email protected], but typically most people just put files up in storage space and send me a link in a private message.

Hm, looks like it didn't arrive. I'll send you a link to it later by pager.

As a general rule I would say mixing sleep() with savefiles is a terrible idea. If you must sleep(), use Lock(). But the reference disappearing prematurely is clearly a bad thing. The question is: is it a bug?

I've tried using Lock() but it doesn't work. Reference is still set to null mid-loop.

IMO it's a bug because it's behavior that can't be reasonably controlled by the developer without constantly opening and closing files.

One thing I would suggest is looking at your code to see if you're calling del() on savefile vars, ever. If you are--don't! You never ever need to call del() on a savefile refrence; you should always let the garbage collector handle it.

I disagree with not using del on the savefile. As a developer I should always close my file streams properly. Letting the garbage collector is a big no-no in my book: how can I be sure that the file is written properly if I'm not closing my stream? Or is BYOND supposed to handle this for me in the background?

Anyway, without the use of del(savefile) the savefile is not written properly. Subsequent attempts to read it fail as the remaining bytes are not written to the file. Only del(savefile) or savefile.ExportText seem to write the file properly.

I'm doing the following operation:

1. Create savefile.
2. In a loop, populate the savefile.
3. Set the savefile reference to null after the loop (I've tried removing this line, but it makes no difference)
4. Use Dantom.zipfile to add the savefile to a .zip file. (There's a lot of redundant data in the zip file such as type paths, and adding it to a zip file compresses the file significantly.)

Maybe writing the file to a zip file is the problem, in that BYOND doesn't detect that the file is being copied and as a result doesn't close the file (as the savefile object likely hasn't been garbage collected at this point).

That's going to be the first thing I check in your source, because it's the most obvious way for this issue to come up and if that happens to be the case, it's not a bug.

The source-code I tried to send you did not have a del call anywhere in it (which I later added to fix my bug with writing to the file).

The savefile reference nulling out shouldn't even be possible without a del() call, so I'm almost certain there is no bug in play.

Regardless of the use of del() or nulling the variable manually, I'm doing this after the loop so that code shouldn't even hit until the proc is finished. IMO this is undesired behavior.

Also, if del() is being called, it's an indication that you're running "concurrent" operations on the savefile (letting one proc sleep while another takes it over), and the only way that can possibly work well is if you make sure to save the savefile datum's cd var before you sleep and restore it afterward. Even then I'm not certain it can work properly; you're better off always waiting for the first routine to finish before doing the second.

There are no concurrent operations running in my last iteration, but the bug still occurs.
In response to NullQuery
NullQuery wrote:
I've tried using Lock() but it doesn't work. Reference is still set to null mid-loop.

Something has to be setting it to null; that something must be a del().

I disagree with not using del on the savefile. As a developer I should always close my file streams properly. Letting the garbage collector is a big no-no in my book: how can I be sure that the file is written properly if I'm not closing my stream? Or is BYOND supposed to handle this for me in the background?

Think about what del() does: It looks for all matching references and makes them null. Not only is that an extensive search across all vars in use, all lists, etc., but it means that if you have that reference in another routine, it's destroyed. That's exactly what's happening to you.

Anyway, without the use of del(savefile) the savefile is not written properly. Subsequent attempts to read it fail as the remaining bytes are not written to the file. Only del(savefile) or savefile.ExportText seem to write the file properly.

I'm doing the following operation:

1. Create savefile.
2. In a loop, populate the savefile.
3. Set the savefile reference to null after the loop (I've tried removing this line, but it makes no difference)
4. Use Dantom.zipfile to add the savefile to a .zip file. (There's a lot of redundant data in the zip file such as type paths, and adding it to a zip file compresses the file significantly.)

Maybe writing the file to a zip file is the problem, in that BYOND doesn't detect that the file is being copied and as a result doesn't close the file (as the savefile object likely hasn't been garbage collected at this point).

That would definitely be the problem. The savefile hasn't been GCed. Maybe I can look into adding Flush() or something for the savefile datum. But the best thing you can do in this situation is let the savefile be closed, and then add it to the zip.

I still think you have to have some concurrent access going on, but I'll take a look.

The savefile reference nulling out shouldn't even be possible without a del() call, so I'm almost certain there is no bug in play.

Regardless of the use of del() or nulling the variable manually, I'm doing this after the loop so that code shouldn't even hit until the proc is finished. IMO this is undesired behavior.

My theory is that a different proc is calling del(). It's about the only thing that makes sense. But anyway I can take a look at the code and run it through the debugger.
Alright, I've narrowed it down. The bug is clearly on my end now (thankfully), except it's not del that causes it but rather fdel.

If a proc has a reference to a savefile and sleeps while another proc calls fdel to delete the physical file on disk the savefile is dereferenced. This occurs even if the savefile was locked with the Lock() proc.

The issue with having to manually call del to write the savefile to disk remains.

There are outstanding 2 issues with this which could be looked at:

1. Adding a Flush() or Close() proc on the /savefile object so it can be closed without calling del().

2. Documenting the behavior that calling fdel on an open savefile will dereference any variables pointing to it, even if the savefile was locked by using savefile.Lock().
I've updated my OP with a code example of the 2nd issue.

Expected results:
fdel should return false because the file is locked. (The developer should wrap this in a try/catch statement.)

Actual results:
The file is deleted and the savefile dereferenced, leading to errors elsewhere.
I'll take a look at your updated code. The original code never made it to my email sadly; I did find a couple of messages inexplicably in my spam folder, but neither had an attachment.
I think I see the problem, although it's not clear that it's a bug. fdel() ignores the lock because the lock is specific to the process, not to the proc or the savefile datum. The behavior you're seeing therefore looks to be correct.

To put it simply, savefiles do not expect to be deleted during their reads/writes, but BYOND does not prevent you from calling fdel() to force the issue. Nor am I convinced it should. Because sleeping during a save is a highly unusual operation, I would suggest taking the approach of creating a datum to assist with savefile operations.

var/list/savefilesInUse = new

proc/OpenSavefile(fname, dodel)
while(savefilesInUse[fname])
sleep(1)
if(dodel) fdel(fname)
return new/savefileMutexfname)

savefileMutex
var/fname
var/savefile/savefile

New(fname)
src.fname = fname
savefilesInUse[fname] = 1
savefile = new(fname)

Del()
savefile = null
savefilesInUse -= fname
..()

You'd then replace the savefile opens like so:

var/savefileMutex/SM = OpenSavefile("savefile.sav") // add a 1 argument to fdel
if(!SM) return
var/savefile/S = SM.savefile

This would of course be a little easier if /savefile were a true datum, but since it's not, this should suffice.
Could this behavior be added to the documentation of /savefile and fdel()?

The other thing I'm lacking is a proper savefile.Flush(). Right now I have to use del(savefile) to force the remaining bytes to be written to file, because I can't wait for the garbage collector to clean up the file (need to write it to a zip file and then delete it immediately).
savefile.Flush() looks doable. I'll put it on a list for 510.

In the meantime, the code says there's a workaround: Try creating a new savefile datum temporarily, and immediately drop the reference. You may need it to point to a file, and if so it can be the same file.
Alright, thanks.
Lummox JR resolved issue with message:
The savefile.Flush() proc has been added, to make sure any pending writes to a savefile are done immediately.