ID:148231
 
I'm quite happy with the upcoming XML library (currently in the queue to be released on BYONDscape), but there is one problem I need to deal with:

It leaks like a sieve.

An XML document is parsed, and each element is turned into an object, so you get a tree (object graph) of objects, with parents having references to their children, and children references to their parents.

These references mean that objects never naturally get garbage collected, since someone always has a reference to them.

Is there some design approach around this, or am I forced to tell users they must always manually delete their XML trees?
Whenever I have to deal with vigorously referenced objects like that, the only way I know of to force garbage collection is to auto-delete an object on a timer. There might be some way you could remove some references but leave others intact, so for example a tree's references might be set to propagate downward but not upward--even temporarily, for the purposes of aiding garbage collection.

I deduced that a single-linked list won't have this problem if the head of the list is no longer referenced by anything, but a double-linked list will be impossible to remove. Unfortunately a single-linked list is harder to work with, and it only barely improves the garbage collection woes.

Ultimately this could be cleared up by BYOND incorporating the concept of internal references, where a reference needn't count for anything. A new special form of var might work for that, such as var/ref. (It'd also need a tmp counterpart, or at least to work with var/ref/tmp or var/tmp/ref.)

Lummox JR
In response to Lummox JR
Lummox JR wrote:
Whenever I have to deal with vigorously referenced objects like that, the only way I know of to force garbage collection is to auto-delete an object on a timer.

Yeah but the user may have a reference to it, for an arbitrary amount of time...this would be doable if I could get my hands on the reference count, though. Then the library could track all existing XML trees, and on a timer it could check for those where the root had no more references than it had children, and delete it.

Unfortunately a single-linked list is harder to work with, and it only barely improves the garbage collection woes.

Yeah you can't seriously navigate without knowing your parent.


Ultimately this could be cleared up by BYOND incorporating the concept of internal references, where a reference needn't count for anything. A new special form of var might work for that, such as var/ref. (It'd also need a tmp counterpart, or at least to work with var/ref/tmp or var/tmp/ref.)

Now that's an interesting idea, and certainly more elegant than my proposed solution above (exposing reference counts probably only leads to evil temptations anyway).

Hmm, I wonder if this is what is meant by "weak references"; I've seen the concept bandied about in Perl, and perhaps that's why the Perl XML library I use doesn't have this problem...
In response to Deadron
Deadron wrote:
Ultimately this could be cleared up by BYOND incorporating the concept of internal references, where a reference needn't count for anything. A new special form of var might work for that, such as var/ref. (It'd also need a tmp counterpart, or at least to work with var/ref/tmp or var/tmp/ref.)

Now that's an interesting idea, and certainly more elegant than my proposed solution above (exposing reference counts probably only leads to evil temptations anyway).

Hmm, I wonder if this is what is meant by "weak references"; I've seen the concept bandied about in Perl, and perhaps that's why the Perl XML library I use doesn't have this problem...

Likely enough it is; I can't think of a more logical explanation for the term. It only makes sense that some references would be used only to track an item while it exists, but not necessarily keep it from being deleted. BYOND greatly simplifies the problem in that it can null out bad references, too, so in theory this could be an easy feature creep.

It strikes me that there's a real danger of related objects being deleted however if all relational references are weak. For example if you have a link to a branch of the tree, but no other working references, you still may need the whole tree. Therefore the weak references within the tree actually have to prevent deletion so long as any arbitrarily-connected object that uses them has references from an outside source.

The solution to this might be that weak references would have to be used in very very careful circumstances. For example, your tree might actually need to use hard downward/sibling references, but use weak upward references. The tree would then only last as long as you maintained a reference to its top node. Any time you'd want to hold onto a branch of the tree and use it later, you'd have to hold onto the top as well or risk "pruning" it (so that only the branch and its children remain accessible). So I imagine you could have something like this:
tree
var/item
var/tree/firstchild
var/tree/sibling
var/ref/tree/parent

var/tree/top // this "fixes" the branch and tree in place if set

// seize a branch and force the tree to stay in memory
proc/Grasp()
if(!parent) top=null
else
top=parent
while(top.parent) top=top.parent

// the tree can go away if all branches are out of use
proc/LetGo()
top=null

proc/LetGoAll()
LetGo()
if(firstchild) firstchild.LetGoAll()
if(sibling) sibling.LetGoAll()
That's the best I can come up with at the moment.

The way I see it, the real key to understanding this is seeing the tree as a big ball of yarn tied to the rest of the program via external references. It doesn't really matter which point in that ball of yarn is referenced; the others will be needed too. I can't really think of any good way of tracking external references so that the whole structure is aware of them.

I suppose it'd be possible, by vastly complicating the garbage collector, to propagate through weak references and ultimately decide whether to let a mass of items go. (Then all the tree's references would be weak, as I'll explain in a bit.) Imagine that objects "hang" from each other, where ultimately one of them must bear the weight of itself and those below it via an external reference. Objects they hang from may in turn hang from others, and so on. Obviously this can't be done for the entire system so it makes sense only to treat weak references this way. Iterate through a list of objects, starting with all objects that have both hard and weak external references. For every weak reference from this object to another object that may or may not be safe to delete, mark the referenced object safe from deletion and recurse.

Internally it'd look something like this:
proc/WeakReferenceGarbageCollect()
// assume all datums at this point have at least 1 external reference of some kind
var/list/foundation=new
var/list/hanging=new
var/datum/D
for(D)
if(D.hard_refcount)
if(D.weak_references.len) foundation+=D
else hanging+=D
for(var/i=1,i<=foundation.len,++i)
D=foundation[i]
for(var/datum/A in D.weak_references)
foundation+=A
hanging-=A
for(D in hanging)
del(D)
Though that's rather ugly.

Lummox JR
In response to Lummox JR
Lummox JR wrote:
The solution to this might be that weak references would have to be used in very very careful circumstances. For example, your tree might actually need to use hard downward/sibling references, but use weak upward references. The tree would then only last as long as you maintained a reference to its top node.

This seems like a good approach; and given the probably limited uses of weak references (pretty much libraries like this), it makes sense to require users to follow policies like this rather than slow down garbage collection by trying to figure out the hard problem of whether this is part of a tree with no external references.

The short-term 64k question: Is there a way to implement this in DM, so the XML Library could do it before any future BYOND changes?

At the moment I don't think so, since I have no access to the reference count, and therefore no way to know whether the root node is being held or not.
In response to Deadron
Deadron wrote:
The short-term 64k question: Is there a way to implement this in DM, so the XML Library could do it before any future BYOND changes?

At the moment I don't think so, since I have no access to the reference count, and therefore no way to know whether the root node is being held or not.

I can't think of any such way, no. The changes involved would necessarily be very internal.

Lummox JR