ID:1549601
 
Resolved
When objs and mobs were deleted, their deletion could take longer than it should if any images were in use in the world, even if the obj/mob itself had no references left.
BYOND Version:506
Operating System:Windows 7 Ultimate 64-bit
Web Browser:Firefox 28.0
Applies to:Dream Seeker
Status: Resolved (506.1244)

This issue has been resolved.
Descriptive Problem Summary:

Object initialization in BYOND depends upon a series of reference numbers. When a reference number is freed, it may not be contiguous. Therefore, in order to keep track of these reference IDs that have been freed, ether a search for an available reference must be performed, or a collection of discarded references must be maintained. (more likely).

The result of this, is that creating games with dynamic maps, or repeated creation and deletion of objects results in progressively longer object initialization times. This would single-handedly account for Space Station 13's progressively worsening performance, and limits the viability of things like particle systems.


The testing environment:

The following test environment will prove without a doubt that the creation of a large number of objects, and their subsequent deletion will result in permanently impacted performance.

The following test environment will also prove without a doubt that the issue is not due alone to the number of objects in existence, but rather the number of dead references. Ironically, performance is less hindered by having a lot of objects in your world, than by removing a lot of objects from your world. By a more than significant margin.

var
list/prefill[1000000]
objscreated = 0
runtime = 0
lastobj = null
avobjs = 0

mob
Stat()
stat("count:",num2text(objscreated,10))
stat("runtime:","[runtime / 10] seconds")
stat("lastobj:",lastobj)
stat("average:","[avobjs] per second")

client
verb
pre_fill()
world << "Object buffer being filled..."
sleep(world.tick_lag)
var/obj/o
var/start = world.timeofday
objscreated = 0
runtime = 0
lastobj = null
for(var/count=1;count<=1000000;count++)
o = new()
global.prefill[count] = o
global.lastobj = "\ref[o]"
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
world << "Object buffer filled."

clear_fill()
world << "Object buffer will be cleared..."
sleep(world.tick_lag)
var/l[1000000]
global.prefill = l
world << "Object buffer cleared."

massobj()
world << "Object mass reference/dereference begin..."
sleep(world.tick_lag)
var/obj/o
var/start = world.timeofday
objscreated = 0
runtime = 0
lastobj = null
for(var/count=0;count<1000000;count++)
o = new()
global.lastobj = "\ref[o]"
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
world << "Object mass reference/dereference complete."
massmob()
world << "Mob mass reference/dereference begin..."
sleep(world.tick_lag)
var/mob/o
var/start = world.timeofday
objscreated = 0
runtime = 0
lastobj = null
for(var/count=0;count<1000000;count++)
o = new()
global.lastobj = "\ref[o]"
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
world << "Mob mass reference/dereference complete."
Case 1: Initial mass testing:

Sequence of verbs:

1) massobj
2+) massobj
last) massmob

You will note that in this testing sequence, 1 million objects are created in sequence, and not allowed to be cleared.

The first massobj test results on my machine in 1 million objects being created in approximately 21 seconds. This is an approximate object creation rate of 47,000 objects per second.

Every subsequent massobj test will result in 1 million objects being created in approximately 780 seconds, despite the fact that none of the objects from the previous test are preserved. You will also notice that the objects do not reclaim recently-dereferenced ids, but instead consume reference ids that were abandoned prior to the current tick. This behavior results in less than 1400 objects being created per second thereafter until the world is rebooted.

You will also see that the first massmob test you run will result in 1 million mobs being created and dereferenced in roughly 21 seconds again, back up to 47,000 objects per second.

However, every subsequent massmob test you run will similarly climb just as the massobj test did.

Interpretation:

This indicates that finding the reference id for the individual object is to blame. Mobs and Objs use separate means of tracking references, and therefore, also demonstrates that there is little effect on initialization of objects relating to the number of objects in the world as a whole. It also indicates that garbage collection itself as a process cannot be the cause of the increased CPU usage.

Beta 506.1243

1) massobj: 1M @ 3.4 seconds, 294,118/sec, lastobj: 0x2000001 (mem stable at 15 megs)
2) massobj: 1M @ 3.4 seconds, 294,118/sec, lastobj: 0x2000000 (mem stable at 15 megs)
3) massmob: 1M @ 3.5 seconds, 285,714/sec, lastobj: 0x3000002 (mem stable at 15 megs)

Test 1 results in expected behavior with added bonus of massive speed improvements on the order of 500-600% faster than before.
Case 2: Increasing the object count

sequence of verbs:

1) pre fill
2) massobj
3+) massobj
4) massmob

In this test scenario, you will note that the list of 1 million objects is filled prior to mass-testing.

The pre-fill takes on average, 25 seconds, which is our standard rate of 47,000 objects per second on my test machine.

The very next iteration of massobj results in 77.4 seconds on average, which is a rate of 12919 objects per second.

The third iteration results in again, around 780 seconds, or our <1400 objects per second.

Our final test of massmobs results in 25 seconds again, back down to our 47,000 objects per second rate.

Interpretation:

This indicates that while we do lose some not insignificant speed from having a large number of objects in the world, it is not the biggest impact on performance.

Beta 506.1243

1) prefill: 1M @ 3.4 seconds, 285,714/sec, lastobj: 0x20F4240 (mem 15megs -> 118megs)
2) massobj: 1M @ 3.7 seconds, 270,270/sec, lastobj: 0x20F4242 (mem stable at 118megs)
3) massobj: 1M @ 3.6 seconds, 277,778/sec, lastobj: 0x20F4241 (mem stable at 118megs)
4) massmob: 1M @ 3.7 seconds, 270,270/sec, lastobj: 0x3000002 (mem stable at 118megs)
Case 3: Reversing the creep

Sequence of verbs:

1) massobj
2) pre fill
3) massobj
4) massobj

In this test scenario, you will see that the first massobj results in our standard 25 seconds @ 47,000 objects per second.

Running a pre fill after the initial massobj test took an average of 600 seconds, which was barely an improvement at 16000 objects per second.

Running the massobj test again results in a final result of approximately 98 seconds. This is a rate of 13,000 objects per second.

Running the massobj test for a third time reverts us back to 732 seconds, back down to about 1400 objects per second.


And as an added bonus, you'll notice that running clearfill and then re-running massobj, will give us times of 734.7 seconds once again, or about 1400 objects per second.

As always, running massmob indicates that mobs are not affected by the object reference count, nor the number of objects in the world at all.

Interpretation:

This test proves definitively that object deletion isn't the culprit. Instead, it shows that binding an reference ids for nearly 75% of the CPU usage involved in managing object instances.

It also proves that the issue is in binding ids alone, as the problem reverses itself the moment the excess freed IDs are removed from the equation by re-binding them.

Beta 506.1243:

1) massobj: 1M @ 3.5 seconds, 285,714/sec, lastobj: 0x2000001 (mem stable at 15 megs)
2) prefill: 1M @ 3.5 seconds, 285,714/sec, lastobj: 0x2000001 (mem 15megs -> 118 megs)
3) massobj: 1M @ 3.4 seconds, 294,118/sec, lastobj: 0x20F4242 (mem stable at 118megs)
4) massobj: 1M @ 3.7 seconds, 270,270/sec, lastobj: 0x20F4241 (mem stable at 118megs)
I've taken the liberty of creating a few new tests to prove that manual object deletion should frankly, never be done. Ever, in any situation by any developer.

var
obj/linkobj/rootlink

obj/linkobj
var
obj/nextobj
obj/lastobj
client
verb
linkobj()
world << "Object mass link creation begin..."
sleep(world.tick_lag)
var/obj/linkobj/o
var/obj/linkobj/lo = new()
global.rootlink = lo
var/start = world.timeofday
objscreated = 1
runtime = 0
lastobj = "\ref[lo]"
for(var/count=1;count<1000000;count++)
o = new()
o.lastobj = lo
lo.nextobj = o
global.lastobj = "\ref[o]"
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
lo = o
world << "Object mass link creation complete."

unlinkobj()
world << "Object mass link breaking begin..."
sleep(world.tick_lag)
var/obj/linkobj/o = rootlink
var/obj/linkobj/no = rootlink.nextobj

var/start = world.timeofday
objscreated = 0
runtime = 0
lastobj = null
for(var/count=0;count<1000000;count++)
global.lastobj = "\ref[o]"
del o
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
o = no
no = o.nextobj

world << "Object mass link breaking complete."


This test will demonstrate that the deletion of objects with references is reasonably fast enough.

Order:

1) pre-fill
2) linkobj
3) unlinkobj

Prefill runs at the expected 20.2 seconds, that's over 49,000 objects per second.

The subsequent linkobj runs at around 56.4 seconds, which is over 17,000 objects per second.

Unlinkobj ran for 30 minutes before I gave up on it. I then modified unlinkobj to simply run through and force garbage collection to occur. Without generating the first million objects, forcibly deleting the objects took over half an hour before abandoning it. Meanwhile, simply unlinking the objects and allowing them to be garbage collected took only 3.4 seconds.

Interpretation:

Automatic garbage collection is the only answer. If you are using the del keyword, you are going to murder your performance. Upon reaching the millions of objects, deleting even a single object can begin to take a very, very long time. Simply removing references to the object when it is no longer required is far better, and will not result in significant CPU consumption, whilst still releasing the object for reuse.

Beta 504.1243

1) prefill: 1M @ 3.5 seconds, 285,714/sec, lastobj: 0x20F4240 (mem 15megs -> 118megs)
2) linkobj: 1M @ 4.2 seconds, 238,095/sec, lastobj: 0x21E8480 (mem 118 megs -> 284 megs)
3) unlinkobj: 1M @ 4.5 seconds... unimportant test. Manual deletion was not improved; nor need it be.
Lastly, I'd like to discuss how to avoid these issues, for the most part. In some sense, these speed issues are unavoidable. However, if you merely need to mass-create a large number of objects, which will not be preserved, you can preserve 95% of the normal speed, as opposed to permanently crippling it.

The trick, is to preserve a set number of objects in the world where possible, as well as to ensure you never delete more than a few thousand objects per frame.

var
list/prefill[1000000]
obj/linkobj/rootlink
objscreated = 0
runtime = 0
lastobj = null
avobjs = 0

mob
Stat()
stat("count:",num2text(objscreated,10))
stat("runtime:","[runtime / 10] seconds")
stat("lastobj:",lastobj)
stat("average:","[avobjs] per second")

obj/linkobj
var
obj/nextobj
obj/lastobj

client
verb
pre_fill()
world << "Object buffer being filled..."
sleep(world.tick_lag)
var/obj/o
var/start = world.timeofday
objscreated = 0
runtime = 0
lastobj = null
for(var/count=1;count<=1000000;count++)
o = new()
global.prefill[count] = o
global.lastobj = "\ref[o]"
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
world << "Object buffer filled."

clear_fill()
world << "Object buffer will be cleared..."
sleep(world.tick_lag)
var/start = world.timeofday
objscreated = 0
runtime = 0
lastobj = null
for(var/count=1;count<=1000000;count++)
global.prefill[count] = null
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
world << "Object buffer cleared."

massobj()
world << "Object mass reference/dereference begin..."
sleep(world.tick_lag)
var/obj/o
var/start = world.timeofday
objscreated = 0
runtime = 0
lastobj = null
for(var/count=0;count<1000000;count++)
o = new()
global.lastobj = "\ref[o]"
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
sleep(-1)
world << "Object mass reference/dereference complete."

linkobj()
world << "Object mass link creation begin..."
sleep(world.tick_lag)
var/obj/linkobj/o
var/obj/linkobj/lo = new()
global.rootlink = lo
var/start = world.timeofday
objscreated = 1
runtime = 0
lastobj = "\ref[lo]"
for(var/count=1;count<1000000;count++)
o = new()
o.lastobj = lo
lo.nextobj = o
global.lastobj = "\ref[o]"
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
lo = o
world << "Object mass link creation complete."

unlinkobj()
world << "Object mass link breaking begin..."
sleep(world.tick_lag)
var/obj/linkobj/o = rootlink
var/obj/linkobj/no = rootlink.nextobj

rootlink = null

var/start = world.timeofday
objscreated = 0
runtime = 0
lastobj = null
for(var/count=0;count<1000000;count++)
global.lastobj = "\ref[o]"
o.lastobj = null
no = o.nextobj
o.nextobj = null
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
o = no

world << "Object mass link breaking complete."

massmob()
world << "Mob mass reference/dereference begin..."
sleep(world.tick_lag)
var/mob/o
var/start = world.timeofday
objscreated = 0
runtime = 0
lastobj = null
for(var/count=0;count<1000000;count++)
o = new()
global.lastobj = "\ref[o]"
global.runtime = world.timeofday - start
global.objscreated++
if(global.runtime>0)
avobjs = objscreated / (runtime / 10)
sleep(-1)
world << "Mob mass reference/dereference complete."


You will notice that unless we preserve and then subsequently remove millions of objects, the non-preserving functions will only cause around 10,000 objects to be deleted before allowing garbage collection.

This results in the first million objects created but not preserved repeating the same freed reference ids. This improves the speed of many of the subsequent creation methods back to around 38,000 to 42,000 objects per second.

Conclusion:

BYOND's reference ids may have been increased, but they are in dire need of severe improvement, and I think merely addressing the reference ids will result in significant speed improvements in quite a large number of games, as well as result in the likelihood of map saving and loading becoming increasingly viable in a multiplayer scenario without it requiring undue expertise to eliminate the potential for permanently destroying your performance.

Beta 506.1243

This approach no longer offers any improvement.
I can agree that deletions are one of the bigger slowdowns, seeing Del() procs on top most of the time. (While this can be our own issues, mass deletions of simple items still takes quite some time.)
In response to Laser50
Laser50 wrote:
I can agree that deletions are one of the bigger slowdowns, seeing Del() procs on top most of the time. (While this can be our own issues, mass deletions of simple items still takes quite some time.)

Oh, no, believe me you are wrong there. You'll see when my test scenarios are finished. The answer is actually much simpler than that.
I can imagine this might be a cause of the reason why we have to reboot Eternia every day or two when nothing is seemingly using much CPU but the game is running like crap and world.cpu is at something crazy like 150%-200%.

The game does have quite a lot of object creation and deletion happening and the problem does seem to be some kind of leak that gets progressively worse over time, this progressive performance loss might be the problem, or at least contributing.
You can check whether this is the case by initializing an object and then checking its \ref[obj] number. If it's in the upper millions... Which I'm sure it is due to the HUD alone, that would definitely be the cause of your problems.
I have a few other things on my plate this weekend for the game, maybe I can toss in a few debugging commands to check this when the game gets sluggish.
Like I said, it gets better. The test I'm working on shows some really interesting results... Results that further demonstrate my point that this isn't an unavoidable progressive problem, but rather, one that's completely exascerbated by poor practices.

I don't want anyone to pretend that this is a case of BYOND's failings alone, but rather, BYOND simply never being designed to handle millions of objects crossed with people not understanding that programming isn't just throwing crap on a wall and seeing what sticks. You have to work *with* your engine.
Alright, that's it for me on this subject. I feel like I've made a clear case here, and it's obvious that long-running or dynamic games in DM have what appears to be a runaway performance drain going on, which is completely due to the fact that BYOND's reference id system was only intended to handle 2-byte object indices, and as such, was never intended to scale to millions of objects.
Quick point that I didn't make here, in order to fix the reference counting issue, a distinction is going to have to be made for objects that are created, and then garbage collected in the same frame. If an object is garbage collected and it hasn't been communicated to clients yet (should happen at the end of the frame), pop the reference back on the top of the stack, which will prevent the massive swelling of the reference ids from instant object/datum creation/deletion.
I'm not sure why this merited a new thread, inasmuch as there's already a thread regarding object creation and deletion where this was being discussed. But as I mentioned in that thread, your picture of how the garbage collector works isn't entirely correct. However the O(n) loop responsible for finding an available ID is problematic in an environment like this one, and I agree having some kind of stack or queue for freed IDs would be better.
But as I mentioned in that thread, your picture of how the garbage collector works isn't entirely correct.

I felt that this merited a new thread because the information in the last thread wasn't entirely accurate to the root of the problem.

I'll take that, for sure. Seeing as I have very little functional understanding of how it works on the inside for certain. All of my understanding is based on how I'd think to implement it, as well as quite a lot of experimentation that has helped me find strange behavior that couldn't be explained by my thoughts on how it was implemented.

However, you have to admit, my discovery that creating and then subsequently dereferencing objects in the same tick does not result in the object's reference being freed for reuse until the next frame. So in some way, the process of finding an empty ID isn't taking care of things in a logical manner.
I've made some major changes to address these issues in 506.1243 for most object types. Hopefully this will result in a speed improvement for some games. I suspect that the degree of improvement will depend greatly on the game's "action quotient", so some like SS13 might see a bigger boost than a typical RPG, but it's really hard to be sure where to expect the most impact.
You know the devil's in the details and the details always tickle my fancy. Care to elucidate on the major changes themselves, and the strategies you employed to solve this issue?

Of course, at your leisure.
For the following types, I employed a stack (each) to track available IDs:

obj
mob
area
image
datum
list
idarray (used internally by lists like overlays)
string

This changes the ID choice from O(N) to O(1). I expect this to actually show the biggest improvement in string handling, as those churn much more frequently. Appearances, which also may churn in some environments, already use a totally different system that's pretty well optimized.

An improvement previously applied to ID arrays and strings has been added to datums as well, which is that the struct for them isn't freed but kept in a linked list of available structs. I did not do this for the other types yet; it's probably a minor enough change for the datums that I doubt it will amount to much there.

Objs, mobs, and images no longer use the old system we had to try and keep them around for a few ticks. (Ostensibly they would stick around at least 4 ticks. In actuality they would linger 1-4 ticks.)

I also improved some of the code used to track the extents of big-icon atoms and images, to avoid a lot of malloc/realloc/free in the map sending routines. Might make a slim difference in games that have some big icons, but it was more about simplification than optimization, as I expect the impact to be low.

In the process of this I also had to track down a couple of bugs with memory allocation that turned out to be unrelated to these changes (but I feared they were, hence the hunt)--the grid control in particular had issues with images.

In this process I also noticed that associative lists might benefit from the optimization that saves structs instead of freeing them right away, so that's a possible future target. Each item in the associative list is a struct in a binary tree. SotS II's pathfinding and AI uses frequently changing associative lists quite a lot.
Good... God, Lummox... You've absolutely outdone yourself.

Results in my test environment have been updated to demonstrate at least the object creation end of things.

I'm currently tracking a 600% improvement over the last incremental beta. This isn't a small fix like you implied, it's absolutely major.

Bravo, sir. Hats off to you.
Page: 1 2 3