ID:1988878
 
Resolved
Experimental: Var access has been changed internally to use sorted arrays instead of linked lists, in the hopes of improving speed for some intensive calculations.
Applies to:Dream Daemon
Status: Resolved (511.1348)

This issue has been resolved.
I'm making a thread for this from the off topic discussion in ID:1955485 as a reminder for 510

Byond stores datum variable in two places:

once, on the type, as a unsorted array with compile time values (this is what initial() uses, and a separate linked list, of only CHANGED variable from initial() on the actual internal datum

When accessing a variable, byond first has to search the changed vars, but because it's a linked list, has to transverse the linked list to find it (this is slower than looping thru every item of a normal list because it's harder for the cpu cache to know what part of memory you are gonna access next, as linked lists are spread all throughout the memory)

If it doesn't find it, it checks the initial() list for the type, the parent type, and so on and so on.

There are three avenues for optimization there.

One, you can resolve the type and index of variables at compile time for the initial() list and store it in the dmb attached to each var if they are typed (and fall back to the old lookup if there is a type mismatch at runtime)

Two, You can store the list of changed variables in an array rather than a linked list to make the cpu cache work better to optimize accesses

Three, you can combine both of the above by storing the complete list on the datum rather than just the changed list, as an array, and have compile time resolve what index each variable access is located at in the internal array and store that in the dmb. (and fall back to a array search if there is a type mismatch at runtime)

I would REALLY love to see the third option, even if it's behind a #define flag of some kind.

It would increase memory usage, slightly increase New() cost, but massively decrease variable access overhead.

That being said, 1 and 2 would also work nicely
The approach I plan to take is twofold:

First, your second option is on my radar. Replacing the linked list with an array has a few potential downsides on write, but on the whole I think it'll make reads faster.

The other thing I want to do is optimize the proto-object lookup by not only sorting its var list, but including its parent type's vars automatically (the ones not overridden) rather than making it search up the chain. (I'd also need to take out the code that moves up the chain, of course.) Since type vars don't change at runtime, that's perfectly doable.
EDIT: I took way too long to write this, nevermind all. Leaving it here for the sake of leaving it.

MrStonedOne wrote:
If it doesn't find it, it checks the initial() list for the type, the parent type, and so on and so on.

LummoxJR wrote:
[...] After the changed vars are looked up, it then start searching up the tree. (That can be improved too, IMO, by "flattening" the search when the world is loaded.)

Note that I'm assuming here that none of your three paths for optimizations were meant to address this. I'd be curious to see some kind of statistics on how far up the type tree, on average, parent lookups of procs/vars have to go for the latest defined initial value, and whether this part could alternatively be improved with some kind of internal vtable.

Pro being that it's probably easier/cleaner to do this at compile-time when there's not potentially a lot of other init to be done before a world is usable.

Depending on how it's used on the VM side, the overhead may not be bad at all. Only do a lookup on the vtable if our search of type-local/instance-local lists failed? Performance may be not-so-great, but for some use cases you'd see an increase on-average.

Have dirty vars write back out to the vtable, and var access always do a lookup against the vtable before doing attempting any lookup of type-local/instance-local lists? May not be too bad- obviously don't know anything about how the VM stores vars itself, but perhaps record (with var's info, if possible, for some delicious locality there) whether or not this specific var has already written to the instance's vtable. You've got an extra byte of overhead for that recording, but it could save you from having to needlessly update the vtable. My guess would be that this approach would, on average, benefit most games of decent scale.

Word vomit, I'll come back later when I'm not vomiting word.
My plan for optimizing the type tree lookup is to do that at world init. The number of types would never be high enough to make that costly. Internally, type var lists are ID arrays, which is kind of unfortunate in another way because it's yet another layer of abstraction; I may or may not change that as well, but at the very least I think a sorted list that includes the parent type's vars is worth doing.
Replacing the linked list with an array has a few potential downsides on write, but on the whole I think it'll make reads faster.

Odds are you are already planning to do this, but you could just have it pre-initialized with the larger of n items or n% of the number of vars.

Then grow by that same metric.

That being said, what's the chance on seeing this in 510. I'm really looking forward to doing comparisons.

(in fact, if you want you can always pager me with test builds to play with =) )

edit: It just occurred to me that this would likely also improve the del overhead.
var access overhead, a visual: http://pastebin.com/pP39pLDj
Very interesting data. It shows that the overhead is non-negligible over a very high number of iterations.

One thing I'm curious about is what would happen if the current linked list were changed to a direct array, with no other changes to the logic. It seems likely that CPU caching would have a small but not insignificant impact on that.

A binary lookup on a direct array would be even better, although I do think there's potential for some games to perform worse with that change. The minor overhead of the binary search might negate the benefits of avoiding a simple loop, especially when accessing the first var in the list (where the simple loop is fastest).
I could see the overhead in sorting them... but that would happen at most vars.len times per object...

it's extremely unlikely that a project would be modifying more often than reading them in a performance dependent setting...

Edit: You could wait to sort until the first read after a write, and use some dirty marking to cut down on that overhead.

(in fact, if you did that with strings and string table searching you'd cut down the overhead there as well.)
So tobba was pointing out that change 2 would also be a non-trivial reduction in memory usage of datum/atom vars by not having the pointer to the next var, and not having a pointer to the actual contents of the var (assuming you use an array of vars rather than an array of pointers to vars)

This might pave the way in doing change 3 (maybe as a compile time define)
The next pointer would be a small savings, it's true. I think there'd also be a small savings in fragmentation issues, although that balances out with the fact that new var allocations would require the array to be reallocated, and if there's slack size (I'd likely do that) to avoid that it could end up being a little more memory. I'm honestly not sure how it'll shake out, but I do think the CPU cache is worth aggressively going after regardless.
We have objects that have 15 to 20 changed vars, and lots of them, so i do think the slack size overhead wouldn't cut too much in to the memory savings...

(as long as you make it short circuit so it doesn't have a slack+real size higher than vars.len)
As an update, Last time i talked with lummox, he said he would be getting to this in a few 510 releases once he has the bugs of 510 worked out and he can focus exclusively on this without distractions and ensure it gets done right and everything is changed over.
I always wondered why we used a flexible list of changed variables under the hood.

My experimentation with writing objects into a VM I was building for research showed that the best solution was one quite similar to that which C++ uses, in that classes contain data that informs the structure, and instances themselves are just a region of memory that contain only mutable properties.

DM's design, I always felt, was overly flexible for no significant benefit because DM doesn't take adequate advantage of flexible object structure like Javascript or LUA (I know, there are no objects in LUA, it's metatables, but same thing really).
I ran the tests, if we got rid of the changed vars modal, ss13 for example would go from using 600mb to 2.1gb (just under the 32bit single process memory usage limit)

and the moment byond did any of it's internal maintenance everything would crash as it would go over.

I tested by making a var that looped thru everything in world, and looped thru every var in .vars, setting it to itself (sans blacklisted read only vars like vars or type or appearance vars)
The last thing byond devs need is to have to start to do the math that adding on vars equals xyz*12 extra memory for that object and all child objects.

You do want people to use OOP right? and not start relying on : and try/catch because they add vars on as late in the object tree as they can to avoid memory overhead.
@StonedOne: This sounds like a having/eating cake situation. The necessary sacrifice for memory saving is multiple lookups, causing a CPU hit. The sacrifice for speed is consistent in-memory data alignment, causing a memory hit.

they add vars on as late in the object tree as they can to avoid memory overhead.

This is gonna give me Forrest Whitaker eye. Jamming variables in too low level and not properly taking advantage of polymorphism is in itself the root of this complaint.

There are situations where an istype() is necessary in DM, but over-reliance on it is a symptom of avoiding proc call overhead and the lack of inlining that DM allows.

I can understand though your point, given how far up the creek in terms of project size you are.
I for one welcome increased memory usage if it means cutting down a good chunk of cpu usage.
In response to Clusterfack
Clusterfack wrote:
I for one welcome increased memory usage if it means cutting down a good chunk of cpu usage.

I'd agree with that to an extent but...

MrStonedOne wrote:
I ran the tests, if we got rid of the changed vars modal, ss13 for example would go from using 600mb to 2.1gb (just under the 32bit single process memory usage limit)

I don't agree with pushing Byond's biggest bread maker into even further-ly unstable situations.
SS13 is BYOND's biggest game, but I'm not sure about being the biggest breadmaker. Much of the SS13 community are not Members, and while they drive more ad revenue I'd much prefer to be getting Memberships and subscription percentages. (SS13 isn't a game that can use BYOND's subscription system directly, but it'd be nice if some more Member perks were built in to the various builds.)
In response to Lummox JR
Lummox JR wrote:
SS13 is BYOND's biggest game, but I'm not sure about being the biggest breadmaker. Much of the SS13 community are not Members, and while they drive more ad revenue I'd much prefer to be getting Memberships and subscription percentages. (SS13 isn't a game that can use BYOND's subscription system directly, but it'd be nice if some more Member perks were built in to the various builds.)

at /tg/station13 I recently (like, 2 months ish ago) redid how we do orbit()s, and was able to add unique shapes to ghost orbits for members.

But things like that and more save slots are all we feel comfortable adding as benefits for members, We don't want members to have an advantage, it's just plain not fair.

Page: 1 2 3 4