Access list's associated values by index and in forloops

BYOND Forums

Announcements · BYOND Help · Bug Reports · Feature Requests · Beta Testers · Beta Bugs · Developer Help · Design Philosophy · Demos & Libraries · Tutorials & Snippets · Art & Sound · Classified Ads · Game Updates · Contests & Events · Linux Talk · On Topic · Off Topic

ID:2078529

May 2 2016, 5:09 am (Edited on May 2 2016, 5:36 am)

MrStonedOne

Applies to:

DM Language

Status:

Open

Issue hasn't been assigned a status value.

Basically, I want a way to have a list be associated, while still allowing for that sweet sweet O(1) accessing of the values by index when the keys aren't relevant, mainly for forlooping thru the list.

Right now, if I want to have O(n log₂(n)) accessing by name, I need to make the list use keys, but when i know i want to loop thru all items, I have to loop thru the keys, then access the item by key, making what could be O(n) become O(n^(n log₂(n))) ie: needlessly slow.

This could be done a few ways (all requiring 511):

a var giving a new list of just values. (list.contents?)

Some special syntax for for loops to assign the value, not the key, to the provided var

Some special syntax for for loops to assign a the key,value to the provided vars. edit: (for (var/key => var/value in list) or for (var/key, var/value in list) or even for ((var/key, var/value) in list) if there is ambiguity concerns in for types. (thanks oranges)

Some special syntax for [] for value accessing (<> maybe?)

(I might have fucked up the bigO notation, but you get the general idea)

May 2 2016, 5:20 am
Optimumtact	why not a syntax like for key => value in listvar similar to php or the more pythonic for key, value in listvar that way you have the key and value available if you want to do anything with either

May 2 2016, 6:25 am
Ter13	for key, value in listvar This has my vote.

May 2 2016, 7:16 am In response to Ter13
CrimsonVision	Ter13 wrote: for key, value in listvar This has my vote.

May 2 2016, 10:01 am

Lummox JR

In theory, a dual-var for loop might be feasible. It'd require altering the parser and the code that compiles the list instructions, and of course a new list-handling instruction, but the concept is sound. (One down side: The current list code makes a temporary copy of the list, but copying the associative part wouldn't be so simple. Associations might be easier to handle as live lookups. Otherwise, we're looking at a bigger proc data structure and more proc call overhead.)

Sad to say you won't avoid any O(n log n) lookups with this method, though. The list items are stored separately from the associative list portion, which is a self-balancing binary tree. The list is traversed in order, so the items would still have to be handled in order.

The only other way I can see to handle this is to loop through the associative data directly, and ignore key order. Which would be consistent with other languages that implement these things as hash tables, I guess. However, this too would require making a copy of the associative portion of the array, or else here any changes to the array while traversing would be catastrophic.

OR, maybe when the list is copied the first time around I suppose I could try to make a double-length array based on traversing the associative tree (again ignoring key order) as soon as the for() is initiated. That might be the very cleanest solution, although it means doing this for a long list might suck; unlike a simple list operation that can do a quick copy on a user list, this would have to do a tree traversal right at the beginning.

May 2 2016, 10:14 am

MrStonedOne

I did mention a copy on write like way of handing this in another thread. http://www.byond.com/forum/?post=2028736

It could massively lower the overhead.

The basic idea is to store with the list, a pointer to the pointer any for loop is using to access this list. then you either copy on write, and assign that pointer to the copy, or copy when nestedly looped unless you want to have the pointer to a pointer be a vector of pointers to pointers.

This lets you snapshot the list, but only overhead when it's needed.

You raised some concern with special lists, but this could be handled using c++ objects and overloading to change the functionality for those special lists.

May 2 2016, 10:40 am

Lummox JR

Given how list looping works in DD, I think copy-on-write has a lot of challenges. For one thing, your concept only had one pointer, but any number of waiting procs might be waiting on the same list, so it seems like either I'd have to handle multiple such pointers or else do a copy as soon as a second loop sprung up for the same list. (And this is with leaving special lists out of the equation entirely; that'd have to be dealt with.)

It's probably doable, just not a hassle I'm particularly keen on jumping into. It's the sort of thing where I would expect lots of weird issues to appear over the course of several betas till it got corrected. Although list optimizations are so notoriously nasty, 506 "stable" has a pretty bad bug in it that wasn't fixed till 507.

The key,value list traversal though may be possible with a pre-copy. I'll pop it on my look-at list for 511.

May 2 2016, 10:47 am
Nadrew	Instead of trying to muck with the guts of for(), which already supports multiple formats so it might not be as tricky as I'm imagining, why not add a foreach() like most other languages come packed with these days? foreach(index as value in list)

May 2 2016, 10:49 am

MrStonedOne

the idea is you copy it, and hand that copy to the for loop, not the other way around.

this seems like a scary concept, i know, changing the pointer a forloop is looping thru mid loop, but its doable since you aren't working with threads so the window of change is rather small (the part where the for loop runs the game code with the item) and it happens after the loop has gotten the info it needs from the pointer and list.

you don't have to care about the item still being from the old list since if its a value type it's already copied and if it's a reference type it could change anyways and that's expected behavior.

or else do a copy as soon as a second loop sprung up for the same list.

As I already said, ya, you either do a vector of pointers to pointers, or just copy on inner loops, something that is rare enough in performance dependant game code anyways since O(n^n) is already bad and we know this.

Those cases will be extremely rare where it matters, that having them not have the optimization won't be a meaningful issue.

May 2 2016, 10:59 am

In response to Nadrew

Lummox JR

Nadrew wrote:

Instead of trying to muck with the guts of for(), which already supports multiple formats so it might not be as tricky as I'm imagining, why not add a foreach() like most other languages come packed with these days?
> foreach(index as value in list)
>

foreach() would confuse users IMO, and basically wouldn't be any easier in terms of implementation. I have a rough idea of how I might approach the for(key,value in list) case, and the main difficulty I think is the parser side of things.

May 2 2016, 11:06 am

Nadrew

I don't know how it would confuse people, as it's used in quite a few mainstream languages these days (PHP, C#, etc...), my syntax was off when I suggested it though, it could be improved to sync up with the way other languages lay it out better.
It's actually more of an alias for the for() usage that's being requested here, so perhaps it could be snuck in at the same time to provide some extra comfort to those used to using foreach().

I'm all for this suggestion one way or another though, I asked about how hard it would be back in the day and got the standard "compiler scary, baaaad" response, but that was early on and I don't think anyone wanted to go anywhere near that monster at the time.

May 2 2016, 11:12 am

In response to MrStonedOne

Lummox JR

MrStonedOne wrote:

the idea is you copy it, and hand that copy to the for loop, not the other way around.

this seems like a scary concept, i know, changing the pointer a forloop is looping thru mid loop, but its doable since you aren't working with threads so the window of change is rather small (the part where the for loop runs the game code with the item) and it happens after the loop has gotten the info it needs from the pointer and list.

you don't have to care about the item still being from the old list since if its a value type it's already copied and if it's a reference type it could change anyways and that's expected behavior.

I'm not too worried about changing the loop pointers mid-loop; that part seems reasonable. I'm more worried about catching all the possible write cases.

I think for copy-on-write to work I'd also need a custom list mode (for loops have a "list type" variable) to say that it was working with a raw list instead of an internal copy. Of course--and this is the bad part--that format would have to grab the list pointers each time.

Here's what I mean, as far as the internals go: The proc data struct has a bunch of info on the list it's iterating:

// this is only a rough overview
int flist_type;
Value *flist;   // copied list; proc data struct owns these values and the pointer
int flistS;     // list item count
int flistI;     // iterator index
ItemList *flistL;   // for linked lists (don't ask)
Value fulist;   // for raw access; proc owns this value

To avoid complicating the routine that frees proc data, I don't want to require it to check the type to see if it needs to free flist and decref its items (which it does now), or if it can skip them. So my concept for a copy-on-write would be a special list type that would only impact the iteration process. That changes the iteration from this:

value = pd->flist[pd->flistI++]

to:

UserList *list = GetUserList(pd->fulist);
value = list ? list->item[pd->flistI++] : NULL_VALUE;

(Also the length lookup to see if flistI is too high already has to factor in there somewhere.) In a long list, I have my suspicions that the extra step of looking up the list on each iteration (even though it's an O(1) lookup, of course) might be a problem. Or maybe not. It's very hard to gauge how CPU usage would be impacted. I certainly suspect the allocation and copy is worse for most use cases, so there's that.

Of course there's also the infinitesimal check for copy cases on list writes, which could potentially add up.