Most of the changes focus on improving performance. There are a few ways this is done:
1. Minimizing the impact of idle mobs. Movement loops are called for mobs that aren't moving, but some things can be done to make the loops take less CPU time for idle mobs.
2. General performance tweaks to reduce the CPU time needed for the most expensive procs: set_flags(), pixel_move(), and set_pos().
3. Support for a 2D only mode. By default the library gives you 3D movement and collision detection but not everyone will need or use this. Even if your game doesn't have jumping, the collision detection is still operating on 3D bounding boxes which takes extra CPU time.
The update adds a new file, _flags.dm, which contains some compiler flags you can set to change the behavior of the library. To enable the 2D mode, you put #define TWO_DIMENSIONAL and the library, when compiled, will use a different version of pixel_move that's 2D only. In fact, setting this flag will disable the definition of all things related to 3D movement (pz, pdepth, vel_z, etc.).
The update also includes a new demo (benchmarks\demo-1) which creates a number of mobs that move in random directions and bounce off walls. You can easily edit the code to change the number of mobs that are spawned. This demo is good for comparing the performance with and without the new compile-time flags.
Performance Improvements
Here's what the mob's default movement loop looks like. The indentation is used to show which procs are called from which other ones (ex: set_pos is called from pixel_move, pixel_move is called from movement):
movement()
set_flags()
gravity()
action()
set_state()
pixel_move()
set_pos()
The biggest CPU users there are set_flags, pixel_move, and set_pos.
set_flags()
I was able to improve performance quite a bit by changing it from using oview(2,src) to nearby(). I'm not sure why it was using oview(), there may have been a reason for that. Here's the performance difference:
using oview(2,src), 377 μs per call
using nearby(), 123 μs per call
Note: 1 second = 1000 millisecond (ms), 1 ms = 1000 microseconds (μs)
Note: All timing measurements were done while running the demos on a 2.01 GHz CPU.
set_flags is one of the most expensive procs in the library (even after this improvement). The good news is that calling set_flags can often be avoided. Generally, mobs that can't jump or fall (ex: most projectiles) don't need to call set_flags. The bad news is that because set_flags is often not needed, improving its performance will often have no effect.
set_pos()
I was surprised that set_pos had become a CPU hog. I found that this code was the reason why:
var/list/_bottom = below(1)
for(var/atom/a in _bottom)
if(can_bump(a))
if(a in bottom)
bottom[a] += 1
a.stepping_on(src, bottom[a])
else
bottom[a] = 1
a.stepped_on(src)
for(var/atom/a in bottom)
if(!(a in _bottom))
bottom -= a
a.stepped_off(src)
This is the code that keeps track of what turfs the mob is standing on and calls the stepped_on, stepped_off, and stepping_on events.
I changed it to this:
if(moved)
// the way this is, set_pos should take about 50% of the
// time that pixel_move takes.
if(on_ground || was_on_ground)
was_on_ground = on_ground
var/list/_bottom = below(1)
for(var/atom/a in _bottom)
if(can_bump(a))
if(a in bottom)
bottom[a] += 1
a.stepping_on(src, bottom[a])
else
bottom[a] = 1
a.stepped_on(src)
for(var/atom/a in bottom)
if(!(a in _bottom))
bottom -= a
a.stepped_off(src)
else
for(var/atom/a in bottom)
bottom[a] += 1
a.stepping_on(src, bottom[a])
This improves performance two ways:
1. It only runs when the mob moves. If your game has a lot of idle mobs this'll make a huge difference.
2. It only runs when the mob is on the ground (that's a requirement to be stepping on something). This also means that mobs who don't call set_flags (i.e. projectiles) won't have on_ground set, so they'll never call this.
I ran the demo-1 benchmark with the two different versions of the code I've shown here:
The unoptimized one (the first one): set_pos() took 119 μs per call
The optimized one (the second one): set_pos() took 22 μs per call
Unfortunately that's not as impressive as it looks. Most of the mobs in this benchmark don't use set_flags so they're never executing this code to check for stepping on atoms.
pixel_move()
pixel_move is hard to optimize because it's always essential, every mob that moves has to call it. The bright side to this is that because every mob is calling it, improvements to pixel_move will have a bigger impact.
If you call pixel_move(0,0,0), which means that you're asking it to move the mob by zero pixels, the proc will exit earlier and avoid doing some unnecessary work.
I also added a compile-time flag called TWO_DIMENSIONAL. When TWO_DIMENSIONAL is enabled, collision detection is treated as purely 2D. By default the library gives you 3D movement and collision detection - players can move around the x-y plane and jump. Some games won't use this (ex: Zelda 1), so the time spent checking the third dimension is all wasted. By setting the TWO_DIMENSIONAL flag, the pixel_move proc uses some alternative logic* to avoid this unnecessary processing.
* because TWO_DIMENSIONAL is a compile-time flag, the library gets compiled with the alternate logic. There's not an if() statement in the code that always checks TWO_DIMENSIONAL, there are #ifdef TWO_DIMENSIONAL compiler directives that change what code is being compiled.
Obviously, if you do use the 3D movement you'll see no performance gain. But, for games that can use the TWO_DIMENSIONAL flag, here are some numbers for how long pixel_move took based on running the demo-1 benchmark:
Without TWO_DIMENSIONAL, 155 μs per call
With TWO_DIMENSIONAL, 71 μs per call
With the TWO_DIMENSIONAL flag enabled, it's very likely that you don't need to call set_flags for any mob. The library only uses it to set the on_ground var which is used for jumping. If you don't use the on_left, on_right, on_top, and on_bottom vars then you don't need to call set_flags. You can put #define NO_FLAGS to disable the flags vars/procs (atom.flags, atom.flags_left, atom.flags_right, etc., mob.on_ground, mob.on_left, etc., mob.set_flags()).
Conclusion
// before
movement() // 592 (592 = 377 + 10 + 30 + 20 + 155)
set_flags() // 377
gravity() // 10
action() // 30
set_state() // 20
pixel_move() // 155
// after
movement() // 338
set_flags() // 123
gravity() // 10
action() // 30
set_state() // 20
pixel_move() // 155
// after (with TWO_DIMENSIONAL defined)
movement() // 244
set_flags() // 123
action() // 30
set_state() // 20
pixel_move() // 71
// after (with TWO_DIMENSIONAL and NO_FLAGS defined)
movement() // 121
action() // 30
set_state() // 20
pixel_move() // 71
The first block shows what the performance used to be (with made up values for action, gravity, and set_state, but none of them changed so that overhead is constant). The second block shows what the new performance is. The third and fourth blocks show how much performance further increases when you can make use of the TWO_DIMENSIONAL and NO_FLAGS compiler options.
If we use the numbers from the last block (121 μs for each mob's movement loop), here's what the CPU usage will be for different framerates with different amounts of mobs:
fps mobs CPU 40 25 12% 40 50 24% 40 75 36% 40 100 48% 30 25 9% 30 50 18% 30 75 27% 30 100 36%