ID:149640
 
I'm trying to squeeze out every bit of extra processing power I can for a new project of mine. I've optimized this proc as much as I could, but there are a few people who frequent the forum who might have new insight to shave a couple milliseconds off.

The proc I'm trying to get more out of:
        ring(px,py,radius,sd_color/color)
// Draw a ring centered at px,py, with specified radius in specified color

//calculate x and y offsets for the first 45 degrees of the circle
// the maximum dy will be radius*sqrt(2)/2, or 0.707*radius
var/sqrtwo = round(radius * 0.707)

for(var/dy = 0 to sqrtwo) // for every dy from 0 to the max

// lookup the precalculated dx for this dy and radius
var/dx = dxlookup[dy+1][radius]

// use these offsets to plot pixels in each quadrant
plot(px + dx, py + dy, color)
plot(px - dx, py + dy, color)
plot(px + dx, py - dy, color)
plot(px - dx, py - dy, color)


/* if dx = dy, it's a 45 degree angle and we
don't need to plot the complimentary angle */

if(dx == dy) continue

// swap dx and dy to plot the complimentary angle in each quadrant
plot(px + dy, py + dx, color)
plot(px - dy, py + dx, color)
plot(px + dy, py - dx, color)
plot(px - dy, py - dx, color)


This is just support code to set up the dxlookup table when the world boots:
world
New()
..()
for(var/dy = 0 to 22)
for(var/radius = 1 to 32)
var/dx = round(radius * cos(arcsin(dy/radius)))
dxlookup[dy+1][radius] = dx

var
list
dxlookup[23][32] // stores the dx for any given dy and radius


Plot() is already pretty efficient with the following profiling imformation, so I'm not worrying about it at the moment.
Proc                    Self CPU Time   Total CPU Time  Real Time       Calls
/proc/plot (total)      3.380           5.270           6.760           53085
/proc/plot (average)    0.000           0.000           0.000           53872

I can see you running into a few problems on this, but one place you can squeeze a little improvement right now is in the loop construction:
for(var/dy = 0 to sqrtwo) // for every dy from 0 to the max

Later in the loop you check if dx==dy. However, this is also a perfect place to stop. In fact if dx<=dy, you're done; in the dx<dy case, it'll draw 4 redundant pixels and then quit; this should occur only with about half the possible radii, so altogether it's not bad. (Putting in an if() to check for this special case would probably be counterproductive.) Therefore you can change the loop like this:
for(var/dy = 0 to radius)
... // draw +dx,+dy
if(dx<=dy) break
... // draw +dy,+dx (complementary)

You can eliminate sqrtwo altogether this way, shaving off another small piece of time.
The reason the <= check can still be put in the middle is that the first time dx<dy, dx=dy-1. With the angle range you're working with, dx only decrements by a single pixel each time. Therefore, dx==dy is a 45-degree angle exactly, but dx==dy-1 means you've gone just past it, the previous value being old_dx==old_dy+1.

Incidentally, you can speed your dxlookup code too. Right now you're using this:
for(var/dy = 0 to 22)
for(var/radius = 1 to 32)
var/dx = round(radius * cos(arcsin(dy/radius)))
dxlookup[dy+1][radius] = dx

Well, there are a few things you can do: The most obvious in this loop is to break after dx<dy (but make sure you assign it first); you have to rearrange the loop order to do that, with radius going first. I'd do the whole thing this way:
for(var/radius = 1 to 32)
var/dx=radius
var/over=radius
for(var/dy = 0 to radius)
dxlookup[dy+1][radius] = dx
if(dx<=dy) break
over-=dy+dy+1
if(over<0)
--dx
over+=dx+dx+1

Look confusing? It is.

The over var you see is actually a running value equal to radius*(radius+1)-dx*dx-dy*dy. The radius*(radius+1) value is the closest integer to (radius+0.5)**2, so it's basically wiggle room. I subtract from over to represent dy going up by 1 (dy*2+1 == (dy+1)**2-dy**2), and add to over to represent dx going down. I would think this would be faster than doing a cosine and a round on every value of dy. Notice there are also no actual multiplications involved. This is possible because it takes advantage of the fact that it's in a loop, so the values don't need to be recalculated from scratch each time.

The fact that DM's numbers are all in floating point might hurt the speedup a little, though, because I would think floating point addition wouldn't be quite as fast as integer addition. It should, however, still beat multiplication and cosines.

Lummox JR
In response to Lummox JR
Lummox, did you just sit down and figure this stuff out, take a class that pointed you in the right direction to solve these types of problems, or do you have a book that gives pointers on this type of thing?

As I learn more about programming, take more classes, and get more experience I'm guessing solving these types of problems will become easier (not easy, just easier) and a little more intuitive. I was just wondering where you got started in this area.
In response to English
English wrote:
Lummox, did you just sit down and figure this stuff out, take a class that pointed you in the right direction to solve these types of problems, or do you have a book that gives pointers on this type of thing?

Actually I've done some playing around with integer math square root calculations before, so it gave me a good footing for this. I thought it'd be fun to try to figure out the most optimized way of doing it.

Back in the 80s I saw an article in Compute! magazine on how to draw a circle quickly using machine language code on the Atari platform. The code was there but I didn't learn how to translate that to assembly until much later, and I never did end up seeing the assembler code--I figure it would've been interesting. The article spoke about how the symmetry of a circle can speed a lot up, but then it just gave the code and never went into specifics about how it did that. I think today when I wrote that loop I got a much better idea.

As I learn more about programming, take more classes, and get more experience I'm guessing solving these types of problems will become easier (not easy, just easier) and a little more intuitive. I was just wondering where you got started in this area.

Well the above code came from knowing something about optimization, programmatical and mathematical. I'd hardly call my dxlookup look intuitive, though. :)

Lummox JR
In response to Lummox JR
Thank you, Lummox JR. I knew if anyone could remove some delay from it, you could. I was thinking even as I posted the code that I could change that continue to a break, but I don't know if I would ever have thought to eliminate sqrtwo altogether.

Lummox JR wrote:
Incidentally, you can speed your dxlookup code too.

I'm not so concerned with the lookup generation, since it is run only once. That is a very interesting routine for generating the look up table though. That ring proc is a key factor in one of my projects though, and is being called up to 40 times a tick just in the single player version of the game. Multiplayer could concievably add 20 more times per player.

Thank you again. :)