I've been running some tests and they really don't seem to be as different as I had thought. Is bit shifting somehow so CPU intensive that it makes other factors such as testing, adding, and looping practically irrelivant?
These are two algorithms for finding what loc and pixel offsets will put an object at a certain set of coordinates (ix and iy). They produce the same output:
Etest()
var
ix
iy
lx
ly
px
py
for(var/count = 0; count < 100000;count++)
ix = rand(0,2000)
iy = rand(0,2000)
lx = ix>>5
ly = iy>>5
px = ix - (lx<<5) - 16
py = iy - (ly<<5) - 16
Etest2()
var
ix
iy
lx
ly
px
py
for(var/count = 0; count < 100000;count++)
ix = rand(0,2000)
iy = rand(0,2000)
lx = round(ix/32)
ly = round(iy/32)
px = ix%32 - 16
py = iy%32 - 16
Etest1() time - .250
Etest2() time - .266
The loops, with just the random numbers, take .093 to execute so these numbers are a bit more relivant:
Etest1() time = .157
Etest2() time = .173
So Etest1() takes about 90% as long to execute as Etest2().
That is a difference, but it seems that since division requires shifting, adding, and testing in multiple steps, the difference should be considerably larger.
The bit shifting version had two extra subtraction operations but the two divisions, two moduluses, and two calls to round should take significantly longer. There must be some concept or idea that I'm missing or misinterpreting.
I guess it comes down to this, is it really worth coming up with bit twiddling algorithms (which are not as conceptually straight forward) for such a small gain?</100000>
ID:153784
![]() Jan 27 2003, 10:31 am
|
|
English wrote:
Etest1() time = .157 Probably a major factor is the overhead of executing any statement at all in BYNOD, plus the overhead of converting a floating point number to an int for bit shifting, then converting it back. BYOND basically treats all numbers as single-precision floating point, so it has to do a conversion to handle bit shifting. The bit shifting version had two extra subtraction operations but the two divisions, two moduluses, and two calls to round should take significantly longer. There must be some concept or idea that I'm missing or misinterpreting. Question: Are the times you quoted based on taking out the random number part of the loop (the .093), or did you not take those out? If you didn't account for the random numbers, then the first proc takes 80% as long, not 90%, and the difference is more acute. But it's the above problem I mentioned that's at the heart of keeping the times so close. Overhead is the killer. Dividing by 32 also isn't quite as bad as dividing by other numbers, I would guess, since this works out to a floating multiplication of n×2-5; essentially all the coprocessor has to handle is changing the exponent, because floating point uses base 2. (In fact it could be that BYOND internalizes right-shifting to act something like a division in a round() proc, without doing a conversion to an integer first.) This suggests that if BYOND added some new bytecode to handle special specific number types, it could be much faster at bit twiddling. I guess it comes down to this, is it really worth coming up with bit twiddling algorithms (which are not as conceptually straight forward) for such a small gain? Considering this is a 10% savings (if not 20%) for a routine you can expect to be called fairly often, I'd say yes. However it couldn't hurt to profile all such routines vs. their counterparts so you know for sure that you're saving time, because the overhead does appear to be significant here. If overhead for bit twiddling ever goes down, it'll be preferable. For now I'd suggest mixing and matching to find the best possible time. One thing I can suggest is that it's probably possible to significantly reduce the time of Etest2(), since of the division and modulo and round(), one of the three is unnecessary. If you know the modulus, you can subtract it from the dividend before dividing. A few algorithms you can test that could be faster: px=(ix&31)-16 Lummox JR |
Probably a major factor is the overhead of executing any statement at all in BYNOD, plus the overhead of converting a floating point number to an int for bit shifting, then converting it back. BYOND basically treats all numbers as single-precision floating point, so it has to do a conversion to handle bit shifting. I profiled the loops with everything but the random number generation commented out. The .093 was the time for 100,000 iterations of that version of the loop. The .266 and .251 were total times with everything included. But it's the above problem I mentioned that's at the heart of keeping the times so close. Overhead is the killer. Dividing by 32 also isn't quite as bad as dividing by other numbers, I would guess, since this works out to a floating multiplication of n×2-5; essentially all the coprocessor has to handle is changing the exponent, because floating point uses base 2. (In fact it could be that BYOND internalizes right-shifting to act something like a division in a round() proc, without doing a conversion to an integer first.) I guess I should have payed more attention when my Assembly teacher talked about how computers handle floating points, I should probably go read that section of the book. Considering this is a 10% savings (if not 20%) for a routine you can expect to be called fairly often, I'd say yes. However it couldn't hurt to profile all such routines vs. their counterparts so you know for sure that you're saving time, because the overhead does appear to be significant here. I've been debating with myself whether to have one of these be a larger part of movement (and not just initial placement) because of some data redundancies I'd need for a second method I'm considering. I'll look into your suggestions to see which is best, it could make a difference in what approach I take. Thanks for the advice :) |
Lummox JR wrote:
One thing I can suggest is that it's probably possible to significantly reduce the time of Etest2(), since of the division and modulo and round(), one of the three is unnecessary. If you know the modulus, you can subtract it from the dividend before dividing. Lummox JR Unfortunately, that's not true. When BYOND does modulus division, it first converts numbers into an integer form. So 12.999%12 is still 0. You could force integer conversion by exploting this, though. Just do x%x+1, and you'll get a nice rounded-down version of x. But, like most tricks like this, I prefer not to trust them. |
It is true for integers, and that's what we're dealing with.
Here's a semi-proof: Let n, q, d, and r be integers such that: n = q*d + r and r < d where n is the dividend, q is the quotient, d is the divisor, and r is the remainder. Using modulus (n%d) you can get the remainder r, and define another integer as such: m = n - r and m = q*d and m/d = q Thus, dividing the integer m by the divisor d, gives the integer quotient q. No rounding necessary. |
Garthor wrote:
Unfortunately, that's not true. When BYOND does modulus division, it first converts numbers into an integer form. So 12.999%12 is still 0. You could force integer conversion by exploting this, though. Just do x%x+1, and you'll get a nice rounded-down version of x. But, like most tricks like this, I prefer not to trust them. x%x+1 will only give you 1, because % has a higher precedence than +. You need x%(x+1) for that trick to work. Lummox JR |
Not really, since Dreamseeker operates in a multitasking and mfc-based environment. I don't think pushing the cpu a few cycles will make a big difference.
/Gazoot