c - more efficient way of flooring doubles to get array indices -


i have double x, , double y. need turn int boxnum, defined (floored) index in (x,y) falls in width x height grid squares of size box_size. coordinates in excess of width wrapped around; ditto height.

i using:

( (((int)(x))/box_size)%width+ width*((((int)(y))/box_size)%height) ) 

this statement eats 20% of execution time, , gets worse (on order of 40-50%) if make safe negative coordinates:

( (( ((int)(x)) /box_size)%width+width)%width     +width*(( (((int)(y)) /box_size)%height+height)%height) ) 

i'm considering converting application fixed-point, avoid this, can bitmask out part want instead of having horrible conversion.

is there better way of doing double->int conversion of kind? worth ensure 0<x<width*box_size , 0<y<height*box_size can drop 2 remainder operations? (doing difficult enough not worth while benchmark, unless significant improvement)

edit: after appropriate chastisement in comments, more details:

x , y coordinates of set of (as many 10^6) particles. using algorithm requires me to, @ every time step, simple sums across particles within box. thus, loop across particles, calculate box particle in, , use array index adding box. particles move far enough past locations no indication of future ones. unordered, means can make no assumptions this.

width, height, , box_size technically free, long width , height multiples of box_size. in practice specified compile time, , integers box_size=1. have run width=height=4 width=height=512, , while i'm on square power of 2 (because why not?), width=37;height=193 should work without issues.

this calculation unavoidable performed once per particle per timestep; in current implementation performed twice. tried caching value avoid recalculation, end benchmark performed worse, went calculating twice.

a basic test run 10 particles/box * 100 width * 100 height* 10000 steps = 1 billion particle*timesteps runs in shade on minute.

these coordinates on order of "regular numbers" (1-1000), i'm near kind of bound on double.

the problem code (int) cast causes floating-point unit's rounding mode change ieee754 default round nearest c standard round toward zero or 'truncate' defined in standard.

see gcc docs here more information on ieee754 rounding modes.

on modern pipelined processor, whole pipeline must flushed when rounding mode changed, causing massive slowdown pipleline emptied on every (int) cast. when doing in loop slowdowns experiencing typical.

there interesting article on issue erik de castro lopo (author of libsndfile , secret rabbit code). in audio conversion routines floating point rounding performance critical, , provides interesting set of solutions problem using posix lrintf() call, x86 assembly non-posix platforms.

the article can found here.

the short answer use c99/posix lrintf() function, or use inline assembly perform integer truncation without changing floating-point rounding mode.


Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

qt - Errors in generated MOC files for QT5 from cmake -