One of my work mates had some code with a lot of floating point clamps in it the other day so I wrote this little branch free version using the PS3’s floating point select intrinsic:
float Clamp(float x, float lower, float upper)
{
float t = __fsels(x-lower, x, lower);
return __fsels(t-upper, upper, t);
}
__fsels basically does this:
float __fsels(float x, float a, float b)
{
return (x >= 0.0f) ? a : b
}
I measured it to be 8% faster than a standard implementation, not a whole lot but quite fun to write. The SPUs have quite general selection functionality which is more useful, some stuff about it here:
http://realtimecollisiondetection.net/blog/?p=90
(Not sure about this free WordPress code formatting, I may have to move it to my own host soon)
3 Comments
on the PPU there is no real fsels instruction. Its just fsel with some casts to float done for you for convenience. This MAY ( and I say may because I haven’t tested it out yet ) be faster if you make t a double and change your first __fsels to an __fsel. That way you can avoid one double to float cast and another float to double cast.
I haven’t looked at the generated assembly, but if you are calling this in a loop it may also help to do a bit of “pipelining,”
i.e. do the conversions/casts for iteration n+1 before you do the clamp for iteration n.
Anyway, its one thing to look into and you can tell really easily if it helps ( or makes it slower! )
Thanks, I should have realised that looking at the disassembly.
I did some more timing today and am less convinced it’s always faster than a standard implementation, although as you suggested converting some of the parameters to doubles did seem to help.
I’ll post some more detailed stats when I get a chance.