Why is POPC so slow? I need an ffs function for 64-bit arguments. Prompted by
the fact that lib ffs() only takes int arguments I wrote
a 3-line ffs64() function that uses popc, following the
classical example in the SPARC V9 Architecture Manual
and various other textbooks:
neg %o0, %o1
xnor %o0, %o1, %o0
retl
popc %o0, %o0
(zero inputs are filtered out before calling the function).
Much to my surprise this takes 1800ns to 2000ns per call
on a 1.2 GHz UltraSparc III depending on the number of
bits set, compared to 50ns to 125ns for the naive C
implementation (return ffs(low) if low != 0, otherwise
return ffs(high) + 32).
Digging a little deeper into the matter, I timed POPC by
itself, and sure enough it accounts for practically all
the time.
Why is POPC so slow?
dk |