On 4/22/15 5:25 PM, David Miller wrote: > From: David Ahern > Date: Wed, 22 Apr 2015 17:19:23 -0600 > >> Only thing left in my queue is optimized versions of the ffs / fls >> families, but that patch is v9b specific, not M7. > > Something faster than the popc thing in arch/sparc/lib/ffs.S? hmmm... i saw that, but wasn't clear 1) how it got inserted and 2) the overhead of a function call versus inline. Anyways, what I have is the same 3 instructions as an inline. But really the __ffs was just along for the ride; the focus was on __fls. > > Are you thinking of using "lzcnt"? I wasn't impressed with the > performance of that instruction last time I played around with it. A comparison of what I hacked together is attached (columns too wide for inline). Data is from a T4-2. It shows lzcnt to be better for __fls, fls and fl64. > >> I'd like to put some attention on precise mode for perf counters; it >> just has not bubbled to the top. > > That plus the backtrace deadlock thing we're discussing in another > thread, that bug is irritating because your pagefault_disable() change > should "just work". > oh, yes. forgot about that one. I spent too many hours trying to figure out why processes get killed with a sigbus. I added an option to perf tool to skip userspace chains until I can get back to it.