On 2013-04-02 07:41, Alexander Graf wrote: >> On 2013-04-01 23:34, Alexander Graf wrote: >>> Is this faster than a load/store with std/ldbrx? >> >> Hmm. Almost certainly not. And since we've got stack space >> allocated for function calls, we've got scratch space to do it in. >> >> Probably similar for bswap32 too, eh? > > Depends - memory load/store doesn't come for free and bswap32 is quite short. > >> >> I'll do a tiny bit o benchmarking for power7. > > Cool, thanks a bunch :) Heh. "Almost certainly not" indeed. Unless I've made some silly mistake, going through memory stalls badly. No store buffer forwarding on power7? With the following test case, time reports: f1 2.967s f2 8.930s f3 7.071s f4 7.166s And note that f4 is a normal store/load pair, trying to determine what the store buffer forwarding delay might be. r~