Philippe Gerum wrote: > Here is likely why we have different levels of accuracy and performance, > firstly my version is bluntly based on the khz freq, secondly it > calculates the other way around, i.e. ns2tsc, so that tsc are keep in > the inner code, but more efficiently converted from ns counts passed to > the outer interface: > > static unsigned long ns2cyc_scale; > #define NS2CYC_SCALE_FACTOR 10 /* 2^10, carefully chosen */ > > static inline void set_ns2cyc_scale(unsigned long cpu_khz) > { > ns2cyc_scale = (cpu_khz << NS2CYC_SCALE_FACTOR) / 1000000; > } > > static inline unsigned long long ns_2_cycles(unsigned long long ns) > { > return ns * ns2cyc_scale >> NS2CYC_SCALE_FACTOR; > } Your version performs ~50% better than mine (outperforming the original version by factor 7 on a 1 GHz box, vs. 4.8). I think you compared non-optimised code, didn't you? Without -O2, I see 15 times better performance. [Gilles variant yet refuses the get benchmarked.] Jan