Philippe Gerum wrote:
> Here is likely why we have different levels of accuracy and performance,
>  firstly my version is bluntly based on the khz freq, secondly it
> calculates the other way around, i.e. ns2tsc, so that tsc are keep in
> the inner code, but more efficiently converted from ns counts passed to
> the outer interface:
> 
> static unsigned long ns2cyc_scale;
> #define NS2CYC_SCALE_FACTOR 10 /* 2^10, carefully chosen */
> 
> static inline void set_ns2cyc_scale(unsigned long cpu_khz)
> {
>     ns2cyc_scale = (cpu_khz << NS2CYC_SCALE_FACTOR) / 1000000;
> }
> 
> static inline unsigned long long ns_2_cycles(unsigned long long ns)
> {
>     return ns * ns2cyc_scale >> NS2CYC_SCALE_FACTOR;
> }

Your version performs ~50% better than mine (outperforming the original
version by factor 7 on a 1 GHz box, vs. 4.8). I think you compared
non-optimised code, didn't you? Without -O2, I see 15 times better
performance.

[Gilles variant yet refuses the get benchmarked.]

Jan