On 29/11/2013 19:36, Peter Zijlstra wrote:
> Hi all,
> 
> This series is supposed to optimize the kernel/sched/clock.c and x86
> sched_clock() implementations.
> 
> So far its only been boot tested. So no clue if it really makes the thing
> faster, but it does remove the need to disable IRQs.
> 
> I'm hoping Eliezer will test this with his benchmark where he could measure a
> performance regression between using sched_clock() and local_clock().

So I tested and retested, but I'm not sure I understand the results.

The numbers I previously reported were with turbo boost enabled.
Since turbo boost changes the CPU frequency depending on how hot it is,
it has a complicated interaction with busy polling.
In general you see better numbers, but it's harder to tell what's
going on.

With turbo boost disabled in BIOS to try to get a more linear behavior I
see:

3.13.0-rc2 (no pathces) 82.0 KRR/s
with busy poll using local clock 80.2 KRR/s.
Note that there is a big variance between cores and on the SMT sibling
of the core that has the packets steered to I see 81.8 KRR/s. (on the
other tests this core is slightly lower than on the one that accepts the
packets, I'm not sure I can explain this.)
local clock + sched_clock patches 80.6 KRR/s
sched patches (busy poll using sched_clock) 80.6 KRR/s

Maybe I'm doing something wrong?

Perf clearly affects the netperf results but the delta is only a few
percent so the numbers might still be good.
On the other hand, I'm seeing repeated warnings that the perf MNI
handler took too long to run, and I need to reboot to get perf to run
again.

Attached are the perf outputs.

If you can think of any other interesting tests, or anything I'm doing
wrong, I'm open to suggestions.

Thanks,
Eliezer