public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] x86-64: Micro-optimize vclock_gettime
@ 2011-03-28 15:06 Andy Lutomirski
  2011-03-28 15:06 ` [PATCH 1/6] x86-64: Optimize vread_tsc's barriers Andy Lutomirski
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Andy Lutomirski @ 2011-03-28 15:06 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, John Stultz, Thomas Gleixner, Andy Lutomirski

This series speeds up vclock_gettime(CLOCK_MONOTONIC) on by almost 30%
(tested on Sandy Bridge).  They're ordered in roughly decreasing order
of improvement.

These are meant for 2.6.40, but if anyone wants to take some of them
for 2.6.39 I won't object.

The changes and timings (fastest of 20 trials of 100M iters on Sandy
Bridge) are:

Unpatched:

CLOCK_MONOTONIC: 22.09ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.65ns

x86-64: Optimize vread_tsc's barriers

This replaces lfence;rdtsc;lfence with a faster sequence with similar
ordering guarantees.

CLOCK_MONOTONIC: 18.28ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.98ns

x86-64: Don't generate cmov in vread_tsc

GCC likes to generate a cmov on a branch that's almost completely
predictable.  Force it to generate a real branch instead.

CLOCK_MONOTONIC: 16.30ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.95ns

x86-64: Put vsyscall_gtod_data at a fixed virtual address

Because vsyscall_gtod_data's address isn't known until load time, the
code contains unnecessary address calculations.  Hardcode it.  This is
a nice speedup for the _COARSE variants as well.

CLOCK_MONOTONIC: 16.12ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 5.31ns

x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0

vset_normalize_timespec was more general than necessary.  Open-code
the appropriate normalization loops.  This is a big win for
CLOCK_MONOTONIC_COARSE

CLOCK_MONOTONIC: 16.09ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 4.49ns

x86-64: Omit frame pointers on vread_tsc

This is a bit silly and needs work for gcc < 4.4 (if we even care),
but, rather surprisingly, it's 0.3ns faster.  I guess that the CPU's
stack frame optimizations aren't quite as good as I thought.

CLOCK_MONOTONIC: 15.79ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 4.50ns

x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO

We're building the vDSO with optimizations disabled that were meant
for kernel code.  Override that, except for -fno-omit-frame-pointers,
which might make userspace debugging harder.

CLOCK_MONOTONIC: 15.66ns
CLOCK_REALTIME_COARSE: 3.44ns
CLOCK_MONOTONIC_COARSE: 4.23ns


Andy Lutomirski (6):
  x86-64: Optimize vread_tsc's barriers
  x86-64: Don't generate cmov in vread_tsc
  x86-64: Put vsyscall_gtod_data at a fixed virtual address
  x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
  x86-64: Omit frame pointers on vread_tsc
  x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO

 arch/x86/kernel/tsc.c          |   48 ++++++++++++++++++++++++++++++++-------
 arch/x86/kernel/vmlinux.lds.S  |   13 +++++-----
 arch/x86/vdso/Makefile         |   15 +++++++++++-
 arch/x86/vdso/vclock_gettime.c |   40 ++++++++++++++++++---------------
 arch/x86/vdso/vextern.h        |    9 ++++++-
 5 files changed, 90 insertions(+), 35 deletions(-)

-- 
1.7.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-04-06 20:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-28 15:06 [PATCH 0/6] x86-64: Micro-optimize vclock_gettime Andy Lutomirski
2011-03-28 15:06 ` [PATCH 1/6] x86-64: Optimize vread_tsc's barriers Andy Lutomirski
2011-03-29  6:18   ` Ingo Molnar
2011-03-28 15:06 ` [PATCH 2/6] x86-64: Don't generate cmov in vread_tsc Andy Lutomirski
2011-03-29  6:15   ` Ingo Molnar
2011-03-29 11:52     ` Andrew Lutomirski
2011-03-28 15:06 ` [PATCH 3/6] x86-64: Put vsyscall_gtod_data at a fixed virtual address Andy Lutomirski
2011-03-28 17:49   ` Thomas Gleixner
2011-03-28 18:09     ` Andrew Lutomirski
2011-03-28 21:35     ` Andrew Lutomirski
2011-03-28 23:13       ` Thomas Gleixner
2011-03-28 15:06 ` [PATCH 4/6] x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0 Andy Lutomirski
2011-03-29  6:21   ` Ingo Molnar
2011-03-29 11:54     ` Andrew Lutomirski
2011-03-28 15:06 ` [PATCH 5/6] x86-64: Omit frame pointers on vread_tsc Andy Lutomirski
2011-03-29  6:24   ` Ingo Molnar
2011-03-28 15:06 ` [PATCH 6/6] x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO Andy Lutomirski
2011-03-29  6:27 ` [PATCH 0/6] x86-64: Micro-optimize vclock_gettime Ingo Molnar
2011-04-06 18:20 ` Andi Kleen
2011-04-06 20:10   ` Andrew Lutomirski
2011-04-06 20:14     ` Andi Kleen
2011-04-06 20:49       ` Andrew Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox