public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Micro-optimize vclock_gettime
@ 2011-05-10 14:15 Andy Lutomirski
  2011-05-10 14:15 ` [PATCH v3 1/6] x86-64: Clean up vdso/kernel shared variables Andy Lutomirski
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Andy Lutomirski @ 2011-05-10 14:15 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ingo Molnar, Andi Kleen, Linus Torvalds,
	Nick Piggin, David S. Miller, Eric Dumazet, Peter Zijlstra,
	Thomas Gleixner, Andy Lutomirski

This series speeds up vclock_gettime(CLOCK_MONOTONIC) on by almost
30% (tested on Sandy Bridge).  These patches are intended for
2.6.40, and if I'm feeling really ambitious I'll try to shave a few
more ns off for 2.6.41.  (There are lots more optimization
opportunities in there.)

x86-64: Clean up vdso/kernel shared variables

Because vsyscall_gtod_data's address isn't known until load time,
the code contains unnecessary address calculations.  The code is
also rather complicated.  Clean it up and use addresses that are
known at compile time.

x86-64: Remove unnecessary barrier in vread_tsc

A fair amount of testing on lots of machines has failed to find a
single example in which the barrier *after* rdtsc is needed. So
remove it.  (The manuals give no real justification for it, and
rdtsc has no dependencies so there's no sensible reason for a CPU to
delay it.)

x86-64: Don't generate cmov in vread_tsc

GCC likes to generate a cmov on a branch that's almost completely
predictable.  Force it to generate a real branch instead.

x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0

vset_normalize_timespec was more general than necessary.  Open-code
the appropriate normalization loops.  This is a big win for
CLOCK_MONOTONIC_COARSE.

x86-64: Move vread_tsc into a new file with sensible options

This way vread_tsc doesn't have a frame pointer, with saves about
0.3ns.  I guess that the CPU's stack frame optimizations aren't quite
as good as I thought.

x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO

We're building the vDSO with optimizations disabled that were meant
for kernel code.  Override that, except for -fno-omit-frame-pointers,
which might make userspace debugging harder.

Changes from v2:
 - Just remove the second barrier instead of hacking it.  Tests
   still pass.

Changes from v1:
 - Redo the vsyscall_gtod_data address patch to make the code
  cleaner instead of uglier and to make it work for all the
  vsyscall variables.
 - Improve the comments for clarity and formatting.
 - Fix up the changelog for the nsec < 0 tweak (the normalization
  code can't be inline because the two callers are different).
 - Move vread_tsc into its own file, removing a GCC version
  dependence and making it more maintainable.

Andy Lutomirski (6):
  x86-64: Clean up vdso/kernel shared variables
  x86-64: Remove unnecessary barrier in vread_tsc
  x86-64: Don't generate cmov in vread_tsc
  x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
  x86-64: Move vread_tsc into a new file with sensible options
  x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO

 arch/x86/include/asm/tsc.h      |    4 +++
 arch/x86/include/asm/vdso.h     |   14 ----------
 arch/x86/include/asm/vgtod.h    |    2 -
 arch/x86/include/asm/vsyscall.h |   12 +-------
 arch/x86/include/asm/vvar.h     |   52 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/Makefile        |    8 +++--
 arch/x86/kernel/time.c          |    2 +-
 arch/x86/kernel/tsc.c           |   19 --------------
 arch/x86/kernel/vmlinux.lds.S   |   34 ++++++++-----------------
 arch/x86/kernel/vread_tsc_64.c  |   36 +++++++++++++++++++++++++++
 arch/x86/kernel/vsyscall_64.c   |   46 +++++++++++++++-------------------
 arch/x86/vdso/Makefile          |   17 +++++++++++-
 arch/x86/vdso/vclock_gettime.c  |   43 +++++++++++++++++---------------
 arch/x86/vdso/vdso.lds.S        |    7 -----
 arch/x86/vdso/vextern.h         |   16 ------------
 arch/x86/vdso/vgetcpu.c         |    3 +-
 arch/x86/vdso/vma.c             |   27 --------------------
 arch/x86/vdso/vvar.c            |   12 ---------
 18 files changed, 170 insertions(+), 184 deletions(-)
 create mode 100644 arch/x86/include/asm/vvar.h
 create mode 100644 arch/x86/kernel/vread_tsc_64.c
 delete mode 100644 arch/x86/vdso/vextern.h
 delete mode 100644 arch/x86/vdso/vvar.c

-- 
1.7.5.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-05-12 11:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-10 14:15 [PATCH v3 0/6] Micro-optimize vclock_gettime Andy Lutomirski
2011-05-10 14:15 ` [PATCH v3 1/6] x86-64: Clean up vdso/kernel shared variables Andy Lutomirski
2011-05-10 14:48   ` Borislav Petkov
2011-05-12 11:16     ` Andrew Lutomirski
2011-05-10 14:15 ` [PATCH v3 2/6] x86-64: Remove unnecessary barrier in vread_tsc Andy Lutomirski
2011-05-10 14:15 ` [PATCH v3 3/6] x86-64: Don't generate cmov " Andy Lutomirski
2011-05-10 14:15 ` [PATCH v3 4/6] x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0 Andy Lutomirski
2011-05-10 14:15 ` [PATCH v3 5/6] x86-64: Move vread_tsc into a new file with sensible options Andy Lutomirski
2011-05-10 14:36   ` Peter Zijlstra
2011-05-10 14:15 ` [PATCH v3 6/6] x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox