All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@MIT.EDU>
To: x86@kernel.org
Cc: linux-kernel@vger.kernel.org, John Stultz <johnstul@us.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andy Lutomirski <luto@MIT.EDU>
Subject: [PATCH 0/6] x86-64: Micro-optimize vclock_gettime
Date: Mon, 28 Mar 2011 11:06:40 -0400	[thread overview]
Message-ID: <cover.1301324270.git.luto@mit.edu> (raw)

This series speeds up vclock_gettime(CLOCK_MONOTONIC) on by almost 30%
(tested on Sandy Bridge).  They're ordered in roughly decreasing order
of improvement.

These are meant for 2.6.40, but if anyone wants to take some of them
for 2.6.39 I won't object.

The changes and timings (fastest of 20 trials of 100M iters on Sandy
Bridge) are:

Unpatched:

CLOCK_MONOTONIC: 22.09ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.65ns

x86-64: Optimize vread_tsc's barriers

This replaces lfence;rdtsc;lfence with a faster sequence with similar
ordering guarantees.

CLOCK_MONOTONIC: 18.28ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.98ns

x86-64: Don't generate cmov in vread_tsc

GCC likes to generate a cmov on a branch that's almost completely
predictable.  Force it to generate a real branch instead.

CLOCK_MONOTONIC: 16.30ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.95ns

x86-64: Put vsyscall_gtod_data at a fixed virtual address

Because vsyscall_gtod_data's address isn't known until load time, the
code contains unnecessary address calculations.  Hardcode it.  This is
a nice speedup for the _COARSE variants as well.

CLOCK_MONOTONIC: 16.12ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 5.31ns

x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0

vset_normalize_timespec was more general than necessary.  Open-code
the appropriate normalization loops.  This is a big win for
CLOCK_MONOTONIC_COARSE

CLOCK_MONOTONIC: 16.09ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 4.49ns

x86-64: Omit frame pointers on vread_tsc

This is a bit silly and needs work for gcc < 4.4 (if we even care),
but, rather surprisingly, it's 0.3ns faster.  I guess that the CPU's
stack frame optimizations aren't quite as good as I thought.

CLOCK_MONOTONIC: 15.79ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 4.50ns

x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO

We're building the vDSO with optimizations disabled that were meant
for kernel code.  Override that, except for -fno-omit-frame-pointers,
which might make userspace debugging harder.

CLOCK_MONOTONIC: 15.66ns
CLOCK_REALTIME_COARSE: 3.44ns
CLOCK_MONOTONIC_COARSE: 4.23ns


Andy Lutomirski (6):
  x86-64: Optimize vread_tsc's barriers
  x86-64: Don't generate cmov in vread_tsc
  x86-64: Put vsyscall_gtod_data at a fixed virtual address
  x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
  x86-64: Omit frame pointers on vread_tsc
  x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO

 arch/x86/kernel/tsc.c          |   48 ++++++++++++++++++++++++++++++++-------
 arch/x86/kernel/vmlinux.lds.S  |   13 +++++-----
 arch/x86/vdso/Makefile         |   15 +++++++++++-
 arch/x86/vdso/vclock_gettime.c |   40 ++++++++++++++++++---------------
 arch/x86/vdso/vextern.h        |    9 ++++++-
 5 files changed, 90 insertions(+), 35 deletions(-)

-- 
1.7.4


             reply	other threads:[~2011-03-28 15:12 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-28 15:06 Andy Lutomirski [this message]
2011-03-28 15:06 ` [PATCH 1/6] x86-64: Optimize vread_tsc's barriers Andy Lutomirski
2011-03-29  6:18   ` Ingo Molnar
2011-03-28 15:06 ` [PATCH 2/6] x86-64: Don't generate cmov in vread_tsc Andy Lutomirski
2011-03-29  6:15   ` Ingo Molnar
2011-03-29 11:52     ` Andrew Lutomirski
2011-03-28 15:06 ` [PATCH 3/6] x86-64: Put vsyscall_gtod_data at a fixed virtual address Andy Lutomirski
2011-03-28 17:49   ` Thomas Gleixner
2011-03-28 18:09     ` Andrew Lutomirski
2011-03-28 21:35     ` Andrew Lutomirski
2011-03-28 23:13       ` Thomas Gleixner
2011-03-28 15:06 ` [PATCH 4/6] x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0 Andy Lutomirski
2011-03-29  6:21   ` Ingo Molnar
2011-03-29 11:54     ` Andrew Lutomirski
2011-03-28 15:06 ` [PATCH 5/6] x86-64: Omit frame pointers on vread_tsc Andy Lutomirski
2011-03-29  6:24   ` Ingo Molnar
2011-03-28 15:06 ` [PATCH 6/6] x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO Andy Lutomirski
2011-03-29  6:27 ` [PATCH 0/6] x86-64: Micro-optimize vclock_gettime Ingo Molnar
2011-04-06 18:20 ` Andi Kleen
2011-04-06 20:10   ` Andrew Lutomirski
2011-04-06 20:14     ` Andi Kleen
2011-04-06 20:49       ` Andrew Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1301324270.git.luto@mit.edu \
    --to=luto@mit.edu \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.