From: Andy Lutomirski <luto@MIT.EDU>
To: x86@kernel.org
Cc: linux-kernel@vger.kernel.org, John Stultz <johnstul@us.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andy Lutomirski <luto@MIT.EDU>
Subject: [PATCH 0/6] x86-64: Micro-optimize vclock_gettime
Date: Mon, 28 Mar 2011 11:06:40 -0400 [thread overview]
Message-ID: <cover.1301324270.git.luto@mit.edu> (raw)
This series speeds up vclock_gettime(CLOCK_MONOTONIC) on by almost 30%
(tested on Sandy Bridge). They're ordered in roughly decreasing order
of improvement.
These are meant for 2.6.40, but if anyone wants to take some of them
for 2.6.39 I won't object.
The changes and timings (fastest of 20 trials of 100M iters on Sandy
Bridge) are:
Unpatched:
CLOCK_MONOTONIC: 22.09ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.65ns
x86-64: Optimize vread_tsc's barriers
This replaces lfence;rdtsc;lfence with a faster sequence with similar
ordering guarantees.
CLOCK_MONOTONIC: 18.28ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.98ns
x86-64: Don't generate cmov in vread_tsc
GCC likes to generate a cmov on a branch that's almost completely
predictable. Force it to generate a real branch instead.
CLOCK_MONOTONIC: 16.30ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.95ns
x86-64: Put vsyscall_gtod_data at a fixed virtual address
Because vsyscall_gtod_data's address isn't known until load time, the
code contains unnecessary address calculations. Hardcode it. This is
a nice speedup for the _COARSE variants as well.
CLOCK_MONOTONIC: 16.12ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 5.31ns
x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
vset_normalize_timespec was more general than necessary. Open-code
the appropriate normalization loops. This is a big win for
CLOCK_MONOTONIC_COARSE
CLOCK_MONOTONIC: 16.09ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 4.49ns
x86-64: Omit frame pointers on vread_tsc
This is a bit silly and needs work for gcc < 4.4 (if we even care),
but, rather surprisingly, it's 0.3ns faster. I guess that the CPU's
stack frame optimizations aren't quite as good as I thought.
CLOCK_MONOTONIC: 15.79ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 4.50ns
x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO
We're building the vDSO with optimizations disabled that were meant
for kernel code. Override that, except for -fno-omit-frame-pointers,
which might make userspace debugging harder.
CLOCK_MONOTONIC: 15.66ns
CLOCK_REALTIME_COARSE: 3.44ns
CLOCK_MONOTONIC_COARSE: 4.23ns
Andy Lutomirski (6):
x86-64: Optimize vread_tsc's barriers
x86-64: Don't generate cmov in vread_tsc
x86-64: Put vsyscall_gtod_data at a fixed virtual address
x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
x86-64: Omit frame pointers on vread_tsc
x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO
arch/x86/kernel/tsc.c | 48 ++++++++++++++++++++++++++++++++-------
arch/x86/kernel/vmlinux.lds.S | 13 +++++-----
arch/x86/vdso/Makefile | 15 +++++++++++-
arch/x86/vdso/vclock_gettime.c | 40 ++++++++++++++++++---------------
arch/x86/vdso/vextern.h | 9 ++++++-
5 files changed, 90 insertions(+), 35 deletions(-)
--
1.7.4
next reply other threads:[~2011-03-28 15:12 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-28 15:06 Andy Lutomirski [this message]
2011-03-28 15:06 ` [PATCH 1/6] x86-64: Optimize vread_tsc's barriers Andy Lutomirski
2011-03-29 6:18 ` Ingo Molnar
2011-03-28 15:06 ` [PATCH 2/6] x86-64: Don't generate cmov in vread_tsc Andy Lutomirski
2011-03-29 6:15 ` Ingo Molnar
2011-03-29 11:52 ` Andrew Lutomirski
2011-03-28 15:06 ` [PATCH 3/6] x86-64: Put vsyscall_gtod_data at a fixed virtual address Andy Lutomirski
2011-03-28 17:49 ` Thomas Gleixner
2011-03-28 18:09 ` Andrew Lutomirski
2011-03-28 21:35 ` Andrew Lutomirski
2011-03-28 23:13 ` Thomas Gleixner
2011-03-28 15:06 ` [PATCH 4/6] x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0 Andy Lutomirski
2011-03-29 6:21 ` Ingo Molnar
2011-03-29 11:54 ` Andrew Lutomirski
2011-03-28 15:06 ` [PATCH 5/6] x86-64: Omit frame pointers on vread_tsc Andy Lutomirski
2011-03-29 6:24 ` Ingo Molnar
2011-03-28 15:06 ` [PATCH 6/6] x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO Andy Lutomirski
2011-03-29 6:27 ` [PATCH 0/6] x86-64: Micro-optimize vclock_gettime Ingo Molnar
2011-04-06 18:20 ` Andi Kleen
2011-04-06 20:10 ` Andrew Lutomirski
2011-04-06 20:14 ` Andi Kleen
2011-04-06 20:49 ` Andrew Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1301324270.git.luto@mit.edu \
--to=luto@mit.edu \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.