From: Andy Lutomirski <luto@MIT.EDU>
To: x86@kernel.org
Cc: linux-kernel@vger.kernel.org, John Stultz <johnstul@us.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andy Lutomirski <luto@MIT.EDU>
Subject: [PATCH 0/6] x86-64: Micro-optimize vclock_gettime
Date: Mon, 28 Mar 2011 11:06:40 -0400 [thread overview]
Message-ID: <cover.1301324270.git.luto@mit.edu> (raw)
This series speeds up vclock_gettime(CLOCK_MONOTONIC) on by almost 30%
(tested on Sandy Bridge). They're ordered in roughly decreasing order
of improvement.
These are meant for 2.6.40, but if anyone wants to take some of them
for 2.6.39 I won't object.
The changes and timings (fastest of 20 trials of 100M iters on Sandy
Bridge) are:
Unpatched:
CLOCK_MONOTONIC: 22.09ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.65ns
x86-64: Optimize vread_tsc's barriers
This replaces lfence;rdtsc;lfence with a faster sequence with similar
ordering guarantees.
CLOCK_MONOTONIC: 18.28ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.98ns
x86-64: Don't generate cmov in vread_tsc
GCC likes to generate a cmov on a branch that's almost completely
predictable. Force it to generate a real branch instead.
CLOCK_MONOTONIC: 16.30ns
CLOCK_REALTIME_COARSE: 4.23ns
CLOCK_MONOTONIC_COARSE: 5.95ns
x86-64: Put vsyscall_gtod_data at a fixed virtual address
Because vsyscall_gtod_data's address isn't known until load time, the
code contains unnecessary address calculations. Hardcode it. This is
a nice speedup for the _COARSE variants as well.
CLOCK_MONOTONIC: 16.12ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 5.31ns
x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
vset_normalize_timespec was more general than necessary. Open-code
the appropriate normalization loops. This is a big win for
CLOCK_MONOTONIC_COARSE
CLOCK_MONOTONIC: 16.09ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 4.49ns
x86-64: Omit frame pointers on vread_tsc
This is a bit silly and needs work for gcc < 4.4 (if we even care),
but, rather surprisingly, it's 0.3ns faster. I guess that the CPU's
stack frame optimizations aren't quite as good as I thought.
CLOCK_MONOTONIC: 15.79ns
CLOCK_REALTIME_COARSE: 3.70ns
CLOCK_MONOTONIC_COARSE: 4.50ns
x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO
We're building the vDSO with optimizations disabled that were meant
for kernel code. Override that, except for -fno-omit-frame-pointers,
which might make userspace debugging harder.
CLOCK_MONOTONIC: 15.66ns
CLOCK_REALTIME_COARSE: 3.44ns
CLOCK_MONOTONIC_COARSE: 4.23ns
Andy Lutomirski (6):
x86-64: Optimize vread_tsc's barriers
x86-64: Don't generate cmov in vread_tsc
x86-64: Put vsyscall_gtod_data at a fixed virtual address
x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
x86-64: Omit frame pointers on vread_tsc
x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO
arch/x86/kernel/tsc.c | 48 ++++++++++++++++++++++++++++++++-------
arch/x86/kernel/vmlinux.lds.S | 13 +++++-----
arch/x86/vdso/Makefile | 15 +++++++++++-
arch/x86/vdso/vclock_gettime.c | 40 ++++++++++++++++++---------------
arch/x86/vdso/vextern.h | 9 ++++++-
5 files changed, 90 insertions(+), 35 deletions(-)
--
1.7.4
next reply other threads:[~2011-03-28 15:12 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-28 15:06 Andy Lutomirski [this message]
2011-03-28 15:06 ` [PATCH 1/6] x86-64: Optimize vread_tsc's barriers Andy Lutomirski
2011-03-29 6:18 ` Ingo Molnar
2011-03-28 15:06 ` [PATCH 2/6] x86-64: Don't generate cmov in vread_tsc Andy Lutomirski
2011-03-29 6:15 ` Ingo Molnar
2011-03-29 11:52 ` Andrew Lutomirski
2011-03-28 15:06 ` [PATCH 3/6] x86-64: Put vsyscall_gtod_data at a fixed virtual address Andy Lutomirski
2011-03-28 17:49 ` Thomas Gleixner
2011-03-28 18:09 ` Andrew Lutomirski
2011-03-28 21:35 ` Andrew Lutomirski
2011-03-28 23:13 ` Thomas Gleixner
2011-03-28 15:06 ` [PATCH 4/6] x86-64: vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0 Andy Lutomirski
2011-03-29 6:21 ` Ingo Molnar
2011-03-29 11:54 ` Andrew Lutomirski
2011-03-28 15:06 ` [PATCH 5/6] x86-64: Omit frame pointers on vread_tsc Andy Lutomirski
2011-03-29 6:24 ` Ingo Molnar
2011-03-28 15:06 ` [PATCH 6/6] x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO Andy Lutomirski
2011-03-29 6:27 ` [PATCH 0/6] x86-64: Micro-optimize vclock_gettime Ingo Molnar
2011-04-06 18:20 ` Andi Kleen
2011-04-06 20:10 ` Andrew Lutomirski
2011-04-06 20:14 ` Andi Kleen
2011-04-06 20:49 ` Andrew Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1301324270.git.luto@mit.edu \
--to=luto@mit.edu \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox