From: "Michael S. Tsirkin" <mst@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, virtualization@lists.linux-foundation.org,
qemu-devel@nongnu.org
Subject: [PATCH v6] x86: use lock+addl for smp_mb()
Date: Fri, 27 Oct 2017 19:14:31 +0300 [thread overview]
Message-ID: <1509118355-4890-1-git-send-email-mst@redhat.com> (raw)
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.
Results:
perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0
Before:
0.922565990 seconds time elapsed ( +- 1.15% )
After:
0.578667024 seconds time elapsed ( +- 1.21% )
Just poking at SP would be the most natural, but if we then read the
value from SP, we get a false dependency which will slow us down.
This was noted in this article:
http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
And is easy to reproduce by sticking a barrier in a small non-inline
function.
So let's use a negative offset - which avoids this problem since we
build with the red zone disabled.
For userspace, use an address just below the redzone.
The one difference between lock+add and mfence is that lock+addl does
not affect clflush, previous patches converted all uses of clflush to
call mb(), such that changes to smp_mb won't affect it.
Update mb/rmb/wmb on 32 bit to use the negative offset, too, for
consistency.
As a follow-up, it might be worth considering switching users
of clflush to another API (e.g. clflush_mb?) - we will
then be able to convert mb to smp_mb.
Also arguably, gcc should switch to use lock+add for __sync_synchronize.
This might be worth pursuing separately.
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/x86/include/asm/barrier.h | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
Changes from v5:
- ringtest update
- document mb() interaction with clflush
- add micro-benchmark results
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index bfb28ca..3c6ba1e 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -11,11 +11,11 @@
*/
#ifdef CONFIG_X86_32
-#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \
+#define mb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "mfence", \
X86_FEATURE_XMM2) ::: "memory", "cc")
-#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \
+#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "lfence", \
X86_FEATURE_XMM2) ::: "memory", "cc")
-#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \
+#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "sfence", \
X86_FEATURE_XMM2) ::: "memory", "cc")
#else
#define mb() asm volatile("mfence":::"memory")
@@ -30,7 +30,11 @@
#endif
#define dma_wmb() barrier()
-#define __smp_mb() mb()
+#ifdef CONFIG_X86_32
+#define __smp_mb() asm volatile("lock; addl $0,-4(%%esp)" ::: "memory", "cc")
+#else
+#define __smp_mb() asm volatile("lock; addl $0,-4(%%rsp)" ::: "memory", "cc")
+#endif
#define __smp_rmb() dma_rmb()
#define __smp_wmb() barrier()
#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
diff --git a/tools/virtio/ringtest/main.h b/tools/virtio/ringtest/main.h
index 90b0133..5706e07 100644
--- a/tools/virtio/ringtest/main.h
+++ b/tools/virtio/ringtest/main.h
@@ -110,11 +110,15 @@ static inline void busy_wait(void)
barrier();
}
+#if defined(__x86_64__) || defined(__i386__)
+#define smp_mb() asm volatile("lock; addl $0,-128(%%rsp)" ::: "memory", "cc")
+#else
/*
* Not using __ATOMIC_SEQ_CST since gcc docs say they are only synchronized
* with other __ATOMIC_SEQ_CST calls.
*/
#define smp_mb() __sync_synchronize()
+#endif
/*
* This abuses the atomic builtins for thread fences, and
--
MST
next reply other threads:[~2017-10-27 16:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-27 16:14 Michael S. Tsirkin [this message]
2017-11-09 9:32 ` [PATCH v6] x86: use lock+addl for smp_mb() Peter Zijlstra
2017-11-10 9:40 ` [tip:locking/core] locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE tip-bot for Michael S. Tsirkin
2017-11-10 10:42 ` tip-bot for Michael S. Tsirkin
2017-11-10 12:49 ` tip-bot for Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1509118355-4890-1-git-send-email-mst@redhat.com \
--to=mst@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=tglx@linutronix.de \
--cc=virtualization@lists.linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox