* [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
@ 2016-01-10 14:16 ` Michael S. Tsirkin
2016-01-12 16:28 ` Paul E. McKenney
2016-01-10 14:16 ` [PATCH v3 02/41] asm-generic: guard smp_store_release/load_acquire Michael S. Tsirkin
` (39 subsequent siblings)
40 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:16 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mips, linux-ia64, linux-sh, Peter Zijlstra,
Benjamin Herrenschmidt, Heiko Carstens, virtualization,
Paul Mackerras, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Davidlohr Bueso, Russell King - ARM Linux,
Arnd Bergmann, Davidlohr Bueso, Michael Ellerman, x86,
Christian Borntraeger, Linus Torvalds, xen-devel, Ingo Molnar,
Paul E . McKenney, linux-xtensa
From: Davidlohr Bueso <dave@stgolabs.net>
With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/ia64/include/asm/barrier.h | 2 +-
arch/powerpc/include/asm/barrier.h | 2 +-
arch/s390/include/asm/barrier.h | 2 +-
include/asm-generic/barrier.h | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index df896a1..209c4b8 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -77,7 +77,7 @@ do { \
___p1; \
})
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
/*
* The group barrier in front of the rsm & ssm are necessary to ensure
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index 0eca6ef..a7af5fb 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -34,7 +34,7 @@
#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define wmb() __asm__ __volatile__ ("sync" : : : "memory")
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
#ifdef __SUBARCH_HAS_LWSYNC
# define SMPWMB LWSYNC
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index d68e11e..7ffd0b1 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -36,7 +36,7 @@
#define smp_mb__before_atomic() smp_mb()
#define smp_mb__after_atomic() smp_mb()
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
#define smp_store_release(p, v) \
do { \
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index b42afad..0f45f93 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -93,7 +93,7 @@
#endif /* CONFIG_SMP */
#ifndef smp_store_mb
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
#endif
#ifndef smp_mb__before_atomic
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* Re: [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()
2016-01-10 14:16 ` [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release() Michael S. Tsirkin
@ 2016-01-12 16:28 ` Paul E. McKenney
2016-01-12 18:40 ` Michael S. Tsirkin
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-12 16:28 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, Peter Zijlstra, Arnd Bergmann, linux-arch,
Andrew Cooper, Russell King - ARM Linux, virtualization,
Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote:
> From: Davidlohr Bueso <dave@stgolabs.net>
>
> With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
> it was made clear that the context of this call (and thus set_mb)
> is strictly for CPU ordering, as opposed to IO. As such all archs
> should use the smp variant of mb(), respecting the semantics and
> saving a mandatory barrier on UP.
>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: <linux-arch@vger.kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: dave@stgolabs.net
> Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Aside from a need for s/lcoking/locking/ in the subject line:
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
> arch/ia64/include/asm/barrier.h | 2 +-
> arch/powerpc/include/asm/barrier.h | 2 +-
> arch/s390/include/asm/barrier.h | 2 +-
> include/asm-generic/barrier.h | 2 +-
> 4 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
> index df896a1..209c4b8 100644
> --- a/arch/ia64/include/asm/barrier.h
> +++ b/arch/ia64/include/asm/barrier.h
> @@ -77,7 +77,7 @@ do { \
> ___p1; \
> })
>
> -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
> +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
>
> /*
> * The group barrier in front of the rsm & ssm are necessary to ensure
> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> index 0eca6ef..a7af5fb 100644
> --- a/arch/powerpc/include/asm/barrier.h
> +++ b/arch/powerpc/include/asm/barrier.h
> @@ -34,7 +34,7 @@
> #define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
>
> -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
> +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
>
> #ifdef __SUBARCH_HAS_LWSYNC
> # define SMPWMB LWSYNC
> diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
> index d68e11e..7ffd0b1 100644
> --- a/arch/s390/include/asm/barrier.h
> +++ b/arch/s390/include/asm/barrier.h
> @@ -36,7 +36,7 @@
> #define smp_mb__before_atomic() smp_mb()
> #define smp_mb__after_atomic() smp_mb()
>
> -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
> +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
>
> #define smp_store_release(p, v) \
> do { \
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index b42afad..0f45f93 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -93,7 +93,7 @@
> #endif /* CONFIG_SMP */
>
> #ifndef smp_store_mb
> -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
> +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> #endif
>
> #ifndef smp_mb__before_atomic
> --
> MST
>
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()
2016-01-12 16:28 ` Paul E. McKenney
@ 2016-01-12 18:40 ` Michael S. Tsirkin
0 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 18:40 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-kernel, Peter Zijlstra, Arnd Bergmann, linux-arch,
Andrew Cooper, Russell King - ARM Linux, virtualization,
Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
On Tue, Jan 12, 2016 at 08:28:44AM -0800, Paul E. McKenney wrote:
> On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote:
> > From: Davidlohr Bueso <dave@stgolabs.net>
> >
> > With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
> > it was made clear that the context of this call (and thus set_mb)
> > is strictly for CPU ordering, as opposed to IO. As such all archs
> > should use the smp variant of mb(), respecting the semantics and
> > saving a mandatory barrier on UP.
> >
> > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: <linux-arch@vger.kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: dave@stgolabs.net
> > Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
>
> Aside from a need for s/lcoking/locking/ in the subject line:
>
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Thanks!
Though Ingo already put this in tip tree like this,
and I need a copy in my tree to avoid breaking bisect,
so I will probably keep it exactly the same to avoid confusion.
> > ---
> > arch/ia64/include/asm/barrier.h | 2 +-
> > arch/powerpc/include/asm/barrier.h | 2 +-
> > arch/s390/include/asm/barrier.h | 2 +-
> > include/asm-generic/barrier.h | 2 +-
> > 4 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
> > index df896a1..209c4b8 100644
> > --- a/arch/ia64/include/asm/barrier.h
> > +++ b/arch/ia64/include/asm/barrier.h
> > @@ -77,7 +77,7 @@ do { \
> > ___p1; \
> > })
> >
> > -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
> > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> >
> > /*
> > * The group barrier in front of the rsm & ssm are necessary to ensure
> > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> > index 0eca6ef..a7af5fb 100644
> > --- a/arch/powerpc/include/asm/barrier.h
> > +++ b/arch/powerpc/include/asm/barrier.h
> > @@ -34,7 +34,7 @@
> > #define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> > #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> >
> > -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
> > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> >
> > #ifdef __SUBARCH_HAS_LWSYNC
> > # define SMPWMB LWSYNC
> > diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
> > index d68e11e..7ffd0b1 100644
> > --- a/arch/s390/include/asm/barrier.h
> > +++ b/arch/s390/include/asm/barrier.h
> > @@ -36,7 +36,7 @@
> > #define smp_mb__before_atomic() smp_mb()
> > #define smp_mb__after_atomic() smp_mb()
> >
> > -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
> > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> >
> > #define smp_store_release(p, v) \
> > do { \
> > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> > index b42afad..0f45f93 100644
> > --- a/include/asm-generic/barrier.h
> > +++ b/include/asm-generic/barrier.h
> > @@ -93,7 +93,7 @@
> > #endif /* CONFIG_SMP */
> >
> > #ifndef smp_store_mb
> > -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
> > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> > #endif
> >
> > #ifndef smp_mb__before_atomic
> > --
> > MST
> >
^ permalink raw reply [flat|nested] 153+ messages in thread
* [PATCH v3 02/41] asm-generic: guard smp_store_release/load_acquire
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
2016-01-10 14:16 ` [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release() Michael S. Tsirkin
@ 2016-01-10 14:16 ` Michael S. Tsirkin
2016-01-10 14:16 ` [PATCH v3 03/41] ia64: rename nop->iosapic_nop Michael S. Tsirkin
` (38 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:16 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
Allow architectures to override smp_store_release
and smp_load_acquire by guarding the defines
in asm-generic/barrier.h with ifndef directives.
This is in preparation to reusing asm-generic/barrier.h
on architectures which have their own definition
of these macros.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
include/asm-generic/barrier.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 0f45f93..987b2e0 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -104,13 +104,16 @@
#define smp_mb__after_atomic() smp_mb()
#endif
+#ifndef smp_store_release
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
smp_mb(); \
WRITE_ONCE(*p, v); \
} while (0)
+#endif
+#ifndef smp_load_acquire
#define smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
@@ -118,6 +121,7 @@ do { \
smp_mb(); \
___p1; \
})
+#endif
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 03/41] ia64: rename nop->iosapic_nop
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
2016-01-10 14:16 ` [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release() Michael S. Tsirkin
2016-01-10 14:16 ` [PATCH v3 02/41] asm-generic: guard smp_store_release/load_acquire Michael S. Tsirkin
@ 2016-01-10 14:16 ` Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 04/41] ia64: reuse asm-generic/barrier.h Michael S. Tsirkin
` (37 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:16 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Tony Luck <tony.luck>
asm-generic/barrier.h defines a nop() macro.
To be able to use this header on ia64, we shouldn't
call local functions/variables nop().
There's one instance where this breaks on ia64:
rename the function to iosapic_nop to avoid the conflict.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/ia64/kernel/iosapic.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/ia64/kernel/iosapic.c b/arch/ia64/kernel/iosapic.c
index d2fae05..90fde5b 100644
--- a/arch/ia64/kernel/iosapic.c
+++ b/arch/ia64/kernel/iosapic.c
@@ -256,7 +256,7 @@ set_rte (unsigned int gsi, unsigned int irq, unsigned int dest, int mask)
}
static void
-nop (struct irq_data *data)
+iosapic_nop (struct irq_data *data)
{
/* do nothing... */
}
@@ -415,7 +415,7 @@ iosapic_unmask_level_irq (struct irq_data *data)
#define iosapic_shutdown_level_irq mask_irq
#define iosapic_enable_level_irq unmask_irq
#define iosapic_disable_level_irq mask_irq
-#define iosapic_ack_level_irq nop
+#define iosapic_ack_level_irq iosapic_nop
static struct irq_chip irq_type_iosapic_level = {
.name = "IO-SAPIC-level",
@@ -453,7 +453,7 @@ iosapic_ack_edge_irq (struct irq_data *data)
}
#define iosapic_enable_edge_irq unmask_irq
-#define iosapic_disable_edge_irq nop
+#define iosapic_disable_edge_irq iosapic_nop
static struct irq_chip irq_type_iosapic_edge = {
.name = "IO-SAPIC-edge",
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 04/41] ia64: reuse asm-generic/barrier.h
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (2 preceding siblings ...)
2016-01-10 14:16 ` [PATCH v3 03/41] ia64: rename nop->iosapic_nop Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 06/41] s390: " Michael S. Tsirkin
` (36 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mips, linux-ia64, linux-sh, Peter Zijlstra, virtualization,
H. Peter Anvin, sparclinux, Ingo Molnar, linux-arch, linux-s390,
Davidlohr Bueso, Russell King - ARM Linux, Arnd Bergmann, x86,
xen-devel, Ingo Molnar, linux-xtensa, user-mode-linux-devel,
Stefano Stabellini, adi-buildroot-devel, Thomas Gleixner,
linux-metag, linux-arm-kernel, Tony Luck, Andrew Cooper,
Fenghua Yu <fengh>
On ia64 smp_rmb, smp_wmb, read_barrier_depends, smp_read_barrier_depends
and smp_store_mb() match the asm-generic variants exactly. Drop the
local definitions and pull in asm-generic/barrier.h instead.
This is in preparation to refactoring this code area.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/ia64/include/asm/barrier.h | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index 209c4b8..2f93348 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -48,12 +48,6 @@
# define smp_mb() barrier()
#endif
-#define smp_rmb() smp_mb()
-#define smp_wmb() smp_mb()
-
-#define read_barrier_depends() do { } while (0)
-#define smp_read_barrier_depends() do { } while (0)
-
#define smp_mb__before_atomic() barrier()
#define smp_mb__after_atomic() barrier()
@@ -77,12 +71,12 @@ do { \
___p1; \
})
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
/*
* The group barrier in front of the rsm & ssm are necessary to ensure
* that none of the previous instructions in the same group are
* affected by the rsm/ssm.
*/
+#include <asm-generic/barrier.h>
+
#endif /* _ASM_IA64_BARRIER_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 06/41] s390: reuse asm-generic/barrier.h
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (3 preceding siblings ...)
2016-01-10 14:17 ` [PATCH v3 04/41] ia64: reuse asm-generic/barrier.h Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 07/41] sparc: " Michael S. Tsirkin
` (35 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Martin
On s390 read_barrier_depends, smp_read_barrier_depends
smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.
This is in preparation to refactoring this code area.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/s390/include/asm/barrier.h | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 7ffd0b1..c358c31 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -30,14 +30,6 @@
#define smp_rmb() rmb()
#define smp_wmb() wmb()
-#define read_barrier_depends() do { } while (0)
-#define smp_read_barrier_depends() do { } while (0)
-
-#define smp_mb__before_atomic() smp_mb()
-#define smp_mb__after_atomic() smp_mb()
-
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
@@ -53,4 +45,6 @@ do { \
___p1; \
})
+#include <asm-generic/barrier.h>
+
#endif /* __ASM_BARRIER_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 07/41] sparc: reuse asm-generic/barrier.h
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (4 preceding siblings ...)
2016-01-10 14:17 ` [PATCH v3 06/41] s390: " Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 08/41] arm: " Michael S. Tsirkin
` (34 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Ingo Molnar
On sparc 64 bit dma_rmb, dma_wmb, smp_store_mb, smp_mb, smp_rmb,
smp_wmb, read_barrier_depends and smp_read_barrier_depends match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.
nop uses __asm__ __volatile but is otherwise identical to
the generic version, drop that as well.
This is in preparation to refactoring this code area.
Note: nop() was in processor.h and not in barrier.h as on other
architectures. Nothing seems to depend on it being there though.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
---
arch/sparc/include/asm/barrier_32.h | 1 -
arch/sparc/include/asm/barrier_64.h | 21 ++-------------------
arch/sparc/include/asm/processor.h | 3 ---
3 files changed, 2 insertions(+), 23 deletions(-)
diff --git a/arch/sparc/include/asm/barrier_32.h b/arch/sparc/include/asm/barrier_32.h
index ae69eda..8059130 100644
--- a/arch/sparc/include/asm/barrier_32.h
+++ b/arch/sparc/include/asm/barrier_32.h
@@ -1,7 +1,6 @@
#ifndef __SPARC_BARRIER_H
#define __SPARC_BARRIER_H
-#include <asm/processor.h> /* for nop() */
#include <asm-generic/barrier.h>
#endif /* !(__SPARC_BARRIER_H) */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 14a9286..26c3f72 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -37,25 +37,6 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \
#define rmb() __asm__ __volatile__("":::"memory")
#define wmb() __asm__ __volatile__("":::"memory")
-#define dma_rmb() rmb()
-#define dma_wmb() wmb()
-
-#define smp_store_mb(__var, __value) \
- do { WRITE_ONCE(__var, __value); membar_safe("#StoreLoad"); } while(0)
-
-#ifdef CONFIG_SMP
-#define smp_mb() mb()
-#define smp_rmb() rmb()
-#define smp_wmb() wmb()
-#else
-#define smp_mb() __asm__ __volatile__("":::"memory")
-#define smp_rmb() __asm__ __volatile__("":::"memory")
-#define smp_wmb() __asm__ __volatile__("":::"memory")
-#endif
-
-#define read_barrier_depends() do { } while (0)
-#define smp_read_barrier_depends() do { } while (0)
-
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
@@ -74,4 +55,6 @@ do { \
#define smp_mb__before_atomic() barrier()
#define smp_mb__after_atomic() barrier()
+#include <asm-generic/barrier.h>
+
#endif /* !(__SPARC64_BARRIER_H) */
diff --git a/arch/sparc/include/asm/processor.h b/arch/sparc/include/asm/processor.h
index 2fe99e6..9da9646 100644
--- a/arch/sparc/include/asm/processor.h
+++ b/arch/sparc/include/asm/processor.h
@@ -5,7 +5,4 @@
#else
#include <asm/processor_32.h>
#endif
-
-#define nop() __asm__ __volatile__ ("nop")
-
#endif
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 08/41] arm: reuse asm-generic/barrier.h
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (5 preceding siblings ...)
2016-01-10 14:17 ` [PATCH v3 07/41] sparc: " Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 09/41] arm64: " Michael S. Tsirkin
` (33 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Russell King <rmk+k>
On arm smp_store_mb, read_barrier_depends, smp_read_barrier_depends,
smp_store_release, smp_load_acquire, smp_mb__before_atomic and
smp_mb__after_atomic match the asm-generic variants exactly. Drop the
local definitions and pull in asm-generic/barrier.h instead.
This is in preparation to refactoring this code area.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
arch/arm/include/asm/barrier.h | 23 +----------------------
1 file changed, 1 insertion(+), 22 deletions(-)
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 3ff5642..31152e8 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -70,28 +70,7 @@ extern void arm_heavy_mb(void);
#define smp_wmb() dmb(ishst)
#endif
-#define smp_store_release(p, v) \
-do { \
- compiletime_assert_atomic_type(*p); \
- smp_mb(); \
- WRITE_ONCE(*p, v); \
-} while (0)
-
-#define smp_load_acquire(p) \
-({ \
- typeof(*p) ___p1 = READ_ONCE(*p); \
- compiletime_assert_atomic_type(*p); \
- smp_mb(); \
- ___p1; \
-})
-
-#define read_barrier_depends() do { } while(0)
-#define smp_read_barrier_depends() do { } while(0)
-
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
-#define smp_mb__before_atomic() smp_mb()
-#define smp_mb__after_atomic() smp_mb()
+#include <asm-generic/barrier.h>
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_BARRIER_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 09/41] arm64: reuse asm-generic/barrier.h
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (6 preceding siblings ...)
2016-01-10 14:17 ` [PATCH v3 08/41] arm: " Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 10/41] metag: " Michael S. Tsirkin
` (32 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Catalin Marinas <ca>
On arm64 nop, read_barrier_depends, smp_read_barrier_depends
smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.
This is in preparation to refactoring this code area.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/arm64/include/asm/barrier.h | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 9622eb4..91a43f4 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -91,14 +91,7 @@ do { \
__u.__val; \
})
-#define read_barrier_depends() do { } while(0)
-#define smp_read_barrier_depends() do { } while(0)
-
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-#define nop() asm volatile("nop");
-
-#define smp_mb__before_atomic() smp_mb()
-#define smp_mb__after_atomic() smp_mb()
+#include <asm-generic/barrier.h>
#endif /* __ASSEMBLY__ */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 10/41] metag: reuse asm-generic/barrier.h
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (7 preceding siblings ...)
2016-01-10 14:17 ` [PATCH v3 09/41] arm64: " Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
2016-01-10 14:18 ` [PATCH v3 11/41] mips: " Michael S. Tsirkin
` (31 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, James Hogan <james.
On metag dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
smp_read_barrier_depends, smp_store_release and smp_load_acquire match
the asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.
This is in preparation to refactoring this code area.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/metag/include/asm/barrier.h | 25 ++-----------------------
1 file changed, 2 insertions(+), 23 deletions(-)
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index 172b7e5..b5b778b 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -44,9 +44,6 @@ static inline void wr_fence(void)
#define rmb() barrier()
#define wmb() mb()
-#define dma_rmb() rmb()
-#define dma_wmb() wmb()
-
#ifndef CONFIG_SMP
#define fence() do { } while (0)
#define smp_mb() barrier()
@@ -81,27 +78,9 @@ static inline void fence(void)
#endif
#endif
-#define read_barrier_depends() do { } while (0)
-#define smp_read_barrier_depends() do { } while (0)
-
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
-#define smp_store_release(p, v) \
-do { \
- compiletime_assert_atomic_type(*p); \
- smp_mb(); \
- WRITE_ONCE(*p, v); \
-} while (0)
-
-#define smp_load_acquire(p) \
-({ \
- typeof(*p) ___p1 = READ_ONCE(*p); \
- compiletime_assert_atomic_type(*p); \
- smp_mb(); \
- ___p1; \
-})
-
#define smp_mb__before_atomic() barrier()
#define smp_mb__after_atomic() barrier()
+#include <asm-generic/barrier.h>
+
#endif /* _ASM_METAG_BARRIER_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 11/41] mips: reuse asm-generic/barrier.h
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (8 preceding siblings ...)
2016-01-10 14:17 ` [PATCH v3 10/41] metag: " Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
2016-01-12 1:14 ` [v3,11/41] " Leonid Yegoshin
2016-01-10 14:18 ` [PATCH v3 12/41] x86/um: " Michael S. Tsirkin
` (30 subsequent siblings)
40 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Ralf Baechle
On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
smp_read_barrier_depends, smp_store_release and smp_load_acquire match
the asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.
This is in preparation to refactoring this code area.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/mips/include/asm/barrier.h | 25 ++-----------------------
1 file changed, 2 insertions(+), 23 deletions(-)
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 752e0b8..3eac4b9 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -10,9 +10,6 @@
#include <asm/addrspace.h>
-#define read_barrier_depends() do { } while(0)
-#define smp_read_barrier_depends() do { } while(0)
-
#ifdef CONFIG_CPU_HAS_SYNC
#define __sync() \
__asm__ __volatile__( \
@@ -87,8 +84,6 @@
#define wmb() fast_wmb()
#define rmb() fast_rmb()
-#define dma_wmb() fast_wmb()
-#define dma_rmb() fast_rmb()
#if defined(CONFIG_WEAK_ORDERING) && defined(CONFIG_SMP)
# ifdef CONFIG_CPU_CAVIUM_OCTEON
@@ -112,9 +107,6 @@
#define __WEAK_LLSC_MB " \n"
#endif
-#define smp_store_mb(var, value) \
- do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
#define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
#ifdef CONFIG_CPU_CAVIUM_OCTEON
@@ -129,22 +121,9 @@
#define nudge_writes() mb()
#endif
-#define smp_store_release(p, v) \
-do { \
- compiletime_assert_atomic_type(*p); \
- smp_mb(); \
- WRITE_ONCE(*p, v); \
-} while (0)
-
-#define smp_load_acquire(p) \
-({ \
- typeof(*p) ___p1 = READ_ONCE(*p); \
- compiletime_assert_atomic_type(*p); \
- smp_mb(); \
- ___p1; \
-})
-
#define smp_mb__before_atomic() smp_mb__before_llsc()
#define smp_mb__after_atomic() smp_llsc_mb()
+#include <asm-generic/barrier.h>
+
#endif /* __ASM_BARRIER_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-10 14:18 ` [PATCH v3 11/41] mips: " Michael S. Tsirkin
@ 2016-01-12 1:14 ` Leonid Yegoshin
[not found] ` <56945366.2090504-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
2016-01-12 9:27 ` Peter Zijlstra
0 siblings, 2 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-12 1:14 UTC (permalink / raw)
To: Michael S. Tsirkin, linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> smp_read_barrier_depends, smp_store_release and smp_load_acquire match
> the asm-generic variants exactly. Drop the local definitions and pull in
> asm-generic/barrier.h instead.
>
This statement doesn't fit MIPS barriers variations. Moreover, there is
a reason to extend that even more specific, at least for
smp_store_release and smp_load_acquire, look into
http://patchwork.linux-mips.org/patch/10506/
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
[parent not found: <56945366.2090504-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>]
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
[not found] ` <56945366.2090504-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
@ 2016-01-12 8:43 ` Michael S. Tsirkin
2016-01-12 9:51 ` Peter Zijlstra
0 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 8:43 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Peter Zijlstra,
Arnd Bergmann, linux-arch-u79uwXL29TY76Z2rM5mHXA, Andrew Cooper,
Russell King - ARM Linux,
virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Joe Perches, David Miller, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
linux-s390-u79uwXL29TY76Z2rM5mHXA,
sparclinux-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-metag-u79uwXL29TY76Z2rM5mHXA,
linux-mips-6z/3iImG2C8G8FEW9MqTrA, x86-DgEjT+Ai2ygdnm+yROfE0A,
user-mode-linux-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
adi-buildroot-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
linux-sh-u79uwXL29TY76Z2rM5mHXA,
linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw,
xen-devel-GuqFBffKawuUor55F778Sg
On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> >smp_read_barrier_depends, smp_store_release and smp_load_acquire match
> >the asm-generic variants exactly. Drop the local definitions and pull in
> >asm-generic/barrier.h instead.
> >
> This statement doesn't fit MIPS barriers variations. Moreover, there is a
> reason to extend that even more specific, at least for smp_store_release and
> smp_load_acquire, look into
>
> http://patchwork.linux-mips.org/patch/10506/
>
> - Leonid.
Fine, but it matches what current code is doing. Since that
MIPS_LIGHTWEIGHT_SYNC patch didn't go into linux-next yet, do
you see a problem reworking it on top of this patchset?
--
MST
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-12 8:43 ` Michael S. Tsirkin
@ 2016-01-12 9:51 ` Peter Zijlstra
0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12 9:51 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-mips, linux-ia64, linux-sh, virtualization, H. Peter Anvin,
sparclinux, Ingo Molnar, linux-arch, linux-s390,
Russell King - ARM Linux, Arnd Bergmann, x86, xen-devel,
Ingo Molnar, linux-xtensa, user-mode-linux-devel,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
linux-kernel, Ralf Baechle
On Tue, Jan 12, 2016 at 10:43:36AM +0200, Michael S. Tsirkin wrote:
> On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> > On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> > >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> > >smp_read_barrier_depends, smp_store_release and smp_load_acquire match
> > >the asm-generic variants exactly. Drop the local definitions and pull in
> > >asm-generic/barrier.h instead.
> > >
> > This statement doesn't fit MIPS barriers variations. Moreover, there is a
> > reason to extend that even more specific, at least for smp_store_release and
> > smp_load_acquire, look into
> >
> > http://patchwork.linux-mips.org/patch/10506/
> >
> > - Leonid.
>
> Fine, but it matches what current code is doing. Since that
> MIPS_LIGHTWEIGHT_SYNC patch didn't go into linux-next yet, do
> you see a problem reworking it on top of this patchset?
That patch is a complete doorstop atm. It needs a lot more work before
it can go anywhere. Don't worry about it.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-12 1:14 ` [v3,11/41] " Leonid Yegoshin
[not found] ` <56945366.2090504-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
@ 2016-01-12 9:27 ` Peter Zijlstra
2016-01-12 10:25 ` Peter Zijlstra
1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12 9:27 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, will.deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux, Arnd Bergmann,
linux-sh, x86, xen-devel, Ingo Molnar, linux-xtensa, james.hogan,
user-mode-linux-devel, Stefano Stabellini, adi-buildroot-devel,
ddaney.cavm, Thomas Gleixner, linux-metag, linux-arm-kernel,
Andrew Cooper, linux-kernel
On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> This statement doesn't fit MIPS barriers variations. Moreover, there is a
> reason to extend that even more specific, at least for smp_store_release and
> smp_load_acquire, look into
>
> http://patchwork.linux-mips.org/patch/10506/
Dude, that's one horrible patch.
1) you do not make such things selectable; either the hardware needs
them or it doesn't. If it does you _must_ use them, however unlikely.
2) the changelog _completely_ fails to explain the sync 0x11 and sync
0x12 semantics nor does it provide a publicly accessible link to
documentation that does.
3) it really should have explained what you did with
smp_llsc_mb/smp_mb__before_llsc() in _detail_.
And I agree that ideally it should be split into parts.
Seriously, this is _NOT_ OK.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-12 9:27 ` Peter Zijlstra
@ 2016-01-12 10:25 ` Peter Zijlstra
2016-01-12 10:40 ` Peter Zijlstra
0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12 10:25 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, will.deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux, Arnd Bergmann,
linux-sh, x86, xen-devel, Ingo Molnar, linux-xtensa, james.hogan,
user-mode-linux-devel, Stefano Stabellini, adi-buildroot-devel,
ddaney.cavm, Thomas Gleixner, linux-metag, linux-arm-kernel,
Andrew Cooper, linux-kernel
On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> 0x12 semantics nor does it provide a publicly accessible link to
> documentation that does.
Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/
> 3) it really should have explained what you did with
> smp_llsc_mb/smp_mb__before_llsc() in _detail_.
And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
are _NOT_ transitive and therefore cannot be used to implement the
smp_mb__{before,after} stuff.
That is, in MIPS speak, those SYNC types are Ordering Barriers, not
Completion Barriers. They need not be globally performed.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-12 10:25 ` Peter Zijlstra
@ 2016-01-12 10:40 ` Peter Zijlstra
2016-01-12 11:41 ` Will Deacon
0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12 10:40 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, will.deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux, Arnd Bergmann,
linux-sh, Michael Ellerman, x86, xen-devel, Ingo Molnar,
Paul McKenney, linux-xtensa, james.hogan, user-mode-linux-devel,
Stefano Stabellini, adi-buildroot-devel, ddaney.cavm,
Thomas Gleixner, linux-metag, linux-arm-kernel
On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> > 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> > 0x12 semantics nor does it provide a publicly accessible link to
> > documentation that does.
>
> Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/
>
> > 3) it really should have explained what you did with
> > smp_llsc_mb/smp_mb__before_llsc() in _detail_.
>
> And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> are _NOT_ transitive and therefore cannot be used to implement the
> smp_mb__{before,after} stuff.
>
> That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> Completion Barriers. They need not be globally performed.
Which if true; and I know Will has some questions here; would also mean
that you 'cannot' use the ACQUIRE/RELEASE barriers for your locks as was
recently suggested by David Daney.
That is, currently all architectures -- with exception of PPC -- have
RCsc locks, but using these non-transitive things will get you RCpc
locks.
So yes, MIPS can go RCpc for its locks and share the burden of pain with
PPC, but that needs to be a very concious decision.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-12 10:40 ` Peter Zijlstra
@ 2016-01-12 11:41 ` Will Deacon
2016-01-12 20:45 ` Leonid Yegoshin
0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-12 11:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, virtualization,
H. Peter Anvin, sparclinux, Ingo Molnar, linux-arch, linux-s390,
Russell King - ARM Linux, user-mode-linux-devel, linux-sh,
Michael Ellerman, x86, xen-devel, Ingo Molnar, Paul McKenney,
linux-xtensa, james.hogan, Arnd Bergmann, Stefano Stabellini,
adi-buildroot-devel, Leonid Yegoshin, ddaney.cavm,
Thomas Gleixner, linux-metag
On Tue, Jan 12, 2016 at 11:40:12AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> > > 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> > > 0x12 semantics nor does it provide a publicly accessible link to
> > > documentation that does.
> >
> > Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/
> >
> > > 3) it really should have explained what you did with
> > > smp_llsc_mb/smp_mb__before_llsc() in _detail_.
> >
> > And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> > are _NOT_ transitive and therefore cannot be used to implement the
> > smp_mb__{before,after} stuff.
> >
> > That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> > Completion Barriers. They need not be globally performed.
>
> Which if true; and I know Will has some questions here; would also mean
> that you 'cannot' use the ACQUIRE/RELEASE barriers for your locks as was
> recently suggested by David Daney.
The issue I have with the SYNC description in the text above is that it
describes the single CPU (program order) and the dual-CPU (confusingly
named global order) cases, but then doesn't generalise any further. That
means we can't sensibly reason about transitivity properties when a third
agent is involved. For example, the WRC+sync+addr test:
P0:
Wx = 1
P1:
Rx = 1
SYNC
Wy = 1
P2:
Ry = 1
<address dep>
Rx = 0
I can't find anything to forbid that, given the text. The main problem
is having the SYNC on P1 affect the write by P0.
> That is, currently all architectures -- with exception of PPC -- have
> RCsc locks, but using these non-transitive things will get you RCpc
> locks.
>
> So yes, MIPS can go RCpc for its locks and share the burden of pain with
> PPC, but that needs to be a very concious decision.
I think it's much worse than RCpc, given my interpretation of the wording.
Will
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-12 11:41 ` Will Deacon
@ 2016-01-12 20:45 ` Leonid Yegoshin
2016-01-12 21:40 ` Peter Zijlstra
2016-01-13 10:45 ` Will Deacon
0 siblings, 2 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-12 20:45 UTC (permalink / raw)
To: Will Deacon, Peter Zijlstra
Cc: Michael S. Tsirkin, linux-kernel, Arnd Bergmann, linux-arch,
Andrew Cooper, Russell King - ARM Linux, virtualization,
Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa
(I try to answer on multiple mails in one)
First of all, it seems like some generic notes should be given here:
1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in
some CPUs. On that CPUs it basically kills pipelines in each CPU, can do
a special memory/IO bus transaction (similar to "fence") and hold a
system until all R/W is completed. It is like Big Kernel Lock but worse.
So, the move to SMP_* kind of barriers is needed to improve performance,
especially on newest CPUs with long pipelines.
2. MIPS Arch document may be misleading because words "ordering" and
"completion" means different from Linux, the SYNC instruction
description is written for HW engineers. I wrote that in a separate
patch of the same patchset -
http://patchwork.linux-mips.org/patch/10505/ "MIPS: R6: Use lightweight
SYNC instruction in smp_* memory barriers":
> This instructions were specifically designed to work for smp_*() sort of
> memory barriers in MIPS R2/R3/R5 and R6.
>
> Unfortunately, it's description is very cryptic and is done in HW engineering
> style which prevents use of it by SW.
3. I bother MIPS Arch team long time until I completely understood that
MIPS SYNC_WMB, SYNC_MB, SYNC_RMB, SYNC_RELEASE and SYNC_ACQUIRE do an
exactly that is required in Documentation/memory-barriers.txt
In Peter Zijlstra mail:
> 1) you do not make such things selectable; either the hardware needs
> them or it doesn't. If it does you_must_ use them, however unlikely.
It is selectable only for MIPS R2 but not MIPS R6. The reason is - most
of MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU
resource, especially taking into account that "lightweight syncs" are
converted to a heavy "SYNC 0" in many of that CPUs. However the latest
MIPS/Imagination CPU have a pipeline long enough to hit a problem -
absence of SYNC at LL/SC inside atomics, barriers etc.
> And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> are_NOT_ transitive and therefore cannot be used to implement the
> smp_mb__{before,after} stuff.
>
> That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> Completion Barriers.
Please see above, point 2.
> That is, currently all architectures -- with exception of PPC -- have
> RCsc locks, but using these non-transitive things will get you RCpc
> locks.
>
> So yes, MIPS can go RCpc for its locks and share the burden of pain with
> PPC, but that needs to be a very concious decision.
I don't understand that - I tried hard but I can't find any word like
"RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.
In Will Deacon mail:
> The issue I have with the SYNC description in the text above is that it
> describes the single CPU (program order) and the dual-CPU (confusingly
> named global order) cases, but then doesn't generalise any further. That
> means we can't sensibly reason about transitivity properties when a third
> agent is involved. For example, the WRC+sync+addr test:
>
>
> P0:
> Wx = 1
>
> P1:
> Rx = 1
> SYNC
> Wy = 1
>
> P2:
> Ry = 1
> <address dep>
> Rx = 0
>
>
> I can't find anything to forbid that, given the text. The main problem
> is having the SYNC on P1 affect the write by P0.
As I understand that test, the visibility of P0: W[x] = 1 is identical
to P1 and P2 here. If P1 got X before SYNC and write to Y after SYNC
then instruction source register dependency tracking in P2 prevents a
speculative load of X before P2 obtains Y from the same place as P0/P1
and calculate address of X. If some load of X in P2 happens before
address dependency calculation it's result is discarded.
Yes, you can't find that in MIPS SYNC instruction description, it is
more likely in CM (Coherence Manager) area. I just pointed our arch team
member responsible for documents and he will think how to explain that.
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-12 20:45 ` Leonid Yegoshin
@ 2016-01-12 21:40 ` Peter Zijlstra
2016-01-13 0:21 ` Leonid Yegoshin
2016-01-13 10:45 ` Will Deacon
1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12 21:40 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: Will Deacon, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
linux-arch, Andrew Cooper, Russell King - ARM Linux,
virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
linux-metag, linux-mips, x86, user-mode-linux-devel,
adi-buildroot-devel, linux-sh, linux-xtensa
On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
> (I try to answer on multiple mails in one)
>
> First of all, it seems like some generic notes should be given here:
>
> 1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in some
> CPUs. On that CPUs it basically kills pipelines in each CPU, can do a
> special memory/IO bus transaction (similar to "fence") and hold a system
> until all R/W is completed. It is like Big Kernel Lock but worse. So, the
> move to SMP_* kind of barriers is needed to improve performance, especially
> on newest CPUs with long pipelines.
The MIPS SYNC isn't any worse than the PPC SYNC, x86 MFENCE or arm DSB
SY, yes they're heavy, so what.
> 2. MIPS Arch document may be misleading because words "ordering" and
> "completion" means different from Linux, the SYNC instruction description is
> written for HW engineers. I wrote that in a separate patch of the same
> patchset - http://patchwork.linux-mips.org/patch/10505/ "MIPS: R6: Use
> lightweight SYNC instruction in smp_* memory barriers":
Did you actually say anything here?
> >This instructions were specifically designed to work for smp_*() sort of
> >memory barriers in MIPS R2/R3/R5 and R6.
> >
> >Unfortunately, it's description is very cryptic and is done in HW engineering
> >style which prevents use of it by SW.
>
> 3. I bother MIPS Arch team long time until I completely understood that MIPS
> SYNC_WMB, SYNC_MB, SYNC_RMB, SYNC_RELEASE and SYNC_ACQUIRE do an exactly
> that is required in Documentation/memory-barriers.txt
Ha! and you think that document covers all the really fun details?
In particular we're very much all 'confused' about the various notions
of transitivity and what barriers imply how much of it.
> In Peter Zijlstra mail:
>
> >1) you do not make such things selectable; either the hardware needs
> >them or it doesn't. If it does you_must_ use them, however unlikely.
> It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of
> MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU
> resource, especially taking into account that "lightweight syncs" are
> converted to a heavy "SYNC 0" in many of that CPUs. However the latest
> MIPS/Imagination CPU have a pipeline long enough to hit a problem - absence
> of SYNC at LL/SC inside atomics, barriers etc.
What ?! Are you saying that because R2 has short pipelines its unlikely
to hit the reordering issues and we can omit barriers?
> >And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> >are_NOT_ transitive and therefore cannot be used to implement the
> >smp_mb__{before,after} stuff.
> >
> >That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> >Completion Barriers.
>
> Please see above, point 2.
That did not in fact enlighten things. Are they transitive/multi-copy
atomic or not?
(and here Will will go into great detail on the differences between the
two and make our collective brains explode :-)
> >That is, currently all architectures -- with exception of PPC -- have
> >RCsc locks, but using these non-transitive things will get you RCpc
> >locks.
> >
> >So yes, MIPS can go RCpc for its locks and share the burden of pain with
> >PPC, but that needs to be a very concious decision.
>
> I don't understand that - I tried hard but I can't find any word like
> "RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.
From: lkml.kernel.org/r/20150828153921.GF19282@twins.programming.kicks-ass.net
Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
not.
Currently PowerPC is the only arch that (can, and) does RCpc and gives a
weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
to see the stores of the CPU which did the RELEASE in order.
As it stands, RCU is the only _known_ codebase where this matters, but
we did in fact write code for a fair number of years 'assuming' RELEASE
+ ACQUIRE was a full barrier, so who knows what else is out there.
RCsc - release consistency sequential consistency
RCpc - release consistency processor consistency
https://en.wikipedia.org/wiki/Processor_consistency
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-12 21:40 ` Peter Zijlstra
@ 2016-01-13 0:21 ` Leonid Yegoshin
0 siblings, 0 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-13 0:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Will Deacon, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
linux-arch, Andrew Cooper, Russell King - ARM Linux,
virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
linux-metag, linux-mips, x86, user-mode-linux-devel,
adi-buildroot-devel, linux-sh
On 01/12/2016 01:40 PM, Peter Zijlstra wrote:
>
>> It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of
>> MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU
>> resource, especially taking into account that "lightweight syncs" are
>> converted to a heavy "SYNC 0" in many of that CPUs. However the latest
>> MIPS/Imagination CPU have a pipeline long enough to hit a problem - absence
>> of SYNC at LL/SC inside atomics, barriers etc.
> What ?! Are you saying that because R2 has short pipelines its unlikely
> to hit the reordering issues and we can omit barriers?
It was my guess to explain - why barriers was not included originally.
You can check with Ralf, he knows more about that time MIPS Linux code.
I bother with this more than 2 years and I just try to solve that issue
- in recent CPUs the load after LL/SC synchronization instruction loop
can get ahead of SC for sure, it was tested.
>
>>> And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
>>> are_NOT_ transitive and therefore cannot be used to implement the
>>> smp_mb__{before,after} stuff.
>>>
>>> That is, in MIPS speak, those SYNC types are Ordering Barriers, not
>>> Completion Barriers.
>> Please see above, point 2.
> That did not in fact enlighten things. Are they transitive/multi-copy
> atomic or not?
Peter Zijlstra recently wrote: "In particular we're very much all
'confused' about the various notions of transitivity". I am actually
confused too and need some examples here.
>
> (and here Will will go into great detail on the differences between the
> two and make our collective brains explode :-)
>
>>> That is, currently all architectures -- with exception of PPC -- have
>>> RCsc locks, but using these non-transitive things will get you RCpc
>>> locks.
>>>
>>> So yes, MIPS can go RCpc for its locks and share the burden of pain with
>>> PPC, but that needs to be a very concious decision.
>> I don't understand that - I tried hard but I can't find any word like
>> "RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.
> From: lkml.kernel.org/r/20150828153921.GF19282@twins.programming.kicks-ass.net
>
> Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> not.
MIPS Arch starting from R2 requires that. If some CPU can't, it should
execute a full "SYNC 0" instead, which is a full memory barrier.
>
> Currently PowerPC is the only arch that (can, and) does RCpc and gives a
> weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
> to see the stores of the CPU which did the RELEASE in order.
Yes, it was a goal for SYNC_ACQUIRE and SYNC_RELEASE.
Caveats:
- "Full memory barrier" on MIPS means - full barrier for any device
in coherent domain. In MIPS Tech/Imagination Tech MIPS-based CPU it is
"for any device connected to CM or IOCU + directly connected memory".
- It is not applied to instruction fetch. However, I-Cache flushes
and SYNCI are consistent with that. There is also hazard barrier
instructions to clear CPU pipeline to some extent - to help with this
limitation.
I don't think that these caveats prevent a correct Acquire/Release semantic.
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-12 20:45 ` Leonid Yegoshin
2016-01-12 21:40 ` Peter Zijlstra
@ 2016-01-13 10:45 ` Will Deacon
2016-01-13 19:02 ` Leonid Yegoshin
2016-01-13 22:26 ` Leonid Yegoshin
1 sibling, 2 replies; 153+ messages in thread
From: Will Deacon @ 2016-01-13 10:45 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
linux-arch, Andrew Cooper, Russell King - ARM Linux,
virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
linux-metag, linux-mips, x86, user-mode-linux-devel,
adi-buildroot-devel, linux-sh
On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
> >The issue I have with the SYNC description in the text above is that it
> >describes the single CPU (program order) and the dual-CPU (confusingly
> >named global order) cases, but then doesn't generalise any further. That
> >means we can't sensibly reason about transitivity properties when a third
> >agent is involved. For example, the WRC+sync+addr test:
> >
> >
> >P0:
> >Wx = 1
> >
> >P1:
> >Rx = 1
> >SYNC
> >Wy = 1
> >
> >P2:
> >Ry = 1
> ><address dep>
> >Rx = 0
> >
> >
> >I can't find anything to forbid that, given the text. The main problem
> >is having the SYNC on P1 affect the write by P0.
>
> As I understand that test, the visibility of P0: W[x] = 1 is identical to P1
> and P2 here. If P1 got X before SYNC and write to Y after SYNC then
> instruction source register dependency tracking in P2 prevents a speculative
> load of X before P2 obtains Y from the same place as P0/P1 and calculate
> address of X. If some load of X in P2 happens before address dependency
> calculation it's result is discarded.
I don't think the address dependency is enough on its own. By that
reasoning, the following variant (WRC+addr+addr) would work too:
P0:
Wx = 1
P1:
Rx = 1
<address dep>
Wy = 1
P2:
Ry = 1
<address dep>
Rx = 0
So are you saying that this is also forbidden?
Imagine that P0 and P1 are two threads that share a store buffer. What
then?
> Yes, you can't find that in MIPS SYNC instruction description, it is more
> likely in CM (Coherence Manager) area. I just pointed our arch team member
> responsible for documents and he will think how to explain that.
I tried grepping the linked documents for "coherence manager" but couldn't
find anything. Is the description you refer to available anywhere?
Will
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-13 10:45 ` Will Deacon
@ 2016-01-13 19:02 ` Leonid Yegoshin
2016-01-13 20:48 ` Peter Zijlstra
2016-01-13 22:26 ` Leonid Yegoshin
1 sibling, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-13 19:02 UTC (permalink / raw)
To: Will Deacon
Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
linux-arch, Andrew Cooper, Russell King - ARM Linux,
virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
linux-metag, linux-mips, x86, user-mode-linux-devel,
adi-buildroot-devel, linux-sh
On 01/13/2016 02:45 AM, Will Deacon wrote:
> On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
>>
> I don't think the address dependency is enough on its own. By that
> reasoning, the following variant (WRC+addr+addr) would work too:
>
>
> P0:
> Wx = 1
>
> P1:
> Rx = 1
> <address dep>
> Wy = 1
>
> P2:
> Ry = 1
> <address dep>
> Rx = 0
>
>
> So are you saying that this is also forbidden?
> Imagine that P0 and P1 are two threads that share a store buffer. What
> then?
>
I ask HW team about it but I have a question - has it any relationship
with replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)? You use
any barrier or do not use it and I just voice an intention to use a more
efficient instruction instead of bold hummer (SYNC instruction). If you
don't use any barrier here then it is a different issue.
May be it has sense to return back to original issue?
- Leonid
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-13 19:02 ` Leonid Yegoshin
@ 2016-01-13 20:48 ` Peter Zijlstra
2016-01-13 20:58 ` Leonid Yegoshin
0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-13 20:48 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, Paul McKenney, linux-xtensa, james.hogan,
Arnd Bergmann, Stefano Stabellini, adi-buildroot-devel,
ddaney.cavm, Thomas Gleixner, linux-metag
On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:
> I ask HW team about it but I have a question - has it any relationship with
> replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?
Of course. If you cannot explain the semantics of the primitives you
introduce, how can we judge the patch.
This barrier business is hard enough as it is, but magic unexplained
hardware makes it impossible.
Rest assured, you (MIPS) isn't the first (nor likely the last) to go
through all this. We've had these discussions (and to a certain extend
are still having them) for x86, PPC, Alpha, ARM, etc..
Any every time new barriers instructions get introduced we had better
have a full and comprehensive explanation to go along with them.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-13 20:48 ` Peter Zijlstra
@ 2016-01-13 20:58 ` Leonid Yegoshin
[not found] ` <5696BA6E.4070508-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-13 20:58 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, Paul McKenney, linux-xtensa, james.hogan,
Arnd Bergmann, Stefano Stabellini, adi-buildroot-devel,
ddaney.cavm, Thomas Gleixner, linux-metag
On 01/13/2016 12:48 PM, Peter Zijlstra wrote:
> On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:
>
>> I ask HW team about it but I have a question - has it any relationship with
>> replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?
> Of course. If you cannot explain the semantics of the primitives you
> introduce, how can we judge the patch.
>
>
You missed a point - it is a question about replacement of SYNC with
lightweight primitives. It is NOT a question about multithread system
behavior without any SYNC. The answer on a latest Will's question lies
in different area.
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-13 10:45 ` Will Deacon
2016-01-13 19:02 ` Leonid Yegoshin
@ 2016-01-13 22:26 ` Leonid Yegoshin
2016-01-14 9:24 ` Michael S. Tsirkin
[not found] ` <5696CF08.8080700-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
1 sibling, 2 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-13 22:26 UTC (permalink / raw)
To: Will Deacon
Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
linux-arch, Andrew Cooper, Russell King - ARM Linux,
virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
linux-metag, linux-mips, x86, user-mode-linux-devel,
adi-buildroot-devel, linux-sh
On 01/13/2016 02:45 AM, Will Deacon wrote:
>>
> I don't think the address dependency is enough on its own. By that
> reasoning, the following variant (WRC+addr+addr) would work too:
>
>
> P0:
> Wx = 1
>
> P1:
> Rx = 1
> <address dep>
> Wy = 1
>
> P2:
> Ry = 1
> <address dep>
> Rx = 0
>
>
> So are you saying that this is also forbidden?
> Imagine that P0 and P1 are two threads that share a store buffer. What
> then?
OK, I collected answers and it is:
In MIPS R6 this test passes OK, I mean - P2: Rx = 1 if Ry is read
as 1. By design.
However, it is unclear that happens in MIPS R2 1004K.
Moreover, there are voices against guarantee that it will be in
future and that voices point me to Documentation/memory-barriers.txt
section "DATA DEPENDENCY BARRIERS" examples which require SYNC_RMB
between loading address/index and using that for loading data based on
that address or index for shared data (look on CPU2 pseudo-code):
> To deal with this, a data dependency barrier or better must be inserted
> between the address load and the data load:
>
> CPU 1 CPU 2
> ======== =======> { A = 1, B = 2, C = 3, P = &A, Q = &C }
> B = 4;
> <write barrier>
> WRITE_ONCE(P, &B);
> Q = READ_ONCE(P);
> <data dependency barrier> <-----------
> SYNC_RMB is here
> D = *Q;
...
> Another example of where data dependency barriers might be required is
> where a
> number is read from memory and then used to calculate the index for an
> array
> access:
>
> CPU 1 CPU 2
> ======== =======> { M[0] = 1, M[1] = 2, M[3] = 3, P = 0, Q = 3 }
> M[1] = 4;
> <write barrier>
> WRITE_ONCE(P, 1);
> Q = READ_ONCE(P);
> <data dependency barrier> <------------
> SYNC_RMB is here
> D = M[Q];
That voices say that there is a legitimate reason to relax HW here for
performance if SYNC_RMB is needed anyway to work with this sequence of
shared data.
And all that is out-of-topic here in my mind. I just want to be sure
that this patchset still provides a use of a specific lightweight SYNCs
on MIPS vs bold and heavy generalized "SYNC 0" in any case.
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-13 22:26 ` Leonid Yegoshin
@ 2016-01-14 9:24 ` Michael S. Tsirkin
[not found] ` <5696CF08.8080700-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
1 sibling, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-14 9:24 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: Will Deacon, Peter Zijlstra, linux-kernel, Arnd Bergmann,
linux-arch, Andrew Cooper, Russell King - ARM Linux,
virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
linux-metag, linux-mips, x86, user-mode-linux-devel,
adi-buildroot-devel, linux-sh, linux-xtensa
On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:
> And all that is out-of-topic here in my mind. I just want to be sure that
> this patchset still provides a use of a specific lightweight SYNCs on MIPS
> vs bold and heavy generalized "SYNC 0" in any case.
>
> - Leonid.
Of course it does. All this patchset does is rename smp_mb/rmb/wmb
to __smp_mb()/__smp_rmb()/__smp_wmb()
and then asm-generic does #define smp_mb __smp_mb
or #define smp_mb barrier depending on CONFIG_SMP.
Why is that needed? So we can implement
[PATCH v3 28/41] asm-generic: implement virt_xxx memory barriers
--
MST
^ permalink raw reply [flat|nested] 153+ messages in thread
[parent not found: <5696CF08.8080700-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>]
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
[not found] ` <5696CF08.8080700-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
@ 2016-01-14 12:14 ` Will Deacon
2016-01-14 19:28 ` Leonid Yegoshin
0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-14 12:14 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: Peter Zijlstra, Michael S. Tsirkin,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Arnd Bergmann,
linux-arch-u79uwXL29TY76Z2rM5mHXA, Andrew Cooper,
Russell King - ARM Linux,
virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Joe Perches, David Miller, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
linux-s390-u79uwXL29TY76Z2rM5mHXA,
sparclinux-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-metag-u79uwXL29TY76Z2rM5mHXA,
linux-mips-6z/3iImG2C8G8FEW9MqTrA, x86-DgEjT+Ai2ygdnm+yROfE0A,
user-mode-linux-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
adi-buildroot-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
linux-sh-u79uwXL29TY76Z2rM5mHXA
On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:
> On 01/13/2016 02:45 AM, Will Deacon wrote:
> >>
> >I don't think the address dependency is enough on its own. By that
> >reasoning, the following variant (WRC+addr+addr) would work too:
> >
> >
> >P0:
> >Wx = 1
> >
> >P1:
> >Rx = 1
> ><address dep>
> >Wy = 1
> >
> >P2:
> >Ry = 1
> ><address dep>
> >Rx = 0
> >
> >
> >So are you saying that this is also forbidden?
> >Imagine that P0 and P1 are two threads that share a store buffer. What
> >then?
>
> OK, I collected answers and it is:
>
> In MIPS R6 this test passes OK, I mean - P2: Rx = 1 if Ry is read as 1.
> By design.
>
> However, it is unclear that happens in MIPS R2 1004K.
How can it be unclear? If, for example, the outcome is permitted on that
CPU, then your original reasoning for the WRC+sync+addr doesn't apply
there and SYNC is not transitive. That's what I'm trying to get to the
bottom of.
Does the MIPS kernel target a particular CPU at compile time?
> Moreover, there are voices against guarantee that it will be in future
> and that voices point me to Documentation/memory-barriers.txt section "DATA
> DEPENDENCY BARRIERS" examples which require SYNC_RMB between loading
> address/index and using that for loading data based on that address or index
> for shared data (look on CPU2 pseudo-code):
> >To deal with this, a data dependency barrier or better must be inserted
> >between the address load and the data load:
> >
> > CPU 1 CPU 2
> > ======== =======> > { A = 1, B = 2, C = 3, P = &A, Q = &C }
> > B = 4;
> > <write barrier>
> > WRITE_ONCE(P, &B);
> > Q = READ_ONCE(P);
> > <data dependency barrier> <-----------
> >SYNC_RMB is here
> > D = *Q;
> ...
> >Another example of where data dependency barriers might be required is
> >where a
> >number is read from memory and then used to calculate the index for an
> >array
> >access:
> >
> > CPU 1 CPU 2
> > ======== =======> > { M[0] = 1, M[1] = 2, M[3] = 3, P = 0, Q = 3 }
> > M[1] = 4;
> > <write barrier>
> > WRITE_ONCE(P, 1);
> > Q = READ_ONCE(P);
> > <data dependency barrier> <------------
> >SYNC_RMB is here
> > D = M[Q];
>
> That voices say that there is a legitimate reason to relax HW here for
> performance if SYNC_RMB is needed anyway to work with this sequence of
> shared data.
Are you saying that MIPS needs to implement [smp_]read_barrier_depends?
> And all that is out-of-topic here in my mind. I just want to be sure that
> this patchset still provides a use of a specific lightweight SYNCs on MIPS
> vs bold and heavy generalized "SYNC 0" in any case.
We may be highjacking the thread slightly, but there are much bigger
issues at play here if you want to start using lightweight barriers to
implement relaxed kernel primitives such as smp_load_acquire and
smp_store_release.
Will
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-14 12:14 ` Will Deacon
@ 2016-01-14 19:28 ` Leonid Yegoshin
2016-01-14 20:34 ` Paul E. McKenney
0 siblings, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 19:28 UTC (permalink / raw)
To: Will Deacon
Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
linux-arch, Andrew Cooper, Russell King - ARM Linux,
virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
linux-metag, linux-mips, x86, user-mode-linux-devel,
adi-buildroot-devel, linux-sh
On 01/14/2016 04:14 AM, Will Deacon wrote:
> On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:
>
>> Moreover, there are voices against guarantee that it will be in future
>> and that voices point me to Documentation/memory-barriers.txt section "DATA
>> DEPENDENCY BARRIERS" examples which require SYNC_RMB between loading
>> address/index and using that for loading data based on that address or index
>> for shared data (look on CPU2 pseudo-code):
>>> To deal with this, a data dependency barrier or better must be inserted
>>> between the address load and the data load:
>>>
>>> CPU 1 CPU 2
>>> ======== =======>>> { A = 1, B = 2, C = 3, P = &A, Q = &C }
>>> B = 4;
>>> <write barrier>
>>> WRITE_ONCE(P, &B);
>>> Q = READ_ONCE(P);
>>> <data dependency barrier> <-----------
>>> SYNC_RMB is here
>>> D = *Q;
>> ...
>>> Another example of where data dependency barriers might be required is
>>> where a
>>> number is read from memory and then used to calculate the index for an
>>> array
>>> access:
>>>
>>> CPU 1 CPU 2
>>> ======== =======>>> { M[0] = 1, M[1] = 2, M[3] = 3, P = 0, Q = 3 }
>>> M[1] = 4;
>>> <write barrier>
>>> WRITE_ONCE(P, 1);
>>> Q = READ_ONCE(P);
>>> <data dependency barrier> <------------
>>> SYNC_RMB is here
>>> D = M[Q];
>> That voices say that there is a legitimate reason to relax HW here for
>> performance if SYNC_RMB is needed anyway to work with this sequence of
>> shared data.
> Are you saying that MIPS needs to implement [smp_]read_barrier_depends?
It is not me, it is Documentation/memory-barriers.txt from kernel sources.
HW team can't work on voice statements, it should do a work on written
documents. If that is written (see above the lines which I marked by
"SYNC_RMB") then anybody should use it and never mind how many
CPUs/Threads are in play. This examples explicitly requires to insert
"data dependency barrier" between reading a shared pointer/index and
using it to fetch a shared data. So, your WRC+addr+addr test is a
violation of that recommendation.
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-14 19:28 ` Leonid Yegoshin
@ 2016-01-14 20:34 ` Paul E. McKenney
2016-01-14 21:01 ` Leonid Yegoshin
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 20:34 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
Will Deacon, virtualization, H. Peter Anvin, sparclinux,
Ingo Molnar, linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, ddaney.cavm,
Thomas Gleixner, linux-metag
On Thu, Jan 14, 2016 at 11:28:18AM -0800, Leonid Yegoshin wrote:
> On 01/14/2016 04:14 AM, Will Deacon wrote:
> >On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:
> >
> >> Moreover, there are voices against guarantee that it will be in future
> >>and that voices point me to Documentation/memory-barriers.txt section "DATA
> >>DEPENDENCY BARRIERS" examples which require SYNC_RMB between loading
> >>address/index and using that for loading data based on that address or index
> >>for shared data (look on CPU2 pseudo-code):
> >>>To deal with this, a data dependency barrier or better must be inserted
> >>>between the address load and the data load:
> >>>
> >>> CPU 1 CPU 2
> >>> ======== =======> >>> { A = 1, B = 2, C = 3, P = &A, Q = &C }
> >>> B = 4;
> >>> <write barrier>
> >>> WRITE_ONCE(P, &B);
> >>> Q = READ_ONCE(P);
> >>> <data dependency barrier> <-----------
> >>>SYNC_RMB is here
> >>> D = *Q;
> >>...
> >>>Another example of where data dependency barriers might be required is
> >>>where a
> >>>number is read from memory and then used to calculate the index for an
> >>>array
> >>>access:
> >>>
> >>> CPU 1 CPU 2
> >>> ======== =======> >>> { M[0] = 1, M[1] = 2, M[3] = 3, P = 0, Q = 3 }
> >>> M[1] = 4;
> >>> <write barrier>
> >>> WRITE_ONCE(P, 1);
> >>> Q = READ_ONCE(P);
> >>> <data dependency barrier> <------------
> >>>SYNC_RMB is here
> >>> D = M[Q];
> >>That voices say that there is a legitimate reason to relax HW here for
> >>performance if SYNC_RMB is needed anyway to work with this sequence of
> >>shared data.
> >Are you saying that MIPS needs to implement [smp_]read_barrier_depends?
>
> It is not me, it is Documentation/memory-barriers.txt from kernel sources.
>
> HW team can't work on voice statements, it should do a work on
> written documents. If that is written (see above the lines which I
> marked by "SYNC_RMB") then anybody should use it and never mind how
> many CPUs/Threads are in play. This examples explicitly requires to
> insert "data dependency barrier" between reading a shared
> pointer/index and using it to fetch a shared data. So, your
> WRC+addr+addr test is a violation of that recommendation.
Perhaps Documentation/memory-barriers.txt needs additional clarification.
It would not be the first time.
If your CPU implicitly maintains ordering based on address and
data dependencies, then you don't need any instructions for
<data dependency barrier>.
The WRC+addr+addr is OK because data dependencies are not required to be
transitive, in other words, they are not required to flow from one CPU to
another without the help of an explicit memory barrier. Transitivity is
instead supplied by smp_mb() and by smp_store_release()-smp_load_acquire()
chains. Here is the Linux kernel code for WRC+addr+addr, give or take
(and no, I have no idea why anyone would want to write code like this):
struct foo {
struct foo **a;
};
struct foo b;
struct foo c;
struct foo d;
struct foo e;
struct foo f = { &d };
struct foo g = { &e };
struct foo *x = &b;
void cpu0(void)
{
WRITE_ONCE(x, &f);
}
void cpu1(void)
{
struct foo *p;
p = lockless_dereference(x);
WRITE_ONCE(p->a, &x);
}
void cpu2(void)
{
r1 = lockless_dereference(f.a);
WRITE_ONCE(*r1, &c);
}
It is legal to end the run with x=&f and r1=&x. To prevent this outcome,
we do the following:
struct foo {
struct foo **a;
};
struct foo b;
struct foo c;
struct foo d;
struct foo e;
struct foo f = { &d };
struct foo g = { &e };
struct foo *x = &b;
void cpu0(void)
{
WRITE_ONCE(x, &f);
}
void cpu1(void)
{
struct foo *p;
p = lockless_dereference(x);
smp_store_release(&p->a, &x); /* Additional ordering. */
}
void cpu2(void)
{
r1 = lockless_dereference(f.a);
WRITE_ONCE(*r1, &c);
}
And I still don't know why anyone would need this sort of code. ;-)
Alternatively, we pull cpu2() into cpu1():
struct foo {
struct foo **a;
};
struct foo b;
struct foo c;
struct foo d;
struct foo e;
struct foo f = { &d };
struct foo g = { &e };
struct foo *x = &b;
void cpu0(void)
{
WRITE_ONCE(x, &f);
}
void cpu1(void)
{
struct foo *p;
p = lockless_dereference(x);
WRITE_ONCE(p->a, &x);
r1 = lockless_dereference(f.a);
WRITE_ONCE(*r1, &c);
}
The ordering is now enforced by being within a single thread. In fact,
the second lockless_dereference() can be READ_ONCE().
So, does MIPS maintain ordering within a given CPU based on address and
data dependencies? If so, you don't need to emit memory-barrier instructions
for read_barrier_depends().
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-14 20:34 ` Paul E. McKenney
@ 2016-01-14 21:01 ` Leonid Yegoshin
2016-01-14 21:29 ` Paul E. McKenney
0 siblings, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 21:01 UTC (permalink / raw)
To: paulmck
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
Will Deacon, virtualization, H. Peter Anvin, sparclinux,
Ingo Molnar, linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, ddaney.cavm,
Thomas Gleixner, linux-metag
I need some time to understand your test examples. However,
On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
>
>
> The WRC+addr+addr is OK because data dependencies are not required to be
> transitive, in other words, they are not required to flow from one CPU to
> another without the help of an explicit memory barrier.
I don't see any reliable way to fit WRC+addr+addr into "DATA DEPENDENCY
BARRIERS" section recommendation to have data dependency barrier between
read of a shared pointer/index and read the shared data based on that
pointer. If you have this two reads, it doesn't matter the rest of
scenario, you should put the dependency barrier in code anyway. If you
don't do it in WRC+addr+addr scenario then after years it can be easily
changed to different scenario which fits some of scenario in "DATA
DEPENDENCY BARRIERS" section and fails.
> Transitivity is
Peter Zijlstra recently wrote: "In particular we're very much all
'confused' about the various notions of transitivity". I am confused
too, so - please use some more simple way to explain your words. Sorry,
but we need a common ground first.
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-14 21:01 ` Leonid Yegoshin
@ 2016-01-14 21:29 ` Paul E. McKenney
2016-01-14 21:36 ` Leonid Yegoshin
2016-01-15 8:55 ` Peter Zijlstra
0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 21:29 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
Will Deacon, virtualization, H. Peter Anvin, sparclinux,
Ingo Molnar, linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, ddaney.cavm,
Thomas Gleixner, linux-metag
On Thu, Jan 14, 2016 at 01:01:05PM -0800, Leonid Yegoshin wrote:
> I need some time to understand your test examples. However,
Understood.
> On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
> >
> >
> >The WRC+addr+addr is OK because data dependencies are not required to be
> >transitive, in other words, they are not required to flow from one CPU to
> >another without the help of an explicit memory barrier.
>
> I don't see any reliable way to fit WRC+addr+addr into "DATA
> DEPENDENCY BARRIERS" section recommendation to have data dependency
> barrier between read of a shared pointer/index and read the shared
> data based on that pointer. If you have this two reads, it doesn't
> matter the rest of scenario, you should put the dependency barrier
> in code anyway. If you don't do it in WRC+addr+addr scenario then
> after years it can be easily changed to different scenario which
> fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
> fails.
The trick is that lockless_dereference() contains an
smp_read_barrier_depends():
#define lockless_dereference(p) \
({ \
typeof(p) _________p1 = READ_ONCE(p); \
smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
(_________p1); \
})
Or am I missing your point?
> > Transitivity is
>
> Peter Zijlstra recently wrote: "In particular we're very much all
> 'confused' about the various notions of transitivity". I am confused
> too, so - please use some more simple way to explain your words.
> Sorry, but we need a common ground first.
OK, how about an example? (Z6.3 in the ppcmem naming scheme.)
int x, y, z;
void cpu0(void)
{
WRITE_ONCE(x, 1);
smp_wmb();
WRITE_ONCE(y, 1);
}
void cpu1(void)
{
WRITE_ONCE(y, 2);
smp_wmb();
WRITE_ONCE(z, 1);
}
void cpu2(void)
{
r1 = READ_ONCE(z);
smp_rmb();
r2 = read_once(x);
}
If smp_rmb() and smp_wmb() provided transitive ordering, then cpu2()
would see cpu0()'s ordering. But they do not, so the ordering is
visible at best to the adjacent CPU. This means that the final value
of y can be 2, while at the same time r1=1 && r2=0.
Now the full barrier, smp_mb(), does provide transitive ordering,
so if the three barriers in the above example are replaced with
smp_mb() the y=2 && r1=1 && r2=0 outcome will be prohibited.
So smp_mb() provides transitivity, as do pairs of smp_store_release()
and smp_read_acquire(), as do RCU grace periods. The exact interactions
between transitive and non-transitive ordering is a work in progress.
That said, if a series of transitive segments ends in a write, which
connects to a single non-transitive segment starting with a read,
you should be good. And in fact in the example above, you can replace
the smp_wmb()s with smp_mb() and leave the smp_rmb() and still
prohibit the "cyclic" outcome.
If you want a more formal definition, I must refer you back to the
ppcmem and herd references.
Does that help?
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-14 21:29 ` Paul E. McKenney
@ 2016-01-14 21:36 ` Leonid Yegoshin
2016-01-14 22:55 ` Paul E. McKenney
2016-01-15 8:55 ` Peter Zijlstra
1 sibling, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 21:36 UTC (permalink / raw)
To: paulmck
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
Will Deacon, virtualization, H. Peter Anvin, sparclinux,
Ingo Molnar, linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, ddaney.cavm,
Thomas Gleixner, linux-metag
On 01/14/2016 01:29 PM, Paul E. McKenney wrote:
>
>> On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
>>>
>>> The WRC+addr+addr is OK because data dependencies are not required to be
>>> transitive, in other words, they are not required to flow from one CPU to
>>> another without the help of an explicit memory barrier.
>> I don't see any reliable way to fit WRC+addr+addr into "DATA
>> DEPENDENCY BARRIERS" section recommendation to have data dependency
>> barrier between read of a shared pointer/index and read the shared
>> data based on that pointer. If you have this two reads, it doesn't
>> matter the rest of scenario, you should put the dependency barrier
>> in code anyway. If you don't do it in WRC+addr+addr scenario then
>> after years it can be easily changed to different scenario which
>> fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
>> fails.
> The trick is that lockless_dereference() contains an
> smp_read_barrier_depends():
>
> #define lockless_dereference(p) \
> ({ \
> typeof(p) _________p1 = READ_ONCE(p); \
> smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> (_________p1); \
> })
>
> Or am I missing your point?
WRC+addr+addr has no any barrier. lockless_dereference() has a barrier.
I don't see a common points between this and that in your answer, sorry.
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-14 21:36 ` Leonid Yegoshin
@ 2016-01-14 22:55 ` Paul E. McKenney
2016-01-14 23:33 ` Leonid Yegoshin
2016-01-15 10:24 ` Will Deacon
0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 22:55 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
Will Deacon, virtualization, H. Peter Anvin, sparclinux,
Ingo Molnar, linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, ddaney.cavm,
Thomas Gleixner, linux-metag
On Thu, Jan 14, 2016 at 01:36:50PM -0800, Leonid Yegoshin wrote:
> On 01/14/2016 01:29 PM, Paul E. McKenney wrote:
> >
> >>On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
> >>>
> >>>The WRC+addr+addr is OK because data dependencies are not required to be
> >>>transitive, in other words, they are not required to flow from one CPU to
> >>>another without the help of an explicit memory barrier.
> >>I don't see any reliable way to fit WRC+addr+addr into "DATA
> >>DEPENDENCY BARRIERS" section recommendation to have data dependency
> >>barrier between read of a shared pointer/index and read the shared
> >>data based on that pointer. If you have this two reads, it doesn't
> >>matter the rest of scenario, you should put the dependency barrier
> >>in code anyway. If you don't do it in WRC+addr+addr scenario then
> >>after years it can be easily changed to different scenario which
> >>fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
> >>fails.
> >The trick is that lockless_dereference() contains an
> >smp_read_barrier_depends():
> >
> >#define lockless_dereference(p) \
> >({ \
> > typeof(p) _________p1 = READ_ONCE(p); \
> > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > (_________p1); \
> >})
> >
> >Or am I missing your point?
>
> WRC+addr+addr has no any barrier. lockless_dereference() has a
> barrier. I don't see a common points between this and that in your
> answer, sorry.
Me, I am wondering what WRC+addr+addr has to do with anything at all.
<Going back through earlier email>
OK, so it looks like Will was asking not about WRC+addr+addr, but instead
about WRC+sync+addr. This would drop an smp_mb() into cpu2() in my
earlier example, which needs to provide ordering.
I am guessing that the manual's "Older instructions which must be globally
performed when the SYNC instruction completes" provides the equivalent
of ARM/Power A-cumulativity, which can be thought of as transitivity
backwards in time. This leads me to believe that your smp_mb() needs
to use SYNC rather than SYNC_MB, as was the subject of earlier spirited
discussion in this thread.
Suppose you have something like this:
void cpu0(void)
{
WRITE_ONCE(a, 1);
SYNC_MB();
r0 = READ_ONCE(b);
}
void cpu1(void)
{
WRITE_ONCE(b, 1);
SYNC_MB();
r1 = READ_ONCE(c);
}
void cpu2(void)
{
WRITE_ONCE(c, 1);
SYNC_MB();
r2 = READ_ONCE(d);
}
void cpu3(void)
{
WRITE_ONCE(d, 1);
SYNC_MB();
r3 = READ_ONCE(a);
}
Does your hardware guarantee that it is not possible for all of r0,
r1, r2, and r3 to be equal to zero at the end of the test, assuming
that a, b, c, and d are all initially zero, and the four functions
above run concurrently? There are many similar litmus tests for other
combinations of reads and writes, but this is perhaps the nastiest from
a hardware viewpoint. Does SYNC_MB() provide sufficient ordering for
this sort of situation?
Another (more academic) case is this one, with x and y initially zero:
void cpu0(void)
{
WRITE_ONCE(x, 1);
}
void cpu1(void)
{
WRITE_ONCE(y, 1);
}
void cpu2(void)
{
r1 = READ_ONCE(x, 1);
SYNC_MB();
r2 = READ_ONCE(y, 1);
}
void cpu3(void)
{
r3 = READ_ONCE(y, 1);
SYNC_MB();
r4 = READ_ONCE(x, 1);
}
Does SYNC_MB() prohibit r1 = 1 && r2 = 0 && r3 = 1 && r4 = 0?
Now, I don't know of any specific use cases for this pattern, but it
is greatly beloved of some of the old-school concurrency community,
so it is likely to crop up at some point, despite my best efforts. :-/
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-14 22:55 ` Paul E. McKenney
@ 2016-01-14 23:33 ` Leonid Yegoshin
2016-01-15 0:47 ` Paul E. McKenney
2016-01-15 10:24 ` Will Deacon
1 sibling, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 23:33 UTC (permalink / raw)
To: paulmck
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
Will Deacon, virtualization, H. Peter Anvin, sparclinux,
Ingo Molnar, linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, ddaney.cavm,
Thomas Gleixner, linux-metag
On 01/14/2016 02:55 PM, Paul E. McKenney wrote:
> OK, so it looks like Will was asking not about WRC+addr+addr, but instead
> about WRC+sync+addr.
(He actually asked twice about this and that too but skip this)
> I am guessing that the manual's "Older instructions which must be globally
> performed when the SYNC instruction completes" provides the equivalent
> of ARM/Power A-cumulativity, which can be thought of as transitivity
> backwards in time. This leads me to believe that your smp_mb() needs
> to use SYNC rather than SYNC_MB, as was the subject of earlier spirited
> discussion in this thread.
Don't be fooled here by words "ordered" and "completed" - it is HW
design items and actually written poorly.
Just assume that SYNC_MB is absolutely the same as SYNC for any CPU and
coherent device (besides performance). The difference can be in
non-coherent devices because SYNC actually tries to make a barrier for
them too. In some SoCs it is just the same because there is no need to
barrier a non-coherent device (device register access usually strictly
ordered... if there is no bridge in between).
>
> Suppose you have something like this:
> ...
> Does your hardware guarantee that it is not possible for all of r0,
> r1, r2, and r3 to be equal to zero at the end of the test, assuming
> that a, b, c, and d are all initially zero, and the four functions
> above run concurrently?
It is assumed to be so from Arch point of view. HW bugs are possible, of
course.
> Another (more academic) case is this one, with x and y initially zero:
>
> ...
> Does SYNC_MB() prohibit r1 = 1 && r2 = 0 && r3 = 1 && r4 = 0?
It is assumed to be so from Arch point of view. HW bugs are possible, of
course.
Note: I am not sure about ANY past MIPS R2 CPU because that stuff is
implemented some time but nobody made it in Linux kernel (it was used by
some vendor for non-Linux system). For that reason my patch for
lightweight SYNCs has an option - implement it or implement a generic
SYNC. It is possible that some vendor did it in different way but nobody
knows or test it. But as a minimum - SYNC must be implemented in
spinlocks/atomics/bitops, in recent P5600 it is proven that read can
pass write in atomics.
MIPS R6 is a different story, I verified lightweight SYNCs from the
beginning and it also should use SYNCs.
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-14 23:33 ` Leonid Yegoshin
@ 2016-01-15 0:47 ` Paul E. McKenney
2016-01-15 1:07 ` Leonid Yegoshin
[not found] ` <20160115004753.GN3818-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 0:47 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
Will Deacon, virtualization, H. Peter Anvin, sparclinux,
Ingo Molnar, linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, ddaney.cavm,
Thomas Gleixner, linux-metag
On Thu, Jan 14, 2016 at 03:33:40PM -0800, Leonid Yegoshin wrote:
> On 01/14/2016 02:55 PM, Paul E. McKenney wrote:
> >OK, so it looks like Will was asking not about WRC+addr+addr, but instead
> >about WRC+sync+addr.
> (He actually asked twice about this and that too but skip this)
Fair enough! ;-)
> >I am guessing that the manual's "Older instructions which must be globally
> >performed when the SYNC instruction completes" provides the equivalent
> >of ARM/Power A-cumulativity, which can be thought of as transitivity
> >backwards in time. This leads me to believe that your smp_mb() needs
> >to use SYNC rather than SYNC_MB, as was the subject of earlier spirited
> >discussion in this thread.
>
> Don't be fooled here by words "ordered" and "completed" - it is HW
> design items and actually written poorly.
> Just assume that SYNC_MB is absolutely the same as SYNC for any CPU
> and coherent device (besides performance). The difference can be in
> non-coherent devices because SYNC actually tries to make a barrier
> for them too. In some SoCs it is just the same because there is no
> need to barrier a non-coherent device (device register access
> usually strictly ordered... if there is no bridge in between).
So smp_mb() can be SYNC_MB. However, mb() needs to be SYNC for MMIO
purposes, correct?
> >Suppose you have something like this:
> >...
> >Does your hardware guarantee that it is not possible for all of r0,
> >r1, r2, and r3 to be equal to zero at the end of the test, assuming
> >that a, b, c, and d are all initially zero, and the four functions
> >above run concurrently?
>
> It is assumed to be so from Arch point of view. HW bugs are
> possible, of course.
Indeed!
> >Another (more academic) case is this one, with x and y initially zero:
> >
> >...
> >Does SYNC_MB() prohibit r1 = 1 && r2 = 0 && r3 = 1 && r4 = 0?
>
> It is assumed to be so from Arch point of view. HW bugs are
> possible, of course.
Looks to me like smp_mb() can be SYNC_MB, then.
> Note: I am not sure about ANY past MIPS R2 CPU because that stuff is
> implemented some time but nobody made it in Linux kernel (it was
> used by some vendor for non-Linux system). For that reason my patch
> for lightweight SYNCs has an option - implement it or implement a
> generic SYNC. It is possible that some vendor did it in different
> way but nobody knows or test it. But as a minimum - SYNC must be
> implemented in spinlocks/atomics/bitops, in recent P5600 it is
> proven that read can pass write in atomics.
>
> MIPS R6 is a different story, I verified lightweight SYNCs from the
> beginning and it also should use SYNCs.
So you need to build a different kernel for some types of MIPS systems?
Or do you do boot-time rewriting, like a number of other arches do?
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 0:47 ` Paul E. McKenney
@ 2016-01-15 1:07 ` Leonid Yegoshin
2016-01-27 11:26 ` Maciej W. Rozycki
[not found] ` <20160115004753.GN3818-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
1 sibling, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-15 1:07 UTC (permalink / raw)
To: paulmck
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
Will Deacon, virtualization, H. Peter Anvin, sparclinux,
Ingo Molnar, linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, ddaney.cavm,
Thomas Gleixner, linux-metag
On 01/14/2016 04:47 PM, Paul E. McKenney wrote:
> On Thu, Jan 14, 2016 at 03:33:40PM -0800, Leonid Yegoshin wrote:
>> Don't be fooled here by words "ordered" and "completed" - it is HW
>> design items and actually written poorly.
>> Just assume that SYNC_MB is absolutely the same as SYNC for any CPU
>> and coherent device (besides performance). The difference can be in
>> non-coherent devices because SYNC actually tries to make a barrier
>> for them too. In some SoCs it is just the same because there is no
>> need to barrier a non-coherent device (device register access
>> usually strictly ordered... if there is no bridge in between).
> So smp_mb() can be SYNC_MB. However, mb() needs to be SYNC for MMIO
> purposes, correct?
Absolutely. For MIPS R2 which is not Octeon.
>> Note: I am not sure about ANY past MIPS R2 CPU because that stuff is
>> implemented some time but nobody made it in Linux kernel (it was
>> used by some vendor for non-Linux system). For that reason my patch
>> for lightweight SYNCs has an option - implement it or implement a
>> generic SYNC. It is possible that some vendor did it in different
>> way but nobody knows or test it. But as a minimum - SYNC must be
>> implemented in spinlocks/atomics/bitops, in recent P5600 it is
>> proven that read can pass write in atomics.
>>
>> MIPS R6 is a different story, I verified lightweight SYNCs from the
>> beginning and it also should use SYNCs.
> So you need to build a different kernel for some types of MIPS systems?
> Or do you do boot-time rewriting, like a number of other arches do?
I don't know. I would like to have responses. Ralf asked Maciej about
old systems and that came nowhere. Even rewrite - don't know what to do
with that: no lightweight SYNC or no SYNC at all - yes, it is still
possible that SYNC on some systems can be too heavy or even harmful,
nobody tested that.
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 1:07 ` Leonid Yegoshin
@ 2016-01-27 11:26 ` Maciej W. Rozycki
2016-01-28 0:58 ` Leonid Yegoshin
[not found] ` <56A9656D.3080707@imgtec.com>
0 siblings, 2 replies; 153+ messages in thread
From: Maciej W. Rozycki @ 2016-01-27 11:26 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: paulmck, Will Deacon, Peter Zijlstra, Michael S. Tsirkin,
linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel
On Fri, 15 Jan 2016, Leonid Yegoshin wrote:
> > So you need to build a different kernel for some types of MIPS systems?
> > Or do you do boot-time rewriting, like a number of other arches do?
>
> I don't know. I would like to have responses. Ralf asked Maciej about old
> systems and that came nowhere. Even rewrite - don't know what to do with that:
> no lightweight SYNC or no SYNC at all - yes, it is still possible that SYNC on
> some systems can be too heavy or even harmful, nobody tested that.
I don't recall being asked; mind that I might not get to messages I have
not been cc-ed in a timely manner and I may miss some altogether. With
the amount of mailing list traffic that passes by me my scanner may fail
to trigger. Sorry if this causes anybody trouble, but such is life.
Coincidentally, I have just posted some notes on SYNC in a different
thread, see <http://lkml.iu.edu/hypermail/linux/kernel/1601.3/03080.html>.
There's a reference to an older message of mine there too. I hope this
answers your questions.
Maciej
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-27 11:26 ` Maciej W. Rozycki
@ 2016-01-28 0:58 ` Leonid Yegoshin
[not found] ` <56A9656D.3080707@imgtec.com>
1 sibling, 0 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-28 0:58 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
Will Deacon, virtualization, H. Peter Anvin, sparclinux,
Ingo Molnar, linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, paulmck, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, ddaney.cavm,
Thomas Gleixner, linux-me
On 01/27/2016 03:26 AM, Maciej W. Rozycki wrote:
> On Fri, 15 Jan 2016, Leonid Yegoshin wrote:
>
>>> So you need to build a different kernel for some types of MIPS systems?
>>> Or do you do boot-time rewriting, like a number of other arches do?
>> I don't know. I would like to have responses. Ralf asked Maciej about old
>> systems and that came nowhere. Even rewrite - don't know what to do with that:
>> no lightweight SYNC or no SYNC at all - yes, it is still possible that SYNC on
>> some systems can be too heavy or even harmful, nobody tested that.
> I don't recall being asked; mind that I might not get to messages I have
> not been cc-ed in a timely manner and I may miss some altogether. With
> the amount of mailing list traffic that passes by me my scanner may fail
> to trigger. Sorry if this causes anybody trouble, but such is life.
>
> Coincidentally, I have just posted some notes on SYNC in a different
> thread, see <http://lkml.iu.edu/hypermail/linux/kernel/1601.3/03080.html>.
> There's a reference to an older message of mine there too. I hope this
> answers your questions.
>
> Maciej
In http://patchwork.linux-mips.org/patch/10505/the very last mesg
exchange is:
Maciej,
do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to
test this?
...
Ralf
Maciej W. Rozycki- June 5, 2015, 9:18 p.m.
On Fri, 5 Jun 2015, Ralf Baechle wrote:
> do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to
> test this?
I should be able to check R4400 (that is virtually the same as R4000)
next week or so. As to SiByte -- not before next month I'm afraid. I
don't have access to any of the other processors you named. You may
want to find a better person if you want to accept this change soon.
Maciej
... and that stops forever...
- Leonid.
^ permalink raw reply [flat|nested] 153+ messages in thread
[parent not found: <56A9656D.3080707@imgtec.com>]
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
[not found] ` <56A9656D.3080707@imgtec.com>
@ 2016-01-29 13:38 ` Maciej W. Rozycki
0 siblings, 0 replies; 153+ messages in thread
From: Maciej W. Rozycki @ 2016-01-29 13:38 UTC (permalink / raw)
To: Leonid Yegoshin
Cc: paulmck, Will Deacon, Peter Zijlstra, Michael S. Tsirkin,
linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel
On Thu, 28 Jan 2016, Leonid Yegoshin wrote:
> In http://patchwork.linux-mips.org/patch/10505/ the very last mesg exchange
> is:
[...]
> ... and that stops forever...
Thanks for the reminder -- last June was very hectic, I travelled a lot
and I lost the discussion from my radar. Apologies for that. I replied
in that thread now with my results. I hope this helps.
Maciej
^ permalink raw reply [flat|nested] 153+ messages in thread
[parent not found: <20160115004753.GN3818-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
[not found] ` <20160115004753.GN3818-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2016-01-27 10:40 ` Ralf Baechle
2016-01-27 12:09 ` Maciej W. Rozycki
0 siblings, 1 reply; 153+ messages in thread
From: Ralf Baechle @ 2016-01-27 10:40 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Leonid Yegoshin, Will Deacon, Peter Zijlstra, Michael S. Tsirkin,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Arnd Bergmann,
linux-arch-u79uwXL29TY76Z2rM5mHXA, Andrew Cooper,
Russell King - ARM Linux,
virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Joe Perches, David Miller, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
linux-s390-u79uwXL29TY76Z2rM5mHXA,
sparclinux-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-metag-u79uwXL29TY76Z2rM5mHXA,
linux-mips-6z/3iImG2C8G8FEW9MqTrA, x86-DgEjT+Ai2ygdnm+yROfE0A,
user-mode-linux-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
On Thu, Jan 14, 2016 at 04:47:53PM -0800, Paul E. McKenney wrote:
> So you need to build a different kernel for some types of MIPS systems?
Yes. We can't really do without. Classic MIPS code is not relocatable
without the complexity of PIC code as used by ELF DSOs - and their
performanc penalty. Plus we have a number of architecture revisions
ovr the decades, big and little endian, 32 and 64 bit as the major
stumbling stones. There however are groups of similar systems that
can share kernel binaries.
> Or do you do boot-time rewriting, like a number of other arches do?
We don't rewrite the code (as in the .text of the vmlinux binary) but we
do runtime code generation for a few highly performance sensitive area
of the kernel code such as copy_page() or TLB exception handlers. This
allows more flexibility than just inserting templates into the kernel
code. Downside - it means we have some of the complexity of as and ld
in the kernel.
Ralf
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-27 10:40 ` Ralf Baechle
@ 2016-01-27 12:09 ` Maciej W. Rozycki
0 siblings, 0 replies; 153+ messages in thread
From: Maciej W. Rozycki @ 2016-01-27 12:09 UTC (permalink / raw)
To: Ralf Baechle
Cc: Matt Redfearn, Paul E. McKenney, Leonid Yegoshin, Will Deacon,
Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
linux-arch, Andrew Cooper, Russell King - ARM Linux,
virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
linuxppc-dev
On Wed, 27 Jan 2016, Ralf Baechle wrote:
> > So you need to build a different kernel for some types of MIPS systems?
>
> Yes. We can't really do without. Classic MIPS code is not relocatable
> without the complexity of PIC code as used by ELF DSOs - and their
> performanc penalty. Plus we have a number of architecture revisions
> ovr the decades, big and little endian, 32 and 64 bit as the major
> stumbling stones. There however are groups of similar systems that
> can share kernel binaries.
Matt (cc-ed) has recently posted patches to add support for a relocatable
kernel, implemented without the usual overhead of PIC code. It works by
retaining relocations in a fully-linked binary and then simply replaying
the work the static linker does when assigning addresses, as the image
loaded is copied to its intended destination at an early bootstrap stage.
See:
<http://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i\x1449137297-30464-1-git-send-email-matt.redfearn%40imgtec.com>
for details.
I think this framework can be reused by carefully choosing instructions
used in early bootstrap code, up to the relocation stage, so that it is
runnable anywhere (not the same as PIC!) like early ld.so initialisation
and then loading the whole attached image starting from an address where
RAM does exist on target hardware.
Endianness is a different matter, obviously we can't build a single image
for both, although for distributions' sake an approach similar to one used
with bi-endian firmware (for hardware which has an easy way to switch the
endianness, e.g. a physical jumper or a configuration bit stored in flash
memory; not to be confused with the reverse user endianness mode) might be
feasible, by glueing two kernel images together and then selecting the
right one early in bootstrap, perhaps again reusing Matt's framework.
I'm not sure if this is worth the effort though, I suspect the usage level
of this feature would be minimal.
All in all I think making a generic MIPS kernel just might be feasible,
but with the diversity of options available the effort required would be
enormous. NetBSD for example I believe supports building a kernel that
correctly runs on both R3000 (MIPS I, 32-bit) and R4000 (MIPS III, 64-bit)
DEC hardware (as did DEC Ultrix, the vendor OS for these systems). These
processors are different enough from each other that you cannot use the
same code for cache, memory and exception management in an OS kernel --
backward compatibility is only provided for user software. That proves
the concept, however in a very limited way only, not even covering SMP,
and their R4000 kernel does not support 64-bit userland I believe. They
still have completely separate ports for other MIPS hardware, such as for
Broadcom SiByte SB-1 (MIPS64r1) processors.
Maciej
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-14 22:55 ` Paul E. McKenney
2016-01-14 23:33 ` Leonid Yegoshin
@ 2016-01-15 10:24 ` Will Deacon
2016-01-15 17:54 ` Paul E. McKenney
1 sibling, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-15 10:24 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Thu, Jan 14, 2016 at 02:55:10PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 14, 2016 at 01:36:50PM -0800, Leonid Yegoshin wrote:
> > On 01/14/2016 01:29 PM, Paul E. McKenney wrote:
> > >
> > >>On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
> > >>>
> > >>>The WRC+addr+addr is OK because data dependencies are not required to be
> > >>>transitive, in other words, they are not required to flow from one CPU to
> > >>>another without the help of an explicit memory barrier.
> > >>I don't see any reliable way to fit WRC+addr+addr into "DATA
> > >>DEPENDENCY BARRIERS" section recommendation to have data dependency
> > >>barrier between read of a shared pointer/index and read the shared
> > >>data based on that pointer. If you have this two reads, it doesn't
> > >>matter the rest of scenario, you should put the dependency barrier
> > >>in code anyway. If you don't do it in WRC+addr+addr scenario then
> > >>after years it can be easily changed to different scenario which
> > >>fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
> > >>fails.
> > >The trick is that lockless_dereference() contains an
> > >smp_read_barrier_depends():
> > >
> > >#define lockless_dereference(p) \
> > >({ \
> > > typeof(p) _________p1 = READ_ONCE(p); \
> > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > (_________p1); \
> > >})
> > >
> > >Or am I missing your point?
> >
> > WRC+addr+addr has no any barrier. lockless_dereference() has a
> > barrier. I don't see a common points between this and that in your
> > answer, sorry.
>
> Me, I am wondering what WRC+addr+addr has to do with anything at all.
See my earlier reply [1] (but also, your WRC Linux example looks more
like a variant on WWC and I couldn't really follow it).
> <Going back through earlier email>
>
> OK, so it looks like Will was asking not about WRC+addr+addr, but instead
> about WRC+sync+addr. This would drop an smp_mb() into cpu2() in my
> earlier example, which needs to provide ordering.
>
> I am guessing that the manual's "Older instructions which must be globally
> performed when the SYNC instruction completes" provides the equivalent
> of ARM/Power A-cumulativity, which can be thought of as transitivity
> backwards in time.
I couldn't make that leap. In particular, the manual's "Detailed
Description" sections explicitly refer to program-order:
Every synchronizable specified memory instruction (loads or stores or
both) that occurs in the instruction stream before the SYNC
instruction must reach a stage in the load/store datapath after which
no instruction re-ordering is possible before any synchronizable
specified memory instruction which occurs after the SYNC instruction
in the instruction stream reaches the same stage in the load/store
datapath.
Will
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/399765.html
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 10:24 ` Will Deacon
@ 2016-01-15 17:54 ` Paul E. McKenney
2016-01-15 19:28 ` Paul E. McKenney
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 17:54 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> On Thu, Jan 14, 2016 at 02:55:10PM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 14, 2016 at 01:36:50PM -0800, Leonid Yegoshin wrote:
> > > On 01/14/2016 01:29 PM, Paul E. McKenney wrote:
> > > >
> > > >>On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
> > > >>>
> > > >>>The WRC+addr+addr is OK because data dependencies are not required to be
> > > >>>transitive, in other words, they are not required to flow from one CPU to
> > > >>>another without the help of an explicit memory barrier.
> > > >>I don't see any reliable way to fit WRC+addr+addr into "DATA
> > > >>DEPENDENCY BARRIERS" section recommendation to have data dependency
> > > >>barrier between read of a shared pointer/index and read the shared
> > > >>data based on that pointer. If you have this two reads, it doesn't
> > > >>matter the rest of scenario, you should put the dependency barrier
> > > >>in code anyway. If you don't do it in WRC+addr+addr scenario then
> > > >>after years it can be easily changed to different scenario which
> > > >>fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
> > > >>fails.
> > > >The trick is that lockless_dereference() contains an
> > > >smp_read_barrier_depends():
> > > >
> > > >#define lockless_dereference(p) \
> > > >({ \
> > > > typeof(p) _________p1 = READ_ONCE(p); \
> > > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > > (_________p1); \
> > > >})
> > > >
> > > >Or am I missing your point?
> > >
> > > WRC+addr+addr has no any barrier. lockless_dereference() has a
> > > barrier. I don't see a common points between this and that in your
> > > answer, sorry.
> >
> > Me, I am wondering what WRC+addr+addr has to do with anything at all.
>
> See my earlier reply [1] (but also, your WRC Linux example looks more
> like a variant on WWC and I couldn't really follow it).
I will revisit my WRC Linux example. And yes, creating litmus tests
that use non-fake dependencies is still a bit of an undertaking. :-/
I am sure that it will seem more natural with time and experience...
> > <Going back through earlier email>
> >
> > OK, so it looks like Will was asking not about WRC+addr+addr, but instead
> > about WRC+sync+addr. This would drop an smp_mb() into cpu2() in my
> > earlier example, which needs to provide ordering.
> >
> > I am guessing that the manual's "Older instructions which must be globally
> > performed when the SYNC instruction completes" provides the equivalent
> > of ARM/Power A-cumulativity, which can be thought of as transitivity
> > backwards in time.
>
> I couldn't make that leap. In particular, the manual's "Detailed
> Description" sections explicitly refer to program-order:
>
> Every synchronizable specified memory instruction (loads or stores or
> both) that occurs in the instruction stream before the SYNC
> instruction must reach a stage in the load/store datapath after which
> no instruction re-ordering is possible before any synchronizable
> specified memory instruction which occurs after the SYNC instruction
> in the instruction stream reaches the same stage in the load/store
> datapath.
>
> Will
>
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/399765.html
All good points. I think we all agree that the MIPS documentation could
use significant help. And given that I work for the company that produced
the analogous documentation for PowerPC, that is saying something. ;-)
We simply can't know if MIPS's memory ordering is sufficient for the
Linux kernel given its current implementation of the ordering primitives
and its current documentation.
I feel a bit better than I did earlier due to Leonid's response to my
earlier litmus-test examples. But I do recommend some serious stress
testing of MIPS on a good set of litmus tests. Much nicer finding issues
that way than as random irreproducible strange behavior!
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 17:54 ` Paul E. McKenney
@ 2016-01-15 19:28 ` Paul E. McKenney
2016-01-25 14:41 ` Will Deacon
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 19:28 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> > On Thu, Jan 14, 2016 at 02:55:10PM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 14, 2016 at 01:36:50PM -0800, Leonid Yegoshin wrote:
> > > > On 01/14/2016 01:29 PM, Paul E. McKenney wrote:
> > > > >
> > > > >>On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
> > > > >>>
> > > > >>>The WRC+addr+addr is OK because data dependencies are not required to be
> > > > >>>transitive, in other words, they are not required to flow from one CPU to
> > > > >>>another without the help of an explicit memory barrier.
> > > > >>I don't see any reliable way to fit WRC+addr+addr into "DATA
> > > > >>DEPENDENCY BARRIERS" section recommendation to have data dependency
> > > > >>barrier between read of a shared pointer/index and read the shared
> > > > >>data based on that pointer. If you have this two reads, it doesn't
> > > > >>matter the rest of scenario, you should put the dependency barrier
> > > > >>in code anyway. If you don't do it in WRC+addr+addr scenario then
> > > > >>after years it can be easily changed to different scenario which
> > > > >>fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
> > > > >>fails.
> > > > >The trick is that lockless_dereference() contains an
> > > > >smp_read_barrier_depends():
> > > > >
> > > > >#define lockless_dereference(p) \
> > > > >({ \
> > > > > typeof(p) _________p1 = READ_ONCE(p); \
> > > > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > > > (_________p1); \
> > > > >})
> > > > >
> > > > >Or am I missing your point?
> > > >
> > > > WRC+addr+addr has no any barrier. lockless_dereference() has a
> > > > barrier. I don't see a common points between this and that in your
> > > > answer, sorry.
> > >
> > > Me, I am wondering what WRC+addr+addr has to do with anything at all.
> >
> > See my earlier reply [1] (but also, your WRC Linux example looks more
> > like a variant on WWC and I couldn't really follow it).
>
> I will revisit my WRC Linux example. And yes, creating litmus tests
> that use non-fake dependencies is still a bit of an undertaking. :-/
> I am sure that it will seem more natural with time and experience...
Hmmm... You are quite right, I did do WWC. I need to change cpu2()'s
last access from a store to a load to get WRC. Plus the levels of
indirection definitely didn't match up, did they?
struct foo {
struct foo *next;
};
struct foo a;
struct foo b;
struct foo c = { &a };
struct foo d = { &b };
struct foo x = { &c };
struct foo y = { &d };
struct foo *r1, *r2, *r3;
void cpu0(void)
{
WRITE_ONCE(x.next, &y);
}
void cpu1(void)
{
r1 = lockless_dereference(x.next);
WRITE_ONCE(r1->next, &x);
}
void cpu2(void)
{
r2 = lockless_dereference(y.next);
r3 = READ_ONCE(r2->next);
}
In this case, it is legal to end the run with:
r1 = &y && r2 = &x && r3 = &c
Please see below for a ppcmem litmus test.
So, did I get it right this time? ;-)
Thanx, Paul
PS. And yes, working through this does help me understand the
benefits of fake dependencies. Why do you ask? ;-)
------------------------------------------------------------------------
PPC WRCnf+addrs
""
{
0:r2=x; 0:r3=y;
1:r2=x; 1:r3=y;
2:r2=x; 2:r3=y;
c=a; d=b; x=c; y=d;
}
P0 | P1 | P2 ;
stw r3,0(r2) | lwz r8,0(r2) | lwz r8,0(r3) ;
| stw r2,0(r3) | lwz r9,0(r8) ;
exists
(1:r8=y /\ 2:r8=x /\ 2:r9=c)
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 19:28 ` Paul E. McKenney
@ 2016-01-25 14:41 ` Will Deacon
2016-01-26 1:06 ` Paul E. McKenney
0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-25 14:41 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel
On Fri, Jan 15, 2016 at 11:28:45AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> > > See my earlier reply [1] (but also, your WRC Linux example looks more
> > > like a variant on WWC and I couldn't really follow it).
> >
> > I will revisit my WRC Linux example. And yes, creating litmus tests
> > that use non-fake dependencies is still a bit of an undertaking. :-/
> > I am sure that it will seem more natural with time and experience...
>
> Hmmm... You are quite right, I did do WWC. I need to change cpu2()'s
> last access from a store to a load to get WRC. Plus the levels of
> indirection definitely didn't match up, did they?
Nope, it was pretty baffling!
> struct foo {
> struct foo *next;
> };
> struct foo a;
> struct foo b;
> struct foo c = { &a };
> struct foo d = { &b };
> struct foo x = { &c };
> struct foo y = { &d };
> struct foo *r1, *r2, *r3;
>
> void cpu0(void)
> {
> WRITE_ONCE(x.next, &y);
> }
>
> void cpu1(void)
> {
> r1 = lockless_dereference(x.next);
> WRITE_ONCE(r1->next, &x);
> }
>
> void cpu2(void)
> {
> r2 = lockless_dereference(y.next);
> r3 = READ_ONCE(r2->next);
> }
>
> In this case, it is legal to end the run with:
>
> r1 = &y && r2 = &x && r3 = &c
>
> Please see below for a ppcmem litmus test.
>
> So, did I get it right this time? ;-)
The code above looks correct to me (in that it matches WRC+addrs),
but your litmus test:
> PPC WRCnf+addrs
> ""
> {
> 0:r2=x; 0:r3=y;
> 1:r2=x; 1:r3=y;
> 2:r2=x; 2:r3=y;
> c=a; d=b; x=c; y=d;
> }
> P0 | P1 | P2 ;
> stw r3,0(r2) | lwz r8,0(r2) | lwz r8,0(r3) ;
> | stw r2,0(r3) | lwz r9,0(r8) ;
> exists
> (1:r8=y /\ 2:r8=x /\ 2:r9=c)
Seems to be missing the address dependency on P1.
Will
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-25 14:41 ` Will Deacon
@ 2016-01-26 1:06 ` Paul E. McKenney
2016-01-26 12:10 ` Will Deacon
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 1:06 UTC (permalink / raw)
To: Will Deacon
Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel
On Mon, Jan 25, 2016 at 02:41:34PM +0000, Will Deacon wrote:
> On Fri, Jan 15, 2016 at 11:28:45AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> > > > See my earlier reply [1] (but also, your WRC Linux example looks more
> > > > like a variant on WWC and I couldn't really follow it).
> > >
> > > I will revisit my WRC Linux example. And yes, creating litmus tests
> > > that use non-fake dependencies is still a bit of an undertaking. :-/
> > > I am sure that it will seem more natural with time and experience...
> >
> > Hmmm... You are quite right, I did do WWC. I need to change cpu2()'s
> > last access from a store to a load to get WRC. Plus the levels of
> > indirection definitely didn't match up, did they?
>
> Nope, it was pretty baffling!
"It is a service that I provide." ;-)
> > struct foo {
> > struct foo *next;
> > };
> > struct foo a;
> > struct foo b;
> > struct foo c = { &a };
> > struct foo d = { &b };
> > struct foo x = { &c };
> > struct foo y = { &d };
> > struct foo *r1, *r2, *r3;
> >
> > void cpu0(void)
> > {
> > WRITE_ONCE(x.next, &y);
> > }
> >
> > void cpu1(void)
> > {
> > r1 = lockless_dereference(x.next);
> > WRITE_ONCE(r1->next, &x);
> > }
> >
> > void cpu2(void)
> > {
> > r2 = lockless_dereference(y.next);
> > r3 = READ_ONCE(r2->next);
> > }
> >
> > In this case, it is legal to end the run with:
> >
> > r1 = &y && r2 = &x && r3 = &c
> >
> > Please see below for a ppcmem litmus test.
> >
> > So, did I get it right this time? ;-)
>
> The code above looks correct to me (in that it matches WRC+addrs),
> but your litmus test:
>
> > PPC WRCnf+addrs
> > ""
> > {
> > 0:r2=x; 0:r3=y;
> > 1:r2=x; 1:r3=y;
> > 2:r2=x; 2:r3=y;
> > c=a; d=b; x=c; y=d;
> > }
> > P0 | P1 | P2 ;
> > stw r3,0(r2) | lwz r8,0(r2) | lwz r8,0(r3) ;
> > | stw r2,0(r3) | lwz r9,0(r8) ;
> > exists
> > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
>
> Seems to be missing the address dependency on P1.
You are quite correct! How about the following?
As before, both herd and ppcmem say that the cycle is allowed, as
expected, given non-transitive ordering. To prohibit the cycle, P1
needs a suitable memory-barrier instruction.
Thanx, Paul
------------------------------------------------------------------------
PPC WRCnf+addrs
""
{
0:r2=x; 0:r3=y;
1:r2=x; 1:r3=y;
2:r2=x; 2:r3=y;
c=a; d=b; x=c; y=d;
}
P0 | P1 | P2 ;
stw r3,0(r2) | lwz r8,0(r2) | lwz r8,0(r3) ;
| stw r2,0(r8) | lwz r9,0(r8) ;
exists
(1:r8=y /\ 2:r8=x /\ 2:r9=c)
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-26 1:06 ` Paul E. McKenney
@ 2016-01-26 12:10 ` Will Deacon
2016-01-26 23:37 ` Paul E. McKenney
0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-26 12:10 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel
On Mon, Jan 25, 2016 at 05:06:46PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2016 at 02:41:34PM +0000, Will Deacon wrote:
> > On Fri, Jan 15, 2016 at 11:28:45AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> > > > > See my earlier reply [1] (but also, your WRC Linux example looks more
> > > > > like a variant on WWC and I couldn't really follow it).
> > > >
> > > > I will revisit my WRC Linux example. And yes, creating litmus tests
> > > > that use non-fake dependencies is still a bit of an undertaking. :-/
> > > > I am sure that it will seem more natural with time and experience...
> > >
> > > Hmmm... You are quite right, I did do WWC. I need to change cpu2()'s
> > > last access from a store to a load to get WRC. Plus the levels of
> > > indirection definitely didn't match up, did they?
> >
> > Nope, it was pretty baffling!
>
> "It is a service that I provide." ;-)
>
> > > struct foo {
> > > struct foo *next;
> > > };
> > > struct foo a;
> > > struct foo b;
> > > struct foo c = { &a };
> > > struct foo d = { &b };
> > > struct foo x = { &c };
> > > struct foo y = { &d };
> > > struct foo *r1, *r2, *r3;
> > >
> > > void cpu0(void)
> > > {
> > > WRITE_ONCE(x.next, &y);
> > > }
> > >
> > > void cpu1(void)
> > > {
> > > r1 = lockless_dereference(x.next);
> > > WRITE_ONCE(r1->next, &x);
> > > }
> > >
> > > void cpu2(void)
> > > {
> > > r2 = lockless_dereference(y.next);
> > > r3 = READ_ONCE(r2->next);
> > > }
> > >
> > > In this case, it is legal to end the run with:
> > >
> > > r1 = &y && r2 = &x && r3 = &c
> > >
> > > Please see below for a ppcmem litmus test.
> > >
> > > So, did I get it right this time? ;-)
> >
> > The code above looks correct to me (in that it matches WRC+addrs),
> > but your litmus test:
> >
> > > PPC WRCnf+addrs
> > > ""
> > > {
> > > 0:r2=x; 0:r3=y;
> > > 1:r2=x; 1:r3=y;
> > > 2:r2=x; 2:r3=y;
> > > c=a; d=b; x=c; y=d;
> > > }
> > > P0 | P1 | P2 ;
> > > stw r3,0(r2) | lwz r8,0(r2) | lwz r8,0(r3) ;
> > > | stw r2,0(r3) | lwz r9,0(r8) ;
> > > exists
> > > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
> >
> > Seems to be missing the address dependency on P1.
>
> You are quite correct! How about the following?
I think that's it!
> As before, both herd and ppcmem say that the cycle is allowed, as
> expected, given non-transitive ordering. To prohibit the cycle, P1
> needs a suitable memory-barrier instruction.
>
> ------------------------------------------------------------------------
>
> PPC WRCnf+addrs
> ""
> {
> 0:r2=x; 0:r3=y;
> 1:r2=x; 1:r3=y;
> 2:r2=x; 2:r3=y;
> c=a; d=b; x=c; y=d;
> }
> P0 | P1 | P2 ;
> stw r3,0(r2) | lwz r8,0(r2) | lwz r8,0(r3) ;
> | stw r2,0(r8) | lwz r9,0(r8) ;
> exists
> (1:r8=y /\ 2:r8=x /\ 2:r9=c)
Agreed.
Will
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-26 12:10 ` Will Deacon
@ 2016-01-26 23:37 ` Paul E. McKenney
2016-01-27 10:23 ` Will Deacon
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 23:37 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Tue, Jan 26, 2016 at 12:10:10PM +0000, Will Deacon wrote:
> On Mon, Jan 25, 2016 at 05:06:46PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 02:41:34PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 11:28:45AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> > > > > > See my earlier reply [1] (but also, your WRC Linux example looks more
> > > > > > like a variant on WWC and I couldn't really follow it).
> > > > >
> > > > > I will revisit my WRC Linux example. And yes, creating litmus tests
> > > > > that use non-fake dependencies is still a bit of an undertaking. :-/
> > > > > I am sure that it will seem more natural with time and experience...
> > > >
> > > > Hmmm... You are quite right, I did do WWC. I need to change cpu2()'s
> > > > last access from a store to a load to get WRC. Plus the levels of
> > > > indirection definitely didn't match up, did they?
> > >
> > > Nope, it was pretty baffling!
> >
> > "It is a service that I provide." ;-)
> >
> > > > struct foo {
> > > > struct foo *next;
> > > > };
> > > > struct foo a;
> > > > struct foo b;
> > > > struct foo c = { &a };
> > > > struct foo d = { &b };
> > > > struct foo x = { &c };
> > > > struct foo y = { &d };
> > > > struct foo *r1, *r2, *r3;
> > > >
> > > > void cpu0(void)
> > > > {
> > > > WRITE_ONCE(x.next, &y);
> > > > }
> > > >
> > > > void cpu1(void)
> > > > {
> > > > r1 = lockless_dereference(x.next);
> > > > WRITE_ONCE(r1->next, &x);
> > > > }
> > > >
> > > > void cpu2(void)
> > > > {
> > > > r2 = lockless_dereference(y.next);
> > > > r3 = READ_ONCE(r2->next);
> > > > }
> > > >
> > > > In this case, it is legal to end the run with:
> > > >
> > > > r1 = &y && r2 = &x && r3 = &c
> > > >
> > > > Please see below for a ppcmem litmus test.
> > > >
> > > > So, did I get it right this time? ;-)
> > >
> > > The code above looks correct to me (in that it matches WRC+addrs),
> > > but your litmus test:
> > >
> > > > PPC WRCnf+addrs
> > > > ""
> > > > {
> > > > 0:r2=x; 0:r3=y;
> > > > 1:r2=x; 1:r3=y;
> > > > 2:r2=x; 2:r3=y;
> > > > c=a; d=b; x=c; y=d;
> > > > }
> > > > P0 | P1 | P2 ;
> > > > stw r3,0(r2) | lwz r8,0(r2) | lwz r8,0(r3) ;
> > > > | stw r2,0(r3) | lwz r9,0(r8) ;
> > > > exists
> > > > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
> > >
> > > Seems to be missing the address dependency on P1.
> >
> > You are quite correct! How about the following?
>
> I think that's it!
>
> > As before, both herd and ppcmem say that the cycle is allowed, as
> > expected, given non-transitive ordering. To prohibit the cycle, P1
> > needs a suitable memory-barrier instruction.
> >
> > ------------------------------------------------------------------------
> >
> > PPC WRCnf+addrs
> > ""
> > {
> > 0:r2=x; 0:r3=y;
> > 1:r2=x; 1:r3=y;
> > 2:r2=x; 2:r3=y;
> > c=a; d=b; x=c; y=d;
> > }
> > P0 | P1 | P2 ;
> > stw r3,0(r2) | lwz r8,0(r2) | lwz r8,0(r3) ;
> > | stw r2,0(r8) | lwz r9,0(r8) ;
> > exists
> > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
>
> Agreed.
OK, thank you! Would you agree that it would be good to replace the
current xor-based fake-dependency litmus tests with tests having real
dependencies?
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-26 23:37 ` Paul E. McKenney
@ 2016-01-27 10:23 ` Will Deacon
0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-01-27 10:23 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel
On Tue, Jan 26, 2016 at 03:37:33PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 26, 2016 at 12:10:10PM +0000, Will Deacon wrote:
> > On Mon, Jan 25, 2016 at 05:06:46PM -0800, Paul E. McKenney wrote:
> > > PPC WRCnf+addrs
> > > ""
> > > {
> > > 0:r2=x; 0:r3=y;
> > > 1:r2=x; 1:r3=y;
> > > 2:r2=x; 2:r3=y;
> > > c=a; d=b; x=c; y=d;
> > > }
> > > P0 | P1 | P2 ;
> > > stw r3,0(r2) | lwz r8,0(r2) | lwz r8,0(r3) ;
> > > | stw r2,0(r8) | lwz r9,0(r8) ;
> > > exists
> > > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
> >
> > Agreed.
>
> OK, thank you! Would you agree that it would be good to replace the
> current xor-based fake-dependency litmus tests with tests having real
> dependencies?
Yes, because it would look a lot more like real (kernel) code.
Will
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-14 21:29 ` Paul E. McKenney
2016-01-14 21:36 ` Leonid Yegoshin
@ 2016-01-15 8:55 ` Peter Zijlstra
2016-01-15 9:13 ` Peter Zijlstra
2016-01-15 17:39 ` Paul E. McKenney
1 sibling, 2 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-15 8:55 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel
On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> So smp_mb() provides transitivity, as do pairs of smp_store_release()
> and smp_read_acquire(),
But they provide different grades of transitivity, which is where all
the confusion lays.
smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
Whereas the RCpc release+acquire is weakly so, only the two cpus
involved in the handover will agree on the order.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 8:55 ` Peter Zijlstra
@ 2016-01-15 9:13 ` Peter Zijlstra
2016-01-15 17:46 ` Paul E. McKenney
2016-01-15 17:39 ` Paul E. McKenney
1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-15 9:13 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > and smp_read_acquire(),
>
> But they provide different grades of transitivity, which is where all
> the confusion lays.
>
> smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
>
> Whereas the RCpc release+acquire is weakly so, only the two cpus
> involved in the handover will agree on the order.
And the stuff we're confused about is how best to express the difference
and guarantees of these two forms of transitivity and how exactly they
interact.
And smp_load_acquire()/smp_store_release() are RCpc because TSO archs
and PPC. the atomic*_{acquire,release}() are RCpc because PPC and
LOCK,UNLOCK are similarly RCpc because of PPC.
Now we'd like PPC to stick a SYNC in either LOCK or UNLOCK so at least
the locks are RCsc again, but they resist for performance reasons but
waver because they don't want to be the ones finding all the nasty bugs
because they're the only one.
Now the thing I worry about, and still have not had an answer to is if
weakly ordered MIPS will end up being RCsc or RCpc for their locks if
they get implemented with SYNC_ACQUIRE and SYNC_RELEASE instead of the
current SYNC.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 9:13 ` Peter Zijlstra
@ 2016-01-15 17:46 ` Paul E. McKenney
2016-01-15 21:27 ` Peter Zijlstra
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 17:46 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > > and smp_read_acquire(),
> >
> > But they provide different grades of transitivity, which is where all
> > the confusion lays.
> >
> > smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
> >
> > Whereas the RCpc release+acquire is weakly so, only the two cpus
> > involved in the handover will agree on the order.
>
> And the stuff we're confused about is how best to express the difference
> and guarantees of these two forms of transitivity and how exactly they
> interact.
Hoping my memory-barrier.txt patch helps here...
> And smp_load_acquire()/smp_store_release() are RCpc because TSO archs
> and PPC. the atomic*_{acquire,release}() are RCpc because PPC and
> LOCK,UNLOCK are similarly RCpc because of PPC.
>
> Now we'd like PPC to stick a SYNC in either LOCK or UNLOCK so at least
> the locks are RCsc again, but they resist for performance reasons but
> waver because they don't want to be the ones finding all the nasty bugs
> because they're the only one.
I believe that the relevant proverb said something about starving to
death between two bales of hay... ;-)
> Now the thing I worry about, and still have not had an answer to is if
> weakly ordered MIPS will end up being RCsc or RCpc for their locks if
> they get implemented with SYNC_ACQUIRE and SYNC_RELEASE instead of the
> current SYNC.
It would be good to have better clarity on this, no two ways about it.
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 17:46 ` Paul E. McKenney
@ 2016-01-15 21:27 ` Peter Zijlstra
2016-01-15 21:58 ` Paul E. McKenney
0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-15 21:27 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:
> > And the stuff we're confused about is how best to express the difference
> > and guarantees of these two forms of transitivity and how exactly they
> > interact.
>
> Hoping my memory-barrier.txt patch helps here...
Yes, that seems a good start. But yesterday you raised the 'fun' point
of two globally ordered sequences connected by a single local link.
And I think I'm still confused on LWSYNC (in the smp_wmb case) when one
of the stores looses a conflict, and if that scenario matters. If it
does, we should inspect the same case for other barriers.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 21:27 ` Peter Zijlstra
@ 2016-01-15 21:58 ` Paul E. McKenney
2016-01-25 16:42 ` Will Deacon
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 21:58 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel
On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:
>
> > > And the stuff we're confused about is how best to express the difference
> > > and guarantees of these two forms of transitivity and how exactly they
> > > interact.
> >
> > Hoping my memory-barrier.txt patch helps here...
>
> Yes, that seems a good start. But yesterday you raised the 'fun' point
> of two globally ordered sequences connected by a single local link.
The conclusion that I am slowly coming to is that litmus tests should
not be thought of as linear chains, but rather as cycles. If you think
of it as a cycle, then it doesn't matter where the local link is, just
how many of them and how they are connected.
But I will admit that there are some rather strange litmus tests that
challenge this cycle-centric view, for example, the one shown below.
It turns out that herd and ppcmem disagree on the outcome. (The Power
architects side with ppcmem.)
> And I think I'm still confused on LWSYNC (in the smp_wmb case) when one
> of the stores looses a conflict, and if that scenario matters. If it
> does, we should inspect the same case for other barriers.
Indeed. I am still working on how these should be described. My
current thought is to be quite conservative on what ordering is
actually respected, however, the current task is formalizing how
RCU plays with the rest of the memory model.
Thanx, Paul
------------------------------------------------------------------------
PPC Overlapping Group-B sets version 4
""
(* When the Group-B sets from two different barriers involve instructions in
the same thread, within that thread one set must contain the other.
P0 P1 P2
Rx=1 Wy=1 Wz=2
dep. lwsync lwsync
Ry=0 Wz=1 Wx=1
Rz=1
assert(!(z=2))
Forbidden by ppcmem, allowed by herd.
*)
{
0:r1=x; 0:r2=y; 0:r3=z;
1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
}
P0 | P1 | P2 ;
lwz r6,0(r1) | stw r4,0(r2) | stw r5,0(r3) ;
xor r7,r6,r6 | lwsync | lwsync ;
lwzx r7,r7,r2 | stw r4,0(r3) | stw r4,0(r1) ;
lwz r8,0(r3) | | ;
exists
(z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 21:58 ` Paul E. McKenney
@ 2016-01-25 16:42 ` Will Deacon
2016-01-26 6:03 ` Paul E. McKenney
0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-25 16:42 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> > On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:
> >
> > > > And the stuff we're confused about is how best to express the difference
> > > > and guarantees of these two forms of transitivity and how exactly they
> > > > interact.
> > >
> > > Hoping my memory-barrier.txt patch helps here...
> >
> > Yes, that seems a good start. But yesterday you raised the 'fun' point
> > of two globally ordered sequences connected by a single local link.
>
> The conclusion that I am slowly coming to is that litmus tests should
> not be thought of as linear chains, but rather as cycles. If you think
> of it as a cycle, then it doesn't matter where the local link is, just
> how many of them and how they are connected.
Do you have some examples of this? I'm struggling to make it work in my
mind, or are you talking specifically in the context of the kernel
memory model?
> But I will admit that there are some rather strange litmus tests that
> challenge this cycle-centric view, for example, the one shown below.
> It turns out that herd and ppcmem disagree on the outcome. (The Power
> architects side with ppcmem.)
>
> > And I think I'm still confused on LWSYNC (in the smp_wmb case) when one
> > of the stores looses a conflict, and if that scenario matters. If it
> > does, we should inspect the same case for other barriers.
>
> Indeed. I am still working on how these should be described. My
> current thought is to be quite conservative on what ordering is
> actually respected, however, the current task is formalizing how
> RCU plays with the rest of the memory model.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> PPC Overlapping Group-B sets version 4
> ""
> (* When the Group-B sets from two different barriers involve instructions in
> the same thread, within that thread one set must contain the other.
>
> P0 P1 P2
> Rx=1 Wy=1 Wz=2
> dep. lwsync lwsync
> Ry=0 Wz=1 Wx=1
> Rz=1
>
> assert(!(z=2))
>
> Forbidden by ppcmem, allowed by herd.
> *)
> {
> 0:r1=x; 0:r2=y; 0:r3=z;
> 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> }
> P0 | P1 | P2 ;
> lwz r6,0(r1) | stw r4,0(r2) | stw r5,0(r3) ;
> xor r7,r6,r6 | lwsync | lwsync ;
> lwzx r7,r7,r2 | stw r4,0(r3) | stw r4,0(r1) ;
> lwz r8,0(r3) | | ;
>
> exists
> (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
That really hurts. Assuming that the "assert(!(z=2))" is actually there
to constrain the coherence order of z to be {0->1->2}, then I think that
this test is forbidden on arm using dmb instead of lwsync. That said, I
also don't think the Rz=1 in P0 changes anything.
The double negatives don't help here! (it is forbidden to guarantee that
z is not always 2).
Will
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-25 16:42 ` Will Deacon
@ 2016-01-26 6:03 ` Paul E. McKenney
2016-01-26 10:19 ` Peter Zijlstra
2016-01-26 12:16 ` Will Deacon
0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 6:03 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> > > On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:
> > >
> > > > > And the stuff we're confused about is how best to express the difference
> > > > > and guarantees of these two forms of transitivity and how exactly they
> > > > > interact.
> > > >
> > > > Hoping my memory-barrier.txt patch helps here...
> > >
> > > Yes, that seems a good start. But yesterday you raised the 'fun' point
> > > of two globally ordered sequences connected by a single local link.
> >
> > The conclusion that I am slowly coming to is that litmus tests should
> > not be thought of as linear chains, but rather as cycles. If you think
> > of it as a cycle, then it doesn't matter where the local link is, just
> > how many of them and how they are connected.
>
> Do you have some examples of this? I'm struggling to make it work in my
> mind, or are you talking specifically in the context of the kernel
> memory model?
Now that you mention it, maybe it would be best to keep the transitive
and non-transitive separate for the time being anyway. Just because it
might be possible to deal with does not necessarily mean that we should
be encouraging it. ;-)
> > But I will admit that there are some rather strange litmus tests that
> > challenge this cycle-centric view, for example, the one shown below.
> > It turns out that herd and ppcmem disagree on the outcome. (The Power
> > architects side with ppcmem.)
> >
> > > And I think I'm still confused on LWSYNC (in the smp_wmb case) when one
> > > of the stores looses a conflict, and if that scenario matters. If it
> > > does, we should inspect the same case for other barriers.
> >
> > Indeed. I am still working on how these should be described. My
> > current thought is to be quite conservative on what ordering is
> > actually respected, however, the current task is formalizing how
> > RCU plays with the rest of the memory model.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > PPC Overlapping Group-B sets version 4
> > ""
> > (* When the Group-B sets from two different barriers involve instructions in
> > the same thread, within that thread one set must contain the other.
> >
> > P0 P1 P2
> > Rx=1 Wy=1 Wz=2
> > dep. lwsync lwsync
> > Ry=0 Wz=1 Wx=1
> > Rz=1
> >
> > assert(!(z=2))
> >
> > Forbidden by ppcmem, allowed by herd.
> > *)
> > {
> > 0:r1=x; 0:r2=y; 0:r3=z;
> > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > }
> > P0 | P1 | P2 ;
> > lwz r6,0(r1) | stw r4,0(r2) | stw r5,0(r3) ;
> > xor r7,r6,r6 | lwsync | lwsync ;
> > lwzx r7,r7,r2 | stw r4,0(r3) | stw r4,0(r1) ;
> > lwz r8,0(r3) | | ;
> >
> > exists
> > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
>
> That really hurts. Assuming that the "assert(!(z=2))" is actually there
> to constrain the coherence order of z to be {0->1->2}, then I think that
> this test is forbidden on arm using dmb instead of lwsync. That said, I
> also don't think the Rz=1 in P0 changes anything.
What about the smp_wmb() variant of dmb that orders only stores?
> The double negatives don't help here! (it is forbidden to guarantee that
> z is not always 2).
Yes, this is a weird one, and I don't know of any use of it.
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-26 6:03 ` Paul E. McKenney
@ 2016-01-26 10:19 ` Peter Zijlstra
2016-01-26 20:13 ` Paul E. McKenney
2016-01-26 12:16 ` Will Deacon
1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-26 10:19 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Will Deacon, Leonid Yegoshin, Michael S. Tsirkin, linux-kernel,
Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel
On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> > > > Yes, that seems a good start. But yesterday you raised the 'fun' point
> > > > of two globally ordered sequences connected by a single local link.
> > >
> > > The conclusion that I am slowly coming to is that litmus tests should
> > > not be thought of as linear chains, but rather as cycles. If you think
> > > of it as a cycle, then it doesn't matter where the local link is, just
> > > how many of them and how they are connected.
> >
> > Do you have some examples of this? I'm struggling to make it work in my
> > mind, or are you talking specifically in the context of the kernel
> > memory model?
>
> Now that you mention it, maybe it would be best to keep the transitive
> and non-transitive separate for the time being anyway. Just because it
> might be possible to deal with does not necessarily mean that we should
> be encouraging it. ;-)
So isn't smp_mb__after_unlock_lock() exactly such a scenario? And would
not someone trying to implement RCsc locks using locally transitive
RELEASE/ACQUIRE operations need exactly this stuff?
That is, I am afraid we need to cover the mix of local and global
transitive operations at least in overview.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-26 10:19 ` Peter Zijlstra
@ 2016-01-26 20:13 ` Paul E. McKenney
2016-01-27 8:39 ` Peter Zijlstra
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 20:13 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Tue, Jan 26, 2016 at 11:19:27AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
>
> > > > > Yes, that seems a good start. But yesterday you raised the 'fun' point
> > > > > of two globally ordered sequences connected by a single local link.
> > > >
> > > > The conclusion that I am slowly coming to is that litmus tests should
> > > > not be thought of as linear chains, but rather as cycles. If you think
> > > > of it as a cycle, then it doesn't matter where the local link is, just
> > > > how many of them and how they are connected.
> > >
> > > Do you have some examples of this? I'm struggling to make it work in my
> > > mind, or are you talking specifically in the context of the kernel
> > > memory model?
> >
> > Now that you mention it, maybe it would be best to keep the transitive
> > and non-transitive separate for the time being anyway. Just because it
> > might be possible to deal with does not necessarily mean that we should
> > be encouraging it. ;-)
>
> So isn't smp_mb__after_unlock_lock() exactly such a scenario? And would
> not someone trying to implement RCsc locks using locally transitive
> RELEASE/ACQUIRE operations need exactly this stuff?
>
> That is, I am afraid we need to cover the mix of local and global
> transitive operations at least in overview.
True, but we haven't gotten to locking yet. That said, I would argue
that smp_mb__after_unlock_lock() upgrades locks to transitive, and
thus would not be an exception to the "no combining transitive and
non-transitive steps in cycles" rule.
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-26 20:13 ` Paul E. McKenney
@ 2016-01-27 8:39 ` Peter Zijlstra
0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-27 8:39 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Will Deacon, Leonid Yegoshin, Michael S. Tsirkin, linux-kernel,
Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel
On Tue, Jan 26, 2016 at 12:13:39PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 26, 2016 at 11:19:27AM +0100, Peter Zijlstra wrote:
> > So isn't smp_mb__after_unlock_lock() exactly such a scenario? And would
> > not someone trying to implement RCsc locks using locally transitive
> > RELEASE/ACQUIRE operations need exactly this stuff?
> >
> > That is, I am afraid we need to cover the mix of local and global
> > transitive operations at least in overview.
>
> True, but we haven't gotten to locking yet.
The mythical smp_mb__after_release_acquire() then ;-)
(and yes, I know you're going to say we don't have that)
> That said, I would argue
> that smp_mb__after_unlock_lock() upgrades locks to transitive, and
> thus would not be an exception to the "no combining transitive and
> non-transitive steps in cycles" rule.
But But But ;-) It does that exactly by combining. I suspect this is
(partly) the source of your SC chains with one PC link example.
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-26 6:03 ` Paul E. McKenney
2016-01-26 10:19 ` Peter Zijlstra
@ 2016-01-26 12:16 ` Will Deacon
2016-01-26 14:35 ` Boqun Feng
[not found] ` <20160126121608.GE21553-5wv7dgnIgG8@public.gmane.org>
1 sibling, 2 replies; 153+ messages in thread
From: Will Deacon @ 2016-01-26 12:16 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > PPC Overlapping Group-B sets version 4
> > > ""
> > > (* When the Group-B sets from two different barriers involve instructions in
> > > the same thread, within that thread one set must contain the other.
> > >
> > > P0 P1 P2
> > > Rx=1 Wy=1 Wz=2
> > > dep. lwsync lwsync
> > > Ry=0 Wz=1 Wx=1
> > > Rz=1
> > >
> > > assert(!(z=2))
> > >
> > > Forbidden by ppcmem, allowed by herd.
> > > *)
> > > {
> > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > }
> > > P0 | P1 | P2 ;
> > > lwz r6,0(r1) | stw r4,0(r2) | stw r5,0(r3) ;
> > > xor r7,r6,r6 | lwsync | lwsync ;
> > > lwzx r7,r7,r2 | stw r4,0(r3) | stw r4,0(r1) ;
> > > lwz r8,0(r3) | | ;
> > >
> > > exists
> > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> >
> > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > to constrain the coherence order of z to be {0->1->2}, then I think that
> > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > also don't think the Rz=1 in P0 changes anything.
>
> What about the smp_wmb() variant of dmb that orders only stores?
Tricky, but I think it still works out if the coherence order of z is as
I described above. The line of reasoning is weird though -- I ended up
considering the two cases where P0 reads z before and after it reads x
and what that means for the read of y.
Will
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-26 12:16 ` Will Deacon
@ 2016-01-26 14:35 ` Boqun Feng
[not found] ` <20160126121608.GE21553-5wv7dgnIgG8@public.gmane.org>
1 sibling, 0 replies; 153+ messages in thread
From: Boqun Feng @ 2016-01-26 14:35 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, Paul E. McKenney, linux-xtensa, james.hogan,
Arnd Bergmann, Stefano Stabellini, adi-buildroot-devel,
Leonid Yegoshin, ddaney.cavm, Thomas
[-- Attachment #1: Type: text/plain, Size: 2198 bytes --]
Hi Will,
On Tue, Jan 26, 2016 at 12:16:09PM +0000, Will Deacon wrote:
> On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > PPC Overlapping Group-B sets version 4
> > > > ""
> > > > (* When the Group-B sets from two different barriers involve instructions in
> > > > the same thread, within that thread one set must contain the other.
> > > >
> > > > P0 P1 P2
> > > > Rx=1 Wy=1 Wz=2
> > > > dep. lwsync lwsync
> > > > Ry=0 Wz=1 Wx=1
> > > > Rz=1
> > > >
> > > > assert(!(z=2))
> > > >
> > > > Forbidden by ppcmem, allowed by herd.
> > > > *)
> > > > {
> > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > }
> > > > P0 | P1 | P2 ;
> > > > lwz r6,0(r1) | stw r4,0(r2) | stw r5,0(r3) ;
> > > > xor r7,r6,r6 | lwsync | lwsync ;
> > > > lwzx r7,r7,r2 | stw r4,0(r3) | stw r4,0(r1) ;
> > > > lwz r8,0(r3) | | ;
> > > >
> > > > exists
> > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > >
> > > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > > to constrain the coherence order of z to be {0->1->2}, then I think that
> > > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > > also don't think the Rz=1 in P0 changes anything.
> >
> > What about the smp_wmb() variant of dmb that orders only stores?
>
> Tricky, but I think it still works out if the coherence order of z is as
> I described above. The line of reasoning is weird though -- I ended up
> considering the two cases where P0 reads z before and after it reads x
^^^^^^^^^^^^^^^
Because of the fact that two reads on the same processors can't be
executed simultaneously? I feel like this is exactly something herd
missed.
> and what that means for the read of y.
>
And the reasoning on PPC is similar, so looks like the read of z on P0
is a necessary condition for the exists clause to be forbidden.
Regards,
Boqun
> Will
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 153+ messages in thread
[parent not found: <20160126121608.GE21553-5wv7dgnIgG8@public.gmane.org>]
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
[not found] ` <20160126121608.GE21553-5wv7dgnIgG8@public.gmane.org>
@ 2016-01-26 19:58 ` Paul E. McKenney
2016-01-27 10:25 ` Will Deacon
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 19:58 UTC (permalink / raw)
To: Will Deacon
Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Arnd Bergmann,
linux-arch-u79uwXL29TY76Z2rM5mHXA, Andrew Cooper,
Russell King - ARM Linux,
virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Joe Perches, David Miller, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
linux-s390-u79uwXL29TY76Z2rM5mHXA,
sparclinux-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-metag-u79uwXL29TY76Z2rM5mHXA,
linux-mips-6z/3iImG2C8G8FEW9MqTrA, x86-DgEjT+Ai2ygdnm+yROfE0A,
user-mode-linux-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
adi-buildroot-devel-5NWGOfrQmneRv+LV9MX5ug
On Tue, Jan 26, 2016 at 12:16:09PM +0000, Will Deacon wrote:
> On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > PPC Overlapping Group-B sets version 4
> > > > ""
> > > > (* When the Group-B sets from two different barriers involve instructions in
> > > > the same thread, within that thread one set must contain the other.
> > > >
> > > > P0 P1 P2
> > > > Rx=1 Wy=1 Wz=2
> > > > dep. lwsync lwsync
> > > > Ry=0 Wz=1 Wx=1
> > > > Rz=1
> > > >
> > > > assert(!(z=2))
> > > >
> > > > Forbidden by ppcmem, allowed by herd.
> > > > *)
> > > > {
> > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > }
> > > > P0 | P1 | P2 ;
> > > > lwz r6,0(r1) | stw r4,0(r2) | stw r5,0(r3) ;
> > > > xor r7,r6,r6 | lwsync | lwsync ;
> > > > lwzx r7,r7,r2 | stw r4,0(r3) | stw r4,0(r1) ;
> > > > lwz r8,0(r3) | | ;
> > > >
> > > > exists
> > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > >
> > > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > > to constrain the coherence order of z to be {0->1->2}, then I think that
> > > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > > also don't think the Rz=1 in P0 changes anything.
> >
> > What about the smp_wmb() variant of dmb that orders only stores?
>
> Tricky, but I think it still works out if the coherence order of z is as
> I described above. The line of reasoning is weird though -- I ended up
> considering the two cases where P0 reads z before and after it reads x
> and what that means for the read of y.
By "works out" you mean that ARM prohibits the outcome?
BTW, I never have seen a real-world use for this case. At the moment
it is mostly a cautionary tale about memory-model corner cases and
tools.
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-26 19:58 ` Paul E. McKenney
@ 2016-01-27 10:25 ` Will Deacon
2016-01-27 23:32 ` Paul E. McKenney
0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-27 10:25 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Tue, Jan 26, 2016 at 11:58:20AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 26, 2016 at 12:16:09PM +0000, Will Deacon wrote:
> > On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > > PPC Overlapping Group-B sets version 4
> > > > > ""
> > > > > (* When the Group-B sets from two different barriers involve instructions in
> > > > > the same thread, within that thread one set must contain the other.
> > > > >
> > > > > P0 P1 P2
> > > > > Rx=1 Wy=1 Wz=2
> > > > > dep. lwsync lwsync
> > > > > Ry=0 Wz=1 Wx=1
> > > > > Rz=1
> > > > >
> > > > > assert(!(z=2))
> > > > >
> > > > > Forbidden by ppcmem, allowed by herd.
> > > > > *)
> > > > > {
> > > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > > }
> > > > > P0 | P1 | P2 ;
> > > > > lwz r6,0(r1) | stw r4,0(r2) | stw r5,0(r3) ;
> > > > > xor r7,r6,r6 | lwsync | lwsync ;
> > > > > lwzx r7,r7,r2 | stw r4,0(r3) | stw r4,0(r1) ;
> > > > > lwz r8,0(r3) | | ;
> > > > >
> > > > > exists
> > > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > > >
> > > > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > > > to constrain the coherence order of z to be {0->1->2}, then I think that
> > > > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > > > also don't think the Rz=1 in P0 changes anything.
> > >
> > > What about the smp_wmb() variant of dmb that orders only stores?
> >
> > Tricky, but I think it still works out if the coherence order of z is as
> > I described above. The line of reasoning is weird though -- I ended up
> > considering the two cases where P0 reads z before and after it reads x
> > and what that means for the read of y.
>
> By "works out" you mean that ARM prohibits the outcome?
Yes, that's my understanding.
Will
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-27 10:25 ` Will Deacon
@ 2016-01-27 23:32 ` Paul E. McKenney
0 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-27 23:32 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Wed, Jan 27, 2016 at 10:25:46AM +0000, Will Deacon wrote:
> On Tue, Jan 26, 2016 at 11:58:20AM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 26, 2016 at 12:16:09PM +0000, Will Deacon wrote:
> > > On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > > > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > > > PPC Overlapping Group-B sets version 4
> > > > > > ""
> > > > > > (* When the Group-B sets from two different barriers involve instructions in
> > > > > > the same thread, within that thread one set must contain the other.
> > > > > >
> > > > > > P0 P1 P2
> > > > > > Rx=1 Wy=1 Wz=2
> > > > > > dep. lwsync lwsync
> > > > > > Ry=0 Wz=1 Wx=1
> > > > > > Rz=1
> > > > > >
> > > > > > assert(!(z=2))
> > > > > >
> > > > > > Forbidden by ppcmem, allowed by herd.
> > > > > > *)
> > > > > > {
> > > > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > > > }
> > > > > > P0 | P1 | P2 ;
> > > > > > lwz r6,0(r1) | stw r4,0(r2) | stw r5,0(r3) ;
> > > > > > xor r7,r6,r6 | lwsync | lwsync ;
> > > > > > lwzx r7,r7,r2 | stw r4,0(r3) | stw r4,0(r1) ;
> > > > > > lwz r8,0(r3) | | ;
> > > > > >
> > > > > > exists
> > > > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > > > >
> > > > > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > > > > to constrain the coherence order of z to be {0->1->2}, then I think that
> > > > > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > > > > also don't think the Rz=1 in P0 changes anything.
> > > >
> > > > What about the smp_wmb() variant of dmb that orders only stores?
> > >
> > > Tricky, but I think it still works out if the coherence order of z is as
> > > I described above. The line of reasoning is weird though -- I ended up
> > > considering the two cases where P0 reads z before and after it reads x
> > > and what that means for the read of y.
> >
> > By "works out" you mean that ARM prohibits the outcome?
>
> Yes, that's my understanding.
Very good, we have agreement between the two architectures, then. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 8:55 ` Peter Zijlstra
2016-01-15 9:13 ` Peter Zijlstra
@ 2016-01-15 17:39 ` Paul E. McKenney
2016-01-15 21:29 ` Peter Zijlstra
2016-01-25 18:02 ` Will Deacon
1 sibling, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 17:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > and smp_read_acquire(),
>
> But they provide different grades of transitivity, which is where all
> the confusion lays.
>
> smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
>
> Whereas the RCpc release+acquire is weakly so, only the two cpus
> involved in the handover will agree on the order.
Good point!
Using grace periods in place of smp_mb() also provides strong/global
transitivity, but also insanely high latencies. ;-)
The patch below updates Documentation/memory-barriers.txt to define
local vs. global transitivity. The corresponding ppcmem litmus test
is included below as well.
Should we start putting litmus tests for the various examples
somewhere, perhaps in a litmus-tests directory within each participating
architecture? I have a pile of powerpc-related litmus tests on my laptop,
but they probably aren't doing all that much good there.
Thanx, Paul
------------------------------------------------------------------------
PPC local-transitive
""
{
0:r1=1; 0:r2=u; 0:r3=v; 0:r4=x; 0:r5=y; 0:r6=z;
1:r1=1; 1:r2=u; 1:r3=v; 1:r4=x; 1:r5=y; 1:r6=z;
2:r1=1; 2:r2=u; 2:r3=v; 2:r4=x; 2:r5=y; 2:r6=z;
3:r1=1; 3:r2=u; 3:r3=v; 3:r4=x; 3:r5=y; 3:r6=z;
}
P0 | P1 | P2 | P3 ;
lwz r9,0(r4) | lwz r9,0(r5) | lwz r9,0(r6) | stw r1,0(r3) ;
lwsync | lwsync | lwsync | sync ;
stw r1,0(r2) | lwz r8,0(r3) | stw r1,0(r7) | lwz r9,0(r2) ;
lwsync | lwz r7,0(r2) | | ;
stw r1,0(r5) | lwsync | | ;
| stw r1,0(r6) | | ;
exists
(* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r8=0 /\ 3:r9=0) *)
(* (0:r9=1 /\ 1:r9=1 /\ 2:r9=1) *)
(* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0) *)
(0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0)
------------------------------------------------------------------------
commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Fri Jan 15 09:30:42 2016 -0800
documentation: Distinguish between local and global transitivity
The introduction of smp_load_acquire() and smp_store_release() had
the side effect of introducing a weaker notion of transitivity:
The transitivity of full smp_mb() barriers is global, but that
of smp_store_release()/smp_load_acquire() chains is local. This
commit therefore introduces the notion of local transitivity and
gives an example.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index c66ba46d8079..d8109ed99342 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
General barriers are therefore required to ensure that all CPUs agree
on the combined order of CPU 1's and CPU 2's accesses.
-To reiterate, if your code requires transitivity, use general barriers
-throughout.
+General barriers provide "global transitivity", so that all CPUs will
+agree on the order of operations. In contrast, a chain of release-acquire
+pairs provides only "local transitivity", so that only those CPUs on
+the chain are guaranteed to agree on the combined order of the accesses.
+For example, switching to C code in deference to Herman Hollerith:
+
+ int u, v, x, y, z;
+
+ void cpu0(void)
+ {
+ r0 = smp_load_acquire(&x);
+ WRITE_ONCE(u, 1);
+ smp_store_release(&y, 1);
+ }
+
+ void cpu1(void)
+ {
+ r1 = smp_load_acquire(&y);
+ r4 = READ_ONCE(v);
+ r5 = READ_ONCE(u);
+ smp_store_release(&z, 1);
+ }
+
+ void cpu2(void)
+ {
+ r2 = smp_load_acquire(&z);
+ smp_store_release(&x, 1);
+ }
+
+ void cpu3(void)
+ {
+ WRITE_ONCE(v, 1);
+ smp_mb();
+ r3 = READ_ONCE(u);
+ }
+
+Because cpu0(), cpu1(), and cpu2() participate in a local transitive
+chain of smp_store_release()/smp_load_acquire() pairs, the following
+outcome is prohibited:
+
+ r0 = 1 && r1 = 1 && r2 = 1
+
+Furthermore, because of the release-acquire relationship between cpu0()
+and cpu1(), cpu1() must see cpu0()'s writes, so that the following
+outcome is prohibited:
+
+ r1 = 1 && r5 = 0
+
+However, the transitivity of release-acquire is local to the participating
+CPUs and does not apply to cpu3(). Therefore, the following outcome
+is possible:
+
+ r0 = 0 && r1 = 1 && r2 = 1 && r3 = 0 && r4 = 0
+
+Although cpu0(), cpu1(), and cpu2() will see their respective reads and
+writes in order, CPUs not involved in the release-acquire chain might
+well disagree on the order. This disagreement stems from the fact that
+the weak memory-barrier instructions used to implement smp_load_acquire()
+and smp_store_release() are not required to order prior stores against
+subsequent loads in all cases. This means that cpu3() can see cpu0()'s
+store to u as happening -after- cpu1()'s load from v, even though
+both cpu0() and cpu1() agree that these two operations occurred in the
+intended order.
+
+However, please keep in mind that smp_load_acquire() is not magic.
+In particular, it simply reads from its argument with ordering. It does
+-not- ensure that any particular value will be read. Therefore, the
+following outcome is possible:
+
+ r0 = 0 && r1 = 0 && r2 = 0 && r5 = 0
+
+Note that this outcome can happen even on a mythical sequentially
+consistent system where nothing is ever reordered.
+
+To reiterate, if your code requires global transitivity, use general
+barriers throughout.
============
^ permalink raw reply related [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 17:39 ` Paul E. McKenney
@ 2016-01-15 21:29 ` Peter Zijlstra
2016-01-15 22:01 ` Paul E. McKenney
2016-01-25 18:02 ` Will Deacon
1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-15 21:29 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Fri, Jan 15, 2016 at 09:39:12AM -0800, Paul E. McKenney wrote:
> Should we start putting litmus tests for the various examples
> somewhere, perhaps in a litmus-tests directory within each participating
> architecture? I have a pile of powerpc-related litmus tests on my laptop,
> but they probably aren't doing all that much good there.
Yeah, or a version of them in C that we can 'compile'?
>
> commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date: Fri Jan 15 09:30:42 2016 -0800
>
> documentation: Distinguish between local and global transitivity
>
> The introduction of smp_load_acquire() and smp_store_release() had
> the side effect of introducing a weaker notion of transitivity:
> The transitivity of full smp_mb() barriers is global, but that
> of smp_store_release()/smp_load_acquire() chains is local. This
> commit therefore introduces the notion of local transitivity and
> gives an example.
>
> Reported-by: Peter Zijlstra <peterz@infradead.org>
> Reported-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
I think it fails to mention smp_mb__after_release_acquire(), although I
suspect we didn't actually introduce the primitive yet, which raises the
point, do we want to?
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 21:29 ` Peter Zijlstra
@ 2016-01-15 22:01 ` Paul E. McKenney
0 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 22:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Fri, Jan 15, 2016 at 10:29:12PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 15, 2016 at 09:39:12AM -0800, Paul E. McKenney wrote:
> > Should we start putting litmus tests for the various examples
> > somewhere, perhaps in a litmus-tests directory within each participating
> > architecture? I have a pile of powerpc-related litmus tests on my laptop,
> > but they probably aren't doing all that much good there.
>
> Yeah, or a version of them in C that we can 'compile'?
That would be good as well. I am guessing that architecture-specific
litmus tests will also be needed, but you are right that
architecture-independent versions are higher priority.
> > commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date: Fri Jan 15 09:30:42 2016 -0800
> >
> > documentation: Distinguish between local and global transitivity
> >
> > The introduction of smp_load_acquire() and smp_store_release() had
> > the side effect of introducing a weaker notion of transitivity:
> > The transitivity of full smp_mb() barriers is global, but that
> > of smp_store_release()/smp_load_acquire() chains is local. This
> > commit therefore introduces the notion of local transitivity and
> > gives an example.
> >
> > Reported-by: Peter Zijlstra <peterz@infradead.org>
> > Reported-by: Will Deacon <will.deacon@arm.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> I think it fails to mention smp_mb__after_release_acquire(), although I
> suspect we didn't actually introduce the primitive yet, which raises the
> point, do we want to?
Well, it is not in v4.4. I believe that we need good use cases before
we add it.
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-15 17:39 ` Paul E. McKenney
2016-01-15 21:29 ` Peter Zijlstra
@ 2016-01-25 18:02 ` Will Deacon
2016-01-26 6:12 ` Paul E. McKenney
1 sibling, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-25 18:02 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin, linux-kernel,
Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel
Hi Paul,
On Fri, Jan 15, 2016 at 09:39:12AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > > and smp_read_acquire(),
> >
> > But they provide different grades of transitivity, which is where all
> > the confusion lays.
> >
> > smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
> >
> > Whereas the RCpc release+acquire is weakly so, only the two cpus
> > involved in the handover will agree on the order.
>
> Good point!
>
> Using grace periods in place of smp_mb() also provides strong/global
> transitivity, but also insanely high latencies. ;-)
>
> The patch below updates Documentation/memory-barriers.txt to define
> local vs. global transitivity. The corresponding ppcmem litmus test
> is included below as well.
>
> Should we start putting litmus tests for the various examples
> somewhere, perhaps in a litmus-tests directory within each participating
> architecture? I have a pile of powerpc-related litmus tests on my laptop,
> but they probably aren't doing all that much good there.
I too would like to have the litmus tests in the kernel so that we can
refer to them from memory-barriers.txt. Ideally they wouldn't be targetted
to a particular arch, however.
> PPC local-transitive
> ""
> {
> 0:r1=1; 0:r2=u; 0:r3=v; 0:r4=x; 0:r5=y; 0:r6=z;
> 1:r1=1; 1:r2=u; 1:r3=v; 1:r4=x; 1:r5=y; 1:r6=z;
> 2:r1=1; 2:r2=u; 2:r3=v; 2:r4=x; 2:r5=y; 2:r6=z;
> 3:r1=1; 3:r2=u; 3:r3=v; 3:r4=x; 3:r5=y; 3:r6=z;
> }
> P0 | P1 | P2 | P3 ;
> lwz r9,0(r4) | lwz r9,0(r5) | lwz r9,0(r6) | stw r1,0(r3) ;
> lwsync | lwsync | lwsync | sync ;
> stw r1,0(r2) | lwz r8,0(r3) | stw r1,0(r7) | lwz r9,0(r2) ;
> lwsync | lwz r7,0(r2) | | ;
> stw r1,0(r5) | lwsync | | ;
> | stw r1,0(r6) | | ;
> exists
> (* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r8=0 /\ 3:r9=0) *)
> (* (0:r9=1 /\ 1:r9=1 /\ 2:r9=1) *)
> (* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0) *)
> (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0)
i.e. we should rewrite this using READ_ONCE/WRITE_ONCE and smp_mb() etc.
> ------------------------------------------------------------------------
>
> commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date: Fri Jan 15 09:30:42 2016 -0800
>
> documentation: Distinguish between local and global transitivity
>
> The introduction of smp_load_acquire() and smp_store_release() had
> the side effect of introducing a weaker notion of transitivity:
> The transitivity of full smp_mb() barriers is global, but that
> of smp_store_release()/smp_load_acquire() chains is local. This
> commit therefore introduces the notion of local transitivity and
> gives an example.
>
> Reported-by: Peter Zijlstra <peterz@infradead.org>
> Reported-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index c66ba46d8079..d8109ed99342 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
> General barriers are therefore required to ensure that all CPUs agree
> on the combined order of CPU 1's and CPU 2's accesses.
>
> -To reiterate, if your code requires transitivity, use general barriers
> -throughout.
> +General barriers provide "global transitivity", so that all CPUs will
> +agree on the order of operations. In contrast, a chain of release-acquire
> +pairs provides only "local transitivity", so that only those CPUs on
> +the chain are guaranteed to agree on the combined order of the accesses.
Thanks for having a go at this. I tried defining something axiomatically,
but got stuck pretty quickly. In my scheme, I used "data-directed
transitivity" instead of "local transitivity", since the latter seems to
be a bit of a misnomer.
> +For example, switching to C code in deference to Herman Hollerith:
> +
> + int u, v, x, y, z;
> +
> + void cpu0(void)
> + {
> + r0 = smp_load_acquire(&x);
> + WRITE_ONCE(u, 1);
> + smp_store_release(&y, 1);
> + }
> +
> + void cpu1(void)
> + {
> + r1 = smp_load_acquire(&y);
> + r4 = READ_ONCE(v);
> + r5 = READ_ONCE(u);
> + smp_store_release(&z, 1);
> + }
> +
> + void cpu2(void)
> + {
> + r2 = smp_load_acquire(&z);
> + smp_store_release(&x, 1);
> + }
> +
> + void cpu3(void)
> + {
> + WRITE_ONCE(v, 1);
> + smp_mb();
> + r3 = READ_ONCE(u);
> + }
> +
> +Because cpu0(), cpu1(), and cpu2() participate in a local transitive
> +chain of smp_store_release()/smp_load_acquire() pairs, the following
> +outcome is prohibited:
> +
> + r0 = 1 && r1 = 1 && r2 = 1
> +
> +Furthermore, because of the release-acquire relationship between cpu0()
> +and cpu1(), cpu1() must see cpu0()'s writes, so that the following
> +outcome is prohibited:
> +
> + r1 = 1 && r5 = 0
> +
> +However, the transitivity of release-acquire is local to the participating
> +CPUs and does not apply to cpu3(). Therefore, the following outcome
> +is possible:
> +
> + r0 = 0 && r1 = 1 && r2 = 1 && r3 = 0 && r4 = 0
I think you should be completely explicit and include r5 = 1 here, too.
Also -- where would you add the smp_mb__after_release_acquire to fix
(i.e. forbid) this? Immediately after cpu1()'s read of y?
> +Although cpu0(), cpu1(), and cpu2() will see their respective reads and
> +writes in order, CPUs not involved in the release-acquire chain might
> +well disagree on the order. This disagreement stems from the fact that
> +the weak memory-barrier instructions used to implement smp_load_acquire()
> +and smp_store_release() are not required to order prior stores against
> +subsequent loads in all cases. This means that cpu3() can see cpu0()'s
> +store to u as happening -after- cpu1()'s load from v, even though
> +both cpu0() and cpu1() agree that these two operations occurred in the
> +intended order.
> +
> +However, please keep in mind that smp_load_acquire() is not magic.
> +In particular, it simply reads from its argument with ordering. It does
> +-not- ensure that any particular value will be read. Therefore, the
> +following outcome is possible:
> +
> + r0 = 0 && r1 = 0 && r2 = 0 && r5 = 0
> +
> +Note that this outcome can happen even on a mythical sequentially
> +consistent system where nothing is ever reordered.
I'm not sure this last bit is strictly necessary. If somebody thinks that
acquire/release involve some sort of implicit synchronisation, I think
they may have bigger problems with memory-barriers.txt.
Will
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-25 18:02 ` Will Deacon
@ 2016-01-26 6:12 ` Paul E. McKenney
2016-01-26 10:15 ` Peter Zijlstra
0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 6:12 UTC (permalink / raw)
To: Will Deacon
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Peter Zijlstra,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Mon, Jan 25, 2016 at 06:02:34PM +0000, Will Deacon wrote:
> Hi Paul,
>
> On Fri, Jan 15, 2016 at 09:39:12AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> > > On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > > > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > > > and smp_read_acquire(),
> > >
> > > But they provide different grades of transitivity, which is where all
> > > the confusion lays.
> > >
> > > smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
> > >
> > > Whereas the RCpc release+acquire is weakly so, only the two cpus
> > > involved in the handover will agree on the order.
> >
> > Good point!
> >
> > Using grace periods in place of smp_mb() also provides strong/global
> > transitivity, but also insanely high latencies. ;-)
> >
> > The patch below updates Documentation/memory-barriers.txt to define
> > local vs. global transitivity. The corresponding ppcmem litmus test
> > is included below as well.
> >
> > Should we start putting litmus tests for the various examples
> > somewhere, perhaps in a litmus-tests directory within each participating
> > architecture? I have a pile of powerpc-related litmus tests on my laptop,
> > but they probably aren't doing all that much good there.
>
> I too would like to have the litmus tests in the kernel so that we can
> refer to them from memory-barriers.txt. Ideally they wouldn't be targetted
> to a particular arch, however.
Agreed. Working on it...
> > PPC local-transitive
> > ""
> > {
> > 0:r1=1; 0:r2=u; 0:r3=v; 0:r4=x; 0:r5=y; 0:r6=z;
> > 1:r1=1; 1:r2=u; 1:r3=v; 1:r4=x; 1:r5=y; 1:r6=z;
> > 2:r1=1; 2:r2=u; 2:r3=v; 2:r4=x; 2:r5=y; 2:r6=z;
> > 3:r1=1; 3:r2=u; 3:r3=v; 3:r4=x; 3:r5=y; 3:r6=z;
> > }
> > P0 | P1 | P2 | P3 ;
> > lwz r9,0(r4) | lwz r9,0(r5) | lwz r9,0(r6) | stw r1,0(r3) ;
> > lwsync | lwsync | lwsync | sync ;
> > stw r1,0(r2) | lwz r8,0(r3) | stw r1,0(r7) | lwz r9,0(r2) ;
> > lwsync | lwz r7,0(r2) | | ;
> > stw r1,0(r5) | lwsync | | ;
> > | stw r1,0(r6) | | ;
> > exists
> > (* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r8=0 /\ 3:r9=0) *)
> > (* (0:r9=1 /\ 1:r9=1 /\ 2:r9=1) *)
> > (* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0) *)
> > (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0)
>
> i.e. we should rewrite this using READ_ONCE/WRITE_ONCE and smp_mb() etc.
Yep!
> > ------------------------------------------------------------------------
> >
> > commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date: Fri Jan 15 09:30:42 2016 -0800
> >
> > documentation: Distinguish between local and global transitivity
> >
> > The introduction of smp_load_acquire() and smp_store_release() had
> > the side effect of introducing a weaker notion of transitivity:
> > The transitivity of full smp_mb() barriers is global, but that
> > of smp_store_release()/smp_load_acquire() chains is local. This
> > commit therefore introduces the notion of local transitivity and
> > gives an example.
> >
> > Reported-by: Peter Zijlstra <peterz@infradead.org>
> > Reported-by: Will Deacon <will.deacon@arm.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >
> > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > index c66ba46d8079..d8109ed99342 100644
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
> > General barriers are therefore required to ensure that all CPUs agree
> > on the combined order of CPU 1's and CPU 2's accesses.
> >
> > -To reiterate, if your code requires transitivity, use general barriers
> > -throughout.
> > +General barriers provide "global transitivity", so that all CPUs will
> > +agree on the order of operations. In contrast, a chain of release-acquire
> > +pairs provides only "local transitivity", so that only those CPUs on
> > +the chain are guaranteed to agree on the combined order of the accesses.
>
> Thanks for having a go at this. I tried defining something axiomatically,
> but got stuck pretty quickly. In my scheme, I used "data-directed
> transitivity" instead of "local transitivity", since the latter seems to
> be a bit of a misnomer.
I figured that "local" meant local to the CPUs participating in the
release-acquire chain. As opposed to smp_mb() chains where the ordering
is "global" as in visible to all CPUs, whether on the chain or not.
Does that help?
> > +For example, switching to C code in deference to Herman Hollerith:
> > +
> > + int u, v, x, y, z;
> > +
> > + void cpu0(void)
> > + {
> > + r0 = smp_load_acquire(&x);
> > + WRITE_ONCE(u, 1);
> > + smp_store_release(&y, 1);
> > + }
> > +
> > + void cpu1(void)
> > + {
> > + r1 = smp_load_acquire(&y);
> > + r4 = READ_ONCE(v);
> > + r5 = READ_ONCE(u);
> > + smp_store_release(&z, 1);
> > + }
> > +
> > + void cpu2(void)
> > + {
> > + r2 = smp_load_acquire(&z);
> > + smp_store_release(&x, 1);
> > + }
> > +
> > + void cpu3(void)
> > + {
> > + WRITE_ONCE(v, 1);
> > + smp_mb();
> > + r3 = READ_ONCE(u);
> > + }
> > +
> > +Because cpu0(), cpu1(), and cpu2() participate in a local transitive
> > +chain of smp_store_release()/smp_load_acquire() pairs, the following
> > +outcome is prohibited:
> > +
> > + r0 = 1 && r1 = 1 && r2 = 1
> > +
> > +Furthermore, because of the release-acquire relationship between cpu0()
> > +and cpu1(), cpu1() must see cpu0()'s writes, so that the following
> > +outcome is prohibited:
> > +
> > + r1 = 1 && r5 = 0
> > +
> > +However, the transitivity of release-acquire is local to the participating
> > +CPUs and does not apply to cpu3(). Therefore, the following outcome
> > +is possible:
> > +
> > + r0 = 0 && r1 = 1 && r2 = 1 && r3 = 0 && r4 = 0
>
> I think you should be completely explicit and include r5 = 1 here, too.
Good point -- I added this as an additional outcome:
r0 = 0 && r1 = 1 && r2 = 1 && r3 = 0 && r4 = 0 && r5 = 1
> Also -- where would you add the smp_mb__after_release_acquire to fix
> (i.e. forbid) this? Immediately after cpu1()'s read of y?
That sounds plausible, but we would first have to agree on exactly
what smp_mb__after_release_acquire() did. ;-)
> > +Although cpu0(), cpu1(), and cpu2() will see their respective reads and
> > +writes in order, CPUs not involved in the release-acquire chain might
> > +well disagree on the order. This disagreement stems from the fact that
> > +the weak memory-barrier instructions used to implement smp_load_acquire()
> > +and smp_store_release() are not required to order prior stores against
> > +subsequent loads in all cases. This means that cpu3() can see cpu0()'s
> > +store to u as happening -after- cpu1()'s load from v, even though
> > +both cpu0() and cpu1() agree that these two operations occurred in the
> > +intended order.
> > +
> > +However, please keep in mind that smp_load_acquire() is not magic.
> > +In particular, it simply reads from its argument with ordering. It does
> > +-not- ensure that any particular value will be read. Therefore, the
> > +following outcome is possible:
> > +
> > + r0 = 0 && r1 = 0 && r2 = 0 && r5 = 0
> > +
> > +Note that this outcome can happen even on a mythical sequentially
> > +consistent system where nothing is ever reordered.
>
> I'm not sure this last bit is strictly necessary. If somebody thinks that
> acquire/release involve some sort of implicit synchronisation, I think
> they may have bigger problems with memory-barriers.txt.
Agreed. But unless I add text like this occasionally, such people could
easily read through much of memory-barriers.txt and think that they did
in fact understand it. So I have to occasionally trip an assertion in
their brain. Or try to... :-/
Thanx, Paul
^ permalink raw reply [flat|nested] 153+ messages in thread
* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
2016-01-26 6:12 ` Paul E. McKenney
@ 2016-01-26 10:15 ` Peter Zijlstra
0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-26 10:15 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
virtualization, H. Peter Anvin, sparclinux, Ingo Molnar,
linux-arch, linux-s390, Russell King - ARM Linux,
user-mode-linux-devel, linux-sh, Michael Ellerman, x86, xen-devel,
Ingo Molnar, linux-xtensa, james.hogan, Arnd Bergmann,
Stefano Stabellini, adi-buildroot-devel, Leonid Yegoshin,
ddaney.cavm, Thomas Gleixner, linux-metag
On Mon, Jan 25, 2016 at 10:12:11PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2016 at 06:02:34PM +0000, Will Deacon wrote:
> > Thanks for having a go at this. I tried defining something axiomatically,
> > but got stuck pretty quickly. In my scheme, I used "data-directed
> > transitivity" instead of "local transitivity", since the latter seems to
> > be a bit of a misnomer.
>
> I figured that "local" meant local to the CPUs participating in the
> release-acquire chain. As opposed to smp_mb() chains where the ordering
> is "global" as in visible to all CPUs, whether on the chain or not.
> Does that help?
That is in fact how I read and understood it.
^ permalink raw reply [flat|nested] 153+ messages in thread
* [PATCH v3 12/41] x86/um: reuse asm-generic/barrier.h
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (9 preceding siblings ...)
2016-01-10 14:18 ` [PATCH v3 11/41] mips: " Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
2016-01-10 14:18 ` [PATCH v3 13/41] x86: " Michael S. Tsirkin
` (29 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Richard
On x86/um CONFIG_SMP is never defined. As a result, several macros
match the asm-generic variant exactly. Drop the local definitions and
pull in asm-generic/barrier.h instead.
This is in preparation to refactoring this code area.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Richard Weinberger <richard@nod.at>
---
arch/x86/um/asm/barrier.h | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h
index 755481f..174781a 100644
--- a/arch/x86/um/asm/barrier.h
+++ b/arch/x86/um/asm/barrier.h
@@ -36,13 +36,6 @@
#endif /* CONFIG_X86_PPRO_FENCE */
#define dma_wmb() barrier()
-#define smp_mb() barrier()
-#define smp_rmb() barrier()
-#define smp_wmb() barrier()
-
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0)
-
-#define read_barrier_depends() do { } while (0)
-#define smp_read_barrier_depends() do { } while (0)
+#include <asm-generic/barrier.h>
#endif
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 13/41] x86: reuse asm-generic/barrier.h
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (10 preceding siblings ...)
2016-01-10 14:18 ` [PATCH v3 12/41] x86/um: " Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
2016-01-12 14:10 ` Thomas Gleixner
2016-01-10 14:18 ` [PATCH v3 14/41] asm-generic: add __smp_xxx wrappers Michael S. Tsirkin
` (28 subsequent siblings)
40 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Ingo Molnar
As on most architectures, on x86 read_barrier_depends and
smp_read_barrier_depends are empty. Drop the local definitions and pull
the generic ones from asm-generic/barrier.h instead: they are identical.
This is in preparation to refactoring this code area.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/x86/include/asm/barrier.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 0681d25..cc4c2a7 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -43,9 +43,6 @@
#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0)
#endif /* SMP */
-#define read_barrier_depends() do { } while (0)
-#define smp_read_barrier_depends() do { } while (0)
-
#if defined(CONFIG_X86_PPRO_FENCE)
/*
@@ -91,4 +88,6 @@ do { \
#define smp_mb__before_atomic() barrier()
#define smp_mb__after_atomic() barrier()
+#include <asm-generic/barrier.h>
+
#endif /* _ASM_X86_BARRIER_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* Re: [PATCH v3 13/41] x86: reuse asm-generic/barrier.h
2016-01-10 14:18 ` [PATCH v3 13/41] x86: " Michael S. Tsirkin
@ 2016-01-12 14:10 ` Thomas Gleixner
0 siblings, 0 replies; 153+ messages in thread
From: Thomas Gleixner @ 2016-01-12 14:10 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, Peter Zijlstra, Arnd Bergmann, linux-arch,
Andrew Cooper, Russell King - ARM Linux, virtualization,
Stefano Stabellini, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh
On Sun, 10 Jan 2016, Michael S. Tsirkin wrote:
> As on most architectures, on x86 read_barrier_depends and
> smp_read_barrier_depends are empty. Drop the local definitions and pull
> the generic ones from asm-generic/barrier.h instead: they are identical.
>
> This is in preparation to refactoring this code area.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply [flat|nested] 153+ messages in thread
* [PATCH v3 14/41] asm-generic: add __smp_xxx wrappers
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (11 preceding siblings ...)
2016-01-10 14:18 ` [PATCH v3 13/41] x86: " Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
2016-01-10 14:18 ` [PATCH v3 15/41] powerpc: define __smp_xxx Michael S. Tsirkin
` (27 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
On !SMP, most architectures define their
barriers as compiler barriers.
On SMP, most need an actual barrier.
Make it possible to remove the code duplication for
!SMP by defining low-level __smp_xxx barriers
which do not depend on the value of SMP, then
use them from asm-generic conditionally.
Besides reducing code duplication, these low level APIs will also be
useful for virtualization, where a barrier is sometimes needed even if
!SMP since we might be talking to another kernel on the same SMP system.
Both virtio and Xen drivers will benefit.
The smp_xxx variants should use __smp_XXX ones or barrier() depending on
SMP, identically for all architectures.
We keep ifndef guards around them for now - once/if all
architectures are converted to use the generic
code, we'll be able to remove these.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
include/asm-generic/barrier.h | 91 ++++++++++++++++++++++++++++++++++++++-----
1 file changed, 82 insertions(+), 9 deletions(-)
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 987b2e0..8752964 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -54,22 +54,38 @@
#define read_barrier_depends() do { } while (0)
#endif
+#ifndef __smp_mb
+#define __smp_mb() mb()
+#endif
+
+#ifndef __smp_rmb
+#define __smp_rmb() rmb()
+#endif
+
+#ifndef __smp_wmb
+#define __smp_wmb() wmb()
+#endif
+
+#ifndef __smp_read_barrier_depends
+#define __smp_read_barrier_depends() read_barrier_depends()
+#endif
+
#ifdef CONFIG_SMP
#ifndef smp_mb
-#define smp_mb() mb()
+#define smp_mb() __smp_mb()
#endif
#ifndef smp_rmb
-#define smp_rmb() rmb()
+#define smp_rmb() __smp_rmb()
#endif
#ifndef smp_wmb
-#define smp_wmb() wmb()
+#define smp_wmb() __smp_wmb()
#endif
#ifndef smp_read_barrier_depends
-#define smp_read_barrier_depends() read_barrier_depends()
+#define smp_read_barrier_depends() __smp_read_barrier_depends()
#endif
#else /* !CONFIG_SMP */
@@ -92,23 +108,78 @@
#endif /* CONFIG_SMP */
+#ifndef __smp_store_mb
+#define __smp_store_mb(var, value) do { WRITE_ONCE(var, value); __smp_mb(); } while (0)
+#endif
+
+#ifndef __smp_mb__before_atomic
+#define __smp_mb__before_atomic() __smp_mb()
+#endif
+
+#ifndef __smp_mb__after_atomic
+#define __smp_mb__after_atomic() __smp_mb()
+#endif
+
+#ifndef __smp_store_release
+#define __smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ __smp_mb(); \
+ WRITE_ONCE(*p, v); \
+} while (0)
+#endif
+
+#ifndef __smp_load_acquire
+#define __smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = READ_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ __smp_mb(); \
+ ___p1; \
+})
+#endif
+
+#ifdef CONFIG_SMP
+
+#ifndef smp_store_mb
+#define smp_store_mb(var, value) __smp_store_mb(var, value)
+#endif
+
+#ifndef smp_mb__before_atomic
+#define smp_mb__before_atomic() __smp_mb__before_atomic()
+#endif
+
+#ifndef smp_mb__after_atomic
+#define smp_mb__after_atomic() __smp_mb__after_atomic()
+#endif
+
+#ifndef smp_store_release
+#define smp_store_release(p, v) __smp_store_release(p, v)
+#endif
+
+#ifndef smp_load_acquire
+#define smp_load_acquire(p) __smp_load_acquire(p)
+#endif
+
+#else /* !CONFIG_SMP */
+
#ifndef smp_store_mb
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0)
#endif
#ifndef smp_mb__before_atomic
-#define smp_mb__before_atomic() smp_mb()
+#define smp_mb__before_atomic() barrier()
#endif
#ifndef smp_mb__after_atomic
-#define smp_mb__after_atomic() smp_mb()
+#define smp_mb__after_atomic() barrier()
#endif
#ifndef smp_store_release
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
- smp_mb(); \
+ barrier(); \
WRITE_ONCE(*p, v); \
} while (0)
#endif
@@ -118,10 +189,12 @@ do { \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
- smp_mb(); \
+ barrier(); \
___p1; \
})
#endif
+#endif
+
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 15/41] powerpc: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (12 preceding siblings ...)
2016-01-10 14:18 ` [PATCH v3 14/41] asm-generic: add __smp_xxx wrappers Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
2016-01-10 14:18 ` [PATCH v3 16/41] arm64: " Michael S. Tsirkin
` (26 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Boqun Feng <boqun.fe>
This defines __smp_xxx barriers for powerpc
for use by virtualization.
smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
This reduces the amount of arch-specific boiler-plate code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
---
arch/powerpc/include/asm/barrier.h | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index 980ad0c..c0deafc 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -44,19 +44,11 @@
#define dma_rmb() __lwsync()
#define dma_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
-#ifdef CONFIG_SMP
-#define smp_lwsync() __lwsync()
+#define __smp_lwsync() __lwsync()
-#define smp_mb() mb()
-#define smp_rmb() __lwsync()
-#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
-#else
-#define smp_lwsync() barrier()
-
-#define smp_mb() barrier()
-#define smp_rmb() barrier()
-#define smp_wmb() barrier()
-#endif /* CONFIG_SMP */
+#define __smp_mb() mb()
+#define __smp_rmb() __lwsync()
+#define __smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
/*
* This is a barrier which prevents following instructions from being
@@ -67,18 +59,18 @@
#define data_barrier(x) \
asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory");
-#define smp_store_release(p, v) \
+#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
- smp_lwsync(); \
+ __smp_lwsync(); \
WRITE_ONCE(*p, v); \
} while (0)
-#define smp_load_acquire(p) \
+#define __smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
- smp_lwsync(); \
+ __smp_lwsync(); \
___p1; \
})
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 16/41] arm64: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (13 preceding siblings ...)
2016-01-10 14:18 ` [PATCH v3 15/41] powerpc: define __smp_xxx Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
2016-01-10 14:18 ` [PATCH v3 17/41] arm: " Michael S. Tsirkin
` (25 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Catalin Marinas <ca>
This defines __smp_xxx barriers for arm64,
for use by virtualization.
smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
Note: arm64 does not support !SMP config,
so smp_xxx and __smp_xxx are always equivalent.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/arm64/include/asm/barrier.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 91a43f4..dae5c49 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -35,11 +35,11 @@
#define dma_rmb() dmb(oshld)
#define dma_wmb() dmb(oshst)
-#define smp_mb() dmb(ish)
-#define smp_rmb() dmb(ishld)
-#define smp_wmb() dmb(ishst)
+#define __smp_mb() dmb(ish)
+#define __smp_rmb() dmb(ishld)
+#define __smp_wmb() dmb(ishst)
-#define smp_store_release(p, v) \
+#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
switch (sizeof(*p)) { \
@@ -62,7 +62,7 @@ do { \
} \
} while (0)
-#define smp_load_acquire(p) \
+#define __smp_load_acquire(p) \
({ \
union { typeof(*p) __val; char __c[1]; } __u; \
compiletime_assert_atomic_type(*p); \
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 17/41] arm: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (14 preceding siblings ...)
2016-01-10 14:18 ` [PATCH v3 16/41] arm64: " Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 18/41] blackfin: " Michael S. Tsirkin
` (24 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Russell King <rmk+k>
This defines __smp_xxx barriers for arm,
for use by virtualization.
smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
This reduces the amount of arch-specific boiler-plate code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
arch/arm/include/asm/barrier.h | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 31152e8..112cc1a 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -60,15 +60,9 @@ extern void arm_heavy_mb(void);
#define dma_wmb() barrier()
#endif
-#ifndef CONFIG_SMP
-#define smp_mb() barrier()
-#define smp_rmb() barrier()
-#define smp_wmb() barrier()
-#else
-#define smp_mb() dmb(ish)
-#define smp_rmb() smp_mb()
-#define smp_wmb() dmb(ishst)
-#endif
+#define __smp_mb() dmb(ish)
+#define __smp_rmb() __smp_mb()
+#define __smp_wmb() dmb(ishst)
#include <asm-generic/barrier.h>
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 18/41] blackfin: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (15 preceding siblings ...)
2016-01-10 14:18 ` [PATCH v3 17/41] arm: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 19/41] ia64: " Michael S. Tsirkin
` (23 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Steven Miao <realmz6>
This defines __smp_xxx barriers for blackfin,
for use by virtualization.
smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/blackfin/include/asm/barrier.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/blackfin/include/asm/barrier.h b/arch/blackfin/include/asm/barrier.h
index dfb66fe..7cca51c 100644
--- a/arch/blackfin/include/asm/barrier.h
+++ b/arch/blackfin/include/asm/barrier.h
@@ -78,8 +78,8 @@
#endif /* !CONFIG_SMP */
-#define smp_mb__before_atomic() barrier()
-#define smp_mb__after_atomic() barrier()
+#define __smp_mb__before_atomic() barrier()
+#define __smp_mb__after_atomic() barrier()
#include <asm-generic/barrier.h>
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 19/41] ia64: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (16 preceding siblings ...)
2016-01-10 14:19 ` [PATCH v3 18/41] blackfin: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 20/41] metag: " Michael S. Tsirkin
` (22 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Tony Luck <tony.luck>
This defines __smp_xxx barriers for ia64,
for use by virtualization.
smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
This reduces the amount of arch-specific boiler-plate code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/ia64/include/asm/barrier.h | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index 2f93348..588f161 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -42,28 +42,24 @@
#define dma_rmb() mb()
#define dma_wmb() mb()
-#ifdef CONFIG_SMP
-# define smp_mb() mb()
-#else
-# define smp_mb() barrier()
-#endif
+# define __smp_mb() mb()
-#define smp_mb__before_atomic() barrier()
-#define smp_mb__after_atomic() barrier()
+#define __smp_mb__before_atomic() barrier()
+#define __smp_mb__after_atomic() barrier()
/*
* IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no
* need for asm trickery!
*/
-#define smp_store_release(p, v) \
+#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
barrier(); \
WRITE_ONCE(*p, v); \
} while (0)
-#define smp_load_acquire(p) \
+#define __smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 20/41] metag: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (17 preceding siblings ...)
2016-01-10 14:19 ` [PATCH v3 19/41] ia64: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 21/41] mips: " Michael S. Tsirkin
` (21 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, James Hogan <james.
This defines __smp_xxx barriers for metag,
for use by virtualization.
smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
Note: as __smp_XX macros should not depend on CONFIG_SMP, they can not
use the existing fence() macro since that is defined differently between
SMP and !SMP. For this reason, this patch introduces a wrapper
metag_fence() that doesn't depend on CONFIG_SMP.
fence() is then defined using that, depending on CONFIG_SMP.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/metag/include/asm/barrier.h | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index b5b778b..84880c9 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -44,13 +44,6 @@ static inline void wr_fence(void)
#define rmb() barrier()
#define wmb() mb()
-#ifndef CONFIG_SMP
-#define fence() do { } while (0)
-#define smp_mb() barrier()
-#define smp_rmb() barrier()
-#define smp_wmb() barrier()
-#else
-
#ifdef CONFIG_METAG_SMP_WRITE_REORDERING
/*
* Write to the atomic memory unlock system event register (command 0). This is
@@ -60,26 +53,31 @@ static inline void wr_fence(void)
* incoherence). It is therefore ineffective if used after and on the same
* thread as a write.
*/
-static inline void fence(void)
+static inline void metag_fence(void)
{
volatile int *flushptr = (volatile int *) LINSYSEVENT_WR_ATOMIC_UNLOCK;
barrier();
*flushptr = 0;
barrier();
}
-#define smp_mb() fence()
-#define smp_rmb() fence()
-#define smp_wmb() barrier()
+#define __smp_mb() metag_fence()
+#define __smp_rmb() metag_fence()
+#define __smp_wmb() barrier()
#else
-#define fence() do { } while (0)
-#define smp_mb() barrier()
-#define smp_rmb() barrier()
-#define smp_wmb() barrier()
+#define metag_fence() do { } while (0)
+#define __smp_mb() barrier()
+#define __smp_rmb() barrier()
+#define __smp_wmb() barrier()
#endif
+
+#ifdef CONFIG_SMP
+#define fence() metag_fence()
+#else
+#define fence() do { } while (0)
#endif
-#define smp_mb__before_atomic() barrier()
-#define smp_mb__after_atomic() barrier()
+#define __smp_mb__before_atomic() barrier()
+#define __smp_mb__after_atomic() barrier()
#include <asm-generic/barrier.h>
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 21/41] mips: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (18 preceding siblings ...)
2016-01-10 14:19 ` [PATCH v3 20/41] metag: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 22/41] s390: " Michael S. Tsirkin
` (20 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Ralf Baechle
This defines __smp_xxx barriers for mips,
for use by virtualization.
smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
Note: the only exception is smp_mb__before_llsc which is mips-specific.
We define both the __smp_mb__before_llsc variant (for use in
asm/barriers.h) and smp_mb__before_llsc (for use elsewhere on this
architecture).
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/mips/include/asm/barrier.h | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 3eac4b9..d296633 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -85,20 +85,20 @@
#define wmb() fast_wmb()
#define rmb() fast_rmb()
-#if defined(CONFIG_WEAK_ORDERING) && defined(CONFIG_SMP)
+#if defined(CONFIG_WEAK_ORDERING)
# ifdef CONFIG_CPU_CAVIUM_OCTEON
-# define smp_mb() __sync()
-# define smp_rmb() barrier()
-# define smp_wmb() __syncw()
+# define __smp_mb() __sync()
+# define __smp_rmb() barrier()
+# define __smp_wmb() __syncw()
# else
-# define smp_mb() __asm__ __volatile__("sync" : : :"memory")
-# define smp_rmb() __asm__ __volatile__("sync" : : :"memory")
-# define smp_wmb() __asm__ __volatile__("sync" : : :"memory")
+# define __smp_mb() __asm__ __volatile__("sync" : : :"memory")
+# define __smp_rmb() __asm__ __volatile__("sync" : : :"memory")
+# define __smp_wmb() __asm__ __volatile__("sync" : : :"memory")
# endif
#else
-#define smp_mb() barrier()
-#define smp_rmb() barrier()
-#define smp_wmb() barrier()
+#define __smp_mb() barrier()
+#define __smp_rmb() barrier()
+#define __smp_wmb() barrier()
#endif
#if defined(CONFIG_WEAK_REORDERING_BEYOND_LLSC) && defined(CONFIG_SMP)
@@ -111,6 +111,7 @@
#ifdef CONFIG_CPU_CAVIUM_OCTEON
#define smp_mb__before_llsc() smp_wmb()
+#define __smp_mb__before_llsc() __smp_wmb()
/* Cause previous writes to become visible on all CPUs as soon as possible */
#define nudge_writes() __asm__ __volatile__(".set push\n\t" \
".set arch=octeon\n\t" \
@@ -118,11 +119,12 @@
".set pop" : : : "memory")
#else
#define smp_mb__before_llsc() smp_llsc_mb()
+#define __smp_mb__before_llsc() smp_llsc_mb()
#define nudge_writes() mb()
#endif
-#define smp_mb__before_atomic() smp_mb__before_llsc()
-#define smp_mb__after_atomic() smp_llsc_mb()
+#define __smp_mb__before_atomic() __smp_mb__before_llsc()
+#define __smp_mb__after_atomic() smp_llsc_mb()
#include <asm-generic/barrier.h>
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 22/41] s390: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (19 preceding siblings ...)
2016-01-10 14:19 ` [PATCH v3 21/41] mips: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 23/41] sh: define __smp_xxx, fix smp_store_mb for !SMP Michael S. Tsirkin
` (19 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Martin
This defines __smp_xxx barriers for s390,
for use by virtualization.
Some smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
Note: smp_mb, smp_rmb and smp_wmb are defined as full barriers
unconditionally on this architecture.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
arch/s390/include/asm/barrier.h | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index c358c31..fbd25b2 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -26,18 +26,21 @@
#define wmb() barrier()
#define dma_rmb() mb()
#define dma_wmb() mb()
-#define smp_mb() mb()
-#define smp_rmb() rmb()
-#define smp_wmb() wmb()
-
-#define smp_store_release(p, v) \
+#define __smp_mb() mb()
+#define __smp_rmb() rmb()
+#define __smp_wmb() wmb()
+#define smp_mb() __smp_mb()
+#define smp_rmb() __smp_rmb()
+#define smp_wmb() __smp_wmb()
+
+#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
barrier(); \
WRITE_ONCE(*p, v); \
} while (0)
-#define smp_load_acquire(p) \
+#define __smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 23/41] sh: define __smp_xxx, fix smp_store_mb for !SMP
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (20 preceding siblings ...)
2016-01-10 14:19 ` [PATCH v3 22/41] s390: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 24/41] sparc: define __smp_xxx Michael S. Tsirkin
` (18 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Ingo Molnar
sh variant of smp_store_mb() calls xchg() on !SMP which is stronger than
implied by both the name and the documentation.
define __smp_store_mb instead: code in asm-generic/barrier.h
will then define smp_store_mb correctly depending on
CONFIG_SMP.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/sh/include/asm/barrier.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/sh/include/asm/barrier.h b/arch/sh/include/asm/barrier.h
index bf91037..f887c64 100644
--- a/arch/sh/include/asm/barrier.h
+++ b/arch/sh/include/asm/barrier.h
@@ -32,7 +32,8 @@
#define ctrl_barrier() __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop")
#endif
-#define smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
+#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
+#define smp_store_mb(var, value) __smp_store_mb(var, value)
#include <asm-generic/barrier.h>
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 24/41] sparc: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (21 preceding siblings ...)
2016-01-10 14:19 ` [PATCH v3 23/41] sh: define __smp_xxx, fix smp_store_mb for !SMP Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
[not found] ` <1452426622-4471-1-git-send-email-mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
` (17 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Ingo Molnar
This defines __smp_xxx barriers for sparc,
for use by virtualization.
smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
---
arch/sparc/include/asm/barrier_64.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 26c3f72..c9f6ee6 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -37,14 +37,14 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \
#define rmb() __asm__ __volatile__("":::"memory")
#define wmb() __asm__ __volatile__("":::"memory")
-#define smp_store_release(p, v) \
+#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
barrier(); \
WRITE_ONCE(*p, v); \
} while (0)
-#define smp_load_acquire(p) \
+#define __smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
@@ -52,8 +52,8 @@ do { \
___p1; \
})
-#define smp_mb__before_atomic() barrier()
-#define smp_mb__after_atomic() barrier()
+#define __smp_mb__before_atomic() barrier()
+#define __smp_mb__after_atomic() barrier()
#include <asm-generic/barrier.h>
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
[parent not found: <1452426622-4471-1-git-send-email-mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* [PATCH v3 05/41] powerpc: reuse asm-generic/barrier.h
[not found] ` <1452426622-4471-1-git-send-email-mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-01-10 14:17 ` Michael S. Tsirkin
2016-01-12 16:31 ` Paul E. McKenney
2016-01-10 14:20 ` [PATCH v3 25/41] tile: define __smp_xxx Michael S. Tsirkin
1 sibling, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch-u79uwXL29TY76Z2rM5mHXA,
Andrew Cooper, Russell King - ARM Linux,
virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Joe Perches, David Miller, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
linux-s390-u79uwXL29TY76Z2rM5mHXA,
sparclinux-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-metag-u79uwXL29TY76Z2rM5mHXA,
linux-mips-6z/3iImG2C8G8FEW9MqTrA, x86-DgEjT+Ai2ygdnm+yROfE0A,
user-mode-linux-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
adi-buildroot-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
linux-sh-u79uwXL29TY76Z2rM5mHXA,
linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw,
xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b, Benjamin
On powerpc read_barrier_depends, smp_read_barrier_depends
smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.
This is in preparation to refactoring this code area.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/powerpc/include/asm/barrier.h | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index a7af5fb..980ad0c 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -34,8 +34,6 @@
#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define wmb() __asm__ __volatile__ ("sync" : : : "memory")
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
#ifdef __SUBARCH_HAS_LWSYNC
# define SMPWMB LWSYNC
#else
@@ -60,9 +58,6 @@
#define smp_wmb() barrier()
#endif /* CONFIG_SMP */
-#define read_barrier_depends() do { } while (0)
-#define smp_read_barrier_depends() do { } while (0)
-
/*
* This is a barrier which prevents following instructions from being
* started until the value of the argument x is known. For example, if
@@ -87,8 +82,8 @@ do { \
___p1; \
})
-#define smp_mb__before_atomic() smp_mb()
-#define smp_mb__after_atomic() smp_mb()
#define smp_mb__before_spinlock() smp_mb()
+#include <asm-generic/barrier.h>
+
#endif /* _ASM_POWERPC_BARRIER_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* Re: [PATCH v3 05/41] powerpc: reuse asm-generic/barrier.h
2016-01-10 14:17 ` [PATCH v3 05/41] powerpc: reuse asm-generic/barrier.h Michael S. Tsirkin
@ 2016-01-12 16:31 ` Paul E. McKenney
0 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-12 16:31 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-mips, linux-ia64, linux-sh, Peter Zijlstra,
Benjamin Herrenschmidt, virtualization, Paul Mackerras,
H. Peter Anvin, sparclinux, Ingo Molnar, linux-arch, linux-s390,
Davidlohr Bueso, Russell King - ARM Linux, Arnd Bergmann,
Michael Ellerman, x86, xen-devel, Ingo Molnar, linux-xtensa,
user-mode-linux-devel, Stefano Stabellini, adi-buildroot-devel,
Thomas Gleixner, linux-metag
On Sun, Jan 10, 2016 at 04:17:09PM +0200, Michael S. Tsirkin wrote:
> On powerpc read_barrier_depends, smp_read_barrier_depends
> smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
> asm-generic variants exactly. Drop the local definitions and pull in
> asm-generic/barrier.h instead.
>
> This is in preparation to refactoring this code area.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
Looks sane to me.
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
> arch/powerpc/include/asm/barrier.h | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> index a7af5fb..980ad0c 100644
> --- a/arch/powerpc/include/asm/barrier.h
> +++ b/arch/powerpc/include/asm/barrier.h
> @@ -34,8 +34,6 @@
> #define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
>
> -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> -
> #ifdef __SUBARCH_HAS_LWSYNC
> # define SMPWMB LWSYNC
> #else
> @@ -60,9 +58,6 @@
> #define smp_wmb() barrier()
> #endif /* CONFIG_SMP */
>
> -#define read_barrier_depends() do { } while (0)
> -#define smp_read_barrier_depends() do { } while (0)
> -
> /*
> * This is a barrier which prevents following instructions from being
> * started until the value of the argument x is known. For example, if
> @@ -87,8 +82,8 @@ do { \
> ___p1; \
> })
>
> -#define smp_mb__before_atomic() smp_mb()
> -#define smp_mb__after_atomic() smp_mb()
> #define smp_mb__before_spinlock() smp_mb()
>
> +#include <asm-generic/barrier.h>
> +
> #endif /* _ASM_POWERPC_BARRIER_H */
> --
> MST
>
^ permalink raw reply [flat|nested] 153+ messages in thread
* [PATCH v3 25/41] tile: define __smp_xxx
[not found] ` <1452426622-4471-1-git-send-email-mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-01-10 14:17 ` [PATCH v3 05/41] powerpc: reuse asm-generic/barrier.h Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
1 sibling, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch-u79uwXL29TY76Z2rM5mHXA,
Andrew Cooper, Russell King - ARM Linux,
virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Joe Perches, David Miller, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
linux-s390-u79uwXL29TY76Z2rM5mHXA,
sparclinux-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-metag-u79uwXL29TY76Z2rM5mHXA,
linux-mips-6z/3iImG2C8G8FEW9MqTrA, x86-DgEjT+Ai2ygdnm+yROfE0A,
user-mode-linux-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
adi-buildroot-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
linux-sh-u79uwXL29TY76Z2rM5mHXA,
linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw,
xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b,
Chris Metcalf <cmetc>
This defines __smp_xxx barriers for tile,
for use by virtualization.
Some smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
Note: for 32 bit, keep smp_mb__after_atomic around since it's faster
than the generic implementation.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/tile/include/asm/barrier.h | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/tile/include/asm/barrier.h b/arch/tile/include/asm/barrier.h
index 96a42ae..d552228 100644
--- a/arch/tile/include/asm/barrier.h
+++ b/arch/tile/include/asm/barrier.h
@@ -79,11 +79,12 @@ mb_incoherent(void)
* But after the word is updated, the routine issues an "mf" before returning,
* and since it's a function call, we don't even need a compiler barrier.
*/
-#define smp_mb__before_atomic() smp_mb()
-#define smp_mb__after_atomic() do { } while (0)
+#define __smp_mb__before_atomic() __smp_mb()
+#define __smp_mb__after_atomic() do { } while (0)
+#define smp_mb__after_atomic() __smp_mb__after_atomic()
#else /* 64 bit */
-#define smp_mb__before_atomic() smp_mb()
-#define smp_mb__after_atomic() smp_mb()
+#define __smp_mb__before_atomic() __smp_mb()
+#define __smp_mb__after_atomic() __smp_mb()
#endif
#include <asm-generic/barrier.h>
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 26/41] xtensa: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (23 preceding siblings ...)
[not found] ` <1452426622-4471-1-git-send-email-mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-01-10 14:20 ` Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 27/41] x86: " Michael S. Tsirkin
` (15 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
This defines __smp_xxx barriers for xtensa,
for use by virtualization.
smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/xtensa/include/asm/barrier.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/xtensa/include/asm/barrier.h b/arch/xtensa/include/asm/barrier.h
index 5b88774..956596e 100644
--- a/arch/xtensa/include/asm/barrier.h
+++ b/arch/xtensa/include/asm/barrier.h
@@ -13,8 +13,8 @@
#define rmb() barrier()
#define wmb() mb()
-#define smp_mb__before_atomic() barrier()
-#define smp_mb__after_atomic() barrier()
+#define __smp_mb__before_atomic() barrier()
+#define __smp_mb__after_atomic() barrier()
#include <asm-generic/barrier.h>
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 27/41] x86: define __smp_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (24 preceding siblings ...)
2016-01-10 14:20 ` [PATCH v3 26/41] xtensa: " Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
2016-01-12 14:11 ` Thomas Gleixner
2016-01-10 14:20 ` [PATCH v3 28/41] asm-generic: implement virt_xxx memory barriers Michael S. Tsirkin
` (14 subsequent siblings)
40 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Ingo Molnar
This defines __smp_xxx barriers for x86,
for use by virtualization.
smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/x86/include/asm/barrier.h | 31 ++++++++++++-------------------
1 file changed, 12 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index cc4c2a7..a584e1c 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -31,17 +31,10 @@
#endif
#define dma_wmb() barrier()
-#ifdef CONFIG_SMP
-#define smp_mb() mb()
-#define smp_rmb() dma_rmb()
-#define smp_wmb() barrier()
-#define smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
-#else /* !SMP */
-#define smp_mb() barrier()
-#define smp_rmb() barrier()
-#define smp_wmb() barrier()
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0)
-#endif /* SMP */
+#define __smp_mb() mb()
+#define __smp_rmb() dma_rmb()
+#define __smp_wmb() barrier()
+#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
#if defined(CONFIG_X86_PPRO_FENCE)
@@ -50,31 +43,31 @@
* model and we should fall back to full barriers.
*/
-#define smp_store_release(p, v) \
+#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
- smp_mb(); \
+ __smp_mb(); \
WRITE_ONCE(*p, v); \
} while (0)
-#define smp_load_acquire(p) \
+#define __smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
- smp_mb(); \
+ __smp_mb(); \
___p1; \
})
#else /* regular x86 TSO memory ordering */
-#define smp_store_release(p, v) \
+#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
barrier(); \
WRITE_ONCE(*p, v); \
} while (0)
-#define smp_load_acquire(p) \
+#define __smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
@@ -85,8 +78,8 @@ do { \
#endif
/* Atomic operations are already serializing on x86 */
-#define smp_mb__before_atomic() barrier()
-#define smp_mb__after_atomic() barrier()
+#define __smp_mb__before_atomic() barrier()
+#define __smp_mb__after_atomic() barrier()
#include <asm-generic/barrier.h>
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* Re: [PATCH v3 27/41] x86: define __smp_xxx
2016-01-10 14:20 ` [PATCH v3 27/41] x86: " Michael S. Tsirkin
@ 2016-01-12 14:11 ` Thomas Gleixner
0 siblings, 0 replies; 153+ messages in thread
From: Thomas Gleixner @ 2016-01-12 14:11 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, Peter Zijlstra, Arnd Bergmann, linux-arch,
Andrew Cooper, Russell King - ARM Linux, virtualization,
Stefano Stabellini, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Ingo Molnar
On Sun, 10 Jan 2016, Michael S. Tsirkin wrote:
> This defines __smp_xxx barriers for x86,
> for use by virtualization.
>
> smp_xxx barriers are removed as they are
> defined correctly by asm-generic/barriers.h
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply [flat|nested] 153+ messages in thread
* [PATCH v3 28/41] asm-generic: implement virt_xxx memory barriers
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (25 preceding siblings ...)
2016-01-10 14:20 ` [PATCH v3 27/41] x86: " Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 29/41] Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb" Michael S. Tsirkin
` (13 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Jonathan Corbet <cor>
Guests running within virtual machines might be affected by SMP effects even if
the guest itself is compiled without SMP support. This is an artifact of
interfacing with an SMP host while running an UP kernel. Using mandatory
barriers for this use-case would be possible but is often suboptimal.
In particular, virtio uses a bunch of confusing ifdefs to work around
this, while xen just uses the mandatory barriers.
To better handle this case, low-level virt_mb() etc macros are made available.
These are implemented trivially using the low-level __smp_xxx macros,
the purpose of these wrappers is to annotate those specific cases.
These have the same effect as smp_mb() etc when SMP is enabled, but generate
identical code for SMP and non-SMP systems. For example, virtual machine guests
should use virt_mb() rather than smp_mb() when synchronizing against a
(possibly SMP) host.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
include/asm-generic/barrier.h | 11 +++++++++++
Documentation/memory-barriers.txt | 28 +++++++++++++++++++++++-----
2 files changed, 34 insertions(+), 5 deletions(-)
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 8752964..1cceca14 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -196,5 +196,16 @@ do { \
#endif
+/* Barriers for virtual machine guests when talking to an SMP host */
+#define virt_mb() __smp_mb()
+#define virt_rmb() __smp_rmb()
+#define virt_wmb() __smp_wmb()
+#define virt_read_barrier_depends() __smp_read_barrier_depends()
+#define virt_store_mb(var, value) __smp_store_mb(var, value)
+#define virt_mb__before_atomic() __smp_mb__before_atomic()
+#define virt_mb__after_atomic() __smp_mb__after_atomic()
+#define virt_store_release(p, v) __smp_store_release(p, v)
+#define virt_load_acquire(p) __smp_load_acquire(p)
+
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index aef9487..8f4a93a 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1655,17 +1655,18 @@ macro is a good place to start looking.
SMP memory barriers are reduced to compiler barriers on uniprocessor compiled
systems because it is assumed that a CPU will appear to be self-consistent,
and will order overlapping accesses correctly with respect to itself.
+However, see the subsection on "Virtual Machine Guests" below.
[!] Note that SMP memory barriers _must_ be used to control the ordering of
references to shared memory on SMP systems, though the use of locking instead
is sufficient.
Mandatory barriers should not be used to control SMP effects, since mandatory
-barriers unnecessarily impose overhead on UP systems. They may, however, be
-used to control MMIO effects on accesses through relaxed memory I/O windows.
-These are required even on non-SMP systems as they affect the order in which
-memory operations appear to a device by prohibiting both the compiler and the
-CPU from reordering them.
+barriers impose unnecessary overhead on both SMP and UP systems. They may,
+however, be used to control MMIO effects on accesses through relaxed memory I/O
+windows. These barriers are required even on non-SMP systems as they affect
+the order in which memory operations appear to a device by prohibiting both the
+compiler and the CPU from reordering them.
There are some more advanced barrier functions:
@@ -2948,6 +2949,23 @@ The Alpha defines the Linux kernel's memory barrier model.
See the subsection on "Cache Coherency" above.
+VIRTUAL MACHINE GUESTS
+-------------------
+
+Guests running within virtual machines might be affected by SMP effects even if
+the guest itself is compiled without SMP support. This is an artifact of
+interfacing with an SMP host while running an UP kernel. Using mandatory
+barriers for this use-case would be possible but is often suboptimal.
+
+To handle this case optimally, low-level virt_mb() etc macros are available.
+These have the same effect as smp_mb() etc when SMP is enabled, but generate
+identical code for SMP and non-SMP systems. For example, virtual machine guests
+should use virt_mb() rather than smp_mb() when synchronizing against a
+(possibly SMP) host.
+
+These are equivalent to smp_mb() etc counterparts in all other respects,
+in particular, they do not control MMIO effects: to control
+MMIO effects, use mandatory barriers.
======
EXAMPLE USES
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 29/41] Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb"
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (26 preceding siblings ...)
2016-01-10 14:20 ` [PATCH v3 28/41] asm-generic: implement virt_xxx memory barriers Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 30/41] virtio_ring: update weak barriers to use virt_xxx Michael S. Tsirkin
` (12 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
This reverts commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9.
While that commit optimizes !CONFIG_SMP, it mixes
up DMA and SMP concepts, making the code hard
to figure out.
A better way to optimize this is with the new __smp_XXX
barriers.
As a first step, go back to full rmb/wmb barriers
for !SMP.
We switch to __smp_XXX barriers in the next patch.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
include/linux/virtio_ring.h | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 8e50888..67e06fe 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -21,20 +21,19 @@
* actually quite cheap.
*/
+#ifdef CONFIG_SMP
static inline void virtio_mb(bool weak_barriers)
{
-#ifdef CONFIG_SMP
if (weak_barriers)
smp_mb();
else
-#endif
mb();
}
static inline void virtio_rmb(bool weak_barriers)
{
if (weak_barriers)
- dma_rmb();
+ smp_rmb();
else
rmb();
}
@@ -42,10 +41,26 @@ static inline void virtio_rmb(bool weak_barriers)
static inline void virtio_wmb(bool weak_barriers)
{
if (weak_barriers)
- dma_wmb();
+ smp_wmb();
else
wmb();
}
+#else
+static inline void virtio_mb(bool weak_barriers)
+{
+ mb();
+}
+
+static inline void virtio_rmb(bool weak_barriers)
+{
+ rmb();
+}
+
+static inline void virtio_wmb(bool weak_barriers)
+{
+ wmb();
+}
+#endif
struct virtio_device;
struct virtqueue;
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 30/41] virtio_ring: update weak barriers to use virt_xxx
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (27 preceding siblings ...)
2016-01-10 14:20 ` [PATCH v3 29/41] Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb" Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 31/41] sh: support 1 and 2 byte xchg Michael S. Tsirkin
` (11 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
virtio ring uses smp_wmb on SMP and wmb on !SMP,
the reason for the later being that it might be
talking to another kernel on the same SMP machine.
This is exactly what virt_xxx barriers do,
so switch to these instead of homegrown ifdef hacks.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
include/linux/virtio_ring.h | 25 ++++---------------------
1 file changed, 4 insertions(+), 21 deletions(-)
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 67e06fe..f3fa55b 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -12,7 +12,7 @@
* anyone care?
*
* For virtio_pci on SMP, we don't need to order with respect to MMIO
- * accesses through relaxed memory I/O windows, so smp_mb() et al are
+ * accesses through relaxed memory I/O windows, so virt_mb() et al are
* sufficient.
*
* For using virtio to talk to real devices (eg. other heterogeneous
@@ -21,11 +21,10 @@
* actually quite cheap.
*/
-#ifdef CONFIG_SMP
static inline void virtio_mb(bool weak_barriers)
{
if (weak_barriers)
- smp_mb();
+ virt_mb();
else
mb();
}
@@ -33,7 +32,7 @@ static inline void virtio_mb(bool weak_barriers)
static inline void virtio_rmb(bool weak_barriers)
{
if (weak_barriers)
- smp_rmb();
+ virt_rmb();
else
rmb();
}
@@ -41,26 +40,10 @@ static inline void virtio_rmb(bool weak_barriers)
static inline void virtio_wmb(bool weak_barriers)
{
if (weak_barriers)
- smp_wmb();
+ virt_wmb();
else
wmb();
}
-#else
-static inline void virtio_mb(bool weak_barriers)
-{
- mb();
-}
-
-static inline void virtio_rmb(bool weak_barriers)
-{
- rmb();
-}
-
-static inline void virtio_wmb(bool weak_barriers)
-{
- wmb();
-}
-#endif
struct virtio_device;
struct virtqueue;
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 31/41] sh: support 1 and 2 byte xchg
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (28 preceding siblings ...)
2016-01-10 14:20 ` [PATCH v3 30/41] virtio_ring: update weak barriers to use virt_xxx Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 32/41] sh: move xchg_cmpxchg to a header by itself Michael S. Tsirkin
` (10 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
This completes the xchg implementation for sh architecture. Note: The
llsc variant is tricky since this only supports 4 byte atomics, the
existing implementation of 1 byte xchg is wrong: we need to do a 4 byte
cmpxchg and retry if any bytes changed meanwhile.
Write this in C for clarity.
Suggested-by: Rich Felker <dalias@libc.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/sh/include/asm/cmpxchg-grb.h | 22 +++++++++++++++
arch/sh/include/asm/cmpxchg-irq.h | 11 ++++++++
arch/sh/include/asm/cmpxchg-llsc.h | 58 +++++++++++++++++++++++---------------
arch/sh/include/asm/cmpxchg.h | 3 ++
4 files changed, 72 insertions(+), 22 deletions(-)
diff --git a/arch/sh/include/asm/cmpxchg-grb.h b/arch/sh/include/asm/cmpxchg-grb.h
index f848dec..2ed557b 100644
--- a/arch/sh/include/asm/cmpxchg-grb.h
+++ b/arch/sh/include/asm/cmpxchg-grb.h
@@ -23,6 +23,28 @@ static inline unsigned long xchg_u32(volatile u32 *m, unsigned long val)
return retval;
}
+static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
+{
+ unsigned long retval;
+
+ __asm__ __volatile__ (
+ " .align 2 \n\t"
+ " mova 1f, r0 \n\t" /* r0 = end point */
+ " mov r15, r1 \n\t" /* r1 = saved sp */
+ " mov #-6, r15 \n\t" /* LOGIN */
+ " mov.w @%1, %0 \n\t" /* load old value */
+ " extu.w %0, %0 \n\t" /* extend as unsigned */
+ " mov.w %2, @%1 \n\t" /* store new value */
+ "1: mov r1, r15 \n\t" /* LOGOUT */
+ : "=&r" (retval),
+ "+r" (m),
+ "+r" (val) /* inhibit r15 overloading */
+ :
+ : "memory" , "r0", "r1");
+
+ return retval;
+}
+
static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
{
unsigned long retval;
diff --git a/arch/sh/include/asm/cmpxchg-irq.h b/arch/sh/include/asm/cmpxchg-irq.h
index bd11f63..f888772 100644
--- a/arch/sh/include/asm/cmpxchg-irq.h
+++ b/arch/sh/include/asm/cmpxchg-irq.h
@@ -14,6 +14,17 @@ static inline unsigned long xchg_u32(volatile u32 *m, unsigned long val)
return retval;
}
+static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
+{
+ unsigned long flags, retval;
+
+ local_irq_save(flags);
+ retval = *m;
+ *m = val;
+ local_irq_restore(flags);
+ return retval;
+}
+
static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
{
unsigned long flags, retval;
diff --git a/arch/sh/include/asm/cmpxchg-llsc.h b/arch/sh/include/asm/cmpxchg-llsc.h
index 4713666..e754794 100644
--- a/arch/sh/include/asm/cmpxchg-llsc.h
+++ b/arch/sh/include/asm/cmpxchg-llsc.h
@@ -1,6 +1,9 @@
#ifndef __ASM_SH_CMPXCHG_LLSC_H
#define __ASM_SH_CMPXCHG_LLSC_H
+#include <linux/bitops.h>
+#include <asm/byteorder.h>
+
static inline unsigned long xchg_u32(volatile u32 *m, unsigned long val)
{
unsigned long retval;
@@ -22,29 +25,8 @@ static inline unsigned long xchg_u32(volatile u32 *m, unsigned long val)
return retval;
}
-static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
-{
- unsigned long retval;
- unsigned long tmp;
-
- __asm__ __volatile__ (
- "1: \n\t"
- "movli.l @%2, %0 ! xchg_u8 \n\t"
- "mov %0, %1 \n\t"
- "mov %3, %0 \n\t"
- "movco.l %0, @%2 \n\t"
- "bf 1b \n\t"
- "synco \n\t"
- : "=&z"(tmp), "=&r" (retval)
- : "r" (m), "r" (val & 0xff)
- : "t", "memory"
- );
-
- return retval;
-}
-
static inline unsigned long
-__cmpxchg_u32(volatile int *m, unsigned long old, unsigned long new)
+__cmpxchg_u32(volatile u32 *m, unsigned long old, unsigned long new)
{
unsigned long retval;
unsigned long tmp;
@@ -68,4 +50,36 @@ __cmpxchg_u32(volatile int *m, unsigned long old, unsigned long new)
return retval;
}
+static inline u32 __xchg_cmpxchg(volatile void *ptr, u32 x, int size)
+{
+ int off = (unsigned long)ptr % sizeof(u32);
+ volatile u32 *p = ptr - off;
+#ifdef __BIG_ENDIAN
+ int bitoff = (sizeof(u32) - 1 - off) * BITS_PER_BYTE;
+#else
+ int bitoff = off * BITS_PER_BYTE;
+#endif
+ u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;
+ u32 oldv, newv;
+ u32 ret;
+
+ do {
+ oldv = READ_ONCE(*p);
+ ret = (oldv & bitmask) >> bitoff;
+ newv = (oldv & ~bitmask) | (x << bitoff);
+ } while (__cmpxchg_u32(p, oldv, newv) != oldv);
+
+ return ret;
+}
+
+static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
+{
+ return __xchg_cmpxchg(m, val, sizeof *m);
+}
+
+static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
+{
+ return __xchg_cmpxchg(m, val, sizeof *m);
+}
+
#endif /* __ASM_SH_CMPXCHG_LLSC_H */
diff --git a/arch/sh/include/asm/cmpxchg.h b/arch/sh/include/asm/cmpxchg.h
index 85c97b18..5225916 100644
--- a/arch/sh/include/asm/cmpxchg.h
+++ b/arch/sh/include/asm/cmpxchg.h
@@ -27,6 +27,9 @@ extern void __xchg_called_with_bad_pointer(void);
case 4: \
__xchg__res = xchg_u32(__xchg_ptr, x); \
break; \
+ case 2: \
+ __xchg__res = xchg_u16(__xchg_ptr, x); \
+ break; \
case 1: \
__xchg__res = xchg_u8(__xchg_ptr, x); \
break; \
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 32/41] sh: move xchg_cmpxchg to a header by itself
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (29 preceding siblings ...)
2016-01-10 14:20 ` [PATCH v3 31/41] sh: support 1 and 2 byte xchg Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 33/41] virtio_ring: use virt_store_mb Michael S. Tsirkin
` (9 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
Looks like future sh variants will support a 4-byte cas which will be
used to implement 1 and 2 byte xchg.
This is exactly what we do for llsc now, move the portable part of the
code into a separate header so it's easy to reuse.
Suggested-by: Rich Felker <dalias@libc.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/sh/include/asm/cmpxchg-llsc.h | 35 +-------------------------
arch/sh/include/asm/cmpxchg-xchg.h | 51 ++++++++++++++++++++++++++++++++++++++
2 files changed, 52 insertions(+), 34 deletions(-)
create mode 100644 arch/sh/include/asm/cmpxchg-xchg.h
diff --git a/arch/sh/include/asm/cmpxchg-llsc.h b/arch/sh/include/asm/cmpxchg-llsc.h
index e754794..fcfd322 100644
--- a/arch/sh/include/asm/cmpxchg-llsc.h
+++ b/arch/sh/include/asm/cmpxchg-llsc.h
@@ -1,9 +1,6 @@
#ifndef __ASM_SH_CMPXCHG_LLSC_H
#define __ASM_SH_CMPXCHG_LLSC_H
-#include <linux/bitops.h>
-#include <asm/byteorder.h>
-
static inline unsigned long xchg_u32(volatile u32 *m, unsigned long val)
{
unsigned long retval;
@@ -50,36 +47,6 @@ __cmpxchg_u32(volatile u32 *m, unsigned long old, unsigned long new)
return retval;
}
-static inline u32 __xchg_cmpxchg(volatile void *ptr, u32 x, int size)
-{
- int off = (unsigned long)ptr % sizeof(u32);
- volatile u32 *p = ptr - off;
-#ifdef __BIG_ENDIAN
- int bitoff = (sizeof(u32) - 1 - off) * BITS_PER_BYTE;
-#else
- int bitoff = off * BITS_PER_BYTE;
-#endif
- u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;
- u32 oldv, newv;
- u32 ret;
-
- do {
- oldv = READ_ONCE(*p);
- ret = (oldv & bitmask) >> bitoff;
- newv = (oldv & ~bitmask) | (x << bitoff);
- } while (__cmpxchg_u32(p, oldv, newv) != oldv);
-
- return ret;
-}
-
-static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
-{
- return __xchg_cmpxchg(m, val, sizeof *m);
-}
-
-static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
-{
- return __xchg_cmpxchg(m, val, sizeof *m);
-}
+#include <asm/cmpxchg-xchg.h>
#endif /* __ASM_SH_CMPXCHG_LLSC_H */
diff --git a/arch/sh/include/asm/cmpxchg-xchg.h b/arch/sh/include/asm/cmpxchg-xchg.h
new file mode 100644
index 0000000..7219719
--- /dev/null
+++ b/arch/sh/include/asm/cmpxchg-xchg.h
@@ -0,0 +1,51 @@
+#ifndef __ASM_SH_CMPXCHG_XCHG_H
+#define __ASM_SH_CMPXCHG_XCHG_H
+
+/*
+ * Copyright (C) 2016 Red Hat, Inc.
+ * Author: Michael S. Tsirkin <mst@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * file "COPYING" in the main directory of this archive for more details.
+ */
+#include <linux/bitops.h>
+#include <asm/byteorder.h>
+
+/*
+ * Portable implementations of 1 and 2 byte xchg using a 4 byte cmpxchg.
+ * Note: this header isn't self-contained: before including it, __cmpxchg_u32
+ * must be defined first.
+ */
+static inline u32 __xchg_cmpxchg(volatile void *ptr, u32 x, int size)
+{
+ int off = (unsigned long)ptr % sizeof(u32);
+ volatile u32 *p = ptr - off;
+#ifdef __BIG_ENDIAN
+ int bitoff = (sizeof(u32) - 1 - off) * BITS_PER_BYTE;
+#else
+ int bitoff = off * BITS_PER_BYTE;
+#endif
+ u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;
+ u32 oldv, newv;
+ u32 ret;
+
+ do {
+ oldv = READ_ONCE(*p);
+ ret = (oldv & bitmask) >> bitoff;
+ newv = (oldv & ~bitmask) | (x << bitoff);
+ } while (__cmpxchg_u32(p, oldv, newv) != oldv);
+
+ return ret;
+}
+
+static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
+{
+ return __xchg_cmpxchg(m, val, sizeof *m);
+}
+
+static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
+{
+ return __xchg_cmpxchg(m, val, sizeof *m);
+}
+
+#endif /* __ASM_SH_CMPXCHG_XCHG_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 33/41] virtio_ring: use virt_store_mb
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (30 preceding siblings ...)
2016-01-10 14:20 ` [PATCH v3 32/41] sh: move xchg_cmpxchg to a header by itself Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 34/41] checkpatch.pl: add missing memory barriers Michael S. Tsirkin
` (8 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
We need a full barrier after writing out event index, using
virt_store_mb there seems better than open-coding. As usual, we need a
wrapper to account for strong barriers.
It's tempting to use this in vhost as well, for that, we'll
need a variant of smp_store_mb that works on __user pointers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
include/linux/virtio_ring.h | 11 +++++++++++
drivers/virtio/virtio_ring.c | 15 +++++++++------
2 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index f3fa55b..a156e2b 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -45,6 +45,17 @@ static inline void virtio_wmb(bool weak_barriers)
wmb();
}
+static inline void virtio_store_mb(bool weak_barriers,
+ __virtio16 *p, __virtio16 v)
+{
+ if (weak_barriers) {
+ virt_store_mb(*p, v);
+ } else {
+ WRITE_ONCE(*p, v);
+ mb();
+ }
+}
+
struct virtio_device;
struct virtqueue;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ee663c4..e12e385 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -517,10 +517,10 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
/* If we expect an interrupt for the next entry, tell host
* by writing event index and flush out the write before
* the read in the next get_buf call. */
- if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
- vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx);
- virtio_mb(vq->weak_barriers);
- }
+ if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
+ virtio_store_mb(vq->weak_barriers,
+ &vring_used_event(&vq->vring),
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
#ifdef DEBUG
vq->last_add_time_valid = false;
@@ -653,8 +653,11 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
}
/* TODO: tune this threshold */
bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
- vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs);
- virtio_mb(vq->weak_barriers);
+
+ virtio_store_mb(vq->weak_barriers,
+ &vring_used_event(&vq->vring),
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
+
if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
END_USE(vq);
return false;
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 34/41] checkpatch.pl: add missing memory barriers
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (31 preceding siblings ...)
2016-01-10 14:21 ` [PATCH v3 33/41] virtio_ring: use virt_store_mb Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 35/41] checkpatch: check for __smp outside barrier.h Michael S. Tsirkin
` (7 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
SMP-only barriers were missing in checkpatch.pl
Refactor code slightly to make adding more variants easier.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
scripts/checkpatch.pl | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 2b3c228..97b8b62 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5116,7 +5116,25 @@ sub process {
}
}
# check for memory barriers without a comment.
- if ($line =~ /\b(mb|rmb|wmb|read_barrier_depends|smp_mb|smp_rmb|smp_wmb|smp_read_barrier_depends)\(/) {
+
+ my $barriers = qr{
+ mb|
+ rmb|
+ wmb|
+ read_barrier_depends
+ }x;
+ my $smp_barriers = qr{
+ store_release|
+ load_acquire|
+ store_mb|
+ ($barriers)
+ }x;
+ my $all_barriers = qr{
+ $barriers|
+ smp_($smp_barriers)
+ }x;
+
+ if ($line =~ /\b($all_barriers)\s*\(/) {
if (!ctx_has_comment($first_line, $linenr)) {
WARN("MEMORY_BARRIER",
"memory barrier without comment\n" . $herecurr);
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 35/41] checkpatch: check for __smp outside barrier.h
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (32 preceding siblings ...)
2016-01-10 14:21 ` [PATCH v3 34/41] checkpatch.pl: add missing memory barriers Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 36/41] checkpatch: add virt barriers Michael S. Tsirkin
` (6 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
Introduction of __smp barriers cleans up a bunch of duplicate code, but
it gives people an additional handle onto a "new" set of barriers - just
because they're prefixed with __* unfortunately doesn't stop anyone from
using it (as happened with other arch stuff before.)
Add a checkpatch test so it will trigger a warning.
Reported-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
scripts/checkpatch.pl | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 97b8b62..a96adcb 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5141,6 +5141,16 @@ sub process {
}
}
+ my $underscore_smp_barriers = qr{__smp_($smp_barriers)}x;
+
+ if ($realfile !~ m@^include/asm-generic/@ &&
+ $realfile !~ m@/barrier\.h$@ &&
+ $line =~ m/\b($underscore_smp_barriers)\s*\(/ &&
+ $line !~ m/^.\s*\#\s*define\s+($underscore_smp_barriers)\s*\(/) {
+ WARN("MEMORY_BARRIER",
+ "__smp memory barriers shouldn't be used outside barrier.h and asm-generic\n" . $herecurr);
+ }
+
# check for waitqueue_active without a comment.
if ($line =~ /\bwaitqueue_active\s*\(/) {
if (!ctx_has_comment($first_line, $linenr)) {
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 36/41] checkpatch: add virt barriers
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (33 preceding siblings ...)
2016-01-10 14:21 ` [PATCH v3 35/41] checkpatch: check for __smp outside barrier.h Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 37/41] xenbus: use virt_xxx barriers Michael S. Tsirkin
` (5 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
Add virt_ barriers to list of barriers to check for
presence of a comment.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
scripts/checkpatch.pl | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index a96adcb..5ca272b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5131,7 +5131,8 @@ sub process {
}x;
my $all_barriers = qr{
$barriers|
- smp_($smp_barriers)
+ smp_($smp_barriers)|
+ virt_($smp_barriers)
}x;
if ($line =~ /\b($all_barriers)\s*\(/) {
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 37/41] xenbus: use virt_xxx barriers
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (34 preceding siblings ...)
2016-01-10 14:21 ` [PATCH v3 36/41] checkpatch: add virt barriers Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 38/41] xen/io: " Michael S. Tsirkin
` (4 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, David Vrabel <david>
drivers/xen/xenbus/xenbus_comms.c uses
full memory barriers to communicate with the other side.
For guests compiled with CONFIG_SMP, smp_wmb and smp_mb
would be sufficient, so mb() and wmb() here are only needed if
a non-SMP guest runs on an SMP host.
Switch to virt_xxx barriers which serve this exact purpose.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
---
drivers/xen/xenbus/xenbus_comms.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c
index fdb0f33..ecdecce 100644
--- a/drivers/xen/xenbus/xenbus_comms.c
+++ b/drivers/xen/xenbus/xenbus_comms.c
@@ -123,14 +123,14 @@ int xb_write(const void *data, unsigned len)
avail = len;
/* Must write data /after/ reading the consumer index. */
- mb();
+ virt_mb();
memcpy(dst, data, avail);
data += avail;
len -= avail;
/* Other side must not see new producer until data is there. */
- wmb();
+ virt_wmb();
intf->req_prod += avail;
/* Implies mb(): other side will see the updated producer. */
@@ -180,14 +180,14 @@ int xb_read(void *data, unsigned len)
avail = len;
/* Must read data /after/ reading the producer index. */
- rmb();
+ virt_rmb();
memcpy(data, src, avail);
data += avail;
len -= avail;
/* Other side must not see free space until we've copied out */
- mb();
+ virt_mb();
intf->rsp_cons += avail;
pr_debug("Finished read of %i bytes (%i to go)\n", avail, len);
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 38/41] xen/io: use virt_xxx barriers
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (35 preceding siblings ...)
2016-01-10 14:21 ` [PATCH v3 37/41] xenbus: use virt_xxx barriers Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 39/41] xen/events: " Michael S. Tsirkin
` (3 subsequent siblings)
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, David Vrabel <david>
include/xen/interface/io/ring.h uses
full memory barriers to communicate with the other side.
For guests compiled with CONFIG_SMP, smp_wmb and smp_mb
would be sufficient, so mb() and wmb() here are only needed if
a non-SMP guest runs on an SMP host.
Switch to virt_xxx barriers which serve this exact purpose.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
---
include/xen/interface/io/ring.h | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h
index 7dc685b..21f4fbd 100644
--- a/include/xen/interface/io/ring.h
+++ b/include/xen/interface/io/ring.h
@@ -208,12 +208,12 @@ struct __name##_back_ring { \
#define RING_PUSH_REQUESTS(_r) do { \
- wmb(); /* back sees requests /before/ updated producer index */ \
+ virt_wmb(); /* back sees requests /before/ updated producer index */ \
(_r)->sring->req_prod = (_r)->req_prod_pvt; \
} while (0)
#define RING_PUSH_RESPONSES(_r) do { \
- wmb(); /* front sees responses /before/ updated producer index */ \
+ virt_wmb(); /* front sees responses /before/ updated producer index */ \
(_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \
} while (0)
@@ -250,9 +250,9 @@ struct __name##_back_ring { \
#define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \
RING_IDX __old = (_r)->sring->req_prod; \
RING_IDX __new = (_r)->req_prod_pvt; \
- wmb(); /* back sees requests /before/ updated producer index */ \
+ virt_wmb(); /* back sees requests /before/ updated producer index */ \
(_r)->sring->req_prod = __new; \
- mb(); /* back sees new requests /before/ we check req_event */ \
+ virt_mb(); /* back sees new requests /before/ we check req_event */ \
(_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \
(RING_IDX)(__new - __old)); \
} while (0)
@@ -260,9 +260,9 @@ struct __name##_back_ring { \
#define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \
RING_IDX __old = (_r)->sring->rsp_prod; \
RING_IDX __new = (_r)->rsp_prod_pvt; \
- wmb(); /* front sees responses /before/ updated producer index */ \
+ virt_wmb(); /* front sees responses /before/ updated producer index */ \
(_r)->sring->rsp_prod = __new; \
- mb(); /* front sees new responses /before/ we check rsp_event */ \
+ virt_mb(); /* front sees new responses /before/ we check rsp_event */ \
(_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \
(RING_IDX)(__new - __old)); \
} while (0)
@@ -271,7 +271,7 @@ struct __name##_back_ring { \
(_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
if (_work_to_do) break; \
(_r)->sring->req_event = (_r)->req_cons + 1; \
- mb(); \
+ virt_mb(); \
(_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
} while (0)
@@ -279,7 +279,7 @@ struct __name##_back_ring { \
(_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
if (_work_to_do) break; \
(_r)->sring->rsp_event = (_r)->rsp_cons + 1; \
- mb(); \
+ virt_mb(); \
(_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
} while (0)
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 39/41] xen/events: use virt_xxx barriers
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (36 preceding siblings ...)
2016-01-10 14:21 ` [PATCH v3 38/41] xen/io: " Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
2016-01-11 11:12 ` David Vrabel
2016-01-10 14:22 ` [PATCH v3 40/41] s390: use generic memory barriers Michael S. Tsirkin
` (2 subsequent siblings)
40 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, David Vrabel <david>
drivers/xen/events/events_fifo.c uses rmb() to communicate with the
other side.
For guests compiled with CONFIG_SMP, smp_rmb would be sufficient, so
rmb() here is only needed if a non-SMP guest runs on an SMP host.
Switch to the virt_rmb barrier which serves this exact purpose.
Pull in asm/barrier.h here to make sure the file is self-contained.
Suggested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/xen/events/events_fifo.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 96a1b8d..eff2b88 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -41,6 +41,7 @@
#include <linux/percpu.h>
#include <linux/cpu.h>
+#include <asm/barrier.h>
#include <asm/sync_bitops.h>
#include <asm/xen/hypercall.h>
#include <asm/xen/hypervisor.h>
@@ -296,7 +297,7 @@ static void consume_one_event(unsigned cpu,
* control block.
*/
if (head = 0) {
- rmb(); /* Ensure word is up-to-date before reading head. */
+ virt_rmb(); /* Ensure word is up-to-date before reading head. */
head = control_block->head[priority];
}
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* Re: [PATCH v3 39/41] xen/events: use virt_xxx barriers
2016-01-10 14:21 ` [PATCH v3 39/41] xen/events: " Michael S. Tsirkin
@ 2016-01-11 11:12 ` David Vrabel
0 siblings, 0 replies; 153+ messages in thread
From: David Vrabel @ 2016-01-11 11:12 UTC (permalink / raw)
To: Michael S. Tsirkin, linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
On 10/01/16 14:21, Michael S. Tsirkin wrote:
> drivers/xen/events/events_fifo.c uses rmb() to communicate with the
> other side.
>
> For guests compiled with CONFIG_SMP, smp_rmb would be sufficient, so
> rmb() here is only needed if a non-SMP guest runs on an SMP host.
>
> Switch to the virt_rmb barrier which serves this exact purpose.
>
> Pull in asm/barrier.h here to make sure the file is self-contained.
>
> Suggested-by: David Vrabel <david.vrabel@citrix.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
David
^ permalink raw reply [flat|nested] 153+ messages in thread
* [PATCH v3 40/41] s390: use generic memory barriers
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (37 preceding siblings ...)
2016-01-10 14:21 ` [PATCH v3 39/41] xen/events: " Michael S. Tsirkin
@ 2016-01-10 14:22 ` Michael S. Tsirkin
2016-01-10 14:22 ` [PATCH v3 41/41] s390: more efficient smp barriers Michael S. Tsirkin
2016-01-12 12:50 ` [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Peter Zijlstra
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:22 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Martin
The s390 kernel is SMP to 99.99%, we just didn't bother with a
non-smp variant for the memory-barriers. If the generic header
is used we'd get the non-smp version for free. It will save a
small amount of text space for CONFIG_SMP=n.
Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/s390/include/asm/barrier.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index fbd25b2..4d26fa4 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -29,9 +29,6 @@
#define __smp_mb() mb()
#define __smp_rmb() rmb()
#define __smp_wmb() wmb()
-#define smp_mb() __smp_mb()
-#define smp_rmb() __smp_rmb()
-#define smp_wmb() __smp_wmb()
#define __smp_store_release(p, v) \
do { \
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* [PATCH v3 41/41] s390: more efficient smp barriers
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (38 preceding siblings ...)
2016-01-10 14:22 ` [PATCH v3 40/41] s390: use generic memory barriers Michael S. Tsirkin
@ 2016-01-10 14:22 ` Michael S. Tsirkin
2016-01-12 12:50 ` [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Peter Zijlstra
40 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:22 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel, Martin
As per: lkml.kernel.org/r/20150921112252.3c2937e1@mschwide
atomics imply a barrier on s390, so s390 should change
smp_mb__before_atomic and smp_mb__after_atomic to barrier() instead of
smp_mb() and hence should not use the generic versions.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/s390/include/asm/barrier.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 4d26fa4..5c8db3c 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -45,6 +45,9 @@ do { \
___p1; \
})
+#define __smp_mb__before_atomic() barrier()
+#define __smp_mb__after_atomic() barrier()
+
#include <asm-generic/barrier.h>
#endif /* __ASM_BARRIER_H */
--
MST
^ permalink raw reply related [flat|nested] 153+ messages in thread
* Re: [PATCH v3 00/41] arch: barrier cleanup + barriers for virt
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
` (39 preceding siblings ...)
2016-01-10 14:22 ` [PATCH v3 41/41] s390: more efficient smp barriers Michael S. Tsirkin
@ 2016-01-12 12:50 ` Peter Zijlstra
40 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12 12:50 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
Russell King - ARM Linux, virtualization, Stefano Stabellini,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
linux-arm-kernel, linux-metag, linux-mips, x86,
user-mode-linux-devel, adi-buildroot-devel, linux-sh,
linux-xtensa, xen-devel
On Sun, Jan 10, 2016 at 04:16:22PM +0200, Michael S. Tsirkin wrote:
> I parked this in vhost tree for now, though the inclusion of patch 1 from tip
> creates a merge conflict - but one that is trivial to resolve.
>
> So I intend to just merge it all through my tree, including the
> duplicate patch, and assume conflict will be resolved.
>
> I would really appreciate some feedback on arch bits (especially the x86 bits),
> and acks for merging this through the vhost tree.
Thanks for doing this, looks good to me.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
^ permalink raw reply [flat|nested] 153+ messages in thread