[RFC PATCH] ARM: Change the mandatory barriers implementation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] ARM: Change the mandatory barriers implementation
@ 2010-02-03 16:15 ` Catalin Marinas
  0 siblings, 0 replies; 14+ messages in thread
From: Catalin Marinas @ 2010-02-03 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

  (I cc'ed LKML as well just in case I got the wrong semantics of the
  mandatory barriers)

The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
systems for things like ordering Normal Non-cacheable memory accesses
with DMA transfer (via Device memory writes). The current implementation
uses dmb() for mb() and friends but this is not sufficient. The DMB only
ensures the ordering of accesses with regards to a single observer
accessing the same memory. If a DMA transfer is started by a write to
Device memory, the data to be transfered may not reach the main memory
(even if mapped as Normal Non-cacheable) before the device receives the
notification to begin the transfer. The only barrier that would help in
this situation is DSB which would completely drain the write buffers.

The patch also adds support for platform-defined barriers that can be
defined in mach/barriers.h. This is required by at least two platforms -
MSM and RealView (possible OMAP as well). On RealView with an outer
cache (PL310 for example) stores to Normal Non-cacheable memory are
buffered by the outer cache but the DSB doesn't go as far as this. A
separate L2x0 sync command is required (a store to Strongly Ordered
memory would do as well, similar to the MSM requirements and maybe
faster).

Note that the SMP barriers are not affected as they only deal with
ordering in Normal memory. There is however a situation with the use of
IPIs. A DMB is not enough to ensure that a write to Normal memory is
strictly ordered with respect to the IPI generation (and interrupt
handling). A solution is for the users of smp_call_function() to use a
mandatory barrier.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Daniel Walker <dwalker@codeaurora.org>
Cc: Larry Bassel <lbassel@quicinc.com>
Cc: Tony Lindgren <tony@atomide.com>
---
 arch/arm/include/asm/system.h |   18 ++++++++----------
 arch/arm/mm/Kconfig           |    6 ++++++
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index 058e7e9..477861d 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -138,14 +138,12 @@ extern unsigned int user_debug;
 #define dmb() __asm__ __volatile__ ("" : : : "memory")
 #endif

-#if __LINUX_ARM_ARCH__ >= 7 || defined(CONFIG_SMP)
-#define mb()		dmb()
-#define rmb()		dmb()
-#define wmb()		dmb()
+#ifdef CONFIG_ARCH_HAS_BARRIERS
+#include <mach/barriers.h>
 #else
-#define mb()	do { if (arch_is_coherent()) dmb(); else barrier(); } while (0)
-#define rmb()	do { if (arch_is_coherent()) dmb(); else barrier(); } while (0)
-#define wmb()	do { if (arch_is_coherent()) dmb(); else barrier(); } while (0)
+#define mb()		dsb()
+#define rmb()		dmb()
+#define wmb()		dsb()
 #endif

 #ifndef CONFIG_SMP
@@ -153,9 +151,9 @@ extern unsigned int user_debug;
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
 #else
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
+#define smp_mb()	dmb()
+#define smp_rmb()	dmb()
+#define smp_wmb()	dmb()
 #endif

 #define read_barrier_depends()		do { } while(0)
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index f62beb7..f67f2c4 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -802,3 +802,9 @@ config ARM_L1_CACHE_SHIFT
 	int
 	default 6 if ARCH_OMAP3 || ARCH_S5PC1XX
 	default 5
+
+config ARCH_HAS_BARRIERS
+	bool
+	help
+	  This option allows the use of custom mandatory barriers
+	  included via the mach/barriers.h file.

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH] ARM: Change the mandatory barriers implementation
@ 2010-02-03 16:15 ` Catalin Marinas
  0 siblings, 0 replies; 14+ messages in thread
From: Catalin Marinas @ 2010-02-03 16:15 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Tony Lindgren, Larry Bassel, Daniel Walker, Russell King

  (I cc'ed LKML as well just in case I got the wrong semantics of the
  mandatory barriers)

The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
systems for things like ordering Normal Non-cacheable memory accesses
with DMA transfer (via Device memory writes). The current implementation
uses dmb() for mb() and friends but this is not sufficient. The DMB only
ensures the ordering of accesses with regards to a single observer
accessing the same memory. If a DMA transfer is started by a write to
Device memory, the data to be transfered may not reach the main memory
(even if mapped as Normal Non-cacheable) before the device receives the
notification to begin the transfer. The only barrier that would help in
this situation is DSB which would completely drain the write buffers.

The patch also adds support for platform-defined barriers that can be
defined in mach/barriers.h. This is required by at least two platforms -
MSM and RealView (possible OMAP as well). On RealView with an outer
cache (PL310 for example) stores to Normal Non-cacheable memory are
buffered by the outer cache but the DSB doesn't go as far as this. A
separate L2x0 sync command is required (a store to Strongly Ordered
memory would do as well, similar to the MSM requirements and maybe
faster).

Note that the SMP barriers are not affected as they only deal with
ordering in Normal memory. There is however a situation with the use of
IPIs. A DMB is not enough to ensure that a write to Normal memory is
strictly ordered with respect to the IPI generation (and interrupt
handling). A solution is for the users of smp_call_function() to use a
mandatory barrier.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Daniel Walker <dwalker@codeaurora.org>
Cc: Larry Bassel <lbassel@quicinc.com>
Cc: Tony Lindgren <tony@atomide.com>
---
 arch/arm/include/asm/system.h |   18 ++++++++----------
 arch/arm/mm/Kconfig           |    6 ++++++
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index 058e7e9..477861d 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -138,14 +138,12 @@ extern unsigned int user_debug;
 #define dmb() __asm__ __volatile__ ("" : : : "memory")
 #endif

-#if __LINUX_ARM_ARCH__ >= 7 || defined(CONFIG_SMP)
-#define mb()		dmb()
-#define rmb()		dmb()
-#define wmb()		dmb()
+#ifdef CONFIG_ARCH_HAS_BARRIERS
+#include <mach/barriers.h>
 #else
-#define mb()	do { if (arch_is_coherent()) dmb(); else barrier(); } while (0)
-#define rmb()	do { if (arch_is_coherent()) dmb(); else barrier(); } while (0)
-#define wmb()	do { if (arch_is_coherent()) dmb(); else barrier(); } while (0)
+#define mb()		dsb()
+#define rmb()		dmb()
+#define wmb()		dsb()
 #endif

 #ifndef CONFIG_SMP
@@ -153,9 +151,9 @@ extern unsigned int user_debug;
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
 #else
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
+#define smp_mb()	dmb()
+#define smp_rmb()	dmb()
+#define smp_wmb()	dmb()
 #endif

 #define read_barrier_depends()		do { } while(0)
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index f62beb7..f67f2c4 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -802,3 +802,9 @@ config ARM_L1_CACHE_SHIFT
 	int
 	default 6 if ARCH_OMAP3 || ARCH_S5PC1XX
 	default 5
+
+config ARCH_HAS_BARRIERS
+	bool
+	help
+	  This option allows the use of custom mandatory barriers
+	  included via the mach/barriers.h file.

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH] ARM: Change the mandatory barriers implementation
  2010-02-03 16:15 ` Catalin Marinas
@ 2010-02-04  0:21   ` Abhijeet Dharmapurikar
  -1 siblings, 0 replies; 14+ messages in thread
From: Abhijeet Dharmapurikar @ 2010-02-04  0:21 UTC (permalink / raw)
  To: linux-arm-kernel

> The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
> systems for things like ordering Normal Non-cacheable memory accesses
> with DMA transfer (via Device memory writes). The current implementation
> uses dmb() for mb() and friends but this is not sufficient. The DMB only
> ensures the ordering of accesses with regards to a single observer
> accessing the same memory. If a DMA transfer is started by a write to
> Device memory, the data to be transfered may not reach the main memory
> (even if mapped as Normal Non-cacheable) before the device receives the
> notification to begin the transfer. The only barrier that would help in
> this situation is DSB which would completely drain the write buffers.

On ARMv7, DMB guarantees that all accesses prior to DMB are observed by 
an observer if that observer sees any accesses _after_ the DMB. In this 
case, since DMA engine observes a write to itself( It is being written 
to and hence must observe the write) it should also see the writes to 
the buffers. A dmb() after the writes to buffer and before write to DMA 
engine should suffice.

Moreover an mb() could be in places where accesses to ARM's Device type 
memory need ordering and are 1kb apart. Such usages of mb() would result 
in a dsb() and could cause performance problems.

Since you mention the write buffers this probably applies only to ARMv6. 
Correct me here, I think that dmb on ARMv6 should suffice too.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] ARM: Change the mandatory barriers implementation
@ 2010-02-04  0:21   ` Abhijeet Dharmapurikar
  0 siblings, 0 replies; 14+ messages in thread
From: Abhijeet Dharmapurikar @ 2010-02-04  0:21 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-kernel, linux-arm-kernel, Tony Lindgren, Larry Bassel,
	Daniel Walker, Russell King, linux-arm-msm

> The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
> systems for things like ordering Normal Non-cacheable memory accesses
> with DMA transfer (via Device memory writes). The current implementation
> uses dmb() for mb() and friends but this is not sufficient. The DMB only
> ensures the ordering of accesses with regards to a single observer
> accessing the same memory. If a DMA transfer is started by a write to
> Device memory, the data to be transfered may not reach the main memory
> (even if mapped as Normal Non-cacheable) before the device receives the
> notification to begin the transfer. The only barrier that would help in
> this situation is DSB which would completely drain the write buffers.

On ARMv7, DMB guarantees that all accesses prior to DMB are observed by 
an observer if that observer sees any accesses _after_ the DMB. In this 
case, since DMA engine observes a write to itself( It is being written 
to and hence must observe the write) it should also see the writes to 
the buffers. A dmb() after the writes to buffer and before write to DMA 
engine should suffice.

Moreover an mb() could be in places where accesses to ARM's Device type 
memory need ordering and are 1kb apart. Such usages of mb() would result 
in a dsb() and could cause performance problems.

Since you mention the write buffers this probably applies only to ARMv6. 
Correct me here, I think that dmb on ARMv6 should suffice too.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH] ARM: Change the mandatory barriers implementation
  2010-02-04  0:21   ` Abhijeet Dharmapurikar
@ 2010-02-04  5:15     ` Shilimkar, Santosh
  -1 siblings, 0 replies; 14+ messages in thread
From: Shilimkar, Santosh @ 2010-02-04  5:15 UTC (permalink / raw)
  To: linux-arm-kernel

> -----Original Message-----
> From: linux-arm-kernel-bounces at lists.infradead.org [mailto:linux-arm-kernel-
> bounces at lists.infradead.org] On Behalf Of Abhijeet Dharmapurikar
> Sent: Thursday, February 04, 2010 5:52 AM
> To: Catalin Marinas
> Cc: Daniel Walker; Russell King; Tony Lindgren; linux-arm-msm at vger.kernel.org; linux-
> kernel at vger.kernel.org; Larry Bassel; linux-arm-kernel at lists.infradead.org
> Subject: Re: [RFC PATCH] ARM: Change the mandatory barriers implementation
> 
> > The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
> > systems for things like ordering Normal Non-cacheable memory accesses
> > with DMA transfer (via Device memory writes). The current implementation
> > uses dmb() for mb() and friends but this is not sufficient. The DMB only
> > ensures the ordering of accesses with regards to a single observer
> > accessing the same memory. If a DMA transfer is started by a write to
> > Device memory, the data to be transfered may not reach the main memory
> > (even if mapped as Normal Non-cacheable) before the device receives the
> > notification to begin the transfer. The only barrier that would help in
> > this situation is DSB which would completely drain the write buffers.
> 
> On ARMv7, DMB guarantees that all accesses prior to DMB are observed by
> an observer if that observer sees any accesses _after_ the DMB. In this
> case, since DMA engine observes a write to itself( It is being written
> to and hence must observe the write) it should also see the writes to
> the buffers. A dmb() after the writes to buffer and before write to DMA
> engine should suffice.
This may be true if the DMA engine is an observer in the cluster. Without
coherence hardware, DMA won't see the correct data without draining the write
buffer. So DSB is necessary in DMA case at least.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH] ARM: Change the mandatory barriers implementation
@ 2010-02-04  5:15     ` Shilimkar, Santosh
  0 siblings, 0 replies; 14+ messages in thread
From: Shilimkar, Santosh @ 2010-02-04  5:15 UTC (permalink / raw)
  To: Abhijeet Dharmapurikar, Catalin Marinas
  Cc: Daniel Walker, Russell King, Tony Lindgren,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Larry Bassel, linux-arm-kernel@lists.infradead.org

> -----Original Message-----
> From: linux-arm-kernel-bounces@lists.infradead.org [mailto:linux-arm-kernel-
> bounces@lists.infradead.org] On Behalf Of Abhijeet Dharmapurikar
> Sent: Thursday, February 04, 2010 5:52 AM
> To: Catalin Marinas
> Cc: Daniel Walker; Russell King; Tony Lindgren; linux-arm-msm@vger.kernel.org; linux-
> kernel@vger.kernel.org; Larry Bassel; linux-arm-kernel@lists.infradead.org
> Subject: Re: [RFC PATCH] ARM: Change the mandatory barriers implementation
> 
> > The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
> > systems for things like ordering Normal Non-cacheable memory accesses
> > with DMA transfer (via Device memory writes). The current implementation
> > uses dmb() for mb() and friends but this is not sufficient. The DMB only
> > ensures the ordering of accesses with regards to a single observer
> > accessing the same memory. If a DMA transfer is started by a write to
> > Device memory, the data to be transfered may not reach the main memory
> > (even if mapped as Normal Non-cacheable) before the device receives the
> > notification to begin the transfer. The only barrier that would help in
> > this situation is DSB which would completely drain the write buffers.
> 
> On ARMv7, DMB guarantees that all accesses prior to DMB are observed by
> an observer if that observer sees any accesses _after_ the DMB. In this
> case, since DMA engine observes a write to itself( It is being written
> to and hence must observe the write) it should also see the writes to
> the buffers. A dmb() after the writes to buffer and before write to DMA
> engine should suffice.
This may be true if the DMA engine is an observer in the cluster. Without
coherence hardware, DMA won't see the correct data without draining the write
buffer. So DSB is necessary in DMA case at least.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH] ARM: Change the mandatory barriers implementation
  2010-02-04  0:21   ` Abhijeet Dharmapurikar
@ 2010-02-04 13:15     ` Catalin Marinas
  -1 siblings, 0 replies; 14+ messages in thread
From: Catalin Marinas @ 2010-02-04 13:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-02-04 at 00:21 +0000, Abhijeet Dharmapurikar wrote:
> > The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
> > systems for things like ordering Normal Non-cacheable memory accesses
> > with DMA transfer (via Device memory writes). The current implementation
> > uses dmb() for mb() and friends but this is not sufficient. The DMB only
> > ensures the ordering of accesses with regards to a single observer
> > accessing the same memory. If a DMA transfer is started by a write to
> > Device memory, the data to be transfered may not reach the main memory
> > (even if mapped as Normal Non-cacheable) before the device receives the
> > notification to begin the transfer. The only barrier that would help in
> > this situation is DSB which would completely drain the write buffers.
> 
> On ARMv7, DMB guarantees that all accesses prior to DMB are observed by
> an observer if that observer sees any accesses _after_ the DMB. In this
> case, since DMA engine observes a write to itself( It is being written
> to and hence must observe the write) it should also see the writes to
> the buffers. A dmb() after the writes to buffer and before write to DMA
> engine should suffice.

I asked our processor architect for a clarification on the wording of
the DMB definition but the "all accesses" part most likely refer to
accesses to the same peripheral or memory block (but not together).
Intuitively, you can have a hardware configuration as below:

     CPU   Device
      |     | |
      +-----+ |      (1)
      |       |
    Buffer    |
      |       |
      +---+---+      (2)
          |
         RAM

The peripheral register write and memory write go on different paths. A
DMB may ensure the ordering at level (1) but there could be delays
before a write reaches the RAM and the peripheral would get the DMA
start notification before that. Only DSB would ensure the draining of
the buffer.

> Moreover an mb() could be in places where accesses to ARM's Device type
> memory need ordering and are 1kb apart. Such usages of mb() would result
> in a dsb() and could cause performance problems.

Note that accesses to Device memory are ordered relative to each-other
without any barrier. If you have weakly ordered I/O (not the ARM case),
there's mmiowb() for this.

If you need ordering between accesses to Normal memory and Device
memory, a DSB is needed, hence the definition of mb() to be a DSB (some
processors like Cortex-A8 implement DMB so that it drains the write
buffer but this is not always the case on other implementations).

Of course, there are situations when you only need ordering of Normal
memory accesses without any peripheral access and a DMB would be fine in
this situation. But so far Linux uses mb() for both situations, hence
I'm taking the less optimal approach for Normal-Normal ordering.

> Since you mention the write buffers this probably applies only to ARMv6.
> Correct me here, I think that dmb on ARMv6 should suffice too.

I can't guarantee. It depends on the processor implementation
(ARM11MPCore may have a different behaviour). Linux on ARM should pretty
much be architecturally generic.

-- 
Catalin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] ARM: Change the mandatory barriers implementation
@ 2010-02-04 13:15     ` Catalin Marinas
  0 siblings, 0 replies; 14+ messages in thread
From: Catalin Marinas @ 2010-02-04 13:15 UTC (permalink / raw)
  To: Abhijeet Dharmapurikar
  Cc: linux-kernel, linux-arm-kernel, Tony Lindgren, Larry Bassel,
	Daniel Walker, Russell King, linux-arm-msm

On Thu, 2010-02-04 at 00:21 +0000, Abhijeet Dharmapurikar wrote:
> > The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
> > systems for things like ordering Normal Non-cacheable memory accesses
> > with DMA transfer (via Device memory writes). The current implementation
> > uses dmb() for mb() and friends but this is not sufficient. The DMB only
> > ensures the ordering of accesses with regards to a single observer
> > accessing the same memory. If a DMA transfer is started by a write to
> > Device memory, the data to be transfered may not reach the main memory
> > (even if mapped as Normal Non-cacheable) before the device receives the
> > notification to begin the transfer. The only barrier that would help in
> > this situation is DSB which would completely drain the write buffers.
> 
> On ARMv7, DMB guarantees that all accesses prior to DMB are observed by
> an observer if that observer sees any accesses _after_ the DMB. In this
> case, since DMA engine observes a write to itself( It is being written
> to and hence must observe the write) it should also see the writes to
> the buffers. A dmb() after the writes to buffer and before write to DMA
> engine should suffice.

I asked our processor architect for a clarification on the wording of
the DMB definition but the "all accesses" part most likely refer to
accesses to the same peripheral or memory block (but not together).
Intuitively, you can have a hardware configuration as below:

     CPU   Device
      |     | |
      +-----+ |      (1)
      |       |
    Buffer    |
      |       |
      +---+---+      (2)
          |
         RAM

The peripheral register write and memory write go on different paths. A
DMB may ensure the ordering at level (1) but there could be delays
before a write reaches the RAM and the peripheral would get the DMA
start notification before that. Only DSB would ensure the draining of
the buffer.

> Moreover an mb() could be in places where accesses to ARM's Device type
> memory need ordering and are 1kb apart. Such usages of mb() would result
> in a dsb() and could cause performance problems.

Note that accesses to Device memory are ordered relative to each-other
without any barrier. If you have weakly ordered I/O (not the ARM case),
there's mmiowb() for this.

If you need ordering between accesses to Normal memory and Device
memory, a DSB is needed, hence the definition of mb() to be a DSB (some
processors like Cortex-A8 implement DMB so that it drains the write
buffer but this is not always the case on other implementations).

Of course, there are situations when you only need ordering of Normal
memory accesses without any peripheral access and a DMB would be fine in
this situation. But so far Linux uses mb() for both situations, hence
I'm taking the less optimal approach for Normal-Normal ordering.

> Since you mention the write buffers this probably applies only to ARMv6.
> Correct me here, I think that dmb on ARMv6 should suffice too.

I can't guarantee. It depends on the processor implementation
(ARM11MPCore may have a different behaviour). Linux on ARM should pretty
much be architecturally generic.

-- 
Catalin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH] ARM: Change the mandatory barriers implementation
  2010-02-04 13:15     ` Catalin Marinas
@ 2010-02-04 16:39       ` Tony Lindgren
  -1 siblings, 0 replies; 14+ messages in thread
From: Tony Lindgren @ 2010-02-04 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

* Catalin Marinas <catalin.marinas@arm.com> [100204 05:13]:
> On Thu, 2010-02-04 at 00:21 +0000, Abhijeet Dharmapurikar wrote:
> > > The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
> > > systems for things like ordering Normal Non-cacheable memory accesses
> > > with DMA transfer (via Device memory writes). The current implementation
> > > uses dmb() for mb() and friends but this is not sufficient. The DMB only
> > > ensures the ordering of accesses with regards to a single observer
> > > accessing the same memory. If a DMA transfer is started by a write to
> > > Device memory, the data to be transfered may not reach the main memory
> > > (even if mapped as Normal Non-cacheable) before the device receives the
> > > notification to begin the transfer. The only barrier that would help in
> > > this situation is DSB which would completely drain the write buffers.
> > 
> > On ARMv7, DMB guarantees that all accesses prior to DMB are observed by
> > an observer if that observer sees any accesses _after_ the DMB. In this
> > case, since DMA engine observes a write to itself( It is being written
> > to and hence must observe the write) it should also see the writes to
> > the buffers. A dmb() after the writes to buffer and before write to DMA
> > engine should suffice.
> 
> I asked our processor architect for a clarification on the wording of
> the DMB definition but the "all accesses" part most likely refer to
> accesses to the same peripheral or memory block (but not together).
> Intuitively, you can have a hardware configuration as below:
> 
>      CPU   Device
>       |     | |
>       +-----+ |      (1)
>       |       |
>     Buffer    |
>       |       |
>       +---+---+      (2)
>           |
>          RAM
> 
> The peripheral register write and memory write go on different paths. A
> DMB may ensure the ordering at level (1) but there could be delays
> before a write reaches the RAM and the peripheral would get the DMA
> start notification before that. Only DSB would ensure the draining of
> the buffer.

Additionally if there's an external bus at (1), the ARM ordering may
does not guarantee that bus is done.

For example, on omap L3/L4 buses we need to do a readback from the same
device on the bus to ensure the write got to the device. Otherwise things
like spurious interrupts can happen as the device has not yet acked the
interrupt while ARM thinks the handler is done.
 
> > Moreover an mb() could be in places where accesses to ARM's Device type
> > memory need ordering and are 1kb apart. Such usages of mb() would result
> > in a dsb() and could cause performance problems.
> 
> Note that accesses to Device memory are ordered relative to each-other
> without any barrier. If you have weakly ordered I/O (not the ARM case),
> there's mmiowb() for this.
> 
> If you need ordering between accesses to Normal memory and Device
> memory, a DSB is needed, hence the definition of mb() to be a DSB (some
> processors like Cortex-A8 implement DMB so that it drains the write
> buffer but this is not always the case on other implementations).
> 
> Of course, there are situations when you only need ordering of Normal
> memory accesses without any peripheral access and a DMB would be fine in
> this situation. But so far Linux uses mb() for both situations, hence
> I'm taking the less optimal approach for Normal-Normal ordering.

Yeah. The device access is ordered relative to each-other, but since
the device writes may directly affect ARM (for example IRQ status),
a readback from the device is the only way to guarantee ordering for
an external bus.

In most cases only the ordering of instructions matters and there
the barriers work just fine. Just FYI, in case others are experiencing
similar issues.
 
> > Since you mention the write buffers this probably applies only to ARMv6.
> > Correct me here, I think that dmb on ARMv6 should suffice too.
> 
> I can't guarantee. It depends on the processor implementation
> (ARM11MPCore may have a different behaviour). Linux on ARM should pretty
> much be architecturally generic.

Also, my experience is that what I describe above was rare on v6 omaps
(or some different problem), but was happening often on v7 omaps.

Regards,

Tony

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] ARM: Change the mandatory barriers implementation
@ 2010-02-04 16:39       ` Tony Lindgren
  0 siblings, 0 replies; 14+ messages in thread
From: Tony Lindgren @ 2010-02-04 16:39 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Abhijeet Dharmapurikar, linux-kernel, linux-arm-kernel,
	Larry Bassel, Daniel Walker, Russell King, linux-arm-msm

* Catalin Marinas <catalin.marinas@arm.com> [100204 05:13]:
> On Thu, 2010-02-04 at 00:21 +0000, Abhijeet Dharmapurikar wrote:
> > > The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
> > > systems for things like ordering Normal Non-cacheable memory accesses
> > > with DMA transfer (via Device memory writes). The current implementation
> > > uses dmb() for mb() and friends but this is not sufficient. The DMB only
> > > ensures the ordering of accesses with regards to a single observer
> > > accessing the same memory. If a DMA transfer is started by a write to
> > > Device memory, the data to be transfered may not reach the main memory
> > > (even if mapped as Normal Non-cacheable) before the device receives the
> > > notification to begin the transfer. The only barrier that would help in
> > > this situation is DSB which would completely drain the write buffers.
> > 
> > On ARMv7, DMB guarantees that all accesses prior to DMB are observed by
> > an observer if that observer sees any accesses _after_ the DMB. In this
> > case, since DMA engine observes a write to itself( It is being written
> > to and hence must observe the write) it should also see the writes to
> > the buffers. A dmb() after the writes to buffer and before write to DMA
> > engine should suffice.
> 
> I asked our processor architect for a clarification on the wording of
> the DMB definition but the "all accesses" part most likely refer to
> accesses to the same peripheral or memory block (but not together).
> Intuitively, you can have a hardware configuration as below:
> 
>      CPU   Device
>       |     | |
>       +-----+ |      (1)
>       |       |
>     Buffer    |
>       |       |
>       +---+---+      (2)
>           |
>          RAM
> 
> The peripheral register write and memory write go on different paths. A
> DMB may ensure the ordering at level (1) but there could be delays
> before a write reaches the RAM and the peripheral would get the DMA
> start notification before that. Only DSB would ensure the draining of
> the buffer.

Additionally if there's an external bus at (1), the ARM ordering may
does not guarantee that bus is done.

For example, on omap L3/L4 buses we need to do a readback from the same
device on the bus to ensure the write got to the device. Otherwise things
like spurious interrupts can happen as the device has not yet acked the
interrupt while ARM thinks the handler is done.
 
> > Moreover an mb() could be in places where accesses to ARM's Device type
> > memory need ordering and are 1kb apart. Such usages of mb() would result
> > in a dsb() and could cause performance problems.
> 
> Note that accesses to Device memory are ordered relative to each-other
> without any barrier. If you have weakly ordered I/O (not the ARM case),
> there's mmiowb() for this.
> 
> If you need ordering between accesses to Normal memory and Device
> memory, a DSB is needed, hence the definition of mb() to be a DSB (some
> processors like Cortex-A8 implement DMB so that it drains the write
> buffer but this is not always the case on other implementations).
> 
> Of course, there are situations when you only need ordering of Normal
> memory accesses without any peripheral access and a DMB would be fine in
> this situation. But so far Linux uses mb() for both situations, hence
> I'm taking the less optimal approach for Normal-Normal ordering.

Yeah. The device access is ordered relative to each-other, but since
the device writes may directly affect ARM (for example IRQ status),
a readback from the device is the only way to guarantee ordering for
an external bus.

In most cases only the ordering of instructions matters and there
the barriers work just fine. Just FYI, in case others are experiencing
similar issues.
 
> > Since you mention the write buffers this probably applies only to ARMv6.
> > Correct me here, I think that dmb on ARMv6 should suffice too.
> 
> I can't guarantee. It depends on the processor implementation
> (ARM11MPCore may have a different behaviour). Linux on ARM should pretty
> much be architecturally generic.

Also, my experience is that what I describe above was rare on v6 omaps
(or some different problem), but was happening often on v7 omaps.

Regards,

Tony

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH] ARM: Change the mandatory barriers implementation
  2010-02-04 13:15     ` Catalin Marinas
@ 2010-02-05 11:04       ` Catalin Marinas
  -1 siblings, 0 replies; 14+ messages in thread
From: Catalin Marinas @ 2010-02-05 11:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-02-04 at 13:15 +0000, Catalin Marinas wrote:
> On Thu, 2010-02-04 at 00:21 +0000, Abhijeet Dharmapurikar wrote:
> > > The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
> > > systems for things like ordering Normal Non-cacheable memory accesses
> > > with DMA transfer (via Device memory writes). The current implementation
> > > uses dmb() for mb() and friends but this is not sufficient. The DMB only
> > > ensures the ordering of accesses with regards to a single observer
> > > accessing the same memory. If a DMA transfer is started by a write to
> > > Device memory, the data to be transfered may not reach the main memory
> > > (even if mapped as Normal Non-cacheable) before the device receives the
> > > notification to begin the transfer. The only barrier that would help in
> > > this situation is DSB which would completely drain the write buffers.
> >
> > On ARMv7, DMB guarantees that all accesses prior to DMB are observed by
> > an observer if that observer sees any accesses _after_ the DMB. In this
> > case, since DMA engine observes a write to itself( It is being written
> > to and hence must observe the write) it should also see the writes to
> > the buffers. A dmb() after the writes to buffer and before write to DMA
> > engine should suffice.
> 
> I asked our processor architect for a clarification on the wording of
> the DMB definition but the "all accesses" part most likely refer to
> accesses to the same peripheral or memory block (but not together).

I got some clarification and there is nothing wrong with the definition
of the DMB. The catch here is that "observe" (as per the ARM ARM) is
defined only for master accesses. The DMA engine in this case above does
not "observe" the write to itself as this is a slave access to one of
its memory-mapped ports.

So, the code below:

	STR	[Normal non-cacheable]
	DMB
	STR	[Device]

puts the first store in Group A (according to the DMB definition) and
the second store in Group B but since the DMA device does not "observe"
Group B in this case, there is no requirement for the ordering between
the observability of the store to normal memory and the observability of
the side-effects of the store to the DMA device.

-- 
Catalin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] ARM: Change the mandatory barriers implementation
@ 2010-02-05 11:04       ` Catalin Marinas
  0 siblings, 0 replies; 14+ messages in thread
From: Catalin Marinas @ 2010-02-05 11:04 UTC (permalink / raw)
  To: Abhijeet Dharmapurikar
  Cc: Daniel Walker, Russell King, Tony Lindgren, linux-arm-msm,
	linux-kernel, Larry Bassel, linux-arm-kernel

On Thu, 2010-02-04 at 13:15 +0000, Catalin Marinas wrote:
> On Thu, 2010-02-04 at 00:21 +0000, Abhijeet Dharmapurikar wrote:
> > > The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
> > > systems for things like ordering Normal Non-cacheable memory accesses
> > > with DMA transfer (via Device memory writes). The current implementation
> > > uses dmb() for mb() and friends but this is not sufficient. The DMB only
> > > ensures the ordering of accesses with regards to a single observer
> > > accessing the same memory. If a DMA transfer is started by a write to
> > > Device memory, the data to be transfered may not reach the main memory
> > > (even if mapped as Normal Non-cacheable) before the device receives the
> > > notification to begin the transfer. The only barrier that would help in
> > > this situation is DSB which would completely drain the write buffers.
> >
> > On ARMv7, DMB guarantees that all accesses prior to DMB are observed by
> > an observer if that observer sees any accesses _after_ the DMB. In this
> > case, since DMA engine observes a write to itself( It is being written
> > to and hence must observe the write) it should also see the writes to
> > the buffers. A dmb() after the writes to buffer and before write to DMA
> > engine should suffice.
> 
> I asked our processor architect for a clarification on the wording of
> the DMB definition but the "all accesses" part most likely refer to
> accesses to the same peripheral or memory block (but not together).

I got some clarification and there is nothing wrong with the definition
of the DMB. The catch here is that "observe" (as per the ARM ARM) is
defined only for master accesses. The DMA engine in this case above does
not "observe" the write to itself as this is a slave access to one of
its memory-mapped ports.

So, the code below:

	STR	[Normal non-cacheable]
	DMB
	STR	[Device]

puts the first store in Group A (according to the DMB definition) and
the second store in Group B but since the DMA device does not "observe"
Group B in this case, there is no requirement for the ordering between
the observability of the store to normal memory and the observability of
the side-effects of the store to the DMA device.

-- 
Catalin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH] ARM: Change the mandatory barriers implementation
  2010-02-05 11:04       ` Catalin Marinas
@ 2010-02-08 19:17         ` Abhijeet Dharmapurikar
  -1 siblings, 0 replies; 14+ messages in thread
From: Abhijeet Dharmapurikar @ 2010-02-08 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas wrote:
> On Thu, 2010-02-04 at 13:15 +0000, Catalin Marinas wrote:
>> On Thu, 2010-02-04 at 00:21 +0000, Abhijeet Dharmapurikar wrote:
>>>> The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
>>>> systems for things like ordering Normal Non-cacheable memory accesses
>>>> with DMA transfer (via Device memory writes). The current implementation
>>>> uses dmb() for mb() and friends but this is not sufficient. The DMB only
>>>> ensures the ordering of accesses with regards to a single observer
>>>> accessing the same memory. If a DMA transfer is started by a write to
>>>> Device memory, the data to be transfered may not reach the main memory
>>>> (even if mapped as Normal Non-cacheable) before the device receives the
>>>> notification to begin the transfer. The only barrier that would help in
>>>> this situation is DSB which would completely drain the write buffers.
>>> On ARMv7, DMB guarantees that all accesses prior to DMB are observed by
>>> an observer if that observer sees any accesses _after_ the DMB. In this
>>> case, since DMA engine observes a write to itself( It is being written
>>> to and hence must observe the write) it should also see the writes to
>>> the buffers. A dmb() after the writes to buffer and before write to DMA
>>> engine should suffice.
>> I asked our processor architect for a clarification on the wording of
>> the DMB definition but the "all accesses" part most likely refer to
>> accesses to the same peripheral or memory block (but not together).
> 
> I got some clarification and there is nothing wrong with the definition
> of the DMB. The catch here is that "observe" (as per the ARM ARM) is
> defined only for master accesses. The DMA engine in this case above does
> not "observe" the write to itself as this is a slave access to one of
> its memory-mapped ports.
> 
> So, the code below:
> 
> 	STR	[Normal non-cacheable]
> 	DMB
> 	STR	[Device]
> 
> puts the first store in Group A (according to the DMB definition) and
> the second store in Group B but since the DMA device does not "observe"
> Group B in this case, there is no requirement for the ordering between
> the observability of the store to normal memory and the observability of
> the side-effects of the store to the DMA device.

I agree now. DSB would be the right thing to do. Moreover the option to 
have a barrier.h for each platform gives them a chance to fine tune 
these barriers further.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] ARM: Change the mandatory barriers implementation
@ 2010-02-08 19:17         ` Abhijeet Dharmapurikar
  0 siblings, 0 replies; 14+ messages in thread
From: Abhijeet Dharmapurikar @ 2010-02-08 19:17 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Daniel Walker, Russell King, Tony Lindgren, linux-arm-msm,
	linux-kernel, Larry Bassel, linux-arm-kernel

Catalin Marinas wrote:
> On Thu, 2010-02-04 at 13:15 +0000, Catalin Marinas wrote:
>> On Thu, 2010-02-04 at 00:21 +0000, Abhijeet Dharmapurikar wrote:
>>>> The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
>>>> systems for things like ordering Normal Non-cacheable memory accesses
>>>> with DMA transfer (via Device memory writes). The current implementation
>>>> uses dmb() for mb() and friends but this is not sufficient. The DMB only
>>>> ensures the ordering of accesses with regards to a single observer
>>>> accessing the same memory. If a DMA transfer is started by a write to
>>>> Device memory, the data to be transfered may not reach the main memory
>>>> (even if mapped as Normal Non-cacheable) before the device receives the
>>>> notification to begin the transfer. The only barrier that would help in
>>>> this situation is DSB which would completely drain the write buffers.
>>> On ARMv7, DMB guarantees that all accesses prior to DMB are observed by
>>> an observer if that observer sees any accesses _after_ the DMB. In this
>>> case, since DMA engine observes a write to itself( It is being written
>>> to and hence must observe the write) it should also see the writes to
>>> the buffers. A dmb() after the writes to buffer and before write to DMA
>>> engine should suffice.
>> I asked our processor architect for a clarification on the wording of
>> the DMB definition but the "all accesses" part most likely refer to
>> accesses to the same peripheral or memory block (but not together).
> 
> I got some clarification and there is nothing wrong with the definition
> of the DMB. The catch here is that "observe" (as per the ARM ARM) is
> defined only for master accesses. The DMA engine in this case above does
> not "observe" the write to itself as this is a slave access to one of
> its memory-mapped ports.
> 
> So, the code below:
> 
> 	STR	[Normal non-cacheable]
> 	DMB
> 	STR	[Device]
> 
> puts the first store in Group A (according to the DMB definition) and
> the second store in Group B but since the DMA device does not "observe"
> Group B in this case, there is no requirement for the ordering between
> the observability of the store to normal memory and the observability of
> the side-effects of the store to the DMA device.

I agree now. DSB would be the right thing to do. Moreover the option to 
have a barrier.h for each platform gives them a chance to fine tune 
these barriers further.




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-02-08 19:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-03 16:15 [RFC PATCH] ARM: Change the mandatory barriers implementation Catalin Marinas
2010-02-03 16:15 ` Catalin Marinas
2010-02-04  0:21 ` Abhijeet Dharmapurikar
2010-02-04  0:21   ` Abhijeet Dharmapurikar
2010-02-04  5:15   ` Shilimkar, Santosh
2010-02-04  5:15     ` Shilimkar, Santosh
2010-02-04 13:15   ` Catalin Marinas
2010-02-04 13:15     ` Catalin Marinas
2010-02-04 16:39     ` Tony Lindgren
2010-02-04 16:39       ` Tony Lindgren
2010-02-05 11:04     ` Catalin Marinas
2010-02-05 11:04       ` Catalin Marinas
2010-02-08 19:17       ` Abhijeet Dharmapurikar
2010-02-08 19:17         ` Abhijeet Dharmapurikar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.