linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Ordered I/O accessors
@ 2010-07-05 13:20 Catalin Marinas
  2010-07-05 13:20 ` [RFC PATCH 1/3] ARM: Introduce *_relaxed() " Catalin Marinas
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Catalin Marinas @ 2010-07-05 13:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Starting with Linux 2.6.33, the DMA coherent buffers are mapped as
Normal Non-cacheable (rather than Strongly Ordered) to comply with the
ARM architecture requirements (mainly ARMv7). The implications are that
Normal Non-cacheable memory accesses are no longer ordered with Device
memory accesses (note that ARMv7 doesn't guarantee this even if using
Strongly Ordered for the coherent DMA buffers, though it worked in
practice).

This change introduced issues in drivers using dma_alloc_coherent()
without barriers (e.g. wmb) to ensure that writes to coherent DMA
buffers are visible (drained to RAM) to a device before starting a DMA
transfer.

Discussions on LKML (see links in the commit log for the last patch)
suggested that the I/O accessors must be ordered with coherent buffer
accesses by adding the necessary barriers to read*/write*() accessors.
The alternative is to add barriers to drivers, though many of them were
tested on x86 and not showing any issues.

Note that this series only adds barriers to deal with the common case of
DMA coherent buffers. Other non-standard use-cases must have add the
correct barriers in the driver.

I did some simple "dd oflag=direct" tests on a CF card (using the PATA
platform driver) and the time for writing 10MB increased from 6.6s to
6.7s. I personally don't consider this significant. Drivers can be
optimised to use the *_relaxed() accessors (though all the other
architectures need to support them).

Please report if you see any severe performance impact with these
patches.

Thanks.


Catalin Marinas (3):
      ARM: Introduce *_relaxed() I/O accessors
      ARM: Convert L2x0 to use the IO relaxed operations for cache sync
      ARM: Add barriers to the I/O accessors if ARM_DMA_MEM_BUFFERABLE


 arch/arm/include/asm/io.h |   39 +++++++++++++++++++++++++++------------
 arch/arm/mm/cache-l2x0.c  |    4 ++--
 2 files changed, 29 insertions(+), 14 deletions(-)

-- 
Catalin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 1/3] ARM: Introduce *_relaxed() I/O accessors
  2010-07-05 13:20 [RFC PATCH 0/3] Ordered I/O accessors Catalin Marinas
@ 2010-07-05 13:20 ` Catalin Marinas
  2010-07-05 13:20 ` [RFC PATCH 2/3] ARM: Convert L2x0 to use the IO relaxed operations for cache sync Catalin Marinas
  2010-07-05 13:20 ` [RFC PATCH 3/3] ARM: Add barriers to the I/O accessors if ARM_DMA_MEM_BUFFERABLE Catalin Marinas
  2 siblings, 0 replies; 4+ messages in thread
From: Catalin Marinas @ 2010-07-05 13:20 UTC (permalink / raw)
  To: linux-arm-kernel

This patch introduces readl*_relaxed()/write*_relaxed() as the main I/O
accessors (when __mem_pci is defined). The standard read*()/write*()
macros are now based on the relaxed accessors.

This patch is in preparation for a subsequent patch which adds barriers
to the I/O accessors.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/io.h |   29 +++++++++++++++++------------
 1 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index c980156..97fb9aa 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -179,25 +179,30 @@ extern void _memset_io(volatile void __iomem *, int, size_t);
  * IO port primitives for more information.
  */
 #ifdef __mem_pci
-#define readb(c) ({ __u8  __v = __raw_readb(__mem_pci(c)); __v; })
-#define readw(c) ({ __u16 __v = le16_to_cpu((__force __le16) \
+#define readb_relaxed(c) ({ u8  __v = __raw_readb(__mem_pci(c)); __v; })
+#define readw_relaxed(c) ({ u16 __v = le16_to_cpu((__force __le16) \
 					__raw_readw(__mem_pci(c))); __v; })
-#define readl(c) ({ __u32 __v = le32_to_cpu((__force __le32) \
+#define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \
 					__raw_readl(__mem_pci(c))); __v; })
-#define readb_relaxed(addr) readb(addr)
-#define readw_relaxed(addr) readw(addr)
-#define readl_relaxed(addr) readl(addr)
+
+#define writeb_relaxed(v,c)	__raw_writeb(v,__mem_pci(c))
+#define writew_relaxed(v,c)	__raw_writew((__force u16) \
+					cpu_to_le16(v),__mem_pci(c))
+#define writel_relaxed(v,c)	__raw_writel((__force u32) \
+					cpu_to_le32(v),__mem_pci(c))
+
+#define readb(c)		readb_relaxed(c)
+#define readw(c)		readw_relaxed(c)
+#define readl(c)		readl_relaxed(c)
+
+#define writeb(v,c)		writeb_relaxed(v,c)
+#define writew(v,c)		writew_relaxed(v,c)
+#define writel(v,c)		writel_relaxed(v,c)
 
 #define readsb(p,d,l)		__raw_readsb(__mem_pci(p),d,l)
 #define readsw(p,d,l)		__raw_readsw(__mem_pci(p),d,l)
 #define readsl(p,d,l)		__raw_readsl(__mem_pci(p),d,l)
 
-#define writeb(v,c)		__raw_writeb(v,__mem_pci(c))
-#define writew(v,c)		__raw_writew((__force __u16) \
-					cpu_to_le16(v),__mem_pci(c))
-#define writel(v,c)		__raw_writel((__force __u32) \
-					cpu_to_le32(v),__mem_pci(c))
-
 #define writesb(p,d,l)		__raw_writesb(__mem_pci(p),d,l)
 #define writesw(p,d,l)		__raw_writesw(__mem_pci(p),d,l)
 #define writesl(p,d,l)		__raw_writesl(__mem_pci(p),d,l)

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 2/3] ARM: Convert L2x0 to use the IO relaxed operations for cache sync
  2010-07-05 13:20 [RFC PATCH 0/3] Ordered I/O accessors Catalin Marinas
  2010-07-05 13:20 ` [RFC PATCH 1/3] ARM: Introduce *_relaxed() " Catalin Marinas
@ 2010-07-05 13:20 ` Catalin Marinas
  2010-07-05 13:20 ` [RFC PATCH 3/3] ARM: Add barriers to the I/O accessors if ARM_DMA_MEM_BUFFERABLE Catalin Marinas
  2 siblings, 0 replies; 4+ messages in thread
From: Catalin Marinas @ 2010-07-05 13:20 UTC (permalink / raw)
  To: linux-arm-kernel

This patch is in preparation for a subsequent patch which adds barriers
to the I/O accessors. Since the mandatory barriers may do an L2 cache
sync, this patch avoids a recursive call into l2x0_cache_sync() via the
write*() accessors and wmb().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/mm/cache-l2x0.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 9819869..086d7e2 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -32,14 +32,14 @@ static uint32_t l2x0_way_mask;	/* Bitmask of active ways */
 static inline void cache_wait(void __iomem *reg, unsigned long mask)
 {
 	/* wait for the operation to complete */
-	while (readl(reg) & mask)
+	while (readl_relaxed(reg) & mask)
 		;
 }
 
 static inline void cache_sync(void)
 {
 	void __iomem *base = l2x0_base;
-	writel(0, base + L2X0_CACHE_SYNC);
+	writel_relaxed(0, base + L2X0_CACHE_SYNC);
 	cache_wait(base + L2X0_CACHE_SYNC, 1);
 }
 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 3/3] ARM: Add barriers to the I/O accessors if ARM_DMA_MEM_BUFFERABLE
  2010-07-05 13:20 [RFC PATCH 0/3] Ordered I/O accessors Catalin Marinas
  2010-07-05 13:20 ` [RFC PATCH 1/3] ARM: Introduce *_relaxed() " Catalin Marinas
  2010-07-05 13:20 ` [RFC PATCH 2/3] ARM: Convert L2x0 to use the IO relaxed operations for cache sync Catalin Marinas
@ 2010-07-05 13:20 ` Catalin Marinas
  2 siblings, 0 replies; 4+ messages in thread
From: Catalin Marinas @ 2010-07-05 13:20 UTC (permalink / raw)
  To: linux-arm-kernel

When the coherent DMA buffers are mapped as Normal Non-cacheable
(ARM_DMA_MEM_BUFFERABLE enabled), buffer accesses are no longer ordered
with Device memory accesses causing failures in device drivers that do
not use the mandatory memory barriers before starting a DMA transfer.
LKML discussions led to the conclusion that such barriers have to be
added to the I/O accessors:

http://thread.gmane.org/gmane.linux.kernel/683509/focus=686153
http://thread.gmane.org/gmane.linux.ide/46414
http://thread.gmane.org/gmane.linux.kernel.cross-arch/5250

This patch introduces a wmb() barrier to the write*() I/O accessors to
handle the situations where Normal Non-cacheable writes are still in the
processor (or L2 cache controller) write buffer before a DMA transfer
command is issued. For the read*() accessors, a rmb() is introduced
after the I/O to avoid speculative loads where the driver polls for a
DMA transfer ready bit.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/io.h |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 97fb9aa..8f3edef 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -191,6 +191,15 @@ extern void _memset_io(volatile void __iomem *, int, size_t);
 #define writel_relaxed(v,c)	__raw_writel((__force u32) \
 					cpu_to_le32(v),__mem_pci(c))
 
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+#define readb(c)		({ u8  __v = readb_relaxed(c); rmb(); __v; })
+#define readw(c)		({ u16 __v = readw_relaxed(c); rmb(); __v; })
+#define readl(c)		({ u32 __v = readl_relaxed(c); rmb(); __v; })
+
+#define writeb(v,c)		do { wmb(); writeb_relaxed(v,c); } while (0)
+#define writew(v,c)		do { wmb(); writew_relaxed(v,c); } while (0)
+#define writel(v,c)		do { wmb(); writel_relaxed(v,c); } while (0)
+#else
 #define readb(c)		readb_relaxed(c)
 #define readw(c)		readw_relaxed(c)
 #define readl(c)		readl_relaxed(c)
@@ -198,6 +207,7 @@ extern void _memset_io(volatile void __iomem *, int, size_t);
 #define writeb(v,c)		writeb_relaxed(v,c)
 #define writew(v,c)		writew_relaxed(v,c)
 #define writel(v,c)		writel_relaxed(v,c)
+#endif
 
 #define readsb(p,d,l)		__raw_readsb(__mem_pci(p),d,l)
 #define readsw(p,d,l)		__raw_readsw(__mem_pci(p),d,l)

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-07-05 13:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-05 13:20 [RFC PATCH 0/3] Ordered I/O accessors Catalin Marinas
2010-07-05 13:20 ` [RFC PATCH 1/3] ARM: Introduce *_relaxed() " Catalin Marinas
2010-07-05 13:20 ` [RFC PATCH 2/3] ARM: Convert L2x0 to use the IO relaxed operations for cache sync Catalin Marinas
2010-07-05 13:20 ` [RFC PATCH 3/3] ARM: Add barriers to the I/O accessors if ARM_DMA_MEM_BUFFERABLE Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).