linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Various patches for comments and upstream
@ 2010-05-04 16:44 Catalin Marinas
  2010-05-04 16:44 ` [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used Catalin Marinas
                   ` (8 more replies)
  0 siblings, 9 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I've been posting most of these patches in the past (some of them for a
long time, e.g. the domains removal) and I would like some comments on
them. There were some previous discussion threads which seem to have
died without a clear conclusion.

A few of these patches are trivial fixes that could go upstream (during
the upcoming merging window). For the other patches like domains
removal, I would like to push them to linux-next if there are no
(further :)) comments.

Thanks.


Catalin Marinas (8):
      ARM: Improve the L2 cache performance when PL310 is used
      ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
      ARM: Align machine_desc.phys_io to a 1MB section
      ARM: Remove the domain switching on ARMv6k/v7 CPUs
      ARM: Fix the __arm_ioremap_caller() definition in nommu.c
      ARM: Implement copy_to_user_page() for noMMU
      ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
      ARM: Implement phys_mem_access_prot() to avoid attributes aliasing


 arch/arm/include/asm/assembler.h  |   13 +++--
 arch/arm/include/asm/cacheflush.h |    4 ++
 arch/arm/include/asm/domain.h     |   31 ++++++++++++
 arch/arm/include/asm/futex.h      |    9 ++--
 arch/arm/include/asm/pgtable.h    |    3 +
 arch/arm/include/asm/tlbflush.h   |   29 +++++++++++-
 arch/arm/include/asm/uaccess.h    |   16 +++---
 arch/arm/kernel/entry-armv.S      |    4 +-
 arch/arm/kernel/head.S            |    2 +
 arch/arm/kernel/setup.c           |    2 -
 arch/arm/kernel/traps.c           |   17 +++++++
 arch/arm/lib/getuser.S            |   13 +++--
 arch/arm/lib/putuser.S            |   29 ++++++------
 arch/arm/lib/uaccess.S            |   83 +++++++++++++++++----------------
 arch/arm/mm/Kconfig               |   15 ++++++
 arch/arm/mm/cache-l2x0.c          |   74 +++++++++++++++++++++---------
 arch/arm/mm/cache-v6.S            |   17 +++++--
 arch/arm/mm/cache-v7.S            |    4 ++
 arch/arm/mm/init.c                |   14 ++----
 arch/arm/mm/mmu.c                 |   14 ++++++
 arch/arm/mm/nommu.c               |   13 ++++-
 arch/arm/mm/proc-macros.S         |    7 +++
 arch/arm/mm/proc-v7.S             |    5 +-
 arch/arm/mm/tlb-v7.S              |    8 +++
 drivers/net/smsc911x.c            |   92 +++++++++++++++++++++----------------
 mm/bootmem.c                      |    2 +
 mm/page_alloc.c                   |    1 
 27 files changed, 355 insertions(+), 166 deletions(-)

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used
  2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
  2010-05-04 17:02   ` Jason McMullan
  2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

With this L2 cache controller, the cache maintenance by PA and sync
operations are atomic and do not require a "wait" loop or spinlocks.
This patch conditionally defines the cache_wait() function and locking
primitives (rather than duplicating the functions or file).

Since L2x0 cache controllers do not work with ARMv7 CPUs, the patch
automatically enables CACHE_PL310 when CPU_V7 is defined.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/mm/Kconfig      |    7 ++++
 arch/arm/mm/cache-l2x0.c |   73 ++++++++++++++++++++++++++++++++--------------
 2 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 5bd7c89..5df74c1 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -767,6 +767,13 @@ config CACHE_L2X0
 	help
 	  This option enables the L2x0 PrimeCell.
 
+config CACHE_PL310
+	bool
+	depends on CACHE_L2X0
+	default y if CPU_V7
+	help
+	  This option enables support for the PL310 cache controller.
+
 config CACHE_TAUROS2
 	bool "Enable the Tauros2 L2 cache controller"
 	depends on ARCH_DOVE
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 21ad68b..87bd0a0 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -26,15 +26,42 @@
 #define CACHE_LINE_SIZE		32
 
 static void __iomem *l2x0_base;
-static DEFINE_SPINLOCK(l2x0_lock);
 
-static inline void cache_wait(void __iomem *reg, unsigned long mask)
+static inline void cache_wait_always(void __iomem *reg, unsigned long mask)
 {
 	/* wait for the operation to complete */
 	while (readl(reg) & mask)
 		;
 }
 
+#ifdef CONFIG_CACHE_PL310
+
+static inline void cache_wait(void __iomem *reg, unsigned long mask)
+{
+	/* cache operations are atomic */
+}
+
+#define _l2x0_lock(lock, flags)		((void)(flags))
+#define _l2x0_unlock(lock, flags)	((void)(flags))
+
+#define block_end(start, end)		(end)
+
+#define L2CC_TYPE			"PL310/L2C-310"
+
+#else	/* !CONFIG_CACHE_PL310 */
+
+#define cache_wait			cache_wait_always
+
+static DEFINE_SPINLOCK(l2x0_lock);
+#define _l2x0_lock(lock, flags)		spin_lock_irqsave(lock, flags)
+#define _l2x0_unlock(lock, flags)	spin_unlock_irqrestore(lock, flags)
+
+#define block_end(start, end)		((start) + min((end) - (start), 4096UL))
+
+#define L2CC_TYPE			"L2x0"
+
+#endif	/* CONFIG_CACHE_PL310 */
+
 static inline void cache_sync(void)
 {
 	void __iomem *base = l2x0_base;
@@ -97,9 +124,9 @@ static void l2x0_cache_sync(void)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&l2x0_lock, flags);
+	_l2x0_lock(&l2x0_lock, flags);
 	cache_sync();
-	spin_unlock_irqrestore(&l2x0_lock, flags);
+	_l2x0_unlock(&l2x0_lock, flags);
 }
 
 static inline void l2x0_inv_all(void)
@@ -107,11 +134,11 @@ static inline void l2x0_inv_all(void)
 	unsigned long flags;
 
 	/* invalidate all ways */
-	spin_lock_irqsave(&l2x0_lock, flags);
+	_l2x0_lock(&l2x0_lock, flags);
 	writel(0xff, l2x0_base + L2X0_INV_WAY);
-	cache_wait(l2x0_base + L2X0_INV_WAY, 0xff);
+	cache_wait_always(l2x0_base + L2X0_INV_WAY, 0xff);
 	cache_sync();
-	spin_unlock_irqrestore(&l2x0_lock, flags);
+	_l2x0_unlock(&l2x0_lock, flags);
 }
 
 static void l2x0_inv_range(unsigned long start, unsigned long end)
@@ -119,7 +146,7 @@ static void l2x0_inv_range(unsigned long start, unsigned long end)
 	void __iomem *base = l2x0_base;
 	unsigned long flags;
 
-	spin_lock_irqsave(&l2x0_lock, flags);
+	_l2x0_lock(&l2x0_lock, flags);
 	if (start & (CACHE_LINE_SIZE - 1)) {
 		start &= ~(CACHE_LINE_SIZE - 1);
 		debug_writel(0x03);
@@ -136,7 +163,7 @@ static void l2x0_inv_range(unsigned long start, unsigned long end)
 	}
 
 	while (start < end) {
-		unsigned long blk_end = start + min(end - start, 4096UL);
+		unsigned long blk_end = block_end(start, end);
 
 		while (start < blk_end) {
 			l2x0_inv_line(start);
@@ -144,13 +171,13 @@ static void l2x0_inv_range(unsigned long start, unsigned long end)
 		}
 
 		if (blk_end < end) {
-			spin_unlock_irqrestore(&l2x0_lock, flags);
-			spin_lock_irqsave(&l2x0_lock, flags);
+			_l2x0_unlock(&l2x0_lock, flags);
+			_l2x0_lock(&l2x0_lock, flags);
 		}
 	}
 	cache_wait(base + L2X0_INV_LINE_PA, 1);
 	cache_sync();
-	spin_unlock_irqrestore(&l2x0_lock, flags);
+	_l2x0_unlock(&l2x0_lock, flags);
 }
 
 static void l2x0_clean_range(unsigned long start, unsigned long end)
@@ -158,10 +185,10 @@ static void l2x0_clean_range(unsigned long start, unsigned long end)
 	void __iomem *base = l2x0_base;
 	unsigned long flags;
 
-	spin_lock_irqsave(&l2x0_lock, flags);
+	_l2x0_lock(&l2x0_lock, flags);
 	start &= ~(CACHE_LINE_SIZE - 1);
 	while (start < end) {
-		unsigned long blk_end = start + min(end - start, 4096UL);
+		unsigned long blk_end = block_end(start, end);
 
 		while (start < blk_end) {
 			l2x0_clean_line(start);
@@ -169,13 +196,13 @@ static void l2x0_clean_range(unsigned long start, unsigned long end)
 		}
 
 		if (blk_end < end) {
-			spin_unlock_irqrestore(&l2x0_lock, flags);
-			spin_lock_irqsave(&l2x0_lock, flags);
+			_l2x0_unlock(&l2x0_lock, flags);
+			_l2x0_lock(&l2x0_lock, flags);
 		}
 	}
 	cache_wait(base + L2X0_CLEAN_LINE_PA, 1);
 	cache_sync();
-	spin_unlock_irqrestore(&l2x0_lock, flags);
+	_l2x0_unlock(&l2x0_lock, flags);
 }
 
 static void l2x0_flush_range(unsigned long start, unsigned long end)
@@ -183,10 +210,10 @@ static void l2x0_flush_range(unsigned long start, unsigned long end)
 	void __iomem *base = l2x0_base;
 	unsigned long flags;
 
-	spin_lock_irqsave(&l2x0_lock, flags);
+	_l2x0_lock(&l2x0_lock, flags);
 	start &= ~(CACHE_LINE_SIZE - 1);
 	while (start < end) {
-		unsigned long blk_end = start + min(end - start, 4096UL);
+		unsigned long blk_end = block_end(start, end);
 
 		debug_writel(0x03);
 		while (start < blk_end) {
@@ -196,13 +223,13 @@ static void l2x0_flush_range(unsigned long start, unsigned long end)
 		debug_writel(0x00);
 
 		if (blk_end < end) {
-			spin_unlock_irqrestore(&l2x0_lock, flags);
-			spin_lock_irqsave(&l2x0_lock, flags);
+			_l2x0_unlock(&l2x0_lock, flags);
+			_l2x0_lock(&l2x0_lock, flags);
 		}
 	}
 	cache_wait(base + L2X0_CLEAN_INV_LINE_PA, 1);
 	cache_sync();
-	spin_unlock_irqrestore(&l2x0_lock, flags);
+	_l2x0_unlock(&l2x0_lock, flags);
 }
 
 void __init l2x0_init(void __iomem *base, __u32 aux_val, __u32 aux_mask)
@@ -236,5 +263,5 @@ void __init l2x0_init(void __iomem *base, __u32 aux_val, __u32 aux_mask)
 	outer_cache.flush_range = l2x0_flush_range;
 	outer_cache.sync = l2x0_cache_sync;
 
-	printk(KERN_INFO "L2X0 cache controller enabled\n");
+	pr_info(L2CC_TYPE " cache controller enabled\n");
 }

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
  2010-05-04 16:44 ` [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
  2010-05-04 17:04   ` Jason McMullan
                     ` (2 more replies)
  2010-05-04 16:44 ` [PATCH 3/8] ARM: Align machine_desc.phys_io to a 1MB section Catalin Marinas
                   ` (6 subsequent siblings)
  8 siblings, 3 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

The Snoop Control Unit on the ARM11MPCore hardware does not detect the
cache operations and the dma_cache_maint*() functions may leave stale
cache entries on other CPUs. The solution implemented in this patch
performs a Read or Write For Ownership in the ARMv6 DMA cache
maintenance functions. These LDR/STR instructions change the cache line
state to shared or exclusive so that the cache maintenance operation has
the desired effect.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/mm/cache-v6.S |   17 +++++++++++++----
 1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 9d89c67..e46ecd8 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -211,6 +211,9 @@ v6_dma_inv_range:
 	mcrne	p15, 0, r1, c7, c15, 1		@ clean & invalidate unified line
 #endif
 1:
+#ifdef CONFIG_SMP
+	str	r0, [r0]			@ write for ownership
+#endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c6, 1		@ invalidate D line
 #else
@@ -231,6 +234,9 @@ v6_dma_inv_range:
 v6_dma_clean_range:
 	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
 1:
+#ifdef CONFIG_SMP
+	ldr	r2, [r0]			@ read for ownership
+#endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c10, 1		@ clean D line
 #else
@@ -251,6 +257,10 @@ v6_dma_clean_range:
 ENTRY(v6_dma_flush_range)
 	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
 1:
+#ifdef CONFIG_SMP
+	ldr	r2, [r0]			@ read for ownership
+	str	r2, [r0]			@ write for ownership
+#endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c14, 1		@ clean & invalidate D line
 #else
@@ -273,7 +283,9 @@ ENTRY(v6_dma_map_area)
 	add	r1, r1, r0
 	teq	r2, #DMA_FROM_DEVICE
 	beq	v6_dma_inv_range
-	b	v6_dma_clean_range
+	teq	r2, #DMA_TO_DEVICE
+	beq	v6_dma_clean_range
+	b	v6_dma_flush_range
 ENDPROC(v6_dma_map_area)
 
 /*
@@ -283,9 +295,6 @@ ENDPROC(v6_dma_map_area)
  *	- dir	- DMA direction
  */
 ENTRY(v6_dma_unmap_area)
-	add	r1, r1, r0
-	teq	r2, #DMA_TO_DEVICE
-	bne	v6_dma_inv_range
 	mov	pc, lr
 ENDPROC(v6_dma_unmap_area)
 

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 3/8] ARM: Align machine_desc.phys_io to a 1MB section
  2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
  2010-05-04 16:44 ` [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used Catalin Marinas
  2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
  2010-05-04 16:44 ` [PATCH 4/8] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

Platforms like RealView don't pass a section-aligned pointer via the
machine_desc structure. This patch aligns the pointer in the
__create_page_tables function. Reported by Tony Thompson.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/kernel/head.S |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index eb62bf9..82ea924 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -302,6 +302,8 @@ __create_page_tables:
 	movhi	r3, #0x0800
 	add	r6, r0, r3
 	ldr	r3, [r8, #MACHINFO_PHYSIO]
+	mov	r3, r3, lsr #20			@ 1MB-aligned address
+	mov	r3, r3, lsl #20
 	orr	r3, r3, r7
 1:	str	r3, [r0], #4
 	add	r3, r3, #1 << 20

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 4/8] ARM: Remove the domain switching on ARMv6k/v7 CPUs
  2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
                   ` (2 preceding siblings ...)
  2010-05-04 16:44 ` [PATCH 3/8] ARM: Align machine_desc.phys_io to a 1MB section Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
  2010-05-04 16:44 ` [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c Catalin Marinas
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

This patch removes the domain switching functionality via the set_fs and
__switch_to functions on cores that have a TLS register.

Currently, the ioremap and vmalloc areas share the same level 1 page
tables and therefore have the same domain (DOMAIN_KERNEL). When the
kernel domain is modified from Client to Manager (via the __set_fs or in
the __switch_to function), the XN (eXecute Never) bit is overridden and
newer CPUs can speculatively prefetch the ioremap'ed memory.

Linux performs the kernel domain switching to allow user-specific
functions (copy_to/from_user, get/put_user etc.) to access kernel
memory. In order for these functions to work with the kernel domain set
to Client, the patch modifies the LDRT/STRT and related instructions to
the LDR/STR ones.

The user pages access rights are also modified for kernel read-only
access rather than read/write so that the copy-on-write mechanism still
works. CPU_USE_DOMAINS gets disabled only if HAS_TLS_REG is defined
since writing the TLS value to the high vectors page isn't possible.

The user addresses passed to the kernel are checked by the access_ok()
function so that they do not point to the kernel space.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/assembler.h |   13 +++---
 arch/arm/include/asm/domain.h    |   31 +++++++++++++-
 arch/arm/include/asm/futex.h     |    9 ++--
 arch/arm/include/asm/uaccess.h   |   16 ++++---
 arch/arm/kernel/entry-armv.S     |    4 +-
 arch/arm/kernel/traps.c          |   17 ++++++++
 arch/arm/lib/getuser.S           |   13 +++---
 arch/arm/lib/putuser.S           |   29 +++++++------
 arch/arm/lib/uaccess.S           |   83 +++++++++++++++++++-------------------
 arch/arm/mm/Kconfig              |    8 ++++
 arch/arm/mm/proc-macros.S        |    7 +++
 arch/arm/mm/proc-v7.S            |    5 +-
 12 files changed, 150 insertions(+), 85 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 00f46d9..cc84083 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -18,6 +18,7 @@
 #endif
 
 #include <asm/ptrace.h>
+#include <asm/domain.h>
 
 /*
  * Endian independent macros for shifting bytes within registers.
@@ -183,12 +184,12 @@
  */
 #ifdef CONFIG_THUMB2_KERNEL
 
-	.macro	usraccoff, instr, reg, ptr, inc, off, cond, abort
+	.macro	usraccoff, instr, reg, ptr, inc, off, cond, abort, t=T()
 9999:
 	.if	\inc == 1
-	\instr\cond\()bt \reg, [\ptr, #\off]
+	\instr\cond\()b\()\t\().w \reg, [\ptr, #\off]
 	.elseif	\inc == 4
-	\instr\cond\()t \reg, [\ptr, #\off]
+	\instr\cond\()\t\().w \reg, [\ptr, #\off]
 	.else
 	.error	"Unsupported inc macro argument"
 	.endif
@@ -223,13 +224,13 @@
 
 #else	/* !CONFIG_THUMB2_KERNEL */
 
-	.macro	usracc, instr, reg, ptr, inc, cond, rept, abort
+	.macro	usracc, instr, reg, ptr, inc, cond, rept, abort, t=T()
 	.rept	\rept
 9999:
 	.if	\inc == 1
-	\instr\cond\()bt \reg, [\ptr], #\inc
+	\instr\cond\()b\()\t \reg, [\ptr], #\inc
 	.elseif	\inc == 4
-	\instr\cond\()t \reg, [\ptr], #\inc
+	\instr\cond\()\t \reg, [\ptr], #\inc
 	.else
 	.error	"Unsupported inc macro argument"
 	.endif
diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h
index cc7ef40..af18cea 100644
--- a/arch/arm/include/asm/domain.h
+++ b/arch/arm/include/asm/domain.h
@@ -45,13 +45,17 @@
  */
 #define DOMAIN_NOACCESS	0
 #define DOMAIN_CLIENT	1
+#ifdef CONFIG_CPU_USE_DOMAINS
 #define DOMAIN_MANAGER	3
+#else
+#define DOMAIN_MANAGER	1
+#endif
 
 #define domain_val(dom,type)	((type) << (2*(dom)))
 
 #ifndef __ASSEMBLY__
 
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
 #define set_domain(x)					\
 	do {						\
 	__asm__ __volatile__(				\
@@ -74,5 +78,28 @@
 #define modify_domain(dom,type)	do { } while (0)
 #endif
 
+/*
+ * Generate the T (user) versions of the LDR/STR and related
+ * instructions (inline assembly)
+ */
+#ifdef CONFIG_CPU_USE_DOMAINS
+#define T(instr)	#instr "t"
+#else
+#define T(instr)	#instr
 #endif
-#endif /* !__ASSEMBLY__ */
+
+#else /* __ASSEMBLY__ */
+
+/*
+ * Generate the T (user) versions of the LDR/STR and related
+ * instructions
+ */
+#ifdef CONFIG_CPU_USE_DOMAINS
+#define T(instr)	instr ## t
+#else
+#define T(instr)	instr
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* !__ASM_PROC_DOMAIN_H */
diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index bfcc159..8d868bd 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -13,12 +13,13 @@
 #include <linux/preempt.h>
 #include <linux/uaccess.h>
 #include <asm/errno.h>
+#include <asm/domain.h>
 
 #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg)	\
 	__asm__ __volatile__(					\
-	"1:	ldrt	%1, [%2]\n"				\
+	"1:	" T(ldr) "	%1, [%2]\n"			\
 	"	" insn "\n"					\
-	"2:	strt	%0, [%2]\n"				\
+	"2:	" T(str) "	%0, [%2]\n"			\
 	"	mov	%0, #0\n"				\
 	"3:\n"							\
 	"	.section __ex_table,\"a\"\n"			\
@@ -97,10 +98,10 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
 	pagefault_disable();	/* implies preempt_disable() */
 
 	__asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
-	"1:	ldrt	%0, [%3]\n"
+	"1:	" T(ldr) "	%0, [%3]\n"
 	"	teq	%0, %1\n"
 	"	it	eq	@ explicit IT needed for the 2b label\n"
-	"2:	streqt	%2, [%3]\n"
+	"2:	" T(streq) "	%2, [%3]\n"
 	"3:\n"
 	"	.section __ex_table,\"a\"\n"
 	"	.align	3\n"
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 1d6bd40..e4d0905 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -227,7 +227,7 @@ do {									\
 
 #define __get_user_asm_byte(x,addr,err)				\
 	__asm__ __volatile__(					\
-	"1:	ldrbt	%1,[%2]\n"				\
+	"1:	" T(ldrb) "	%1,[%2],#0\n"			\
 	"2:\n"							\
 	"	.section .fixup,\"ax\"\n"			\
 	"	.align	2\n"					\
@@ -263,7 +263,7 @@ do {									\
 
 #define __get_user_asm_word(x,addr,err)				\
 	__asm__ __volatile__(					\
-	"1:	ldrt	%1,[%2]\n"				\
+	"1:	" T(ldr) "	%1,[%2],#0\n"			\
 	"2:\n"							\
 	"	.section .fixup,\"ax\"\n"			\
 	"	.align	2\n"					\
@@ -308,7 +308,7 @@ do {									\
 
 #define __put_user_asm_byte(x,__pu_addr,err)			\
 	__asm__ __volatile__(					\
-	"1:	strbt	%1,[%2]\n"				\
+	"1:	" T(strb) "	%1,[%2],#0\n"			\
 	"2:\n"							\
 	"	.section .fixup,\"ax\"\n"			\
 	"	.align	2\n"					\
@@ -341,7 +341,7 @@ do {									\
 
 #define __put_user_asm_word(x,__pu_addr,err)			\
 	__asm__ __volatile__(					\
-	"1:	strt	%1,[%2]\n"				\
+	"1:	" T(str) "	%1,[%2],#0\n"			\
 	"2:\n"							\
 	"	.section .fixup,\"ax\"\n"			\
 	"	.align	2\n"					\
@@ -366,10 +366,10 @@ do {									\
 
 #define __put_user_asm_dword(x,__pu_addr,err)			\
 	__asm__ __volatile__(					\
- ARM(	"1:	strt	" __reg_oper1 ", [%1], #4\n"	)	\
- ARM(	"2:	strt	" __reg_oper0 ", [%1]\n"	)	\
- THUMB(	"1:	strt	" __reg_oper1 ", [%1]\n"	)	\
- THUMB(	"2:	strt	" __reg_oper0 ", [%1, #4]\n"	)	\
+ ARM(	"1:	" T(str) "	" __reg_oper1 ", [%1], #4\n"	)	\
+ ARM(	"2:	" T(str) "	" __reg_oper0 ", [%1]\n"	)	\
+ THUMB(	"1:	" T(str) "	" __reg_oper1 ", [%1]\n"	)	\
+ THUMB(	"2:	" T(str) "	" __reg_oper0 ", [%1, #4]\n"	)	\
 	"3:\n"							\
 	"	.section .fixup,\"ax\"\n"			\
 	"	.align	2\n"					\
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 6c5cf36..694f7ab 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -736,7 +736,7 @@ ENTRY(__switch_to)
  THUMB(	stmia	ip!, {r4 - sl, fp}	   )	@ Store most regs on stack
  THUMB(	str	sp, [ip], #4		   )
  THUMB(	str	lr, [ip], #4		   )
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
 	ldr	r6, [r2, #TI_CPU_DOMAIN]
 #endif
 #if defined(CONFIG_HAS_TLS_REG)
@@ -745,7 +745,7 @@ ENTRY(__switch_to)
 	mov	r4, #0xffff0fff
 	str	r3, [r4, #-15]			@ TLS val at 0xffff0ff0
 #endif
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
 	mcr	p15, 0, r6, c3, c0, 0		@ Set domain register
 #endif
 	mov	r5, r0
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1621e53..6571e19 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -30,6 +30,7 @@
 #include <asm/unistd.h>
 #include <asm/traps.h>
 #include <asm/unwind.h>
+#include <asm/tlbflush.h>
 
 #include "ptrace.h"
 #include "signal.h"
@@ -750,6 +751,16 @@ void __init early_trap_init(void)
 	extern char __vectors_start[], __vectors_end[];
 	extern char __kuser_helper_start[], __kuser_helper_end[];
 	int kuser_sz = __kuser_helper_end - __kuser_helper_start;
+#if !defined(CONFIG_CPU_USE_DOMAINS) && defined(CONFIG_MMU)
+	pgd_t *pgd = pgd_offset_k(vectors);
+	pmd_t *pmd = pmd_offset(pgd, vectors);
+	pte_t *pte = pte_offset_kernel(pmd, vectors);
+	pte_t entry = *pte;
+
+	/* allow writing to the vectors page */
+	set_pte_ext(pte, pte_mkwrite(entry), 0);
+	local_flush_tlb_kernel_page(vectors);
+#endif
 
 	/*
 	 * Copy the vectors, stubs and kuser helpers (in entry-armv.S)
@@ -769,6 +780,12 @@ void __init early_trap_init(void)
 	memcpy((void *)KERN_RESTART_CODE, syscall_restart_code,
 	       sizeof(syscall_restart_code));
 
+#if !defined(CONFIG_CPU_USE_DOMAINS) && defined(CONFIG_MMU)
+	/* restore the vectors page permissions */
+	set_pte_ext(pte, entry, 0);
+	local_flush_tlb_kernel_page(vectors);
+#endif
+
 	flush_icache_range(vectors, vectors + PAGE_SIZE);
 	modify_domain(DOMAIN_USER, DOMAIN_CLIENT);
 }
diff --git a/arch/arm/lib/getuser.S b/arch/arm/lib/getuser.S
index a1814d9..acc966b 100644
--- a/arch/arm/lib/getuser.S
+++ b/arch/arm/lib/getuser.S
@@ -28,20 +28,21 @@
  */
 #include <linux/linkage.h>
 #include <asm/errno.h>
+#include <asm/domain.h>
 
 ENTRY(__get_user_1)
-1:	ldrbt	r2, [r0]
+1:	T(ldrb)	r2, [r0]
 	mov	r0, #0
 	mov	pc, lr
 ENDPROC(__get_user_1)
 
 ENTRY(__get_user_2)
 #ifdef CONFIG_THUMB2_KERNEL
-2:	ldrbt	r2, [r0]
-3:	ldrbt	r3, [r0, #1]
+2:	T(ldrb)	r2, [r0]
+3:	T(ldrb)	r3, [r0, #1]
 #else
-2:	ldrbt	r2, [r0], #1
-3:	ldrbt	r3, [r0]
+2:	T(ldrb)	r2, [r0], #1
+3:	T(ldrb)	r3, [r0]
 #endif
 #ifndef __ARMEB__
 	orr	r2, r2, r3, lsl #8
@@ -53,7 +54,7 @@ ENTRY(__get_user_2)
 ENDPROC(__get_user_2)
 
 ENTRY(__get_user_4)
-4:	ldrt	r2, [r0]
+4:	T(ldr)	r2, [r0]
 	mov	r0, #0
 	mov	pc, lr
 ENDPROC(__get_user_4)
diff --git a/arch/arm/lib/putuser.S b/arch/arm/lib/putuser.S
index 02fedbf..95b3fe8 100644
--- a/arch/arm/lib/putuser.S
+++ b/arch/arm/lib/putuser.S
@@ -28,9 +28,10 @@
  */
 #include <linux/linkage.h>
 #include <asm/errno.h>
+#include <asm/domain.h>
 
 ENTRY(__put_user_1)
-1:	strbt	r2, [r0]
+1:	T(strb)	r2, [r0]
 	mov	r0, #0
 	mov	pc, lr
 ENDPROC(__put_user_1)
@@ -39,19 +40,19 @@ ENTRY(__put_user_2)
 	mov	ip, r2, lsr #8
 #ifdef CONFIG_THUMB2_KERNEL
 #ifndef __ARMEB__
-2:	strbt	r2, [r0]
-3:	strbt	ip, [r0, #1]
+2:	T(strb)	r2, [r0]
+3:	T(strb)	ip, [r0, #1]
 #else
-2:	strbt	ip, [r0]
-3:	strbt	r2, [r0, #1]
+2:	T(strb)	ip, [r0]
+3:	T(strb)	r2, [r0, #1]
 #endif
 #else	/* !CONFIG_THUMB2_KERNEL */
 #ifndef __ARMEB__
-2:	strbt	r2, [r0], #1
-3:	strbt	ip, [r0]
+2:	T(strb)	r2, [r0], #1
+3:	T(strb)	ip, [r0]
 #else
-2:	strbt	ip, [r0], #1
-3:	strbt	r2, [r0]
+2:	T(strb)	ip, [r0], #1
+3:	T(strb)	r2, [r0]
 #endif
 #endif	/* CONFIG_THUMB2_KERNEL */
 	mov	r0, #0
@@ -59,18 +60,18 @@ ENTRY(__put_user_2)
 ENDPROC(__put_user_2)
 
 ENTRY(__put_user_4)
-4:	strt	r2, [r0]
+4:	T(str)	r2, [r0]
 	mov	r0, #0
 	mov	pc, lr
 ENDPROC(__put_user_4)
 
 ENTRY(__put_user_8)
 #ifdef CONFIG_THUMB2_KERNEL
-5:	strt	r2, [r0]
-6:	strt	r3, [r0, #4]
+5:	T(str)	r2, [r0]
+6:	T(str)	r3, [r0, #4]
 #else
-5:	strt	r2, [r0], #4
-6:	strt	r3, [r0]
+5:	T(str)	r2, [r0], #4
+6:	T(str)	r3, [r0]
 #endif
 	mov	r0, #0
 	mov	pc, lr
diff --git a/arch/arm/lib/uaccess.S b/arch/arm/lib/uaccess.S
index ffdd274..e47cdfd 100644
--- a/arch/arm/lib/uaccess.S
+++ b/arch/arm/lib/uaccess.S
@@ -14,6 +14,7 @@
 #include <linux/linkage.h>
 #include <asm/assembler.h>
 #include <asm/errno.h>
+#include <asm/domain.h>
 
 		.text
 
@@ -31,11 +32,11 @@
 		rsb	ip, ip, #4
 		cmp	ip, #2
 		ldrb	r3, [r1], #1
-USER(		strbt	r3, [r0], #1)			@ May fault
+USER(		T(strb)	r3, [r0], #1)			@ May fault
 		ldrgeb	r3, [r1], #1
-USER(		strgebt	r3, [r0], #1)			@ May fault
+USER(		T(strgeb) r3, [r0], #1)			@ May fault
 		ldrgtb	r3, [r1], #1
-USER(		strgtbt	r3, [r0], #1)			@ May fault
+USER(		T(strgtb) r3, [r0], #1)			@ May fault
 		sub	r2, r2, ip
 		b	.Lc2u_dest_aligned
 
@@ -58,7 +59,7 @@ ENTRY(__copy_to_user)
 		addmi	ip, r2, #4
 		bmi	.Lc2u_0nowords
 		ldr	r3, [r1], #4
-USER(		strt	r3, [r0], #4)			@ May fault
+USER(		T(str)	r3, [r0], #4)			@ May fault
 		mov	ip, r0, lsl #32 - PAGE_SHIFT	@ On each page, use a ld/st??t instruction
 		rsb	ip, ip, #0
 		movs	ip, ip, lsr #32 - PAGE_SHIFT
@@ -87,18 +88,18 @@ USER(		strt	r3, [r0], #4)			@ May fault
 		stmneia	r0!, {r3 - r4}			@ Shouldnt fault
 		tst	ip, #4
 		ldrne	r3, [r1], #4
-		strnet	r3, [r0], #4			@ Shouldnt fault
+		T(strne) r3, [r0], #4			@ Shouldnt fault
 		ands	ip, ip, #3
 		beq	.Lc2u_0fupi
 .Lc2u_0nowords:	teq	ip, #0
 		beq	.Lc2u_finished
 .Lc2u_nowords:	cmp	ip, #2
 		ldrb	r3, [r1], #1
-USER(		strbt	r3, [r0], #1)			@ May fault
+USER(		T(strb)	r3, [r0], #1)			@ May fault
 		ldrgeb	r3, [r1], #1
-USER(		strgebt	r3, [r0], #1)			@ May fault
+USER(		T(strgeb) r3, [r0], #1)			@ May fault
 		ldrgtb	r3, [r1], #1
-USER(		strgtbt	r3, [r0], #1)			@ May fault
+USER(		T(strgtb) r3, [r0], #1)			@ May fault
 		b	.Lc2u_finished
 
 .Lc2u_not_enough:
@@ -119,7 +120,7 @@ USER(		strgtbt	r3, [r0], #1)			@ May fault
 		mov	r3, r7, pull #8
 		ldr	r7, [r1], #4
 		orr	r3, r3, r7, push #24
-USER(		strt	r3, [r0], #4)			@ May fault
+USER(		T(str)	r3, [r0], #4)			@ May fault
 		mov	ip, r0, lsl #32 - PAGE_SHIFT
 		rsb	ip, ip, #0
 		movs	ip, ip, lsr #32 - PAGE_SHIFT
@@ -154,18 +155,18 @@ USER(		strt	r3, [r0], #4)			@ May fault
 		movne	r3, r7, pull #8
 		ldrne	r7, [r1], #4
 		orrne	r3, r3, r7, push #24
-		strnet	r3, [r0], #4			@ Shouldnt fault
+		T(strne) r3, [r0], #4			@ Shouldnt fault
 		ands	ip, ip, #3
 		beq	.Lc2u_1fupi
 .Lc2u_1nowords:	mov	r3, r7, get_byte_1
 		teq	ip, #0
 		beq	.Lc2u_finished
 		cmp	ip, #2
-USER(		strbt	r3, [r0], #1)			@ May fault
+USER(		T(strb)	r3, [r0], #1)			@ May fault
 		movge	r3, r7, get_byte_2
-USER(		strgebt	r3, [r0], #1)			@ May fault
+USER(		T(strgeb) r3, [r0], #1)			@ May fault
 		movgt	r3, r7, get_byte_3
-USER(		strgtbt	r3, [r0], #1)			@ May fault
+USER(		T(strgtb) r3, [r0], #1)			@ May fault
 		b	.Lc2u_finished
 
 .Lc2u_2fupi:	subs	r2, r2, #4
@@ -174,7 +175,7 @@ USER(		strgtbt	r3, [r0], #1)			@ May fault
 		mov	r3, r7, pull #16
 		ldr	r7, [r1], #4
 		orr	r3, r3, r7, push #16
-USER(		strt	r3, [r0], #4)			@ May fault
+USER(		T(str)	r3, [r0], #4)			@ May fault
 		mov	ip, r0, lsl #32 - PAGE_SHIFT
 		rsb	ip, ip, #0
 		movs	ip, ip, lsr #32 - PAGE_SHIFT
@@ -209,18 +210,18 @@ USER(		strt	r3, [r0], #4)			@ May fault
 		movne	r3, r7, pull #16
 		ldrne	r7, [r1], #4
 		orrne	r3, r3, r7, push #16
-		strnet	r3, [r0], #4			@ Shouldnt fault
+		T(strne) r3, [r0], #4			@ Shouldnt fault
 		ands	ip, ip, #3
 		beq	.Lc2u_2fupi
 .Lc2u_2nowords:	mov	r3, r7, get_byte_2
 		teq	ip, #0
 		beq	.Lc2u_finished
 		cmp	ip, #2
-USER(		strbt	r3, [r0], #1)			@ May fault
+USER(		T(strb)	r3, [r0], #1)			@ May fault
 		movge	r3, r7, get_byte_3
-USER(		strgebt	r3, [r0], #1)			@ May fault
+USER(		T(strgeb) r3, [r0], #1)			@ May fault
 		ldrgtb	r3, [r1], #0
-USER(		strgtbt	r3, [r0], #1)			@ May fault
+USER(		T(strgtb) r3, [r0], #1)			@ May fault
 		b	.Lc2u_finished
 
 .Lc2u_3fupi:	subs	r2, r2, #4
@@ -229,7 +230,7 @@ USER(		strgtbt	r3, [r0], #1)			@ May fault
 		mov	r3, r7, pull #24
 		ldr	r7, [r1], #4
 		orr	r3, r3, r7, push #8
-USER(		strt	r3, [r0], #4)			@ May fault
+USER(		T(str)	r3, [r0], #4)			@ May fault
 		mov	ip, r0, lsl #32 - PAGE_SHIFT
 		rsb	ip, ip, #0
 		movs	ip, ip, lsr #32 - PAGE_SHIFT
@@ -264,18 +265,18 @@ USER(		strt	r3, [r0], #4)			@ May fault
 		movne	r3, r7, pull #24
 		ldrne	r7, [r1], #4
 		orrne	r3, r3, r7, push #8
-		strnet	r3, [r0], #4			@ Shouldnt fault
+		T(strne) r3, [r0], #4			@ Shouldnt fault
 		ands	ip, ip, #3
 		beq	.Lc2u_3fupi
 .Lc2u_3nowords:	mov	r3, r7, get_byte_3
 		teq	ip, #0
 		beq	.Lc2u_finished
 		cmp	ip, #2
-USER(		strbt	r3, [r0], #1)			@ May fault
+USER(		T(strb)	r3, [r0], #1)			@ May fault
 		ldrgeb	r3, [r1], #1
-USER(		strgebt	r3, [r0], #1)			@ May fault
+USER(		T(strgeb) r3, [r0], #1)			@ May fault
 		ldrgtb	r3, [r1], #0
-USER(		strgtbt	r3, [r0], #1)			@ May fault
+USER(		T(strgtb) r3, [r0], #1)			@ May fault
 		b	.Lc2u_finished
 ENDPROC(__copy_to_user)
 
@@ -294,11 +295,11 @@ ENDPROC(__copy_to_user)
 .Lcfu_dest_not_aligned:
 		rsb	ip, ip, #4
 		cmp	ip, #2
-USER(		ldrbt	r3, [r1], #1)			@ May fault
+USER(		T(ldrb)	r3, [r1], #1)			@ May fault
 		strb	r3, [r0], #1
-USER(		ldrgebt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgeb) r3, [r1], #1)			@ May fault
 		strgeb	r3, [r0], #1
-USER(		ldrgtbt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgtb) r3, [r1], #1)			@ May fault
 		strgtb	r3, [r0], #1
 		sub	r2, r2, ip
 		b	.Lcfu_dest_aligned
@@ -321,7 +322,7 @@ ENTRY(__copy_from_user)
 .Lcfu_0fupi:	subs	r2, r2, #4
 		addmi	ip, r2, #4
 		bmi	.Lcfu_0nowords
-USER(		ldrt	r3, [r1], #4)
+USER(		T(ldr)	r3, [r1], #4)
 		str	r3, [r0], #4
 		mov	ip, r1, lsl #32 - PAGE_SHIFT	@ On each page, use a ld/st??t instruction
 		rsb	ip, ip, #0
@@ -350,18 +351,18 @@ USER(		ldrt	r3, [r1], #4)
 		ldmneia	r1!, {r3 - r4}			@ Shouldnt fault
 		stmneia	r0!, {r3 - r4}
 		tst	ip, #4
-		ldrnet	r3, [r1], #4			@ Shouldnt fault
+		T(ldrne) r3, [r1], #4			@ Shouldnt fault
 		strne	r3, [r0], #4
 		ands	ip, ip, #3
 		beq	.Lcfu_0fupi
 .Lcfu_0nowords:	teq	ip, #0
 		beq	.Lcfu_finished
 .Lcfu_nowords:	cmp	ip, #2
-USER(		ldrbt	r3, [r1], #1)			@ May fault
+USER(		T(ldrb)	r3, [r1], #1)			@ May fault
 		strb	r3, [r0], #1
-USER(		ldrgebt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgeb) r3, [r1], #1)			@ May fault
 		strgeb	r3, [r0], #1
-USER(		ldrgtbt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgtb) r3, [r1], #1)			@ May fault
 		strgtb	r3, [r0], #1
 		b	.Lcfu_finished
 
@@ -374,7 +375,7 @@ USER(		ldrgtbt	r3, [r1], #1)			@ May fault
 
 .Lcfu_src_not_aligned:
 		bic	r1, r1, #3
-USER(		ldrt	r7, [r1], #4)			@ May fault
+USER(		T(ldr)	r7, [r1], #4)			@ May fault
 		cmp	ip, #2
 		bgt	.Lcfu_3fupi
 		beq	.Lcfu_2fupi
@@ -382,7 +383,7 @@ USER(		ldrt	r7, [r1], #4)			@ May fault
 		addmi	ip, r2, #4
 		bmi	.Lcfu_1nowords
 		mov	r3, r7, pull #8
-USER(		ldrt	r7, [r1], #4)			@ May fault
+USER(		T(ldr)	r7, [r1], #4)			@ May fault
 		orr	r3, r3, r7, push #24
 		str	r3, [r0], #4
 		mov	ip, r1, lsl #32 - PAGE_SHIFT
@@ -417,7 +418,7 @@ USER(		ldrt	r7, [r1], #4)			@ May fault
 		stmneia	r0!, {r3 - r4}
 		tst	ip, #4
 		movne	r3, r7, pull #8
-USER(		ldrnet	r7, [r1], #4)			@ May fault
+USER(		T(ldrne) r7, [r1], #4)			@ May fault
 		orrne	r3, r3, r7, push #24
 		strne	r3, [r0], #4
 		ands	ip, ip, #3
@@ -437,7 +438,7 @@ USER(		ldrnet	r7, [r1], #4)			@ May fault
 		addmi	ip, r2, #4
 		bmi	.Lcfu_2nowords
 		mov	r3, r7, pull #16
-USER(		ldrt	r7, [r1], #4)			@ May fault
+USER(		T(ldr)	r7, [r1], #4)			@ May fault
 		orr	r3, r3, r7, push #16
 		str	r3, [r0], #4
 		mov	ip, r1, lsl #32 - PAGE_SHIFT
@@ -473,7 +474,7 @@ USER(		ldrt	r7, [r1], #4)			@ May fault
 		stmneia	r0!, {r3 - r4}
 		tst	ip, #4
 		movne	r3, r7, pull #16
-USER(		ldrnet	r7, [r1], #4)			@ May fault
+USER(		T(ldrne) r7, [r1], #4)			@ May fault
 		orrne	r3, r3, r7, push #16
 		strne	r3, [r0], #4
 		ands	ip, ip, #3
@@ -485,7 +486,7 @@ USER(		ldrnet	r7, [r1], #4)			@ May fault
 		strb	r3, [r0], #1
 		movge	r3, r7, get_byte_3
 		strgeb	r3, [r0], #1
-USER(		ldrgtbt	r3, [r1], #0)			@ May fault
+USER(		T(ldrgtb) r3, [r1], #0)			@ May fault
 		strgtb	r3, [r0], #1
 		b	.Lcfu_finished
 
@@ -493,7 +494,7 @@ USER(		ldrgtbt	r3, [r1], #0)			@ May fault
 		addmi	ip, r2, #4
 		bmi	.Lcfu_3nowords
 		mov	r3, r7, pull #24
-USER(		ldrt	r7, [r1], #4)			@ May fault
+USER(		T(ldr)	r7, [r1], #4)			@ May fault
 		orr	r3, r3, r7, push #8
 		str	r3, [r0], #4
 		mov	ip, r1, lsl #32 - PAGE_SHIFT
@@ -528,7 +529,7 @@ USER(		ldrt	r7, [r1], #4)			@ May fault
 		stmneia	r0!, {r3 - r4}
 		tst	ip, #4
 		movne	r3, r7, pull #24
-USER(		ldrnet	r7, [r1], #4)			@ May fault
+USER(		T(ldrne) r7, [r1], #4)			@ May fault
 		orrne	r3, r3, r7, push #8
 		strne	r3, [r0], #4
 		ands	ip, ip, #3
@@ -538,9 +539,9 @@ USER(		ldrnet	r7, [r1], #4)			@ May fault
 		beq	.Lcfu_finished
 		cmp	ip, #2
 		strb	r3, [r0], #1
-USER(		ldrgebt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgeb) r3, [r1], #1)			@ May fault
 		strgeb	r3, [r0], #1
-USER(		ldrgtbt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgtb) r3, [r1], #1)			@ May fault
 		strgtb	r3, [r0], #1
 		b	.Lcfu_finished
 ENDPROC(__copy_from_user)
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 5df74c1..755e081 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -597,6 +597,14 @@ config CPU_CP15_MPU
 	help
 	  Processor has the CP15 register, which has MPU related registers.
 
+config CPU_USE_DOMAINS
+	bool
+	depends on MMU
+	default y if !HAS_TLS_REG
+	help
+	  This option enables or disables the use of domain switching
+	  via the set_fs() function.
+
 #
 # CPU supports 36-bit I/O
 #
diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
index 7d63bea..337f102 100644
--- a/arch/arm/mm/proc-macros.S
+++ b/arch/arm/mm/proc-macros.S
@@ -99,6 +99,10 @@
  *  110x   0   1   0	r/w	r/o
  *  11x0   0   1   0	r/w	r/o
  *  1111   0   1   1	r/w	r/w
+ *
+ * If !CONFIG_CPU_USE_DOMAINS, the following permissions are changed:
+ *  110x   1   1   1	r/o	r/o
+ *  11x0   1   1   1	r/o	r/o
  */
 	.macro	armv6_mt_table pfx
 \pfx\()_mt_table:
@@ -138,8 +142,11 @@
 
 	tst	r1, #L_PTE_USER
 	orrne	r3, r3, #PTE_EXT_AP1
+#ifdef CONFIG_CPU_USE_DOMAINS
+	@ allow kernel read/write access to read-only user pages
 	tstne	r3, #PTE_EXT_APX
 	bicne	r3, r3, #PTE_EXT_APX | PTE_EXT_AP0
+#endif
 
 	tst	r1, #L_PTE_EXEC
 	orreq	r3, r3, #PTE_EXT_XN
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 7aaf88a..c1c3fe0 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -152,8 +152,11 @@ ENTRY(cpu_v7_set_pte_ext)
 
 	tst	r1, #L_PTE_USER
 	orrne	r3, r3, #PTE_EXT_AP1
+#ifdef CONFIG_CPU_USE_DOMAINS
+	@ allow kernel read/write access to read-only user pages
 	tstne	r3, #PTE_EXT_APX
 	bicne	r3, r3, #PTE_EXT_APX | PTE_EXT_AP0
+#endif
 
 	tst	r1, #L_PTE_EXEC
 	orreq	r3, r3, #PTE_EXT_XN
@@ -240,8 +243,6 @@ __v7_setup:
 	mcr	p15, 0, r10, c2, c0, 2		@ TTB control register
 	orr	r4, r4, #TTB_FLAGS
 	mcr	p15, 0, r4, c2, c0, 1		@ load TTB1
-	mov	r10, #0x1f			@ domains 0, 1 = manager
-	mcr	p15, 0, r10, c3, c0, 0		@ load domain access register
 	/*
 	 * Memory region attributes with SCTLR.TRE=1
 	 *

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c
  2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
                   ` (3 preceding siblings ...)
  2010-05-04 16:44 ` [PATCH 4/8] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
  2010-05-04 17:19   ` Russell King - ARM Linux
  2010-05-04 16:44 ` [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU Catalin Marinas
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

Commit 31aa8fd6 introduced the __arm_ioremap_caller() function but the
nommu.c version did not have the _caller suffix.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/mm/nommu.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
index 9bfeb6b..f8791ee 100644
--- a/arch/arm/mm/nommu.c
+++ b/arch/arm/mm/nommu.c
@@ -87,8 +87,8 @@ void __iomem *__arm_ioremap(unsigned long phys_addr, size_t size,
 }
 EXPORT_SYMBOL(__arm_ioremap);
 
-void __iomem *__arm_ioremap(unsigned long phys_addr, size_t size,
-			    unsigned int mtype, void *caller)
+void __iomem *__arm_ioremap_caller(unsigned long phys_addr, size_t size,
+				   unsigned int mtype, void *caller)
 {
 	return __arm_ioremap(phys_addr, size, mtype);
 }

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU
  2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
                   ` (4 preceding siblings ...)
  2010-05-04 16:44 ` [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
  2010-05-04 17:19   ` Russell King - ARM Linux
  2010-05-04 16:44 ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP Catalin Marinas
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

Commit 7959722 introduced calls to copy_(to|from)_user_page() from
access_process_vm() in mm/nommu.c. The copy_to_user_page() was not
implemented on noMMU ARM.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/mm/nommu.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
index f8791ee..33b3273 100644
--- a/arch/arm/mm/nommu.c
+++ b/arch/arm/mm/nommu.c
@@ -65,6 +65,15 @@ void flush_dcache_page(struct page *page)
 }
 EXPORT_SYMBOL(flush_dcache_page);
 
+void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
+		       unsigned long uaddr, void *dst, const void *src,
+		       unsigned long len)
+{
+	memcpy(dst, src, len);
+	if (vma->vm_flags & VM_EXEC)
+		__cpuc_coherent_user_range(uaddr, uaddr + len);
+}
+
 void __iomem *__arm_ioremap_pfn(unsigned long pfn, unsigned long offset,
 				size_t size, unsigned int mtype)
 {

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
  2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
                   ` (5 preceding siblings ...)
  2010-05-04 16:44 ` [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
  2010-05-28 18:50   ` Russell King - ARM Linux
  2010-05-04 16:44 ` [PATCH 8/8] ARM: Implement phys_mem_access_prot() to avoid attributes aliasing Catalin Marinas
  2010-05-04 16:48 ` [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
  8 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

The standard I-cache Invalidate All (ICIALLU) and Branch Predication
Invalidate All (BPIALL) operations are not automatically broadcast to
the other CPUs in an ARMv7 MP system. The patch adds the Inner Shareable
variants, ICIALLUIS and BPIALLIS, if ARMv7 and SMP.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/cacheflush.h |    4 ++++
 arch/arm/include/asm/tlbflush.h   |   29 ++++++++++++++++++++++++++++-
 arch/arm/mm/cache-v7.S            |    4 ++++
 arch/arm/mm/tlb-v7.S              |    8 ++++++++
 4 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 0d08d41..4656a24 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -371,6 +371,10 @@ static inline void __flush_icache_all(void)
 #ifdef CONFIG_ARM_ERRATA_411920
 	extern void v6_icache_inval_all(void);
 	v6_icache_inval_all();
+#elif defined(CONFIG_SMP) && __LINUX_ARM_ARCH__ >= 7
+	asm("mcr	p15, 0, %0, c7, c1, 0	@ invalidate I-cache inner shareable\n"
+	    :
+	    : "r" (0));
 #else
 	asm("mcr	p15, 0, %0, c7, c5, 0	@ invalidate I-cache\n"
 	    :
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index e085e2c..bd863d8 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -46,6 +46,9 @@
 #define TLB_V7_UIS_FULL (1 << 20)
 #define TLB_V7_UIS_ASID (1 << 21)
 
+/* Inner Shareable BTB operation (ARMv7 MP extensions) */
+#define TLB_V7_IS_BTB	(1 << 22)
+
 #define TLB_L2CLEAN_FR	(1 << 29)		/* Feroceon */
 #define TLB_DCLEAN	(1 << 30)
 #define TLB_WB		(1 << 31)
@@ -183,7 +186,7 @@
 #endif
 
 #ifdef CONFIG_SMP
-#define v7wbi_tlb_flags (TLB_WB | TLB_DCLEAN | TLB_BTB | \
+#define v7wbi_tlb_flags (TLB_WB | TLB_DCLEAN | TLB_V7_IS_BTB | \
 			 TLB_V7_UIS_FULL | TLB_V7_UIS_PAGE | TLB_V7_UIS_ASID)
 #else
 #define v7wbi_tlb_flags (TLB_WB | TLB_DCLEAN | TLB_BTB | \
@@ -339,6 +342,12 @@ static inline void local_flush_tlb_all(void)
 		dsb();
 		isb();
 	}
+	if (tlb_flag(TLB_V7_IS_BTB)) {
+		/* flush the branch target cache */
+		asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero) : "cc");
+		dsb();
+		isb();
+	}
 }
 
 static inline void local_flush_tlb_mm(struct mm_struct *mm)
@@ -376,6 +385,12 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
 		asm("mcr p15, 0, %0, c7, c5, 6" : : "r" (zero) : "cc");
 		dsb();
 	}
+	if (tlb_flag(TLB_V7_IS_BTB)) {
+		/* flush the branch target cache */
+		asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero) : "cc");
+		dsb();
+		isb();
+	}
 }
 
 static inline void
@@ -416,6 +431,12 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		asm("mcr p15, 0, %0, c7, c5, 6" : : "r" (zero) : "cc");
 		dsb();
 	}
+	if (tlb_flag(TLB_V7_IS_BTB)) {
+		/* flush the branch target cache */
+		asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero) : "cc");
+		dsb();
+		isb();
+	}
 }
 
 static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
@@ -454,6 +475,12 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
 		dsb();
 		isb();
 	}
+	if (tlb_flag(TLB_V7_IS_BTB)) {
+		/* flush the branch target cache */
+		asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero) : "cc");
+		dsb();
+		isb();
+	}
 }
 
 /*
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index bcd64f2..06a90dc 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -167,7 +167,11 @@ ENTRY(v7_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mov	r0, #0
+#ifdef CONFIG_SMP
+	mcr	p15, 0, r0, c7, c1, 6		@ invalidate BTB Inner Shareable
+#else
 	mcr	p15, 0, r0, c7, c5, 6		@ invalidate BTB
+#endif
 	dsb
 	isb
 	mov	pc, lr
diff --git a/arch/arm/mm/tlb-v7.S b/arch/arm/mm/tlb-v7.S
index 0cb1848..f3f288a 100644
--- a/arch/arm/mm/tlb-v7.S
+++ b/arch/arm/mm/tlb-v7.S
@@ -50,7 +50,11 @@ ENTRY(v7wbi_flush_user_tlb_range)
 	cmp	r0, r1
 	blo	1b
 	mov	ip, #0
+#ifdef CONFIG_SMP
+	mcr	p15, 0, ip, c7, c1, 6		@ flush BTAC/BTB Inner Shareable
+#else
 	mcr	p15, 0, ip, c7, c5, 6		@ flush BTAC/BTB
+#endif
 	dsb
 	mov	pc, lr
 ENDPROC(v7wbi_flush_user_tlb_range)
@@ -79,7 +83,11 @@ ENTRY(v7wbi_flush_kern_tlb_range)
 	cmp	r0, r1
 	blo	1b
 	mov	r2, #0
+#ifdef CONFIG_SMP
+	mcr	p15, 0, r2, c7, c1, 6		@ flush BTAC/BTB Inner Shareable
+#else
 	mcr	p15, 0, r2, c7, c5, 6		@ flush BTAC/BTB
+#endif
 	dsb
 	isb
 	mov	pc, lr

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 8/8] ARM: Implement phys_mem_access_prot() to avoid attributes aliasing
  2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
                   ` (6 preceding siblings ...)
  2010-05-04 16:44 ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
  2010-05-04 16:48 ` [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
  8 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

ARMv7 onwards requires that there are no aliases to the same physical
location using different memory types (i.e. Normal vs Strongly Ordered).
Access to SO mappings when the unaligned accesses are handled in
hardware is also Unpredictable (pgprot_noncached() mappings in user
space).

The /dev/mem driver requires uncached mappings with O_SYNC. The patch
implements the phys_mem_access_prot() function which generates Strongly
Ordered memory attributes if !pfn_valid() (independent of O_SYNC) and
Normal Noncacheable (writecombine) if O_SYNC.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/pgtable.h |    3 +++
 arch/arm/mm/mmu.c              |   14 ++++++++++++++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 1139768..6fae619 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -317,6 +317,9 @@ static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 #if __LINUX_ARM_ARCH__ >= 7
 #define pgprot_dmacoherent(prot) \
 	__pgprot_modify(prot, L_PTE_MT_MASK|L_PTE_EXEC, L_PTE_MT_BUFFERABLE)
+#define __HAVE_PHYS_MEM_ACCESS_PROT
+extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+				     unsigned long size, pgprot_t vma_prot);
 #else
 #define pgprot_dmacoherent(prot) \
 	__pgprot_modify(prot, L_PTE_MT_MASK|L_PTE_EXEC, L_PTE_MT_UNCACHED)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 241c24a..deb62cc 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -14,6 +14,7 @@
 #include <linux/bootmem.h>
 #include <linux/mman.h>
 #include <linux/nodemask.h>
+#include <linux/fs.h>
 
 #include <asm/cputype.h>
 #include <asm/mach-types.h>
@@ -485,6 +486,19 @@ static void __init build_mem_type_table(void)
 	}
 }
 
+#if __LINUX_ARM_ARCH__ >= 7
+pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+			      unsigned long size, pgprot_t vma_prot)
+{
+	if (!pfn_valid(pfn))
+		return pgprot_noncached(vma_prot);
+	else if (file->f_flags & O_SYNC)
+		return pgprot_writecombine(vma_prot);
+	return vma_prot;
+}
+EXPORT_SYMBOL(phys_mem_access_prot);
+#endif
+
 #define vectors_base()	(vectors_high() ? 0xffff0000 : 0)
 
 static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 0/8] Various patches for comments and upstream
  2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
                   ` (7 preceding siblings ...)
  2010-05-04 16:44 ` [PATCH 8/8] ARM: Implement phys_mem_access_prot() to avoid attributes aliasing Catalin Marinas
@ 2010-05-04 16:48 ` Catalin Marinas
  8 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2010-05-04 at 17:44 +0100, Catalin Marinas wrote:
>  drivers/net/smsc911x.c            |   92 +++++++++++++++++++++----------------
>  mm/bootmem.c                      |    2 +
>  mm/page_alloc.c                   |    1

The summary stats here are wrong (StGit not dealing well with
non-consecutive patches in my series). The individual patches that I
posted are fine.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used
  2010-05-04 16:44 ` [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used Catalin Marinas
@ 2010-05-04 17:02   ` Jason McMullan
  2010-05-05 16:07     ` Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: Jason McMullan @ 2010-05-04 17:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 4, 2010 at 12:44 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> With this L2 cache controller, the cache maintenance by PA and sync
> operations are atomic and do not require a "wait" loop or spinlocks.
> This patch conditionally defines the cache_wait() function and locking
> primitives (rather than duplicating the functions or file).
>
> Since L2x0 cache controllers do not work with ARMv7 CPUs, the patch
> automatically enables CACHE_PL310 when CPU_V7 is defined.
>
> [snip snip snip]
> ?static inline void l2x0_inv_all(void)
> @@ -107,11 +134,11 @@ static inline void l2x0_inv_all(void)
> ? ? ? ?unsigned long flags;
>
> ? ? ? ?/* invalidate all ways */
> - ? ? ? spin_lock_irqsave(&l2x0_lock, flags);
> + ? ? ? _l2x0_lock(&l2x0_lock, flags);
> ? ? ? ?writel(0xff, l2x0_base + L2X0_INV_WAY);
> - ? ? ? cache_wait(l2x0_base + L2X0_INV_WAY, 0xff);
> + ? ? ? cache_wait_always(l2x0_base + L2X0_INV_WAY, 0xff);
> ? ? ? ?cache_sync();
> - ? ? ? spin_unlock_irqrestore(&l2x0_lock, flags);
> + ? ? ? _l2x0_unlock(&l2x0_lock, flags);
> ?}

So, ah, shouldn't you be using a mask of 0xffff for 16-way PL310s?

And I think we have a potential patch collision in the near future.

Could you integrate in my [arm l2x0] patch I posted today on the list?

It supports 16-way PL310s, and PL210s with fewer than 7 ways.

--
Jason S. McMullan
Netronome Systems, Inc.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
@ 2010-05-04 17:04   ` Jason McMullan
  2010-05-05 16:23     ` Catalin Marinas
  2010-05-05 13:26   ` George G. Davis
  2010-05-12 12:51   ` Ronen Shitrit
  2 siblings, 1 reply; 64+ messages in thread
From: Jason McMullan @ 2010-05-04 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 4, 2010 at 12:44 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> The Snoop Control Unit on the ARM11MPCore hardware does not detect the
> cache operations and the dma_cache_maint*() functions may leave stale
> cache entries on other CPUs. The solution implemented in this patch
> performs a Read or Write For Ownership in the ARMv6 DMA cache
> maintenance functions. These LDR/STR instructions change the cache line
> state to shared or exclusive so that the cache maintenance operation has
> the desired effect.

Is latter portion of this patch required only for SMP MPCore systems, or is
it also required for uniprocessor MPCore configurations?

--
Jason S. McMullan
Netronome Systems, Inc.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c
  2010-05-04 16:44 ` [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c Catalin Marinas
@ 2010-05-04 17:19   ` Russell King - ARM Linux
  0 siblings, 0 replies; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-04 17:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 04, 2010 at 05:44:43PM +0100, Catalin Marinas wrote:
> Commit 31aa8fd6 introduced the __arm_ioremap_caller() function but the
> nommu.c version did not have the _caller suffix.

Ok.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU
  2010-05-04 16:44 ` [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU Catalin Marinas
@ 2010-05-04 17:19   ` Russell King - ARM Linux
  0 siblings, 0 replies; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-04 17:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 04, 2010 at 05:44:48PM +0100, Catalin Marinas wrote:
> Commit 7959722 introduced calls to copy_(to|from)_user_page() from
> access_process_vm() in mm/nommu.c. The copy_to_user_page() was not
> implemented on noMMU ARM.

Ok.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
  2010-05-04 17:04   ` Jason McMullan
@ 2010-05-05 13:26   ` George G. Davis
  2010-05-06 14:40     ` Catalin Marinas
  2010-05-12 12:51   ` Ronen Shitrit
  2 siblings, 1 reply; 64+ messages in thread
From: George G. Davis @ 2010-05-05 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Definitely need this for stable DMA on ARM11 MPCore.

On Tue, May 04, 2010 at 05:44:26PM +0100, Catalin Marinas wrote:
> The Snoop Control Unit on the ARM11MPCore hardware does not detect the
> cache operations and the dma_cache_maint*() functions may leave stale
> cache entries on other CPUs. The solution implemented in this patch
> performs a Read or Write For Ownership in the ARMv6 DMA cache
> maintenance functions. These LDR/STR instructions change the cache line
> state to shared or exclusive so that the cache maintenance operation has
> the desired effect.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Tested-by: George G. Davis <gdavis@mvista.com>
---

FWIW, lack of ARM11 MPCore DMA cache coherency has been a problem for
well over two years now and it would be good if we can finally get this
fixed in mainline.  Without this applied on current, I observe various
oopses and/or filesystem errors which are resolved by this patch.

If there is some other testing that I can do to help getting this
or some other variation accepted for mainline, let me know.

Thanks!

--
Regards,
George


> ---
>  arch/arm/mm/cache-v6.S |   17 +++++++++++++----
>  1 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
> index 9d89c67..e46ecd8 100644
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -211,6 +211,9 @@ v6_dma_inv_range:
>  	mcrne	p15, 0, r1, c7, c15, 1		@ clean & invalidate unified line
>  #endif
>  1:
> +#ifdef CONFIG_SMP
> +	str	r0, [r0]			@ write for ownership
> +#endif
>  #ifdef HARVARD_CACHE
>  	mcr	p15, 0, r0, c7, c6, 1		@ invalidate D line
>  #else
> @@ -231,6 +234,9 @@ v6_dma_inv_range:
>  v6_dma_clean_range:
>  	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
>  1:
> +#ifdef CONFIG_SMP
> +	ldr	r2, [r0]			@ read for ownership
> +#endif
>  #ifdef HARVARD_CACHE
>  	mcr	p15, 0, r0, c7, c10, 1		@ clean D line
>  #else
> @@ -251,6 +257,10 @@ v6_dma_clean_range:
>  ENTRY(v6_dma_flush_range)
>  	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
>  1:
> +#ifdef CONFIG_SMP
> +	ldr	r2, [r0]			@ read for ownership
> +	str	r2, [r0]			@ write for ownership
> +#endif
>  #ifdef HARVARD_CACHE
>  	mcr	p15, 0, r0, c7, c14, 1		@ clean & invalidate D line
>  #else
> @@ -273,7 +283,9 @@ ENTRY(v6_dma_map_area)
>  	add	r1, r1, r0
>  	teq	r2, #DMA_FROM_DEVICE
>  	beq	v6_dma_inv_range
> -	b	v6_dma_clean_range
> +	teq	r2, #DMA_TO_DEVICE
> +	beq	v6_dma_clean_range
> +	b	v6_dma_flush_range
>  ENDPROC(v6_dma_map_area)
>  
>  /*
> @@ -283,9 +295,6 @@ ENDPROC(v6_dma_map_area)
>   *	- dir	- DMA direction
>   */
>  ENTRY(v6_dma_unmap_area)
> -	add	r1, r1, r0
> -	teq	r2, #DMA_TO_DEVICE
> -	bne	v6_dma_inv_range
>  	mov	pc, lr
>  ENDPROC(v6_dma_unmap_area)
>  
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used
  2010-05-04 17:02   ` Jason McMullan
@ 2010-05-05 16:07     ` Catalin Marinas
  0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-05 16:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2010-05-04 at 18:02 +0100, Jason McMullan wrote:
> On Tue, May 4, 2010 at 12:44 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > With this L2 cache controller, the cache maintenance by PA and sync
> > operations are atomic and do not require a "wait" loop or spinlocks.
> > This patch conditionally defines the cache_wait() function and locking
> > primitives (rather than duplicating the functions or file).
> >
> > Since L2x0 cache controllers do not work with ARMv7 CPUs, the patch
> > automatically enables CACHE_PL310 when CPU_V7 is defined.
> >
> > [snip snip snip]
> >  static inline void l2x0_inv_all(void)
> > @@ -107,11 +134,11 @@ static inline void l2x0_inv_all(void)
> >        unsigned long flags;
> >
> >        /* invalidate all ways */
> > -       spin_lock_irqsave(&l2x0_lock, flags);
> > +       _l2x0_lock(&l2x0_lock, flags);
> >        writel(0xff, l2x0_base + L2X0_INV_WAY);
> > -       cache_wait(l2x0_base + L2X0_INV_WAY, 0xff);
> > +       cache_wait_always(l2x0_base + L2X0_INV_WAY, 0xff);
> >        cache_sync();
> > -       spin_unlock_irqrestore(&l2x0_lock, flags);
> > +       _l2x0_unlock(&l2x0_lock, flags);
> >  }
> 
> So, ah, shouldn't you be using a mask of 0xffff for 16-way PL310s?
> 
> And I think we have a potential patch collision in the near future.
> 
> Could you integrate in my [arm l2x0] patch I posted today on the list?
> 
> It supports 16-way PL310s, and PL210s with fewer than 7 ways.

I think the best way is to just go ahead and push your patch to
Russell's patch system, I'll update my patch based on yours once I get
some feedback on this list.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-04 17:04   ` Jason McMullan
@ 2010-05-05 16:23     ` Catalin Marinas
  0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-05 16:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2010-05-04 at 18:04 +0100, Jason McMullan wrote:
> On Tue, May 4, 2010 at 12:44 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > The Snoop Control Unit on the ARM11MPCore hardware does not detect the
> > cache operations and the dma_cache_maint*() functions may leave stale
> > cache entries on other CPUs. The solution implemented in this patch
> > performs a Read or Write For Ownership in the ARMv6 DMA cache
> > maintenance functions. These LDR/STR instructions change the cache line
> > state to shared or exclusive so that the cache maintenance operation has
> > the desired effect.
> 
> Is latter portion of this patch required only for SMP MPCore systems, or is
> it also required for uniprocessor MPCore configurations?

It's required for MPCore to work correctly but it's harmless for UP.
It's actually a performance improvement since we don't invalidate the
D-cache twice in map and unmap (and we don't have speculative fetches
into the D-cache on v6).

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-05 13:26   ` George G. Davis
@ 2010-05-06 14:40     ` Catalin Marinas
  2010-05-06 15:57       ` George G. Davis
  0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-06 14:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2010-05-05 at 14:26 +0100, George G. Davis wrote:
> Definitely need this for stable DMA on ARM11 MPCore.
> 
> On Tue, May 04, 2010 at 05:44:26PM +0100, Catalin Marinas wrote:
> > The Snoop Control Unit on the ARM11MPCore hardware does not detect the
> > cache operations and the dma_cache_maint*() functions may leave stale
> > cache entries on other CPUs. The solution implemented in this patch
> > performs a Read or Write For Ownership in the ARMv6 DMA cache
> > maintenance functions. These LDR/STR instructions change the cache line
> > state to shared or exclusive so that the cache maintenance operation has
> > the desired effect.
> >
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> 
> Tested-by: George G. Davis <gdavis@mvista.com>

Thanks.

> FWIW, lack of ARM11 MPCore DMA cache coherency has been a problem for
> well over two years now and it would be good if we can finally get this
> fixed in mainline.  Without this applied on current, I observe various
> oopses and/or filesystem errors which are resolved by this patch.

What platform was this tested on?

> If there is some other testing that I can do to help getting this
> or some other variation accepted for mainline, let me know.

The more platforms tested, the better. Anyway, I don't think we have
much choice in how this workaround is implemented, unless the hardware
supports FIQs.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-06 14:40     ` Catalin Marinas
@ 2010-05-06 15:57       ` George G. Davis
  0 siblings, 0 replies; 64+ messages in thread
From: George G. Davis @ 2010-05-06 15:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 06, 2010 at 03:40:32PM +0100, Catalin Marinas wrote:
> On Wed, 2010-05-05 at 14:26 +0100, George G. Davis wrote:
> > Definitely need this for stable DMA on ARM11 MPCore.
> > 
> > On Tue, May 04, 2010 at 05:44:26PM +0100, Catalin Marinas wrote:
> > > The Snoop Control Unit on the ARM11MPCore hardware does not detect the
> > > cache operations and the dma_cache_maint*() functions may leave stale
> > > cache entries on other CPUs. The solution implemented in this patch
> > > performs a Read or Write For Ownership in the ARMv6 DMA cache
> > > maintenance functions. These LDR/STR instructions change the cache line
> > > state to shared or exclusive so that the cache maintenance operation has
> > > the desired effect.
> > >
> > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > 
> > Tested-by: George G. Davis <gdavis@mvista.com>
> 
> Thanks.
> 
> > FWIW, lack of ARM11 MPCore DMA cache coherency has been a problem for
> > well over two years now and it would be good if we can finally get this
> > fixed in mainline.  Without this applied on current, I observe various
> > oopses and/or filesystem errors which are resolved by this patch.
> 
> What platform was this tested on?

Tested on the EMMA Car Series EC-4260 board [1] and (EC-4270) NE1BOARD [2]
using ext4 root filesystem over USB Mass Storage and SD Card.  Here is
the ID Code Register, same for both systems (I've given up trying to make
sense as to whether this is r1p0, r0p4 or other : ):

CPU: ARMv6-compatible processor [410fb024] revision 4 (ARMv7), cr=08c5387f

> > If there is some other testing that I can do to help getting this
> > or some other variation accepted for mainline, let me know.
> 
> The more platforms tested, the better. Anyway, I don't think we have
> much choice in how this workaround is implemented, unless the hardware
> supports FIQs.

FWIW, I've also tried testing on a RealView EB ARM11 MPCore [3] but
unfortunately it has a rather old core (r0p0/[410fb020]) which no
longer boots on linux-2.6.34-rc (but oddly still works ok on
v2.6.33).

Thanks!

--
Regards,
George

[1] http://www.arm.linux.org.uk/developer/machines/list.php?id=2639
[2] http://www.arm.linux.org.uk/developer/machines/list.php?id=1655
[3] http://www.arm.linux.org.uk/developer/machines/list.php?id=827

> 
> -- 
> Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
  2010-05-04 17:04   ` Jason McMullan
  2010-05-05 13:26   ` George G. Davis
@ 2010-05-12 12:51   ` Ronen Shitrit
  2010-05-12 13:55     ` Catalin Marinas
  2 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-12 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi 

I'm using ARM v6 MP core.
I have applied this patch above 2.6.34-rc7 and my system hang while booting from NFS (through e1000 nic).
Previously I used 2.6.32 with the cache broadcast IPI mask and it worked just fine for me.
I have noticed that if I remove the str, write for ownership, in the dma_inv the system boot just fine.

- Why was the IPI solution dropped?
- I can think of few cases the above str might be problematic:
1) CPU 0 is doing inv, after the str cpu 1 change the line and then cpu 0 inv the line... probably won't happen since the dma buffer shouldn't be used by more the one CPU...
2) CPU 0 is doing inv, the str write garbage which get to the DRAM because of eviction and break the assumption that inv shouldn't change the DRAM...

Regards
Ronen Shitrit


-----Original Message-----
From: linux-arm-kernel-bounces@lists.infradead.org [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of Catalin Marinas
Sent: Tuesday, May 04, 2010 7:44 PM
To: linux-arm-kernel at lists.infradead.org
Subject: [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops

The Snoop Control Unit on the ARM11MPCore hardware does not detect the
cache operations and the dma_cache_maint*() functions may leave stale
cache entries on other CPUs. The solution implemented in this patch
performs a Read or Write For Ownership in the ARMv6 DMA cache
maintenance functions. These LDR/STR instructions change the cache line
state to shared or exclusive so that the cache maintenance operation has
the desired effect.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/mm/cache-v6.S |   17 +++++++++++++----
 1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 9d89c67..e46ecd8 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -211,6 +211,9 @@ v6_dma_inv_range:
 	mcrne	p15, 0, r1, c7, c15, 1		@ clean & invalidate unified line
 #endif
 1:
+#ifdef CONFIG_SMP
+	str	r0, [r0]			@ write for ownership
+#endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c6, 1		@ invalidate D line
 #else
@@ -231,6 +234,9 @@ v6_dma_inv_range:
 v6_dma_clean_range:
 	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
 1:
+#ifdef CONFIG_SMP
+	ldr	r2, [r0]			@ read for ownership
+#endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c10, 1		@ clean D line
 #else
@@ -251,6 +257,10 @@ v6_dma_clean_range:
 ENTRY(v6_dma_flush_range)
 	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
 1:
+#ifdef CONFIG_SMP
+	ldr	r2, [r0]			@ read for ownership
+	str	r2, [r0]			@ write for ownership
+#endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c14, 1		@ clean & invalidate D line
 #else
@@ -273,7 +283,9 @@ ENTRY(v6_dma_map_area)
 	add	r1, r1, r0
 	teq	r2, #DMA_FROM_DEVICE
 	beq	v6_dma_inv_range
-	b	v6_dma_clean_range
+	teq	r2, #DMA_TO_DEVICE
+	beq	v6_dma_clean_range
+	b	v6_dma_flush_range
 ENDPROC(v6_dma_map_area)
 
 /*
@@ -283,9 +295,6 @@ ENDPROC(v6_dma_map_area)
  *	- dir	- DMA direction
  */
 ENTRY(v6_dma_unmap_area)
-	add	r1, r1, r0
-	teq	r2, #DMA_TO_DEVICE
-	bne	v6_dma_inv_range
 	mov	pc, lr
 ENDPROC(v6_dma_unmap_area)
 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel at lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-12 12:51   ` Ronen Shitrit
@ 2010-05-12 13:55     ` Catalin Marinas
  2010-05-12 15:03       ` Ronen Shitrit
  2010-05-12 18:48       ` Russell King - ARM Linux
  0 siblings, 2 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-12 13:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2010-05-12 at 15:51 +0300, Ronen Shitrit wrote:
> I'm using ARM v6 MP core.
> I have applied this patch above 2.6.34-rc7 and my system hang while
> booting from NFS (through e1000 nic).

Does the whole kernel hang or it's just user space?

> Previously I used 2.6.32 with the cache broadcast IPI mask and it
> worked just fine for me.
> I have noticed that if I remove the str, write for ownership, in the
> dma_inv the system boot just fine.

The only way to invalidate the caches on all the CPUs is to force a
write so that the cache line is moved to modified/exclusive mode (no
other CPUs has the cache line) and then invalidate.

Drivers shouldn't call dma_inv_range if they have valid date in that
buffer (that's why I removed the dma_inv_range call from
dma_unmap_area).

Could try the patch below:

diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index e46ecd8..332b48c 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -212,7 +212,8 @@ v6_dma_inv_range:
 #endif
 1:
 #ifdef CONFIG_SMP
-	str	r0, [r0]			@ write for ownership
+	ldr	r2, [r0]			@ read for ownership
+	str	r2, [r0]			@ write for ownership
 #endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c6, 1		@ invalidate D line

> - Why was the IPI solution dropped?

It can cause deadlock in some circumstances (there was a thread some
time ago).

> - I can think of few cases the above str might be problematic:
> 1) CPU 0 is doing inv, after the str cpu 1 change the line and then
> cpu 0 inv the line... probably won't happen since the dma buffer
> shouldn't be used by more the one CPU...

This shouldn't happen. Even without my patch, doing a cache line
invalidation would erase the data written by CPU1, so the same
corruption.

> 2) CPU 0 is doing inv, the str write garbage which get to the DRAM
> because of eviction and break the assumption that inv shouldn't change
> the DRAM...

This may be the case but is there such situation? The above patch should
fix it but it slows down the invalidate operation further.

Thanks.

-- 
Catalin

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-12 13:55     ` Catalin Marinas
@ 2010-05-12 15:03       ` Ronen Shitrit
  2010-05-12 18:48       ` Russell King - ARM Linux
  1 sibling, 0 replies; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-12 15:03 UTC (permalink / raw)
  To: linux-arm-kernel

Thanks for the fast response.
The kernel doesn't hang but the NFS can't complete loading the FS, seems like DMA data corruption...
The patch below seems to solve the issue :)
I guess we can't assure that the 2nd scenario below won't happen...

Regards
Ronen Shitrit

-----Original Message-----
From: Catalin Marinas [mailto:catalin.marinas at arm.com] 
Sent: Wednesday, May 12, 2010 4:56 PM
To: Ronen Shitrit
Cc: linux-arm-kernel at lists.infradead.org
Subject: RE: [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops

On Wed, 2010-05-12 at 15:51 +0300, Ronen Shitrit wrote:
> I'm using ARM v6 MP core.
> I have applied this patch above 2.6.34-rc7 and my system hang while
> booting from NFS (through e1000 nic).

Does the whole kernel hang or it's just user space?

> Previously I used 2.6.32 with the cache broadcast IPI mask and it
> worked just fine for me.
> I have noticed that if I remove the str, write for ownership, in the
> dma_inv the system boot just fine.

The only way to invalidate the caches on all the CPUs is to force a
write so that the cache line is moved to modified/exclusive mode (no
other CPUs has the cache line) and then invalidate.

Drivers shouldn't call dma_inv_range if they have valid date in that
buffer (that's why I removed the dma_inv_range call from
dma_unmap_area).

Could try the patch below:

diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index e46ecd8..332b48c 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -212,7 +212,8 @@ v6_dma_inv_range:
 #endif
 1:
 #ifdef CONFIG_SMP
-	str	r0, [r0]			@ write for ownership
+	ldr	r2, [r0]			@ read for ownership
+	str	r2, [r0]			@ write for ownership
 #endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c6, 1		@ invalidate D line

> - Why was the IPI solution dropped?

It can cause deadlock in some circumstances (there was a thread some
time ago).

> - I can think of few cases the above str might be problematic:
> 1) CPU 0 is doing inv, after the str cpu 1 change the line and then
> cpu 0 inv the line... probably won't happen since the dma buffer
> shouldn't be used by more the one CPU...

This shouldn't happen. Even without my patch, doing a cache line
invalidation would erase the data written by CPU1, so the same
corruption.

> 2) CPU 0 is doing inv, the str write garbage which get to the DRAM
> because of eviction and break the assumption that inv shouldn't change
> the DRAM...

This may be the case but is there such situation? The above patch should
fix it but it slows down the invalidate operation further.

Thanks.

-- 
Catalin

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-12 13:55     ` Catalin Marinas
  2010-05-12 15:03       ` Ronen Shitrit
@ 2010-05-12 18:48       ` Russell King - ARM Linux
  2010-05-12 18:59         ` Russell King - ARM Linux
  1 sibling, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-12 18:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 12, 2010 at 02:55:39PM +0100, Catalin Marinas wrote:
> Drivers shouldn't call dma_inv_range if they have valid date in that
> buffer (that's why I removed the dma_inv_range call from
> dma_unmap_area).

That (removal of dma_inv_range) is the wrong approach - if drivers are
expecting data in the cache to be preserved over a dma_unmap_area() call,
they are buggy and need fixing.

However...

> > 2) CPU 0 is doing inv, the str write garbage which get to the DRAM
> > because of eviction and break the assumption that inv shouldn't change
> > the DRAM...
> 
> This may be the case but is there such situation?

This is why there's a problem - if we (rightfully) have dma_unmap_area()
call dma_inv_range, and dma_inv_range() corrupts the SDRAM, that's where
the bug is - dma_unmap_area() must not change the contents of SDRAM.

Aren't there CPUs which speculatively prefetch _and_ which don't have
broadcast cache ops?  If yes, then we can't use the "read/write to
gain ownership" approach - and since we can't use IPIs either, I think
we're sadly boxed in by hardware restrictions to the point of not being
able to run with DMA on these CPUs.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-12 18:48       ` Russell King - ARM Linux
@ 2010-05-12 18:59         ` Russell King - ARM Linux
  2010-05-12 20:00           ` Ronen Shitrit
  2010-05-12 21:21           ` [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 " Catalin Marinas
  0 siblings, 2 replies; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-12 18:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 12, 2010 at 07:48:52PM +0100, Russell King - ARM Linux wrote:
> Aren't there CPUs which speculatively prefetch _and_ which don't have
> broadcast cache ops?  If yes, then we can't use the "read/write to
> gain ownership" approach - and since we can't use IPIs either, I think
> we're sadly boxed in by hardware restrictions to the point of not being
> able to run with DMA on these CPUs.

I just had a second thought that what I wrote above was tosh, but then
had a third thought which reconfirmed it as a valid point...

Consider if the CPU speculatively prefetches the line you're going to
read before the DMA has completed, and it prefetches the pre-DMA data.

Your read loads the pre-DMA data, and the write writes it back to the
cache line.  An interrupt happens, and the cache line gets evicted some
time later before the invalidate line operation - overwriting the DMA
data, thereby corrupting it.

So, a CPU which speculatively prefetches _and_ doesn't broadcast cache
operations _is_ a big problem.  I hope we don't have any.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-12 18:59         ` Russell King - ARM Linux
@ 2010-05-12 20:00           ` Ronen Shitrit
  2010-05-12 20:04             ` Russell King - ARM Linux
  2010-05-12 21:21           ` [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 " Catalin Marinas
  1 sibling, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-12 20:00 UTC (permalink / raw)
  To: linux-arm-kernel

This is a good point, assuming the patch will lock the int a round the  
dma cache calls, can you think of any data coruption issues?

Thanks

Sent from my phone

On 12/05/2010, at 21:59, "Russell King - ARM Linux" <linux@arm.linux.org.uk 
 > wrote:

> On Wed, May 12, 2010 at 07:48:52PM +0100, Russell King - ARM Linux  
> wrote:
>> Aren't there CPUs which speculatively prefetch _and_ which don't have
>> broadcast cache ops?  If yes, then we can't use the "read/write to
>> gain ownership" approach - and since we can't use IPIs either, I  
>> think
>> we're sadly boxed in by hardware restrictions to the point of not  
>> being
>> able to run with DMA on these CPUs.
>
> I just had a second thought that what I wrote above was tosh, but then
> had a third thought which reconfirmed it as a valid point...
>
> Consider if the CPU speculatively prefetches the line you're going to
> read before the DMA has completed, and it prefetches the pre-DMA data.
>
> Your read loads the pre-DMA data, and the write writes it back to the
> cache line.  An interrupt happens, and the cache line gets evicted  
> some
> time later before the invalidate line operation - overwriting the DMA
> data, thereby corrupting it.
>
> So, a CPU which speculatively prefetches _and_ doesn't broadcast cache
> operations _is_ a big problem.  I hope we don't have any.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-12 20:00           ` Ronen Shitrit
@ 2010-05-12 20:04             ` Russell King - ARM Linux
  2010-05-12 20:19               ` Ronen Shitrit
  0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-12 20:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 12, 2010 at 11:00:42PM +0300, Ronen Shitrit wrote:
> This is a good point, assuming the patch will lock the int a round the  
> dma cache calls, can you think of any data coruption issues?

FIQs?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
  2010-05-12 20:04             ` Russell King - ARM Linux
@ 2010-05-12 20:19               ` Ronen Shitrit
  0 siblings, 0 replies; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-12 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

Let's assume no FIQs...

Sent from my phone

On 12/05/2010, at 23:04, "Russell King - ARM Linux" <linux at arm.linux.org.uk<mailto:linux@arm.linux.org.uk>> wrote:

On Wed, May 12, 2010 at 11:00:42PM +0300, Ronen Shitrit wrote:
This is a good point, assuming the patch will lock the int a round the
dma cache calls, can you think of any data coruption issues?

FIQs?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20100512/3b64fe4e/attachment-0001.htm>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-12 18:59         ` Russell King - ARM Linux
  2010-05-12 20:00           ` Ronen Shitrit
@ 2010-05-12 21:21           ` Catalin Marinas
  2010-05-13  5:27             ` Ronen Shitrit
  1 sibling, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-12 21:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2010-05-12 at 19:59 +0100, Russell King - ARM Linux wrote:
> On Wed, May 12, 2010 at 07:48:52PM +0100, Russell King - ARM Linux wrote:
> > Aren't there CPUs which speculatively prefetch _and_ which don't have
> > broadcast cache ops?  If yes, then we can't use the "read/write to
> > gain ownership" approach - and since we can't use IPIs either, I think
> > we're sadly boxed in by hardware restrictions to the point of not being
> > able to run with DMA on these CPUs.
> 
> I just had a second thought that what I wrote above was tosh, but then
> had a third thought which reconfirmed it as a valid point...
> 
> Consider if the CPU speculatively prefetches the line you're going to
> read before the DMA has completed, and it prefetches the pre-DMA data.

I won't consider this scenario :). For such cores there isn't an easy
workaround to the DMA cache operations. Even with IPI via FIQ, you may
still have the risk of cache lines migrating around CPUs and missing the
cache flushing.

AFAICT, ARM11MPCore doesn't do this.
> 
Since my implementation of v6_dma_inv_range is destructive, I removed it
from dma_unmap_area() with the precondition that the CPU doesn't do
speculative accesses. With my patch, the dma_unmap_area() doesn't change
the content of the SDRAM.

I don't entirely understand why corruption happens with the e1000 driver
(looking at the e1000.c code it uses consistent memory).

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-12 21:21           ` [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 " Catalin Marinas
@ 2010-05-13  5:27             ` Ronen Shitrit
  2010-05-13  8:26               ` Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-13  5:27 UTC (permalink / raw)
  To: linux-arm-kernel

Our ARM v6 does have speculative prefetch...

Sent from my phone

On 13/05/2010, at 00:21, "Catalin Marinas" <catalin.marinas@arm.com>  
wrote:

> On Wed, 2010-05-12 at 19:59 +0100, Russell King - ARM Linux wrote:
>> On Wed, May 12, 2010 at 07:48:52PM +0100, Russell King - ARM Linux  
>> wrote:
>>> Aren't there CPUs which speculatively prefetch _and_ which don't  
>>> have
>>> broadcast cache ops?  If yes, then we can't use the "read/write to
>>> gain ownership" approach - and since we can't use IPIs either, I  
>>> think
>>> we're sadly boxed in by hardware restrictions to the point of not  
>>> being
>>> able to run with DMA on these CPUs.
>>
>> I just had a second thought that what I wrote above was tosh, but  
>> then
>> had a third thought which reconfirmed it as a valid point...
>>
>> Consider if the CPU speculatively prefetches the line you're going to
>> read before the DMA has completed, and it prefetches the pre-DMA  
>> data.
>
> I won't consider this scenario :). For such cores there isn't an easy
> workaround to the DMA cache operations. Even with IPI via FIQ, you may
> still have the risk of cache lines migrating around CPUs and missing  
> the
> cache flushing.
>
> AFAICT, ARM11MPCore doesn't do this.
>>
> Since my implementation of v6_dma_inv_range is destructive, I  
> removed it
> from dma_unmap_area() with the precondition that the CPU doesn't do
> speculative accesses. With my patch, the dma_unmap_area() doesn't  
> change
> the content of the SDRAM.
>
> I don't entirely understand why corruption happens with the e1000  
> driver
> (looking at the e1000.c code it uses consistent memory).
>
> -- 
> Catalin
>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-13  5:27             ` Ronen Shitrit
@ 2010-05-13  8:26               ` Catalin Marinas
  2010-05-13 13:54                 ` George G. Davis
  2010-05-16  6:29                 ` Ronen Shitrit
  0 siblings, 2 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-13  8:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> Our ARM v6 does have speculative prefetch...

That's the I-cache and we should be ok with it. What may affect this
patch is speculative fetches into the D-cache.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-13  8:26               ` Catalin Marinas
@ 2010-05-13 13:54                 ` George G. Davis
  2010-05-13 14:15                   ` Catalin Marinas
  2010-05-16  6:29                 ` Ronen Shitrit
  1 sibling, 1 reply; 64+ messages in thread
From: George G. Davis @ 2010-05-13 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, May 13, 2010 at 09:26:57AM +0100, Catalin Marinas wrote:
> On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> > Our ARM v6 does have speculative prefetch...
> 
> That's the I-cache and we should be ok with it. What may affect this
> patch is speculative fetches into the D-cache.

prefetch()?

--
Regards,
George

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-13 13:54                 ` George G. Davis
@ 2010-05-13 14:15                   ` Catalin Marinas
  2010-05-13 20:34                     ` George G. Davis
  0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-13 14:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-05-13 at 14:54 +0100, George G. Davis wrote:
> Hi,
> 
> On Thu, May 13, 2010 at 09:26:57AM +0100, Catalin Marinas wrote:
> > On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> > > Our ARM v6 does have speculative prefetch...
> >
> > That's the I-cache and we should be ok with it. What may affect this
> > patch is speculative fetches into the D-cache.
> 
> prefetch()?

Drivers should not use prefetch (PLD) on memory handed over to a device
for DMA transfers. To me this is like explicitly accessing such memory
during a DMA transfer.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-13 14:15                   ` Catalin Marinas
@ 2010-05-13 20:34                     ` George G. Davis
  2010-05-14 16:29                       ` Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: George G. Davis @ 2010-05-13 20:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, May 13, 2010 at 03:15:20PM +0100, Catalin Marinas wrote:
> On Thu, 2010-05-13 at 14:54 +0100, George G. Davis wrote:
> > Hi,
> > 
> > On Thu, May 13, 2010 at 09:26:57AM +0100, Catalin Marinas wrote:
> > > On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> > > > Our ARM v6 does have speculative prefetch...
> > >
> > > That's the I-cache and we should be ok with it. What may affect this
> > > patch is speculative fetches into the D-cache.
> > 
> > prefetch()?
> 
> Drivers should not use prefetch (PLD) on memory handed over to a device
> for DMA transfers. To me this is like explicitly accessing such memory
> during a DMA transfer.

I agree and was merely speculating that prefetch() may be a case where
D-Cache lines are explicitly prefetched and may be the cause of this
problem, e.g. from e1000_clean_rx_irq() in drivers/net/e1000e/netdev.c:

	while (rx_desc->status & E1000_RXD_STAT_DD) {
		struct sk_buff *skb;
		u8 status;

		if (*work_done >= work_to_do)
			break;
		(*work_done)++;

		status = rx_desc->status;
		skb = buffer_info->skb;
		buffer_info->skb = NULL;

		prefetch(skb->data - NET_IP_ALIGN);

		i++;
		if (i == rx_ring->count)
			i = 0;
		next_rxd = E1000_RX_DESC(*rx_ring, i);
		prefetch(next_rxd);

		next_buffer = &rx_ring->buffer_info[i];

		cleaned = 1;
		cleaned_count++;
		pci_unmap_single(pdev,
				 buffer_info->dma,
				 adapter->rx_buffer_len,
				 PCI_DMA_FROMDEVICE);


I'm to lazy^Wbusy to drill down but could the above prefetch(skb->data...)
before pci_unmap_single() or similar cases be causing the problem reported
by Ronen?

I think the above may be one of those "if drivers are expecting data in
the cache to be preserved over a dma_unmap_area() call, they are buggy
and need fixing." Russell noted in a previous post, no?

--
Regards,
George

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-13 20:34                     ` George G. Davis
@ 2010-05-14 16:29                       ` Catalin Marinas
  2010-05-14 16:42                         ` Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-14 16:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-05-13 at 21:34 +0100, George G. Davis wrote:
> On Thu, May 13, 2010 at 03:15:20PM +0100, Catalin Marinas wrote:
> > On Thu, 2010-05-13 at 14:54 +0100, George G. Davis wrote:
> > > On Thu, May 13, 2010 at 09:26:57AM +0100, Catalin Marinas wrote:
> > > > On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> > > > > Our ARM v6 does have speculative prefetch...
> > > >
> > > > That's the I-cache and we should be ok with it. What may affect this
> > > > patch is speculative fetches into the D-cache.
> > >
> > > prefetch()?
> >
> > Drivers should not use prefetch (PLD) on memory handed over to a device
> > for DMA transfers. To me this is like explicitly accessing such memory
> > during a DMA transfer.
> 
> I agree and was merely speculating that prefetch() may be a case where
> D-Cache lines are explicitly prefetched and may be the cause of this
> problem, e.g. from e1000_clean_rx_irq() in drivers/net/e1000e/netdev.c:
> 
>         while (rx_desc->status & E1000_RXD_STAT_DD) {
>                 struct sk_buff *skb;
>                 u8 status;
> 
>                 if (*work_done >= work_to_do)
>                         break;
>                 (*work_done)++;
> 
>                 status = rx_desc->status;
>                 skb = buffer_info->skb;
>                 buffer_info->skb = NULL;
> 
>                 prefetch(skb->data - NET_IP_ALIGN);
> 
>                 i++;
>                 if (i == rx_ring->count)
>                         i = 0;
>                 next_rxd = E1000_RX_DESC(*rx_ring, i);
>                 prefetch(next_rxd);
> 
>                 next_buffer = &rx_ring->buffer_info[i];
> 
>                 cleaned = 1;
>                 cleaned_count++;
>                 pci_unmap_single(pdev,
>                                  buffer_info->dma,
>                                  adapter->rx_buffer_len,
>                                  PCI_DMA_FROMDEVICE);

If the prefetch is done after the DMA transfer has completed, then there
shouldn't be any problem. I suspect that's the case in this interrupt
routine.

What I think I missed in my patch (and didn't show up with a sata drive)
is the "sync" operations. The dma_sync_single_for_device(FROMDEVICE)
corrupts the existing buffer.

I'll post an updated patch shortly.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-14 16:29                       ` Catalin Marinas
@ 2010-05-14 16:42                         ` Catalin Marinas
  2010-05-15  1:26                           ` George G. Davis
  0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-14 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2010-05-14 at 17:29 +0100, Catalin Marinas wrote:
> What I think I missed in my patch (and didn't show up with a sata drive)
> is the "sync" operations. The dma_sync_single_for_device(FROMDEVICE)
> corrupts the existing buffer.
> 
> I'll post an updated patch shortly.

Actually I think the hunk that I posted earlier should be enough:


diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index a4a6840..aa3ac32 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -216,7 +216,8 @@ v6_dma_inv_range:
 #endif
 1:
 #ifdef CONFIG_SMP
-	str	r0, [r0]			@ write for ownership
+	ldr	r2, [r0]			@ read for ownership
+	str	r2, [r0]			@ write for ownership
 #endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c6, 1		@ invalidate D line


The dma_sync_single_for_cpu() eventually calls dmac_unmap_area() which
on ARM11MPCore became a no-op. There is no problem since this function
call is always preceded by a call to either dma_map_single() or
dma_sync_single_to_device(), both of them calling dma_map_area() which
does the cache invalidation.

With the patch above, v6_dma_inv_range() is no longer destructive so the
transferred data to the SDRAM is preserved. We could even re-instate the
v6_dma_unmap_area() function but for performance reasons I would leave
it as a no-op.

Russell, are you ok with such change? Since you already applied the
original patch as #6111, I will send an additional patch with this fix
(I'm sending it directly to the patch system as I won't have access to
my work PC this weekend. Feel free to apply or reject).

Thanks.

-- 
Catalin

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-14 16:42                         ` Catalin Marinas
@ 2010-05-15  1:26                           ` George G. Davis
  2010-05-16  6:28                             ` Ronen Shitrit
  0 siblings, 1 reply; 64+ messages in thread
From: George G. Davis @ 2010-05-15  1:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, May 14, 2010 at 05:42:31PM +0100, Catalin Marinas wrote:
> On Fri, 2010-05-14 at 17:29 +0100, Catalin Marinas wrote:
> > What I think I missed in my patch (and didn't show up with a sata drive)
> > is the "sync" operations. The dma_sync_single_for_device(FROMDEVICE)
> > corrupts the existing buffer.
> > 
> > I'll post an updated patch shortly.
> 
> Actually I think the hunk that I posted earlier should be enough:

Um, apologies for the noise but I didn't understand if Ronen said
this resolved the e1000 problem or not.

Thanks!

--
Regards,
George

> 
> 
> diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
> index a4a6840..aa3ac32 100644
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -216,7 +216,8 @@ v6_dma_inv_range:
>  #endif
>  1:
>  #ifdef CONFIG_SMP
> -	str	r0, [r0]			@ write for ownership
> +	ldr	r2, [r0]			@ read for ownership
> +	str	r2, [r0]			@ write for ownership
>  #endif
>  #ifdef HARVARD_CACHE
>  	mcr	p15, 0, r0, c7, c6, 1		@ invalidate D line
> 
> 
> The dma_sync_single_for_cpu() eventually calls dmac_unmap_area() which
> on ARM11MPCore became a no-op. There is no problem since this function
> call is always preceded by a call to either dma_map_single() or
> dma_sync_single_to_device(), both of them calling dma_map_area() which
> does the cache invalidation.
> 
> With the patch above, v6_dma_inv_range() is no longer destructive so the
> transferred data to the SDRAM is preserved. We could even re-instate the
> v6_dma_unmap_area() function but for performance reasons I would leave
> it as a no-op.
> 
> Russell, are you ok with such change? Since you already applied the
> original patch as #6111, I will send an additional patch with this fix
> (I'm sending it directly to the patch system as I won't have access to
> my work PC this weekend. Feel free to apply or reject).
> 
> Thanks.
> 
> -- 
> Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-15  1:26                           ` George G. Davis
@ 2010-05-16  6:28                             ` Ronen Shitrit
  0 siblings, 0 replies; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-16  6:28 UTC (permalink / raw)
  To: linux-arm-kernel

Yes it does...

-----Original Message-----
From: George G. Davis [mailto:gdavis at mvista.com] 
Sent: Saturday, May 15, 2010 4:26 AM
To: Catalin Marinas
Cc: Ronen Shitrit; Russell King - ARM Linux; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops

Hi,

On Fri, May 14, 2010 at 05:42:31PM +0100, Catalin Marinas wrote:
> On Fri, 2010-05-14 at 17:29 +0100, Catalin Marinas wrote:
> > What I think I missed in my patch (and didn't show up with a sata drive)
> > is the "sync" operations. The dma_sync_single_for_device(FROMDEVICE)
> > corrupts the existing buffer.
> > 
> > I'll post an updated patch shortly.
> 
> Actually I think the hunk that I posted earlier should be enough:

Um, apologies for the noise but I didn't understand if Ronen said
this resolved the e1000 problem or not.

Thanks!

--
Regards,
George

> 
> 
> diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
> index a4a6840..aa3ac32 100644
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -216,7 +216,8 @@ v6_dma_inv_range:
>  #endif
>  1:
>  #ifdef CONFIG_SMP
> -	str	r0, [r0]			@ write for ownership
> +	ldr	r2, [r0]			@ read for ownership
> +	str	r2, [r0]			@ write for ownership
>  #endif
>  #ifdef HARVARD_CACHE
>  	mcr	p15, 0, r0, c7, c6, 1		@ invalidate D line
> 
> 
> The dma_sync_single_for_cpu() eventually calls dmac_unmap_area() which
> on ARM11MPCore became a no-op. There is no problem since this function
> call is always preceded by a call to either dma_map_single() or
> dma_sync_single_to_device(), both of them calling dma_map_area() which
> does the cache invalidation.
> 
> With the patch above, v6_dma_inv_range() is no longer destructive so the
> transferred data to the SDRAM is preserved. We could even re-instate the
> v6_dma_unmap_area() function but for performance reasons I would leave
> it as a no-op.
> 
> Russell, are you ok with such change? Since you already applied the
> original patch as #6111, I will send an additional patch with this fix
> (I'm sending it directly to the patch system as I won't have access to
> my work PC this weekend. Feel free to apply or reject).
> 
> Thanks.
> 
> -- 
> Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-13  8:26               ` Catalin Marinas
  2010-05-13 13:54                 ` George G. Davis
@ 2010-05-16  6:29                 ` Ronen Shitrit
  2010-05-16 15:01                   ` Russell King - ARM Linux
  2010-05-17  9:51                   ` Catalin Marinas
  1 sibling, 2 replies; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-16  6:29 UTC (permalink / raw)
  To: linux-arm-kernel

Our ARMv6 do speculative rd for both I and D cache...

Regards

-----Original Message-----
From: Catalin Marinas [mailto:catalin.marinas at arm.com] 
Sent: Thursday, May 13, 2010 11:27 AM
To: Ronen Shitrit
Cc: Russell King - ARM Linux; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops

On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> Our ARM v6 does have speculative prefetch...

That's the I-cache and we should be ok with it. What may affect this
patch is speculative fetches into the D-cache.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-16  6:29                 ` Ronen Shitrit
@ 2010-05-16 15:01                   ` Russell King - ARM Linux
  2010-05-17  6:29                     ` Ronen Shitrit
  2010-05-17  9:51                   ` Catalin Marinas
  1 sibling, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-16 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, May 16, 2010 at 09:29:47AM +0300, Ronen Shitrit wrote:
> Our ARMv6 do speculative rd for both I and D cache...

That means doing this read-write for ownership trick can lead to
corrupted data on your CPU, since the CPU may prefetch data from
the DMA buffer before the unmap has happened.

This means Catalin's patch is unsafe for your situation.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-16 15:01                   ` Russell King - ARM Linux
@ 2010-05-17  6:29                     ` Ronen Shitrit
  2010-05-17  6:57                       ` Russell King - ARM Linux
  0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17  6:29 UTC (permalink / raw)
  To: linux-arm-kernel

Got it, thanks.

This is my patch over Catalin original patch for v6_dma_inv_range

--- cache-v6.S	2010-05-17 08:12:18.000000000 +0300
+++ cache-v6.S	2010-05-17 08:16:16.000000000 +0300
@@ -210,9 +210,15 @@
 #else
 	mcrne	p15, 0, r1, c7, c15, 1		@ clean & invalidate unified line
 #endif
-1:
 #ifdef CONFIG_SMP
-	str	r0, [r0]			@ write for ownership
+	mrs	r2, cpsr
+	orr	r3, r2, #PSR_F_BIT | PSR_I_BIT
+	msr	cpsr_c, r3                      @ Disable interrupts
+1:
+	ldr	r2, [r0]                        @ read for ownership
+	str	r2, [r0]                        @ write for ownership
+#else
+1:
 #endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c6, 1		@ invalidate D line
@@ -222,6 +228,9 @@
 	add	r0, r0, #D_CACHE_LINE_SIZE
 	cmp	r0, r1
 	blo	1b
+#ifdef CONFIG_SMP
+	msr	cpsr_c, r2                      @ Restore interrupts
+#endif
 	mov	r0, #0
 	mcr	p15, 0, r0, c7, c10, 4		@ drain write buffer
 	mov	pc, lr



-----Original Message-----
From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk] 
Sent: Sunday, May 16, 2010 6:01 PM
To: Ronen Shitrit
Cc: Catalin Marinas; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops

On Sun, May 16, 2010 at 09:29:47AM +0300, Ronen Shitrit wrote:
> Our ARMv6 do speculative rd for both I and D cache...

That means doing this read-write for ownership trick can lead to
corrupted data on your CPU, since the CPU may prefetch data from
the DMA buffer before the unmap has happened.

This means Catalin's patch is unsafe for your situation.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  6:29                     ` Ronen Shitrit
@ 2010-05-17  6:57                       ` Russell King - ARM Linux
  2010-05-17  7:34                         ` Ronen Shitrit
  0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-17  6:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 17, 2010 at 09:29:35AM +0300, Ronen Shitrit wrote:
> Got it, thanks.

No you haven't; disabling interrupts doesn't prevent CPUs speculatively
prefetching.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  6:57                       ` Russell King - ARM Linux
@ 2010-05-17  7:34                         ` Ronen Shitrit
  2010-05-17  7:43                           ` Russell King - ARM Linux
  2010-05-17 10:00                           ` Catalin Marinas
  0 siblings, 2 replies; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17  7:34 UTC (permalink / raw)
  To: linux-arm-kernel

After some more thinking, I think I found the problem...

DMA for receive should operate as follows:
1) CPU will map (invalidate) the buffers for the DMA
2) DMA will place the buffers and ack the CPU
3) The CPU will call unmap which will invalidate the buffers again


I can only think of one problematic scenario:
- After 1 and before 2: 
	CPU0 did spec prefetch for address x 
- After 2: 
	CPU0 is doing inv: lock int, ldr x, 
		str x (addr x is marked dirty with wrong data).
	CPU1 at this time doing spec prefetch for x which 
		will snoop CPU0 and will cause data corruption... :(

I can assume that if I count on the Soc IO cache coherency support all this isn't relevant, right? Or there are some issues hiding with v6 + IO coherency?


Thanks
Ronen Shitrit



-----Original Message-----
From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk] 
Sent: Monday, May 17, 2010 9:57 AM
To: Ronen Shitrit
Cc: Catalin Marinas; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops

On Mon, May 17, 2010 at 09:29:35AM +0300, Ronen Shitrit wrote:
> Got it, thanks.

No you haven't; disabling interrupts doesn't prevent CPUs speculatively
prefetching.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  7:34                         ` Ronen Shitrit
@ 2010-05-17  7:43                           ` Russell King - ARM Linux
  2010-05-17  8:29                             ` Ronen Shitrit
  2010-05-17 10:00                           ` Catalin Marinas
  1 sibling, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-17  7:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 17, 2010 at 10:34:36AM +0300, Ronen Shitrit wrote:
> After some more thinking, I think I found the problem...
> 
> DMA for receive should operate as follows:
> 1) CPU will map (invalidate) the buffers for the DMA
> 2) DMA will place the buffers and ack the CPU
> 3) The CPU will call unmap which will invalidate the buffers again
> 
> 
> I can only think of one problematic scenario:
> - After 1 and before 2: 
> 	CPU0 did spec prefetch for address x 
> - After 2: 
> 	CPU0 is doing inv: lock int, ldr x, 
> 		str x (addr x is marked dirty with wrong data).

The data has been corrupted at this point; you don't need a second CPU.

> 	CPU1 at this time doing spec prefetch for x which 
> 		will snoop CPU0 and will cause data corruption... :(
> 
> I can assume that if I count on the Soc IO cache coherency support all
> this isn't relevant, right? Or there are some issues hiding with v6 +
> IO coherency?

What "IO cache coherency" ?  Are you saying that your SoC has a mode
where the DMA controller can snoop the CPU caches?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  7:43                           ` Russell King - ARM Linux
@ 2010-05-17  8:29                             ` Ronen Shitrit
  2010-05-17  8:57                               ` Russell King - ARM Linux
  0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17  8:29 UTC (permalink / raw)
  To: linux-arm-kernel



-----Original Message-----
From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk] 
Sent: Monday, May 17, 2010 10:43 AM
To: Ronen Shitrit
Cc: Catalin Marinas; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops

On Mon, May 17, 2010 at 10:34:36AM +0300, Ronen Shitrit wrote:
> After some more thinking, I think I found the problem...
> 
> DMA for receive should operate as follows:
> 1) CPU will map (invalidate) the buffers for the DMA
> 2) DMA will place the buffers and ack the CPU
> 3) The CPU will call unmap which will invalidate the buffers again
> 
> 
> I can only think of one problematic scenario:
> - After 1 and before 2: 
> 	CPU0 did spec prefetch for address x 
> - After 2: 
> 	CPU0 is doing inv: lock int, ldr x, 
> 		str x (addr x is marked dirty with wrong data).

The data has been corrupted at this point; you don't need a second CPU.
[Ronen Shitrit] Why? If the second CPU doesn't exist an invalidate will come just after the str inst and the wrong data will not get anywhere... (interrupts are locked so I assume no eviction...)

> 	CPU1 at this time doing spec prefetch for x which 
> 		will snoop CPU0 and will cause data corruption... :(
> 
> I can assume that if I count on the Soc IO cache coherency support all
> this isn't relevant, right? Or there are some issues hiding with v6 +
> IO coherency?

What "IO cache coherency" ?  Are you saying that your SoC has a mode
where the DMA controller can snoop the CPU caches?


[Ronen Shitrit] Yes. 
I also found that I can disable D-cache spec prefetch.
However I-cache prefetch still exist and we have unified shared L2, so I think the ldr/str WA is still not valid since it might load garbage from the L2 as a result of I-cache spec pref... right?


I guess that for machines that has only I-cache spec pref, unmap should invalidate only the unified caches (in my case the L2).
And for machines that has also D-cache spec pref, unmap should invalidate all caches.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  8:29                             ` Ronen Shitrit
@ 2010-05-17  8:57                               ` Russell King - ARM Linux
  2010-05-17  9:50                                 ` Ronen Shitrit
  0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-17  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 17, 2010 at 11:29:53AM +0300, Ronen Shitrit wrote:
> On Mon, May 17, 2010 at 10:34:36AM +0300, Ronen Shitrit wrote:
> > I can only think of one problematic scenario:
> > - After 1 and before 2: 
> > 	CPU0 did spec prefetch for address x 
> > - After 2: 
> > 	CPU0 is doing inv: lock int, ldr x, 
> > 		str x (addr x is marked dirty with wrong data).
> 
> The data has been corrupted at this point; you don't need a second CPU.
>
> [Ronen Shitrit] Why? If the second CPU doesn't exist an invalidate will
> come just after the str inst and the wrong data will not get anywhere...
> (interrupts are locked so I assume no eviction...)

If a speculative prefetch occurs (eg, to prefetch the next ldr) it
could evict the dirty cache line that the str just wrote to.  Cache
replacement algorithms aren't always round-robin.

> 
> > 	CPU1 at this time doing spec prefetch for x which 
> > 		will snoop CPU0 and will cause data corruption... :(
> > 
> > I can assume that if I count on the Soc IO cache coherency support all
> > this isn't relevant, right? Or there are some issues hiding with v6 +
> > IO coherency?
> 
> What "IO cache coherency" ?  Are you saying that your SoC has a mode
> where the DMA controller can snoop the CPU caches?
> 
> [Ronen Shitrit] Yes. 

That suggests to me that your SoC manufacturer intended the system to run
with DMA coherency enabled - maybe to avoid these problems.

We have a macro in the kernel to avoid unnecessary cache handling as a
result of that - arch_is_coherent().  Maybe you should enable the hardware
DMA coherency and ensure arch_is_coherent() returns true for your SoC.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  8:57                               ` Russell King - ARM Linux
@ 2010-05-17  9:50                                 ` Ronen Shitrit
  2010-05-17 10:03                                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17  9:50 UTC (permalink / raw)
  To: linux-arm-kernel



-----Original Message-----
From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk] 
Sent: Monday, May 17, 2010 11:58 AM
To: Ronen Shitrit
Cc: Catalin Marinas; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops

On Mon, May 17, 2010 at 11:29:53AM +0300, Ronen Shitrit wrote:
> On Mon, May 17, 2010 at 10:34:36AM +0300, Ronen Shitrit wrote:
> > I can only think of one problematic scenario:
> > - After 1 and before 2: 
> > 	CPU0 did spec prefetch for address x 
> > - After 2: 
> > 	CPU0 is doing inv: lock int, ldr x, 
> > 		str x (addr x is marked dirty with wrong data).
> 
> The data has been corrupted at this point; you don't need a second CPU.
>
> [Ronen Shitrit] Why? If the second CPU doesn't exist an invalidate will
> come just after the str inst and the wrong data will not get anywhere...
> (interrupts are locked so I assume no eviction...)

If a speculative prefetch occurs (eg, to prefetch the next ldr) it
could evict the dirty cache line that the str just wrote to.  Cache
replacement algorithms aren't always round-robin.
[Ronen Shitrit] only ldr around is the next line, which shouldn't evict the current line, so I don't see any issue.

> 
> > 	CPU1 at this time doing spec prefetch for x which 
> > 		will snoop CPU0 and will cause data corruption... :(
> > 
> > I can assume that if I count on the Soc IO cache coherency support all
> > this isn't relevant, right? Or there are some issues hiding with v6 +
> > IO coherency?
> 
> What "IO cache coherency" ?  Are you saying that your SoC has a mode
> where the DMA controller can snoop the CPU caches?
> 
> [Ronen Shitrit] Yes. 

That suggests to me that your SoC manufacturer intended the system to run
with DMA coherency enabled - maybe to avoid these problems.

We have a macro in the kernel to avoid unnecessary cache handling as a
result of that - arch_is_coherent().  Maybe you should enable the hardware
DMA coherency and ensure arch_is_coherent() returns true for your SoC.
[Ronen Shitrit] I know, I just wanted to make sure we have fall back for case of some HW malfunctioning.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-16  6:29                 ` Ronen Shitrit
  2010-05-16 15:01                   ` Russell King - ARM Linux
@ 2010-05-17  9:51                   ` Catalin Marinas
  2010-05-17  9:57                     ` Catalin Marinas
  1 sibling, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17  9:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 2010-05-16 at 07:29 +0100, Ronen Shitrit wrote:
> Our ARMv6 do speculative rd for both I and D cache...

I'll check with the hardware guys here in ARM and get back to you.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  9:51                   ` Catalin Marinas
@ 2010-05-17  9:57                     ` Catalin Marinas
  2010-05-17  9:59                       ` Ronen Shitrit
  0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2010-05-17 at 10:51 +0100, Catalin Marinas wrote:
> On Sun, 2010-05-16 at 07:29 +0100, Ronen Shitrit wrote:
> > Our ARMv6 do speculative rd for both I and D cache...
> 
> I'll check with the hardware guys here in ARM and get back to you.

Just for clarification - is your ARMv6 MP processor an ARM11MPCore or
your own MP variant?

In case of Cortex-A9 (ARMv7), the TRM states clearly that it can do
speculative loads into the L1 D-cache and this can be disabled via a bit
in the auxiliary control register. Do you have a similar bit on your
ARMv6 MP processor?

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  9:57                     ` Catalin Marinas
@ 2010-05-17  9:59                       ` Ronen Shitrit
  2010-05-17 11:08                         ` Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17  9:59 UTC (permalink / raw)
  To: linux-arm-kernel



-----Original Message-----
From: Catalin Marinas [mailto:catalin.marinas at arm.com] 
Sent: Monday, May 17, 2010 12:58 PM
To: Ronen Shitrit
Cc: Russell King - ARM Linux; linux-arm-kernel at lists.infradead.org
Subject: RE: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops

On Mon, 2010-05-17 at 10:51 +0100, Catalin Marinas wrote:
> On Sun, 2010-05-16 at 07:29 +0100, Ronen Shitrit wrote:
> > Our ARMv6 do speculative rd for both I and D cache...
> 
> I'll check with the hardware guys here in ARM and get back to you.

Just for clarification - is your ARMv6 MP processor an ARM11MPCore or
your own MP variant?
[Ronen Shitrit] our own.

In case of Cortex-A9 (ARMv7), the TRM states clearly that it can do
speculative loads into the L1 D-cache and this can be disabled via a bit
in the auxiliary control register. Do you have a similar bit on your
ARMv6 MP processor?
[Ronen Shitrit] Yes.

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  7:34                         ` Ronen Shitrit
  2010-05-17  7:43                           ` Russell King - ARM Linux
@ 2010-05-17 10:00                           ` Catalin Marinas
  2010-05-17 11:29                             ` Ronen Shitrit
  1 sibling, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 10:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2010-05-17 at 10:34 +0300, Ronen Shitrit wrote:
> After some more thinking, I think I found the problem...
> 
> DMA for receive should operate as follows:
> 1) CPU will map (invalidate) the buffers for the DMA
> 2) DMA will place the buffers and ack the CPU
> 3) The CPU will call unmap which will invalidate the buffers again

(3) doesn't happen with my patch - the unmap function is now a no-op
since the invalidate operation is destructive.

As I said in a previous e-mail, it's the sync operation that does an
invalidate and this would affect the data in the buffer. Hence the patch
to make the invalidate operation non-destructive (as long as your CPU
doesn't do speculative loads into the D-cache or they can be disabled).

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  9:50                                 ` Ronen Shitrit
@ 2010-05-17 10:03                                   ` Russell King - ARM Linux
  2010-05-17 11:26                                     ` Ronen Shitrit
  0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-17 10:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 17, 2010 at 12:50:01PM +0300, Ronen Shitrit wrote:
> If a speculative prefetch occurs (eg, to prefetch the next ldr) it
> could evict the dirty cache line that the str just wrote to.  Cache
> replacement algorithms aren't always round-robin.
>
> [Ronen Shitrit] only ldr around is the next line, which shouldn't
> evict the current line, so I don't see any issue.

How can you say that the current line won't be evicted?  Do you know
the cache replacement algorithm for your CPU that well?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17  9:59                       ` Ronen Shitrit
@ 2010-05-17 11:08                         ` Catalin Marinas
  2010-05-17 11:27                           ` Ronen Shitrit
  0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 11:08 UTC (permalink / raw)
  To: linux-arm-kernel

> In case of Cortex-A9 (ARMv7), the TRM states clearly that it can do
> speculative loads into the L1 D-cache and this can be disabled via a bit
> in the auxiliary control register. Do you have a similar bit on your
> ARMv6 MP processor?
> 
> [Ronen Shitrit] Yes.

So, use it :)

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17 10:03                                   ` Russell King - ARM Linux
@ 2010-05-17 11:26                                     ` Ronen Shitrit
  2010-05-17 11:31                                       ` Russell King - ARM Linux
  0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 11:26 UTC (permalink / raw)
  To: linux-arm-kernel



-----Original Message-----
From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk] 
Sent: Monday, May 17, 2010 1:04 PM
To: Ronen Shitrit
Cc: Catalin Marinas; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops

On Mon, May 17, 2010 at 12:50:01PM +0300, Ronen Shitrit wrote:
> If a speculative prefetch occurs (eg, to prefetch the next ldr) it
> could evict the dirty cache line that the str just wrote to.  Cache
> replacement algorithms aren't always round-robin.
>
> [Ronen Shitrit] only ldr around is the next line, which shouldn't
> evict the current line, so I don't see any issue.

How can you say that the current line won't be evicted?
[Ronen Shitrit] Since the only ldr I see around is the ldr to the next line and next line will go to different line on the cache...
  Do you know
the cache replacement algorithm for your CPU that well?
[Ronen Shitrit] We can configure to work in LRU or random replacement...

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17 11:08                         ` Catalin Marinas
@ 2010-05-17 11:27                           ` Ronen Shitrit
  2010-05-17 11:47                             ` Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

Performance wise we prefer to enable it...
But it seems to be on the safe side we will disable it.

Thanks

-----Original Message-----
From: Catalin Marinas [mailto:catalin.marinas at arm.com] 
Sent: Monday, May 17, 2010 2:08 PM
To: Ronen Shitrit
Cc: Russell King - ARM Linux; linux-arm-kernel at lists.infradead.org
Subject: RE: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops

> In case of Cortex-A9 (ARMv7), the TRM states clearly that it can do
> speculative loads into the L1 D-cache and this can be disabled via a bit
> in the auxiliary control register. Do you have a similar bit on your
> ARMv6 MP processor?
> 
> [Ronen Shitrit] Yes.

So, use it :)

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17 10:00                           ` Catalin Marinas
@ 2010-05-17 11:29                             ` Ronen Shitrit
  2010-05-17 11:42                               ` Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 11:29 UTC (permalink / raw)
  To: linux-arm-kernel



-----Original Message-----
From: Catalin Marinas [mailto:catalin.marinas at arm.com] 
Sent: Monday, May 17, 2010 1:01 PM
To: Ronen Shitrit
Cc: Russell King - ARM Linux; linux-arm-kernel at lists.infradead.org
Subject: RE: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops

On Mon, 2010-05-17 at 10:34 +0300, Ronen Shitrit wrote:
> After some more thinking, I think I found the problem...
> 
> DMA for receive should operate as follows:
> 1) CPU will map (invalidate) the buffers for the DMA
> 2) DMA will place the buffers and ack the CPU
> 3) The CPU will call unmap which will invalidate the buffers again

(3) doesn't happen with my patch - the unmap function is now a no-op
since the invalidate operation is destructive.

[Ronen Shitrit] so how do you protect your self from I-cache spec prefetch?
Do you have a unified L2?

As I said in a previous e-mail, it's the sync operation that does an
invalidate and this would affect the data in the buffer. Hence the patch
to make the invalidate operation non-destructive (as long as your CPU
doesn't do speculative loads into the D-cache or they can be disabled).
[Ronen Shitrit] What about I-cache prefetch? Unified cache?

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17 11:26                                     ` Ronen Shitrit
@ 2010-05-17 11:31                                       ` Russell King - ARM Linux
  2010-05-17 11:45                                         ` Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-17 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 17, 2010 at 02:26:38PM +0300, Ronen Shitrit wrote:
> > How can you say that the current line won't be evicted?
>
> [Ronen Shitrit] Since the only ldr I see around is the ldr to the next
> line and next line will go to different line on the cache...

I think you mean different index, and yes, you're right about that.

However, I still feel that your solution is unsafe as long as
speculative prefetch is enabled; the assumption with speculative
prefetch from an architectural point of view is to assume that
the CPU has infinite prefetching.

The fact that we have non-ARM ARMv6 CPUs which do prefetch in ways
we don't know about means that we can't assume that ARMv6 CPUs
aren't going to have aggressive prefetching.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17 11:29                             ` Ronen Shitrit
@ 2010-05-17 11:42                               ` Catalin Marinas
  2010-05-17 12:04                                 ` Ronen Shitrit
  0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 11:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2010-05-17 at 14:29 +0300, Ronen Shitrit wrote:
> From: Catalin Marinas [mailto:catalin.marinas at arm.com] 
> On Mon, 2010-05-17 at 10:34 +0300, Ronen Shitrit wrote:
> > After some more thinking, I think I found the problem...
> > 
> > DMA for receive should operate as follows:
> > 1) CPU will map (invalidate) the buffers for the DMA
> > 2) DMA will place the buffers and ack the CPU
> > 3) The CPU will call unmap which will invalidate the buffers again
> 
> (3) doesn't happen with my patch - the unmap function is now a no-op
> since the invalidate operation is destructive.
> 
> [Ronen Shitrit] so how do you protect your self from I-cache spec prefetch?

Strictly looking at the DMA cache maintenance, the API only cares about
the D-side or unified (the I-side is handled by the kernel via other
functions like flush_dcache_page).

> Do you have a unified L2?

Yes, it's a unified L2, so I-cache prefetching can cause allocation into
the L2 cache. The L2 cache is considered "outer cache" in Linux and
invalidated explicitly both before and after the DMA transfer (from
device). Since the L2 (at least on the RealView boards) is common to all
the processors, there is no need for tricks like read/write for
ownership, so the invalidation doesn't affect the underlying RAM.

> As I said in a previous e-mail, it's the sync operation that does an
> invalidate and this would affect the data in the buffer. Hence the patch
> to make the invalidate operation non-destructive (as long as your CPU
> doesn't do speculative loads into the D-cache or they can be disabled).
> 
> [Ronen Shitrit] What about I-cache prefetch? Unified cache?

The "sync_for_cpu" operation (from device case) first invalidates the L2
cache and then invalidates the L1 using a LDR/STR pair and cache op. So
a the LDR would see the data in RAM rather than some speculatively
loaded value in the L2.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17 11:31                                       ` Russell King - ARM Linux
@ 2010-05-17 11:45                                         ` Catalin Marinas
  0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 11:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2010-05-17 at 12:31 +0100, Russell King - ARM Linux wrote:
> The fact that we have non-ARM ARMv6 CPUs which do prefetch in ways
> we don't know about means that we can't assume that ARMv6 CPUs
> aren't going to have aggressive prefetching.

I agree with this. Maybe we should add a config option for the DMA ops
workaround and state clearly in the help that it doesn't work with
speculative loads into the D-cache. Usually implementations have either
a bit in CP15 ACTLR or some hidden (undocumented) bit somewhere else to
disable the speculative loads.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17 11:27                           ` Ronen Shitrit
@ 2010-05-17 11:47                             ` Catalin Marinas
  2010-05-17 13:46                               ` [PATCH 2/8] ARM: Implement read/write for ownership intheARMv6 " Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 11:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2010-05-17 at 14:27 +0300, Ronen Shitrit wrote:
> Performance wise we prefer to enable it...
> But it seems to be on the safe side we will disable it.

It's worth doing some benchmarking.

Anyway, if it's a new SoC, you may have the option of implementing IPI
via FIQ and you could broadcast the cache maintenance operations via FIQ
(and avoid potential deadlocks in the kernel).

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17 11:42                               ` Catalin Marinas
@ 2010-05-17 12:04                                 ` Ronen Shitrit
  2010-05-17 13:45                                   ` Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 12:04 UTC (permalink / raw)
  To: linux-arm-kernel



> As I said in a previous e-mail, it's the sync operation that does an
> invalidate and this would affect the data in the buffer. Hence the patch
> to make the invalidate operation non-destructive (as long as your CPU
> doesn't do speculative loads into the D-cache or they can be disabled).
> 
> [Ronen Shitrit] What about I-cache prefetch? Unified cache?

The "sync_for_cpu" operation (from device case) first invalidates the L2
cache and then invalidates the L1 using a LDR/STR pair and cache op.
[Ronen Shitrit] So for v6-MP, if no D-cache spec pref, why do you need to inv the L1 at all?



-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
  2010-05-17 12:04                                 ` Ronen Shitrit
@ 2010-05-17 13:45                                   ` Catalin Marinas
  0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 13:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2010-05-17 at 13:04 +0100, Ronen Shitrit wrote:
> > As I said in a previous e-mail, it's the sync operation that does an
> > invalidate and this would affect the data in the buffer. Hence the patch
> > to make the invalidate operation non-destructive (as long as your CPU
> > doesn't do speculative loads into the D-cache or they can be disabled).
> >
> > [Ronen Shitrit] What about I-cache prefetch? Unified cache?
> 
> The "sync_for_cpu" operation (from device case) first invalidates the L2
> cache and then invalidates the L1 using a LDR/STR pair and cache op.
> [Ronen Shitrit] So for v6-MP, if no D-cache spec pref, why do you need to inv the L1 at all?

No, you are correct. The ___dma_single_dev_to_cpu(FROM_DEVICE) function
calls outer_inv_range() followed by dmac_unmap_area(). The latter,
however, is a no-op with my DMA workaround patch.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/8] ARM: Implement read/write for ownership intheARMv6 DMA cache ops
  2010-05-17 11:47                             ` Catalin Marinas
@ 2010-05-17 13:46                               ` Catalin Marinas
  0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 13:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2010-05-17 at 12:47 +0100, Catalin Marinas wrote:
> On Mon, 2010-05-17 at 14:27 +0300, Ronen Shitrit wrote:
> > Performance wise we prefer to enable it...
> > But it seems to be on the safe side we will disable it.
> 
> It's worth doing some benchmarking.
> 
> Anyway, if it's a new SoC, you may have the option of implementing IPI
> via FIQ and you could broadcast the cache maintenance operations via FIQ
> (and avoid potential deadlocks in the kernel).

Or even better - implement cache broadcasting in hardware.

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
  2010-05-04 16:44 ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP Catalin Marinas
@ 2010-05-28 18:50   ` Russell King - ARM Linux
  2010-05-28 21:37     ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB opson " Catalin Marinas
  0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-28 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 04, 2010 at 05:44:54PM +0100, Catalin Marinas wrote:
> The standard I-cache Invalidate All (ICIALLU) and Branch Predication
> Invalidate All (BPIALL) operations are not automatically broadcast to
> the other CPUs in an ARMv7 MP system. The patch adds the Inner Shareable
> variants, ICIALLUIS and BPIALLIS, if ARMv7 and SMP.

FYI, I'm considering reverting this patch; this patch is one of two
responsible for causing a stability regression of Versatile Express
platform.

The two commits responsible for making Versatile Express userspace
completely unstable are:

b8349b569aae661dea9d59d7d2ee587ccea3336c
ARM: 6112/1: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP

7e5a69e83ba7a0d5917ad830f417cba8b8d6aa72
ARM: 6007/1: fix highmem with VIPT cache and DMA

Most of the instability is from the second patch, but reverting just
the second patch doesn't fully restore userspace stability.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB opson ARMv7 SMP
  2010-05-28 18:50   ` Russell King - ARM Linux
@ 2010-05-28 21:37     ` Catalin Marinas
  0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-28 21:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2010-05-28 at 19:50 +0100, Russell King - ARM Linux wrote:
> On Tue, May 04, 2010 at 05:44:54PM +0100, Catalin Marinas wrote:
> > The standard I-cache Invalidate All (ICIALLU) and Branch Predication
> > Invalidate All (BPIALL) operations are not automatically broadcast to
> > the other CPUs in an ARMv7 MP system. The patch adds the Inner Shareable
> > variants, ICIALLUIS and BPIALLIS, if ARMv7 and SMP.
> 
> FYI, I'm considering reverting this patch; this patch is one of two
> responsible for causing a stability regression of Versatile Express
> platform.
> 
> The two commits responsible for making Versatile Express userspace
> completely unstable are:
> 
> b8349b569aae661dea9d59d7d2ee587ccea3336c
> ARM: 6112/1: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
> 
> 7e5a69e83ba7a0d5917ad830f417cba8b8d6aa72
> ARM: 6007/1: fix highmem with VIPT cache and DMA
> 
> Most of the instability is from the second patch, but reverting just
> the second patch doesn't fully restore userspace stability.

That's a bit weird as my patch is supposed to make things better. Does
it fail in the same way with maxcpus=1? For SMP configuration, you could
also try some of the recent patches from my "proposed" branch:

http://git.kernel.org/?p=linux/kernel/git/cmarinas/linux-2.6-cm.git;a=shortlog;h=refs/heads/proposed

especially:

ARM: Use lazy cache flushing on ARMv7 SMP systems
ARM: Assume new page cache pages have dirty D-cache
ARM: Synchronise the I and D caches via set_pte_at

-- 
Catalin

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2010-05-28 21:37 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
2010-05-04 16:44 ` [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used Catalin Marinas
2010-05-04 17:02   ` Jason McMullan
2010-05-05 16:07     ` Catalin Marinas
2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
2010-05-04 17:04   ` Jason McMullan
2010-05-05 16:23     ` Catalin Marinas
2010-05-05 13:26   ` George G. Davis
2010-05-06 14:40     ` Catalin Marinas
2010-05-06 15:57       ` George G. Davis
2010-05-12 12:51   ` Ronen Shitrit
2010-05-12 13:55     ` Catalin Marinas
2010-05-12 15:03       ` Ronen Shitrit
2010-05-12 18:48       ` Russell King - ARM Linux
2010-05-12 18:59         ` Russell King - ARM Linux
2010-05-12 20:00           ` Ronen Shitrit
2010-05-12 20:04             ` Russell King - ARM Linux
2010-05-12 20:19               ` Ronen Shitrit
2010-05-12 21:21           ` [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 " Catalin Marinas
2010-05-13  5:27             ` Ronen Shitrit
2010-05-13  8:26               ` Catalin Marinas
2010-05-13 13:54                 ` George G. Davis
2010-05-13 14:15                   ` Catalin Marinas
2010-05-13 20:34                     ` George G. Davis
2010-05-14 16:29                       ` Catalin Marinas
2010-05-14 16:42                         ` Catalin Marinas
2010-05-15  1:26                           ` George G. Davis
2010-05-16  6:28                             ` Ronen Shitrit
2010-05-16  6:29                 ` Ronen Shitrit
2010-05-16 15:01                   ` Russell King - ARM Linux
2010-05-17  6:29                     ` Ronen Shitrit
2010-05-17  6:57                       ` Russell King - ARM Linux
2010-05-17  7:34                         ` Ronen Shitrit
2010-05-17  7:43                           ` Russell King - ARM Linux
2010-05-17  8:29                             ` Ronen Shitrit
2010-05-17  8:57                               ` Russell King - ARM Linux
2010-05-17  9:50                                 ` Ronen Shitrit
2010-05-17 10:03                                   ` Russell King - ARM Linux
2010-05-17 11:26                                     ` Ronen Shitrit
2010-05-17 11:31                                       ` Russell King - ARM Linux
2010-05-17 11:45                                         ` Catalin Marinas
2010-05-17 10:00                           ` Catalin Marinas
2010-05-17 11:29                             ` Ronen Shitrit
2010-05-17 11:42                               ` Catalin Marinas
2010-05-17 12:04                                 ` Ronen Shitrit
2010-05-17 13:45                                   ` Catalin Marinas
2010-05-17  9:51                   ` Catalin Marinas
2010-05-17  9:57                     ` Catalin Marinas
2010-05-17  9:59                       ` Ronen Shitrit
2010-05-17 11:08                         ` Catalin Marinas
2010-05-17 11:27                           ` Ronen Shitrit
2010-05-17 11:47                             ` Catalin Marinas
2010-05-17 13:46                               ` [PATCH 2/8] ARM: Implement read/write for ownership intheARMv6 " Catalin Marinas
2010-05-04 16:44 ` [PATCH 3/8] ARM: Align machine_desc.phys_io to a 1MB section Catalin Marinas
2010-05-04 16:44 ` [PATCH 4/8] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
2010-05-04 16:44 ` [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c Catalin Marinas
2010-05-04 17:19   ` Russell King - ARM Linux
2010-05-04 16:44 ` [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU Catalin Marinas
2010-05-04 17:19   ` Russell King - ARM Linux
2010-05-04 16:44 ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP Catalin Marinas
2010-05-28 18:50   ` Russell King - ARM Linux
2010-05-28 21:37     ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB opson " Catalin Marinas
2010-05-04 16:44 ` [PATCH 8/8] ARM: Implement phys_mem_access_prot() to avoid attributes aliasing Catalin Marinas
2010-05-04 16:48 ` [PATCH 0/8] Various patches for comments and upstream Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).