* [PATCH 0/8] Various patches for comments and upstream
@ 2010-05-04 16:44 Catalin Marinas
2010-05-04 16:44 ` [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used Catalin Marinas
` (8 more replies)
0 siblings, 9 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
I've been posting most of these patches in the past (some of them for a
long time, e.g. the domains removal) and I would like some comments on
them. There were some previous discussion threads which seem to have
died without a clear conclusion.
A few of these patches are trivial fixes that could go upstream (during
the upcoming merging window). For the other patches like domains
removal, I would like to push them to linux-next if there are no
(further :)) comments.
Thanks.
Catalin Marinas (8):
ARM: Improve the L2 cache performance when PL310 is used
ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
ARM: Align machine_desc.phys_io to a 1MB section
ARM: Remove the domain switching on ARMv6k/v7 CPUs
ARM: Fix the __arm_ioremap_caller() definition in nommu.c
ARM: Implement copy_to_user_page() for noMMU
ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
ARM: Implement phys_mem_access_prot() to avoid attributes aliasing
arch/arm/include/asm/assembler.h | 13 +++--
arch/arm/include/asm/cacheflush.h | 4 ++
arch/arm/include/asm/domain.h | 31 ++++++++++++
arch/arm/include/asm/futex.h | 9 ++--
arch/arm/include/asm/pgtable.h | 3 +
arch/arm/include/asm/tlbflush.h | 29 +++++++++++-
arch/arm/include/asm/uaccess.h | 16 +++---
arch/arm/kernel/entry-armv.S | 4 +-
arch/arm/kernel/head.S | 2 +
arch/arm/kernel/setup.c | 2 -
arch/arm/kernel/traps.c | 17 +++++++
arch/arm/lib/getuser.S | 13 +++--
arch/arm/lib/putuser.S | 29 ++++++------
arch/arm/lib/uaccess.S | 83 +++++++++++++++++----------------
arch/arm/mm/Kconfig | 15 ++++++
arch/arm/mm/cache-l2x0.c | 74 +++++++++++++++++++++---------
arch/arm/mm/cache-v6.S | 17 +++++--
arch/arm/mm/cache-v7.S | 4 ++
arch/arm/mm/init.c | 14 ++----
arch/arm/mm/mmu.c | 14 ++++++
arch/arm/mm/nommu.c | 13 ++++-
arch/arm/mm/proc-macros.S | 7 +++
arch/arm/mm/proc-v7.S | 5 +-
arch/arm/mm/tlb-v7.S | 8 +++
drivers/net/smsc911x.c | 92 +++++++++++++++++++++----------------
mm/bootmem.c | 2 +
mm/page_alloc.c | 1
27 files changed, 355 insertions(+), 166 deletions(-)
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
2010-05-04 17:02 ` Jason McMullan
2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
` (7 subsequent siblings)
8 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
To: linux-arm-kernel
With this L2 cache controller, the cache maintenance by PA and sync
operations are atomic and do not require a "wait" loop or spinlocks.
This patch conditionally defines the cache_wait() function and locking
primitives (rather than duplicating the functions or file).
Since L2x0 cache controllers do not work with ARMv7 CPUs, the patch
automatically enables CACHE_PL310 when CPU_V7 is defined.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/mm/Kconfig | 7 ++++
arch/arm/mm/cache-l2x0.c | 73 ++++++++++++++++++++++++++++++++--------------
2 files changed, 57 insertions(+), 23 deletions(-)
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 5bd7c89..5df74c1 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -767,6 +767,13 @@ config CACHE_L2X0
help
This option enables the L2x0 PrimeCell.
+config CACHE_PL310
+ bool
+ depends on CACHE_L2X0
+ default y if CPU_V7
+ help
+ This option enables support for the PL310 cache controller.
+
config CACHE_TAUROS2
bool "Enable the Tauros2 L2 cache controller"
depends on ARCH_DOVE
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 21ad68b..87bd0a0 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -26,15 +26,42 @@
#define CACHE_LINE_SIZE 32
static void __iomem *l2x0_base;
-static DEFINE_SPINLOCK(l2x0_lock);
-static inline void cache_wait(void __iomem *reg, unsigned long mask)
+static inline void cache_wait_always(void __iomem *reg, unsigned long mask)
{
/* wait for the operation to complete */
while (readl(reg) & mask)
;
}
+#ifdef CONFIG_CACHE_PL310
+
+static inline void cache_wait(void __iomem *reg, unsigned long mask)
+{
+ /* cache operations are atomic */
+}
+
+#define _l2x0_lock(lock, flags) ((void)(flags))
+#define _l2x0_unlock(lock, flags) ((void)(flags))
+
+#define block_end(start, end) (end)
+
+#define L2CC_TYPE "PL310/L2C-310"
+
+#else /* !CONFIG_CACHE_PL310 */
+
+#define cache_wait cache_wait_always
+
+static DEFINE_SPINLOCK(l2x0_lock);
+#define _l2x0_lock(lock, flags) spin_lock_irqsave(lock, flags)
+#define _l2x0_unlock(lock, flags) spin_unlock_irqrestore(lock, flags)
+
+#define block_end(start, end) ((start) + min((end) - (start), 4096UL))
+
+#define L2CC_TYPE "L2x0"
+
+#endif /* CONFIG_CACHE_PL310 */
+
static inline void cache_sync(void)
{
void __iomem *base = l2x0_base;
@@ -97,9 +124,9 @@ static void l2x0_cache_sync(void)
{
unsigned long flags;
- spin_lock_irqsave(&l2x0_lock, flags);
+ _l2x0_lock(&l2x0_lock, flags);
cache_sync();
- spin_unlock_irqrestore(&l2x0_lock, flags);
+ _l2x0_unlock(&l2x0_lock, flags);
}
static inline void l2x0_inv_all(void)
@@ -107,11 +134,11 @@ static inline void l2x0_inv_all(void)
unsigned long flags;
/* invalidate all ways */
- spin_lock_irqsave(&l2x0_lock, flags);
+ _l2x0_lock(&l2x0_lock, flags);
writel(0xff, l2x0_base + L2X0_INV_WAY);
- cache_wait(l2x0_base + L2X0_INV_WAY, 0xff);
+ cache_wait_always(l2x0_base + L2X0_INV_WAY, 0xff);
cache_sync();
- spin_unlock_irqrestore(&l2x0_lock, flags);
+ _l2x0_unlock(&l2x0_lock, flags);
}
static void l2x0_inv_range(unsigned long start, unsigned long end)
@@ -119,7 +146,7 @@ static void l2x0_inv_range(unsigned long start, unsigned long end)
void __iomem *base = l2x0_base;
unsigned long flags;
- spin_lock_irqsave(&l2x0_lock, flags);
+ _l2x0_lock(&l2x0_lock, flags);
if (start & (CACHE_LINE_SIZE - 1)) {
start &= ~(CACHE_LINE_SIZE - 1);
debug_writel(0x03);
@@ -136,7 +163,7 @@ static void l2x0_inv_range(unsigned long start, unsigned long end)
}
while (start < end) {
- unsigned long blk_end = start + min(end - start, 4096UL);
+ unsigned long blk_end = block_end(start, end);
while (start < blk_end) {
l2x0_inv_line(start);
@@ -144,13 +171,13 @@ static void l2x0_inv_range(unsigned long start, unsigned long end)
}
if (blk_end < end) {
- spin_unlock_irqrestore(&l2x0_lock, flags);
- spin_lock_irqsave(&l2x0_lock, flags);
+ _l2x0_unlock(&l2x0_lock, flags);
+ _l2x0_lock(&l2x0_lock, flags);
}
}
cache_wait(base + L2X0_INV_LINE_PA, 1);
cache_sync();
- spin_unlock_irqrestore(&l2x0_lock, flags);
+ _l2x0_unlock(&l2x0_lock, flags);
}
static void l2x0_clean_range(unsigned long start, unsigned long end)
@@ -158,10 +185,10 @@ static void l2x0_clean_range(unsigned long start, unsigned long end)
void __iomem *base = l2x0_base;
unsigned long flags;
- spin_lock_irqsave(&l2x0_lock, flags);
+ _l2x0_lock(&l2x0_lock, flags);
start &= ~(CACHE_LINE_SIZE - 1);
while (start < end) {
- unsigned long blk_end = start + min(end - start, 4096UL);
+ unsigned long blk_end = block_end(start, end);
while (start < blk_end) {
l2x0_clean_line(start);
@@ -169,13 +196,13 @@ static void l2x0_clean_range(unsigned long start, unsigned long end)
}
if (blk_end < end) {
- spin_unlock_irqrestore(&l2x0_lock, flags);
- spin_lock_irqsave(&l2x0_lock, flags);
+ _l2x0_unlock(&l2x0_lock, flags);
+ _l2x0_lock(&l2x0_lock, flags);
}
}
cache_wait(base + L2X0_CLEAN_LINE_PA, 1);
cache_sync();
- spin_unlock_irqrestore(&l2x0_lock, flags);
+ _l2x0_unlock(&l2x0_lock, flags);
}
static void l2x0_flush_range(unsigned long start, unsigned long end)
@@ -183,10 +210,10 @@ static void l2x0_flush_range(unsigned long start, unsigned long end)
void __iomem *base = l2x0_base;
unsigned long flags;
- spin_lock_irqsave(&l2x0_lock, flags);
+ _l2x0_lock(&l2x0_lock, flags);
start &= ~(CACHE_LINE_SIZE - 1);
while (start < end) {
- unsigned long blk_end = start + min(end - start, 4096UL);
+ unsigned long blk_end = block_end(start, end);
debug_writel(0x03);
while (start < blk_end) {
@@ -196,13 +223,13 @@ static void l2x0_flush_range(unsigned long start, unsigned long end)
debug_writel(0x00);
if (blk_end < end) {
- spin_unlock_irqrestore(&l2x0_lock, flags);
- spin_lock_irqsave(&l2x0_lock, flags);
+ _l2x0_unlock(&l2x0_lock, flags);
+ _l2x0_lock(&l2x0_lock, flags);
}
}
cache_wait(base + L2X0_CLEAN_INV_LINE_PA, 1);
cache_sync();
- spin_unlock_irqrestore(&l2x0_lock, flags);
+ _l2x0_unlock(&l2x0_lock, flags);
}
void __init l2x0_init(void __iomem *base, __u32 aux_val, __u32 aux_mask)
@@ -236,5 +263,5 @@ void __init l2x0_init(void __iomem *base, __u32 aux_val, __u32 aux_mask)
outer_cache.flush_range = l2x0_flush_range;
outer_cache.sync = l2x0_cache_sync;
- printk(KERN_INFO "L2X0 cache controller enabled\n");
+ pr_info(L2CC_TYPE " cache controller enabled\n");
}
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
2010-05-04 16:44 ` [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
2010-05-04 17:04 ` Jason McMullan
` (2 more replies)
2010-05-04 16:44 ` [PATCH 3/8] ARM: Align machine_desc.phys_io to a 1MB section Catalin Marinas
` (6 subsequent siblings)
8 siblings, 3 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
To: linux-arm-kernel
The Snoop Control Unit on the ARM11MPCore hardware does not detect the
cache operations and the dma_cache_maint*() functions may leave stale
cache entries on other CPUs. The solution implemented in this patch
performs a Read or Write For Ownership in the ARMv6 DMA cache
maintenance functions. These LDR/STR instructions change the cache line
state to shared or exclusive so that the cache maintenance operation has
the desired effect.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/mm/cache-v6.S | 17 +++++++++++++----
1 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 9d89c67..e46ecd8 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -211,6 +211,9 @@ v6_dma_inv_range:
mcrne p15, 0, r1, c7, c15, 1 @ clean & invalidate unified line
#endif
1:
+#ifdef CONFIG_SMP
+ str r0, [r0] @ write for ownership
+#endif
#ifdef HARVARD_CACHE
mcr p15, 0, r0, c7, c6, 1 @ invalidate D line
#else
@@ -231,6 +234,9 @@ v6_dma_inv_range:
v6_dma_clean_range:
bic r0, r0, #D_CACHE_LINE_SIZE - 1
1:
+#ifdef CONFIG_SMP
+ ldr r2, [r0] @ read for ownership
+#endif
#ifdef HARVARD_CACHE
mcr p15, 0, r0, c7, c10, 1 @ clean D line
#else
@@ -251,6 +257,10 @@ v6_dma_clean_range:
ENTRY(v6_dma_flush_range)
bic r0, r0, #D_CACHE_LINE_SIZE - 1
1:
+#ifdef CONFIG_SMP
+ ldr r2, [r0] @ read for ownership
+ str r2, [r0] @ write for ownership
+#endif
#ifdef HARVARD_CACHE
mcr p15, 0, r0, c7, c14, 1 @ clean & invalidate D line
#else
@@ -273,7 +283,9 @@ ENTRY(v6_dma_map_area)
add r1, r1, r0
teq r2, #DMA_FROM_DEVICE
beq v6_dma_inv_range
- b v6_dma_clean_range
+ teq r2, #DMA_TO_DEVICE
+ beq v6_dma_clean_range
+ b v6_dma_flush_range
ENDPROC(v6_dma_map_area)
/*
@@ -283,9 +295,6 @@ ENDPROC(v6_dma_map_area)
* - dir - DMA direction
*/
ENTRY(v6_dma_unmap_area)
- add r1, r1, r0
- teq r2, #DMA_TO_DEVICE
- bne v6_dma_inv_range
mov pc, lr
ENDPROC(v6_dma_unmap_area)
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 3/8] ARM: Align machine_desc.phys_io to a 1MB section
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
2010-05-04 16:44 ` [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used Catalin Marinas
2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
2010-05-04 16:44 ` [PATCH 4/8] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
` (5 subsequent siblings)
8 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
To: linux-arm-kernel
Platforms like RealView don't pass a section-aligned pointer via the
machine_desc structure. This patch aligns the pointer in the
__create_page_tables function. Reported by Tony Thompson.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/kernel/head.S | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index eb62bf9..82ea924 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -302,6 +302,8 @@ __create_page_tables:
movhi r3, #0x0800
add r6, r0, r3
ldr r3, [r8, #MACHINFO_PHYSIO]
+ mov r3, r3, lsr #20 @ 1MB-aligned address
+ mov r3, r3, lsl #20
orr r3, r3, r7
1: str r3, [r0], #4
add r3, r3, #1 << 20
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 4/8] ARM: Remove the domain switching on ARMv6k/v7 CPUs
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
` (2 preceding siblings ...)
2010-05-04 16:44 ` [PATCH 3/8] ARM: Align machine_desc.phys_io to a 1MB section Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
2010-05-04 16:44 ` [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c Catalin Marinas
` (4 subsequent siblings)
8 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
To: linux-arm-kernel
This patch removes the domain switching functionality via the set_fs and
__switch_to functions on cores that have a TLS register.
Currently, the ioremap and vmalloc areas share the same level 1 page
tables and therefore have the same domain (DOMAIN_KERNEL). When the
kernel domain is modified from Client to Manager (via the __set_fs or in
the __switch_to function), the XN (eXecute Never) bit is overridden and
newer CPUs can speculatively prefetch the ioremap'ed memory.
Linux performs the kernel domain switching to allow user-specific
functions (copy_to/from_user, get/put_user etc.) to access kernel
memory. In order for these functions to work with the kernel domain set
to Client, the patch modifies the LDRT/STRT and related instructions to
the LDR/STR ones.
The user pages access rights are also modified for kernel read-only
access rather than read/write so that the copy-on-write mechanism still
works. CPU_USE_DOMAINS gets disabled only if HAS_TLS_REG is defined
since writing the TLS value to the high vectors page isn't possible.
The user addresses passed to the kernel are checked by the access_ok()
function so that they do not point to the kernel space.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/assembler.h | 13 +++---
arch/arm/include/asm/domain.h | 31 +++++++++++++-
arch/arm/include/asm/futex.h | 9 ++--
arch/arm/include/asm/uaccess.h | 16 ++++---
arch/arm/kernel/entry-armv.S | 4 +-
arch/arm/kernel/traps.c | 17 ++++++++
arch/arm/lib/getuser.S | 13 +++---
arch/arm/lib/putuser.S | 29 +++++++------
arch/arm/lib/uaccess.S | 83 +++++++++++++++++++-------------------
arch/arm/mm/Kconfig | 8 ++++
arch/arm/mm/proc-macros.S | 7 +++
arch/arm/mm/proc-v7.S | 5 +-
12 files changed, 150 insertions(+), 85 deletions(-)
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 00f46d9..cc84083 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -18,6 +18,7 @@
#endif
#include <asm/ptrace.h>
+#include <asm/domain.h>
/*
* Endian independent macros for shifting bytes within registers.
@@ -183,12 +184,12 @@
*/
#ifdef CONFIG_THUMB2_KERNEL
- .macro usraccoff, instr, reg, ptr, inc, off, cond, abort
+ .macro usraccoff, instr, reg, ptr, inc, off, cond, abort, t=T()
9999:
.if \inc == 1
- \instr\cond\()bt \reg, [\ptr, #\off]
+ \instr\cond\()b\()\t\().w \reg, [\ptr, #\off]
.elseif \inc == 4
- \instr\cond\()t \reg, [\ptr, #\off]
+ \instr\cond\()\t\().w \reg, [\ptr, #\off]
.else
.error "Unsupported inc macro argument"
.endif
@@ -223,13 +224,13 @@
#else /* !CONFIG_THUMB2_KERNEL */
- .macro usracc, instr, reg, ptr, inc, cond, rept, abort
+ .macro usracc, instr, reg, ptr, inc, cond, rept, abort, t=T()
.rept \rept
9999:
.if \inc == 1
- \instr\cond\()bt \reg, [\ptr], #\inc
+ \instr\cond\()b\()\t \reg, [\ptr], #\inc
.elseif \inc == 4
- \instr\cond\()t \reg, [\ptr], #\inc
+ \instr\cond\()\t \reg, [\ptr], #\inc
.else
.error "Unsupported inc macro argument"
.endif
diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h
index cc7ef40..af18cea 100644
--- a/arch/arm/include/asm/domain.h
+++ b/arch/arm/include/asm/domain.h
@@ -45,13 +45,17 @@
*/
#define DOMAIN_NOACCESS 0
#define DOMAIN_CLIENT 1
+#ifdef CONFIG_CPU_USE_DOMAINS
#define DOMAIN_MANAGER 3
+#else
+#define DOMAIN_MANAGER 1
+#endif
#define domain_val(dom,type) ((type) << (2*(dom)))
#ifndef __ASSEMBLY__
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
#define set_domain(x) \
do { \
__asm__ __volatile__( \
@@ -74,5 +78,28 @@
#define modify_domain(dom,type) do { } while (0)
#endif
+/*
+ * Generate the T (user) versions of the LDR/STR and related
+ * instructions (inline assembly)
+ */
+#ifdef CONFIG_CPU_USE_DOMAINS
+#define T(instr) #instr "t"
+#else
+#define T(instr) #instr
#endif
-#endif /* !__ASSEMBLY__ */
+
+#else /* __ASSEMBLY__ */
+
+/*
+ * Generate the T (user) versions of the LDR/STR and related
+ * instructions
+ */
+#ifdef CONFIG_CPU_USE_DOMAINS
+#define T(instr) instr ## t
+#else
+#define T(instr) instr
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* !__ASM_PROC_DOMAIN_H */
diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index bfcc159..8d868bd 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -13,12 +13,13 @@
#include <linux/preempt.h>
#include <linux/uaccess.h>
#include <asm/errno.h>
+#include <asm/domain.h>
#define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \
__asm__ __volatile__( \
- "1: ldrt %1, [%2]\n" \
+ "1: " T(ldr) " %1, [%2]\n" \
" " insn "\n" \
- "2: strt %0, [%2]\n" \
+ "2: " T(str) " %0, [%2]\n" \
" mov %0, #0\n" \
"3:\n" \
" .section __ex_table,\"a\"\n" \
@@ -97,10 +98,10 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
pagefault_disable(); /* implies preempt_disable() */
__asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
- "1: ldrt %0, [%3]\n"
+ "1: " T(ldr) " %0, [%3]\n"
" teq %0, %1\n"
" it eq @ explicit IT needed for the 2b label\n"
- "2: streqt %2, [%3]\n"
+ "2: " T(streq) " %2, [%3]\n"
"3:\n"
" .section __ex_table,\"a\"\n"
" .align 3\n"
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 1d6bd40..e4d0905 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -227,7 +227,7 @@ do { \
#define __get_user_asm_byte(x,addr,err) \
__asm__ __volatile__( \
- "1: ldrbt %1,[%2]\n" \
+ "1: " T(ldrb) " %1,[%2],#0\n" \
"2:\n" \
" .section .fixup,\"ax\"\n" \
" .align 2\n" \
@@ -263,7 +263,7 @@ do { \
#define __get_user_asm_word(x,addr,err) \
__asm__ __volatile__( \
- "1: ldrt %1,[%2]\n" \
+ "1: " T(ldr) " %1,[%2],#0\n" \
"2:\n" \
" .section .fixup,\"ax\"\n" \
" .align 2\n" \
@@ -308,7 +308,7 @@ do { \
#define __put_user_asm_byte(x,__pu_addr,err) \
__asm__ __volatile__( \
- "1: strbt %1,[%2]\n" \
+ "1: " T(strb) " %1,[%2],#0\n" \
"2:\n" \
" .section .fixup,\"ax\"\n" \
" .align 2\n" \
@@ -341,7 +341,7 @@ do { \
#define __put_user_asm_word(x,__pu_addr,err) \
__asm__ __volatile__( \
- "1: strt %1,[%2]\n" \
+ "1: " T(str) " %1,[%2],#0\n" \
"2:\n" \
" .section .fixup,\"ax\"\n" \
" .align 2\n" \
@@ -366,10 +366,10 @@ do { \
#define __put_user_asm_dword(x,__pu_addr,err) \
__asm__ __volatile__( \
- ARM( "1: strt " __reg_oper1 ", [%1], #4\n" ) \
- ARM( "2: strt " __reg_oper0 ", [%1]\n" ) \
- THUMB( "1: strt " __reg_oper1 ", [%1]\n" ) \
- THUMB( "2: strt " __reg_oper0 ", [%1, #4]\n" ) \
+ ARM( "1: " T(str) " " __reg_oper1 ", [%1], #4\n" ) \
+ ARM( "2: " T(str) " " __reg_oper0 ", [%1]\n" ) \
+ THUMB( "1: " T(str) " " __reg_oper1 ", [%1]\n" ) \
+ THUMB( "2: " T(str) " " __reg_oper0 ", [%1, #4]\n" ) \
"3:\n" \
" .section .fixup,\"ax\"\n" \
" .align 2\n" \
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 6c5cf36..694f7ab 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -736,7 +736,7 @@ ENTRY(__switch_to)
THUMB( stmia ip!, {r4 - sl, fp} ) @ Store most regs on stack
THUMB( str sp, [ip], #4 )
THUMB( str lr, [ip], #4 )
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
ldr r6, [r2, #TI_CPU_DOMAIN]
#endif
#if defined(CONFIG_HAS_TLS_REG)
@@ -745,7 +745,7 @@ ENTRY(__switch_to)
mov r4, #0xffff0fff
str r3, [r4, #-15] @ TLS val at 0xffff0ff0
#endif
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
mcr p15, 0, r6, c3, c0, 0 @ Set domain register
#endif
mov r5, r0
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1621e53..6571e19 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -30,6 +30,7 @@
#include <asm/unistd.h>
#include <asm/traps.h>
#include <asm/unwind.h>
+#include <asm/tlbflush.h>
#include "ptrace.h"
#include "signal.h"
@@ -750,6 +751,16 @@ void __init early_trap_init(void)
extern char __vectors_start[], __vectors_end[];
extern char __kuser_helper_start[], __kuser_helper_end[];
int kuser_sz = __kuser_helper_end - __kuser_helper_start;
+#if !defined(CONFIG_CPU_USE_DOMAINS) && defined(CONFIG_MMU)
+ pgd_t *pgd = pgd_offset_k(vectors);
+ pmd_t *pmd = pmd_offset(pgd, vectors);
+ pte_t *pte = pte_offset_kernel(pmd, vectors);
+ pte_t entry = *pte;
+
+ /* allow writing to the vectors page */
+ set_pte_ext(pte, pte_mkwrite(entry), 0);
+ local_flush_tlb_kernel_page(vectors);
+#endif
/*
* Copy the vectors, stubs and kuser helpers (in entry-armv.S)
@@ -769,6 +780,12 @@ void __init early_trap_init(void)
memcpy((void *)KERN_RESTART_CODE, syscall_restart_code,
sizeof(syscall_restart_code));
+#if !defined(CONFIG_CPU_USE_DOMAINS) && defined(CONFIG_MMU)
+ /* restore the vectors page permissions */
+ set_pte_ext(pte, entry, 0);
+ local_flush_tlb_kernel_page(vectors);
+#endif
+
flush_icache_range(vectors, vectors + PAGE_SIZE);
modify_domain(DOMAIN_USER, DOMAIN_CLIENT);
}
diff --git a/arch/arm/lib/getuser.S b/arch/arm/lib/getuser.S
index a1814d9..acc966b 100644
--- a/arch/arm/lib/getuser.S
+++ b/arch/arm/lib/getuser.S
@@ -28,20 +28,21 @@
*/
#include <linux/linkage.h>
#include <asm/errno.h>
+#include <asm/domain.h>
ENTRY(__get_user_1)
-1: ldrbt r2, [r0]
+1: T(ldrb) r2, [r0]
mov r0, #0
mov pc, lr
ENDPROC(__get_user_1)
ENTRY(__get_user_2)
#ifdef CONFIG_THUMB2_KERNEL
-2: ldrbt r2, [r0]
-3: ldrbt r3, [r0, #1]
+2: T(ldrb) r2, [r0]
+3: T(ldrb) r3, [r0, #1]
#else
-2: ldrbt r2, [r0], #1
-3: ldrbt r3, [r0]
+2: T(ldrb) r2, [r0], #1
+3: T(ldrb) r3, [r0]
#endif
#ifndef __ARMEB__
orr r2, r2, r3, lsl #8
@@ -53,7 +54,7 @@ ENTRY(__get_user_2)
ENDPROC(__get_user_2)
ENTRY(__get_user_4)
-4: ldrt r2, [r0]
+4: T(ldr) r2, [r0]
mov r0, #0
mov pc, lr
ENDPROC(__get_user_4)
diff --git a/arch/arm/lib/putuser.S b/arch/arm/lib/putuser.S
index 02fedbf..95b3fe8 100644
--- a/arch/arm/lib/putuser.S
+++ b/arch/arm/lib/putuser.S
@@ -28,9 +28,10 @@
*/
#include <linux/linkage.h>
#include <asm/errno.h>
+#include <asm/domain.h>
ENTRY(__put_user_1)
-1: strbt r2, [r0]
+1: T(strb) r2, [r0]
mov r0, #0
mov pc, lr
ENDPROC(__put_user_1)
@@ -39,19 +40,19 @@ ENTRY(__put_user_2)
mov ip, r2, lsr #8
#ifdef CONFIG_THUMB2_KERNEL
#ifndef __ARMEB__
-2: strbt r2, [r0]
-3: strbt ip, [r0, #1]
+2: T(strb) r2, [r0]
+3: T(strb) ip, [r0, #1]
#else
-2: strbt ip, [r0]
-3: strbt r2, [r0, #1]
+2: T(strb) ip, [r0]
+3: T(strb) r2, [r0, #1]
#endif
#else /* !CONFIG_THUMB2_KERNEL */
#ifndef __ARMEB__
-2: strbt r2, [r0], #1
-3: strbt ip, [r0]
+2: T(strb) r2, [r0], #1
+3: T(strb) ip, [r0]
#else
-2: strbt ip, [r0], #1
-3: strbt r2, [r0]
+2: T(strb) ip, [r0], #1
+3: T(strb) r2, [r0]
#endif
#endif /* CONFIG_THUMB2_KERNEL */
mov r0, #0
@@ -59,18 +60,18 @@ ENTRY(__put_user_2)
ENDPROC(__put_user_2)
ENTRY(__put_user_4)
-4: strt r2, [r0]
+4: T(str) r2, [r0]
mov r0, #0
mov pc, lr
ENDPROC(__put_user_4)
ENTRY(__put_user_8)
#ifdef CONFIG_THUMB2_KERNEL
-5: strt r2, [r0]
-6: strt r3, [r0, #4]
+5: T(str) r2, [r0]
+6: T(str) r3, [r0, #4]
#else
-5: strt r2, [r0], #4
-6: strt r3, [r0]
+5: T(str) r2, [r0], #4
+6: T(str) r3, [r0]
#endif
mov r0, #0
mov pc, lr
diff --git a/arch/arm/lib/uaccess.S b/arch/arm/lib/uaccess.S
index ffdd274..e47cdfd 100644
--- a/arch/arm/lib/uaccess.S
+++ b/arch/arm/lib/uaccess.S
@@ -14,6 +14,7 @@
#include <linux/linkage.h>
#include <asm/assembler.h>
#include <asm/errno.h>
+#include <asm/domain.h>
.text
@@ -31,11 +32,11 @@
rsb ip, ip, #4
cmp ip, #2
ldrb r3, [r1], #1
-USER( strbt r3, [r0], #1) @ May fault
+USER( T(strb) r3, [r0], #1) @ May fault
ldrgeb r3, [r1], #1
-USER( strgebt r3, [r0], #1) @ May fault
+USER( T(strgeb) r3, [r0], #1) @ May fault
ldrgtb r3, [r1], #1
-USER( strgtbt r3, [r0], #1) @ May fault
+USER( T(strgtb) r3, [r0], #1) @ May fault
sub r2, r2, ip
b .Lc2u_dest_aligned
@@ -58,7 +59,7 @@ ENTRY(__copy_to_user)
addmi ip, r2, #4
bmi .Lc2u_0nowords
ldr r3, [r1], #4
-USER( strt r3, [r0], #4) @ May fault
+USER( T(str) r3, [r0], #4) @ May fault
mov ip, r0, lsl #32 - PAGE_SHIFT @ On each page, use a ld/st??t instruction
rsb ip, ip, #0
movs ip, ip, lsr #32 - PAGE_SHIFT
@@ -87,18 +88,18 @@ USER( strt r3, [r0], #4) @ May fault
stmneia r0!, {r3 - r4} @ Shouldnt fault
tst ip, #4
ldrne r3, [r1], #4
- strnet r3, [r0], #4 @ Shouldnt fault
+ T(strne) r3, [r0], #4 @ Shouldnt fault
ands ip, ip, #3
beq .Lc2u_0fupi
.Lc2u_0nowords: teq ip, #0
beq .Lc2u_finished
.Lc2u_nowords: cmp ip, #2
ldrb r3, [r1], #1
-USER( strbt r3, [r0], #1) @ May fault
+USER( T(strb) r3, [r0], #1) @ May fault
ldrgeb r3, [r1], #1
-USER( strgebt r3, [r0], #1) @ May fault
+USER( T(strgeb) r3, [r0], #1) @ May fault
ldrgtb r3, [r1], #1
-USER( strgtbt r3, [r0], #1) @ May fault
+USER( T(strgtb) r3, [r0], #1) @ May fault
b .Lc2u_finished
.Lc2u_not_enough:
@@ -119,7 +120,7 @@ USER( strgtbt r3, [r0], #1) @ May fault
mov r3, r7, pull #8
ldr r7, [r1], #4
orr r3, r3, r7, push #24
-USER( strt r3, [r0], #4) @ May fault
+USER( T(str) r3, [r0], #4) @ May fault
mov ip, r0, lsl #32 - PAGE_SHIFT
rsb ip, ip, #0
movs ip, ip, lsr #32 - PAGE_SHIFT
@@ -154,18 +155,18 @@ USER( strt r3, [r0], #4) @ May fault
movne r3, r7, pull #8
ldrne r7, [r1], #4
orrne r3, r3, r7, push #24
- strnet r3, [r0], #4 @ Shouldnt fault
+ T(strne) r3, [r0], #4 @ Shouldnt fault
ands ip, ip, #3
beq .Lc2u_1fupi
.Lc2u_1nowords: mov r3, r7, get_byte_1
teq ip, #0
beq .Lc2u_finished
cmp ip, #2
-USER( strbt r3, [r0], #1) @ May fault
+USER( T(strb) r3, [r0], #1) @ May fault
movge r3, r7, get_byte_2
-USER( strgebt r3, [r0], #1) @ May fault
+USER( T(strgeb) r3, [r0], #1) @ May fault
movgt r3, r7, get_byte_3
-USER( strgtbt r3, [r0], #1) @ May fault
+USER( T(strgtb) r3, [r0], #1) @ May fault
b .Lc2u_finished
.Lc2u_2fupi: subs r2, r2, #4
@@ -174,7 +175,7 @@ USER( strgtbt r3, [r0], #1) @ May fault
mov r3, r7, pull #16
ldr r7, [r1], #4
orr r3, r3, r7, push #16
-USER( strt r3, [r0], #4) @ May fault
+USER( T(str) r3, [r0], #4) @ May fault
mov ip, r0, lsl #32 - PAGE_SHIFT
rsb ip, ip, #0
movs ip, ip, lsr #32 - PAGE_SHIFT
@@ -209,18 +210,18 @@ USER( strt r3, [r0], #4) @ May fault
movne r3, r7, pull #16
ldrne r7, [r1], #4
orrne r3, r3, r7, push #16
- strnet r3, [r0], #4 @ Shouldnt fault
+ T(strne) r3, [r0], #4 @ Shouldnt fault
ands ip, ip, #3
beq .Lc2u_2fupi
.Lc2u_2nowords: mov r3, r7, get_byte_2
teq ip, #0
beq .Lc2u_finished
cmp ip, #2
-USER( strbt r3, [r0], #1) @ May fault
+USER( T(strb) r3, [r0], #1) @ May fault
movge r3, r7, get_byte_3
-USER( strgebt r3, [r0], #1) @ May fault
+USER( T(strgeb) r3, [r0], #1) @ May fault
ldrgtb r3, [r1], #0
-USER( strgtbt r3, [r0], #1) @ May fault
+USER( T(strgtb) r3, [r0], #1) @ May fault
b .Lc2u_finished
.Lc2u_3fupi: subs r2, r2, #4
@@ -229,7 +230,7 @@ USER( strgtbt r3, [r0], #1) @ May fault
mov r3, r7, pull #24
ldr r7, [r1], #4
orr r3, r3, r7, push #8
-USER( strt r3, [r0], #4) @ May fault
+USER( T(str) r3, [r0], #4) @ May fault
mov ip, r0, lsl #32 - PAGE_SHIFT
rsb ip, ip, #0
movs ip, ip, lsr #32 - PAGE_SHIFT
@@ -264,18 +265,18 @@ USER( strt r3, [r0], #4) @ May fault
movne r3, r7, pull #24
ldrne r7, [r1], #4
orrne r3, r3, r7, push #8
- strnet r3, [r0], #4 @ Shouldnt fault
+ T(strne) r3, [r0], #4 @ Shouldnt fault
ands ip, ip, #3
beq .Lc2u_3fupi
.Lc2u_3nowords: mov r3, r7, get_byte_3
teq ip, #0
beq .Lc2u_finished
cmp ip, #2
-USER( strbt r3, [r0], #1) @ May fault
+USER( T(strb) r3, [r0], #1) @ May fault
ldrgeb r3, [r1], #1
-USER( strgebt r3, [r0], #1) @ May fault
+USER( T(strgeb) r3, [r0], #1) @ May fault
ldrgtb r3, [r1], #0
-USER( strgtbt r3, [r0], #1) @ May fault
+USER( T(strgtb) r3, [r0], #1) @ May fault
b .Lc2u_finished
ENDPROC(__copy_to_user)
@@ -294,11 +295,11 @@ ENDPROC(__copy_to_user)
.Lcfu_dest_not_aligned:
rsb ip, ip, #4
cmp ip, #2
-USER( ldrbt r3, [r1], #1) @ May fault
+USER( T(ldrb) r3, [r1], #1) @ May fault
strb r3, [r0], #1
-USER( ldrgebt r3, [r1], #1) @ May fault
+USER( T(ldrgeb) r3, [r1], #1) @ May fault
strgeb r3, [r0], #1
-USER( ldrgtbt r3, [r1], #1) @ May fault
+USER( T(ldrgtb) r3, [r1], #1) @ May fault
strgtb r3, [r0], #1
sub r2, r2, ip
b .Lcfu_dest_aligned
@@ -321,7 +322,7 @@ ENTRY(__copy_from_user)
.Lcfu_0fupi: subs r2, r2, #4
addmi ip, r2, #4
bmi .Lcfu_0nowords
-USER( ldrt r3, [r1], #4)
+USER( T(ldr) r3, [r1], #4)
str r3, [r0], #4
mov ip, r1, lsl #32 - PAGE_SHIFT @ On each page, use a ld/st??t instruction
rsb ip, ip, #0
@@ -350,18 +351,18 @@ USER( ldrt r3, [r1], #4)
ldmneia r1!, {r3 - r4} @ Shouldnt fault
stmneia r0!, {r3 - r4}
tst ip, #4
- ldrnet r3, [r1], #4 @ Shouldnt fault
+ T(ldrne) r3, [r1], #4 @ Shouldnt fault
strne r3, [r0], #4
ands ip, ip, #3
beq .Lcfu_0fupi
.Lcfu_0nowords: teq ip, #0
beq .Lcfu_finished
.Lcfu_nowords: cmp ip, #2
-USER( ldrbt r3, [r1], #1) @ May fault
+USER( T(ldrb) r3, [r1], #1) @ May fault
strb r3, [r0], #1
-USER( ldrgebt r3, [r1], #1) @ May fault
+USER( T(ldrgeb) r3, [r1], #1) @ May fault
strgeb r3, [r0], #1
-USER( ldrgtbt r3, [r1], #1) @ May fault
+USER( T(ldrgtb) r3, [r1], #1) @ May fault
strgtb r3, [r0], #1
b .Lcfu_finished
@@ -374,7 +375,7 @@ USER( ldrgtbt r3, [r1], #1) @ May fault
.Lcfu_src_not_aligned:
bic r1, r1, #3
-USER( ldrt r7, [r1], #4) @ May fault
+USER( T(ldr) r7, [r1], #4) @ May fault
cmp ip, #2
bgt .Lcfu_3fupi
beq .Lcfu_2fupi
@@ -382,7 +383,7 @@ USER( ldrt r7, [r1], #4) @ May fault
addmi ip, r2, #4
bmi .Lcfu_1nowords
mov r3, r7, pull #8
-USER( ldrt r7, [r1], #4) @ May fault
+USER( T(ldr) r7, [r1], #4) @ May fault
orr r3, r3, r7, push #24
str r3, [r0], #4
mov ip, r1, lsl #32 - PAGE_SHIFT
@@ -417,7 +418,7 @@ USER( ldrt r7, [r1], #4) @ May fault
stmneia r0!, {r3 - r4}
tst ip, #4
movne r3, r7, pull #8
-USER( ldrnet r7, [r1], #4) @ May fault
+USER( T(ldrne) r7, [r1], #4) @ May fault
orrne r3, r3, r7, push #24
strne r3, [r0], #4
ands ip, ip, #3
@@ -437,7 +438,7 @@ USER( ldrnet r7, [r1], #4) @ May fault
addmi ip, r2, #4
bmi .Lcfu_2nowords
mov r3, r7, pull #16
-USER( ldrt r7, [r1], #4) @ May fault
+USER( T(ldr) r7, [r1], #4) @ May fault
orr r3, r3, r7, push #16
str r3, [r0], #4
mov ip, r1, lsl #32 - PAGE_SHIFT
@@ -473,7 +474,7 @@ USER( ldrt r7, [r1], #4) @ May fault
stmneia r0!, {r3 - r4}
tst ip, #4
movne r3, r7, pull #16
-USER( ldrnet r7, [r1], #4) @ May fault
+USER( T(ldrne) r7, [r1], #4) @ May fault
orrne r3, r3, r7, push #16
strne r3, [r0], #4
ands ip, ip, #3
@@ -485,7 +486,7 @@ USER( ldrnet r7, [r1], #4) @ May fault
strb r3, [r0], #1
movge r3, r7, get_byte_3
strgeb r3, [r0], #1
-USER( ldrgtbt r3, [r1], #0) @ May fault
+USER( T(ldrgtb) r3, [r1], #0) @ May fault
strgtb r3, [r0], #1
b .Lcfu_finished
@@ -493,7 +494,7 @@ USER( ldrgtbt r3, [r1], #0) @ May fault
addmi ip, r2, #4
bmi .Lcfu_3nowords
mov r3, r7, pull #24
-USER( ldrt r7, [r1], #4) @ May fault
+USER( T(ldr) r7, [r1], #4) @ May fault
orr r3, r3, r7, push #8
str r3, [r0], #4
mov ip, r1, lsl #32 - PAGE_SHIFT
@@ -528,7 +529,7 @@ USER( ldrt r7, [r1], #4) @ May fault
stmneia r0!, {r3 - r4}
tst ip, #4
movne r3, r7, pull #24
-USER( ldrnet r7, [r1], #4) @ May fault
+USER( T(ldrne) r7, [r1], #4) @ May fault
orrne r3, r3, r7, push #8
strne r3, [r0], #4
ands ip, ip, #3
@@ -538,9 +539,9 @@ USER( ldrnet r7, [r1], #4) @ May fault
beq .Lcfu_finished
cmp ip, #2
strb r3, [r0], #1
-USER( ldrgebt r3, [r1], #1) @ May fault
+USER( T(ldrgeb) r3, [r1], #1) @ May fault
strgeb r3, [r0], #1
-USER( ldrgtbt r3, [r1], #1) @ May fault
+USER( T(ldrgtb) r3, [r1], #1) @ May fault
strgtb r3, [r0], #1
b .Lcfu_finished
ENDPROC(__copy_from_user)
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 5df74c1..755e081 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -597,6 +597,14 @@ config CPU_CP15_MPU
help
Processor has the CP15 register, which has MPU related registers.
+config CPU_USE_DOMAINS
+ bool
+ depends on MMU
+ default y if !HAS_TLS_REG
+ help
+ This option enables or disables the use of domain switching
+ via the set_fs() function.
+
#
# CPU supports 36-bit I/O
#
diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
index 7d63bea..337f102 100644
--- a/arch/arm/mm/proc-macros.S
+++ b/arch/arm/mm/proc-macros.S
@@ -99,6 +99,10 @@
* 110x 0 1 0 r/w r/o
* 11x0 0 1 0 r/w r/o
* 1111 0 1 1 r/w r/w
+ *
+ * If !CONFIG_CPU_USE_DOMAINS, the following permissions are changed:
+ * 110x 1 1 1 r/o r/o
+ * 11x0 1 1 1 r/o r/o
*/
.macro armv6_mt_table pfx
\pfx\()_mt_table:
@@ -138,8 +142,11 @@
tst r1, #L_PTE_USER
orrne r3, r3, #PTE_EXT_AP1
+#ifdef CONFIG_CPU_USE_DOMAINS
+ @ allow kernel read/write access to read-only user pages
tstne r3, #PTE_EXT_APX
bicne r3, r3, #PTE_EXT_APX | PTE_EXT_AP0
+#endif
tst r1, #L_PTE_EXEC
orreq r3, r3, #PTE_EXT_XN
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 7aaf88a..c1c3fe0 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -152,8 +152,11 @@ ENTRY(cpu_v7_set_pte_ext)
tst r1, #L_PTE_USER
orrne r3, r3, #PTE_EXT_AP1
+#ifdef CONFIG_CPU_USE_DOMAINS
+ @ allow kernel read/write access to read-only user pages
tstne r3, #PTE_EXT_APX
bicne r3, r3, #PTE_EXT_APX | PTE_EXT_AP0
+#endif
tst r1, #L_PTE_EXEC
orreq r3, r3, #PTE_EXT_XN
@@ -240,8 +243,6 @@ __v7_setup:
mcr p15, 0, r10, c2, c0, 2 @ TTB control register
orr r4, r4, #TTB_FLAGS
mcr p15, 0, r4, c2, c0, 1 @ load TTB1
- mov r10, #0x1f @ domains 0, 1 = manager
- mcr p15, 0, r10, c3, c0, 0 @ load domain access register
/*
* Memory region attributes with SCTLR.TRE=1
*
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
` (3 preceding siblings ...)
2010-05-04 16:44 ` [PATCH 4/8] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
2010-05-04 17:19 ` Russell King - ARM Linux
2010-05-04 16:44 ` [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU Catalin Marinas
` (3 subsequent siblings)
8 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
To: linux-arm-kernel
Commit 31aa8fd6 introduced the __arm_ioremap_caller() function but the
nommu.c version did not have the _caller suffix.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/mm/nommu.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
index 9bfeb6b..f8791ee 100644
--- a/arch/arm/mm/nommu.c
+++ b/arch/arm/mm/nommu.c
@@ -87,8 +87,8 @@ void __iomem *__arm_ioremap(unsigned long phys_addr, size_t size,
}
EXPORT_SYMBOL(__arm_ioremap);
-void __iomem *__arm_ioremap(unsigned long phys_addr, size_t size,
- unsigned int mtype, void *caller)
+void __iomem *__arm_ioremap_caller(unsigned long phys_addr, size_t size,
+ unsigned int mtype, void *caller)
{
return __arm_ioremap(phys_addr, size, mtype);
}
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
` (4 preceding siblings ...)
2010-05-04 16:44 ` [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
2010-05-04 17:19 ` Russell King - ARM Linux
2010-05-04 16:44 ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP Catalin Marinas
` (2 subsequent siblings)
8 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
To: linux-arm-kernel
Commit 7959722 introduced calls to copy_(to|from)_user_page() from
access_process_vm() in mm/nommu.c. The copy_to_user_page() was not
implemented on noMMU ARM.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/mm/nommu.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
index f8791ee..33b3273 100644
--- a/arch/arm/mm/nommu.c
+++ b/arch/arm/mm/nommu.c
@@ -65,6 +65,15 @@ void flush_dcache_page(struct page *page)
}
EXPORT_SYMBOL(flush_dcache_page);
+void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
+ unsigned long uaddr, void *dst, const void *src,
+ unsigned long len)
+{
+ memcpy(dst, src, len);
+ if (vma->vm_flags & VM_EXEC)
+ __cpuc_coherent_user_range(uaddr, uaddr + len);
+}
+
void __iomem *__arm_ioremap_pfn(unsigned long pfn, unsigned long offset,
size_t size, unsigned int mtype)
{
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
` (5 preceding siblings ...)
2010-05-04 16:44 ` [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
2010-05-28 18:50 ` Russell King - ARM Linux
2010-05-04 16:44 ` [PATCH 8/8] ARM: Implement phys_mem_access_prot() to avoid attributes aliasing Catalin Marinas
2010-05-04 16:48 ` [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
8 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
To: linux-arm-kernel
The standard I-cache Invalidate All (ICIALLU) and Branch Predication
Invalidate All (BPIALL) operations are not automatically broadcast to
the other CPUs in an ARMv7 MP system. The patch adds the Inner Shareable
variants, ICIALLUIS and BPIALLIS, if ARMv7 and SMP.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/cacheflush.h | 4 ++++
arch/arm/include/asm/tlbflush.h | 29 ++++++++++++++++++++++++++++-
arch/arm/mm/cache-v7.S | 4 ++++
arch/arm/mm/tlb-v7.S | 8 ++++++++
4 files changed, 44 insertions(+), 1 deletions(-)
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 0d08d41..4656a24 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -371,6 +371,10 @@ static inline void __flush_icache_all(void)
#ifdef CONFIG_ARM_ERRATA_411920
extern void v6_icache_inval_all(void);
v6_icache_inval_all();
+#elif defined(CONFIG_SMP) && __LINUX_ARM_ARCH__ >= 7
+ asm("mcr p15, 0, %0, c7, c1, 0 @ invalidate I-cache inner shareable\n"
+ :
+ : "r" (0));
#else
asm("mcr p15, 0, %0, c7, c5, 0 @ invalidate I-cache\n"
:
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index e085e2c..bd863d8 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -46,6 +46,9 @@
#define TLB_V7_UIS_FULL (1 << 20)
#define TLB_V7_UIS_ASID (1 << 21)
+/* Inner Shareable BTB operation (ARMv7 MP extensions) */
+#define TLB_V7_IS_BTB (1 << 22)
+
#define TLB_L2CLEAN_FR (1 << 29) /* Feroceon */
#define TLB_DCLEAN (1 << 30)
#define TLB_WB (1 << 31)
@@ -183,7 +186,7 @@
#endif
#ifdef CONFIG_SMP
-#define v7wbi_tlb_flags (TLB_WB | TLB_DCLEAN | TLB_BTB | \
+#define v7wbi_tlb_flags (TLB_WB | TLB_DCLEAN | TLB_V7_IS_BTB | \
TLB_V7_UIS_FULL | TLB_V7_UIS_PAGE | TLB_V7_UIS_ASID)
#else
#define v7wbi_tlb_flags (TLB_WB | TLB_DCLEAN | TLB_BTB | \
@@ -339,6 +342,12 @@ static inline void local_flush_tlb_all(void)
dsb();
isb();
}
+ if (tlb_flag(TLB_V7_IS_BTB)) {
+ /* flush the branch target cache */
+ asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero) : "cc");
+ dsb();
+ isb();
+ }
}
static inline void local_flush_tlb_mm(struct mm_struct *mm)
@@ -376,6 +385,12 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
asm("mcr p15, 0, %0, c7, c5, 6" : : "r" (zero) : "cc");
dsb();
}
+ if (tlb_flag(TLB_V7_IS_BTB)) {
+ /* flush the branch target cache */
+ asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero) : "cc");
+ dsb();
+ isb();
+ }
}
static inline void
@@ -416,6 +431,12 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
asm("mcr p15, 0, %0, c7, c5, 6" : : "r" (zero) : "cc");
dsb();
}
+ if (tlb_flag(TLB_V7_IS_BTB)) {
+ /* flush the branch target cache */
+ asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero) : "cc");
+ dsb();
+ isb();
+ }
}
static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
@@ -454,6 +475,12 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
dsb();
isb();
}
+ if (tlb_flag(TLB_V7_IS_BTB)) {
+ /* flush the branch target cache */
+ asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero) : "cc");
+ dsb();
+ isb();
+ }
}
/*
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index bcd64f2..06a90dc 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -167,7 +167,11 @@ ENTRY(v7_coherent_user_range)
cmp r0, r1
blo 1b
mov r0, #0
+#ifdef CONFIG_SMP
+ mcr p15, 0, r0, c7, c1, 6 @ invalidate BTB Inner Shareable
+#else
mcr p15, 0, r0, c7, c5, 6 @ invalidate BTB
+#endif
dsb
isb
mov pc, lr
diff --git a/arch/arm/mm/tlb-v7.S b/arch/arm/mm/tlb-v7.S
index 0cb1848..f3f288a 100644
--- a/arch/arm/mm/tlb-v7.S
+++ b/arch/arm/mm/tlb-v7.S
@@ -50,7 +50,11 @@ ENTRY(v7wbi_flush_user_tlb_range)
cmp r0, r1
blo 1b
mov ip, #0
+#ifdef CONFIG_SMP
+ mcr p15, 0, ip, c7, c1, 6 @ flush BTAC/BTB Inner Shareable
+#else
mcr p15, 0, ip, c7, c5, 6 @ flush BTAC/BTB
+#endif
dsb
mov pc, lr
ENDPROC(v7wbi_flush_user_tlb_range)
@@ -79,7 +83,11 @@ ENTRY(v7wbi_flush_kern_tlb_range)
cmp r0, r1
blo 1b
mov r2, #0
+#ifdef CONFIG_SMP
+ mcr p15, 0, r2, c7, c1, 6 @ flush BTAC/BTB Inner Shareable
+#else
mcr p15, 0, r2, c7, c5, 6 @ flush BTAC/BTB
+#endif
dsb
isb
mov pc, lr
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 8/8] ARM: Implement phys_mem_access_prot() to avoid attributes aliasing
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
` (6 preceding siblings ...)
2010-05-04 16:44 ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP Catalin Marinas
@ 2010-05-04 16:44 ` Catalin Marinas
2010-05-04 16:48 ` [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
8 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:44 UTC (permalink / raw)
To: linux-arm-kernel
ARMv7 onwards requires that there are no aliases to the same physical
location using different memory types (i.e. Normal vs Strongly Ordered).
Access to SO mappings when the unaligned accesses are handled in
hardware is also Unpredictable (pgprot_noncached() mappings in user
space).
The /dev/mem driver requires uncached mappings with O_SYNC. The patch
implements the phys_mem_access_prot() function which generates Strongly
Ordered memory attributes if !pfn_valid() (independent of O_SYNC) and
Normal Noncacheable (writecombine) if O_SYNC.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/pgtable.h | 3 +++
arch/arm/mm/mmu.c | 14 ++++++++++++++
2 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 1139768..6fae619 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -317,6 +317,9 @@ static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
#if __LINUX_ARM_ARCH__ >= 7
#define pgprot_dmacoherent(prot) \
__pgprot_modify(prot, L_PTE_MT_MASK|L_PTE_EXEC, L_PTE_MT_BUFFERABLE)
+#define __HAVE_PHYS_MEM_ACCESS_PROT
+extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+ unsigned long size, pgprot_t vma_prot);
#else
#define pgprot_dmacoherent(prot) \
__pgprot_modify(prot, L_PTE_MT_MASK|L_PTE_EXEC, L_PTE_MT_UNCACHED)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 241c24a..deb62cc 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -14,6 +14,7 @@
#include <linux/bootmem.h>
#include <linux/mman.h>
#include <linux/nodemask.h>
+#include <linux/fs.h>
#include <asm/cputype.h>
#include <asm/mach-types.h>
@@ -485,6 +486,19 @@ static void __init build_mem_type_table(void)
}
}
+#if __LINUX_ARM_ARCH__ >= 7
+pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+ unsigned long size, pgprot_t vma_prot)
+{
+ if (!pfn_valid(pfn))
+ return pgprot_noncached(vma_prot);
+ else if (file->f_flags & O_SYNC)
+ return pgprot_writecombine(vma_prot);
+ return vma_prot;
+}
+EXPORT_SYMBOL(phys_mem_access_prot);
+#endif
+
#define vectors_base() (vectors_high() ? 0xffff0000 : 0)
static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 0/8] Various patches for comments and upstream
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
` (7 preceding siblings ...)
2010-05-04 16:44 ` [PATCH 8/8] ARM: Implement phys_mem_access_prot() to avoid attributes aliasing Catalin Marinas
@ 2010-05-04 16:48 ` Catalin Marinas
8 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-04 16:48 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 2010-05-04 at 17:44 +0100, Catalin Marinas wrote:
> drivers/net/smsc911x.c | 92 +++++++++++++++++++++----------------
> mm/bootmem.c | 2 +
> mm/page_alloc.c | 1
The summary stats here are wrong (StGit not dealing well with
non-consecutive patches in my series). The individual patches that I
posted are fine.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used
2010-05-04 16:44 ` [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used Catalin Marinas
@ 2010-05-04 17:02 ` Jason McMullan
2010-05-05 16:07 ` Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: Jason McMullan @ 2010-05-04 17:02 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 4, 2010 at 12:44 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> With this L2 cache controller, the cache maintenance by PA and sync
> operations are atomic and do not require a "wait" loop or spinlocks.
> This patch conditionally defines the cache_wait() function and locking
> primitives (rather than duplicating the functions or file).
>
> Since L2x0 cache controllers do not work with ARMv7 CPUs, the patch
> automatically enables CACHE_PL310 when CPU_V7 is defined.
>
> [snip snip snip]
> ?static inline void l2x0_inv_all(void)
> @@ -107,11 +134,11 @@ static inline void l2x0_inv_all(void)
> ? ? ? ?unsigned long flags;
>
> ? ? ? ?/* invalidate all ways */
> - ? ? ? spin_lock_irqsave(&l2x0_lock, flags);
> + ? ? ? _l2x0_lock(&l2x0_lock, flags);
> ? ? ? ?writel(0xff, l2x0_base + L2X0_INV_WAY);
> - ? ? ? cache_wait(l2x0_base + L2X0_INV_WAY, 0xff);
> + ? ? ? cache_wait_always(l2x0_base + L2X0_INV_WAY, 0xff);
> ? ? ? ?cache_sync();
> - ? ? ? spin_unlock_irqrestore(&l2x0_lock, flags);
> + ? ? ? _l2x0_unlock(&l2x0_lock, flags);
> ?}
So, ah, shouldn't you be using a mask of 0xffff for 16-way PL310s?
And I think we have a potential patch collision in the near future.
Could you integrate in my [arm l2x0] patch I posted today on the list?
It supports 16-way PL310s, and PL210s with fewer than 7 ways.
--
Jason S. McMullan
Netronome Systems, Inc.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
@ 2010-05-04 17:04 ` Jason McMullan
2010-05-05 16:23 ` Catalin Marinas
2010-05-05 13:26 ` George G. Davis
2010-05-12 12:51 ` Ronen Shitrit
2 siblings, 1 reply; 64+ messages in thread
From: Jason McMullan @ 2010-05-04 17:04 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 4, 2010 at 12:44 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> The Snoop Control Unit on the ARM11MPCore hardware does not detect the
> cache operations and the dma_cache_maint*() functions may leave stale
> cache entries on other CPUs. The solution implemented in this patch
> performs a Read or Write For Ownership in the ARMv6 DMA cache
> maintenance functions. These LDR/STR instructions change the cache line
> state to shared or exclusive so that the cache maintenance operation has
> the desired effect.
Is latter portion of this patch required only for SMP MPCore systems, or is
it also required for uniprocessor MPCore configurations?
--
Jason S. McMullan
Netronome Systems, Inc.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c
2010-05-04 16:44 ` [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c Catalin Marinas
@ 2010-05-04 17:19 ` Russell King - ARM Linux
0 siblings, 0 replies; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-04 17:19 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 04, 2010 at 05:44:43PM +0100, Catalin Marinas wrote:
> Commit 31aa8fd6 introduced the __arm_ioremap_caller() function but the
> nommu.c version did not have the _caller suffix.
Ok.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU
2010-05-04 16:44 ` [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU Catalin Marinas
@ 2010-05-04 17:19 ` Russell King - ARM Linux
0 siblings, 0 replies; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-04 17:19 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 04, 2010 at 05:44:48PM +0100, Catalin Marinas wrote:
> Commit 7959722 introduced calls to copy_(to|from)_user_page() from
> access_process_vm() in mm/nommu.c. The copy_to_user_page() was not
> implemented on noMMU ARM.
Ok.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
2010-05-04 17:04 ` Jason McMullan
@ 2010-05-05 13:26 ` George G. Davis
2010-05-06 14:40 ` Catalin Marinas
2010-05-12 12:51 ` Ronen Shitrit
2 siblings, 1 reply; 64+ messages in thread
From: George G. Davis @ 2010-05-05 13:26 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
Definitely need this for stable DMA on ARM11 MPCore.
On Tue, May 04, 2010 at 05:44:26PM +0100, Catalin Marinas wrote:
> The Snoop Control Unit on the ARM11MPCore hardware does not detect the
> cache operations and the dma_cache_maint*() functions may leave stale
> cache entries on other CPUs. The solution implemented in this patch
> performs a Read or Write For Ownership in the ARMv6 DMA cache
> maintenance functions. These LDR/STR instructions change the cache line
> state to shared or exclusive so that the cache maintenance operation has
> the desired effect.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: George G. Davis <gdavis@mvista.com>
---
FWIW, lack of ARM11 MPCore DMA cache coherency has been a problem for
well over two years now and it would be good if we can finally get this
fixed in mainline. Without this applied on current, I observe various
oopses and/or filesystem errors which are resolved by this patch.
If there is some other testing that I can do to help getting this
or some other variation accepted for mainline, let me know.
Thanks!
--
Regards,
George
> ---
> arch/arm/mm/cache-v6.S | 17 +++++++++++++----
> 1 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
> index 9d89c67..e46ecd8 100644
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -211,6 +211,9 @@ v6_dma_inv_range:
> mcrne p15, 0, r1, c7, c15, 1 @ clean & invalidate unified line
> #endif
> 1:
> +#ifdef CONFIG_SMP
> + str r0, [r0] @ write for ownership
> +#endif
> #ifdef HARVARD_CACHE
> mcr p15, 0, r0, c7, c6, 1 @ invalidate D line
> #else
> @@ -231,6 +234,9 @@ v6_dma_inv_range:
> v6_dma_clean_range:
> bic r0, r0, #D_CACHE_LINE_SIZE - 1
> 1:
> +#ifdef CONFIG_SMP
> + ldr r2, [r0] @ read for ownership
> +#endif
> #ifdef HARVARD_CACHE
> mcr p15, 0, r0, c7, c10, 1 @ clean D line
> #else
> @@ -251,6 +257,10 @@ v6_dma_clean_range:
> ENTRY(v6_dma_flush_range)
> bic r0, r0, #D_CACHE_LINE_SIZE - 1
> 1:
> +#ifdef CONFIG_SMP
> + ldr r2, [r0] @ read for ownership
> + str r2, [r0] @ write for ownership
> +#endif
> #ifdef HARVARD_CACHE
> mcr p15, 0, r0, c7, c14, 1 @ clean & invalidate D line
> #else
> @@ -273,7 +283,9 @@ ENTRY(v6_dma_map_area)
> add r1, r1, r0
> teq r2, #DMA_FROM_DEVICE
> beq v6_dma_inv_range
> - b v6_dma_clean_range
> + teq r2, #DMA_TO_DEVICE
> + beq v6_dma_clean_range
> + b v6_dma_flush_range
> ENDPROC(v6_dma_map_area)
>
> /*
> @@ -283,9 +295,6 @@ ENDPROC(v6_dma_map_area)
> * - dir - DMA direction
> */
> ENTRY(v6_dma_unmap_area)
> - add r1, r1, r0
> - teq r2, #DMA_TO_DEVICE
> - bne v6_dma_inv_range
> mov pc, lr
> ENDPROC(v6_dma_unmap_area)
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used
2010-05-04 17:02 ` Jason McMullan
@ 2010-05-05 16:07 ` Catalin Marinas
0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-05 16:07 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 2010-05-04 at 18:02 +0100, Jason McMullan wrote:
> On Tue, May 4, 2010 at 12:44 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > With this L2 cache controller, the cache maintenance by PA and sync
> > operations are atomic and do not require a "wait" loop or spinlocks.
> > This patch conditionally defines the cache_wait() function and locking
> > primitives (rather than duplicating the functions or file).
> >
> > Since L2x0 cache controllers do not work with ARMv7 CPUs, the patch
> > automatically enables CACHE_PL310 when CPU_V7 is defined.
> >
> > [snip snip snip]
> > static inline void l2x0_inv_all(void)
> > @@ -107,11 +134,11 @@ static inline void l2x0_inv_all(void)
> > unsigned long flags;
> >
> > /* invalidate all ways */
> > - spin_lock_irqsave(&l2x0_lock, flags);
> > + _l2x0_lock(&l2x0_lock, flags);
> > writel(0xff, l2x0_base + L2X0_INV_WAY);
> > - cache_wait(l2x0_base + L2X0_INV_WAY, 0xff);
> > + cache_wait_always(l2x0_base + L2X0_INV_WAY, 0xff);
> > cache_sync();
> > - spin_unlock_irqrestore(&l2x0_lock, flags);
> > + _l2x0_unlock(&l2x0_lock, flags);
> > }
>
> So, ah, shouldn't you be using a mask of 0xffff for 16-way PL310s?
>
> And I think we have a potential patch collision in the near future.
>
> Could you integrate in my [arm l2x0] patch I posted today on the list?
>
> It supports 16-way PL310s, and PL210s with fewer than 7 ways.
I think the best way is to just go ahead and push your patch to
Russell's patch system, I'll update my patch based on yours once I get
some feedback on this list.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-04 17:04 ` Jason McMullan
@ 2010-05-05 16:23 ` Catalin Marinas
0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-05 16:23 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 2010-05-04 at 18:04 +0100, Jason McMullan wrote:
> On Tue, May 4, 2010 at 12:44 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > The Snoop Control Unit on the ARM11MPCore hardware does not detect the
> > cache operations and the dma_cache_maint*() functions may leave stale
> > cache entries on other CPUs. The solution implemented in this patch
> > performs a Read or Write For Ownership in the ARMv6 DMA cache
> > maintenance functions. These LDR/STR instructions change the cache line
> > state to shared or exclusive so that the cache maintenance operation has
> > the desired effect.
>
> Is latter portion of this patch required only for SMP MPCore systems, or is
> it also required for uniprocessor MPCore configurations?
It's required for MPCore to work correctly but it's harmless for UP.
It's actually a performance improvement since we don't invalidate the
D-cache twice in map and unmap (and we don't have speculative fetches
into the D-cache on v6).
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-05 13:26 ` George G. Davis
@ 2010-05-06 14:40 ` Catalin Marinas
2010-05-06 15:57 ` George G. Davis
0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-06 14:40 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 2010-05-05 at 14:26 +0100, George G. Davis wrote:
> Definitely need this for stable DMA on ARM11 MPCore.
>
> On Tue, May 04, 2010 at 05:44:26PM +0100, Catalin Marinas wrote:
> > The Snoop Control Unit on the ARM11MPCore hardware does not detect the
> > cache operations and the dma_cache_maint*() functions may leave stale
> > cache entries on other CPUs. The solution implemented in this patch
> > performs a Read or Write For Ownership in the ARMv6 DMA cache
> > maintenance functions. These LDR/STR instructions change the cache line
> > state to shared or exclusive so that the cache maintenance operation has
> > the desired effect.
> >
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>
> Tested-by: George G. Davis <gdavis@mvista.com>
Thanks.
> FWIW, lack of ARM11 MPCore DMA cache coherency has been a problem for
> well over two years now and it would be good if we can finally get this
> fixed in mainline. Without this applied on current, I observe various
> oopses and/or filesystem errors which are resolved by this patch.
What platform was this tested on?
> If there is some other testing that I can do to help getting this
> or some other variation accepted for mainline, let me know.
The more platforms tested, the better. Anyway, I don't think we have
much choice in how this workaround is implemented, unless the hardware
supports FIQs.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-06 14:40 ` Catalin Marinas
@ 2010-05-06 15:57 ` George G. Davis
0 siblings, 0 replies; 64+ messages in thread
From: George G. Davis @ 2010-05-06 15:57 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 06, 2010 at 03:40:32PM +0100, Catalin Marinas wrote:
> On Wed, 2010-05-05 at 14:26 +0100, George G. Davis wrote:
> > Definitely need this for stable DMA on ARM11 MPCore.
> >
> > On Tue, May 04, 2010 at 05:44:26PM +0100, Catalin Marinas wrote:
> > > The Snoop Control Unit on the ARM11MPCore hardware does not detect the
> > > cache operations and the dma_cache_maint*() functions may leave stale
> > > cache entries on other CPUs. The solution implemented in this patch
> > > performs a Read or Write For Ownership in the ARMv6 DMA cache
> > > maintenance functions. These LDR/STR instructions change the cache line
> > > state to shared or exclusive so that the cache maintenance operation has
> > > the desired effect.
> > >
> > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> >
> > Tested-by: George G. Davis <gdavis@mvista.com>
>
> Thanks.
>
> > FWIW, lack of ARM11 MPCore DMA cache coherency has been a problem for
> > well over two years now and it would be good if we can finally get this
> > fixed in mainline. Without this applied on current, I observe various
> > oopses and/or filesystem errors which are resolved by this patch.
>
> What platform was this tested on?
Tested on the EMMA Car Series EC-4260 board [1] and (EC-4270) NE1BOARD [2]
using ext4 root filesystem over USB Mass Storage and SD Card. Here is
the ID Code Register, same for both systems (I've given up trying to make
sense as to whether this is r1p0, r0p4 or other : ):
CPU: ARMv6-compatible processor [410fb024] revision 4 (ARMv7), cr=08c5387f
> > If there is some other testing that I can do to help getting this
> > or some other variation accepted for mainline, let me know.
>
> The more platforms tested, the better. Anyway, I don't think we have
> much choice in how this workaround is implemented, unless the hardware
> supports FIQs.
FWIW, I've also tried testing on a RealView EB ARM11 MPCore [3] but
unfortunately it has a rather old core (r0p0/[410fb020]) which no
longer boots on linux-2.6.34-rc (but oddly still works ok on
v2.6.33).
Thanks!
--
Regards,
George
[1] http://www.arm.linux.org.uk/developer/machines/list.php?id=2639
[2] http://www.arm.linux.org.uk/developer/machines/list.php?id=1655
[3] http://www.arm.linux.org.uk/developer/machines/list.php?id=827
>
> --
> Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
2010-05-04 17:04 ` Jason McMullan
2010-05-05 13:26 ` George G. Davis
@ 2010-05-12 12:51 ` Ronen Shitrit
2010-05-12 13:55 ` Catalin Marinas
2 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-12 12:51 UTC (permalink / raw)
To: linux-arm-kernel
Hi
I'm using ARM v6 MP core.
I have applied this patch above 2.6.34-rc7 and my system hang while booting from NFS (through e1000 nic).
Previously I used 2.6.32 with the cache broadcast IPI mask and it worked just fine for me.
I have noticed that if I remove the str, write for ownership, in the dma_inv the system boot just fine.
- Why was the IPI solution dropped?
- I can think of few cases the above str might be problematic:
1) CPU 0 is doing inv, after the str cpu 1 change the line and then cpu 0 inv the line... probably won't happen since the dma buffer shouldn't be used by more the one CPU...
2) CPU 0 is doing inv, the str write garbage which get to the DRAM because of eviction and break the assumption that inv shouldn't change the DRAM...
Regards
Ronen Shitrit
-----Original Message-----
From: linux-arm-kernel-bounces@lists.infradead.org [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of Catalin Marinas
Sent: Tuesday, May 04, 2010 7:44 PM
To: linux-arm-kernel at lists.infradead.org
Subject: [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
The Snoop Control Unit on the ARM11MPCore hardware does not detect the
cache operations and the dma_cache_maint*() functions may leave stale
cache entries on other CPUs. The solution implemented in this patch
performs a Read or Write For Ownership in the ARMv6 DMA cache
maintenance functions. These LDR/STR instructions change the cache line
state to shared or exclusive so that the cache maintenance operation has
the desired effect.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/mm/cache-v6.S | 17 +++++++++++++----
1 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 9d89c67..e46ecd8 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -211,6 +211,9 @@ v6_dma_inv_range:
mcrne p15, 0, r1, c7, c15, 1 @ clean & invalidate unified line
#endif
1:
+#ifdef CONFIG_SMP
+ str r0, [r0] @ write for ownership
+#endif
#ifdef HARVARD_CACHE
mcr p15, 0, r0, c7, c6, 1 @ invalidate D line
#else
@@ -231,6 +234,9 @@ v6_dma_inv_range:
v6_dma_clean_range:
bic r0, r0, #D_CACHE_LINE_SIZE - 1
1:
+#ifdef CONFIG_SMP
+ ldr r2, [r0] @ read for ownership
+#endif
#ifdef HARVARD_CACHE
mcr p15, 0, r0, c7, c10, 1 @ clean D line
#else
@@ -251,6 +257,10 @@ v6_dma_clean_range:
ENTRY(v6_dma_flush_range)
bic r0, r0, #D_CACHE_LINE_SIZE - 1
1:
+#ifdef CONFIG_SMP
+ ldr r2, [r0] @ read for ownership
+ str r2, [r0] @ write for ownership
+#endif
#ifdef HARVARD_CACHE
mcr p15, 0, r0, c7, c14, 1 @ clean & invalidate D line
#else
@@ -273,7 +283,9 @@ ENTRY(v6_dma_map_area)
add r1, r1, r0
teq r2, #DMA_FROM_DEVICE
beq v6_dma_inv_range
- b v6_dma_clean_range
+ teq r2, #DMA_TO_DEVICE
+ beq v6_dma_clean_range
+ b v6_dma_flush_range
ENDPROC(v6_dma_map_area)
/*
@@ -283,9 +295,6 @@ ENDPROC(v6_dma_map_area)
* - dir - DMA direction
*/
ENTRY(v6_dma_unmap_area)
- add r1, r1, r0
- teq r2, #DMA_TO_DEVICE
- bne v6_dma_inv_range
mov pc, lr
ENDPROC(v6_dma_unmap_area)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel at lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-12 12:51 ` Ronen Shitrit
@ 2010-05-12 13:55 ` Catalin Marinas
2010-05-12 15:03 ` Ronen Shitrit
2010-05-12 18:48 ` Russell King - ARM Linux
0 siblings, 2 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-12 13:55 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 2010-05-12 at 15:51 +0300, Ronen Shitrit wrote:
> I'm using ARM v6 MP core.
> I have applied this patch above 2.6.34-rc7 and my system hang while
> booting from NFS (through e1000 nic).
Does the whole kernel hang or it's just user space?
> Previously I used 2.6.32 with the cache broadcast IPI mask and it
> worked just fine for me.
> I have noticed that if I remove the str, write for ownership, in the
> dma_inv the system boot just fine.
The only way to invalidate the caches on all the CPUs is to force a
write so that the cache line is moved to modified/exclusive mode (no
other CPUs has the cache line) and then invalidate.
Drivers shouldn't call dma_inv_range if they have valid date in that
buffer (that's why I removed the dma_inv_range call from
dma_unmap_area).
Could try the patch below:
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index e46ecd8..332b48c 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -212,7 +212,8 @@ v6_dma_inv_range:
#endif
1:
#ifdef CONFIG_SMP
- str r0, [r0] @ write for ownership
+ ldr r2, [r0] @ read for ownership
+ str r2, [r0] @ write for ownership
#endif
#ifdef HARVARD_CACHE
mcr p15, 0, r0, c7, c6, 1 @ invalidate D line
> - Why was the IPI solution dropped?
It can cause deadlock in some circumstances (there was a thread some
time ago).
> - I can think of few cases the above str might be problematic:
> 1) CPU 0 is doing inv, after the str cpu 1 change the line and then
> cpu 0 inv the line... probably won't happen since the dma buffer
> shouldn't be used by more the one CPU...
This shouldn't happen. Even without my patch, doing a cache line
invalidation would erase the data written by CPU1, so the same
corruption.
> 2) CPU 0 is doing inv, the str write garbage which get to the DRAM
> because of eviction and break the assumption that inv shouldn't change
> the DRAM...
This may be the case but is there such situation? The above patch should
fix it but it slows down the invalidate operation further.
Thanks.
--
Catalin
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-12 13:55 ` Catalin Marinas
@ 2010-05-12 15:03 ` Ronen Shitrit
2010-05-12 18:48 ` Russell King - ARM Linux
1 sibling, 0 replies; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-12 15:03 UTC (permalink / raw)
To: linux-arm-kernel
Thanks for the fast response.
The kernel doesn't hang but the NFS can't complete loading the FS, seems like DMA data corruption...
The patch below seems to solve the issue :)
I guess we can't assure that the 2nd scenario below won't happen...
Regards
Ronen Shitrit
-----Original Message-----
From: Catalin Marinas [mailto:catalin.marinas at arm.com]
Sent: Wednesday, May 12, 2010 4:56 PM
To: Ronen Shitrit
Cc: linux-arm-kernel at lists.infradead.org
Subject: RE: [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
On Wed, 2010-05-12 at 15:51 +0300, Ronen Shitrit wrote:
> I'm using ARM v6 MP core.
> I have applied this patch above 2.6.34-rc7 and my system hang while
> booting from NFS (through e1000 nic).
Does the whole kernel hang or it's just user space?
> Previously I used 2.6.32 with the cache broadcast IPI mask and it
> worked just fine for me.
> I have noticed that if I remove the str, write for ownership, in the
> dma_inv the system boot just fine.
The only way to invalidate the caches on all the CPUs is to force a
write so that the cache line is moved to modified/exclusive mode (no
other CPUs has the cache line) and then invalidate.
Drivers shouldn't call dma_inv_range if they have valid date in that
buffer (that's why I removed the dma_inv_range call from
dma_unmap_area).
Could try the patch below:
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index e46ecd8..332b48c 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -212,7 +212,8 @@ v6_dma_inv_range:
#endif
1:
#ifdef CONFIG_SMP
- str r0, [r0] @ write for ownership
+ ldr r2, [r0] @ read for ownership
+ str r2, [r0] @ write for ownership
#endif
#ifdef HARVARD_CACHE
mcr p15, 0, r0, c7, c6, 1 @ invalidate D line
> - Why was the IPI solution dropped?
It can cause deadlock in some circumstances (there was a thread some
time ago).
> - I can think of few cases the above str might be problematic:
> 1) CPU 0 is doing inv, after the str cpu 1 change the line and then
> cpu 0 inv the line... probably won't happen since the dma buffer
> shouldn't be used by more the one CPU...
This shouldn't happen. Even without my patch, doing a cache line
invalidation would erase the data written by CPU1, so the same
corruption.
> 2) CPU 0 is doing inv, the str write garbage which get to the DRAM
> because of eviction and break the assumption that inv shouldn't change
> the DRAM...
This may be the case but is there such situation? The above patch should
fix it but it slows down the invalidate operation further.
Thanks.
--
Catalin
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-12 13:55 ` Catalin Marinas
2010-05-12 15:03 ` Ronen Shitrit
@ 2010-05-12 18:48 ` Russell King - ARM Linux
2010-05-12 18:59 ` Russell King - ARM Linux
1 sibling, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-12 18:48 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, May 12, 2010 at 02:55:39PM +0100, Catalin Marinas wrote:
> Drivers shouldn't call dma_inv_range if they have valid date in that
> buffer (that's why I removed the dma_inv_range call from
> dma_unmap_area).
That (removal of dma_inv_range) is the wrong approach - if drivers are
expecting data in the cache to be preserved over a dma_unmap_area() call,
they are buggy and need fixing.
However...
> > 2) CPU 0 is doing inv, the str write garbage which get to the DRAM
> > because of eviction and break the assumption that inv shouldn't change
> > the DRAM...
>
> This may be the case but is there such situation?
This is why there's a problem - if we (rightfully) have dma_unmap_area()
call dma_inv_range, and dma_inv_range() corrupts the SDRAM, that's where
the bug is - dma_unmap_area() must not change the contents of SDRAM.
Aren't there CPUs which speculatively prefetch _and_ which don't have
broadcast cache ops? If yes, then we can't use the "read/write to
gain ownership" approach - and since we can't use IPIs either, I think
we're sadly boxed in by hardware restrictions to the point of not being
able to run with DMA on these CPUs.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-12 18:48 ` Russell King - ARM Linux
@ 2010-05-12 18:59 ` Russell King - ARM Linux
2010-05-12 20:00 ` Ronen Shitrit
2010-05-12 21:21 ` [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 " Catalin Marinas
0 siblings, 2 replies; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-12 18:59 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, May 12, 2010 at 07:48:52PM +0100, Russell King - ARM Linux wrote:
> Aren't there CPUs which speculatively prefetch _and_ which don't have
> broadcast cache ops? If yes, then we can't use the "read/write to
> gain ownership" approach - and since we can't use IPIs either, I think
> we're sadly boxed in by hardware restrictions to the point of not being
> able to run with DMA on these CPUs.
I just had a second thought that what I wrote above was tosh, but then
had a third thought which reconfirmed it as a valid point...
Consider if the CPU speculatively prefetches the line you're going to
read before the DMA has completed, and it prefetches the pre-DMA data.
Your read loads the pre-DMA data, and the write writes it back to the
cache line. An interrupt happens, and the cache line gets evicted some
time later before the invalidate line operation - overwriting the DMA
data, thereby corrupting it.
So, a CPU which speculatively prefetches _and_ doesn't broadcast cache
operations _is_ a big problem. I hope we don't have any.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-12 18:59 ` Russell King - ARM Linux
@ 2010-05-12 20:00 ` Ronen Shitrit
2010-05-12 20:04 ` Russell King - ARM Linux
2010-05-12 21:21 ` [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 " Catalin Marinas
1 sibling, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-12 20:00 UTC (permalink / raw)
To: linux-arm-kernel
This is a good point, assuming the patch will lock the int a round the
dma cache calls, can you think of any data coruption issues?
Thanks
Sent from my phone
On 12/05/2010, at 21:59, "Russell King - ARM Linux" <linux@arm.linux.org.uk
> wrote:
> On Wed, May 12, 2010 at 07:48:52PM +0100, Russell King - ARM Linux
> wrote:
>> Aren't there CPUs which speculatively prefetch _and_ which don't have
>> broadcast cache ops? If yes, then we can't use the "read/write to
>> gain ownership" approach - and since we can't use IPIs either, I
>> think
>> we're sadly boxed in by hardware restrictions to the point of not
>> being
>> able to run with DMA on these CPUs.
>
> I just had a second thought that what I wrote above was tosh, but then
> had a third thought which reconfirmed it as a valid point...
>
> Consider if the CPU speculatively prefetches the line you're going to
> read before the DMA has completed, and it prefetches the pre-DMA data.
>
> Your read loads the pre-DMA data, and the write writes it back to the
> cache line. An interrupt happens, and the cache line gets evicted
> some
> time later before the invalidate line operation - overwriting the DMA
> data, thereby corrupting it.
>
> So, a CPU which speculatively prefetches _and_ doesn't broadcast cache
> operations _is_ a big problem. I hope we don't have any.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-12 20:00 ` Ronen Shitrit
@ 2010-05-12 20:04 ` Russell King - ARM Linux
2010-05-12 20:19 ` Ronen Shitrit
0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-12 20:04 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, May 12, 2010 at 11:00:42PM +0300, Ronen Shitrit wrote:
> This is a good point, assuming the patch will lock the int a round the
> dma cache calls, can you think of any data coruption issues?
FIQs?
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops
2010-05-12 20:04 ` Russell King - ARM Linux
@ 2010-05-12 20:19 ` Ronen Shitrit
0 siblings, 0 replies; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-12 20:19 UTC (permalink / raw)
To: linux-arm-kernel
Let's assume no FIQs...
Sent from my phone
On 12/05/2010, at 23:04, "Russell King - ARM Linux" <linux at arm.linux.org.uk<mailto:linux@arm.linux.org.uk>> wrote:
On Wed, May 12, 2010 at 11:00:42PM +0300, Ronen Shitrit wrote:
This is a good point, assuming the patch will lock the int a round the
dma cache calls, can you think of any data coruption issues?
FIQs?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20100512/3b64fe4e/attachment-0001.htm>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-12 18:59 ` Russell King - ARM Linux
2010-05-12 20:00 ` Ronen Shitrit
@ 2010-05-12 21:21 ` Catalin Marinas
2010-05-13 5:27 ` Ronen Shitrit
1 sibling, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-12 21:21 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 2010-05-12 at 19:59 +0100, Russell King - ARM Linux wrote:
> On Wed, May 12, 2010 at 07:48:52PM +0100, Russell King - ARM Linux wrote:
> > Aren't there CPUs which speculatively prefetch _and_ which don't have
> > broadcast cache ops? If yes, then we can't use the "read/write to
> > gain ownership" approach - and since we can't use IPIs either, I think
> > we're sadly boxed in by hardware restrictions to the point of not being
> > able to run with DMA on these CPUs.
>
> I just had a second thought that what I wrote above was tosh, but then
> had a third thought which reconfirmed it as a valid point...
>
> Consider if the CPU speculatively prefetches the line you're going to
> read before the DMA has completed, and it prefetches the pre-DMA data.
I won't consider this scenario :). For such cores there isn't an easy
workaround to the DMA cache operations. Even with IPI via FIQ, you may
still have the risk of cache lines migrating around CPUs and missing the
cache flushing.
AFAICT, ARM11MPCore doesn't do this.
>
Since my implementation of v6_dma_inv_range is destructive, I removed it
from dma_unmap_area() with the precondition that the CPU doesn't do
speculative accesses. With my patch, the dma_unmap_area() doesn't change
the content of the SDRAM.
I don't entirely understand why corruption happens with the e1000 driver
(looking at the e1000.c code it uses consistent memory).
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-12 21:21 ` [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 " Catalin Marinas
@ 2010-05-13 5:27 ` Ronen Shitrit
2010-05-13 8:26 ` Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-13 5:27 UTC (permalink / raw)
To: linux-arm-kernel
Our ARM v6 does have speculative prefetch...
Sent from my phone
On 13/05/2010, at 00:21, "Catalin Marinas" <catalin.marinas@arm.com>
wrote:
> On Wed, 2010-05-12 at 19:59 +0100, Russell King - ARM Linux wrote:
>> On Wed, May 12, 2010 at 07:48:52PM +0100, Russell King - ARM Linux
>> wrote:
>>> Aren't there CPUs which speculatively prefetch _and_ which don't
>>> have
>>> broadcast cache ops? If yes, then we can't use the "read/write to
>>> gain ownership" approach - and since we can't use IPIs either, I
>>> think
>>> we're sadly boxed in by hardware restrictions to the point of not
>>> being
>>> able to run with DMA on these CPUs.
>>
>> I just had a second thought that what I wrote above was tosh, but
>> then
>> had a third thought which reconfirmed it as a valid point...
>>
>> Consider if the CPU speculatively prefetches the line you're going to
>> read before the DMA has completed, and it prefetches the pre-DMA
>> data.
>
> I won't consider this scenario :). For such cores there isn't an easy
> workaround to the DMA cache operations. Even with IPI via FIQ, you may
> still have the risk of cache lines migrating around CPUs and missing
> the
> cache flushing.
>
> AFAICT, ARM11MPCore doesn't do this.
>>
> Since my implementation of v6_dma_inv_range is destructive, I
> removed it
> from dma_unmap_area() with the precondition that the CPU doesn't do
> speculative accesses. With my patch, the dma_unmap_area() doesn't
> change
> the content of the SDRAM.
>
> I don't entirely understand why corruption happens with the e1000
> driver
> (looking at the e1000.c code it uses consistent memory).
>
> --
> Catalin
>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-13 5:27 ` Ronen Shitrit
@ 2010-05-13 8:26 ` Catalin Marinas
2010-05-13 13:54 ` George G. Davis
2010-05-16 6:29 ` Ronen Shitrit
0 siblings, 2 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-13 8:26 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> Our ARM v6 does have speculative prefetch...
That's the I-cache and we should be ok with it. What may affect this
patch is speculative fetches into the D-cache.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-13 8:26 ` Catalin Marinas
@ 2010-05-13 13:54 ` George G. Davis
2010-05-13 14:15 ` Catalin Marinas
2010-05-16 6:29 ` Ronen Shitrit
1 sibling, 1 reply; 64+ messages in thread
From: George G. Davis @ 2010-05-13 13:54 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
On Thu, May 13, 2010 at 09:26:57AM +0100, Catalin Marinas wrote:
> On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> > Our ARM v6 does have speculative prefetch...
>
> That's the I-cache and we should be ok with it. What may affect this
> patch is speculative fetches into the D-cache.
prefetch()?
--
Regards,
George
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-13 13:54 ` George G. Davis
@ 2010-05-13 14:15 ` Catalin Marinas
2010-05-13 20:34 ` George G. Davis
0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-13 14:15 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 2010-05-13 at 14:54 +0100, George G. Davis wrote:
> Hi,
>
> On Thu, May 13, 2010 at 09:26:57AM +0100, Catalin Marinas wrote:
> > On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> > > Our ARM v6 does have speculative prefetch...
> >
> > That's the I-cache and we should be ok with it. What may affect this
> > patch is speculative fetches into the D-cache.
>
> prefetch()?
Drivers should not use prefetch (PLD) on memory handed over to a device
for DMA transfers. To me this is like explicitly accessing such memory
during a DMA transfer.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-13 14:15 ` Catalin Marinas
@ 2010-05-13 20:34 ` George G. Davis
2010-05-14 16:29 ` Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: George G. Davis @ 2010-05-13 20:34 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
On Thu, May 13, 2010 at 03:15:20PM +0100, Catalin Marinas wrote:
> On Thu, 2010-05-13 at 14:54 +0100, George G. Davis wrote:
> > Hi,
> >
> > On Thu, May 13, 2010 at 09:26:57AM +0100, Catalin Marinas wrote:
> > > On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> > > > Our ARM v6 does have speculative prefetch...
> > >
> > > That's the I-cache and we should be ok with it. What may affect this
> > > patch is speculative fetches into the D-cache.
> >
> > prefetch()?
>
> Drivers should not use prefetch (PLD) on memory handed over to a device
> for DMA transfers. To me this is like explicitly accessing such memory
> during a DMA transfer.
I agree and was merely speculating that prefetch() may be a case where
D-Cache lines are explicitly prefetched and may be the cause of this
problem, e.g. from e1000_clean_rx_irq() in drivers/net/e1000e/netdev.c:
while (rx_desc->status & E1000_RXD_STAT_DD) {
struct sk_buff *skb;
u8 status;
if (*work_done >= work_to_do)
break;
(*work_done)++;
status = rx_desc->status;
skb = buffer_info->skb;
buffer_info->skb = NULL;
prefetch(skb->data - NET_IP_ALIGN);
i++;
if (i == rx_ring->count)
i = 0;
next_rxd = E1000_RX_DESC(*rx_ring, i);
prefetch(next_rxd);
next_buffer = &rx_ring->buffer_info[i];
cleaned = 1;
cleaned_count++;
pci_unmap_single(pdev,
buffer_info->dma,
adapter->rx_buffer_len,
PCI_DMA_FROMDEVICE);
I'm to lazy^Wbusy to drill down but could the above prefetch(skb->data...)
before pci_unmap_single() or similar cases be causing the problem reported
by Ronen?
I think the above may be one of those "if drivers are expecting data in
the cache to be preserved over a dma_unmap_area() call, they are buggy
and need fixing." Russell noted in a previous post, no?
--
Regards,
George
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-13 20:34 ` George G. Davis
@ 2010-05-14 16:29 ` Catalin Marinas
2010-05-14 16:42 ` Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-14 16:29 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 2010-05-13 at 21:34 +0100, George G. Davis wrote:
> On Thu, May 13, 2010 at 03:15:20PM +0100, Catalin Marinas wrote:
> > On Thu, 2010-05-13 at 14:54 +0100, George G. Davis wrote:
> > > On Thu, May 13, 2010 at 09:26:57AM +0100, Catalin Marinas wrote:
> > > > On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> > > > > Our ARM v6 does have speculative prefetch...
> > > >
> > > > That's the I-cache and we should be ok with it. What may affect this
> > > > patch is speculative fetches into the D-cache.
> > >
> > > prefetch()?
> >
> > Drivers should not use prefetch (PLD) on memory handed over to a device
> > for DMA transfers. To me this is like explicitly accessing such memory
> > during a DMA transfer.
>
> I agree and was merely speculating that prefetch() may be a case where
> D-Cache lines are explicitly prefetched and may be the cause of this
> problem, e.g. from e1000_clean_rx_irq() in drivers/net/e1000e/netdev.c:
>
> while (rx_desc->status & E1000_RXD_STAT_DD) {
> struct sk_buff *skb;
> u8 status;
>
> if (*work_done >= work_to_do)
> break;
> (*work_done)++;
>
> status = rx_desc->status;
> skb = buffer_info->skb;
> buffer_info->skb = NULL;
>
> prefetch(skb->data - NET_IP_ALIGN);
>
> i++;
> if (i == rx_ring->count)
> i = 0;
> next_rxd = E1000_RX_DESC(*rx_ring, i);
> prefetch(next_rxd);
>
> next_buffer = &rx_ring->buffer_info[i];
>
> cleaned = 1;
> cleaned_count++;
> pci_unmap_single(pdev,
> buffer_info->dma,
> adapter->rx_buffer_len,
> PCI_DMA_FROMDEVICE);
If the prefetch is done after the DMA transfer has completed, then there
shouldn't be any problem. I suspect that's the case in this interrupt
routine.
What I think I missed in my patch (and didn't show up with a sata drive)
is the "sync" operations. The dma_sync_single_for_device(FROMDEVICE)
corrupts the existing buffer.
I'll post an updated patch shortly.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-14 16:29 ` Catalin Marinas
@ 2010-05-14 16:42 ` Catalin Marinas
2010-05-15 1:26 ` George G. Davis
0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-14 16:42 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, 2010-05-14 at 17:29 +0100, Catalin Marinas wrote:
> What I think I missed in my patch (and didn't show up with a sata drive)
> is the "sync" operations. The dma_sync_single_for_device(FROMDEVICE)
> corrupts the existing buffer.
>
> I'll post an updated patch shortly.
Actually I think the hunk that I posted earlier should be enough:
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index a4a6840..aa3ac32 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -216,7 +216,8 @@ v6_dma_inv_range:
#endif
1:
#ifdef CONFIG_SMP
- str r0, [r0] @ write for ownership
+ ldr r2, [r0] @ read for ownership
+ str r2, [r0] @ write for ownership
#endif
#ifdef HARVARD_CACHE
mcr p15, 0, r0, c7, c6, 1 @ invalidate D line
The dma_sync_single_for_cpu() eventually calls dmac_unmap_area() which
on ARM11MPCore became a no-op. There is no problem since this function
call is always preceded by a call to either dma_map_single() or
dma_sync_single_to_device(), both of them calling dma_map_area() which
does the cache invalidation.
With the patch above, v6_dma_inv_range() is no longer destructive so the
transferred data to the SDRAM is preserved. We could even re-instate the
v6_dma_unmap_area() function but for performance reasons I would leave
it as a no-op.
Russell, are you ok with such change? Since you already applied the
original patch as #6111, I will send an additional patch with this fix
(I'm sending it directly to the patch system as I won't have access to
my work PC this weekend. Feel free to apply or reject).
Thanks.
--
Catalin
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-14 16:42 ` Catalin Marinas
@ 2010-05-15 1:26 ` George G. Davis
2010-05-16 6:28 ` Ronen Shitrit
0 siblings, 1 reply; 64+ messages in thread
From: George G. Davis @ 2010-05-15 1:26 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
On Fri, May 14, 2010 at 05:42:31PM +0100, Catalin Marinas wrote:
> On Fri, 2010-05-14 at 17:29 +0100, Catalin Marinas wrote:
> > What I think I missed in my patch (and didn't show up with a sata drive)
> > is the "sync" operations. The dma_sync_single_for_device(FROMDEVICE)
> > corrupts the existing buffer.
> >
> > I'll post an updated patch shortly.
>
> Actually I think the hunk that I posted earlier should be enough:
Um, apologies for the noise but I didn't understand if Ronen said
this resolved the e1000 problem or not.
Thanks!
--
Regards,
George
>
>
> diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
> index a4a6840..aa3ac32 100644
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -216,7 +216,8 @@ v6_dma_inv_range:
> #endif
> 1:
> #ifdef CONFIG_SMP
> - str r0, [r0] @ write for ownership
> + ldr r2, [r0] @ read for ownership
> + str r2, [r0] @ write for ownership
> #endif
> #ifdef HARVARD_CACHE
> mcr p15, 0, r0, c7, c6, 1 @ invalidate D line
>
>
> The dma_sync_single_for_cpu() eventually calls dmac_unmap_area() which
> on ARM11MPCore became a no-op. There is no problem since this function
> call is always preceded by a call to either dma_map_single() or
> dma_sync_single_to_device(), both of them calling dma_map_area() which
> does the cache invalidation.
>
> With the patch above, v6_dma_inv_range() is no longer destructive so the
> transferred data to the SDRAM is preserved. We could even re-instate the
> v6_dma_unmap_area() function but for performance reasons I would leave
> it as a no-op.
>
> Russell, are you ok with such change? Since you already applied the
> original patch as #6111, I will send an additional patch with this fix
> (I'm sending it directly to the patch system as I won't have access to
> my work PC this weekend. Feel free to apply or reject).
>
> Thanks.
>
> --
> Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-15 1:26 ` George G. Davis
@ 2010-05-16 6:28 ` Ronen Shitrit
0 siblings, 0 replies; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-16 6:28 UTC (permalink / raw)
To: linux-arm-kernel
Yes it does...
-----Original Message-----
From: George G. Davis [mailto:gdavis at mvista.com]
Sent: Saturday, May 15, 2010 4:26 AM
To: Catalin Marinas
Cc: Ronen Shitrit; Russell King - ARM Linux; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
Hi,
On Fri, May 14, 2010 at 05:42:31PM +0100, Catalin Marinas wrote:
> On Fri, 2010-05-14 at 17:29 +0100, Catalin Marinas wrote:
> > What I think I missed in my patch (and didn't show up with a sata drive)
> > is the "sync" operations. The dma_sync_single_for_device(FROMDEVICE)
> > corrupts the existing buffer.
> >
> > I'll post an updated patch shortly.
>
> Actually I think the hunk that I posted earlier should be enough:
Um, apologies for the noise but I didn't understand if Ronen said
this resolved the e1000 problem or not.
Thanks!
--
Regards,
George
>
>
> diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
> index a4a6840..aa3ac32 100644
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -216,7 +216,8 @@ v6_dma_inv_range:
> #endif
> 1:
> #ifdef CONFIG_SMP
> - str r0, [r0] @ write for ownership
> + ldr r2, [r0] @ read for ownership
> + str r2, [r0] @ write for ownership
> #endif
> #ifdef HARVARD_CACHE
> mcr p15, 0, r0, c7, c6, 1 @ invalidate D line
>
>
> The dma_sync_single_for_cpu() eventually calls dmac_unmap_area() which
> on ARM11MPCore became a no-op. There is no problem since this function
> call is always preceded by a call to either dma_map_single() or
> dma_sync_single_to_device(), both of them calling dma_map_area() which
> does the cache invalidation.
>
> With the patch above, v6_dma_inv_range() is no longer destructive so the
> transferred data to the SDRAM is preserved. We could even re-instate the
> v6_dma_unmap_area() function but for performance reasons I would leave
> it as a no-op.
>
> Russell, are you ok with such change? Since you already applied the
> original patch as #6111, I will send an additional patch with this fix
> (I'm sending it directly to the patch system as I won't have access to
> my work PC this weekend. Feel free to apply or reject).
>
> Thanks.
>
> --
> Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-13 8:26 ` Catalin Marinas
2010-05-13 13:54 ` George G. Davis
@ 2010-05-16 6:29 ` Ronen Shitrit
2010-05-16 15:01 ` Russell King - ARM Linux
2010-05-17 9:51 ` Catalin Marinas
1 sibling, 2 replies; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-16 6:29 UTC (permalink / raw)
To: linux-arm-kernel
Our ARMv6 do speculative rd for both I and D cache...
Regards
-----Original Message-----
From: Catalin Marinas [mailto:catalin.marinas at arm.com]
Sent: Thursday, May 13, 2010 11:27 AM
To: Ronen Shitrit
Cc: Russell King - ARM Linux; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
On Thu, 2010-05-13 at 06:27 +0100, Ronen Shitrit wrote:
> Our ARM v6 does have speculative prefetch...
That's the I-cache and we should be ok with it. What may affect this
patch is speculative fetches into the D-cache.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-16 6:29 ` Ronen Shitrit
@ 2010-05-16 15:01 ` Russell King - ARM Linux
2010-05-17 6:29 ` Ronen Shitrit
2010-05-17 9:51 ` Catalin Marinas
1 sibling, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-16 15:01 UTC (permalink / raw)
To: linux-arm-kernel
On Sun, May 16, 2010 at 09:29:47AM +0300, Ronen Shitrit wrote:
> Our ARMv6 do speculative rd for both I and D cache...
That means doing this read-write for ownership trick can lead to
corrupted data on your CPU, since the CPU may prefetch data from
the DMA buffer before the unmap has happened.
This means Catalin's patch is unsafe for your situation.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-16 15:01 ` Russell King - ARM Linux
@ 2010-05-17 6:29 ` Ronen Shitrit
2010-05-17 6:57 ` Russell King - ARM Linux
0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 6:29 UTC (permalink / raw)
To: linux-arm-kernel
Got it, thanks.
This is my patch over Catalin original patch for v6_dma_inv_range
--- cache-v6.S 2010-05-17 08:12:18.000000000 +0300
+++ cache-v6.S 2010-05-17 08:16:16.000000000 +0300
@@ -210,9 +210,15 @@
#else
mcrne p15, 0, r1, c7, c15, 1 @ clean & invalidate unified line
#endif
-1:
#ifdef CONFIG_SMP
- str r0, [r0] @ write for ownership
+ mrs r2, cpsr
+ orr r3, r2, #PSR_F_BIT | PSR_I_BIT
+ msr cpsr_c, r3 @ Disable interrupts
+1:
+ ldr r2, [r0] @ read for ownership
+ str r2, [r0] @ write for ownership
+#else
+1:
#endif
#ifdef HARVARD_CACHE
mcr p15, 0, r0, c7, c6, 1 @ invalidate D line
@@ -222,6 +228,9 @@
add r0, r0, #D_CACHE_LINE_SIZE
cmp r0, r1
blo 1b
+#ifdef CONFIG_SMP
+ msr cpsr_c, r2 @ Restore interrupts
+#endif
mov r0, #0
mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
-----Original Message-----
From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
Sent: Sunday, May 16, 2010 6:01 PM
To: Ronen Shitrit
Cc: Catalin Marinas; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
On Sun, May 16, 2010 at 09:29:47AM +0300, Ronen Shitrit wrote:
> Our ARMv6 do speculative rd for both I and D cache...
That means doing this read-write for ownership trick can lead to
corrupted data on your CPU, since the CPU may prefetch data from
the DMA buffer before the unmap has happened.
This means Catalin's patch is unsafe for your situation.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 6:29 ` Ronen Shitrit
@ 2010-05-17 6:57 ` Russell King - ARM Linux
2010-05-17 7:34 ` Ronen Shitrit
0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-17 6:57 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, May 17, 2010 at 09:29:35AM +0300, Ronen Shitrit wrote:
> Got it, thanks.
No you haven't; disabling interrupts doesn't prevent CPUs speculatively
prefetching.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 6:57 ` Russell King - ARM Linux
@ 2010-05-17 7:34 ` Ronen Shitrit
2010-05-17 7:43 ` Russell King - ARM Linux
2010-05-17 10:00 ` Catalin Marinas
0 siblings, 2 replies; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 7:34 UTC (permalink / raw)
To: linux-arm-kernel
After some more thinking, I think I found the problem...
DMA for receive should operate as follows:
1) CPU will map (invalidate) the buffers for the DMA
2) DMA will place the buffers and ack the CPU
3) The CPU will call unmap which will invalidate the buffers again
I can only think of one problematic scenario:
- After 1 and before 2:
CPU0 did spec prefetch for address x
- After 2:
CPU0 is doing inv: lock int, ldr x,
str x (addr x is marked dirty with wrong data).
CPU1 at this time doing spec prefetch for x which
will snoop CPU0 and will cause data corruption... :(
I can assume that if I count on the Soc IO cache coherency support all this isn't relevant, right? Or there are some issues hiding with v6 + IO coherency?
Thanks
Ronen Shitrit
-----Original Message-----
From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
Sent: Monday, May 17, 2010 9:57 AM
To: Ronen Shitrit
Cc: Catalin Marinas; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
On Mon, May 17, 2010 at 09:29:35AM +0300, Ronen Shitrit wrote:
> Got it, thanks.
No you haven't; disabling interrupts doesn't prevent CPUs speculatively
prefetching.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 7:34 ` Ronen Shitrit
@ 2010-05-17 7:43 ` Russell King - ARM Linux
2010-05-17 8:29 ` Ronen Shitrit
2010-05-17 10:00 ` Catalin Marinas
1 sibling, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-17 7:43 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, May 17, 2010 at 10:34:36AM +0300, Ronen Shitrit wrote:
> After some more thinking, I think I found the problem...
>
> DMA for receive should operate as follows:
> 1) CPU will map (invalidate) the buffers for the DMA
> 2) DMA will place the buffers and ack the CPU
> 3) The CPU will call unmap which will invalidate the buffers again
>
>
> I can only think of one problematic scenario:
> - After 1 and before 2:
> CPU0 did spec prefetch for address x
> - After 2:
> CPU0 is doing inv: lock int, ldr x,
> str x (addr x is marked dirty with wrong data).
The data has been corrupted at this point; you don't need a second CPU.
> CPU1 at this time doing spec prefetch for x which
> will snoop CPU0 and will cause data corruption... :(
>
> I can assume that if I count on the Soc IO cache coherency support all
> this isn't relevant, right? Or there are some issues hiding with v6 +
> IO coherency?
What "IO cache coherency" ? Are you saying that your SoC has a mode
where the DMA controller can snoop the CPU caches?
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 7:43 ` Russell King - ARM Linux
@ 2010-05-17 8:29 ` Ronen Shitrit
2010-05-17 8:57 ` Russell King - ARM Linux
0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 8:29 UTC (permalink / raw)
To: linux-arm-kernel
-----Original Message-----
From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
Sent: Monday, May 17, 2010 10:43 AM
To: Ronen Shitrit
Cc: Catalin Marinas; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
On Mon, May 17, 2010 at 10:34:36AM +0300, Ronen Shitrit wrote:
> After some more thinking, I think I found the problem...
>
> DMA for receive should operate as follows:
> 1) CPU will map (invalidate) the buffers for the DMA
> 2) DMA will place the buffers and ack the CPU
> 3) The CPU will call unmap which will invalidate the buffers again
>
>
> I can only think of one problematic scenario:
> - After 1 and before 2:
> CPU0 did spec prefetch for address x
> - After 2:
> CPU0 is doing inv: lock int, ldr x,
> str x (addr x is marked dirty with wrong data).
The data has been corrupted at this point; you don't need a second CPU.
[Ronen Shitrit] Why? If the second CPU doesn't exist an invalidate will come just after the str inst and the wrong data will not get anywhere... (interrupts are locked so I assume no eviction...)
> CPU1 at this time doing spec prefetch for x which
> will snoop CPU0 and will cause data corruption... :(
>
> I can assume that if I count on the Soc IO cache coherency support all
> this isn't relevant, right? Or there are some issues hiding with v6 +
> IO coherency?
What "IO cache coherency" ? Are you saying that your SoC has a mode
where the DMA controller can snoop the CPU caches?
[Ronen Shitrit] Yes.
I also found that I can disable D-cache spec prefetch.
However I-cache prefetch still exist and we have unified shared L2, so I think the ldr/str WA is still not valid since it might load garbage from the L2 as a result of I-cache spec pref... right?
I guess that for machines that has only I-cache spec pref, unmap should invalidate only the unified caches (in my case the L2).
And for machines that has also D-cache spec pref, unmap should invalidate all caches.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 8:29 ` Ronen Shitrit
@ 2010-05-17 8:57 ` Russell King - ARM Linux
2010-05-17 9:50 ` Ronen Shitrit
0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-17 8:57 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, May 17, 2010 at 11:29:53AM +0300, Ronen Shitrit wrote:
> On Mon, May 17, 2010 at 10:34:36AM +0300, Ronen Shitrit wrote:
> > I can only think of one problematic scenario:
> > - After 1 and before 2:
> > CPU0 did spec prefetch for address x
> > - After 2:
> > CPU0 is doing inv: lock int, ldr x,
> > str x (addr x is marked dirty with wrong data).
>
> The data has been corrupted at this point; you don't need a second CPU.
>
> [Ronen Shitrit] Why? If the second CPU doesn't exist an invalidate will
> come just after the str inst and the wrong data will not get anywhere...
> (interrupts are locked so I assume no eviction...)
If a speculative prefetch occurs (eg, to prefetch the next ldr) it
could evict the dirty cache line that the str just wrote to. Cache
replacement algorithms aren't always round-robin.
>
> > CPU1 at this time doing spec prefetch for x which
> > will snoop CPU0 and will cause data corruption... :(
> >
> > I can assume that if I count on the Soc IO cache coherency support all
> > this isn't relevant, right? Or there are some issues hiding with v6 +
> > IO coherency?
>
> What "IO cache coherency" ? Are you saying that your SoC has a mode
> where the DMA controller can snoop the CPU caches?
>
> [Ronen Shitrit] Yes.
That suggests to me that your SoC manufacturer intended the system to run
with DMA coherency enabled - maybe to avoid these problems.
We have a macro in the kernel to avoid unnecessary cache handling as a
result of that - arch_is_coherent(). Maybe you should enable the hardware
DMA coherency and ensure arch_is_coherent() returns true for your SoC.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 8:57 ` Russell King - ARM Linux
@ 2010-05-17 9:50 ` Ronen Shitrit
2010-05-17 10:03 ` Russell King - ARM Linux
0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 9:50 UTC (permalink / raw)
To: linux-arm-kernel
-----Original Message-----
From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
Sent: Monday, May 17, 2010 11:58 AM
To: Ronen Shitrit
Cc: Catalin Marinas; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
On Mon, May 17, 2010 at 11:29:53AM +0300, Ronen Shitrit wrote:
> On Mon, May 17, 2010 at 10:34:36AM +0300, Ronen Shitrit wrote:
> > I can only think of one problematic scenario:
> > - After 1 and before 2:
> > CPU0 did spec prefetch for address x
> > - After 2:
> > CPU0 is doing inv: lock int, ldr x,
> > str x (addr x is marked dirty with wrong data).
>
> The data has been corrupted at this point; you don't need a second CPU.
>
> [Ronen Shitrit] Why? If the second CPU doesn't exist an invalidate will
> come just after the str inst and the wrong data will not get anywhere...
> (interrupts are locked so I assume no eviction...)
If a speculative prefetch occurs (eg, to prefetch the next ldr) it
could evict the dirty cache line that the str just wrote to. Cache
replacement algorithms aren't always round-robin.
[Ronen Shitrit] only ldr around is the next line, which shouldn't evict the current line, so I don't see any issue.
>
> > CPU1 at this time doing spec prefetch for x which
> > will snoop CPU0 and will cause data corruption... :(
> >
> > I can assume that if I count on the Soc IO cache coherency support all
> > this isn't relevant, right? Or there are some issues hiding with v6 +
> > IO coherency?
>
> What "IO cache coherency" ? Are you saying that your SoC has a mode
> where the DMA controller can snoop the CPU caches?
>
> [Ronen Shitrit] Yes.
That suggests to me that your SoC manufacturer intended the system to run
with DMA coherency enabled - maybe to avoid these problems.
We have a macro in the kernel to avoid unnecessary cache handling as a
result of that - arch_is_coherent(). Maybe you should enable the hardware
DMA coherency and ensure arch_is_coherent() returns true for your SoC.
[Ronen Shitrit] I know, I just wanted to make sure we have fall back for case of some HW malfunctioning.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-16 6:29 ` Ronen Shitrit
2010-05-16 15:01 ` Russell King - ARM Linux
@ 2010-05-17 9:51 ` Catalin Marinas
2010-05-17 9:57 ` Catalin Marinas
1 sibling, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 9:51 UTC (permalink / raw)
To: linux-arm-kernel
On Sun, 2010-05-16 at 07:29 +0100, Ronen Shitrit wrote:
> Our ARMv6 do speculative rd for both I and D cache...
I'll check with the hardware guys here in ARM and get back to you.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 9:51 ` Catalin Marinas
@ 2010-05-17 9:57 ` Catalin Marinas
2010-05-17 9:59 ` Ronen Shitrit
0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 9:57 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 2010-05-17 at 10:51 +0100, Catalin Marinas wrote:
> On Sun, 2010-05-16 at 07:29 +0100, Ronen Shitrit wrote:
> > Our ARMv6 do speculative rd for both I and D cache...
>
> I'll check with the hardware guys here in ARM and get back to you.
Just for clarification - is your ARMv6 MP processor an ARM11MPCore or
your own MP variant?
In case of Cortex-A9 (ARMv7), the TRM states clearly that it can do
speculative loads into the L1 D-cache and this can be disabled via a bit
in the auxiliary control register. Do you have a similar bit on your
ARMv6 MP processor?
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 9:57 ` Catalin Marinas
@ 2010-05-17 9:59 ` Ronen Shitrit
2010-05-17 11:08 ` Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 9:59 UTC (permalink / raw)
To: linux-arm-kernel
-----Original Message-----
From: Catalin Marinas [mailto:catalin.marinas at arm.com]
Sent: Monday, May 17, 2010 12:58 PM
To: Ronen Shitrit
Cc: Russell King - ARM Linux; linux-arm-kernel at lists.infradead.org
Subject: RE: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
On Mon, 2010-05-17 at 10:51 +0100, Catalin Marinas wrote:
> On Sun, 2010-05-16 at 07:29 +0100, Ronen Shitrit wrote:
> > Our ARMv6 do speculative rd for both I and D cache...
>
> I'll check with the hardware guys here in ARM and get back to you.
Just for clarification - is your ARMv6 MP processor an ARM11MPCore or
your own MP variant?
[Ronen Shitrit] our own.
In case of Cortex-A9 (ARMv7), the TRM states clearly that it can do
speculative loads into the L1 D-cache and this can be disabled via a bit
in the auxiliary control register. Do you have a similar bit on your
ARMv6 MP processor?
[Ronen Shitrit] Yes.
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 7:34 ` Ronen Shitrit
2010-05-17 7:43 ` Russell King - ARM Linux
@ 2010-05-17 10:00 ` Catalin Marinas
2010-05-17 11:29 ` Ronen Shitrit
1 sibling, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 10:00 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 2010-05-17 at 10:34 +0300, Ronen Shitrit wrote:
> After some more thinking, I think I found the problem...
>
> DMA for receive should operate as follows:
> 1) CPU will map (invalidate) the buffers for the DMA
> 2) DMA will place the buffers and ack the CPU
> 3) The CPU will call unmap which will invalidate the buffers again
(3) doesn't happen with my patch - the unmap function is now a no-op
since the invalidate operation is destructive.
As I said in a previous e-mail, it's the sync operation that does an
invalidate and this would affect the data in the buffer. Hence the patch
to make the invalidate operation non-destructive (as long as your CPU
doesn't do speculative loads into the D-cache or they can be disabled).
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 9:50 ` Ronen Shitrit
@ 2010-05-17 10:03 ` Russell King - ARM Linux
2010-05-17 11:26 ` Ronen Shitrit
0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-17 10:03 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, May 17, 2010 at 12:50:01PM +0300, Ronen Shitrit wrote:
> If a speculative prefetch occurs (eg, to prefetch the next ldr) it
> could evict the dirty cache line that the str just wrote to. Cache
> replacement algorithms aren't always round-robin.
>
> [Ronen Shitrit] only ldr around is the next line, which shouldn't
> evict the current line, so I don't see any issue.
How can you say that the current line won't be evicted? Do you know
the cache replacement algorithm for your CPU that well?
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 9:59 ` Ronen Shitrit
@ 2010-05-17 11:08 ` Catalin Marinas
2010-05-17 11:27 ` Ronen Shitrit
0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 11:08 UTC (permalink / raw)
To: linux-arm-kernel
> In case of Cortex-A9 (ARMv7), the TRM states clearly that it can do
> speculative loads into the L1 D-cache and this can be disabled via a bit
> in the auxiliary control register. Do you have a similar bit on your
> ARMv6 MP processor?
>
> [Ronen Shitrit] Yes.
So, use it :)
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 10:03 ` Russell King - ARM Linux
@ 2010-05-17 11:26 ` Ronen Shitrit
2010-05-17 11:31 ` Russell King - ARM Linux
0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 11:26 UTC (permalink / raw)
To: linux-arm-kernel
-----Original Message-----
From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
Sent: Monday, May 17, 2010 1:04 PM
To: Ronen Shitrit
Cc: Catalin Marinas; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
On Mon, May 17, 2010 at 12:50:01PM +0300, Ronen Shitrit wrote:
> If a speculative prefetch occurs (eg, to prefetch the next ldr) it
> could evict the dirty cache line that the str just wrote to. Cache
> replacement algorithms aren't always round-robin.
>
> [Ronen Shitrit] only ldr around is the next line, which shouldn't
> evict the current line, so I don't see any issue.
How can you say that the current line won't be evicted?
[Ronen Shitrit] Since the only ldr I see around is the ldr to the next line and next line will go to different line on the cache...
Do you know
the cache replacement algorithm for your CPU that well?
[Ronen Shitrit] We can configure to work in LRU or random replacement...
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 11:08 ` Catalin Marinas
@ 2010-05-17 11:27 ` Ronen Shitrit
2010-05-17 11:47 ` Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 11:27 UTC (permalink / raw)
To: linux-arm-kernel
Performance wise we prefer to enable it...
But it seems to be on the safe side we will disable it.
Thanks
-----Original Message-----
From: Catalin Marinas [mailto:catalin.marinas at arm.com]
Sent: Monday, May 17, 2010 2:08 PM
To: Ronen Shitrit
Cc: Russell King - ARM Linux; linux-arm-kernel at lists.infradead.org
Subject: RE: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
> In case of Cortex-A9 (ARMv7), the TRM states clearly that it can do
> speculative loads into the L1 D-cache and this can be disabled via a bit
> in the auxiliary control register. Do you have a similar bit on your
> ARMv6 MP processor?
>
> [Ronen Shitrit] Yes.
So, use it :)
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 10:00 ` Catalin Marinas
@ 2010-05-17 11:29 ` Ronen Shitrit
2010-05-17 11:42 ` Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 11:29 UTC (permalink / raw)
To: linux-arm-kernel
-----Original Message-----
From: Catalin Marinas [mailto:catalin.marinas at arm.com]
Sent: Monday, May 17, 2010 1:01 PM
To: Ronen Shitrit
Cc: Russell King - ARM Linux; linux-arm-kernel at lists.infradead.org
Subject: RE: [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
On Mon, 2010-05-17 at 10:34 +0300, Ronen Shitrit wrote:
> After some more thinking, I think I found the problem...
>
> DMA for receive should operate as follows:
> 1) CPU will map (invalidate) the buffers for the DMA
> 2) DMA will place the buffers and ack the CPU
> 3) The CPU will call unmap which will invalidate the buffers again
(3) doesn't happen with my patch - the unmap function is now a no-op
since the invalidate operation is destructive.
[Ronen Shitrit] so how do you protect your self from I-cache spec prefetch?
Do you have a unified L2?
As I said in a previous e-mail, it's the sync operation that does an
invalidate and this would affect the data in the buffer. Hence the patch
to make the invalidate operation non-destructive (as long as your CPU
doesn't do speculative loads into the D-cache or they can be disabled).
[Ronen Shitrit] What about I-cache prefetch? Unified cache?
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 11:26 ` Ronen Shitrit
@ 2010-05-17 11:31 ` Russell King - ARM Linux
2010-05-17 11:45 ` Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-17 11:31 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, May 17, 2010 at 02:26:38PM +0300, Ronen Shitrit wrote:
> > How can you say that the current line won't be evicted?
>
> [Ronen Shitrit] Since the only ldr I see around is the ldr to the next
> line and next line will go to different line on the cache...
I think you mean different index, and yes, you're right about that.
However, I still feel that your solution is unsafe as long as
speculative prefetch is enabled; the assumption with speculative
prefetch from an architectural point of view is to assume that
the CPU has infinite prefetching.
The fact that we have non-ARM ARMv6 CPUs which do prefetch in ways
we don't know about means that we can't assume that ARMv6 CPUs
aren't going to have aggressive prefetching.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 11:29 ` Ronen Shitrit
@ 2010-05-17 11:42 ` Catalin Marinas
2010-05-17 12:04 ` Ronen Shitrit
0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 11:42 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 2010-05-17 at 14:29 +0300, Ronen Shitrit wrote:
> From: Catalin Marinas [mailto:catalin.marinas at arm.com]
> On Mon, 2010-05-17 at 10:34 +0300, Ronen Shitrit wrote:
> > After some more thinking, I think I found the problem...
> >
> > DMA for receive should operate as follows:
> > 1) CPU will map (invalidate) the buffers for the DMA
> > 2) DMA will place the buffers and ack the CPU
> > 3) The CPU will call unmap which will invalidate the buffers again
>
> (3) doesn't happen with my patch - the unmap function is now a no-op
> since the invalidate operation is destructive.
>
> [Ronen Shitrit] so how do you protect your self from I-cache spec prefetch?
Strictly looking at the DMA cache maintenance, the API only cares about
the D-side or unified (the I-side is handled by the kernel via other
functions like flush_dcache_page).
> Do you have a unified L2?
Yes, it's a unified L2, so I-cache prefetching can cause allocation into
the L2 cache. The L2 cache is considered "outer cache" in Linux and
invalidated explicitly both before and after the DMA transfer (from
device). Since the L2 (at least on the RealView boards) is common to all
the processors, there is no need for tricks like read/write for
ownership, so the invalidation doesn't affect the underlying RAM.
> As I said in a previous e-mail, it's the sync operation that does an
> invalidate and this would affect the data in the buffer. Hence the patch
> to make the invalidate operation non-destructive (as long as your CPU
> doesn't do speculative loads into the D-cache or they can be disabled).
>
> [Ronen Shitrit] What about I-cache prefetch? Unified cache?
The "sync_for_cpu" operation (from device case) first invalidates the L2
cache and then invalidates the L1 using a LDR/STR pair and cache op. So
a the LDR would see the data in RAM rather than some speculatively
loaded value in the L2.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 11:31 ` Russell King - ARM Linux
@ 2010-05-17 11:45 ` Catalin Marinas
0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 11:45 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 2010-05-17 at 12:31 +0100, Russell King - ARM Linux wrote:
> The fact that we have non-ARM ARMv6 CPUs which do prefetch in ways
> we don't know about means that we can't assume that ARMv6 CPUs
> aren't going to have aggressive prefetching.
I agree with this. Maybe we should add a config option for the DMA ops
workaround and state clearly in the help that it doesn't work with
speculative loads into the D-cache. Usually implementations have either
a bit in CP15 ACTLR or some hidden (undocumented) bit somewhere else to
disable the speculative loads.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 11:27 ` Ronen Shitrit
@ 2010-05-17 11:47 ` Catalin Marinas
2010-05-17 13:46 ` [PATCH 2/8] ARM: Implement read/write for ownership intheARMv6 " Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 11:47 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 2010-05-17 at 14:27 +0300, Ronen Shitrit wrote:
> Performance wise we prefer to enable it...
> But it seems to be on the safe side we will disable it.
It's worth doing some benchmarking.
Anyway, if it's a new SoC, you may have the option of implementing IPI
via FIQ and you could broadcast the cache maintenance operations via FIQ
(and avoid potential deadlocks in the kernel).
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 11:42 ` Catalin Marinas
@ 2010-05-17 12:04 ` Ronen Shitrit
2010-05-17 13:45 ` Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: Ronen Shitrit @ 2010-05-17 12:04 UTC (permalink / raw)
To: linux-arm-kernel
> As I said in a previous e-mail, it's the sync operation that does an
> invalidate and this would affect the data in the buffer. Hence the patch
> to make the invalidate operation non-destructive (as long as your CPU
> doesn't do speculative loads into the D-cache or they can be disabled).
>
> [Ronen Shitrit] What about I-cache prefetch? Unified cache?
The "sync_for_cpu" operation (from device case) first invalidates the L2
cache and then invalidates the L1 using a LDR/STR pair and cache op.
[Ronen Shitrit] So for v6-MP, if no D-cache spec pref, why do you need to inv the L1 at all?
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 DMA cache ops
2010-05-17 12:04 ` Ronen Shitrit
@ 2010-05-17 13:45 ` Catalin Marinas
0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 13:45 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 2010-05-17 at 13:04 +0100, Ronen Shitrit wrote:
> > As I said in a previous e-mail, it's the sync operation that does an
> > invalidate and this would affect the data in the buffer. Hence the patch
> > to make the invalidate operation non-destructive (as long as your CPU
> > doesn't do speculative loads into the D-cache or they can be disabled).
> >
> > [Ronen Shitrit] What about I-cache prefetch? Unified cache?
>
> The "sync_for_cpu" operation (from device case) first invalidates the L2
> cache and then invalidates the L1 using a LDR/STR pair and cache op.
> [Ronen Shitrit] So for v6-MP, if no D-cache spec pref, why do you need to inv the L1 at all?
No, you are correct. The ___dma_single_dev_to_cpu(FROM_DEVICE) function
calls outer_inv_range() followed by dmac_unmap_area(). The latter,
however, is a no-op with my DMA workaround patch.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 2/8] ARM: Implement read/write for ownership intheARMv6 DMA cache ops
2010-05-17 11:47 ` Catalin Marinas
@ 2010-05-17 13:46 ` Catalin Marinas
0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-17 13:46 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 2010-05-17 at 12:47 +0100, Catalin Marinas wrote:
> On Mon, 2010-05-17 at 14:27 +0300, Ronen Shitrit wrote:
> > Performance wise we prefer to enable it...
> > But it seems to be on the safe side we will disable it.
>
> It's worth doing some benchmarking.
>
> Anyway, if it's a new SoC, you may have the option of implementing IPI
> via FIQ and you could broadcast the cache maintenance operations via FIQ
> (and avoid potential deadlocks in the kernel).
Or even better - implement cache broadcasting in hardware.
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
2010-05-04 16:44 ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP Catalin Marinas
@ 2010-05-28 18:50 ` Russell King - ARM Linux
2010-05-28 21:37 ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB opson " Catalin Marinas
0 siblings, 1 reply; 64+ messages in thread
From: Russell King - ARM Linux @ 2010-05-28 18:50 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 04, 2010 at 05:44:54PM +0100, Catalin Marinas wrote:
> The standard I-cache Invalidate All (ICIALLU) and Branch Predication
> Invalidate All (BPIALL) operations are not automatically broadcast to
> the other CPUs in an ARMv7 MP system. The patch adds the Inner Shareable
> variants, ICIALLUIS and BPIALLIS, if ARMv7 and SMP.
FYI, I'm considering reverting this patch; this patch is one of two
responsible for causing a stability regression of Versatile Express
platform.
The two commits responsible for making Versatile Express userspace
completely unstable are:
b8349b569aae661dea9d59d7d2ee587ccea3336c
ARM: 6112/1: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
7e5a69e83ba7a0d5917ad830f417cba8b8d6aa72
ARM: 6007/1: fix highmem with VIPT cache and DMA
Most of the instability is from the second patch, but reverting just
the second patch doesn't fully restore userspace stability.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB opson ARMv7 SMP
2010-05-28 18:50 ` Russell King - ARM Linux
@ 2010-05-28 21:37 ` Catalin Marinas
0 siblings, 0 replies; 64+ messages in thread
From: Catalin Marinas @ 2010-05-28 21:37 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, 2010-05-28 at 19:50 +0100, Russell King - ARM Linux wrote:
> On Tue, May 04, 2010 at 05:44:54PM +0100, Catalin Marinas wrote:
> > The standard I-cache Invalidate All (ICIALLU) and Branch Predication
> > Invalidate All (BPIALL) operations are not automatically broadcast to
> > the other CPUs in an ARMv7 MP system. The patch adds the Inner Shareable
> > variants, ICIALLUIS and BPIALLIS, if ARMv7 and SMP.
>
> FYI, I'm considering reverting this patch; this patch is one of two
> responsible for causing a stability regression of Versatile Express
> platform.
>
> The two commits responsible for making Versatile Express userspace
> completely unstable are:
>
> b8349b569aae661dea9d59d7d2ee587ccea3336c
> ARM: 6112/1: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
>
> 7e5a69e83ba7a0d5917ad830f417cba8b8d6aa72
> ARM: 6007/1: fix highmem with VIPT cache and DMA
>
> Most of the instability is from the second patch, but reverting just
> the second patch doesn't fully restore userspace stability.
That's a bit weird as my patch is supposed to make things better. Does
it fail in the same way with maxcpus=1? For SMP configuration, you could
also try some of the recent patches from my "proposed" branch:
http://git.kernel.org/?p=linux/kernel/git/cmarinas/linux-2.6-cm.git;a=shortlog;h=refs/heads/proposed
especially:
ARM: Use lazy cache flushing on ARMv7 SMP systems
ARM: Assume new page cache pages have dirty D-cache
ARM: Synchronise the I and D caches via set_pte_at
--
Catalin
^ permalink raw reply [flat|nested] 64+ messages in thread
end of thread, other threads:[~2010-05-28 21:37 UTC | newest]
Thread overview: 64+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-04 16:44 [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
2010-05-04 16:44 ` [PATCH 1/8] ARM: Improve the L2 cache performance when PL310 is used Catalin Marinas
2010-05-04 17:02 ` Jason McMullan
2010-05-05 16:07 ` Catalin Marinas
2010-05-04 16:44 ` [PATCH 2/8] ARM: Implement read/write for ownership in the ARMv6 DMA cache ops Catalin Marinas
2010-05-04 17:04 ` Jason McMullan
2010-05-05 16:23 ` Catalin Marinas
2010-05-05 13:26 ` George G. Davis
2010-05-06 14:40 ` Catalin Marinas
2010-05-06 15:57 ` George G. Davis
2010-05-12 12:51 ` Ronen Shitrit
2010-05-12 13:55 ` Catalin Marinas
2010-05-12 15:03 ` Ronen Shitrit
2010-05-12 18:48 ` Russell King - ARM Linux
2010-05-12 18:59 ` Russell King - ARM Linux
2010-05-12 20:00 ` Ronen Shitrit
2010-05-12 20:04 ` Russell King - ARM Linux
2010-05-12 20:19 ` Ronen Shitrit
2010-05-12 21:21 ` [PATCH 2/8] ARM: Implement read/write for ownership in theARMv6 " Catalin Marinas
2010-05-13 5:27 ` Ronen Shitrit
2010-05-13 8:26 ` Catalin Marinas
2010-05-13 13:54 ` George G. Davis
2010-05-13 14:15 ` Catalin Marinas
2010-05-13 20:34 ` George G. Davis
2010-05-14 16:29 ` Catalin Marinas
2010-05-14 16:42 ` Catalin Marinas
2010-05-15 1:26 ` George G. Davis
2010-05-16 6:28 ` Ronen Shitrit
2010-05-16 6:29 ` Ronen Shitrit
2010-05-16 15:01 ` Russell King - ARM Linux
2010-05-17 6:29 ` Ronen Shitrit
2010-05-17 6:57 ` Russell King - ARM Linux
2010-05-17 7:34 ` Ronen Shitrit
2010-05-17 7:43 ` Russell King - ARM Linux
2010-05-17 8:29 ` Ronen Shitrit
2010-05-17 8:57 ` Russell King - ARM Linux
2010-05-17 9:50 ` Ronen Shitrit
2010-05-17 10:03 ` Russell King - ARM Linux
2010-05-17 11:26 ` Ronen Shitrit
2010-05-17 11:31 ` Russell King - ARM Linux
2010-05-17 11:45 ` Catalin Marinas
2010-05-17 10:00 ` Catalin Marinas
2010-05-17 11:29 ` Ronen Shitrit
2010-05-17 11:42 ` Catalin Marinas
2010-05-17 12:04 ` Ronen Shitrit
2010-05-17 13:45 ` Catalin Marinas
2010-05-17 9:51 ` Catalin Marinas
2010-05-17 9:57 ` Catalin Marinas
2010-05-17 9:59 ` Ronen Shitrit
2010-05-17 11:08 ` Catalin Marinas
2010-05-17 11:27 ` Ronen Shitrit
2010-05-17 11:47 ` Catalin Marinas
2010-05-17 13:46 ` [PATCH 2/8] ARM: Implement read/write for ownership intheARMv6 " Catalin Marinas
2010-05-04 16:44 ` [PATCH 3/8] ARM: Align machine_desc.phys_io to a 1MB section Catalin Marinas
2010-05-04 16:44 ` [PATCH 4/8] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
2010-05-04 16:44 ` [PATCH 5/8] ARM: Fix the __arm_ioremap_caller() definition in nommu.c Catalin Marinas
2010-05-04 17:19 ` Russell King - ARM Linux
2010-05-04 16:44 ` [PATCH 6/8] ARM: Implement copy_to_user_page() for noMMU Catalin Marinas
2010-05-04 17:19 ` Russell King - ARM Linux
2010-05-04 16:44 ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP Catalin Marinas
2010-05-28 18:50 ` Russell King - ARM Linux
2010-05-28 21:37 ` [PATCH 7/8] ARM: Use the Inner Shareable I-cache and BTB opson " Catalin Marinas
2010-05-04 16:44 ` [PATCH 8/8] ARM: Implement phys_mem_access_prot() to avoid attributes aliasing Catalin Marinas
2010-05-04 16:48 ` [PATCH 0/8] Various patches for comments and upstream Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).