* [PATCH v4 0/4] Patches for -next
@ 2010-06-21 14:46 Catalin Marinas
2010-06-21 14:46 ` [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Catalin Marinas @ 2010-06-21 14:46 UTC (permalink / raw)
To: linux-arm-kernel
Pretty much the same content as v3 but I reordered the last three
patches to make sure that switching to lazy cache flushing on ARMv7 SMP
doesn't make the set_pte/cache flushing race more visible (issue handled
by the set_pte_at patch).
Catalin Marinas (4):
ARM: Remove the domain switching on ARMv6k/v7 CPUs
ARM: Assume new page cache pages have dirty D-cache
ARM: Synchronise the I and D caches via set_pte_at() on SMP systems
ARM: Use lazy cache flushing on ARMv7 SMP systems
arch/arm/include/asm/assembler.h | 13 +++---
arch/arm/include/asm/cacheflush.h | 6 +--
arch/arm/include/asm/domain.h | 31 +++++++++++++-
arch/arm/include/asm/futex.h | 9 ++--
arch/arm/include/asm/pgtable.h | 25 ++++++++++-
arch/arm/include/asm/smp_plat.h | 4 ++
arch/arm/include/asm/tlbflush.h | 12 ++++-
arch/arm/include/asm/traps.h | 2 +
arch/arm/include/asm/uaccess.h | 16 ++++---
arch/arm/kernel/entry-armv.S | 4 +-
arch/arm/kernel/fiq.c | 5 ++
arch/arm/kernel/traps.c | 14 ++++--
arch/arm/lib/getuser.S | 13 +++---
arch/arm/lib/putuser.S | 29 +++++++------
arch/arm/lib/uaccess.S | 83 +++++++++++++++++++------------------
arch/arm/mm/Kconfig | 8 ++++
arch/arm/mm/copypage-v4mc.c | 2 -
arch/arm/mm/copypage-v6.c | 2 -
arch/arm/mm/copypage-xscale.c | 2 -
arch/arm/mm/dma-mapping.c | 6 +++
arch/arm/mm/fault-armv.c | 8 ++--
arch/arm/mm/flush.c | 31 +++++++++-----
arch/arm/mm/mmu.c | 6 +--
arch/arm/mm/proc-macros.S | 7 +++
arch/arm/mm/proc-v7.S | 5 +-
25 files changed, 226 insertions(+), 117 deletions(-)
--
Catalin
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs
2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas
@ 2010-06-21 14:46 ` Catalin Marinas
2010-06-22 12:47 ` Anton Vorontsov
2010-06-21 14:46 ` [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache Catalin Marinas
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Catalin Marinas @ 2010-06-21 14:46 UTC (permalink / raw)
To: linux-arm-kernel
This patch removes the domain switching functionality via the set_fs and
__switch_to functions on cores that have a TLS register.
Currently, the ioremap and vmalloc areas share the same level 1 page
tables and therefore have the same domain (DOMAIN_KERNEL). When the
kernel domain is modified from Client to Manager (via the __set_fs or in
the __switch_to function), the XN (eXecute Never) bit is overridden and
newer CPUs can speculatively prefetch the ioremap'ed memory.
Linux performs the kernel domain switching to allow user-specific
functions (copy_to/from_user, get/put_user etc.) to access kernel
memory. In order for these functions to work with the kernel domain set
to Client, the patch modifies the LDRT/STRT and related instructions to
the LDR/STR ones.
The user pages access rights are also modified for kernel read-only
access rather than read/write so that the copy-on-write mechanism still
works. CPU_USE_DOMAINS gets disabled only if HAS_TLS_REG is defined
since writing the TLS value to the high vectors page isn't possible.
The user addresses passed to the kernel are checked by the access_ok()
function so that they do not point to the kernel space.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/assembler.h | 13 +++---
arch/arm/include/asm/domain.h | 31 +++++++++++++-
arch/arm/include/asm/futex.h | 9 ++--
arch/arm/include/asm/traps.h | 2 +
arch/arm/include/asm/uaccess.h | 16 ++++---
arch/arm/kernel/entry-armv.S | 4 +-
arch/arm/kernel/fiq.c | 5 ++
arch/arm/kernel/traps.c | 14 +++++-
arch/arm/lib/getuser.S | 13 +++---
arch/arm/lib/putuser.S | 29 +++++++------
arch/arm/lib/uaccess.S | 83 +++++++++++++++++++-------------------
arch/arm/mm/Kconfig | 8 ++++
arch/arm/mm/mmu.c | 6 +--
arch/arm/mm/proc-macros.S | 7 +++
arch/arm/mm/proc-v7.S | 5 +-
15 files changed, 153 insertions(+), 92 deletions(-)
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 6e8f05c..66db132 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -18,6 +18,7 @@
#endif
#include <asm/ptrace.h>
+#include <asm/domain.h>
/*
* Endian independent macros for shifting bytes within registers.
@@ -183,12 +184,12 @@
*/
#ifdef CONFIG_THUMB2_KERNEL
- .macro usraccoff, instr, reg, ptr, inc, off, cond, abort
+ .macro usraccoff, instr, reg, ptr, inc, off, cond, abort, t=T()
9999:
.if \inc == 1
- \instr\cond\()bt \reg, [\ptr, #\off]
+ \instr\cond\()b\()\t\().w \reg, [\ptr, #\off]
.elseif \inc == 4
- \instr\cond\()t \reg, [\ptr, #\off]
+ \instr\cond\()\t\().w \reg, [\ptr, #\off]
.else
.error "Unsupported inc macro argument"
.endif
@@ -223,13 +224,13 @@
#else /* !CONFIG_THUMB2_KERNEL */
- .macro usracc, instr, reg, ptr, inc, cond, rept, abort
+ .macro usracc, instr, reg, ptr, inc, cond, rept, abort, t=T()
.rept \rept
9999:
.if \inc == 1
- \instr\cond\()bt \reg, [\ptr], #\inc
+ \instr\cond\()b\()\t \reg, [\ptr], #\inc
.elseif \inc == 4
- \instr\cond\()t \reg, [\ptr], #\inc
+ \instr\cond\()\t \reg, [\ptr], #\inc
.else
.error "Unsupported inc macro argument"
.endif
diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h
index cc7ef40..af18cea 100644
--- a/arch/arm/include/asm/domain.h
+++ b/arch/arm/include/asm/domain.h
@@ -45,13 +45,17 @@
*/
#define DOMAIN_NOACCESS 0
#define DOMAIN_CLIENT 1
+#ifdef CONFIG_CPU_USE_DOMAINS
#define DOMAIN_MANAGER 3
+#else
+#define DOMAIN_MANAGER 1
+#endif
#define domain_val(dom,type) ((type) << (2*(dom)))
#ifndef __ASSEMBLY__
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
#define set_domain(x) \
do { \
__asm__ __volatile__( \
@@ -74,5 +78,28 @@
#define modify_domain(dom,type) do { } while (0)
#endif
+/*
+ * Generate the T (user) versions of the LDR/STR and related
+ * instructions (inline assembly)
+ */
+#ifdef CONFIG_CPU_USE_DOMAINS
+#define T(instr) #instr "t"
+#else
+#define T(instr) #instr
#endif
-#endif /* !__ASSEMBLY__ */
+
+#else /* __ASSEMBLY__ */
+
+/*
+ * Generate the T (user) versions of the LDR/STR and related
+ * instructions
+ */
+#ifdef CONFIG_CPU_USE_DOMAINS
+#define T(instr) instr ## t
+#else
+#define T(instr) instr
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* !__ASM_PROC_DOMAIN_H */
diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index 540a044..b33fe70 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -13,12 +13,13 @@
#include <linux/preempt.h>
#include <linux/uaccess.h>
#include <asm/errno.h>
+#include <asm/domain.h>
#define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \
__asm__ __volatile__( \
- "1: ldrt %1, [%2]\n" \
+ "1: " T(ldr) " %1, [%2]\n" \
" " insn "\n" \
- "2: strt %0, [%2]\n" \
+ "2: " T(str) " %0, [%2]\n" \
" mov %0, #0\n" \
"3:\n" \
" .pushsection __ex_table,\"a\"\n" \
@@ -97,10 +98,10 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
pagefault_disable(); /* implies preempt_disable() */
__asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
- "1: ldrt %0, [%3]\n"
+ "1: " T(ldr) " %0, [%3]\n"
" teq %0, %1\n"
" it eq @ explicit IT needed for the 2b label\n"
- "2: streqt %2, [%3]\n"
+ "2: " T(streq) " %2, [%3]\n"
"3:\n"
" .pushsection __ex_table,\"a\"\n"
" .align 3\n"
diff --git a/arch/arm/include/asm/traps.h b/arch/arm/include/asm/traps.h
index 491960b..af5d5d1 100644
--- a/arch/arm/include/asm/traps.h
+++ b/arch/arm/include/asm/traps.h
@@ -27,4 +27,6 @@ static inline int in_exception_text(unsigned long ptr)
extern void __init early_trap_init(void);
extern void dump_backtrace_entry(unsigned long where, unsigned long from, unsigned long frame);
+extern void *vectors_page;
+
#endif
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 33e4a48..b293616 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -227,7 +227,7 @@ do { \
#define __get_user_asm_byte(x,addr,err) \
__asm__ __volatile__( \
- "1: ldrbt %1,[%2]\n" \
+ "1: " T(ldrb) " %1,[%2],#0\n" \
"2:\n" \
" .pushsection .fixup,\"ax\"\n" \
" .align 2\n" \
@@ -263,7 +263,7 @@ do { \
#define __get_user_asm_word(x,addr,err) \
__asm__ __volatile__( \
- "1: ldrt %1,[%2]\n" \
+ "1: " T(ldr) " %1,[%2],#0\n" \
"2:\n" \
" .pushsection .fixup,\"ax\"\n" \
" .align 2\n" \
@@ -308,7 +308,7 @@ do { \
#define __put_user_asm_byte(x,__pu_addr,err) \
__asm__ __volatile__( \
- "1: strbt %1,[%2]\n" \
+ "1: " T(strb) " %1,[%2],#0\n" \
"2:\n" \
" .pushsection .fixup,\"ax\"\n" \
" .align 2\n" \
@@ -341,7 +341,7 @@ do { \
#define __put_user_asm_word(x,__pu_addr,err) \
__asm__ __volatile__( \
- "1: strt %1,[%2]\n" \
+ "1: " T(str) " %1,[%2],#0\n" \
"2:\n" \
" .pushsection .fixup,\"ax\"\n" \
" .align 2\n" \
@@ -366,10 +366,10 @@ do { \
#define __put_user_asm_dword(x,__pu_addr,err) \
__asm__ __volatile__( \
- ARM( "1: strt " __reg_oper1 ", [%1], #4\n" ) \
- ARM( "2: strt " __reg_oper0 ", [%1]\n" ) \
- THUMB( "1: strt " __reg_oper1 ", [%1]\n" ) \
- THUMB( "2: strt " __reg_oper0 ", [%1, #4]\n" ) \
+ ARM( "1: " T(str) " " __reg_oper1 ", [%1], #4\n" ) \
+ ARM( "2: " T(str) " " __reg_oper0 ", [%1]\n" ) \
+ THUMB( "1: " T(str) " " __reg_oper1 ", [%1]\n" ) \
+ THUMB( "2: " T(str) " " __reg_oper0 ", [%1, #4]\n" ) \
"3:\n" \
" .pushsection .fixup,\"ax\"\n" \
" .align 2\n" \
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 7ee48e7..ba654fa 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -736,7 +736,7 @@ ENTRY(__switch_to)
THUMB( stmia ip!, {r4 - sl, fp} ) @ Store most regs on stack
THUMB( str sp, [ip], #4 )
THUMB( str lr, [ip], #4 )
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
ldr r6, [r2, #TI_CPU_DOMAIN]
#endif
#if defined(CONFIG_HAS_TLS_REG)
@@ -745,7 +745,7 @@ ENTRY(__switch_to)
mov r4, #0xffff0fff
str r3, [r4, #-15] @ TLS val at 0xffff0ff0
#endif
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
mcr p15, 0, r6, c3, c0, 0 @ Set domain register
#endif
mov r5, r0
diff --git a/arch/arm/kernel/fiq.c b/arch/arm/kernel/fiq.c
index 6ff7919..d601ef2 100644
--- a/arch/arm/kernel/fiq.c
+++ b/arch/arm/kernel/fiq.c
@@ -45,6 +45,7 @@
#include <asm/fiq.h>
#include <asm/irq.h>
#include <asm/system.h>
+#include <asm/traps.h>
static unsigned long no_fiq_insn;
@@ -77,7 +78,11 @@ int show_fiq_list(struct seq_file *p, void *v)
void set_fiq_handler(void *start, unsigned int length)
{
+#if defined(CONFIG_CPU_USE_DOMAINS)
memcpy((void *)0xffff001c, start, length);
+#else
+ memcpy(vectors_page + 0x1c, start, length);
+#endif
flush_icache_range(0xffff001c, 0xffff001c + length);
if (!vectors_high())
flush_icache_range(0x1c, 0x1c + length);
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1621e53..6bc57c5 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -36,6 +36,8 @@
static const char *handler[]= { "prefetch abort", "data abort", "address exception", "interrupt" };
+void *vectors_page;
+
#ifdef CONFIG_DEBUG_USER
unsigned int user_debug;
@@ -745,7 +747,11 @@ void __init trap_init(void)
void __init early_trap_init(void)
{
+#if defined(CONFIG_CPU_USE_DOMAINS)
unsigned long vectors = CONFIG_VECTORS_BASE;
+#else
+ unsigned long vectors = (unsigned long)vectors_page;
+#endif
extern char __stubs_start[], __stubs_end[];
extern char __vectors_start[], __vectors_end[];
extern char __kuser_helper_start[], __kuser_helper_end[];
@@ -764,10 +770,10 @@ void __init early_trap_init(void)
* Copy signal return handlers into the vector page, and
* set sigreturn to be a pointer to these.
*/
- memcpy((void *)KERN_SIGRETURN_CODE, sigreturn_codes,
- sizeof(sigreturn_codes));
- memcpy((void *)KERN_RESTART_CODE, syscall_restart_code,
- sizeof(syscall_restart_code));
+ memcpy((void *)(vectors + KERN_SIGRETURN_CODE - CONFIG_VECTORS_BASE),
+ sigreturn_codes, sizeof(sigreturn_codes));
+ memcpy((void *)(vectors + KERN_RESTART_CODE - CONFIG_VECTORS_BASE),
+ syscall_restart_code, sizeof(syscall_restart_code));
flush_icache_range(vectors, vectors + PAGE_SIZE);
modify_domain(DOMAIN_USER, DOMAIN_CLIENT);
diff --git a/arch/arm/lib/getuser.S b/arch/arm/lib/getuser.S
index b1631a7..1b049cd 100644
--- a/arch/arm/lib/getuser.S
+++ b/arch/arm/lib/getuser.S
@@ -28,20 +28,21 @@
*/
#include <linux/linkage.h>
#include <asm/errno.h>
+#include <asm/domain.h>
ENTRY(__get_user_1)
-1: ldrbt r2, [r0]
+1: T(ldrb) r2, [r0]
mov r0, #0
mov pc, lr
ENDPROC(__get_user_1)
ENTRY(__get_user_2)
#ifdef CONFIG_THUMB2_KERNEL
-2: ldrbt r2, [r0]
-3: ldrbt r3, [r0, #1]
+2: T(ldrb) r2, [r0]
+3: T(ldrb) r3, [r0, #1]
#else
-2: ldrbt r2, [r0], #1
-3: ldrbt r3, [r0]
+2: T(ldrb) r2, [r0], #1
+3: T(ldrb) r3, [r0]
#endif
#ifndef __ARMEB__
orr r2, r2, r3, lsl #8
@@ -53,7 +54,7 @@ ENTRY(__get_user_2)
ENDPROC(__get_user_2)
ENTRY(__get_user_4)
-4: ldrt r2, [r0]
+4: T(ldr) r2, [r0]
mov r0, #0
mov pc, lr
ENDPROC(__get_user_4)
diff --git a/arch/arm/lib/putuser.S b/arch/arm/lib/putuser.S
index 5a01a23..c023fc1 100644
--- a/arch/arm/lib/putuser.S
+++ b/arch/arm/lib/putuser.S
@@ -28,9 +28,10 @@
*/
#include <linux/linkage.h>
#include <asm/errno.h>
+#include <asm/domain.h>
ENTRY(__put_user_1)
-1: strbt r2, [r0]
+1: T(strb) r2, [r0]
mov r0, #0
mov pc, lr
ENDPROC(__put_user_1)
@@ -39,19 +40,19 @@ ENTRY(__put_user_2)
mov ip, r2, lsr #8
#ifdef CONFIG_THUMB2_KERNEL
#ifndef __ARMEB__
-2: strbt r2, [r0]
-3: strbt ip, [r0, #1]
+2: T(strb) r2, [r0]
+3: T(strb) ip, [r0, #1]
#else
-2: strbt ip, [r0]
-3: strbt r2, [r0, #1]
+2: T(strb) ip, [r0]
+3: T(strb) r2, [r0, #1]
#endif
#else /* !CONFIG_THUMB2_KERNEL */
#ifndef __ARMEB__
-2: strbt r2, [r0], #1
-3: strbt ip, [r0]
+2: T(strb) r2, [r0], #1
+3: T(strb) ip, [r0]
#else
-2: strbt ip, [r0], #1
-3: strbt r2, [r0]
+2: T(strb) ip, [r0], #1
+3: T(strb) r2, [r0]
#endif
#endif /* CONFIG_THUMB2_KERNEL */
mov r0, #0
@@ -59,18 +60,18 @@ ENTRY(__put_user_2)
ENDPROC(__put_user_2)
ENTRY(__put_user_4)
-4: strt r2, [r0]
+4: T(str) r2, [r0]
mov r0, #0
mov pc, lr
ENDPROC(__put_user_4)
ENTRY(__put_user_8)
#ifdef CONFIG_THUMB2_KERNEL
-5: strt r2, [r0]
-6: strt r3, [r0, #4]
+5: T(str) r2, [r0]
+6: T(str) r3, [r0, #4]
#else
-5: strt r2, [r0], #4
-6: strt r3, [r0]
+5: T(str) r2, [r0], #4
+6: T(str) r3, [r0]
#endif
mov r0, #0
mov pc, lr
diff --git a/arch/arm/lib/uaccess.S b/arch/arm/lib/uaccess.S
index fee9f6f..d0ece2a 100644
--- a/arch/arm/lib/uaccess.S
+++ b/arch/arm/lib/uaccess.S
@@ -14,6 +14,7 @@
#include <linux/linkage.h>
#include <asm/assembler.h>
#include <asm/errno.h>
+#include <asm/domain.h>
.text
@@ -31,11 +32,11 @@
rsb ip, ip, #4
cmp ip, #2
ldrb r3, [r1], #1
-USER( strbt r3, [r0], #1) @ May fault
+USER( T(strb) r3, [r0], #1) @ May fault
ldrgeb r3, [r1], #1
-USER( strgebt r3, [r0], #1) @ May fault
+USER( T(strgeb) r3, [r0], #1) @ May fault
ldrgtb r3, [r1], #1
-USER( strgtbt r3, [r0], #1) @ May fault
+USER( T(strgtb) r3, [r0], #1) @ May fault
sub r2, r2, ip
b .Lc2u_dest_aligned
@@ -58,7 +59,7 @@ ENTRY(__copy_to_user)
addmi ip, r2, #4
bmi .Lc2u_0nowords
ldr r3, [r1], #4
-USER( strt r3, [r0], #4) @ May fault
+USER( T(str) r3, [r0], #4) @ May fault
mov ip, r0, lsl #32 - PAGE_SHIFT @ On each page, use a ld/st??t instruction
rsb ip, ip, #0
movs ip, ip, lsr #32 - PAGE_SHIFT
@@ -87,18 +88,18 @@ USER( strt r3, [r0], #4) @ May fault
stmneia r0!, {r3 - r4} @ Shouldnt fault
tst ip, #4
ldrne r3, [r1], #4
- strnet r3, [r0], #4 @ Shouldnt fault
+ T(strne) r3, [r0], #4 @ Shouldnt fault
ands ip, ip, #3
beq .Lc2u_0fupi
.Lc2u_0nowords: teq ip, #0
beq .Lc2u_finished
.Lc2u_nowords: cmp ip, #2
ldrb r3, [r1], #1
-USER( strbt r3, [r0], #1) @ May fault
+USER( T(strb) r3, [r0], #1) @ May fault
ldrgeb r3, [r1], #1
-USER( strgebt r3, [r0], #1) @ May fault
+USER( T(strgeb) r3, [r0], #1) @ May fault
ldrgtb r3, [r1], #1
-USER( strgtbt r3, [r0], #1) @ May fault
+USER( T(strgtb) r3, [r0], #1) @ May fault
b .Lc2u_finished
.Lc2u_not_enough:
@@ -119,7 +120,7 @@ USER( strgtbt r3, [r0], #1) @ May fault
mov r3, r7, pull #8
ldr r7, [r1], #4
orr r3, r3, r7, push #24
-USER( strt r3, [r0], #4) @ May fault
+USER( T(str) r3, [r0], #4) @ May fault
mov ip, r0, lsl #32 - PAGE_SHIFT
rsb ip, ip, #0
movs ip, ip, lsr #32 - PAGE_SHIFT
@@ -154,18 +155,18 @@ USER( strt r3, [r0], #4) @ May fault
movne r3, r7, pull #8
ldrne r7, [r1], #4
orrne r3, r3, r7, push #24
- strnet r3, [r0], #4 @ Shouldnt fault
+ T(strne) r3, [r0], #4 @ Shouldnt fault
ands ip, ip, #3
beq .Lc2u_1fupi
.Lc2u_1nowords: mov r3, r7, get_byte_1
teq ip, #0
beq .Lc2u_finished
cmp ip, #2
-USER( strbt r3, [r0], #1) @ May fault
+USER( T(strb) r3, [r0], #1) @ May fault
movge r3, r7, get_byte_2
-USER( strgebt r3, [r0], #1) @ May fault
+USER( T(strgeb) r3, [r0], #1) @ May fault
movgt r3, r7, get_byte_3
-USER( strgtbt r3, [r0], #1) @ May fault
+USER( T(strgtb) r3, [r0], #1) @ May fault
b .Lc2u_finished
.Lc2u_2fupi: subs r2, r2, #4
@@ -174,7 +175,7 @@ USER( strgtbt r3, [r0], #1) @ May fault
mov r3, r7, pull #16
ldr r7, [r1], #4
orr r3, r3, r7, push #16
-USER( strt r3, [r0], #4) @ May fault
+USER( T(str) r3, [r0], #4) @ May fault
mov ip, r0, lsl #32 - PAGE_SHIFT
rsb ip, ip, #0
movs ip, ip, lsr #32 - PAGE_SHIFT
@@ -209,18 +210,18 @@ USER( strt r3, [r0], #4) @ May fault
movne r3, r7, pull #16
ldrne r7, [r1], #4
orrne r3, r3, r7, push #16
- strnet r3, [r0], #4 @ Shouldnt fault
+ T(strne) r3, [r0], #4 @ Shouldnt fault
ands ip, ip, #3
beq .Lc2u_2fupi
.Lc2u_2nowords: mov r3, r7, get_byte_2
teq ip, #0
beq .Lc2u_finished
cmp ip, #2
-USER( strbt r3, [r0], #1) @ May fault
+USER( T(strb) r3, [r0], #1) @ May fault
movge r3, r7, get_byte_3
-USER( strgebt r3, [r0], #1) @ May fault
+USER( T(strgeb) r3, [r0], #1) @ May fault
ldrgtb r3, [r1], #0
-USER( strgtbt r3, [r0], #1) @ May fault
+USER( T(strgtb) r3, [r0], #1) @ May fault
b .Lc2u_finished
.Lc2u_3fupi: subs r2, r2, #4
@@ -229,7 +230,7 @@ USER( strgtbt r3, [r0], #1) @ May fault
mov r3, r7, pull #24
ldr r7, [r1], #4
orr r3, r3, r7, push #8
-USER( strt r3, [r0], #4) @ May fault
+USER( T(str) r3, [r0], #4) @ May fault
mov ip, r0, lsl #32 - PAGE_SHIFT
rsb ip, ip, #0
movs ip, ip, lsr #32 - PAGE_SHIFT
@@ -264,18 +265,18 @@ USER( strt r3, [r0], #4) @ May fault
movne r3, r7, pull #24
ldrne r7, [r1], #4
orrne r3, r3, r7, push #8
- strnet r3, [r0], #4 @ Shouldnt fault
+ T(strne) r3, [r0], #4 @ Shouldnt fault
ands ip, ip, #3
beq .Lc2u_3fupi
.Lc2u_3nowords: mov r3, r7, get_byte_3
teq ip, #0
beq .Lc2u_finished
cmp ip, #2
-USER( strbt r3, [r0], #1) @ May fault
+USER( T(strb) r3, [r0], #1) @ May fault
ldrgeb r3, [r1], #1
-USER( strgebt r3, [r0], #1) @ May fault
+USER( T(strgeb) r3, [r0], #1) @ May fault
ldrgtb r3, [r1], #0
-USER( strgtbt r3, [r0], #1) @ May fault
+USER( T(strgtb) r3, [r0], #1) @ May fault
b .Lc2u_finished
ENDPROC(__copy_to_user)
@@ -294,11 +295,11 @@ ENDPROC(__copy_to_user)
.Lcfu_dest_not_aligned:
rsb ip, ip, #4
cmp ip, #2
-USER( ldrbt r3, [r1], #1) @ May fault
+USER( T(ldrb) r3, [r1], #1) @ May fault
strb r3, [r0], #1
-USER( ldrgebt r3, [r1], #1) @ May fault
+USER( T(ldrgeb) r3, [r1], #1) @ May fault
strgeb r3, [r0], #1
-USER( ldrgtbt r3, [r1], #1) @ May fault
+USER( T(ldrgtb) r3, [r1], #1) @ May fault
strgtb r3, [r0], #1
sub r2, r2, ip
b .Lcfu_dest_aligned
@@ -321,7 +322,7 @@ ENTRY(__copy_from_user)
.Lcfu_0fupi: subs r2, r2, #4
addmi ip, r2, #4
bmi .Lcfu_0nowords
-USER( ldrt r3, [r1], #4)
+USER( T(ldr) r3, [r1], #4)
str r3, [r0], #4
mov ip, r1, lsl #32 - PAGE_SHIFT @ On each page, use a ld/st??t instruction
rsb ip, ip, #0
@@ -350,18 +351,18 @@ USER( ldrt r3, [r1], #4)
ldmneia r1!, {r3 - r4} @ Shouldnt fault
stmneia r0!, {r3 - r4}
tst ip, #4
- ldrnet r3, [r1], #4 @ Shouldnt fault
+ T(ldrne) r3, [r1], #4 @ Shouldnt fault
strne r3, [r0], #4
ands ip, ip, #3
beq .Lcfu_0fupi
.Lcfu_0nowords: teq ip, #0
beq .Lcfu_finished
.Lcfu_nowords: cmp ip, #2
-USER( ldrbt r3, [r1], #1) @ May fault
+USER( T(ldrb) r3, [r1], #1) @ May fault
strb r3, [r0], #1
-USER( ldrgebt r3, [r1], #1) @ May fault
+USER( T(ldrgeb) r3, [r1], #1) @ May fault
strgeb r3, [r0], #1
-USER( ldrgtbt r3, [r1], #1) @ May fault
+USER( T(ldrgtb) r3, [r1], #1) @ May fault
strgtb r3, [r0], #1
b .Lcfu_finished
@@ -374,7 +375,7 @@ USER( ldrgtbt r3, [r1], #1) @ May fault
.Lcfu_src_not_aligned:
bic r1, r1, #3
-USER( ldrt r7, [r1], #4) @ May fault
+USER( T(ldr) r7, [r1], #4) @ May fault
cmp ip, #2
bgt .Lcfu_3fupi
beq .Lcfu_2fupi
@@ -382,7 +383,7 @@ USER( ldrt r7, [r1], #4) @ May fault
addmi ip, r2, #4
bmi .Lcfu_1nowords
mov r3, r7, pull #8
-USER( ldrt r7, [r1], #4) @ May fault
+USER( T(ldr) r7, [r1], #4) @ May fault
orr r3, r3, r7, push #24
str r3, [r0], #4
mov ip, r1, lsl #32 - PAGE_SHIFT
@@ -417,7 +418,7 @@ USER( ldrt r7, [r1], #4) @ May fault
stmneia r0!, {r3 - r4}
tst ip, #4
movne r3, r7, pull #8
-USER( ldrnet r7, [r1], #4) @ May fault
+USER( T(ldrne) r7, [r1], #4) @ May fault
orrne r3, r3, r7, push #24
strne r3, [r0], #4
ands ip, ip, #3
@@ -437,7 +438,7 @@ USER( ldrnet r7, [r1], #4) @ May fault
addmi ip, r2, #4
bmi .Lcfu_2nowords
mov r3, r7, pull #16
-USER( ldrt r7, [r1], #4) @ May fault
+USER( T(ldr) r7, [r1], #4) @ May fault
orr r3, r3, r7, push #16
str r3, [r0], #4
mov ip, r1, lsl #32 - PAGE_SHIFT
@@ -473,7 +474,7 @@ USER( ldrt r7, [r1], #4) @ May fault
stmneia r0!, {r3 - r4}
tst ip, #4
movne r3, r7, pull #16
-USER( ldrnet r7, [r1], #4) @ May fault
+USER( T(ldrne) r7, [r1], #4) @ May fault
orrne r3, r3, r7, push #16
strne r3, [r0], #4
ands ip, ip, #3
@@ -485,7 +486,7 @@ USER( ldrnet r7, [r1], #4) @ May fault
strb r3, [r0], #1
movge r3, r7, get_byte_3
strgeb r3, [r0], #1
-USER( ldrgtbt r3, [r1], #0) @ May fault
+USER( T(ldrgtb) r3, [r1], #0) @ May fault
strgtb r3, [r0], #1
b .Lcfu_finished
@@ -493,7 +494,7 @@ USER( ldrgtbt r3, [r1], #0) @ May fault
addmi ip, r2, #4
bmi .Lcfu_3nowords
mov r3, r7, pull #24
-USER( ldrt r7, [r1], #4) @ May fault
+USER( T(ldr) r7, [r1], #4) @ May fault
orr r3, r3, r7, push #8
str r3, [r0], #4
mov ip, r1, lsl #32 - PAGE_SHIFT
@@ -528,7 +529,7 @@ USER( ldrt r7, [r1], #4) @ May fault
stmneia r0!, {r3 - r4}
tst ip, #4
movne r3, r7, pull #24
-USER( ldrnet r7, [r1], #4) @ May fault
+USER( T(ldrne) r7, [r1], #4) @ May fault
orrne r3, r3, r7, push #8
strne r3, [r0], #4
ands ip, ip, #3
@@ -538,9 +539,9 @@ USER( ldrnet r7, [r1], #4) @ May fault
beq .Lcfu_finished
cmp ip, #2
strb r3, [r0], #1
-USER( ldrgebt r3, [r1], #1) @ May fault
+USER( T(ldrgeb) r3, [r1], #1) @ May fault
strgeb r3, [r0], #1
-USER( ldrgtbt r3, [r1], #1) @ May fault
+USER( T(ldrgtb) r3, [r1], #1) @ May fault
strgtb r3, [r0], #1
b .Lcfu_finished
ENDPROC(__copy_from_user)
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 346ae14..f33c422 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -599,6 +599,14 @@ config CPU_CP15_MPU
help
Processor has the CP15 register, which has MPU related registers.
+config CPU_USE_DOMAINS
+ bool
+ depends on MMU
+ default y if !HAS_TLS_REG
+ help
+ This option enables or disables the use of domain switching
+ via the set_fs() function.
+
#
# CPU supports 36-bit I/O
#
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 2858941..499e22d 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -25,6 +25,7 @@
#include <asm/smp_plat.h>
#include <asm/tlb.h>
#include <asm/highmem.h>
+#include <asm/traps.h>
#include <asm/mach/arch.h>
#include <asm/mach/map.h>
@@ -935,12 +936,11 @@ static void __init devicemaps_init(struct machine_desc *mdesc)
{
struct map_desc map;
unsigned long addr;
- void *vectors;
/*
* Allocate the vector page early.
*/
- vectors = alloc_bootmem_low_pages(PAGE_SIZE);
+ vectors_page = alloc_bootmem_low_pages(PAGE_SIZE);
for (addr = VMALLOC_END; addr; addr += PGDIR_SIZE)
pmd_clear(pmd_off_k(addr));
@@ -980,7 +980,7 @@ static void __init devicemaps_init(struct machine_desc *mdesc)
* location (0xffff0000). If we aren't using high-vectors, also
* create a mapping at the low-vectors virtual address.
*/
- map.pfn = __phys_to_pfn(virt_to_phys(vectors));
+ map.pfn = __phys_to_pfn(virt_to_phys(vectors_page));
map.virtual = 0xffff0000;
map.length = PAGE_SIZE;
map.type = MT_HIGH_VECTORS;
diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
index 7d63bea..337f102 100644
--- a/arch/arm/mm/proc-macros.S
+++ b/arch/arm/mm/proc-macros.S
@@ -99,6 +99,10 @@
* 110x 0 1 0 r/w r/o
* 11x0 0 1 0 r/w r/o
* 1111 0 1 1 r/w r/w
+ *
+ * If !CONFIG_CPU_USE_DOMAINS, the following permissions are changed:
+ * 110x 1 1 1 r/o r/o
+ * 11x0 1 1 1 r/o r/o
*/
.macro armv6_mt_table pfx
\pfx\()_mt_table:
@@ -138,8 +142,11 @@
tst r1, #L_PTE_USER
orrne r3, r3, #PTE_EXT_AP1
+#ifdef CONFIG_CPU_USE_DOMAINS
+ @ allow kernel read/write access to read-only user pages
tstne r3, #PTE_EXT_APX
bicne r3, r3, #PTE_EXT_APX | PTE_EXT_AP0
+#endif
tst r1, #L_PTE_EXEC
orreq r3, r3, #PTE_EXT_XN
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 7aaf88a..c1c3fe0 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -152,8 +152,11 @@ ENTRY(cpu_v7_set_pte_ext)
tst r1, #L_PTE_USER
orrne r3, r3, #PTE_EXT_AP1
+#ifdef CONFIG_CPU_USE_DOMAINS
+ @ allow kernel read/write access to read-only user pages
tstne r3, #PTE_EXT_APX
bicne r3, r3, #PTE_EXT_APX | PTE_EXT_AP0
+#endif
tst r1, #L_PTE_EXEC
orreq r3, r3, #PTE_EXT_XN
@@ -240,8 +243,6 @@ __v7_setup:
mcr p15, 0, r10, c2, c0, 2 @ TTB control register
orr r4, r4, #TTB_FLAGS
mcr p15, 0, r4, c2, c0, 1 @ load TTB1
- mov r10, #0x1f @ domains 0, 1 = manager
- mcr p15, 0, r10, c3, c0, 0 @ load domain access register
/*
* Memory region attributes with SCTLR.TRE=1
*
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache
2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas
2010-06-21 14:46 ` [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
@ 2010-06-21 14:46 ` Catalin Marinas
2010-06-22 19:36 ` Rabin Vincent
2010-06-21 14:46 ` [PATCH v4 3/4] ARM: Synchronise the I and D caches via set_pte_at() on SMP systems Catalin Marinas
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Catalin Marinas @ 2010-06-21 14:46 UTC (permalink / raw)
To: linux-arm-kernel
There are places in Linux where writes to newly allocated page cache
pages happen without a subsequent call to flush_dcache_page() (several
PIO drivers including USB HCD). This patch changes the meaning of
PG_arch_1 to be PG_dcache_clean and always flush the D-cache for a newly
mapped page in update_mmu_cache().
The patch also sets the PG_arch_1 bit in the DMA cache maintenance
function to avoid additional cache flushing in update_mmu_cache().
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/cacheflush.h | 6 +++---
arch/arm/include/asm/tlbflush.h | 2 +-
arch/arm/mm/copypage-v4mc.c | 2 +-
arch/arm/mm/copypage-v6.c | 2 +-
arch/arm/mm/copypage-xscale.c | 2 +-
arch/arm/mm/dma-mapping.c | 6 ++++++
arch/arm/mm/fault-armv.c | 4 ++--
arch/arm/mm/flush.c | 3 ++-
8 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 4656a24..d3730f0 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -137,10 +137,10 @@
#endif
/*
- * This flag is used to indicate that the page pointed to by a pte
- * is dirty and requires cleaning before returning it to the user.
+ * This flag is used to indicate that the page pointed to by a pte is clean
+ * and does not require cleaning before returning it to the user.
*/
-#define PG_dcache_dirty PG_arch_1
+#define PG_dcache_clean PG_arch_1
/*
* MM Cache Management
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index bd863d8..40a7092 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -552,7 +552,7 @@ extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
#endif
/*
- * if PG_dcache_dirty is set for the page, we need to ensure that any
+ * If PG_dcache_clean is not set for the page, we need to ensure that any
* cache entries for the kernels virtual memory range are written
* back to the page.
*/
diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
index 598c51a..b806151 100644
--- a/arch/arm/mm/copypage-v4mc.c
+++ b/arch/arm/mm/copypage-v4mc.c
@@ -73,7 +73,7 @@ void v4_mc_copy_user_highpage(struct page *to, struct page *from,
{
void *kto = kmap_atomic(to, KM_USER1);
- if (test_and_clear_bit(PG_dcache_dirty, &from->flags))
+ if (!test_and_set_bit(PG_dcache_clean, &from->flags))
__flush_dcache_page(page_mapping(from), from);
spin_lock(&minicache_lock);
diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
index f55fa10..bdba6c6 100644
--- a/arch/arm/mm/copypage-v6.c
+++ b/arch/arm/mm/copypage-v6.c
@@ -79,7 +79,7 @@ static void v6_copy_user_highpage_aliasing(struct page *to,
unsigned int offset = CACHE_COLOUR(vaddr);
unsigned long kfrom, kto;
- if (test_and_clear_bit(PG_dcache_dirty, &from->flags))
+ if (!test_and_set_bit(PG_dcache_clean, &from->flags))
__flush_dcache_page(page_mapping(from), from);
/* FIXME: not highmem safe */
diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
index 9920c0a..649bbcd 100644
--- a/arch/arm/mm/copypage-xscale.c
+++ b/arch/arm/mm/copypage-xscale.c
@@ -95,7 +95,7 @@ void xscale_mc_copy_user_highpage(struct page *to, struct page *from,
{
void *kto = kmap_atomic(to, KM_USER1);
- if (test_and_clear_bit(PG_dcache_dirty, &from->flags))
+ if (!test_and_set_bit(PG_dcache_clean, &from->flags))
__flush_dcache_page(page_mapping(from), from);
spin_lock(&minicache_lock);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 13fa536..bb9b612 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -508,6 +508,12 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
outer_inv_range(paddr, paddr + size);
dma_cache_maint_page(page, off, size, dir, dmac_unmap_area);
+
+ /*
+ * Mark the D-cache clean for this page to avoid extra flushing.
+ */
+ if (dir != DMA_TO_DEVICE)
+ set_bit(PG_dcache_clean, &page->flags);
}
EXPORT_SYMBOL(___dma_page_dev_to_cpu);
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 9b906de..58846cb 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -141,7 +141,7 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
* a page table, or changing an existing PTE. Basically, there are two
* things that we need to take care of:
*
- * 1. If PG_dcache_dirty is set for the page, we need to ensure
+ * 1. If PG_dcache_clean is not set for the page, we need to ensure
* that any cache entries for the kernels virtual memory
* range are written back to the page.
* 2. If we have multiple shared mappings of the same space in
@@ -169,7 +169,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
mapping = page_mapping(page);
#ifndef CONFIG_SMP
- if (test_and_clear_bit(PG_dcache_dirty, &page->flags))
+ if (!test_and_set_bit(PG_dcache_clean, &page->flags))
__flush_dcache_page(mapping, page);
#endif
if (mapping) {
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index c6844cb..9f9919b 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -248,7 +248,7 @@ void flush_dcache_page(struct page *page)
#ifndef CONFIG_SMP
if (!PageHighMem(page) && mapping && !mapping_mapped(mapping))
- set_bit(PG_dcache_dirty, &page->flags);
+ clear_bit(PG_dcache_clean, &page->flags);
else
#endif
{
@@ -257,6 +257,7 @@ void flush_dcache_page(struct page *page)
__flush_dcache_aliases(mapping, page);
else if (mapping)
__flush_icache_all();
+ set_bit(PG_dcache_clean, &page->flags);
}
}
EXPORT_SYMBOL(flush_dcache_page);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 3/4] ARM: Synchronise the I and D caches via set_pte_at() on SMP systems
2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas
2010-06-21 14:46 ` [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
2010-06-21 14:46 ` [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache Catalin Marinas
@ 2010-06-21 14:46 ` Catalin Marinas
2010-06-21 14:46 ` [PATCH v4 4/4] ARM: Use lazy cache flushing on ARMv7 " Catalin Marinas
2010-06-22 10:29 ` [PATCH v4 0/4] Patches for -next Rabin Vincent
4 siblings, 0 replies; 11+ messages in thread
From: Catalin Marinas @ 2010-06-21 14:46 UTC (permalink / raw)
To: linux-arm-kernel
On SMP systems, there is a small chance of a PTE becoming visible to a
different CPU before the cache maintenance operations in
update_mmu_cache(). This patch follows the IA-64 and PowerPC approach of
synchronising the I and D caches via the set_pte_at() function. In this
case, there is no need for update_mmu_cache() to be implemented since
lazy cache flushing is already handled by the time this function is
called.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/pgtable.h | 25 ++++++++++++++++++++++---
arch/arm/include/asm/tlbflush.h | 10 +++++++++-
arch/arm/mm/fault-armv.c | 4 ++--
arch/arm/mm/flush.c | 15 +++++++++++++++
4 files changed, 48 insertions(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index ab68cf1..a3752f5 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -278,9 +278,24 @@ extern struct page *empty_zero_page;
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
-#define set_pte_at(mm,addr,ptep,pteval) do { \
- set_pte_ext(ptep, pteval, (addr) >= TASK_SIZE ? 0 : PTE_EXT_NG); \
- } while (0)
+#ifndef CONFIG_SMP
+static inline void __sync_icache_dcache(pte_t pteval)
+{
+}
+#else
+extern void __sync_icache_dcache(pte_t pteval);
+#endif
+
+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pteval)
+{
+ if (addr >= TASK_SIZE)
+ set_pte_ext(ptep, pteval, 0);
+ else {
+ __sync_icache_dcache(pteval);
+ set_pte_ext(ptep, pteval, PTE_EXT_NG);
+ }
+}
/*
* The following only work if pte_present() is true.
@@ -292,6 +307,10 @@ extern struct page *empty_zero_page;
#define pte_young(pte) (pte_val(pte) & L_PTE_YOUNG)
#define pte_special(pte) (0)
+#define pte_present_exec_user(pte) \
+ ((pte_val(pte) & (L_PTE_PRESENT | L_PTE_EXEC | L_PTE_USER)) == \
+ (L_PTE_PRESENT | L_PTE_EXEC | L_PTE_USER))
+
#define PTE_BIT_FUNC(fn,op) \
static inline pte_t pte_##fn(pte_t pte) { pte_val(pte) op; return pte; }
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 40a7092..6aabedd 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -554,10 +554,18 @@ extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
/*
* If PG_dcache_clean is not set for the page, we need to ensure that any
* cache entries for the kernels virtual memory range are written
- * back to the page.
+ * back to the page. On SMP systems, the cache coherency is handled in the
+ * set_pte_at() function.
*/
+#ifndef CONFIG_SMP
extern void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
pte_t *ptep);
+#else
+static inline void update_mmu_cache(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+}
+#endif
#endif
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 58846cb..030f1da 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -28,6 +28,7 @@
static unsigned long shared_pte_mask = L_PTE_MT_BUFFERABLE;
+#ifndef CONFIG_SMP
/*
* We take the easy way out of this problem - we make the
* PTE uncacheable. However, we leave the write buffer on.
@@ -168,10 +169,8 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
return;
mapping = page_mapping(page);
-#ifndef CONFIG_SMP
if (!test_and_set_bit(PG_dcache_clean, &page->flags))
__flush_dcache_page(mapping, page);
-#endif
if (mapping) {
if (cache_is_vivt())
make_coherent(mapping, vma, addr, ptep, pfn);
@@ -179,6 +178,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
__flush_icache_all();
}
}
+#endif /* !CONFIG_SMP */
/*
* Check whether the write buffer has physical address aliasing
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 9f9919b..18a8467 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -215,6 +215,21 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
flush_dcache_mmap_unlock(mapping);
}
+#ifdef CONFIG_SMP
+void __sync_icache_dcache(pte_t pteval)
+{
+ unsigned long pfn = pte_pfn(pteval);
+
+ if (pfn_valid(pfn) && pte_present_exec_user(pteval)) {
+ struct page *page = pfn_to_page(pfn);
+
+ if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+ __flush_dcache_page(NULL, page);
+ __flush_icache_all();
+ }
+}
+#endif
+
/*
* Ensure cache coherency between kernel mapping and userspace mapping
* of this page.
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 4/4] ARM: Use lazy cache flushing on ARMv7 SMP systems
2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas
` (2 preceding siblings ...)
2010-06-21 14:46 ` [PATCH v4 3/4] ARM: Synchronise the I and D caches via set_pte_at() on SMP systems Catalin Marinas
@ 2010-06-21 14:46 ` Catalin Marinas
2010-06-22 10:29 ` [PATCH v4 0/4] Patches for -next Rabin Vincent
4 siblings, 0 replies; 11+ messages in thread
From: Catalin Marinas @ 2010-06-21 14:46 UTC (permalink / raw)
To: linux-arm-kernel
ARMv7 processors like Cortex-A9 broadcast the cache maintenance
operations in hardware. This patch allows the
flush_dcache_page/update_mmu_cache pair to work in lazy flushing mode
similar to the UP case.
Note that cache flushing on SMP systems now takes place via the
set_pte_at() call (__sync_icache_dcache) and there is no race with other
CPUs executing code from the new PTE before the cache flushing took
place.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/smp_plat.h | 4 ++++
arch/arm/mm/flush.c | 13 ++++---------
2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h
index e621530..963a338 100644
--- a/arch/arm/include/asm/smp_plat.h
+++ b/arch/arm/include/asm/smp_plat.h
@@ -13,9 +13,13 @@ static inline int tlb_ops_need_broadcast(void)
return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 2;
}
+#if !defined(CONFIG_SMP) || __LINUX_ARM_ARCH__ >= 7
+#define cache_ops_need_broadcast() 0
+#else
static inline int cache_ops_need_broadcast(void)
{
return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 1;
}
+#endif
#endif
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 18a8467..2d08a5e 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -17,6 +17,7 @@
#include <asm/smp_plat.h>
#include <asm/system.h>
#include <asm/tlbflush.h>
+#include <asm/smp_plat.h>
#include "mm.h"
@@ -93,12 +94,10 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig
#define flush_pfn_alias(pfn,vaddr) do { } while (0)
#endif
-#ifdef CONFIG_SMP
static void flush_ptrace_access_other(void *args)
{
__flush_icache_all();
}
-#endif
static
void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
@@ -122,11 +121,9 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
if (vma->vm_flags & VM_EXEC) {
unsigned long addr = (unsigned long)kaddr;
__cpuc_coherent_kern_range(addr, addr + len);
-#ifdef CONFIG_SMP
if (cache_ops_need_broadcast())
smp_call_function(flush_ptrace_access_other,
NULL, 1);
-#endif
}
}
@@ -261,12 +258,10 @@ void flush_dcache_page(struct page *page)
mapping = page_mapping(page);
-#ifndef CONFIG_SMP
- if (!PageHighMem(page) && mapping && !mapping_mapped(mapping))
+ if (!cache_ops_need_broadcast() &&
+ !PageHighMem(page) && mapping && !mapping_mapped(mapping))
clear_bit(PG_dcache_clean, &page->flags);
- else
-#endif
- {
+ else {
__flush_dcache_page(mapping, page);
if (mapping && cache_is_vivt())
__flush_dcache_aliases(mapping, page);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 0/4] Patches for -next
2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas
` (3 preceding siblings ...)
2010-06-21 14:46 ` [PATCH v4 4/4] ARM: Use lazy cache flushing on ARMv7 " Catalin Marinas
@ 2010-06-22 10:29 ` Rabin Vincent
2010-06-22 11:34 ` Catalin Marinas
4 siblings, 1 reply; 11+ messages in thread
From: Rabin Vincent @ 2010-06-22 10:29 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jun 21, 2010 at 8:16 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> Pretty much the same content as v3 but I reordered the last three
> patches to make sure that switching to lazy cache flushing on ARMv7 SMP
> doesn't make the set_pte/cache flushing race more visible (issue handled
> by the set_pte_at patch).
>
>
> Catalin Marinas (4):
> ? ? ?ARM: Remove the domain switching on ARMv6k/v7 CPUs
> ? ? ?ARM: Assume new page cache pages have dirty D-cache
> ? ? ?ARM: Synchronise the I and D caches via set_pte_at() on SMP systems
> ? ? ?ARM: Use lazy cache flushing on ARMv7 SMP systems
I've tried patches 2, 3, and 4 on a Cortex-A9 SMP system and it resolves the
MMC rootfs init crash without the need for the flush_kernel_dcache_page()
change.
Tested-by: Rabin Vincent <rabin.vincent@stericsson.com>
Rabin
P.S.: Patch 1 (at least the version on your proposed branch), doesn't seem to
apply cleanly on linux-next.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 0/4] Patches for -next
2010-06-22 10:29 ` [PATCH v4 0/4] Patches for -next Rabin Vincent
@ 2010-06-22 11:34 ` Catalin Marinas
0 siblings, 0 replies; 11+ messages in thread
From: Catalin Marinas @ 2010-06-22 11:34 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 2010-06-22 at 11:29 +0100, Rabin Vincent wrote:
> On Mon, Jun 21, 2010 at 8:16 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > Pretty much the same content as v3 but I reordered the last three
> > patches to make sure that switching to lazy cache flushing on ARMv7 SMP
> > doesn't make the set_pte/cache flushing race more visible (issue handled
> > by the set_pte_at patch).
> >
> >
> > Catalin Marinas (4):
> > ARM: Remove the domain switching on ARMv6k/v7 CPUs
> > ARM: Assume new page cache pages have dirty D-cache
> > ARM: Synchronise the I and D caches via set_pte_at() on SMP systems
> > ARM: Use lazy cache flushing on ARMv7 SMP systems
>
> I've tried patches 2, 3, and 4 on a Cortex-A9 SMP system and it resolves the
> MMC rootfs init crash without the need for the flush_kernel_dcache_page()
> change.
>
> Tested-by: Rabin Vincent <rabin.vincent@stericsson.com>
Thanks.
> P.S.: Patch 1 (at least the version on your proposed branch), doesn't seem to
> apply cleanly on linux-next.
It's a bit more intrusive and I'd like to push it to linux-next (but
still waiting for review on this list).
--
Catalin
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs
2010-06-21 14:46 ` [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
@ 2010-06-22 12:47 ` Anton Vorontsov
2010-06-22 13:01 ` Catalin Marinas
0 siblings, 1 reply; 11+ messages in thread
From: Anton Vorontsov @ 2010-06-22 12:47 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jun 21, 2010 at 03:46:26PM +0100, Catalin Marinas wrote:
> This patch removes the domain switching functionality via the set_fs and
> __switch_to functions on cores that have a TLS register.
>
> Currently, the ioremap and vmalloc areas share the same level 1 page
> tables and therefore have the same domain (DOMAIN_KERNEL). When the
> kernel domain is modified from Client to Manager (via the __set_fs or in
> the __switch_to function), the XN (eXecute Never) bit is overridden and
> newer CPUs can speculatively prefetch the ioremap'ed memory.
>
> Linux performs the kernel domain switching to allow user-specific
> functions (copy_to/from_user, get/put_user etc.) to access kernel
> memory. In order for these functions to work with the kernel domain set
> to Client, the patch modifies the LDRT/STRT and related instructions to
> the LDR/STR ones.
>
> The user pages access rights are also modified for kernel read-only
> access rather than read/write so that the copy-on-write mechanism still
> works. CPU_USE_DOMAINS gets disabled only if HAS_TLS_REG is defined
> since writing the TLS value to the high vectors page isn't possible.
>
> The user addresses passed to the kernel are checked by the access_ok()
> function so that they do not point to the kernel space.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
I tested this on ARMv6K (ARM11 MPcore) and ARMv7 (Cortex-A9), and
didn't notice any issues. This is also needed for robust mutextes
support... so, if that helps,
Tested-by: Anton Vorontsov <cbouatmailru@gmail.com>
Thanks!
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs
2010-06-22 12:47 ` Anton Vorontsov
@ 2010-06-22 13:01 ` Catalin Marinas
0 siblings, 0 replies; 11+ messages in thread
From: Catalin Marinas @ 2010-06-22 13:01 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 2010-06-22 at 13:47 +0100, Anton Vorontsov wrote:
> On Mon, Jun 21, 2010 at 03:46:26PM +0100, Catalin Marinas wrote:
> > This patch removes the domain switching functionality via the set_fs and
> > __switch_to functions on cores that have a TLS register.
> >
> > Currently, the ioremap and vmalloc areas share the same level 1 page
> > tables and therefore have the same domain (DOMAIN_KERNEL). When the
> > kernel domain is modified from Client to Manager (via the __set_fs or in
> > the __switch_to function), the XN (eXecute Never) bit is overridden and
> > newer CPUs can speculatively prefetch the ioremap'ed memory.
> >
> > Linux performs the kernel domain switching to allow user-specific
> > functions (copy_to/from_user, get/put_user etc.) to access kernel
> > memory. In order for these functions to work with the kernel domain set
> > to Client, the patch modifies the LDRT/STRT and related instructions to
> > the LDR/STR ones.
> >
> > The user pages access rights are also modified for kernel read-only
> > access rather than read/write so that the copy-on-write mechanism still
> > works. CPU_USE_DOMAINS gets disabled only if HAS_TLS_REG is defined
> > since writing the TLS value to the high vectors page isn't possible.
> >
> > The user addresses passed to the kernel are checked by the access_ok()
> > function so that they do not point to the kernel space.
> >
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>
> I tested this on ARMv6K (ARM11 MPcore) and ARMv7 (Cortex-A9), and
> didn't notice any issues. This is also needed for robust mutextes
> support... so, if that helps,
>
> Tested-by: Anton Vorontsov <cbouatmailru@gmail.com>
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache
2010-06-21 14:46 ` [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache Catalin Marinas
@ 2010-06-22 19:36 ` Rabin Vincent
2010-06-22 22:39 ` Catalin Marinas
0 siblings, 1 reply; 11+ messages in thread
From: Rabin Vincent @ 2010-06-22 19:36 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jun 21, 2010 at 03:46:32PM +0100, Catalin Marinas wrote:
> There are places in Linux where writes to newly allocated page cache
> pages happen without a subsequent call to flush_dcache_page() (several
> PIO drivers including USB HCD). This patch changes the meaning of
> PG_arch_1 to be PG_dcache_clean and always flush the D-cache for a newly
> mapped page in update_mmu_cache().
Correct me if I'm misreading the code, but don't this patch and the next
one make the assumption that CONFIG_SMP == VIPT non-aliasing (or PIPT)
caches? This patch does not add flushing on SMP systems, and the next
one handles the I$-D$ coherency issues there (ignoring the set_pte race
fix for a moment). Won't the flushing added in this patch be
unnecessary on non-SMP PIPT systems, since they too only need the
exec-related flushing?
Rabin
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache
2010-06-22 19:36 ` Rabin Vincent
@ 2010-06-22 22:39 ` Catalin Marinas
0 siblings, 0 replies; 11+ messages in thread
From: Catalin Marinas @ 2010-06-22 22:39 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 2010-06-22 at 20:36 +0100, Rabin Vincent wrote:
> On Mon, Jun 21, 2010 at 03:46:32PM +0100, Catalin Marinas wrote:
> > There are places in Linux where writes to newly allocated page cache
> > pages happen without a subsequent call to flush_dcache_page() (several
> > PIO drivers including USB HCD). This patch changes the meaning of
> > PG_arch_1 to be PG_dcache_clean and always flush the D-cache for a newly
> > mapped page in update_mmu_cache().
>
> Correct me if I'm misreading the code, but don't this patch and the next
> one make the assumption that CONFIG_SMP == VIPT non-aliasing (or PIPT)
> caches?
Yes.
> This patch does not add flushing on SMP systems, and the next
> one handles the I$-D$ coherency issues there (ignoring the set_pte race
> fix for a moment).
The aim of this patch isn't to add any new flushing, just changes the
default assumptions on the D-cache status for new pages (from clean to
dirty).
The following patch (sync I-D caches) is aimed at sorting out a race.
This patch indeed adds a __flush_dcache_page() call if the page is dirty
but with well-behaved drivers (calling flush_dcache_page), the
PG_dcache_clean bit should always be set when reaching set_pte_at().
On ARM11MPCore, the cache operations are not broadcast in hardware, so
flush_dcache_page() must be called on the CPU that dirtied the cache. So
not having a correct driver could lead to errors in user space on this
processor (if running with more than 1).
Cortex-A9 broadcasts the cache ops in hardware, so even if you don't do
it via the driver and flush_dcache_page, you catch the dirty page later
in set_pte_at().
So the sync I-D cache patch fixes both the race and the handling of not
well written drivers (the latter only on Cortex-A9). With all this in
place, the final patch in my series allows Cortex-A9 to use lazy cache
flushing as on UP systems (flush_dcache_page/update_mmu_cache pair).
BTW, I have a patch for ARM11MPCore which allows lazy cache flushing as
well (by performing a read-for-ownership before flushing the D-cache).
However, I can't guarantee that it works on anything other than
ARM11MPcore as it is to specific to the hardware implementation.
> Won't the flushing added in this patch be
> unnecessary on non-SMP PIPT systems, since they too only need the
> exec-related flushing?
Yes, this can be optimised. We could extend the __sync_icache_dcache()
patch to all the ARMv7 (UP) and SMP configurations (just a matter of
#ifdef's) where we know for sure that the hardware is VIPT non-aliasing.
I'll have a look tomorrow and post an update to this patch.
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-06-22 22:39 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas
2010-06-21 14:46 ` [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
2010-06-22 12:47 ` Anton Vorontsov
2010-06-22 13:01 ` Catalin Marinas
2010-06-21 14:46 ` [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache Catalin Marinas
2010-06-22 19:36 ` Rabin Vincent
2010-06-22 22:39 ` Catalin Marinas
2010-06-21 14:46 ` [PATCH v4 3/4] ARM: Synchronise the I and D caches via set_pte_at() on SMP systems Catalin Marinas
2010-06-21 14:46 ` [PATCH v4 4/4] ARM: Use lazy cache flushing on ARMv7 " Catalin Marinas
2010-06-22 10:29 ` [PATCH v4 0/4] Patches for -next Rabin Vincent
2010-06-22 11:34 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).