* [PATCH v4 0/4] Patches for -next
@ 2010-06-21 14:46 Catalin Marinas
2010-06-21 14:46 ` [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Catalin Marinas @ 2010-06-21 14:46 UTC (permalink / raw)
To: linux-arm-kernel
Pretty much the same content as v3 but I reordered the last three
patches to make sure that switching to lazy cache flushing on ARMv7 SMP
doesn't make the set_pte/cache flushing race more visible (issue handled
by the set_pte_at patch).
Catalin Marinas (4):
ARM: Remove the domain switching on ARMv6k/v7 CPUs
ARM: Assume new page cache pages have dirty D-cache
ARM: Synchronise the I and D caches via set_pte_at() on SMP systems
ARM: Use lazy cache flushing on ARMv7 SMP systems
arch/arm/include/asm/assembler.h | 13 +++---
arch/arm/include/asm/cacheflush.h | 6 +--
arch/arm/include/asm/domain.h | 31 +++++++++++++-
arch/arm/include/asm/futex.h | 9 ++--
arch/arm/include/asm/pgtable.h | 25 ++++++++++-
arch/arm/include/asm/smp_plat.h | 4 ++
arch/arm/include/asm/tlbflush.h | 12 ++++-
arch/arm/include/asm/traps.h | 2 +
arch/arm/include/asm/uaccess.h | 16 ++++---
arch/arm/kernel/entry-armv.S | 4 +-
arch/arm/kernel/fiq.c | 5 ++
arch/arm/kernel/traps.c | 14 ++++--
arch/arm/lib/getuser.S | 13 +++---
arch/arm/lib/putuser.S | 29 +++++++------
arch/arm/lib/uaccess.S | 83 +++++++++++++++++++------------------
arch/arm/mm/Kconfig | 8 ++++
arch/arm/mm/copypage-v4mc.c | 2 -
arch/arm/mm/copypage-v6.c | 2 -
arch/arm/mm/copypage-xscale.c | 2 -
arch/arm/mm/dma-mapping.c | 6 +++
arch/arm/mm/fault-armv.c | 8 ++--
arch/arm/mm/flush.c | 31 +++++++++-----
arch/arm/mm/mmu.c | 6 +--
arch/arm/mm/proc-macros.S | 7 +++
arch/arm/mm/proc-v7.S | 5 +-
25 files changed, 226 insertions(+), 117 deletions(-)
--
Catalin
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs 2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas @ 2010-06-21 14:46 ` Catalin Marinas 2010-06-22 12:47 ` Anton Vorontsov 2010-06-21 14:46 ` [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache Catalin Marinas ` (3 subsequent siblings) 4 siblings, 1 reply; 11+ messages in thread From: Catalin Marinas @ 2010-06-21 14:46 UTC (permalink / raw) To: linux-arm-kernel This patch removes the domain switching functionality via the set_fs and __switch_to functions on cores that have a TLS register. Currently, the ioremap and vmalloc areas share the same level 1 page tables and therefore have the same domain (DOMAIN_KERNEL). When the kernel domain is modified from Client to Manager (via the __set_fs or in the __switch_to function), the XN (eXecute Never) bit is overridden and newer CPUs can speculatively prefetch the ioremap'ed memory. Linux performs the kernel domain switching to allow user-specific functions (copy_to/from_user, get/put_user etc.) to access kernel memory. In order for these functions to work with the kernel domain set to Client, the patch modifies the LDRT/STRT and related instructions to the LDR/STR ones. The user pages access rights are also modified for kernel read-only access rather than read/write so that the copy-on-write mechanism still works. CPU_USE_DOMAINS gets disabled only if HAS_TLS_REG is defined since writing the TLS value to the high vectors page isn't possible. The user addresses passed to the kernel are checked by the access_ok() function so that they do not point to the kernel space. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/assembler.h | 13 +++--- arch/arm/include/asm/domain.h | 31 +++++++++++++- arch/arm/include/asm/futex.h | 9 ++-- arch/arm/include/asm/traps.h | 2 + arch/arm/include/asm/uaccess.h | 16 ++++--- arch/arm/kernel/entry-armv.S | 4 +- arch/arm/kernel/fiq.c | 5 ++ arch/arm/kernel/traps.c | 14 +++++- arch/arm/lib/getuser.S | 13 +++--- arch/arm/lib/putuser.S | 29 +++++++------ arch/arm/lib/uaccess.S | 83 +++++++++++++++++++------------------- arch/arm/mm/Kconfig | 8 ++++ arch/arm/mm/mmu.c | 6 +-- arch/arm/mm/proc-macros.S | 7 +++ arch/arm/mm/proc-v7.S | 5 +- 15 files changed, 153 insertions(+), 92 deletions(-) diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index 6e8f05c..66db132 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -18,6 +18,7 @@ #endif #include <asm/ptrace.h> +#include <asm/domain.h> /* * Endian independent macros for shifting bytes within registers. @@ -183,12 +184,12 @@ */ #ifdef CONFIG_THUMB2_KERNEL - .macro usraccoff, instr, reg, ptr, inc, off, cond, abort + .macro usraccoff, instr, reg, ptr, inc, off, cond, abort, t=T() 9999: .if \inc == 1 - \instr\cond\()bt \reg, [\ptr, #\off] + \instr\cond\()b\()\t\().w \reg, [\ptr, #\off] .elseif \inc == 4 - \instr\cond\()t \reg, [\ptr, #\off] + \instr\cond\()\t\().w \reg, [\ptr, #\off] .else .error "Unsupported inc macro argument" .endif @@ -223,13 +224,13 @@ #else /* !CONFIG_THUMB2_KERNEL */ - .macro usracc, instr, reg, ptr, inc, cond, rept, abort + .macro usracc, instr, reg, ptr, inc, cond, rept, abort, t=T() .rept \rept 9999: .if \inc == 1 - \instr\cond\()bt \reg, [\ptr], #\inc + \instr\cond\()b\()\t \reg, [\ptr], #\inc .elseif \inc == 4 - \instr\cond\()t \reg, [\ptr], #\inc + \instr\cond\()\t \reg, [\ptr], #\inc .else .error "Unsupported inc macro argument" .endif diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h index cc7ef40..af18cea 100644 --- a/arch/arm/include/asm/domain.h +++ b/arch/arm/include/asm/domain.h @@ -45,13 +45,17 @@ */ #define DOMAIN_NOACCESS 0 #define DOMAIN_CLIENT 1 +#ifdef CONFIG_CPU_USE_DOMAINS #define DOMAIN_MANAGER 3 +#else +#define DOMAIN_MANAGER 1 +#endif #define domain_val(dom,type) ((type) << (2*(dom))) #ifndef __ASSEMBLY__ -#ifdef CONFIG_MMU +#ifdef CONFIG_CPU_USE_DOMAINS #define set_domain(x) \ do { \ __asm__ __volatile__( \ @@ -74,5 +78,28 @@ #define modify_domain(dom,type) do { } while (0) #endif +/* + * Generate the T (user) versions of the LDR/STR and related + * instructions (inline assembly) + */ +#ifdef CONFIG_CPU_USE_DOMAINS +#define T(instr) #instr "t" +#else +#define T(instr) #instr #endif -#endif /* !__ASSEMBLY__ */ + +#else /* __ASSEMBLY__ */ + +/* + * Generate the T (user) versions of the LDR/STR and related + * instructions + */ +#ifdef CONFIG_CPU_USE_DOMAINS +#define T(instr) instr ## t +#else +#define T(instr) instr +#endif + +#endif /* __ASSEMBLY__ */ + +#endif /* !__ASM_PROC_DOMAIN_H */ diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h index 540a044..b33fe70 100644 --- a/arch/arm/include/asm/futex.h +++ b/arch/arm/include/asm/futex.h @@ -13,12 +13,13 @@ #include <linux/preempt.h> #include <linux/uaccess.h> #include <asm/errno.h> +#include <asm/domain.h> #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \ __asm__ __volatile__( \ - "1: ldrt %1, [%2]\n" \ + "1: " T(ldr) " %1, [%2]\n" \ " " insn "\n" \ - "2: strt %0, [%2]\n" \ + "2: " T(str) " %0, [%2]\n" \ " mov %0, #0\n" \ "3:\n" \ " .pushsection __ex_table,\"a\"\n" \ @@ -97,10 +98,10 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval) pagefault_disable(); /* implies preempt_disable() */ __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" - "1: ldrt %0, [%3]\n" + "1: " T(ldr) " %0, [%3]\n" " teq %0, %1\n" " it eq @ explicit IT needed for the 2b label\n" - "2: streqt %2, [%3]\n" + "2: " T(streq) " %2, [%3]\n" "3:\n" " .pushsection __ex_table,\"a\"\n" " .align 3\n" diff --git a/arch/arm/include/asm/traps.h b/arch/arm/include/asm/traps.h index 491960b..af5d5d1 100644 --- a/arch/arm/include/asm/traps.h +++ b/arch/arm/include/asm/traps.h @@ -27,4 +27,6 @@ static inline int in_exception_text(unsigned long ptr) extern void __init early_trap_init(void); extern void dump_backtrace_entry(unsigned long where, unsigned long from, unsigned long frame); +extern void *vectors_page; + #endif diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h index 33e4a48..b293616 100644 --- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -227,7 +227,7 @@ do { \ #define __get_user_asm_byte(x,addr,err) \ __asm__ __volatile__( \ - "1: ldrbt %1,[%2]\n" \ + "1: " T(ldrb) " %1,[%2],#0\n" \ "2:\n" \ " .pushsection .fixup,\"ax\"\n" \ " .align 2\n" \ @@ -263,7 +263,7 @@ do { \ #define __get_user_asm_word(x,addr,err) \ __asm__ __volatile__( \ - "1: ldrt %1,[%2]\n" \ + "1: " T(ldr) " %1,[%2],#0\n" \ "2:\n" \ " .pushsection .fixup,\"ax\"\n" \ " .align 2\n" \ @@ -308,7 +308,7 @@ do { \ #define __put_user_asm_byte(x,__pu_addr,err) \ __asm__ __volatile__( \ - "1: strbt %1,[%2]\n" \ + "1: " T(strb) " %1,[%2],#0\n" \ "2:\n" \ " .pushsection .fixup,\"ax\"\n" \ " .align 2\n" \ @@ -341,7 +341,7 @@ do { \ #define __put_user_asm_word(x,__pu_addr,err) \ __asm__ __volatile__( \ - "1: strt %1,[%2]\n" \ + "1: " T(str) " %1,[%2],#0\n" \ "2:\n" \ " .pushsection .fixup,\"ax\"\n" \ " .align 2\n" \ @@ -366,10 +366,10 @@ do { \ #define __put_user_asm_dword(x,__pu_addr,err) \ __asm__ __volatile__( \ - ARM( "1: strt " __reg_oper1 ", [%1], #4\n" ) \ - ARM( "2: strt " __reg_oper0 ", [%1]\n" ) \ - THUMB( "1: strt " __reg_oper1 ", [%1]\n" ) \ - THUMB( "2: strt " __reg_oper0 ", [%1, #4]\n" ) \ + ARM( "1: " T(str) " " __reg_oper1 ", [%1], #4\n" ) \ + ARM( "2: " T(str) " " __reg_oper0 ", [%1]\n" ) \ + THUMB( "1: " T(str) " " __reg_oper1 ", [%1]\n" ) \ + THUMB( "2: " T(str) " " __reg_oper0 ", [%1, #4]\n" ) \ "3:\n" \ " .pushsection .fixup,\"ax\"\n" \ " .align 2\n" \ diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index 7ee48e7..ba654fa 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -736,7 +736,7 @@ ENTRY(__switch_to) THUMB( stmia ip!, {r4 - sl, fp} ) @ Store most regs on stack THUMB( str sp, [ip], #4 ) THUMB( str lr, [ip], #4 ) -#ifdef CONFIG_MMU +#ifdef CONFIG_CPU_USE_DOMAINS ldr r6, [r2, #TI_CPU_DOMAIN] #endif #if defined(CONFIG_HAS_TLS_REG) @@ -745,7 +745,7 @@ ENTRY(__switch_to) mov r4, #0xffff0fff str r3, [r4, #-15] @ TLS val at 0xffff0ff0 #endif -#ifdef CONFIG_MMU +#ifdef CONFIG_CPU_USE_DOMAINS mcr p15, 0, r6, c3, c0, 0 @ Set domain register #endif mov r5, r0 diff --git a/arch/arm/kernel/fiq.c b/arch/arm/kernel/fiq.c index 6ff7919..d601ef2 100644 --- a/arch/arm/kernel/fiq.c +++ b/arch/arm/kernel/fiq.c @@ -45,6 +45,7 @@ #include <asm/fiq.h> #include <asm/irq.h> #include <asm/system.h> +#include <asm/traps.h> static unsigned long no_fiq_insn; @@ -77,7 +78,11 @@ int show_fiq_list(struct seq_file *p, void *v) void set_fiq_handler(void *start, unsigned int length) { +#if defined(CONFIG_CPU_USE_DOMAINS) memcpy((void *)0xffff001c, start, length); +#else + memcpy(vectors_page + 0x1c, start, length); +#endif flush_icache_range(0xffff001c, 0xffff001c + length); if (!vectors_high()) flush_icache_range(0x1c, 0x1c + length); diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 1621e53..6bc57c5 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -36,6 +36,8 @@ static const char *handler[]= { "prefetch abort", "data abort", "address exception", "interrupt" }; +void *vectors_page; + #ifdef CONFIG_DEBUG_USER unsigned int user_debug; @@ -745,7 +747,11 @@ void __init trap_init(void) void __init early_trap_init(void) { +#if defined(CONFIG_CPU_USE_DOMAINS) unsigned long vectors = CONFIG_VECTORS_BASE; +#else + unsigned long vectors = (unsigned long)vectors_page; +#endif extern char __stubs_start[], __stubs_end[]; extern char __vectors_start[], __vectors_end[]; extern char __kuser_helper_start[], __kuser_helper_end[]; @@ -764,10 +770,10 @@ void __init early_trap_init(void) * Copy signal return handlers into the vector page, and * set sigreturn to be a pointer to these. */ - memcpy((void *)KERN_SIGRETURN_CODE, sigreturn_codes, - sizeof(sigreturn_codes)); - memcpy((void *)KERN_RESTART_CODE, syscall_restart_code, - sizeof(syscall_restart_code)); + memcpy((void *)(vectors + KERN_SIGRETURN_CODE - CONFIG_VECTORS_BASE), + sigreturn_codes, sizeof(sigreturn_codes)); + memcpy((void *)(vectors + KERN_RESTART_CODE - CONFIG_VECTORS_BASE), + syscall_restart_code, sizeof(syscall_restart_code)); flush_icache_range(vectors, vectors + PAGE_SIZE); modify_domain(DOMAIN_USER, DOMAIN_CLIENT); diff --git a/arch/arm/lib/getuser.S b/arch/arm/lib/getuser.S index b1631a7..1b049cd 100644 --- a/arch/arm/lib/getuser.S +++ b/arch/arm/lib/getuser.S @@ -28,20 +28,21 @@ */ #include <linux/linkage.h> #include <asm/errno.h> +#include <asm/domain.h> ENTRY(__get_user_1) -1: ldrbt r2, [r0] +1: T(ldrb) r2, [r0] mov r0, #0 mov pc, lr ENDPROC(__get_user_1) ENTRY(__get_user_2) #ifdef CONFIG_THUMB2_KERNEL -2: ldrbt r2, [r0] -3: ldrbt r3, [r0, #1] +2: T(ldrb) r2, [r0] +3: T(ldrb) r3, [r0, #1] #else -2: ldrbt r2, [r0], #1 -3: ldrbt r3, [r0] +2: T(ldrb) r2, [r0], #1 +3: T(ldrb) r3, [r0] #endif #ifndef __ARMEB__ orr r2, r2, r3, lsl #8 @@ -53,7 +54,7 @@ ENTRY(__get_user_2) ENDPROC(__get_user_2) ENTRY(__get_user_4) -4: ldrt r2, [r0] +4: T(ldr) r2, [r0] mov r0, #0 mov pc, lr ENDPROC(__get_user_4) diff --git a/arch/arm/lib/putuser.S b/arch/arm/lib/putuser.S index 5a01a23..c023fc1 100644 --- a/arch/arm/lib/putuser.S +++ b/arch/arm/lib/putuser.S @@ -28,9 +28,10 @@ */ #include <linux/linkage.h> #include <asm/errno.h> +#include <asm/domain.h> ENTRY(__put_user_1) -1: strbt r2, [r0] +1: T(strb) r2, [r0] mov r0, #0 mov pc, lr ENDPROC(__put_user_1) @@ -39,19 +40,19 @@ ENTRY(__put_user_2) mov ip, r2, lsr #8 #ifdef CONFIG_THUMB2_KERNEL #ifndef __ARMEB__ -2: strbt r2, [r0] -3: strbt ip, [r0, #1] +2: T(strb) r2, [r0] +3: T(strb) ip, [r0, #1] #else -2: strbt ip, [r0] -3: strbt r2, [r0, #1] +2: T(strb) ip, [r0] +3: T(strb) r2, [r0, #1] #endif #else /* !CONFIG_THUMB2_KERNEL */ #ifndef __ARMEB__ -2: strbt r2, [r0], #1 -3: strbt ip, [r0] +2: T(strb) r2, [r0], #1 +3: T(strb) ip, [r0] #else -2: strbt ip, [r0], #1 -3: strbt r2, [r0] +2: T(strb) ip, [r0], #1 +3: T(strb) r2, [r0] #endif #endif /* CONFIG_THUMB2_KERNEL */ mov r0, #0 @@ -59,18 +60,18 @@ ENTRY(__put_user_2) ENDPROC(__put_user_2) ENTRY(__put_user_4) -4: strt r2, [r0] +4: T(str) r2, [r0] mov r0, #0 mov pc, lr ENDPROC(__put_user_4) ENTRY(__put_user_8) #ifdef CONFIG_THUMB2_KERNEL -5: strt r2, [r0] -6: strt r3, [r0, #4] +5: T(str) r2, [r0] +6: T(str) r3, [r0, #4] #else -5: strt r2, [r0], #4 -6: strt r3, [r0] +5: T(str) r2, [r0], #4 +6: T(str) r3, [r0] #endif mov r0, #0 mov pc, lr diff --git a/arch/arm/lib/uaccess.S b/arch/arm/lib/uaccess.S index fee9f6f..d0ece2a 100644 --- a/arch/arm/lib/uaccess.S +++ b/arch/arm/lib/uaccess.S @@ -14,6 +14,7 @@ #include <linux/linkage.h> #include <asm/assembler.h> #include <asm/errno.h> +#include <asm/domain.h> .text @@ -31,11 +32,11 @@ rsb ip, ip, #4 cmp ip, #2 ldrb r3, [r1], #1 -USER( strbt r3, [r0], #1) @ May fault +USER( T(strb) r3, [r0], #1) @ May fault ldrgeb r3, [r1], #1 -USER( strgebt r3, [r0], #1) @ May fault +USER( T(strgeb) r3, [r0], #1) @ May fault ldrgtb r3, [r1], #1 -USER( strgtbt r3, [r0], #1) @ May fault +USER( T(strgtb) r3, [r0], #1) @ May fault sub r2, r2, ip b .Lc2u_dest_aligned @@ -58,7 +59,7 @@ ENTRY(__copy_to_user) addmi ip, r2, #4 bmi .Lc2u_0nowords ldr r3, [r1], #4 -USER( strt r3, [r0], #4) @ May fault +USER( T(str) r3, [r0], #4) @ May fault mov ip, r0, lsl #32 - PAGE_SHIFT @ On each page, use a ld/st??t instruction rsb ip, ip, #0 movs ip, ip, lsr #32 - PAGE_SHIFT @@ -87,18 +88,18 @@ USER( strt r3, [r0], #4) @ May fault stmneia r0!, {r3 - r4} @ Shouldnt fault tst ip, #4 ldrne r3, [r1], #4 - strnet r3, [r0], #4 @ Shouldnt fault + T(strne) r3, [r0], #4 @ Shouldnt fault ands ip, ip, #3 beq .Lc2u_0fupi .Lc2u_0nowords: teq ip, #0 beq .Lc2u_finished .Lc2u_nowords: cmp ip, #2 ldrb r3, [r1], #1 -USER( strbt r3, [r0], #1) @ May fault +USER( T(strb) r3, [r0], #1) @ May fault ldrgeb r3, [r1], #1 -USER( strgebt r3, [r0], #1) @ May fault +USER( T(strgeb) r3, [r0], #1) @ May fault ldrgtb r3, [r1], #1 -USER( strgtbt r3, [r0], #1) @ May fault +USER( T(strgtb) r3, [r0], #1) @ May fault b .Lc2u_finished .Lc2u_not_enough: @@ -119,7 +120,7 @@ USER( strgtbt r3, [r0], #1) @ May fault mov r3, r7, pull #8 ldr r7, [r1], #4 orr r3, r3, r7, push #24 -USER( strt r3, [r0], #4) @ May fault +USER( T(str) r3, [r0], #4) @ May fault mov ip, r0, lsl #32 - PAGE_SHIFT rsb ip, ip, #0 movs ip, ip, lsr #32 - PAGE_SHIFT @@ -154,18 +155,18 @@ USER( strt r3, [r0], #4) @ May fault movne r3, r7, pull #8 ldrne r7, [r1], #4 orrne r3, r3, r7, push #24 - strnet r3, [r0], #4 @ Shouldnt fault + T(strne) r3, [r0], #4 @ Shouldnt fault ands ip, ip, #3 beq .Lc2u_1fupi .Lc2u_1nowords: mov r3, r7, get_byte_1 teq ip, #0 beq .Lc2u_finished cmp ip, #2 -USER( strbt r3, [r0], #1) @ May fault +USER( T(strb) r3, [r0], #1) @ May fault movge r3, r7, get_byte_2 -USER( strgebt r3, [r0], #1) @ May fault +USER( T(strgeb) r3, [r0], #1) @ May fault movgt r3, r7, get_byte_3 -USER( strgtbt r3, [r0], #1) @ May fault +USER( T(strgtb) r3, [r0], #1) @ May fault b .Lc2u_finished .Lc2u_2fupi: subs r2, r2, #4 @@ -174,7 +175,7 @@ USER( strgtbt r3, [r0], #1) @ May fault mov r3, r7, pull #16 ldr r7, [r1], #4 orr r3, r3, r7, push #16 -USER( strt r3, [r0], #4) @ May fault +USER( T(str) r3, [r0], #4) @ May fault mov ip, r0, lsl #32 - PAGE_SHIFT rsb ip, ip, #0 movs ip, ip, lsr #32 - PAGE_SHIFT @@ -209,18 +210,18 @@ USER( strt r3, [r0], #4) @ May fault movne r3, r7, pull #16 ldrne r7, [r1], #4 orrne r3, r3, r7, push #16 - strnet r3, [r0], #4 @ Shouldnt fault + T(strne) r3, [r0], #4 @ Shouldnt fault ands ip, ip, #3 beq .Lc2u_2fupi .Lc2u_2nowords: mov r3, r7, get_byte_2 teq ip, #0 beq .Lc2u_finished cmp ip, #2 -USER( strbt r3, [r0], #1) @ May fault +USER( T(strb) r3, [r0], #1) @ May fault movge r3, r7, get_byte_3 -USER( strgebt r3, [r0], #1) @ May fault +USER( T(strgeb) r3, [r0], #1) @ May fault ldrgtb r3, [r1], #0 -USER( strgtbt r3, [r0], #1) @ May fault +USER( T(strgtb) r3, [r0], #1) @ May fault b .Lc2u_finished .Lc2u_3fupi: subs r2, r2, #4 @@ -229,7 +230,7 @@ USER( strgtbt r3, [r0], #1) @ May fault mov r3, r7, pull #24 ldr r7, [r1], #4 orr r3, r3, r7, push #8 -USER( strt r3, [r0], #4) @ May fault +USER( T(str) r3, [r0], #4) @ May fault mov ip, r0, lsl #32 - PAGE_SHIFT rsb ip, ip, #0 movs ip, ip, lsr #32 - PAGE_SHIFT @@ -264,18 +265,18 @@ USER( strt r3, [r0], #4) @ May fault movne r3, r7, pull #24 ldrne r7, [r1], #4 orrne r3, r3, r7, push #8 - strnet r3, [r0], #4 @ Shouldnt fault + T(strne) r3, [r0], #4 @ Shouldnt fault ands ip, ip, #3 beq .Lc2u_3fupi .Lc2u_3nowords: mov r3, r7, get_byte_3 teq ip, #0 beq .Lc2u_finished cmp ip, #2 -USER( strbt r3, [r0], #1) @ May fault +USER( T(strb) r3, [r0], #1) @ May fault ldrgeb r3, [r1], #1 -USER( strgebt r3, [r0], #1) @ May fault +USER( T(strgeb) r3, [r0], #1) @ May fault ldrgtb r3, [r1], #0 -USER( strgtbt r3, [r0], #1) @ May fault +USER( T(strgtb) r3, [r0], #1) @ May fault b .Lc2u_finished ENDPROC(__copy_to_user) @@ -294,11 +295,11 @@ ENDPROC(__copy_to_user) .Lcfu_dest_not_aligned: rsb ip, ip, #4 cmp ip, #2 -USER( ldrbt r3, [r1], #1) @ May fault +USER( T(ldrb) r3, [r1], #1) @ May fault strb r3, [r0], #1 -USER( ldrgebt r3, [r1], #1) @ May fault +USER( T(ldrgeb) r3, [r1], #1) @ May fault strgeb r3, [r0], #1 -USER( ldrgtbt r3, [r1], #1) @ May fault +USER( T(ldrgtb) r3, [r1], #1) @ May fault strgtb r3, [r0], #1 sub r2, r2, ip b .Lcfu_dest_aligned @@ -321,7 +322,7 @@ ENTRY(__copy_from_user) .Lcfu_0fupi: subs r2, r2, #4 addmi ip, r2, #4 bmi .Lcfu_0nowords -USER( ldrt r3, [r1], #4) +USER( T(ldr) r3, [r1], #4) str r3, [r0], #4 mov ip, r1, lsl #32 - PAGE_SHIFT @ On each page, use a ld/st??t instruction rsb ip, ip, #0 @@ -350,18 +351,18 @@ USER( ldrt r3, [r1], #4) ldmneia r1!, {r3 - r4} @ Shouldnt fault stmneia r0!, {r3 - r4} tst ip, #4 - ldrnet r3, [r1], #4 @ Shouldnt fault + T(ldrne) r3, [r1], #4 @ Shouldnt fault strne r3, [r0], #4 ands ip, ip, #3 beq .Lcfu_0fupi .Lcfu_0nowords: teq ip, #0 beq .Lcfu_finished .Lcfu_nowords: cmp ip, #2 -USER( ldrbt r3, [r1], #1) @ May fault +USER( T(ldrb) r3, [r1], #1) @ May fault strb r3, [r0], #1 -USER( ldrgebt r3, [r1], #1) @ May fault +USER( T(ldrgeb) r3, [r1], #1) @ May fault strgeb r3, [r0], #1 -USER( ldrgtbt r3, [r1], #1) @ May fault +USER( T(ldrgtb) r3, [r1], #1) @ May fault strgtb r3, [r0], #1 b .Lcfu_finished @@ -374,7 +375,7 @@ USER( ldrgtbt r3, [r1], #1) @ May fault .Lcfu_src_not_aligned: bic r1, r1, #3 -USER( ldrt r7, [r1], #4) @ May fault +USER( T(ldr) r7, [r1], #4) @ May fault cmp ip, #2 bgt .Lcfu_3fupi beq .Lcfu_2fupi @@ -382,7 +383,7 @@ USER( ldrt r7, [r1], #4) @ May fault addmi ip, r2, #4 bmi .Lcfu_1nowords mov r3, r7, pull #8 -USER( ldrt r7, [r1], #4) @ May fault +USER( T(ldr) r7, [r1], #4) @ May fault orr r3, r3, r7, push #24 str r3, [r0], #4 mov ip, r1, lsl #32 - PAGE_SHIFT @@ -417,7 +418,7 @@ USER( ldrt r7, [r1], #4) @ May fault stmneia r0!, {r3 - r4} tst ip, #4 movne r3, r7, pull #8 -USER( ldrnet r7, [r1], #4) @ May fault +USER( T(ldrne) r7, [r1], #4) @ May fault orrne r3, r3, r7, push #24 strne r3, [r0], #4 ands ip, ip, #3 @@ -437,7 +438,7 @@ USER( ldrnet r7, [r1], #4) @ May fault addmi ip, r2, #4 bmi .Lcfu_2nowords mov r3, r7, pull #16 -USER( ldrt r7, [r1], #4) @ May fault +USER( T(ldr) r7, [r1], #4) @ May fault orr r3, r3, r7, push #16 str r3, [r0], #4 mov ip, r1, lsl #32 - PAGE_SHIFT @@ -473,7 +474,7 @@ USER( ldrt r7, [r1], #4) @ May fault stmneia r0!, {r3 - r4} tst ip, #4 movne r3, r7, pull #16 -USER( ldrnet r7, [r1], #4) @ May fault +USER( T(ldrne) r7, [r1], #4) @ May fault orrne r3, r3, r7, push #16 strne r3, [r0], #4 ands ip, ip, #3 @@ -485,7 +486,7 @@ USER( ldrnet r7, [r1], #4) @ May fault strb r3, [r0], #1 movge r3, r7, get_byte_3 strgeb r3, [r0], #1 -USER( ldrgtbt r3, [r1], #0) @ May fault +USER( T(ldrgtb) r3, [r1], #0) @ May fault strgtb r3, [r0], #1 b .Lcfu_finished @@ -493,7 +494,7 @@ USER( ldrgtbt r3, [r1], #0) @ May fault addmi ip, r2, #4 bmi .Lcfu_3nowords mov r3, r7, pull #24 -USER( ldrt r7, [r1], #4) @ May fault +USER( T(ldr) r7, [r1], #4) @ May fault orr r3, r3, r7, push #8 str r3, [r0], #4 mov ip, r1, lsl #32 - PAGE_SHIFT @@ -528,7 +529,7 @@ USER( ldrt r7, [r1], #4) @ May fault stmneia r0!, {r3 - r4} tst ip, #4 movne r3, r7, pull #24 -USER( ldrnet r7, [r1], #4) @ May fault +USER( T(ldrne) r7, [r1], #4) @ May fault orrne r3, r3, r7, push #8 strne r3, [r0], #4 ands ip, ip, #3 @@ -538,9 +539,9 @@ USER( ldrnet r7, [r1], #4) @ May fault beq .Lcfu_finished cmp ip, #2 strb r3, [r0], #1 -USER( ldrgebt r3, [r1], #1) @ May fault +USER( T(ldrgeb) r3, [r1], #1) @ May fault strgeb r3, [r0], #1 -USER( ldrgtbt r3, [r1], #1) @ May fault +USER( T(ldrgtb) r3, [r1], #1) @ May fault strgtb r3, [r0], #1 b .Lcfu_finished ENDPROC(__copy_from_user) diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index 346ae14..f33c422 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -599,6 +599,14 @@ config CPU_CP15_MPU help Processor has the CP15 register, which has MPU related registers. +config CPU_USE_DOMAINS + bool + depends on MMU + default y if !HAS_TLS_REG + help + This option enables or disables the use of domain switching + via the set_fs() function. + # # CPU supports 36-bit I/O # diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index 2858941..499e22d 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -25,6 +25,7 @@ #include <asm/smp_plat.h> #include <asm/tlb.h> #include <asm/highmem.h> +#include <asm/traps.h> #include <asm/mach/arch.h> #include <asm/mach/map.h> @@ -935,12 +936,11 @@ static void __init devicemaps_init(struct machine_desc *mdesc) { struct map_desc map; unsigned long addr; - void *vectors; /* * Allocate the vector page early. */ - vectors = alloc_bootmem_low_pages(PAGE_SIZE); + vectors_page = alloc_bootmem_low_pages(PAGE_SIZE); for (addr = VMALLOC_END; addr; addr += PGDIR_SIZE) pmd_clear(pmd_off_k(addr)); @@ -980,7 +980,7 @@ static void __init devicemaps_init(struct machine_desc *mdesc) * location (0xffff0000). If we aren't using high-vectors, also * create a mapping at the low-vectors virtual address. */ - map.pfn = __phys_to_pfn(virt_to_phys(vectors)); + map.pfn = __phys_to_pfn(virt_to_phys(vectors_page)); map.virtual = 0xffff0000; map.length = PAGE_SIZE; map.type = MT_HIGH_VECTORS; diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S index 7d63bea..337f102 100644 --- a/arch/arm/mm/proc-macros.S +++ b/arch/arm/mm/proc-macros.S @@ -99,6 +99,10 @@ * 110x 0 1 0 r/w r/o * 11x0 0 1 0 r/w r/o * 1111 0 1 1 r/w r/w + * + * If !CONFIG_CPU_USE_DOMAINS, the following permissions are changed: + * 110x 1 1 1 r/o r/o + * 11x0 1 1 1 r/o r/o */ .macro armv6_mt_table pfx \pfx\()_mt_table: @@ -138,8 +142,11 @@ tst r1, #L_PTE_USER orrne r3, r3, #PTE_EXT_AP1 +#ifdef CONFIG_CPU_USE_DOMAINS + @ allow kernel read/write access to read-only user pages tstne r3, #PTE_EXT_APX bicne r3, r3, #PTE_EXT_APX | PTE_EXT_AP0 +#endif tst r1, #L_PTE_EXEC orreq r3, r3, #PTE_EXT_XN diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index 7aaf88a..c1c3fe0 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -152,8 +152,11 @@ ENTRY(cpu_v7_set_pte_ext) tst r1, #L_PTE_USER orrne r3, r3, #PTE_EXT_AP1 +#ifdef CONFIG_CPU_USE_DOMAINS + @ allow kernel read/write access to read-only user pages tstne r3, #PTE_EXT_APX bicne r3, r3, #PTE_EXT_APX | PTE_EXT_AP0 +#endif tst r1, #L_PTE_EXEC orreq r3, r3, #PTE_EXT_XN @@ -240,8 +243,6 @@ __v7_setup: mcr p15, 0, r10, c2, c0, 2 @ TTB control register orr r4, r4, #TTB_FLAGS mcr p15, 0, r4, c2, c0, 1 @ load TTB1 - mov r10, #0x1f @ domains 0, 1 = manager - mcr p15, 0, r10, c3, c0, 0 @ load domain access register /* * Memory region attributes with SCTLR.TRE=1 * ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs 2010-06-21 14:46 ` [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas @ 2010-06-22 12:47 ` Anton Vorontsov 2010-06-22 13:01 ` Catalin Marinas 0 siblings, 1 reply; 11+ messages in thread From: Anton Vorontsov @ 2010-06-22 12:47 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jun 21, 2010 at 03:46:26PM +0100, Catalin Marinas wrote: > This patch removes the domain switching functionality via the set_fs and > __switch_to functions on cores that have a TLS register. > > Currently, the ioremap and vmalloc areas share the same level 1 page > tables and therefore have the same domain (DOMAIN_KERNEL). When the > kernel domain is modified from Client to Manager (via the __set_fs or in > the __switch_to function), the XN (eXecute Never) bit is overridden and > newer CPUs can speculatively prefetch the ioremap'ed memory. > > Linux performs the kernel domain switching to allow user-specific > functions (copy_to/from_user, get/put_user etc.) to access kernel > memory. In order for these functions to work with the kernel domain set > to Client, the patch modifies the LDRT/STRT and related instructions to > the LDR/STR ones. > > The user pages access rights are also modified for kernel read-only > access rather than read/write so that the copy-on-write mechanism still > works. CPU_USE_DOMAINS gets disabled only if HAS_TLS_REG is defined > since writing the TLS value to the high vectors page isn't possible. > > The user addresses passed to the kernel are checked by the access_ok() > function so that they do not point to the kernel space. > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> I tested this on ARMv6K (ARM11 MPcore) and ARMv7 (Cortex-A9), and didn't notice any issues. This is also needed for robust mutextes support... so, if that helps, Tested-by: Anton Vorontsov <cbouatmailru@gmail.com> Thanks! ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs 2010-06-22 12:47 ` Anton Vorontsov @ 2010-06-22 13:01 ` Catalin Marinas 0 siblings, 0 replies; 11+ messages in thread From: Catalin Marinas @ 2010-06-22 13:01 UTC (permalink / raw) To: linux-arm-kernel On Tue, 2010-06-22 at 13:47 +0100, Anton Vorontsov wrote: > On Mon, Jun 21, 2010 at 03:46:26PM +0100, Catalin Marinas wrote: > > This patch removes the domain switching functionality via the set_fs and > > __switch_to functions on cores that have a TLS register. > > > > Currently, the ioremap and vmalloc areas share the same level 1 page > > tables and therefore have the same domain (DOMAIN_KERNEL). When the > > kernel domain is modified from Client to Manager (via the __set_fs or in > > the __switch_to function), the XN (eXecute Never) bit is overridden and > > newer CPUs can speculatively prefetch the ioremap'ed memory. > > > > Linux performs the kernel domain switching to allow user-specific > > functions (copy_to/from_user, get/put_user etc.) to access kernel > > memory. In order for these functions to work with the kernel domain set > > to Client, the patch modifies the LDRT/STRT and related instructions to > > the LDR/STR ones. > > > > The user pages access rights are also modified for kernel read-only > > access rather than read/write so that the copy-on-write mechanism still > > works. CPU_USE_DOMAINS gets disabled only if HAS_TLS_REG is defined > > since writing the TLS value to the high vectors page isn't possible. > > > > The user addresses passed to the kernel are checked by the access_ok() > > function so that they do not point to the kernel space. > > > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > > I tested this on ARMv6K (ARM11 MPcore) and ARMv7 (Cortex-A9), and > didn't notice any issues. This is also needed for robust mutextes > support... so, if that helps, > > Tested-by: Anton Vorontsov <cbouatmailru@gmail.com> Thanks. -- Catalin ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache 2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas 2010-06-21 14:46 ` [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas @ 2010-06-21 14:46 ` Catalin Marinas 2010-06-22 19:36 ` Rabin Vincent 2010-06-21 14:46 ` [PATCH v4 3/4] ARM: Synchronise the I and D caches via set_pte_at() on SMP systems Catalin Marinas ` (2 subsequent siblings) 4 siblings, 1 reply; 11+ messages in thread From: Catalin Marinas @ 2010-06-21 14:46 UTC (permalink / raw) To: linux-arm-kernel There are places in Linux where writes to newly allocated page cache pages happen without a subsequent call to flush_dcache_page() (several PIO drivers including USB HCD). This patch changes the meaning of PG_arch_1 to be PG_dcache_clean and always flush the D-cache for a newly mapped page in update_mmu_cache(). The patch also sets the PG_arch_1 bit in the DMA cache maintenance function to avoid additional cache flushing in update_mmu_cache(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/cacheflush.h | 6 +++--- arch/arm/include/asm/tlbflush.h | 2 +- arch/arm/mm/copypage-v4mc.c | 2 +- arch/arm/mm/copypage-v6.c | 2 +- arch/arm/mm/copypage-xscale.c | 2 +- arch/arm/mm/dma-mapping.c | 6 ++++++ arch/arm/mm/fault-armv.c | 4 ++-- arch/arm/mm/flush.c | 3 ++- 8 files changed, 17 insertions(+), 10 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index 4656a24..d3730f0 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -137,10 +137,10 @@ #endif /* - * This flag is used to indicate that the page pointed to by a pte - * is dirty and requires cleaning before returning it to the user. + * This flag is used to indicate that the page pointed to by a pte is clean + * and does not require cleaning before returning it to the user. */ -#define PG_dcache_dirty PG_arch_1 +#define PG_dcache_clean PG_arch_1 /* * MM Cache Management diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h index bd863d8..40a7092 100644 --- a/arch/arm/include/asm/tlbflush.h +++ b/arch/arm/include/asm/tlbflush.h @@ -552,7 +552,7 @@ extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); #endif /* - * if PG_dcache_dirty is set for the page, we need to ensure that any + * If PG_dcache_clean is not set for the page, we need to ensure that any * cache entries for the kernels virtual memory range are written * back to the page. */ diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c index 598c51a..b806151 100644 --- a/arch/arm/mm/copypage-v4mc.c +++ b/arch/arm/mm/copypage-v4mc.c @@ -73,7 +73,7 @@ void v4_mc_copy_user_highpage(struct page *to, struct page *from, { void *kto = kmap_atomic(to, KM_USER1); - if (test_and_clear_bit(PG_dcache_dirty, &from->flags)) + if (!test_and_set_bit(PG_dcache_clean, &from->flags)) __flush_dcache_page(page_mapping(from), from); spin_lock(&minicache_lock); diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c index f55fa10..bdba6c6 100644 --- a/arch/arm/mm/copypage-v6.c +++ b/arch/arm/mm/copypage-v6.c @@ -79,7 +79,7 @@ static void v6_copy_user_highpage_aliasing(struct page *to, unsigned int offset = CACHE_COLOUR(vaddr); unsigned long kfrom, kto; - if (test_and_clear_bit(PG_dcache_dirty, &from->flags)) + if (!test_and_set_bit(PG_dcache_clean, &from->flags)) __flush_dcache_page(page_mapping(from), from); /* FIXME: not highmem safe */ diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c index 9920c0a..649bbcd 100644 --- a/arch/arm/mm/copypage-xscale.c +++ b/arch/arm/mm/copypage-xscale.c @@ -95,7 +95,7 @@ void xscale_mc_copy_user_highpage(struct page *to, struct page *from, { void *kto = kmap_atomic(to, KM_USER1); - if (test_and_clear_bit(PG_dcache_dirty, &from->flags)) + if (!test_and_set_bit(PG_dcache_clean, &from->flags)) __flush_dcache_page(page_mapping(from), from); spin_lock(&minicache_lock); diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 13fa536..bb9b612 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -508,6 +508,12 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off, outer_inv_range(paddr, paddr + size); dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); + + /* + * Mark the D-cache clean for this page to avoid extra flushing. + */ + if (dir != DMA_TO_DEVICE) + set_bit(PG_dcache_clean, &page->flags); } EXPORT_SYMBOL(___dma_page_dev_to_cpu); diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index 9b906de..58846cb 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -141,7 +141,7 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, * a page table, or changing an existing PTE. Basically, there are two * things that we need to take care of: * - * 1. If PG_dcache_dirty is set for the page, we need to ensure + * 1. If PG_dcache_clean is not set for the page, we need to ensure * that any cache entries for the kernels virtual memory * range are written back to the page. * 2. If we have multiple shared mappings of the same space in @@ -169,7 +169,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, mapping = page_mapping(page); #ifndef CONFIG_SMP - if (test_and_clear_bit(PG_dcache_dirty, &page->flags)) + if (!test_and_set_bit(PG_dcache_clean, &page->flags)) __flush_dcache_page(mapping, page); #endif if (mapping) { diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index c6844cb..9f9919b 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -248,7 +248,7 @@ void flush_dcache_page(struct page *page) #ifndef CONFIG_SMP if (!PageHighMem(page) && mapping && !mapping_mapped(mapping)) - set_bit(PG_dcache_dirty, &page->flags); + clear_bit(PG_dcache_clean, &page->flags); else #endif { @@ -257,6 +257,7 @@ void flush_dcache_page(struct page *page) __flush_dcache_aliases(mapping, page); else if (mapping) __flush_icache_all(); + set_bit(PG_dcache_clean, &page->flags); } } EXPORT_SYMBOL(flush_dcache_page); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache 2010-06-21 14:46 ` [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache Catalin Marinas @ 2010-06-22 19:36 ` Rabin Vincent 2010-06-22 22:39 ` Catalin Marinas 0 siblings, 1 reply; 11+ messages in thread From: Rabin Vincent @ 2010-06-22 19:36 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jun 21, 2010 at 03:46:32PM +0100, Catalin Marinas wrote: > There are places in Linux where writes to newly allocated page cache > pages happen without a subsequent call to flush_dcache_page() (several > PIO drivers including USB HCD). This patch changes the meaning of > PG_arch_1 to be PG_dcache_clean and always flush the D-cache for a newly > mapped page in update_mmu_cache(). Correct me if I'm misreading the code, but don't this patch and the next one make the assumption that CONFIG_SMP == VIPT non-aliasing (or PIPT) caches? This patch does not add flushing on SMP systems, and the next one handles the I$-D$ coherency issues there (ignoring the set_pte race fix for a moment). Won't the flushing added in this patch be unnecessary on non-SMP PIPT systems, since they too only need the exec-related flushing? Rabin ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache 2010-06-22 19:36 ` Rabin Vincent @ 2010-06-22 22:39 ` Catalin Marinas 0 siblings, 0 replies; 11+ messages in thread From: Catalin Marinas @ 2010-06-22 22:39 UTC (permalink / raw) To: linux-arm-kernel On Tue, 2010-06-22 at 20:36 +0100, Rabin Vincent wrote: > On Mon, Jun 21, 2010 at 03:46:32PM +0100, Catalin Marinas wrote: > > There are places in Linux where writes to newly allocated page cache > > pages happen without a subsequent call to flush_dcache_page() (several > > PIO drivers including USB HCD). This patch changes the meaning of > > PG_arch_1 to be PG_dcache_clean and always flush the D-cache for a newly > > mapped page in update_mmu_cache(). > > Correct me if I'm misreading the code, but don't this patch and the next > one make the assumption that CONFIG_SMP == VIPT non-aliasing (or PIPT) > caches? Yes. > This patch does not add flushing on SMP systems, and the next > one handles the I$-D$ coherency issues there (ignoring the set_pte race > fix for a moment). The aim of this patch isn't to add any new flushing, just changes the default assumptions on the D-cache status for new pages (from clean to dirty). The following patch (sync I-D caches) is aimed at sorting out a race. This patch indeed adds a __flush_dcache_page() call if the page is dirty but with well-behaved drivers (calling flush_dcache_page), the PG_dcache_clean bit should always be set when reaching set_pte_at(). On ARM11MPCore, the cache operations are not broadcast in hardware, so flush_dcache_page() must be called on the CPU that dirtied the cache. So not having a correct driver could lead to errors in user space on this processor (if running with more than 1). Cortex-A9 broadcasts the cache ops in hardware, so even if you don't do it via the driver and flush_dcache_page, you catch the dirty page later in set_pte_at(). So the sync I-D cache patch fixes both the race and the handling of not well written drivers (the latter only on Cortex-A9). With all this in place, the final patch in my series allows Cortex-A9 to use lazy cache flushing as on UP systems (flush_dcache_page/update_mmu_cache pair). BTW, I have a patch for ARM11MPCore which allows lazy cache flushing as well (by performing a read-for-ownership before flushing the D-cache). However, I can't guarantee that it works on anything other than ARM11MPcore as it is to specific to the hardware implementation. > Won't the flushing added in this patch be > unnecessary on non-SMP PIPT systems, since they too only need the > exec-related flushing? Yes, this can be optimised. We could extend the __sync_icache_dcache() patch to all the ARMv7 (UP) and SMP configurations (just a matter of #ifdef's) where we know for sure that the hardware is VIPT non-aliasing. I'll have a look tomorrow and post an update to this patch. Thanks. -- Catalin ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 3/4] ARM: Synchronise the I and D caches via set_pte_at() on SMP systems 2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas 2010-06-21 14:46 ` [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas 2010-06-21 14:46 ` [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache Catalin Marinas @ 2010-06-21 14:46 ` Catalin Marinas 2010-06-21 14:46 ` [PATCH v4 4/4] ARM: Use lazy cache flushing on ARMv7 " Catalin Marinas 2010-06-22 10:29 ` [PATCH v4 0/4] Patches for -next Rabin Vincent 4 siblings, 0 replies; 11+ messages in thread From: Catalin Marinas @ 2010-06-21 14:46 UTC (permalink / raw) To: linux-arm-kernel On SMP systems, there is a small chance of a PTE becoming visible to a different CPU before the cache maintenance operations in update_mmu_cache(). This patch follows the IA-64 and PowerPC approach of synchronising the I and D caches via the set_pte_at() function. In this case, there is no need for update_mmu_cache() to be implemented since lazy cache flushing is already handled by the time this function is called. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/pgtable.h | 25 ++++++++++++++++++++++--- arch/arm/include/asm/tlbflush.h | 10 +++++++++- arch/arm/mm/fault-armv.c | 4 ++-- arch/arm/mm/flush.c | 15 +++++++++++++++ 4 files changed, 48 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index ab68cf1..a3752f5 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -278,9 +278,24 @@ extern struct page *empty_zero_page; #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext) -#define set_pte_at(mm,addr,ptep,pteval) do { \ - set_pte_ext(ptep, pteval, (addr) >= TASK_SIZE ? 0 : PTE_EXT_NG); \ - } while (0) +#ifndef CONFIG_SMP +static inline void __sync_icache_dcache(pte_t pteval) +{ +} +#else +extern void __sync_icache_dcache(pte_t pteval); +#endif + +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pteval) +{ + if (addr >= TASK_SIZE) + set_pte_ext(ptep, pteval, 0); + else { + __sync_icache_dcache(pteval); + set_pte_ext(ptep, pteval, PTE_EXT_NG); + } +} /* * The following only work if pte_present() is true. @@ -292,6 +307,10 @@ extern struct page *empty_zero_page; #define pte_young(pte) (pte_val(pte) & L_PTE_YOUNG) #define pte_special(pte) (0) +#define pte_present_exec_user(pte) \ + ((pte_val(pte) & (L_PTE_PRESENT | L_PTE_EXEC | L_PTE_USER)) == \ + (L_PTE_PRESENT | L_PTE_EXEC | L_PTE_USER)) + #define PTE_BIT_FUNC(fn,op) \ static inline pte_t pte_##fn(pte_t pte) { pte_val(pte) op; return pte; } diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h index 40a7092..6aabedd 100644 --- a/arch/arm/include/asm/tlbflush.h +++ b/arch/arm/include/asm/tlbflush.h @@ -554,10 +554,18 @@ extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); /* * If PG_dcache_clean is not set for the page, we need to ensure that any * cache entries for the kernels virtual memory range are written - * back to the page. + * back to the page. On SMP systems, the cache coherency is handled in the + * set_pte_at() function. */ +#ifndef CONFIG_SMP extern void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); +#else +static inline void update_mmu_cache(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ +} +#endif #endif diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index 58846cb..030f1da 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -28,6 +28,7 @@ static unsigned long shared_pte_mask = L_PTE_MT_BUFFERABLE; +#ifndef CONFIG_SMP /* * We take the easy way out of this problem - we make the * PTE uncacheable. However, we leave the write buffer on. @@ -168,10 +169,8 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, return; mapping = page_mapping(page); -#ifndef CONFIG_SMP if (!test_and_set_bit(PG_dcache_clean, &page->flags)) __flush_dcache_page(mapping, page); -#endif if (mapping) { if (cache_is_vivt()) make_coherent(mapping, vma, addr, ptep, pfn); @@ -179,6 +178,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, __flush_icache_all(); } } +#endif /* !CONFIG_SMP */ /* * Check whether the write buffer has physical address aliasing diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 9f9919b..18a8467 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -215,6 +215,21 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p flush_dcache_mmap_unlock(mapping); } +#ifdef CONFIG_SMP +void __sync_icache_dcache(pte_t pteval) +{ + unsigned long pfn = pte_pfn(pteval); + + if (pfn_valid(pfn) && pte_present_exec_user(pteval)) { + struct page *page = pfn_to_page(pfn); + + if (!test_and_set_bit(PG_dcache_clean, &page->flags)) + __flush_dcache_page(NULL, page); + __flush_icache_all(); + } +} +#endif + /* * Ensure cache coherency between kernel mapping and userspace mapping * of this page. ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 4/4] ARM: Use lazy cache flushing on ARMv7 SMP systems 2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas ` (2 preceding siblings ...) 2010-06-21 14:46 ` [PATCH v4 3/4] ARM: Synchronise the I and D caches via set_pte_at() on SMP systems Catalin Marinas @ 2010-06-21 14:46 ` Catalin Marinas 2010-06-22 10:29 ` [PATCH v4 0/4] Patches for -next Rabin Vincent 4 siblings, 0 replies; 11+ messages in thread From: Catalin Marinas @ 2010-06-21 14:46 UTC (permalink / raw) To: linux-arm-kernel ARMv7 processors like Cortex-A9 broadcast the cache maintenance operations in hardware. This patch allows the flush_dcache_page/update_mmu_cache pair to work in lazy flushing mode similar to the UP case. Note that cache flushing on SMP systems now takes place via the set_pte_at() call (__sync_icache_dcache) and there is no race with other CPUs executing code from the new PTE before the cache flushing took place. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/smp_plat.h | 4 ++++ arch/arm/mm/flush.c | 13 ++++--------- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h index e621530..963a338 100644 --- a/arch/arm/include/asm/smp_plat.h +++ b/arch/arm/include/asm/smp_plat.h @@ -13,9 +13,13 @@ static inline int tlb_ops_need_broadcast(void) return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 2; } +#if !defined(CONFIG_SMP) || __LINUX_ARM_ARCH__ >= 7 +#define cache_ops_need_broadcast() 0 +#else static inline int cache_ops_need_broadcast(void) { return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 1; } +#endif #endif diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 18a8467..2d08a5e 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -17,6 +17,7 @@ #include <asm/smp_plat.h> #include <asm/system.h> #include <asm/tlbflush.h> +#include <asm/smp_plat.h> #include "mm.h" @@ -93,12 +94,10 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig #define flush_pfn_alias(pfn,vaddr) do { } while (0) #endif -#ifdef CONFIG_SMP static void flush_ptrace_access_other(void *args) { __flush_icache_all(); } -#endif static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page, @@ -122,11 +121,9 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page, if (vma->vm_flags & VM_EXEC) { unsigned long addr = (unsigned long)kaddr; __cpuc_coherent_kern_range(addr, addr + len); -#ifdef CONFIG_SMP if (cache_ops_need_broadcast()) smp_call_function(flush_ptrace_access_other, NULL, 1); -#endif } } @@ -261,12 +258,10 @@ void flush_dcache_page(struct page *page) mapping = page_mapping(page); -#ifndef CONFIG_SMP - if (!PageHighMem(page) && mapping && !mapping_mapped(mapping)) + if (!cache_ops_need_broadcast() && + !PageHighMem(page) && mapping && !mapping_mapped(mapping)) clear_bit(PG_dcache_clean, &page->flags); - else -#endif - { + else { __flush_dcache_page(mapping, page); if (mapping && cache_is_vivt()) __flush_dcache_aliases(mapping, page); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 0/4] Patches for -next 2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas ` (3 preceding siblings ...) 2010-06-21 14:46 ` [PATCH v4 4/4] ARM: Use lazy cache flushing on ARMv7 " Catalin Marinas @ 2010-06-22 10:29 ` Rabin Vincent 2010-06-22 11:34 ` Catalin Marinas 4 siblings, 1 reply; 11+ messages in thread From: Rabin Vincent @ 2010-06-22 10:29 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jun 21, 2010 at 8:16 PM, Catalin Marinas <catalin.marinas@arm.com> wrote: > Pretty much the same content as v3 but I reordered the last three > patches to make sure that switching to lazy cache flushing on ARMv7 SMP > doesn't make the set_pte/cache flushing race more visible (issue handled > by the set_pte_at patch). > > > Catalin Marinas (4): > ? ? ?ARM: Remove the domain switching on ARMv6k/v7 CPUs > ? ? ?ARM: Assume new page cache pages have dirty D-cache > ? ? ?ARM: Synchronise the I and D caches via set_pte_at() on SMP systems > ? ? ?ARM: Use lazy cache flushing on ARMv7 SMP systems I've tried patches 2, 3, and 4 on a Cortex-A9 SMP system and it resolves the MMC rootfs init crash without the need for the flush_kernel_dcache_page() change. Tested-by: Rabin Vincent <rabin.vincent@stericsson.com> Rabin P.S.: Patch 1 (at least the version on your proposed branch), doesn't seem to apply cleanly on linux-next. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 0/4] Patches for -next 2010-06-22 10:29 ` [PATCH v4 0/4] Patches for -next Rabin Vincent @ 2010-06-22 11:34 ` Catalin Marinas 0 siblings, 0 replies; 11+ messages in thread From: Catalin Marinas @ 2010-06-22 11:34 UTC (permalink / raw) To: linux-arm-kernel On Tue, 2010-06-22 at 11:29 +0100, Rabin Vincent wrote: > On Mon, Jun 21, 2010 at 8:16 PM, Catalin Marinas > <catalin.marinas@arm.com> wrote: > > Pretty much the same content as v3 but I reordered the last three > > patches to make sure that switching to lazy cache flushing on ARMv7 SMP > > doesn't make the set_pte/cache flushing race more visible (issue handled > > by the set_pte_at patch). > > > > > > Catalin Marinas (4): > > ARM: Remove the domain switching on ARMv6k/v7 CPUs > > ARM: Assume new page cache pages have dirty D-cache > > ARM: Synchronise the I and D caches via set_pte_at() on SMP systems > > ARM: Use lazy cache flushing on ARMv7 SMP systems > > I've tried patches 2, 3, and 4 on a Cortex-A9 SMP system and it resolves the > MMC rootfs init crash without the need for the flush_kernel_dcache_page() > change. > > Tested-by: Rabin Vincent <rabin.vincent@stericsson.com> Thanks. > P.S.: Patch 1 (at least the version on your proposed branch), doesn't seem to > apply cleanly on linux-next. It's a bit more intrusive and I'd like to push it to linux-next (but still waiting for review on this list). -- Catalin ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-06-22 22:39 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-21 14:46 [PATCH v4 0/4] Patches for -next Catalin Marinas 2010-06-21 14:46 ` [PATCH v4 1/4] ARM: Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas 2010-06-22 12:47 ` Anton Vorontsov 2010-06-22 13:01 ` Catalin Marinas 2010-06-21 14:46 ` [PATCH v4 2/4] ARM: Assume new page cache pages have dirty D-cache Catalin Marinas 2010-06-22 19:36 ` Rabin Vincent 2010-06-22 22:39 ` Catalin Marinas 2010-06-21 14:46 ` [PATCH v4 3/4] ARM: Synchronise the I and D caches via set_pte_at() on SMP systems Catalin Marinas 2010-06-21 14:46 ` [PATCH v4 4/4] ARM: Use lazy cache flushing on ARMv7 " Catalin Marinas 2010-06-22 10:29 ` [PATCH v4 0/4] Patches for -next Rabin Vincent 2010-06-22 11:34 ` Catalin Marinas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).