* [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1
@ 2009-12-07 14:10 Catalin Marinas
2009-12-07 14:10 ` [PATCH 1/6] Global ASID allocation on SMP Catalin Marinas
` (5 more replies)
0 siblings, 6 replies; 20+ messages in thread
From: Catalin Marinas @ 2009-12-07 14:10 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
There are some patches which I've had in my branch for some time and I
would like to get them merged by 2.6.34.
The first three are fixes to allow Linux to work better on SMP systems.
The next two are improvements for the ARMv7. The last patch remove the
use of domains for ARMv6k and ARMv7 processors. One of the reasons is
that because of domain switching, we can get speculative fetches from
I/O areas.
Catalin Marinas (6):
Global ASID allocation on SMP
Broadcast the DMA cache operations on ARMv6 SMP hardware
Fix a race in the vfp_notifier() function on SMP systems
ARMv7: Use lazy cache flushing if hardware broadcasts cache operations
ARMv7: Improved page table format with TRE and AFE
Remove the domain switching on ARMv6k/v7 CPUs
arch/arm/include/asm/assembler.h | 9 +-
arch/arm/include/asm/cacheflush.h | 29 ++++++++
arch/arm/include/asm/domain.h | 31 ++++++++
arch/arm/include/asm/futex.h | 9 +-
arch/arm/include/asm/memory.h | 6 +-
arch/arm/include/asm/mmu.h | 1
arch/arm/include/asm/mmu_context.h | 15 ++++
arch/arm/include/asm/page.h | 8 ++
arch/arm/include/asm/pgalloc.h | 10 ++-
arch/arm/include/asm/pgtable.h | 117 ++++++++++++++++++++++++++++----
arch/arm/include/asm/smp_plat.h | 9 ++
arch/arm/include/asm/uaccess.h | 16 ++--
arch/arm/kernel/entry-armv.S | 4 +
arch/arm/kernel/smp.c | 133 ++++++++++++++++++++++++++++++++++++
arch/arm/kernel/traps.c | 17 +++++
arch/arm/lib/getuser.S | 13 ++--
arch/arm/lib/putuser.S | 29 ++++----
arch/arm/lib/uaccess.S | 83 +++++++++++-----------
arch/arm/mm/Kconfig | 26 +++++++
arch/arm/mm/context.c | 120 +++++++++++++++++++++++++++++---
arch/arm/mm/dma-mapping.c | 20 ++++-
arch/arm/mm/fault-armv.c | 2 -
arch/arm/mm/fault.c | 10 +++
arch/arm/mm/flush.c | 9 +-
arch/arm/mm/mmu.c | 7 +-
arch/arm/mm/proc-v7.S | 58 ++++++----------
arch/arm/vfp/vfpmodule.c | 25 ++++++-
27 files changed, 647 insertions(+), 169 deletions(-)
--
Catalin
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH 1/6] Global ASID allocation on SMP 2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas @ 2009-12-07 14:10 ` Catalin Marinas 2009-12-07 14:13 ` [PATCH 2/6] Broadcast the DMA cache operations on ARMv6 SMP hardware Catalin Marinas ` (4 subsequent siblings) 5 siblings, 0 replies; 20+ messages in thread From: Catalin Marinas @ 2009-12-07 14:10 UTC (permalink / raw) To: linux-arm-kernel The current ASID allocation algorithm doesn't ensure the notification of the other CPUs when the ASID rolls over. This may lead to two processes using the same ASID (but different generation) or multiple threads of the same process using different ASIDs. This patch adds the broadcasting of the ASID rollover event to the other CPUs. To avoid a race on multiple CPUs modifying "cpu_last_asid" during the handling of the broadcast, the ASID numbering now starts at "smp_processor_id() + 1". At rollover, the cpu_last_asid will be set to NR_CPUS. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/mmu.h | 1 arch/arm/include/asm/mmu_context.h | 15 +++++ arch/arm/mm/context.c | 120 ++++++++++++++++++++++++++++++++---- 3 files changed, 122 insertions(+), 14 deletions(-) diff --git a/arch/arm/include/asm/mmu.h b/arch/arm/include/asm/mmu.h index b561584..68870c7 100644 --- a/arch/arm/include/asm/mmu.h +++ b/arch/arm/include/asm/mmu.h @@ -6,6 +6,7 @@ typedef struct { #ifdef CONFIG_CPU_HAS_ASID unsigned int id; + spinlock_t id_lock; #endif unsigned int kvm_seq; } mm_context_t; diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h index de6cefb..a0b3cac 100644 --- a/arch/arm/include/asm/mmu_context.h +++ b/arch/arm/include/asm/mmu_context.h @@ -43,12 +43,23 @@ void __check_kvm_seq(struct mm_struct *mm); #define ASID_FIRST_VERSION (1 << ASID_BITS) extern unsigned int cpu_last_asid; +#ifdef CONFIG_SMP +DECLARE_PER_CPU(struct mm_struct *, current_mm); +#endif void __init_new_context(struct task_struct *tsk, struct mm_struct *mm); void __new_context(struct mm_struct *mm); static inline void check_context(struct mm_struct *mm) { + /* + * This code is executed with interrupts enabled. Therefore, + * mm->context.id cannot be updated to the latest ASID version + * on a different CPU (and condition below not triggered) + * without first getting an IPI to reset the context. The + * alternative is to take a read_lock on mm->context.id_lock + * (after changing its type to rwlock_t). + */ if (unlikely((mm->context.id ^ cpu_last_asid) >> ASID_BITS)) __new_context(mm); @@ -108,6 +119,10 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next, __flush_icache_all(); #endif if (!cpumask_test_and_set_cpu(cpu, mm_cpumask(next)) || prev != next) { +#ifdef CONFIG_SMP + struct mm_struct **crt_mm = &per_cpu(current_mm, cpu); + *crt_mm = next; +#endif check_context(next); cpu_switch_mm(next->pgd, next); if (cache_is_vivt()) diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c index a9e22e3..626375b 100644 --- a/arch/arm/mm/context.c +++ b/arch/arm/mm/context.c @@ -10,12 +10,17 @@ #include <linux/init.h> #include <linux/sched.h> #include <linux/mm.h> +#include <linux/smp.h> +#include <linux/percpu.h> #include <asm/mmu_context.h> #include <asm/tlbflush.h> static DEFINE_SPINLOCK(cpu_asid_lock); unsigned int cpu_last_asid = ASID_FIRST_VERSION; +#ifdef CONFIG_SMP +DEFINE_PER_CPU(struct mm_struct *, current_mm); +#endif /* * We fork()ed a process, and we need a new context for the child @@ -26,13 +31,105 @@ unsigned int cpu_last_asid = ASID_FIRST_VERSION; void __init_new_context(struct task_struct *tsk, struct mm_struct *mm) { mm->context.id = 0; + spin_lock_init(&mm->context.id_lock); } +static void flush_context(void) +{ + /* set the reserved ASID before flushing the TLB */ + asm("mcr p15, 0, %0, c13, c0, 1\n" : : "r" (0)); + isb(); + local_flush_tlb_all(); + if (icache_is_vivt_asid_tagged()) { + __flush_icache_all(); + dsb(); + } +} + +#ifdef CONFIG_SMP + +static void set_mm_context(struct mm_struct *mm, unsigned int asid) +{ + /* + * Locking needed for multi-threaded applications where the + * same mm->context.id could be set from different CPUs during + * the broadcast. + */ + spin_lock(&mm->context.id_lock); + if (likely((mm->context.id ^ cpu_last_asid) >> ASID_BITS)) { + /* + * Old version of ASID found. Set the new one and + * reset mm_cpumask(mm). + */ + mm->context.id = asid; + cpumask_clear(mm_cpumask(mm)); + } + spin_unlock(&mm->context.id_lock); + + /* + * Set the mm_cpumask(mm) bit for the current CPU. + */ + cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm)); +} + +/* + * Reset the ASID on the current CPU. This function call is broadcast + * from the CPU handling the ASID rollover and holding cpu_asid_lock. + */ +static void reset_context(void *info) +{ + unsigned int asid; + unsigned int cpu = smp_processor_id(); + struct mm_struct *mm = per_cpu(current_mm, cpu); + + /* + * Check if a current_mm was set on this CPU as it might still + * be in the early booting stages and using the reserved ASID. + */ + if (!mm) + return; + + smp_rmb(); + asid = cpu_last_asid + cpu + 1; + + flush_context(); + set_mm_context(mm, asid); + + /* set the new ASID */ + asm("mcr p15, 0, %0, c13, c0, 1\n" : : "r" (mm->context.id)); +} + +#else + +static inline void set_mm_context(struct mm_struct *mm, unsigned int asid) +{ + mm->context.id = asid; + cpumask_copy(mm_cpumask(mm), cpumask_of(smp_processor_id())); +} + +#endif + void __new_context(struct mm_struct *mm) { unsigned int asid; spin_lock(&cpu_asid_lock); +#ifdef CONFIG_SMP + /* + * Check the ASID again, in case the change was broadcast from + * another CPU before we acquired the lock. + */ + if (unlikely(((mm->context.id ^ cpu_last_asid) >> ASID_BITS) == 0)) { + cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm)); + spin_unlock(&cpu_asid_lock); + return; + } +#endif + /* + * At this point, it is guaranteed that the current mm (with + * an old ASID) isn't active on any other CPU since the ASIDs + * are changed simultaneously via IPI. + */ asid = ++cpu_last_asid; if (asid == 0) asid = cpu_last_asid = ASID_FIRST_VERSION; @@ -42,20 +139,15 @@ void __new_context(struct mm_struct *mm) * to start a new version and flush the TLB. */ if (unlikely((asid & ~ASID_MASK) == 0)) { - asid = ++cpu_last_asid; - /* set the reserved ASID before flushing the TLB */ - asm("mcr p15, 0, %0, c13, c0, 1 @ set reserved context ID\n" - : - : "r" (0)); - isb(); - flush_tlb_all(); - if (icache_is_vivt_asid_tagged()) { - __flush_icache_all(); - dsb(); - } + asid = cpu_last_asid + smp_processor_id() + 1; + flush_context(); +#ifdef CONFIG_SMP + smp_wmb(); + smp_call_function(reset_context, NULL, 1); +#endif + cpu_last_asid += NR_CPUS; } - spin_unlock(&cpu_asid_lock); - cpumask_copy(mm_cpumask(mm), cpumask_of(smp_processor_id())); - mm->context.id = asid; + set_mm_context(mm, asid); + spin_unlock(&cpu_asid_lock); } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/6] Broadcast the DMA cache operations on ARMv6 SMP hardware 2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas 2009-12-07 14:10 ` [PATCH 1/6] Global ASID allocation on SMP Catalin Marinas @ 2009-12-07 14:13 ` Catalin Marinas 2009-12-07 14:13 ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems Catalin Marinas ` (3 subsequent siblings) 5 siblings, 0 replies; 20+ messages in thread From: Catalin Marinas @ 2009-12-07 14:13 UTC (permalink / raw) To: linux-arm-kernel The Snoop Control Unit on the ARM11MPCore hardware does not detect the cache operations and the dma_cache_maint() function may leave stale cache entries on other CPUs. The solution is to broadcast the cache operations to the other CPUs in software. However, there is no restriction to the contexts in which dma_cache_maint() function can be called (interrupt context or IRQs disabled). This patch implements the smp_dma_cache_op() function which performs the broadcast and it can be called with interrupts disabled or from interrupt context. To avoid deadlocking when more than one CPU try to invoke this function, the implementation uses spin_trylock() loop if the IRQs are disabled and, if the lock cannot be acquired, it polls for an incoming IPI and executes it. In the unlikely situation of two or more CPUs calling the smp_dma_cache_op() function with interrupts disabled, there may be spurious (or delayed) IPIs after a CPU completes and enables the IRQs. These are handled by checking the corresponding "unfinished" bits in the IPI handler. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- Just a note - the DMA cache ops broadcasting in software cannot easily use the generic IPI functionality in the kernel because of the restriction to have the interrupts enabled when invoking smp_call_function(). Another reason to do it separately is that the introduced smp_dma_cache_op() function runs the DMA cache operation locally in parallel with the other CPUs while smp_call_function() would only run it on the other CPUs in parallel but not with the current CPU. arch/arm/include/asm/cacheflush.h | 29 ++++++++ arch/arm/kernel/smp.c | 133 +++++++++++++++++++++++++++++++++++++ arch/arm/mm/Kconfig | 5 + arch/arm/mm/dma-mapping.c | 14 ++-- 4 files changed, 174 insertions(+), 7 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index 3d0cdd2..b3c53f5 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -280,6 +280,35 @@ extern void dmac_flush_range(const void *, const void *); #endif +#ifdef CONFIG_CPU_NO_CACHE_BCAST +enum smp_dma_cache_type { + SMP_DMA_CACHE_INV, + SMP_DMA_CACHE_CLEAN, + SMP_DMA_CACHE_FLUSH, +}; + +extern void smp_dma_cache_op(int type, const void *start, const void *end); + +static inline void smp_dma_inv_range(const void *start, const void *end) +{ + smp_dma_cache_op(SMP_DMA_CACHE_INV, start, end); +} + +static inline void smp_dma_clean_range(const void *start, const void *end) +{ + smp_dma_cache_op(SMP_DMA_CACHE_CLEAN, start, end); +} + +static inline void smp_dma_flush_range(const void *start, const void *end) +{ + smp_dma_cache_op(SMP_DMA_CACHE_FLUSH, start, end); +} +#else +#define smp_dma_inv_range dmac_inv_range +#define smp_dma_clean_range dmac_clean_range +#define smp_dma_flush_range dmac_flush_range +#endif + #ifdef CONFIG_OUTER_CACHE extern struct outer_cache_fns outer_cache; diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 57162af..27827bd 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -65,6 +65,9 @@ enum ipi_msg_type { IPI_CALL_FUNC, IPI_CALL_FUNC_SINGLE, IPI_CPU_STOP, +#ifdef CONFIG_CPU_NO_CACHE_BCAST + IPI_DMA_CACHE, +#endif }; int __cpuinit __cpu_up(unsigned int cpu) @@ -473,6 +476,10 @@ static void ipi_cpu_stop(unsigned int cpu) cpu_relax(); } +#ifdef CONFIG_CPU_NO_CACHE_BCAST +static void ipi_dma_cache_op(unsigned int cpu); +#endif + /* * Main handler for inter-processor interrupts * @@ -532,6 +539,12 @@ asmlinkage void __exception do_IPI(struct pt_regs *regs) ipi_cpu_stop(cpu); break; +#ifdef CONFIG_CPU_NO_CACHE_BCAST + case IPI_DMA_CACHE: + ipi_dma_cache_op(cpu); + break; +#endif + default: printk(KERN_CRIT "CPU%u: Unknown IPI message 0x%x\n", cpu, nextmsg); @@ -687,3 +700,123 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) } else local_flush_tlb_kernel_range(start, end); } + +#ifdef CONFIG_CPU_NO_CACHE_BCAST +/* + * DMA cache maintenance operations on SMP if the automatic hardware + * broadcasting is not available + */ +struct smp_dma_cache_struct { + int type; + const void *start; + const void *end; + cpumask_t unfinished; +}; + +static struct smp_dma_cache_struct *smp_dma_cache_data; +static DEFINE_RWLOCK(smp_dma_cache_data_lock); +static DEFINE_SPINLOCK(smp_dma_cache_lock); + +static void local_dma_cache_op(int type, const void *start, const void *end) +{ + switch (type) { + case SMP_DMA_CACHE_INV: + dmac_inv_range(start, end); + break; + case SMP_DMA_CACHE_CLEAN: + dmac_clean_range(start, end); + break; + case SMP_DMA_CACHE_FLUSH: + dmac_flush_range(start, end); + break; + default: + printk(KERN_CRIT "CPU%u: Unknown SMP DMA cache type %d\n", + smp_processor_id(), type); + } +} + +/* + * This function must be executed with interrupts disabled. + */ +static void ipi_dma_cache_op(unsigned int cpu) +{ + read_lock(&smp_dma_cache_data_lock); + + /* check for spurious IPI */ + if ((smp_dma_cache_data == NULL) || + (!cpu_isset(cpu, smp_dma_cache_data->unfinished))) + goto out; + local_dma_cache_op(smp_dma_cache_data->type, + smp_dma_cache_data->start, smp_dma_cache_data->end); + cpu_clear(cpu, smp_dma_cache_data->unfinished); + out: + read_unlock(&smp_dma_cache_data_lock); +} + +/* + * Execute the DMA cache operations on all online CPUs. This function + * can be called with interrupts disabled or from interrupt context. + */ +static void __smp_dma_cache_op(int type, const void *start, const void *end) +{ + struct smp_dma_cache_struct data; + cpumask_t callmap = cpu_online_map; + unsigned int cpu = get_cpu(); + unsigned long flags; + + cpu_clear(cpu, callmap); + data.type = type; + data.start = start; + data.end = end; + data.unfinished = callmap; + + /* + * If the spinlock cannot be acquired, other CPU is trying to + * send an IPI. If the interrupts are disabled, we have to + * poll for an incoming IPI. + */ + while (!spin_trylock_irqsave(&smp_dma_cache_lock, flags)) { + if (irqs_disabled()) + ipi_dma_cache_op(cpu); + } + + write_lock(&smp_dma_cache_data_lock); + smp_dma_cache_data = &data; + write_unlock(&smp_dma_cache_data_lock); + + if (!cpus_empty(callmap)) + send_ipi_message(&callmap, IPI_DMA_CACHE); + /* run the local operation in parallel with the other CPUs */ + local_dma_cache_op(type, start, end); + + while (!cpus_empty(data.unfinished)) + barrier(); + + write_lock(&smp_dma_cache_data_lock); + smp_dma_cache_data = NULL; + write_unlock(&smp_dma_cache_data_lock); + + spin_unlock_irqrestore(&smp_dma_cache_lock, flags); + put_cpu(); +} + +#define DMA_MAX_RANGE SZ_4K + +/* + * Split the cache range in smaller pieces if interrupts are enabled + * to reduce the latency caused by disabling the interrupts during the + * broadcast. + */ +void smp_dma_cache_op(int type, const void *start, const void *end) +{ + if (irqs_disabled() || (end - start <= DMA_MAX_RANGE)) + __smp_dma_cache_op(type, start, end); + else { + const void *ptr; + for (ptr = start; ptr < end - DMA_MAX_RANGE; + ptr += DMA_MAX_RANGE) + __smp_dma_cache_op(type, ptr, ptr + DMA_MAX_RANGE); + __smp_dma_cache_op(type, ptr, end); + } +} +#endif diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index 9264d81..ce382f5 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -516,6 +516,11 @@ config CPU_CACHE_VIPT config CPU_CACHE_FA bool +config CPU_NO_CACHE_BCAST + bool + depends on SMP + default y if CPU_V6 + if MMU # The copy-page model config CPU_COPY_V3 diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index b9590a7..176c696 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -219,7 +219,7 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, { void *ptr = page_address(page); memset(ptr, 0, size); - dmac_flush_range(ptr, ptr + size); + smp_dma_flush_range(ptr, ptr + size); outer_flush_range(__pa(ptr), __pa(ptr) + size); } @@ -548,15 +548,15 @@ void dma_cache_maint(const void *start, size_t size, int direction) switch (direction) { case DMA_FROM_DEVICE: /* invalidate only */ - inner_op = dmac_inv_range; + inner_op = smp_dma_inv_range; outer_op = outer_inv_range; break; case DMA_TO_DEVICE: /* writeback only */ - inner_op = dmac_clean_range; + inner_op = smp_dma_clean_range; outer_op = outer_clean_range; break; case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - inner_op = dmac_flush_range; + inner_op = smp_dma_flush_range; outer_op = outer_flush_range; break; default: @@ -578,15 +578,15 @@ static void dma_cache_maint_contiguous(struct page *page, unsigned long offset, switch (direction) { case DMA_FROM_DEVICE: /* invalidate only */ - inner_op = dmac_inv_range; + inner_op = smp_dma_inv_range; outer_op = outer_inv_range; break; case DMA_TO_DEVICE: /* writeback only */ - inner_op = dmac_clean_range; + inner_op = smp_dma_clean_range; outer_op = outer_clean_range; break; case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - inner_op = dmac_flush_range; + inner_op = smp_dma_flush_range; outer_op = outer_flush_range; break; default: ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems 2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas 2009-12-07 14:10 ` [PATCH 1/6] Global ASID allocation on SMP Catalin Marinas 2009-12-07 14:13 ` [PATCH 2/6] Broadcast the DMA cache operations on ARMv6 SMP hardware Catalin Marinas @ 2009-12-07 14:13 ` Catalin Marinas 2009-12-12 12:24 ` Russell King - ARM Linux 2009-12-07 14:13 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcasts cache operations Catalin Marinas ` (2 subsequent siblings) 5 siblings, 1 reply; 20+ messages in thread From: Catalin Marinas @ 2009-12-07 14:13 UTC (permalink / raw) To: linux-arm-kernel The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu different from the current one, causing a race condition with both the THREAD_NOTIFY_SWITCH path and vfp_support_entry(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/vfp/vfpmodule.c | 25 ++++++++++++++++++++++--- 1 files changed, 22 insertions(+), 3 deletions(-) diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 2d7423a..fa6692a 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -14,6 +14,7 @@ #include <linux/signal.h> #include <linux/sched.h> #include <linux/init.h> +#include <linux/rcupdate.h> #include <asm/thread_notify.h> #include <asm/vfp.h> @@ -49,14 +50,21 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v) #ifdef CONFIG_SMP /* + * RCU locking is needed in case last_VFP_context[cpu] is + * released on a different CPU. + */ + rcu_read_lock(); + vfp = last_VFP_context[cpu]; + /* * On SMP, if VFP is enabled, save the old state in * case the thread migrates to a different CPU. The * restoring is done lazily. */ - if ((fpexc & FPEXC_EN) && last_VFP_context[cpu]) { - vfp_save_state(last_VFP_context[cpu], fpexc); - last_VFP_context[cpu]->hard.cpu = cpu; + if ((fpexc & FPEXC_EN) && vfp) { + vfp_save_state(vfp, fpexc); + vfp->hard.cpu = cpu; } + rcu_read_unlock(); /* * Thread migration, just force the reloading of the * state on the new CPU in case the VFP registers @@ -91,8 +99,19 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v) } /* flush and release case: Per-thread VFP cleanup. */ +#ifndef CONFIG_SMP if (last_VFP_context[cpu] == vfp) last_VFP_context[cpu] = NULL; +#else + /* + * Since release_thread() may be called from a different CPU, we use + * cmpxchg() here to avoid a race with the vfp_support_entry() code + * which modifies last_VFP_context[cpu]. Note that on SMP systems, a + * STR instruction on a different CPU clears the global exclusive + * monitor state. + */ + (void)cmpxchg(&last_VFP_context[cpu], vfp, NULL); +#endif return NOTIFY_DONE; } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems 2009-12-07 14:13 ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems Catalin Marinas @ 2009-12-12 12:24 ` Russell King - ARM Linux 2009-12-12 13:57 ` Russell King - ARM Linux 2009-12-14 12:15 ` Catalin Marinas 0 siblings, 2 replies; 20+ messages in thread From: Russell King - ARM Linux @ 2009-12-12 12:24 UTC (permalink / raw) To: linux-arm-kernel On Mon, Dec 07, 2009 at 02:13:34PM +0000, Catalin Marinas wrote: > The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu > different from the current one, causing a race condition with both the > THREAD_NOTIFY_SWITCH path and vfp_support_entry(). The only call where thread->cpu may not be the current CPU is in the THREAD_NOFTIFY_RELEASE case. When called in the THREAD_NOTIFY_SWITCH case, we are switching to the specified thread, and thread->cpu better be smp_processor_id() or else we're saving our CPUs VFP state into some other CPUs currently running thread. Not only that, but the thread we're switching away from will still be 'owned' by the CPU we're running on, and can't be scheduled onto another CPU without this function first completing, nor can it be flushed nor released. > @@ -49,14 +50,21 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v) > > #ifdef CONFIG_SMP BUG_ON(cpu != smp_processor_id()); since it would be very bad if this was any different. Note that this is also a non-preemptible context - we're called from the scheduler, and the scheduler can't be preempted mid-thead switch. > /* > + * RCU locking is needed in case last_VFP_context[cpu] is > + * released on a different CPU. > + */ > + rcu_read_lock(); Given that we're modifying our CPUs last_VFP_context here, I don't see what the RCU locks give us - the thread we're switching to can _not_ be being released at this time - we can't be switching to a dead task. Not only that, but this notifier is already called under the RCU lock, so this is a no-op. > + vfp = last_VFP_context[cpu]; > + /* > * On SMP, if VFP is enabled, save the old state in > * case the thread migrates to a different CPU. The > * restoring is done lazily. > */ > - if ((fpexc & FPEXC_EN) && last_VFP_context[cpu]) { > - vfp_save_state(last_VFP_context[cpu], fpexc); > - last_VFP_context[cpu]->hard.cpu = cpu; > + if ((fpexc & FPEXC_EN) && vfp) { > + vfp_save_state(vfp, fpexc); > + vfp->hard.cpu = cpu; > } > + rcu_read_unlock(); > /* > * Thread migration, just force the reloading of the > * state on the new CPU in case the VFP registers > @@ -91,8 +99,19 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v) > } > > /* flush and release case: Per-thread VFP cleanup. */ > +#ifndef CONFIG_SMP > if (last_VFP_context[cpu] == vfp) > last_VFP_context[cpu] = NULL; > +#else > + /* > + * Since release_thread() may be called from a different CPU, we use > + * cmpxchg() here to avoid a race with the vfp_support_entry() code > + * which modifies last_VFP_context[cpu]. Note that on SMP systems, a > + * STR instruction on a different CPU clears the global exclusive > + * monitor state. > + */ > + (void)cmpxchg(&last_VFP_context[cpu], vfp, NULL); > +#endif I think this hunk is the only part which makes sense. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems 2009-12-12 12:24 ` Russell King - ARM Linux @ 2009-12-12 13:57 ` Russell King - ARM Linux 2009-12-14 12:21 ` Catalin Marinas 2009-12-14 12:15 ` Catalin Marinas 1 sibling, 1 reply; 20+ messages in thread From: Russell King - ARM Linux @ 2009-12-12 13:57 UTC (permalink / raw) To: linux-arm-kernel On Sat, Dec 12, 2009 at 12:24:47PM +0000, Russell King - ARM Linux wrote: > On Mon, Dec 07, 2009 at 02:13:34PM +0000, Catalin Marinas wrote: > > The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu > > different from the current one, causing a race condition with both the > > THREAD_NOTIFY_SWITCH path and vfp_support_entry(). > > The only call where thread->cpu may not be the current CPU is in the > THREAD_NOFTIFY_RELEASE case. > > When called in the THREAD_NOTIFY_SWITCH case, we are switching to the > specified thread, and thread->cpu better be smp_processor_id() or else > we're saving our CPUs VFP state into some other CPUs currently running > thread. > > Not only that, but the thread we're switching away from will still be > 'owned' by the CPU we're running on, and can't be scheduled onto another > CPU without this function first completing, nor can it be flushed nor > released. Here's a patch which adds this documentation, and fixes the THREAD_NOTIFY_FLUSH case - since that could be preempted. diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 2d7423a..aed05bc 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -38,16 +38,72 @@ union vfp_state *last_VFP_context[NR_CPUS]; */ unsigned int VFP_arch; +/* + * Per-thread VFP initialization. + */ +static void vfp_thread_flush(struct thread_info *thread) +{ + union vfp_state *vfp = &thread->vfpstate; + unsigned int cpu; + + memset(vfp, 0, sizeof(union vfp_state)); + + vfp->hard.fpexc = FPEXC_EN; + vfp->hard.fpscr = FPSCR_ROUND_NEAREST; + + /* + * Disable VFP to ensure we initialize it first. We must ensure + * that the modification of last_VFP_context[] and hardware disable + * are done for the same CPU and without preemption. + */ + cpu = get_cpu(); + if (last_VFP_context[cpu] == vfp) + last_VFP_context[cpu] = NULL; + fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN); + put_cpu(); +} + +static void vfp_thread_release(struct thread_info *thread) +{ + /* release case: Per-thread VFP cleanup. */ + union vfp_state *vfp = &thread->vfpstate; + unsigned int cpu = thread->cpu; + + if (last_VFP_context[cpu] == vfp) + last_VFP_context[cpu] = NULL; +} + +/* + * When this function is called with the following 'cmd's, the following + * is true while this function is being run: + * THREAD_NOFTIFY_SWTICH: + * - the previously running thread will not be scheduled onto another CPU. + * - the next thread to be run (v) will not be running on another CPU. + * - thread->cpu is the local CPU number + * - not preemptible as we're called in the middle of a thread switch + * THREAD_NOTIFY_FLUSH: + * - the thread (v) will be running on the local CPU, so + * v === current_thread_info() + * - thread->cpu is the local CPU number at the time it is accessed, + * but may change at any time. + * - we could be preempted if tree preempt rcu is enabled, so + * it is unsafe to use thread->cpu. + * THREAD_NOTIFY_RELEASE: + * - the thread (v) will not be running on any CPU; it is a dead thread. + * - thread->cpu will be the last CPU the thread ran on, which may not + * be the current CPU. + * - we could be preempted if tree preempt rcu is enabled. + */ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v) { struct thread_info *thread = v; - union vfp_state *vfp; - __u32 cpu = thread->cpu; if (likely(cmd == THREAD_NOTIFY_SWITCH)) { u32 fpexc = fmrx(FPEXC); #ifdef CONFIG_SMP + unsigned int cpu = thread->cpu; + /* * On SMP, if VFP is enabled, save the old state in * case the thread migrates to a different CPU. The @@ -74,25 +130,10 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v) return NOTIFY_DONE; } - vfp = &thread->vfpstate; - if (cmd == THREAD_NOTIFY_FLUSH) { - /* - * Per-thread VFP initialisation. - */ - memset(vfp, 0, sizeof(union vfp_state)); - - vfp->hard.fpexc = FPEXC_EN; - vfp->hard.fpscr = FPSCR_ROUND_NEAREST; - - /* - * Disable VFP to ensure we initialise it first. - */ - fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN); - } - - /* flush and release case: Per-thread VFP cleanup. */ - if (last_VFP_context[cpu] == vfp) - last_VFP_context[cpu] = NULL; + if (cmd == THREAD_NOTIFY_FLUSH) + vfp_thread_flush(thread); + else + vfp_thread_release(thread); return NOTIFY_DONE; } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems 2009-12-12 13:57 ` Russell King - ARM Linux @ 2009-12-14 12:21 ` Catalin Marinas 0 siblings, 0 replies; 20+ messages in thread From: Catalin Marinas @ 2009-12-14 12:21 UTC (permalink / raw) To: linux-arm-kernel On Sat, 2009-12-12 at 13:57 +0000, Russell King - ARM Linux wrote: > On Sat, Dec 12, 2009 at 12:24:47PM +0000, Russell King - ARM Linux wrote: > > On Mon, Dec 07, 2009 at 02:13:34PM +0000, Catalin Marinas wrote: > > > The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu > > > different from the current one, causing a race condition with both the > > > THREAD_NOTIFY_SWITCH path and vfp_support_entry(). > > > > The only call where thread->cpu may not be the current CPU is in the > > THREAD_NOFTIFY_RELEASE case. > > > > When called in the THREAD_NOTIFY_SWITCH case, we are switching to the > > specified thread, and thread->cpu better be smp_processor_id() or else > > we're saving our CPUs VFP state into some other CPUs currently running > > thread. > > > > Not only that, but the thread we're switching away from will still be > > 'owned' by the CPU we're running on, and can't be scheduled onto another > > CPU without this function first completing, nor can it be flushed nor > > released. > > Here's a patch which adds this documentation, and fixes the > THREAD_NOTIFY_FLUSH case - since that could be preempted. [...] > + * THREAD_NOFTIFY_SWTICH: > + * - the previously running thread will not be scheduled onto another CPU. While this comment is certainly true, I don't think it is relevant since we aren't always switching the VFP context from the thread being switched out but it may be a thread that ran much earlier. Otherwise the patch is fine. -- Catalin ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems 2009-12-12 12:24 ` Russell King - ARM Linux 2009-12-12 13:57 ` Russell King - ARM Linux @ 2009-12-14 12:15 ` Catalin Marinas 2009-12-14 16:28 ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMPsystems Catalin Marinas 1 sibling, 1 reply; 20+ messages in thread From: Catalin Marinas @ 2009-12-14 12:15 UTC (permalink / raw) To: linux-arm-kernel On Sat, 2009-12-12 at 12:24 +0000, Russell King - ARM Linux wrote: > On Mon, Dec 07, 2009 at 02:13:34PM +0000, Catalin Marinas wrote: > > The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu > > different from the current one, causing a race condition with both the > > THREAD_NOTIFY_SWITCH path and vfp_support_entry(). > > The only call where thread->cpu may not be the current CPU is in the > THREAD_NOFTIFY_RELEASE case. Correct. > When called in the THREAD_NOTIFY_SWITCH case, we are switching to the > specified thread, and thread->cpu better be smp_processor_id() or else > we're saving our CPUs VFP state into some other CPUs currently running > thread. Also correct. > Not only that, but the thread we're switching away from will still be > 'owned' by the CPU we're running on, and can't be scheduled onto another > CPU without this function first completing, nor can it be flushed nor > released. Correct but see below. > > /* > > + * RCU locking is needed in case last_VFP_context[cpu] is > > + * released on a different CPU. > > + */ > > + rcu_read_lock(); > > Given that we're modifying our CPUs last_VFP_context here, I don't see > what the RCU locks give us - the thread we're switching to can _not_ > be being released at this time - we can't be switching to a dead task. > Not only that, but this notifier is already called under the RCU lock, > so this is a no-op. With the current implementation, the last_CPU_context is set when the CPU takes an undef for a VFP instruction and not during every context switch, so last_VFP_context may *not* point to the task we are switching away from. The RCU locking was added to prevent the task structure (and thread_info) pointed to by last_VFP_context from being freed while executing the switch between two tasks other than the one released. If the region is RCU locked anyway, we can simply add a comment but unless we change how last_CPU_context is set, we still need this check. -- Catalin ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMPsystems 2009-12-14 12:15 ` Catalin Marinas @ 2009-12-14 16:28 ` Catalin Marinas 0 siblings, 0 replies; 20+ messages in thread From: Catalin Marinas @ 2009-12-14 16:28 UTC (permalink / raw) To: linux-arm-kernel On Mon, 2009-12-14 at 12:15 +0000, Catalin Marinas wrote: > On Sat, 2009-12-12 at 12:24 +0000, Russell King - ARM Linux wrote: > > On Mon, Dec 07, 2009 at 02:13:34PM +0000, Catalin Marinas wrote: > > > /* > > > + * RCU locking is needed in case last_VFP_context[cpu] is > > > + * released on a different CPU. > > > + */ > > > + rcu_read_lock(); [...] > > Not only that, but this notifier is already called under the RCU lock, > > so this is a no-op. [...] > If the region is RCU locked anyway, we can simply add a comment but > unless we change how last_CPU_context is set, we still need this check. I updated patch below which doesn't take the RCU lock but adds a comment. Apart from the last hunk with which you are OK, the patch makes sure that last_VFP_context[cpu] is only read once and stored to a local variable otherwise you may have the surprise that it becomes NULL if the thread that was using it is released on a different CPU (that's actually the failure we were getting under stress testing). Fix a race in the vfp_notifier() function on SMP systems From: Catalin Marinas <catalin.marinas@arm.com> The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu different from the current one, causing a race condition with both the THREAD_NOTIFY_SWITCH path and vfp_support_entry(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/vfp/vfpmodule.c | 26 +++++++++++++++++++++++--- 1 files changed, 23 insertions(+), 3 deletions(-) diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 2d7423a..8d1fe44 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -14,6 +14,7 @@ #include <linux/signal.h> #include <linux/sched.h> #include <linux/init.h> +#include <linux/rcupdate.h> #include <asm/thread_notify.h> #include <asm/vfp.h> @@ -49,14 +50,22 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v) #ifdef CONFIG_SMP /* + * The vfpstate structure pointed to by last_VFP_context[cpu] + * may be released via call_rcu(delayed_put_task_struct) but + * atomic_notifier_call_chain() already holds the RCU lock. + */ + vfp = last_VFP_context[cpu]; + + /* * On SMP, if VFP is enabled, save the old state in * case the thread migrates to a different CPU. The * restoring is done lazily. */ - if ((fpexc & FPEXC_EN) && last_VFP_context[cpu]) { - vfp_save_state(last_VFP_context[cpu], fpexc); - last_VFP_context[cpu]->hard.cpu = cpu; + if ((fpexc & FPEXC_EN) && vfp) { + vfp_save_state(vfp, fpexc); + vfp->hard.cpu = cpu; } + /* * Thread migration, just force the reloading of the * state on the new CPU in case the VFP registers @@ -91,8 +100,19 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v) } /* flush and release case: Per-thread VFP cleanup. */ +#ifndef CONFIG_SMP if (last_VFP_context[cpu] == vfp) last_VFP_context[cpu] = NULL; +#else + /* + * Since release_thread() may be called from a different CPU, we use + * cmpxchg() here to avoid a race with the vfp_support_entry() code + * which modifies last_VFP_context[cpu]. Note that on SMP systems, a + * STR instruction on a different CPU clears the global exclusive + * monitor state. + */ + (void)cmpxchg(&last_VFP_context[cpu], vfp, NULL); +#endif return NOTIFY_DONE; } -- Catalin ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcasts cache operations 2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas ` (2 preceding siblings ...) 2009-12-07 14:13 ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems Catalin Marinas @ 2009-12-07 14:13 ` Catalin Marinas 2010-03-08 16:25 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcastscache operations Catalin Marinas 2009-12-07 14:14 ` [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE Catalin Marinas 2009-12-07 14:16 ` [PATCH 6/6] Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas 5 siblings, 1 reply; 20+ messages in thread From: Catalin Marinas @ 2009-12-07 14:13 UTC (permalink / raw) To: linux-arm-kernel ARMv7 processors like Cortex-A9 broadcast the cache maintenance operations in hardware. The patch adds the CPU ID checks for such feature and allows the flush_dcache_page/update_mmu_cache pair to work in lazy flushing mode similar to the UP case. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/smp_plat.h | 9 +++++++++ arch/arm/mm/fault-armv.c | 2 -- arch/arm/mm/flush.c | 9 ++++----- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h index 59303e2..e587167 100644 --- a/arch/arm/include/asm/smp_plat.h +++ b/arch/arm/include/asm/smp_plat.h @@ -13,4 +13,13 @@ static inline int tlb_ops_need_broadcast(void) return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 2; } +#ifndef CONFIG_SMP +#define cache_ops_need_broadcast() 0 +#else +static inline int cache_ops_need_broadcast(void) +{ + return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 1; +} +#endif + #endif diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index d0d17b6..bb60117 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -153,10 +153,8 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte) page = pfn_to_page(pfn); mapping = page_mapping(page); -#ifndef CONFIG_SMP if (test_and_clear_bit(PG_dcache_dirty, &page->flags)) __flush_dcache_page(mapping, page); -#endif if (mapping) { if (cache_is_vivt()) make_coherent(mapping, vma, addr, pfn); diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 7f294f3..2d3325d 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -15,6 +15,7 @@ #include <asm/cachetype.h> #include <asm/system.h> #include <asm/tlbflush.h> +#include <asm/smp_plat.h> #include "mm.h" @@ -198,12 +199,10 @@ void flush_dcache_page(struct page *page) { struct address_space *mapping = page_mapping(page); -#ifndef CONFIG_SMP - if (!PageHighMem(page) && mapping && !mapping_mapped(mapping)) + if (!cache_ops_need_broadcast() && + !PageHighMem(page) && mapping && !mapping_mapped(mapping)) set_bit(PG_dcache_dirty, &page->flags); - else -#endif - { + else { __flush_dcache_page(mapping, page); if (mapping && cache_is_vivt()) __flush_dcache_aliases(mapping, page); ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcastscache operations 2009-12-07 14:13 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcasts cache operations Catalin Marinas @ 2010-03-08 16:25 ` Catalin Marinas 2010-03-08 16:31 ` Russell King - ARM Linux 0 siblings, 1 reply; 20+ messages in thread From: Catalin Marinas @ 2010-03-08 16:25 UTC (permalink / raw) To: linux-arm-kernel Hi Russell, On Mon, 2009-12-07 at 14:13 +0000, Catalin Marinas wrote: > ARMv7 processors like Cortex-A9 broadcast the cache maintenance > operations in hardware. The patch adds the CPU ID checks for such > feature and allows the flush_dcache_page/update_mmu_cache pair to work > in lazy flushing mode similar to the UP case. It looks like I haven't got a final ok from you on this patch (I had the impression that it's in the patch system already but rebased my patches and found that it's not in mainline). Are you ok with it (quoting it below)? > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > --- > arch/arm/include/asm/smp_plat.h | 9 +++++++++ > arch/arm/mm/fault-armv.c | 2 -- > arch/arm/mm/flush.c | 9 ++++----- > 3 files changed, 13 insertions(+), 7 deletions(-) > > diff --git a/arch/arm/include/asm/smp_plat.h > b/arch/arm/include/asm/smp_plat.h > index 59303e2..e587167 100644 > --- a/arch/arm/include/asm/smp_plat.h > +++ b/arch/arm/include/asm/smp_plat.h > @@ -13,4 +13,13 @@ static inline int tlb_ops_need_broadcast(void) > return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 2; > } > > +#ifndef CONFIG_SMP > +#define cache_ops_need_broadcast() 0 > +#else > +static inline int cache_ops_need_broadcast(void) > +{ > + return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 1; > +} > +#endif > + > #endif > diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c > index d0d17b6..bb60117 100644 > --- a/arch/arm/mm/fault-armv.c > +++ b/arch/arm/mm/fault-armv.c > @@ -153,10 +153,8 @@ void update_mmu_cache(struct vm_area_struct *vma, > unsigned long addr, pte_t pte) > > page = pfn_to_page(pfn); > mapping = page_mapping(page); > -#ifndef CONFIG_SMP > if (test_and_clear_bit(PG_dcache_dirty, &page->flags)) > __flush_dcache_page(mapping, page); > -#endif > if (mapping) { > if (cache_is_vivt()) > make_coherent(mapping, vma, addr, pfn); > diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c > index 7f294f3..2d3325d 100644 > --- a/arch/arm/mm/flush.c > +++ b/arch/arm/mm/flush.c > @@ -15,6 +15,7 @@ > #include <asm/cachetype.h> > #include <asm/system.h> > #include <asm/tlbflush.h> > +#include <asm/smp_plat.h> > > #include "mm.h" > > @@ -198,12 +199,10 @@ void flush_dcache_page(struct page *page) > { > struct address_space *mapping = page_mapping(page); > > -#ifndef CONFIG_SMP > - if (!PageHighMem(page) && mapping && !mapping_mapped(mapping)) > + if (!cache_ops_need_broadcast() && > + !PageHighMem(page) && mapping && !mapping_mapped(mapping)) > set_bit(PG_dcache_dirty, &page->flags); > - else > -#endif > - { > + else { > __flush_dcache_page(mapping, page); > if (mapping && cache_is_vivt()) > __flush_dcache_aliases(mapping, page); -- Catalin ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcastscache operations 2010-03-08 16:25 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcastscache operations Catalin Marinas @ 2010-03-08 16:31 ` Russell King - ARM Linux 2010-03-08 16:38 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardwarebroadcastscache operations Catalin Marinas 0 siblings, 1 reply; 20+ messages in thread From: Russell King - ARM Linux @ 2010-03-08 16:31 UTC (permalink / raw) To: linux-arm-kernel On Mon, Mar 08, 2010 at 04:25:01PM +0000, Catalin Marinas wrote: > Hi Russell, > > On Mon, 2009-12-07 at 14:13 +0000, Catalin Marinas wrote: > > ARMv7 processors like Cortex-A9 broadcast the cache maintenance > > operations in hardware. The patch adds the CPU ID checks for such > > feature and allows the flush_dcache_page/update_mmu_cache pair to work > > in lazy flushing mode similar to the UP case. > > It looks like I haven't got a final ok from you on this patch (I had the > impression that it's in the patch system already but rebased my patches > and found that it's not in mainline). It needs to be updated - we have a cache_ops_need_broadcast() in smp_plat.h now for the ptrace issues. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/6] ARMv7: Use lazy cache flushing if hardwarebroadcastscache operations 2010-03-08 16:31 ` Russell King - ARM Linux @ 2010-03-08 16:38 ` Catalin Marinas 0 siblings, 0 replies; 20+ messages in thread From: Catalin Marinas @ 2010-03-08 16:38 UTC (permalink / raw) To: linux-arm-kernel On Mon, 2010-03-08 at 16:31 +0000, Russell King - ARM Linux wrote: > On Mon, Mar 08, 2010 at 04:25:01PM +0000, Catalin Marinas wrote: > > Hi Russell, > > > > On Mon, 2009-12-07 at 14:13 +0000, Catalin Marinas wrote: > > > ARMv7 processors like Cortex-A9 broadcast the cache maintenance > > > operations in hardware. The patch adds the CPU ID checks for such > > > feature and allows the flush_dcache_page/update_mmu_cache pair to work > > > in lazy flushing mode similar to the UP case. > > > > It looks like I haven't got a final ok from you on this patch (I had the > > impression that it's in the patch system already but rebased my patches > > and found that it's not in mainline). > > It needs to be updated - we have a cache_ops_need_broadcast() in > smp_plat.h now for the ptrace issues. I noticed that when rebasing. Here's the updated patch: ARMv7: Use lazy cache flushing if hardware broadcasts cache operations From: Catalin Marinas <catalin.marinas@arm.com> ARMv7 processors like Cortex-A9 broadcast the cache maintenance operations in hardware. The patch adds the CPU ID checks for such feature and allows the flush_dcache_page/update_mmu_cache pair to work in lazy flushing mode similar to the UP case. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/smp_plat.h | 4 ++++ arch/arm/mm/fault-armv.c | 2 -- arch/arm/mm/flush.c | 9 ++++----- 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h index e621530..963a338 100644 --- a/arch/arm/include/asm/smp_plat.h +++ b/arch/arm/include/asm/smp_plat.h @@ -13,9 +13,13 @@ static inline int tlb_ops_need_broadcast(void) return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 2; } +#if !defined(CONFIG_SMP) || __LINUX_ARM_ARCH__ >= 7 +#define cache_ops_need_broadcast() 0 +#else static inline int cache_ops_need_broadcast(void) { return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 1; } +#endif #endif diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index c9b97e9..0866ffd 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -169,10 +169,8 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, return; mapping = page_mapping(page); -#ifndef CONFIG_SMP if (test_and_clear_bit(PG_dcache_dirty, &page->flags)) __flush_dcache_page(mapping, page); -#endif if (mapping) { if (cache_is_vivt()) make_coherent(mapping, vma, addr, ptep, pfn); diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index e34f095..c2cea53 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -16,6 +16,7 @@ #include <asm/smp_plat.h> #include <asm/system.h> #include <asm/tlbflush.h> +#include <asm/smp_plat.h> #include "mm.h" @@ -241,12 +242,10 @@ void flush_dcache_page(struct page *page) mapping = page_mapping(page); -#ifndef CONFIG_SMP - if (!PageHighMem(page) && mapping && !mapping_mapped(mapping)) + if (!cache_ops_need_broadcast() && + !PageHighMem(page) && mapping && !mapping_mapped(mapping)) set_bit(PG_dcache_dirty, &page->flags); - else -#endif - { + else { __flush_dcache_page(mapping, page); if (mapping && cache_is_vivt()) __flush_dcache_aliases(mapping, page); -- Catalin ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE 2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas ` (3 preceding siblings ...) 2009-12-07 14:13 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcasts cache operations Catalin Marinas @ 2009-12-07 14:14 ` Catalin Marinas 2009-12-12 11:28 ` Russell King - ARM Linux 2009-12-07 14:16 ` [PATCH 6/6] Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas 5 siblings, 1 reply; 20+ messages in thread From: Catalin Marinas @ 2009-12-07 14:14 UTC (permalink / raw) To: linux-arm-kernel This patch enables the Access Flag in SCTLR and, together with the TEX remapping, allows the use of the spare bits in the page table entry thus removing the Linux specific PTEs. The simplified permission model is used which means that "kernel read/write, user read-only" is no longer available. This was used for the vectors page but with a dedicated TLS register it is no longer necessary. With this feature, the following bits were changed to overlap with the hardware bits: L_PTE_NOEXEC -> XN L_PTE_PRESENT -> bit 1 L_PTE_YOUNG -> AP[0] (access flag) L_PTE_USER -> AP[1] (simplified permission model) L_PTE_NOWRITE -> AP[2] (simplified permission model) L_PTE_DIRTY -> TEX[1] (spare bit) The TEX[2] spare bit is available for future use. Since !L_PTE_PRESENT requires bit 0 to be unset (otherwise it is a Large Page Table entry), L_PTE_FILE occupies bit 2. This requires some changes to the __swp_* and pte_to_pgoff/pgoff_to_pte macros to avoid overriding this bit. PTE_FILE_MAXBITS becomes 29 if AFE is enabled. There are no changes required to the PMD_SECT_* macros because the current usage is compatible with the simplified permission model. If hardware management of the access flag is available and SCTLR.HA is set, the L_PTE_YOUNG bit is automatically set when a page is accessed. With software management of the access flag, an "access flag" fault is generated which is handled by the do_page_fault() function. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/memory.h | 6 ++ arch/arm/include/asm/page.h | 8 +++ arch/arm/include/asm/pgalloc.h | 10 ++- arch/arm/include/asm/pgtable.h | 117 +++++++++++++++++++++++++++++++++++----- arch/arm/mm/Kconfig | 12 ++++ arch/arm/mm/dma-mapping.c | 6 ++ arch/arm/mm/fault.c | 10 +++ arch/arm/mm/mmu.c | 7 +- arch/arm/mm/proc-v7.S | 56 ++++++++----------- 9 files changed, 177 insertions(+), 55 deletions(-) diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index bc2ff8b..d57040a 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -113,11 +113,15 @@ #endif /* !CONFIG_MMU */ /* - * Size of DMA-consistent memory region. Must be multiple of 2M, + * Size of DMA-consistent memory region. Must be multiple of 2M (4MB if AFE), * between 2MB and 14MB inclusive. */ #ifndef CONSISTENT_DMA_SIZE +#ifndef CONFIG_CPU_AFE #define CONSISTENT_DMA_SIZE SZ_2M +#else +#define CONSISTENT_DMA_SIZE SZ_4M +#endif #endif /* diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h index 3a32af4..224159d 100644 --- a/arch/arm/include/asm/page.h +++ b/arch/arm/include/asm/page.h @@ -158,7 +158,11 @@ extern void copy_page(void *to, const void *from); */ typedef struct { unsigned long pte; } pte_t; typedef struct { unsigned long pmd; } pmd_t; +#ifndef CONFIG_CPU_AFE typedef struct { unsigned long pgd[2]; } pgd_t; +#else +typedef struct { unsigned long pgd[4]; } pgd_t; +#endif typedef struct { unsigned long pgprot; } pgprot_t; #define pte_val(x) ((x).pte) @@ -176,7 +180,11 @@ typedef struct { unsigned long pgprot; } pgprot_t; */ typedef unsigned long pte_t; typedef unsigned long pmd_t; +#ifndef CONFIG_CPU_AFE typedef unsigned long pgd_t[2]; +#else +typedef unsigned long pgd_t[4]; +#endif typedef unsigned long pgprot_t; #define pte_val(x) (x) diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h index b12cc98..57083dd 100644 --- a/arch/arm/include/asm/pgalloc.h +++ b/arch/arm/include/asm/pgalloc.h @@ -62,7 +62,7 @@ pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr) pte = (pte_t *)__get_free_page(PGALLOC_GFP); if (pte) { clean_dcache_area(pte, sizeof(pte_t) * PTRS_PER_PTE); - pte += PTRS_PER_PTE; + pte += LINUX_PTE_OFFSET; } return pte; @@ -95,7 +95,7 @@ pte_alloc_one(struct mm_struct *mm, unsigned long addr) static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte) { if (pte) { - pte -= PTRS_PER_PTE; + pte -= LINUX_PTE_OFFSET; free_page((unsigned long)pte); } } @@ -110,6 +110,10 @@ static inline void __pmd_populate(pmd_t *pmdp, unsigned long pmdval) { pmdp[0] = __pmd(pmdval); pmdp[1] = __pmd(pmdval + 256 * sizeof(pte_t)); +#ifdef CONFIG_CPU_AFE + pmdp[2] = __pmd(pmdval + 512 * sizeof(pte_t)); + pmdp[3] = __pmd(pmdval + 768 * sizeof(pte_t)); +#endif flush_pmd_entry(pmdp); } @@ -128,7 +132,7 @@ pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, pte_t *ptep) * The pmd must be loaded with the physical * address of the PTE table */ - pte_ptr -= PTRS_PER_PTE * sizeof(void *); + pte_ptr -= LINUX_PTE_OFFSET * sizeof(void *); __pmd_populate(pmdp, __pa(pte_ptr) | _PAGE_KERNEL_TABLE); } diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 201ccaa..8429868 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -40,6 +40,7 @@ #define VMALLOC_START (((unsigned long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)) #endif +#ifndef CONFIG_CPU_AFE /* * Hardware-wise, we have a two level page table structure, where the first * level has 4096 entries, and the second level has 256 entries. Each entry @@ -101,13 +102,31 @@ #define PTRS_PER_PTE 512 #define PTRS_PER_PMD 1 #define PTRS_PER_PGD 2048 +#define LINUX_PTE_OFFSET PTRS_PER_PTE +#else +/* + * If the Access Flag is enabled, Linux only uses one version of PTEs. We tell + * LInux that we have 1024 entries in the first level, each of which is 16 + * bytes long (4 hardware pointers to the second level). The PTE level has + * 1024 entries. + */ +#define PTRS_PER_PTE 1024 +#define PTRS_PER_PMD 1 +#define PTRS_PER_PGD 1024 +#define LINUX_PTE_OFFSET 0 +#endif /* * PMD_SHIFT determines the size of the area a second-level page table can map * PGDIR_SHIFT determines what a third-level page table entry can map */ +#ifndef CONFIG_CPU_AFE #define PMD_SHIFT 21 #define PGDIR_SHIFT 21 +#else +#define PMD_SHIFT 22 +#define PGDIR_SHIFT 22 +#endif #define LIBRARY_TEXT_START 0x0c000000 @@ -150,6 +169,7 @@ extern void __pgd_error(const char *file, int line, unsigned long val); #define SUPERSECTION_SIZE (1UL << SUPERSECTION_SHIFT) #define SUPERSECTION_MASK (~(SUPERSECTION_SIZE-1)) +#ifndef CONFIG_CPU_AFE /* * "Linux" PTE definitions. * @@ -169,7 +189,30 @@ extern void __pgd_error(const char *file, int line, unsigned long val); #define L_PTE_USER (1 << 8) #define L_PTE_EXEC (1 << 9) #define L_PTE_SHARED (1 << 10) /* shared(v6), coherent(xsc3) */ +#define L_PTE_NOEXEC 0 +#define L_PTE_NOWRITE 0 +#else +/* + * "Linux" PTE definitions with AFE set. + * + * These bits overlap with the hardware bits but the naming is preserved for + * consistency with the non-AFE version. + */ +#define L_PTE_NOEXEC (1 << 0) /* XN */ +#define L_PTE_PRESENT (1 << 1) +#define L_PTE_FILE (1 << 2) /* only when !PRESENT */ +#define L_PTE_BUFFERABLE (1 << 2) /* B */ +#define L_PTE_CACHEABLE (1 << 3) /* C */ +#define L_PTE_YOUNG (1 << 4) /* access flag */ +#define L_PTE_USER (1 << 5) /* AP[1] */ +#define L_PTE_DIRTY (1 << 7) /* TEX[1] */ +#define L_PTE_NOWRITE (1 << 9) /* AP[2] */ +#define L_PTE_SHARED (1 << 10) /* shared(v6+) */ +#define L_PTE_EXEC 0 +#define L_PTE_WRITE 0 +#endif +#ifndef CONFIG_CPU_AFE /* * These are the memory types, defined to be compatible with * pre-ARMv6 CPUs cacheable and bufferable bits: XXCB @@ -185,6 +228,22 @@ extern void __pgd_error(const char *file, int line, unsigned long val); #define L_PTE_MT_DEV_WC (0x09 << 2) /* 1001 */ #define L_PTE_MT_DEV_CACHED (0x0b << 2) /* 1011 */ #define L_PTE_MT_MASK (0x0f << 2) +#else +/* + * AFE page table format requires TEX remapping as well: TEX[0], C, B. + */ +#define L_PTE_MT_UNCACHED ((0 << 6) | (0 << 2)) /* 000 */ +#define L_PTE_MT_BUFFERABLE ((0 << 6) | (1 << 2)) /* 001 */ +#define L_PTE_MT_WRITETHROUGH ((0 << 6) | (2 << 2)) /* 010 */ +#define L_PTE_MT_WRITEBACK ((0 << 6) | (3 << 2)) /* 011 */ +#define L_PTE_MT_MINICACHE ((1 << 6) | (2 << 2)) /* 110 (sa1100, xscale) */ +#define L_PTE_MT_WRITEALLOC ((1 << 6) | (3 << 2)) /* 111 */ +#define L_PTE_MT_DEV_SHARED ((1 << 6) | (0 << 2)) /* 100 */ +#define L_PTE_MT_DEV_NONSHARED ((1 << 6) | (0 << 2)) /* 100 */ +#define L_PTE_MT_DEV_WC ((0 << 6) | (1 << 2)) /* 001 */ +#define L_PTE_MT_DEV_CACHED ((0 << 6) | (3 << 2)) /* 011 */ +#define L_PTE_MT_MASK ((1 << 6) | (3 << 2)) +#endif #ifndef __ASSEMBLY__ @@ -202,22 +261,22 @@ extern pgprot_t pgprot_kernel; #define _MOD_PROT(p, b) __pgprot(pgprot_val(p) | (b)) #define PAGE_NONE pgprot_user -#define PAGE_SHARED _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_WRITE) +#define PAGE_SHARED _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_WRITE | L_PTE_NOEXEC) #define PAGE_SHARED_EXEC _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_WRITE | L_PTE_EXEC) -#define PAGE_COPY _MOD_PROT(pgprot_user, L_PTE_USER) -#define PAGE_COPY_EXEC _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_EXEC) -#define PAGE_READONLY _MOD_PROT(pgprot_user, L_PTE_USER) -#define PAGE_READONLY_EXEC _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_EXEC) -#define PAGE_KERNEL pgprot_kernel +#define PAGE_COPY _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_NOEXEC | L_PTE_NOWRITE) +#define PAGE_COPY_EXEC _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_EXEC | L_PTE_NOWRITE) +#define PAGE_READONLY _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_NOEXEC | L_PTE_NOWRITE) +#define PAGE_READONLY_EXEC _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_EXEC | L_PTE_NOWRITE) +#define PAGE_KERNEL _MOD_PROT(pgprot_kernel, L_PTE_NOEXEC) #define PAGE_KERNEL_EXEC _MOD_PROT(pgprot_kernel, L_PTE_EXEC) -#define __PAGE_NONE __pgprot(_L_PTE_DEFAULT) -#define __PAGE_SHARED __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_WRITE) +#define __PAGE_NONE __pgprot(_L_PTE_DEFAULT | L_PTE_NOEXEC | L_PTE_NOWRITE) +#define __PAGE_SHARED __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_WRITE | L_PTE_NOEXEC) #define __PAGE_SHARED_EXEC __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_WRITE | L_PTE_EXEC) -#define __PAGE_COPY __pgprot(_L_PTE_DEFAULT | L_PTE_USER) -#define __PAGE_COPY_EXEC __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_EXEC) -#define __PAGE_READONLY __pgprot(_L_PTE_DEFAULT | L_PTE_USER) -#define __PAGE_READONLY_EXEC __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_EXEC) +#define __PAGE_COPY __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_NOEXEC | L_PTE_NOWRITE) +#define __PAGE_COPY_EXEC __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_EXEC | L_PTE_NOWRITE) +#define __PAGE_READONLY __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_NOEXEC | L_PTE_NOWRITE) +#define __PAGE_READONLY_EXEC __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_EXEC | L_PTE_NOWRITE) #endif /* __ASSEMBLY__ */ @@ -287,7 +346,11 @@ extern struct page *empty_zero_page; * Undefined behaviour if not.. */ #define pte_present(pte) (pte_val(pte) & L_PTE_PRESENT) +#ifndef CONFIG_CPU_AFE #define pte_write(pte) (pte_val(pte) & L_PTE_WRITE) +#else +#define pte_write(pte) (!(pte_val(pte) & L_PTE_NOWRITE)) +#endif #define pte_dirty(pte) (pte_val(pte) & L_PTE_DIRTY) #define pte_young(pte) (pte_val(pte) & L_PTE_YOUNG) #define pte_special(pte) (0) @@ -295,8 +358,13 @@ extern struct page *empty_zero_page; #define PTE_BIT_FUNC(fn,op) \ static inline pte_t pte_##fn(pte_t pte) { pte_val(pte) op; return pte; } +#ifndef CONFIG_CPU_AFE PTE_BIT_FUNC(wrprotect, &= ~L_PTE_WRITE); PTE_BIT_FUNC(mkwrite, |= L_PTE_WRITE); +#else +PTE_BIT_FUNC(wrprotect, |= L_PTE_NOWRITE); +PTE_BIT_FUNC(mkwrite, &= ~L_PTE_NOWRITE); +#endif PTE_BIT_FUNC(mkclean, &= ~L_PTE_DIRTY); PTE_BIT_FUNC(mkdirty, |= L_PTE_DIRTY); PTE_BIT_FUNC(mkold, &= ~L_PTE_YOUNG); @@ -316,10 +384,27 @@ static inline pte_t pte_mkspecial(pte_t pte) { return pte; } #define pmd_present(pmd) (pmd_val(pmd)) #define pmd_bad(pmd) (pmd_val(pmd) & 2) +#ifndef CONFIG_CPU_AFE +#define copy_pmd(pmdpd,pmdps) \ + do { \ + pmdpd[0] = pmdps[0]; \ + pmdpd[1] = pmdps[1]; \ + flush_pmd_entry(pmdpd); \ + } while (0) + +#define pmd_clear(pmdp) \ + do { \ + pmdp[0] = __pmd(0); \ + pmdp[1] = __pmd(0); \ + clean_pmd_entry(pmdp); \ + } while (0) +#else #define copy_pmd(pmdpd,pmdps) \ do { \ pmdpd[0] = pmdps[0]; \ pmdpd[1] = pmdps[1]; \ + pmdpd[2] = pmdps[2]; \ + pmdpd[3] = pmdps[3]; \ flush_pmd_entry(pmdpd); \ } while (0) @@ -327,15 +412,18 @@ static inline pte_t pte_mkspecial(pte_t pte) { return pte; } do { \ pmdp[0] = __pmd(0); \ pmdp[1] = __pmd(0); \ + pmdp[2] = __pmd(0); \ + pmdp[3] = __pmd(0); \ clean_pmd_entry(pmdp); \ } while (0) +#endif static inline pte_t *pmd_page_vaddr(pmd_t pmd) { unsigned long ptr; ptr = pmd_val(pmd) & ~(PTRS_PER_PTE * sizeof(void *) - 1); - ptr += PTRS_PER_PTE * sizeof(void *); + ptr += LINUX_PTE_OFFSET * sizeof(void *); return __va(ptr); } @@ -375,7 +463,8 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { - const unsigned long mask = L_PTE_EXEC | L_PTE_WRITE | L_PTE_USER; + const unsigned long mask = L_PTE_EXEC | L_PTE_WRITE | L_PTE_USER | + L_PTE_NOEXEC | L_PTE_NOWRITE; pte_val(pte) = (pte_val(pte) & ~mask) | (pgprot_val(newprot) & mask); return pte; } diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index ce382f5..56aadfa 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -454,6 +454,18 @@ config CPU_32v6 config CPU_32v7 bool +# Page table format +config CPU_AFE + bool + depends on MMU + default y if CPU_V7 + help + This option sets the Access Flag Enable bit forcing the simplified + permission model and automatic management of the access bit (if + supported by the hardware). With this option enabled and TEX + remapping, Linux no longer keeps a separate page table entry for + storing additional bits. + # The abort model config CPU_ABRT_NOMMU bool diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 176c696..15dafb6 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -25,9 +25,15 @@ #include <asm/sizes.h> /* Sanity check size */ +#ifndef CONFIG_CPU_AFE #if (CONSISTENT_DMA_SIZE % SZ_2M) #error "CONSISTENT_DMA_SIZE must be multiple of 2MiB" #endif +#else +#if (CONSISTENT_DMA_SIZE % SZ_4M) +#error "CONSISTENT_DMA_SIZE must be multiple of 4MiB" +#endif +#endif #define CONSISTENT_END (0xffe00000) #define CONSISTENT_BASE (CONSISTENT_END - CONSISTENT_DMA_SIZE) diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 10e0680..e398ade 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -107,7 +107,9 @@ void show_pte(struct mm_struct *mm, unsigned long addr) pte = pte_offset_map(pmd, addr); printk(", *pte=%08lx", pte_val(*pte)); +#ifndef CONFIG_CPU_AFE printk(", *ppte=%08lx", pte_val(pte[-PTRS_PER_PTE])); +#endif pte_unmap(pte); } while(0); @@ -458,7 +460,11 @@ static struct fsr_info { { do_bad, SIGILL, BUS_ADRALN, "alignment exception" }, { do_bad, SIGBUS, 0, "external abort on linefetch" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "section translation fault" }, +#ifndef CONFIG_CPU_AFE { do_bad, SIGBUS, 0, "external abort on linefetch" }, +#else + { do_page_fault, SIGSEGV, SEGV_MAPERR, "access flag fault" }, +#endif { do_page_fault, SIGSEGV, SEGV_MAPERR, "page translation fault" }, { do_bad, SIGBUS, 0, "external abort on non-linefetch" }, { do_bad, SIGSEGV, SEGV_ACCERR, "section domain fault" }, @@ -532,7 +538,11 @@ static struct fsr_info ifsr_info[] = { { do_bad, SIGSEGV, SEGV_ACCERR, "section access flag fault" }, { do_bad, SIGBUS, 0, "unknown 4" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "section translation fault" }, +#ifndef CONFIG_CPU_AFE { do_bad, SIGSEGV, SEGV_ACCERR, "page access flag fault" }, +#else + { do_page_fault, SIGSEGV, SEGV_MAPERR, "access flag fault" }, +#endif { do_page_fault, SIGSEGV, SEGV_MAPERR, "page translation fault" }, { do_bad, SIGBUS, 0, "external abort on non-linefetch" }, { do_bad, SIGSEGV, SEGV_ACCERR, "section domain fault" }, diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index ea67be0..b3796a0 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -190,7 +190,7 @@ void adjust_cr(unsigned long mask, unsigned long set) } #endif -#define PROT_PTE_DEVICE L_PTE_PRESENT|L_PTE_YOUNG|L_PTE_DIRTY|L_PTE_WRITE +#define PROT_PTE_DEVICE L_PTE_PRESENT|L_PTE_YOUNG|L_PTE_DIRTY|L_PTE_WRITE|L_PTE_NOEXEC #define PROT_SECT_DEVICE PMD_TYPE_SECT|PMD_SECT_AP_WRITE static struct mem_type mem_types[] = { @@ -241,7 +241,7 @@ static struct mem_type mem_types[] = { }, [MT_HIGH_VECTORS] = { .prot_pte = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY | - L_PTE_USER | L_PTE_EXEC, + L_PTE_USER | L_PTE_EXEC | L_PTE_NOWRITE, .prot_l1 = PMD_TYPE_TABLE, .domain = DOMAIN_USER, }, @@ -491,7 +491,8 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr, pte_t *pte; if (pmd_none(*pmd)) { - pte = alloc_bootmem_low_pages(2 * PTRS_PER_PTE * sizeof(pte_t)); + pte = alloc_bootmem_low_pages((LINUX_PTE_OFFSET + + PTRS_PER_PTE) * sizeof(pte_t)); __pmd_populate(pmd, __pa(pte) | type->prot_l1); } diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index 3a28521..568ccfc 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -126,38 +126,26 @@ ENDPROC(cpu_v7_switch_mm) * (hardware version is stored at -1024 bytes) * - pte - PTE value to store * - ext - value for extended PTE bits + * + * Simplified permission translation (AP0 is the access flag): + * YUWD AP2 AP1 AP0 SVC User + * 0xxx 0 0 0 no acc no acc + * 100x 1 0 1 r/o no acc + * 10x0 1 0 1 r/o no acc + * 1011 0 0 1 r/w no acc + * 110x 1 1 1 r/o r/o + * 11x0 1 1 1 r/o r/o + * 1111 0 1 1 r/w r/w */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU - ARM( str r1, [r0], #-2048 ) @ linux version - THUMB( str r1, [r0] ) @ linux version - THUMB( sub r0, r0, #2048 ) - - bic r3, r1, #0x000003f0 - bic r3, r3, #PTE_TYPE_MASK - orr r3, r3, r2 - orr r3, r3, #PTE_EXT_AP0 | 2 - - tst r1, #1 << 4 - orrne r3, r3, #PTE_EXT_TEX(1) - - tst r1, #L_PTE_WRITE - tstne r1, #L_PTE_DIRTY - orreq r3, r3, #PTE_EXT_APX - - tst r1, #L_PTE_USER - orrne r3, r3, #PTE_EXT_AP1 - tstne r3, #PTE_EXT_APX - bicne r3, r3, #PTE_EXT_APX | PTE_EXT_AP0 - - tst r1, #L_PTE_EXEC - orreq r3, r3, #PTE_EXT_XN - - tst r1, #L_PTE_YOUNG - tstne r1, #L_PTE_PRESENT - moveq r3, #0 - - str r3, [r0] + tst r1, #L_PTE_PRESENT + beq 1f + tst r1, #L_PTE_DIRTY + orreq r1, #L_PTE_NOWRITE + orr r1, r1, r2 +1: + str r1, [r0] mcr p15, 0, r0, c7, c10, 1 @ flush_pte #endif mov pc, lr @@ -283,14 +271,14 @@ __v7_setup: ENDPROC(__v7_setup) /* AT - * TFR EV X F I D LR S - * .EEE ..EE PUI. .T.T 4RVI ZWRS BLDP WCAM - * rxxx rrxx xxx0 0101 xxxx xxxx x111 xxxx < forced - * 1 0 110 0011 1100 .111 1101 < we want + * TFR EV X F IHD LR S + * .EEE ..EE PUI. .TAT 4RVI ZWRS BLDP WCAM + * rxxx rrxx xxx0 01x1 xxxx xxxx x111 xxxx < forced + * 11 0 110 1 0011 1100 .111 1101 < we want */ .type v7_crval, #object v7_crval: - crval clear=0x0120c302, mmuset=0x10c03c7d, ucset=0x00c01c7c + crval clear=0x0120c302, mmuset=0x30c23c7d, ucset=0x00c01c7c __v7_setup_stack: .space 4 * 11 @ 11 registers ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE 2009-12-07 14:14 ` [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE Catalin Marinas @ 2009-12-12 11:28 ` Russell King - ARM Linux 2009-12-14 15:50 ` Catalin Marinas 0 siblings, 1 reply; 20+ messages in thread From: Russell King - ARM Linux @ 2009-12-12 11:28 UTC (permalink / raw) To: linux-arm-kernel On Mon, Dec 07, 2009 at 02:14:10PM +0000, Catalin Marinas wrote: > This patch enables the Access Flag in SCTLR and, together with the TEX > remapping, allows the use of the spare bits in the page table entry thus > removing the Linux specific PTEs. The simplified permission model is > used which means that "kernel read/write, user read-only" is no longer > available. This was used for the vectors page but with a dedicated TLS > register it is no longer necessary. I really do not want to go here without an explaination of how situations such as: - Kernel reads PTE and modifies it - Hardware accesses page - TLB reads PTE, updates, and writes new back - Kernel writes PTE back - Kernel cleans cache line are handled. What about SMP, where CPU0 may access and modify the active page tables on CPU1 (eg, clearing PTEs)? Are TLB accesses with AFE enabled guaranteed to read from the L1 cache? If not, we need to clean _and_ invalidate PTE updates. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE 2009-12-12 11:28 ` Russell King - ARM Linux @ 2009-12-14 15:50 ` Catalin Marinas 2009-12-14 15:58 ` Catalin Marinas 2009-12-14 16:11 ` Russell King - ARM Linux 0 siblings, 2 replies; 20+ messages in thread From: Catalin Marinas @ 2009-12-14 15:50 UTC (permalink / raw) To: linux-arm-kernel On Sat, 2009-12-12 at 11:28 +0000, Russell King - ARM Linux wrote: > On Mon, Dec 07, 2009 at 02:14:10PM +0000, Catalin Marinas wrote: > > This patch enables the Access Flag in SCTLR and, together with the TEX > > remapping, allows the use of the spare bits in the page table entry thus > > removing the Linux specific PTEs. The simplified permission model is > > used which means that "kernel read/write, user read-only" is no longer > > available. This was used for the vectors page but with a dedicated TLS > > register it is no longer necessary. > > I really do not want to go here without an explaination of how situations > such as: I think we discussed some of these when I first posted the patch some time ago but I'm happy do it again here. BTW, all the hardware implementations I'm aware of only raise an access flag fault when this bit is cleared rather than doing it in hardware. But even if they do it in hardware, it can still work (you also have the option of disabling the hardware management via the SCTLR.HA bit). > - Kernel reads PTE and modifies it B3.3.5 in the ARM ARM describes the requirements for the Hardware management of the access flag: Any implementation of hardware management of the access flag must ensure that any software changes to the translation table are not lost. The architecture does not require software that performs translation table changes to use interlocked operations. The hardware management mechanisms for the access flag must prevent any loss of data written to translation table entries that might occur when, for example, a write by another processor occurs between the read and write phases of a translation table walk that updates the access flag. At the hardware level, it could be implemented similar to a LDREX/STREX block. > - Hardware accesses page > - TLB reads PTE, updates, and writes new back > - Kernel writes PTE back Addressed above. The hardware write should fail if there was an STR from the current or different CPU. > - Kernel cleans cache line A hardware implementation of the AF would probably require the PTWs via L1 (Cortex-A9 has such PTWs) otherwise it breaks the requirements. It is mandatory to have the same TTBR cacheability settings as the mapping of the page tables (i.e. Normal cached in the Linux case) so we don't need further kernel modifications. -- Catalin ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE 2009-12-14 15:50 ` Catalin Marinas @ 2009-12-14 15:58 ` Catalin Marinas 2009-12-14 16:11 ` Russell King - ARM Linux 1 sibling, 0 replies; 20+ messages in thread From: Catalin Marinas @ 2009-12-14 15:58 UTC (permalink / raw) To: linux-arm-kernel On Mon, 2009-12-14 at 15:50 +0000, Catalin Marinas wrote: > On Sat, 2009-12-12 at 11:28 +0000, Russell King - ARM Linux wrote: > > On Mon, Dec 07, 2009 at 02:14:10PM +0000, Catalin Marinas wrote: > > > This patch enables the Access Flag in SCTLR and, together with the TEX > > > remapping, allows the use of the spare bits in the page table entry thus > > > removing the Linux specific PTEs. The simplified permission model is > > > used which means that "kernel read/write, user read-only" is no longer > > > available. This was used for the vectors page but with a dedicated TLS > > > register it is no longer necessary. > > > > I really do not want to go here without an explaination of how situations > > such as: > > I think we discussed some of these when I first posted the patch some > time ago but I'm happy do it again here. > > BTW, all the hardware implementations I'm aware of only raise an access > flag fault when this bit is cleared rather than doing it in hardware. > > But even if they do it in hardware, it can still work (you also have the > option of disabling the hardware management via the SCTLR.HA bit). > > > - Kernel reads PTE and modifies it > > B3.3.5 in the ARM ARM describes the requirements for the Hardware > management of the access flag: > > Any implementation of hardware management of the access flag > must ensure that any software changes to the translation table > are not lost. The architecture does not require software that > performs translation table changes to use interlocked > operations. The hardware management mechanisms for the access > flag must prevent any loss of data written to translation table > entries that might occur when, for example, a write by another > processor occurs between the read and write phases of a > translation table walk that updates the > access flag. > > At the hardware level, it could be implemented similar to a LDREX/STREX > block. Just avoid a question on this - it is possible for the kernel to read a PTE with AP[0] cleared, the hardware could set AP[0] to 1 as a result of an access then the kernel clears it again when storing the modified PTE. The above cannot be prevented since the PTE modifications are not atomic but it doesn't actually matter. In the worst case, the kernel would think that a page wasn't accessed for a longer time and it may decide to swap it out. I doubt this would be a performance hit since trapping the access faults takes much more time. If precise access timing is required (not sure why), you can always disable SCTLR.HA and handle the accesses in software. -- Catalin ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE 2009-12-14 15:50 ` Catalin Marinas 2009-12-14 15:58 ` Catalin Marinas @ 2009-12-14 16:11 ` Russell King - ARM Linux 2009-12-14 16:16 ` Catalin Marinas 1 sibling, 1 reply; 20+ messages in thread From: Russell King - ARM Linux @ 2009-12-14 16:11 UTC (permalink / raw) To: linux-arm-kernel On Mon, Dec 14, 2009 at 03:50:24PM +0000, Catalin Marinas wrote: > > - Kernel reads PTE and modifies it > > B3.3.5 in the ARM ARM describes the requirements for the Hardware > management of the access flag: > > Any implementation of hardware management of the access flag > must ensure that any software changes to the translation table > are not lost. The architecture does not require software that > performs translation table changes to use interlocked > operations. The hardware management mechanisms for the access > flag must prevent any loss of data written to translation table > entries that might occur when, for example, a write by another > processor occurs between the read and write phases of a > translation table walk that updates the > access flag. > > At the hardware level, it could be implemented similar to a LDREX/STREX > block. > > > - Hardware accesses page > > - TLB reads PTE, updates, and writes new back > > - Kernel writes PTE back > > Addressed above. The hardware write should fail if there was an STR from > the current or different CPU. I don't think it is - the paragraph you quote talks about the following situation: - Hardware reads PTE - Kernel writes PTE - Hardware (tries to) write PTE What it says is that the hardware write in this case must fail. The case I was talking about is: - Kernel reads PTE - Hardware reads PTE - Hardware writes PTE - Kernel writes PTE Since there is no STR between the hardware reading and writing the PTE, the hardware can not know that its update has been lost. Whether it matters or not is a different kettle of fish. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE 2009-12-14 16:11 ` Russell King - ARM Linux @ 2009-12-14 16:16 ` Catalin Marinas 0 siblings, 0 replies; 20+ messages in thread From: Catalin Marinas @ 2009-12-14 16:16 UTC (permalink / raw) To: linux-arm-kernel On Mon, 2009-12-14 at 16:11 +0000, Russell King - ARM Linux wrote: > On Mon, Dec 14, 2009 at 03:50:24PM +0000, Catalin Marinas wrote: > > > - Kernel reads PTE and modifies it > > > > B3.3.5 in the ARM ARM describes the requirements for the Hardware > > management of the access flag: > > > > Any implementation of hardware management of the access flag > > must ensure that any software changes to the translation table > > are not lost. The architecture does not require software that > > performs translation table changes to use interlocked > > operations. The hardware management mechanisms for the access > > flag must prevent any loss of data written to translation table > > entries that might occur when, for example, a write by another > > processor occurs between the read and write phases of a > > translation table walk that updates the > > access flag. > > > > At the hardware level, it could be implemented similar to a LDREX/STREX > > block. > > > > > - Hardware accesses page > > > - TLB reads PTE, updates, and writes new back > > > - Kernel writes PTE back > > > > Addressed above. The hardware write should fail if there was an STR from > > the current or different CPU. [...] > The case I was talking about is: > > - Kernel reads PTE > - Hardware reads PTE > - Hardware writes PTE > - Kernel writes PTE > > Since there is no STR between the hardware reading and writing the PTE, > the hardware can not know that its update has been lost. > > Whether it matters or not is a different kettle of fish. I was expected this follow-up :-), so I already replied to my post. I don't think it matters. -- Catalin ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 6/6] Remove the domain switching on ARMv6k/v7 CPUs 2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas ` (4 preceding siblings ...) 2009-12-07 14:14 ` [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE Catalin Marinas @ 2009-12-07 14:16 ` Catalin Marinas 5 siblings, 0 replies; 20+ messages in thread From: Catalin Marinas @ 2009-12-07 14:16 UTC (permalink / raw) To: linux-arm-kernel This patch removes the domain switching functionality via the set_fs and __switch_to functions on cores that have a TLS register. Currently, the ioremap and vmalloc areas share the same level 1 page tables and therefore have the same domain (DOMAIN_KERNEL). When the kernel domain is modified from Client to Manager (via the __set_fs or in the __switch_to function), the XN (eXecute Never) bit is overridden and newer CPUs can speculatively prefetch the ioremap'ed memory. Linux performs the kernel domain switching to allow user-specific functions (copy_to/from_user, get/put_user etc.) to access kernel memory. In order for these functions to work with the kernel domain set to Client, the patch modifies the LDRT/STRT and related instructions to the LDR/STR ones. The user pages access rights are also modified for kernel read-only access rather than read/write so that the copy-on-write mechanism still works. CPU_USE_DOMAINS gets disabled only if HAS_TLS_REG is defined since writing the TLS value to the high vectors page isn't possible. The user addresses passed to the kernel are checked by the access_ok() function so that they do not point to the kernel space. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- An additional note - prior to ARMv6 we cannot set the user R/O, kernel R/O permission on a page hence we have to use STRT variant to write such page from the kernel. Because of this, the T macro had to be introduced to differentiate between the STRT and STR usages and build time. A better name could be used instead of "T". arch/arm/include/asm/assembler.h | 9 ++-- arch/arm/include/asm/domain.h | 31 +++++++++++++- arch/arm/include/asm/futex.h | 9 ++-- arch/arm/include/asm/uaccess.h | 16 ++++--- arch/arm/kernel/entry-armv.S | 4 +- arch/arm/kernel/traps.c | 17 ++++++++ arch/arm/lib/getuser.S | 13 +++--- arch/arm/lib/putuser.S | 29 +++++++------ arch/arm/lib/uaccess.S | 83 +++++++++++++++++++------------------- arch/arm/mm/Kconfig | 9 ++++ arch/arm/mm/proc-v7.S | 2 - 11 files changed, 139 insertions(+), 83 deletions(-) diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index 00f46d9..4b82143 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -18,6 +18,7 @@ #endif #include <asm/ptrace.h> +#include <asm/domain.h> /* * Endian independent macros for shifting bytes within registers. @@ -186,9 +187,9 @@ .macro usraccoff, instr, reg, ptr, inc, off, cond, abort 9999: .if \inc == 1 - \instr\cond\()bt \reg, [\ptr, #\off] + T(\instr\cond\()b) \reg, [\ptr, #\off] .elseif \inc == 4 - \instr\cond\()t \reg, [\ptr, #\off] + T(\instr\cond\()) \reg, [\ptr, #\off] .else .error "Unsupported inc macro argument" .endif @@ -227,9 +228,9 @@ .rept \rept 9999: .if \inc == 1 - \instr\cond\()bt \reg, [\ptr], #\inc + T(\instr\cond\()b) \reg, [\ptr], #\inc .elseif \inc == 4 - \instr\cond\()t \reg, [\ptr], #\inc + T(\instr\cond\()) \reg, [\ptr], #\inc .else .error "Unsupported inc macro argument" .endif diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h index cc7ef40..af18cea 100644 --- a/arch/arm/include/asm/domain.h +++ b/arch/arm/include/asm/domain.h @@ -45,13 +45,17 @@ */ #define DOMAIN_NOACCESS 0 #define DOMAIN_CLIENT 1 +#ifdef CONFIG_CPU_USE_DOMAINS #define DOMAIN_MANAGER 3 +#else +#define DOMAIN_MANAGER 1 +#endif #define domain_val(dom,type) ((type) << (2*(dom))) #ifndef __ASSEMBLY__ -#ifdef CONFIG_MMU +#ifdef CONFIG_CPU_USE_DOMAINS #define set_domain(x) \ do { \ __asm__ __volatile__( \ @@ -74,5 +78,28 @@ #define modify_domain(dom,type) do { } while (0) #endif +/* + * Generate the T (user) versions of the LDR/STR and related + * instructions (inline assembly) + */ +#ifdef CONFIG_CPU_USE_DOMAINS +#define T(instr) #instr "t" +#else +#define T(instr) #instr #endif -#endif /* !__ASSEMBLY__ */ + +#else /* __ASSEMBLY__ */ + +/* + * Generate the T (user) versions of the LDR/STR and related + * instructions + */ +#ifdef CONFIG_CPU_USE_DOMAINS +#define T(instr) instr ## t +#else +#define T(instr) instr +#endif + +#endif /* __ASSEMBLY__ */ + +#endif /* !__ASM_PROC_DOMAIN_H */ diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h index bfcc159..8d868bd 100644 --- a/arch/arm/include/asm/futex.h +++ b/arch/arm/include/asm/futex.h @@ -13,12 +13,13 @@ #include <linux/preempt.h> #include <linux/uaccess.h> #include <asm/errno.h> +#include <asm/domain.h> #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \ __asm__ __volatile__( \ - "1: ldrt %1, [%2]\n" \ + "1: " T(ldr) " %1, [%2]\n" \ " " insn "\n" \ - "2: strt %0, [%2]\n" \ + "2: " T(str) " %0, [%2]\n" \ " mov %0, #0\n" \ "3:\n" \ " .section __ex_table,\"a\"\n" \ @@ -97,10 +98,10 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval) pagefault_disable(); /* implies preempt_disable() */ __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" - "1: ldrt %0, [%3]\n" + "1: " T(ldr) " %0, [%3]\n" " teq %0, %1\n" " it eq @ explicit IT needed for the 2b label\n" - "2: streqt %2, [%3]\n" + "2: " T(streq) " %2, [%3]\n" "3:\n" " .section __ex_table,\"a\"\n" " .align 3\n" diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h index 1d6bd40..e4d0905 100644 --- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -227,7 +227,7 @@ do { \ #define __get_user_asm_byte(x,addr,err) \ __asm__ __volatile__( \ - "1: ldrbt %1,[%2]\n" \ + "1: " T(ldrb) " %1,[%2],#0\n" \ "2:\n" \ " .section .fixup,\"ax\"\n" \ " .align 2\n" \ @@ -263,7 +263,7 @@ do { \ #define __get_user_asm_word(x,addr,err) \ __asm__ __volatile__( \ - "1: ldrt %1,[%2]\n" \ + "1: " T(ldr) " %1,[%2],#0\n" \ "2:\n" \ " .section .fixup,\"ax\"\n" \ " .align 2\n" \ @@ -308,7 +308,7 @@ do { \ #define __put_user_asm_byte(x,__pu_addr,err) \ __asm__ __volatile__( \ - "1: strbt %1,[%2]\n" \ + "1: " T(strb) " %1,[%2],#0\n" \ "2:\n" \ " .section .fixup,\"ax\"\n" \ " .align 2\n" \ @@ -341,7 +341,7 @@ do { \ #define __put_user_asm_word(x,__pu_addr,err) \ __asm__ __volatile__( \ - "1: strt %1,[%2]\n" \ + "1: " T(str) " %1,[%2],#0\n" \ "2:\n" \ " .section .fixup,\"ax\"\n" \ " .align 2\n" \ @@ -366,10 +366,10 @@ do { \ #define __put_user_asm_dword(x,__pu_addr,err) \ __asm__ __volatile__( \ - ARM( "1: strt " __reg_oper1 ", [%1], #4\n" ) \ - ARM( "2: strt " __reg_oper0 ", [%1]\n" ) \ - THUMB( "1: strt " __reg_oper1 ", [%1]\n" ) \ - THUMB( "2: strt " __reg_oper0 ", [%1, #4]\n" ) \ + ARM( "1: " T(str) " " __reg_oper1 ", [%1], #4\n" ) \ + ARM( "2: " T(str) " " __reg_oper0 ", [%1]\n" ) \ + THUMB( "1: " T(str) " " __reg_oper1 ", [%1]\n" ) \ + THUMB( "2: " T(str) " " __reg_oper0 ", [%1, #4]\n" ) \ "3:\n" \ " .section .fixup,\"ax\"\n" \ " .align 2\n" \ diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index d2903e3..1b31ecb 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -736,7 +736,7 @@ ENTRY(__switch_to) THUMB( stmia ip!, {r4 - sl, fp} ) @ Store most regs on stack THUMB( str sp, [ip], #4 ) THUMB( str lr, [ip], #4 ) -#ifdef CONFIG_MMU +#ifdef CONFIG_CPU_USE_DOMAINS ldr r6, [r2, #TI_CPU_DOMAIN] #endif #if defined(CONFIG_HAS_TLS_REG) @@ -745,7 +745,7 @@ ENTRY(__switch_to) mov r4, #0xffff0fff str r3, [r4, #-15] @ TLS val at 0xffff0ff0 #endif -#ifdef CONFIG_MMU +#ifdef CONFIG_CPU_USE_DOMAINS mcr p15, 0, r6, c3, c0, 0 @ Set domain register #endif mov r5, r0 diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 3f361a7..23d7673 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -28,6 +28,7 @@ #include <asm/unistd.h> #include <asm/traps.h> #include <asm/unwind.h> +#include <asm/tlbflush.h> #include "ptrace.h" #include "signal.h" @@ -735,6 +736,16 @@ void __init early_trap_init(void) extern char __vectors_start[], __vectors_end[]; extern char __kuser_helper_start[], __kuser_helper_end[]; int kuser_sz = __kuser_helper_end - __kuser_helper_start; +#ifndef CONFIG_CPU_USE_DOMAINS + pgd_t *pgd = pgd_offset_k(vectors); + pmd_t *pmd = pmd_offset(pgd, vectors); + pte_t *pte = pte_offset_kernel(pmd, vectors); + pte_t entry = *pte; + + /* allow writing to the vectors page */ + set_pte_ext(pte, pte_mkwrite(entry), 0); + local_flush_tlb_kernel_page(vectors); +#endif /* * Copy the vectors, stubs and kuser helpers (in entry-armv.S) @@ -754,6 +765,12 @@ void __init early_trap_init(void) memcpy((void *)KERN_RESTART_CODE, syscall_restart_code, sizeof(syscall_restart_code)); +#ifndef CONFIG_CPU_USE_DOMAINS + /* restore the vectors page permissions */ + set_pte_ext(pte, entry, 0); + local_flush_tlb_kernel_page(vectors); +#endif + flush_icache_range(vectors, vectors + PAGE_SIZE); modify_domain(DOMAIN_USER, DOMAIN_CLIENT); } diff --git a/arch/arm/lib/getuser.S b/arch/arm/lib/getuser.S index a1814d9..acc966b 100644 --- a/arch/arm/lib/getuser.S +++ b/arch/arm/lib/getuser.S @@ -28,20 +28,21 @@ */ #include <linux/linkage.h> #include <asm/errno.h> +#include <asm/domain.h> ENTRY(__get_user_1) -1: ldrbt r2, [r0] +1: T(ldrb) r2, [r0] mov r0, #0 mov pc, lr ENDPROC(__get_user_1) ENTRY(__get_user_2) #ifdef CONFIG_THUMB2_KERNEL -2: ldrbt r2, [r0] -3: ldrbt r3, [r0, #1] +2: T(ldrb) r2, [r0] +3: T(ldrb) r3, [r0, #1] #else -2: ldrbt r2, [r0], #1 -3: ldrbt r3, [r0] +2: T(ldrb) r2, [r0], #1 +3: T(ldrb) r3, [r0] #endif #ifndef __ARMEB__ orr r2, r2, r3, lsl #8 @@ -53,7 +54,7 @@ ENTRY(__get_user_2) ENDPROC(__get_user_2) ENTRY(__get_user_4) -4: ldrt r2, [r0] +4: T(ldr) r2, [r0] mov r0, #0 mov pc, lr ENDPROC(__get_user_4) diff --git a/arch/arm/lib/putuser.S b/arch/arm/lib/putuser.S index 02fedbf..95b3fe8 100644 --- a/arch/arm/lib/putuser.S +++ b/arch/arm/lib/putuser.S @@ -28,9 +28,10 @@ */ #include <linux/linkage.h> #include <asm/errno.h> +#include <asm/domain.h> ENTRY(__put_user_1) -1: strbt r2, [r0] +1: T(strb) r2, [r0] mov r0, #0 mov pc, lr ENDPROC(__put_user_1) @@ -39,19 +40,19 @@ ENTRY(__put_user_2) mov ip, r2, lsr #8 #ifdef CONFIG_THUMB2_KERNEL #ifndef __ARMEB__ -2: strbt r2, [r0] -3: strbt ip, [r0, #1] +2: T(strb) r2, [r0] +3: T(strb) ip, [r0, #1] #else -2: strbt ip, [r0] -3: strbt r2, [r0, #1] +2: T(strb) ip, [r0] +3: T(strb) r2, [r0, #1] #endif #else /* !CONFIG_THUMB2_KERNEL */ #ifndef __ARMEB__ -2: strbt r2, [r0], #1 -3: strbt ip, [r0] +2: T(strb) r2, [r0], #1 +3: T(strb) ip, [r0] #else -2: strbt ip, [r0], #1 -3: strbt r2, [r0] +2: T(strb) ip, [r0], #1 +3: T(strb) r2, [r0] #endif #endif /* CONFIG_THUMB2_KERNEL */ mov r0, #0 @@ -59,18 +60,18 @@ ENTRY(__put_user_2) ENDPROC(__put_user_2) ENTRY(__put_user_4) -4: strt r2, [r0] +4: T(str) r2, [r0] mov r0, #0 mov pc, lr ENDPROC(__put_user_4) ENTRY(__put_user_8) #ifdef CONFIG_THUMB2_KERNEL -5: strt r2, [r0] -6: strt r3, [r0, #4] +5: T(str) r2, [r0] +6: T(str) r3, [r0, #4] #else -5: strt r2, [r0], #4 -6: strt r3, [r0] +5: T(str) r2, [r0], #4 +6: T(str) r3, [r0] #endif mov r0, #0 mov pc, lr diff --git a/arch/arm/lib/uaccess.S b/arch/arm/lib/uaccess.S index ffdd274..e47cdfd 100644 --- a/arch/arm/lib/uaccess.S +++ b/arch/arm/lib/uaccess.S @@ -14,6 +14,7 @@ #include <linux/linkage.h> #include <asm/assembler.h> #include <asm/errno.h> +#include <asm/domain.h> .text @@ -31,11 +32,11 @@ rsb ip, ip, #4 cmp ip, #2 ldrb r3, [r1], #1 -USER( strbt r3, [r0], #1) @ May fault +USER( T(strb) r3, [r0], #1) @ May fault ldrgeb r3, [r1], #1 -USER( strgebt r3, [r0], #1) @ May fault +USER( T(strgeb) r3, [r0], #1) @ May fault ldrgtb r3, [r1], #1 -USER( strgtbt r3, [r0], #1) @ May fault +USER( T(strgtb) r3, [r0], #1) @ May fault sub r2, r2, ip b .Lc2u_dest_aligned @@ -58,7 +59,7 @@ ENTRY(__copy_to_user) addmi ip, r2, #4 bmi .Lc2u_0nowords ldr r3, [r1], #4 -USER( strt r3, [r0], #4) @ May fault +USER( T(str) r3, [r0], #4) @ May fault mov ip, r0, lsl #32 - PAGE_SHIFT @ On each page, use a ld/st??t instruction rsb ip, ip, #0 movs ip, ip, lsr #32 - PAGE_SHIFT @@ -87,18 +88,18 @@ USER( strt r3, [r0], #4) @ May fault stmneia r0!, {r3 - r4} @ Shouldnt fault tst ip, #4 ldrne r3, [r1], #4 - strnet r3, [r0], #4 @ Shouldnt fault + T(strne) r3, [r0], #4 @ Shouldnt fault ands ip, ip, #3 beq .Lc2u_0fupi .Lc2u_0nowords: teq ip, #0 beq .Lc2u_finished .Lc2u_nowords: cmp ip, #2 ldrb r3, [r1], #1 -USER( strbt r3, [r0], #1) @ May fault +USER( T(strb) r3, [r0], #1) @ May fault ldrgeb r3, [r1], #1 -USER( strgebt r3, [r0], #1) @ May fault +USER( T(strgeb) r3, [r0], #1) @ May fault ldrgtb r3, [r1], #1 -USER( strgtbt r3, [r0], #1) @ May fault +USER( T(strgtb) r3, [r0], #1) @ May fault b .Lc2u_finished .Lc2u_not_enough: @@ -119,7 +120,7 @@ USER( strgtbt r3, [r0], #1) @ May fault mov r3, r7, pull #8 ldr r7, [r1], #4 orr r3, r3, r7, push #24 -USER( strt r3, [r0], #4) @ May fault +USER( T(str) r3, [r0], #4) @ May fault mov ip, r0, lsl #32 - PAGE_SHIFT rsb ip, ip, #0 movs ip, ip, lsr #32 - PAGE_SHIFT @@ -154,18 +155,18 @@ USER( strt r3, [r0], #4) @ May fault movne r3, r7, pull #8 ldrne r7, [r1], #4 orrne r3, r3, r7, push #24 - strnet r3, [r0], #4 @ Shouldnt fault + T(strne) r3, [r0], #4 @ Shouldnt fault ands ip, ip, #3 beq .Lc2u_1fupi .Lc2u_1nowords: mov r3, r7, get_byte_1 teq ip, #0 beq .Lc2u_finished cmp ip, #2 -USER( strbt r3, [r0], #1) @ May fault +USER( T(strb) r3, [r0], #1) @ May fault movge r3, r7, get_byte_2 -USER( strgebt r3, [r0], #1) @ May fault +USER( T(strgeb) r3, [r0], #1) @ May fault movgt r3, r7, get_byte_3 -USER( strgtbt r3, [r0], #1) @ May fault +USER( T(strgtb) r3, [r0], #1) @ May fault b .Lc2u_finished .Lc2u_2fupi: subs r2, r2, #4 @@ -174,7 +175,7 @@ USER( strgtbt r3, [r0], #1) @ May fault mov r3, r7, pull #16 ldr r7, [r1], #4 orr r3, r3, r7, push #16 -USER( strt r3, [r0], #4) @ May fault +USER( T(str) r3, [r0], #4) @ May fault mov ip, r0, lsl #32 - PAGE_SHIFT rsb ip, ip, #0 movs ip, ip, lsr #32 - PAGE_SHIFT @@ -209,18 +210,18 @@ USER( strt r3, [r0], #4) @ May fault movne r3, r7, pull #16 ldrne r7, [r1], #4 orrne r3, r3, r7, push #16 - strnet r3, [r0], #4 @ Shouldnt fault + T(strne) r3, [r0], #4 @ Shouldnt fault ands ip, ip, #3 beq .Lc2u_2fupi .Lc2u_2nowords: mov r3, r7, get_byte_2 teq ip, #0 beq .Lc2u_finished cmp ip, #2 -USER( strbt r3, [r0], #1) @ May fault +USER( T(strb) r3, [r0], #1) @ May fault movge r3, r7, get_byte_3 -USER( strgebt r3, [r0], #1) @ May fault +USER( T(strgeb) r3, [r0], #1) @ May fault ldrgtb r3, [r1], #0 -USER( strgtbt r3, [r0], #1) @ May fault +USER( T(strgtb) r3, [r0], #1) @ May fault b .Lc2u_finished .Lc2u_3fupi: subs r2, r2, #4 @@ -229,7 +230,7 @@ USER( strgtbt r3, [r0], #1) @ May fault mov r3, r7, pull #24 ldr r7, [r1], #4 orr r3, r3, r7, push #8 -USER( strt r3, [r0], #4) @ May fault +USER( T(str) r3, [r0], #4) @ May fault mov ip, r0, lsl #32 - PAGE_SHIFT rsb ip, ip, #0 movs ip, ip, lsr #32 - PAGE_SHIFT @@ -264,18 +265,18 @@ USER( strt r3, [r0], #4) @ May fault movne r3, r7, pull #24 ldrne r7, [r1], #4 orrne r3, r3, r7, push #8 - strnet r3, [r0], #4 @ Shouldnt fault + T(strne) r3, [r0], #4 @ Shouldnt fault ands ip, ip, #3 beq .Lc2u_3fupi .Lc2u_3nowords: mov r3, r7, get_byte_3 teq ip, #0 beq .Lc2u_finished cmp ip, #2 -USER( strbt r3, [r0], #1) @ May fault +USER( T(strb) r3, [r0], #1) @ May fault ldrgeb r3, [r1], #1 -USER( strgebt r3, [r0], #1) @ May fault +USER( T(strgeb) r3, [r0], #1) @ May fault ldrgtb r3, [r1], #0 -USER( strgtbt r3, [r0], #1) @ May fault +USER( T(strgtb) r3, [r0], #1) @ May fault b .Lc2u_finished ENDPROC(__copy_to_user) @@ -294,11 +295,11 @@ ENDPROC(__copy_to_user) .Lcfu_dest_not_aligned: rsb ip, ip, #4 cmp ip, #2 -USER( ldrbt r3, [r1], #1) @ May fault +USER( T(ldrb) r3, [r1], #1) @ May fault strb r3, [r0], #1 -USER( ldrgebt r3, [r1], #1) @ May fault +USER( T(ldrgeb) r3, [r1], #1) @ May fault strgeb r3, [r0], #1 -USER( ldrgtbt r3, [r1], #1) @ May fault +USER( T(ldrgtb) r3, [r1], #1) @ May fault strgtb r3, [r0], #1 sub r2, r2, ip b .Lcfu_dest_aligned @@ -321,7 +322,7 @@ ENTRY(__copy_from_user) .Lcfu_0fupi: subs r2, r2, #4 addmi ip, r2, #4 bmi .Lcfu_0nowords -USER( ldrt r3, [r1], #4) +USER( T(ldr) r3, [r1], #4) str r3, [r0], #4 mov ip, r1, lsl #32 - PAGE_SHIFT @ On each page, use a ld/st??t instruction rsb ip, ip, #0 @@ -350,18 +351,18 @@ USER( ldrt r3, [r1], #4) ldmneia r1!, {r3 - r4} @ Shouldnt fault stmneia r0!, {r3 - r4} tst ip, #4 - ldrnet r3, [r1], #4 @ Shouldnt fault + T(ldrne) r3, [r1], #4 @ Shouldnt fault strne r3, [r0], #4 ands ip, ip, #3 beq .Lcfu_0fupi .Lcfu_0nowords: teq ip, #0 beq .Lcfu_finished .Lcfu_nowords: cmp ip, #2 -USER( ldrbt r3, [r1], #1) @ May fault +USER( T(ldrb) r3, [r1], #1) @ May fault strb r3, [r0], #1 -USER( ldrgebt r3, [r1], #1) @ May fault +USER( T(ldrgeb) r3, [r1], #1) @ May fault strgeb r3, [r0], #1 -USER( ldrgtbt r3, [r1], #1) @ May fault +USER( T(ldrgtb) r3, [r1], #1) @ May fault strgtb r3, [r0], #1 b .Lcfu_finished @@ -374,7 +375,7 @@ USER( ldrgtbt r3, [r1], #1) @ May fault .Lcfu_src_not_aligned: bic r1, r1, #3 -USER( ldrt r7, [r1], #4) @ May fault +USER( T(ldr) r7, [r1], #4) @ May fault cmp ip, #2 bgt .Lcfu_3fupi beq .Lcfu_2fupi @@ -382,7 +383,7 @@ USER( ldrt r7, [r1], #4) @ May fault addmi ip, r2, #4 bmi .Lcfu_1nowords mov r3, r7, pull #8 -USER( ldrt r7, [r1], #4) @ May fault +USER( T(ldr) r7, [r1], #4) @ May fault orr r3, r3, r7, push #24 str r3, [r0], #4 mov ip, r1, lsl #32 - PAGE_SHIFT @@ -417,7 +418,7 @@ USER( ldrt r7, [r1], #4) @ May fault stmneia r0!, {r3 - r4} tst ip, #4 movne r3, r7, pull #8 -USER( ldrnet r7, [r1], #4) @ May fault +USER( T(ldrne) r7, [r1], #4) @ May fault orrne r3, r3, r7, push #24 strne r3, [r0], #4 ands ip, ip, #3 @@ -437,7 +438,7 @@ USER( ldrnet r7, [r1], #4) @ May fault addmi ip, r2, #4 bmi .Lcfu_2nowords mov r3, r7, pull #16 -USER( ldrt r7, [r1], #4) @ May fault +USER( T(ldr) r7, [r1], #4) @ May fault orr r3, r3, r7, push #16 str r3, [r0], #4 mov ip, r1, lsl #32 - PAGE_SHIFT @@ -473,7 +474,7 @@ USER( ldrt r7, [r1], #4) @ May fault stmneia r0!, {r3 - r4} tst ip, #4 movne r3, r7, pull #16 -USER( ldrnet r7, [r1], #4) @ May fault +USER( T(ldrne) r7, [r1], #4) @ May fault orrne r3, r3, r7, push #16 strne r3, [r0], #4 ands ip, ip, #3 @@ -485,7 +486,7 @@ USER( ldrnet r7, [r1], #4) @ May fault strb r3, [r0], #1 movge r3, r7, get_byte_3 strgeb r3, [r0], #1 -USER( ldrgtbt r3, [r1], #0) @ May fault +USER( T(ldrgtb) r3, [r1], #0) @ May fault strgtb r3, [r0], #1 b .Lcfu_finished @@ -493,7 +494,7 @@ USER( ldrgtbt r3, [r1], #0) @ May fault addmi ip, r2, #4 bmi .Lcfu_3nowords mov r3, r7, pull #24 -USER( ldrt r7, [r1], #4) @ May fault +USER( T(ldr) r7, [r1], #4) @ May fault orr r3, r3, r7, push #8 str r3, [r0], #4 mov ip, r1, lsl #32 - PAGE_SHIFT @@ -528,7 +529,7 @@ USER( ldrt r7, [r1], #4) @ May fault stmneia r0!, {r3 - r4} tst ip, #4 movne r3, r7, pull #24 -USER( ldrnet r7, [r1], #4) @ May fault +USER( T(ldrne) r7, [r1], #4) @ May fault orrne r3, r3, r7, push #8 strne r3, [r0], #4 ands ip, ip, #3 @@ -538,9 +539,9 @@ USER( ldrnet r7, [r1], #4) @ May fault beq .Lcfu_finished cmp ip, #2 strb r3, [r0], #1 -USER( ldrgebt r3, [r1], #1) @ May fault +USER( T(ldrgeb) r3, [r1], #1) @ May fault strgeb r3, [r0], #1 -USER( ldrgtbt r3, [r1], #1) @ May fault +USER( T(ldrgtb) r3, [r1], #1) @ May fault strgtb r3, [r0], #1 b .Lcfu_finished ENDPROC(__copy_from_user) diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index 56aadfa..8bc0421 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -618,6 +618,15 @@ config CPU_CP15_MPU help Processor has the CP15 register, which has MPU related registers. +config CPU_USE_DOMAINS + bool + depends on MMU + default n if HAS_TLS_REG + default y + help + This option enables or disable the use of domain switching + via the set_fs() function. + # # CPU supports 36-bit I/O # diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index 568ccfc..f0fc850 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -223,8 +223,6 @@ __v7_setup: mcr p15, 0, r10, c2, c0, 2 @ TTB control register orr r4, r4, #TTB_FLAGS mcr p15, 0, r4, c2, c0, 1 @ load TTB1 - mov r10, #0x1f @ domains 0, 1 = manager - mcr p15, 0, r10, c3, c0, 0 @ load domain access register /* * Memory region attributes with SCTLR.TRE=1 * ^ permalink raw reply related [flat|nested] 20+ messages in thread
end of thread, other threads:[~2010-03-08 16:38 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas 2009-12-07 14:10 ` [PATCH 1/6] Global ASID allocation on SMP Catalin Marinas 2009-12-07 14:13 ` [PATCH 2/6] Broadcast the DMA cache operations on ARMv6 SMP hardware Catalin Marinas 2009-12-07 14:13 ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems Catalin Marinas 2009-12-12 12:24 ` Russell King - ARM Linux 2009-12-12 13:57 ` Russell King - ARM Linux 2009-12-14 12:21 ` Catalin Marinas 2009-12-14 12:15 ` Catalin Marinas 2009-12-14 16:28 ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMPsystems Catalin Marinas 2009-12-07 14:13 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcasts cache operations Catalin Marinas 2010-03-08 16:25 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcastscache operations Catalin Marinas 2010-03-08 16:31 ` Russell King - ARM Linux 2010-03-08 16:38 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardwarebroadcastscache operations Catalin Marinas 2009-12-07 14:14 ` [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE Catalin Marinas 2009-12-12 11:28 ` Russell King - ARM Linux 2009-12-14 15:50 ` Catalin Marinas 2009-12-14 15:58 ` Catalin Marinas 2009-12-14 16:11 ` Russell King - ARM Linux 2009-12-14 16:16 ` Catalin Marinas 2009-12-07 14:16 ` [PATCH 6/6] Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).