[PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1
@ 2009-12-07 14:10 Catalin Marinas
  2009-12-07 14:10 ` [PATCH 1/6] Global ASID allocation on SMP Catalin Marinas
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Catalin Marinas @ 2009-12-07 14:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

There are some patches which I've had in my branch for some time and I
would like to get them merged by 2.6.34.

The first three are fixes to allow Linux to work better on SMP systems.
The next two are improvements for the ARMv7. The last patch remove the
use of domains for ARMv6k and ARMv7 processors. One of the reasons is
that because of domain switching, we can get speculative fetches from
I/O areas.


Catalin Marinas (6):
      Global ASID allocation on SMP
      Broadcast the DMA cache operations on ARMv6 SMP hardware
      Fix a race in the vfp_notifier() function on SMP systems
      ARMv7: Use lazy cache flushing if hardware broadcasts cache operations
      ARMv7: Improved page table format with TRE and AFE
      Remove the domain switching on ARMv6k/v7 CPUs


 arch/arm/include/asm/assembler.h   |    9 +-
 arch/arm/include/asm/cacheflush.h  |   29 ++++++++
 arch/arm/include/asm/domain.h      |   31 ++++++++
 arch/arm/include/asm/futex.h       |    9 +-
 arch/arm/include/asm/memory.h      |    6 +-
 arch/arm/include/asm/mmu.h         |    1 
 arch/arm/include/asm/mmu_context.h |   15 ++++
 arch/arm/include/asm/page.h        |    8 ++
 arch/arm/include/asm/pgalloc.h     |   10 ++-
 arch/arm/include/asm/pgtable.h     |  117 ++++++++++++++++++++++++++++----
 arch/arm/include/asm/smp_plat.h    |    9 ++
 arch/arm/include/asm/uaccess.h     |   16 ++--
 arch/arm/kernel/entry-armv.S       |    4 +
 arch/arm/kernel/smp.c              |  133 ++++++++++++++++++++++++++++++++++++
 arch/arm/kernel/traps.c            |   17 +++++
 arch/arm/lib/getuser.S             |   13 ++--
 arch/arm/lib/putuser.S             |   29 ++++----
 arch/arm/lib/uaccess.S             |   83 +++++++++++-----------
 arch/arm/mm/Kconfig                |   26 +++++++
 arch/arm/mm/context.c              |  120 +++++++++++++++++++++++++++++---
 arch/arm/mm/dma-mapping.c          |   20 ++++-
 arch/arm/mm/fault-armv.c           |    2 -
 arch/arm/mm/fault.c                |   10 +++
 arch/arm/mm/flush.c                |    9 +-
 arch/arm/mm/mmu.c                  |    7 +-
 arch/arm/mm/proc-v7.S              |   58 ++++++----------
 arch/arm/vfp/vfpmodule.c           |   25 ++++++-
 27 files changed, 647 insertions(+), 169 deletions(-)

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] Global ASID allocation on SMP
  2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas
@ 2009-12-07 14:10 ` Catalin Marinas
  2009-12-07 14:13 ` [PATCH 2/6] Broadcast the DMA cache operations on ARMv6 SMP hardware Catalin Marinas
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2009-12-07 14:10 UTC (permalink / raw)
  To: linux-arm-kernel

The current ASID allocation algorithm doesn't ensure the notification
of the other CPUs when the ASID rolls over. This may lead to two
processes using the same ASID (but different generation) or multiple
threads of the same process using different ASIDs.

This patch adds the broadcasting of the ASID rollover event to the
other CPUs. To avoid a race on multiple CPUs modifying "cpu_last_asid"
during the handling of the broadcast, the ASID numbering now starts at
"smp_processor_id() + 1". At rollover, the cpu_last_asid will be set
to NR_CPUS.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/mmu.h         |    1 
 arch/arm/include/asm/mmu_context.h |   15 +++++
 arch/arm/mm/context.c              |  120 ++++++++++++++++++++++++++++++++----
 3 files changed, 122 insertions(+), 14 deletions(-)

diff --git a/arch/arm/include/asm/mmu.h b/arch/arm/include/asm/mmu.h
index b561584..68870c7 100644
--- a/arch/arm/include/asm/mmu.h
+++ b/arch/arm/include/asm/mmu.h
@@ -6,6 +6,7 @@
 typedef struct {
 #ifdef CONFIG_CPU_HAS_ASID
 	unsigned int id;
+	spinlock_t id_lock;
 #endif
 	unsigned int kvm_seq;
 } mm_context_t;
diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h
index de6cefb..a0b3cac 100644
--- a/arch/arm/include/asm/mmu_context.h
+++ b/arch/arm/include/asm/mmu_context.h
@@ -43,12 +43,23 @@ void __check_kvm_seq(struct mm_struct *mm);
 #define ASID_FIRST_VERSION	(1 << ASID_BITS)
 
 extern unsigned int cpu_last_asid;
+#ifdef CONFIG_SMP
+DECLARE_PER_CPU(struct mm_struct *, current_mm);
+#endif
 
 void __init_new_context(struct task_struct *tsk, struct mm_struct *mm);
 void __new_context(struct mm_struct *mm);
 
 static inline void check_context(struct mm_struct *mm)
 {
+	/*
+	 * This code is executed with interrupts enabled. Therefore,
+	 * mm->context.id cannot be updated to the latest ASID version
+	 * on a different CPU (and condition below not triggered)
+	 * without first getting an IPI to reset the context. The
+	 * alternative is to take a read_lock on mm->context.id_lock
+	 * (after changing its type to rwlock_t).
+	 */
 	if (unlikely((mm->context.id ^ cpu_last_asid) >> ASID_BITS))
 		__new_context(mm);
 
@@ -108,6 +119,10 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		__flush_icache_all();
 #endif
 	if (!cpumask_test_and_set_cpu(cpu, mm_cpumask(next)) || prev != next) {
+#ifdef CONFIG_SMP
+		struct mm_struct **crt_mm = &per_cpu(current_mm, cpu);
+		*crt_mm = next;
+#endif
 		check_context(next);
 		cpu_switch_mm(next->pgd, next);
 		if (cache_is_vivt())
diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index a9e22e3..626375b 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -10,12 +10,17 @@
 #include <linux/init.h>
 #include <linux/sched.h>
 #include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/percpu.h>
 
 #include <asm/mmu_context.h>
 #include <asm/tlbflush.h>
 
 static DEFINE_SPINLOCK(cpu_asid_lock);
 unsigned int cpu_last_asid = ASID_FIRST_VERSION;
+#ifdef CONFIG_SMP
+DEFINE_PER_CPU(struct mm_struct *, current_mm);
+#endif
 
 /*
  * We fork()ed a process, and we need a new context for the child
@@ -26,13 +31,105 @@ unsigned int cpu_last_asid = ASID_FIRST_VERSION;
 void __init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 {
 	mm->context.id = 0;
+	spin_lock_init(&mm->context.id_lock);
 }
 
+static void flush_context(void)
+{
+	/* set the reserved ASID before flushing the TLB */
+	asm("mcr	p15, 0, %0, c13, c0, 1\n" : : "r" (0));
+	isb();
+	local_flush_tlb_all();
+	if (icache_is_vivt_asid_tagged()) {
+		__flush_icache_all();
+		dsb();
+	}
+}
+
+#ifdef CONFIG_SMP
+
+static void set_mm_context(struct mm_struct *mm, unsigned int asid)
+{
+	/*
+	 * Locking needed for multi-threaded applications where the
+	 * same mm->context.id could be set from different CPUs during
+	 * the broadcast.
+	 */
+	spin_lock(&mm->context.id_lock);
+	if (likely((mm->context.id ^ cpu_last_asid) >> ASID_BITS)) {
+		/*
+		 * Old version of ASID found. Set the new one and
+		 * reset mm_cpumask(mm).
+		 */
+		mm->context.id = asid;
+		cpumask_clear(mm_cpumask(mm));
+	}
+	spin_unlock(&mm->context.id_lock);
+
+	/*
+	 * Set the mm_cpumask(mm) bit for the current CPU.
+	 */
+	cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
+}
+
+/*
+ * Reset the ASID on the current CPU. This function call is broadcast
+ * from the CPU handling the ASID rollover and holding cpu_asid_lock.
+ */
+static void reset_context(void *info)
+{
+	unsigned int asid;
+	unsigned int cpu = smp_processor_id();
+	struct mm_struct *mm = per_cpu(current_mm, cpu);
+
+	/*
+	 * Check if a current_mm was set on this CPU as it might still
+	 * be in the early booting stages and using the reserved ASID.
+	 */
+	if (!mm)
+		return;
+
+	smp_rmb();
+	asid = cpu_last_asid + cpu + 1;
+
+	flush_context();
+	set_mm_context(mm, asid);
+
+	/* set the new ASID */
+	asm("mcr	p15, 0, %0, c13, c0, 1\n" : : "r" (mm->context.id));
+}
+
+#else
+
+static inline void set_mm_context(struct mm_struct *mm, unsigned int asid)
+{
+	mm->context.id = asid;
+	cpumask_copy(mm_cpumask(mm), cpumask_of(smp_processor_id()));
+}
+
+#endif
+
 void __new_context(struct mm_struct *mm)
 {
 	unsigned int asid;
 
 	spin_lock(&cpu_asid_lock);
+#ifdef CONFIG_SMP
+	/*
+	 * Check the ASID again, in case the change was broadcast from
+	 * another CPU before we acquired the lock.
+	 */
+	if (unlikely(((mm->context.id ^ cpu_last_asid) >> ASID_BITS) == 0)) {
+		cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
+		spin_unlock(&cpu_asid_lock);
+		return;
+	}
+#endif
+	/*
+	 * At this point, it is guaranteed that the current mm (with
+	 * an old ASID) isn't active on any other CPU since the ASIDs
+	 * are changed simultaneously via IPI.
+	 */
 	asid = ++cpu_last_asid;
 	if (asid == 0)
 		asid = cpu_last_asid = ASID_FIRST_VERSION;
@@ -42,20 +139,15 @@ void __new_context(struct mm_struct *mm)
 	 * to start a new version and flush the TLB.
 	 */
 	if (unlikely((asid & ~ASID_MASK) == 0)) {
-		asid = ++cpu_last_asid;
-		/* set the reserved ASID before flushing the TLB */
-		asm("mcr	p15, 0, %0, c13, c0, 1	@ set reserved context ID\n"
-		    :
-		    : "r" (0));
-		isb();
-		flush_tlb_all();
-		if (icache_is_vivt_asid_tagged()) {
-			__flush_icache_all();
-			dsb();
-		}
+		asid = cpu_last_asid + smp_processor_id() + 1;
+		flush_context();
+#ifdef CONFIG_SMP
+		smp_wmb();
+		smp_call_function(reset_context, NULL, 1);
+#endif
+		cpu_last_asid += NR_CPUS;
 	}
-	spin_unlock(&cpu_asid_lock);
 
-	cpumask_copy(mm_cpumask(mm), cpumask_of(smp_processor_id()));
-	mm->context.id = asid;
+	set_mm_context(mm, asid);
+	spin_unlock(&cpu_asid_lock);
 }

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] Broadcast the DMA cache operations on ARMv6 SMP hardware
  2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas
  2009-12-07 14:10 ` [PATCH 1/6] Global ASID allocation on SMP Catalin Marinas
@ 2009-12-07 14:13 ` Catalin Marinas
  2009-12-07 14:13 ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems Catalin Marinas
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2009-12-07 14:13 UTC (permalink / raw)
  To: linux-arm-kernel

The Snoop Control Unit on the ARM11MPCore hardware does not detect the
cache operations and the dma_cache_maint() function may leave stale
cache entries on other CPUs. The solution is to broadcast the cache
operations to the other CPUs in software. However, there is no
restriction to the contexts in which dma_cache_maint() function can be
called (interrupt context or IRQs disabled).

This patch implements the smp_dma_cache_op() function which performs the
broadcast and it can be called with interrupts disabled or from
interrupt context.

To avoid deadlocking when more than one CPU try to invoke this
function, the implementation uses spin_trylock() loop if the IRQs are
disabled and, if the lock cannot be acquired, it polls for an incoming
IPI and executes it. In the unlikely situation of two or more CPUs
calling the smp_dma_cache_op() function with interrupts disabled, there
may be spurious (or delayed) IPIs after a CPU completes and enables the
IRQs. These are handled by checking the corresponding "unfinished" bits
in the IPI handler.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---

Just a note - the DMA cache ops broadcasting in software cannot easily
use the generic IPI functionality in the kernel because of the
restriction to have the interrupts enabled when invoking
smp_call_function(). Another reason to do it separately is that the
introduced smp_dma_cache_op() function runs the DMA cache operation
locally in parallel with the other CPUs while smp_call_function() would
only run it on the other CPUs in parallel but not with the current CPU.


 arch/arm/include/asm/cacheflush.h |   29 ++++++++
 arch/arm/kernel/smp.c             |  133 +++++++++++++++++++++++++++++++++++++
 arch/arm/mm/Kconfig               |    5 +
 arch/arm/mm/dma-mapping.c         |   14 ++--
 4 files changed, 174 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 3d0cdd2..b3c53f5 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -280,6 +280,35 @@ extern void dmac_flush_range(const void *, const void *);
 
 #endif
 
+#ifdef CONFIG_CPU_NO_CACHE_BCAST
+enum smp_dma_cache_type {
+	SMP_DMA_CACHE_INV,
+	SMP_DMA_CACHE_CLEAN,
+	SMP_DMA_CACHE_FLUSH,
+};
+
+extern void smp_dma_cache_op(int type, const void *start, const void *end);
+
+static inline void smp_dma_inv_range(const void *start, const void *end)
+{
+	smp_dma_cache_op(SMP_DMA_CACHE_INV, start, end);
+}
+
+static inline void smp_dma_clean_range(const void *start, const void *end)
+{
+	smp_dma_cache_op(SMP_DMA_CACHE_CLEAN, start, end);
+}
+
+static inline void smp_dma_flush_range(const void *start, const void *end)
+{
+	smp_dma_cache_op(SMP_DMA_CACHE_FLUSH, start, end);
+}
+#else
+#define smp_dma_inv_range		dmac_inv_range
+#define smp_dma_clean_range		dmac_clean_range
+#define smp_dma_flush_range		dmac_flush_range
+#endif
+
 #ifdef CONFIG_OUTER_CACHE
 
 extern struct outer_cache_fns outer_cache;
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 57162af..27827bd 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -65,6 +65,9 @@ enum ipi_msg_type {
 	IPI_CALL_FUNC,
 	IPI_CALL_FUNC_SINGLE,
 	IPI_CPU_STOP,
+#ifdef CONFIG_CPU_NO_CACHE_BCAST
+	IPI_DMA_CACHE,
+#endif
 };
 
 int __cpuinit __cpu_up(unsigned int cpu)
@@ -473,6 +476,10 @@ static void ipi_cpu_stop(unsigned int cpu)
 		cpu_relax();
 }
 
+#ifdef CONFIG_CPU_NO_CACHE_BCAST
+static void ipi_dma_cache_op(unsigned int cpu);
+#endif
+
 /*
  * Main handler for inter-processor interrupts
  *
@@ -532,6 +539,12 @@ asmlinkage void __exception do_IPI(struct pt_regs *regs)
 				ipi_cpu_stop(cpu);
 				break;
 
+#ifdef CONFIG_CPU_NO_CACHE_BCAST
+			case IPI_DMA_CACHE:
+				ipi_dma_cache_op(cpu);
+				break;
+#endif
+
 			default:
 				printk(KERN_CRIT "CPU%u: Unknown IPI message 0x%x\n",
 				       cpu, nextmsg);
@@ -687,3 +700,123 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 	} else
 		local_flush_tlb_kernel_range(start, end);
 }
+
+#ifdef CONFIG_CPU_NO_CACHE_BCAST
+/*
+ * DMA cache maintenance operations on SMP if the automatic hardware
+ * broadcasting is not available
+ */
+struct smp_dma_cache_struct {
+	int type;
+	const void *start;
+	const void *end;
+	cpumask_t unfinished;
+};
+
+static struct smp_dma_cache_struct *smp_dma_cache_data;
+static DEFINE_RWLOCK(smp_dma_cache_data_lock);
+static DEFINE_SPINLOCK(smp_dma_cache_lock);
+
+static void local_dma_cache_op(int type, const void *start, const void *end)
+{
+	switch (type) {
+	case SMP_DMA_CACHE_INV:
+		dmac_inv_range(start, end);
+		break;
+	case SMP_DMA_CACHE_CLEAN:
+		dmac_clean_range(start, end);
+		break;
+	case SMP_DMA_CACHE_FLUSH:
+		dmac_flush_range(start, end);
+		break;
+	default:
+		printk(KERN_CRIT "CPU%u: Unknown SMP DMA cache type %d\n",
+		       smp_processor_id(), type);
+	}
+}
+
+/*
+ * This function must be executed with interrupts disabled.
+ */
+static void ipi_dma_cache_op(unsigned int cpu)
+{
+	read_lock(&smp_dma_cache_data_lock);
+
+	/* check for spurious IPI */
+	if ((smp_dma_cache_data == NULL) ||
+	    (!cpu_isset(cpu, smp_dma_cache_data->unfinished)))
+		goto out;
+	local_dma_cache_op(smp_dma_cache_data->type,
+			   smp_dma_cache_data->start, smp_dma_cache_data->end);
+	cpu_clear(cpu, smp_dma_cache_data->unfinished);
+ out:
+	read_unlock(&smp_dma_cache_data_lock);
+}
+
+/*
+ * Execute the DMA cache operations on all online CPUs. This function
+ * can be called with interrupts disabled or from interrupt context.
+ */
+static void __smp_dma_cache_op(int type, const void *start, const void *end)
+{
+	struct smp_dma_cache_struct data;
+	cpumask_t callmap = cpu_online_map;
+	unsigned int cpu = get_cpu();
+	unsigned long flags;
+
+	cpu_clear(cpu, callmap);
+	data.type = type;
+	data.start = start;
+	data.end = end;
+	data.unfinished = callmap;
+
+	/*
+	 * If the spinlock cannot be acquired, other CPU is trying to
+	 * send an IPI. If the interrupts are disabled, we have to
+	 * poll for an incoming IPI.
+	 */
+	while (!spin_trylock_irqsave(&smp_dma_cache_lock, flags)) {
+		if (irqs_disabled())
+			ipi_dma_cache_op(cpu);
+	}
+
+	write_lock(&smp_dma_cache_data_lock);
+	smp_dma_cache_data = &data;
+	write_unlock(&smp_dma_cache_data_lock);
+
+	if (!cpus_empty(callmap))
+		send_ipi_message(&callmap, IPI_DMA_CACHE);
+	/* run the local operation in parallel with the other CPUs */
+	local_dma_cache_op(type, start, end);
+
+	while (!cpus_empty(data.unfinished))
+		barrier();
+
+	write_lock(&smp_dma_cache_data_lock);
+	smp_dma_cache_data = NULL;
+	write_unlock(&smp_dma_cache_data_lock);
+
+	spin_unlock_irqrestore(&smp_dma_cache_lock, flags);
+	put_cpu();
+}
+
+#define DMA_MAX_RANGE		SZ_4K
+
+/*
+ * Split the cache range in smaller pieces if interrupts are enabled
+ * to reduce the latency caused by disabling the interrupts during the
+ * broadcast.
+ */
+void smp_dma_cache_op(int type, const void *start, const void *end)
+{
+	if (irqs_disabled() || (end - start <= DMA_MAX_RANGE))
+		__smp_dma_cache_op(type, start, end);
+	else {
+		const void *ptr;
+		for (ptr = start; ptr < end - DMA_MAX_RANGE;
+		     ptr += DMA_MAX_RANGE)
+			__smp_dma_cache_op(type, ptr, ptr + DMA_MAX_RANGE);
+		__smp_dma_cache_op(type, ptr, end);
+	}
+}
+#endif
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 9264d81..ce382f5 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -516,6 +516,11 @@ config CPU_CACHE_VIPT
 config CPU_CACHE_FA
 	bool
 
+config CPU_NO_CACHE_BCAST
+	bool
+	depends on SMP
+	default y if CPU_V6
+
 if MMU
 # The copy-page model
 config CPU_COPY_V3
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index b9590a7..176c696 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -219,7 +219,7 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
 	{
 		void *ptr = page_address(page);
 		memset(ptr, 0, size);
-		dmac_flush_range(ptr, ptr + size);
+		smp_dma_flush_range(ptr, ptr + size);
 		outer_flush_range(__pa(ptr), __pa(ptr) + size);
 	}
 
@@ -548,15 +548,15 @@ void dma_cache_maint(const void *start, size_t size, int direction)
 
 	switch (direction) {
 	case DMA_FROM_DEVICE:		/* invalidate only */
-		inner_op = dmac_inv_range;
+		inner_op = smp_dma_inv_range;
 		outer_op = outer_inv_range;
 		break;
 	case DMA_TO_DEVICE:		/* writeback only */
-		inner_op = dmac_clean_range;
+		inner_op = smp_dma_clean_range;
 		outer_op = outer_clean_range;
 		break;
 	case DMA_BIDIRECTIONAL:		/* writeback and invalidate */
-		inner_op = dmac_flush_range;
+		inner_op = smp_dma_flush_range;
 		outer_op = outer_flush_range;
 		break;
 	default:
@@ -578,15 +578,15 @@ static void dma_cache_maint_contiguous(struct page *page, unsigned long offset,
 
 	switch (direction) {
 	case DMA_FROM_DEVICE:		/* invalidate only */
-		inner_op = dmac_inv_range;
+		inner_op = smp_dma_inv_range;
 		outer_op = outer_inv_range;
 		break;
 	case DMA_TO_DEVICE:		/* writeback only */
-		inner_op = dmac_clean_range;
+		inner_op = smp_dma_clean_range;
 		outer_op = outer_clean_range;
 		break;
 	case DMA_BIDIRECTIONAL:		/* writeback and invalidate */
-		inner_op = dmac_flush_range;
+		inner_op = smp_dma_flush_range;
 		outer_op = outer_flush_range;
 		break;
 	default:

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems
  2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas
  2009-12-07 14:10 ` [PATCH 1/6] Global ASID allocation on SMP Catalin Marinas
  2009-12-07 14:13 ` [PATCH 2/6] Broadcast the DMA cache operations on ARMv6 SMP hardware Catalin Marinas
@ 2009-12-07 14:13 ` Catalin Marinas
  2009-12-12 12:24   ` Russell King - ARM Linux
  2009-12-07 14:13 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcasts cache operations Catalin Marinas
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2009-12-07 14:13 UTC (permalink / raw)
  To: linux-arm-kernel

The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu
different from the current one, causing a race condition with both the
THREAD_NOTIFY_SWITCH path and vfp_support_entry().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/vfp/vfpmodule.c |   25 ++++++++++++++++++++++---
 1 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 2d7423a..fa6692a 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -14,6 +14,7 @@
 #include <linux/signal.h>
 #include <linux/sched.h>
 #include <linux/init.h>
+#include <linux/rcupdate.h>
 
 #include <asm/thread_notify.h>
 #include <asm/vfp.h>
@@ -49,14 +50,21 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v)
 
 #ifdef CONFIG_SMP
 		/*
+		 * RCU locking is needed in case last_VFP_context[cpu] is
+		 * released on a different CPU.
+		 */
+		rcu_read_lock();
+		vfp = last_VFP_context[cpu];
+		/*
 		 * On SMP, if VFP is enabled, save the old state in
 		 * case the thread migrates to a different CPU. The
 		 * restoring is done lazily.
 		 */
-		if ((fpexc & FPEXC_EN) && last_VFP_context[cpu]) {
-			vfp_save_state(last_VFP_context[cpu], fpexc);
-			last_VFP_context[cpu]->hard.cpu = cpu;
+		if ((fpexc & FPEXC_EN) && vfp) {
+			vfp_save_state(vfp, fpexc);
+			vfp->hard.cpu = cpu;
 		}
+		rcu_read_unlock();
 		/*
 		 * Thread migration, just force the reloading of the
 		 * state on the new CPU in case the VFP registers
@@ -91,8 +99,19 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v)
 	}
 
 	/* flush and release case: Per-thread VFP cleanup. */
+#ifndef CONFIG_SMP
 	if (last_VFP_context[cpu] == vfp)
 		last_VFP_context[cpu] = NULL;
+#else
+	/*
+	 * Since release_thread() may be called from a different CPU, we use
+	 * cmpxchg() here to avoid a race with the vfp_support_entry() code
+	 * which modifies last_VFP_context[cpu]. Note that on SMP systems, a
+	 * STR instruction on a different CPU clears the global exclusive
+	 * monitor state.
+	 */
+	(void)cmpxchg(&last_VFP_context[cpu], vfp, NULL);
+#endif
 
 	return NOTIFY_DONE;
 }

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcasts cache operations
  2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas
                   ` (2 preceding siblings ...)
  2009-12-07 14:13 ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems Catalin Marinas
@ 2009-12-07 14:13 ` Catalin Marinas
  2010-03-08 16:25   ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcastscache operations Catalin Marinas
  2009-12-07 14:14 ` [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE Catalin Marinas
  2009-12-07 14:16 ` [PATCH 6/6] Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
  5 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2009-12-07 14:13 UTC (permalink / raw)
  To: linux-arm-kernel

ARMv7 processors like Cortex-A9 broadcast the cache maintenance
operations in hardware. The patch adds the CPU ID checks for such
feature and allows the flush_dcache_page/update_mmu_cache pair to work
in lazy flushing mode similar to the UP case.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/smp_plat.h |    9 +++++++++
 arch/arm/mm/fault-armv.c        |    2 --
 arch/arm/mm/flush.c             |    9 ++++-----
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h
index 59303e2..e587167 100644
--- a/arch/arm/include/asm/smp_plat.h
+++ b/arch/arm/include/asm/smp_plat.h
@@ -13,4 +13,13 @@ static inline int tlb_ops_need_broadcast(void)
 	return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 2;
 }
 
+#ifndef CONFIG_SMP
+#define cache_ops_need_broadcast()	0
+#else
+static inline int cache_ops_need_broadcast(void)
+{
+	return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 1;
+}
+#endif
+
 #endif
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index d0d17b6..bb60117 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -153,10 +153,8 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 
 	page = pfn_to_page(pfn);
 	mapping = page_mapping(page);
-#ifndef CONFIG_SMP
 	if (test_and_clear_bit(PG_dcache_dirty, &page->flags))
 		__flush_dcache_page(mapping, page);
-#endif
 	if (mapping) {
 		if (cache_is_vivt())
 			make_coherent(mapping, vma, addr, pfn);
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7f294f3..2d3325d 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -15,6 +15,7 @@
 #include <asm/cachetype.h>
 #include <asm/system.h>
 #include <asm/tlbflush.h>
+#include <asm/smp_plat.h>
 
 #include "mm.h"
 
@@ -198,12 +199,10 @@ void flush_dcache_page(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 
-#ifndef CONFIG_SMP
-	if (!PageHighMem(page) && mapping && !mapping_mapped(mapping))
+	if (!cache_ops_need_broadcast() &&
+	    !PageHighMem(page) && mapping && !mapping_mapped(mapping))
 		set_bit(PG_dcache_dirty, &page->flags);
-	else
-#endif
-	{
+	else {
 		__flush_dcache_page(mapping, page);
 		if (mapping && cache_is_vivt())
 			__flush_dcache_aliases(mapping, page);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE
  2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas
                   ` (3 preceding siblings ...)
  2009-12-07 14:13 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcasts cache operations Catalin Marinas
@ 2009-12-07 14:14 ` Catalin Marinas
  2009-12-12 11:28   ` Russell King - ARM Linux
  2009-12-07 14:16 ` [PATCH 6/6] Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas
  5 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2009-12-07 14:14 UTC (permalink / raw)
  To: linux-arm-kernel

This patch enables the Access Flag in SCTLR and, together with the TEX
remapping, allows the use of the spare bits in the page table entry thus
removing the Linux specific PTEs. The simplified permission model is
used which means that "kernel read/write, user read-only" is no longer
available. This was used for the vectors page but with a dedicated TLS
register it is no longer necessary.

With this feature, the following bits were changed to overlap with the
hardware bits:

L_PTE_NOEXEC	-> XN
L_PTE_PRESENT	-> bit 1
L_PTE_YOUNG	-> AP[0] (access flag)
L_PTE_USER	-> AP[1] (simplified permission model)
L_PTE_NOWRITE	-> AP[2] (simplified permission model)
L_PTE_DIRTY	-> TEX[1] (spare bit)

The TEX[2] spare bit is available for future use.

Since !L_PTE_PRESENT requires bit 0 to be unset (otherwise it is a Large
Page Table entry), L_PTE_FILE occupies bit 2. This requires some changes
to the __swp_* and pte_to_pgoff/pgoff_to_pte macros to avoid overriding
this bit. PTE_FILE_MAXBITS becomes 29 if AFE is enabled.

There are no changes required to the PMD_SECT_* macros because the
current usage is compatible with the simplified permission model.

If hardware management of the access flag is available and SCTLR.HA is
set, the L_PTE_YOUNG bit is automatically set when a page is
accessed. With software management of the access flag, an "access flag"
fault is generated which is handled by the do_page_fault() function.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/memory.h  |    6 ++
 arch/arm/include/asm/page.h    |    8 +++
 arch/arm/include/asm/pgalloc.h |   10 ++-
 arch/arm/include/asm/pgtable.h |  117 +++++++++++++++++++++++++++++++++++-----
 arch/arm/mm/Kconfig            |   12 ++++
 arch/arm/mm/dma-mapping.c      |    6 ++
 arch/arm/mm/fault.c            |   10 +++
 arch/arm/mm/mmu.c              |    7 +-
 arch/arm/mm/proc-v7.S          |   56 ++++++++-----------
 9 files changed, 177 insertions(+), 55 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index bc2ff8b..d57040a 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -113,11 +113,15 @@
 #endif /* !CONFIG_MMU */
 
 /*
- * Size of DMA-consistent memory region.  Must be multiple of 2M,
+ * Size of DMA-consistent memory region.  Must be multiple of 2M (4MB if AFE),
  * between 2MB and 14MB inclusive.
  */
 #ifndef CONSISTENT_DMA_SIZE
+#ifndef CONFIG_CPU_AFE
 #define CONSISTENT_DMA_SIZE SZ_2M
+#else
+#define CONSISTENT_DMA_SIZE SZ_4M
+#endif
 #endif
 
 /*
diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 3a32af4..224159d 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -158,7 +158,11 @@ extern void copy_page(void *to, const void *from);
  */
 typedef struct { unsigned long pte; } pte_t;
 typedef struct { unsigned long pmd; } pmd_t;
+#ifndef CONFIG_CPU_AFE
 typedef struct { unsigned long pgd[2]; } pgd_t;
+#else
+typedef struct { unsigned long pgd[4]; } pgd_t;
+#endif
 typedef struct { unsigned long pgprot; } pgprot_t;
 
 #define pte_val(x)      ((x).pte)
@@ -176,7 +180,11 @@ typedef struct { unsigned long pgprot; } pgprot_t;
  */
 typedef unsigned long pte_t;
 typedef unsigned long pmd_t;
+#ifndef CONFIG_CPU_AFE
 typedef unsigned long pgd_t[2];
+#else
+typedef unsigned long pgd_t[4];
+#endif
 typedef unsigned long pgprot_t;
 
 #define pte_val(x)      (x)
diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index b12cc98..57083dd 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -62,7 +62,7 @@ pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
 	pte = (pte_t *)__get_free_page(PGALLOC_GFP);
 	if (pte) {
 		clean_dcache_area(pte, sizeof(pte_t) * PTRS_PER_PTE);
-		pte += PTRS_PER_PTE;
+		pte += LINUX_PTE_OFFSET;
 	}
 
 	return pte;
@@ -95,7 +95,7 @@ pte_alloc_one(struct mm_struct *mm, unsigned long addr)
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
 	if (pte) {
-		pte -= PTRS_PER_PTE;
+		pte -= LINUX_PTE_OFFSET;
 		free_page((unsigned long)pte);
 	}
 }
@@ -110,6 +110,10 @@ static inline void __pmd_populate(pmd_t *pmdp, unsigned long pmdval)
 {
 	pmdp[0] = __pmd(pmdval);
 	pmdp[1] = __pmd(pmdval + 256 * sizeof(pte_t));
+#ifdef CONFIG_CPU_AFE
+	pmdp[2] = __pmd(pmdval + 512 * sizeof(pte_t));
+	pmdp[3] = __pmd(pmdval + 768 * sizeof(pte_t));
+#endif
 	flush_pmd_entry(pmdp);
 }
 
@@ -128,7 +132,7 @@ pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, pte_t *ptep)
 	 * The pmd must be loaded with the physical
 	 * address of the PTE table
 	 */
-	pte_ptr -= PTRS_PER_PTE * sizeof(void *);
+	pte_ptr -= LINUX_PTE_OFFSET * sizeof(void *);
 	__pmd_populate(pmdp, __pa(pte_ptr) | _PAGE_KERNEL_TABLE);
 }
 
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 201ccaa..8429868 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -40,6 +40,7 @@
 #define VMALLOC_START		(((unsigned long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1))
 #endif
 
+#ifndef CONFIG_CPU_AFE
 /*
  * Hardware-wise, we have a two level page table structure, where the first
  * level has 4096 entries, and the second level has 256 entries.  Each entry
@@ -101,13 +102,31 @@
 #define PTRS_PER_PTE		512
 #define PTRS_PER_PMD		1
 #define PTRS_PER_PGD		2048
+#define LINUX_PTE_OFFSET	PTRS_PER_PTE
+#else
+/*
+ * If the Access Flag is enabled, Linux only uses one version of PTEs. We tell
+ * LInux that we have 1024 entries in the first level, each of which is 16
+ * bytes long (4 hardware pointers to the second level). The PTE level has
+ * 1024 entries.
+ */
+#define PTRS_PER_PTE		1024
+#define PTRS_PER_PMD		1
+#define PTRS_PER_PGD		1024
+#define LINUX_PTE_OFFSET	0
+#endif
 
 /*
  * PMD_SHIFT determines the size of the area a second-level page table can map
  * PGDIR_SHIFT determines what a third-level page table entry can map
  */
+#ifndef CONFIG_CPU_AFE
 #define PMD_SHIFT		21
 #define PGDIR_SHIFT		21
+#else
+#define PMD_SHIFT		22
+#define PGDIR_SHIFT		22
+#endif
 
 #define LIBRARY_TEXT_START	0x0c000000
 
@@ -150,6 +169,7 @@ extern void __pgd_error(const char *file, int line, unsigned long val);
 #define SUPERSECTION_SIZE	(1UL << SUPERSECTION_SHIFT)
 #define SUPERSECTION_MASK	(~(SUPERSECTION_SIZE-1))
 
+#ifndef CONFIG_CPU_AFE
 /*
  * "Linux" PTE definitions.
  *
@@ -169,7 +189,30 @@ extern void __pgd_error(const char *file, int line, unsigned long val);
 #define L_PTE_USER		(1 << 8)
 #define L_PTE_EXEC		(1 << 9)
 #define L_PTE_SHARED		(1 << 10)	/* shared(v6), coherent(xsc3) */
+#define L_PTE_NOEXEC		0
+#define L_PTE_NOWRITE		0
+#else
+/*
+ * "Linux" PTE definitions with AFE set.
+ *
+ * These bits overlap with the hardware bits but the naming is preserved for
+ * consistency with the non-AFE version.
+ */
+#define L_PTE_NOEXEC		(1 << 0)	/* XN */
+#define L_PTE_PRESENT		(1 << 1)
+#define L_PTE_FILE		(1 << 2)	/* only when !PRESENT */
+#define L_PTE_BUFFERABLE	(1 << 2)	/* B */
+#define L_PTE_CACHEABLE		(1 << 3)	/* C */
+#define L_PTE_YOUNG		(1 << 4)	/* access flag */
+#define L_PTE_USER		(1 << 5)	/* AP[1] */
+#define L_PTE_DIRTY		(1 << 7)	/* TEX[1] */
+#define L_PTE_NOWRITE		(1 << 9)	/* AP[2] */
+#define L_PTE_SHARED		(1 << 10)	/* shared(v6+) */
+#define L_PTE_EXEC		0
+#define L_PTE_WRITE		0
+#endif
 
+#ifndef CONFIG_CPU_AFE
 /*
  * These are the memory types, defined to be compatible with
  * pre-ARMv6 CPUs cacheable and bufferable bits:   XXCB
@@ -185,6 +228,22 @@ extern void __pgd_error(const char *file, int line, unsigned long val);
 #define L_PTE_MT_DEV_WC		(0x09 << 2)	/* 1001 */
 #define L_PTE_MT_DEV_CACHED	(0x0b << 2)	/* 1011 */
 #define L_PTE_MT_MASK		(0x0f << 2)
+#else
+/*
+ * AFE page table format requires TEX remapping as well: TEX[0], C, B.
+ */
+#define L_PTE_MT_UNCACHED	((0 << 6) | (0 << 2))	/* 000 */
+#define L_PTE_MT_BUFFERABLE	((0 << 6) | (1 << 2))	/* 001 */
+#define L_PTE_MT_WRITETHROUGH	((0 << 6) | (2 << 2))	/* 010 */
+#define L_PTE_MT_WRITEBACK	((0 << 6) | (3 << 2))	/* 011 */
+#define L_PTE_MT_MINICACHE	((1 << 6) | (2 << 2))	/* 110 (sa1100, xscale) */
+#define L_PTE_MT_WRITEALLOC	((1 << 6) | (3 << 2))	/* 111 */
+#define L_PTE_MT_DEV_SHARED	((1 << 6) | (0 << 2))	/* 100 */
+#define L_PTE_MT_DEV_NONSHARED	((1 << 6) | (0 << 2))	/* 100 */
+#define L_PTE_MT_DEV_WC		((0 << 6) | (1 << 2))	/* 001 */
+#define L_PTE_MT_DEV_CACHED	((0 << 6) | (3 << 2))	/* 011 */
+#define L_PTE_MT_MASK		((1 << 6) | (3 << 2))
+#endif
 
 #ifndef __ASSEMBLY__
 
@@ -202,22 +261,22 @@ extern pgprot_t		pgprot_kernel;
 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
 
 #define PAGE_NONE		pgprot_user
-#define PAGE_SHARED		_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_WRITE)
+#define PAGE_SHARED		_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_WRITE | L_PTE_NOEXEC)
 #define PAGE_SHARED_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_WRITE | L_PTE_EXEC)
-#define PAGE_COPY		_MOD_PROT(pgprot_user, L_PTE_USER)
-#define PAGE_COPY_EXEC		_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_EXEC)
-#define PAGE_READONLY		_MOD_PROT(pgprot_user, L_PTE_USER)
-#define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_EXEC)
-#define PAGE_KERNEL		pgprot_kernel
+#define PAGE_COPY		_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_NOEXEC | L_PTE_NOWRITE)
+#define PAGE_COPY_EXEC		_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_EXEC | L_PTE_NOWRITE)
+#define PAGE_READONLY		_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_NOEXEC | L_PTE_NOWRITE)
+#define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_EXEC | L_PTE_NOWRITE)
+#define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_NOEXEC)
 #define PAGE_KERNEL_EXEC	_MOD_PROT(pgprot_kernel, L_PTE_EXEC)
 
-#define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT)
-#define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_WRITE)
+#define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_NOEXEC | L_PTE_NOWRITE)
+#define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_WRITE | L_PTE_NOEXEC)
 #define __PAGE_SHARED_EXEC	__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_WRITE | L_PTE_EXEC)
-#define __PAGE_COPY		__pgprot(_L_PTE_DEFAULT | L_PTE_USER)
-#define __PAGE_COPY_EXEC	__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_EXEC)
-#define __PAGE_READONLY		__pgprot(_L_PTE_DEFAULT | L_PTE_USER)
-#define __PAGE_READONLY_EXEC	__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_EXEC)
+#define __PAGE_COPY		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_NOEXEC | L_PTE_NOWRITE)
+#define __PAGE_COPY_EXEC	__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_EXEC | L_PTE_NOWRITE)
+#define __PAGE_READONLY		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_NOEXEC | L_PTE_NOWRITE)
+#define __PAGE_READONLY_EXEC	__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_EXEC | L_PTE_NOWRITE)
 
 #endif /* __ASSEMBLY__ */
 
@@ -287,7 +346,11 @@ extern struct page *empty_zero_page;
  * Undefined behaviour if not..
  */
 #define pte_present(pte)	(pte_val(pte) & L_PTE_PRESENT)
+#ifndef CONFIG_CPU_AFE
 #define pte_write(pte)		(pte_val(pte) & L_PTE_WRITE)
+#else
+#define pte_write(pte)		(!(pte_val(pte) & L_PTE_NOWRITE))
+#endif
 #define pte_dirty(pte)		(pte_val(pte) & L_PTE_DIRTY)
 #define pte_young(pte)		(pte_val(pte) & L_PTE_YOUNG)
 #define pte_special(pte)	(0)
@@ -295,8 +358,13 @@ extern struct page *empty_zero_page;
 #define PTE_BIT_FUNC(fn,op) \
 static inline pte_t pte_##fn(pte_t pte) { pte_val(pte) op; return pte; }
 
+#ifndef CONFIG_CPU_AFE
 PTE_BIT_FUNC(wrprotect, &= ~L_PTE_WRITE);
 PTE_BIT_FUNC(mkwrite,   |= L_PTE_WRITE);
+#else
+PTE_BIT_FUNC(wrprotect, |= L_PTE_NOWRITE);
+PTE_BIT_FUNC(mkwrite,   &= ~L_PTE_NOWRITE);
+#endif
 PTE_BIT_FUNC(mkclean,   &= ~L_PTE_DIRTY);
 PTE_BIT_FUNC(mkdirty,   |= L_PTE_DIRTY);
 PTE_BIT_FUNC(mkold,     &= ~L_PTE_YOUNG);
@@ -316,10 +384,27 @@ static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 #define pmd_present(pmd)	(pmd_val(pmd))
 #define pmd_bad(pmd)		(pmd_val(pmd) & 2)
 
+#ifndef CONFIG_CPU_AFE
+#define copy_pmd(pmdpd,pmdps)		\
+	do {				\
+		pmdpd[0] = pmdps[0];	\
+		pmdpd[1] = pmdps[1];	\
+		flush_pmd_entry(pmdpd);	\
+	} while (0)
+
+#define pmd_clear(pmdp)			\
+	do {				\
+		pmdp[0] = __pmd(0);	\
+		pmdp[1] = __pmd(0);	\
+		clean_pmd_entry(pmdp);	\
+	} while (0)
+#else
 #define copy_pmd(pmdpd,pmdps)		\
 	do {				\
 		pmdpd[0] = pmdps[0];	\
 		pmdpd[1] = pmdps[1];	\
+		pmdpd[2] = pmdps[2];	\
+		pmdpd[3] = pmdps[3];	\
 		flush_pmd_entry(pmdpd);	\
 	} while (0)
 
@@ -327,15 +412,18 @@ static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 	do {				\
 		pmdp[0] = __pmd(0);	\
 		pmdp[1] = __pmd(0);	\
+		pmdp[2] = __pmd(0);	\
+		pmdp[3] = __pmd(0);	\
 		clean_pmd_entry(pmdp);	\
 	} while (0)
+#endif
 
 static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 {
 	unsigned long ptr;
 
 	ptr = pmd_val(pmd) & ~(PTRS_PER_PTE * sizeof(void *) - 1);
-	ptr += PTRS_PER_PTE * sizeof(void *);
+	ptr += LINUX_PTE_OFFSET * sizeof(void *);
 
 	return __va(ptr);
 }
@@ -375,7 +463,8 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
-	const unsigned long mask = L_PTE_EXEC | L_PTE_WRITE | L_PTE_USER;
+	const unsigned long mask = L_PTE_EXEC | L_PTE_WRITE | L_PTE_USER |
+		L_PTE_NOEXEC | L_PTE_NOWRITE;
 	pte_val(pte) = (pte_val(pte) & ~mask) | (pgprot_val(newprot) & mask);
 	return pte;
 }
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index ce382f5..56aadfa 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -454,6 +454,18 @@ config CPU_32v6
 config CPU_32v7
 	bool
 
+# Page table format
+config CPU_AFE
+	bool
+	depends on MMU
+	default y if CPU_V7
+	help
+	  This option sets the Access Flag Enable bit forcing the simplified
+	  permission model and automatic management of the access bit (if
+	  supported by the hardware). With this option enabled and TEX
+	  remapping, Linux no longer keeps a separate page table entry for
+	  storing additional bits.
+
 # The abort model
 config CPU_ABRT_NOMMU
 	bool
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 176c696..15dafb6 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -25,9 +25,15 @@
 #include <asm/sizes.h>
 
 /* Sanity check size */
+#ifndef CONFIG_CPU_AFE
 #if (CONSISTENT_DMA_SIZE % SZ_2M)
 #error "CONSISTENT_DMA_SIZE must be multiple of 2MiB"
 #endif
+#else
+#if (CONSISTENT_DMA_SIZE % SZ_4M)
+#error "CONSISTENT_DMA_SIZE must be multiple of 4MiB"
+#endif
+#endif
 
 #define CONSISTENT_END	(0xffe00000)
 #define CONSISTENT_BASE	(CONSISTENT_END - CONSISTENT_DMA_SIZE)
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 10e0680..e398ade 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -107,7 +107,9 @@ void show_pte(struct mm_struct *mm, unsigned long addr)
 
 		pte = pte_offset_map(pmd, addr);
 		printk(", *pte=%08lx", pte_val(*pte));
+#ifndef CONFIG_CPU_AFE
 		printk(", *ppte=%08lx", pte_val(pte[-PTRS_PER_PTE]));
+#endif
 		pte_unmap(pte);
 	} while(0);
 
@@ -458,7 +460,11 @@ static struct fsr_info {
 	{ do_bad,		SIGILL,	 BUS_ADRALN,	"alignment exception"		   },
 	{ do_bad,		SIGBUS,	 0,		"external abort on linefetch"	   },
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"section translation fault"	   },
+#ifndef CONFIG_CPU_AFE
 	{ do_bad,		SIGBUS,	 0,		"external abort on linefetch"	   },
+#else
+	{ do_page_fault,	SIGSEGV, SEGV_MAPERR,	"access flag fault"		   },
+#endif
 	{ do_page_fault,	SIGSEGV, SEGV_MAPERR,	"page translation fault"	   },
 	{ do_bad,		SIGBUS,	 0,		"external abort on non-linefetch"  },
 	{ do_bad,		SIGSEGV, SEGV_ACCERR,	"section domain fault"		   },
@@ -532,7 +538,11 @@ static struct fsr_info ifsr_info[] = {
 	{ do_bad,		SIGSEGV, SEGV_ACCERR,	"section access flag fault"	   },
 	{ do_bad,		SIGBUS,  0,		"unknown 4"			   },
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"section translation fault"	   },
+#ifndef CONFIG_CPU_AFE
 	{ do_bad,		SIGSEGV, SEGV_ACCERR,	"page access flag fault"	   },
+#else
+	{ do_page_fault,	SIGSEGV, SEGV_MAPERR,	"access flag fault"		   },
+#endif
 	{ do_page_fault,	SIGSEGV, SEGV_MAPERR,	"page translation fault"	   },
 	{ do_bad,		SIGBUS,	 0,		"external abort on non-linefetch"  },
 	{ do_bad,		SIGSEGV, SEGV_ACCERR,	"section domain fault"		   },
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index ea67be0..b3796a0 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -190,7 +190,7 @@ void adjust_cr(unsigned long mask, unsigned long set)
 }
 #endif
 
-#define PROT_PTE_DEVICE		L_PTE_PRESENT|L_PTE_YOUNG|L_PTE_DIRTY|L_PTE_WRITE
+#define PROT_PTE_DEVICE		L_PTE_PRESENT|L_PTE_YOUNG|L_PTE_DIRTY|L_PTE_WRITE|L_PTE_NOEXEC
 #define PROT_SECT_DEVICE	PMD_TYPE_SECT|PMD_SECT_AP_WRITE
 
 static struct mem_type mem_types[] = {
@@ -241,7 +241,7 @@ static struct mem_type mem_types[] = {
 	},
 	[MT_HIGH_VECTORS] = {
 		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
-				L_PTE_USER | L_PTE_EXEC,
+				L_PTE_USER | L_PTE_EXEC | L_PTE_NOWRITE,
 		.prot_l1   = PMD_TYPE_TABLE,
 		.domain    = DOMAIN_USER,
 	},
@@ -491,7 +491,8 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	pte_t *pte;
 
 	if (pmd_none(*pmd)) {
-		pte = alloc_bootmem_low_pages(2 * PTRS_PER_PTE * sizeof(pte_t));
+		pte = alloc_bootmem_low_pages((LINUX_PTE_OFFSET
+					       + PTRS_PER_PTE) * sizeof(pte_t));
 		__pmd_populate(pmd, __pa(pte) | type->prot_l1);
 	}
 
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 3a28521..568ccfc 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -126,38 +126,26 @@ ENDPROC(cpu_v7_switch_mm)
  *		  (hardware version is stored at -1024 bytes)
  *	- pte   - PTE value to store
  *	- ext	- value for extended PTE bits
+ *
+ *	Simplified permission translation (AP0 is the access flag):
+ *	YUWD  AP2 AP1 AP0	SVC	User
+ *	0xxx   0   0   0	no acc	no acc
+ *	100x   1   0   1	r/o	no acc
+ *	10x0   1   0   1	r/o	no acc
+ *	1011   0   0   1	r/w	no acc
+ *	110x   1   1   1	r/o	r/o
+ *	11x0   1   1   1	r/o	r/o
+ *	1111   0   1   1	r/w	r/w
  */
 ENTRY(cpu_v7_set_pte_ext)
 #ifdef CONFIG_MMU
- ARM(	str	r1, [r0], #-2048	)	@ linux version
- THUMB(	str	r1, [r0]		)	@ linux version
- THUMB(	sub	r0, r0, #2048		)
-
-	bic	r3, r1, #0x000003f0
-	bic	r3, r3, #PTE_TYPE_MASK
-	orr	r3, r3, r2
-	orr	r3, r3, #PTE_EXT_AP0 | 2
-
-	tst	r1, #1 << 4
-	orrne	r3, r3, #PTE_EXT_TEX(1)
-
-	tst	r1, #L_PTE_WRITE
-	tstne	r1, #L_PTE_DIRTY
-	orreq	r3, r3, #PTE_EXT_APX
-
-	tst	r1, #L_PTE_USER
-	orrne	r3, r3, #PTE_EXT_AP1
-	tstne	r3, #PTE_EXT_APX
-	bicne	r3, r3, #PTE_EXT_APX | PTE_EXT_AP0
-
-	tst	r1, #L_PTE_EXEC
-	orreq	r3, r3, #PTE_EXT_XN
-
-	tst	r1, #L_PTE_YOUNG
-	tstne	r1, #L_PTE_PRESENT
-	moveq	r3, #0
-
-	str	r3, [r0]
+	tst	r1, #L_PTE_PRESENT
+	beq	1f
+	tst	r1, #L_PTE_DIRTY
+	orreq	r1, #L_PTE_NOWRITE
+	orr	r1, r1, r2
+1:
+	str	r1, [r0]
 	mcr	p15, 0, r0, c7, c10, 1		@ flush_pte
 #endif
 	mov	pc, lr
@@ -283,14 +271,14 @@ __v7_setup:
 ENDPROC(__v7_setup)
 
 	/*   AT
-	 *  TFR   EV X F   I D LR    S
-	 * .EEE ..EE PUI. .T.T 4RVI ZWRS BLDP WCAM
-	 * rxxx rrxx xxx0 0101 xxxx xxxx x111 xxxx < forced
-	 *    1    0 110       0011 1100 .111 1101 < we want
+	 *  TFR   EV X F   IHD LR    S
+	 * .EEE ..EE PUI. .TAT 4RVI ZWRS BLDP WCAM
+	 * rxxx rrxx xxx0 01x1 xxxx xxxx x111 xxxx < forced
+	 *   11    0 110    1  0011 1100 .111 1101 < we want
 	 */
 	.type	v7_crval, #object
 v7_crval:
-	crval	clear=0x0120c302, mmuset=0x10c03c7d, ucset=0x00c01c7c
+	crval	clear=0x0120c302, mmuset=0x30c23c7d, ucset=0x00c01c7c
 
 __v7_setup_stack:
 	.space	4 * 11				@ 11 registers

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] Remove the domain switching on ARMv6k/v7 CPUs
  2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas
                   ` (4 preceding siblings ...)
  2009-12-07 14:14 ` [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE Catalin Marinas
@ 2009-12-07 14:16 ` Catalin Marinas
  5 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2009-12-07 14:16 UTC (permalink / raw)
  To: linux-arm-kernel

This patch removes the domain switching functionality via the set_fs and
__switch_to functions on cores that have a TLS register.

Currently, the ioremap and vmalloc areas share the same level 1 page
tables and therefore have the same domain (DOMAIN_KERNEL). When the
kernel domain is modified from Client to Manager (via the __set_fs or in
the __switch_to function), the XN (eXecute Never) bit is overridden and
newer CPUs can speculatively prefetch the ioremap'ed memory.

Linux performs the kernel domain switching to allow user-specific
functions (copy_to/from_user, get/put_user etc.) to access kernel
memory. In order for these functions to work with the kernel domain set
to Client, the patch modifies the LDRT/STRT and related instructions to
the LDR/STR ones.

The user pages access rights are also modified for kernel read-only
access rather than read/write so that the copy-on-write mechanism still
works. CPU_USE_DOMAINS gets disabled only if HAS_TLS_REG is defined
since writing the TLS value to the high vectors page isn't possible.

The user addresses passed to the kernel are checked by the access_ok()
function so that they do not point to the kernel space.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---

An additional note - prior to ARMv6 we cannot set the user R/O, kernel
R/O permission on a page hence we have to use STRT variant to write such
page from the kernel. Because of this, the T macro had to be introduced
to differentiate between the STRT and STR usages and build time.

A better name could be used instead of "T".


 arch/arm/include/asm/assembler.h |    9 ++--
 arch/arm/include/asm/domain.h    |   31 +++++++++++++-
 arch/arm/include/asm/futex.h     |    9 ++--
 arch/arm/include/asm/uaccess.h   |   16 ++++---
 arch/arm/kernel/entry-armv.S     |    4 +-
 arch/arm/kernel/traps.c          |   17 ++++++++
 arch/arm/lib/getuser.S           |   13 +++---
 arch/arm/lib/putuser.S           |   29 +++++++------
 arch/arm/lib/uaccess.S           |   83 +++++++++++++++++++-------------------
 arch/arm/mm/Kconfig              |    9 ++++
 arch/arm/mm/proc-v7.S            |    2 -
 11 files changed, 139 insertions(+), 83 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 00f46d9..4b82143 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -18,6 +18,7 @@
 #endif
 
 #include <asm/ptrace.h>
+#include <asm/domain.h>
 
 /*
  * Endian independent macros for shifting bytes within registers.
@@ -186,9 +187,9 @@
 	.macro	usraccoff, instr, reg, ptr, inc, off, cond, abort
 9999:
 	.if	\inc == 1
-	\instr\cond\()bt \reg, [\ptr, #\off]
+	T(\instr\cond\()b) \reg, [\ptr, #\off]
 	.elseif	\inc == 4
-	\instr\cond\()t \reg, [\ptr, #\off]
+	T(\instr\cond\()) \reg, [\ptr, #\off]
 	.else
 	.error	"Unsupported inc macro argument"
 	.endif
@@ -227,9 +228,9 @@
 	.rept	\rept
 9999:
 	.if	\inc == 1
-	\instr\cond\()bt \reg, [\ptr], #\inc
+	T(\instr\cond\()b) \reg, [\ptr], #\inc
 	.elseif	\inc == 4
-	\instr\cond\()t \reg, [\ptr], #\inc
+	T(\instr\cond\()) \reg, [\ptr], #\inc
 	.else
 	.error	"Unsupported inc macro argument"
 	.endif
diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h
index cc7ef40..af18cea 100644
--- a/arch/arm/include/asm/domain.h
+++ b/arch/arm/include/asm/domain.h
@@ -45,13 +45,17 @@
  */
 #define DOMAIN_NOACCESS	0
 #define DOMAIN_CLIENT	1
+#ifdef CONFIG_CPU_USE_DOMAINS
 #define DOMAIN_MANAGER	3
+#else
+#define DOMAIN_MANAGER	1
+#endif
 
 #define domain_val(dom,type)	((type) << (2*(dom)))
 
 #ifndef __ASSEMBLY__
 
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
 #define set_domain(x)					\
 	do {						\
 	__asm__ __volatile__(				\
@@ -74,5 +78,28 @@
 #define modify_domain(dom,type)	do { } while (0)
 #endif
 
+/*
+ * Generate the T (user) versions of the LDR/STR and related
+ * instructions (inline assembly)
+ */
+#ifdef CONFIG_CPU_USE_DOMAINS
+#define T(instr)	#instr "t"
+#else
+#define T(instr)	#instr
 #endif
-#endif /* !__ASSEMBLY__ */
+
+#else /* __ASSEMBLY__ */
+
+/*
+ * Generate the T (user) versions of the LDR/STR and related
+ * instructions
+ */
+#ifdef CONFIG_CPU_USE_DOMAINS
+#define T(instr)	instr ## t
+#else
+#define T(instr)	instr
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* !__ASM_PROC_DOMAIN_H */
diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index bfcc159..8d868bd 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -13,12 +13,13 @@
 #include <linux/preempt.h>
 #include <linux/uaccess.h>
 #include <asm/errno.h>
+#include <asm/domain.h>
 
 #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg)	\
 	__asm__ __volatile__(					\
-	"1:	ldrt	%1, [%2]\n"				\
+	"1:	" T(ldr) "	%1, [%2]\n"			\
 	"	" insn "\n"					\
-	"2:	strt	%0, [%2]\n"				\
+	"2:	" T(str) "	%0, [%2]\n"			\
 	"	mov	%0, #0\n"				\
 	"3:\n"							\
 	"	.section __ex_table,\"a\"\n"			\
@@ -97,10 +98,10 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
 	pagefault_disable();	/* implies preempt_disable() */
 
 	__asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
-	"1:	ldrt	%0, [%3]\n"
+	"1:	" T(ldr) "	%0, [%3]\n"
 	"	teq	%0, %1\n"
 	"	it	eq	@ explicit IT needed for the 2b label\n"
-	"2:	streqt	%2, [%3]\n"
+	"2:	" T(streq) "	%2, [%3]\n"
 	"3:\n"
 	"	.section __ex_table,\"a\"\n"
 	"	.align	3\n"
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 1d6bd40..e4d0905 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -227,7 +227,7 @@ do {									\
 
 #define __get_user_asm_byte(x,addr,err)				\
 	__asm__ __volatile__(					\
-	"1:	ldrbt	%1,[%2]\n"				\
+	"1:	" T(ldrb) "	%1,[%2],#0\n"			\
 	"2:\n"							\
 	"	.section .fixup,\"ax\"\n"			\
 	"	.align	2\n"					\
@@ -263,7 +263,7 @@ do {									\
 
 #define __get_user_asm_word(x,addr,err)				\
 	__asm__ __volatile__(					\
-	"1:	ldrt	%1,[%2]\n"				\
+	"1:	" T(ldr) "	%1,[%2],#0\n"			\
 	"2:\n"							\
 	"	.section .fixup,\"ax\"\n"			\
 	"	.align	2\n"					\
@@ -308,7 +308,7 @@ do {									\
 
 #define __put_user_asm_byte(x,__pu_addr,err)			\
 	__asm__ __volatile__(					\
-	"1:	strbt	%1,[%2]\n"				\
+	"1:	" T(strb) "	%1,[%2],#0\n"			\
 	"2:\n"							\
 	"	.section .fixup,\"ax\"\n"			\
 	"	.align	2\n"					\
@@ -341,7 +341,7 @@ do {									\
 
 #define __put_user_asm_word(x,__pu_addr,err)			\
 	__asm__ __volatile__(					\
-	"1:	strt	%1,[%2]\n"				\
+	"1:	" T(str) "	%1,[%2],#0\n"			\
 	"2:\n"							\
 	"	.section .fixup,\"ax\"\n"			\
 	"	.align	2\n"					\
@@ -366,10 +366,10 @@ do {									\
 
 #define __put_user_asm_dword(x,__pu_addr,err)			\
 	__asm__ __volatile__(					\
- ARM(	"1:	strt	" __reg_oper1 ", [%1], #4\n"	)	\
- ARM(	"2:	strt	" __reg_oper0 ", [%1]\n"	)	\
- THUMB(	"1:	strt	" __reg_oper1 ", [%1]\n"	)	\
- THUMB(	"2:	strt	" __reg_oper0 ", [%1, #4]\n"	)	\
+ ARM(	"1:	" T(str) "	" __reg_oper1 ", [%1], #4\n"	)	\
+ ARM(	"2:	" T(str) "	" __reg_oper0 ", [%1]\n"	)	\
+ THUMB(	"1:	" T(str) "	" __reg_oper1 ", [%1]\n"	)	\
+ THUMB(	"2:	" T(str) "	" __reg_oper0 ", [%1, #4]\n"	)	\
 	"3:\n"							\
 	"	.section .fixup,\"ax\"\n"			\
 	"	.align	2\n"					\
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index d2903e3..1b31ecb 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -736,7 +736,7 @@ ENTRY(__switch_to)
  THUMB(	stmia	ip!, {r4 - sl, fp}	   )	@ Store most regs on stack
  THUMB(	str	sp, [ip], #4		   )
  THUMB(	str	lr, [ip], #4		   )
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
 	ldr	r6, [r2, #TI_CPU_DOMAIN]
 #endif
 #if defined(CONFIG_HAS_TLS_REG)
@@ -745,7 +745,7 @@ ENTRY(__switch_to)
 	mov	r4, #0xffff0fff
 	str	r3, [r4, #-15]			@ TLS val at 0xffff0ff0
 #endif
-#ifdef CONFIG_MMU
+#ifdef CONFIG_CPU_USE_DOMAINS
 	mcr	p15, 0, r6, c3, c0, 0		@ Set domain register
 #endif
 	mov	r5, r0
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 3f361a7..23d7673 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -28,6 +28,7 @@
 #include <asm/unistd.h>
 #include <asm/traps.h>
 #include <asm/unwind.h>
+#include <asm/tlbflush.h>
 
 #include "ptrace.h"
 #include "signal.h"
@@ -735,6 +736,16 @@ void __init early_trap_init(void)
 	extern char __vectors_start[], __vectors_end[];
 	extern char __kuser_helper_start[], __kuser_helper_end[];
 	int kuser_sz = __kuser_helper_end - __kuser_helper_start;
+#ifndef CONFIG_CPU_USE_DOMAINS
+	pgd_t *pgd = pgd_offset_k(vectors);
+	pmd_t *pmd = pmd_offset(pgd, vectors);
+	pte_t *pte = pte_offset_kernel(pmd, vectors);
+	pte_t entry = *pte;
+
+	/* allow writing to the vectors page */
+	set_pte_ext(pte, pte_mkwrite(entry), 0);
+	local_flush_tlb_kernel_page(vectors);
+#endif
 
 	/*
 	 * Copy the vectors, stubs and kuser helpers (in entry-armv.S)
@@ -754,6 +765,12 @@ void __init early_trap_init(void)
 	memcpy((void *)KERN_RESTART_CODE, syscall_restart_code,
 	       sizeof(syscall_restart_code));
 
+#ifndef CONFIG_CPU_USE_DOMAINS
+	/* restore the vectors page permissions */
+	set_pte_ext(pte, entry, 0);
+	local_flush_tlb_kernel_page(vectors);
+#endif
+
 	flush_icache_range(vectors, vectors + PAGE_SIZE);
 	modify_domain(DOMAIN_USER, DOMAIN_CLIENT);
 }
diff --git a/arch/arm/lib/getuser.S b/arch/arm/lib/getuser.S
index a1814d9..acc966b 100644
--- a/arch/arm/lib/getuser.S
+++ b/arch/arm/lib/getuser.S
@@ -28,20 +28,21 @@
  */
 #include <linux/linkage.h>
 #include <asm/errno.h>
+#include <asm/domain.h>
 
 ENTRY(__get_user_1)
-1:	ldrbt	r2, [r0]
+1:	T(ldrb)	r2, [r0]
 	mov	r0, #0
 	mov	pc, lr
 ENDPROC(__get_user_1)
 
 ENTRY(__get_user_2)
 #ifdef CONFIG_THUMB2_KERNEL
-2:	ldrbt	r2, [r0]
-3:	ldrbt	r3, [r0, #1]
+2:	T(ldrb)	r2, [r0]
+3:	T(ldrb)	r3, [r0, #1]
 #else
-2:	ldrbt	r2, [r0], #1
-3:	ldrbt	r3, [r0]
+2:	T(ldrb)	r2, [r0], #1
+3:	T(ldrb)	r3, [r0]
 #endif
 #ifndef __ARMEB__
 	orr	r2, r2, r3, lsl #8
@@ -53,7 +54,7 @@ ENTRY(__get_user_2)
 ENDPROC(__get_user_2)
 
 ENTRY(__get_user_4)
-4:	ldrt	r2, [r0]
+4:	T(ldr)	r2, [r0]
 	mov	r0, #0
 	mov	pc, lr
 ENDPROC(__get_user_4)
diff --git a/arch/arm/lib/putuser.S b/arch/arm/lib/putuser.S
index 02fedbf..95b3fe8 100644
--- a/arch/arm/lib/putuser.S
+++ b/arch/arm/lib/putuser.S
@@ -28,9 +28,10 @@
  */
 #include <linux/linkage.h>
 #include <asm/errno.h>
+#include <asm/domain.h>
 
 ENTRY(__put_user_1)
-1:	strbt	r2, [r0]
+1:	T(strb)	r2, [r0]
 	mov	r0, #0
 	mov	pc, lr
 ENDPROC(__put_user_1)
@@ -39,19 +40,19 @@ ENTRY(__put_user_2)
 	mov	ip, r2, lsr #8
 #ifdef CONFIG_THUMB2_KERNEL
 #ifndef __ARMEB__
-2:	strbt	r2, [r0]
-3:	strbt	ip, [r0, #1]
+2:	T(strb)	r2, [r0]
+3:	T(strb)	ip, [r0, #1]
 #else
-2:	strbt	ip, [r0]
-3:	strbt	r2, [r0, #1]
+2:	T(strb)	ip, [r0]
+3:	T(strb)	r2, [r0, #1]
 #endif
 #else	/* !CONFIG_THUMB2_KERNEL */
 #ifndef __ARMEB__
-2:	strbt	r2, [r0], #1
-3:	strbt	ip, [r0]
+2:	T(strb)	r2, [r0], #1
+3:	T(strb)	ip, [r0]
 #else
-2:	strbt	ip, [r0], #1
-3:	strbt	r2, [r0]
+2:	T(strb)	ip, [r0], #1
+3:	T(strb)	r2, [r0]
 #endif
 #endif	/* CONFIG_THUMB2_KERNEL */
 	mov	r0, #0
@@ -59,18 +60,18 @@ ENTRY(__put_user_2)
 ENDPROC(__put_user_2)
 
 ENTRY(__put_user_4)
-4:	strt	r2, [r0]
+4:	T(str)	r2, [r0]
 	mov	r0, #0
 	mov	pc, lr
 ENDPROC(__put_user_4)
 
 ENTRY(__put_user_8)
 #ifdef CONFIG_THUMB2_KERNEL
-5:	strt	r2, [r0]
-6:	strt	r3, [r0, #4]
+5:	T(str)	r2, [r0]
+6:	T(str)	r3, [r0, #4]
 #else
-5:	strt	r2, [r0], #4
-6:	strt	r3, [r0]
+5:	T(str)	r2, [r0], #4
+6:	T(str)	r3, [r0]
 #endif
 	mov	r0, #0
 	mov	pc, lr
diff --git a/arch/arm/lib/uaccess.S b/arch/arm/lib/uaccess.S
index ffdd274..e47cdfd 100644
--- a/arch/arm/lib/uaccess.S
+++ b/arch/arm/lib/uaccess.S
@@ -14,6 +14,7 @@
 #include <linux/linkage.h>
 #include <asm/assembler.h>
 #include <asm/errno.h>
+#include <asm/domain.h>
 
 		.text
 
@@ -31,11 +32,11 @@
 		rsb	ip, ip, #4
 		cmp	ip, #2
 		ldrb	r3, [r1], #1
-USER(		strbt	r3, [r0], #1)			@ May fault
+USER(		T(strb)	r3, [r0], #1)			@ May fault
 		ldrgeb	r3, [r1], #1
-USER(		strgebt	r3, [r0], #1)			@ May fault
+USER(		T(strgeb) r3, [r0], #1)			@ May fault
 		ldrgtb	r3, [r1], #1
-USER(		strgtbt	r3, [r0], #1)			@ May fault
+USER(		T(strgtb) r3, [r0], #1)			@ May fault
 		sub	r2, r2, ip
 		b	.Lc2u_dest_aligned
 
@@ -58,7 +59,7 @@ ENTRY(__copy_to_user)
 		addmi	ip, r2, #4
 		bmi	.Lc2u_0nowords
 		ldr	r3, [r1], #4
-USER(		strt	r3, [r0], #4)			@ May fault
+USER(		T(str)	r3, [r0], #4)			@ May fault
 		mov	ip, r0, lsl #32 - PAGE_SHIFT	@ On each page, use a ld/st??t instruction
 		rsb	ip, ip, #0
 		movs	ip, ip, lsr #32 - PAGE_SHIFT
@@ -87,18 +88,18 @@ USER(		strt	r3, [r0], #4)			@ May fault
 		stmneia	r0!, {r3 - r4}			@ Shouldnt fault
 		tst	ip, #4
 		ldrne	r3, [r1], #4
-		strnet	r3, [r0], #4			@ Shouldnt fault
+		T(strne) r3, [r0], #4			@ Shouldnt fault
 		ands	ip, ip, #3
 		beq	.Lc2u_0fupi
 .Lc2u_0nowords:	teq	ip, #0
 		beq	.Lc2u_finished
 .Lc2u_nowords:	cmp	ip, #2
 		ldrb	r3, [r1], #1
-USER(		strbt	r3, [r0], #1)			@ May fault
+USER(		T(strb)	r3, [r0], #1)			@ May fault
 		ldrgeb	r3, [r1], #1
-USER(		strgebt	r3, [r0], #1)			@ May fault
+USER(		T(strgeb) r3, [r0], #1)			@ May fault
 		ldrgtb	r3, [r1], #1
-USER(		strgtbt	r3, [r0], #1)			@ May fault
+USER(		T(strgtb) r3, [r0], #1)			@ May fault
 		b	.Lc2u_finished
 
 .Lc2u_not_enough:
@@ -119,7 +120,7 @@ USER(		strgtbt	r3, [r0], #1)			@ May fault
 		mov	r3, r7, pull #8
 		ldr	r7, [r1], #4
 		orr	r3, r3, r7, push #24
-USER(		strt	r3, [r0], #4)			@ May fault
+USER(		T(str)	r3, [r0], #4)			@ May fault
 		mov	ip, r0, lsl #32 - PAGE_SHIFT
 		rsb	ip, ip, #0
 		movs	ip, ip, lsr #32 - PAGE_SHIFT
@@ -154,18 +155,18 @@ USER(		strt	r3, [r0], #4)			@ May fault
 		movne	r3, r7, pull #8
 		ldrne	r7, [r1], #4
 		orrne	r3, r3, r7, push #24
-		strnet	r3, [r0], #4			@ Shouldnt fault
+		T(strne) r3, [r0], #4			@ Shouldnt fault
 		ands	ip, ip, #3
 		beq	.Lc2u_1fupi
 .Lc2u_1nowords:	mov	r3, r7, get_byte_1
 		teq	ip, #0
 		beq	.Lc2u_finished
 		cmp	ip, #2
-USER(		strbt	r3, [r0], #1)			@ May fault
+USER(		T(strb)	r3, [r0], #1)			@ May fault
 		movge	r3, r7, get_byte_2
-USER(		strgebt	r3, [r0], #1)			@ May fault
+USER(		T(strgeb) r3, [r0], #1)			@ May fault
 		movgt	r3, r7, get_byte_3
-USER(		strgtbt	r3, [r0], #1)			@ May fault
+USER(		T(strgtb) r3, [r0], #1)			@ May fault
 		b	.Lc2u_finished
 
 .Lc2u_2fupi:	subs	r2, r2, #4
@@ -174,7 +175,7 @@ USER(		strgtbt	r3, [r0], #1)			@ May fault
 		mov	r3, r7, pull #16
 		ldr	r7, [r1], #4
 		orr	r3, r3, r7, push #16
-USER(		strt	r3, [r0], #4)			@ May fault
+USER(		T(str)	r3, [r0], #4)			@ May fault
 		mov	ip, r0, lsl #32 - PAGE_SHIFT
 		rsb	ip, ip, #0
 		movs	ip, ip, lsr #32 - PAGE_SHIFT
@@ -209,18 +210,18 @@ USER(		strt	r3, [r0], #4)			@ May fault
 		movne	r3, r7, pull #16
 		ldrne	r7, [r1], #4
 		orrne	r3, r3, r7, push #16
-		strnet	r3, [r0], #4			@ Shouldnt fault
+		T(strne) r3, [r0], #4			@ Shouldnt fault
 		ands	ip, ip, #3
 		beq	.Lc2u_2fupi
 .Lc2u_2nowords:	mov	r3, r7, get_byte_2
 		teq	ip, #0
 		beq	.Lc2u_finished
 		cmp	ip, #2
-USER(		strbt	r3, [r0], #1)			@ May fault
+USER(		T(strb)	r3, [r0], #1)			@ May fault
 		movge	r3, r7, get_byte_3
-USER(		strgebt	r3, [r0], #1)			@ May fault
+USER(		T(strgeb) r3, [r0], #1)			@ May fault
 		ldrgtb	r3, [r1], #0
-USER(		strgtbt	r3, [r0], #1)			@ May fault
+USER(		T(strgtb) r3, [r0], #1)			@ May fault
 		b	.Lc2u_finished
 
 .Lc2u_3fupi:	subs	r2, r2, #4
@@ -229,7 +230,7 @@ USER(		strgtbt	r3, [r0], #1)			@ May fault
 		mov	r3, r7, pull #24
 		ldr	r7, [r1], #4
 		orr	r3, r3, r7, push #8
-USER(		strt	r3, [r0], #4)			@ May fault
+USER(		T(str)	r3, [r0], #4)			@ May fault
 		mov	ip, r0, lsl #32 - PAGE_SHIFT
 		rsb	ip, ip, #0
 		movs	ip, ip, lsr #32 - PAGE_SHIFT
@@ -264,18 +265,18 @@ USER(		strt	r3, [r0], #4)			@ May fault
 		movne	r3, r7, pull #24
 		ldrne	r7, [r1], #4
 		orrne	r3, r3, r7, push #8
-		strnet	r3, [r0], #4			@ Shouldnt fault
+		T(strne) r3, [r0], #4			@ Shouldnt fault
 		ands	ip, ip, #3
 		beq	.Lc2u_3fupi
 .Lc2u_3nowords:	mov	r3, r7, get_byte_3
 		teq	ip, #0
 		beq	.Lc2u_finished
 		cmp	ip, #2
-USER(		strbt	r3, [r0], #1)			@ May fault
+USER(		T(strb)	r3, [r0], #1)			@ May fault
 		ldrgeb	r3, [r1], #1
-USER(		strgebt	r3, [r0], #1)			@ May fault
+USER(		T(strgeb) r3, [r0], #1)			@ May fault
 		ldrgtb	r3, [r1], #0
-USER(		strgtbt	r3, [r0], #1)			@ May fault
+USER(		T(strgtb) r3, [r0], #1)			@ May fault
 		b	.Lc2u_finished
 ENDPROC(__copy_to_user)
 
@@ -294,11 +295,11 @@ ENDPROC(__copy_to_user)
 .Lcfu_dest_not_aligned:
 		rsb	ip, ip, #4
 		cmp	ip, #2
-USER(		ldrbt	r3, [r1], #1)			@ May fault
+USER(		T(ldrb)	r3, [r1], #1)			@ May fault
 		strb	r3, [r0], #1
-USER(		ldrgebt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgeb) r3, [r1], #1)			@ May fault
 		strgeb	r3, [r0], #1
-USER(		ldrgtbt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgtb) r3, [r1], #1)			@ May fault
 		strgtb	r3, [r0], #1
 		sub	r2, r2, ip
 		b	.Lcfu_dest_aligned
@@ -321,7 +322,7 @@ ENTRY(__copy_from_user)
 .Lcfu_0fupi:	subs	r2, r2, #4
 		addmi	ip, r2, #4
 		bmi	.Lcfu_0nowords
-USER(		ldrt	r3, [r1], #4)
+USER(		T(ldr)	r3, [r1], #4)
 		str	r3, [r0], #4
 		mov	ip, r1, lsl #32 - PAGE_SHIFT	@ On each page, use a ld/st??t instruction
 		rsb	ip, ip, #0
@@ -350,18 +351,18 @@ USER(		ldrt	r3, [r1], #4)
 		ldmneia	r1!, {r3 - r4}			@ Shouldnt fault
 		stmneia	r0!, {r3 - r4}
 		tst	ip, #4
-		ldrnet	r3, [r1], #4			@ Shouldnt fault
+		T(ldrne) r3, [r1], #4			@ Shouldnt fault
 		strne	r3, [r0], #4
 		ands	ip, ip, #3
 		beq	.Lcfu_0fupi
 .Lcfu_0nowords:	teq	ip, #0
 		beq	.Lcfu_finished
 .Lcfu_nowords:	cmp	ip, #2
-USER(		ldrbt	r3, [r1], #1)			@ May fault
+USER(		T(ldrb)	r3, [r1], #1)			@ May fault
 		strb	r3, [r0], #1
-USER(		ldrgebt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgeb) r3, [r1], #1)			@ May fault
 		strgeb	r3, [r0], #1
-USER(		ldrgtbt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgtb) r3, [r1], #1)			@ May fault
 		strgtb	r3, [r0], #1
 		b	.Lcfu_finished
 
@@ -374,7 +375,7 @@ USER(		ldrgtbt	r3, [r1], #1)			@ May fault
 
 .Lcfu_src_not_aligned:
 		bic	r1, r1, #3
-USER(		ldrt	r7, [r1], #4)			@ May fault
+USER(		T(ldr)	r7, [r1], #4)			@ May fault
 		cmp	ip, #2
 		bgt	.Lcfu_3fupi
 		beq	.Lcfu_2fupi
@@ -382,7 +383,7 @@ USER(		ldrt	r7, [r1], #4)			@ May fault
 		addmi	ip, r2, #4
 		bmi	.Lcfu_1nowords
 		mov	r3, r7, pull #8
-USER(		ldrt	r7, [r1], #4)			@ May fault
+USER(		T(ldr)	r7, [r1], #4)			@ May fault
 		orr	r3, r3, r7, push #24
 		str	r3, [r0], #4
 		mov	ip, r1, lsl #32 - PAGE_SHIFT
@@ -417,7 +418,7 @@ USER(		ldrt	r7, [r1], #4)			@ May fault
 		stmneia	r0!, {r3 - r4}
 		tst	ip, #4
 		movne	r3, r7, pull #8
-USER(		ldrnet	r7, [r1], #4)			@ May fault
+USER(		T(ldrne) r7, [r1], #4)			@ May fault
 		orrne	r3, r3, r7, push #24
 		strne	r3, [r0], #4
 		ands	ip, ip, #3
@@ -437,7 +438,7 @@ USER(		ldrnet	r7, [r1], #4)			@ May fault
 		addmi	ip, r2, #4
 		bmi	.Lcfu_2nowords
 		mov	r3, r7, pull #16
-USER(		ldrt	r7, [r1], #4)			@ May fault
+USER(		T(ldr)	r7, [r1], #4)			@ May fault
 		orr	r3, r3, r7, push #16
 		str	r3, [r0], #4
 		mov	ip, r1, lsl #32 - PAGE_SHIFT
@@ -473,7 +474,7 @@ USER(		ldrt	r7, [r1], #4)			@ May fault
 		stmneia	r0!, {r3 - r4}
 		tst	ip, #4
 		movne	r3, r7, pull #16
-USER(		ldrnet	r7, [r1], #4)			@ May fault
+USER(		T(ldrne) r7, [r1], #4)			@ May fault
 		orrne	r3, r3, r7, push #16
 		strne	r3, [r0], #4
 		ands	ip, ip, #3
@@ -485,7 +486,7 @@ USER(		ldrnet	r7, [r1], #4)			@ May fault
 		strb	r3, [r0], #1
 		movge	r3, r7, get_byte_3
 		strgeb	r3, [r0], #1
-USER(		ldrgtbt	r3, [r1], #0)			@ May fault
+USER(		T(ldrgtb) r3, [r1], #0)			@ May fault
 		strgtb	r3, [r0], #1
 		b	.Lcfu_finished
 
@@ -493,7 +494,7 @@ USER(		ldrgtbt	r3, [r1], #0)			@ May fault
 		addmi	ip, r2, #4
 		bmi	.Lcfu_3nowords
 		mov	r3, r7, pull #24
-USER(		ldrt	r7, [r1], #4)			@ May fault
+USER(		T(ldr)	r7, [r1], #4)			@ May fault
 		orr	r3, r3, r7, push #8
 		str	r3, [r0], #4
 		mov	ip, r1, lsl #32 - PAGE_SHIFT
@@ -528,7 +529,7 @@ USER(		ldrt	r7, [r1], #4)			@ May fault
 		stmneia	r0!, {r3 - r4}
 		tst	ip, #4
 		movne	r3, r7, pull #24
-USER(		ldrnet	r7, [r1], #4)			@ May fault
+USER(		T(ldrne) r7, [r1], #4)			@ May fault
 		orrne	r3, r3, r7, push #8
 		strne	r3, [r0], #4
 		ands	ip, ip, #3
@@ -538,9 +539,9 @@ USER(		ldrnet	r7, [r1], #4)			@ May fault
 		beq	.Lcfu_finished
 		cmp	ip, #2
 		strb	r3, [r0], #1
-USER(		ldrgebt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgeb) r3, [r1], #1)			@ May fault
 		strgeb	r3, [r0], #1
-USER(		ldrgtbt	r3, [r1], #1)			@ May fault
+USER(		T(ldrgtb) r3, [r1], #1)			@ May fault
 		strgtb	r3, [r0], #1
 		b	.Lcfu_finished
 ENDPROC(__copy_from_user)
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 56aadfa..8bc0421 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -618,6 +618,15 @@ config CPU_CP15_MPU
 	help
 	  Processor has the CP15 register, which has MPU related registers.
 
+config CPU_USE_DOMAINS
+	bool
+	depends on MMU
+	default n if HAS_TLS_REG
+	default y
+	help
+	  This option enables or disable the use of domain switching
+	  via the set_fs() function.
+
 #
 # CPU supports 36-bit I/O
 #
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 568ccfc..f0fc850 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -223,8 +223,6 @@ __v7_setup:
 	mcr	p15, 0, r10, c2, c0, 2		@ TTB control register
 	orr	r4, r4, #TTB_FLAGS
 	mcr	p15, 0, r4, c2, c0, 1		@ load TTB1
-	mov	r10, #0x1f			@ domains 0, 1 = manager
-	mcr	p15, 0, r10, c3, c0, 0		@ load domain access register
 	/*
 	 * Memory region attributes with SCTLR.TRE=1
 	 *

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE
  2009-12-07 14:14 ` [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE Catalin Marinas
@ 2009-12-12 11:28   ` Russell King - ARM Linux
  2009-12-14 15:50     ` Catalin Marinas
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King - ARM Linux @ 2009-12-12 11:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 07, 2009 at 02:14:10PM +0000, Catalin Marinas wrote:
> This patch enables the Access Flag in SCTLR and, together with the TEX
> remapping, allows the use of the spare bits in the page table entry thus
> removing the Linux specific PTEs. The simplified permission model is
> used which means that "kernel read/write, user read-only" is no longer
> available. This was used for the vectors page but with a dedicated TLS
> register it is no longer necessary.

I really do not want to go here without an explaination of how situations
such as:

- Kernel reads PTE and modifies it
- Hardware accesses page
-  TLB reads PTE, updates, and writes new back
- Kernel writes PTE back
- Kernel cleans cache line

are handled.  What about SMP, where CPU0 may access and modify the active
page tables on CPU1 (eg, clearing PTEs)?

Are TLB accesses with AFE enabled guaranteed to read from the L1 cache?
If not, we need to clean _and_ invalidate PTE updates.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems
  2009-12-07 14:13 ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems Catalin Marinas
@ 2009-12-12 12:24   ` Russell King - ARM Linux
  2009-12-12 13:57     ` Russell King - ARM Linux
  2009-12-14 12:15     ` Catalin Marinas
  0 siblings, 2 replies; 20+ messages in thread
From: Russell King - ARM Linux @ 2009-12-12 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 07, 2009 at 02:13:34PM +0000, Catalin Marinas wrote:
> The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu
> different from the current one, causing a race condition with both the
> THREAD_NOTIFY_SWITCH path and vfp_support_entry().

The only call where thread->cpu may not be the current CPU is in the
THREAD_NOFTIFY_RELEASE case.

When called in the THREAD_NOTIFY_SWITCH case, we are switching to the
specified thread, and thread->cpu better be smp_processor_id() or else
we're saving our CPUs VFP state into some other CPUs currently running
thread.

Not only that, but the thread we're switching away from will still be
'owned' by the CPU we're running on, and can't be scheduled onto another
CPU without this function first completing, nor can it be flushed nor
released.

> @@ -49,14 +50,21 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v)
>  
>  #ifdef CONFIG_SMP

		BUG_ON(cpu != smp_processor_id());

since it would be very bad if this was any different.  Note that this is
also a non-preemptible context - we're called from the scheduler, and the
scheduler can't be preempted mid-thead switch.

>  		/*
> +		 * RCU locking is needed in case last_VFP_context[cpu] is
> +		 * released on a different CPU.
> +		 */
> +		rcu_read_lock();

Given that we're modifying our CPUs last_VFP_context here, I don't see
what the RCU locks give us - the thread we're switching to can _not_
be being released at this time - we can't be switching to a dead task.
Not only that, but this notifier is already called under the RCU lock,
so this is a no-op.

> +		vfp = last_VFP_context[cpu];
> +		/*
>  		 * On SMP, if VFP is enabled, save the old state in
>  		 * case the thread migrates to a different CPU. The
>  		 * restoring is done lazily.
>  		 */
> -		if ((fpexc & FPEXC_EN) && last_VFP_context[cpu]) {
> -			vfp_save_state(last_VFP_context[cpu], fpexc);
> -			last_VFP_context[cpu]->hard.cpu = cpu;
> +		if ((fpexc & FPEXC_EN) && vfp) {
> +			vfp_save_state(vfp, fpexc);
> +			vfp->hard.cpu = cpu;
>  		}
> +		rcu_read_unlock();
>  		/*
>  		 * Thread migration, just force the reloading of the
>  		 * state on the new CPU in case the VFP registers
> @@ -91,8 +99,19 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v)
>  	}
>  
>  	/* flush and release case: Per-thread VFP cleanup. */
> +#ifndef CONFIG_SMP
>  	if (last_VFP_context[cpu] == vfp)
>  		last_VFP_context[cpu] = NULL;
> +#else
> +	/*
> +	 * Since release_thread() may be called from a different CPU, we use
> +	 * cmpxchg() here to avoid a race with the vfp_support_entry() code
> +	 * which modifies last_VFP_context[cpu]. Note that on SMP systems, a
> +	 * STR instruction on a different CPU clears the global exclusive
> +	 * monitor state.
> +	 */
> +	(void)cmpxchg(&last_VFP_context[cpu], vfp, NULL);
> +#endif

I think this hunk is the only part which makes sense.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems
  2009-12-12 12:24   ` Russell King - ARM Linux
@ 2009-12-12 13:57     ` Russell King - ARM Linux
  2009-12-14 12:21       ` Catalin Marinas
  2009-12-14 12:15     ` Catalin Marinas
  1 sibling, 1 reply; 20+ messages in thread
From: Russell King - ARM Linux @ 2009-12-12 13:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Dec 12, 2009 at 12:24:47PM +0000, Russell King - ARM Linux wrote:
> On Mon, Dec 07, 2009 at 02:13:34PM +0000, Catalin Marinas wrote:
> > The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu
> > different from the current one, causing a race condition with both the
> > THREAD_NOTIFY_SWITCH path and vfp_support_entry().
> 
> The only call where thread->cpu may not be the current CPU is in the
> THREAD_NOFTIFY_RELEASE case.
> 
> When called in the THREAD_NOTIFY_SWITCH case, we are switching to the
> specified thread, and thread->cpu better be smp_processor_id() or else
> we're saving our CPUs VFP state into some other CPUs currently running
> thread.
> 
> Not only that, but the thread we're switching away from will still be
> 'owned' by the CPU we're running on, and can't be scheduled onto another
> CPU without this function first completing, nor can it be flushed nor
> released.

Here's a patch which adds this documentation, and fixes the
THREAD_NOTIFY_FLUSH case - since that could be preempted.

diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 2d7423a..aed05bc 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -38,16 +38,72 @@ union vfp_state *last_VFP_context[NR_CPUS];
  */
 unsigned int VFP_arch;
 
+/*
+ * Per-thread VFP initialization.
+ */
+static void vfp_thread_flush(struct thread_info *thread)
+{
+	union vfp_state *vfp = &thread->vfpstate;
+	unsigned int cpu;
+
+	memset(vfp, 0, sizeof(union vfp_state));
+
+	vfp->hard.fpexc = FPEXC_EN;
+	vfp->hard.fpscr = FPSCR_ROUND_NEAREST;
+
+	/*
+	 * Disable VFP to ensure we initialize it first.  We must ensure
+	 * that the modification of last_VFP_context[] and hardware disable
+	 * are done for the same CPU and without preemption.
+	 */
+	cpu = get_cpu();
+	if (last_VFP_context[cpu] == vfp)
+		last_VFP_context[cpu] = NULL;
+	fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
+	put_cpu();
+}
+
+static void vfp_thread_release(struct thread_info *thread)
+{
+	/* release case: Per-thread VFP cleanup. */
+	union vfp_state *vfp = &thread->vfpstate;
+	unsigned int cpu = thread->cpu;
+
+	if (last_VFP_context[cpu] == vfp)
+		last_VFP_context[cpu] = NULL;
+}
+
+/*
+ * When this function is called with the following 'cmd's, the following
+ * is true while this function is being run:
+ *  THREAD_NOFTIFY_SWTICH:
+ *   - the previously running thread will not be scheduled onto another CPU.
+ *   - the next thread to be run (v) will not be running on another CPU.
+ *   - thread->cpu is the local CPU number
+ *   - not preemptible as we're called in the middle of a thread switch
+ *  THREAD_NOTIFY_FLUSH:
+ *   - the thread (v) will be running on the local CPU, so
+ *	v === current_thread_info()
+ *   - thread->cpu is the local CPU number at the time it is accessed,
+ *	but may change at any time.
+ *   - we could be preempted if tree preempt rcu is enabled, so
+ *	it is unsafe to use thread->cpu.
+ *  THREAD_NOTIFY_RELEASE:
+ *   - the thread (v) will not be running on any CPU; it is a dead thread.
+ *   - thread->cpu will be the last CPU the thread ran on, which may not
+ *	be the current CPU.
+ *   - we could be preempted if tree preempt rcu is enabled.
+ */
 static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v)
 {
 	struct thread_info *thread = v;
-	union vfp_state *vfp;
-	__u32 cpu = thread->cpu;
 
 	if (likely(cmd == THREAD_NOTIFY_SWITCH)) {
 		u32 fpexc = fmrx(FPEXC);
 
 #ifdef CONFIG_SMP
+		unsigned int cpu = thread->cpu;
+
 		/*
 		 * On SMP, if VFP is enabled, save the old state in
 		 * case the thread migrates to a different CPU. The
@@ -74,25 +130,10 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v)
 		return NOTIFY_DONE;
 	}
 
-	vfp = &thread->vfpstate;
-	if (cmd == THREAD_NOTIFY_FLUSH) {
-		/*
-		 * Per-thread VFP initialisation.
-		 */
-		memset(vfp, 0, sizeof(union vfp_state));
-
-		vfp->hard.fpexc = FPEXC_EN;
-		vfp->hard.fpscr = FPSCR_ROUND_NEAREST;
-
-		/*
-		 * Disable VFP to ensure we initialise it first.
-		 */
-		fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
-	}
-
-	/* flush and release case: Per-thread VFP cleanup. */
-	if (last_VFP_context[cpu] == vfp)
-		last_VFP_context[cpu] = NULL;
+	if (cmd == THREAD_NOTIFY_FLUSH)
+		vfp_thread_flush(thread);
+	else
+		vfp_thread_release(thread);
 
 	return NOTIFY_DONE;
 }

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems
  2009-12-12 12:24   ` Russell King - ARM Linux
  2009-12-12 13:57     ` Russell King - ARM Linux
@ 2009-12-14 12:15     ` Catalin Marinas
  2009-12-14 16:28       ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMPsystems Catalin Marinas
  1 sibling, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2009-12-14 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2009-12-12 at 12:24 +0000, Russell King - ARM Linux wrote:
> On Mon, Dec 07, 2009 at 02:13:34PM +0000, Catalin Marinas wrote:
> > The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu
> > different from the current one, causing a race condition with both the
> > THREAD_NOTIFY_SWITCH path and vfp_support_entry().
> 
> The only call where thread->cpu may not be the current CPU is in the
> THREAD_NOFTIFY_RELEASE case.

Correct.

> When called in the THREAD_NOTIFY_SWITCH case, we are switching to the
> specified thread, and thread->cpu better be smp_processor_id() or else
> we're saving our CPUs VFP state into some other CPUs currently running
> thread.

Also correct.

> Not only that, but the thread we're switching away from will still be
> 'owned' by the CPU we're running on, and can't be scheduled onto another
> CPU without this function first completing, nor can it be flushed nor
> released.

Correct but see below.

> >  		/*
> > +		 * RCU locking is needed in case last_VFP_context[cpu] is
> > +		 * released on a different CPU.
> > +		 */
> > +		rcu_read_lock();
> 
> Given that we're modifying our CPUs last_VFP_context here, I don't see
> what the RCU locks give us - the thread we're switching to can _not_
> be being released at this time - we can't be switching to a dead task.
> Not only that, but this notifier is already called under the RCU lock,
> so this is a no-op.

With the current implementation, the last_CPU_context is set when the
CPU takes an undef for a VFP instruction and not during every context
switch, so last_VFP_context may *not* point to the task we are switching
away from. The RCU locking was added to prevent the task structure (and
thread_info) pointed to by last_VFP_context from being freed while
executing the switch between two tasks other than the one released.

If the region is RCU locked anyway, we can simply add a comment but
unless we change how last_CPU_context is set, we still need this check.

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems
  2009-12-12 13:57     ` Russell King - ARM Linux
@ 2009-12-14 12:21       ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2009-12-14 12:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2009-12-12 at 13:57 +0000, Russell King - ARM Linux wrote:
> On Sat, Dec 12, 2009 at 12:24:47PM +0000, Russell King - ARM Linux wrote:
> > On Mon, Dec 07, 2009 at 02:13:34PM +0000, Catalin Marinas wrote:
> > > The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu
> > > different from the current one, causing a race condition with both the
> > > THREAD_NOTIFY_SWITCH path and vfp_support_entry().
> > 
> > The only call where thread->cpu may not be the current CPU is in the
> > THREAD_NOFTIFY_RELEASE case.
> > 
> > When called in the THREAD_NOTIFY_SWITCH case, we are switching to the
> > specified thread, and thread->cpu better be smp_processor_id() or else
> > we're saving our CPUs VFP state into some other CPUs currently running
> > thread.
> > 
> > Not only that, but the thread we're switching away from will still be
> > 'owned' by the CPU we're running on, and can't be scheduled onto another
> > CPU without this function first completing, nor can it be flushed nor
> > released.
> 
> Here's a patch which adds this documentation, and fixes the
> THREAD_NOTIFY_FLUSH case - since that could be preempted.
[...]
> + *  THREAD_NOFTIFY_SWTICH:
> + *   - the previously running thread will not be scheduled onto another CPU.

While this comment is certainly true, I don't think it is relevant since
we aren't always switching the VFP context from the thread being
switched out but it may be a thread that ran much earlier.

Otherwise the patch is fine.

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE
  2009-12-12 11:28   ` Russell King - ARM Linux
@ 2009-12-14 15:50     ` Catalin Marinas
  2009-12-14 15:58       ` Catalin Marinas
  2009-12-14 16:11       ` Russell King - ARM Linux
  0 siblings, 2 replies; 20+ messages in thread
From: Catalin Marinas @ 2009-12-14 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2009-12-12 at 11:28 +0000, Russell King - ARM Linux wrote:
> On Mon, Dec 07, 2009 at 02:14:10PM +0000, Catalin Marinas wrote:
> > This patch enables the Access Flag in SCTLR and, together with the TEX
> > remapping, allows the use of the spare bits in the page table entry thus
> > removing the Linux specific PTEs. The simplified permission model is
> > used which means that "kernel read/write, user read-only" is no longer
> > available. This was used for the vectors page but with a dedicated TLS
> > register it is no longer necessary.
> 
> I really do not want to go here without an explaination of how situations
> such as:

I think we discussed some of these when I first posted the patch some
time ago but I'm happy do it again here.

BTW, all the hardware implementations I'm aware of only raise an access
flag fault when this bit is cleared rather than doing it in hardware.

But even if they do it in hardware, it can still work (you also have the
option of disabling the hardware management via the SCTLR.HA bit).

> - Kernel reads PTE and modifies it

B3.3.5 in the ARM ARM describes the requirements for the Hardware
management of the access flag:

        Any implementation of hardware management of the access flag
        must ensure that any software changes to the translation table
        are not lost. The architecture does not require software that
        performs translation table changes to use interlocked
        operations. The hardware management mechanisms for the access
        flag must prevent any loss of data written to translation table
        entries that might occur when, for example, a write by another
        processor occurs between the read and write phases of a
        translation table walk that updates the
        access flag.

At the hardware level, it could be implemented similar to a LDREX/STREX
block.

> - Hardware accesses page
> -  TLB reads PTE, updates, and writes new back
> - Kernel writes PTE back

Addressed above. The hardware write should fail if there was an STR from
the current or different CPU.

> - Kernel cleans cache line

A hardware implementation of the AF would probably require the PTWs via
L1 (Cortex-A9 has such PTWs) otherwise it breaks the requirements. It is
mandatory to have the same TTBR cacheability settings as the mapping of
the page tables (i.e. Normal cached in the Linux case) so we don't need
further kernel modifications.

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE
  2009-12-14 15:50     ` Catalin Marinas
@ 2009-12-14 15:58       ` Catalin Marinas
  2009-12-14 16:11       ` Russell King - ARM Linux
  1 sibling, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2009-12-14 15:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-12-14 at 15:50 +0000, Catalin Marinas wrote:
> On Sat, 2009-12-12 at 11:28 +0000, Russell King - ARM Linux wrote:
> > On Mon, Dec 07, 2009 at 02:14:10PM +0000, Catalin Marinas wrote:
> > > This patch enables the Access Flag in SCTLR and, together with the TEX
> > > remapping, allows the use of the spare bits in the page table entry thus
> > > removing the Linux specific PTEs. The simplified permission model is
> > > used which means that "kernel read/write, user read-only" is no longer
> > > available. This was used for the vectors page but with a dedicated TLS
> > > register it is no longer necessary.
> > 
> > I really do not want to go here without an explaination of how situations
> > such as:
> 
> I think we discussed some of these when I first posted the patch some
> time ago but I'm happy do it again here.
> 
> BTW, all the hardware implementations I'm aware of only raise an access
> flag fault when this bit is cleared rather than doing it in hardware.
> 
> But even if they do it in hardware, it can still work (you also have the
> option of disabling the hardware management via the SCTLR.HA bit).
> 
> > - Kernel reads PTE and modifies it
> 
> B3.3.5 in the ARM ARM describes the requirements for the Hardware
> management of the access flag:
> 
>         Any implementation of hardware management of the access flag
>         must ensure that any software changes to the translation table
>         are not lost. The architecture does not require software that
>         performs translation table changes to use interlocked
>         operations. The hardware management mechanisms for the access
>         flag must prevent any loss of data written to translation table
>         entries that might occur when, for example, a write by another
>         processor occurs between the read and write phases of a
>         translation table walk that updates the
>         access flag.
> 
> At the hardware level, it could be implemented similar to a LDREX/STREX
> block.

Just avoid a question on this - it is possible for the kernel to read a
PTE with AP[0] cleared, the hardware could set AP[0] to 1 as a result of
an access then the kernel clears it again when storing the modified PTE.

The above cannot be prevented since the PTE modifications are not atomic
but it doesn't actually matter. In the worst case, the kernel would
think that a page wasn't accessed for a longer time and it may decide to
swap it out. I doubt this would be a performance hit since trapping the
access faults takes much more time.

If precise access timing is required (not sure why), you can always
disable SCTLR.HA and handle the accesses in software.

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE
  2009-12-14 15:50     ` Catalin Marinas
  2009-12-14 15:58       ` Catalin Marinas
@ 2009-12-14 16:11       ` Russell King - ARM Linux
  2009-12-14 16:16         ` Catalin Marinas
  1 sibling, 1 reply; 20+ messages in thread
From: Russell King - ARM Linux @ 2009-12-14 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 14, 2009 at 03:50:24PM +0000, Catalin Marinas wrote:
> > - Kernel reads PTE and modifies it
> 
> B3.3.5 in the ARM ARM describes the requirements for the Hardware
> management of the access flag:
> 
>         Any implementation of hardware management of the access flag
>         must ensure that any software changes to the translation table
>         are not lost. The architecture does not require software that
>         performs translation table changes to use interlocked
>         operations. The hardware management mechanisms for the access
>         flag must prevent any loss of data written to translation table
>         entries that might occur when, for example, a write by another
>         processor occurs between the read and write phases of a
>         translation table walk that updates the
>         access flag.
> 
> At the hardware level, it could be implemented similar to a LDREX/STREX
> block.
> 
> > - Hardware accesses page
> > -  TLB reads PTE, updates, and writes new back
> > - Kernel writes PTE back
> 
> Addressed above. The hardware write should fail if there was an STR from
> the current or different CPU.

I don't think it is - the paragraph you quote talks about the following
situation:

- Hardware reads PTE
- Kernel writes PTE
- Hardware (tries to) write PTE

What it says is that the hardware write in this case must fail.

The case I was talking about is:

- Kernel reads PTE
- Hardware reads PTE
- Hardware writes PTE
- Kernel writes PTE

Since there is no STR between the hardware reading and writing the PTE,
the hardware can not know that its update has been lost.

Whether it matters or not is a different kettle of fish.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE
  2009-12-14 16:11       ` Russell King - ARM Linux
@ 2009-12-14 16:16         ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2009-12-14 16:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-12-14 at 16:11 +0000, Russell King - ARM Linux wrote:
> On Mon, Dec 14, 2009 at 03:50:24PM +0000, Catalin Marinas wrote:
> > > - Kernel reads PTE and modifies it
> >
> > B3.3.5 in the ARM ARM describes the requirements for the Hardware
> > management of the access flag:
> >
> >         Any implementation of hardware management of the access flag
> >         must ensure that any software changes to the translation table
> >         are not lost. The architecture does not require software that
> >         performs translation table changes to use interlocked
> >         operations. The hardware management mechanisms for the access
> >         flag must prevent any loss of data written to translation table
> >         entries that might occur when, for example, a write by another
> >         processor occurs between the read and write phases of a
> >         translation table walk that updates the
> >         access flag.
> >
> > At the hardware level, it could be implemented similar to a LDREX/STREX
> > block.
> >
> > > - Hardware accesses page
> > > -  TLB reads PTE, updates, and writes new back
> > > - Kernel writes PTE back
> >
> > Addressed above. The hardware write should fail if there was an STR from
> > the current or different CPU.
[...]
> The case I was talking about is:
> 
> - Kernel reads PTE
> - Hardware reads PTE
> - Hardware writes PTE
> - Kernel writes PTE
> 
> Since there is no STR between the hardware reading and writing the PTE,
> the hardware can not know that its update has been lost.
> 
> Whether it matters or not is a different kettle of fish.

I was expected this follow-up :-), so I already replied to my post. I
don't think it matters.

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/6] Fix a race in the vfp_notifier() function on SMPsystems
  2009-12-14 12:15     ` Catalin Marinas
@ 2009-12-14 16:28       ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2009-12-14 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2009-12-14 at 12:15 +0000, Catalin Marinas wrote:
> On Sat, 2009-12-12 at 12:24 +0000, Russell King - ARM Linux wrote:
> > On Mon, Dec 07, 2009 at 02:13:34PM +0000, Catalin Marinas wrote:
> > >             /*
> > > +            * RCU locking is needed in case last_VFP_context[cpu] is
> > > +            * released on a different CPU.
> > > +            */
> > > +           rcu_read_lock();
[...]
> > Not only that, but this notifier is already called under the RCU lock,
> > so this is a no-op.
[...]
> If the region is RCU locked anyway, we can simply add a comment but
> unless we change how last_CPU_context is set, we still need this check.

I updated patch below which doesn't take the RCU lock but adds a
comment.

Apart from the last hunk with which you are OK, the patch makes sure
that last_VFP_context[cpu] is only read once and stored to a local
variable otherwise you may have the surprise that it becomes NULL if the
thread that was using it is released on a different CPU (that's actually
the failure we were getting under stress testing).


Fix a race in the vfp_notifier() function on SMP systems

From: Catalin Marinas <catalin.marinas@arm.com>

The vfp_notifier(THREAD_NOTIFY_RELEASE) maybe be called with thread->cpu
different from the current one, causing a race condition with both the
THREAD_NOTIFY_SWITCH path and vfp_support_entry().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/vfp/vfpmodule.c |   26 +++++++++++++++++++++++---
 1 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 2d7423a..8d1fe44 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -14,6 +14,7 @@
 #include <linux/signal.h>
 #include <linux/sched.h>
 #include <linux/init.h>
+#include <linux/rcupdate.h>
 
 #include <asm/thread_notify.h>
 #include <asm/vfp.h>
@@ -49,14 +50,22 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v)
 
 #ifdef CONFIG_SMP
 		/*
+		 * The vfpstate structure pointed to by last_VFP_context[cpu]
+		 * may be released via call_rcu(delayed_put_task_struct) but
+		 * atomic_notifier_call_chain() already holds the RCU lock.
+		 */
+		vfp = last_VFP_context[cpu];
+
+		/*
 		 * On SMP, if VFP is enabled, save the old state in
 		 * case the thread migrates to a different CPU. The
 		 * restoring is done lazily.
 		 */
-		if ((fpexc & FPEXC_EN) && last_VFP_context[cpu]) {
-			vfp_save_state(last_VFP_context[cpu], fpexc);
-			last_VFP_context[cpu]->hard.cpu = cpu;
+		if ((fpexc & FPEXC_EN) && vfp) {
+			vfp_save_state(vfp, fpexc);
+			vfp->hard.cpu = cpu;
 		}
+
 		/*
 		 * Thread migration, just force the reloading of the
 		 * state on the new CPU in case the VFP registers
@@ -91,8 +100,19 @@ static int vfp_notifier(struct notifier_block *self, unsigned long cmd, void *v)
 	}
 
 	/* flush and release case: Per-thread VFP cleanup. */
+#ifndef CONFIG_SMP
 	if (last_VFP_context[cpu] == vfp)
 		last_VFP_context[cpu] = NULL;
+#else
+	/*
+	 * Since release_thread() may be called from a different CPU, we use
+	 * cmpxchg() here to avoid a race with the vfp_support_entry() code
+	 * which modifies last_VFP_context[cpu]. Note that on SMP systems, a
+	 * STR instruction on a different CPU clears the global exclusive
+	 * monitor state.
+	 */
+	(void)cmpxchg(&last_VFP_context[cpu], vfp, NULL);
+#endif
 
 	return NOTIFY_DONE;
 }

-- 
Catalin

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcastscache operations
  2009-12-07 14:13 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcasts cache operations Catalin Marinas
@ 2010-03-08 16:25   ` Catalin Marinas
  2010-03-08 16:31     ` Russell King - ARM Linux
  0 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2010-03-08 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Mon, 2009-12-07 at 14:13 +0000, Catalin Marinas wrote:
> ARMv7 processors like Cortex-A9 broadcast the cache maintenance
> operations in hardware. The patch adds the CPU ID checks for such
> feature and allows the flush_dcache_page/update_mmu_cache pair to work
> in lazy flushing mode similar to the UP case.

It looks like I haven't got a final ok from you on this patch (I had the
impression that it's in the patch system already but rebased my patches
and found that it's not in mainline).

Are you ok with it (quoting it below)?

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm/include/asm/smp_plat.h |    9 +++++++++
>  arch/arm/mm/fault-armv.c        |    2 --
>  arch/arm/mm/flush.c             |    9 ++++-----
>  3 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/include/asm/smp_plat.h
> b/arch/arm/include/asm/smp_plat.h
> index 59303e2..e587167 100644
> --- a/arch/arm/include/asm/smp_plat.h
> +++ b/arch/arm/include/asm/smp_plat.h
> @@ -13,4 +13,13 @@ static inline int tlb_ops_need_broadcast(void)
>         return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 2;
>  }
> 
> +#ifndef CONFIG_SMP
> +#define cache_ops_need_broadcast()     0
> +#else
> +static inline int cache_ops_need_broadcast(void)
> +{
> +       return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 1;
> +}
> +#endif
> +
>  #endif
> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> index d0d17b6..bb60117 100644
> --- a/arch/arm/mm/fault-armv.c
> +++ b/arch/arm/mm/fault-armv.c
> @@ -153,10 +153,8 @@ void update_mmu_cache(struct vm_area_struct *vma,
> unsigned long addr, pte_t pte)
> 
>         page = pfn_to_page(pfn);
>         mapping = page_mapping(page);
> -#ifndef CONFIG_SMP
>         if (test_and_clear_bit(PG_dcache_dirty, &page->flags))
>                 __flush_dcache_page(mapping, page);
> -#endif
>         if (mapping) {
>                 if (cache_is_vivt())
>                         make_coherent(mapping, vma, addr, pfn);
> diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
> index 7f294f3..2d3325d 100644
> --- a/arch/arm/mm/flush.c
> +++ b/arch/arm/mm/flush.c
> @@ -15,6 +15,7 @@
>  #include <asm/cachetype.h>
>  #include <asm/system.h>
>  #include <asm/tlbflush.h>
> +#include <asm/smp_plat.h>
> 
>  #include "mm.h"
> 
> @@ -198,12 +199,10 @@ void flush_dcache_page(struct page *page)
>  {
>         struct address_space *mapping = page_mapping(page);
> 
> -#ifndef CONFIG_SMP
> -       if (!PageHighMem(page) && mapping && !mapping_mapped(mapping))
> +       if (!cache_ops_need_broadcast() &&
> +           !PageHighMem(page) && mapping && !mapping_mapped(mapping))
>                 set_bit(PG_dcache_dirty, &page->flags);
> -       else
> -#endif
> -       {
> +       else {
>                 __flush_dcache_page(mapping, page);
>                 if (mapping && cache_is_vivt())
>                         __flush_dcache_aliases(mapping, page);


-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcastscache operations
  2010-03-08 16:25   ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcastscache operations Catalin Marinas
@ 2010-03-08 16:31     ` Russell King - ARM Linux
  2010-03-08 16:38       ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardwarebroadcastscache operations Catalin Marinas
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King - ARM Linux @ 2010-03-08 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 08, 2010 at 04:25:01PM +0000, Catalin Marinas wrote:
> Hi Russell,
> 
> On Mon, 2009-12-07 at 14:13 +0000, Catalin Marinas wrote:
> > ARMv7 processors like Cortex-A9 broadcast the cache maintenance
> > operations in hardware. The patch adds the CPU ID checks for such
> > feature and allows the flush_dcache_page/update_mmu_cache pair to work
> > in lazy flushing mode similar to the UP case.
> 
> It looks like I haven't got a final ok from you on this patch (I had the
> impression that it's in the patch system already but rebased my patches
> and found that it's not in mainline).

It needs to be updated - we have a cache_ops_need_broadcast() in
smp_plat.h now for the ptrace issues.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/6] ARMv7: Use lazy cache flushing if hardwarebroadcastscache operations
  2010-03-08 16:31     ` Russell King - ARM Linux
@ 2010-03-08 16:38       ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2010-03-08 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2010-03-08 at 16:31 +0000, Russell King - ARM Linux wrote:
> On Mon, Mar 08, 2010 at 04:25:01PM +0000, Catalin Marinas wrote:
> > Hi Russell,
> >
> > On Mon, 2009-12-07 at 14:13 +0000, Catalin Marinas wrote:
> > > ARMv7 processors like Cortex-A9 broadcast the cache maintenance
> > > operations in hardware. The patch adds the CPU ID checks for such
> > > feature and allows the flush_dcache_page/update_mmu_cache pair to work
> > > in lazy flushing mode similar to the UP case.
> >
> > It looks like I haven't got a final ok from you on this patch (I had the
> > impression that it's in the patch system already but rebased my patches
> > and found that it's not in mainline).
> 
> It needs to be updated - we have a cache_ops_need_broadcast() in
> smp_plat.h now for the ptrace issues.

I noticed that when rebasing. Here's the updated patch:

ARMv7: Use lazy cache flushing if hardware broadcasts cache operations

From: Catalin Marinas <catalin.marinas@arm.com>

ARMv7 processors like Cortex-A9 broadcast the cache maintenance
operations in hardware. The patch adds the CPU ID checks for such
feature and allows the flush_dcache_page/update_mmu_cache pair to work
in lazy flushing mode similar to the UP case.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/smp_plat.h |    4 ++++
 arch/arm/mm/fault-armv.c        |    2 --
 arch/arm/mm/flush.c             |    9 ++++-----
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h
index e621530..963a338 100644
--- a/arch/arm/include/asm/smp_plat.h
+++ b/arch/arm/include/asm/smp_plat.h
@@ -13,9 +13,13 @@ static inline int tlb_ops_need_broadcast(void)
 	return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 2;
 }
 
+#if !defined(CONFIG_SMP) || __LINUX_ARM_ARCH__ >= 7
+#define cache_ops_need_broadcast()	0
+#else
 static inline int cache_ops_need_broadcast(void)
 {
 	return ((read_cpuid_ext(CPUID_EXT_MMFR3) >> 12) & 0xf) < 1;
 }
+#endif
 
 #endif
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index c9b97e9..0866ffd 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -169,10 +169,8 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
 		return;
 
 	mapping = page_mapping(page);
-#ifndef CONFIG_SMP
 	if (test_and_clear_bit(PG_dcache_dirty, &page->flags))
 		__flush_dcache_page(mapping, page);
-#endif
 	if (mapping) {
 		if (cache_is_vivt())
 			make_coherent(mapping, vma, addr, ptep, pfn);
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index e34f095..c2cea53 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -16,6 +16,7 @@
 #include <asm/smp_plat.h>
 #include <asm/system.h>
 #include <asm/tlbflush.h>
+#include <asm/smp_plat.h>
 
 #include "mm.h"
 
@@ -241,12 +242,10 @@ void flush_dcache_page(struct page *page)
 
 	mapping = page_mapping(page);
 
-#ifndef CONFIG_SMP
-	if (!PageHighMem(page) && mapping && !mapping_mapped(mapping))
+	if (!cache_ops_need_broadcast() &&
+	    !PageHighMem(page) && mapping && !mapping_mapped(mapping))
 		set_bit(PG_dcache_dirty, &page->flags);
-	else
-#endif
-	{
+	else {
 		__flush_dcache_page(mapping, page);
 		if (mapping && cache_is_vivt())
 			__flush_dcache_aliases(mapping, page);


-- 
Catalin

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-03-08 16:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-07 14:10 [PATCH 0/6] Bug-fixes and new features for 2.6.34-rc1 Catalin Marinas
2009-12-07 14:10 ` [PATCH 1/6] Global ASID allocation on SMP Catalin Marinas
2009-12-07 14:13 ` [PATCH 2/6] Broadcast the DMA cache operations on ARMv6 SMP hardware Catalin Marinas
2009-12-07 14:13 ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMP systems Catalin Marinas
2009-12-12 12:24   ` Russell King - ARM Linux
2009-12-12 13:57     ` Russell King - ARM Linux
2009-12-14 12:21       ` Catalin Marinas
2009-12-14 12:15     ` Catalin Marinas
2009-12-14 16:28       ` [PATCH 3/6] Fix a race in the vfp_notifier() function on SMPsystems Catalin Marinas
2009-12-07 14:13 ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcasts cache operations Catalin Marinas
2010-03-08 16:25   ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardware broadcastscache operations Catalin Marinas
2010-03-08 16:31     ` Russell King - ARM Linux
2010-03-08 16:38       ` [PATCH 4/6] ARMv7: Use lazy cache flushing if hardwarebroadcastscache operations Catalin Marinas
2009-12-07 14:14 ` [PATCH 5/6] ARMv7: Improved page table format with TRE and AFE Catalin Marinas
2009-12-12 11:28   ` Russell King - ARM Linux
2009-12-14 15:50     ` Catalin Marinas
2009-12-14 15:58       ` Catalin Marinas
2009-12-14 16:11       ` Russell King - ARM Linux
2009-12-14 16:16         ` Catalin Marinas
2009-12-07 14:16 ` [PATCH 6/6] Remove the domain switching on ARMv6k/v7 CPUs Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).