[PATCH 0/3] New algorithm for ASID allocation and rollover

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] New algorithm for ASID allocation and rollover
@ 2012-08-15 16:53 Will Deacon
  2012-08-15 16:54 ` [PATCH 1/3] ARM: mm: remove IPI broadcasting on ASID rollover Will Deacon
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Will Deacon @ 2012-08-15 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

Following some investigation into preempt-rt Linux, it became apparent
that ASID rollover can happen fairly regularly under certain heavy
scheduling workloads. Each time this happens, we broadcast an interrupt
to the secondary CPUs so that we can reset the global ASID numberspace
without assigning duplicate ASIDs to different tasks or accidentally
assigning different ASIDs to threads of the same process.

This leads to a large number of expensive IPIs between cores:

           CPU0       CPU1
IPI0:          0          0  Timer broadcast interrupts
IPI1:      23165     115888  Rescheduling interrupts
IPI2:          0          0  Function call interrupts
IPI3:       6619       1123  Single function call interrupts <---- IPIs
IPI4:          0          0  CPU stop interrupts

Digging deeper, this also leads to an extremely varied waittime on the
cpu_asid_lock. Granted this is only contended for <1% of the time, but
the waittime varies between 0.5 and 734 us!

After some discussion, it became apparent that tracking the ASIDs
currently active on the cores in the system means that, on rollover, we
can automatically reserve those that are in use without having to stop
the world.

This patch series develops that idea so that:

  - We can support cores without hardware broadcasting of TLB maintenance
    operations without resorting to IPIs.
  - The fastpath (that is, the task already has a valid ASID) remains
    lockless.
  - Assuming that the number of CPUs is less than the number of ASIDs,
    the algorithm scales as they increase (using a bitmap for searching).
  - Generation overflow is not a problem (we use a u64).

With these patches applied, I saw ~2% improvement in hackbench scores on
my dual-core Cortex-A15 board and the interrupt statistics now appear as:

           CPU0       CPU1
IPI0:          0          0  Timer broadcast interrupts
IPI1:      64888      74560  Rescheduling interrupts
IPI2:          0          0  Function call interrupts
IPI3:          1          3  Single function call interrupts <--- Much better!
IPI4:          0          0  CPU stop interrupts

Finally, the waittime on cpu_asid_lock reduced to 0.5 - 4.6 us.

All feedback welcome.

Will

Will Deacon (3):
  ARM: mm: remove IPI broadcasting on ASID rollover
  ARM: mm: avoid taking ASID spinlock on fastpath
  ARM: mm: use bitmap operations when allocating new ASIDs

 arch/arm/include/asm/mmu.h         |   11 +--
 arch/arm/include/asm/mmu_context.h |   82 +--------------
 arch/arm/mm/context.c              |  207 +++++++++++++++++++-----------------
 3 files changed, 115 insertions(+), 185 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] ARM: mm: remove IPI broadcasting on ASID rollover
  2012-08-15 16:53 [PATCH 0/3] New algorithm for ASID allocation and rollover Will Deacon
@ 2012-08-15 16:54 ` Will Deacon
  2012-08-15 16:54 ` [PATCH 2/3] ARM: mm: avoid taking ASID spinlock on fastpath Will Deacon
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2012-08-15 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

ASIDs are allocated to MMU contexts based on a rolling counter. This
means that after 255 allocations we must invalidate all existing ASIDs
via an expensive IPI mechanism to synchronise all of the online CPUs and
ensure that all tasks execute with an ASID from the new generation.

This patch changes the rollover behaviour so that we rely instead on the
hardware broadcasting of the TLB invalidation to avoid the IPI calls.
This works by keeping track of the active ASID on each core, which is
then reserved in the case of a rollover so that currently scheduled
tasks can continue to run. For cores without hardware TLB broadcasting,
we keep track of pending flushes in a cpumask, so cores can flush their
local TLB before scheduling a new mm.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/mmu.h         |   11 +--
 arch/arm/include/asm/mmu_context.h |   82 +---------------
 arch/arm/mm/context.c              |  186 +++++++++++++++++-------------------
 3 files changed, 93 insertions(+), 186 deletions(-)

diff --git a/arch/arm/include/asm/mmu.h b/arch/arm/include/asm/mmu.h
index 1496565..5b53b53 100644
--- a/arch/arm/include/asm/mmu.h
+++ b/arch/arm/include/asm/mmu.h
@@ -5,18 +5,15 @@
 
 typedef struct {
 #ifdef CONFIG_CPU_HAS_ASID
-	unsigned int id;
-	raw_spinlock_t id_lock;
+	u64 id;
 #endif
 	unsigned int kvm_seq;
 } mm_context_t;
 
 #ifdef CONFIG_CPU_HAS_ASID
-#define ASID(mm)	((mm)->context.id & 255)
-
-/* init_mm.context.id_lock should be initialized. */
-#define INIT_MM_CONTEXT(name)                                                 \
-	.context.id_lock    = __RAW_SPIN_LOCK_UNLOCKED(name.context.id_lock),
+#define ASID_BITS	8
+#define ASID_MASK	((~0ULL) << ASID_BITS)
+#define ASID(mm)	((mm)->context.id & ~ASID_MASK)
 #else
 #define ASID(mm)	(0)
 #endif
diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h
index 0306bc6..a64f61c 100644
--- a/arch/arm/include/asm/mmu_context.h
+++ b/arch/arm/include/asm/mmu_context.h
@@ -24,84 +24,8 @@ void __check_kvm_seq(struct mm_struct *mm);
 
 #ifdef CONFIG_CPU_HAS_ASID
 
-/*
- * On ARMv6, we have the following structure in the Context ID:
- *
- * 31                         7          0
- * +-------------------------+-----------+
- * |      process ID         |   ASID    |
- * +-------------------------+-----------+
- * |              context ID             |
- * +-------------------------------------+
- *
- * The ASID is used to tag entries in the CPU caches and TLBs.
- * The context ID is used by debuggers and trace logic, and
- * should be unique within all running processes.
- */
-#define ASID_BITS		8
-#define ASID_MASK		((~0) << ASID_BITS)
-#define ASID_FIRST_VERSION	(1 << ASID_BITS)
-
-extern unsigned int cpu_last_asid;
-
-void __init_new_context(struct task_struct *tsk, struct mm_struct *mm);
-void __new_context(struct mm_struct *mm);
-void cpu_set_reserved_ttbr0(void);
-
-static inline void switch_new_context(struct mm_struct *mm)
-{
-	unsigned long flags;
-
-	__new_context(mm);
-
-	local_irq_save(flags);
-	cpu_switch_mm(mm->pgd, mm);
-	local_irq_restore(flags);
-}
-
-static inline void check_and_switch_context(struct mm_struct *mm,
-					    struct task_struct *tsk)
-{
-	if (unlikely(mm->context.kvm_seq != init_mm.context.kvm_seq))
-		__check_kvm_seq(mm);
-
-	/*
-	 * Required during context switch to avoid speculative page table
-	 * walking with the wrong TTBR.
-	 */
-	cpu_set_reserved_ttbr0();
-
-	if (!((mm->context.id ^ cpu_last_asid) >> ASID_BITS))
-		/*
-		 * The ASID is from the current generation, just switch to the
-		 * new pgd. This condition is only true for calls from
-		 * context_switch() and interrupts are already disabled.
-		 */
-		cpu_switch_mm(mm->pgd, mm);
-	else if (irqs_disabled())
-		/*
-		 * Defer the new ASID allocation until after the context
-		 * switch critical region since __new_context() cannot be
-		 * called with interrupts disabled (it sends IPIs).
-		 */
-		set_ti_thread_flag(task_thread_info(tsk), TIF_SWITCH_MM);
-	else
-		/*
-		 * That is a direct call to switch_mm() or activate_mm() with
-		 * interrupts enabled and a new context.
-		 */
-		switch_new_context(mm);
-}
-
-#define init_new_context(tsk,mm)	(__init_new_context(tsk,mm),0)
-
-#define finish_arch_post_lock_switch \
-	finish_arch_post_lock_switch
-static inline void finish_arch_post_lock_switch(void)
-{
-	if (test_and_clear_thread_flag(TIF_SWITCH_MM))
-		switch_new_context(current->mm);
-}
+void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk);
+#define init_new_context(tsk,mm)	({ mm->context.id = 0; })
 
 #else	/* !CONFIG_CPU_HAS_ASID */
 
@@ -143,6 +67,7 @@ static inline void finish_arch_post_lock_switch(void)
 #endif	/* CONFIG_CPU_HAS_ASID */
 
 #define destroy_context(mm)		do { } while(0)
+#define activate_mm(prev,next)		switch_mm(prev, next, NULL)
 
 /*
  * This is called when "tsk" is about to enter lazy TLB mode.
@@ -186,6 +111,5 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 }
 
 #define deactivate_mm(tsk,mm)	do { } while (0)
-#define activate_mm(prev,next)	switch_mm(prev, next, NULL)
 
 #endif
diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index 119bc52..2d1b42d 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -2,6 +2,9 @@
  *  linux/arch/arm/mm/context.c
  *
  *  Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved.
+ *  Copyright (C) 2012 ARM Limited
+ *
+ *  Author: Will Deacon <will.deacon@arm.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -14,14 +17,35 @@
 #include <linux/percpu.h>
 
 #include <asm/mmu_context.h>
+#include <asm/smp_plat.h>
 #include <asm/thread_notify.h>
 #include <asm/tlbflush.h>
 
+/*
+ * On ARMv6, we have the following structure in the Context ID:
+ *
+ * 31                         7          0
+ * +-------------------------+-----------+
+ * |      process ID         |   ASID    |
+ * +-------------------------+-----------+
+ * |              context ID             |
+ * +-------------------------------------+
+ *
+ * The ASID is used to tag entries in the CPU caches and TLBs.
+ * The context ID is used by debuggers and trace logic, and
+ * should be unique within all running processes.
+ */
+#define ASID_FIRST_VERSION	(1ULL << ASID_BITS)
+
 static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
-unsigned int cpu_last_asid = ASID_FIRST_VERSION;
+static u64 cpu_last_asid = ASID_FIRST_VERSION;
+
+static DEFINE_PER_CPU(u64, active_asids);
+static DEFINE_PER_CPU(u64, reserved_asids);
+static cpumask_t tlb_flush_pending;
 
 #ifdef CONFIG_ARM_LPAE
-void cpu_set_reserved_ttbr0(void)
+static void cpu_set_reserved_ttbr0(void)
 {
 	unsigned long ttbl = __pa(swapper_pg_dir);
 	unsigned long ttbh = 0;
@@ -37,7 +61,7 @@ void cpu_set_reserved_ttbr0(void)
 	isb();
 }
 #else
-void cpu_set_reserved_ttbr0(void)
+static void cpu_set_reserved_ttbr0(void)
 {
 	u32 ttb;
 	/* Copy TTBR1 into TTBR0 */
@@ -83,124 +107,86 @@ static int __init contextidr_notifier_init(void)
 arch_initcall(contextidr_notifier_init);
 #endif
 
-/*
- * We fork()ed a process, and we need a new context for the child
- * to run in.
- */
-void __init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+static void flush_context(unsigned int cpu)
 {
-	mm->context.id = 0;
-	raw_spin_lock_init(&mm->context.id_lock);
-}
+	int i;
 
-static void flush_context(void)
-{
-	cpu_set_reserved_ttbr0();
-	local_flush_tlb_all();
-	if (icache_is_vivt_asid_tagged()) {
+	/* Update the list of reserved ASIDs. */
+	per_cpu(active_asids, cpu) = 0;
+	for_each_possible_cpu(i)
+		per_cpu(reserved_asids, i) = per_cpu(active_asids, i);
+
+	/* Queue a TLB invalidate and flush the I-cache if necessary. */
+	if (!tlb_ops_need_broadcast())
+		cpumask_set_cpu(cpu, &tlb_flush_pending);
+	else
+		cpumask_setall(&tlb_flush_pending);
+
+	if (icache_is_vivt_asid_tagged())
 		__flush_icache_all();
-		dsb();
-	}
 }
 
-#ifdef CONFIG_SMP
+static int is_reserved_asid(u64 asid, u64 mask)
+{
+	int cpu;
+	for_each_possible_cpu(cpu)
+		if ((per_cpu(reserved_asids, cpu) & mask) == (asid & mask))
+			return 1;
+	return 0;
+}
 
-static void set_mm_context(struct mm_struct *mm, unsigned int asid)
+static void new_context(struct mm_struct *mm, unsigned int cpu)
 {
-	unsigned long flags;
+	u64 asid = mm->context.id;
 
-	/*
-	 * Locking needed for multi-threaded applications where the
-	 * same mm->context.id could be set from different CPUs during
-	 * the broadcast. This function is also called via IPI so the
-	 * mm->context.id_lock has to be IRQ-safe.
-	 */
-	raw_spin_lock_irqsave(&mm->context.id_lock, flags);
-	if (likely((mm->context.id ^ cpu_last_asid) >> ASID_BITS)) {
+	if (asid != 0 && is_reserved_asid(asid, ULLONG_MAX)) {
 		/*
-		 * Old version of ASID found. Set the new one and
-		 * reset mm_cpumask(mm).
+		 * Our current ASID was active during a rollover, we can
+		 * continue to use it and this was just a false alarm.
 		 */
-		mm->context.id = asid;
+		asid = (cpu_last_asid & ASID_MASK) | (asid & ~ASID_MASK);
+	} else {
+		/*
+		 * Allocate a free ASID. If we can't find one, take a
+		 * note of the currently active ASIDs and mark the TLBs
+		 * as requiring flushes.
+		 */
+		do {
+			asid = ++cpu_last_asid;
+			if ((asid & ~ASID_MASK) == 0)
+				flush_context(cpu);
+		} while (is_reserved_asid(asid, ~ASID_MASK));
 		cpumask_clear(mm_cpumask(mm));
 	}
-	raw_spin_unlock_irqrestore(&mm->context.id_lock, flags);
 
-	/*
-	 * Set the mm_cpumask(mm) bit for the current CPU.
-	 */
-	cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
+	mm->context.id = asid;
 }
 
-/*
- * Reset the ASID on the current CPU. This function call is broadcast
- * from the CPU handling the ASID rollover and holding cpu_asid_lock.
- */
-static void reset_context(void *info)
+void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
 {
-	unsigned int asid;
+	unsigned long flags;
 	unsigned int cpu = smp_processor_id();
-	struct mm_struct *mm = current->active_mm;
-
-	smp_rmb();
-	asid = cpu_last_asid + cpu + 1;
-
-	flush_context();
-	set_mm_context(mm, asid);
-
-	/* set the new ASID */
-	cpu_switch_mm(mm->pgd, mm);
-}
-
-#else
 
-static inline void set_mm_context(struct mm_struct *mm, unsigned int asid)
-{
-	mm->context.id = asid;
-	cpumask_copy(mm_cpumask(mm), cpumask_of(smp_processor_id()));
-}
+	if (unlikely(mm->context.kvm_seq != init_mm.context.kvm_seq))
+		__check_kvm_seq(mm);
 
-#endif
-
-void __new_context(struct mm_struct *mm)
-{
-	unsigned int asid;
-
-	raw_spin_lock(&cpu_asid_lock);
-#ifdef CONFIG_SMP
 	/*
-	 * Check the ASID again, in case the change was broadcast from
-	 * another CPU before we acquired the lock.
+	 * Required during context switch to avoid speculative page table
+	 * walking with the wrong TTBR.
 	 */
-	if (unlikely(((mm->context.id ^ cpu_last_asid) >> ASID_BITS) == 0)) {
-		cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
-		raw_spin_unlock(&cpu_asid_lock);
-		return;
-	}
-#endif
-	/*
-	 * At this point, it is guaranteed that the current mm (with
-	 * an old ASID) isn't active on any other CPU since the ASIDs
-	 * are changed simultaneously via IPI.
-	 */
-	asid = ++cpu_last_asid;
-	if (asid == 0)
-		asid = cpu_last_asid = ASID_FIRST_VERSION;
+	cpu_set_reserved_ttbr0();
 
-	/*
-	 * If we've used up all our ASIDs, we need
-	 * to start a new version and flush the TLB.
-	 */
-	if (unlikely((asid & ~ASID_MASK) == 0)) {
-		asid = cpu_last_asid + smp_processor_id() + 1;
-		flush_context();
-#ifdef CONFIG_SMP
-		smp_wmb();
-		smp_call_function(reset_context, NULL, 1);
-#endif
-		cpu_last_asid += NR_CPUS;
-	}
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+	/* Check that our ASID belongs to the current generation. */
+	if ((mm->context.id ^ cpu_last_asid) >> ASID_BITS)
+		new_context(mm, cpu);
 
-	set_mm_context(mm, asid);
-	raw_spin_unlock(&cpu_asid_lock);
+	*this_cpu_ptr(&active_asids) = mm->context.id;
+	cpumask_set_cpu(cpu, mm_cpumask(mm));
+
+	if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending))
+		local_flush_tlb_all();
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+
+	cpu_switch_mm(mm->pgd, mm);
 }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] ARM: mm: avoid taking ASID spinlock on fastpath
  2012-08-15 16:53 [PATCH 0/3] New algorithm for ASID allocation and rollover Will Deacon
  2012-08-15 16:54 ` [PATCH 1/3] ARM: mm: remove IPI broadcasting on ASID rollover Will Deacon
@ 2012-08-15 16:54 ` Will Deacon
  2012-08-15 16:54 ` [PATCH 3/3] ARM: mm: use bitmap operations when allocating new ASIDs Will Deacon
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2012-08-15 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

When scheduling a new mm, we take a spinlock so that we can:

  1. Safely allocate a new ASID, if required
  2. Update our active_asids field without worrying about parallel
     updates to reserved_asids
  3. Ensure that we flush our local TLB, if required

However, this has the nasty affect of serialising context-switch across
all CPUs in the system. The usual (fast) case is where the next mm has
a valid ASID for the current generation. In such a scenario, we can
avoid taking the lock and instead use atomic64_xchg to update the
active_asids variable for the current CPU. If a rollover occurs on
another CPU (which would take the lock), when copying the active_asids
into the reserved_asids another atomic64_xchg is used to replace each
active_asids with 0. The fast path can then detect this case and fall
back to spinning on the lock.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/mm/context.c |   23 +++++++++++++++--------
 1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index 2d1b42d..733774f 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -38,9 +38,9 @@
 #define ASID_FIRST_VERSION	(1ULL << ASID_BITS)
 
 static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
-static u64 cpu_last_asid = ASID_FIRST_VERSION;
+static atomic64_t cpu_last_asid = ATOMIC64_INIT(ASID_FIRST_VERSION);
 
-static DEFINE_PER_CPU(u64, active_asids);
+static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;
 
@@ -112,9 +112,10 @@ static void flush_context(unsigned int cpu)
 	int i;
 
 	/* Update the list of reserved ASIDs. */
-	per_cpu(active_asids, cpu) = 0;
 	for_each_possible_cpu(i)
-		per_cpu(reserved_asids, i) = per_cpu(active_asids, i);
+		per_cpu(reserved_asids, i) =
+			atomic64_xchg(&per_cpu(active_asids, i), 0);
+	per_cpu(reserved_asids, cpu) = 0;
 
 	/* Queue a TLB invalidate and flush the I-cache if necessary. */
 	if (!tlb_ops_need_broadcast())
@@ -144,7 +145,8 @@ static void new_context(struct mm_struct *mm, unsigned int cpu)
 		 * Our current ASID was active during a rollover, we can
 		 * continue to use it and this was just a false alarm.
 		 */
-		asid = (cpu_last_asid & ASID_MASK) | (asid & ~ASID_MASK);
+		asid = (atomic64_read(&cpu_last_asid) & ASID_MASK) | \
+		       (asid & ~ASID_MASK);
 	} else {
 		/*
 		 * Allocate a free ASID. If we can't find one, take a
@@ -152,7 +154,7 @@ static void new_context(struct mm_struct *mm, unsigned int cpu)
 		 * as requiring flushes.
 		 */
 		do {
-			asid = ++cpu_last_asid;
+			asid = atomic64_inc_return(&cpu_last_asid);
 			if ((asid & ~ASID_MASK) == 0)
 				flush_context(cpu);
 		} while (is_reserved_asid(asid, ~ASID_MASK));
@@ -170,6 +172,10 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
 	if (unlikely(mm->context.kvm_seq != init_mm.context.kvm_seq))
 		__check_kvm_seq(mm);
 
+	if (!((mm->context.id ^ atomic64_read(&cpu_last_asid)) >> ASID_BITS) &&
+	    atomic64_xchg(&per_cpu(active_asids, cpu), mm->context.id))
+		goto switch_mm_fastpath;
+
 	/*
 	 * Required during context switch to avoid speculative page table
 	 * walking with the wrong TTBR.
@@ -178,15 +184,16 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
 
 	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
 	/* Check that our ASID belongs to the current generation. */
-	if ((mm->context.id ^ cpu_last_asid) >> ASID_BITS)
+	if ((mm->context.id ^ atomic64_read(&cpu_last_asid)) >> ASID_BITS)
 		new_context(mm, cpu);
 
-	*this_cpu_ptr(&active_asids) = mm->context.id;
+	atomic64_set(&per_cpu(active_asids, cpu), mm->context.id);
 	cpumask_set_cpu(cpu, mm_cpumask(mm));
 
 	if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending))
 		local_flush_tlb_all();
 	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
 
+switch_mm_fastpath:
 	cpu_switch_mm(mm->pgd, mm);
 }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] ARM: mm: use bitmap operations when allocating new ASIDs
  2012-08-15 16:53 [PATCH 0/3] New algorithm for ASID allocation and rollover Will Deacon
  2012-08-15 16:54 ` [PATCH 1/3] ARM: mm: remove IPI broadcasting on ASID rollover Will Deacon
  2012-08-15 16:54 ` [PATCH 2/3] ARM: mm: avoid taking ASID spinlock on fastpath Will Deacon
@ 2012-08-15 16:54 ` Will Deacon
  2012-08-15 17:05 ` [PATCH 0/3] New algorithm for ASID allocation and rollover Marc Zyngier
  2012-08-19 15:21 ` Arnd Bergmann
  4 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2012-08-15 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

When allocating a new ASID, we must take care not to re-assign a
reserved ASID-value to a new mm. This requires us to check each
candidate ASID against those currently reserved by other cores before
assigning a new ASID to the current mm.

This patch improves the ASID allocation algorithm by using a
bitmap-based approach. Rather than iterating over the reserved ASID
array for each candidate ASID, we simply find the first zero bit,
ensuring that those indices corresponding to reserved ASIDs are set
when flushing during a rollover event.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/mm/context.c |   56 +++++++++++++++++++++++++++++++-----------------
 1 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index 733774f..1a9e2ab 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -36,9 +36,14 @@
  * should be unique within all running processes.
  */
 #define ASID_FIRST_VERSION	(1ULL << ASID_BITS)
+#define NUM_USER_ASIDS		(ASID_FIRST_VERSION - 1)
+
+#define ASID_TO_IDX(asid)	((asid & ~ASID_MASK) - 1)
+#define IDX_TO_ASID(idx)	((idx + 1) & ~ASID_MASK)
 
 static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
-static atomic64_t cpu_last_asid = ATOMIC64_INIT(ASID_FIRST_VERSION);
+static atomic64_t asid_generation = ATOMIC64_INIT(ASID_FIRST_VERSION);
+static DECLARE_BITMAP(asid_map, NUM_USER_ASIDS);
 
 static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
@@ -110,12 +115,19 @@ arch_initcall(contextidr_notifier_init);
 static void flush_context(unsigned int cpu)
 {
 	int i;
-
-	/* Update the list of reserved ASIDs. */
-	for_each_possible_cpu(i)
-		per_cpu(reserved_asids, i) =
-			atomic64_xchg(&per_cpu(active_asids, i), 0);
-	per_cpu(reserved_asids, cpu) = 0;
+	u64 asid;
+
+	/* Update the list of reserved ASIDs and the ASID bitmap. */
+	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+	for_each_possible_cpu(i) {
+		if (i == cpu) {
+			asid = 0;
+		} else {
+			asid = atomic64_xchg(&per_cpu(active_asids, i), 0);
+			__set_bit(ASID_TO_IDX(asid), asid_map);
+		}
+		per_cpu(reserved_asids, i) = asid;
+	}
 
 	/* Queue a TLB invalidate and flush the I-cache if necessary. */
 	if (!tlb_ops_need_broadcast())
@@ -127,11 +139,11 @@ static void flush_context(unsigned int cpu)
 		__flush_icache_all();
 }
 
-static int is_reserved_asid(u64 asid, u64 mask)
+static int is_reserved_asid(u64 asid)
 {
 	int cpu;
 	for_each_possible_cpu(cpu)
-		if ((per_cpu(reserved_asids, cpu) & mask) == (asid & mask))
+		if (per_cpu(reserved_asids, cpu) == asid)
 			return 1;
 	return 0;
 }
@@ -139,25 +151,29 @@ static int is_reserved_asid(u64 asid, u64 mask)
 static void new_context(struct mm_struct *mm, unsigned int cpu)
 {
 	u64 asid = mm->context.id;
+	u64 generation = atomic64_read(&asid_generation);
 
-	if (asid != 0 && is_reserved_asid(asid, ULLONG_MAX)) {
+	if (asid != 0 && is_reserved_asid(asid)) {
 		/*
 		 * Our current ASID was active during a rollover, we can
 		 * continue to use it and this was just a false alarm.
 		 */
-		asid = (atomic64_read(&cpu_last_asid) & ASID_MASK) | \
-		       (asid & ~ASID_MASK);
+		asid = generation | (asid & ~ASID_MASK);
 	} else {
 		/*
 		 * Allocate a free ASID. If we can't find one, take a
 		 * note of the currently active ASIDs and mark the TLBs
 		 * as requiring flushes.
 		 */
-		do {
-			asid = atomic64_inc_return(&cpu_last_asid);
-			if ((asid & ~ASID_MASK) == 0)
-				flush_context(cpu);
-		} while (is_reserved_asid(asid, ~ASID_MASK));
+		asid = find_first_zero_bit(asid_map, NUM_USER_ASIDS);
+		if (asid == NUM_USER_ASIDS) {
+			generation = atomic64_add_return(ASID_FIRST_VERSION,
+							 &asid_generation);
+			flush_context(cpu);
+			asid = find_first_zero_bit(asid_map, NUM_USER_ASIDS);
+		}
+		__set_bit(asid, asid_map);
+		asid = generation | IDX_TO_ASID(asid);
 		cpumask_clear(mm_cpumask(mm));
 	}
 
@@ -172,8 +188,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
 	if (unlikely(mm->context.kvm_seq != init_mm.context.kvm_seq))
 		__check_kvm_seq(mm);
 
-	if (!((mm->context.id ^ atomic64_read(&cpu_last_asid)) >> ASID_BITS) &&
-	    atomic64_xchg(&per_cpu(active_asids, cpu), mm->context.id))
+	if (!((mm->context.id ^ atomic64_read(&asid_generation)) >> ASID_BITS)
+	    && atomic64_xchg(&per_cpu(active_asids, cpu), mm->context.id))
 		goto switch_mm_fastpath;
 
 	/*
@@ -184,7 +200,7 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
 
 	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
 	/* Check that our ASID belongs to the current generation. */
-	if ((mm->context.id ^ atomic64_read(&cpu_last_asid)) >> ASID_BITS)
+	if ((mm->context.id ^ atomic64_read(&asid_generation)) >> ASID_BITS)
 		new_context(mm, cpu);
 
 	atomic64_set(&per_cpu(active_asids, cpu), mm->context.id);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 0/3] New algorithm for ASID allocation and rollover
  2012-08-15 16:53 [PATCH 0/3] New algorithm for ASID allocation and rollover Will Deacon
                   ` (2 preceding siblings ...)
  2012-08-15 16:54 ` [PATCH 3/3] ARM: mm: use bitmap operations when allocating new ASIDs Will Deacon
@ 2012-08-15 17:05 ` Marc Zyngier
  2012-08-19 15:21 ` Arnd Bergmann
  4 siblings, 0 replies; 7+ messages in thread
From: Marc Zyngier @ 2012-08-15 17:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 15/08/12 17:53, Will Deacon wrote:
> Hello,
> 
> Following some investigation into preempt-rt Linux, it became apparent
> that ASID rollover can happen fairly regularly under certain heavy
> scheduling workloads. Each time this happens, we broadcast an interrupt
> to the secondary CPUs so that we can reset the global ASID numberspace
> without assigning duplicate ASIDs to different tasks or accidentally
> assigning different ASIDs to threads of the same process.
> 
> This leads to a large number of expensive IPIs between cores:
> 
>            CPU0       CPU1
> IPI0:          0          0  Timer broadcast interrupts
> IPI1:      23165     115888  Rescheduling interrupts
> IPI2:          0          0  Function call interrupts
> IPI3:       6619       1123  Single function call interrupts <---- IPIs
> IPI4:          0          0  CPU stop interrupts
> 
> Digging deeper, this also leads to an extremely varied waittime on the
> cpu_asid_lock. Granted this is only contended for <1% of the time, but
> the waittime varies between 0.5 and 734 us!
> 
> After some discussion, it became apparent that tracking the ASIDs
> currently active on the cores in the system means that, on rollover, we
> can automatically reserve those that are in use without having to stop
> the world.
> 
> This patch series develops that idea so that:
> 
>   - We can support cores without hardware broadcasting of TLB maintenance
>     operations without resorting to IPIs.

This particular bit should benefit even more to virtual machines, as it
avoids trapping back to the host to handle a write to the (emulated) GIC
distributor, and the corresponding interrupt injection on the target vcpus.

I'll give it a spin on KVM.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 0/3] New algorithm for ASID allocation and rollover
  2012-08-15 16:53 [PATCH 0/3] New algorithm for ASID allocation and rollover Will Deacon
                   ` (3 preceding siblings ...)
  2012-08-15 17:05 ` [PATCH 0/3] New algorithm for ASID allocation and rollover Marc Zyngier
@ 2012-08-19 15:21 ` Arnd Bergmann
  2012-08-20 12:51   ` Will Deacon
  4 siblings, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2012-08-19 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 15 August 2012, Will Deacon wrote:
> After some discussion, it became apparent that tracking the ASIDs
> currently active on the cores in the system means that, on rollover, we
> can automatically reserve those that are in use without having to stop
> the world.

Just a question for my general understanding of how this is done: How do
you know if an ASID is active or not? Do you broadcast flush an address
space completely when the struct mm goes away, or do you keep track of
which CPUs had TLBs in an ASIC when it went away and then flush that ASID
when you reuse it on that CPU?

	Arnd

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 0/3] New algorithm for ASID allocation and rollover
  2012-08-19 15:21 ` Arnd Bergmann
@ 2012-08-20 12:51   ` Will Deacon
  0 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2012-08-20 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arnd,

Thanks for the interest!

On Sun, Aug 19, 2012 at 04:21:37PM +0100, Arnd Bergmann wrote:
> On Wednesday 15 August 2012, Will Deacon wrote:
> > After some discussion, it became apparent that tracking the ASIDs
> > currently active on the cores in the system means that, on rollover, we
> > can automatically reserve those that are in use without having to stop
> > the world.
> 
> Just a question for my general understanding of how this is done: How do
> you know if an ASID is active or not? Do you broadcast flush an address
> space completely when the struct mm goes away, or do you keep track of
> which CPUs had TLBs in an ASIC when it went away and then flush that ASID
> when you reuse it on that CPU?

Ok, I'll address these in turn:

1. We know if an ASID is active or not by updating the per-cpu active_asids
   variable when switching mm.

2. Flushing (TLB invalidation) only happens when all ASIDs are dirty. So
   when a struct mm is freed, its ASID remains `dirty' in the asid_map.
   Eventually, no more ASIDs can be allocated, so the flushing takes place
   then. All v7 CPUs broadcast this operation in hardware. At this point, we
   mark all the active ASIDs as dirty to prevent them being re-allocated to
   different tasks.

I did play with per-ASID TLB-flushing depending on ASID pressure but I
couldn't find any benchmarks that showed an improvement.

Cheers,

Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-08-20 12:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-15 16:53 [PATCH 0/3] New algorithm for ASID allocation and rollover Will Deacon
2012-08-15 16:54 ` [PATCH 1/3] ARM: mm: remove IPI broadcasting on ASID rollover Will Deacon
2012-08-15 16:54 ` [PATCH 2/3] ARM: mm: avoid taking ASID spinlock on fastpath Will Deacon
2012-08-15 16:54 ` [PATCH 3/3] ARM: mm: use bitmap operations when allocating new ASIDs Will Deacon
2012-08-15 17:05 ` [PATCH 0/3] New algorithm for ASID allocation and rollover Marc Zyngier
2012-08-19 15:21 ` Arnd Bergmann
2012-08-20 12:51   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).