linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] arm64 switch_mm improvements
@ 2015-10-06 17:46 Will Deacon
  2015-10-06 17:46 ` [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function Will Deacon
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

This is version two of the patches previously posted here:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/370720.html

Changes since v1 include:

  * More comments
  * Added reviewed-by tags

Cheers,

Will

--->8

Will Deacon (10):
  arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function
  arm64: proc: de-scope TLBI operation during cold boot
  arm64: flush: use local TLB and I-cache invalidation
  arm64: mm: rewrite ASID allocator and MM context-switching code
  arm64: tlbflush: remove redundant ASID casts to (unsigned long)
  arm64: tlbflush: avoid flushing when fullmm == 1
  arm64: switch_mm: simplify mm and CPU checks
  arm64: mm: kill mm_cpumask usage
  arm64: tlb: remove redundant barrier from __flush_tlb_pgtable
  arm64: mm: remove dsb from update_mmu_cache

 arch/arm64/include/asm/cacheflush.h  |   7 ++
 arch/arm64/include/asm/mmu.h         |  15 +--
 arch/arm64/include/asm/mmu_context.h | 113 ++++-------------
 arch/arm64/include/asm/pgtable.h     |   6 +-
 arch/arm64/include/asm/thread_info.h |   1 -
 arch/arm64/include/asm/tlb.h         |  26 ++--
 arch/arm64/include/asm/tlbflush.h    |  18 ++-
 arch/arm64/kernel/asm-offsets.c      |   2 +-
 arch/arm64/kernel/efi.c              |   5 +-
 arch/arm64/kernel/smp.c              |   9 +-
 arch/arm64/kernel/suspend.c          |   2 +-
 arch/arm64/mm/context.c              | 236 +++++++++++++++++++++--------------
 arch/arm64/mm/mmu.c                  |   2 +-
 arch/arm64/mm/proc.S                 |   6 +-
 14 files changed, 222 insertions(+), 226 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-07  6:17   ` Ard Biesheuvel
  2015-10-06 17:46 ` [PATCH v2 02/10] arm64: proc: de-scope TLBI operation during cold boot Will Deacon
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

With commit b08d4640a3dc ("arm64: remove dead code"),
cpu_set_idmap_tcr_t0sz is no longer called and can therefore be removed
from the kernel.

This patch removes the function and effectively inlines the helper
function __cpu_set_tcr_t0sz into cpu_set_default_tcr_t0sz.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h | 35 ++++++++++++-----------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 8ec41e5f56f0..549b89554ce8 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -77,34 +77,23 @@ static inline bool __cpu_uses_extended_idmap(void)
 		unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
 }
 
-static inline void __cpu_set_tcr_t0sz(u64 t0sz)
-{
-	unsigned long tcr;
-
-	if (__cpu_uses_extended_idmap())
-		asm volatile (
-		"	mrs	%0, tcr_el1	;"
-		"	bfi	%0, %1, %2, %3	;"
-		"	msr	tcr_el1, %0	;"
-		"	isb"
-		: "=&r" (tcr)
-		: "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
-}
-
-/*
- * Set TCR.T0SZ to the value appropriate for activating the identity map.
- */
-static inline void cpu_set_idmap_tcr_t0sz(void)
-{
-	__cpu_set_tcr_t0sz(idmap_t0sz);
-}
-
 /*
  * Set TCR.T0SZ to its default value (based on VA_BITS)
  */
 static inline void cpu_set_default_tcr_t0sz(void)
 {
-	__cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
+	unsigned long tcr;
+
+	if (!__cpu_uses_extended_idmap())
+		return;
+
+	asm volatile (
+	"	mrs	%0, tcr_el1	;"
+	"	bfi	%0, %1, %2, %3	;"
+	"	msr	tcr_el1, %0	;"
+	"	isb"
+	: "=&r" (tcr)
+	: "r"(TCR_T0SZ(VA_BITS)), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
 }
 
 static inline void switch_new_context(struct mm_struct *mm)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 02/10] arm64: proc: de-scope TLBI operation during cold boot
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
  2015-10-06 17:46 ` [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation Will Deacon
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

When cold-booting a CPU, we must invalidate any junk entries from the
local TLB prior to enabling the MMU. This doesn't require broadcasting
within the inner-shareable domain, so de-scope the operation to apply
only to the local CPU.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/mm/proc.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index e4ee7bd8830a..bbde13d77da5 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -146,8 +146,8 @@ ENDPROC(cpu_do_switch_mm)
  *	value of the SCTLR_EL1 register.
  */
 ENTRY(__cpu_setup)
-	tlbi	vmalle1is			// invalidate I + D TLBs
-	dsb	ish
+	tlbi	vmalle1				// Invalidate local TLB
+	dsb	nsh
 
 	mov	x0, #3 << 20
 	msr	cpacr_el1, x0			// Enable FP/ASIMD
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
  2015-10-06 17:46 ` [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function Will Deacon
  2015-10-06 17:46 ` [PATCH v2 02/10] arm64: proc: de-scope TLBI operation during cold boot Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-07  1:18   ` David Daney
  2015-10-07  6:18   ` Ard Biesheuvel
  2015-10-06 17:46 ` [PATCH v2 04/10] arm64: mm: rewrite ASID allocator and MM context-switching code Will Deacon
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

There are a number of places where a single CPU is running with a
private page-table and we need to perform maintenance on the TLB and
I-cache in order to ensure correctness, but do not require the operation
to be broadcast to other CPUs.

This patch adds local variants of tlb_flush_all and __flush_icache_all
to support these use-cases and updates the callers respectively.
__local_flush_icache_all also implies an isb, since it is intended to be
used synchronously.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/cacheflush.h | 7 +++++++
 arch/arm64/include/asm/tlbflush.h   | 8 ++++++++
 arch/arm64/kernel/efi.c             | 4 ++--
 arch/arm64/kernel/smp.c             | 2 +-
 arch/arm64/kernel/suspend.c         | 2 +-
 arch/arm64/mm/context.c             | 4 ++--
 arch/arm64/mm/mmu.c                 | 2 +-
 7 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index c75b8d027eb1..54efedaf331f 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -115,6 +115,13 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *);
 
+static inline void __local_flush_icache_all(void)
+{
+	asm("ic iallu");
+	dsb(nsh);
+	isb();
+}
+
 static inline void __flush_icache_all(void)
 {
 	asm("ic	ialluis");
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 7bd2da021658..96f944e75dc4 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -63,6 +63,14 @@
  *		only require the D-TLB to be invalidated.
  *		- kaddr - Kernel virtual memory address
  */
+static inline void local_flush_tlb_all(void)
+{
+	dsb(nshst);
+	asm("tlbi	vmalle1");
+	dsb(nsh);
+	isb();
+}
+
 static inline void flush_tlb_all(void)
 {
 	dsb(ishst);
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 13671a9cf016..4d12926ea40d 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -344,9 +344,9 @@ static void efi_set_pgd(struct mm_struct *mm)
 	else
 		cpu_switch_mm(mm->pgd, mm);
 
-	flush_tlb_all();
+	local_flush_tlb_all();
 	if (icache_is_aivivt())
-		__flush_icache_all();
+		__local_flush_icache_all();
 }
 
 void efi_virtmap_load(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dbdaacddd9a5..fdd4d4dbd64f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -152,7 +152,7 @@ asmlinkage void secondary_start_kernel(void)
 	 * point to zero page to avoid speculatively fetching new entries.
 	 */
 	cpu_set_reserved_ttbr0();
-	flush_tlb_all();
+	local_flush_tlb_all();
 	cpu_set_default_tcr_t0sz();
 
 	preempt_disable();
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index 8297d502217e..3c5e4e6dcf68 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -90,7 +90,7 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 		else
 			cpu_switch_mm(mm->pgd, mm);
 
-		flush_tlb_all();
+		local_flush_tlb_all();
 
 		/*
 		 * Restore per-cpu offset before any kernel
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index d70ff14dbdbd..48b53fb381af 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -48,9 +48,9 @@ static void flush_context(void)
 {
 	/* set the reserved TTBR0 before flushing the TLB */
 	cpu_set_reserved_ttbr0();
-	flush_tlb_all();
+	local_flush_tlb_all();
 	if (icache_is_aivivt())
-		__flush_icache_all();
+		__local_flush_icache_all();
 }
 
 static void set_mm_context(struct mm_struct *mm, unsigned int asid)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9211b8527f25..71a310478c9e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -456,7 +456,7 @@ void __init paging_init(void)
 	 * point to zero page to avoid speculatively fetching new entries.
 	 */
 	cpu_set_reserved_ttbr0();
-	flush_tlb_all();
+	local_flush_tlb_all();
 	cpu_set_default_tcr_t0sz();
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 04/10] arm64: mm: rewrite ASID allocator and MM context-switching code
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (2 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 05/10] arm64: tlbflush: remove redundant ASID casts to (unsigned long) Will Deacon
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

Our current switch_mm implementation suffers from a number of problems:

  (1) The ASID allocator relies on IPIs to synchronise the CPUs on a
      rollover event

  (2) Because of (1), we cannot allocate ASIDs with interrupts disabled
      and therefore make use of a TIF_SWITCH_MM flag to postpone the
      actual switch to finish_arch_post_lock_switch

  (3) We run context switch with a reserved (invalid) TTBR0 value, even
      though the ASID and pgd are updated atomically

  (4) We take a global spinlock (cpu_asid_lock) during context-switch

  (5) We use h/w broadcast TLB operations when they are not required
      (e.g. in flush_context)

This patch addresses these problems by rewriting the ASID algorithm to
match the bitmap-based arch/arm/ implementation more closely. This in
turn allows us to remove much of the complications surrounding switch_mm,
including the ugly thread flag.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu.h         |  15 +--
 arch/arm64/include/asm/mmu_context.h |  76 ++---------
 arch/arm64/include/asm/thread_info.h |   1 -
 arch/arm64/kernel/asm-offsets.c      |   2 +-
 arch/arm64/kernel/efi.c              |   1 -
 arch/arm64/mm/context.c              | 238 +++++++++++++++++++++--------------
 arch/arm64/mm/proc.S                 |   2 +-
 7 files changed, 166 insertions(+), 169 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 030208767185..990124a67eeb 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -17,15 +17,16 @@
 #define __ASM_MMU_H
 
 typedef struct {
-	unsigned int id;
-	raw_spinlock_t id_lock;
-	void *vdso;
+	atomic64_t	id;
+	void		*vdso;
 } mm_context_t;
 
-#define INIT_MM_CONTEXT(name) \
-	.context.id_lock = __RAW_SPIN_LOCK_UNLOCKED(name.context.id_lock),
-
-#define ASID(mm)	((mm)->context.id & 0xffff)
+/*
+ * This macro is only used by the TLBI code, which cannot race with an
+ * ASID change and therefore doesn't need to reload the counter using
+ * atomic64_read.
+ */
+#define ASID(mm)	((mm)->context.id.counter & 0xffff)
 
 extern void paging_init(void);
 extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 549b89554ce8..f4c74a951b6c 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -28,13 +28,6 @@
 #include <asm/cputype.h>
 #include <asm/pgtable.h>
 
-#define MAX_ASID_BITS	16
-
-extern unsigned int cpu_last_asid;
-
-void __init_new_context(struct task_struct *tsk, struct mm_struct *mm);
-void __new_context(struct mm_struct *mm);
-
 #ifdef CONFIG_PID_IN_CONTEXTIDR
 static inline void contextidr_thread_switch(struct task_struct *next)
 {
@@ -96,66 +89,19 @@ static inline void cpu_set_default_tcr_t0sz(void)
 	: "r"(TCR_T0SZ(VA_BITS)), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
 }
 
-static inline void switch_new_context(struct mm_struct *mm)
-{
-	unsigned long flags;
-
-	__new_context(mm);
-
-	local_irq_save(flags);
-	cpu_switch_mm(mm->pgd, mm);
-	local_irq_restore(flags);
-}
-
-static inline void check_and_switch_context(struct mm_struct *mm,
-					    struct task_struct *tsk)
-{
-	/*
-	 * Required during context switch to avoid speculative page table
-	 * walking with the wrong TTBR.
-	 */
-	cpu_set_reserved_ttbr0();
-
-	if (!((mm->context.id ^ cpu_last_asid) >> MAX_ASID_BITS))
-		/*
-		 * The ASID is from the current generation, just switch to the
-		 * new pgd. This condition is only true for calls from
-		 * context_switch() and interrupts are already disabled.
-		 */
-		cpu_switch_mm(mm->pgd, mm);
-	else if (irqs_disabled())
-		/*
-		 * Defer the new ASID allocation until after the context
-		 * switch critical region since __new_context() cannot be
-		 * called with interrupts disabled.
-		 */
-		set_ti_thread_flag(task_thread_info(tsk), TIF_SWITCH_MM);
-	else
-		/*
-		 * That is a direct call to switch_mm() or activate_mm() with
-		 * interrupts enabled and a new context.
-		 */
-		switch_new_context(mm);
-}
-
-#define init_new_context(tsk,mm)	(__init_new_context(tsk,mm),0)
+/*
+ * It would be nice to return ASIDs back to the allocator, but unfortunately
+ * that introduces a race with a generation rollover where we could erroneously
+ * free an ASID allocated in a future generation. We could workaround this by
+ * freeing the ASID from the context of the dying mm (e.g. in arch_exit_mmap),
+ * but we'd then need to make sure that we didn't dirty any TLBs afterwards.
+ * Setting a reserved TTBR0 or EPD0 would work, but it all gets ugly when you
+ * take CPU migration into account.
+ */
 #define destroy_context(mm)		do { } while(0)
+void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define finish_arch_post_lock_switch \
-	finish_arch_post_lock_switch
-static inline void finish_arch_post_lock_switch(void)
-{
-	if (test_and_clear_thread_flag(TIF_SWITCH_MM)) {
-		struct mm_struct *mm = current->mm;
-		unsigned long flags;
-
-		__new_context(mm);
-
-		local_irq_save(flags);
-		cpu_switch_mm(mm->pgd, mm);
-		local_irq_restore(flags);
-	}
-}
+#define init_new_context(tsk,mm)	({ atomic64_set(&mm->context.id, 0); 0; })
 
 /*
  * This is called when "tsk" is about to enter lazy TLB mode.
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index dcd06d18a42a..555c6dec5ef2 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -111,7 +111,6 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	20
 #define TIF_SINGLESTEP		21
 #define TIF_32BIT		22	/* 32bit process */
-#define TIF_SWITCH_MM		23	/* deferred switch_mm */
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 8d89cf8dae55..25de8b244961 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -60,7 +60,7 @@ int main(void)
   DEFINE(S_SYSCALLNO,		offsetof(struct pt_regs, syscallno));
   DEFINE(S_FRAME_SIZE,		sizeof(struct pt_regs));
   BLANK();
-  DEFINE(MM_CONTEXT_ID,		offsetof(struct mm_struct, context.id));
+  DEFINE(MM_CONTEXT_ID,		offsetof(struct mm_struct, context.id.counter));
   BLANK();
   DEFINE(VMA_VM_MM,		offsetof(struct vm_area_struct, vm_mm));
   DEFINE(VMA_VM_FLAGS,		offsetof(struct vm_area_struct, vm_flags));
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 4d12926ea40d..a48d1f477b2e 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -48,7 +48,6 @@ static struct mm_struct efi_mm = {
 	.mmap_sem		= __RWSEM_INITIALIZER(efi_mm.mmap_sem),
 	.page_table_lock	= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
 	.mmlist			= LIST_HEAD_INIT(efi_mm.mmlist),
-	INIT_MM_CONTEXT(efi_mm)
 };
 
 static int uefi_debug __initdata;
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 48b53fb381af..e902229b1a3d 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -17,135 +17,187 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
-#include <linux/init.h>
+#include <linux/bitops.h>
 #include <linux/sched.h>
+#include <linux/slab.h>
 #include <linux/mm.h>
-#include <linux/smp.h>
-#include <linux/percpu.h>
 
+#include <asm/cpufeature.h>
 #include <asm/mmu_context.h>
 #include <asm/tlbflush.h>
-#include <asm/cachetype.h>
 
-#define asid_bits(reg) \
-	(((read_cpuid(ID_AA64MMFR0_EL1) & 0xf0) >> 2) + 8)
+static u32 asid_bits;
+static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
 
-#define ASID_FIRST_VERSION	(1 << MAX_ASID_BITS)
+static atomic64_t asid_generation;
+static unsigned long *asid_map;
 
-static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
-unsigned int cpu_last_asid = ASID_FIRST_VERSION;
+static DEFINE_PER_CPU(atomic64_t, active_asids);
+static DEFINE_PER_CPU(u64, reserved_asids);
+static cpumask_t tlb_flush_pending;
 
-/*
- * We fork()ed a process, and we need a new context for the child to run in.
- */
-void __init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+#define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
+#define ASID_FIRST_VERSION	(1UL << asid_bits)
+#define NUM_USER_ASIDS		ASID_FIRST_VERSION
+
+static void flush_context(unsigned int cpu)
 {
-	mm->context.id = 0;
-	raw_spin_lock_init(&mm->context.id_lock);
+	int i;
+	u64 asid;
+
+	/* Update the list of reserved ASIDs and the ASID bitmap. */
+	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+
+	/*
+	 * Ensure the generation bump is observed before we xchg the
+	 * active_asids.
+	 */
+	smp_wmb();
+
+	for_each_possible_cpu(i) {
+		asid = atomic64_xchg_relaxed(&per_cpu(active_asids, i), 0);
+		/*
+		 * If this CPU has already been through a
+		 * rollover, but hasn't run another task in
+		 * the meantime, we must preserve its reserved
+		 * ASID, as this is the only trace we have of
+		 * the process it is still running.
+		 */
+		if (asid == 0)
+			asid = per_cpu(reserved_asids, i);
+		__set_bit(asid & ~ASID_MASK, asid_map);
+		per_cpu(reserved_asids, i) = asid;
+	}
+
+	/* Queue a TLB invalidate and flush the I-cache if necessary. */
+	cpumask_setall(&tlb_flush_pending);
+
+	if (icache_is_aivivt())
+		__flush_icache_all();
 }
 
-static void flush_context(void)
+static int is_reserved_asid(u64 asid)
 {
-	/* set the reserved TTBR0 before flushing the TLB */
-	cpu_set_reserved_ttbr0();
-	local_flush_tlb_all();
-	if (icache_is_aivivt())
-		__local_flush_icache_all();
+	int cpu;
+	for_each_possible_cpu(cpu)
+		if (per_cpu(reserved_asids, cpu) == asid)
+			return 1;
+	return 0;
 }
 
-static void set_mm_context(struct mm_struct *mm, unsigned int asid)
+static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 {
-	unsigned long flags;
+	static u32 cur_idx = 1;
+	u64 asid = atomic64_read(&mm->context.id);
+	u64 generation = atomic64_read(&asid_generation);
 
-	/*
-	 * Locking needed for multi-threaded applications where the same
-	 * mm->context.id could be set from different CPUs during the
-	 * broadcast. This function is also called via IPI so the
-	 * mm->context.id_lock has to be IRQ-safe.
-	 */
-	raw_spin_lock_irqsave(&mm->context.id_lock, flags);
-	if (likely((mm->context.id ^ cpu_last_asid) >> MAX_ASID_BITS)) {
+	if (asid != 0) {
 		/*
-		 * Old version of ASID found. Set the new one and reset
-		 * mm_cpumask(mm).
+		 * If our current ASID was active during a rollover, we
+		 * can continue to use it and this was just a false alarm.
 		 */
-		mm->context.id = asid;
-		cpumask_clear(mm_cpumask(mm));
+		if (is_reserved_asid(asid))
+			return generation | (asid & ~ASID_MASK);
+
+		/*
+		 * We had a valid ASID in a previous life, so try to re-use
+		 * it if possible.
+		 */
+		asid &= ~ASID_MASK;
+		if (!__test_and_set_bit(asid, asid_map))
+			goto bump_gen;
 	}
-	raw_spin_unlock_irqrestore(&mm->context.id_lock, flags);
 
 	/*
-	 * Set the mm_cpumask(mm) bit for the current CPU.
+	 * Allocate a free ASID. If we can't find one, take a note of the
+	 * currently active ASIDs and mark the TLBs as requiring flushes.
+	 * We always count from ASID #1, as we use ASID #0 when setting a
+	 * reserved TTBR0 for the init_mm.
 	 */
-	cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
+	asid = find_next_zero_bit(asid_map, NUM_USER_ASIDS, cur_idx);
+	if (asid != NUM_USER_ASIDS)
+		goto set_asid;
+
+	/* We're out of ASIDs, so increment the global generation count */
+	generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION,
+						 &asid_generation);
+	flush_context(cpu);
+
+	/* We have at least 1 ASID per CPU, so this will always succeed */
+	asid = find_next_zero_bit(asid_map, NUM_USER_ASIDS, 1);
+
+set_asid:
+	__set_bit(asid, asid_map);
+	cur_idx = asid;
+
+bump_gen:
+	asid |= generation;
+	cpumask_clear(mm_cpumask(mm));
+	return asid;
 }
 
-/*
- * Reset the ASID on the current CPU. This function call is broadcast from the
- * CPU handling the ASID rollover and holding cpu_asid_lock.
- */
-static void reset_context(void *info)
+void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 {
-	unsigned int asid;
-	unsigned int cpu = smp_processor_id();
-	struct mm_struct *mm = current->active_mm;
+	unsigned long flags;
+	u64 asid;
+
+	asid = atomic64_read(&mm->context.id);
 
 	/*
-	 * current->active_mm could be init_mm for the idle thread immediately
-	 * after secondary CPU boot or hotplug. TTBR0_EL1 is already set to
-	 * the reserved value, so no need to reset any context.
+	 * The memory ordering here is subtle. We rely on the control
+	 * dependency between the generation read and the update of
+	 * active_asids to ensure that we are synchronised with a
+	 * parallel rollover (i.e. this pairs with the smp_wmb() in
+	 * flush_context).
 	 */
-	if (mm == &init_mm)
-		return;
+	if (!((asid ^ atomic64_read(&asid_generation)) >> asid_bits)
+	    && atomic64_xchg_relaxed(&per_cpu(active_asids, cpu), asid))
+		goto switch_mm_fastpath;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+	/* Check that our ASID belongs to the current generation. */
+	asid = atomic64_read(&mm->context.id);
+	if ((asid ^ atomic64_read(&asid_generation)) >> asid_bits) {
+		asid = new_context(mm, cpu);
+		atomic64_set(&mm->context.id, asid);
+	}
 
-	smp_rmb();
-	asid = cpu_last_asid + cpu;
+	if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending))
+		local_flush_tlb_all();
 
-	flush_context();
-	set_mm_context(mm, asid);
+	atomic64_set(&per_cpu(active_asids, cpu), asid);
+	cpumask_set_cpu(cpu, mm_cpumask(mm));
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
 
-	/* set the new ASID */
+switch_mm_fastpath:
 	cpu_switch_mm(mm->pgd, mm);
 }
 
-void __new_context(struct mm_struct *mm)
+static int asids_init(void)
 {
-	unsigned int asid;
-	unsigned int bits = asid_bits();
-
-	raw_spin_lock(&cpu_asid_lock);
-	/*
-	 * Check the ASID again, in case the change was broadcast from another
-	 * CPU before we acquired the lock.
-	 */
-	if (!unlikely((mm->context.id ^ cpu_last_asid) >> MAX_ASID_BITS)) {
-		cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
-		raw_spin_unlock(&cpu_asid_lock);
-		return;
-	}
-	/*
-	 * At this point, it is guaranteed that the current mm (with an old
-	 * ASID) isn't active on any other CPU since the ASIDs are changed
-	 * simultaneously via IPI.
-	 */
-	asid = ++cpu_last_asid;
-
-	/*
-	 * If we've used up all our ASIDs, we need to start a new version and
-	 * flush the TLB.
-	 */
-	if (unlikely((asid & ((1 << bits) - 1)) == 0)) {
-		/* increment the ASID version */
-		cpu_last_asid += (1 << MAX_ASID_BITS) - (1 << bits);
-		if (cpu_last_asid == 0)
-			cpu_last_asid = ASID_FIRST_VERSION;
-		asid = cpu_last_asid + smp_processor_id();
-		flush_context();
-		smp_wmb();
-		smp_call_function(reset_context, NULL, 1);
-		cpu_last_asid += NR_CPUS - 1;
+	int fld = cpuid_feature_extract_field(read_cpuid(ID_AA64MMFR0_EL1), 4);
+
+	switch (fld) {
+	default:
+		pr_warn("Unknown ASID size (%d); assuming 8-bit\n", fld);
+		/* Fallthrough */
+	case 0:
+		asid_bits = 8;
+		break;
+	case 2:
+		asid_bits = 16;
 	}
 
-	set_mm_context(mm, asid);
-	raw_spin_unlock(&cpu_asid_lock);
+	/* If we end up with more CPUs than ASIDs, expect things to crash */
+	WARN_ON(NUM_USER_ASIDS < num_possible_cpus());
+	atomic64_set(&asid_generation, ASID_FIRST_VERSION);
+	asid_map = kzalloc(BITS_TO_LONGS(NUM_USER_ASIDS) * sizeof(*asid_map),
+			   GFP_KERNEL);
+	if (!asid_map)
+		panic("Failed to allocate bitmap for %lu ASIDs\n",
+		      NUM_USER_ASIDS);
+
+	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
+	return 0;
 }
+early_initcall(asids_init);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index bbde13d77da5..91cb2eaac256 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -130,7 +130,7 @@ ENDPROC(cpu_do_resume)
  *	- pgd_phys - physical address of new TTB
  */
 ENTRY(cpu_do_switch_mm)
-	mmid	w1, x1				// get mm->context.id
+	mmid	x1, x1				// get mm->context.id
 	bfi	x0, x1, #48, #16		// set the ASID
 	msr	ttbr0_el1, x0			// set TTBR0
 	isb
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 05/10] arm64: tlbflush: remove redundant ASID casts to (unsigned long)
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (3 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 04/10] arm64: mm: rewrite ASID allocator and MM context-switching code Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 06/10] arm64: tlbflush: avoid flushing when fullmm == 1 Will Deacon
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

The ASID macro returns a 64-bit (long long) value, so there is no need
to cast to (unsigned long) before shifting prior to a TLBI operation.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 96f944e75dc4..93e9f964805c 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -81,7 +81,7 @@ static inline void flush_tlb_all(void)
 
 static inline void flush_tlb_mm(struct mm_struct *mm)
 {
-	unsigned long asid = (unsigned long)ASID(mm) << 48;
+	unsigned long asid = ASID(mm) << 48;
 
 	dsb(ishst);
 	asm("tlbi	aside1is, %0" : : "r" (asid));
@@ -91,8 +91,7 @@ static inline void flush_tlb_mm(struct mm_struct *mm)
 static inline void flush_tlb_page(struct vm_area_struct *vma,
 				  unsigned long uaddr)
 {
-	unsigned long addr = uaddr >> 12 |
-		((unsigned long)ASID(vma->vm_mm) << 48);
+	unsigned long addr = uaddr >> 12 | (ASID(vma->vm_mm) << 48);
 
 	dsb(ishst);
 	asm("tlbi	vale1is, %0" : : "r" (addr));
@@ -109,7 +108,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long start, unsigned long end,
 				     bool last_level)
 {
-	unsigned long asid = (unsigned long)ASID(vma->vm_mm) << 48;
+	unsigned long asid = ASID(vma->vm_mm) << 48;
 	unsigned long addr;
 
 	if ((end - start) > MAX_TLB_RANGE) {
@@ -162,7 +161,7 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end
 static inline void __flush_tlb_pgtable(struct mm_struct *mm,
 				       unsigned long uaddr)
 {
-	unsigned long addr = uaddr >> 12 | ((unsigned long)ASID(mm) << 48);
+	unsigned long addr = uaddr >> 12 | (ASID(mm) << 48);
 
 	dsb(ishst);
 	asm("tlbi	vae1is, %0" : : "r" (addr));
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 06/10] arm64: tlbflush: avoid flushing when fullmm == 1
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (4 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 05/10] arm64: tlbflush: remove redundant ASID casts to (unsigned long) Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 07/10] arm64: switch_mm: simplify mm and CPU checks Will Deacon
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

The TLB gather code sets fullmm=1 when tearing down the entire address
space for an mm_struct on exit or execve. Given that the ASID allocator
will never re-allocate a dirty ASID, this flushing is not needed and can
simply be avoided in the flushing code.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlb.h | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index d6e6b6660380..ffdaea7954bb 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -37,17 +37,21 @@ static inline void __tlb_remove_table(void *_table)
 
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
-	if (tlb->fullmm) {
-		flush_tlb_mm(tlb->mm);
-	} else {
-		struct vm_area_struct vma = { .vm_mm = tlb->mm, };
-		/*
-		 * The intermediate page table levels are already handled by
-		 * the __(pte|pmd|pud)_free_tlb() functions, so last level
-		 * TLBI is sufficient here.
-		 */
-		__flush_tlb_range(&vma, tlb->start, tlb->end, true);
-	}
+	struct vm_area_struct vma = { .vm_mm = tlb->mm, };
+
+	/*
+	 * The ASID allocator will either invalidate the ASID or mark
+	 * it as used.
+	 */
+	if (tlb->fullmm)
+		return;
+
+	/*
+	 * The intermediate page table levels are already handled by
+	 * the __(pte|pmd|pud)_free_tlb() functions, so last level
+	 * TLBI is sufficient here.
+	 */
+	__flush_tlb_range(&vma, tlb->start, tlb->end, true);
 }
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 07/10] arm64: switch_mm: simplify mm and CPU checks
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (5 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 06/10] arm64: tlbflush: avoid flushing when fullmm == 1 Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 08/10] arm64: mm: kill mm_cpumask usage Will Deacon
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

switch_mm performs some checks to try and avoid entering the ASID
allocator:

  (1) If we're switching to the init_mm (no user mappings), then simply
      set a reserved TTBR0 value with no page table (the zero page)

  (2) If prev == next *and* the mm_cpumask indicates that we've run on
      this CPU before, then we can skip the allocator.

However, there is plenty of redundancy here. With the new ASID allocator,
if prev == next, then we know that our ASID is valid and do not need to
worry about re-allocation. Consequently, we can drop the mm_cpumask check
in (2) and move the prev == next check before the init_mm check, since
if prev == next == init_mm then there's nothing to do.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h | 6 ++++--
 arch/arm64/mm/context.c              | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index f4c74a951b6c..c0e87898ba96 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -129,6 +129,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 {
 	unsigned int cpu = smp_processor_id();
 
+	if (prev == next)
+		return;
+
 	/*
 	 * init_mm.pgd does not contain any user mappings and it is always
 	 * active for kernel addresses in TTBR1. Just set the reserved TTBR0.
@@ -138,8 +141,7 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		return;
 	}
 
-	if (!cpumask_test_and_set_cpu(cpu, mm_cpumask(next)) || prev != next)
-		check_and_switch_context(next, tsk);
+	check_and_switch_context(next, cpu);
 }
 
 #define deactivate_mm(tsk,mm)	do { } while (0)
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index e902229b1a3d..4b9ec4484e3f 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -166,10 +166,10 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		local_flush_tlb_all();
 
 	atomic64_set(&per_cpu(active_asids, cpu), asid);
-	cpumask_set_cpu(cpu, mm_cpumask(mm));
 	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
 
 switch_mm_fastpath:
+	cpumask_set_cpu(cpu, mm_cpumask(mm));
 	cpu_switch_mm(mm->pgd, mm);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 08/10] arm64: mm: kill mm_cpumask usage
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (6 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 07/10] arm64: switch_mm: simplify mm and CPU checks Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 09/10] arm64: tlb: remove redundant barrier from __flush_tlb_pgtable Will Deacon
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

mm_cpumask isn't actually used for anything on arm64, so remove all the
code trying to keep it up-to-date.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/smp.c | 7 -------
 arch/arm64/mm/context.c | 2 --
 2 files changed, 9 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index fdd4d4dbd64f..03b0aa28ea61 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -142,7 +142,6 @@ asmlinkage void secondary_start_kernel(void)
 	 */
 	atomic_inc(&mm->mm_count);
 	current->active_mm = mm;
-	cpumask_set_cpu(cpu, mm_cpumask(mm));
 
 	set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
 	printk("CPU%u: Booted secondary processor\n", cpu);
@@ -233,12 +232,6 @@ int __cpu_disable(void)
 	 * OK - migrate IRQs away from this CPU
 	 */
 	migrate_irqs();
-
-	/*
-	 * Remove this CPU from the vm mask set of all processes.
-	 */
-	clear_tasks_mm_cpumask(cpu);
-
 	return 0;
 }
 
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 4b9ec4484e3f..f636a2639f03 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -132,7 +132,6 @@ set_asid:
 
 bump_gen:
 	asid |= generation;
-	cpumask_clear(mm_cpumask(mm));
 	return asid;
 }
 
@@ -169,7 +168,6 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
 
 switch_mm_fastpath:
-	cpumask_set_cpu(cpu, mm_cpumask(mm));
 	cpu_switch_mm(mm->pgd, mm);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 09/10] arm64: tlb: remove redundant barrier from __flush_tlb_pgtable
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (7 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 08/10] arm64: mm: kill mm_cpumask usage Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 10/10] arm64: mm: remove dsb from update_mmu_cache Will Deacon
  2015-10-07 10:59 ` [PATCH v2 00/10] arm64 switch_mm improvements Catalin Marinas
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

__flush_tlb_pgtable is used to invalidate intermediate page table
entries after they have been cleared and are about to be freed. Since
pXd_clear imply memory barriers, we don't need the extra one here.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 93e9f964805c..b460ae28e346 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -163,7 +163,6 @@ static inline void __flush_tlb_pgtable(struct mm_struct *mm,
 {
 	unsigned long addr = uaddr >> 12 | (ASID(mm) << 48);
 
-	dsb(ishst);
 	asm("tlbi	vae1is, %0" : : "r" (addr));
 	dsb(ish);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 10/10] arm64: mm: remove dsb from update_mmu_cache
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (8 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 09/10] arm64: tlb: remove redundant barrier from __flush_tlb_pgtable Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-07 10:59 ` [PATCH v2 00/10] arm64 switch_mm improvements Catalin Marinas
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

update_mmu_cache() consists of a dsb(ishst) instruction so that new user
mappings are guaranteed to be visible to the page table walker on
exception return.

In reality this can be a very expensive operation which is rarely needed.
Removing this barrier shows a modest improvement in hackbench scores and
, in the worst case, we re-take the user fault and establish that there
was nothing to do.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 26b066690593..0d18e88e1cfa 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -646,10 +646,10 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 				    unsigned long addr, pte_t *ptep)
 {
 	/*
-	 * set_pte() does not have a DSB for user mappings, so make sure that
-	 * the page table write is visible.
+	 * We don't do anything here, so there's a very small chance of
+	 * us retaking a user fault which we just fixed up. The alternative
+	 * is doing a dsb(ishst), but that penalises the fastpath.
 	 */
-	dsb(ishst);
 }
 
 #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation
  2015-10-06 17:46 ` [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation Will Deacon
@ 2015-10-07  1:18   ` David Daney
  2015-10-07  6:18   ` Ard Biesheuvel
  1 sibling, 0 replies; 15+ messages in thread
From: David Daney @ 2015-10-07  1:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/06/2015 10:46 AM, Will Deacon wrote:
> There are a number of places where a single CPU is running with a
> private page-table and we need to perform maintenance on the TLB and
> I-cache in order to ensure correctness, but do not require the operation
> to be broadcast to other CPUs.
>
> This patch adds local variants of tlb_flush_all and __flush_icache_all
> to support these use-cases and updates the callers respectively.
> __local_flush_icache_all also implies an isb, since it is intended to be
> used synchronously.
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

I like this one.  It is similar to what some of my earlier patches did.

Acked-by: David Daney <david.daney@cavium.com>


> ---
>   arch/arm64/include/asm/cacheflush.h | 7 +++++++
>   arch/arm64/include/asm/tlbflush.h   | 8 ++++++++
>   arch/arm64/kernel/efi.c             | 4 ++--
>   arch/arm64/kernel/smp.c             | 2 +-
>   arch/arm64/kernel/suspend.c         | 2 +-
>   arch/arm64/mm/context.c             | 4 ++--
>   arch/arm64/mm/mmu.c                 | 2 +-
>   7 files changed, 22 insertions(+), 7 deletions(-)
>
[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function
  2015-10-06 17:46 ` [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function Will Deacon
@ 2015-10-07  6:17   ` Ard Biesheuvel
  0 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2015-10-07  6:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 October 2015 at 18:46, Will Deacon <will.deacon@arm.com> wrote:
> With commit b08d4640a3dc ("arm64: remove dead code"),
> cpu_set_idmap_tcr_t0sz is no longer called and can therefore be removed
> from the kernel.
>
> This patch removes the function and effectively inlines the helper
> function __cpu_set_tcr_t0sz into cpu_set_default_tcr_t0sz.
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---
>  arch/arm64/include/asm/mmu_context.h | 35 ++++++++++++-----------------------
>  1 file changed, 12 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index 8ec41e5f56f0..549b89554ce8 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -77,34 +77,23 @@ static inline bool __cpu_uses_extended_idmap(void)
>                 unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
>  }
>
> -static inline void __cpu_set_tcr_t0sz(u64 t0sz)
> -{
> -       unsigned long tcr;
> -
> -       if (__cpu_uses_extended_idmap())
> -               asm volatile (
> -               "       mrs     %0, tcr_el1     ;"
> -               "       bfi     %0, %1, %2, %3  ;"
> -               "       msr     tcr_el1, %0     ;"
> -               "       isb"
> -               : "=&r" (tcr)
> -               : "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
> -}
> -
> -/*
> - * Set TCR.T0SZ to the value appropriate for activating the identity map.
> - */
> -static inline void cpu_set_idmap_tcr_t0sz(void)
> -{
> -       __cpu_set_tcr_t0sz(idmap_t0sz);
> -}
> -
>  /*
>   * Set TCR.T0SZ to its default value (based on VA_BITS)
>   */
>  static inline void cpu_set_default_tcr_t0sz(void)
>  {
> -       __cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
> +       unsigned long tcr;
> +
> +       if (!__cpu_uses_extended_idmap())
> +               return;
> +
> +       asm volatile (
> +       "       mrs     %0, tcr_el1     ;"
> +       "       bfi     %0, %1, %2, %3  ;"
> +       "       msr     tcr_el1, %0     ;"
> +       "       isb"
> +       : "=&r" (tcr)
> +       : "r"(TCR_T0SZ(VA_BITS)), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
>  }
>
>  static inline void switch_new_context(struct mm_struct *mm)
> --
> 2.1.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation
  2015-10-06 17:46 ` [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation Will Deacon
  2015-10-07  1:18   ` David Daney
@ 2015-10-07  6:18   ` Ard Biesheuvel
  1 sibling, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2015-10-07  6:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 October 2015 at 18:46, Will Deacon <will.deacon@arm.com> wrote:
> There are a number of places where a single CPU is running with a
> private page-table and we need to perform maintenance on the TLB and
> I-cache in order to ensure correctness, but do not require the operation
> to be broadcast to other CPUs.
>
> This patch adds local variants of tlb_flush_all and __flush_icache_all
> to support these use-cases and updates the callers respectively.
> __local_flush_icache_all also implies an isb, since it is intended to be
> used synchronously.
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---
>  arch/arm64/include/asm/cacheflush.h | 7 +++++++
>  arch/arm64/include/asm/tlbflush.h   | 8 ++++++++
>  arch/arm64/kernel/efi.c             | 4 ++--
>  arch/arm64/kernel/smp.c             | 2 +-
>  arch/arm64/kernel/suspend.c         | 2 +-
>  arch/arm64/mm/context.c             | 4 ++--
>  arch/arm64/mm/mmu.c                 | 2 +-
>  7 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index c75b8d027eb1..54efedaf331f 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -115,6 +115,13 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  extern void flush_dcache_page(struct page *);
>
> +static inline void __local_flush_icache_all(void)
> +{
> +       asm("ic iallu");
> +       dsb(nsh);
> +       isb();
> +}
> +
>  static inline void __flush_icache_all(void)
>  {
>         asm("ic ialluis");
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index 7bd2da021658..96f944e75dc4 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -63,6 +63,14 @@
>   *             only require the D-TLB to be invalidated.
>   *             - kaddr - Kernel virtual memory address
>   */
> +static inline void local_flush_tlb_all(void)
> +{
> +       dsb(nshst);
> +       asm("tlbi       vmalle1");
> +       dsb(nsh);
> +       isb();
> +}
> +
>  static inline void flush_tlb_all(void)
>  {
>         dsb(ishst);
> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
> index 13671a9cf016..4d12926ea40d 100644
> --- a/arch/arm64/kernel/efi.c
> +++ b/arch/arm64/kernel/efi.c
> @@ -344,9 +344,9 @@ static void efi_set_pgd(struct mm_struct *mm)
>         else
>                 cpu_switch_mm(mm->pgd, mm);
>
> -       flush_tlb_all();
> +       local_flush_tlb_all();
>         if (icache_is_aivivt())
> -               __flush_icache_all();
> +               __local_flush_icache_all();
>  }
>
>  void efi_virtmap_load(void)
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index dbdaacddd9a5..fdd4d4dbd64f 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -152,7 +152,7 @@ asmlinkage void secondary_start_kernel(void)
>          * point to zero page to avoid speculatively fetching new entries.
>          */
>         cpu_set_reserved_ttbr0();
> -       flush_tlb_all();
> +       local_flush_tlb_all();
>         cpu_set_default_tcr_t0sz();
>
>         preempt_disable();
> diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
> index 8297d502217e..3c5e4e6dcf68 100644
> --- a/arch/arm64/kernel/suspend.c
> +++ b/arch/arm64/kernel/suspend.c
> @@ -90,7 +90,7 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
>                 else
>                         cpu_switch_mm(mm->pgd, mm);
>
> -               flush_tlb_all();
> +               local_flush_tlb_all();
>
>                 /*
>                  * Restore per-cpu offset before any kernel
> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index d70ff14dbdbd..48b53fb381af 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -48,9 +48,9 @@ static void flush_context(void)
>  {
>         /* set the reserved TTBR0 before flushing the TLB */
>         cpu_set_reserved_ttbr0();
> -       flush_tlb_all();
> +       local_flush_tlb_all();
>         if (icache_is_aivivt())
> -               __flush_icache_all();
> +               __local_flush_icache_all();
>  }
>
>  static void set_mm_context(struct mm_struct *mm, unsigned int asid)
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 9211b8527f25..71a310478c9e 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -456,7 +456,7 @@ void __init paging_init(void)
>          * point to zero page to avoid speculatively fetching new entries.
>          */
>         cpu_set_reserved_ttbr0();
> -       flush_tlb_all();
> +       local_flush_tlb_all();
>         cpu_set_default_tcr_t0sz();
>  }
>
> --
> 2.1.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 00/10] arm64 switch_mm improvements
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (9 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 10/10] arm64: mm: remove dsb from update_mmu_cache Will Deacon
@ 2015-10-07 10:59 ` Catalin Marinas
  10 siblings, 0 replies; 15+ messages in thread
From: Catalin Marinas @ 2015-10-07 10:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 06, 2015 at 06:46:20PM +0100, Will Deacon wrote:
> Will Deacon (10):
>   arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function
>   arm64: proc: de-scope TLBI operation during cold boot
>   arm64: flush: use local TLB and I-cache invalidation
>   arm64: mm: rewrite ASID allocator and MM context-switching code
>   arm64: tlbflush: remove redundant ASID casts to (unsigned long)
>   arm64: tlbflush: avoid flushing when fullmm == 1
>   arm64: switch_mm: simplify mm and CPU checks
>   arm64: mm: kill mm_cpumask usage
>   arm64: tlb: remove redundant barrier from __flush_tlb_pgtable
>   arm64: mm: remove dsb from update_mmu_cache

Series queued for 4.4. Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-10-07 10:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
2015-10-06 17:46 ` [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function Will Deacon
2015-10-07  6:17   ` Ard Biesheuvel
2015-10-06 17:46 ` [PATCH v2 02/10] arm64: proc: de-scope TLBI operation during cold boot Will Deacon
2015-10-06 17:46 ` [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation Will Deacon
2015-10-07  1:18   ` David Daney
2015-10-07  6:18   ` Ard Biesheuvel
2015-10-06 17:46 ` [PATCH v2 04/10] arm64: mm: rewrite ASID allocator and MM context-switching code Will Deacon
2015-10-06 17:46 ` [PATCH v2 05/10] arm64: tlbflush: remove redundant ASID casts to (unsigned long) Will Deacon
2015-10-06 17:46 ` [PATCH v2 06/10] arm64: tlbflush: avoid flushing when fullmm == 1 Will Deacon
2015-10-06 17:46 ` [PATCH v2 07/10] arm64: switch_mm: simplify mm and CPU checks Will Deacon
2015-10-06 17:46 ` [PATCH v2 08/10] arm64: mm: kill mm_cpumask usage Will Deacon
2015-10-06 17:46 ` [PATCH v2 09/10] arm64: tlb: remove redundant barrier from __flush_tlb_pgtable Will Deacon
2015-10-06 17:46 ` [PATCH v2 10/10] arm64: mm: remove dsb from update_mmu_cache Will Deacon
2015-10-07 10:59 ` [PATCH v2 00/10] arm64 switch_mm improvements Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).