[PATCH 0/2] riscv: asid: switch to alternative way to fix stale TLB entries

public inbox for linux-riscv@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH 0/2] riscv: asid: switch to alternative way to fix stale TLB entries
@ 2023-02-26 15:01 Sergey Matyukevich
  2023-02-26 15:01 ` [PATCH 1/2] Revert "riscv: mm: notify remote harts about mmu cache updates" Sergey Matyukevich
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Sergey Matyukevich @ 2023-02-26 15:01 UTC (permalink / raw)
  To: linux-riscv, linux-arch
  Cc: Lad Prabhakar, Zong Li, Guo Ren, Albert Ou, Palmer Dabbelt,
	Paul Walmsley, Sergey Matyukevich

Hi all,

Some time ago two different patches have been posted to fix stale TLB
entries that caused applications crashes.

The patch [0] suggested 'aggregating' mm_cpumask, i.e. current cpu is not
cleared for the switched-out task in switch_mm function. For additional
explanations see the commit message by Guo Ren. The same approach is
used by arc architecture, so another good comment is for switch_mm
in arch/arc/include/asm/mmu_context.h.

The patch [1] attempted to reduce the number of TLB flushes by deferring
(and possibly avoiding) them for CPUs not running the task.

Patch [1] has been merged. However we already have two bug reports from
different vendors. So apparently something is missing in the approach
suggested in [1]. In both cases the patch [0] fixed the issue.

This patch series reverts [1] and replaces it by [0].

Regards,
Sergey

[0] https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
[1] https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/

Guo Ren (1):
  riscv: asid: Fixup stale TLB entry cause application crash

Sergey Matyukevich (1):
  Revert "riscv: mm: notify remote harts about mmu cache updates"

 arch/riscv/include/asm/mmu.h      |  2 --
 arch/riscv/include/asm/tlbflush.h | 18 --------------
 arch/riscv/mm/context.c           | 40 +++++++++++++++----------------
 arch/riscv/mm/tlbflush.c          | 28 +++++++++++++---------
 4 files changed, 37 insertions(+), 51 deletions(-)

-- 
2.39.2

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] Revert "riscv: mm: notify remote harts about mmu cache updates"
  2023-02-26 15:01 [PATCH 0/2] riscv: asid: switch to alternative way to fix stale TLB entries Sergey Matyukevich
@ 2023-02-26 15:01 ` Sergey Matyukevich
  2023-02-28  3:15   ` Guo Ren
  2023-02-26 15:01 ` [PATCH 2/2] riscv: asid: Fixup stale TLB entry cause application crash Sergey Matyukevich
  2023-03-10  2:30 ` [PATCH 0/2] riscv: asid: switch to alternative way to fix stale TLB entries patchwork-bot+linux-riscv
  2 siblings, 1 reply; 6+ messages in thread
From: Sergey Matyukevich @ 2023-02-26 15:01 UTC (permalink / raw)
  To: linux-riscv, linux-arch
  Cc: Lad Prabhakar, Zong Li, Guo Ren, Albert Ou, Palmer Dabbelt,
	Paul Walmsley, Sergey Matyukevich, Sergey Matyukevich, stable

From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>

This reverts the remaining bits of commit 4bd1d80efb5a ("riscv: mm:
notify remote harts harts about mmu cache updates").

According to bug reports, suggested approach to fix stale TLB entries
is not sufficient. It needs to be replaced by a more robust solution.

Fixes: 4bd1d80efb5a ("riscv: mm: notify remote harts about mmu cache updates")
Reported-by: Zong Li <zong.li@sifive.com>
Reported-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Cc: stable@vger.kernel.org

---
 arch/riscv/include/asm/mmu.h      |  2 --
 arch/riscv/include/asm/tlbflush.h | 18 ------------------
 arch/riscv/mm/context.c           | 10 ----------
 arch/riscv/mm/tlbflush.c          | 28 +++++++++++++++++-----------
 4 files changed, 17 insertions(+), 41 deletions(-)

diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index 5ff1f19fd45c..0099dc116168 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -19,8 +19,6 @@ typedef struct {
 #ifdef CONFIG_SMP
 	/* A local icache flush is needed before user execution can resume. */
 	cpumask_t icache_stale_mask;
-	/* A local tlb flush is needed before user execution can resume. */
-	cpumask_t tlb_stale_mask;
 #endif
 } mm_context_t;
 
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 907b9efd39a8..801019381dea 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -22,24 +22,6 @@ static inline void local_flush_tlb_page(unsigned long addr)
 {
 	ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
 }
-
-static inline void local_flush_tlb_all_asid(unsigned long asid)
-{
-	__asm__ __volatile__ ("sfence.vma x0, %0"
-			:
-			: "r" (asid)
-			: "memory");
-}
-
-static inline void local_flush_tlb_page_asid(unsigned long addr,
-		unsigned long asid)
-{
-	__asm__ __volatile__ ("sfence.vma %0, %1"
-			:
-			: "r" (addr), "r" (asid)
-			: "memory");
-}
-
 #else /* CONFIG_MMU */
 #define local_flush_tlb_all()			do { } while (0)
 #define local_flush_tlb_page(addr)		do { } while (0)
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 80ce9caba8d2..7acbfbd14557 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -196,16 +196,6 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 
 	if (need_flush_tlb)
 		local_flush_tlb_all();
-#ifdef CONFIG_SMP
-	else {
-		cpumask_t *mask = &mm->context.tlb_stale_mask;
-
-		if (cpumask_test_cpu(cpu, mask)) {
-			cpumask_clear_cpu(cpu, mask);
-			local_flush_tlb_all_asid(cntx & asid_mask);
-		}
-	}
-#endif
 }
 
 static void set_mm_noasid(struct mm_struct *mm)
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index ce7dfc81bb3f..37ed760d007c 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -5,7 +5,23 @@
 #include <linux/sched.h>
 #include <asm/sbi.h>
 #include <asm/mmu_context.h>
-#include <asm/tlbflush.h>
+
+static inline void local_flush_tlb_all_asid(unsigned long asid)
+{
+	__asm__ __volatile__ ("sfence.vma x0, %0"
+			:
+			: "r" (asid)
+			: "memory");
+}
+
+static inline void local_flush_tlb_page_asid(unsigned long addr,
+		unsigned long asid)
+{
+	__asm__ __volatile__ ("sfence.vma %0, %1"
+			:
+			: "r" (addr), "r" (asid)
+			: "memory");
+}
 
 void flush_tlb_all(void)
 {
@@ -15,7 +31,6 @@ void flush_tlb_all(void)
 static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
 				  unsigned long size, unsigned long stride)
 {
-	struct cpumask *pmask = &mm->context.tlb_stale_mask;
 	struct cpumask *cmask = mm_cpumask(mm);
 	unsigned int cpuid;
 	bool broadcast;
@@ -29,15 +44,6 @@ static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
 	if (static_branch_unlikely(&use_asid_allocator)) {
 		unsigned long asid = atomic_long_read(&mm->context.id);
 
-		/*
-		 * TLB will be immediately flushed on harts concurrently
-		 * executing this MM context. TLB flush on other harts
-		 * is deferred until this MM context migrates there.
-		 */
-		cpumask_setall(pmask);
-		cpumask_clear_cpu(cpuid, pmask);
-		cpumask_andnot(pmask, pmask, cmask);
-
 		if (broadcast) {
 			sbi_remote_sfence_vma_asid(cmask, start, size, asid);
 		} else if (size <= stride) {
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] Revert "riscv: mm: notify remote harts about mmu cache updates"
  2023-02-26 15:01 ` [PATCH 1/2] Revert "riscv: mm: notify remote harts about mmu cache updates" Sergey Matyukevich
@ 2023-02-28  3:15   ` Guo Ren
  0 siblings, 0 replies; 6+ messages in thread
From: Guo Ren @ 2023-02-28  3:15 UTC (permalink / raw)
  To: Sergey Matyukevich
  Cc: linux-riscv, linux-arch, Lad Prabhakar, Zong Li, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Sergey Matyukevich, stable

Thx
Reviewed-by: Guo Ren <guoren@kernel.org>

On Sun, Feb 26, 2023 at 11:02 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
>
> From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
>
> This reverts the remaining bits of commit 4bd1d80efb5a ("riscv: mm:
> notify remote harts harts about mmu cache updates").
>
> According to bug reports, suggested approach to fix stale TLB entries
> is not sufficient. It needs to be replaced by a more robust solution.
>
> Fixes: 4bd1d80efb5a ("riscv: mm: notify remote harts about mmu cache updates")
> Reported-by: Zong Li <zong.li@sifive.com>
> Reported-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> Cc: stable@vger.kernel.org
>
> ---
>  arch/riscv/include/asm/mmu.h      |  2 --
>  arch/riscv/include/asm/tlbflush.h | 18 ------------------
>  arch/riscv/mm/context.c           | 10 ----------
>  arch/riscv/mm/tlbflush.c          | 28 +++++++++++++++++-----------
>  4 files changed, 17 insertions(+), 41 deletions(-)
>
> diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
> index 5ff1f19fd45c..0099dc116168 100644
> --- a/arch/riscv/include/asm/mmu.h
> +++ b/arch/riscv/include/asm/mmu.h
> @@ -19,8 +19,6 @@ typedef struct {
>  #ifdef CONFIG_SMP
>         /* A local icache flush is needed before user execution can resume. */
>         cpumask_t icache_stale_mask;
> -       /* A local tlb flush is needed before user execution can resume. */
> -       cpumask_t tlb_stale_mask;
>  #endif
>  } mm_context_t;
>
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 907b9efd39a8..801019381dea 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -22,24 +22,6 @@ static inline void local_flush_tlb_page(unsigned long addr)
>  {
>         ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
>  }
> -
> -static inline void local_flush_tlb_all_asid(unsigned long asid)
> -{
> -       __asm__ __volatile__ ("sfence.vma x0, %0"
> -                       :
> -                       : "r" (asid)
> -                       : "memory");
> -}
> -
> -static inline void local_flush_tlb_page_asid(unsigned long addr,
> -               unsigned long asid)
> -{
> -       __asm__ __volatile__ ("sfence.vma %0, %1"
> -                       :
> -                       : "r" (addr), "r" (asid)
> -                       : "memory");
> -}
> -
>  #else /* CONFIG_MMU */
>  #define local_flush_tlb_all()                  do { } while (0)
>  #define local_flush_tlb_page(addr)             do { } while (0)
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index 80ce9caba8d2..7acbfbd14557 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -196,16 +196,6 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>
>         if (need_flush_tlb)
>                 local_flush_tlb_all();
> -#ifdef CONFIG_SMP
> -       else {
> -               cpumask_t *mask = &mm->context.tlb_stale_mask;
> -
> -               if (cpumask_test_cpu(cpu, mask)) {
> -                       cpumask_clear_cpu(cpu, mask);
> -                       local_flush_tlb_all_asid(cntx & asid_mask);
> -               }
> -       }
> -#endif
>  }
>
>  static void set_mm_noasid(struct mm_struct *mm)
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index ce7dfc81bb3f..37ed760d007c 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -5,7 +5,23 @@
>  #include <linux/sched.h>
>  #include <asm/sbi.h>
>  #include <asm/mmu_context.h>
> -#include <asm/tlbflush.h>
> +
> +static inline void local_flush_tlb_all_asid(unsigned long asid)
> +{
> +       __asm__ __volatile__ ("sfence.vma x0, %0"
> +                       :
> +                       : "r" (asid)
> +                       : "memory");
> +}
> +
> +static inline void local_flush_tlb_page_asid(unsigned long addr,
> +               unsigned long asid)
> +{
> +       __asm__ __volatile__ ("sfence.vma %0, %1"
> +                       :
> +                       : "r" (addr), "r" (asid)
> +                       : "memory");
> +}
>
>  void flush_tlb_all(void)
>  {
> @@ -15,7 +31,6 @@ void flush_tlb_all(void)
>  static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
>                                   unsigned long size, unsigned long stride)
>  {
> -       struct cpumask *pmask = &mm->context.tlb_stale_mask;
>         struct cpumask *cmask = mm_cpumask(mm);
>         unsigned int cpuid;
>         bool broadcast;
> @@ -29,15 +44,6 @@ static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
>         if (static_branch_unlikely(&use_asid_allocator)) {
>                 unsigned long asid = atomic_long_read(&mm->context.id);
>
> -               /*
> -                * TLB will be immediately flushed on harts concurrently
> -                * executing this MM context. TLB flush on other harts
> -                * is deferred until this MM context migrates there.
> -                */
> -               cpumask_setall(pmask);
> -               cpumask_clear_cpu(cpuid, pmask);
> -               cpumask_andnot(pmask, pmask, cmask);
> -
>                 if (broadcast) {
>                         sbi_remote_sfence_vma_asid(cmask, start, size, asid);
>                 } else if (size <= stride) {
> --
> 2.39.2
>


-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] riscv: asid: Fixup stale TLB entry cause application crash
  2023-02-26 15:01 [PATCH 0/2] riscv: asid: switch to alternative way to fix stale TLB entries Sergey Matyukevich
  2023-02-26 15:01 ` [PATCH 1/2] Revert "riscv: mm: notify remote harts about mmu cache updates" Sergey Matyukevich
@ 2023-02-26 15:01 ` Sergey Matyukevich
  2023-02-26 16:44   ` Andrew Jones
  2023-03-10  2:30 ` [PATCH 0/2] riscv: asid: switch to alternative way to fix stale TLB entries patchwork-bot+linux-riscv
  2 siblings, 1 reply; 6+ messages in thread
From: Sergey Matyukevich @ 2023-02-26 15:01 UTC (permalink / raw)
  To: linux-riscv, linux-arch
  Cc: Lad Prabhakar, Zong Li, Guo Ren, Albert Ou, Palmer Dabbelt,
	Paul Walmsley, Sergey Matyukevich, Guo Ren, Sergey Matyukevich,
	Anup Patel, Palmer Dabbelt, stable

From: Guo Ren <guoren@linux.alibaba.com>

After use_asid_allocator is enabled, the userspace application will
crash by stale TLB entries. Because only using cpumask_clear_cpu without
local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
Then set_mm_asid would cause the user space application to get a stale
value by stale TLB entry, but set_mm_noasid is okay.

Here is the symptom of the bug:
unhandled signal 11 code 0x1 (coredump)
   0x0000003fd6d22524 <+4>:     auipc   s0,0x70
   0x0000003fd6d22528 <+8>:     ld      s0,-148(s0) # 0x3fd6d92490
=> 0x0000003fd6d2252c <+12>:    ld      a5,0(s0)
(gdb) i r s0
s0          0x8082ed1cc3198b21       0x8082ed1cc3198b21
(gdb) x /2x 0x3fd6d92490
0x3fd6d92490:   0xd80ac8a8      0x0000003f
The core dump file shows that register s0 is wrong, but the value in
memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry
in TLB and got a wrong result from an incorrect physical address.

When the task ran on CPU0, which loaded/speculative-loaded the value of
address(0x3fd6d92490), then the first version of the mapping entry was
PTWed into CPU0's TLB.
When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by
asid), it happened to write a value on the address (0x3fd6d92490). It
caused do_page_fault -> wp_page_copy -> ptep_clear_flush ->
ptep_get_and_clear & flush_tlb_page.
The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB
flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous
switch_mm. So we only flushed the CPU1 TLB and set the second version
mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0
still used a stale TLB mapping entry which contained a wrong target
physical address. It raised a bug when the task happened to read that
value.

   CPU0                               CPU1
   - switch 'task' in
   - read addr (Fill stale mapping
     entry into TLB)
   - switch 'task' out (no tlb_flush)
                                      - switch 'task' in (no tlb_flush)
                                      - write addr cause pagefault
                                        do_page_fault() (change to
                                        new addr mapping)
                                          wp_page_copy()
                                            ptep_clear_flush()
                                              ptep_get_and_clear()
                                              & flush_tlb_page()
                                        write new value into addr
                                      - switch 'task' out (no tlb_flush)
   - switch 'task' in (no tlb_flush)
   - read addr again (Use stale
     mapping entry in TLB)
     get wrong value from old phyical
     addr, BUG!

The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm,
which could guarantee to invalidate all stale TLB entries during TLB
flush.

Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Zong Li <zong.li@sifive.com>
Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Cc: Anup Patel <apatel@ventanamicro.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: stable@vger.kernel.org

---
 arch/riscv/mm/context.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 7acbfbd14557..0f784e3d307b 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -205,12 +205,24 @@ static void set_mm_noasid(struct mm_struct *mm)
 	local_flush_tlb_all();
 }
 
-static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
+static inline void set_mm(struct mm_struct *prev,
+			  struct mm_struct *next, unsigned int cpu)
 {
-	if (static_branch_unlikely(&use_asid_allocator))
-		set_mm_asid(mm, cpu);
-	else
-		set_mm_noasid(mm);
+	/*
+	 * The mm_cpumask indicates which harts' TLBs contain the virtual
+	 * address mapping of the mm. Compared to noasid, using asid
+	 * can't guarantee that stale TLB entries are invalidated because
+	 * the asid mechanism wouldn't flush TLB for every switch_mm for
+	 * performance. So when using asid, keep all CPUs footmarks in
+	 * cpumask() until mm reset.
+	 */
+	cpumask_set_cpu(cpu, mm_cpumask(next));
+	if (static_branch_unlikely(&use_asid_allocator)) {
+		set_mm_asid(next, cpu);
+	} else {
+		cpumask_clear_cpu(cpu, mm_cpumask(prev));
+		set_mm_noasid(next);
+	}
 }
 
 static int __init asids_init(void)
@@ -264,7 +276,8 @@ static int __init asids_init(void)
 }
 early_initcall(asids_init);
 #else
-static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
+static inline void set_mm(struct mm_struct *prev,
+			  struct mm_struct *next, unsigned int cpu)
 {
 	/* Nothing to do here when there is no MMU */
 }
@@ -317,10 +330,7 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 	 */
 	cpu = smp_processor_id();
 
-	cpumask_clear_cpu(cpu, mm_cpumask(prev));
-	cpumask_set_cpu(cpu, mm_cpumask(next));
-
-	set_mm(next, cpu);
+	set_mm(prev, next, cpu);
 
 	flush_icache_deferred(next, cpu);
 }
-- 
2.39.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] riscv: asid: Fixup stale TLB entry cause application crash
  2023-02-26 15:01 ` [PATCH 2/2] riscv: asid: Fixup stale TLB entry cause application crash Sergey Matyukevich
@ 2023-02-26 16:44   ` Andrew Jones
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Jones @ 2023-02-26 16:44 UTC (permalink / raw)
  To: Sergey Matyukevich
  Cc: linux-riscv, linux-arch, Lad Prabhakar, Zong Li, Guo Ren,
	Albert Ou, Palmer Dabbelt, Paul Walmsley, Guo Ren,
	Sergey Matyukevich, Anup Patel, Palmer Dabbelt, stable

On Sun, Feb 26, 2023 at 06:01:37PM +0300, Sergey Matyukevich wrote:
> From: Guo Ren <guoren@linux.alibaba.com>
> 
> After use_asid_allocator is enabled, the userspace application will
> crash by stale TLB entries. Because only using cpumask_clear_cpu without
> local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
> Then set_mm_asid would cause the user space application to get a stale
> value by stale TLB entry, but set_mm_noasid is okay.
> 
> Here is the symptom of the bug:
> unhandled signal 11 code 0x1 (coredump)
>    0x0000003fd6d22524 <+4>:     auipc   s0,0x70
>    0x0000003fd6d22528 <+8>:     ld      s0,-148(s0) # 0x3fd6d92490
> => 0x0000003fd6d2252c <+12>:    ld      a5,0(s0)
> (gdb) i r s0
> s0          0x8082ed1cc3198b21       0x8082ed1cc3198b21
> (gdb) x /2x 0x3fd6d92490
> 0x3fd6d92490:   0xd80ac8a8      0x0000003f
> The core dump file shows that register s0 is wrong, but the value in
> memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry
> in TLB and got a wrong result from an incorrect physical address.
> 
> When the task ran on CPU0, which loaded/speculative-loaded the value of
> address(0x3fd6d92490), then the first version of the mapping entry was
> PTWed into CPU0's TLB.
> When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by
> asid), it happened to write a value on the address (0x3fd6d92490). It
> caused do_page_fault -> wp_page_copy -> ptep_clear_flush ->
> ptep_get_and_clear & flush_tlb_page.
> The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB
> flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous
> switch_mm. So we only flushed the CPU1 TLB and set the second version
> mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0
> still used a stale TLB mapping entry which contained a wrong target
> physical address. It raised a bug when the task happened to read that
> value.
> 
>    CPU0                               CPU1
>    - switch 'task' in
>    - read addr (Fill stale mapping
>      entry into TLB)
>    - switch 'task' out (no tlb_flush)
>                                       - switch 'task' in (no tlb_flush)
>                                       - write addr cause pagefault
>                                         do_page_fault() (change to
>                                         new addr mapping)
>                                           wp_page_copy()
>                                             ptep_clear_flush()
>                                               ptep_get_and_clear()
>                                               & flush_tlb_page()
>                                         write new value into addr
>                                       - switch 'task' out (no tlb_flush)
>    - switch 'task' in (no tlb_flush)
>    - read addr again (Use stale
>      mapping entry in TLB)
>      get wrong value from old phyical
>      addr, BUG!
> 
> The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm,
> which could guarantee to invalidate all stale TLB entries during TLB
> flush.
> 
> Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator")
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
> Tested-by: Zong Li <zong.li@sifive.com>
> Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> Cc: Anup Patel <apatel@ventanamicro.com>
> Cc: Palmer Dabbelt <palmer@rivosinc.com>
> Cc: stable@vger.kernel.org
> 
> ---
>  arch/riscv/mm/context.c | 30 ++++++++++++++++++++----------
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index 7acbfbd14557..0f784e3d307b 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -205,12 +205,24 @@ static void set_mm_noasid(struct mm_struct *mm)
>  	local_flush_tlb_all();
>  }
>  
> -static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
> +static inline void set_mm(struct mm_struct *prev,
> +			  struct mm_struct *next, unsigned int cpu)
>  {
> -	if (static_branch_unlikely(&use_asid_allocator))
> -		set_mm_asid(mm, cpu);
> -	else
> -		set_mm_noasid(mm);
> +	/*
> +	 * The mm_cpumask indicates which harts' TLBs contain the virtual
> +	 * address mapping of the mm. Compared to noasid, using asid
> +	 * can't guarantee that stale TLB entries are invalidated because
> +	 * the asid mechanism wouldn't flush TLB for every switch_mm for
> +	 * performance. So when using asid, keep all CPUs footmarks in
> +	 * cpumask() until mm reset.
> +	 */
> +	cpumask_set_cpu(cpu, mm_cpumask(next));
> +	if (static_branch_unlikely(&use_asid_allocator)) {
> +		set_mm_asid(next, cpu);
> +	} else {
> +		cpumask_clear_cpu(cpu, mm_cpumask(prev));
> +		set_mm_noasid(next);
> +	}
>  }
>  
>  static int __init asids_init(void)
> @@ -264,7 +276,8 @@ static int __init asids_init(void)
>  }
>  early_initcall(asids_init);
>  #else
> -static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
> +static inline void set_mm(struct mm_struct *prev,
> +			  struct mm_struct *next, unsigned int cpu)
>  {
>  	/* Nothing to do here when there is no MMU */
>  }
> @@ -317,10 +330,7 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>  	 */
>  	cpu = smp_processor_id();
>  
> -	cpumask_clear_cpu(cpu, mm_cpumask(prev));
> -	cpumask_set_cpu(cpu, mm_cpumask(next));
> -
> -	set_mm(next, cpu);
> +	set_mm(prev, next, cpu);
>  
>  	flush_icache_deferred(next, cpu);
>  }
> -- 
> 2.39.2
>

This is identical to what I reviewed before, so my r-b could have been
kept, anyway here it is again

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] riscv: asid: switch to alternative way to fix stale TLB entries
  2023-02-26 15:01 [PATCH 0/2] riscv: asid: switch to alternative way to fix stale TLB entries Sergey Matyukevich
  2023-02-26 15:01 ` [PATCH 1/2] Revert "riscv: mm: notify remote harts about mmu cache updates" Sergey Matyukevich
  2023-02-26 15:01 ` [PATCH 2/2] riscv: asid: Fixup stale TLB entry cause application crash Sergey Matyukevich
@ 2023-03-10  2:30 ` patchwork-bot+linux-riscv
  2 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+linux-riscv @ 2023-03-10  2:30 UTC (permalink / raw)
  To: Sergey Matyukevich
  Cc: linux-riscv, linux-arch, prabhakar.mahadev-lad.rj, zong.li,
	guoren, aou, palmer, paul.walmsley

Hello:

This series was applied to riscv/linux.git (fixes)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Sun, 26 Feb 2023 18:01:35 +0300 you wrote:
> Hi all,
> 
> Some time ago two different patches have been posted to fix stale TLB
> entries that caused applications crashes.
> 
> The patch [0] suggested 'aggregating' mm_cpumask, i.e. current cpu is not
> cleared for the switched-out task in switch_mm function. For additional
> explanations see the commit message by Guo Ren. The same approach is
> used by arc architecture, so another good comment is for switch_mm
> in arch/arc/include/asm/mmu_context.h.
> 
> [...]

Here is the summary with links:
  - [1/2] Revert "riscv: mm: notify remote harts about mmu cache updates"
    https://git.kernel.org/riscv/c/e921050022f1
  - [2/2] riscv: asid: Fixup stale TLB entry cause application crash
    https://git.kernel.org/riscv/c/82dd33fde026

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-10  2:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-26 15:01 [PATCH 0/2] riscv: asid: switch to alternative way to fix stale TLB entries Sergey Matyukevich
2023-02-26 15:01 ` [PATCH 1/2] Revert "riscv: mm: notify remote harts about mmu cache updates" Sergey Matyukevich
2023-02-28  3:15   ` Guo Ren
2023-02-26 15:01 ` [PATCH 2/2] riscv: asid: Fixup stale TLB entry cause application crash Sergey Matyukevich
2023-02-26 16:44   ` Andrew Jones
2023-03-10  2:30 ` [PATCH 0/2] riscv: asid: switch to alternative way to fix stale TLB entries patchwork-bot+linux-riscv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox