From: guoren@kernel.org
To: anup@brainfault.org, paul.walmsley@sifive.com,
palmer@dabbelt.com, conor.dooley@microchip.com, heiko@sntech.de,
philipp.tomsich@vrull.eu, alex@ghiti.fr, hch@lst.de,
ajones@ventanamicro.com, gary@garyguo.net, jszhang@kernel.org
Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
Guo Ren <guoren@linux.alibaba.com>, Guo Ren <guoren@kernel.org>,
Anup Patel <apatel@ventanamicro.com>,
Palmer Dabbelt <palmer@rivosinc.com>
Subject: [PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash
Date: Fri, 11 Nov 2022 02:59:02 -0500 [thread overview]
Message-ID: <20221111075902.798571-1-guoren@kernel.org> (raw)
From: Guo Ren <guoren@linux.alibaba.com>
After use_asid_allocator is enabled, the userspace application will
crash by stale TLB entries. Because only using cpumask_clear_cpu without
local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
Then set_mm_asid would cause the user space application to get a stale
value by stale TLB entry, but set_mm_noasid is okay.
Here is the symptom of the bug:
unhandled signal 11 code 0x1 (coredump)
0x0000003fd6d22524 <+4>: auipc s0,0x70
0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490
=> 0x0000003fd6d2252c <+12>: ld a5,0(s0)
(gdb) i r s0
s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21
(gdb) x /2x 0x3fd6d92490
0x3fd6d92490: 0xd80ac8a8 0x0000003f
The core dump file shows that register s0 is wrong, but the value in
memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry
in TLB and got a wrong result from an incorrect physical address.
When the task ran on CPU0, which loaded/speculative-loaded the value of
address(0x3fd6d92490), then the first version of the mapping entry was
PTWed into CPU0's TLB.
When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by
asid), it happened to write a value on the address (0x3fd6d92490). It
caused do_page_fault -> wp_page_copy -> ptep_clear_flush ->
ptep_get_and_clear & flush_tlb_page.
The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB
flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous
switch_mm. So we only flushed the CPU1 TLB and set the second version
mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0
still used a stale TLB mapping entry which contained a wrong target
physical address. It raised a bug when the task happened to read that
value.
CPU0 CPU1
- switch 'task' in
- read addr (Fill stale mapping
entry into TLB)
- switch 'task' out (no tlb_flush)
- switch 'task' in (no tlb_flush)
- write addr cause pagefault
do_page_fault() (change to
new addr mapping)
wp_page_copy()
ptep_clear_flush()
ptep_get_and_clear()
& flush_tlb_page()
write new value into addr
- switch 'task' out (no tlb_flush)
- switch 'task' in (no tlb_flush)
- read addr again (Use stale
mapping entry in TLB)
get wrong value from old phyical
addr, BUG!
The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm,
which could guarantee to invalidate all stale TLB entries during TLB
flush.
Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Anup Patel <apatel@ventanamicro.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
---
Changes in v3:
- Move set/clear cpumask(mm) into set_mm (Make code more pretty
with Andrew's advice)
- Optimize comment description
Changes in v2:
- Fixup nommu compile problem (Thx Conor, Also Reported-by: kernel
test robot <lkp@intel.com>)
- Keep cpumask_clear_cpu for noasid
---
arch/riscv/mm/context.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 7acbfbd14557..0f784e3d307b 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -205,12 +205,24 @@ static void set_mm_noasid(struct mm_struct *mm)
local_flush_tlb_all();
}
-static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
+static inline void set_mm(struct mm_struct *prev,
+ struct mm_struct *next, unsigned int cpu)
{
- if (static_branch_unlikely(&use_asid_allocator))
- set_mm_asid(mm, cpu);
- else
- set_mm_noasid(mm);
+ /*
+ * The mm_cpumask indicates which harts' TLBs contain the virtual
+ * address mapping of the mm. Compared to noasid, using asid
+ * can't guarantee that stale TLB entries are invalidated because
+ * the asid mechanism wouldn't flush TLB for every switch_mm for
+ * performance. So when using asid, keep all CPUs footmarks in
+ * cpumask() until mm reset.
+ */
+ cpumask_set_cpu(cpu, mm_cpumask(next));
+ if (static_branch_unlikely(&use_asid_allocator)) {
+ set_mm_asid(next, cpu);
+ } else {
+ cpumask_clear_cpu(cpu, mm_cpumask(prev));
+ set_mm_noasid(next);
+ }
}
static int __init asids_init(void)
@@ -264,7 +276,8 @@ static int __init asids_init(void)
}
early_initcall(asids_init);
#else
-static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
+static inline void set_mm(struct mm_struct *prev,
+ struct mm_struct *next, unsigned int cpu)
{
/* Nothing to do here when there is no MMU */
}
@@ -317,10 +330,7 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
*/
cpu = smp_processor_id();
- cpumask_clear_cpu(cpu, mm_cpumask(prev));
- cpumask_set_cpu(cpu, mm_cpumask(next));
-
- set_mm(next, cpu);
+ set_mm(prev, next, cpu);
flush_icache_deferred(next, cpu);
}
--
2.36.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: guoren@kernel.org
To: anup@brainfault.org, paul.walmsley@sifive.com,
palmer@dabbelt.com, conor.dooley@microchip.com, heiko@sntech.de,
philipp.tomsich@vrull.eu, alex@ghiti.fr, hch@lst.de,
ajones@ventanamicro.com, gary@garyguo.net, jszhang@kernel.org
Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
Guo Ren <guoren@linux.alibaba.com>, Guo Ren <guoren@kernel.org>,
Anup Patel <apatel@ventanamicro.com>,
Palmer Dabbelt <palmer@rivosinc.com>
Subject: [PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash
Date: Fri, 11 Nov 2022 02:59:02 -0500 [thread overview]
Message-ID: <20221111075902.798571-1-guoren@kernel.org> (raw)
From: Guo Ren <guoren@linux.alibaba.com>
After use_asid_allocator is enabled, the userspace application will
crash by stale TLB entries. Because only using cpumask_clear_cpu without
local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
Then set_mm_asid would cause the user space application to get a stale
value by stale TLB entry, but set_mm_noasid is okay.
Here is the symptom of the bug:
unhandled signal 11 code 0x1 (coredump)
0x0000003fd6d22524 <+4>: auipc s0,0x70
0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490
=> 0x0000003fd6d2252c <+12>: ld a5,0(s0)
(gdb) i r s0
s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21
(gdb) x /2x 0x3fd6d92490
0x3fd6d92490: 0xd80ac8a8 0x0000003f
The core dump file shows that register s0 is wrong, but the value in
memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry
in TLB and got a wrong result from an incorrect physical address.
When the task ran on CPU0, which loaded/speculative-loaded the value of
address(0x3fd6d92490), then the first version of the mapping entry was
PTWed into CPU0's TLB.
When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by
asid), it happened to write a value on the address (0x3fd6d92490). It
caused do_page_fault -> wp_page_copy -> ptep_clear_flush ->
ptep_get_and_clear & flush_tlb_page.
The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB
flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous
switch_mm. So we only flushed the CPU1 TLB and set the second version
mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0
still used a stale TLB mapping entry which contained a wrong target
physical address. It raised a bug when the task happened to read that
value.
CPU0 CPU1
- switch 'task' in
- read addr (Fill stale mapping
entry into TLB)
- switch 'task' out (no tlb_flush)
- switch 'task' in (no tlb_flush)
- write addr cause pagefault
do_page_fault() (change to
new addr mapping)
wp_page_copy()
ptep_clear_flush()
ptep_get_and_clear()
& flush_tlb_page()
write new value into addr
- switch 'task' out (no tlb_flush)
- switch 'task' in (no tlb_flush)
- read addr again (Use stale
mapping entry in TLB)
get wrong value from old phyical
addr, BUG!
The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm,
which could guarantee to invalidate all stale TLB entries during TLB
flush.
Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Anup Patel <apatel@ventanamicro.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
---
Changes in v3:
- Move set/clear cpumask(mm) into set_mm (Make code more pretty
with Andrew's advice)
- Optimize comment description
Changes in v2:
- Fixup nommu compile problem (Thx Conor, Also Reported-by: kernel
test robot <lkp@intel.com>)
- Keep cpumask_clear_cpu for noasid
---
arch/riscv/mm/context.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 7acbfbd14557..0f784e3d307b 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -205,12 +205,24 @@ static void set_mm_noasid(struct mm_struct *mm)
local_flush_tlb_all();
}
-static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
+static inline void set_mm(struct mm_struct *prev,
+ struct mm_struct *next, unsigned int cpu)
{
- if (static_branch_unlikely(&use_asid_allocator))
- set_mm_asid(mm, cpu);
- else
- set_mm_noasid(mm);
+ /*
+ * The mm_cpumask indicates which harts' TLBs contain the virtual
+ * address mapping of the mm. Compared to noasid, using asid
+ * can't guarantee that stale TLB entries are invalidated because
+ * the asid mechanism wouldn't flush TLB for every switch_mm for
+ * performance. So when using asid, keep all CPUs footmarks in
+ * cpumask() until mm reset.
+ */
+ cpumask_set_cpu(cpu, mm_cpumask(next));
+ if (static_branch_unlikely(&use_asid_allocator)) {
+ set_mm_asid(next, cpu);
+ } else {
+ cpumask_clear_cpu(cpu, mm_cpumask(prev));
+ set_mm_noasid(next);
+ }
}
static int __init asids_init(void)
@@ -264,7 +276,8 @@ static int __init asids_init(void)
}
early_initcall(asids_init);
#else
-static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
+static inline void set_mm(struct mm_struct *prev,
+ struct mm_struct *next, unsigned int cpu)
{
/* Nothing to do here when there is no MMU */
}
@@ -317,10 +330,7 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
*/
cpu = smp_processor_id();
- cpumask_clear_cpu(cpu, mm_cpumask(prev));
- cpumask_set_cpu(cpu, mm_cpumask(next));
-
- set_mm(next, cpu);
+ set_mm(prev, next, cpu);
flush_icache_deferred(next, cpu);
}
--
2.36.1
next reply other threads:[~2022-11-11 7:59 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-11 7:59 guoren [this message]
2022-11-11 7:59 ` [PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash guoren
2022-11-11 8:29 ` Andrew Jones
2022-11-11 8:29 ` Andrew Jones
2022-11-18 20:57 ` Sergey Matyukevich
2022-11-18 20:57 ` Sergey Matyukevich
2022-11-19 3:37 ` Guo Ren
2022-11-19 3:37 ` Guo Ren
2022-11-21 20:03 ` Sergey Matyukevich
2022-11-21 20:03 ` Sergey Matyukevich
2022-12-08 23:30 ` Palmer Dabbelt
2022-12-08 23:30 ` Palmer Dabbelt
2022-12-09 3:13 ` Guo Ren
2022-12-09 3:13 ` Guo Ren
2022-12-09 3:15 ` Guo Ren
2022-12-09 3:15 ` Guo Ren
2022-11-21 19:59 ` Sergey Matyukevich
2022-11-21 19:59 ` Sergey Matyukevich
2022-12-23 12:53 ` Lad, Prabhakar
2022-12-23 12:53 ` Lad, Prabhakar
2023-02-23 17:57 ` Zong Li
2023-02-23 17:57 ` Zong Li
2023-02-23 23:20 ` Guo Ren
2023-02-23 23:20 ` Guo Ren
2023-02-25 19:29 ` Sergey Matyukevich
2023-02-25 19:29 ` Sergey Matyukevich
2023-02-26 4:24 ` Guo Ren
2023-02-26 4:24 ` Guo Ren
2023-02-27 22:40 ` Gary Guo
2023-02-27 22:40 ` Gary Guo
2023-02-28 3:12 ` Guo Ren
2023-02-28 3:12 ` Guo Ren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221111075902.798571-1-guoren@kernel.org \
--to=guoren@kernel.org \
--cc=ajones@ventanamicro.com \
--cc=alex@ghiti.fr \
--cc=anup@brainfault.org \
--cc=apatel@ventanamicro.com \
--cc=conor.dooley@microchip.com \
--cc=gary@garyguo.net \
--cc=guoren@linux.alibaba.com \
--cc=hch@lst.de \
--cc=heiko@sntech.de \
--cc=jszhang@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=palmer@rivosinc.com \
--cc=paul.walmsley@sifive.com \
--cc=philipp.tomsich@vrull.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.