From: Sergey Matyukevich <geomatsi@gmail.com>
To: guoren@kernel.org
Cc: anup@brainfault.org, paul.walmsley@sifive.com,
palmer@dabbelt.com, conor.dooley@microchip.com, heiko@sntech.de,
philipp.tomsich@vrull.eu, alex@ghiti.fr, hch@lst.de,
ajones@ventanamicro.com, gary@garyguo.net, jszhang@kernel.org,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
Guo Ren <guoren@linux.alibaba.com>,
Anup Patel <apatel@ventanamicro.com>,
Palmer Dabbelt <palmer@rivosinc.com>
Subject: Re: [PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash
Date: Mon, 21 Nov 2022 22:59:50 +0300 [thread overview]
Message-ID: <Y3vYtobn72sqA2Tq@curiosity> (raw)
In-Reply-To: <20221111075902.798571-1-guoren@kernel.org>
> After use_asid_allocator is enabled, the userspace application will
> crash by stale TLB entries. Because only using cpumask_clear_cpu without
> local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
> Then set_mm_asid would cause the user space application to get a stale
> value by stale TLB entry, but set_mm_noasid is okay.
>
> Here is the symptom of the bug:
> unhandled signal 11 code 0x1 (coredump)
> 0x0000003fd6d22524 <+4>: auipc s0,0x70
> 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490
> => 0x0000003fd6d2252c <+12>: ld a5,0(s0)
> (gdb) i r s0
> s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21
> (gdb) x /2x 0x3fd6d92490
> 0x3fd6d92490: 0xd80ac8a8 0x0000003f
> The core dump file shows that register s0 is wrong, but the value in
> memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry
> in TLB and got a wrong result from an incorrect physical address.
>
> When the task ran on CPU0, which loaded/speculative-loaded the value of
> address(0x3fd6d92490), then the first version of the mapping entry was
> PTWed into CPU0's TLB.
> When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by
> asid), it happened to write a value on the address (0x3fd6d92490). It
> caused do_page_fault -> wp_page_copy -> ptep_clear_flush ->
> ptep_get_and_clear & flush_tlb_page.
> The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB
> flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous
> switch_mm. So we only flushed the CPU1 TLB and set the second version
> mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0
> still used a stale TLB mapping entry which contained a wrong target
> physical address. It raised a bug when the task happened to read that
> value.
>
> CPU0 CPU1
> - switch 'task' in
> - read addr (Fill stale mapping
> entry into TLB)
> - switch 'task' out (no tlb_flush)
> - switch 'task' in (no tlb_flush)
> - write addr cause pagefault
> do_page_fault() (change to
> new addr mapping)
> wp_page_copy()
> ptep_clear_flush()
> ptep_get_and_clear()
> & flush_tlb_page()
> write new value into addr
> - switch 'task' out (no tlb_flush)
> - switch 'task' in (no tlb_flush)
> - read addr again (Use stale
> mapping entry in TLB)
> get wrong value from old phyical
> addr, BUG!
>
> The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm,
> which could guarantee to invalidate all stale TLB entries during TLB
> flush.
>
> Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator")
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> Cc: Anup Patel <apatel@ventanamicro.com>
> Cc: Palmer Dabbelt <palmer@rivosinc.com>
> ---
> Changes in v3:
> - Move set/clear cpumask(mm) into set_mm (Make code more pretty
> with Andrew's advice)
> - Optimize comment description
>
> Changes in v2:
> - Fixup nommu compile problem (Thx Conor, Also Reported-by: kernel
> test robot <lkp@intel.com>)
> - Keep cpumask_clear_cpu for noasid
> ---
> arch/riscv/mm/context.c | 30 ++++++++++++++++++++----------
> 1 file changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index 7acbfbd14557..0f784e3d307b 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -205,12 +205,24 @@ static void set_mm_noasid(struct mm_struct *mm)
> local_flush_tlb_all();
> }
>
> -static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
> +static inline void set_mm(struct mm_struct *prev,
> + struct mm_struct *next, unsigned int cpu)
> {
> - if (static_branch_unlikely(&use_asid_allocator))
> - set_mm_asid(mm, cpu);
> - else
> - set_mm_noasid(mm);
> + /*
> + * The mm_cpumask indicates which harts' TLBs contain the virtual
> + * address mapping of the mm. Compared to noasid, using asid
> + * can't guarantee that stale TLB entries are invalidated because
> + * the asid mechanism wouldn't flush TLB for every switch_mm for
> + * performance. So when using asid, keep all CPUs footmarks in
> + * cpumask() until mm reset.
> + */
> + cpumask_set_cpu(cpu, mm_cpumask(next));
> + if (static_branch_unlikely(&use_asid_allocator)) {
> + set_mm_asid(next, cpu);
> + } else {
> + cpumask_clear_cpu(cpu, mm_cpumask(prev));
> + set_mm_noasid(next);
> + }
> }
>
> static int __init asids_init(void)
> @@ -264,7 +276,8 @@ static int __init asids_init(void)
> }
> early_initcall(asids_init);
> #else
> -static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
> +static inline void set_mm(struct mm_struct *prev,
> + struct mm_struct *next, unsigned int cpu)
> {
> /* Nothing to do here when there is no MMU */
> }
> @@ -317,10 +330,7 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> */
> cpu = smp_processor_id();
>
> - cpumask_clear_cpu(cpu, mm_cpumask(prev));
> - cpumask_set_cpu(cpu, mm_cpumask(next));
> -
> - set_mm(next, cpu);
> + set_mm(prev, next, cpu);
>
> flush_icache_deferred(next, cpu);
> }
Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Thanks,
Sergey
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: Sergey Matyukevich <geomatsi@gmail.com>
To: guoren@kernel.org
Cc: anup@brainfault.org, paul.walmsley@sifive.com,
palmer@dabbelt.com, conor.dooley@microchip.com, heiko@sntech.de,
philipp.tomsich@vrull.eu, alex@ghiti.fr, hch@lst.de,
ajones@ventanamicro.com, gary@garyguo.net, jszhang@kernel.org,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
Guo Ren <guoren@linux.alibaba.com>,
Anup Patel <apatel@ventanamicro.com>,
Palmer Dabbelt <palmer@rivosinc.com>
Subject: Re: [PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash
Date: Mon, 21 Nov 2022 22:59:50 +0300 [thread overview]
Message-ID: <Y3vYtobn72sqA2Tq@curiosity> (raw)
In-Reply-To: <20221111075902.798571-1-guoren@kernel.org>
> After use_asid_allocator is enabled, the userspace application will
> crash by stale TLB entries. Because only using cpumask_clear_cpu without
> local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
> Then set_mm_asid would cause the user space application to get a stale
> value by stale TLB entry, but set_mm_noasid is okay.
>
> Here is the symptom of the bug:
> unhandled signal 11 code 0x1 (coredump)
> 0x0000003fd6d22524 <+4>: auipc s0,0x70
> 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490
> => 0x0000003fd6d2252c <+12>: ld a5,0(s0)
> (gdb) i r s0
> s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21
> (gdb) x /2x 0x3fd6d92490
> 0x3fd6d92490: 0xd80ac8a8 0x0000003f
> The core dump file shows that register s0 is wrong, but the value in
> memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry
> in TLB and got a wrong result from an incorrect physical address.
>
> When the task ran on CPU0, which loaded/speculative-loaded the value of
> address(0x3fd6d92490), then the first version of the mapping entry was
> PTWed into CPU0's TLB.
> When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by
> asid), it happened to write a value on the address (0x3fd6d92490). It
> caused do_page_fault -> wp_page_copy -> ptep_clear_flush ->
> ptep_get_and_clear & flush_tlb_page.
> The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB
> flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous
> switch_mm. So we only flushed the CPU1 TLB and set the second version
> mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0
> still used a stale TLB mapping entry which contained a wrong target
> physical address. It raised a bug when the task happened to read that
> value.
>
> CPU0 CPU1
> - switch 'task' in
> - read addr (Fill stale mapping
> entry into TLB)
> - switch 'task' out (no tlb_flush)
> - switch 'task' in (no tlb_flush)
> - write addr cause pagefault
> do_page_fault() (change to
> new addr mapping)
> wp_page_copy()
> ptep_clear_flush()
> ptep_get_and_clear()
> & flush_tlb_page()
> write new value into addr
> - switch 'task' out (no tlb_flush)
> - switch 'task' in (no tlb_flush)
> - read addr again (Use stale
> mapping entry in TLB)
> get wrong value from old phyical
> addr, BUG!
>
> The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm,
> which could guarantee to invalidate all stale TLB entries during TLB
> flush.
>
> Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator")
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> Cc: Anup Patel <apatel@ventanamicro.com>
> Cc: Palmer Dabbelt <palmer@rivosinc.com>
> ---
> Changes in v3:
> - Move set/clear cpumask(mm) into set_mm (Make code more pretty
> with Andrew's advice)
> - Optimize comment description
>
> Changes in v2:
> - Fixup nommu compile problem (Thx Conor, Also Reported-by: kernel
> test robot <lkp@intel.com>)
> - Keep cpumask_clear_cpu for noasid
> ---
> arch/riscv/mm/context.c | 30 ++++++++++++++++++++----------
> 1 file changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index 7acbfbd14557..0f784e3d307b 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -205,12 +205,24 @@ static void set_mm_noasid(struct mm_struct *mm)
> local_flush_tlb_all();
> }
>
> -static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
> +static inline void set_mm(struct mm_struct *prev,
> + struct mm_struct *next, unsigned int cpu)
> {
> - if (static_branch_unlikely(&use_asid_allocator))
> - set_mm_asid(mm, cpu);
> - else
> - set_mm_noasid(mm);
> + /*
> + * The mm_cpumask indicates which harts' TLBs contain the virtual
> + * address mapping of the mm. Compared to noasid, using asid
> + * can't guarantee that stale TLB entries are invalidated because
> + * the asid mechanism wouldn't flush TLB for every switch_mm for
> + * performance. So when using asid, keep all CPUs footmarks in
> + * cpumask() until mm reset.
> + */
> + cpumask_set_cpu(cpu, mm_cpumask(next));
> + if (static_branch_unlikely(&use_asid_allocator)) {
> + set_mm_asid(next, cpu);
> + } else {
> + cpumask_clear_cpu(cpu, mm_cpumask(prev));
> + set_mm_noasid(next);
> + }
> }
>
> static int __init asids_init(void)
> @@ -264,7 +276,8 @@ static int __init asids_init(void)
> }
> early_initcall(asids_init);
> #else
> -static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
> +static inline void set_mm(struct mm_struct *prev,
> + struct mm_struct *next, unsigned int cpu)
> {
> /* Nothing to do here when there is no MMU */
> }
> @@ -317,10 +330,7 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> */
> cpu = smp_processor_id();
>
> - cpumask_clear_cpu(cpu, mm_cpumask(prev));
> - cpumask_set_cpu(cpu, mm_cpumask(next));
> -
> - set_mm(next, cpu);
> + set_mm(prev, next, cpu);
>
> flush_icache_deferred(next, cpu);
> }
Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Thanks,
Sergey
next prev parent reply other threads:[~2022-11-21 20:00 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-11 7:59 [PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash guoren
2022-11-11 7:59 ` guoren
2022-11-11 8:29 ` Andrew Jones
2022-11-11 8:29 ` Andrew Jones
2022-11-18 20:57 ` Sergey Matyukevich
2022-11-18 20:57 ` Sergey Matyukevich
2022-11-19 3:37 ` Guo Ren
2022-11-19 3:37 ` Guo Ren
2022-11-21 20:03 ` Sergey Matyukevich
2022-11-21 20:03 ` Sergey Matyukevich
2022-12-08 23:30 ` Palmer Dabbelt
2022-12-08 23:30 ` Palmer Dabbelt
2022-12-09 3:13 ` Guo Ren
2022-12-09 3:13 ` Guo Ren
2022-12-09 3:15 ` Guo Ren
2022-12-09 3:15 ` Guo Ren
2022-11-21 19:59 ` Sergey Matyukevich [this message]
2022-11-21 19:59 ` Sergey Matyukevich
2022-12-23 12:53 ` Lad, Prabhakar
2022-12-23 12:53 ` Lad, Prabhakar
2023-02-23 17:57 ` Zong Li
2023-02-23 17:57 ` Zong Li
2023-02-23 23:20 ` Guo Ren
2023-02-23 23:20 ` Guo Ren
2023-02-25 19:29 ` Sergey Matyukevich
2023-02-25 19:29 ` Sergey Matyukevich
2023-02-26 4:24 ` Guo Ren
2023-02-26 4:24 ` Guo Ren
2023-02-27 22:40 ` Gary Guo
2023-02-27 22:40 ` Gary Guo
2023-02-28 3:12 ` Guo Ren
2023-02-28 3:12 ` Guo Ren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3vYtobn72sqA2Tq@curiosity \
--to=geomatsi@gmail.com \
--cc=ajones@ventanamicro.com \
--cc=alex@ghiti.fr \
--cc=anup@brainfault.org \
--cc=apatel@ventanamicro.com \
--cc=conor.dooley@microchip.com \
--cc=gary@garyguo.net \
--cc=guoren@kernel.org \
--cc=guoren@linux.alibaba.com \
--cc=hch@lst.de \
--cc=heiko@sntech.de \
--cc=jszhang@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=palmer@rivosinc.com \
--cc=paul.walmsley@sifive.com \
--cc=philipp.tomsich@vrull.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.