From: Sergey Matyukevich <geomatsi@gmail.com>
To: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
Cc: linux-riscv@lists.infradead.org, linux-arch@vger.kernel.org,
Anup Patel <anup@brainfault.org>,
Atish Patra <atishp@rivosinc.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Palmer Dabbelt <palmer@dabbelt.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Guo Ren <guoren@kernel.org>,
Alexandre Ghiti <alexandre.ghiti@canonical.com>,
Heiko Stuebner <heiko@sntech.de>,
Sergey Matyukevich <sergey.matyukevich@syntacore.com>,
Geert Uytterhoeven <geert+renesas@glider.be>,
Conor Dooley <Conor.Dooley@microchip.com>
Subject: Re: [RFC PATCH 1/1] riscv: mm: notify remote harts about mmu cache updates
Date: Sat, 24 Dec 2022 14:48:00 +0300 [thread overview]
Message-ID: <Y6bm8KIsHhFq1RFR@curiosity> (raw)
In-Reply-To: <CA+V-a8uwmwriyoSZ1ftVQ1L1_uTL3k=VSs7Cid7++XEshq1RsQ@mail.gmail.com>
Hi Prabhakar,
> > > > > > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > > > > > not take into account other harts currently running the same mm context
> > > > > > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > > > > > is enabled.
> > > > > > > > > >
> > > > > > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > > > > > harts currently running the same mm context as well as possible
> > > > > > > > > > migration of this context to other harts.
> > > > > > > > > >
> > > > > > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > > > > > >
> > > > > > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > > > > ---
> > > > > > > > > > arch/riscv/include/asm/mmu.h | 2 ++
> > > > > > > > > > arch/riscv/include/asm/pgtable.h | 2 +-
> > > > > > > > > > arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > > > > > > arch/riscv/mm/context.c | 10 ++++++++++
> > > > > > > > > > arch/riscv/mm/tlbflush.c | 28 +++++++++++-----------------
> > > > > > > > > > 5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > > > > > >
> > > > > > > <snip>
> > > > > > > > > [ 133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > > > > > [ 133.015338] Call Trace:
> > > > > > > > > [ 133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > > > > > [ 133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > > > > > [ 133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > > > > > [ 133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > > > > > [ 133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > > > > > [ 133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > > > > > [ 133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > > > > > [ 133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > > > > > [ 133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > > > > > [ 133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > > > > > [ 133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > > > > > init! exitcode=0x0000000b ]---
> > > > > > > > >
> > > > > > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > > > > > >
> > > > > > > > > Any pointers on what could be the issue here?
> > > > > > > > >
> > > > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Prabhakar
> > > > > > > >
> > > > > > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > > > > > possible ways to fix it can be found in the following email thread:
> > > > > > > >
> > > > > > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > > > > > >
> > > > > > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > > > > > if you have any issues with your test ? Besides, could you please share
> > > > > > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > > > > > >
> > > > > > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > > > > > >
> > > > > > > I have pasted the script here [0] and attached config.
> > > > > > >
> > > > > > > [0] https://paste.debian.net/hidden/a7a769b5/
> > > > > >
> > > > > > Thanks for the script and config. Could you please also share the
> > > > > > following information:
> > > > > > - how many cores your system has
> > > > > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> > > > >
> > > > > > - does your system support ASID
> > > > > >
> > > > > With a quick look at [0] It does support ASID, unless there is a way
> > > > > to disable it.
> > > > >
> > > > > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> > > >
> > > > So you have a single-core system, but your kernel configuration enables
> > > > CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> > > > time if CONFIG_SMP is disabled. On the other hand, system should not
> > > > fail that way even if SMP is enabled.
> > > >
> > > I enabled CONFIG_SMP while doing some testing of PMA code and indeed
> > > enabling this config should not introduce a failure.
> > >
> > > > Let me double-check if anything can go wrong if cpumasks may have only
> > > > a single cpu. Another suspect is a change in update_mmu_cache: probably
> > > > making it asid-specific (and thus more granular) was a bad idea.
> > > >
> > > Thanks.
> > >
> > > BTW I tested the patch [0] which you pointed out and that fixes the
> > > issues seen earlier.
> > >
> > > [0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/
> >
> > All looks good with cpumasks in the single-core case. So deferred TLB
> > flush logic is not even executed in your case. So the root cause
> > should be in update_mmu_cache change.
> >
> > May I ask you to repeat the original emmc test on your platform from
> > for-next (i.e. with [0] and without [1]) with the following partial revert:
> >
> > : diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > : index ec6fb83349ce..92ec2d9d7273 100644
> > : --- a/arch/riscv/include/asm/pgtable.h
> > : +++ b/arch/riscv/include/asm/pgtable.h
> > : @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> > : * Relying on flush_tlb_fix_spurious_fault would suffice, but
> > : * the extra traps reduce performance. So, eagerly SFENCE.VMA.
> > : */
> > : - flush_tlb_page(vma, address);
> > : + local_flush_tlb_page(address);
> > : }
> > :
> > : static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> >
> I tested your above proposed changes and I am no longer seeing an
> issue on my platform.
Great. I will send a fixup after I double-check on several other
hardware platforms. Thanks for testing !
Regards,
Sergey
WARNING: multiple messages have this Message-ID (diff)
From: Sergey Matyukevich <geomatsi@gmail.com>
To: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
Cc: linux-riscv@lists.infradead.org, linux-arch@vger.kernel.org,
Anup Patel <anup@brainfault.org>,
Atish Patra <atishp@rivosinc.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Palmer Dabbelt <palmer@dabbelt.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Guo Ren <guoren@kernel.org>,
Alexandre Ghiti <alexandre.ghiti@canonical.com>,
Heiko Stuebner <heiko@sntech.de>,
Sergey Matyukevich <sergey.matyukevich@syntacore.com>,
Geert Uytterhoeven <geert+renesas@glider.be>,
Conor Dooley <Conor.Dooley@microchip.com>
Subject: Re: [RFC PATCH 1/1] riscv: mm: notify remote harts about mmu cache updates
Date: Sat, 24 Dec 2022 14:48:00 +0300 [thread overview]
Message-ID: <Y6bm8KIsHhFq1RFR@curiosity> (raw)
In-Reply-To: <CA+V-a8uwmwriyoSZ1ftVQ1L1_uTL3k=VSs7Cid7++XEshq1RsQ@mail.gmail.com>
Hi Prabhakar,
> > > > > > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > > > > > not take into account other harts currently running the same mm context
> > > > > > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > > > > > is enabled.
> > > > > > > > > >
> > > > > > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > > > > > harts currently running the same mm context as well as possible
> > > > > > > > > > migration of this context to other harts.
> > > > > > > > > >
> > > > > > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > > > > > >
> > > > > > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > > > > ---
> > > > > > > > > > arch/riscv/include/asm/mmu.h | 2 ++
> > > > > > > > > > arch/riscv/include/asm/pgtable.h | 2 +-
> > > > > > > > > > arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > > > > > > arch/riscv/mm/context.c | 10 ++++++++++
> > > > > > > > > > arch/riscv/mm/tlbflush.c | 28 +++++++++++-----------------
> > > > > > > > > > 5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > > > > > >
> > > > > > > <snip>
> > > > > > > > > [ 133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > > > > > [ 133.015338] Call Trace:
> > > > > > > > > [ 133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > > > > > [ 133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > > > > > [ 133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > > > > > [ 133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > > > > > [ 133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > > > > > [ 133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > > > > > [ 133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > > > > > [ 133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > > > > > [ 133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > > > > > [ 133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > > > > > [ 133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > > > > > init! exitcode=0x0000000b ]---
> > > > > > > > >
> > > > > > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > > > > > >
> > > > > > > > > Any pointers on what could be the issue here?
> > > > > > > > >
> > > > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Prabhakar
> > > > > > > >
> > > > > > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > > > > > possible ways to fix it can be found in the following email thread:
> > > > > > > >
> > > > > > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > > > > > >
> > > > > > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > > > > > if you have any issues with your test ? Besides, could you please share
> > > > > > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > > > > > >
> > > > > > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > > > > > >
> > > > > > > I have pasted the script here [0] and attached config.
> > > > > > >
> > > > > > > [0] https://paste.debian.net/hidden/a7a769b5/
> > > > > >
> > > > > > Thanks for the script and config. Could you please also share the
> > > > > > following information:
> > > > > > - how many cores your system has
> > > > > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> > > > >
> > > > > > - does your system support ASID
> > > > > >
> > > > > With a quick look at [0] It does support ASID, unless there is a way
> > > > > to disable it.
> > > > >
> > > > > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> > > >
> > > > So you have a single-core system, but your kernel configuration enables
> > > > CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> > > > time if CONFIG_SMP is disabled. On the other hand, system should not
> > > > fail that way even if SMP is enabled.
> > > >
> > > I enabled CONFIG_SMP while doing some testing of PMA code and indeed
> > > enabling this config should not introduce a failure.
> > >
> > > > Let me double-check if anything can go wrong if cpumasks may have only
> > > > a single cpu. Another suspect is a change in update_mmu_cache: probably
> > > > making it asid-specific (and thus more granular) was a bad idea.
> > > >
> > > Thanks.
> > >
> > > BTW I tested the patch [0] which you pointed out and that fixes the
> > > issues seen earlier.
> > >
> > > [0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/
> >
> > All looks good with cpumasks in the single-core case. So deferred TLB
> > flush logic is not even executed in your case. So the root cause
> > should be in update_mmu_cache change.
> >
> > May I ask you to repeat the original emmc test on your platform from
> > for-next (i.e. with [0] and without [1]) with the following partial revert:
> >
> > : diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > : index ec6fb83349ce..92ec2d9d7273 100644
> > : --- a/arch/riscv/include/asm/pgtable.h
> > : +++ b/arch/riscv/include/asm/pgtable.h
> > : @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> > : * Relying on flush_tlb_fix_spurious_fault would suffice, but
> > : * the extra traps reduce performance. So, eagerly SFENCE.VMA.
> > : */
> > : - flush_tlb_page(vma, address);
> > : + local_flush_tlb_page(address);
> > : }
> > :
> > : static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> >
> I tested your above proposed changes and I am no longer seeing an
> issue on my platform.
Great. I will send a fixup after I double-check on several other
hardware platforms. Thanks for testing !
Regards,
Sergey
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2022-12-24 11:48 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-29 20:52 [RFC PATCH 1/1] riscv: mm: notify remote harts about mmu cache updates Sergey Matyukevich
2022-08-29 20:52 ` Sergey Matyukevich
2022-12-22 17:50 ` Lad, Prabhakar
2022-12-22 17:50 ` Lad, Prabhakar
2022-12-22 19:54 ` Sergey Matyukevich
2022-12-22 19:54 ` Sergey Matyukevich
2022-12-22 21:00 ` Lad, Prabhakar
2022-12-22 21:14 ` Sergey Matyukevich
2022-12-22 21:14 ` Sergey Matyukevich
2022-12-22 21:26 ` Lad, Prabhakar
2022-12-22 21:26 ` Lad, Prabhakar
2022-12-22 22:20 ` Sergey Matyukevich
2022-12-22 22:20 ` Sergey Matyukevich
2022-12-23 13:02 ` Lad, Prabhakar
2022-12-23 13:02 ` Lad, Prabhakar
2022-12-23 17:22 ` Sergey Matyukevich
2022-12-23 17:22 ` Sergey Matyukevich
2022-12-24 8:46 ` Lad, Prabhakar
2022-12-24 8:46 ` Lad, Prabhakar
2022-12-24 11:48 ` Sergey Matyukevich [this message]
2022-12-24 11:48 ` Sergey Matyukevich
2022-12-30 16:15 ` Lad, Prabhakar
2022-12-30 16:15 ` Lad, Prabhakar
2022-12-30 16:53 ` Sergey Matyukevich
2022-12-30 16:53 ` Sergey Matyukevich
2022-12-30 17:28 ` Lad, Prabhakar
2022-12-30 17:28 ` Lad, Prabhakar
2023-01-26 17:17 ` Lad, Prabhakar
2023-01-26 17:17 ` Lad, Prabhakar
2023-01-26 20:07 ` Sergey Matyukevich
2023-01-26 20:07 ` Sergey Matyukevich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y6bm8KIsHhFq1RFR@curiosity \
--to=geomatsi@gmail.com \
--cc=Conor.Dooley@microchip.com \
--cc=alexandre.ghiti@canonical.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=atishp@rivosinc.com \
--cc=geert+renesas@glider.be \
--cc=guoren@kernel.org \
--cc=heiko@sntech.de \
--cc=linux-arch@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=prabhakar.csengg@gmail.com \
--cc=sergey.matyukevich@syntacore.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.