* [PATCH v8] riscv: mm: Add support for Svinval extension @ 2024-07-02 10:26 Mayuresh Chitale 2024-07-02 12:32 ` Alexandre Ghiti 2024-07-24 14:50 ` Palmer Dabbelt 0 siblings, 2 replies; 8+ messages in thread From: Mayuresh Chitale @ 2024-07-02 10:26 UTC (permalink / raw) To: linux-riscv, linux-kernel Cc: Mayuresh Chitale, Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti, Samuel Holland, Andrew Jones The Svinval extension splits SFENCE.VMA instruction into finer-grained invalidation and ordering operations and is mandatory for RVA23S64 profile. When Svinval is enabled the local_flush_tlb_range_threshold_asid function should use the following sequence to optimize the tlb flushes instead of a simple sfence.vma: sfence.w.inval svinval.vma . . svinval.vma sfence.inval.ir The maximum number of consecutive svinval.vma instructions that can be executed in local_flush_tlb_range_threshold_asid function is limited to 64. This is required to avoid soft lockups and the approach is similar to that used in arm64. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> --- Changes in v8: - Fix line wrap - Add RB tag Changes in v7: - Use existing svinval macros in the insn-def.h - Rename local_sinval_vma_asid to local_sinval_vma Changes in v6: - Rebase on latest torvalds/master Changes in v5: - Reduce tlb flush threshold to 64 - Improve implementation of local_flush_tlb* functions Changes in v4: - Rebase and refactor as per latest changes on torvalds/master - Drop patch 1 in the series Changes in v3: - Fix incorrect vma used for sinval instructions - Use unified static key mechanism for svinval - Rebased on torvalds/master Changes in v2: - Rebased on 5.18-rc3 - update riscv_fill_hwcap to probe Svinval extension arch/riscv/mm/tlbflush.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 9b6e86ce3867..782147a63f3b 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -6,6 +6,27 @@ #include <linux/hugetlb.h> #include <asm/sbi.h> #include <asm/mmu_context.h> +#include <asm/cpufeature.h> + +#define has_svinval() riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL) + +static inline void local_sfence_inval_ir(void) +{ + asm volatile(SFENCE_INVAL_IR() ::: "memory"); +} + +static inline void local_sfence_w_inval(void) +{ + asm volatile(SFENCE_W_INVAL() ::: "memory"); +} + +static inline void local_sinval_vma(unsigned long vma, unsigned long asid) +{ + if (asid != FLUSH_TLB_NO_ASID) + asm volatile(SINVAL_VMA(%0, %1) : : "r" (vma), "r" (asid) : "memory"); + else + asm volatile(SINVAL_VMA(%0, zero) : : "r" (vma) : "memory"); +} /* * Flush entire TLB if number of entries to be flushed is greater @@ -26,6 +47,16 @@ static void local_flush_tlb_range_threshold_asid(unsigned long start, return; } + if (has_svinval()) { + local_sfence_w_inval(); + for (i = 0; i < nr_ptes_in_range; ++i) { + local_sinval_vma(start, asid); + start += stride; + } + local_sfence_inval_ir(); + return; + } + for (i = 0; i < nr_ptes_in_range; ++i) { local_flush_tlb_page_asid(start, asid); start += stride; -- 2.34.1 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v8] riscv: mm: Add support for Svinval extension 2024-07-02 10:26 [PATCH v8] riscv: mm: Add support for Svinval extension Mayuresh Chitale @ 2024-07-02 12:32 ` Alexandre Ghiti 2024-07-24 14:50 ` Palmer Dabbelt 1 sibling, 0 replies; 8+ messages in thread From: Alexandre Ghiti @ 2024-07-02 12:32 UTC (permalink / raw) To: Mayuresh Chitale Cc: linux-riscv, linux-kernel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Samuel Holland, Andrew Jones Hi Mayuresh, On Tue, Jul 2, 2024 at 12:26 PM Mayuresh Chitale <mchitale@ventanamicro.com> wrote: > > The Svinval extension splits SFENCE.VMA instruction into finer-grained > invalidation and ordering operations and is mandatory for RVA23S64 profile. > When Svinval is enabled the local_flush_tlb_range_threshold_asid function > should use the following sequence to optimize the tlb flushes instead of > a simple sfence.vma: > > sfence.w.inval > svinval.vma > . > . > svinval.vma > sfence.inval.ir > > The maximum number of consecutive svinval.vma instructions that > can be executed in local_flush_tlb_range_threshold_asid function > is limited to 64. This is required to avoid soft lockups and the > approach is similar to that used in arm64. > > Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > --- > Changes in v8: > - Fix line wrap > - Add RB tag > > Changes in v7: > - Use existing svinval macros in the insn-def.h > - Rename local_sinval_vma_asid to local_sinval_vma > > Changes in v6: > - Rebase on latest torvalds/master > > Changes in v5: > - Reduce tlb flush threshold to 64 > - Improve implementation of local_flush_tlb* functions > > Changes in v4: > - Rebase and refactor as per latest changes on torvalds/master > - Drop patch 1 in the series > > Changes in v3: > - Fix incorrect vma used for sinval instructions > - Use unified static key mechanism for svinval > - Rebased on torvalds/master > > Changes in v2: > - Rebased on 5.18-rc3 > - update riscv_fill_hwcap to probe Svinval extension > > arch/riscv/mm/tlbflush.c | 32 ++++++++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c > index 9b6e86ce3867..782147a63f3b 100644 > --- a/arch/riscv/mm/tlbflush.c > +++ b/arch/riscv/mm/tlbflush.c > @@ -6,6 +6,27 @@ > #include <linux/hugetlb.h> > #include <asm/sbi.h> > #include <asm/mmu_context.h> > +#include <asm/cpufeature.h> > + > +#define has_svinval() riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL) > + > +static inline void local_sfence_inval_ir(void) > +{ > + asm volatile(SFENCE_INVAL_IR() ::: "memory"); > +} > + > +static inline void local_sfence_w_inval(void) > +{ > + asm volatile(SFENCE_W_INVAL() ::: "memory"); > +} > + > +static inline void local_sinval_vma(unsigned long vma, unsigned long asid) > +{ > + if (asid != FLUSH_TLB_NO_ASID) > + asm volatile(SINVAL_VMA(%0, %1) : : "r" (vma), "r" (asid) : "memory"); > + else > + asm volatile(SINVAL_VMA(%0, zero) : : "r" (vma) : "memory"); > +} > > /* > * Flush entire TLB if number of entries to be flushed is greater > @@ -26,6 +47,16 @@ static void local_flush_tlb_range_threshold_asid(unsigned long start, > return; > } > > + if (has_svinval()) { > + local_sfence_w_inval(); > + for (i = 0; i < nr_ptes_in_range; ++i) { > + local_sinval_vma(start, asid); > + start += stride; > + } > + local_sfence_inval_ir(); > + return; > + } > + > for (i = 0; i < nr_ptes_in_range; ++i) { > local_flush_tlb_page_asid(start, asid); > start += stride; > -- > 2.34.1 > Great, thanks again for reworking this patchset! Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Thanks, Alex _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v8] riscv: mm: Add support for Svinval extension 2024-07-02 10:26 [PATCH v8] riscv: mm: Add support for Svinval extension Mayuresh Chitale 2024-07-02 12:32 ` Alexandre Ghiti @ 2024-07-24 14:50 ` Palmer Dabbelt 2024-07-30 8:43 ` Mayuresh Chitale 1 sibling, 1 reply; 8+ messages in thread From: Palmer Dabbelt @ 2024-07-24 14:50 UTC (permalink / raw) To: mchitale Cc: linux-riscv, linux-kernel, mchitale, Paul Walmsley, aou, alexghiti, samuel.holland, ajones On Tue, 02 Jul 2024 03:26:37 PDT (-0700), mchitale@ventanamicro.com wrote: > The Svinval extension splits SFENCE.VMA instruction into finer-grained > invalidation and ordering operations and is mandatory for RVA23S64 profile. > When Svinval is enabled the local_flush_tlb_range_threshold_asid function > should use the following sequence to optimize the tlb flushes instead of Do you have any performance numbers for the optimization? As per here <https://lore.kernel.org/all/mhng-f799bd2b-7f22-4c03-bdb2-903fa3b5d508@palmer-ri-x1c9a/>. > a simple sfence.vma: > > sfence.w.inval > svinval.vma > . > . > svinval.vma > sfence.inval.ir > > The maximum number of consecutive svinval.vma instructions that > can be executed in local_flush_tlb_range_threshold_asid function > is limited to 64. This is required to avoid soft lockups and the > approach is similar to that used in arm64. > > Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > --- > Changes in v8: > - Fix line wrap > - Add RB tag > > Changes in v7: > - Use existing svinval macros in the insn-def.h > - Rename local_sinval_vma_asid to local_sinval_vma > > Changes in v6: > - Rebase on latest torvalds/master > > Changes in v5: > - Reduce tlb flush threshold to 64 > - Improve implementation of local_flush_tlb* functions > > Changes in v4: > - Rebase and refactor as per latest changes on torvalds/master > - Drop patch 1 in the series > > Changes in v3: > - Fix incorrect vma used for sinval instructions > - Use unified static key mechanism for svinval > - Rebased on torvalds/master > > Changes in v2: > - Rebased on 5.18-rc3 > - update riscv_fill_hwcap to probe Svinval extension > > arch/riscv/mm/tlbflush.c | 32 ++++++++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c > index 9b6e86ce3867..782147a63f3b 100644 > --- a/arch/riscv/mm/tlbflush.c > +++ b/arch/riscv/mm/tlbflush.c > @@ -6,6 +6,27 @@ > #include <linux/hugetlb.h> > #include <asm/sbi.h> > #include <asm/mmu_context.h> > +#include <asm/cpufeature.h> > + > +#define has_svinval() riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL) > + > +static inline void local_sfence_inval_ir(void) > +{ > + asm volatile(SFENCE_INVAL_IR() ::: "memory"); > +} > + > +static inline void local_sfence_w_inval(void) > +{ > + asm volatile(SFENCE_W_INVAL() ::: "memory"); > +} > + > +static inline void local_sinval_vma(unsigned long vma, unsigned long asid) > +{ > + if (asid != FLUSH_TLB_NO_ASID) > + asm volatile(SINVAL_VMA(%0, %1) : : "r" (vma), "r" (asid) : "memory"); > + else > + asm volatile(SINVAL_VMA(%0, zero) : : "r" (vma) : "memory"); > +} > > /* > * Flush entire TLB if number of entries to be flushed is greater > @@ -26,6 +47,16 @@ static void local_flush_tlb_range_threshold_asid(unsigned long start, > return; > } > > + if (has_svinval()) { > + local_sfence_w_inval(); > + for (i = 0; i < nr_ptes_in_range; ++i) { > + local_sinval_vma(start, asid); > + start += stride; > + } > + local_sfence_inval_ir(); > + return; > + } > + > for (i = 0; i < nr_ptes_in_range; ++i) { > local_flush_tlb_page_asid(start, asid); > start += stride; _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v8] riscv: mm: Add support for Svinval extension 2024-07-24 14:50 ` Palmer Dabbelt @ 2024-07-30 8:43 ` Mayuresh Chitale 2025-05-20 16:55 ` Alexandre Ghiti 0 siblings, 1 reply; 8+ messages in thread From: Mayuresh Chitale @ 2024-07-30 8:43 UTC (permalink / raw) To: Palmer Dabbelt Cc: linux-riscv, linux-kernel, Paul Walmsley, aou, alexghiti, samuel.holland, ajones On Wed, Jul 24, 2024 at 8:20 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Tue, 02 Jul 2024 03:26:37 PDT (-0700), mchitale@ventanamicro.com wrote: > > The Svinval extension splits SFENCE.VMA instruction into finer-grained > > invalidation and ordering operations and is mandatory for RVA23S64 profile. > > When Svinval is enabled the local_flush_tlb_range_threshold_asid function > > should use the following sequence to optimize the tlb flushes instead of > > Do you have any performance numbers for the optimization? As per here > <https://lore.kernel.org/all/mhng-f799bd2b-7f22-4c03-bdb2-903fa3b5d508@palmer-ri-x1c9a/>. No, currently there are no numbers available for comparison but the rationale for the optimization is described in the spec. The extension is mandatory for the RVA23S64 profile but any platform that doesn't support this extension will not be impacted as the code executes only if the svinval extension is enabled at the boot up. > > > a simple sfence.vma: > > > > sfence.w.inval > > svinval.vma > > . > > . > > svinval.vma > > sfence.inval.ir > > > > The maximum number of consecutive svinval.vma instructions that > > can be executed in local_flush_tlb_range_threshold_asid function > > is limited to 64. This is required to avoid soft lockups and the > > approach is similar to that used in arm64. > > > > Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > > --- > > Changes in v8: > > - Fix line wrap > > - Add RB tag > > > > Changes in v7: > > - Use existing svinval macros in the insn-def.h > > - Rename local_sinval_vma_asid to local_sinval_vma > > > > Changes in v6: > > - Rebase on latest torvalds/master > > > > Changes in v5: > > - Reduce tlb flush threshold to 64 > > - Improve implementation of local_flush_tlb* functions > > > > Changes in v4: > > - Rebase and refactor as per latest changes on torvalds/master > > - Drop patch 1 in the series > > > > Changes in v3: > > - Fix incorrect vma used for sinval instructions > > - Use unified static key mechanism for svinval > > - Rebased on torvalds/master > > > > Changes in v2: > > - Rebased on 5.18-rc3 > > - update riscv_fill_hwcap to probe Svinval extension > > > > arch/riscv/mm/tlbflush.c | 32 ++++++++++++++++++++++++++++++++ > > 1 file changed, 32 insertions(+) > > > > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c > > index 9b6e86ce3867..782147a63f3b 100644 > > --- a/arch/riscv/mm/tlbflush.c > > +++ b/arch/riscv/mm/tlbflush.c > > @@ -6,6 +6,27 @@ > > #include <linux/hugetlb.h> > > #include <asm/sbi.h> > > #include <asm/mmu_context.h> > > +#include <asm/cpufeature.h> > > + > > +#define has_svinval() riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL) > > + > > +static inline void local_sfence_inval_ir(void) > > +{ > > + asm volatile(SFENCE_INVAL_IR() ::: "memory"); > > +} > > + > > +static inline void local_sfence_w_inval(void) > > +{ > > + asm volatile(SFENCE_W_INVAL() ::: "memory"); > > +} > > + > > +static inline void local_sinval_vma(unsigned long vma, unsigned long asid) > > +{ > > + if (asid != FLUSH_TLB_NO_ASID) > > + asm volatile(SINVAL_VMA(%0, %1) : : "r" (vma), "r" (asid) : "memory"); > > + else > > + asm volatile(SINVAL_VMA(%0, zero) : : "r" (vma) : "memory"); > > +} > > > > /* > > * Flush entire TLB if number of entries to be flushed is greater > > @@ -26,6 +47,16 @@ static void local_flush_tlb_range_threshold_asid(unsigned long start, > > return; > > } > > > > + if (has_svinval()) { > > + local_sfence_w_inval(); > > + for (i = 0; i < nr_ptes_in_range; ++i) { > > + local_sinval_vma(start, asid); > > + start += stride; > > + } > > + local_sfence_inval_ir(); > > + return; > > + } > > + > > for (i = 0; i < nr_ptes_in_range; ++i) { > > local_flush_tlb_page_asid(start, asid); > > start += stride; _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v8] riscv: mm: Add support for Svinval extension 2024-07-30 8:43 ` Mayuresh Chitale @ 2025-05-20 16:55 ` Alexandre Ghiti 2025-05-22 15:11 ` Mayuresh Chitale 0 siblings, 1 reply; 8+ messages in thread From: Alexandre Ghiti @ 2025-05-20 16:55 UTC (permalink / raw) To: Mayuresh Chitale, Palmer Dabbelt Cc: linux-riscv, linux-kernel, Paul Walmsley, aou, alexghiti, samuel.holland, ajones Hi Mayuresh! On 7/30/24 10:43, Mayuresh Chitale wrote: > On Wed, Jul 24, 2024 at 8:20 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >> On Tue, 02 Jul 2024 03:26:37 PDT (-0700), mchitale@ventanamicro.com wrote: >>> The Svinval extension splits SFENCE.VMA instruction into finer-grained >>> invalidation and ordering operations and is mandatory for RVA23S64 profile. >>> When Svinval is enabled the local_flush_tlb_range_threshold_asid function >>> should use the following sequence to optimize the tlb flushes instead of >> Do you have any performance numbers for the optimization? As per here >> <https://lore.kernel.org/all/mhng-f799bd2b-7f22-4c03-bdb2-903fa3b5d508@palmer-ri-x1c9a/>. > No, currently there are no numbers available for comparison but the > rationale for the optimization is described in the spec. The extension > is mandatory for the RVA23S64 profile but any platform that doesn't > support this extension will not be impacted as the code executes only > if the svinval extension is enabled at the boot up. So I finally have some numbers! I tested this patchset on the BananaPi: I measured the number of cycles when flushing 64 entries (which is our current threshold) and I made sure to touch the entries beforehand. Here they are: * svinval: #cycles: 364920 #cycles: 365856 #cycles: 367993 * !svinval: #cycles: 663585 #cycles: 663105 #cycles: 664073 So that's roughly /2 using svinval. To me that's good enough to merge that for 6.16 :) Sorry for the very very long delay and thanks again for the multiple revisions! Alex >>> a simple sfence.vma: >>> >>> sfence.w.inval >>> svinval.vma >>> . >>> . >>> svinval.vma >>> sfence.inval.ir >>> >>> The maximum number of consecutive svinval.vma instructions that >>> can be executed in local_flush_tlb_range_threshold_asid function >>> is limited to 64. This is required to avoid soft lockups and the >>> approach is similar to that used in arm64. >>> >>> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> >>> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> >>> --- >>> Changes in v8: >>> - Fix line wrap >>> - Add RB tag >>> >>> Changes in v7: >>> - Use existing svinval macros in the insn-def.h >>> - Rename local_sinval_vma_asid to local_sinval_vma >>> >>> Changes in v6: >>> - Rebase on latest torvalds/master >>> >>> Changes in v5: >>> - Reduce tlb flush threshold to 64 >>> - Improve implementation of local_flush_tlb* functions >>> >>> Changes in v4: >>> - Rebase and refactor as per latest changes on torvalds/master >>> - Drop patch 1 in the series >>> >>> Changes in v3: >>> - Fix incorrect vma used for sinval instructions >>> - Use unified static key mechanism for svinval >>> - Rebased on torvalds/master >>> >>> Changes in v2: >>> - Rebased on 5.18-rc3 >>> - update riscv_fill_hwcap to probe Svinval extension >>> >>> arch/riscv/mm/tlbflush.c | 32 ++++++++++++++++++++++++++++++++ >>> 1 file changed, 32 insertions(+) >>> >>> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c >>> index 9b6e86ce3867..782147a63f3b 100644 >>> --- a/arch/riscv/mm/tlbflush.c >>> +++ b/arch/riscv/mm/tlbflush.c >>> @@ -6,6 +6,27 @@ >>> #include <linux/hugetlb.h> >>> #include <asm/sbi.h> >>> #include <asm/mmu_context.h> >>> +#include <asm/cpufeature.h> >>> + >>> +#define has_svinval() riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL) >>> + >>> +static inline void local_sfence_inval_ir(void) >>> +{ >>> + asm volatile(SFENCE_INVAL_IR() ::: "memory"); >>> +} >>> + >>> +static inline void local_sfence_w_inval(void) >>> +{ >>> + asm volatile(SFENCE_W_INVAL() ::: "memory"); >>> +} >>> + >>> +static inline void local_sinval_vma(unsigned long vma, unsigned long asid) >>> +{ >>> + if (asid != FLUSH_TLB_NO_ASID) >>> + asm volatile(SINVAL_VMA(%0, %1) : : "r" (vma), "r" (asid) : "memory"); >>> + else >>> + asm volatile(SINVAL_VMA(%0, zero) : : "r" (vma) : "memory"); >>> +} >>> >>> /* >>> * Flush entire TLB if number of entries to be flushed is greater >>> @@ -26,6 +47,16 @@ static void local_flush_tlb_range_threshold_asid(unsigned long start, >>> return; >>> } >>> >>> + if (has_svinval()) { >>> + local_sfence_w_inval(); >>> + for (i = 0; i < nr_ptes_in_range; ++i) { >>> + local_sinval_vma(start, asid); >>> + start += stride; >>> + } >>> + local_sfence_inval_ir(); >>> + return; >>> + } >>> + >>> for (i = 0; i < nr_ptes_in_range; ++i) { >>> local_flush_tlb_page_asid(start, asid); >>> start += stride; > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v8] riscv: mm: Add support for Svinval extension 2025-05-20 16:55 ` Alexandre Ghiti @ 2025-05-22 15:11 ` Mayuresh Chitale 2025-06-06 18:22 ` Palmer Dabbelt 0 siblings, 1 reply; 8+ messages in thread From: Mayuresh Chitale @ 2025-05-22 15:11 UTC (permalink / raw) To: Alexandre Ghiti, Palmer Dabbelt Cc: linux-riscv, linux-kernel, Paul Walmsley, aou, alexghiti, samuel.holland, ajones Hi Alex, On Tue, May 20, 2025 at 10:25 PM Alexandre Ghiti <alex@ghiti.fr> wrote: > > Hi Mayuresh! > > On 7/30/24 10:43, Mayuresh Chitale wrote: > > On Wed, Jul 24, 2024 at 8:20 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > >> On Tue, 02 Jul 2024 03:26:37 PDT (-0700), mchitale@ventanamicro.com wrote: > >>> The Svinval extension splits SFENCE.VMA instruction into finer-grained > >>> invalidation and ordering operations and is mandatory for RVA23S64 profile. > >>> When Svinval is enabled the local_flush_tlb_range_threshold_asid function > >>> should use the following sequence to optimize the tlb flushes instead of > >> Do you have any performance numbers for the optimization? As per here > >> <https://lore.kernel.org/all/mhng-f799bd2b-7f22-4c03-bdb2-903fa3b5d508@palmer-ri-x1c9a/>. > > No, currently there are no numbers available for comparison but the > > rationale for the optimization is described in the spec. The extension > > is mandatory for the RVA23S64 profile but any platform that doesn't > > support this extension will not be impacted as the code executes only > > if the svinval extension is enabled at the boot up. > > > So I finally have some numbers! I tested this patchset on the BananaPi: > I measured the number of cycles when flushing 64 entries (which is our > current threshold) and I made sure to touch the entries beforehand. > > Here they are: > > * svinval: > > #cycles: 364920 > #cycles: 365856 > #cycles: 367993 > > * !svinval: > > #cycles: 663585 > #cycles: 663105 > #cycles: 664073 > That's awesome !! Thank you so much for getting the data. > So that's roughly /2 using svinval. To me that's good enough to merge > that for 6.16 :) > > Sorry for the very very long delay and thanks again for the multiple > revisions! > > Alex > > > >>> a simple sfence.vma: > >>> > >>> sfence.w.inval > >>> svinval.vma > >>> . > >>> . > >>> svinval.vma > >>> sfence.inval.ir > >>> > >>> The maximum number of consecutive svinval.vma instructions that > >>> can be executed in local_flush_tlb_range_threshold_asid function > >>> is limited to 64. This is required to avoid soft lockups and the > >>> approach is similar to that used in arm64. > >>> > >>> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> > >>> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > >>> --- > >>> Changes in v8: > >>> - Fix line wrap > >>> - Add RB tag > >>> > >>> Changes in v7: > >>> - Use existing svinval macros in the insn-def.h > >>> - Rename local_sinval_vma_asid to local_sinval_vma > >>> > >>> Changes in v6: > >>> - Rebase on latest torvalds/master > >>> > >>> Changes in v5: > >>> - Reduce tlb flush threshold to 64 > >>> - Improve implementation of local_flush_tlb* functions > >>> > >>> Changes in v4: > >>> - Rebase and refactor as per latest changes on torvalds/master > >>> - Drop patch 1 in the series > >>> > >>> Changes in v3: > >>> - Fix incorrect vma used for sinval instructions > >>> - Use unified static key mechanism for svinval > >>> - Rebased on torvalds/master > >>> > >>> Changes in v2: > >>> - Rebased on 5.18-rc3 > >>> - update riscv_fill_hwcap to probe Svinval extension > >>> > >>> arch/riscv/mm/tlbflush.c | 32 ++++++++++++++++++++++++++++++++ > >>> 1 file changed, 32 insertions(+) > >>> > >>> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c > >>> index 9b6e86ce3867..782147a63f3b 100644 > >>> --- a/arch/riscv/mm/tlbflush.c > >>> +++ b/arch/riscv/mm/tlbflush.c > >>> @@ -6,6 +6,27 @@ > >>> #include <linux/hugetlb.h> > >>> #include <asm/sbi.h> > >>> #include <asm/mmu_context.h> > >>> +#include <asm/cpufeature.h> > >>> + > >>> +#define has_svinval() riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL) > >>> + > >>> +static inline void local_sfence_inval_ir(void) > >>> +{ > >>> + asm volatile(SFENCE_INVAL_IR() ::: "memory"); > >>> +} > >>> + > >>> +static inline void local_sfence_w_inval(void) > >>> +{ > >>> + asm volatile(SFENCE_W_INVAL() ::: "memory"); > >>> +} > >>> + > >>> +static inline void local_sinval_vma(unsigned long vma, unsigned long asid) > >>> +{ > >>> + if (asid != FLUSH_TLB_NO_ASID) > >>> + asm volatile(SINVAL_VMA(%0, %1) : : "r" (vma), "r" (asid) : "memory"); > >>> + else > >>> + asm volatile(SINVAL_VMA(%0, zero) : : "r" (vma) : "memory"); > >>> +} > >>> > >>> /* > >>> * Flush entire TLB if number of entries to be flushed is greater > >>> @@ -26,6 +47,16 @@ static void local_flush_tlb_range_threshold_asid(unsigned long start, > >>> return; > >>> } > >>> > >>> + if (has_svinval()) { > >>> + local_sfence_w_inval(); > >>> + for (i = 0; i < nr_ptes_in_range; ++i) { > >>> + local_sinval_vma(start, asid); > >>> + start += stride; > >>> + } > >>> + local_sfence_inval_ir(); > >>> + return; > >>> + } > >>> + > >>> for (i = 0; i < nr_ptes_in_range; ++i) { > >>> local_flush_tlb_page_asid(start, asid); > >>> start += stride; > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v8] riscv: mm: Add support for Svinval extension 2025-05-22 15:11 ` Mayuresh Chitale @ 2025-06-06 18:22 ` Palmer Dabbelt 2025-06-13 14:08 ` Alexandre Ghiti 0 siblings, 1 reply; 8+ messages in thread From: Palmer Dabbelt @ 2025-06-06 18:22 UTC (permalink / raw) To: mchitale Cc: Alexandre Ghiti, linux-riscv, linux-kernel, Paul Walmsley, aou, alexghiti, samuel.holland, ajones On Thu, 22 May 2025 08:11:09 PDT (-0700), mchitale@ventanamicro.com wrote: > Hi Alex, > > On Tue, May 20, 2025 at 10:25 PM Alexandre Ghiti <alex@ghiti.fr> wrote: >> >> Hi Mayuresh! >> >> On 7/30/24 10:43, Mayuresh Chitale wrote: >> > On Wed, Jul 24, 2024 at 8:20 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >> >> On Tue, 02 Jul 2024 03:26:37 PDT (-0700), mchitale@ventanamicro.com wrote: >> >>> The Svinval extension splits SFENCE.VMA instruction into finer-grained >> >>> invalidation and ordering operations and is mandatory for RVA23S64 profile. >> >>> When Svinval is enabled the local_flush_tlb_range_threshold_asid function >> >>> should use the following sequence to optimize the tlb flushes instead of >> >> Do you have any performance numbers for the optimization? As per here >> >> <https://lore.kernel.org/all/mhng-f799bd2b-7f22-4c03-bdb2-903fa3b5d508@palmer-ri-x1c9a/>. >> > No, currently there are no numbers available for comparison but the >> > rationale for the optimization is described in the spec. The extension >> > is mandatory for the RVA23S64 profile but any platform that doesn't >> > support this extension will not be impacted as the code executes only >> > if the svinval extension is enabled at the boot up. I seem to remember writing something like this up at some point, but I didn't see it so I figured I'd just write it up again in case anyone's looking in the future... My core worry here is that there's two ways to tuse the Svinval extension: * This, which just replaces the sfence.vma with a loop of Svinval instructions. This keeps the Svinval instructions close to each other. * Some more extensive modifications to move the Svinval instructions farther apart from each other. Which approach is better is going to depend on what them microarchicture implements: does it want the Svinval instructions close to each other so it can pattern match them, or does it want them far apart so the ordering instructions stall less. It's entirely possible that the wrong flavor of Svinval will actually decrease performance on systems that were built around the other flavor, and we don't know which flavor is getting implemented. So it's not one of these extensions where we can just say "these are only used on systems that implement the extensions, so it's obviously faster". We really need some numbers to demonstrate this flavor is actually faster... >> So I finally have some numbers! I tested this patchset on the BananaPi: >> I measured the number of cycles when flushing 64 entries (which is our >> current threshold) and I made sure to touch the entries beforehand. >> >> Here they are: >> >> * svinval: >> >> #cycles: 364920 >> #cycles: 365856 >> #cycles: 367993 >> >> * !svinval: >> >> #cycles: 663585 >> #cycles: 663105 >> #cycles: 664073 >> > > That's awesome !! Thank you so much for getting the data. So I think that's good enough to start, so it's going up to Linus and hopefully it lands (there's been a bit of chaos, but hopefully that's all sorted out). >> So that's roughly /2 using svinval. To me that's good enough to merge >> that for 6.16 :) Ya, but probably low enough that we're going to want a lower threshold too -- if the svinval loop is only a factor of 2 faster for 64 entries, it's probably going to end up slower at some point... On the flip side, it's a big enough change that we probably want to re-evaluate tlb_flush_all_threshold, too. >> Sorry for the very very long delay and thanks again for the multiple >> revisions! >> >> Alex >> >> >> >>> a simple sfence.vma: >> >>> >> >>> sfence.w.inval >> >>> svinval.vma >> >>> . >> >>> . >> >>> svinval.vma >> >>> sfence.inval.ir >> >>> >> >>> The maximum number of consecutive svinval.vma instructions that >> >>> can be executed in local_flush_tlb_range_threshold_asid function >> >>> is limited to 64. This is required to avoid soft lockups and the >> >>> approach is similar to that used in arm64. >> >>> >> >>> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> >> >>> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> >> >>> --- >> >>> Changes in v8: >> >>> - Fix line wrap >> >>> - Add RB tag >> >>> >> >>> Changes in v7: >> >>> - Use existing svinval macros in the insn-def.h >> >>> - Rename local_sinval_vma_asid to local_sinval_vma >> >>> >> >>> Changes in v6: >> >>> - Rebase on latest torvalds/master >> >>> >> >>> Changes in v5: >> >>> - Reduce tlb flush threshold to 64 >> >>> - Improve implementation of local_flush_tlb* functions >> >>> >> >>> Changes in v4: >> >>> - Rebase and refactor as per latest changes on torvalds/master >> >>> - Drop patch 1 in the series >> >>> >> >>> Changes in v3: >> >>> - Fix incorrect vma used for sinval instructions >> >>> - Use unified static key mechanism for svinval >> >>> - Rebased on torvalds/master >> >>> >> >>> Changes in v2: >> >>> - Rebased on 5.18-rc3 >> >>> - update riscv_fill_hwcap to probe Svinval extension >> >>> >> >>> arch/riscv/mm/tlbflush.c | 32 ++++++++++++++++++++++++++++++++ >> >>> 1 file changed, 32 insertions(+) >> >>> >> >>> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c >> >>> index 9b6e86ce3867..782147a63f3b 100644 >> >>> --- a/arch/riscv/mm/tlbflush.c >> >>> +++ b/arch/riscv/mm/tlbflush.c >> >>> @@ -6,6 +6,27 @@ >> >>> #include <linux/hugetlb.h> >> >>> #include <asm/sbi.h> >> >>> #include <asm/mmu_context.h> >> >>> +#include <asm/cpufeature.h> >> >>> + >> >>> +#define has_svinval() riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL) >> >>> + >> >>> +static inline void local_sfence_inval_ir(void) >> >>> +{ >> >>> + asm volatile(SFENCE_INVAL_IR() ::: "memory"); >> >>> +} >> >>> + >> >>> +static inline void local_sfence_w_inval(void) >> >>> +{ >> >>> + asm volatile(SFENCE_W_INVAL() ::: "memory"); >> >>> +} >> >>> + >> >>> +static inline void local_sinval_vma(unsigned long vma, unsigned long asid) >> >>> +{ >> >>> + if (asid != FLUSH_TLB_NO_ASID) >> >>> + asm volatile(SINVAL_VMA(%0, %1) : : "r" (vma), "r" (asid) : "memory"); >> >>> + else >> >>> + asm volatile(SINVAL_VMA(%0, zero) : : "r" (vma) : "memory"); >> >>> +} >> >>> >> >>> /* >> >>> * Flush entire TLB if number of entries to be flushed is greater >> >>> @@ -26,6 +47,16 @@ static void local_flush_tlb_range_threshold_asid(unsigned long start, >> >>> return; >> >>> } >> >>> >> >>> + if (has_svinval()) { >> >>> + local_sfence_w_inval(); >> >>> + for (i = 0; i < nr_ptes_in_range; ++i) { >> >>> + local_sinval_vma(start, asid); >> >>> + start += stride; >> >>> + } >> >>> + local_sfence_inval_ir(); >> >>> + return; >> >>> + } >> >>> + >> >>> for (i = 0; i < nr_ptes_in_range; ++i) { >> >>> local_flush_tlb_page_asid(start, asid); >> >>> start += stride; >> > _______________________________________________ >> > linux-riscv mailing list >> > linux-riscv@lists.infradead.org >> > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v8] riscv: mm: Add support for Svinval extension 2025-06-06 18:22 ` Palmer Dabbelt @ 2025-06-13 14:08 ` Alexandre Ghiti 0 siblings, 0 replies; 8+ messages in thread From: Alexandre Ghiti @ 2025-06-13 14:08 UTC (permalink / raw) To: Palmer Dabbelt, mchitale Cc: linux-riscv, linux-kernel, Paul Walmsley, aou, alexghiti, samuel.holland, ajones On 6/6/25 20:22, Palmer Dabbelt wrote: > On Thu, 22 May 2025 08:11:09 PDT (-0700), mchitale@ventanamicro.com > wrote: >> Hi Alex, >> >> On Tue, May 20, 2025 at 10:25 PM Alexandre Ghiti <alex@ghiti.fr> wrote: >>> >>> Hi Mayuresh! >>> >>> On 7/30/24 10:43, Mayuresh Chitale wrote: >>> > On Wed, Jul 24, 2024 at 8:20 PM Palmer Dabbelt >>> <palmer@dabbelt.com> wrote: >>> >> On Tue, 02 Jul 2024 03:26:37 PDT (-0700), >>> mchitale@ventanamicro.com wrote: >>> >>> The Svinval extension splits SFENCE.VMA instruction into >>> finer-grained >>> >>> invalidation and ordering operations and is mandatory for >>> RVA23S64 profile. >>> >>> When Svinval is enabled the local_flush_tlb_range_threshold_asid >>> function >>> >>> should use the following sequence to optimize the tlb flushes >>> instead of >>> >> Do you have any performance numbers for the optimization? As per >>> here >>> >> >>> <https://lore.kernel.org/all/mhng-f799bd2b-7f22-4c03-bdb2-903fa3b5d508@palmer-ri-x1c9a/>. >>> > No, currently there are no numbers available for comparison but the >>> > rationale for the optimization is described in the spec. The >>> extension >>> > is mandatory for the RVA23S64 profile but any platform that doesn't >>> > support this extension will not be impacted as the code executes only >>> > if the svinval extension is enabled at the boot up. > > I seem to remember writing something like this up at some point, but I > didn't see it so I figured I'd just write it up again in case anyone's > looking in the future... > > My core worry here is that there's two ways to tuse the Svinval > extension: > > * This, which just replaces the sfence.vma with a loop of Svinval > instructions. This keeps the Svinval instructions close to each other. > * Some more extensive modifications to move the Svinval instructions > farther apart from each other. Yes and those 2 ways are complementary, the next step is to implement the second one. > > Which approach is better is going to depend on what them > microarchicture implements: does it want the Svinval instructions > close to each other so it can pattern match them, or does it want them > far apart so the ordering instructions stall less. It's entirely > possible that the wrong flavor of Svinval will actually decrease > performance on systems that were built around the other flavor, and we > don't know which flavor is getting implemented. > > So it's not one of these extensions where we can just say "these are > only used on systems that implement the extensions, so it's obviously > faster". We really need some numbers to demonstrate this flavor is > actually faster... I don't see how an svinval implementation would lead in that case to poorer performance, unless the platform implements a "dummy" svinval to conform to rva23, in that case, I think we should not worry about this case in the long run. > >>> So I finally have some numbers! I tested this patchset on the BananaPi: >>> I measured the number of cycles when flushing 64 entries (which is our >>> current threshold) and I made sure to touch the entries beforehand. >>> >>> Here they are: >>> >>> * svinval: >>> >>> #cycles: 364920 >>> #cycles: 365856 >>> #cycles: 367993 >>> >>> * !svinval: >>> >>> #cycles: 663585 >>> #cycles: 663105 >>> #cycles: 664073 >>> >> >> That's awesome !! Thank you so much for getting the data. > > So I think that's good enough to start, so it's going up to Linus and > hopefully it lands (there's been a bit of chaos, but hopefully that's > all sorted out). > >>> So that's roughly /2 using svinval. To me that's good enough to merge >>> that for 6.16 :) > > Ya, but probably low enough that we're going to want a lower threshold > too -- if the svinval loop is only a factor of 2 faster for 64 > entries, it's probably going to end up slower at some point... > > On the flip side, it's a big enough change that we probably want to > re-evaluate tlb_flush_all_threshold, too. > >>> Sorry for the very very long delay and thanks again for the multiple >>> revisions! >>> >>> Alex >>> >>> >>> >>> a simple sfence.vma: >>> >>> >>> >>> sfence.w.inval >>> >>> svinval.vma >>> >>> . >>> >>> . >>> >>> svinval.vma >>> >>> sfence.inval.ir >>> >>> >>> >>> The maximum number of consecutive svinval.vma instructions that >>> >>> can be executed in local_flush_tlb_range_threshold_asid function >>> >>> is limited to 64. This is required to avoid soft lockups and the >>> >>> approach is similar to that used in arm64. >>> >>> >>> >>> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> >>> >>> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> >>> >>> --- >>> >>> Changes in v8: >>> >>> - Fix line wrap >>> >>> - Add RB tag >>> >>> >>> >>> Changes in v7: >>> >>> - Use existing svinval macros in the insn-def.h >>> >>> - Rename local_sinval_vma_asid to local_sinval_vma >>> >>> >>> >>> Changes in v6: >>> >>> - Rebase on latest torvalds/master >>> >>> >>> >>> Changes in v5: >>> >>> - Reduce tlb flush threshold to 64 >>> >>> - Improve implementation of local_flush_tlb* functions >>> >>> >>> >>> Changes in v4: >>> >>> - Rebase and refactor as per latest changes on torvalds/master >>> >>> - Drop patch 1 in the series >>> >>> >>> >>> Changes in v3: >>> >>> - Fix incorrect vma used for sinval instructions >>> >>> - Use unified static key mechanism for svinval >>> >>> - Rebased on torvalds/master >>> >>> >>> >>> Changes in v2: >>> >>> - Rebased on 5.18-rc3 >>> >>> - update riscv_fill_hwcap to probe Svinval extension >>> >>> >>> >>> arch/riscv/mm/tlbflush.c | 32 ++++++++++++++++++++++++++++++++ >>> >>> 1 file changed, 32 insertions(+) >>> >>> >>> >>> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c >>> >>> index 9b6e86ce3867..782147a63f3b 100644 >>> >>> --- a/arch/riscv/mm/tlbflush.c >>> >>> +++ b/arch/riscv/mm/tlbflush.c >>> >>> @@ -6,6 +6,27 @@ >>> >>> #include <linux/hugetlb.h> >>> >>> #include <asm/sbi.h> >>> >>> #include <asm/mmu_context.h> >>> >>> +#include <asm/cpufeature.h> >>> >>> + >>> >>> +#define has_svinval() >>> riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL) >>> >>> + >>> >>> +static inline void local_sfence_inval_ir(void) >>> >>> +{ >>> >>> + asm volatile(SFENCE_INVAL_IR() ::: "memory"); >>> >>> +} >>> >>> + >>> >>> +static inline void local_sfence_w_inval(void) >>> >>> +{ >>> >>> + asm volatile(SFENCE_W_INVAL() ::: "memory"); >>> >>> +} >>> >>> + >>> >>> +static inline void local_sinval_vma(unsigned long vma, unsigned >>> long asid) >>> >>> +{ >>> >>> + if (asid != FLUSH_TLB_NO_ASID) >>> >>> + asm volatile(SINVAL_VMA(%0, %1) : : "r" (vma), "r" >>> (asid) : "memory"); >>> >>> + else >>> >>> + asm volatile(SINVAL_VMA(%0, zero) : : "r" (vma) : >>> "memory"); >>> >>> +} >>> >>> >>> >>> /* >>> >>> * Flush entire TLB if number of entries to be flushed is greater >>> >>> @@ -26,6 +47,16 @@ static void >>> local_flush_tlb_range_threshold_asid(unsigned long start, >>> >>> return; >>> >>> } >>> >>> >>> >>> + if (has_svinval()) { >>> >>> + local_sfence_w_inval(); >>> >>> + for (i = 0; i < nr_ptes_in_range; ++i) { >>> >>> + local_sinval_vma(start, asid); >>> >>> + start += stride; >>> >>> + } >>> >>> + local_sfence_inval_ir(); >>> >>> + return; >>> >>> + } >>> >>> + >>> >>> for (i = 0; i < nr_ptes_in_range; ++i) { >>> >>> local_flush_tlb_page_asid(start, asid); >>> >>> start += stride; >>> > _______________________________________________ >>> > linux-riscv mailing list >>> > linux-riscv@lists.infradead.org >>> > http://lists.infradead.org/mailman/listinfo/linux-riscv > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-06-13 15:21 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-02 10:26 [PATCH v8] riscv: mm: Add support for Svinval extension Mayuresh Chitale 2024-07-02 12:32 ` Alexandre Ghiti 2024-07-24 14:50 ` Palmer Dabbelt 2024-07-30 8:43 ` Mayuresh Chitale 2025-05-20 16:55 ` Alexandre Ghiti 2025-05-22 15:11 ` Mayuresh Chitale 2025-06-06 18:22 ` Palmer Dabbelt 2025-06-13 14:08 ` Alexandre Ghiti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox