linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexandre Ghiti <alexghiti@rivosinc.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	 Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Michael Ellerman <mpe@ellerman.id.au>,
	 Nicholas Piggin <npiggin@gmail.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	 Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Ved Shanbhogue <ved@rivosinc.com>,  Matt Evans <mev@rivosinc.com>,
	Dylan Jhong <dylan@andestech.com>,
	 "linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	 "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>,
	 "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	 "linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH RFC/RFT 3/4] riscv: Stop emitting preventive sfence.vma for new userspace mappings
Date: Fri, 8 Dec 2023 15:39:22 +0100	[thread overview]
Message-ID: <CAHVXubit2NjBREKFb=s14q13PvtcjuthLn2oKnBq8PfFnVMieQ@mail.gmail.com> (raw)
In-Reply-To: <e783436f-631c-4b02-ba9c-b6145e0e8b5a@csgroup.eu>

On Thu, Dec 7, 2023 at 5:37 PM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
> The subject says "riscv:" but it changes core part and several arch.
> Maybe this commit should be split in two commits, one for API changes
> that changes flush_tlb_fix_spurious_fault() to
> flush_tlb_fix_spurious_write_fault() and adds
> flush_tlb_fix_spurious_read_fault() including the change in memory.c,
> then a second patch with the changes to riscv.

You're right, I'll do that, thanks.

>
> Le 07/12/2023 à 16:03, Alexandre Ghiti a écrit :
> > The preventive sfence.vma were emitted because new mappings must be made
> > visible to the page table walker, either the uarch caches invalid
> > entries or not.
> >
> > Actually, there is no need to preventively sfence.vma on new mappings for
> > userspace, this should be handled only in the page fault path.
> >
> > This allows to drastically reduce the number of sfence.vma emitted:
> >
> > * Ubuntu boot to login:
> > Before: ~630k sfence.vma
> > After:  ~200k sfence.vma
> >
> > * ltp - mmapstress01
> > Before: ~45k
> > After:  ~6.3k
> >
> > * lmbench - lat_pagefault
> > Before: ~665k
> > After:   832 (!)
> >
> > * lmbench - lat_mmap
> > Before: ~546k
> > After:   718 (!)
> >
> > The only issue with the removal of sfence.vma in update_mmu_cache() is
> > that on uarchs that cache invalid entries, those won't be invalidated
> > until the process takes a fault: so that's an additional fault in those
> > cases.
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > ---
> >   arch/arm64/include/asm/pgtable.h              |  2 +-
> >   arch/mips/include/asm/pgtable.h               |  6 +--
> >   arch/powerpc/include/asm/book3s/64/tlbflush.h |  8 ++--
> >   arch/riscv/include/asm/pgtable.h              | 43 +++++++++++--------
> >   include/linux/pgtable.h                       |  8 +++-
> >   mm/memory.c                                   | 12 +++++-
> >   6 files changed, 48 insertions(+), 31 deletions(-)
>
> Did you forget mm/pgtable-generic.c ?

Indeed, I "missed" the occurrence of flush_tlb_fix_spurious_fault()
there, thanks.

>
> >
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index 7f7d9b1df4e5..728f25f529a5 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -57,7 +57,7 @@ static inline bool arch_thp_swp_supported(void)
> >    * fault on one CPU which has been handled concurrently by another CPU
> >    * does not need to perform additional invalidation.
> >    */
> > -#define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0)
> > +#define flush_tlb_fix_spurious_write_fault(vma, address, ptep) do { } while (0)
>
> Why do you need to do that change ? Nothing is explained about that in
> the commit message.

I renamed this macro because in the page fault path,
flush_tlb_fix_spurious_fault() is called only when the fault is a
write fault (see
https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L5016).
I'll check if that fits the occurrence in mm/pgtable-generic.c too.

Thanks again for the review,

Alex

>
> >
> >   /*
> >    * ZERO_PAGE is a global shared page that is always zero: used
> > diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> > index 430b208c0130..84439fe6ed29 100644
> > --- a/arch/mips/include/asm/pgtable.h
> > +++ b/arch/mips/include/asm/pgtable.h
> > @@ -478,9 +478,9 @@ static inline pgprot_t pgprot_writecombine(pgprot_t _prot)
> >       return __pgprot(prot);
> >   }
> >
> > -static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
> > -                                             unsigned long address,
> > -                                             pte_t *ptep)
> > +static inline void flush_tlb_fix_spurious_write_fault(struct vm_area_struct *vma,
> > +                                                   unsigned long address,
> > +                                                   pte_t *ptep)
> >   {
> >   }
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> > index 1950c1b825b4..7166d56f90db 100644
> > --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> > +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> > @@ -128,10 +128,10 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
> >   #define flush_tlb_page(vma, addr)   local_flush_tlb_page(vma, addr)
> >   #endif /* CONFIG_SMP */
> >
> > -#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
> > -static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
> > -                                             unsigned long address,
> > -                                             pte_t *ptep)
> > +#define flush_tlb_fix_spurious_write_fault flush_tlb_fix_spurious_write_fault
> > +static inline void flush_tlb_fix_spurious_write_fault(struct vm_area_struct *vma,
> > +                                                   unsigned long address,
> > +                                                   pte_t *ptep)
> >   {
> >       /*
> >        * Book3S 64 does not require spurious fault flushes because the PTE
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index b2ba3f79cfe9..89aa5650f104 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -472,28 +472,20 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf,
> >               struct vm_area_struct *vma, unsigned long address,
> >               pte_t *ptep, unsigned int nr)
> >   {
> > -     /*
> > -      * The kernel assumes that TLBs don't cache invalid entries, but
> > -      * in RISC-V, SFENCE.VMA specifies an ordering constraint, not a
> > -      * cache flush; it is necessary even after writing invalid entries.
> > -      * Relying on flush_tlb_fix_spurious_fault would suffice, but
> > -      * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
> > -      */
> > -     while (nr--)
> > -             local_flush_tlb_page(address + nr * PAGE_SIZE);
> >   }
> >   #define update_mmu_cache(vma, addr, ptep) \
> >       update_mmu_cache_range(NULL, vma, addr, ptep, 1)
> >
> >   #define __HAVE_ARCH_UPDATE_MMU_TLB
> > -#define update_mmu_tlb update_mmu_cache
> > +static inline void update_mmu_tlb(struct vm_area_struct *vma,
> > +                               unsigned long address, pte_t *ptep)
> > +{
> > +     flush_tlb_range(vma, address, address + PAGE_SIZE);
> > +}
> >
> >   static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> >               unsigned long address, pmd_t *pmdp)
> >   {
> > -     pte_t *ptep = (pte_t *)pmdp;
> > -
> > -     update_mmu_cache(vma, address, ptep);
> >   }
> >
> >   #define __HAVE_ARCH_PTE_SAME
> > @@ -548,13 +540,26 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma,
> >                                       unsigned long address, pte_t *ptep,
> >                                       pte_t entry, int dirty)
> >   {
> > -     if (!pte_same(*ptep, entry))
> > +     if (!pte_same(*ptep, entry)) {
> >               __set_pte_at(ptep, entry);
> > -     /*
> > -      * update_mmu_cache will unconditionally execute, handling both
> > -      * the case that the PTE changed and the spurious fault case.
> > -      */
> > -     return true;
> > +             /* Here only not svadu is impacted */
> > +             flush_tlb_page(vma, address);
> > +             return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> > +extern u64 nr_sfence_vma_handle_exception;
> > +extern bool tlb_caching_invalid_entries;
> > +
> > +#define flush_tlb_fix_spurious_read_fault flush_tlb_fix_spurious_read_fault
> > +static inline void flush_tlb_fix_spurious_read_fault(struct vm_area_struct *vma,
> > +                                                  unsigned long address,
> > +                                                  pte_t *ptep)
> > +{
> > +     if (tlb_caching_invalid_entries)
> > +             flush_tlb_page(vma, address);
> >   }
> >
> >   #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
> > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> > index af7639c3b0a3..7abaf42ef612 100644
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -931,8 +931,12 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
> >   # define pte_accessible(mm, pte)    ((void)(pte), 1)
> >   #endif
> >
> > -#ifndef flush_tlb_fix_spurious_fault
> > -#define flush_tlb_fix_spurious_fault(vma, address, ptep) flush_tlb_page(vma, address)
> > +#ifndef flush_tlb_fix_spurious_write_fault
> > +#define flush_tlb_fix_spurious_write_fault(vma, address, ptep) flush_tlb_page(vma, address)
> > +#endif
> > +
> > +#ifndef flush_tlb_fix_spurious_read_fault
> > +#define flush_tlb_fix_spurious_read_fault(vma, address, ptep)
> >   #endif
> >
> >   /*
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 517221f01303..5cb0ccf0c03f 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -5014,8 +5014,16 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> >                * with threads.
> >                */
> >               if (vmf->flags & FAULT_FLAG_WRITE)
> > -                     flush_tlb_fix_spurious_fault(vmf->vma, vmf->address,
> > -                                                  vmf->pte);
> > +                     flush_tlb_fix_spurious_write_fault(vmf->vma, vmf->address,
> > +                                                        vmf->pte);
> > +             else
> > +                     /*
> > +                      * With the pte_same(ptep_get(vmf->pte), entry) check
> > +                      * that calls update_mmu_tlb() above, multiple threads
> > +                      * faulting at the same time won't get there.
> > +                      */
> > +                     flush_tlb_fix_spurious_read_fault(vmf->vma, vmf->address,
> > +                                                       vmf->pte);
> >       }
> >   unlock:
> >       pte_unmap_unlock(vmf->pte, vmf->ptl);

  reply	other threads:[~2023-12-08 14:39 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-07 15:03 [PATCH RFC/RFT 0/4] Remove preventive sfence.vma Alexandre Ghiti
2023-12-07 15:03 ` [PATCH RFC/RFT 1/4] riscv: Stop emitting preventive sfence.vma for new vmalloc mappings Alexandre Ghiti
2023-12-07 15:52   ` Christophe Leroy
2023-12-08 14:28     ` Alexandre Ghiti
2023-12-07 15:03 ` [PATCH RFC/RFT 2/4] riscv: Add a runtime detection of invalid TLB entries caching Alexandre Ghiti
2023-12-07 15:55   ` Christophe Leroy
2023-12-08 14:30     ` Alexandre Ghiti
2023-12-07 15:03 ` [PATCH RFC/RFT 3/4] riscv: Stop emitting preventive sfence.vma for new userspace mappings Alexandre Ghiti
2023-12-07 16:37   ` Christophe Leroy
2023-12-08 14:39     ` Alexandre Ghiti [this message]
2023-12-07 15:03 ` [PATCH RFC/RFT 4/4] TEMP: riscv: Add debugfs interface to retrieve #sfence.vma Alexandre Ghiti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHVXubit2NjBREKFb=s14q13PvtcjuthLn2oKnBq8PfFnVMieQ@mail.gmail.com' \
    --to=alexghiti@rivosinc.com \
    --cc=akpm@linux-foundation.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dylan@andestech.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mev@rivosinc.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=ved@rivosinc.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).