From: Alexandre Ghiti <alexghiti@rivosinc.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Andrew Morton <akpm@linux-foundation.org>,
Ved Shanbhogue <ved@rivosinc.com>, Matt Evans <mev@rivosinc.com>,
Dylan Jhong <dylan@andestech.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH RFC/RFT 3/4] riscv: Stop emitting preventive sfence.vma for new userspace mappings
Date: Fri, 8 Dec 2023 15:39:22 +0100 [thread overview]
Message-ID: <CAHVXubit2NjBREKFb=s14q13PvtcjuthLn2oKnBq8PfFnVMieQ@mail.gmail.com> (raw)
In-Reply-To: <e783436f-631c-4b02-ba9c-b6145e0e8b5a@csgroup.eu>
On Thu, Dec 7, 2023 at 5:37 PM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
> The subject says "riscv:" but it changes core part and several arch.
> Maybe this commit should be split in two commits, one for API changes
> that changes flush_tlb_fix_spurious_fault() to
> flush_tlb_fix_spurious_write_fault() and adds
> flush_tlb_fix_spurious_read_fault() including the change in memory.c,
> then a second patch with the changes to riscv.
You're right, I'll do that, thanks.
>
> Le 07/12/2023 à 16:03, Alexandre Ghiti a écrit :
> > The preventive sfence.vma were emitted because new mappings must be made
> > visible to the page table walker, either the uarch caches invalid
> > entries or not.
> >
> > Actually, there is no need to preventively sfence.vma on new mappings for
> > userspace, this should be handled only in the page fault path.
> >
> > This allows to drastically reduce the number of sfence.vma emitted:
> >
> > * Ubuntu boot to login:
> > Before: ~630k sfence.vma
> > After: ~200k sfence.vma
> >
> > * ltp - mmapstress01
> > Before: ~45k
> > After: ~6.3k
> >
> > * lmbench - lat_pagefault
> > Before: ~665k
> > After: 832 (!)
> >
> > * lmbench - lat_mmap
> > Before: ~546k
> > After: 718 (!)
> >
> > The only issue with the removal of sfence.vma in update_mmu_cache() is
> > that on uarchs that cache invalid entries, those won't be invalidated
> > until the process takes a fault: so that's an additional fault in those
> > cases.
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > ---
> > arch/arm64/include/asm/pgtable.h | 2 +-
> > arch/mips/include/asm/pgtable.h | 6 +--
> > arch/powerpc/include/asm/book3s/64/tlbflush.h | 8 ++--
> > arch/riscv/include/asm/pgtable.h | 43 +++++++++++--------
> > include/linux/pgtable.h | 8 +++-
> > mm/memory.c | 12 +++++-
> > 6 files changed, 48 insertions(+), 31 deletions(-)
>
> Did you forget mm/pgtable-generic.c ?
Indeed, I "missed" the occurrence of flush_tlb_fix_spurious_fault()
there, thanks.
>
> >
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index 7f7d9b1df4e5..728f25f529a5 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -57,7 +57,7 @@ static inline bool arch_thp_swp_supported(void)
> > * fault on one CPU which has been handled concurrently by another CPU
> > * does not need to perform additional invalidation.
> > */
> > -#define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0)
> > +#define flush_tlb_fix_spurious_write_fault(vma, address, ptep) do { } while (0)
>
> Why do you need to do that change ? Nothing is explained about that in
> the commit message.
I renamed this macro because in the page fault path,
flush_tlb_fix_spurious_fault() is called only when the fault is a
write fault (see
https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L5016).
I'll check if that fits the occurrence in mm/pgtable-generic.c too.
Thanks again for the review,
Alex
>
> >
> > /*
> > * ZERO_PAGE is a global shared page that is always zero: used
> > diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> > index 430b208c0130..84439fe6ed29 100644
> > --- a/arch/mips/include/asm/pgtable.h
> > +++ b/arch/mips/include/asm/pgtable.h
> > @@ -478,9 +478,9 @@ static inline pgprot_t pgprot_writecombine(pgprot_t _prot)
> > return __pgprot(prot);
> > }
> >
> > -static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
> > - unsigned long address,
> > - pte_t *ptep)
> > +static inline void flush_tlb_fix_spurious_write_fault(struct vm_area_struct *vma,
> > + unsigned long address,
> > + pte_t *ptep)
> > {
> > }
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> > index 1950c1b825b4..7166d56f90db 100644
> > --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> > +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> > @@ -128,10 +128,10 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
> > #define flush_tlb_page(vma, addr) local_flush_tlb_page(vma, addr)
> > #endif /* CONFIG_SMP */
> >
> > -#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
> > -static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
> > - unsigned long address,
> > - pte_t *ptep)
> > +#define flush_tlb_fix_spurious_write_fault flush_tlb_fix_spurious_write_fault
> > +static inline void flush_tlb_fix_spurious_write_fault(struct vm_area_struct *vma,
> > + unsigned long address,
> > + pte_t *ptep)
> > {
> > /*
> > * Book3S 64 does not require spurious fault flushes because the PTE
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index b2ba3f79cfe9..89aa5650f104 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -472,28 +472,20 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf,
> > struct vm_area_struct *vma, unsigned long address,
> > pte_t *ptep, unsigned int nr)
> > {
> > - /*
> > - * The kernel assumes that TLBs don't cache invalid entries, but
> > - * in RISC-V, SFENCE.VMA specifies an ordering constraint, not a
> > - * cache flush; it is necessary even after writing invalid entries.
> > - * Relying on flush_tlb_fix_spurious_fault would suffice, but
> > - * the extra traps reduce performance. So, eagerly SFENCE.VMA.
> > - */
> > - while (nr--)
> > - local_flush_tlb_page(address + nr * PAGE_SIZE);
> > }
> > #define update_mmu_cache(vma, addr, ptep) \
> > update_mmu_cache_range(NULL, vma, addr, ptep, 1)
> >
> > #define __HAVE_ARCH_UPDATE_MMU_TLB
> > -#define update_mmu_tlb update_mmu_cache
> > +static inline void update_mmu_tlb(struct vm_area_struct *vma,
> > + unsigned long address, pte_t *ptep)
> > +{
> > + flush_tlb_range(vma, address, address + PAGE_SIZE);
> > +}
> >
> > static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> > unsigned long address, pmd_t *pmdp)
> > {
> > - pte_t *ptep = (pte_t *)pmdp;
> > -
> > - update_mmu_cache(vma, address, ptep);
> > }
> >
> > #define __HAVE_ARCH_PTE_SAME
> > @@ -548,13 +540,26 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma,
> > unsigned long address, pte_t *ptep,
> > pte_t entry, int dirty)
> > {
> > - if (!pte_same(*ptep, entry))
> > + if (!pte_same(*ptep, entry)) {
> > __set_pte_at(ptep, entry);
> > - /*
> > - * update_mmu_cache will unconditionally execute, handling both
> > - * the case that the PTE changed and the spurious fault case.
> > - */
> > - return true;
> > + /* Here only not svadu is impacted */
> > + flush_tlb_page(vma, address);
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +extern u64 nr_sfence_vma_handle_exception;
> > +extern bool tlb_caching_invalid_entries;
> > +
> > +#define flush_tlb_fix_spurious_read_fault flush_tlb_fix_spurious_read_fault
> > +static inline void flush_tlb_fix_spurious_read_fault(struct vm_area_struct *vma,
> > + unsigned long address,
> > + pte_t *ptep)
> > +{
> > + if (tlb_caching_invalid_entries)
> > + flush_tlb_page(vma, address);
> > }
> >
> > #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
> > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> > index af7639c3b0a3..7abaf42ef612 100644
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -931,8 +931,12 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
> > # define pte_accessible(mm, pte) ((void)(pte), 1)
> > #endif
> >
> > -#ifndef flush_tlb_fix_spurious_fault
> > -#define flush_tlb_fix_spurious_fault(vma, address, ptep) flush_tlb_page(vma, address)
> > +#ifndef flush_tlb_fix_spurious_write_fault
> > +#define flush_tlb_fix_spurious_write_fault(vma, address, ptep) flush_tlb_page(vma, address)
> > +#endif
> > +
> > +#ifndef flush_tlb_fix_spurious_read_fault
> > +#define flush_tlb_fix_spurious_read_fault(vma, address, ptep)
> > #endif
> >
> > /*
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 517221f01303..5cb0ccf0c03f 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -5014,8 +5014,16 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> > * with threads.
> > */
> > if (vmf->flags & FAULT_FLAG_WRITE)
> > - flush_tlb_fix_spurious_fault(vmf->vma, vmf->address,
> > - vmf->pte);
> > + flush_tlb_fix_spurious_write_fault(vmf->vma, vmf->address,
> > + vmf->pte);
> > + else
> > + /*
> > + * With the pte_same(ptep_get(vmf->pte), entry) check
> > + * that calls update_mmu_tlb() above, multiple threads
> > + * faulting at the same time won't get there.
> > + */
> > + flush_tlb_fix_spurious_read_fault(vmf->vma, vmf->address,
> > + vmf->pte);
> > }
> > unlock:
> > pte_unmap_unlock(vmf->pte, vmf->ptl);
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2023-12-08 14:39 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-07 15:03 [PATCH RFC/RFT 0/4] Remove preventive sfence.vma Alexandre Ghiti
2023-12-07 15:03 ` [PATCH RFC/RFT 1/4] riscv: Stop emitting preventive sfence.vma for new vmalloc mappings Alexandre Ghiti
2023-12-07 15:52 ` Christophe Leroy
2023-12-08 14:28 ` Alexandre Ghiti
2023-12-07 15:03 ` [PATCH RFC/RFT 2/4] riscv: Add a runtime detection of invalid TLB entries caching Alexandre Ghiti
2023-12-07 15:55 ` Christophe Leroy
2023-12-08 14:30 ` Alexandre Ghiti
2023-12-07 15:03 ` [PATCH RFC/RFT 3/4] riscv: Stop emitting preventive sfence.vma for new userspace mappings Alexandre Ghiti
2023-12-07 16:37 ` Christophe Leroy
2023-12-08 14:39 ` Alexandre Ghiti [this message]
2023-12-07 15:03 ` [PATCH RFC/RFT 4/4] TEMP: riscv: Add debugfs interface to retrieve #sfence.vma Alexandre Ghiti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHVXubit2NjBREKFb=s14q13PvtcjuthLn2oKnBq8PfFnVMieQ@mail.gmail.com' \
--to=alexghiti@rivosinc.com \
--cc=akpm@linux-foundation.org \
--cc=aou@eecs.berkeley.edu \
--cc=catalin.marinas@arm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=dylan@andestech.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mev@rivosinc.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=tsbogend@alpha.franken.de \
--cc=ved@rivosinc.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).