public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dylan Jhong <dylan@andestech.com>
To: Alexandre Ghiti <alex@ghiti.fr>
Cc: <paul.walmsley@sifive.com>, <palmer@dabbelt.com>,
	<aou@eecs.berkeley.edu>, <ajones@ventanamicro.com>,
	<alexghiti@rivosinc.com>, <anup@brainfault.org>,
	<rppt@kernel.org>, <samuel@sholland.org>,
	<panqinglin2020@iscas.ac.cn>, <sergey.matyukevich@syntacore.com>,
	<maz@kernel.org>, <linux-riscv@lists.infradead.org>,
	<conor.dooley@microchip.com>, <linux-kernel@vger.kernel.org>,
	<ycliang@andestech.com>, <x5710999x@gmail.com>,
	<tim609@andestech.com>
Subject: Re: [PATCH 1/1] riscv: Implement arch_sync_kernel_mappings() for "preventive" TLB flush
Date: Wed, 9 Aug 2023 19:16:00 +0800	[thread overview]
Message-ID: <ZNN1cEO1jMrPlPfj@atctrx.andestech.com> (raw)
In-Reply-To: <5681817e-2751-0166-b823-df03aebedf9f@ghiti.fr>

On Tue, Aug 08, 2023 at 12:16:50PM +0200, Alexandre Ghiti wrote:
> Hi Dylan,
> 
> On 07/08/2023 10:23, Dylan Jhong wrote:
> > Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB,
> > it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains
> > the correct kernel mapping.
> > 
> > The patch implements TLB flushing in arch_sync_kernel_mappings(), ensuring that kernel
> > page table mappings created via vmap/vmalloc() are updated before switching MM.
> > 
> > Signed-off-by: Dylan Jhong <dylan@andestech.com>
> > ---
> >   arch/riscv/include/asm/page.h |  2 ++
> >   arch/riscv/mm/tlbflush.c      | 12 ++++++++++++
> >   2 files changed, 14 insertions(+)
> > 
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index b55ba20903ec..6c86ab69687e 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -21,6 +21,8 @@
> >   #define HPAGE_MASK              (~(HPAGE_SIZE - 1))
> >   #define HUGETLB_PAGE_ORDER      (HPAGE_SHIFT - PAGE_SHIFT)
> > +#define ARCH_PAGE_TABLE_SYNC_MASK	PGTBL_PTE_MODIFIED
> > +
> >   /*
> >    * PAGE_OFFSET -- the first address of the first page of memory.
> >    * When not using MMU this corresponds to the first free page in
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index 77be59aadc73..d63364948c85 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -149,3 +149,15 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> >   	__flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE);
> >   }
> >   #endif
> > +
> > +/*
> > + * Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB,
> > + * it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains
> > + * the correct kernel mapping. arch_sync_kernel_mappings() will ensure that kernel
> > + * page table mappings created via vmap/vmalloc() are updated before switching MM.
> > + */
> > +void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
> > +{
> > +	if (start < VMALLOC_END && end > VMALLOC_START)
> 
> 
> This test is too restrictive, it should catch the range [MODULES_VADDR; 
> MODULES_END[ too, sorry I did not notice that at first.
> 
> 
> > +		flush_tlb_all();
> > +}
> > \ No newline at end of file
> 
> 
> I have to admit that I *think* both your patch and mine are wrong: one of
> the problem that led to the removal of vmalloc_fault() is the possibility
> for tracing functions to actually allocate vmalloc regions in the vmalloc
> page fault path, which could give rise to nested exceptions (see
> https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/).
> 
> Here, everytime we allocate a vmalloc region, we send an IPI. If a vmalloc
> allocation happens in this path (if it is traced for example), it will give
> rise to an IPI...and so on.
> 
> So I came to the conclusion that the only way to actually fix this issue is
> by resolving the vmalloc faults very early in the page fault path (by
> emitting a sfence.vma on uarch that cache invalid entries), before the
> kernel stack is even accessed. That's the best solution since it would
> completely remove all the preventive sfence.vma in
> flush_cache_vmap()/arch_sync_kernel_mappings(), we would rely on faulting
> which I assume should not happen a lot (?).
> 

Hi Alex,

Agree. 

If we could introduce a "new vmalloc_fault()" function before accessing the kernel stack,
which would trigger an SFENCE.VMA instruction, then each time we call vmalloc() or vmap()
to create new kernel mappings, we wouldn't need to execute flush_cache_vmap() or
arch_sync_kernel_mappings() to update the TLB. This should be able to balance both
performance and correctness.

> I'm implementing this solution, but I'm pretty sure it won't be ready for
> 6.5. In the meantime, we need either your patch or mine to fix your issue...
> 

If there are no others reporting this issues, I believe encountering this TLB flush problem
might not be so common. Perhaps we could wait until you've finished implementing the
"new vmalloc_fault()" feature. If anyone encounters problems in the meantime, I think they
can temporarily apply either my patch or yours to workaround the issue of updating TLB for
vmalloc.

Best regards,
Dylan Jhong


      reply	other threads:[~2023-08-09 11:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-07  8:23 [PATCH 0/1] Enhanced TLB flushing for vmap/vmalloc() Dylan Jhong
2023-08-07  8:23 ` [PATCH 1/1] riscv: Implement arch_sync_kernel_mappings() for "preventive" TLB flush Dylan Jhong
2023-08-07  9:35   ` kernel test robot
2023-08-07 12:28   ` kernel test robot
2023-08-08 10:16   ` Alexandre Ghiti
2023-08-09 11:16     ` Dylan Jhong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNN1cEO1jMrPlPfj@atctrx.andestech.com \
    --to=dylan@andestech.com \
    --cc=ajones@ventanamicro.com \
    --cc=alex@ghiti.fr \
    --cc=alexghiti@rivosinc.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=conor.dooley@microchip.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=panqinglin2020@iscas.ac.cn \
    --cc=paul.walmsley@sifive.com \
    --cc=rppt@kernel.org \
    --cc=samuel@sholland.org \
    --cc=sergey.matyukevich@syntacore.com \
    --cc=tim609@andestech.com \
    --cc=x5710999x@gmail.com \
    --cc=ycliang@andestech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox