public inbox for linux-riscv@lists.infradead.org
 help / color / mirror / Atom feed
From: mchitale@ventanamicro.com
To: Alexandre Ghiti <alex@ghiti.fr>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	 Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	linux-riscv@lists.infradead.org
Cc: Nanyong Sun <sunnanyong@huawei.com>,
	Anup Patel <apatel@ventanamicro.com>,
	 Andrew Jones <ajones@ventanamicro.com>
Subject: Re: [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP
Date: Mon, 30 Jan 2023 12:59:42 +0530	[thread overview]
Message-ID: <80649c8a017f7e5cbb4b23d7625bbb4737bfe5db.camel@ventanamicro.com> (raw)
In-Reply-To: <117752cf-c02b-4264-87c3-42c81aa429f7@ghiti.fr>

On Thu, 2023-01-26 at 16:33 +0100, Alexandre Ghiti wrote:
> Hi Mayuresh,
> 
> On 1/25/23 13:55, Mayuresh Chitale wrote:
> > When THP is enabled, 4K pages are collapsed into a single huge
> > page using the generic pmdp_collapse_flush() which will further
> > use flush_tlb_range() to shoot-down stale TLB entries.
> > Unfortunately,
> > the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
> > using address specific SFENCEs which results in repetitive (or
> > unpredictable) page faults on RISC-V implementations which cache
> > non-leaf PTEs.
> 
> That's interesting! I'm wondering if the same issue will happen if a 
> user maps 4K, unmaps it and at the same address maps a 2MB hugepage:
> I'm 
> not sure the mm code would correctly flush the non-leaf PTE when 
> unmapping the 4KB page. In that case, your patch only fixes the THP 
> usecase and maybe we should try to catch this non-leaf -> leaf
> upgrade 
> at some lower level page table functions, what do you think?

I will look into it but I dont know how to reproduce the issue without
the THP use case. It would be great if you could share the test case or
test code to reproduce it.

> 
> Alex
> 
> 
> > Provide a RISC-V specific pmdp_collapse_flush() which ensures both
> > cached leaf and non-leaf PTEs are invalidated by using non-address
> > specific SFENCEs as recommended by the RISC-V privileged
> > specification.
> > 
> > Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
> > Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> > ---
> >   arch/riscv/include/asm/pgtable.h | 24 ++++++++++++++++++++++++
> >   1 file changed, 24 insertions(+)
> > 
> > diff --git a/arch/riscv/include/asm/pgtable.h
> > b/arch/riscv/include/asm/pgtable.h
> > index 4eba9a98d0e3..6d948dec6020 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct
> > vm_area_struct *vma,
> >   	page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
> >   	return __pmd(atomic_long_xchg((atomic_long_t *)pmdp,
> > pmd_val(pmd)));
> >   }
> > +
> > +#define pmdp_collapse_flush pmdp_collapse_flush
> > +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct
> > *vma,
> > +					unsigned long address, pmd_t
> > *pmdp)
> > +{
> > +	pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
> > +
> > +	/*
> > +	 * When leaf PTE enteries (regular pages) are collapsed into a
> > leaf
> > +	 * PMD entry (huge page), a valid non-leaf PTE is converted
> > into a
> > +	 * valid leaf PTE at the level 1 page table. The RISC-V
> > privileged v1.12
> > +	 * specification allows implementations to cache valid non-leaf 
> > PTEs,
> > +	 * but the section "4.2.1 Supervisor Memory-Management Fence
> > +	 * Instruction" recommends the following:
> > +	 * "If software modifies a non-leaf PTE, it should execute
> > SFENCE.VMA
> > +	 * with rs1=x0. If any PTE along the traversal path had its G
> > bit set,
> > +	 * rs2 must be x0; otherwise, rs2 should be set to the ASID for
> > which
> > +	 * the translation is being modified."
> > +	 * Based on the above recommendation, we should do full flush
> > whenever
> > +	 * leaf PTE entries are collapsed into a leaf PMD entry.
> > +	 */
> > +	flush_tlb_mm(vma->vm_mm);
> > +	return pmd;
> > +}
> >   #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >   
> >   /*
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

      parent reply	other threads:[~2023-01-30  7:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 12:55 [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP Mayuresh Chitale
2023-01-25 15:35 ` Andrew Jones
2023-01-30  7:26   ` mchitale
2023-01-26 15:33 ` Alexandre Ghiti
2023-01-26 18:14   ` Anup Patel
2023-01-27  8:41     ` Alexandre Ghiti
2023-01-30  7:34       ` mchitale
2023-01-30  7:29   ` mchitale [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80649c8a017f7e5cbb4b23d7625bbb4737bfe5db.camel@ventanamicro.com \
    --to=mchitale@ventanamicro.com \
    --cc=ajones@ventanamicro.com \
    --cc=alex@ghiti.fr \
    --cc=aou@eecs.berkeley.edu \
    --cc=apatel@ventanamicro.com \
    --cc=linux-riscv@lists.infradead.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=sunnanyong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox