* [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP
@ 2023-01-25 12:55 Mayuresh Chitale
2023-01-25 15:35 ` Andrew Jones
2023-01-26 15:33 ` Alexandre Ghiti
0 siblings, 2 replies; 8+ messages in thread
From: Mayuresh Chitale @ 2023-01-25 12:55 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv
Cc: Mayuresh Chitale, Nanyong Sun, Anup Patel, Andrew Jones
When THP is enabled, 4K pages are collapsed into a single huge
page using the generic pmdp_collapse_flush() which will further
use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately,
the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
using address specific SFENCEs which results in repetitive (or
unpredictable) page faults on RISC-V implementations which cache
non-leaf PTEs.
Provide a RISC-V specific pmdp_collapse_flush() which ensures both
cached leaf and non-leaf PTEs are invalidated by using non-address
specific SFENCEs as recommended by the RISC-V privileged specification.
Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
---
arch/riscv/include/asm/pgtable.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 4eba9a98d0e3..6d948dec6020 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
}
+
+#define pmdp_collapse_flush pmdp_collapse_flush
+static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
+{
+ pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
+
+ /*
+ * When leaf PTE enteries (regular pages) are collapsed into a leaf
+ * PMD entry (huge page), a valid non-leaf PTE is converted into a
+ * valid leaf PTE at the level 1 page table. The RISC-V privileged v1.12
+ * specification allows implementations to cache valid non-leaf PTEs,
+ * but the section "4.2.1 Supervisor Memory-Management Fence
+ * Instruction" recommends the following:
+ * "If software modifies a non-leaf PTE, it should execute SFENCE.VMA
+ * with rs1=x0. If any PTE along the traversal path had its G bit set,
+ * rs2 must be x0; otherwise, rs2 should be set to the ASID for which
+ * the translation is being modified."
+ * Based on the above recommendation, we should do full flush whenever
+ * leaf PTE entries are collapsed into a leaf PMD entry.
+ */
+ flush_tlb_mm(vma->vm_mm);
+ return pmd;
+}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
/*
--
2.34.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP
2023-01-25 12:55 [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP Mayuresh Chitale
@ 2023-01-25 15:35 ` Andrew Jones
2023-01-30 7:26 ` mchitale
2023-01-26 15:33 ` Alexandre Ghiti
1 sibling, 1 reply; 8+ messages in thread
From: Andrew Jones @ 2023-01-25 15:35 UTC (permalink / raw)
To: Mayuresh Chitale
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv,
Nanyong Sun, Anup Patel
On Wed, Jan 25, 2023 at 06:25:11PM +0530, Mayuresh Chitale wrote:
> When THP is enabled, 4K pages are collapsed into a single huge
> page using the generic pmdp_collapse_flush() which will further
> use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately,
> the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
> using address specific SFENCEs which results in repetitive (or
> unpredictable) page faults on RISC-V implementations which cache
> non-leaf PTEs.
>
> Provide a RISC-V specific pmdp_collapse_flush() which ensures both
> cached leaf and non-leaf PTEs are invalidated by using non-address
> specific SFENCEs as recommended by the RISC-V privileged specification.
>
> Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> ---
> arch/riscv/include/asm/pgtable.h | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 4eba9a98d0e3..6d948dec6020 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
> return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
> }
> +
> +#define pmdp_collapse_flush pmdp_collapse_flush
> +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
> + unsigned long address, pmd_t *pmdp)
> +{
> + pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
The generic version of this function has a couple sanity checks for
kernels compiled with CONFIG_DEBUG_VM. Shouldn't we duplicate those?
Thanks,
drew
> +
> + /*
> + * When leaf PTE enteries (regular pages) are collapsed into a leaf
> + * PMD entry (huge page), a valid non-leaf PTE is converted into a
> + * valid leaf PTE at the level 1 page table. The RISC-V privileged v1.12
> + * specification allows implementations to cache valid non-leaf PTEs,
> + * but the section "4.2.1 Supervisor Memory-Management Fence
> + * Instruction" recommends the following:
> + * "If software modifies a non-leaf PTE, it should execute SFENCE.VMA
> + * with rs1=x0. If any PTE along the traversal path had its G bit set,
> + * rs2 must be x0; otherwise, rs2 should be set to the ASID for which
> + * the translation is being modified."
> + * Based on the above recommendation, we should do full flush whenever
> + * leaf PTE entries are collapsed into a leaf PMD entry.
> + */
> + flush_tlb_mm(vma->vm_mm);
> + return pmd;
> +}
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> /*
> --
> 2.34.1
>
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP
2023-01-25 15:35 ` Andrew Jones
@ 2023-01-30 7:26 ` mchitale
0 siblings, 0 replies; 8+ messages in thread
From: mchitale @ 2023-01-30 7:26 UTC (permalink / raw)
To: Andrew Jones
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv,
Nanyong Sun, Anup Patel
On Wed, 2023-01-25 at 16:35 +0100, Andrew Jones wrote:
> On Wed, Jan 25, 2023 at 06:25:11PM +0530, Mayuresh Chitale wrote:
> > When THP is enabled, 4K pages are collapsed into a single huge
> > page using the generic pmdp_collapse_flush() which will further
> > use flush_tlb_range() to shoot-down stale TLB entries.
> > Unfortunately,
> > the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
> > using address specific SFENCEs which results in repetitive (or
> > unpredictable) page faults on RISC-V implementations which cache
> > non-leaf PTEs.
> >
> > Provide a RISC-V specific pmdp_collapse_flush() which ensures both
> > cached leaf and non-leaf PTEs are invalidated by using non-address
> > specific SFENCEs as recommended by the RISC-V privileged
> > specification.
> >
> > Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
> > Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> > ---
> > arch/riscv/include/asm/pgtable.h | 24 ++++++++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/pgtable.h
> > b/arch/riscv/include/asm/pgtable.h
> > index 4eba9a98d0e3..6d948dec6020 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct
> > vm_area_struct *vma,
> > page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
> > return __pmd(atomic_long_xchg((atomic_long_t *)pmdp,
> > pmd_val(pmd)));
> > }
> > +
> > +#define pmdp_collapse_flush pmdp_collapse_flush
> > +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct
> > *vma,
> > + unsigned long address, pmd_t
> > *pmdp)
> > +{
> > + pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
>
> The generic version of this function has a couple sanity checks for
> kernels compiled with CONFIG_DEBUG_VM. Shouldn't we duplicate those?
Ok. I will add those checks too.
>
> Thanks,
> drew
>
> > +
> > + /*
> > + * When leaf PTE enteries (regular pages) are collapsed into a
> > leaf
> > + * PMD entry (huge page), a valid non-leaf PTE is converted
> > into a
> > + * valid leaf PTE at the level 1 page table. The RISC-V
> > privileged v1.12
> > + * specification allows implementations to cache valid non-leaf
> > PTEs,
> > + * but the section "4.2.1 Supervisor Memory-Management Fence
> > + * Instruction" recommends the following:
> > + * "If software modifies a non-leaf PTE, it should execute
> > SFENCE.VMA
> > + * with rs1=x0. If any PTE along the traversal path had its G
> > bit set,
> > + * rs2 must be x0; otherwise, rs2 should be set to the ASID for
> > which
> > + * the translation is being modified."
> > + * Based on the above recommendation, we should do full flush
> > whenever
> > + * leaf PTE entries are collapsed into a leaf PMD entry.
> > + */
> > + flush_tlb_mm(vma->vm_mm);
> > + return pmd;
> > +}
> > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >
> > /*
> > --
> > 2.34.1
> >
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP
2023-01-25 12:55 [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP Mayuresh Chitale
2023-01-25 15:35 ` Andrew Jones
@ 2023-01-26 15:33 ` Alexandre Ghiti
2023-01-26 18:14 ` Anup Patel
2023-01-30 7:29 ` mchitale
1 sibling, 2 replies; 8+ messages in thread
From: Alexandre Ghiti @ 2023-01-26 15:33 UTC (permalink / raw)
To: Mayuresh Chitale, Paul Walmsley, Palmer Dabbelt, Albert Ou,
linux-riscv
Cc: Nanyong Sun, Anup Patel, Andrew Jones
Hi Mayuresh,
On 1/25/23 13:55, Mayuresh Chitale wrote:
> When THP is enabled, 4K pages are collapsed into a single huge
> page using the generic pmdp_collapse_flush() which will further
> use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately,
> the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
> using address specific SFENCEs which results in repetitive (or
> unpredictable) page faults on RISC-V implementations which cache
> non-leaf PTEs.
That's interesting! I'm wondering if the same issue will happen if a
user maps 4K, unmaps it and at the same address maps a 2MB hugepage: I'm
not sure the mm code would correctly flush the non-leaf PTE when
unmapping the 4KB page. In that case, your patch only fixes the THP
usecase and maybe we should try to catch this non-leaf -> leaf upgrade
at some lower level page table functions, what do you think?
Alex
> Provide a RISC-V specific pmdp_collapse_flush() which ensures both
> cached leaf and non-leaf PTEs are invalidated by using non-address
> specific SFENCEs as recommended by the RISC-V privileged specification.
>
> Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> ---
> arch/riscv/include/asm/pgtable.h | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 4eba9a98d0e3..6d948dec6020 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
> return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
> }
> +
> +#define pmdp_collapse_flush pmdp_collapse_flush
> +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
> + unsigned long address, pmd_t *pmdp)
> +{
> + pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
> +
> + /*
> + * When leaf PTE enteries (regular pages) are collapsed into a leaf
> + * PMD entry (huge page), a valid non-leaf PTE is converted into a
> + * valid leaf PTE at the level 1 page table. The RISC-V privileged v1.12
> + * specification allows implementations to cache valid non-leaf PTEs,
> + * but the section "4.2.1 Supervisor Memory-Management Fence
> + * Instruction" recommends the following:
> + * "If software modifies a non-leaf PTE, it should execute SFENCE.VMA
> + * with rs1=x0. If any PTE along the traversal path had its G bit set,
> + * rs2 must be x0; otherwise, rs2 should be set to the ASID for which
> + * the translation is being modified."
> + * Based on the above recommendation, we should do full flush whenever
> + * leaf PTE entries are collapsed into a leaf PMD entry.
> + */
> + flush_tlb_mm(vma->vm_mm);
> + return pmd;
> +}
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> /*
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP
2023-01-26 15:33 ` Alexandre Ghiti
@ 2023-01-26 18:14 ` Anup Patel
2023-01-27 8:41 ` Alexandre Ghiti
2023-01-30 7:29 ` mchitale
1 sibling, 1 reply; 8+ messages in thread
From: Anup Patel @ 2023-01-26 18:14 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: Mayuresh Chitale, Paul Walmsley, Palmer Dabbelt, Albert Ou,
linux-riscv, Nanyong Sun, Andrew Jones
On Thu, Jan 26, 2023 at 9:03 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Mayuresh,
>
> On 1/25/23 13:55, Mayuresh Chitale wrote:
> > When THP is enabled, 4K pages are collapsed into a single huge
> > page using the generic pmdp_collapse_flush() which will further
> > use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately,
> > the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
> > using address specific SFENCEs which results in repetitive (or
> > unpredictable) page faults on RISC-V implementations which cache
> > non-leaf PTEs.
>
>
> That's interesting! I'm wondering if the same issue will happen if a
> user maps 4K, unmaps it and at the same address maps a 2MB hugepage: I'm
> not sure the mm code would correctly flush the non-leaf PTE when
> unmapping the 4KB page. In that case, your patch only fixes the THP
> usecase and maybe we should try to catch this non-leaf -> leaf upgrade
> at some lower level page table functions, what do you think?
This issue can happen whenever existing/valid non-leaf PTE is modified.
We hit this issue in the THP case but we can also hit this issue in other
scenarios where the page table programming pattern is similar.
Regards,
Anup
>
> Alex
>
>
> > Provide a RISC-V specific pmdp_collapse_flush() which ensures both
> > cached leaf and non-leaf PTEs are invalidated by using non-address
> > specific SFENCEs as recommended by the RISC-V privileged specification.
> >
> > Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
> > Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> > ---
> > arch/riscv/include/asm/pgtable.h | 24 ++++++++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index 4eba9a98d0e3..6d948dec6020 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> > page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
> > return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
> > }
> > +
> > +#define pmdp_collapse_flush pmdp_collapse_flush
> > +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
> > + unsigned long address, pmd_t *pmdp)
> > +{
> > + pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
> > +
> > + /*
> > + * When leaf PTE enteries (regular pages) are collapsed into a leaf
> > + * PMD entry (huge page), a valid non-leaf PTE is converted into a
> > + * valid leaf PTE at the level 1 page table. The RISC-V privileged v1.12
> > + * specification allows implementations to cache valid non-leaf PTEs,
> > + * but the section "4.2.1 Supervisor Memory-Management Fence
> > + * Instruction" recommends the following:
> > + * "If software modifies a non-leaf PTE, it should execute SFENCE.VMA
> > + * with rs1=x0. If any PTE along the traversal path had its G bit set,
> > + * rs2 must be x0; otherwise, rs2 should be set to the ASID for which
> > + * the translation is being modified."
> > + * Based on the above recommendation, we should do full flush whenever
> > + * leaf PTE entries are collapsed into a leaf PMD entry.
> > + */
> > + flush_tlb_mm(vma->vm_mm);
> > + return pmd;
> > +}
> > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >
> > /*
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP
2023-01-26 18:14 ` Anup Patel
@ 2023-01-27 8:41 ` Alexandre Ghiti
2023-01-30 7:34 ` mchitale
0 siblings, 1 reply; 8+ messages in thread
From: Alexandre Ghiti @ 2023-01-27 8:41 UTC (permalink / raw)
To: Anup Patel
Cc: Mayuresh Chitale, Paul Walmsley, Palmer Dabbelt, Albert Ou,
linux-riscv, Nanyong Sun, Andrew Jones
On 1/26/23 19:14, Anup Patel wrote:
> On Thu, Jan 26, 2023 at 9:03 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>> Hi Mayuresh,
>>
>> On 1/25/23 13:55, Mayuresh Chitale wrote:
>>> When THP is enabled, 4K pages are collapsed into a single huge
>>> page using the generic pmdp_collapse_flush() which will further
>>> use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately,
>>> the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
>>> using address specific SFENCEs which results in repetitive (or
>>> unpredictable) page faults on RISC-V implementations which cache
>>> non-leaf PTEs.
>>
>> That's interesting! I'm wondering if the same issue will happen if a
>> user maps 4K, unmaps it and at the same address maps a 2MB hugepage: I'm
>> not sure the mm code would correctly flush the non-leaf PTE when
>> unmapping the 4KB page. In that case, your patch only fixes the THP
>> usecase and maybe we should try to catch this non-leaf -> leaf upgrade
>> at some lower level page table functions, what do you think?
> This issue can happen whenever existing/valid non-leaf PTE is modified.
>
> We hit this issue in the THP case but we can also hit this issue in other
> scenarios where the page table programming pattern is similar.
Then what about trying to get all those cases at once? We can easily
catch those spurious page faults as the pte would be valid: we would
just have to flush_tlb_mm at the first page fault, if any.
>
> Regards,
> Anup
>
>> Alex
>>
>>
>>> Provide a RISC-V specific pmdp_collapse_flush() which ensures both
>>> cached leaf and non-leaf PTEs are invalidated by using non-address
>>> specific SFENCEs as recommended by the RISC-V privileged specification.
>>>
>>> Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
>>> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
>>> ---
>>> arch/riscv/include/asm/pgtable.h | 24 ++++++++++++++++++++++++
>>> 1 file changed, 24 insertions(+)
>>>
>>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>>> index 4eba9a98d0e3..6d948dec6020 100644
>>> --- a/arch/riscv/include/asm/pgtable.h
>>> +++ b/arch/riscv/include/asm/pgtable.h
>>> @@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>>> page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
>>> return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
>>> }
>>> +
>>> +#define pmdp_collapse_flush pmdp_collapse_flush
>>> +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
>>> + unsigned long address, pmd_t *pmdp)
>>> +{
>>> + pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
>>> +
>>> + /*
>>> + * When leaf PTE enteries (regular pages) are collapsed into a leaf
>>> + * PMD entry (huge page), a valid non-leaf PTE is converted into a
>>> + * valid leaf PTE at the level 1 page table. The RISC-V privileged v1.12
>>> + * specification allows implementations to cache valid non-leaf PTEs,
>>> + * but the section "4.2.1 Supervisor Memory-Management Fence
>>> + * Instruction" recommends the following:
>>> + * "If software modifies a non-leaf PTE, it should execute SFENCE.VMA
>>> + * with rs1=x0. If any PTE along the traversal path had its G bit set,
>>> + * rs2 must be x0; otherwise, rs2 should be set to the ASID for which
>>> + * the translation is being modified."
>>> + * Based on the above recommendation, we should do full flush whenever
>>> + * leaf PTE entries are collapsed into a leaf PMD entry.
>>> + */
>>> + flush_tlb_mm(vma->vm_mm);
>>> + return pmd;
>>> +}
>>> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>>
>>> /*
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP
2023-01-27 8:41 ` Alexandre Ghiti
@ 2023-01-30 7:34 ` mchitale
0 siblings, 0 replies; 8+ messages in thread
From: mchitale @ 2023-01-30 7:34 UTC (permalink / raw)
To: Alexandre Ghiti, Anup Patel
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv,
Nanyong Sun, Andrew Jones
On Fri, 2023-01-27 at 09:41 +0100, Alexandre Ghiti wrote:
> On 1/26/23 19:14, Anup Patel wrote:
> > On Thu, Jan 26, 2023 at 9:03 PM Alexandre Ghiti <alex@ghiti.fr>
> > wrote:
> > > Hi Mayuresh,
> > >
> > > On 1/25/23 13:55, Mayuresh Chitale wrote:
> > > > When THP is enabled, 4K pages are collapsed into a single huge
> > > > page using the generic pmdp_collapse_flush() which will further
> > > > use flush_tlb_range() to shoot-down stale TLB entries.
> > > > Unfortunately,
> > > > the generic pmdp_collapse_flush() only invalidates cached leaf
> > > > PTEs
> > > > using address specific SFENCEs which results in repetitive (or
> > > > unpredictable) page faults on RISC-V implementations which
> > > > cache
> > > > non-leaf PTEs.
> > >
> > > That's interesting! I'm wondering if the same issue will happen
> > > if a
> > > user maps 4K, unmaps it and at the same address maps a 2MB
> > > hugepage: I'm
> > > not sure the mm code would correctly flush the non-leaf PTE when
> > > unmapping the 4KB page. In that case, your patch only fixes the
> > > THP
> > > usecase and maybe we should try to catch this non-leaf -> leaf
> > > upgrade
> > > at some lower level page table functions, what do you think?
> > This issue can happen whenever existing/valid non-leaf PTE is
> > modified.
> >
> > We hit this issue in the THP case but we can also hit this issue in
> > other
> > scenarios where the page table programming pattern is similar.
>
> Then what about trying to get all those cases at once? We can easily
> catch those spurious page faults as the pte would be valid: we would
> just have to flush_tlb_mm at the first page fault, if any.
IMO, we can consider fixing those cases in a separate patch.
>
>
> > Regards,
> > Anup
> >
> > > Alex
> > >
> > >
> > > > Provide a RISC-V specific pmdp_collapse_flush() which ensures
> > > > both
> > > > cached leaf and non-leaf PTEs are invalidated by using non-
> > > > address
> > > > specific SFENCEs as recommended by the RISC-V privileged
> > > > specification.
> > > >
> > > > Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
> > > > Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> > > > ---
> > > > arch/riscv/include/asm/pgtable.h | 24
> > > > ++++++++++++++++++++++++
> > > > 1 file changed, 24 insertions(+)
> > > >
> > > > diff --git a/arch/riscv/include/asm/pgtable.h
> > > > b/arch/riscv/include/asm/pgtable.h
> > > > index 4eba9a98d0e3..6d948dec6020 100644
> > > > --- a/arch/riscv/include/asm/pgtable.h
> > > > +++ b/arch/riscv/include/asm/pgtable.h
> > > > @@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct
> > > > vm_area_struct *vma,
> > > > page_table_check_pmd_set(vma->vm_mm, address, pmdp,
> > > > pmd);
> > > > return __pmd(atomic_long_xchg((atomic_long_t *)pmdp,
> > > > pmd_val(pmd)));
> > > > }
> > > > +
> > > > +#define pmdp_collapse_flush pmdp_collapse_flush
> > > > +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct
> > > > *vma,
> > > > + unsigned long address,
> > > > pmd_t *pmdp)
> > > > +{
> > > > + pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address,
> > > > pmdp);
> > > > +
> > > > + /*
> > > > + * When leaf PTE enteries (regular pages) are collapsed
> > > > into a leaf
> > > > + * PMD entry (huge page), a valid non-leaf PTE is
> > > > converted into a
> > > > + * valid leaf PTE at the level 1 page table. The RISC-V
> > > > privileged v1.12
> > > > + * specification allows implementations to cache valid
> > > > non-leaf PTEs,
> > > > + * but the section "4.2.1 Supervisor Memory-Management
> > > > Fence
> > > > + * Instruction" recommends the following:
> > > > + * "If software modifies a non-leaf PTE, it should
> > > > execute SFENCE.VMA
> > > > + * with rs1=x0. If any PTE along the traversal path had
> > > > its G bit set,
> > > > + * rs2 must be x0; otherwise, rs2 should be set to the
> > > > ASID for which
> > > > + * the translation is being modified."
> > > > + * Based on the above recommendation, we should do full
> > > > flush whenever
> > > > + * leaf PTE entries are collapsed into a leaf PMD entry.
> > > > + */
> > > > + flush_tlb_mm(vma->vm_mm);
> > > > + return pmd;
> > > > +}
> > > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > >
> > > > /*
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP
2023-01-26 15:33 ` Alexandre Ghiti
2023-01-26 18:14 ` Anup Patel
@ 2023-01-30 7:29 ` mchitale
1 sibling, 0 replies; 8+ messages in thread
From: mchitale @ 2023-01-30 7:29 UTC (permalink / raw)
To: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Albert Ou,
linux-riscv
Cc: Nanyong Sun, Anup Patel, Andrew Jones
On Thu, 2023-01-26 at 16:33 +0100, Alexandre Ghiti wrote:
> Hi Mayuresh,
>
> On 1/25/23 13:55, Mayuresh Chitale wrote:
> > When THP is enabled, 4K pages are collapsed into a single huge
> > page using the generic pmdp_collapse_flush() which will further
> > use flush_tlb_range() to shoot-down stale TLB entries.
> > Unfortunately,
> > the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
> > using address specific SFENCEs which results in repetitive (or
> > unpredictable) page faults on RISC-V implementations which cache
> > non-leaf PTEs.
>
> That's interesting! I'm wondering if the same issue will happen if a
> user maps 4K, unmaps it and at the same address maps a 2MB hugepage:
> I'm
> not sure the mm code would correctly flush the non-leaf PTE when
> unmapping the 4KB page. In that case, your patch only fixes the THP
> usecase and maybe we should try to catch this non-leaf -> leaf
> upgrade
> at some lower level page table functions, what do you think?
I will look into it but I dont know how to reproduce the issue without
the THP use case. It would be great if you could share the test case or
test code to reproduce it.
>
> Alex
>
>
> > Provide a RISC-V specific pmdp_collapse_flush() which ensures both
> > cached leaf and non-leaf PTEs are invalidated by using non-address
> > specific SFENCEs as recommended by the RISC-V privileged
> > specification.
> >
> > Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
> > Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> > ---
> > arch/riscv/include/asm/pgtable.h | 24 ++++++++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/pgtable.h
> > b/arch/riscv/include/asm/pgtable.h
> > index 4eba9a98d0e3..6d948dec6020 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct
> > vm_area_struct *vma,
> > page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
> > return __pmd(atomic_long_xchg((atomic_long_t *)pmdp,
> > pmd_val(pmd)));
> > }
> > +
> > +#define pmdp_collapse_flush pmdp_collapse_flush
> > +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct
> > *vma,
> > + unsigned long address, pmd_t
> > *pmdp)
> > +{
> > + pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
> > +
> > + /*
> > + * When leaf PTE enteries (regular pages) are collapsed into a
> > leaf
> > + * PMD entry (huge page), a valid non-leaf PTE is converted
> > into a
> > + * valid leaf PTE at the level 1 page table. The RISC-V
> > privileged v1.12
> > + * specification allows implementations to cache valid non-leaf
> > PTEs,
> > + * but the section "4.2.1 Supervisor Memory-Management Fence
> > + * Instruction" recommends the following:
> > + * "If software modifies a non-leaf PTE, it should execute
> > SFENCE.VMA
> > + * with rs1=x0. If any PTE along the traversal path had its G
> > bit set,
> > + * rs2 must be x0; otherwise, rs2 should be set to the ASID for
> > which
> > + * the translation is being modified."
> > + * Based on the above recommendation, we should do full flush
> > whenever
> > + * leaf PTE entries are collapsed into a leaf PMD entry.
> > + */
> > + flush_tlb_mm(vma->vm_mm);
> > + return pmd;
> > +}
> > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >
> > /*
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-01-30 7:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-25 12:55 [PATCH] riscv: mm: Implement pmdp_collapse_flush for THP Mayuresh Chitale
2023-01-25 15:35 ` Andrew Jones
2023-01-30 7:26 ` mchitale
2023-01-26 15:33 ` Alexandre Ghiti
2023-01-26 18:14 ` Anup Patel
2023-01-27 8:41 ` Alexandre Ghiti
2023-01-30 7:34 ` mchitale
2023-01-30 7:29 ` mchitale
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox