linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
@ 2025-03-21 13:06 Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 1/9] riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes Alexandre Ghiti
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-21 13:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm
  Cc: Alexandre Ghiti

This patchset intends to merge the contiguous ptes hugetlbfs implementation
of arm64 and riscv.

Both arm64 and riscv support the use of contiguous ptes to map pages that
are larger than the default page table size, respectively called contpte
and svnapot.

The riscv implementation differs from the arm64's in that the LSBs of the
pfn of a svnapot pte are used to store the size of the mapping, allowing
for future sizes to be added (for now only 64KB is supported). That's an
issue for the core mm code which expects to find the *real* pfn a pte points
to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
and restores the size of the mapping when it is written to a page table.

The following patches are just merges of the 2 different implementations
that currently exist in arm64 and riscv which are very similar. It paves
the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
reimplementing the same in riscv.

This patchset was tested by running the libhugetlbfs testsuite with 64KB
and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).

[1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/

v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/
v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/
v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/

Changes in v5:
  - Fix "int i" unused variable in patch 2 (as reported by PW)
  - Fix !svnapot build
  - Fix arch_make_huge_pte() which returned a real napot pte
  - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
    avoid leaking real napot pfns to core mm
  - Fix arch_contpte_get_num_contig() that used to always try to get the
    mapping size from the ptep, which does not work if the ptep comes the core mm
  - Rebase on top of 6.14-rc7 + fix for
    huge_ptep_get_and_clear()/huge_pte_clear()
    https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/

Changes in v4:
  - Rebase on top of 6.13

Changes in v3:
  - Split set_ptes and ptep_get into internal and external API (Ryan)
  - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
    we split hugetlb functions from contpte functions (actually riscv contpte
    functions to support THP will come into another series) (Ryan)
  - Rebase on top of 6.11-rc1

Changes in v2:
  - Rebase on top of 6.9-rc3

Alexandre Ghiti (9):
  riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
  riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
  mm: Use common huge_ptep_get() function for riscv/arm64
  mm: Use common set_huge_pte_at() function for riscv/arm64
  mm: Use common huge_pte_clear() function for riscv/arm64
  mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
  mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
  mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
  mm: Use common huge_ptep_clear_flush() function for riscv/arm64

 arch/arm64/Kconfig                  |   1 +
 arch/arm64/include/asm/hugetlb.h    |  22 +--
 arch/arm64/include/asm/pgtable.h    |  68 ++++++-
 arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
 arch/riscv/Kconfig                  |   1 +
 arch/riscv/include/asm/hugetlb.h    |  36 +---
 arch/riscv/include/asm/pgtable-64.h |  11 ++
 arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
 arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
 arch/riscv/mm/pgtable.c             |   6 +-
 include/linux/hugetlb_contpte.h     |  39 ++++
 mm/Kconfig                          |   3 +
 mm/Makefile                         |   1 +
 mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
 14 files changed, 583 insertions(+), 622 deletions(-)
 create mode 100644 include/linux/hugetlb_contpte.h
 create mode 100644 mm/hugetlb_contpte.c

-- 
2.39.2



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 1/9] riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
@ 2025-03-21 13:06 ` Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 2/9] riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code Alexandre Ghiti
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-21 13:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm
  Cc: Alexandre Ghiti

The pte_t pointer is expected to point to the first entry of the NAPOT
mapping so no need to use huge_pte_offset(), similarly to what is done
in arm64.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/mm/hugetlbpage.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 375dd96bb4a0..3192ad804279 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -287,7 +287,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 
 	order = napot_cont_order(pte);
 	pte_num = napot_pte_num(order);
-	ptep = huge_pte_offset(mm, addr, napot_cont_size(order));
 	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
 
 	if (pte_dirty(orig_pte))
@@ -334,7 +333,6 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
 
 	order = napot_cont_order(pte);
 	pte_num = napot_pte_num(order);
-	ptep = huge_pte_offset(mm, addr, napot_cont_size(order));
 	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
 
 	orig_pte = pte_wrprotect(orig_pte);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 2/9] riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 1/9] riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes Alexandre Ghiti
@ 2025-03-21 13:06 ` Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 3/9] mm: Use common huge_ptep_get() function for riscv/arm64 Alexandre Ghiti
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-21 13:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm
  Cc: Alexandre Ghiti

The core mm code expects to be able to extract the pfn from a pte. NAPOT
mappings work differently since its ptes actually point to the first pfn
of the mapping, the other bits being used to encode the size of the
mapping.

So modify ptep_get() so that it returns a pte value that contains the
*real* pfn (which is then different from what the HW expects) and right
before storing the ptes to the page table, reset the pfn LSBs to the
size of the mapping.

And make sure that all NAPOT mappings are set using set_ptes().

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/include/asm/pgtable-64.h |  11 ++
 arch/riscv/include/asm/pgtable.h    | 155 ++++++++++++++++++++++++----
 arch/riscv/mm/hugetlbpage.c         |  15 ++-
 3 files changed, 152 insertions(+), 29 deletions(-)

diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index 0897dd99ab8d..cddbe426f618 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -104,6 +104,17 @@ enum napot_cont_order {
 #define napot_cont_mask(order)	(~(napot_cont_size(order) - 1UL))
 #define napot_pte_num(order)	BIT(order)
 
+static inline bool is_napot_order(unsigned int order)
+{
+	unsigned int napot_order;
+
+	for_each_napot_order(napot_order)
+		if (order == napot_order)
+			return true;
+
+	return false;
+}
+
 #ifdef CONFIG_RISCV_ISA_SVNAPOT
 #define HUGE_MAX_HSTATE		(2 + (NAPOT_ORDER_MAX - NAPOT_CONT_ORDER_BASE))
 #else
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 050fdc49b5ad..2e62d7e607db 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -296,6 +296,17 @@ static inline unsigned long pte_napot(pte_t pte)
 	return pte_val(pte) & _PAGE_NAPOT;
 }
 
+#define pte_valid_napot(pte)	(pte_present(pte) && pte_napot(pte))
+
+/*
+ * contpte is what we expose to the core mm code, this is not exactly a napot
+ * mapping since the size is not encoded in the pfn yet.
+ */
+static inline pte_t pte_mkcont(pte_t pte)
+{
+	return __pte(pte_val(pte) | _PAGE_NAPOT);
+}
+
 static inline pte_t pte_mknapot(pte_t pte, unsigned int order)
 {
 	int pos = order - 1 + _PAGE_PFN_SHIFT;
@@ -305,6 +316,12 @@ static inline pte_t pte_mknapot(pte_t pte, unsigned int order)
 	return __pte((pte_val(pte) & napot_mask) | napot_bit | _PAGE_NAPOT);
 }
 
+/* pte at entry must *not* encode the mapping size in the pfn LSBs. */
+static inline pte_t pte_clear_napot(pte_t pte)
+{
+	return __pte(pte_val(pte) & ~_PAGE_NAPOT);
+}
+
 #else
 
 static __always_inline bool has_svnapot(void) { return false; }
@@ -314,17 +331,24 @@ static inline unsigned long pte_napot(pte_t pte)
 	return 0;
 }
 
+static inline pte_t pte_clear_napot(pte_t pte)
+{
+	return pte;
+}
+
+static inline pte_t pte_mknapot(pte_t pte, unsigned int order)
+{
+	return pte;
+}
+
+#define pte_valid_napot(pte)	false
+
 #endif /* CONFIG_RISCV_ISA_SVNAPOT */
 
 /* Yields the page frame number (PFN) of a page table entry */
 static inline unsigned long pte_pfn(pte_t pte)
 {
-	unsigned long res  = __page_val_to_pfn(pte_val(pte));
-
-	if (has_svnapot() && pte_napot(pte))
-		res = res & (res - 1UL);
-
-	return res;
+	return __page_val_to_pfn(pte_val(pte));
 }
 
 #define pte_page(x)     pfn_to_page(pte_pfn(x))
@@ -559,8 +583,13 @@ static inline void __set_pte_at(struct mm_struct *mm, pte_t *ptep, pte_t pteval)
 
 #define PFN_PTE_SHIFT		_PAGE_PFN_SHIFT
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pteval, unsigned int nr)
+static inline pte_t ___ptep_get(pte_t *ptep)
+{
+	return READ_ONCE(*ptep);
+}
+
+static inline void ___set_ptes(struct mm_struct *mm, unsigned long addr,
+			       pte_t *ptep, pte_t pteval, unsigned int nr)
 {
 	page_table_check_ptes_set(mm, ptep, pteval, nr);
 
@@ -569,10 +598,13 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		if (--nr == 0)
 			break;
 		ptep++;
+
+		if (unlikely(pte_valid_napot(pteval)))
+			continue;
+
 		pte_val(pteval) += 1 << _PAGE_PFN_SHIFT;
 	}
 }
-#define set_ptes set_ptes
 
 static inline void pte_clear(struct mm_struct *mm,
 	unsigned long addr, pte_t *ptep)
@@ -587,17 +619,6 @@ extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addre
 extern int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long address,
 				     pte_t *ptep);
 
-#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
-static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
-				       unsigned long address, pte_t *ptep)
-{
-	pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0));
-
-	page_table_check_pte_clear(mm, pte);
-
-	return pte;
-}
-
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 static inline void ptep_set_wrprotect(struct mm_struct *mm,
 				      unsigned long address, pte_t *ptep)
@@ -627,6 +648,100 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
 	return ptep_test_and_clear_young(vma, address, ptep);
 }
 
+#ifdef CONFIG_RISCV_ISA_SVNAPOT
+static inline pte_t pte_napot_clear_pfn(pte_t *ptep, pte_t pte)
+{
+	/*
+	 * The pte we load has the N bit set and the size of the mapping in
+	 * the pfn LSBs: keep the N bit and replace the mapping size with
+	 * the *real* pfn since the core mm code expects to find it there.
+	 * The mapping size will be reset just before being written to the
+	 * page table in set_ptes().
+	 */
+	if (unlikely(pte_valid_napot(pte))) {
+		unsigned int order = napot_cont_order(pte);
+		int pos = order - 1 + _PAGE_PFN_SHIFT;
+		unsigned long napot_mask = ~GENMASK(pos, _PAGE_PFN_SHIFT);
+		pte_t *orig_ptep = PTR_ALIGN_DOWN(ptep, sizeof(*ptep) * napot_pte_num(order));
+
+		pte = __pte((pte_val(pte) & napot_mask) + ((ptep - orig_ptep) << _PAGE_PFN_SHIFT));
+	}
+
+	return pte;
+}
+
+#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
+static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
+				       unsigned long address, pte_t *ptep)
+{
+	pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0));
+
+	pte = pte_napot_clear_pfn(ptep, pte);
+
+	page_table_check_pte_clear(mm, pte);
+
+	return pte;
+}
+
+static inline pte_t __ptep_get(pte_t *ptep)
+{
+	pte_t pte = ___ptep_get(ptep);
+
+	return pte_napot_clear_pfn(ptep, pte);
+}
+
+static inline void __set_ptes(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pteval, unsigned int nr)
+{
+	if (unlikely(pte_valid_napot(pteval))) {
+		unsigned int order = ilog2(nr);
+
+		if (!is_napot_order(order)) {
+			/*
+			 * Something's weird, we are given a NAPOT pte but the
+			 * size of the mapping is not a known NAPOT mapping
+			 * size, so clear the NAPOT bit and map this without
+			 * NAPOT support: core mm only manipulates pte with the
+			 * real pfn so we know the pte is valid without the N
+			 * bit.
+			 */
+			pr_err("Incorrect NAPOT mapping, resetting.\n");
+			pteval = pte_clear_napot(pteval);
+		} else {
+			/*
+			 * NAPOT ptes that arrive here only have the N bit set
+			 * and their pfn does not contain the mapping size, so
+			 * set that here.
+			 */
+			pteval = pte_mknapot(pteval, order);
+		}
+	}
+
+	___set_ptes(mm, addr, ptep, pteval, nr);
+}
+
+#define ptep_get			__ptep_get
+#define set_ptes			__set_ptes
+#define set_contptes(mm, addr, ptep, pte, nr, pgsize)			\
+			set_ptes(mm, addr, ptep, pte, nr)
+#else
+#define ptep_get			___ptep_get
+#define __ptep_get			___ptep_get
+#define set_ptes			___set_ptes
+#define __set_ptes			___set_ptes
+
+#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
+static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
+				       unsigned long address, pte_t *ptep)
+{
+	pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0));
+
+	page_table_check_pte_clear(mm, pte);
+
+	return pte;
+}
+#endif /* CONFIG_RISCV_ISA_SVNAPOT */
+
 #define pgprot_nx pgprot_nx
 static inline pgprot_t pgprot_nx(pgprot_t _prot)
 {
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 3192ad804279..60b7e738b31a 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -190,7 +190,7 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 
 	for_each_napot_order(order) {
 		if (shift == napot_cont_shift(order)) {
-			entry = pte_mknapot(entry, order);
+			entry = pte_mkcont(entry);
 			break;
 		}
 	}
@@ -267,8 +267,7 @@ void set_huge_pte_at(struct mm_struct *mm,
 
 	clear_flush(mm, addr, ptep, pgsize, pte_num);
 
-	for (i = 0; i < pte_num; i++, ptep++, addr += pgsize)
-		set_pte_at(mm, addr, ptep, pte);
+	set_ptes(mm, addr, ptep, pte, pte_num);
 }
 
 int huge_ptep_set_access_flags(struct vm_area_struct *vma,
@@ -280,7 +279,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long order;
 	pte_t orig_pte;
-	int i, pte_num;
+	int pte_num;
 
 	if (!pte_napot(pte))
 		return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
@@ -295,8 +294,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 	if (pte_young(orig_pte))
 		pte = pte_mkyoung(pte);
 
-	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
-		set_pte_at(mm, addr, ptep, pte);
+	set_ptes(mm, addr, ptep, pte, pte_num);
 
 	return true;
 }
@@ -324,7 +322,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
 	pte_t pte = ptep_get(ptep);
 	unsigned long order;
 	pte_t orig_pte;
-	int i, pte_num;
+	int pte_num;
 
 	if (!pte_napot(pte)) {
 		ptep_set_wrprotect(mm, addr, ptep);
@@ -337,8 +335,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
 
 	orig_pte = pte_wrprotect(orig_pte);
 
-	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
-		set_pte_at(mm, addr, ptep, orig_pte);
+	set_ptes(mm, addr, ptep, orig_pte, pte_num);
 }
 
 pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 3/9] mm: Use common huge_ptep_get() function for riscv/arm64
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 1/9] riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 2/9] riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code Alexandre Ghiti
@ 2025-03-21 13:06 ` Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 4/9] mm: Use common set_huge_pte_at() " Alexandre Ghiti
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-21 13:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm
  Cc: Alexandre Ghiti

After some adjustments, both architectures have the same implementation
so move it to the generic code.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/arm64/Kconfig               |  1 +
 arch/arm64/include/asm/hugetlb.h |  3 +-
 arch/arm64/include/asm/pgtable.h | 57 +++++++++++++++++++++++++--
 arch/arm64/mm/hugetlbpage.c      | 66 ++------------------------------
 arch/riscv/Kconfig               |  1 +
 arch/riscv/include/asm/hugetlb.h |  6 +--
 arch/riscv/include/asm/pgtable.h | 45 ++++++++++++++++++++++
 arch/riscv/mm/hugetlbpage.c      | 62 +++---------------------------
 include/linux/hugetlb_contpte.h  | 12 ++++++
 mm/Kconfig                       |  3 ++
 mm/Makefile                      |  1 +
 mm/hugetlb_contpte.c             | 32 ++++++++++++++++
 12 files changed, 161 insertions(+), 128 deletions(-)
 create mode 100644 include/linux/hugetlb_contpte.h
 create mode 100644 mm/hugetlb_contpte.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 940343beb3d4..5a1e1bc73c15 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -113,6 +113,7 @@ config ARM64
 	select ARCH_WANT_DEFAULT_BPF_JIT
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
 	select ARCH_WANT_FRAME_POINTERS
+	select ARCH_WANT_GENERAL_HUGETLB_CONTPTE
 	select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
 	select ARCH_WANT_LD_ORPHAN_WARN
 	select ARCH_WANTS_EXECMEM_LATE
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 07fbf5bf85a7..0604e01dca97 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -13,6 +13,7 @@
 #include <asm/cacheflush.h>
 #include <asm/mte.h>
 #include <asm/page.h>
+#include <linux/hugetlb_contpte.h>
 
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
 #define arch_hugetlb_migration_supported arch_hugetlb_migration_supported
@@ -53,8 +54,6 @@ extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 			   pte_t *ptep, unsigned long sz);
-#define __HAVE_ARCH_HUGE_PTEP_GET
-extern pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
 
 void __init arm64_hugetlb_cma_reserve(void);
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0b2a2ad1b9e8..af8156929c1d 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -420,9 +420,10 @@ static inline pte_t pte_advance_pfn(pte_t pte, unsigned long nr)
 	return pfn_pte(pte_pfn(pte) + nr, pte_pgprot(pte));
 }
 
-static inline void __set_ptes(struct mm_struct *mm,
-			      unsigned long __always_unused addr,
-			      pte_t *ptep, pte_t pte, unsigned int nr)
+static inline void ___set_ptes(struct mm_struct *mm,
+			       unsigned long __always_unused addr,
+			       pte_t *ptep, pte_t pte, unsigned int nr,
+			       size_t pgsize)
 {
 	page_table_check_ptes_set(mm, ptep, pte, nr);
 	__sync_cache_and_tags(pte, nr);
@@ -433,10 +434,15 @@ static inline void __set_ptes(struct mm_struct *mm,
 		if (--nr == 0)
 			break;
 		ptep++;
-		pte = pte_advance_pfn(pte, 1);
+		pte = pte_advance_pfn(pte, pgsize >> PAGE_SHIFT);
 	}
 }
 
+#define __set_ptes(mm, addr, ptep, pte, nr)				\
+			___set_ptes(mm, addr, ptep, pte, nr, PAGE_SIZE)
+
+#define set_contptes	___set_ptes
+
 /*
  * Hugetlb definitions.
  */
@@ -1825,6 +1831,49 @@ static inline void clear_young_dirty_ptes(struct vm_area_struct *vma,
 
 #endif /* CONFIG_ARM64_CONTPTE */
 
+static inline bool __hugetlb_valid_size(unsigned long size)
+{
+	switch (size) {
+#ifndef __PAGETABLE_PMD_FOLDED
+	case PUD_SIZE:
+		return pud_sect_supported();
+#endif
+	case CONT_PMD_SIZE:
+	case PMD_SIZE:
+	case CONT_PTE_SIZE:
+		return true;
+	}
+
+	return false;
+}
+
+static inline int arch_contpte_get_num_contig(pte_t *ptep,
+					      unsigned long size,
+					      size_t *pgsize)
+{
+	int contig_ptes = 1;
+
+	if (pgsize)
+		*pgsize = size;
+
+	switch (size) {
+	case CONT_PMD_SIZE:
+		if (pgsize)
+			*pgsize = PMD_SIZE;
+		contig_ptes = CONT_PMDS;
+		break;
+	case CONT_PTE_SIZE:
+		if (pgsize)
+			*pgsize = PAGE_SIZE;
+		contig_ptes = CONT_PTES;
+		break;
+	default:
+		WARN_ON(!__hugetlb_valid_size(size));
+	}
+
+	return contig_ptes;
+}
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_PGTABLE_H */
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index b3a7fafe8892..60a2bb7575c1 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -49,22 +49,6 @@ void __init arm64_hugetlb_cma_reserve(void)
 }
 #endif /* CONFIG_CMA */
 
-static bool __hugetlb_valid_size(unsigned long size)
-{
-	switch (size) {
-#ifndef __PAGETABLE_PMD_FOLDED
-	case PUD_SIZE:
-		return pud_sect_supported();
-#endif
-	case CONT_PMD_SIZE:
-	case PMD_SIZE:
-	case CONT_PTE_SIZE:
-		return true;
-	}
-
-	return false;
-}
-
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
 bool arch_hugetlb_migration_supported(struct hstate *h)
 {
@@ -98,50 +82,6 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
 	return CONT_PTES;
 }
 
-static inline int num_contig_ptes(unsigned long size, size_t *pgsize)
-{
-	int contig_ptes = 1;
-
-	*pgsize = size;
-
-	switch (size) {
-	case CONT_PMD_SIZE:
-		*pgsize = PMD_SIZE;
-		contig_ptes = CONT_PMDS;
-		break;
-	case CONT_PTE_SIZE:
-		*pgsize = PAGE_SIZE;
-		contig_ptes = CONT_PTES;
-		break;
-	default:
-		WARN_ON(!__hugetlb_valid_size(size));
-	}
-
-	return contig_ptes;
-}
-
-pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
-{
-	int ncontig, i;
-	size_t pgsize;
-	pte_t orig_pte = __ptep_get(ptep);
-
-	if (!pte_present(orig_pte) || !pte_cont(orig_pte))
-		return orig_pte;
-
-	ncontig = num_contig_ptes(page_size(pte_page(orig_pte)), &pgsize);
-	for (i = 0; i < ncontig; i++, ptep++) {
-		pte_t pte = __ptep_get(ptep);
-
-		if (pte_dirty(pte))
-			orig_pte = pte_mkdirty(orig_pte);
-
-		if (pte_young(pte))
-			orig_pte = pte_mkyoung(orig_pte);
-	}
-	return orig_pte;
-}
-
 /*
  * Changing some bits of contiguous entries requires us to follow a
  * Break-Before-Make approach, breaking the whole contiguous set
@@ -221,7 +161,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long pfn, dpfn;
 	pgprot_t hugeprot;
 
-	ncontig = num_contig_ptes(sz, &pgsize);
+	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
 
 	if (!pte_present(pte)) {
 		for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
@@ -382,7 +322,7 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 	int i, ncontig;
 	size_t pgsize;
 
-	ncontig = num_contig_ptes(sz, &pgsize);
+	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
 
 	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
 		__pte_clear(mm, addr, ptep);
@@ -394,7 +334,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 	int ncontig;
 	size_t pgsize;
 
-	ncontig = num_contig_ptes(sz, &pgsize);
+	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
 	return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
 }
 
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 7612c52e9b1e..2a5b2a9f2816 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -79,6 +79,7 @@ config RISCV
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT
+	select ARCH_WANT_GENERAL_HUGETLB_CONTPTE if RISCV_ISA_SVNAPOT
 	select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
 	select ARCH_WANT_LD_ORPHAN_WARN if !XIP_KERNEL
 	select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index 446126497768..69393346ade0 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -4,6 +4,9 @@
 
 #include <asm/cacheflush.h>
 #include <asm/page.h>
+#ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB_CONTPTE
+#include <linux/hugetlb_contpte.h>
+#endif
 
 static inline void arch_clear_hugetlb_flags(struct folio *folio)
 {
@@ -44,9 +47,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 			       unsigned long addr, pte_t *ptep,
 			       pte_t pte, int dirty);
 
-#define __HAVE_ARCH_HUGE_PTEP_GET
-pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
-
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 #define arch_make_huge_pte arch_make_huge_pte
 
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 2e62d7e607db..286fe1a32ded 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -296,6 +296,8 @@ static inline unsigned long pte_napot(pte_t pte)
 	return pte_val(pte) & _PAGE_NAPOT;
 }
 
+#define pte_cont		pte_napot
+
 #define pte_valid_napot(pte)	(pte_present(pte) && pte_napot(pte))
 
 /*
@@ -606,6 +608,49 @@ static inline void ___set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#ifdef CONFIG_RISCV_ISA_SVNAPOT
+/*
+ * Some hugetlb functions can be called on !present ptes, so we must use the
+ * size parameter when it is passed.
+ */
+static inline int arch_contpte_get_num_contig(pte_t *ptep, unsigned long size,
+					      size_t *pgsize)
+{
+	unsigned long hugepage_shift;
+	pte_t __pte;
+
+	if (size) {
+		if (size >= PGDIR_SIZE)
+			hugepage_shift = PGDIR_SHIFT;
+		else if (size >= P4D_SIZE)
+			hugepage_shift = P4D_SHIFT;
+		else if (size >= PUD_SIZE)
+			hugepage_shift = PUD_SHIFT;
+		else if (size >= PMD_SIZE)
+			hugepage_shift = PMD_SHIFT;
+		else
+			hugepage_shift = PAGE_SHIFT;
+	} else {
+		/*
+		 * We must read the raw value of the pte to get the size of
+		 * the mapping
+		 */
+		__pte = ___ptep_get(ptep);
+
+		/* Make sure __pte is not a swap entry */
+		BUG_ON(!pte_valid_napot(__pte));
+
+		hugepage_shift = PAGE_SHIFT;
+		size = napot_cont_size(napot_cont_order(__pte));
+	}
+
+	if (pgsize)
+		*pgsize = BIT(hugepage_shift);
+
+	return size >> hugepage_shift;
+}
+#endif
+
 static inline void pte_clear(struct mm_struct *mm,
 	unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 60b7e738b31a..b9eb6b7b214d 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -3,30 +3,6 @@
 #include <linux/err.h>
 
 #ifdef CONFIG_RISCV_ISA_SVNAPOT
-pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
-{
-	unsigned long pte_num;
-	int i;
-	pte_t orig_pte = ptep_get(ptep);
-
-	if (!pte_present(orig_pte) || !pte_napot(orig_pte))
-		return orig_pte;
-
-	pte_num = napot_pte_num(napot_cont_order(orig_pte));
-
-	for (i = 0; i < pte_num; i++, ptep++) {
-		pte_t pte = ptep_get(ptep);
-
-		if (pte_dirty(pte))
-			orig_pte = pte_mkdirty(orig_pte);
-
-		if (pte_young(pte))
-			orig_pte = pte_mkyoung(orig_pte);
-	}
-
-	return orig_pte;
-}
-
 pte_t *huge_pte_alloc(struct mm_struct *mm,
 		      struct vm_area_struct *vma,
 		      unsigned long addr,
@@ -215,26 +191,6 @@ static void clear_flush(struct mm_struct *mm,
 	flush_tlb_range(&vma, saddr, addr);
 }
 
-static int num_contig_ptes_from_size(unsigned long sz, size_t *pgsize)
-{
-	unsigned long hugepage_shift;
-
-	if (sz >= PGDIR_SIZE)
-		hugepage_shift = PGDIR_SHIFT;
-	else if (sz >= P4D_SIZE)
-		hugepage_shift = P4D_SHIFT;
-	else if (sz >= PUD_SIZE)
-		hugepage_shift = PUD_SHIFT;
-	else if (sz >= PMD_SIZE)
-		hugepage_shift = PMD_SHIFT;
-	else
-		hugepage_shift = PAGE_SHIFT;
-
-	*pgsize = 1 << hugepage_shift;
-
-	return sz >> hugepage_shift;
-}
-
 /*
  * When dealing with NAPOT mappings, the privileged specification indicates that
  * "if an update needs to be made, the OS generally should first mark all of the
@@ -252,7 +208,7 @@ void set_huge_pte_at(struct mm_struct *mm,
 	size_t pgsize;
 	int i, pte_num;
 
-	pte_num = num_contig_ptes_from_size(sz, &pgsize);
+	pte_num = arch_contpte_get_num_contig(ptep, sz, &pgsize);
 
 	if (!pte_present(pte)) {
 		for (i = 0; i < pte_num; i++, ptep++, addr += pgsize)
@@ -277,15 +233,13 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 			       int dirty)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long order;
 	pte_t orig_pte;
 	int pte_num;
 
 	if (!pte_napot(pte))
 		return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
 
-	order = napot_cont_order(pte);
-	pte_num = napot_pte_num(order);
+	pte_num = arch_contpte_get_num_contig(ptep, 0, NULL);
 	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
 
 	if (pte_dirty(orig_pte))
@@ -303,14 +257,13 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 			      unsigned long addr,
 			      pte_t *ptep, unsigned long sz)
 {
-	size_t pgsize;
 	pte_t orig_pte = ptep_get(ptep);
 	int pte_num;
 
 	if (!pte_napot(orig_pte))
 		return ptep_get_and_clear(mm, addr, ptep);
 
-	pte_num = num_contig_ptes_from_size(sz, &pgsize);
+	pte_num = arch_contpte_get_num_contig(ptep, sz, NULL);
 
 	return get_clear_contig(mm, addr, ptep, pte_num);
 }
@@ -320,7 +273,6 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
 			     pte_t *ptep)
 {
 	pte_t pte = ptep_get(ptep);
-	unsigned long order;
 	pte_t orig_pte;
 	int pte_num;
 
@@ -329,8 +281,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
 		return;
 	}
 
-	order = napot_cont_order(pte);
-	pte_num = napot_pte_num(order);
+	pte_num = arch_contpte_get_num_contig(ptep, 0, NULL);
 	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
 
 	orig_pte = pte_wrprotect(orig_pte);
@@ -348,7 +299,7 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 	if (!pte_napot(pte))
 		return ptep_clear_flush(vma, addr, ptep);
 
-	pte_num = napot_pte_num(napot_cont_order(pte));
+	pte_num = arch_contpte_get_num_contig(ptep, 0, NULL);
 
 	return get_clear_contig_flush(vma->vm_mm, addr, ptep, pte_num);
 }
@@ -367,8 +318,7 @@ void huge_pte_clear(struct mm_struct *mm,
 		return;
 	}
 
-	pte_num = num_contig_ptes_from_size(sz, &pgsize);
-
+	pte_num = arch_contpte_get_num_contig(ptep, sz, &pgsize);
 	for (i = 0; i < pte_num; i++, addr += pgsize, ptep++)
 		pte_clear(mm, addr, ptep);
 }
diff --git a/include/linux/hugetlb_contpte.h b/include/linux/hugetlb_contpte.h
new file mode 100644
index 000000000000..2ea17e4fe36b
--- /dev/null
+++ b/include/linux/hugetlb_contpte.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2025 Rivos Inc.
+ */
+
+#ifndef _LINUX_HUGETLB_CONTPTE_H
+#define _LINUX_HUGETLB_CONTPTE_H
+
+#define __HAVE_ARCH_HUGE_PTEP_GET
+extern pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
+
+#endif /* _LINUX_HUGETLB_CONTPTE_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 1b501db06417..f9d3f3d49f3e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -853,6 +853,9 @@ config NOMMU_INITIAL_TRIM_EXCESS
 config ARCH_WANT_GENERAL_HUGETLB
 	bool
 
+config ARCH_WANT_GENERAL_HUGETLB_CONTPTE
+	bool
+
 config ARCH_WANTS_THP_SWAP
 	def_bool n
 
diff --git a/mm/Makefile b/mm/Makefile
index 850386a67b3e..76e8b995f551 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -96,6 +96,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_NUMA) += memory-tiers.o
 obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o
 obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
+obj-$(CONFIG_ARCH_WANT_GENERAL_HUGETLB_CONTPTE) += hugetlb_contpte.o
 obj-$(CONFIG_PAGE_COUNTER) += page_counter.o
 obj-$(CONFIG_MEMCG_V1) += memcontrol-v1.o
 obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o
diff --git a/mm/hugetlb_contpte.c b/mm/hugetlb_contpte.c
new file mode 100644
index 000000000000..500d0b96a680
--- /dev/null
+++ b/mm/hugetlb_contpte.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2025 Rivos Inc.
+ */
+
+#include <linux/pgtable.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+
+pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
+{
+	int ncontig, i;
+	pte_t orig_pte = __ptep_get(ptep);
+
+	if (!pte_present(orig_pte) || !pte_cont(orig_pte))
+		return orig_pte;
+
+	ncontig = arch_contpte_get_num_contig(ptep,
+					      page_size(pte_page(orig_pte)),
+					      NULL);
+
+	for (i = 0; i < ncontig; i++, ptep++) {
+		pte_t pte = __ptep_get(ptep);
+
+		if (pte_dirty(pte))
+			orig_pte = pte_mkdirty(orig_pte);
+
+		if (pte_young(pte))
+			orig_pte = pte_mkyoung(orig_pte);
+	}
+	return orig_pte;
+}
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 4/9] mm: Use common set_huge_pte_at() function for riscv/arm64
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
                   ` (2 preceding siblings ...)
  2025-03-21 13:06 ` [PATCH v5 3/9] mm: Use common huge_ptep_get() function for riscv/arm64 Alexandre Ghiti
@ 2025-03-21 13:06 ` Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 5/9] mm: Use common huge_pte_clear() " Alexandre Ghiti
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-21 13:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm
  Cc: Alexandre Ghiti

After some adjustments, both architectures have the same implementation
so move it to the generic code.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/arm64/include/asm/hugetlb.h |  3 --
 arch/arm64/mm/hugetlbpage.c      | 56 --------------------------------
 arch/riscv/include/asm/hugetlb.h |  5 ---
 arch/riscv/include/asm/pgtable.h | 13 ++++----
 arch/riscv/mm/hugetlbpage.c      | 50 ----------------------------
 include/linux/hugetlb_contpte.h  |  5 +++
 mm/hugetlb_contpte.c             | 56 ++++++++++++++++++++++++++++++++
 7 files changed, 68 insertions(+), 120 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 0604e01dca97..cfdc04e11585 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -35,9 +35,6 @@ static inline void arch_clear_hugetlb_flags(struct folio *folio)
 
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 #define arch_make_huge_pte arch_make_huge_pte
-#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
-extern void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-			    pte_t *ptep, pte_t pte, unsigned long sz);
 #define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
 extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 				      unsigned long addr, pte_t *ptep,
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 60a2bb7575c1..6feb90ed2e7d 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -128,62 +128,6 @@ static pte_t get_clear_contig_flush(struct mm_struct *mm,
 	return orig_pte;
 }
 
-/*
- * Changing some bits of contiguous entries requires us to follow a
- * Break-Before-Make approach, breaking the whole contiguous set
- * before we can change any entries. See ARM DDI 0487A.k_iss10775,
- * "Misprogramming of the Contiguous bit", page D4-1762.
- *
- * This helper performs the break step for use cases where the
- * original pte is not needed.
- */
-static void clear_flush(struct mm_struct *mm,
-			     unsigned long addr,
-			     pte_t *ptep,
-			     unsigned long pgsize,
-			     unsigned long ncontig)
-{
-	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
-	unsigned long i, saddr = addr;
-
-	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
-		__ptep_get_and_clear(mm, addr, ptep);
-
-	flush_tlb_range(&vma, saddr, addr);
-}
-
-void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-			    pte_t *ptep, pte_t pte, unsigned long sz)
-{
-	size_t pgsize;
-	int i;
-	int ncontig;
-	unsigned long pfn, dpfn;
-	pgprot_t hugeprot;
-
-	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
-
-	if (!pte_present(pte)) {
-		for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
-			__set_ptes(mm, addr, ptep, pte, 1);
-		return;
-	}
-
-	if (!pte_cont(pte)) {
-		__set_ptes(mm, addr, ptep, pte, 1);
-		return;
-	}
-
-	pfn = pte_pfn(pte);
-	dpfn = pgsize >> PAGE_SHIFT;
-	hugeprot = pte_pgprot(pte);
-
-	clear_flush(mm, addr, ptep, pgsize, ncontig);
-
-	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
-		__set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1);
-}
-
 pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 		      unsigned long addr, unsigned long sz)
 {
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index 69393346ade0..7049a17b819d 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -24,11 +24,6 @@ bool arch_hugetlb_migration_supported(struct hstate *h);
 void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 		    pte_t *ptep, unsigned long sz);
 
-#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
-void set_huge_pte_at(struct mm_struct *mm,
-		     unsigned long addr, pte_t *ptep, pte_t pte,
-		     unsigned long sz);
-
 #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 			      unsigned long addr, pte_t *ptep,
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 286fe1a32ded..5b34b3c9c0f9 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -715,9 +715,8 @@ static inline pte_t pte_napot_clear_pfn(pte_t *ptep, pte_t pte)
 	return pte;
 }
 
-#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
-static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
-				       unsigned long address, pte_t *ptep)
+static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
+					 unsigned long address, pte_t *ptep)
 {
 	pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0));
 
@@ -775,9 +774,8 @@ static inline void __set_ptes(struct mm_struct *mm, unsigned long addr,
 #define set_ptes			___set_ptes
 #define __set_ptes			___set_ptes
 
-#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
-static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
-				       unsigned long address, pte_t *ptep)
+static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
+					 unsigned long address, pte_t *ptep)
 {
 	pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0));
 
@@ -787,6 +785,9 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 }
 #endif /* CONFIG_RISCV_ISA_SVNAPOT */
 
+#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
+#define ptep_get_and_clear	__ptep_get_and_clear
+
 #define pgprot_nx pgprot_nx
 static inline pgprot_t pgprot_nx(pgprot_t _prot)
 {
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index b9eb6b7b214d..75faeacc8138 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -176,56 +176,6 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 	return entry;
 }
 
-static void clear_flush(struct mm_struct *mm,
-			unsigned long addr,
-			pte_t *ptep,
-			unsigned long pgsize,
-			unsigned long ncontig)
-{
-	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
-	unsigned long i, saddr = addr;
-
-	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
-		ptep_get_and_clear(mm, addr, ptep);
-
-	flush_tlb_range(&vma, saddr, addr);
-}
-
-/*
- * When dealing with NAPOT mappings, the privileged specification indicates that
- * "if an update needs to be made, the OS generally should first mark all of the
- * PTEs invalid, then issue SFENCE.VMA instruction(s) covering all 4 KiB regions
- * within the range, [...] then update the PTE(s), as described in Section
- * 4.2.1.". That's the equivalent of the Break-Before-Make approach used by
- * arm64.
- */
-void set_huge_pte_at(struct mm_struct *mm,
-		     unsigned long addr,
-		     pte_t *ptep,
-		     pte_t pte,
-		     unsigned long sz)
-{
-	size_t pgsize;
-	int i, pte_num;
-
-	pte_num = arch_contpte_get_num_contig(ptep, sz, &pgsize);
-
-	if (!pte_present(pte)) {
-		for (i = 0; i < pte_num; i++, ptep++, addr += pgsize)
-			set_ptes(mm, addr, ptep, pte, 1);
-		return;
-	}
-
-	if (!pte_napot(pte)) {
-		set_ptes(mm, addr, ptep, pte, 1);
-		return;
-	}
-
-	clear_flush(mm, addr, ptep, pgsize, pte_num);
-
-	set_ptes(mm, addr, ptep, pte, pte_num);
-}
-
 int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 			       unsigned long addr,
 			       pte_t *ptep,
diff --git a/include/linux/hugetlb_contpte.h b/include/linux/hugetlb_contpte.h
index 2ea17e4fe36b..135b68bd09ca 100644
--- a/include/linux/hugetlb_contpte.h
+++ b/include/linux/hugetlb_contpte.h
@@ -9,4 +9,9 @@
 #define __HAVE_ARCH_HUGE_PTEP_GET
 extern pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
 
+#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
+extern void set_huge_pte_at(struct mm_struct *mm,
+			    unsigned long addr, pte_t *ptep, pte_t pte,
+			    unsigned long sz);
+
 #endif /* _LINUX_HUGETLB_CONTPTE_H */
diff --git a/mm/hugetlb_contpte.c b/mm/hugetlb_contpte.c
index 500d0b96a680..cbf93ffcd882 100644
--- a/mm/hugetlb_contpte.c
+++ b/mm/hugetlb_contpte.c
@@ -30,3 +30,59 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 	}
 	return orig_pte;
 }
+
+/*
+ * ARM64: Changing some bits of contiguous entries requires us to follow a
+ * Break-Before-Make approach, breaking the whole contiguous set
+ * before we can change any entries. See ARM DDI 0487A.k_iss10775,
+ * "Misprogramming of the Contiguous bit", page D4-1762.
+ *
+ * RISCV: When dealing with NAPOT mappings, the privileged specification
+ * indicates that "if an update needs to be made, the OS generally should first
+ * mark all of the PTEs invalid, then issue SFENCE.VMA instruction(s) covering
+ * all 4 KiB regions within the range, [...] then update the PTE(s), as
+ * described in Section 4.2.1.". That's the equivalent of the Break-Before-Make
+ * approach used by arm64.
+ *
+ * This helper performs the break step for use cases where the
+ * original pte is not needed.
+ */
+static void clear_flush(struct mm_struct *mm,
+			unsigned long addr,
+			pte_t *ptep,
+			unsigned long pgsize,
+			unsigned long ncontig)
+{
+	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
+	unsigned long i, saddr = addr;
+
+	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
+		__ptep_get_and_clear(mm, addr, ptep);
+
+	flush_tlb_range(&vma, saddr, addr);
+}
+
+void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+		     pte_t *ptep, pte_t pte, unsigned long sz)
+{
+	size_t pgsize;
+	int i;
+	int ncontig;
+
+	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
+
+	if (!pte_present(pte)) {
+		for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
+			__set_ptes(mm, addr, ptep, pte, 1);
+		return;
+	}
+
+	if (!pte_cont(pte)) {
+		__set_ptes(mm, addr, ptep, pte, 1);
+		return;
+	}
+
+	clear_flush(mm, addr, ptep, pgsize, ncontig);
+
+	set_contptes(mm, addr, ptep, pte, ncontig, pgsize);
+}
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 5/9] mm: Use common huge_pte_clear() function for riscv/arm64
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
                   ` (3 preceding siblings ...)
  2025-03-21 13:06 ` [PATCH v5 4/9] mm: Use common set_huge_pte_at() " Alexandre Ghiti
@ 2025-03-21 13:06 ` Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 6/9] mm: Use common huge_ptep_get_and_clear() " Alexandre Ghiti
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-21 13:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm
  Cc: Alexandre Ghiti

Both architectures have the same implementation so move it to generic code.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/arm64/include/asm/hugetlb.h |  3 ---
 arch/arm64/mm/hugetlbpage.c      | 12 ------------
 arch/riscv/include/asm/hugetlb.h |  4 ----
 arch/riscv/include/asm/pgtable.h |  5 +++--
 arch/riscv/mm/hugetlbpage.c      | 19 -------------------
 include/linux/hugetlb_contpte.h  |  4 ++++
 mm/hugetlb_contpte.c             | 12 ++++++++++++
 7 files changed, 19 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index cfdc04e11585..ed75631ad63c 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -48,9 +48,6 @@ extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 				   unsigned long addr, pte_t *ptep);
-#define __HAVE_ARCH_HUGE_PTE_CLEAR
-extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
-			   pte_t *ptep, unsigned long sz);
 
 void __init arm64_hugetlb_cma_reserve(void);
 
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 6feb90ed2e7d..99728b02a3ca 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -260,18 +260,6 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 	return entry;
 }
 
-void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
-		    pte_t *ptep, unsigned long sz)
-{
-	int i, ncontig;
-	size_t pgsize;
-
-	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
-
-	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
-		__pte_clear(mm, addr, ptep);
-}
-
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep, unsigned long sz)
 {
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index 7049a17b819d..467bc30c2153 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -20,10 +20,6 @@ bool arch_hugetlb_migration_supported(struct hstate *h);
 #endif
 
 #ifdef CONFIG_RISCV_ISA_SVNAPOT
-#define __HAVE_ARCH_HUGE_PTE_CLEAR
-void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
-		    pte_t *ptep, unsigned long sz);
-
 #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 			      unsigned long addr, pte_t *ptep,
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 5b34b3c9c0f9..72d3592454d3 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -651,8 +651,8 @@ static inline int arch_contpte_get_num_contig(pte_t *ptep, unsigned long size,
 }
 #endif
 
-static inline void pte_clear(struct mm_struct *mm,
-	unsigned long addr, pte_t *ptep)
+static inline void __pte_clear(struct mm_struct *mm,
+			       unsigned long addr, pte_t *ptep)
 {
 	__set_pte_at(mm, ptep, __pte(0));
 }
@@ -787,6 +787,7 @@ static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 #define ptep_get_and_clear	__ptep_get_and_clear
+#define pte_clear		__pte_clear
 
 #define pgprot_nx pgprot_nx
 static inline pgprot_t pgprot_nx(pgprot_t _prot)
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 75faeacc8138..fe82284c3dc4 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -254,25 +254,6 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 	return get_clear_contig_flush(vma->vm_mm, addr, ptep, pte_num);
 }
 
-void huge_pte_clear(struct mm_struct *mm,
-		    unsigned long addr,
-		    pte_t *ptep,
-		    unsigned long sz)
-{
-	size_t pgsize;
-	pte_t pte = ptep_get(ptep);
-	int i, pte_num;
-
-	if (!pte_napot(pte)) {
-		pte_clear(mm, addr, ptep);
-		return;
-	}
-
-	pte_num = arch_contpte_get_num_contig(ptep, sz, &pgsize);
-	for (i = 0; i < pte_num; i++, addr += pgsize, ptep++)
-		pte_clear(mm, addr, ptep);
-}
-
 static bool is_napot_size(unsigned long size)
 {
 	unsigned long order;
diff --git a/include/linux/hugetlb_contpte.h b/include/linux/hugetlb_contpte.h
index 135b68bd09ca..e6aa9befa78c 100644
--- a/include/linux/hugetlb_contpte.h
+++ b/include/linux/hugetlb_contpte.h
@@ -14,4 +14,8 @@ extern void set_huge_pte_at(struct mm_struct *mm,
 			    unsigned long addr, pte_t *ptep, pte_t pte,
 			    unsigned long sz);
 
+#define __HAVE_ARCH_HUGE_PTE_CLEAR
+extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
+			   pte_t *ptep, unsigned long sz);
+
 #endif /* _LINUX_HUGETLB_CONTPTE_H */
diff --git a/mm/hugetlb_contpte.c b/mm/hugetlb_contpte.c
index cbf93ffcd882..e881b302dd63 100644
--- a/mm/hugetlb_contpte.c
+++ b/mm/hugetlb_contpte.c
@@ -86,3 +86,15 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 
 	set_contptes(mm, addr, ptep, pte, ncontig, pgsize);
 }
+
+void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
+		    pte_t *ptep, unsigned long sz)
+{
+	int i, ncontig;
+	size_t pgsize;
+
+	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
+
+	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
+		__pte_clear(mm, addr, ptep);
+}
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 6/9] mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
                   ` (4 preceding siblings ...)
  2025-03-21 13:06 ` [PATCH v5 5/9] mm: Use common huge_pte_clear() " Alexandre Ghiti
@ 2025-03-21 13:06 ` Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 7/9] mm: Use common huge_ptep_set_access_flags() " Alexandre Ghiti
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-21 13:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm
  Cc: Alexandre Ghiti

After some adjustments, both architectures have the same implementation
so move it to the generic code.

Note that get_clear_contig() function is duplicated in the generic and
the arm64 code because it is still used by some arm64 functions that
will, in the next commits, be moved to the generic code. Once all have
been moved, the arm64 version will be removed.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/arm64/include/asm/hugetlb.h |  3 --
 arch/arm64/mm/hugetlbpage.c      | 10 -------
 arch/riscv/include/asm/hugetlb.h |  5 ----
 arch/riscv/mm/hugetlbpage.c      | 15 ----------
 include/linux/hugetlb_contpte.h  |  5 ++++
 mm/hugetlb_contpte.c             | 51 ++++++++++++++++++++++++++++++++
 6 files changed, 56 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index ed75631ad63c..9b1c25775bea 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -39,9 +39,6 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 				      unsigned long addr, pte_t *ptep,
 				      pte_t pte, int dirty);
-#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
-extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
-				     pte_t *ptep, unsigned long sz);
 #define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
 				    unsigned long addr, pte_t *ptep);
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 99728b02a3ca..62a66ce2b2fe 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -260,16 +260,6 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 	return entry;
 }
 
-pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, unsigned long sz)
-{
-	int ncontig;
-	size_t pgsize;
-
-	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
-	return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
-}
-
 /*
  * huge_ptep_set_access_flags will update access flags (dirty, accesssed)
  * and write permission.
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index 467bc30c2153..0fbb6b19df79 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -20,11 +20,6 @@ bool arch_hugetlb_migration_supported(struct hstate *h);
 #endif
 
 #ifdef CONFIG_RISCV_ISA_SVNAPOT
-#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
-pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
-			      unsigned long addr, pte_t *ptep,
-			      unsigned long sz);
-
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 			    unsigned long addr, pte_t *ptep);
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index fe82284c3dc4..87168123d4a2 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -203,21 +203,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 	return true;
 }
 
-pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
-			      unsigned long addr,
-			      pte_t *ptep, unsigned long sz)
-{
-	pte_t orig_pte = ptep_get(ptep);
-	int pte_num;
-
-	if (!pte_napot(orig_pte))
-		return ptep_get_and_clear(mm, addr, ptep);
-
-	pte_num = arch_contpte_get_num_contig(ptep, sz, NULL);
-
-	return get_clear_contig(mm, addr, ptep, pte_num);
-}
-
 void huge_ptep_set_wrprotect(struct mm_struct *mm,
 			     unsigned long addr,
 			     pte_t *ptep)
diff --git a/include/linux/hugetlb_contpte.h b/include/linux/hugetlb_contpte.h
index e6aa9befa78c..1c8f46ff95ea 100644
--- a/include/linux/hugetlb_contpte.h
+++ b/include/linux/hugetlb_contpte.h
@@ -18,4 +18,9 @@ extern void set_huge_pte_at(struct mm_struct *mm,
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 			   pte_t *ptep, unsigned long sz);
 
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
+extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+				     unsigned long addr, pte_t *ptep,
+				     unsigned long sz);
+
 #endif /* _LINUX_HUGETLB_CONTPTE_H */
diff --git a/mm/hugetlb_contpte.c b/mm/hugetlb_contpte.c
index e881b302dd63..82f49eb79ffb 100644
--- a/mm/hugetlb_contpte.c
+++ b/mm/hugetlb_contpte.c
@@ -98,3 +98,54 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
 		__pte_clear(mm, addr, ptep);
 }
+
+/*
+ * ARM: Changing some bits of contiguous entries requires us to follow a
+ * Break-Before-Make approach, breaking the whole contiguous set
+ * before we can change any entries. See ARM DDI 0487A.k_iss10775,
+ * "Misprogramming of the Contiguous bit", page D4-1762.
+ *
+ * RISCV: When dealing with NAPOT mappings, the privileged specification
+ * indicates that "if an update needs to be made, the OS generally should first
+ * mark all of the PTEs invalid, then issue SFENCE.VMA instruction(s) covering
+ * all 4 KiB regions within the range, [...] then update the PTE(s), as
+ * described in Section 4.2.1.". That's the equivalent of the Break-Before-Make
+ * approach used by arm64.
+ *
+ * This helper performs the break step.
+ */
+static pte_t get_clear_contig(struct mm_struct *mm,
+			     unsigned long addr,
+			     pte_t *ptep,
+			     unsigned long pgsize,
+			     unsigned long ncontig)
+{
+	pte_t pte, tmp_pte;
+	bool present;
+
+	pte = __ptep_get_and_clear(mm, addr, ptep);
+	present = pte_present(pte);
+	while (--ncontig) {
+		ptep++;
+		addr += pgsize;
+		tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
+		if (present) {
+			if (pte_dirty(tmp_pte))
+				pte = pte_mkdirty(pte);
+			if (pte_young(tmp_pte))
+				pte = pte_mkyoung(pte);
+		}
+	}
+	return pte;
+}
+
+pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+			      unsigned long addr, pte_t *ptep, unsigned long sz)
+{
+	int ncontig;
+	size_t pgsize;
+
+	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
+
+	return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
+}
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 7/9] mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
                   ` (5 preceding siblings ...)
  2025-03-21 13:06 ` [PATCH v5 6/9] mm: Use common huge_ptep_get_and_clear() " Alexandre Ghiti
@ 2025-03-21 13:06 ` Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 8/9] mm: Use common huge_ptep_set_wrprotect() " Alexandre Ghiti
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-21 13:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm
  Cc: Alexandre Ghiti

Both architectures have almost the same implementation:
__cont_access_flags_changed() is also correct on riscv and brings the
same benefits (ie don't do anything if the flags are unchanged).

As in the previous commit, get_clear_contig_flush() is duplicated in both
the arch and the generic codes, it will be removed from the arch code when
the last reference there gets moved to the generic code.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/arm64/include/asm/hugetlb.h |  4 --
 arch/arm64/include/asm/pgtable.h | 15 +++++-
 arch/arm64/mm/hugetlbpage.c      | 69 +--------------------------
 arch/riscv/include/asm/hugetlb.h |  5 --
 arch/riscv/include/asm/pgtable.h | 11 +++--
 arch/riscv/mm/hugetlbpage.c      | 32 ++-----------
 arch/riscv/mm/pgtable.c          |  6 +--
 include/linux/hugetlb_contpte.h  |  5 ++
 mm/hugetlb_contpte.c             | 81 ++++++++++++++++++++++++++++++--
 9 files changed, 110 insertions(+), 118 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 9b1c25775bea..29a9dac52cef 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -35,10 +35,6 @@ static inline void arch_clear_hugetlb_flags(struct folio *folio)
 
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 #define arch_make_huge_pte arch_make_huge_pte
-#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
-extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
-				      unsigned long addr, pte_t *ptep,
-				      pte_t pte, int dirty);
 #define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
 				    unsigned long addr, pte_t *ptep);
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index af8156929c1d..9b5c57e56691 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1847,12 +1847,23 @@ static inline bool __hugetlb_valid_size(unsigned long size)
 	return false;
 }
 
-static inline int arch_contpte_get_num_contig(pte_t *ptep,
-					      unsigned long size,
+extern int find_num_contig(struct mm_struct *mm, unsigned long addr,
+			   pte_t *ptep, size_t *pgsize);
+
+static inline int arch_contpte_get_num_contig(struct mm_struct *mm,
+					      unsigned long addr,
+					      pte_t *ptep, unsigned long size,
 					      size_t *pgsize)
 {
 	int contig_ptes = 1;
 
+	/*
+	 * If the size is not passed, we need to go through the page table to
+	 * find out the number of contiguous ptes.
+	 */
+	if (size == 0)
+		return find_num_contig(mm, addr, ptep, pgsize);
+
 	if (pgsize)
 		*pgsize = size;
 
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 62a66ce2b2fe..03cb757f7935 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -63,8 +63,8 @@ bool arch_hugetlb_migration_supported(struct hstate *h)
 }
 #endif
 
-static int find_num_contig(struct mm_struct *mm, unsigned long addr,
-			   pte_t *ptep, size_t *pgsize)
+int find_num_contig(struct mm_struct *mm, unsigned long addr,
+		    pte_t *ptep, size_t *pgsize)
 {
 	pgd_t *pgdp = pgd_offset(mm, addr);
 	p4d_t *p4dp;
@@ -260,71 +260,6 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 	return entry;
 }
 
-/*
- * huge_ptep_set_access_flags will update access flags (dirty, accesssed)
- * and write permission.
- *
- * For a contiguous huge pte range we need to check whether or not write
- * permission has to change only on the first pte in the set. Then for
- * all the contiguous ptes we need to check whether or not there is a
- * discrepancy between dirty or young.
- */
-static int __cont_access_flags_changed(pte_t *ptep, pte_t pte, int ncontig)
-{
-	int i;
-
-	if (pte_write(pte) != pte_write(__ptep_get(ptep)))
-		return 1;
-
-	for (i = 0; i < ncontig; i++) {
-		pte_t orig_pte = __ptep_get(ptep + i);
-
-		if (pte_dirty(pte) != pte_dirty(orig_pte))
-			return 1;
-
-		if (pte_young(pte) != pte_young(orig_pte))
-			return 1;
-	}
-
-	return 0;
-}
-
-int huge_ptep_set_access_flags(struct vm_area_struct *vma,
-			       unsigned long addr, pte_t *ptep,
-			       pte_t pte, int dirty)
-{
-	int ncontig, i;
-	size_t pgsize = 0;
-	unsigned long pfn = pte_pfn(pte), dpfn;
-	struct mm_struct *mm = vma->vm_mm;
-	pgprot_t hugeprot;
-	pte_t orig_pte;
-
-	if (!pte_cont(pte))
-		return __ptep_set_access_flags(vma, addr, ptep, pte, dirty);
-
-	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
-	dpfn = pgsize >> PAGE_SHIFT;
-
-	if (!__cont_access_flags_changed(ptep, pte, ncontig))
-		return 0;
-
-	orig_pte = get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig);
-
-	/* Make sure we don't lose the dirty or young state */
-	if (pte_dirty(orig_pte))
-		pte = pte_mkdirty(pte);
-
-	if (pte_young(orig_pte))
-		pte = pte_mkyoung(pte);
-
-	hugeprot = pte_pgprot(pte);
-	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
-		__set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1);
-
-	return 1;
-}
-
 void huge_ptep_set_wrprotect(struct mm_struct *mm,
 			     unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index 0fbb6b19df79..bf533c2cef84 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -28,11 +28,6 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 void huge_ptep_set_wrprotect(struct mm_struct *mm,
 			     unsigned long addr, pte_t *ptep);
 
-#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
-int huge_ptep_set_access_flags(struct vm_area_struct *vma,
-			       unsigned long addr, pte_t *ptep,
-			       pte_t pte, int dirty);
-
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 #define arch_make_huge_pte arch_make_huge_pte
 
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 72d3592454d3..081385e0d10a 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -613,7 +613,9 @@ static inline void ___set_ptes(struct mm_struct *mm, unsigned long addr,
  * Some hugetlb functions can be called on !present ptes, so we must use the
  * size parameter when it is passed.
  */
-static inline int arch_contpte_get_num_contig(pte_t *ptep, unsigned long size,
+static inline int arch_contpte_get_num_contig(struct mm_struct *mm,
+					      unsigned long addr,
+					      pte_t *ptep, unsigned long size,
 					      size_t *pgsize)
 {
 	unsigned long hugepage_shift;
@@ -657,9 +659,8 @@ static inline void __pte_clear(struct mm_struct *mm,
 	__set_pte_at(mm, ptep, __pte(0));
 }
 
-#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS	/* defined in mm/pgtable.c */
-extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
-				 pte_t *ptep, pte_t entry, int dirty);
+extern int __ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
+				   pte_t *ptep, pte_t entry, int dirty);
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG	/* defined in mm/pgtable.c */
 extern int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long address,
 				     pte_t *ptep);
@@ -788,6 +789,8 @@ static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 #define ptep_get_and_clear	__ptep_get_and_clear
 #define pte_clear		__pte_clear
+#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
+#define ptep_set_access_flags	__ptep_set_access_flags
 
 #define pgprot_nx pgprot_nx
 static inline pgprot_t pgprot_nx(pgprot_t _prot)
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 87168123d4a2..b2046f4bd445 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -176,33 +176,6 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 	return entry;
 }
 
-int huge_ptep_set_access_flags(struct vm_area_struct *vma,
-			       unsigned long addr,
-			       pte_t *ptep,
-			       pte_t pte,
-			       int dirty)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	pte_t orig_pte;
-	int pte_num;
-
-	if (!pte_napot(pte))
-		return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
-
-	pte_num = arch_contpte_get_num_contig(ptep, 0, NULL);
-	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
-
-	if (pte_dirty(orig_pte))
-		pte = pte_mkdirty(pte);
-
-	if (pte_young(orig_pte))
-		pte = pte_mkyoung(pte);
-
-	set_ptes(mm, addr, ptep, pte, pte_num);
-
-	return true;
-}
-
 void huge_ptep_set_wrprotect(struct mm_struct *mm,
 			     unsigned long addr,
 			     pte_t *ptep)
@@ -216,7 +189,8 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
 		return;
 	}
 
-	pte_num = arch_contpte_get_num_contig(ptep, 0, NULL);
+	pte_num = arch_contpte_get_num_contig(mm, addr, ptep, 0, NULL);
+
 	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
 
 	orig_pte = pte_wrprotect(orig_pte);
@@ -234,7 +208,7 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 	if (!pte_napot(pte))
 		return ptep_clear_flush(vma, addr, ptep);
 
-	pte_num = arch_contpte_get_num_contig(ptep, 0, NULL);
+	pte_num = arch_contpte_get_num_contig(vma->vm_mm, addr, ptep, 0, NULL);
 
 	return get_clear_contig_flush(vma->vm_mm, addr, ptep, pte_num);
 }
diff --git a/arch/riscv/mm/pgtable.c b/arch/riscv/mm/pgtable.c
index 4ae67324f992..af8b3769a349 100644
--- a/arch/riscv/mm/pgtable.c
+++ b/arch/riscv/mm/pgtable.c
@@ -5,9 +5,9 @@
 #include <linux/kernel.h>
 #include <linux/pgtable.h>
 
-int ptep_set_access_flags(struct vm_area_struct *vma,
-			  unsigned long address, pte_t *ptep,
-			  pte_t entry, int dirty)
+int __ptep_set_access_flags(struct vm_area_struct *vma,
+			    unsigned long address, pte_t *ptep,
+			    pte_t entry, int dirty)
 {
 	asm goto(ALTERNATIVE("nop", "j %l[svvptc]", 0, RISCV_ISA_EXT_SVVPTC, 1)
 		 : : : : svvptc);
diff --git a/include/linux/hugetlb_contpte.h b/include/linux/hugetlb_contpte.h
index 1c8f46ff95ea..e129578f6500 100644
--- a/include/linux/hugetlb_contpte.h
+++ b/include/linux/hugetlb_contpte.h
@@ -23,4 +23,9 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 				     unsigned long addr, pte_t *ptep,
 				     unsigned long sz);
 
+#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
+extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
+				      unsigned long addr, pte_t *ptep,
+				      pte_t pte, int dirty);
+
 #endif /* _LINUX_HUGETLB_CONTPTE_H */
diff --git a/mm/hugetlb_contpte.c b/mm/hugetlb_contpte.c
index 82f49eb79ffb..b4c409d11195 100644
--- a/mm/hugetlb_contpte.c
+++ b/mm/hugetlb_contpte.c
@@ -15,7 +15,7 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 	if (!pte_present(orig_pte) || !pte_cont(orig_pte))
 		return orig_pte;
 
-	ncontig = arch_contpte_get_num_contig(ptep,
+	ncontig = arch_contpte_get_num_contig(mm, addr, ptep,
 					      page_size(pte_page(orig_pte)),
 					      NULL);
 
@@ -69,7 +69,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 	int i;
 	int ncontig;
 
-	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
+	ncontig = arch_contpte_get_num_contig(mm, addr, ptep, sz, &pgsize);
 
 	if (!pte_present(pte)) {
 		for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
@@ -93,7 +93,7 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 	int i, ncontig;
 	size_t pgsize;
 
-	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
+	ncontig = arch_contpte_get_num_contig(mm, addr, ptep, sz, &pgsize);
 
 	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
 		__pte_clear(mm, addr, ptep);
@@ -145,7 +145,80 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 	int ncontig;
 	size_t pgsize;
 
-	ncontig = arch_contpte_get_num_contig(ptep, sz, &pgsize);
+	ncontig = arch_contpte_get_num_contig(mm, addr, ptep, sz, &pgsize);
 
 	return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
 }
+
+/*
+ * huge_ptep_set_access_flags will update access flags (dirty, accesssed)
+ * and write permission.
+ *
+ * For a contiguous huge pte range we need to check whether or not write
+ * permission has to change only on the first pte in the set. Then for
+ * all the contiguous ptes we need to check whether or not there is a
+ * discrepancy between dirty or young.
+ */
+static int __cont_access_flags_changed(pte_t *ptep, pte_t pte, int ncontig)
+{
+	int i;
+
+	if (pte_write(pte) != pte_write(__ptep_get(ptep)))
+		return 1;
+
+	for (i = 0; i < ncontig; i++) {
+		pte_t orig_pte = __ptep_get(ptep + i);
+
+		if (pte_dirty(pte) != pte_dirty(orig_pte))
+			return 1;
+
+		if (pte_young(pte) != pte_young(orig_pte))
+			return 1;
+	}
+
+	return 0;
+}
+
+static pte_t get_clear_contig_flush(struct mm_struct *mm,
+				    unsigned long addr,
+				    pte_t *ptep,
+				    unsigned long pgsize,
+				    unsigned long ncontig)
+{
+	pte_t orig_pte = get_clear_contig(mm, addr, ptep, pgsize, ncontig);
+	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
+
+	flush_tlb_range(&vma, addr, addr + (pgsize * ncontig));
+	return orig_pte;
+}
+
+int huge_ptep_set_access_flags(struct vm_area_struct *vma,
+			       unsigned long addr, pte_t *ptep,
+			       pte_t pte, int dirty)
+{
+	int ncontig;
+	size_t pgsize = 0;
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t orig_pte;
+
+	if (!pte_cont(pte))
+		return __ptep_set_access_flags(vma, addr, ptep, pte, dirty);
+
+	ncontig = arch_contpte_get_num_contig(vma->vm_mm, addr, ptep, 0, &pgsize);
+
+	if (!__cont_access_flags_changed(ptep, pte, ncontig))
+		return 0;
+
+	orig_pte = get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig);
+
+	/* Make sure we don't lose the dirty or young state */
+	if (pte_dirty(orig_pte))
+		pte = pte_mkdirty(pte);
+
+	if (pte_young(orig_pte))
+		pte = pte_mkyoung(pte);
+
+	set_contptes(mm, addr, ptep, pte, ncontig, pgsize);
+
+	return 1;
+}
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 8/9] mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
                   ` (6 preceding siblings ...)
  2025-03-21 13:06 ` [PATCH v5 7/9] mm: Use common huge_ptep_set_access_flags() " Alexandre Ghiti
@ 2025-03-21 13:06 ` Alexandre Ghiti
  2025-03-21 13:06 ` [PATCH v5 9/9] mm: Use common huge_ptep_clear_flush() " Alexandre Ghiti
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-21 13:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm
  Cc: Alexandre Ghiti

After some adjustments, both architectures have the same implementation
so move it to the generic code.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/arm64/include/asm/hugetlb.h |  3 ---
 arch/arm64/mm/hugetlbpage.c      | 27 ---------------------------
 arch/riscv/include/asm/hugetlb.h |  4 ----
 arch/riscv/include/asm/pgtable.h |  7 ++++---
 arch/riscv/mm/hugetlbpage.c      | 22 ----------------------
 include/linux/hugetlb_contpte.h  |  4 ++++
 mm/hugetlb_contpte.c             | 20 ++++++++++++++++++++
 7 files changed, 28 insertions(+), 59 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 29a9dac52cef..f568467e8ba2 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -35,9 +35,6 @@ static inline void arch_clear_hugetlb_flags(struct folio *folio)
 
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 #define arch_make_huge_pte arch_make_huge_pte
-#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
-extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
-				    unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 				   unsigned long addr, pte_t *ptep);
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 03cb757f7935..17f1ed34356d 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -260,33 +260,6 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 	return entry;
 }
 
-void huge_ptep_set_wrprotect(struct mm_struct *mm,
-			     unsigned long addr, pte_t *ptep)
-{
-	unsigned long pfn, dpfn;
-	pgprot_t hugeprot;
-	int ncontig, i;
-	size_t pgsize;
-	pte_t pte;
-
-	if (!pte_cont(__ptep_get(ptep))) {
-		__ptep_set_wrprotect(mm, addr, ptep);
-		return;
-	}
-
-	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
-	dpfn = pgsize >> PAGE_SHIFT;
-
-	pte = get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig);
-	pte = pte_wrprotect(pte);
-
-	hugeprot = pte_pgprot(pte);
-	pfn = pte_pfn(pte);
-
-	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
-		__set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1);
-}
-
 pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 			    unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index bf533c2cef84..4c692dd82779 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -24,10 +24,6 @@ bool arch_hugetlb_migration_supported(struct hstate *h);
 pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 			    unsigned long addr, pte_t *ptep);
 
-#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
-void huge_ptep_set_wrprotect(struct mm_struct *mm,
-			     unsigned long addr, pte_t *ptep);
-
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 #define arch_make_huge_pte arch_make_huge_pte
 
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 081385e0d10a..c41b49948ee9 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -665,9 +665,8 @@ extern int __ptep_set_access_flags(struct vm_area_struct *vma, unsigned long add
 extern int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long address,
 				     pte_t *ptep);
 
-#define __HAVE_ARCH_PTEP_SET_WRPROTECT
-static inline void ptep_set_wrprotect(struct mm_struct *mm,
-				      unsigned long address, pte_t *ptep)
+static inline void __ptep_set_wrprotect(struct mm_struct *mm,
+					unsigned long address, pte_t *ptep)
 {
 	atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep);
 }
@@ -791,6 +790,8 @@ static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
 #define pte_clear		__pte_clear
 #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
 #define ptep_set_access_flags	__ptep_set_access_flags
+#define __HAVE_ARCH_PTEP_SET_WRPROTECT
+#define ptep_set_wrprotect	__ptep_set_wrprotect
 
 #define pgprot_nx pgprot_nx
 static inline pgprot_t pgprot_nx(pgprot_t _prot)
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index b2046f4bd445..db13f7bcdd54 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -176,28 +176,6 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 	return entry;
 }
 
-void huge_ptep_set_wrprotect(struct mm_struct *mm,
-			     unsigned long addr,
-			     pte_t *ptep)
-{
-	pte_t pte = ptep_get(ptep);
-	pte_t orig_pte;
-	int pte_num;
-
-	if (!pte_napot(pte)) {
-		ptep_set_wrprotect(mm, addr, ptep);
-		return;
-	}
-
-	pte_num = arch_contpte_get_num_contig(mm, addr, ptep, 0, NULL);
-
-	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
-
-	orig_pte = pte_wrprotect(orig_pte);
-
-	set_ptes(mm, addr, ptep, orig_pte, pte_num);
-}
-
 pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 			    unsigned long addr,
 			    pte_t *ptep)
diff --git a/include/linux/hugetlb_contpte.h b/include/linux/hugetlb_contpte.h
index e129578f6500..9ec8792a2f4d 100644
--- a/include/linux/hugetlb_contpte.h
+++ b/include/linux/hugetlb_contpte.h
@@ -28,4 +28,8 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 				      unsigned long addr, pte_t *ptep,
 				      pte_t pte, int dirty);
 
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
+extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
+				    unsigned long addr, pte_t *ptep);
+
 #endif /* _LINUX_HUGETLB_CONTPTE_H */
diff --git a/mm/hugetlb_contpte.c b/mm/hugetlb_contpte.c
index b4c409d11195..629878765081 100644
--- a/mm/hugetlb_contpte.c
+++ b/mm/hugetlb_contpte.c
@@ -222,3 +222,23 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 
 	return 1;
 }
+
+void huge_ptep_set_wrprotect(struct mm_struct *mm,
+			     unsigned long addr, pte_t *ptep)
+{
+	int ncontig;
+	size_t pgsize;
+	pte_t pte;
+
+	if (!pte_cont(__ptep_get(ptep))) {
+		__ptep_set_wrprotect(mm, addr, ptep);
+		return;
+	}
+
+	ncontig = arch_contpte_get_num_contig(mm, addr, ptep, 0, &pgsize);
+
+	pte = get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig);
+	pte = pte_wrprotect(pte);
+
+	set_contptes(mm, addr, ptep, pte, ncontig, pgsize);
+}
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 9/9] mm: Use common huge_ptep_clear_flush() function for riscv/arm64
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
                   ` (7 preceding siblings ...)
  2025-03-21 13:06 ` [PATCH v5 8/9] mm: Use common huge_ptep_set_wrprotect() " Alexandre Ghiti
@ 2025-03-21 13:06 ` Alexandre Ghiti
  2025-03-21 17:24 ` [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Christophe Leroy
  2025-04-07 12:04 ` Alexandre Ghiti
  10 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-21 13:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm
  Cc: Alexandre Ghiti

After some adjustments, both architectures have the same implementation
so move it to the generic code.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/arm64/include/asm/hugetlb.h |  3 --
 arch/arm64/mm/hugetlbpage.c      | 60 --------------------------------
 arch/riscv/include/asm/hugetlb.h |  7 +---
 arch/riscv/mm/hugetlbpage.c      | 54 ----------------------------
 include/linux/hugetlb_contpte.h  |  4 +++
 mm/hugetlb_contpte.c             | 14 ++++++++
 6 files changed, 19 insertions(+), 123 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index f568467e8ba2..368600764127 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -35,9 +35,6 @@ static inline void arch_clear_hugetlb_flags(struct folio *folio)
 
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 #define arch_make_huge_pte arch_make_huge_pte
-#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
-				   unsigned long addr, pte_t *ptep);
 
 void __init arm64_hugetlb_cma_reserve(void);
 
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 17f1ed34356d..08316cf4b104 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -82,52 +82,6 @@ int find_num_contig(struct mm_struct *mm, unsigned long addr,
 	return CONT_PTES;
 }
 
-/*
- * Changing some bits of contiguous entries requires us to follow a
- * Break-Before-Make approach, breaking the whole contiguous set
- * before we can change any entries. See ARM DDI 0487A.k_iss10775,
- * "Misprogramming of the Contiguous bit", page D4-1762.
- *
- * This helper performs the break step.
- */
-static pte_t get_clear_contig(struct mm_struct *mm,
-			     unsigned long addr,
-			     pte_t *ptep,
-			     unsigned long pgsize,
-			     unsigned long ncontig)
-{
-	pte_t pte, tmp_pte;
-	bool present;
-
-	pte = __ptep_get_and_clear(mm, addr, ptep);
-	present = pte_present(pte);
-	while (--ncontig) {
-		ptep++;
-		addr += pgsize;
-		tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
-		if (present) {
-			if (pte_dirty(tmp_pte))
-				pte = pte_mkdirty(pte);
-			if (pte_young(tmp_pte))
-				pte = pte_mkyoung(pte);
-		}
-	}
-	return pte;
-}
-
-static pte_t get_clear_contig_flush(struct mm_struct *mm,
-				    unsigned long addr,
-				    pte_t *ptep,
-				    unsigned long pgsize,
-				    unsigned long ncontig)
-{
-	pte_t orig_pte = get_clear_contig(mm, addr, ptep, pgsize, ncontig);
-	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
-
-	flush_tlb_range(&vma, addr, addr + (pgsize * ncontig));
-	return orig_pte;
-}
-
 pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 		      unsigned long addr, unsigned long sz)
 {
@@ -260,20 +214,6 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 	return entry;
 }
 
-pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
-			    unsigned long addr, pte_t *ptep)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	size_t pgsize;
-	int ncontig;
-
-	if (!pte_cont(__ptep_get(ptep)))
-		return ptep_clear_flush(vma, addr, ptep);
-
-	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
-	return get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig);
-}
-
 static int __init hugetlbpage_init(void)
 {
 	/*
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index 4c692dd82779..63c7e4fa342a 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -20,14 +20,9 @@ bool arch_hugetlb_migration_supported(struct hstate *h);
 #endif
 
 #ifdef CONFIG_RISCV_ISA_SVNAPOT
-#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
-			    unsigned long addr, pte_t *ptep);
-
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 #define arch_make_huge_pte arch_make_huge_pte
-
-#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
+#endif /* CONFIG_RISCV_ISA_SVNAPOT */
 
 #include <asm-generic/hugetlb.h>
 
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index db13f7bcdd54..a6176415432a 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -121,45 +121,6 @@ unsigned long hugetlb_mask_last_page(struct hstate *h)
 	return 0UL;
 }
 
-static pte_t get_clear_contig(struct mm_struct *mm,
-			      unsigned long addr,
-			      pte_t *ptep,
-			      unsigned long ncontig)
-{
-	pte_t pte, tmp_pte;
-	bool present;
-
-	pte = ptep_get_and_clear(mm, addr, ptep);
-	present = pte_present(pte);
-	while (--ncontig) {
-		ptep++;
-		addr += PAGE_SIZE;
-		tmp_pte = ptep_get_and_clear(mm, addr, ptep);
-		if (present) {
-			if (pte_dirty(tmp_pte))
-				pte = pte_mkdirty(pte);
-			if (pte_young(tmp_pte))
-				pte = pte_mkyoung(pte);
-		}
-	}
-	return pte;
-}
-
-static pte_t get_clear_contig_flush(struct mm_struct *mm,
-				    unsigned long addr,
-				    pte_t *ptep,
-				    unsigned long pte_num)
-{
-	pte_t orig_pte = get_clear_contig(mm, addr, ptep, pte_num);
-	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
-	bool valid = !pte_none(orig_pte);
-
-	if (valid)
-		flush_tlb_range(&vma, addr, addr + (PAGE_SIZE * pte_num));
-
-	return orig_pte;
-}
-
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 {
 	unsigned long order;
@@ -176,21 +137,6 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 	return entry;
 }
 
-pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
-			    unsigned long addr,
-			    pte_t *ptep)
-{
-	pte_t pte = ptep_get(ptep);
-	int pte_num;
-
-	if (!pte_napot(pte))
-		return ptep_clear_flush(vma, addr, ptep);
-
-	pte_num = arch_contpte_get_num_contig(vma->vm_mm, addr, ptep, 0, NULL);
-
-	return get_clear_contig_flush(vma->vm_mm, addr, ptep, pte_num);
-}
-
 static bool is_napot_size(unsigned long size)
 {
 	unsigned long order;
diff --git a/include/linux/hugetlb_contpte.h b/include/linux/hugetlb_contpte.h
index 9ec8792a2f4d..e217a3412b13 100644
--- a/include/linux/hugetlb_contpte.h
+++ b/include/linux/hugetlb_contpte.h
@@ -32,4 +32,8 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
 				    unsigned long addr, pte_t *ptep);
 
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
+extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+				   unsigned long addr, pte_t *ptep);
+
 #endif /* _LINUX_HUGETLB_CONTPTE_H */
diff --git a/mm/hugetlb_contpte.c b/mm/hugetlb_contpte.c
index 629878765081..1dc211d6fbe1 100644
--- a/mm/hugetlb_contpte.c
+++ b/mm/hugetlb_contpte.c
@@ -242,3 +242,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
 
 	set_contptes(mm, addr, ptep, pte, ncontig, pgsize);
 }
+
+pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+			    unsigned long addr, pte_t *ptep)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	size_t pgsize;
+	int ncontig;
+
+	if (!pte_cont(__ptep_get(ptep)))
+		return ptep_clear_flush(vma, addr, ptep);
+
+	ncontig = arch_contpte_get_num_contig(mm, addr, ptep, 0, &pgsize);
+	return get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig);
+}
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
                   ` (8 preceding siblings ...)
  2025-03-21 13:06 ` [PATCH v5 9/9] mm: Use common huge_ptep_clear_flush() " Alexandre Ghiti
@ 2025-03-21 17:24 ` Christophe Leroy
  2025-03-25 12:36   ` Alexandre Ghiti
  2025-04-07 12:04 ` Alexandre Ghiti
  10 siblings, 1 reply; 22+ messages in thread
From: Christophe Leroy @ 2025-03-21 17:24 UTC (permalink / raw)
  To: Alexandre Ghiti, Catalin Marinas, Will Deacon, Ryan Roberts,
	Mark Rutland, Matthew Wilcox, Paul Walmsley, Palmer Dabbelt,
	Alexandre Ghiti, Andrew Morton, linux-arm-kernel, linux-kernel,
	linux-riscv, linux-mm



Le 21/03/2025 à 14:06, Alexandre Ghiti a écrit :
> This patchset intends to merge the contiguous ptes hugetlbfs implementation
> of arm64 and riscv.

Can we also add powerpc in the dance ?

powerpc also use contiguous PTEs allthough there is not (yet) a special 
name for it:
- b250c8c08c79 powerpc/8xx: Manage 512k huge pages as standard pages
- e47168f3d1b1 powerpc/8xx: Support 16k hugepages with 4k pages

powerpc also use configuous PMDs/PUDs for larger hugepages:
- 57fb15c32f4f ("powerpc/64s: use contiguous PMD/PUD instead of HUGEPD")
- 7c44202e3609 ("powerpc/e500: use contiguous PMD instead of hugepd")
- 0549e7666373 ("powerpc/8xx: rework support for 8M pages using 
contiguous PTE entries")

Christophe

> 
> Both arm64 and riscv support the use of contiguous ptes to map pages that
> are larger than the default page table size, respectively called contpte
> and svnapot.
> 
> The riscv implementation differs from the arm64's in that the LSBs of the
> pfn of a svnapot pte are used to store the size of the mapping, allowing
> for future sizes to be added (for now only 64KB is supported). That's an
> issue for the core mm code which expects to find the *real* pfn a pte points
> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
> and restores the size of the mapping when it is written to a page table.
> 
> The following patches are just merges of the 2 different implementations
> that currently exist in arm64 and riscv which are very similar. It paves
> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
> reimplementing the same in riscv.
> 
> This patchset was tested by running the libhugetlbfs testsuite with 64KB
> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).
> 
> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/
> 
> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/
> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/
> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/
> 
> Changes in v5:
>    - Fix "int i" unused variable in patch 2 (as reported by PW)
>    - Fix !svnapot build
>    - Fix arch_make_huge_pte() which returned a real napot pte
>    - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
>      avoid leaking real napot pfns to core mm
>    - Fix arch_contpte_get_num_contig() that used to always try to get the
>      mapping size from the ptep, which does not work if the ptep comes the core mm
>    - Rebase on top of 6.14-rc7 + fix for
>      huge_ptep_get_and_clear()/huge_pte_clear()
>      https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/
> 
> Changes in v4:
>    - Rebase on top of 6.13
> 
> Changes in v3:
>    - Split set_ptes and ptep_get into internal and external API (Ryan)
>    - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
>      we split hugetlb functions from contpte functions (actually riscv contpte
>      functions to support THP will come into another series) (Ryan)
>    - Rebase on top of 6.11-rc1
> 
> Changes in v2:
>    - Rebase on top of 6.9-rc3
> 
> Alexandre Ghiti (9):
>    riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
>    riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
>    mm: Use common huge_ptep_get() function for riscv/arm64
>    mm: Use common set_huge_pte_at() function for riscv/arm64
>    mm: Use common huge_pte_clear() function for riscv/arm64
>    mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
>    mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
>    mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
>    mm: Use common huge_ptep_clear_flush() function for riscv/arm64
> 
>   arch/arm64/Kconfig                  |   1 +
>   arch/arm64/include/asm/hugetlb.h    |  22 +--
>   arch/arm64/include/asm/pgtable.h    |  68 ++++++-
>   arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
>   arch/riscv/Kconfig                  |   1 +
>   arch/riscv/include/asm/hugetlb.h    |  36 +---
>   arch/riscv/include/asm/pgtable-64.h |  11 ++
>   arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
>   arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
>   arch/riscv/mm/pgtable.c             |   6 +-
>   include/linux/hugetlb_contpte.h     |  39 ++++
>   mm/Kconfig                          |   3 +
>   mm/Makefile                         |   1 +
>   mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
>   14 files changed, 583 insertions(+), 622 deletions(-)
>   create mode 100644 include/linux/hugetlb_contpte.h
>   create mode 100644 mm/hugetlb_contpte.c
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-03-21 17:24 ` [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Christophe Leroy
@ 2025-03-25 12:36   ` Alexandre Ghiti
  0 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-03-25 12:36 UTC (permalink / raw)
  To: Christophe Leroy, Alexandre Ghiti, Catalin Marinas, Will Deacon,
	Ryan Roberts, Mark Rutland, Matthew Wilcox, Paul Walmsley,
	Palmer Dabbelt, Andrew Morton, linux-arm-kernel, linux-kernel,
	linux-riscv, linux-mm

Hi Christophe,

On 21/03/2025 18:24, Christophe Leroy wrote:
>
>
> Le 21/03/2025 à 14:06, Alexandre Ghiti a écrit :
>> This patchset intends to merge the contiguous ptes hugetlbfs 
>> implementation
>> of arm64 and riscv.
>
> Can we also add powerpc in the dance ?
>
> powerpc also use contiguous PTEs allthough there is not (yet) a 
> special name for it:
> - b250c8c08c79 powerpc/8xx: Manage 512k huge pages as standard pages
> - e47168f3d1b1 powerpc/8xx: Support 16k hugepages with 4k pages
>
> powerpc also use configuous PMDs/PUDs for larger hugepages:
> - 57fb15c32f4f ("powerpc/64s: use contiguous PMD/PUD instead of HUGEPD")
> - 7c44202e3609 ("powerpc/e500: use contiguous PMD instead of hugepd")
> - 0549e7666373 ("powerpc/8xx: rework support for 8M pages using 
> contiguous PTE entries")


So I have been looking at the powerpc hugetlb implementation and I have 
to admit that I'm struggling to find similarities with how arm64 and 
riscv deal with contiguous pte mappings.

I think the 2 main characteristics of contpte (arm64) and svnapot 
(riscv) are the break-before-make requirement and the HW A/D update on 
only a single pte. Those make the handling of hugetlb pages very similar 
between arm64 and riscv.

But I may have missed something, the powerpc hugetlb implementation is 
quite "scattered" because of the radix/hash page table and 32/64 bit.

Thanks,

Alex


>
> Christophe
>
>>
>> Both arm64 and riscv support the use of contiguous ptes to map pages 
>> that
>> are larger than the default page table size, respectively called contpte
>> and svnapot.
>>
>> The riscv implementation differs from the arm64's in that the LSBs of 
>> the
>> pfn of a svnapot pte are used to store the size of the mapping, allowing
>> for future sizes to be added (for now only 64KB is supported). That's an
>> issue for the core mm code which expects to find the *real* pfn a pte 
>> points
>> to. Patch 1 fixes that by always returning svnapot ptes with the real 
>> pfn
>> and restores the size of the mapping when it is written to a page table.
>>
>> The following patches are just merges of the 2 different implementations
>> that currently exist in arm64 and riscv which are very similar. It paves
>> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
>> reimplementing the same in riscv.
>>
>> This patchset was tested by running the libhugetlbfs testsuite with 64KB
>> and 2MB pages on both architectures (on a 4KB base page size arm64 
>> kernel).
>>
>> [1] 
>> https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/
>>
>> v4: 
>> https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/
>> v3: 
>> https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
>> v2: 
>> https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/
>> v1: 
>> https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/
>>
>> Changes in v5:
>>    - Fix "int i" unused variable in patch 2 (as reported by PW)
>>    - Fix !svnapot build
>>    - Fix arch_make_huge_pte() which returned a real napot pte
>>    - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot 
>> aware to
>>      avoid leaking real napot pfns to core mm
>>    - Fix arch_contpte_get_num_contig() that used to always try to get 
>> the
>>      mapping size from the ptep, which does not work if the ptep 
>> comes the core mm
>>    - Rebase on top of 6.14-rc7 + fix for
>>      huge_ptep_get_and_clear()/huge_pte_clear()
>> https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/
>>
>> Changes in v4:
>>    - Rebase on top of 6.13
>>
>> Changes in v3:
>>    - Split set_ptes and ptep_get into internal and external API (Ryan)
>>    - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE 
>> so that
>>      we split hugetlb functions from contpte functions (actually 
>> riscv contpte
>>      functions to support THP will come into another series) (Ryan)
>>    - Rebase on top of 6.11-rc1
>>
>> Changes in v2:
>>    - Rebase on top of 6.9-rc3
>>
>> Alexandre Ghiti (9):
>>    riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
>>    riscv: Restore the pfn in a NAPOT pte when manipulated by core mm 
>> code
>>    mm: Use common huge_ptep_get() function for riscv/arm64
>>    mm: Use common set_huge_pte_at() function for riscv/arm64
>>    mm: Use common huge_pte_clear() function for riscv/arm64
>>    mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
>>    mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
>>    mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
>>    mm: Use common huge_ptep_clear_flush() function for riscv/arm64
>>
>>   arch/arm64/Kconfig                  |   1 +
>>   arch/arm64/include/asm/hugetlb.h    |  22 +--
>>   arch/arm64/include/asm/pgtable.h    |  68 ++++++-
>>   arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
>>   arch/riscv/Kconfig                  |   1 +
>>   arch/riscv/include/asm/hugetlb.h    |  36 +---
>>   arch/riscv/include/asm/pgtable-64.h |  11 ++
>>   arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
>>   arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
>>   arch/riscv/mm/pgtable.c             |   6 +-
>>   include/linux/hugetlb_contpte.h     |  39 ++++
>>   mm/Kconfig                          |   3 +
>>   mm/Makefile                         |   1 +
>>   mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
>>   14 files changed, 583 insertions(+), 622 deletions(-)
>>   create mode 100644 include/linux/hugetlb_contpte.h
>>   create mode 100644 mm/hugetlb_contpte.c
>>
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
                   ` (9 preceding siblings ...)
  2025-03-21 17:24 ` [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Christophe Leroy
@ 2025-04-07 12:04 ` Alexandre Ghiti
  2025-04-29 14:09   ` Ryan Roberts
  10 siblings, 1 reply; 22+ messages in thread
From: Alexandre Ghiti @ 2025-04-07 12:04 UTC (permalink / raw)
  To: Alexandre Ghiti, Catalin Marinas, Will Deacon, Ryan Roberts,
	Mark Rutland, Matthew Wilcox, Paul Walmsley, Palmer Dabbelt,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm

Can someone from arm64 review this? I think it's preferable to share the 
same implementation between riscv and arm64.

The end goal is the support of mTHP using svnapot on riscv, which we 
want soon, so if that patchset does not gain any traction, I'll just 
copy/paste the arm64 implementation into riscv.

Thanks,

Alex

On 21/03/2025 14:06, Alexandre Ghiti wrote:
> This patchset intends to merge the contiguous ptes hugetlbfs implementation
> of arm64 and riscv.
>
> Both arm64 and riscv support the use of contiguous ptes to map pages that
> are larger than the default page table size, respectively called contpte
> and svnapot.
>
> The riscv implementation differs from the arm64's in that the LSBs of the
> pfn of a svnapot pte are used to store the size of the mapping, allowing
> for future sizes to be added (for now only 64KB is supported). That's an
> issue for the core mm code which expects to find the *real* pfn a pte points
> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
> and restores the size of the mapping when it is written to a page table.
>
> The following patches are just merges of the 2 different implementations
> that currently exist in arm64 and riscv which are very similar. It paves
> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
> reimplementing the same in riscv.
>
> This patchset was tested by running the libhugetlbfs testsuite with 64KB
> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).
>
> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/
>
> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/
> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/
> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/
>
> Changes in v5:
>    - Fix "int i" unused variable in patch 2 (as reported by PW)
>    - Fix !svnapot build
>    - Fix arch_make_huge_pte() which returned a real napot pte
>    - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
>      avoid leaking real napot pfns to core mm
>    - Fix arch_contpte_get_num_contig() that used to always try to get the
>      mapping size from the ptep, which does not work if the ptep comes the core mm
>    - Rebase on top of 6.14-rc7 + fix for
>      huge_ptep_get_and_clear()/huge_pte_clear()
>      https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/
>
> Changes in v4:
>    - Rebase on top of 6.13
>
> Changes in v3:
>    - Split set_ptes and ptep_get into internal and external API (Ryan)
>    - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
>      we split hugetlb functions from contpte functions (actually riscv contpte
>      functions to support THP will come into another series) (Ryan)
>    - Rebase on top of 6.11-rc1
>
> Changes in v2:
>    - Rebase on top of 6.9-rc3
>
> Alexandre Ghiti (9):
>    riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
>    riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
>    mm: Use common huge_ptep_get() function for riscv/arm64
>    mm: Use common set_huge_pte_at() function for riscv/arm64
>    mm: Use common huge_pte_clear() function for riscv/arm64
>    mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
>    mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
>    mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
>    mm: Use common huge_ptep_clear_flush() function for riscv/arm64
>
>   arch/arm64/Kconfig                  |   1 +
>   arch/arm64/include/asm/hugetlb.h    |  22 +--
>   arch/arm64/include/asm/pgtable.h    |  68 ++++++-
>   arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
>   arch/riscv/Kconfig                  |   1 +
>   arch/riscv/include/asm/hugetlb.h    |  36 +---
>   arch/riscv/include/asm/pgtable-64.h |  11 ++
>   arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
>   arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
>   arch/riscv/mm/pgtable.c             |   6 +-
>   include/linux/hugetlb_contpte.h     |  39 ++++
>   mm/Kconfig                          |   3 +
>   mm/Makefile                         |   1 +
>   mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
>   14 files changed, 583 insertions(+), 622 deletions(-)
>   create mode 100644 include/linux/hugetlb_contpte.h
>   create mode 100644 mm/hugetlb_contpte.c
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-04-07 12:04 ` Alexandre Ghiti
@ 2025-04-29 14:09   ` Ryan Roberts
  2025-05-05 16:08     ` Alexandre Ghiti
  0 siblings, 1 reply; 22+ messages in thread
From: Ryan Roberts @ 2025-04-29 14:09 UTC (permalink / raw)
  To: Alexandre Ghiti, Alexandre Ghiti, Catalin Marinas, Will Deacon,
	Mark Rutland, Matthew Wilcox, Paul Walmsley, Palmer Dabbelt,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm

Hi Alexandre,

On 07/04/2025 13:04, Alexandre Ghiti wrote:
> Can someone from arm64 review this? I think it's preferable to share the same
> implementation between riscv and arm64.

I've been thinking about this for a while and had some conversations internally.
This patchset has both pros and cons.

In the pros column, it increases code reuse in an area that has had quite of few
bugs popping up lately; so this would bring more eyes and hopefully higher
quality in the long run.

But in the cons column, we have seen HW errata in similar areas in the past and
I'm nervous that by hoisting this code to mm, we make it harder to workaround
any future errata. Additionally I can imagine that this change could make it
harder to support future Arm architecture enhancements.

I appreciate the cons are not strong *technical* arguments but nevertheless they
are winning out in this case; My opinion is that we should keep the arm64
implementations of huge_pte_ (and contpte_ too - I know you have a separate
series for this) private to arm64.

Sorry about that.

> 
> The end goal is the support of mTHP using svnapot on riscv, which we want soon,
> so if that patchset does not gain any traction, I'll just copy/paste the arm64
> implementation into riscv.

This copy/paste approach would be my preference.

Thanks,
Ryan

> 
> Thanks,
> 
> Alex
> 
> On 21/03/2025 14:06, Alexandre Ghiti wrote:
>> This patchset intends to merge the contiguous ptes hugetlbfs implementation
>> of arm64 and riscv.
>>
>> Both arm64 and riscv support the use of contiguous ptes to map pages that
>> are larger than the default page table size, respectively called contpte
>> and svnapot.
>>
>> The riscv implementation differs from the arm64's in that the LSBs of the
>> pfn of a svnapot pte are used to store the size of the mapping, allowing
>> for future sizes to be added (for now only 64KB is supported). That's an
>> issue for the core mm code which expects to find the *real* pfn a pte points
>> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
>> and restores the size of the mapping when it is written to a page table.
>>
>> The following patches are just merges of the 2 different implementations
>> that currently exist in arm64 and riscv which are very similar. It paves
>> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
>> reimplementing the same in riscv.
>>
>> This patchset was tested by running the libhugetlbfs testsuite with 64KB
>> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).
>>
>> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-
>> ryan.roberts@arm.com/
>>
>> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-
>> alexghiti@rivosinc.com/
>> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
>> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-
>> alexghiti@rivosinc.com/
>> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-
>> alexghiti@rivosinc.com/
>>
>> Changes in v5:
>>    - Fix "int i" unused variable in patch 2 (as reported by PW)
>>    - Fix !svnapot build
>>    - Fix arch_make_huge_pte() which returned a real napot pte
>>    - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
>>      avoid leaking real napot pfns to core mm
>>    - Fix arch_contpte_get_num_contig() that used to always try to get the
>>      mapping size from the ptep, which does not work if the ptep comes the
>> core mm
>>    - Rebase on top of 6.14-rc7 + fix for
>>      huge_ptep_get_and_clear()/huge_pte_clear()
>>      https://lore.kernel.org/linux-riscv/20250317072551.572169-1-
>> alexghiti@rivosinc.com/
>>
>> Changes in v4:
>>    - Rebase on top of 6.13
>>
>> Changes in v3:
>>    - Split set_ptes and ptep_get into internal and external API (Ryan)
>>    - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
>>      we split hugetlb functions from contpte functions (actually riscv contpte
>>      functions to support THP will come into another series) (Ryan)
>>    - Rebase on top of 6.11-rc1
>>
>> Changes in v2:
>>    - Rebase on top of 6.9-rc3
>>
>> Alexandre Ghiti (9):
>>    riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
>>    riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
>>    mm: Use common huge_ptep_get() function for riscv/arm64
>>    mm: Use common set_huge_pte_at() function for riscv/arm64
>>    mm: Use common huge_pte_clear() function for riscv/arm64
>>    mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
>>    mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
>>    mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
>>    mm: Use common huge_ptep_clear_flush() function for riscv/arm64
>>
>>   arch/arm64/Kconfig                  |   1 +
>>   arch/arm64/include/asm/hugetlb.h    |  22 +--
>>   arch/arm64/include/asm/pgtable.h    |  68 ++++++-
>>   arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
>>   arch/riscv/Kconfig                  |   1 +
>>   arch/riscv/include/asm/hugetlb.h    |  36 +---
>>   arch/riscv/include/asm/pgtable-64.h |  11 ++
>>   arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
>>   arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
>>   arch/riscv/mm/pgtable.c             |   6 +-
>>   include/linux/hugetlb_contpte.h     |  39 ++++
>>   mm/Kconfig                          |   3 +
>>   mm/Makefile                         |   1 +
>>   mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
>>   14 files changed, 583 insertions(+), 622 deletions(-)
>>   create mode 100644 include/linux/hugetlb_contpte.h
>>   create mode 100644 mm/hugetlb_contpte.c
>>
> 
> From mboxrd@z Thu Jan  1 00:00:00 1970
> Return-Path: <linux-riscv-bounces+linux-
> riscv=archiver.kernel.org@lists.infradead.org>
> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
>     aws-us-west-2-korg-lkml-1.web.codeaurora.org
> Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
>     (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
>     (No client certificate requested)
>     by smtp.lore.kernel.org (Postfix) with ESMTPS id A4D94C3601E
>     for <linux-riscv@archiver.kernel.org>; Mon,  7 Apr 2025 12:35:59 +0000 (UTC)
> DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
>     d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type:
>     Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive:
>     List-Unsubscribe:List-Id:In-Reply-To:From:References:To:Subject:MIME-Version:
>     Date:Message-ID:Reply-To:Cc:Content-ID:Content-Description:Resent-Date:
>     Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
>     bh=QGtw44ZccGhXZHG0gus8jo8nditsIsPYxbfRUYIB+hU=; b=TuC4N8bBiqSCZqINAlCMfr1aa0
>     HKCtL5AM0VsHJ36rTV1TZCiAN0tKuI4mbGKMbrvNUcKXaa0IaZGgplHJXZPCwfiRmK51dvr1ndwc+
>     x4+UfoK5lEB2HNBzTjcA9nH164vMm8lu0bitMWB+QzfpYT0nprO+11bFlBPqZVI35bwer5bTytL/w
>     2PtmHktDSGJXgSCnDKefpnBo+yiIKU2uq7dhR713fLa1hzLYi5f0+2trqJXfZ5ADJSOBaZc6h2RQo
>     Hfb0DRyNJsiBjuBYn3H1+RCnv6lZwV1eVbltqj1BIjrb0C32Zmnb7FxqUYECyH4vEhWbmYgbwpAKI
>     8BYmZxbA==;
> Received: from localhost ([::1] helo=bombadil.infradead.org)
>     by bombadil.infradead.org with esmtp (Exim 4.98.1 #2 (Red Hat Linux))
>     id 1u1lhh-00000000H0X-3INP;
>     Mon, 07 Apr 2025 12:35:53 +0000
> Received: from relay2-d.mail.gandi.net ([2001:4b98:dc4:8::222])
>     by bombadil.infradead.org with esmtps (Exim 4.98.1 #2 (Red Hat Linux))
>     id 1u1lDQ-000000009MS-3LfF;
>     Mon, 07 Apr 2025 12:04:39 +0000
> Received: by mail.gandi.net (Postfix) with ESMTPSA id E350243163;
>     Mon,  7 Apr 2025 12:04:28 +0000 (UTC)
> Message-ID: <4dd5d187-f977-4f27-9937-8608991797b5@ghiti.fr>
> Date: Mon, 7 Apr 2025 14:04:27 +0200
> MIME-Version: 1.0
> User-Agent: Mozilla Thunderbird
> Subject: Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
> Content-Language: en-US
> To: Alexandre Ghiti <alexghiti@rivosinc.com>,
> Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>,
> Ryan Roberts <ryan.roberts@arm.com>, Mark Rutland <mark.rutland@arm.com>,
> Matthew Wilcox <willy@infradead.org>,
> Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt
> <palmer@dabbelt.com>, Andrew Morton <akpm@linux-foundation.org>,
> linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org,
> linux-riscv@lists.infradead.org, linux-mm@kvack.org
> References: <20250321130635.227011-1-alexghiti@rivosinc.com>
> From: Alexandre Ghiti <alex@ghiti.fr>
> In-Reply-To: <20250321130635.227011-1-alexghiti@rivosinc.com>
> X-GND-State: clean
> X-GND-Score: -100
> X-GND-Cause:
> gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtddtudegucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfhfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeetlhgvgigrnhgurhgvucfihhhithhiuceorghlvgigsehghhhithhirdhfrheqnecuggftrfgrthhtvghrnhepveetvdfhvdeuheekvdettdegheetgeejiefgjeetvedtfeeuvddvtefhjeffgeevnecuffhomhgrihhnpehkvghrnhgvlhdrohhrghenucfkphepvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehpdhhvghloheplgfkrfggieemvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehngdpmhgrihhlfhhrohhmpegrlhgvgiesghhhihhtihdrfhhrpdhnsggprhgtphhtthhopedufedprhgtphhtthhopegrlhgvgihghhhithhisehrihhvohhsihhntgdrtghomhdprhgtphhtthhopegtrghtrghlihhnrdhmrghrihhnrghssegrrhhmrdgtohhmpdhrtghpthhtohepfihilhhls
> ehkvghrnhgvlhdrohhrghdprhgtphhtthhopehrhigrnhdrrhhosggvrhhtshesrghrmhdrtghomhdprhgtphhtthhopehmrghrkhdrrhhuthhlrghnugesrghrmhdrtghomhdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurdhorhhgpdhrtghpthhtohepphgruhhlrdifrghlmhhslhgvhiesshhifhhivhgvrdgtohhmpdhrtghpthhtohepphgrlhhmvghrsegurggssggvlhhtrdgtohhm
> X-GND-Sasl: alex@ghiti.fr
> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-
> CRM114-CacheID: sfid-20250407_050436_994014_8B16F654 X-CRM114-Status: GOOD ( 
> 23.24  )
> X-BeenThere: linux-riscv@lists.infradead.org
> X-Mailman-Version: 2.1.34
> Precedence: list
> List-Id: <linux-riscv.lists.infradead.org>
> List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-riscv>,
> <mailto:linux-riscv-request@lists.infradead.org?subject=unsubscribe>
> List-Archive: <http://lists.infradead.org/pipermail/linux-riscv/>
> List-Post: <mailto:linux-riscv@lists.infradead.org>
> List-Help: <mailto:linux-riscv-request@lists.infradead.org?subject=help>
> List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-riscv>,
> <mailto:linux-riscv-request@lists.infradead.org?subject=subscribe>
> Content-Transfer-Encoding: 7bit
> Content-Type: text/plain; charset="us-ascii"; Format="flowed"
> Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org>
> Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org
> 
> Can someone from arm64 review this? I think it's preferable to share the same
> implementation between riscv and arm64.
> 
> The end goal is the support of mTHP using svnapot on riscv, which we want soon,
> so if that patchset does not gain any traction, I'll just copy/paste the arm64
> implementation into riscv.
> 
> Thanks,
> 
> Alex
> 
> On 21/03/2025 14:06, Alexandre Ghiti wrote:
>> This patchset intends to merge the contiguous ptes hugetlbfs implementation
>> of arm64 and riscv.
>>
>> Both arm64 and riscv support the use of contiguous ptes to map pages that
>> are larger than the default page table size, respectively called contpte
>> and svnapot.
>>
>> The riscv implementation differs from the arm64's in that the LSBs of the
>> pfn of a svnapot pte are used to store the size of the mapping, allowing
>> for future sizes to be added (for now only 64KB is supported). That's an
>> issue for the core mm code which expects to find the *real* pfn a pte points
>> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
>> and restores the size of the mapping when it is written to a page table.
>>
>> The following patches are just merges of the 2 different implementations
>> that currently exist in arm64 and riscv which are very similar. It paves
>> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
>> reimplementing the same in riscv.
>>
>> This patchset was tested by running the libhugetlbfs testsuite with 64KB
>> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).
>>
>> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-
>> ryan.roberts@arm.com/
>>
>> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-
>> alexghiti@rivosinc.com/
>> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
>> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-
>> alexghiti@rivosinc.com/
>> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-
>> alexghiti@rivosinc.com/
>>
>> Changes in v5:
>>    - Fix "int i" unused variable in patch 2 (as reported by PW)
>>    - Fix !svnapot build
>>    - Fix arch_make_huge_pte() which returned a real napot pte
>>    - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
>>      avoid leaking real napot pfns to core mm
>>    - Fix arch_contpte_get_num_contig() that used to always try to get the
>>      mapping size from the ptep, which does not work if the ptep comes the
>> core mm
>>    - Rebase on top of 6.14-rc7 + fix for
>>      huge_ptep_get_and_clear()/huge_pte_clear()
>>      https://lore.kernel.org/linux-riscv/20250317072551.572169-1-
>> alexghiti@rivosinc.com/
>>
>> Changes in v4:
>>    - Rebase on top of 6.13
>>
>> Changes in v3:
>>    - Split set_ptes and ptep_get into internal and external API (Ryan)
>>    - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
>>      we split hugetlb functions from contpte functions (actually riscv contpte
>>      functions to support THP will come into another series) (Ryan)
>>    - Rebase on top of 6.11-rc1
>>
>> Changes in v2:
>>    - Rebase on top of 6.9-rc3
>>
>> Alexandre Ghiti (9):
>>    riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
>>    riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
>>    mm: Use common huge_ptep_get() function for riscv/arm64
>>    mm: Use common set_huge_pte_at() function for riscv/arm64
>>    mm: Use common huge_pte_clear() function for riscv/arm64
>>    mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
>>    mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
>>    mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
>>    mm: Use common huge_ptep_clear_flush() function for riscv/arm64
>>
>>   arch/arm64/Kconfig                  |   1 +
>>   arch/arm64/include/asm/hugetlb.h    |  22 +--
>>   arch/arm64/include/asm/pgtable.h    |  68 ++++++-
>>   arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
>>   arch/riscv/Kconfig                  |   1 +
>>   arch/riscv/include/asm/hugetlb.h    |  36 +---
>>   arch/riscv/include/asm/pgtable-64.h |  11 ++
>>   arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
>>   arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
>>   arch/riscv/mm/pgtable.c             |   6 +-
>>   include/linux/hugetlb_contpte.h     |  39 ++++
>>   mm/Kconfig                          |   3 +
>>   mm/Makefile                         |   1 +
>>   mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
>>   14 files changed, 583 insertions(+), 622 deletions(-)
>>   create mode 100644 include/linux/hugetlb_contpte.h
>>   create mode 100644 mm/hugetlb_contpte.c
>>
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-04-29 14:09   ` Ryan Roberts
@ 2025-05-05 16:08     ` Alexandre Ghiti
  2025-05-08 12:30       ` Will Deacon
  0 siblings, 1 reply; 22+ messages in thread
From: Alexandre Ghiti @ 2025-05-05 16:08 UTC (permalink / raw)
  To: Ryan Roberts, Alexandre Ghiti, Catalin Marinas, Will Deacon,
	Mark Rutland, Matthew Wilcox, Paul Walmsley, Palmer Dabbelt,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm

Hi Ryan,

On 29/04/2025 16:09, Ryan Roberts wrote:
> Hi Alexandre,
>
> On 07/04/2025 13:04, Alexandre Ghiti wrote:
>> Can someone from arm64 review this? I think it's preferable to share the same
>> implementation between riscv and arm64.
> I've been thinking about this for a while and had some conversations internally.
> This patchset has both pros and cons.
>
> In the pros column, it increases code reuse in an area that has had quite of few
> bugs popping up lately; so this would bring more eyes and hopefully higher
> quality in the long run.
>
> But in the cons column, we have seen HW errata in similar areas in the past and
> I'm nervous that by hoisting this code to mm, we make it harder to workaround
> any future errata. Additionally I can imagine that this change could make it
> harder to support future Arm architecture enhancements.
>
> I appreciate the cons are not strong *technical* arguments but nevertheless they
> are winning out in this case; My opinion is that we should keep the arm64
> implementations of huge_pte_ (and contpte_ too - I know you have a separate
> series for this) private to arm64.
>
> Sorry about that.
>
>> The end goal is the support of mTHP using svnapot on riscv, which we want soon,
>> so if that patchset does not gain any traction, I'll just copy/paste the arm64
>> implementation into riscv.
> This copy/paste approach would be my preference.


I have to admit that I disagree with this approach, the riscv and arm64 
implementations are *exactly* the same so it sounds weird to duplicate 
code, the pros you mention outweigh the cons.

Unless I'm missing something about the erratas? To me, that's easily 
fixed by providing arch specific overrides no? Can you describe what 
sort of erratas would not fit then?

Thanks,

Alex


>
> Thanks,
> Ryan
>
>> Thanks,
>>
>> Alex
>>
>> On 21/03/2025 14:06, Alexandre Ghiti wrote:
>>> This patchset intends to merge the contiguous ptes hugetlbfs implementation
>>> of arm64 and riscv.
>>>
>>> Both arm64 and riscv support the use of contiguous ptes to map pages that
>>> are larger than the default page table size, respectively called contpte
>>> and svnapot.
>>>
>>> The riscv implementation differs from the arm64's in that the LSBs of the
>>> pfn of a svnapot pte are used to store the size of the mapping, allowing
>>> for future sizes to be added (for now only 64KB is supported). That's an
>>> issue for the core mm code which expects to find the *real* pfn a pte points
>>> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
>>> and restores the size of the mapping when it is written to a page table.
>>>
>>> The following patches are just merges of the 2 different implementations
>>> that currently exist in arm64 and riscv which are very similar. It paves
>>> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
>>> reimplementing the same in riscv.
>>>
>>> This patchset was tested by running the libhugetlbfs testsuite with 64KB
>>> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).
>>>
>>> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-
>>> ryan.roberts@arm.com/
>>>
>>> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-
>>> alexghiti@rivosinc.com/
>>> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
>>> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-
>>> alexghiti@rivosinc.com/
>>> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-
>>> alexghiti@rivosinc.com/
>>>
>>> Changes in v5:
>>>     - Fix "int i" unused variable in patch 2 (as reported by PW)
>>>     - Fix !svnapot build
>>>     - Fix arch_make_huge_pte() which returned a real napot pte
>>>     - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
>>>       avoid leaking real napot pfns to core mm
>>>     - Fix arch_contpte_get_num_contig() that used to always try to get the
>>>       mapping size from the ptep, which does not work if the ptep comes the
>>> core mm
>>>     - Rebase on top of 6.14-rc7 + fix for
>>>       huge_ptep_get_and_clear()/huge_pte_clear()
>>>       https://lore.kernel.org/linux-riscv/20250317072551.572169-1-
>>> alexghiti@rivosinc.com/
>>>
>>> Changes in v4:
>>>     - Rebase on top of 6.13
>>>
>>> Changes in v3:
>>>     - Split set_ptes and ptep_get into internal and external API (Ryan)
>>>     - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
>>>       we split hugetlb functions from contpte functions (actually riscv contpte
>>>       functions to support THP will come into another series) (Ryan)
>>>     - Rebase on top of 6.11-rc1
>>>
>>> Changes in v2:
>>>     - Rebase on top of 6.9-rc3
>>>
>>> Alexandre Ghiti (9):
>>>     riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
>>>     riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
>>>     mm: Use common huge_ptep_get() function for riscv/arm64
>>>     mm: Use common set_huge_pte_at() function for riscv/arm64
>>>     mm: Use common huge_pte_clear() function for riscv/arm64
>>>     mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
>>>     mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
>>>     mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
>>>     mm: Use common huge_ptep_clear_flush() function for riscv/arm64
>>>
>>>    arch/arm64/Kconfig                  |   1 +
>>>    arch/arm64/include/asm/hugetlb.h    |  22 +--
>>>    arch/arm64/include/asm/pgtable.h    |  68 ++++++-
>>>    arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
>>>    arch/riscv/Kconfig                  |   1 +
>>>    arch/riscv/include/asm/hugetlb.h    |  36 +---
>>>    arch/riscv/include/asm/pgtable-64.h |  11 ++
>>>    arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
>>>    arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
>>>    arch/riscv/mm/pgtable.c             |   6 +-
>>>    include/linux/hugetlb_contpte.h     |  39 ++++
>>>    mm/Kconfig                          |   3 +
>>>    mm/Makefile                         |   1 +
>>>    mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
>>>    14 files changed, 583 insertions(+), 622 deletions(-)
>>>    create mode 100644 include/linux/hugetlb_contpte.h
>>>    create mode 100644 mm/hugetlb_contpte.c
>>>
>>  From mboxrd@z Thu Jan  1 00:00:00 1970
>> Return-Path: <linux-riscv-bounces+linux-
>> riscv=archiver.kernel.org@lists.infradead.org>
>> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
>>      aws-us-west-2-korg-lkml-1.web.codeaurora.org
>> Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
>>      (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
>>      (No client certificate requested)
>>      by smtp.lore.kernel.org (Postfix) with ESMTPS id A4D94C3601E
>>      for <linux-riscv@archiver.kernel.org>; Mon,  7 Apr 2025 12:35:59 +0000 (UTC)
>> DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
>>      d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type:
>>      Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive:
>>      List-Unsubscribe:List-Id:In-Reply-To:From:References:To:Subject:MIME-Version:
>>      Date:Message-ID:Reply-To:Cc:Content-ID:Content-Description:Resent-Date:
>>      Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
>>      bh=QGtw44ZccGhXZHG0gus8jo8nditsIsPYxbfRUYIB+hU=; b=TuC4N8bBiqSCZqINAlCMfr1aa0
>>      HKCtL5AM0VsHJ36rTV1TZCiAN0tKuI4mbGKMbrvNUcKXaa0IaZGgplHJXZPCwfiRmK51dvr1ndwc+
>>      x4+UfoK5lEB2HNBzTjcA9nH164vMm8lu0bitMWB+QzfpYT0nprO+11bFlBPqZVI35bwer5bTytL/w
>>      2PtmHktDSGJXgSCnDKefpnBo+yiIKU2uq7dhR713fLa1hzLYi5f0+2trqJXfZ5ADJSOBaZc6h2RQo
>>      Hfb0DRyNJsiBjuBYn3H1+RCnv6lZwV1eVbltqj1BIjrb0C32Zmnb7FxqUYECyH4vEhWbmYgbwpAKI
>>      8BYmZxbA==;
>> Received: from localhost ([::1] helo=bombadil.infradead.org)
>>      by bombadil.infradead.org with esmtp (Exim 4.98.1 #2 (Red Hat Linux))
>>      id 1u1lhh-00000000H0X-3INP;
>>      Mon, 07 Apr 2025 12:35:53 +0000
>> Received: from relay2-d.mail.gandi.net ([2001:4b98:dc4:8::222])
>>      by bombadil.infradead.org with esmtps (Exim 4.98.1 #2 (Red Hat Linux))
>>      id 1u1lDQ-000000009MS-3LfF;
>>      Mon, 07 Apr 2025 12:04:39 +0000
>> Received: by mail.gandi.net (Postfix) with ESMTPSA id E350243163;
>>      Mon,  7 Apr 2025 12:04:28 +0000 (UTC)
>> Message-ID: <4dd5d187-f977-4f27-9937-8608991797b5@ghiti.fr>
>> Date: Mon, 7 Apr 2025 14:04:27 +0200
>> MIME-Version: 1.0
>> User-Agent: Mozilla Thunderbird
>> Subject: Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
>> Content-Language: en-US
>> To: Alexandre Ghiti <alexghiti@rivosinc.com>,
>> Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>,
>> Ryan Roberts <ryan.roberts@arm.com>, Mark Rutland <mark.rutland@arm.com>,
>> Matthew Wilcox <willy@infradead.org>,
>> Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt
>> <palmer@dabbelt.com>, Andrew Morton <akpm@linux-foundation.org>,
>> linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org,
>> linux-riscv@lists.infradead.org, linux-mm@kvack.org
>> References: <20250321130635.227011-1-alexghiti@rivosinc.com>
>> From: Alexandre Ghiti <alex@ghiti.fr>
>> In-Reply-To: <20250321130635.227011-1-alexghiti@rivosinc.com>
>> X-GND-State: clean
>> X-GND-Score: -100
>> X-GND-Cause:
>> gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtddtudegucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfhfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeetlhgvgigrnhgurhgvucfihhhithhiuceorghlvgigsehghhhithhirdhfrheqnecuggftrfgrthhtvghrnhepveetvdfhvdeuheekvdettdegheetgeejiefgjeetvedtfeeuvddvtefhjeffgeevnecuffhomhgrihhnpehkvghrnhgvlhdrohhrghenucfkphepvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehpdhhvghloheplgfkrfggieemvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehngdpmhgrihhlfhhrohhmpegrlhgvgiesghhhihhtihdrfhhrpdhnsggprhgtphhtthhopedufedprhgtphhtthhopegrlhgvgihghhhithhisehrihhvohhsihhntgdrtghomhdprhgtphhtthhopegtrghtrghlihhnrdhmrghrihhnrghssegrrhhmrdgtohhmpdhrtghpthhtohepfihilhhls
>> ehkvghrnhgvlhdrohhrghdprhgtphhtthhopehrhigrnhdrrhhosggvrhhtshesrghrmhdrtghomhdprhgtphhtthhopehmrghrkhdrrhhuthhlrghnugesrghrmhdrtghomhdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurdhorhhgpdhrtghpthhtohepphgruhhlrdifrghlmhhslhgvhiesshhifhhivhgvrdgtohhmpdhrtghpthhtohepphgrlhhmvghrsegurggssggvlhhtrdgtohhm
>> X-GND-Sasl: alex@ghiti.fr
>> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-
>> CRM114-CacheID: sfid-20250407_050436_994014_8B16F654 X-CRM114-Status: GOOD (
>> 23.24  )
>> X-BeenThere: linux-riscv@lists.infradead.org
>> X-Mailman-Version: 2.1.34
>> Precedence: list
>> List-Id: <linux-riscv.lists.infradead.org>
>> List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-riscv>,
>> <mailto:linux-riscv-request@lists.infradead.org?subject=unsubscribe>
>> List-Archive: <http://lists.infradead.org/pipermail/linux-riscv/>
>> List-Post: <mailto:linux-riscv@lists.infradead.org>
>> List-Help: <mailto:linux-riscv-request@lists.infradead.org?subject=help>
>> List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-riscv>,
>> <mailto:linux-riscv-request@lists.infradead.org?subject=subscribe>
>> Content-Transfer-Encoding: 7bit
>> Content-Type: text/plain; charset="us-ascii"; Format="flowed"
>> Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org>
>> Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org
>>
>> Can someone from arm64 review this? I think it's preferable to share the same
>> implementation between riscv and arm64.
>>
>> The end goal is the support of mTHP using svnapot on riscv, which we want soon,
>> so if that patchset does not gain any traction, I'll just copy/paste the arm64
>> implementation into riscv.
>>
>> Thanks,
>>
>> Alex
>>
>> On 21/03/2025 14:06, Alexandre Ghiti wrote:
>>> This patchset intends to merge the contiguous ptes hugetlbfs implementation
>>> of arm64 and riscv.
>>>
>>> Both arm64 and riscv support the use of contiguous ptes to map pages that
>>> are larger than the default page table size, respectively called contpte
>>> and svnapot.
>>>
>>> The riscv implementation differs from the arm64's in that the LSBs of the
>>> pfn of a svnapot pte are used to store the size of the mapping, allowing
>>> for future sizes to be added (for now only 64KB is supported). That's an
>>> issue for the core mm code which expects to find the *real* pfn a pte points
>>> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
>>> and restores the size of the mapping when it is written to a page table.
>>>
>>> The following patches are just merges of the 2 different implementations
>>> that currently exist in arm64 and riscv which are very similar. It paves
>>> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
>>> reimplementing the same in riscv.
>>>
>>> This patchset was tested by running the libhugetlbfs testsuite with 64KB
>>> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).
>>>
>>> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-
>>> ryan.roberts@arm.com/
>>>
>>> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-
>>> alexghiti@rivosinc.com/
>>> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
>>> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-
>>> alexghiti@rivosinc.com/
>>> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-
>>> alexghiti@rivosinc.com/
>>>
>>> Changes in v5:
>>>     - Fix "int i" unused variable in patch 2 (as reported by PW)
>>>     - Fix !svnapot build
>>>     - Fix arch_make_huge_pte() which returned a real napot pte
>>>     - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
>>>       avoid leaking real napot pfns to core mm
>>>     - Fix arch_contpte_get_num_contig() that used to always try to get the
>>>       mapping size from the ptep, which does not work if the ptep comes the
>>> core mm
>>>     - Rebase on top of 6.14-rc7 + fix for
>>>       huge_ptep_get_and_clear()/huge_pte_clear()
>>>       https://lore.kernel.org/linux-riscv/20250317072551.572169-1-
>>> alexghiti@rivosinc.com/
>>>
>>> Changes in v4:
>>>     - Rebase on top of 6.13
>>>
>>> Changes in v3:
>>>     - Split set_ptes and ptep_get into internal and external API (Ryan)
>>>     - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
>>>       we split hugetlb functions from contpte functions (actually riscv contpte
>>>       functions to support THP will come into another series) (Ryan)
>>>     - Rebase on top of 6.11-rc1
>>>
>>> Changes in v2:
>>>     - Rebase on top of 6.9-rc3
>>>
>>> Alexandre Ghiti (9):
>>>     riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
>>>     riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
>>>     mm: Use common huge_ptep_get() function for riscv/arm64
>>>     mm: Use common set_huge_pte_at() function for riscv/arm64
>>>     mm: Use common huge_pte_clear() function for riscv/arm64
>>>     mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
>>>     mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
>>>     mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
>>>     mm: Use common huge_ptep_clear_flush() function for riscv/arm64
>>>
>>>    arch/arm64/Kconfig                  |   1 +
>>>    arch/arm64/include/asm/hugetlb.h    |  22 +--
>>>    arch/arm64/include/asm/pgtable.h    |  68 ++++++-
>>>    arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
>>>    arch/riscv/Kconfig                  |   1 +
>>>    arch/riscv/include/asm/hugetlb.h    |  36 +---
>>>    arch/riscv/include/asm/pgtable-64.h |  11 ++
>>>    arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
>>>    arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
>>>    arch/riscv/mm/pgtable.c             |   6 +-
>>>    include/linux/hugetlb_contpte.h     |  39 ++++
>>>    mm/Kconfig                          |   3 +
>>>    mm/Makefile                         |   1 +
>>>    mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
>>>    14 files changed, 583 insertions(+), 622 deletions(-)
>>>    create mode 100644 include/linux/hugetlb_contpte.h
>>>    create mode 100644 mm/hugetlb_contpte.c
>>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-05-05 16:08     ` Alexandre Ghiti
@ 2025-05-08 12:30       ` Will Deacon
  2025-05-09 11:09         ` Alexandre Ghiti
  0 siblings, 1 reply; 22+ messages in thread
From: Will Deacon @ 2025-05-08 12:30 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Ryan Roberts, Alexandre Ghiti, Catalin Marinas, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Andrew Morton,
	linux-arm-kernel, linux-kernel, linux-riscv, linux-mm

Hi folks,

On Mon, May 05, 2025 at 06:08:50PM +0200, Alexandre Ghiti wrote:
> On 29/04/2025 16:09, Ryan Roberts wrote:
> > On 07/04/2025 13:04, Alexandre Ghiti wrote:
> > > Can someone from arm64 review this? I think it's preferable to share the same
> > > implementation between riscv and arm64.
> > I've been thinking about this for a while and had some conversations internally.
> > This patchset has both pros and cons.
> > 
> > In the pros column, it increases code reuse in an area that has had quite of few
> > bugs popping up lately; so this would bring more eyes and hopefully higher
> > quality in the long run.
> > 
> > But in the cons column, we have seen HW errata in similar areas in the past and
> > I'm nervous that by hoisting this code to mm, we make it harder to workaround
> > any future errata. Additionally I can imagine that this change could make it
> > harder to support future Arm architecture enhancements.
> > 
> > I appreciate the cons are not strong *technical* arguments but nevertheless they
> > are winning out in this case; My opinion is that we should keep the arm64
> > implementations of huge_pte_ (and contpte_ too - I know you have a separate
> > series for this) private to arm64.
> > 
> > Sorry about that.
> > 
> > > The end goal is the support of mTHP using svnapot on riscv, which we want soon,
> > > so if that patchset does not gain any traction, I'll just copy/paste the arm64
> > > implementation into riscv.
> > This copy/paste approach would be my preference.
> 
> 
> I have to admit that I disagree with this approach, the riscv and arm64
> implementations are *exactly* the same so it sounds weird to duplicate code,
> the pros you mention outweigh the cons.
> 
> Unless I'm missing something about the erratas? To me, that's easily fixed
> by providing arch specific overrides no? Can you describe what sort of
> erratas would not fit then?

If we start with the common implementation you have here, nothing
prevents us from forking the code in future if the architectures diverge
so I'd be inclined to merge this series and see how we get on. However,
one thing I *do* think we need to ensure is that the relevant folks from
both arm64 (i.e. Ryan) and riscv (i.e. Alexandre) are cc'd on changes to
the common code. Otherwise, it's going to be a step backwards in terms
of maintainability.

Could we add something to MAINTAINERS so that the new file picks you both
up as reviewers?

Will


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-05-08 12:30       ` Will Deacon
@ 2025-05-09 11:09         ` Alexandre Ghiti
  2025-05-09 13:02           ` Ryan Roberts
  0 siblings, 1 reply; 22+ messages in thread
From: Alexandre Ghiti @ 2025-05-09 11:09 UTC (permalink / raw)
  To: Will Deacon, Lorenzo Stoakes
  Cc: Alexandre Ghiti, Ryan Roberts, Catalin Marinas, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Andrew Morton,
	linux-arm-kernel, linux-kernel, linux-riscv, linux-mm

Hi Will,

On Thu, May 8, 2025 at 2:30 PM Will Deacon <will@kernel.org> wrote:
>
> Hi folks,
>
> On Mon, May 05, 2025 at 06:08:50PM +0200, Alexandre Ghiti wrote:
> > On 29/04/2025 16:09, Ryan Roberts wrote:
> > > On 07/04/2025 13:04, Alexandre Ghiti wrote:
> > > > Can someone from arm64 review this? I think it's preferable to share the same
> > > > implementation between riscv and arm64.
> > > I've been thinking about this for a while and had some conversations internally.
> > > This patchset has both pros and cons.
> > >
> > > In the pros column, it increases code reuse in an area that has had quite of few
> > > bugs popping up lately; so this would bring more eyes and hopefully higher
> > > quality in the long run.
> > >
> > > But in the cons column, we have seen HW errata in similar areas in the past and
> > > I'm nervous that by hoisting this code to mm, we make it harder to workaround
> > > any future errata. Additionally I can imagine that this change could make it
> > > harder to support future Arm architecture enhancements.
> > >
> > > I appreciate the cons are not strong *technical* arguments but nevertheless they
> > > are winning out in this case; My opinion is that we should keep the arm64
> > > implementations of huge_pte_ (and contpte_ too - I know you have a separate
> > > series for this) private to arm64.
> > >
> > > Sorry about that.
> > >
> > > > The end goal is the support of mTHP using svnapot on riscv, which we want soon,
> > > > so if that patchset does not gain any traction, I'll just copy/paste the arm64
> > > > implementation into riscv.
> > > This copy/paste approach would be my preference.
> >
> >
> > I have to admit that I disagree with this approach, the riscv and arm64
> > implementations are *exactly* the same so it sounds weird to duplicate code,
> > the pros you mention outweigh the cons.
> >
> > Unless I'm missing something about the erratas? To me, that's easily fixed
> > by providing arch specific overrides no? Can you describe what sort of
> > erratas would not fit then?
>
> If we start with the common implementation you have here, nothing
> prevents us from forking the code in future if the architectures diverge
> so I'd be inclined to merge this series and see how we get on. However,
> one thing I *do* think we need to ensure is that the relevant folks from
> both arm64 (i.e. Ryan) and riscv (i.e. Alexandre) are cc'd on changes to
> the common code. Otherwise, it's going to be a step backwards in terms
> of maintainability.
>
> Could we add something to MAINTAINERS so that the new file picks you both
> up as reviewers?

I'm adding Lorenzo as he is cleaning the mm MAINTAINERS entries.

@Lorenzo: should we add a new section "CONTPTE" for this? FYI, hugetlb
is the first patchset, I have another patchset to merge THP contpte
support [1] as well so the "HUGETLB" section does not seem to be a
good fit.

[1] https://lore.kernel.org/linux-riscv/20240508191931.46060-1-alexghiti@rivosinc.com/

Thanks,

Alex

>
> Will


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-05-09 11:09         ` Alexandre Ghiti
@ 2025-05-09 13:02           ` Ryan Roberts
  2025-05-21 14:57             ` Lorenzo Stoakes
  0 siblings, 1 reply; 22+ messages in thread
From: Ryan Roberts @ 2025-05-09 13:02 UTC (permalink / raw)
  To: Alexandre Ghiti, Will Deacon, Lorenzo Stoakes
  Cc: Alexandre Ghiti, Catalin Marinas, Mark Rutland, Matthew Wilcox,
	Paul Walmsley, Palmer Dabbelt, Andrew Morton, linux-arm-kernel,
	linux-kernel, linux-riscv, linux-mm

On 09/05/2025 12:09, Alexandre Ghiti wrote:
> Hi Will,
> 
> On Thu, May 8, 2025 at 2:30 PM Will Deacon <will@kernel.org> wrote:
>>
>> Hi folks,
>>
>> On Mon, May 05, 2025 at 06:08:50PM +0200, Alexandre Ghiti wrote:
>>> On 29/04/2025 16:09, Ryan Roberts wrote:
>>>> On 07/04/2025 13:04, Alexandre Ghiti wrote:
>>>>> Can someone from arm64 review this? I think it's preferable to share the same
>>>>> implementation between riscv and arm64.
>>>> I've been thinking about this for a while and had some conversations internally.
>>>> This patchset has both pros and cons.
>>>>
>>>> In the pros column, it increases code reuse in an area that has had quite of few
>>>> bugs popping up lately; so this would bring more eyes and hopefully higher
>>>> quality in the long run.
>>>>
>>>> But in the cons column, we have seen HW errata in similar areas in the past and
>>>> I'm nervous that by hoisting this code to mm, we make it harder to workaround
>>>> any future errata. Additionally I can imagine that this change could make it
>>>> harder to support future Arm architecture enhancements.
>>>>
>>>> I appreciate the cons are not strong *technical* arguments but nevertheless they
>>>> are winning out in this case; My opinion is that we should keep the arm64
>>>> implementations of huge_pte_ (and contpte_ too - I know you have a separate
>>>> series for this) private to arm64.
>>>>
>>>> Sorry about that.
>>>>
>>>>> The end goal is the support of mTHP using svnapot on riscv, which we want soon,
>>>>> so if that patchset does not gain any traction, I'll just copy/paste the arm64
>>>>> implementation into riscv.
>>>> This copy/paste approach would be my preference.
>>>
>>>
>>> I have to admit that I disagree with this approach, the riscv and arm64
>>> implementations are *exactly* the same so it sounds weird to duplicate code,
>>> the pros you mention outweigh the cons.
>>>
>>> Unless I'm missing something about the erratas? To me, that's easily fixed
>>> by providing arch specific overrides no? Can you describe what sort of
>>> erratas would not fit then?

One concrete feature is the use of Arm's FEAT_BBM level 2 to avoid having to do
break-before-make and TLB maintenance when doing a fold or unfold operation.
There is a series in flight to add this support at [1]. I can see this type of
approach being extended to the hugetlb helpers in future.

I also have another series in flight at [2] that tidies up the hugetlb
implementation and does some optimizations. But the optimizations depend on
arm64-specific TLB maintenance APIs.

[1]
https://lore.kernel.org/linux-arm-kernel/20250428153514.55772-2-miko.lenczewski@arm.com/

[2]
https://lore.kernel.org/linux-arm-kernel/20250422081822.1836315-1-ryan.roberts@arm.com/

As for errata, that's obviously much more fuzzy; there have been a bunch
relating to the MMU in the recent past, and I wouldn't be shocked if more turned up.

For future architecture enchancements, I'm aware of one potential feature being
discussed for which this change would likely make it harder to implement.

>>
>> If we start with the common implementation you have here, nothing
>> prevents us from forking the code in future if the architectures diverge
>> so I'd be inclined to merge this series and see how we get on. 

OK if that's your preference, I'm ok with it. I don't have strong opinion, just
a sense that we will end up with loads of arch-specific overrides. As you say,
let's see.

Alexandre, I guess this series is quite old now and will need to incorporate the
hugtelb fixes I did last cycle? And ideally I'd like [2] to land then for that
to also be incorporated into your next version. (I'm still hopeful we can get
[2] into v6.16 and have been waiting patiently for Will to pick it up ;) ).

I guess we can worry about [1] later as that is only affected by your other series.

How does that sound?

>> However,
>> one thing I *do* think we need to ensure is that the relevant folks from
>> both arm64 (i.e. Ryan) and riscv (i.e. Alexandre) are cc'd on changes to
>> the common code. Otherwise, it's going to be a step backwards in terms
>> of maintainability.
>>>> Could we add something to MAINTAINERS so that the new file picks you both
>> up as reviewers?

That's fine with me. Lorenzo added me for some parts of MM this cycle anyway.

Thanks,
Ryan

> 
> I'm adding Lorenzo as he is cleaning the mm MAINTAINERS entries.
> 
> @Lorenzo: should we add a new section "CONTPTE" for this? FYI, hugetlb
> is the first patchset, I have another patchset to merge THP contpte
> support [1] as well so the "HUGETLB" section does not seem to be a
> good fit.
> 
> [1] https://lore.kernel.org/linux-riscv/20240508191931.46060-1-alexghiti@rivosinc.com/
> 
> Thanks,
> 
> Alex
> 
>>
>> Will



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-05-09 13:02           ` Ryan Roberts
@ 2025-05-21 14:57             ` Lorenzo Stoakes
  2025-05-27  9:25               ` Alexandre Ghiti
  0 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-05-21 14:57 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Alexandre Ghiti, Will Deacon, Alexandre Ghiti, Catalin Marinas,
	Mark Rutland, Matthew Wilcox, Paul Walmsley, Palmer Dabbelt,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm

-cc my gmail, I no longer check kernel mail here at all, everything is via my
 work mail (lorenzo.stoakes@oracle.com :)

So apologies for missing this.

On Fri, May 09, 2025 at 02:02:03PM +0100, Ryan Roberts wrote:
> On 09/05/2025 12:09, Alexandre Ghiti wrote:
> > Hi Will,
> >
> > On Thu, May 8, 2025 at 2:30 PM Will Deacon <will@kernel.org> wrote:
> >>
> >> Hi folks,
> >>
> >> On Mon, May 05, 2025 at 06:08:50PM +0200, Alexandre Ghiti wrote:
> >>> On 29/04/2025 16:09, Ryan Roberts wrote:
> >>>> On 07/04/2025 13:04, Alexandre Ghiti wrote:
> >>>>> Can someone from arm64 review this? I think it's preferable to share the same
> >>>>> implementation between riscv and arm64.
> >>>> I've been thinking about this for a while and had some conversations internally.
> >>>> This patchset has both pros and cons.
> >>>>
> >>>> In the pros column, it increases code reuse in an area that has had quite of few
> >>>> bugs popping up lately; so this would bring more eyes and hopefully higher
> >>>> quality in the long run.
> >>>>
> >>>> But in the cons column, we have seen HW errata in similar areas in the past and
> >>>> I'm nervous that by hoisting this code to mm, we make it harder to workaround
> >>>> any future errata. Additionally I can imagine that this change could make it
> >>>> harder to support future Arm architecture enhancements.
> >>>>
> >>>> I appreciate the cons are not strong *technical* arguments but nevertheless they
> >>>> are winning out in this case; My opinion is that we should keep the arm64
> >>>> implementations of huge_pte_ (and contpte_ too - I know you have a separate
> >>>> series for this) private to arm64.
> >>>>
> >>>> Sorry about that.
> >>>>
> >>>>> The end goal is the support of mTHP using svnapot on riscv, which we want soon,
> >>>>> so if that patchset does not gain any traction, I'll just copy/paste the arm64
> >>>>> implementation into riscv.
> >>>> This copy/paste approach would be my preference.
> >>>
> >>>
> >>> I have to admit that I disagree with this approach, the riscv and arm64
> >>> implementations are *exactly* the same so it sounds weird to duplicate code,
> >>> the pros you mention outweigh the cons.
> >>>
> >>> Unless I'm missing something about the erratas? To me, that's easily fixed
> >>> by providing arch specific overrides no? Can you describe what sort of
> >>> erratas would not fit then?
>
> One concrete feature is the use of Arm's FEAT_BBM level 2 to avoid having to do
> break-before-make and TLB maintenance when doing a fold or unfold operation.
> There is a series in flight to add this support at [1]. I can see this type of
> approach being extended to the hugetlb helpers in future.
>
> I also have another series in flight at [2] that tidies up the hugetlb
> implementation and does some optimizations. But the optimizations depend on
> arm64-specific TLB maintenance APIs.
>
> [1]
> https://lore.kernel.org/linux-arm-kernel/20250428153514.55772-2-miko.lenczewski@arm.com/
>
> [2]
> https://lore.kernel.org/linux-arm-kernel/20250422081822.1836315-1-ryan.roberts@arm.com/
>
> As for errata, that's obviously much more fuzzy; there have been a bunch
> relating to the MMU in the recent past, and I wouldn't be shocked if more turned up.
>
> For future architecture enchancements, I'm aware of one potential feature being
> discussed for which this change would likely make it harder to implement.
>
> >>
> >> If we start with the common implementation you have here, nothing
> >> prevents us from forking the code in future if the architectures diverge
> >> so I'd be inclined to merge this series and see how we get on.
>
> OK if that's your preference, I'm ok with it. I don't have strong opinion, just
> a sense that we will end up with loads of arch-specific overrides. As you say,
> let's see.
>
> Alexandre, I guess this series is quite old now and will need to incorporate the
> hugtelb fixes I did last cycle? And ideally I'd like [2] to land then for that
> to also be incorporated into your next version. (I'm still hopeful we can get
> [2] into v6.16 and have been waiting patiently for Will to pick it up ;) ).
>
> I guess we can worry about [1] later as that is only affected by your other series.
>
> How does that sound?
>
> >> However,
> >> one thing I *do* think we need to ensure is that the relevant folks from
> >> both arm64 (i.e. Ryan) and riscv (i.e. Alexandre) are cc'd on changes to
> >> the common code. Otherwise, it's going to be a step backwards in terms
> >> of maintainability.
> >>>> Could we add something to MAINTAINERS so that the new file picks you both
> >> up as reviewers?
>
> That's fine with me. Lorenzo added me for some parts of MM this cycle anyway.
>
> Thanks,
> Ryan

Indeed :) happy to have you there Ryan!

>
> >
> > I'm adding Lorenzo as he is cleaning the mm MAINTAINERS entries.
> >
> > @Lorenzo: should we add a new section "CONTPTE" for this? FYI, hugetlb
> > is the first patchset, I have another patchset to merge THP contpte
> > support [1] as well so the "HUGETLB" section does not seem to be a
> > good fit.

Hm, this does seem to be very arm64-specific right?

But having said that, literally can see risc v entries :)

We are in a strange sort of scenario where there's some cross-over here.

I don't strictly object to it though, this stuff is important and we should get
the mm files absolutely under an appropriate MAINTAINER entry.

So right now it seems the files would consist of:

include/linux/hugetlb_contpte.h
mm/hugetlb_contpte.c

Is this correct?

Is this series intended to be taken by Andrew or through an arch tree?

And who would you sensibly propose for M's and R's?

If we are definitely adding things that sit outside hugetlb or anything
arch-specific, and is in fact generic mm code, then yes this should be a
section.

Does contpte stand for 'Contiguous PTE'?

Then entry could perhaps be:

MEMORY MANAGEMENT - CONTPTE (CONTIGUOUS PTE SUPPORT)

I'd say this entry should probably be added as a patch in this series.

If you give me a list of R's and M's and confirm those files I can very quickly
copy/pasta from an existing entry and then you could respin (and cc my work mail
for the series :P) and include that as an additional patch?

Happy to ACK that in that case.


> >
> > [1] https://lore.kernel.org/linux-riscv/20240508191931.46060-1-alexghiti@rivosinc.com/
> >
> > Thanks,
> >
> > Alex
> >
> >>
> >> Will
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-05-21 14:57             ` Lorenzo Stoakes
@ 2025-05-27  9:25               ` Alexandre Ghiti
  2025-05-27  9:37                 ` Lorenzo Stoakes
  0 siblings, 1 reply; 22+ messages in thread
From: Alexandre Ghiti @ 2025-05-27  9:25 UTC (permalink / raw)
  To: Lorenzo Stoakes, Ryan Roberts
  Cc: Alexandre Ghiti, Will Deacon, Catalin Marinas, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Andrew Morton,
	linux-arm-kernel, linux-kernel, linux-riscv, linux-mm

Hi Lorenzo,

On 5/21/25 16:57, Lorenzo Stoakes wrote:
> -cc my gmail, I no longer check kernel mail here at all, everything is via my
>   work mail (lorenzo.stoakes@oracle.com :)
>
> So apologies for missing this.
>
> On Fri, May 09, 2025 at 02:02:03PM +0100, Ryan Roberts wrote:
>> On 09/05/2025 12:09, Alexandre Ghiti wrote:
>>> Hi Will,
>>>
>>> On Thu, May 8, 2025 at 2:30 PM Will Deacon <will@kernel.org> wrote:
>>>> Hi folks,
>>>>
>>>> On Mon, May 05, 2025 at 06:08:50PM +0200, Alexandre Ghiti wrote:
>>>>> On 29/04/2025 16:09, Ryan Roberts wrote:
>>>>>> On 07/04/2025 13:04, Alexandre Ghiti wrote:
>>>>>>> Can someone from arm64 review this? I think it's preferable to share the same
>>>>>>> implementation between riscv and arm64.
>>>>>> I've been thinking about this for a while and had some conversations internally.
>>>>>> This patchset has both pros and cons.
>>>>>>
>>>>>> In the pros column, it increases code reuse in an area that has had quite of few
>>>>>> bugs popping up lately; so this would bring more eyes and hopefully higher
>>>>>> quality in the long run.
>>>>>>
>>>>>> But in the cons column, we have seen HW errata in similar areas in the past and
>>>>>> I'm nervous that by hoisting this code to mm, we make it harder to workaround
>>>>>> any future errata. Additionally I can imagine that this change could make it
>>>>>> harder to support future Arm architecture enhancements.
>>>>>>
>>>>>> I appreciate the cons are not strong *technical* arguments but nevertheless they
>>>>>> are winning out in this case; My opinion is that we should keep the arm64
>>>>>> implementations of huge_pte_ (and contpte_ too - I know you have a separate
>>>>>> series for this) private to arm64.
>>>>>>
>>>>>> Sorry about that.
>>>>>>
>>>>>>> The end goal is the support of mTHP using svnapot on riscv, which we want soon,
>>>>>>> so if that patchset does not gain any traction, I'll just copy/paste the arm64
>>>>>>> implementation into riscv.
>>>>>> This copy/paste approach would be my preference.
>>>>>
>>>>> I have to admit that I disagree with this approach, the riscv and arm64
>>>>> implementations are *exactly* the same so it sounds weird to duplicate code,
>>>>> the pros you mention outweigh the cons.
>>>>>
>>>>> Unless I'm missing something about the erratas? To me, that's easily fixed
>>>>> by providing arch specific overrides no? Can you describe what sort of
>>>>> erratas would not fit then?
>> One concrete feature is the use of Arm's FEAT_BBM level 2 to avoid having to do
>> break-before-make and TLB maintenance when doing a fold or unfold operation.
>> There is a series in flight to add this support at [1]. I can see this type of
>> approach being extended to the hugetlb helpers in future.
>>
>> I also have another series in flight at [2] that tidies up the hugetlb
>> implementation and does some optimizations. But the optimizations depend on
>> arm64-specific TLB maintenance APIs.
>>
>> [1]
>> https://lore.kernel.org/linux-arm-kernel/20250428153514.55772-2-miko.lenczewski@arm.com/
>>
>> [2]
>> https://lore.kernel.org/linux-arm-kernel/20250422081822.1836315-1-ryan.roberts@arm.com/
>>
>> As for errata, that's obviously much more fuzzy; there have been a bunch
>> relating to the MMU in the recent past, and I wouldn't be shocked if more turned up.
>>
>> For future architecture enchancements, I'm aware of one potential feature being
>> discussed for which this change would likely make it harder to implement.
>>
>>>> If we start with the common implementation you have here, nothing
>>>> prevents us from forking the code in future if the architectures diverge
>>>> so I'd be inclined to merge this series and see how we get on.
>> OK if that's your preference, I'm ok with it. I don't have strong opinion, just
>> a sense that we will end up with loads of arch-specific overrides. As you say,
>> let's see.
>>
>> Alexandre, I guess this series is quite old now and will need to incorporate the
>> hugtelb fixes I did last cycle? And ideally I'd like [2] to land then for that
>> to also be incorporated into your next version. (I'm still hopeful we can get
>> [2] into v6.16 and have been waiting patiently for Will to pick it up ;) ).
>>
>> I guess we can worry about [1] later as that is only affected by your other series.
>>
>> How does that sound?
>>
>>>> However,
>>>> one thing I *do* think we need to ensure is that the relevant folks from
>>>> both arm64 (i.e. Ryan) and riscv (i.e. Alexandre) are cc'd on changes to
>>>> the common code. Otherwise, it's going to be a step backwards in terms
>>>> of maintainability.
>>>>>> Could we add something to MAINTAINERS so that the new file picks you both
>>>> up as reviewers?
>> That's fine with me. Lorenzo added me for some parts of MM this cycle anyway.
>>
>> Thanks,
>> Ryan
> Indeed :) happy to have you there Ryan!
>
>>> I'm adding Lorenzo as he is cleaning the mm MAINTAINERS entries.
>>>
>>> @Lorenzo: should we add a new section "CONTPTE" for this? FYI, hugetlb
>>> is the first patchset, I have another patchset to merge THP contpte
>>> support [1] as well so the "HUGETLB" section does not seem to be a
>>> good fit.
> Hm, this does seem to be very arm64-specific right?
>
> But having said that, literally can see risc v entries :)
>
> We are in a strange sort of scenario where there's some cross-over here.
>
> I don't strictly object to it though, this stuff is important and we should get
> the mm files absolutely under an appropriate MAINTAINER entry.
>
> So right now it seems the files would consist of:
>
> include/linux/hugetlb_contpte.h
> mm/hugetlb_contpte.c
>
> Is this correct?


For now, it is, yes. When this first series gets merged, I would come up 
with another series that will introduce other files for riscv to support 
thp contpte based on the arm64 implementation.


>
> Is this series intended to be taken by Andrew or through an arch tree?


I can pick it up in the riscv tree once I have Acked-by from arm64 
maintainers.


>
> And who would you sensibly propose for M's and R's?


Ryan is definitely a M, I would be happy to help as M too but if needed, 
a R is enough for me.


>
> If we are definitely adding things that sit outside hugetlb or anything
> arch-specific, and is in fact generic mm code, then yes this should be a
> section.
>
> Does contpte stand for 'Contiguous PTE'?


Yes, that's the name arm64 gave to this feature (more understandable 
than svnapot for the riscv feature).


>
> Then entry could perhaps be:
>
> MEMORY MANAGEMENT - CONTPTE (CONTIGUOUS PTE SUPPORT)
>
> I'd say this entry should probably be added as a patch in this series.
>
> If you give me a list of R's and M's and confirm those files I can very quickly
> copy/pasta from an existing entry and then you could respin (and cc my work mail
> for the series :P) and include that as an additional patch?


You can do that or I can do it on my own based on your previous patches, 
as you prefer.


>
> Happy to ACK that in that case.


Thanks for jumping in!

Alex


>
>
>>> [1] https://lore.kernel.org/linux-riscv/20240508191931.46060-1-alexghiti@rivosinc.com/
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>>> Will
> Cheers, Lorenzo
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-05-27  9:25               ` Alexandre Ghiti
@ 2025-05-27  9:37                 ` Lorenzo Stoakes
  2025-05-28 14:51                   ` Ryan Roberts
  0 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-05-27  9:37 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Ryan Roberts, Alexandre Ghiti, Will Deacon, Catalin Marinas,
	Mark Rutland, Matthew Wilcox, Paul Walmsley, Palmer Dabbelt,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-mm

Andrew - does taking this proposed MAINTAINERS change through the riscv tree
work for you?

This series introduces the files being added there, so it seems sensible to add
the MAINTAINERS change to this series.

And I believe this series is intended to be taken through the riscv tree so
seems sensible to do it there?

Proposed entry is 'MEMORY MANAGEMENT - CONTPTE (CONTIGUOUS PTE SUPPORT)', which
is explicitly relevant for arm64, riscv.

Thanks!

On Tue, May 27, 2025 at 11:25:57AM +0200, Alexandre Ghiti wrote:
> Hi Lorenzo,
>
> On 5/21/25 16:57, Lorenzo Stoakes wrote:
[snip]
> > So right now it seems the files would consist of:
> >
> > include/linux/hugetlb_contpte.h
> > mm/hugetlb_contpte.c
> >
> > Is this correct?
>
>
> For now, it is, yes. When this first series gets merged, I would come up
> with another series that will introduce other files for riscv to support thp
> contpte based on the arm64 implementation.

Cool!

>
>
> >
> > Is this series intended to be taken by Andrew or through an arch tree?
>
>
> I can pick it up in the riscv tree once I have Acked-by from arm64
> maintainers.

Have pinged Andrew above on this, you'd need an acked-by from mm people also of
course.

But I guess what makes sense is to take this as a patch in the next respin of
this series that actually introduces this stuff.

So if Andrew took it, he'd have to take the whole series I would say.

>
>
> >
> > And who would you sensibly propose for M's and R's?
>
>
> Ryan is definitely a M, I would be happy to help as M too but if needed, a R
> is enough for me.

Ryan understands this area better than I do, so I would say it's up to him as to
whether he thinks this makes sense.

>
>
> >
> > If we are definitely adding things that sit outside hugetlb or anything
> > arch-specific, and is in fact generic mm code, then yes this should be a
> > section.
> >
> > Does contpte stand for 'Contiguous PTE'?
>
>
> Yes, that's the name arm64 gave to this feature (more understandable than
> svnapot for the riscv feature).

Cheers!

svnapot, guys... what? :P

>
>
> >
> > Then entry could perhaps be:
> >
> > MEMORY MANAGEMENT - CONTPTE (CONTIGUOUS PTE SUPPORT)
> >
> > I'd say this entry should probably be added as a patch in this series.
> >
> > If you give me a list of R's and M's and confirm those files I can very quickly
> > copy/pasta from an existing entry and then you could respin (and cc my work mail
> > for the series :P) and include that as an additional patch?
>
>
> You can do that or I can do it on my own based on your previous patches, as
> you prefer.

I absolutely prefer you to do the work haha! ;)

Please cc- me on the next respin with this change in and I can take a look.

>
>
> >
> > Happy to ACK that in that case.
>
>
> Thanks for jumping in!

No problem!

>
> Alex
>
>
> >
> >
> > > > [1] https://lore.kernel.org/linux-riscv/20240508191931.46060-1-alexghiti@rivosinc.com/
> > > >
> > > > Thanks,
> > > >
> > > > Alex
> > > >
> > > > > Will
> > Cheers, Lorenzo
> >

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support
  2025-05-27  9:37                 ` Lorenzo Stoakes
@ 2025-05-28 14:51                   ` Ryan Roberts
  0 siblings, 0 replies; 22+ messages in thread
From: Ryan Roberts @ 2025-05-28 14:51 UTC (permalink / raw)
  To: Lorenzo Stoakes, Alexandre Ghiti
  Cc: Alexandre Ghiti, Will Deacon, Catalin Marinas, Mark Rutland,
	Matthew Wilcox, Paul Walmsley, Palmer Dabbelt, Andrew Morton,
	linux-arm-kernel, linux-kernel, linux-riscv, linux-mm

On 27/05/2025 10:37, Lorenzo Stoakes wrote:

[...]

>>>
>>> And who would you sensibly propose for M's and R's?
>>
>>
>> Ryan is definitely a M, I would be happy to help as M too but if needed, a R
>> is enough for me.
> 
> Ryan understands this area better than I do, so I would say it's up to him as to
> whether he thinks this makes sense.

I'd certainly like to be an R. I'd prefer not to sign up for M right now though,
unless there is nobody else willing to take it on.



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-05-28 14:51 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-21 13:06 [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Alexandre Ghiti
2025-03-21 13:06 ` [PATCH v5 1/9] riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes Alexandre Ghiti
2025-03-21 13:06 ` [PATCH v5 2/9] riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code Alexandre Ghiti
2025-03-21 13:06 ` [PATCH v5 3/9] mm: Use common huge_ptep_get() function for riscv/arm64 Alexandre Ghiti
2025-03-21 13:06 ` [PATCH v5 4/9] mm: Use common set_huge_pte_at() " Alexandre Ghiti
2025-03-21 13:06 ` [PATCH v5 5/9] mm: Use common huge_pte_clear() " Alexandre Ghiti
2025-03-21 13:06 ` [PATCH v5 6/9] mm: Use common huge_ptep_get_and_clear() " Alexandre Ghiti
2025-03-21 13:06 ` [PATCH v5 7/9] mm: Use common huge_ptep_set_access_flags() " Alexandre Ghiti
2025-03-21 13:06 ` [PATCH v5 8/9] mm: Use common huge_ptep_set_wrprotect() " Alexandre Ghiti
2025-03-21 13:06 ` [PATCH v5 9/9] mm: Use common huge_ptep_clear_flush() " Alexandre Ghiti
2025-03-21 17:24 ` [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support Christophe Leroy
2025-03-25 12:36   ` Alexandre Ghiti
2025-04-07 12:04 ` Alexandre Ghiti
2025-04-29 14:09   ` Ryan Roberts
2025-05-05 16:08     ` Alexandre Ghiti
2025-05-08 12:30       ` Will Deacon
2025-05-09 11:09         ` Alexandre Ghiti
2025-05-09 13:02           ` Ryan Roberts
2025-05-21 14:57             ` Lorenzo Stoakes
2025-05-27  9:25               ` Alexandre Ghiti
2025-05-27  9:37                 ` Lorenzo Stoakes
2025-05-28 14:51                   ` Ryan Roberts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).