linux-csky.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/12] Always call constructor for kernel page tables
@ 2025-04-08  9:52 Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 01/12] mm: Pass mm down to pagetable_{pte,pmd}_ctor Kevin Brodsky
                   ` (11 more replies)
  0 siblings, 12 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

There has been much confusion around exactly when page table
constructors/destructors (pagetable_*_[cd]tor) are supposed to be
called. They were initially introduced for user PTEs only (to support
split page table locks), then at the PMD level for the same purpose.
Accounting was added later on, starting at the PTE level and then moving
to higher levels (PMD, PUD). Finally, with my earlier series "Account
page tables at all levels" [1], the ctor/dtor is run for all levels, all
the way to PGD.

I thought this was the end of the story, and it hopefully is for user
pgtables, but I was wrong for what concerns kernel pgtables. The current
situation there makes very little sense:

* At the PTE level, the ctor/dtor is not called (at least in the generic
  implementation). Specific helpers are used for kernel pgtables at this
  level (pte_{alloc,free}_kernel()) and those have never called the
  ctor/dtor, most likely because they were initially irrelevant in the
  kernel case.

* At all other levels, the ctor/dtor is normally called. This is
  potentially wasteful at the PMD level (more on that later).

This series aims to ensure that the ctor/dtor is always called for kernel
pgtables, as it already is for user pgtables. Besides consistency, the
main motivation is to guarantee that ctor/dtor hooks are systematically
called; this makes it possible to insert hooks to protect page tables [2],
for instance. There is however an extra challenge: split locks are not
used for kernel pgtables, and it would therefore be wasteful to
initialise them (ptlock_init()).

It is worth clarifying exactly when split locks are used. They clearly
are for user pgtables, but as illustrated in commit 61444cde9170 ("ARM:
8591/1: mm: use fully constructed struct pages for EFI pgd
allocations"), they also are for special page tables like efi_mm. The
one case where split locks are definitely unused is pgtables owned by
init_mm; this is consistent with the behaviour of apply_to_pte_range().

The approach chosen in this series is therefore to pass the mm
associated to the pgtables being constructed to
pagetable_{pte,pmd}_ctor() (patch 1), and skip ptlock_init() if
mm == &init_mm (patch 3 and 7). This makes it possible to call the PTE
ctor/dtor from pte_{alloc,free}_kernel() without unintended consequences
(patch 3). As a result the accounting functions are now called at
all levels for kernel pgtables, and split locks are never initialised.

In configurations where ptlocks are dynamically allocated (32-bit,
PREEMPT_RT, etc.) and ARCH_ENABLE_SPLIT_PMD_PTLOCK is selected, this
series results in the removal of a kmem_cache allocation for every
kernel PMD. Additionally, for certain architectures that do not use
<asm-generic/pgalloc.h> such as s390, the same optimisation occurs at
the PTE level.

---

Things get more complicated when it comes to special pgtable allocators
(patch 8-12). All architectures need such allocators to create initial
kernel pgtables; we are not concerned with those as the ctor cannot be
called so early in the boot sequence. However, those allocators may also
be used later in the boot sequence or during normal operations. There
are two main use-cases:

1. Mapping EFI memory: efi_mm (arm, arm64, riscv)
2. arch_add_memory(): init_mm

The ctor is already explicitly run (at the PTE/PMD level) in the first
case, as required for pgtables that are not associated with init_mm.
However the same allocators may also be used for the second use-case (or
others), and this is where it gets messy. Patch 1 calls the ctor with
NULL as mm in those situations, as the actual mm isn't available.
Practically this means that ptlocks will be unconditionally initialised.
This is fine on arm - create_mapping_late() is only used for the EFI
mapping. On arm64, __create_pgd_mapping() is also used by
arch_add_memory(); patch 8/9/11 ensure that ctors are called at all
levels with the appropriate mm. The situation is similar on riscv, but
propagating the mm down to the ctor would require significant
refactoring. Since they are already called unconditionally, this series
leaves riscv no worse off - patch 10 adds comments to clarify the
situation.

From a cursory look at other architectures implementing
arch_add_memory(), s390 and x86 may also need a similar treatment to add
constructor calls. This is to be taken care of in a future version or as
a follow-up.

---

The complications in those special pgtable allocators beg the question:
does it really make sense to treat efi_mm and init_mm differently in
e.g. apply_to_pte_range()? Maybe what we really need is a way to tell if
an mm corresponds to user memory or not, and never use split locks for
non-user mm's. Feedback and suggestions welcome!

- Kevin

[1] https://lore.kernel.org/linux-mm/20250103184415.2744423-1-kevin.brodsky@arm.com/
[2] https://lore.kernel.org/linux-hardening/20250203101839.1223008-1-kevin.brodsky@arm.com/
---
Changelog

v1..v2:

- Added patch 2 to fix a BUG() on x86 caused by a missing dtor call at
  the PTE level.

- Patch 9: declared the new helpers with __maybe_unused to avoid a
  warning if CONFIG_MEMORY_HOTPLUG isn't selected.

v1: https://lore.kernel.org/linux-mm/20250317141700.3701581-1-kevin.brodsky@arm.com/
---
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-csky@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-openrisc@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: x86@kernel.org
---
Kevin Brodsky (12):
  mm: Pass mm down to pagetable_{pte,pmd}_ctor
  x86: pgtable: Always use pte_free_kernel()
  mm: Call ctor/dtor for kernel PTEs
  m68k: mm: Call ctor/dtor for kernel PTEs
  powerpc: mm: Call ctor/dtor for kernel PTEs
  sparc64: mm: Call ctor/dtor for kernel PTEs
  mm: Skip ptlock_init() for kernel PMDs
  arm64: mm: Use enum to identify pgtable level instead of *_SHIFT
  arm64: mm: Always call PTE/PMD ctor in __create_pgd_mapping()
  riscv: mm: Clarify ctor mm argument in alloc_{pte,pmd}_late
  arm64: mm: Call PUD/P4D ctor in __create_pgd_mapping()
  riscv: mm: Call PUD/P4D ctor in special kernel pgtable alloc

 arch/arm/mm/mmu.c                        |  2 +-
 arch/arm64/mm/mmu.c                      | 93 ++++++++++++++----------
 arch/csky/include/asm/pgalloc.h          |  2 +-
 arch/loongarch/include/asm/pgalloc.h     |  2 +-
 arch/m68k/include/asm/mcf_pgalloc.h      |  8 +-
 arch/m68k/include/asm/motorola_pgalloc.h | 10 +--
 arch/m68k/mm/motorola.c                  |  6 +-
 arch/microblaze/mm/pgtable.c             |  2 +-
 arch/mips/include/asm/pgalloc.h          |  2 +-
 arch/openrisc/mm/ioremap.c               |  2 +-
 arch/parisc/include/asm/pgalloc.h        |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c       |  2 +-
 arch/powerpc/mm/pgtable-frag.c           | 30 ++++----
 arch/riscv/mm/init.c                     | 26 ++++---
 arch/s390/include/asm/pgalloc.h          |  2 +-
 arch/s390/mm/pgalloc.c                   |  2 +-
 arch/sparc/mm/init_64.c                  | 29 ++++----
 arch/sparc/mm/srmmu.c                    |  2 +-
 arch/x86/mm/pgtable.c                    |  9 +--
 include/asm-generic/pgalloc.h            | 11 ++-
 include/linux/mm.h                       | 10 ++-
 21 files changed, 142 insertions(+), 112 deletions(-)


base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
-- 
2.47.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 01/12] mm: Pass mm down to pagetable_{pte,pmd}_ctor
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-25 16:34   ` Alexander Gordeev
  2025-04-08  9:52 ` [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel() Kevin Brodsky
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

In preparation for calling constructors for all kernel page tables
while eliding unnecessary ptlock initialisation, let's pass down the
associated mm to the PTE/PMD level ctors. (These are the two levels
where ptlocks are used.)

In most cases the mm is already around at the point of calling the
ctor so we simply pass it down. This is however not the case for
special page table allocators:

* arch/arm/mm/mmu.c
* arch/arm64/mm/mmu.c
* arch/riscv/mm/init.c

In those cases, the page tables being allocated are either for
standard kernel memory (init_mm) or special page directories, which
may not be associated to any mm. For now let's pass NULL as mm; this
will be refined where possible in future patches.

No functional change in this patch.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm/mm/mmu.c                        |  2 +-
 arch/arm64/mm/mmu.c                      |  4 ++--
 arch/loongarch/include/asm/pgalloc.h     |  2 +-
 arch/m68k/include/asm/mcf_pgalloc.h      |  2 +-
 arch/m68k/include/asm/motorola_pgalloc.h | 10 +++++-----
 arch/m68k/mm/motorola.c                  |  6 +++---
 arch/mips/include/asm/pgalloc.h          |  2 +-
 arch/parisc/include/asm/pgalloc.h        |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c       |  2 +-
 arch/powerpc/mm/pgtable-frag.c           |  2 +-
 arch/riscv/mm/init.c                     |  4 ++--
 arch/s390/include/asm/pgalloc.h          |  2 +-
 arch/s390/mm/pgalloc.c                   |  2 +-
 arch/sparc/mm/init_64.c                  |  2 +-
 arch/sparc/mm/srmmu.c                    |  2 +-
 arch/x86/mm/pgtable.c                    |  2 +-
 include/asm-generic/pgalloc.h            |  4 ++--
 include/linux/mm.h                       |  6 ++++--
 18 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index f02f872ea8a9..edb7f56b7c91 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -735,7 +735,7 @@ static void *__init late_alloc(unsigned long sz)
 	void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM,
 			get_order(sz));
 
-	if (!ptdesc || !pagetable_pte_ctor(ptdesc))
+	if (!ptdesc || !pagetable_pte_ctor(NULL, ptdesc))
 		BUG();
 	return ptdesc_to_virt(ptdesc);
 }
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ea6695d53fb9..8c5c471cfb06 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -494,9 +494,9 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
 	 * folded, and if so pagetable_pte_ctor() becomes nop.
 	 */
 	if (shift == PAGE_SHIFT)
-		BUG_ON(!pagetable_pte_ctor(ptdesc));
+		BUG_ON(!pagetable_pte_ctor(NULL, ptdesc));
 	else if (shift == PMD_SHIFT)
-		BUG_ON(!pagetable_pmd_ctor(ptdesc));
+		BUG_ON(!pagetable_pmd_ctor(NULL, ptdesc));
 
 	return pa;
 }
diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
index b58f587f0f0a..1c63a9d9a6d3 100644
--- a/arch/loongarch/include/asm/pgalloc.h
+++ b/arch/loongarch/include/asm/pgalloc.h
@@ -69,7 +69,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 	if (!ptdesc)
 		return NULL;
 
-	if (!pagetable_pmd_ctor(ptdesc)) {
+	if (!pagetable_pmd_ctor(mm, ptdesc)) {
 		pagetable_free(ptdesc);
 		return NULL;
 	}
diff --git a/arch/m68k/include/asm/mcf_pgalloc.h b/arch/m68k/include/asm/mcf_pgalloc.h
index 4c648b51e7fd..465a71101b7d 100644
--- a/arch/m68k/include/asm/mcf_pgalloc.h
+++ b/arch/m68k/include/asm/mcf_pgalloc.h
@@ -48,7 +48,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
 
 	if (!ptdesc)
 		return NULL;
-	if (!pagetable_pte_ctor(ptdesc)) {
+	if (!pagetable_pte_ctor(mm, ptdesc)) {
 		pagetable_free(ptdesc);
 		return NULL;
 	}
diff --git a/arch/m68k/include/asm/motorola_pgalloc.h b/arch/m68k/include/asm/motorola_pgalloc.h
index 5abe7da8ac5a..1091fb0affbe 100644
--- a/arch/m68k/include/asm/motorola_pgalloc.h
+++ b/arch/m68k/include/asm/motorola_pgalloc.h
@@ -15,7 +15,7 @@ enum m68k_table_types {
 };
 
 extern void init_pointer_table(void *table, int type);
-extern void *get_pointer_table(int type);
+extern void *get_pointer_table(struct mm_struct *mm, int type);
 extern int free_pointer_table(void *table, int type);
 
 /*
@@ -26,7 +26,7 @@ extern int free_pointer_table(void *table, int type);
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
-	return get_pointer_table(TABLE_PTE);
+	return get_pointer_table(mm, TABLE_PTE);
 }
 
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
@@ -36,7 +36,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 
 static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
 {
-	return get_pointer_table(TABLE_PTE);
+	return get_pointer_table(mm, TABLE_PTE);
 }
 
 static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
@@ -53,7 +53,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pgtable,
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 {
-	return get_pointer_table(TABLE_PMD);
+	return get_pointer_table(mm, TABLE_PMD);
 }
 
 static inline int pmd_free(struct mm_struct *mm, pmd_t *pmd)
@@ -75,7 +75,7 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	return get_pointer_table(TABLE_PGD);
+	return get_pointer_table(mm, TABLE_PGD);
 }
 
 
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 73651e093c4d..6ab3ef39ba7a 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -139,7 +139,7 @@ void __init init_pointer_table(void *table, int type)
 	return;
 }
 
-void *get_pointer_table(int type)
+void *get_pointer_table(struct mm_struct *mm, int type)
 {
 	ptable_desc *dp = ptable_list[type].next;
 	unsigned int mask = list_empty(&ptable_list[type]) ? 0 : PD_MARKBITS(dp);
@@ -164,10 +164,10 @@ void *get_pointer_table(int type)
 			 * m68k doesn't have SPLIT_PTE_PTLOCKS for not having
 			 * SMP.
 			 */
-			pagetable_pte_ctor(virt_to_ptdesc(page));
+			pagetable_pte_ctor(mm, virt_to_ptdesc(page));
 			break;
 		case TABLE_PMD:
-			pagetable_pmd_ctor(virt_to_ptdesc(page));
+			pagetable_pmd_ctor(mm, virt_to_ptdesc(page));
 			break;
 		case TABLE_PGD:
 			pagetable_pgd_ctor(virt_to_ptdesc(page));
diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index bbca420c96d3..942af87f1cdd 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -62,7 +62,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 	if (!ptdesc)
 		return NULL;
 
-	if (!pagetable_pmd_ctor(ptdesc)) {
+	if (!pagetable_pmd_ctor(mm, ptdesc)) {
 		pagetable_free(ptdesc);
 		return NULL;
 	}
diff --git a/arch/parisc/include/asm/pgalloc.h b/arch/parisc/include/asm/pgalloc.h
index 2ca74a56415c..3b84ee93edaa 100644
--- a/arch/parisc/include/asm/pgalloc.h
+++ b/arch/parisc/include/asm/pgalloc.h
@@ -39,7 +39,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 	ptdesc = pagetable_alloc(gfp, PMD_TABLE_ORDER);
 	if (!ptdesc)
 		return NULL;
-	if (!pagetable_pmd_ctor(ptdesc)) {
+	if (!pagetable_pmd_ctor(mm, ptdesc)) {
 		pagetable_free(ptdesc);
 		return NULL;
 	}
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 8f7d41ce2ca1..a282233c8785 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -422,7 +422,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
 	ptdesc = pagetable_alloc(gfp, 0);
 	if (!ptdesc)
 		return NULL;
-	if (!pagetable_pmd_ctor(ptdesc)) {
+	if (!pagetable_pmd_ctor(mm, ptdesc)) {
 		pagetable_free(ptdesc);
 		return NULL;
 	}
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 713268ccb1a0..387e9b1fe12c 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -61,7 +61,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
 		ptdesc = pagetable_alloc(PGALLOC_GFP | __GFP_ACCOUNT, 0);
 		if (!ptdesc)
 			return NULL;
-		if (!pagetable_pte_ctor(ptdesc)) {
+		if (!pagetable_pte_ctor(mm, ptdesc)) {
 			pagetable_free(ptdesc);
 			return NULL;
 		}
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index ab475ec6ca42..e5ef693fc778 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -442,7 +442,7 @@ static phys_addr_t __meminit alloc_pte_late(uintptr_t va)
 {
 	struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
 
-	BUG_ON(!ptdesc || !pagetable_pte_ctor(ptdesc));
+	BUG_ON(!ptdesc || !pagetable_pte_ctor(NULL, ptdesc));
 	return __pa((pte_t *)ptdesc_address(ptdesc));
 }
 
@@ -522,7 +522,7 @@ static phys_addr_t __meminit alloc_pmd_late(uintptr_t va)
 {
 	struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
 
-	BUG_ON(!ptdesc || !pagetable_pmd_ctor(ptdesc));
+	BUG_ON(!ptdesc || !pagetable_pmd_ctor(NULL, ptdesc));
 	return __pa((pmd_t *)ptdesc_address(ptdesc));
 }
 
diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 005497ffebda..5345398df653 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -97,7 +97,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long vmaddr)
 	if (!table)
 		return NULL;
 	crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
-	if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) {
+	if (!pagetable_pmd_ctor(mm, virt_to_ptdesc(table))) {
 		crst_table_free(mm, table);
 		return NULL;
 	}
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index e3a6f8ae156c..619d6917e3b7 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -145,7 +145,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 	ptdesc = pagetable_alloc(GFP_KERNEL, 0);
 	if (!ptdesc)
 		return NULL;
-	if (!pagetable_pte_ctor(ptdesc)) {
+	if (!pagetable_pte_ctor(mm, ptdesc)) {
 		pagetable_free(ptdesc);
 		return NULL;
 	}
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 760818950464..5c8eabda1d17 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2895,7 +2895,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
 
 	if (!ptdesc)
 		return NULL;
-	if (!pagetable_pte_ctor(ptdesc)) {
+	if (!pagetable_pte_ctor(mm, ptdesc)) {
 		pagetable_free(ptdesc);
 		return NULL;
 	}
diff --git a/arch/sparc/mm/srmmu.c b/arch/sparc/mm/srmmu.c
index dd32711022f5..f8fb4911d360 100644
--- a/arch/sparc/mm/srmmu.c
+++ b/arch/sparc/mm/srmmu.c
@@ -350,7 +350,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
 	page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
 	spin_lock(&mm->page_table_lock);
 	if (page_ref_inc_return(page) == 2 &&
-			!pagetable_pte_ctor(page_ptdesc(page))) {
+			!pagetable_pte_ctor(mm, page_ptdesc(page))) {
 		page_ref_dec(page);
 		ptep = NULL;
 	}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index a05fcddfc811..7930f234c5f6 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -205,7 +205,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 
 		if (!ptdesc)
 			failed = true;
-		if (ptdesc && !pagetable_pmd_ctor(ptdesc)) {
+		if (ptdesc && !pagetable_pmd_ctor(mm, ptdesc)) {
 			pagetable_free(ptdesc);
 			ptdesc = NULL;
 			failed = true;
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 892ece4558a2..e164ca66f0f6 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -70,7 +70,7 @@ static inline pgtable_t __pte_alloc_one_noprof(struct mm_struct *mm, gfp_t gfp)
 	ptdesc = pagetable_alloc_noprof(gfp, 0);
 	if (!ptdesc)
 		return NULL;
-	if (!pagetable_pte_ctor(ptdesc)) {
+	if (!pagetable_pte_ctor(mm, ptdesc)) {
 		pagetable_free(ptdesc);
 		return NULL;
 	}
@@ -137,7 +137,7 @@ static inline pmd_t *pmd_alloc_one_noprof(struct mm_struct *mm, unsigned long ad
 	ptdesc = pagetable_alloc_noprof(gfp, 0);
 	if (!ptdesc)
 		return NULL;
-	if (!pagetable_pmd_ctor(ptdesc)) {
+	if (!pagetable_pmd_ctor(mm, ptdesc)) {
 		pagetable_free(ptdesc);
 		return NULL;
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b7f13f087954..f9b793cce2c1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3100,7 +3100,8 @@ static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
 	pagetable_free(ptdesc);
 }
 
-static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
+static inline bool pagetable_pte_ctor(struct mm_struct *mm,
+				      struct ptdesc *ptdesc)
 {
 	if (!ptlock_init(ptdesc))
 		return false;
@@ -3206,7 +3207,8 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd)
 	return ptl;
 }
 
-static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
+static inline bool pagetable_pmd_ctor(struct mm_struct *mm,
+				      struct ptdesc *ptdesc)
 {
 	if (!pmd_ptlock_init(ptdesc))
 		return false;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 01/12] mm: Pass mm down to pagetable_{pte,pmd}_ctor Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-08 15:22   ` Dave Hansen
  2025-04-08  9:52 ` [PATCH v2 03/12] mm: Call ctor/dtor for kernel PTEs Kevin Brodsky
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

Page table pages are normally freed using the appropriate helper for
the given page table level. On x86, pud_free_pmd_page() and
pmd_free_pte_page() are an exception to the rule: they call
free_page() directly.

Constructor/destructor calls are about to be introduced for kernel
PTEs. To avoid missing dtor calls in those helpers, free the PTE
pages using pte_free_kernel() instead of free_page().

While at it also use pmd_free() instead of calling pagetable_dtor()
explicitly at the PMD level.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/x86/mm/pgtable.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 7930f234c5f6..1dee9bdbeea5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -809,14 +809,13 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 	for (i = 0; i < PTRS_PER_PMD; i++) {
 		if (!pmd_none(pmd_sv[i])) {
 			pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
-			free_page((unsigned long)pte);
+			pte_free_kernel(&init_mm, pte);
 		}
 	}
 
 	free_page((unsigned long)pmd_sv);
 
-	pagetable_dtor(virt_to_ptdesc(pmd));
-	free_page((unsigned long)pmd);
+	pmd_free(&init_mm, pmd);
 
 	return 1;
 }
@@ -839,7 +838,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 	/* INVLPG to clear all paging-structure caches */
 	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
 
-	free_page((unsigned long)pte);
+	pte_free_kernel(&init_mm, pte);
 
 	return 1;
 }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 03/12] mm: Call ctor/dtor for kernel PTEs
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 01/12] mm: Pass mm down to pagetable_{pte,pmd}_ctor Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel() Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-25 16:35   ` Alexander Gordeev
  2025-04-08  9:52 ` [PATCH v2 04/12] m68k: " Kevin Brodsky
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

Since [1], constructors/destructors are expected to be called for
all page table pages, at all levels and for both user and kernel
pgtables. There is however one glaring exception: kernel PTEs are
managed via separate helpers (pte_alloc_kernel/pte_free_kernel),
which do not call the [cd]tor, at least not in the generic
implementation.

The most obvious reason for this anomaly is that init_mm is
special-cased not to use split page table locks. As a result calling
ptlock_init() for PTEs associated with init_mm would be wasteful,
potentially resulting in dynamic memory allocation. However, pgtable
[cd]tors perform other actions - currently related to
accounting/statistics, and potentially more functionally significant
in the future.

Now that pagetable_pte_ctor() is passed the associated mm, we can
make it skip the call to ptlock_init() for init_mm; this allows us
to call the ctor from pte_alloc_one_kernel() too. This is matched by
a call to the pgtable destructor in pte_free_kernel(); no
special-casing is needed on that path, as ptlock_free() is already
called unconditionally. (ptlock_free() is a no-op unless a ptlock
was allocated for the given PTP.)

This patch ensures that all architectures that rely on
<asm-generic/pgalloc.h> call the [cd]tor for kernel PTEs.
pte_free_kernel() cannot be overridden so changing the generic
implementation is sufficient. pte_alloc_one_kernel() can be
overridden using __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL, and a few
architectures implement it by calling the page allocator directly.
We amend those so that they call the generic
__pte_alloc_one_kernel() instead, if possible, ensuring that the
ctor is called.

A few architectures do not use <asm-generic/pgalloc.h>; those will
be taken care of separately.

[1] https://lore.kernel.org/linux-mm/20250103184415.2744423-1-kevin.brodsky@arm.com/

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/csky/include/asm/pgalloc.h | 2 +-
 arch/microblaze/mm/pgtable.c    | 2 +-
 arch/openrisc/mm/ioremap.c      | 2 +-
 include/asm-generic/pgalloc.h   | 7 ++++++-
 include/linux/mm.h              | 2 +-
 5 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
index 11055c574968..9ed2b15ffd94 100644
--- a/arch/csky/include/asm/pgalloc.h
+++ b/arch/csky/include/asm/pgalloc.h
@@ -29,7 +29,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 	unsigned long i;
 
-	pte = (pte_t *) __get_free_page(GFP_KERNEL);
+	pte = __pte_alloc_one_kernel(mm);
 	if (!pte)
 		return NULL;
 
diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index 9f73265aad4e..e96dd1b7aba4 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -245,7 +245,7 @@ unsigned long iopa(unsigned long addr)
 __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
 	if (mem_init_done)
-		return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		return __pte_alloc_one_kernel(mm);
 	else
 		return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
 					      MEMBLOCK_LOW_LIMIT,
diff --git a/arch/openrisc/mm/ioremap.c b/arch/openrisc/mm/ioremap.c
index 8e63e86251ca..3b352f97fecb 100644
--- a/arch/openrisc/mm/ioremap.c
+++ b/arch/openrisc/mm/ioremap.c
@@ -36,7 +36,7 @@ pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 
 	if (likely(mem_init_done)) {
-		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+		pte = __pte_alloc_one_kernel(mm);
 	} else {
 		pte = memblock_alloc_or_panic(PAGE_SIZE, PAGE_SIZE);
 	}
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index e164ca66f0f6..3c8ec3bfea44 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -23,6 +23,11 @@ static inline pte_t *__pte_alloc_one_kernel_noprof(struct mm_struct *mm)
 
 	if (!ptdesc)
 		return NULL;
+	if (!pagetable_pte_ctor(mm, ptdesc)) {
+		pagetable_free(ptdesc);
+		return NULL;
+	}
+
 	return ptdesc_address(ptdesc);
 }
 #define __pte_alloc_one_kernel(...)	alloc_hooks(__pte_alloc_one_kernel_noprof(__VA_ARGS__))
@@ -48,7 +53,7 @@ static inline pte_t *pte_alloc_one_kernel_noprof(struct mm_struct *mm)
  */
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-	pagetable_free(virt_to_ptdesc(pte));
+	pagetable_dtor_free(virt_to_ptdesc(pte));
 }
 
 /**
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f9b793cce2c1..3f48e449574a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3103,7 +3103,7 @@ static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
 static inline bool pagetable_pte_ctor(struct mm_struct *mm,
 				      struct ptdesc *ptdesc)
 {
-	if (!ptlock_init(ptdesc))
+	if (mm != &init_mm && !ptlock_init(ptdesc))
 		return false;
 	__pagetable_ctor(ptdesc);
 	return true;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 04/12] m68k: mm: Call ctor/dtor for kernel PTEs
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
                   ` (2 preceding siblings ...)
  2025-04-08  9:52 ` [PATCH v2 03/12] mm: Call ctor/dtor for kernel PTEs Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 05/12] powerpc: " Kevin Brodsky
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

The generic implementation of pte_{alloc_one,free}_kernel now calls
the [cd]tor. Align the m68k/ColdFire implementation of those
functions by calling the [cd]tor explicitly.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/m68k/include/asm/mcf_pgalloc.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/m68k/include/asm/mcf_pgalloc.h b/arch/m68k/include/asm/mcf_pgalloc.h
index 465a71101b7d..fc5454d37da3 100644
--- a/arch/m68k/include/asm/mcf_pgalloc.h
+++ b/arch/m68k/include/asm/mcf_pgalloc.h
@@ -7,7 +7,7 @@
 
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-	pagetable_free(virt_to_ptdesc(pte));
+	pagetable_dtor_free(virt_to_ptdesc(pte));
 }
 
 extern const char bad_pmd_string[];
@@ -19,6 +19,10 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 
 	if (!ptdesc)
 		return NULL;
+	if (!pagetable_pte_ctor(mm, ptdesc)) {
+		pagetable_free(ptdesc);
+		return NULL;
+	}
 
 	return ptdesc_address(ptdesc);
 }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 05/12] powerpc: mm: Call ctor/dtor for kernel PTEs
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
                   ` (3 preceding siblings ...)
  2025-04-08  9:52 ` [PATCH v2 04/12] m68k: " Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 06/12] sparc64: " Kevin Brodsky
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

The generic implementation of pte_{alloc_one,free}_kernel now calls
the [cd]tor, without initialising the ptlock needlessly as
pagetable_pte_ctor() skips it for init_mm.

On powerpc, all functions related to PTE allocation are implemented
by common helpers, which are passed a boolean to differentiate user
from kernel pgtables. This patch aligns the powerpc implementation
with the generic one by calling pagetable_pte_[cd]tor()
unconditionally in those helpers.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/powerpc/mm/pgtable-frag.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 387e9b1fe12c..77e55eac16e4 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -56,19 +56,17 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
 {
 	void *ret = NULL;
 	struct ptdesc *ptdesc;
+	gfp_t gfp = PGALLOC_GFP;
 
-	if (!kernel) {
-		ptdesc = pagetable_alloc(PGALLOC_GFP | __GFP_ACCOUNT, 0);
-		if (!ptdesc)
-			return NULL;
-		if (!pagetable_pte_ctor(mm, ptdesc)) {
-			pagetable_free(ptdesc);
-			return NULL;
-		}
-	} else {
-		ptdesc = pagetable_alloc(PGALLOC_GFP, 0);
-		if (!ptdesc)
-			return NULL;
+	if (!kernel)
+		gfp |= __GFP_ACCOUNT;
+
+	ptdesc = pagetable_alloc(gfp, 0);
+	if (!ptdesc)
+		return NULL;
+	if (!pagetable_pte_ctor(mm, ptdesc)) {
+		pagetable_free(ptdesc);
+		return NULL;
 	}
 
 	atomic_set(&ptdesc->pt_frag_refcount, 1);
@@ -124,12 +122,10 @@ void pte_fragment_free(unsigned long *table, int kernel)
 
 	BUG_ON(atomic_read(&ptdesc->pt_frag_refcount) <= 0);
 	if (atomic_dec_and_test(&ptdesc->pt_frag_refcount)) {
-		if (kernel)
-			pagetable_free(ptdesc);
-		else if (folio_test_clear_active(ptdesc_folio(ptdesc)))
-			call_rcu(&ptdesc->pt_rcu_head, pte_free_now);
-		else
+		if (kernel || !folio_test_clear_active(ptdesc_folio(ptdesc)))
 			pte_free_now(&ptdesc->pt_rcu_head);
+		else
+			call_rcu(&ptdesc->pt_rcu_head, pte_free_now);
 	}
 }
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 06/12] sparc64: mm: Call ctor/dtor for kernel PTEs
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
                   ` (4 preceding siblings ...)
  2025-04-08  9:52 ` [PATCH v2 05/12] powerpc: " Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 07/12] mm: Skip ptlock_init() for kernel PMDs Kevin Brodsky
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

The generic implementation of pte_{alloc_one,free}_kernel now calls
the [cd]tor, without initialising the ptlock needlessly as
pagetable_pte_ctor() skips it for init_mm.

Align sparc64 with the generic implementation by ensuring
pagetable_pte_[cd]tor() are called for kernel PTEs. As a result
the kernel and user alloc/free functions have the same
implementation, and since pgtable_t is defined as pte_t *, we can
have both call a common helper.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/sparc/mm/init_64.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 5c8eabda1d17..25ae4c897aae 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2878,18 +2878,7 @@ void __flush_tlb_all(void)
 			     : : "r" (pstate));
 }
 
-pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
-{
-	struct page *page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	pte_t *pte = NULL;
-
-	if (page)
-		pte = (pte_t *) page_address(page);
-
-	return pte;
-}
-
-pgtable_t pte_alloc_one(struct mm_struct *mm)
+static pte_t *__pte_alloc_one(struct mm_struct *mm)
 {
 	struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL | __GFP_ZERO, 0);
 
@@ -2902,9 +2891,14 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
 	return ptdesc_address(ptdesc);
 }
 
-void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
-	free_page((unsigned long)pte);
+	return __pte_alloc_one(mm);
+}
+
+pgtable_t pte_alloc_one(struct mm_struct *mm)
+{
+	return __pte_alloc_one(mm);
 }
 
 static void __pte_free(pgtable_t pte)
@@ -2915,6 +2909,11 @@ static void __pte_free(pgtable_t pte)
 	pagetable_free(ptdesc);
 }
 
+void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+{
+	__pte_free(pte);
+}
+
 void pte_free(struct mm_struct *mm, pgtable_t pte)
 {
 	__pte_free(pte);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 07/12] mm: Skip ptlock_init() for kernel PMDs
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
                   ` (5 preceding siblings ...)
  2025-04-08  9:52 ` [PATCH v2 06/12] sparc64: " Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 08/12] arm64: mm: Use enum to identify pgtable level instead of *_SHIFT Kevin Brodsky
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

Split page table locks are not used for pgtables associated to
init_mm, at any level. pte_alloc_kernel() does not call
ptlock_init() as a result. There is however no separate alloc/free
functions for kernel PMDs, and pmd_ptlock_init() is called
unconditionally. When ALLOC_SPLIT_PTLOCKS is true (e.g. 32-bit
architectures or if CONFIG_PREEMPT_RT is selected), this results in
unnecessary dynamic memory allocation every time a kernel PMD is
allocated.

Now that pagetable_pmd_ctor() is passed the associated mm, we can
easily remove this overhead by skipping pmd_ptlock_init() if the
pgtable is associated to init_mm. No special-casing is needed on the
dtor path, as ptlock_free() is already called unconditionally for
all levels. (ptlock_free() is a no-op unless a ptlock was allocated
for the given PTP.)

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 include/linux/mm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3f48e449574a..ef420f4dc72c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3210,7 +3210,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd)
 static inline bool pagetable_pmd_ctor(struct mm_struct *mm,
 				      struct ptdesc *ptdesc)
 {
-	if (!pmd_ptlock_init(ptdesc))
+	if (mm != &init_mm && !pmd_ptlock_init(ptdesc))
 		return false;
 	ptdesc_pmd_pts_init(ptdesc);
 	__pagetable_ctor(ptdesc);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 08/12] arm64: mm: Use enum to identify pgtable level instead of *_SHIFT
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
                   ` (6 preceding siblings ...)
  2025-04-08  9:52 ` [PATCH v2 07/12] mm: Skip ptlock_init() for kernel PMDs Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 09/12] arm64: mm: Always call PTE/PMD ctor in __create_pgd_mapping() Kevin Brodsky
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

Commit 90292aca9854 ("arm64: mm: use appropriate ctors for page
tables") introduced pgtable ctor calls in pgd_pgtable_alloc(). To
identify the pgtable level and call the appropriate ctor, the
*_SHIFT value associated with the pgtable level is used. However,
those values do not unambiguously identify a level, because if a
given level is folded, the *_SHIFT value will be equal to that of
the upper level (e.g. PMD_SHIFT == PUD_SHIFT if PMD is folded).

As things stand, there is probably not much damaged done by calling
the ctor for a different level, and ARCH_ENABLE_SPLIT_PMD_PTLOCK is
only selected if PMD isn't folded (so we don't needlessly initialise
pmd_ptlock). Still, this is pretty confusing, and it would get even
more confusing when adding ctor calls for the remaining levels.

Let's simplify all this by using an enum to identify the pgtable
level instead; this way folding becomes irrelevant. This is inspired
by one of the m68k pgtable allocators
(arch/m68k/include/asm/motorola_pgalloc.h).

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/mm/mmu.c | 54 +++++++++++++++++++++++++++------------------
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8c5c471cfb06..eca324b3a6fc 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -46,6 +46,13 @@
 #define NO_CONT_MAPPINGS	BIT(1)
 #define NO_EXEC_MAPPINGS	BIT(2)	/* assumes FEAT_HPDS is not used */
 
+enum pgtable_type {
+	TABLE_PTE,
+	TABLE_PMD,
+	TABLE_PUD,
+	TABLE_P4D,
+};
+
 u64 kimage_voffset __ro_after_init;
 EXPORT_SYMBOL(kimage_voffset);
 
@@ -107,7 +114,7 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 }
 EXPORT_SYMBOL(phys_mem_access_prot);
 
-static phys_addr_t __init early_pgtable_alloc(int shift)
+static phys_addr_t __init early_pgtable_alloc(enum pgtable_type pgtable_type)
 {
 	phys_addr_t phys;
 
@@ -192,7 +199,7 @@ static void init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
 static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 				unsigned long end, phys_addr_t phys,
 				pgprot_t prot,
-				phys_addr_t (*pgtable_alloc)(int),
+				phys_addr_t (*pgtable_alloc)(enum pgtable_type),
 				int flags)
 {
 	unsigned long next;
@@ -207,7 +214,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 		if (flags & NO_EXEC_MAPPINGS)
 			pmdval |= PMD_TABLE_PXN;
 		BUG_ON(!pgtable_alloc);
-		pte_phys = pgtable_alloc(PAGE_SHIFT);
+		pte_phys = pgtable_alloc(TABLE_PTE);
 		ptep = pte_set_fixmap(pte_phys);
 		init_clear_pgtable(ptep);
 		ptep += pte_index(addr);
@@ -243,7 +250,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 
 static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
 		     phys_addr_t phys, pgprot_t prot,
-		     phys_addr_t (*pgtable_alloc)(int), int flags)
+		     phys_addr_t (*pgtable_alloc)(enum pgtable_type), int flags)
 {
 	unsigned long next;
 
@@ -277,7 +284,8 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
 static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 				unsigned long end, phys_addr_t phys,
 				pgprot_t prot,
-				phys_addr_t (*pgtable_alloc)(int), int flags)
+				phys_addr_t (*pgtable_alloc)(enum pgtable_type),
+				int flags)
 {
 	unsigned long next;
 	pud_t pud = READ_ONCE(*pudp);
@@ -294,7 +302,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 		if (flags & NO_EXEC_MAPPINGS)
 			pudval |= PUD_TABLE_PXN;
 		BUG_ON(!pgtable_alloc);
-		pmd_phys = pgtable_alloc(PMD_SHIFT);
+		pmd_phys = pgtable_alloc(TABLE_PMD);
 		pmdp = pmd_set_fixmap(pmd_phys);
 		init_clear_pgtable(pmdp);
 		pmdp += pmd_index(addr);
@@ -325,7 +333,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 
 static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
 			   phys_addr_t phys, pgprot_t prot,
-			   phys_addr_t (*pgtable_alloc)(int),
+			   phys_addr_t (*pgtable_alloc)(enum pgtable_type),
 			   int flags)
 {
 	unsigned long next;
@@ -339,7 +347,7 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
 		if (flags & NO_EXEC_MAPPINGS)
 			p4dval |= P4D_TABLE_PXN;
 		BUG_ON(!pgtable_alloc);
-		pud_phys = pgtable_alloc(PUD_SHIFT);
+		pud_phys = pgtable_alloc(TABLE_PUD);
 		pudp = pud_set_fixmap(pud_phys);
 		init_clear_pgtable(pudp);
 		pudp += pud_index(addr);
@@ -383,7 +391,7 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
 
 static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
 			   phys_addr_t phys, pgprot_t prot,
-			   phys_addr_t (*pgtable_alloc)(int),
+			   phys_addr_t (*pgtable_alloc)(enum pgtable_type),
 			   int flags)
 {
 	unsigned long next;
@@ -397,7 +405,7 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
 		if (flags & NO_EXEC_MAPPINGS)
 			pgdval |= PGD_TABLE_PXN;
 		BUG_ON(!pgtable_alloc);
-		p4d_phys = pgtable_alloc(P4D_SHIFT);
+		p4d_phys = pgtable_alloc(TABLE_P4D);
 		p4dp = p4d_set_fixmap(p4d_phys);
 		init_clear_pgtable(p4dp);
 		p4dp += p4d_index(addr);
@@ -427,7 +435,7 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
 static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
 					unsigned long virt, phys_addr_t size,
 					pgprot_t prot,
-					phys_addr_t (*pgtable_alloc)(int),
+					phys_addr_t (*pgtable_alloc)(enum pgtable_type),
 					int flags)
 {
 	unsigned long addr, end, next;
@@ -455,7 +463,7 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
 static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 				 unsigned long virt, phys_addr_t size,
 				 pgprot_t prot,
-				 phys_addr_t (*pgtable_alloc)(int),
+				 phys_addr_t (*pgtable_alloc)(enum pgtable_type),
 				 int flags)
 {
 	mutex_lock(&fixmap_lock);
@@ -468,10 +476,11 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 extern __alias(__create_pgd_mapping_locked)
 void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
 			     phys_addr_t size, pgprot_t prot,
-			     phys_addr_t (*pgtable_alloc)(int), int flags);
+			     phys_addr_t (*pgtable_alloc)(enum pgtable_type),
+			     int flags);
 #endif
 
-static phys_addr_t __pgd_pgtable_alloc(int shift)
+static phys_addr_t __pgd_pgtable_alloc(enum pgtable_type pgtable_type)
 {
 	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
 	void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL & ~__GFP_ZERO);
@@ -480,23 +489,26 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
 	return __pa(ptr);
 }
 
-static phys_addr_t pgd_pgtable_alloc(int shift)
+static phys_addr_t pgd_pgtable_alloc(enum pgtable_type pgtable_type)
 {
-	phys_addr_t pa = __pgd_pgtable_alloc(shift);
+	phys_addr_t pa = __pgd_pgtable_alloc(pgtable_type);
 	struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
 
 	/*
 	 * Call proper page table ctor in case later we need to
 	 * call core mm functions like apply_to_page_range() on
 	 * this pre-allocated page table.
-	 *
-	 * We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
-	 * folded, and if so pagetable_pte_ctor() becomes nop.
 	 */
-	if (shift == PAGE_SHIFT)
+	switch (pgtable_type) {
+	case TABLE_PTE:
 		BUG_ON(!pagetable_pte_ctor(NULL, ptdesc));
-	else if (shift == PMD_SHIFT)
+		break;
+	case TABLE_PMD:
 		BUG_ON(!pagetable_pmd_ctor(NULL, ptdesc));
+		break;
+	default:
+		break;
+	}
 
 	return pa;
 }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 09/12] arm64: mm: Always call PTE/PMD ctor in __create_pgd_mapping()
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
                   ` (7 preceding siblings ...)
  2025-04-08  9:52 ` [PATCH v2 08/12] arm64: mm: Use enum to identify pgtable level instead of *_SHIFT Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 10/12] riscv: mm: Clarify ctor mm argument in alloc_{pte,pmd}_late Kevin Brodsky
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

TL;DR: always call the PTE/PMD ctor, passing the appropriate mm to
skip ptlock_init() if unneeded.

__create_pgd_mapping() is used for creating different kinds of
mappings, and may allocate page table pages if passed an allocator
callback. There are currently three such cases:

1. create_pgd_mapping(), which is used to create the EFI mapping
2. arch_add_memory()
3. map_entry_trampoline()

1. uses pgd_pgtable_alloc() as allocator callback, which calls the
PTE/PMD ctor, while 2. and 3. use __pgd_pgtable_alloc(), which does
not. The rationale is most likely that pgtables associated with
init_mm do not make use of split page table locks, and it is
therefore unnecessary to initialise them by calling the ctor. 2.
operates on swapper_pg_dir so the allocated pgtables are clearly
associated with init_mm, this is arguably the case for 3. too (the
trampoline mapping is never modified so ptlocks are anyway
irrelevant). 1. corresponds to efi_mm so ptlocks need to be
initialised in that case.

We are now moving towards calling the ctor for all page tables, even
those associated with init_mm. pagetable_{pte,pmd}_ctor() have
become aware of the associated mm so that the ptlock initialisation
can be skipped for init_mm. This patch therefore amends the
allocator callbacks so that the PTE/PMD ctor are always called, with
an appropriate mm pointer to avoid unnecessary ptlock overhead.

Modifying the prototype of the allocator callbacks to take the mm
and propagating that pointer all the way down would be pretty
invasive. Instead:

* __pgd_pgtable_alloc() (cases 2. and 3. above) is replaced with
  pgd_pgtable_alloc_init_mm(), resulting in the ctors being called
  with &init_mm. This is the main functional change in this patch;
  the ptlock still isn't initialised, but other ctor actions (e.g.
  accounting-related) are now carried out for those allocated
  pgtables.

* pgd_pgtable_alloc() (case 1. above) is replaced with
  pgd_pgtable_alloc_special_mm(), resulting in the ctors being
  called with NULL as mm. No functional change here; NULL
  essentially means "not init_mm", and the ptlock is still
  initialised.

__pgd_pgtable_alloc() is now the common implementation of those two
helpers. While at it we switch it to using pagetable_alloc() like
standard pgtable allocator functions and remove the comment
regarding ctor calls (ctors are now always expected to be called).

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/mm/mmu.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index eca324b3a6fc..51cfc891f6a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -480,31 +480,22 @@ void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
 			     int flags);
 #endif
 
-static phys_addr_t __pgd_pgtable_alloc(enum pgtable_type pgtable_type)
+static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm,
+				       enum pgtable_type pgtable_type)
 {
 	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
-	void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL & ~__GFP_ZERO);
+	struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_ZERO, 0);
+	phys_addr_t pa;
 
-	BUG_ON(!ptr);
-	return __pa(ptr);
-}
-
-static phys_addr_t pgd_pgtable_alloc(enum pgtable_type pgtable_type)
-{
-	phys_addr_t pa = __pgd_pgtable_alloc(pgtable_type);
-	struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
+	BUG_ON(!ptdesc);
+	pa = page_to_phys(ptdesc_page(ptdesc));
 
-	/*
-	 * Call proper page table ctor in case later we need to
-	 * call core mm functions like apply_to_page_range() on
-	 * this pre-allocated page table.
-	 */
 	switch (pgtable_type) {
 	case TABLE_PTE:
-		BUG_ON(!pagetable_pte_ctor(NULL, ptdesc));
+		BUG_ON(!pagetable_pte_ctor(mm, ptdesc));
 		break;
 	case TABLE_PMD:
-		BUG_ON(!pagetable_pmd_ctor(NULL, ptdesc));
+		BUG_ON(!pagetable_pmd_ctor(mm, ptdesc));
 		break;
 	default:
 		break;
@@ -513,6 +504,18 @@ static phys_addr_t pgd_pgtable_alloc(enum pgtable_type pgtable_type)
 	return pa;
 }
 
+static phys_addr_t __maybe_unused
+pgd_pgtable_alloc_init_mm(enum pgtable_type pgtable_type)
+{
+	return __pgd_pgtable_alloc(&init_mm, pgtable_type);
+}
+
+static phys_addr_t
+pgd_pgtable_alloc_special_mm(enum pgtable_type pgtable_type)
+{
+	return __pgd_pgtable_alloc(NULL, pgtable_type);
+}
+
 /*
  * This function can only be used to modify existing table entries,
  * without allocating new levels of table. Note that this permits the
@@ -542,7 +545,7 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
 	__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
-			     pgd_pgtable_alloc, flags);
+			     pgd_pgtable_alloc_special_mm, flags);
 }
 
 static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
@@ -756,7 +759,7 @@ static int __init map_entry_trampoline(void)
 	memset(tramp_pg_dir, 0, PGD_SIZE);
 	__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS,
 			     entry_tramp_text_size(), prot,
-			     __pgd_pgtable_alloc, NO_BLOCK_MAPPINGS);
+			     pgd_pgtable_alloc_init_mm, NO_BLOCK_MAPPINGS);
 
 	/* Map both the text and data into the kernel page table */
 	for (i = 0; i < DIV_ROUND_UP(entry_tramp_text_size(), PAGE_SIZE); i++)
@@ -1362,7 +1365,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
 		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
 	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
-			     size, params->pgprot, __pgd_pgtable_alloc,
+			     size, params->pgprot, pgd_pgtable_alloc_init_mm,
 			     flags);
 
 	memblock_clear_nomap(start, size);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 10/12] riscv: mm: Clarify ctor mm argument in alloc_{pte,pmd}_late
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
                   ` (8 preceding siblings ...)
  2025-04-08  9:52 ` [PATCH v2 09/12] arm64: mm: Always call PTE/PMD ctor in __create_pgd_mapping() Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 11/12] arm64: mm: Call PUD/P4D ctor in __create_pgd_mapping() Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 12/12] riscv: mm: Call PUD/P4D ctor in special kernel pgtable alloc Kevin Brodsky
  11 siblings, 0 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

pagetable_{pte,pmd}_ctor(mm, ptdesc) skip the ptlock initialisation
if mm is &init_mm. To avoid unnecessary overhead, it is therefore
preferable to pass the actual mm associated to the PTE/PMD.

Unfortunately, this proves challenging for alloc_{pte,pmd}_late() as
the associated mm is not available at the point where they are
called - in fact not even top-level functions like
create_pgd_mapping() are passed the mm. As a result they both call
the ctor with NULL as mm; this is safe but potentially wasteful.

This is not a new situation, but let's add a couple of comments to
clarify it.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/riscv/mm/init.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index e5ef693fc778..59a982f88908 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -442,6 +442,11 @@ static phys_addr_t __meminit alloc_pte_late(uintptr_t va)
 {
 	struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
 
+	/*
+	 * We do not know which mm the PTE page is associated to at this point.
+	 * Passing NULL to the ctor is the safe option, though it may result
+	 * in unnecessary work (e.g. initialising the ptlock for init_mm).
+	 */
 	BUG_ON(!ptdesc || !pagetable_pte_ctor(NULL, ptdesc));
 	return __pa((pte_t *)ptdesc_address(ptdesc));
 }
@@ -522,6 +527,7 @@ static phys_addr_t __meminit alloc_pmd_late(uintptr_t va)
 {
 	struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
 
+	/* See comment in alloc_pte_late() regarding NULL passed the ctor */
 	BUG_ON(!ptdesc || !pagetable_pmd_ctor(NULL, ptdesc));
 	return __pa((pmd_t *)ptdesc_address(ptdesc));
 }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 11/12] arm64: mm: Call PUD/P4D ctor in __create_pgd_mapping()
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
                   ` (9 preceding siblings ...)
  2025-04-08  9:52 ` [PATCH v2 10/12] riscv: mm: Clarify ctor mm argument in alloc_{pte,pmd}_late Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  2025-04-08  9:52 ` [PATCH v2 12/12] riscv: mm: Call PUD/P4D ctor in special kernel pgtable alloc Kevin Brodsky
  11 siblings, 0 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

Constructors for PUD/P4D-level pgtables were recently introduced.
They should be called for all pgtables; make sure they are called
for special kernel mappings created by __create_pgd_mapping() too.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/mm/mmu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 51cfc891f6a1..8fcf59ba39db 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -497,7 +497,11 @@ static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm,
 	case TABLE_PMD:
 		BUG_ON(!pagetable_pmd_ctor(mm, ptdesc));
 		break;
-	default:
+	case TABLE_PUD:
+		pagetable_pud_ctor(ptdesc);
+		break;
+	case TABLE_P4D:
+		pagetable_p4d_ctor(ptdesc);
 		break;
 	}
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 12/12] riscv: mm: Call PUD/P4D ctor in special kernel pgtable alloc
  2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
                   ` (10 preceding siblings ...)
  2025-04-08  9:52 ` [PATCH v2 11/12] arm64: mm: Call PUD/P4D ctor in __create_pgd_mapping() Kevin Brodsky
@ 2025-04-08  9:52 ` Kevin Brodsky
  11 siblings, 0 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-08  9:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kevin Brodsky, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport (IBM), Palmer Dabbelt, Paul Walmsley,
	Peter Zijlstra, Qi Zheng, Ryan Roberts, Will Deacon, Yang Shi,
	linux-arch, linux-arm-kernel, linux-csky, linux-m68k,
	linux-openrisc, linux-riscv, linux-s390, linuxppc-dev, sparclinux,
	x86

Constructors for PUD/P4D-level pgtables were recently introduced.
They should be called for all pgtables; make sure they are called
for special kernel mappings created by create_pgd_mapping() too.

While at it also switch to using pagetable_alloc() like
in alloc_{pte,pmd}_late().

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/riscv/mm/init.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 59a982f88908..8d0374d7ce8e 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -590,11 +590,11 @@ static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
 
 static phys_addr_t __meminit alloc_pud_late(uintptr_t va)
 {
-	unsigned long vaddr;
+	struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
 
-	vaddr = __get_free_page(GFP_KERNEL);
-	BUG_ON(!vaddr);
-	return __pa(vaddr);
+	BUG_ON(!ptdesc);
+	pagetable_pud_ctor(ptdesc);
+	return __pa((pud_t *)ptdesc_address(ptdesc));
 }
 
 static p4d_t *__init get_p4d_virt_early(phys_addr_t pa)
@@ -628,11 +628,11 @@ static phys_addr_t __init alloc_p4d_fixmap(uintptr_t va)
 
 static phys_addr_t __meminit alloc_p4d_late(uintptr_t va)
 {
-	unsigned long vaddr;
+	struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
 
-	vaddr = __get_free_page(GFP_KERNEL);
-	BUG_ON(!vaddr);
-	return __pa(vaddr);
+	BUG_ON(!ptdesc);
+	pagetable_p4d_ctor(ptdesc);
+	return __pa((p4d_t *)ptdesc_address(ptdesc));
 }
 
 static void __meminit create_pud_mapping(pud_t *pudp, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()
  2025-04-08  9:52 ` [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel() Kevin Brodsky
@ 2025-04-08 15:22   ` Dave Hansen
  2025-04-08 16:37     ` Matthew Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Hansen @ 2025-04-08 15:22 UTC (permalink / raw)
  To: Kevin Brodsky, linux-mm
  Cc: linux-kernel, Albert Ou, Andreas Larsson, Andrew Morton,
	Catalin Marinas, Dave Hansen, David S. Miller, Geert Uytterhoeven,
	Linus Walleij, Madhavan Srinivasan, Mark Rutland, Matthew Wilcox,
	Michael Ellerman, Mike Rapoport (IBM), Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Qi Zheng, Ryan Roberts,
	Will Deacon, Yang Shi, linux-arch, linux-arm-kernel, linux-csky,
	linux-m68k, linux-openrisc, linux-riscv, linux-s390, linuxppc-dev,
	sparclinux, x86

On 4/8/25 02:52, Kevin Brodsky wrote:
> Page table pages are normally freed using the appropriate helper for
> the given page table level. On x86, pud_free_pmd_page() and
> pmd_free_pte_page() are an exception to the rule: they call
> free_page() directly.
> 
> Constructor/destructor calls are about to be introduced for kernel
> PTEs. To avoid missing dtor calls in those helpers, free the PTE
> pages using pte_free_kernel() instead of free_page().
> 
> While at it also use pmd_free() instead of calling pagetable_dtor()
> explicitly at the PMD level.

Looks sane and adding consistency is nice.

Are there any tests for folio_test_pgtable() at free_page() time? If we
had that, it would make it less likely that another free_page() user
could sneak in without calling the destructor.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()
  2025-04-08 15:22   ` Dave Hansen
@ 2025-04-08 16:37     ` Matthew Wilcox
  2025-04-08 16:54       ` Dave Hansen
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2025-04-08 16:37 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kevin Brodsky, linux-mm, linux-kernel, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Michael Ellerman, Mike Rapoport (IBM),
	Palmer Dabbelt, Paul Walmsley, Peter Zijlstra, Qi Zheng,
	Ryan Roberts, Will Deacon, Yang Shi, linux-arch, linux-arm-kernel,
	linux-csky, linux-m68k, linux-openrisc, linux-riscv, linux-s390,
	linuxppc-dev, sparclinux, x86

On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote:
> Are there any tests for folio_test_pgtable() at free_page() time? If we
> had that, it would make it less likely that another free_page() user
> could sneak in without calling the destructor.

It's hidden, but yes:

static inline bool page_expected_state(struct page *page,
                                        unsigned long check_flags)
{
        if (unlikely(atomic_read(&page->_mapcount) != -1))
                return false;

PageTable uses page_type which aliases with mapcount, so this check
covers "PageTable is still set when the last refcount to it is put".

I don't think we really use the page refcount when allocating/freeing
page tables.  Anyone want to try switching it over to using
alloc_frozen_pages() / free_frozen_pages()?  Might need to move that API
out of mm/internal.h ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()
  2025-04-08 16:37     ` Matthew Wilcox
@ 2025-04-08 16:54       ` Dave Hansen
  2025-04-08 17:40         ` Matthew Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Hansen @ 2025-04-08 16:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kevin Brodsky, linux-mm, linux-kernel, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Michael Ellerman, Mike Rapoport (IBM),
	Palmer Dabbelt, Paul Walmsley, Peter Zijlstra, Qi Zheng,
	Ryan Roberts, Will Deacon, Yang Shi, linux-arch, linux-arm-kernel,
	linux-csky, linux-m68k, linux-openrisc, linux-riscv, linux-s390,
	linuxppc-dev, sparclinux, x86

On 4/8/25 09:37, Matthew Wilcox wrote:
> On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote:
>> Are there any tests for folio_test_pgtable() at free_page() time? If we
>> had that, it would make it less likely that another free_page() user
>> could sneak in without calling the destructor.
> It's hidden, but yes:
> 
> static inline bool page_expected_state(struct page *page,
>                                         unsigned long check_flags)
> {
>         if (unlikely(atomic_read(&page->_mapcount) != -1))
>                 return false;
> 
> PageTable uses page_type which aliases with mapcount, so this check
> covers "PageTable is still set when the last refcount to it is put".

Huh, so shouldn't we have ended up in bad_page() for these, other than:

        pagetable_dtor(virt_to_ptdesc(pmd));
        free_page((unsigned long)pmd);

?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()
  2025-04-08 16:54       ` Dave Hansen
@ 2025-04-08 17:40         ` Matthew Wilcox
  2025-04-08 17:42           ` Dave Hansen
  2025-04-09 14:50           ` Kevin Brodsky
  0 siblings, 2 replies; 21+ messages in thread
From: Matthew Wilcox @ 2025-04-08 17:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kevin Brodsky, linux-mm, linux-kernel, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Michael Ellerman, Mike Rapoport (IBM),
	Palmer Dabbelt, Paul Walmsley, Peter Zijlstra, Qi Zheng,
	Ryan Roberts, Will Deacon, Yang Shi, linux-arch, linux-arm-kernel,
	linux-csky, linux-m68k, linux-openrisc, linux-riscv, linux-s390,
	linuxppc-dev, sparclinux, x86

On Tue, Apr 08, 2025 at 09:54:42AM -0700, Dave Hansen wrote:
> On 4/8/25 09:37, Matthew Wilcox wrote:
> > On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote:
> >> Are there any tests for folio_test_pgtable() at free_page() time? If we
> >> had that, it would make it less likely that another free_page() user
> >> could sneak in without calling the destructor.
> > It's hidden, but yes:
> > 
> > static inline bool page_expected_state(struct page *page,
> >                                         unsigned long check_flags)
> > {
> >         if (unlikely(atomic_read(&page->_mapcount) != -1))
> >                 return false;
> > 
> > PageTable uses page_type which aliases with mapcount, so this check
> > covers "PageTable is still set when the last refcount to it is put".
> 
> Huh, so shouldn't we have ended up in bad_page() for these, other than:
> 
>         pagetable_dtor(virt_to_ptdesc(pmd));
>         free_page((unsigned long)pmd);

I think at this point in Kevin's series, we don't call the ctor for
these pages, so we never set PageTable() on them.  I could be wrong;
as Kevin says, this is all very twisty and confusing with exceptions and
exceptions to exceptions.  This series should reduce the confusion.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()
  2025-04-08 17:40         ` Matthew Wilcox
@ 2025-04-08 17:42           ` Dave Hansen
  2025-04-09 14:50           ` Kevin Brodsky
  1 sibling, 0 replies; 21+ messages in thread
From: Dave Hansen @ 2025-04-08 17:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kevin Brodsky, linux-mm, linux-kernel, Albert Ou, Andreas Larsson,
	Andrew Morton, Catalin Marinas, Dave Hansen, David S. Miller,
	Geert Uytterhoeven, Linus Walleij, Madhavan Srinivasan,
	Mark Rutland, Michael Ellerman, Mike Rapoport (IBM),
	Palmer Dabbelt, Paul Walmsley, Peter Zijlstra, Qi Zheng,
	Ryan Roberts, Will Deacon, Yang Shi, linux-arch, linux-arm-kernel,
	linux-csky, linux-m68k, linux-openrisc, linux-riscv, linux-s390,
	linuxppc-dev, sparclinux, x86

On 4/8/25 10:40, Matthew Wilcox wrote:
> I think at this point in Kevin's series, we don't call the ctor for
> these pages, so we never set PageTable() on them.  I could be wrong;
> as Kevin says, this is all very twisty and confusing with exceptions and
> exceptions to exceptions.  This series should reduce the confusion.

Oh, for sure. I didn't mean to detract from the series as a whole. It
totally looks like a good idea!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()
  2025-04-08 17:40         ` Matthew Wilcox
  2025-04-08 17:42           ` Dave Hansen
@ 2025-04-09 14:50           ` Kevin Brodsky
  1 sibling, 0 replies; 21+ messages in thread
From: Kevin Brodsky @ 2025-04-09 14:50 UTC (permalink / raw)
  To: Matthew Wilcox, Dave Hansen
  Cc: linux-mm, linux-kernel, Albert Ou, Andreas Larsson, Andrew Morton,
	Catalin Marinas, Dave Hansen, David S. Miller, Geert Uytterhoeven,
	Linus Walleij, Madhavan Srinivasan, Mark Rutland,
	Michael Ellerman, Mike Rapoport (IBM), Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Qi Zheng, Ryan Roberts,
	Will Deacon, Yang Shi, linux-arch, linux-arm-kernel, linux-csky,
	linux-m68k, linux-openrisc, linux-riscv, linux-s390, linuxppc-dev,
	sparclinux, x86

On 08/04/2025 19:40, Matthew Wilcox wrote:
> On Tue, Apr 08, 2025 at 09:54:42AM -0700, Dave Hansen wrote:
>> On 4/8/25 09:37, Matthew Wilcox wrote:
>>> On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote:
>>>> Are there any tests for folio_test_pgtable() at free_page() time? If we
>>>> had that, it would make it less likely that another free_page() user
>>>> could sneak in without calling the destructor.
>>> It's hidden, but yes:
>>>
>>> static inline bool page_expected_state(struct page *page,
>>>                                         unsigned long check_flags)
>>> {
>>>         if (unlikely(atomic_read(&page->_mapcount) != -1))
>>>                 return false;
>>>
>>> PageTable uses page_type which aliases with mapcount, so this check
>>> covers "PageTable is still set when the last refcount to it is put".
>> Huh, so shouldn't we have ended up in bad_page() for these, other than:
>>
>>         pagetable_dtor(virt_to_ptdesc(pmd));
>>         free_page((unsigned long)pmd);
> I think at this point in Kevin's series, we don't call the ctor for
> these pages, so we never set PageTable() on them. I could be wrong;

Correct, that's why I added this patch early in the series (the next
patch adds the ctor call in pte_alloc_one_kernel()).

The BUG() in v1 was indeed triggered by a page_expected_state() check [1].

> as Kevin says, this is all very twisty and confusing with exceptions and
> exceptions to exceptions.  This series should reduce the confusion.

I hope so!

- Kevin

[1] https://lore.kernel.org/oe-lkp/202503211612.e11bd73f-lkp@intel.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 01/12] mm: Pass mm down to pagetable_{pte,pmd}_ctor
  2025-04-08  9:52 ` [PATCH v2 01/12] mm: Pass mm down to pagetable_{pte,pmd}_ctor Kevin Brodsky
@ 2025-04-25 16:34   ` Alexander Gordeev
  0 siblings, 0 replies; 21+ messages in thread
From: Alexander Gordeev @ 2025-04-25 16:34 UTC (permalink / raw)
  To: Kevin Brodsky
  Cc: linux-mm, linux-kernel, Albert Ou, Andreas Larsson, Andrew Morton,
	Catalin Marinas, Dave Hansen, David S. Miller, Geert Uytterhoeven,
	Linus Walleij, Madhavan Srinivasan, Mark Rutland, Matthew Wilcox,
	Michael Ellerman, Mike Rapoport (IBM), Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Qi Zheng, Ryan Roberts,
	Will Deacon, Yang Shi, linux-arch, linux-arm-kernel, linux-csky,
	linux-m68k, linux-openrisc, linux-riscv, linux-s390, linuxppc-dev,
	sparclinux, x86

On Tue, Apr 08, 2025 at 10:52:11AM +0100, Kevin Brodsky wrote:
> In preparation for calling constructors for all kernel page tables
> while eliding unnecessary ptlock initialisation, let's pass down the
> associated mm to the PTE/PMD level ctors. (These are the two levels
> where ptlocks are used.)
> 
> In most cases the mm is already around at the point of calling the
> ctor so we simply pass it down. This is however not the case for
> special page table allocators:
> 
> * arch/arm/mm/mmu.c
> * arch/arm64/mm/mmu.c
> * arch/riscv/mm/init.c
> 
> In those cases, the page tables being allocated are either for
> standard kernel memory (init_mm) or special page directories, which
> may not be associated to any mm. For now let's pass NULL as mm; this
> will be refined where possible in future patches.
> 
> No functional change in this patch.
> 
> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
> ---
>  arch/arm/mm/mmu.c                        |  2 +-
>  arch/arm64/mm/mmu.c                      |  4 ++--
>  arch/loongarch/include/asm/pgalloc.h     |  2 +-
>  arch/m68k/include/asm/mcf_pgalloc.h      |  2 +-
>  arch/m68k/include/asm/motorola_pgalloc.h | 10 +++++-----
>  arch/m68k/mm/motorola.c                  |  6 +++---
>  arch/mips/include/asm/pgalloc.h          |  2 +-
>  arch/parisc/include/asm/pgalloc.h        |  2 +-
>  arch/powerpc/mm/book3s64/pgtable.c       |  2 +-
>  arch/powerpc/mm/pgtable-frag.c           |  2 +-
>  arch/riscv/mm/init.c                     |  4 ++--
>  arch/s390/include/asm/pgalloc.h          |  2 +-
>  arch/s390/mm/pgalloc.c                   |  2 +-
>  arch/sparc/mm/init_64.c                  |  2 +-
>  arch/sparc/mm/srmmu.c                    |  2 +-
>  arch/x86/mm/pgtable.c                    |  2 +-
>  include/asm-generic/pgalloc.h            |  4 ++--
>  include/linux/mm.h                       |  6 ++++--
>  18 files changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index f02f872ea8a9..edb7f56b7c91 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -735,7 +735,7 @@ static void *__init late_alloc(unsigned long sz)
>  	void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM,
>  			get_order(sz));
>  
> -	if (!ptdesc || !pagetable_pte_ctor(ptdesc))
> +	if (!ptdesc || !pagetable_pte_ctor(NULL, ptdesc))
>  		BUG();
>  	return ptdesc_to_virt(ptdesc);
>  }
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ea6695d53fb9..8c5c471cfb06 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -494,9 +494,9 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
>  	 * folded, and if so pagetable_pte_ctor() becomes nop.
>  	 */
>  	if (shift == PAGE_SHIFT)
> -		BUG_ON(!pagetable_pte_ctor(ptdesc));
> +		BUG_ON(!pagetable_pte_ctor(NULL, ptdesc));
>  	else if (shift == PMD_SHIFT)
> -		BUG_ON(!pagetable_pmd_ctor(ptdesc));
> +		BUG_ON(!pagetable_pmd_ctor(NULL, ptdesc));
>  
>  	return pa;
>  }
> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
> index b58f587f0f0a..1c63a9d9a6d3 100644
> --- a/arch/loongarch/include/asm/pgalloc.h
> +++ b/arch/loongarch/include/asm/pgalloc.h
> @@ -69,7 +69,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
>  	if (!ptdesc)
>  		return NULL;
>  
> -	if (!pagetable_pmd_ctor(ptdesc)) {
> +	if (!pagetable_pmd_ctor(mm, ptdesc)) {
>  		pagetable_free(ptdesc);
>  		return NULL;
>  	}
> diff --git a/arch/m68k/include/asm/mcf_pgalloc.h b/arch/m68k/include/asm/mcf_pgalloc.h
> index 4c648b51e7fd..465a71101b7d 100644
> --- a/arch/m68k/include/asm/mcf_pgalloc.h
> +++ b/arch/m68k/include/asm/mcf_pgalloc.h
> @@ -48,7 +48,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
>  
>  	if (!ptdesc)
>  		return NULL;
> -	if (!pagetable_pte_ctor(ptdesc)) {
> +	if (!pagetable_pte_ctor(mm, ptdesc)) {
>  		pagetable_free(ptdesc);
>  		return NULL;
>  	}
> diff --git a/arch/m68k/include/asm/motorola_pgalloc.h b/arch/m68k/include/asm/motorola_pgalloc.h
> index 5abe7da8ac5a..1091fb0affbe 100644
> --- a/arch/m68k/include/asm/motorola_pgalloc.h
> +++ b/arch/m68k/include/asm/motorola_pgalloc.h
> @@ -15,7 +15,7 @@ enum m68k_table_types {
>  };
>  
>  extern void init_pointer_table(void *table, int type);
> -extern void *get_pointer_table(int type);
> +extern void *get_pointer_table(struct mm_struct *mm, int type);
>  extern int free_pointer_table(void *table, int type);
>  
>  /*
> @@ -26,7 +26,7 @@ extern int free_pointer_table(void *table, int type);
>  
>  static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> -	return get_pointer_table(TABLE_PTE);
> +	return get_pointer_table(mm, TABLE_PTE);
>  }
>  
>  static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
> @@ -36,7 +36,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  
>  static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
>  {
> -	return get_pointer_table(TABLE_PTE);
> +	return get_pointer_table(mm, TABLE_PTE);
>  }
>  
>  static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
> @@ -53,7 +53,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pgtable,
>  
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
>  {
> -	return get_pointer_table(TABLE_PMD);
> +	return get_pointer_table(mm, TABLE_PMD);
>  }
>  
>  static inline int pmd_free(struct mm_struct *mm, pmd_t *pmd)
> @@ -75,7 +75,7 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
>  
>  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
> -	return get_pointer_table(TABLE_PGD);
> +	return get_pointer_table(mm, TABLE_PGD);
>  }
>  
>  
> diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
> index 73651e093c4d..6ab3ef39ba7a 100644
> --- a/arch/m68k/mm/motorola.c
> +++ b/arch/m68k/mm/motorola.c
> @@ -139,7 +139,7 @@ void __init init_pointer_table(void *table, int type)
>  	return;
>  }
>  
> -void *get_pointer_table(int type)
> +void *get_pointer_table(struct mm_struct *mm, int type)
>  {
>  	ptable_desc *dp = ptable_list[type].next;
>  	unsigned int mask = list_empty(&ptable_list[type]) ? 0 : PD_MARKBITS(dp);
> @@ -164,10 +164,10 @@ void *get_pointer_table(int type)
>  			 * m68k doesn't have SPLIT_PTE_PTLOCKS for not having
>  			 * SMP.
>  			 */
> -			pagetable_pte_ctor(virt_to_ptdesc(page));
> +			pagetable_pte_ctor(mm, virt_to_ptdesc(page));
>  			break;
>  		case TABLE_PMD:
> -			pagetable_pmd_ctor(virt_to_ptdesc(page));
> +			pagetable_pmd_ctor(mm, virt_to_ptdesc(page));
>  			break;
>  		case TABLE_PGD:
>  			pagetable_pgd_ctor(virt_to_ptdesc(page));
> diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
> index bbca420c96d3..942af87f1cdd 100644
> --- a/arch/mips/include/asm/pgalloc.h
> +++ b/arch/mips/include/asm/pgalloc.h
> @@ -62,7 +62,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
>  	if (!ptdesc)
>  		return NULL;
>  
> -	if (!pagetable_pmd_ctor(ptdesc)) {
> +	if (!pagetable_pmd_ctor(mm, ptdesc)) {
>  		pagetable_free(ptdesc);
>  		return NULL;
>  	}
> diff --git a/arch/parisc/include/asm/pgalloc.h b/arch/parisc/include/asm/pgalloc.h
> index 2ca74a56415c..3b84ee93edaa 100644
> --- a/arch/parisc/include/asm/pgalloc.h
> +++ b/arch/parisc/include/asm/pgalloc.h
> @@ -39,7 +39,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
>  	ptdesc = pagetable_alloc(gfp, PMD_TABLE_ORDER);
>  	if (!ptdesc)
>  		return NULL;
> -	if (!pagetable_pmd_ctor(ptdesc)) {
> +	if (!pagetable_pmd_ctor(mm, ptdesc)) {
>  		pagetable_free(ptdesc);
>  		return NULL;
>  	}
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> index 8f7d41ce2ca1..a282233c8785 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -422,7 +422,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  	ptdesc = pagetable_alloc(gfp, 0);
>  	if (!ptdesc)
>  		return NULL;
> -	if (!pagetable_pmd_ctor(ptdesc)) {
> +	if (!pagetable_pmd_ctor(mm, ptdesc)) {
>  		pagetable_free(ptdesc);
>  		return NULL;
>  	}
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index 713268ccb1a0..387e9b1fe12c 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -61,7 +61,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
>  		ptdesc = pagetable_alloc(PGALLOC_GFP | __GFP_ACCOUNT, 0);
>  		if (!ptdesc)
>  			return NULL;
> -		if (!pagetable_pte_ctor(ptdesc)) {
> +		if (!pagetable_pte_ctor(mm, ptdesc)) {
>  			pagetable_free(ptdesc);
>  			return NULL;
>  		}
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index ab475ec6ca42..e5ef693fc778 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -442,7 +442,7 @@ static phys_addr_t __meminit alloc_pte_late(uintptr_t va)
>  {
>  	struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
>  
> -	BUG_ON(!ptdesc || !pagetable_pte_ctor(ptdesc));
> +	BUG_ON(!ptdesc || !pagetable_pte_ctor(NULL, ptdesc));
>  	return __pa((pte_t *)ptdesc_address(ptdesc));
>  }
>  
> @@ -522,7 +522,7 @@ static phys_addr_t __meminit alloc_pmd_late(uintptr_t va)
>  {
>  	struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
>  
> -	BUG_ON(!ptdesc || !pagetable_pmd_ctor(ptdesc));
> +	BUG_ON(!ptdesc || !pagetable_pmd_ctor(NULL, ptdesc));
>  	return __pa((pmd_t *)ptdesc_address(ptdesc));
>  }
>  
> diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
> index 005497ffebda..5345398df653 100644
> --- a/arch/s390/include/asm/pgalloc.h
> +++ b/arch/s390/include/asm/pgalloc.h
> @@ -97,7 +97,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long vmaddr)
>  	if (!table)
>  		return NULL;
>  	crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
> -	if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) {
> +	if (!pagetable_pmd_ctor(mm, virt_to_ptdesc(table))) {
>  		crst_table_free(mm, table);
>  		return NULL;
>  	}
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index e3a6f8ae156c..619d6917e3b7 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -145,7 +145,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>  	ptdesc = pagetable_alloc(GFP_KERNEL, 0);
>  	if (!ptdesc)
>  		return NULL;
> -	if (!pagetable_pte_ctor(ptdesc)) {
> +	if (!pagetable_pte_ctor(mm, ptdesc)) {
>  		pagetable_free(ptdesc);
>  		return NULL;
>  	}
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index 760818950464..5c8eabda1d17 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -2895,7 +2895,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
>  
>  	if (!ptdesc)
>  		return NULL;
> -	if (!pagetable_pte_ctor(ptdesc)) {
> +	if (!pagetable_pte_ctor(mm, ptdesc)) {
>  		pagetable_free(ptdesc);
>  		return NULL;
>  	}
> diff --git a/arch/sparc/mm/srmmu.c b/arch/sparc/mm/srmmu.c
> index dd32711022f5..f8fb4911d360 100644
> --- a/arch/sparc/mm/srmmu.c
> +++ b/arch/sparc/mm/srmmu.c
> @@ -350,7 +350,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
>  	page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
>  	spin_lock(&mm->page_table_lock);
>  	if (page_ref_inc_return(page) == 2 &&
> -			!pagetable_pte_ctor(page_ptdesc(page))) {
> +			!pagetable_pte_ctor(mm, page_ptdesc(page))) {
>  		page_ref_dec(page);
>  		ptep = NULL;
>  	}
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index a05fcddfc811..7930f234c5f6 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -205,7 +205,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
>  
>  		if (!ptdesc)
>  			failed = true;
> -		if (ptdesc && !pagetable_pmd_ctor(ptdesc)) {
> +		if (ptdesc && !pagetable_pmd_ctor(mm, ptdesc)) {
>  			pagetable_free(ptdesc);
>  			ptdesc = NULL;
>  			failed = true;
> diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
> index 892ece4558a2..e164ca66f0f6 100644
> --- a/include/asm-generic/pgalloc.h
> +++ b/include/asm-generic/pgalloc.h
> @@ -70,7 +70,7 @@ static inline pgtable_t __pte_alloc_one_noprof(struct mm_struct *mm, gfp_t gfp)
>  	ptdesc = pagetable_alloc_noprof(gfp, 0);
>  	if (!ptdesc)
>  		return NULL;
> -	if (!pagetable_pte_ctor(ptdesc)) {
> +	if (!pagetable_pte_ctor(mm, ptdesc)) {
>  		pagetable_free(ptdesc);
>  		return NULL;
>  	}
> @@ -137,7 +137,7 @@ static inline pmd_t *pmd_alloc_one_noprof(struct mm_struct *mm, unsigned long ad
>  	ptdesc = pagetable_alloc_noprof(gfp, 0);
>  	if (!ptdesc)
>  		return NULL;
> -	if (!pagetable_pmd_ctor(ptdesc)) {
> +	if (!pagetable_pmd_ctor(mm, ptdesc)) {
>  		pagetable_free(ptdesc);
>  		return NULL;
>  	}
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b7f13f087954..f9b793cce2c1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3100,7 +3100,8 @@ static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
>  	pagetable_free(ptdesc);
>  }
>  
> -static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
> +static inline bool pagetable_pte_ctor(struct mm_struct *mm,
> +				      struct ptdesc *ptdesc)
>  {
>  	if (!ptlock_init(ptdesc))
>  		return false;
> @@ -3206,7 +3207,8 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd)
>  	return ptl;
>  }
>  
> -static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
> +static inline bool pagetable_pmd_ctor(struct mm_struct *mm,
> +				      struct ptdesc *ptdesc)
>  {
>  	if (!pmd_ptlock_init(ptdesc))
>  		return false;

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>	# s390

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 03/12] mm: Call ctor/dtor for kernel PTEs
  2025-04-08  9:52 ` [PATCH v2 03/12] mm: Call ctor/dtor for kernel PTEs Kevin Brodsky
@ 2025-04-25 16:35   ` Alexander Gordeev
  0 siblings, 0 replies; 21+ messages in thread
From: Alexander Gordeev @ 2025-04-25 16:35 UTC (permalink / raw)
  To: Kevin Brodsky
  Cc: linux-mm, linux-kernel, Albert Ou, Andreas Larsson, Andrew Morton,
	Catalin Marinas, Dave Hansen, David S. Miller, Geert Uytterhoeven,
	Linus Walleij, Madhavan Srinivasan, Mark Rutland, Matthew Wilcox,
	Michael Ellerman, Mike Rapoport (IBM), Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Qi Zheng, Ryan Roberts,
	Will Deacon, Yang Shi, linux-arch, linux-arm-kernel, linux-csky,
	linux-m68k, linux-openrisc, linux-riscv, linux-s390, linuxppc-dev,
	sparclinux, x86

On Tue, Apr 08, 2025 at 10:52:13AM +0100, Kevin Brodsky wrote:
> Since [1], constructors/destructors are expected to be called for
> all page table pages, at all levels and for both user and kernel
> pgtables. There is however one glaring exception: kernel PTEs are
> managed via separate helpers (pte_alloc_kernel/pte_free_kernel),
> which do not call the [cd]tor, at least not in the generic
> implementation.
> 
> The most obvious reason for this anomaly is that init_mm is
> special-cased not to use split page table locks. As a result calling
> ptlock_init() for PTEs associated with init_mm would be wasteful,
> potentially resulting in dynamic memory allocation. However, pgtable
> [cd]tors perform other actions - currently related to
> accounting/statistics, and potentially more functionally significant
> in the future.
> 
> Now that pagetable_pte_ctor() is passed the associated mm, we can
> make it skip the call to ptlock_init() for init_mm; this allows us
> to call the ctor from pte_alloc_one_kernel() too. This is matched by
> a call to the pgtable destructor in pte_free_kernel(); no
> special-casing is needed on that path, as ptlock_free() is already
> called unconditionally. (ptlock_free() is a no-op unless a ptlock
> was allocated for the given PTP.)
> 
> This patch ensures that all architectures that rely on
> <asm-generic/pgalloc.h> call the [cd]tor for kernel PTEs.
> pte_free_kernel() cannot be overridden so changing the generic
> implementation is sufficient. pte_alloc_one_kernel() can be
> overridden using __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL, and a few
> architectures implement it by calling the page allocator directly.
> We amend those so that they call the generic
> __pte_alloc_one_kernel() instead, if possible, ensuring that the
> ctor is called.
> 
> A few architectures do not use <asm-generic/pgalloc.h>; those will
> be taken care of separately.
> 
> [1] https://lore.kernel.org/linux-mm/20250103184415.2744423-1-kevin.brodsky@arm.com/
> 
> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
> ---
>  arch/csky/include/asm/pgalloc.h | 2 +-
>  arch/microblaze/mm/pgtable.c    | 2 +-
>  arch/openrisc/mm/ioremap.c      | 2 +-
>  include/asm-generic/pgalloc.h   | 7 ++++++-
>  include/linux/mm.h              | 2 +-
>  5 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
> index 11055c574968..9ed2b15ffd94 100644
> --- a/arch/csky/include/asm/pgalloc.h
> +++ b/arch/csky/include/asm/pgalloc.h
> @@ -29,7 +29,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>  	pte_t *pte;
>  	unsigned long i;
>  
> -	pte = (pte_t *) __get_free_page(GFP_KERNEL);
> +	pte = __pte_alloc_one_kernel(mm);
>  	if (!pte)
>  		return NULL;
>  
> diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
> index 9f73265aad4e..e96dd1b7aba4 100644
> --- a/arch/microblaze/mm/pgtable.c
> +++ b/arch/microblaze/mm/pgtable.c
> @@ -245,7 +245,7 @@ unsigned long iopa(unsigned long addr)
>  __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>  {
>  	if (mem_init_done)
> -		return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> +		return __pte_alloc_one_kernel(mm);
>  	else
>  		return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
>  					      MEMBLOCK_LOW_LIMIT,
> diff --git a/arch/openrisc/mm/ioremap.c b/arch/openrisc/mm/ioremap.c
> index 8e63e86251ca..3b352f97fecb 100644
> --- a/arch/openrisc/mm/ioremap.c
> +++ b/arch/openrisc/mm/ioremap.c
> @@ -36,7 +36,7 @@ pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm)
>  	pte_t *pte;
>  
>  	if (likely(mem_init_done)) {
> -		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
> +		pte = __pte_alloc_one_kernel(mm);
>  	} else {
>  		pte = memblock_alloc_or_panic(PAGE_SIZE, PAGE_SIZE);
>  	}
> diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
> index e164ca66f0f6..3c8ec3bfea44 100644
> --- a/include/asm-generic/pgalloc.h
> +++ b/include/asm-generic/pgalloc.h
> @@ -23,6 +23,11 @@ static inline pte_t *__pte_alloc_one_kernel_noprof(struct mm_struct *mm)
>  
>  	if (!ptdesc)
>  		return NULL;
> +	if (!pagetable_pte_ctor(mm, ptdesc)) {
> +		pagetable_free(ptdesc);
> +		return NULL;
> +	}
> +
>  	return ptdesc_address(ptdesc);
>  }
>  #define __pte_alloc_one_kernel(...)	alloc_hooks(__pte_alloc_one_kernel_noprof(__VA_ARGS__))
> @@ -48,7 +53,7 @@ static inline pte_t *pte_alloc_one_kernel_noprof(struct mm_struct *mm)
>   */
>  static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  {
> -	pagetable_free(virt_to_ptdesc(pte));
> +	pagetable_dtor_free(virt_to_ptdesc(pte));
>  }
>  
>  /**
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f9b793cce2c1..3f48e449574a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3103,7 +3103,7 @@ static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
>  static inline bool pagetable_pte_ctor(struct mm_struct *mm,
>  				      struct ptdesc *ptdesc)
>  {
> -	if (!ptlock_init(ptdesc))
> +	if (mm != &init_mm && !ptlock_init(ptdesc))
>  		return false;
>  	__pagetable_ctor(ptdesc);
>  	return true;

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-04-25 16:36 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08  9:52 [PATCH v2 00/12] Always call constructor for kernel page tables Kevin Brodsky
2025-04-08  9:52 ` [PATCH v2 01/12] mm: Pass mm down to pagetable_{pte,pmd}_ctor Kevin Brodsky
2025-04-25 16:34   ` Alexander Gordeev
2025-04-08  9:52 ` [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel() Kevin Brodsky
2025-04-08 15:22   ` Dave Hansen
2025-04-08 16:37     ` Matthew Wilcox
2025-04-08 16:54       ` Dave Hansen
2025-04-08 17:40         ` Matthew Wilcox
2025-04-08 17:42           ` Dave Hansen
2025-04-09 14:50           ` Kevin Brodsky
2025-04-08  9:52 ` [PATCH v2 03/12] mm: Call ctor/dtor for kernel PTEs Kevin Brodsky
2025-04-25 16:35   ` Alexander Gordeev
2025-04-08  9:52 ` [PATCH v2 04/12] m68k: " Kevin Brodsky
2025-04-08  9:52 ` [PATCH v2 05/12] powerpc: " Kevin Brodsky
2025-04-08  9:52 ` [PATCH v2 06/12] sparc64: " Kevin Brodsky
2025-04-08  9:52 ` [PATCH v2 07/12] mm: Skip ptlock_init() for kernel PMDs Kevin Brodsky
2025-04-08  9:52 ` [PATCH v2 08/12] arm64: mm: Use enum to identify pgtable level instead of *_SHIFT Kevin Brodsky
2025-04-08  9:52 ` [PATCH v2 09/12] arm64: mm: Always call PTE/PMD ctor in __create_pgd_mapping() Kevin Brodsky
2025-04-08  9:52 ` [PATCH v2 10/12] riscv: mm: Clarify ctor mm argument in alloc_{pte,pmd}_late Kevin Brodsky
2025-04-08  9:52 ` [PATCH v2 11/12] arm64: mm: Call PUD/P4D ctor in __create_pgd_mapping() Kevin Brodsky
2025-04-08  9:52 ` [PATCH v2 12/12] riscv: mm: Call PUD/P4D ctor in special kernel pgtable alloc Kevin Brodsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).