linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2]  fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64
@ 2025-12-18 19:47 Yeoreum Yun
  2025-12-18 19:47 ` [PATCH v3 1/2] arm64: mmu: avoid allocating pages while splitting the linear mapping Yeoreum Yun
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Yeoreum Yun @ 2025-12-18 19:47 UTC (permalink / raw)
  To: catalin.marinas, will, ryan.roberts, akpm, david, kevin.brodsky,
	quic_zhenhuah, dev.jain, yang, chaitanyas.prakash, bigeasy,
	clrkwllms, rostedt, lorenzo.stoakes, ardb, jackmanb, vbabka,
	mhocko
  Cc: linux-arm-kernel, linux-kernel, linux-rt-devel, Yeoreum Yun

Under PREEMPT_RT, calling generic memory allocation/free APIs
(e.x) __get_free_pages(), pgtable_alloc(), free_pages() and etc
with preemption disabled is not allowed, but allow only nolock() APIs series
because it may acquire a spin lock that becomes sleepable on RT,
potentially causing a sleep during page allocation
(See Documentation/core-api/real-time/differences.rst, Memory allocation section).

However, In arm64, __linear_map_split_to_ptes() and
__kpti_install_ng_mappings() called by stopper thread via stop_machine()
use generic memory allocation/free APIs.

This patchset fixes this problem and based on v6.19-rc1

Patch History
==============
from v2 to v3:
  - remove split-mode and split_args.
    pass proper function pointer while spliting.
  - rename function name.
  - https://lore.kernel.org/all/20251217182007.2345700-1-yeoreum.yun@arm.com/

from v1 to v2:
  - drop pagetable_alloc_nolock()
  - following @Ryan Roberts suggestion.
  - https://lore.kernel.org/all/20251212161832.2067134-1-yeoreum.yun@arm.com/


*** BLURB HERE ***

Yeoreum Yun (2):
  arm64: mmu: avoid allocating pages while splitting the linear mapping
  arm64: mmu: avoid allocating pages while installing ng-mapping for
    KPTI

 arch/arm64/mm/mmu.c | 254 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 197 insertions(+), 57 deletions(-)

--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/2] arm64: mmu: avoid allocating pages while splitting the linear mapping
  2025-12-18 19:47 [PATCH v3 0/2] fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64 Yeoreum Yun
@ 2025-12-18 19:47 ` Yeoreum Yun
  2026-01-02 11:03   ` Ryan Roberts
  2025-12-18 19:47 ` [PATCH v3 2/2] arm64: mmu: avoid allocating pages while installing ng-mapping for KPTI Yeoreum Yun
  2025-12-31 10:07 ` [PATCH v3 0/2] fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64 Yeoreum Yun
  2 siblings, 1 reply; 11+ messages in thread
From: Yeoreum Yun @ 2025-12-18 19:47 UTC (permalink / raw)
  To: catalin.marinas, will, ryan.roberts, akpm, david, kevin.brodsky,
	quic_zhenhuah, dev.jain, yang, chaitanyas.prakash, bigeasy,
	clrkwllms, rostedt, lorenzo.stoakes, ardb, jackmanb, vbabka,
	mhocko
  Cc: linux-arm-kernel, linux-kernel, linux-rt-devel, Yeoreum Yun

linear_map_split_to_ptes() currently allocates page tables while
splitting the linear mapping into PTEs under stop_machine() using GFP_ATOMIC.

This is fine for non-PREEMPT_RT configurations.
However, it becomes problematic on PREEMPT_RT, because
generic memory allocation/free APIs (e.g. pgtable_alloc(), __get_free_pages(), etc.)
cannot be called from a non-preemptible context, except for the _nolock() variants.
This is because generic memory allocation/free paths are sleepable,
as they rely on spin_lock(), which becomes sleepable on PREEMPT_RT.

In other words, even calling pgtable_alloc() with GFP_ATOMIC is not permitted
in __linear_map_split_to_pte() when it is executed by the stopper thread,
 where preemption is disabled on PREEMPT_RT.

To address this, the required number of page tables is first collected
and preallocated, and the preallocated page tables are then used
when splitting the linear mapping in __linear_map_split_to_pte().

Fixes: 3df6979d222b ("arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs")
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 arch/arm64/mm/mmu.c | 232 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 184 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9ae7ce00a7ef..96a9fa505e71 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -527,18 +527,15 @@ static void early_create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 		panic("Failed to create page tables\n");
 }
 
-static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
-				       enum pgtable_type pgtable_type)
-{
-	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
-	struct ptdesc *ptdesc = pagetable_alloc(gfp & ~__GFP_ZERO, 0);
-	phys_addr_t pa;
-
-	if (!ptdesc)
-		return INVALID_PHYS_ADDR;
-
-	pa = page_to_phys(ptdesc_page(ptdesc));
+static struct ptdesc **split_pgtables;
+static int split_pgtables_order;
+static unsigned long split_pgtables_count;
+static unsigned long split_pgtables_idx;
 
+static __always_inline void __pgd_pgtable_init(struct mm_struct *mm,
+					       struct ptdesc *ptdesc,
+					       enum pgtable_type pgtable_type)
+{
 	switch (pgtable_type) {
 	case TABLE_PTE:
 		BUG_ON(!pagetable_pte_ctor(mm, ptdesc));
@@ -554,19 +551,28 @@ static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
 		break;
 	}
 
-	return pa;
 }
 
-static phys_addr_t
-pgd_pgtable_alloc_init_mm_gfp(enum pgtable_type pgtable_type, gfp_t gfp)
+static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
+				       enum pgtable_type pgtable_type)
 {
-	return __pgd_pgtable_alloc(&init_mm, gfp, pgtable_type);
+	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
+	struct ptdesc *ptdesc = pagetable_alloc(gfp & ~__GFP_ZERO, 0);
+
+	if (!ptdesc)
+		return INVALID_PHYS_ADDR;
+
+	__pgd_pgtable_init(mm, ptdesc, pgtable_type);
+
+	return page_to_phys(ptdesc_page(ptdesc));
 }
 
+typedef phys_addr_t (split_pgtable_alloc_fn)(enum pgtable_type);
+
 static phys_addr_t __maybe_unused
 pgd_pgtable_alloc_init_mm(enum pgtable_type pgtable_type)
 {
-	return pgd_pgtable_alloc_init_mm_gfp(pgtable_type, GFP_PGTABLE_KERNEL);
+	return __pgd_pgtable_alloc(&init_mm, GFP_PGTABLE_KERNEL, pgtable_type);
 }
 
 static phys_addr_t
@@ -575,6 +581,23 @@ pgd_pgtable_alloc_special_mm(enum pgtable_type pgtable_type)
 	return  __pgd_pgtable_alloc(NULL, GFP_PGTABLE_KERNEL, pgtable_type);
 }
 
+static phys_addr_t
+pgd_pgtable_get_preallocated(enum pgtable_type pgtable_type)
+{
+	struct ptdesc *ptdesc;
+
+	if (WARN_ON(split_pgtables_idx >= split_pgtables_count))
+		return INVALID_PHYS_ADDR;
+
+	ptdesc = split_pgtables[split_pgtables_idx++];
+	if (!ptdesc)
+		return INVALID_PHYS_ADDR;
+
+	__pgd_pgtable_init(&init_mm, ptdesc, pgtable_type);
+
+	return page_to_phys(ptdesc_page(ptdesc));
+}
+
 static void split_contpte(pte_t *ptep)
 {
 	int i;
@@ -584,7 +607,9 @@ static void split_contpte(pte_t *ptep)
 		__set_pte(ptep, pte_mknoncont(__ptep_get(ptep)));
 }
 
-static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
+static int split_pmd(pmd_t *pmdp, pmd_t pmd,
+		     split_pgtable_alloc_fn *pgtable_alloc_fn,
+		     bool to_cont)
 {
 	pmdval_t tableprot = PMD_TYPE_TABLE | PMD_TABLE_UXN | PMD_TABLE_AF;
 	unsigned long pfn = pmd_pfn(pmd);
@@ -593,7 +618,7 @@ static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
 	pte_t *ptep;
 	int i;
 
-	pte_phys = pgd_pgtable_alloc_init_mm_gfp(TABLE_PTE, gfp);
+	pte_phys = pgtable_alloc_fn(TABLE_PTE);
 	if (pte_phys == INVALID_PHYS_ADDR)
 		return -ENOMEM;
 	ptep = (pte_t *)phys_to_virt(pte_phys);
@@ -628,7 +653,9 @@ static void split_contpmd(pmd_t *pmdp)
 		set_pmd(pmdp, pmd_mknoncont(pmdp_get(pmdp)));
 }
 
-static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
+static int split_pud(pud_t *pudp, pud_t pud,
+		     split_pgtable_alloc_fn *pgtable_alloc_fn,
+		     bool to_cont)
 {
 	pudval_t tableprot = PUD_TYPE_TABLE | PUD_TABLE_UXN | PUD_TABLE_AF;
 	unsigned int step = PMD_SIZE >> PAGE_SHIFT;
@@ -638,7 +665,7 @@ static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
 	pmd_t *pmdp;
 	int i;
 
-	pmd_phys = pgd_pgtable_alloc_init_mm_gfp(TABLE_PMD, gfp);
+	pmd_phys = pgtable_alloc_fn(TABLE_PMD);
 	if (pmd_phys == INVALID_PHYS_ADDR)
 		return -ENOMEM;
 	pmdp = (pmd_t *)phys_to_virt(pmd_phys);
@@ -707,7 +734,7 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
 	if (!pud_present(pud))
 		goto out;
 	if (pud_leaf(pud)) {
-		ret = split_pud(pudp, pud, GFP_PGTABLE_KERNEL, true);
+		ret = split_pud(pudp, pud, pgd_pgtable_alloc_init_mm, true);
 		if (ret)
 			goto out;
 	}
@@ -732,7 +759,7 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
 		 */
 		if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
 			goto out;
-		ret = split_pmd(pmdp, pmd, GFP_PGTABLE_KERNEL, true);
+		ret = split_pmd(pmdp, pmd, pgd_pgtable_alloc_init_mm, true);
 		if (ret)
 			goto out;
 	}
@@ -831,34 +858,35 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
 static int split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
 				   unsigned long next, struct mm_walk *walk)
 {
-	gfp_t gfp = *(gfp_t *)walk->private;
+	split_pgtable_alloc_fn *pgtable_alloc_fn = walk->private;
 	pud_t pud = pudp_get(pudp);
-	int ret = 0;
 
-	if (pud_leaf(pud))
-		ret = split_pud(pudp, pud, gfp, false);
+	if (!pud_leaf(pud))
+		return 0;
 
-	return ret;
+	return split_pud(pudp, pud, pgtable_alloc_fn, false);
 }
 
 static int split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
 				   unsigned long next, struct mm_walk *walk)
 {
-	gfp_t gfp = *(gfp_t *)walk->private;
+	split_pgtable_alloc_fn *pgtable_alloc_fn = walk->private;
 	pmd_t pmd = pmdp_get(pmdp);
-	int ret = 0;
+	int ret;
 
-	if (pmd_leaf(pmd)) {
-		if (pmd_cont(pmd))
-			split_contpmd(pmdp);
-		ret = split_pmd(pmdp, pmd, gfp, false);
+	if (!pmd_leaf(pmd))
+		return 0;
 
-		/*
-		 * We have split the pmd directly to ptes so there is no need to
-		 * visit each pte to check if they are contpte.
-		 */
-		walk->action = ACTION_CONTINUE;
-	}
+	if (pmd_cont(pmd))
+		split_contpmd(pmdp);
+
+	ret = split_pmd(pmdp, pmd, pgtable_alloc_fn, false);
+
+	/*
+	 * We have split the pmd directly to ptes so there is no need to
+	 * visit each pte to check if they are contpte.
+	 */
+	walk->action = ACTION_CONTINUE;
 
 	return ret;
 }
@@ -880,13 +908,15 @@ static const struct mm_walk_ops split_to_ptes_ops = {
 	.pte_entry	= split_to_ptes_pte_entry,
 };
 
-static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp)
+static int range_split_to_ptes(unsigned long start, unsigned long end,
+			       split_pgtable_alloc_fn *pgtable_alloc_fn)
 {
 	int ret;
 
 	arch_enter_lazy_mmu_mode();
 	ret = walk_kernel_page_table_range_lockless(start, end,
-					&split_to_ptes_ops, NULL, &gfp);
+						    &split_to_ptes_ops, NULL,
+						    pgtable_alloc_fn);
 	arch_leave_lazy_mmu_mode();
 
 	return ret;
@@ -903,6 +933,105 @@ static void __init init_idmap_kpti_bbml2_flag(void)
 	smp_mb();
 }
 
+static int __init
+collect_to_split_pud_entry(pud_t *pudp, unsigned long addr,
+			   unsigned long next, struct mm_walk *walk)
+{
+	pud_t pud = pudp_get(pudp);
+
+	if (pud_leaf(pud))
+		split_pgtables_count += 1 + PTRS_PER_PMD;
+
+	return 0;
+}
+
+static int __init
+collect_to_split_pmd_entry(pmd_t *pmdp, unsigned long addr,
+			   unsigned long next, struct mm_walk *walk)
+{
+	pmd_t pmd = pmdp_get(pmdp);
+
+	if (pmd_leaf(pmd))
+		split_pgtables_count++;
+
+	walk->action = ACTION_CONTINUE;
+
+	return 0;
+}
+
+static void __init linear_map_free_split_pgtables(void)
+{
+	int i;
+
+	if (!split_pgtables_count || !split_pgtables)
+		goto skip_free;
+
+	for (i = split_pgtables_idx; i < split_pgtables_count; i++) {
+		if (split_pgtables[i])
+			pagetable_free(split_pgtables[i]);
+	}
+
+	free_pages((unsigned long)split_pgtables, split_pgtables_order);
+
+skip_free:
+	split_pgtables = NULL;
+	split_pgtables_count = 0;
+	split_pgtables_idx = 0;
+	split_pgtables_order = 0;
+}
+
+static int __init linear_map_prealloc_split_pgtables(void)
+{
+	int ret, i;
+	unsigned long lstart = _PAGE_OFFSET(vabits_actual);
+	unsigned long lend = PAGE_END;
+	unsigned long kstart = (unsigned long)lm_alias(_stext);
+	unsigned long kend = (unsigned long)lm_alias(__init_begin);
+
+	const struct mm_walk_ops collect_to_split_ops = {
+		.pud_entry	= collect_to_split_pud_entry,
+		.pmd_entry	= collect_to_split_pmd_entry
+	};
+
+	split_pgtables_idx = 0;
+	split_pgtables_count = 0;
+
+	ret = walk_kernel_page_table_range_lockless(lstart, kstart,
+						    &collect_to_split_ops,
+						    NULL, NULL);
+	if (!ret)
+		ret = walk_kernel_page_table_range_lockless(kend, lend,
+							    &collect_to_split_ops,
+							    NULL, NULL);
+	if (ret || !split_pgtables_count)
+		goto error;
+
+	ret = -ENOMEM;
+
+	split_pgtables_order =
+		order_base_2(PAGE_ALIGN(split_pgtables_count *
+					sizeof(struct ptdesc *)) >> PAGE_SHIFT);
+
+	split_pgtables = (struct ptdesc **) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+							     split_pgtables_order);
+	if (!split_pgtables)
+		goto error;
+
+	for (i = 0; i < split_pgtables_count; i++) {
+		split_pgtables[i] = pagetable_alloc(GFP_KERNEL, 0);
+		if (!split_pgtables[i])
+			goto error;
+	}
+
+	ret = 0;
+
+error:
+	if (ret)
+		linear_map_free_split_pgtables();
+
+	return ret;
+}
+
 static int __init linear_map_split_to_ptes(void *__unused)
 {
 	/*
@@ -928,9 +1057,9 @@ static int __init linear_map_split_to_ptes(void *__unused)
 		 * PTE. The kernel alias remains static throughout runtime so
 		 * can continue to be safely mapped with large mappings.
 		 */
-		ret = range_split_to_ptes(lstart, kstart, GFP_ATOMIC);
+		ret = range_split_to_ptes(lstart, kstart, pgd_pgtable_get_preallocated);
 		if (!ret)
-			ret = range_split_to_ptes(kend, lend, GFP_ATOMIC);
+			ret = range_split_to_ptes(kend, lend, pgd_pgtable_get_preallocated);
 		if (ret)
 			panic("Failed to split linear map\n");
 		flush_tlb_kernel_range(lstart, lend);
@@ -963,10 +1092,16 @@ static int __init linear_map_split_to_ptes(void *__unused)
 
 void __init linear_map_maybe_split_to_ptes(void)
 {
-	if (linear_map_requires_bbml2 && !system_supports_bbml2_noabort()) {
-		init_idmap_kpti_bbml2_flag();
-		stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
-	}
+	if (!linear_map_requires_bbml2 || system_supports_bbml2_noabort())
+		return;
+
+	if (linear_map_prealloc_split_pgtables())
+		panic("Failed to split linear map\n");
+
+	init_idmap_kpti_bbml2_flag();
+	stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
+
+	linear_map_free_split_pgtables();
 }
 
 /*
@@ -1088,6 +1223,7 @@ bool arch_kfence_init_pool(void)
 	unsigned long end = start + KFENCE_POOL_SIZE;
 	int ret;
 
+
 	/* Exit early if we know the linear map is already pte-mapped. */
 	if (!split_leaf_mapping_possible())
 		return true;
@@ -1097,7 +1233,7 @@ bool arch_kfence_init_pool(void)
 		return true;
 
 	mutex_lock(&pgtable_split_lock);
-	ret = range_split_to_ptes(start, end, GFP_PGTABLE_KERNEL);
+	ret = range_split_to_ptes(start, end, pgd_pgtable_alloc_init_mm);
 	mutex_unlock(&pgtable_split_lock);
 
 	/*
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/2] arm64: mmu: avoid allocating pages while installing ng-mapping for KPTI
  2025-12-18 19:47 [PATCH v3 0/2] fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64 Yeoreum Yun
  2025-12-18 19:47 ` [PATCH v3 1/2] arm64: mmu: avoid allocating pages while splitting the linear mapping Yeoreum Yun
@ 2025-12-18 19:47 ` Yeoreum Yun
  2026-01-02 11:10   ` Ryan Roberts
  2025-12-31 10:07 ` [PATCH v3 0/2] fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64 Yeoreum Yun
  2 siblings, 1 reply; 11+ messages in thread
From: Yeoreum Yun @ 2025-12-18 19:47 UTC (permalink / raw)
  To: catalin.marinas, will, ryan.roberts, akpm, david, kevin.brodsky,
	quic_zhenhuah, dev.jain, yang, chaitanyas.prakash, bigeasy,
	clrkwllms, rostedt, lorenzo.stoakes, ardb, jackmanb, vbabka,
	mhocko
  Cc: linux-arm-kernel, linux-kernel, linux-rt-devel, Yeoreum Yun

The current __kpti_install_ng_mappings() allocates a temporary PGD
while installing the NG mapping for KPTI under stop_machine(),
using GFP_ATOMIC.

This is fine in the non-PREEMPT_RT case. However, it becomes a problem
under PREEMPT_RT because generic memory allocation/free APIs
(e.g., pgtable_alloc(), __get_free_pages(), etc.) cannot be invoked
in a non-preemptible context, except for the *_nolock() variants.
These generic allocators may sleep due to their use of spin_lock().

In other words, calling __get_free_pages(), even with GFP_ATOMIC,
is not allowed in __kpti_install_ng_mappings(), which is executed by
the stopper thread where preemption is disabled under PREEMPT_RT.

To address this, preallocate the page needed for the temporary PGD
before invoking __kpti_install_ng_mappings() via stop_machine().

Fixes: 47546a1912fc ("arm64: mm: install KPTI nG mappings with MMU enabled")
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/mm/mmu.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 96a9fa505e71..9ad9612728e6 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1369,7 +1369,7 @@ static phys_addr_t __init kpti_ng_pgd_alloc(enum pgtable_type type)
 	return kpti_ng_temp_alloc;
 }
 
-static int __init __kpti_install_ng_mappings(void *__unused)
+static int __init __kpti_install_ng_mappings(void *data)
 {
 	typedef void (kpti_remap_fn)(int, int, phys_addr_t, unsigned long);
 	extern kpti_remap_fn idmap_kpti_install_ng_mappings;
@@ -1377,10 +1377,9 @@ static int __init __kpti_install_ng_mappings(void *__unused)
 
 	int cpu = smp_processor_id();
 	int levels = CONFIG_PGTABLE_LEVELS;
-	int order = order_base_2(levels);
 	u64 kpti_ng_temp_pgd_pa = 0;
 	pgd_t *kpti_ng_temp_pgd;
-	u64 alloc = 0;
+	u64 alloc = *(u64 *)data;
 
 	if (levels == 5 && !pgtable_l5_enabled())
 		levels = 4;
@@ -1391,8 +1390,6 @@ static int __init __kpti_install_ng_mappings(void *__unused)
 
 	if (!cpu) {
 		int ret;
-
-		alloc = __get_free_pages(GFP_ATOMIC | __GFP_ZERO, order);
 		kpti_ng_temp_pgd = (pgd_t *)(alloc + (levels - 1) * PAGE_SIZE);
 		kpti_ng_temp_alloc = kpti_ng_temp_pgd_pa = __pa(kpti_ng_temp_pgd);
 
@@ -1423,16 +1420,17 @@ static int __init __kpti_install_ng_mappings(void *__unused)
 	remap_fn(cpu, num_online_cpus(), kpti_ng_temp_pgd_pa, KPTI_NG_TEMP_VA);
 	cpu_uninstall_idmap();
 
-	if (!cpu) {
-		free_pages(alloc, order);
+	if (!cpu)
 		arm64_use_ng_mappings = true;
-	}
 
 	return 0;
 }
 
 void __init kpti_install_ng_mappings(void)
 {
+	int order = order_base_2(CONFIG_PGTABLE_LEVELS);
+	u64 alloc;
+
 	/* Check whether KPTI is going to be used */
 	if (!arm64_kernel_unmapped_at_el0())
 		return;
@@ -1445,8 +1443,14 @@ void __init kpti_install_ng_mappings(void)
 	if (arm64_use_ng_mappings)
 		return;
 
+	alloc = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+	if (!alloc)
+		panic("Failed to alloc page tables\n");
+
 	init_idmap_kpti_bbml2_flag();
-	stop_machine(__kpti_install_ng_mappings, NULL, cpu_online_mask);
+	stop_machine(__kpti_install_ng_mappings, &alloc, cpu_online_mask);
+
+	free_pages(alloc, order);
 }
 
 static pgprot_t __init kernel_exec_prot(void)
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/2]  fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64
  2025-12-18 19:47 [PATCH v3 0/2] fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64 Yeoreum Yun
  2025-12-18 19:47 ` [PATCH v3 1/2] arm64: mmu: avoid allocating pages while splitting the linear mapping Yeoreum Yun
  2025-12-18 19:47 ` [PATCH v3 2/2] arm64: mmu: avoid allocating pages while installing ng-mapping for KPTI Yeoreum Yun
@ 2025-12-31 10:07 ` Yeoreum Yun
  2025-12-31 12:34   ` David Hildenbrand (Red Hat)
  2 siblings, 1 reply; 11+ messages in thread
From: Yeoreum Yun @ 2025-12-31 10:07 UTC (permalink / raw)
  To: catalin.marinas, will, ryan.roberts, akpm, david, kevin.brodsky,
	quic_zhenhuah, dev.jain, yang, chaitanyas.prakash, bigeasy,
	clrkwllms, rostedt, lorenzo.stoakes, ardb, jackmanb, vbabka,
	mhocko
  Cc: linux-arm-kernel, linux-kernel, linux-rt-devel

Gentle ping in case of forgotten.

On Thu, Dec 18, 2025 at 07:47:48PM +0000, Yeoreum Yun wrote:
> Under PREEMPT_RT, calling generic memory allocation/free APIs
> (e.x) __get_free_pages(), pgtable_alloc(), free_pages() and etc
> with preemption disabled is not allowed, but allow only nolock() APIs series
> because it may acquire a spin lock that becomes sleepable on RT,
> potentially causing a sleep during page allocation
> (See Documentation/core-api/real-time/differences.rst, Memory allocation section).
>
> However, In arm64, __linear_map_split_to_ptes() and
> __kpti_install_ng_mappings() called by stopper thread via stop_machine()
> use generic memory allocation/free APIs.
>
> This patchset fixes this problem and based on v6.19-rc1
>
> Patch History
> ==============
> from v2 to v3:
>   - remove split-mode and split_args.
>     pass proper function pointer while spliting.
>   - rename function name.
>   - https://lore.kernel.org/all/20251217182007.2345700-1-yeoreum.yun@arm.com/
>
> from v1 to v2:
>   - drop pagetable_alloc_nolock()
>   - following @Ryan Roberts suggestion.
>   - https://lore.kernel.org/all/20251212161832.2067134-1-yeoreum.yun@arm.com/
>
>
> *** BLURB HERE ***
>
> Yeoreum Yun (2):
>   arm64: mmu: avoid allocating pages while splitting the linear mapping
>   arm64: mmu: avoid allocating pages while installing ng-mapping for
>     KPTI
>
>  arch/arm64/mm/mmu.c | 254 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 197 insertions(+), 57 deletions(-)
>
> --
> LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
>

--
Sincerely,
Yeoreum Yun


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/2] fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64
  2025-12-31 10:07 ` [PATCH v3 0/2] fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64 Yeoreum Yun
@ 2025-12-31 12:34   ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-31 12:34 UTC (permalink / raw)
  To: Yeoreum Yun, catalin.marinas, will, ryan.roberts, akpm,
	kevin.brodsky, quic_zhenhuah, dev.jain, yang, chaitanyas.prakash,
	bigeasy, clrkwllms, rostedt, lorenzo.stoakes, ardb, jackmanb,
	vbabka, mhocko
  Cc: linux-arm-kernel, linux-kernel, linux-rt-devel

On 12/31/25 11:07, Yeoreum Yun wrote:
> Gentle ping in case of forgotten.

A lot of people (including me, wait, why am I reading mails ;) ) are 
currently out, and will likely be out for the remainder of the week.

I'd assume people will look into it next week.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] arm64: mmu: avoid allocating pages while splitting the linear mapping
  2025-12-18 19:47 ` [PATCH v3 1/2] arm64: mmu: avoid allocating pages while splitting the linear mapping Yeoreum Yun
@ 2026-01-02 11:03   ` Ryan Roberts
  2026-01-02 11:09     ` Ryan Roberts
  2026-01-02 12:21     ` Yeoreum Yun
  0 siblings, 2 replies; 11+ messages in thread
From: Ryan Roberts @ 2026-01-02 11:03 UTC (permalink / raw)
  To: Yeoreum Yun, catalin.marinas, will, akpm, david, kevin.brodsky,
	quic_zhenhuah, dev.jain, yang, chaitanyas.prakash, bigeasy,
	clrkwllms, rostedt, lorenzo.stoakes, ardb, jackmanb, vbabka,
	mhocko
  Cc: linux-arm-kernel, linux-kernel, linux-rt-devel

On 18/12/2025 19:47, Yeoreum Yun wrote:
> linear_map_split_to_ptes() currently allocates page tables while
> splitting the linear mapping into PTEs under stop_machine() using GFP_ATOMIC.
> 
> This is fine for non-PREEMPT_RT configurations.
> However, it becomes problematic on PREEMPT_RT, because
> generic memory allocation/free APIs (e.g. pgtable_alloc(), __get_free_pages(), etc.)
> cannot be called from a non-preemptible context, except for the _nolock() variants.
> This is because generic memory allocation/free paths are sleepable,
> as they rely on spin_lock(), which becomes sleepable on PREEMPT_RT.
> 
> In other words, even calling pgtable_alloc() with GFP_ATOMIC is not permitted
> in __linear_map_split_to_pte() when it is executed by the stopper thread,
>  where preemption is disabled on PREEMPT_RT.
> 
> To address this, the required number of page tables is first collected
> and preallocated, and the preallocated page tables are then used
> when splitting the linear mapping in __linear_map_split_to_pte().
> 
> Fixes: 3df6979d222b ("arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs")
> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> ---
>  arch/arm64/mm/mmu.c | 232 +++++++++++++++++++++++++++++++++++---------
>  1 file changed, 184 insertions(+), 48 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 9ae7ce00a7ef..96a9fa505e71 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -527,18 +527,15 @@ static void early_create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>  		panic("Failed to create page tables\n");
>  }
>  
> -static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
> -				       enum pgtable_type pgtable_type)
> -{
> -	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
> -	struct ptdesc *ptdesc = pagetable_alloc(gfp & ~__GFP_ZERO, 0);
> -	phys_addr_t pa;
> -
> -	if (!ptdesc)
> -		return INVALID_PHYS_ADDR;
> -
> -	pa = page_to_phys(ptdesc_page(ptdesc));
> +static struct ptdesc **split_pgtables;
> +static int split_pgtables_order;
> +static unsigned long split_pgtables_count;
> +static unsigned long split_pgtables_idx;
>  
> +static __always_inline void __pgd_pgtable_init(struct mm_struct *mm,
> +					       struct ptdesc *ptdesc,
> +					       enum pgtable_type pgtable_type)
> +{
>  	switch (pgtable_type) {
>  	case TABLE_PTE:
>  		BUG_ON(!pagetable_pte_ctor(mm, ptdesc));
> @@ -554,19 +551,28 @@ static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
>  		break;
>  	}
>  

nit: no need for this empty line

> -	return pa;
>  }
>  
> -static phys_addr_t
> -pgd_pgtable_alloc_init_mm_gfp(enum pgtable_type pgtable_type, gfp_t gfp)
> +static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,

nit: all remaining callers pass gfp=GFP_PGTABLE_KERNEL so you could remove the
param?

> +				       enum pgtable_type pgtable_type)
>  {
> -	return __pgd_pgtable_alloc(&init_mm, gfp, pgtable_type);
> +	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
> +	struct ptdesc *ptdesc = pagetable_alloc(gfp & ~__GFP_ZERO, 0);
> +
> +	if (!ptdesc)
> +		return INVALID_PHYS_ADDR;
> +
> +	__pgd_pgtable_init(mm, ptdesc, pgtable_type);
> +
> +	return page_to_phys(ptdesc_page(ptdesc));
>  }
>  
> +typedef phys_addr_t (split_pgtable_alloc_fn)(enum pgtable_type);

This type is used more generally than just for splitting. Perhaps simply call it
"pgtable_alloc_fn"?

We also pass this type around in __create_pgd_mapping() and friends; perhaps we
should have a preparatory patch to define this type and consistently use it
everywhere instead of passing around "phys_addr_t (*pgtable_alloc)(enum
pgtable_type)"?

> +
>  static phys_addr_t __maybe_unused

This is no longer __maybe_unused; you can drop the decorator.

>  pgd_pgtable_alloc_init_mm(enum pgtable_type pgtable_type)
>  {
> -	return pgd_pgtable_alloc_init_mm_gfp(pgtable_type, GFP_PGTABLE_KERNEL);
> +	return __pgd_pgtable_alloc(&init_mm, GFP_PGTABLE_KERNEL, pgtable_type);
>  }
>  
>  static phys_addr_t
> @@ -575,6 +581,23 @@ pgd_pgtable_alloc_special_mm(enum pgtable_type pgtable_type)
>  	return  __pgd_pgtable_alloc(NULL, GFP_PGTABLE_KERNEL, pgtable_type);
>  }
>  
> +static phys_addr_t
> +pgd_pgtable_get_preallocated(enum pgtable_type pgtable_type)
> +{
> +	struct ptdesc *ptdesc;
> +
> +	if (WARN_ON(split_pgtables_idx >= split_pgtables_count))
> +		return INVALID_PHYS_ADDR;
> +
> +	ptdesc = split_pgtables[split_pgtables_idx++];
> +	if (!ptdesc)
> +		return INVALID_PHYS_ADDR;
> +
> +	__pgd_pgtable_init(&init_mm, ptdesc, pgtable_type);
> +
> +	return page_to_phys(ptdesc_page(ptdesc));
> +}
> +
>  static void split_contpte(pte_t *ptep)
>  {
>  	int i;
> @@ -584,7 +607,9 @@ static void split_contpte(pte_t *ptep)
>  		__set_pte(ptep, pte_mknoncont(__ptep_get(ptep)));
>  }
>  
> -static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
> +static int split_pmd(pmd_t *pmdp, pmd_t pmd,
> +		     split_pgtable_alloc_fn *pgtable_alloc_fn,

nit: I believe the * has no effect when passing function pointers and the usual
convention in Linux is to not use the *. Existing functions are also calling it
"pgtable_alloc" so perhaps this is a bit more consistent:

pgtable_alloc_fn pgtable_alloc

(same nitty comment for all uses below :) )

> +		     bool to_cont)
>  {
>  	pmdval_t tableprot = PMD_TYPE_TABLE | PMD_TABLE_UXN | PMD_TABLE_AF;
>  	unsigned long pfn = pmd_pfn(pmd);
> @@ -593,7 +618,7 @@ static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
>  	pte_t *ptep;
>  	int i;
>  
> -	pte_phys = pgd_pgtable_alloc_init_mm_gfp(TABLE_PTE, gfp);
> +	pte_phys = pgtable_alloc_fn(TABLE_PTE);
>  	if (pte_phys == INVALID_PHYS_ADDR)
>  		return -ENOMEM;
>  	ptep = (pte_t *)phys_to_virt(pte_phys);
> @@ -628,7 +653,9 @@ static void split_contpmd(pmd_t *pmdp)
>  		set_pmd(pmdp, pmd_mknoncont(pmdp_get(pmdp)));
>  }
>  
> -static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
> +static int split_pud(pud_t *pudp, pud_t pud,
> +		     split_pgtable_alloc_fn *pgtable_alloc_fn,
> +		     bool to_cont)
>  {
>  	pudval_t tableprot = PUD_TYPE_TABLE | PUD_TABLE_UXN | PUD_TABLE_AF;
>  	unsigned int step = PMD_SIZE >> PAGE_SHIFT;
> @@ -638,7 +665,7 @@ static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
>  	pmd_t *pmdp;
>  	int i;
>  
> -	pmd_phys = pgd_pgtable_alloc_init_mm_gfp(TABLE_PMD, gfp);
> +	pmd_phys = pgtable_alloc_fn(TABLE_PMD);
>  	if (pmd_phys == INVALID_PHYS_ADDR)
>  		return -ENOMEM;
>  	pmdp = (pmd_t *)phys_to_virt(pmd_phys);
> @@ -707,7 +734,7 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
>  	if (!pud_present(pud))
>  		goto out;
>  	if (pud_leaf(pud)) {
> -		ret = split_pud(pudp, pud, GFP_PGTABLE_KERNEL, true);
> +		ret = split_pud(pudp, pud, pgd_pgtable_alloc_init_mm, true);
>  		if (ret)
>  			goto out;
>  	}
> @@ -732,7 +759,7 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
>  		 */
>  		if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
>  			goto out;
> -		ret = split_pmd(pmdp, pmd, GFP_PGTABLE_KERNEL, true);
> +		ret = split_pmd(pmdp, pmd, pgd_pgtable_alloc_init_mm, true);
>  		if (ret)
>  			goto out;
>  	}
> @@ -831,34 +858,35 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
>  static int split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
>  				   unsigned long next, struct mm_walk *walk)
>  {
> -	gfp_t gfp = *(gfp_t *)walk->private;
> +	split_pgtable_alloc_fn *pgtable_alloc_fn = walk->private;
>  	pud_t pud = pudp_get(pudp);
> -	int ret = 0;
>  
> -	if (pud_leaf(pud))
> -		ret = split_pud(pudp, pud, gfp, false);
> +	if (!pud_leaf(pud))
> +		return 0;
>  
> -	return ret;
> +	return split_pud(pudp, pud, pgtable_alloc_fn, false);

why are you changing the layout of this function? Seems like unneccessary churn.
Just pass pgtable_alloc to split_pud() instead of gfp.

>  }
>  
>  static int split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
>  				   unsigned long next, struct mm_walk *walk)
>  {
> -	gfp_t gfp = *(gfp_t *)walk->private;
> +	split_pgtable_alloc_fn *pgtable_alloc_fn = walk->private;
>  	pmd_t pmd = pmdp_get(pmdp);
> -	int ret = 0;
> +	int ret;
>  
> -	if (pmd_leaf(pmd)) {
> -		if (pmd_cont(pmd))
> -			split_contpmd(pmdp);
> -		ret = split_pmd(pmdp, pmd, gfp, false);
> +	if (!pmd_leaf(pmd))
> +		return 0;
>  
> -		/*
> -		 * We have split the pmd directly to ptes so there is no need to
> -		 * visit each pte to check if they are contpte.
> -		 */
> -		walk->action = ACTION_CONTINUE;
> -	}
> +	if (pmd_cont(pmd))
> +		split_contpmd(pmdp);
> +
> +	ret = split_pmd(pmdp, pmd, pgtable_alloc_fn, false);
> +
> +	/*
> +	 * We have split the pmd directly to ptes so there is no need to
> +	 * visit each pte to check if they are contpte.
> +	 */
> +	walk->action = ACTION_CONTINUE;

Same comment; no need to change the layout of the function.

>  
>  	return ret;
>  }
> @@ -880,13 +908,15 @@ static const struct mm_walk_ops split_to_ptes_ops = {
>  	.pte_entry	= split_to_ptes_pte_entry,
>  };
>  
> -static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp)
> +static int range_split_to_ptes(unsigned long start, unsigned long end,
> +			       split_pgtable_alloc_fn *pgtable_alloc_fn)
>  {
>  	int ret;
>  
>  	arch_enter_lazy_mmu_mode();
>  	ret = walk_kernel_page_table_range_lockless(start, end,
> -					&split_to_ptes_ops, NULL, &gfp);
> +						    &split_to_ptes_ops, NULL,
> +						    pgtable_alloc_fn);
>  	arch_leave_lazy_mmu_mode();
>  
>  	return ret;
> @@ -903,6 +933,105 @@ static void __init init_idmap_kpti_bbml2_flag(void)
>  	smp_mb();
>  }
>  
> +static int __init
> +collect_to_split_pud_entry(pud_t *pudp, unsigned long addr,
> +			   unsigned long next, struct mm_walk *walk)
> +{
> +	pud_t pud = pudp_get(pudp);
> +
> +	if (pud_leaf(pud))
> +		split_pgtables_count += 1 + PTRS_PER_PMD;

I think you probably want:

	walk->action = ACTION_CONTINUE;

Likely you will end up with the same behaviour regardless. But you should at
least we consistent with collect_to_split_pmd_entry().

> +
> +	return 0;
> +}
> +
> +static int __init
> +collect_to_split_pmd_entry(pmd_t *pmdp, unsigned long addr,
> +			   unsigned long next, struct mm_walk *walk)
> +{
> +	pmd_t pmd = pmdp_get(pmdp);
> +
> +	if (pmd_leaf(pmd))
> +		split_pgtables_count++;
> +
> +	walk->action = ACTION_CONTINUE;
> +
> +	return 0;
> +}
> +
> +static void __init linear_map_free_split_pgtables(void)
> +{
> +	int i;
> +
> +	if (!split_pgtables_count || !split_pgtables)
> +		goto skip_free;
> +
> +	for (i = split_pgtables_idx; i < split_pgtables_count; i++) {
> +		if (split_pgtables[i])
> +			pagetable_free(split_pgtables[i]);
> +	}
> +
> +	free_pages((unsigned long)split_pgtables, split_pgtables_order);
> +
> +skip_free:
> +	split_pgtables = NULL;
> +	split_pgtables_count = 0;
> +	split_pgtables_idx = 0;
> +	split_pgtables_order = 0;
> +}
> +
> +static int __init linear_map_prealloc_split_pgtables(void)
> +{
> +	int ret, i;
> +	unsigned long lstart = _PAGE_OFFSET(vabits_actual);
> +	unsigned long lend = PAGE_END;
> +	unsigned long kstart = (unsigned long)lm_alias(_stext);
> +	unsigned long kend = (unsigned long)lm_alias(__init_begin);
> +
> +	const struct mm_walk_ops collect_to_split_ops = {
> +		.pud_entry	= collect_to_split_pud_entry,
> +		.pmd_entry	= collect_to_split_pmd_entry
> +	};
> +
> +	split_pgtables_idx = 0;
> +	split_pgtables_count = 0;
> +
> +	ret = walk_kernel_page_table_range_lockless(lstart, kstart,
> +						    &collect_to_split_ops,
> +						    NULL, NULL);
> +	if (!ret)
> +		ret = walk_kernel_page_table_range_lockless(kend, lend,
> +							    &collect_to_split_ops,
> +							    NULL, NULL);
> +	if (ret || !split_pgtables_count)
> +		goto error;
> +
> +	ret = -ENOMEM;
> +
> +	split_pgtables_order =
> +		order_base_2(PAGE_ALIGN(split_pgtables_count *
> +					sizeof(struct ptdesc *)) >> PAGE_SHIFT);

Wouldn't this be simpler?

	split_pgtables_order = get_order(split_pgtables_count *
					 sizeof(struct ptdesc *));

> +
> +	split_pgtables = (struct ptdesc **) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +							     split_pgtables_order);

Do you really need the cast? (I'm not sure).

> +	if (!split_pgtables)
> +		goto error;
> +
> +	for (i = 0; i < split_pgtables_count; i++) {
> +		split_pgtables[i] = pagetable_alloc(GFP_KERNEL, 0);

For consistency with other code, perhaps spell it out?:

	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
	gfp_t gfp = GFP_PGTABLE_KERNEL & ~__GFP_ZERO;

	for (i = 0; i < split_pgtables_count; i++) {
		split_pgtables[i] = pagetable_alloc(gfp, 0);

> +		if (!split_pgtables[i])
> +			goto error;
> +	}
> +
> +	ret = 0;
> +
> +error:
> +	if (ret)
> +		linear_map_free_split_pgtables();
> +
> +	return ret;
> +}

I wonder if there is value in generalizing this a bit to separate out the
determination of the number of pages from the actual pre-allocation and free
functions? Then you have a reusable pre-allocation function that you could also
use for the KPTI case instead of having it have yet another private pre-allocator?

> +
>  static int __init linear_map_split_to_ptes(void *__unused)
>  {
>  	/*
> @@ -928,9 +1057,9 @@ static int __init linear_map_split_to_ptes(void *__unused)
>  		 * PTE. The kernel alias remains static throughout runtime so
>  		 * can continue to be safely mapped with large mappings.
>  		 */
> -		ret = range_split_to_ptes(lstart, kstart, GFP_ATOMIC);
> +		ret = range_split_to_ptes(lstart, kstart, pgd_pgtable_get_preallocated);
>  		if (!ret)
> -			ret = range_split_to_ptes(kend, lend, GFP_ATOMIC);
> +			ret = range_split_to_ptes(kend, lend, pgd_pgtable_get_preallocated);
>  		if (ret)
>  			panic("Failed to split linear map\n");
>  		flush_tlb_kernel_range(lstart, lend);
> @@ -963,10 +1092,16 @@ static int __init linear_map_split_to_ptes(void *__unused)
>  
>  void __init linear_map_maybe_split_to_ptes(void)
>  {
> -	if (linear_map_requires_bbml2 && !system_supports_bbml2_noabort()) {
> -		init_idmap_kpti_bbml2_flag();
> -		stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
> -	}
> +	if (!linear_map_requires_bbml2 || system_supports_bbml2_noabort())
> +		return;
> +
> +	if (linear_map_prealloc_split_pgtables())
> +		panic("Failed to split linear map\n");
> +
> +	init_idmap_kpti_bbml2_flag();
> +	stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
> +
> +	linear_map_free_split_pgtables();
>  }
>  
>  /*
> @@ -1088,6 +1223,7 @@ bool arch_kfence_init_pool(void)
>  	unsigned long end = start + KFENCE_POOL_SIZE;
>  	int ret;
>  
> +

nit: Remove extra empty line.

This is looking much cleaner now; nearly there!

Thanks,
Ryan


>  	/* Exit early if we know the linear map is already pte-mapped. */
>  	if (!split_leaf_mapping_possible())
>  		return true;
> @@ -1097,7 +1233,7 @@ bool arch_kfence_init_pool(void)
>  		return true;
>  
>  	mutex_lock(&pgtable_split_lock);
> -	ret = range_split_to_ptes(start, end, GFP_PGTABLE_KERNEL);
> +	ret = range_split_to_ptes(start, end, pgd_pgtable_alloc_init_mm);
>  	mutex_unlock(&pgtable_split_lock);
>  
>  	/*



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] arm64: mmu: avoid allocating pages while splitting the linear mapping
  2026-01-02 11:03   ` Ryan Roberts
@ 2026-01-02 11:09     ` Ryan Roberts
  2026-01-02 12:21     ` Yeoreum Yun
  1 sibling, 0 replies; 11+ messages in thread
From: Ryan Roberts @ 2026-01-02 11:09 UTC (permalink / raw)
  To: Yeoreum Yun, catalin.marinas, will, akpm, david, kevin.brodsky,
	quic_zhenhuah, dev.jain, yang, chaitanyas.prakash, bigeasy,
	clrkwllms, rostedt, lorenzo.stoakes, ardb, jackmanb, vbabka,
	mhocko
  Cc: linux-arm-kernel, linux-kernel, linux-rt-devel

On 02/01/2026 11:03, Ryan Roberts wrote:
> On 18/12/2025 19:47, Yeoreum Yun wrote:
>> linear_map_split_to_ptes() currently allocates page tables while
>> splitting the linear mapping into PTEs under stop_machine() using GFP_ATOMIC.
>>
>> This is fine for non-PREEMPT_RT configurations.
>> However, it becomes problematic on PREEMPT_RT, because
>> generic memory allocation/free APIs (e.g. pgtable_alloc(), __get_free_pages(), etc.)
>> cannot be called from a non-preemptible context, except for the _nolock() variants.
>> This is because generic memory allocation/free paths are sleepable,
>> as they rely on spin_lock(), which becomes sleepable on PREEMPT_RT.
>>
>> In other words, even calling pgtable_alloc() with GFP_ATOMIC is not permitted
>> in __linear_map_split_to_pte() when it is executed by the stopper thread,
>>  where preemption is disabled on PREEMPT_RT.
>>
>> To address this, the required number of page tables is first collected
>> and preallocated, and the preallocated page tables are then used
>> when splitting the linear mapping in __linear_map_split_to_pte().
>>
>> Fixes: 3df6979d222b ("arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs")
>> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
>> ---
>>  arch/arm64/mm/mmu.c | 232 +++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 184 insertions(+), 48 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 9ae7ce00a7ef..96a9fa505e71 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -527,18 +527,15 @@ static void early_create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>>  		panic("Failed to create page tables\n");
>>  }
>>  
>> -static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
>> -				       enum pgtable_type pgtable_type)
>> -{
>> -	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
>> -	struct ptdesc *ptdesc = pagetable_alloc(gfp & ~__GFP_ZERO, 0);
>> -	phys_addr_t pa;
>> -
>> -	if (!ptdesc)
>> -		return INVALID_PHYS_ADDR;
>> -
>> -	pa = page_to_phys(ptdesc_page(ptdesc));
>> +static struct ptdesc **split_pgtables;
>> +static int split_pgtables_order;
>> +static unsigned long split_pgtables_count;
>> +static unsigned long split_pgtables_idx;
>>  
>> +static __always_inline void __pgd_pgtable_init(struct mm_struct *mm,
>> +					       struct ptdesc *ptdesc,
>> +					       enum pgtable_type pgtable_type)
>> +{
>>  	switch (pgtable_type) {
>>  	case TABLE_PTE:
>>  		BUG_ON(!pagetable_pte_ctor(mm, ptdesc));
>> @@ -554,19 +551,28 @@ static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
>>  		break;
>>  	}
>>  
> 
> nit: no need for this empty line
> 
>> -	return pa;
>>  }
>>  
>> -static phys_addr_t
>> -pgd_pgtable_alloc_init_mm_gfp(enum pgtable_type pgtable_type, gfp_t gfp)
>> +static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
> 
> nit: all remaining callers pass gfp=GFP_PGTABLE_KERNEL so you could remove the
> param?
> 
>> +				       enum pgtable_type pgtable_type)
>>  {
>> -	return __pgd_pgtable_alloc(&init_mm, gfp, pgtable_type);
>> +	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
>> +	struct ptdesc *ptdesc = pagetable_alloc(gfp & ~__GFP_ZERO, 0);
>> +
>> +	if (!ptdesc)
>> +		return INVALID_PHYS_ADDR;
>> +
>> +	__pgd_pgtable_init(mm, ptdesc, pgtable_type);
>> +
>> +	return page_to_phys(ptdesc_page(ptdesc));
>>  }
>>  
>> +typedef phys_addr_t (split_pgtable_alloc_fn)(enum pgtable_type);
> 
> This type is used more generally than just for splitting. Perhaps simply call it
> "pgtable_alloc_fn"?
> 
> We also pass this type around in __create_pgd_mapping() and friends; perhaps we
> should have a preparatory patch to define this type and consistently use it
> everywhere instead of passing around "phys_addr_t (*pgtable_alloc)(enum
> pgtable_type)"?
> 
>> +
>>  static phys_addr_t __maybe_unused
> 
> This is no longer __maybe_unused; you can drop the decorator.
> 
>>  pgd_pgtable_alloc_init_mm(enum pgtable_type pgtable_type)
>>  {
>> -	return pgd_pgtable_alloc_init_mm_gfp(pgtable_type, GFP_PGTABLE_KERNEL);
>> +	return __pgd_pgtable_alloc(&init_mm, GFP_PGTABLE_KERNEL, pgtable_type);
>>  }
>>  
>>  static phys_addr_t
>> @@ -575,6 +581,23 @@ pgd_pgtable_alloc_special_mm(enum pgtable_type pgtable_type)
>>  	return  __pgd_pgtable_alloc(NULL, GFP_PGTABLE_KERNEL, pgtable_type);
>>  }
>>  
>> +static phys_addr_t
>> +pgd_pgtable_get_preallocated(enum pgtable_type pgtable_type)
>> +{
>> +	struct ptdesc *ptdesc;
>> +
>> +	if (WARN_ON(split_pgtables_idx >= split_pgtables_count))
>> +		return INVALID_PHYS_ADDR;
>> +
>> +	ptdesc = split_pgtables[split_pgtables_idx++];
>> +	if (!ptdesc)
>> +		return INVALID_PHYS_ADDR;
>> +
>> +	__pgd_pgtable_init(&init_mm, ptdesc, pgtable_type);
>> +
>> +	return page_to_phys(ptdesc_page(ptdesc));
>> +}
>> +
>>  static void split_contpte(pte_t *ptep)
>>  {
>>  	int i;
>> @@ -584,7 +607,9 @@ static void split_contpte(pte_t *ptep)
>>  		__set_pte(ptep, pte_mknoncont(__ptep_get(ptep)));
>>  }
>>  
>> -static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
>> +static int split_pmd(pmd_t *pmdp, pmd_t pmd,
>> +		     split_pgtable_alloc_fn *pgtable_alloc_fn,
> 
> nit: I believe the * has no effect when passing function pointers and the usual
> convention in Linux is to not use the *. Existing functions are also calling it
> "pgtable_alloc" so perhaps this is a bit more consistent:
> 
> pgtable_alloc_fn pgtable_alloc
> 
> (same nitty comment for all uses below :) )
> 
>> +		     bool to_cont)
>>  {
>>  	pmdval_t tableprot = PMD_TYPE_TABLE | PMD_TABLE_UXN | PMD_TABLE_AF;
>>  	unsigned long pfn = pmd_pfn(pmd);
>> @@ -593,7 +618,7 @@ static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
>>  	pte_t *ptep;
>>  	int i;
>>  
>> -	pte_phys = pgd_pgtable_alloc_init_mm_gfp(TABLE_PTE, gfp);
>> +	pte_phys = pgtable_alloc_fn(TABLE_PTE);
>>  	if (pte_phys == INVALID_PHYS_ADDR)
>>  		return -ENOMEM;
>>  	ptep = (pte_t *)phys_to_virt(pte_phys);
>> @@ -628,7 +653,9 @@ static void split_contpmd(pmd_t *pmdp)
>>  		set_pmd(pmdp, pmd_mknoncont(pmdp_get(pmdp)));
>>  }
>>  
>> -static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
>> +static int split_pud(pud_t *pudp, pud_t pud,
>> +		     split_pgtable_alloc_fn *pgtable_alloc_fn,
>> +		     bool to_cont)
>>  {
>>  	pudval_t tableprot = PUD_TYPE_TABLE | PUD_TABLE_UXN | PUD_TABLE_AF;
>>  	unsigned int step = PMD_SIZE >> PAGE_SHIFT;
>> @@ -638,7 +665,7 @@ static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
>>  	pmd_t *pmdp;
>>  	int i;
>>  
>> -	pmd_phys = pgd_pgtable_alloc_init_mm_gfp(TABLE_PMD, gfp);
>> +	pmd_phys = pgtable_alloc_fn(TABLE_PMD);
>>  	if (pmd_phys == INVALID_PHYS_ADDR)
>>  		return -ENOMEM;
>>  	pmdp = (pmd_t *)phys_to_virt(pmd_phys);
>> @@ -707,7 +734,7 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
>>  	if (!pud_present(pud))
>>  		goto out;
>>  	if (pud_leaf(pud)) {
>> -		ret = split_pud(pudp, pud, GFP_PGTABLE_KERNEL, true);
>> +		ret = split_pud(pudp, pud, pgd_pgtable_alloc_init_mm, true);
>>  		if (ret)
>>  			goto out;
>>  	}
>> @@ -732,7 +759,7 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
>>  		 */
>>  		if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
>>  			goto out;
>> -		ret = split_pmd(pmdp, pmd, GFP_PGTABLE_KERNEL, true);
>> +		ret = split_pmd(pmdp, pmd, pgd_pgtable_alloc_init_mm, true);
>>  		if (ret)
>>  			goto out;
>>  	}
>> @@ -831,34 +858,35 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
>>  static int split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
>>  				   unsigned long next, struct mm_walk *walk)
>>  {
>> -	gfp_t gfp = *(gfp_t *)walk->private;
>> +	split_pgtable_alloc_fn *pgtable_alloc_fn = walk->private;
>>  	pud_t pud = pudp_get(pudp);
>> -	int ret = 0;
>>  
>> -	if (pud_leaf(pud))
>> -		ret = split_pud(pudp, pud, gfp, false);
>> +	if (!pud_leaf(pud))
>> +		return 0;
>>  
>> -	return ret;
>> +	return split_pud(pudp, pud, pgtable_alloc_fn, false);
> 
> why are you changing the layout of this function? Seems like unneccessary churn.
> Just pass pgtable_alloc to split_pud() instead of gfp.
> 
>>  }
>>  
>>  static int split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
>>  				   unsigned long next, struct mm_walk *walk)
>>  {
>> -	gfp_t gfp = *(gfp_t *)walk->private;
>> +	split_pgtable_alloc_fn *pgtable_alloc_fn = walk->private;
>>  	pmd_t pmd = pmdp_get(pmdp);
>> -	int ret = 0;
>> +	int ret;
>>  
>> -	if (pmd_leaf(pmd)) {
>> -		if (pmd_cont(pmd))
>> -			split_contpmd(pmdp);
>> -		ret = split_pmd(pmdp, pmd, gfp, false);
>> +	if (!pmd_leaf(pmd))
>> +		return 0;
>>  
>> -		/*
>> -		 * We have split the pmd directly to ptes so there is no need to
>> -		 * visit each pte to check if they are contpte.
>> -		 */
>> -		walk->action = ACTION_CONTINUE;
>> -	}
>> +	if (pmd_cont(pmd))
>> +		split_contpmd(pmdp);
>> +
>> +	ret = split_pmd(pmdp, pmd, pgtable_alloc_fn, false);
>> +
>> +	/*
>> +	 * We have split the pmd directly to ptes so there is no need to
>> +	 * visit each pte to check if they are contpte.
>> +	 */
>> +	walk->action = ACTION_CONTINUE;
> 
> Same comment; no need to change the layout of the function.
> 
>>  
>>  	return ret;
>>  }
>> @@ -880,13 +908,15 @@ static const struct mm_walk_ops split_to_ptes_ops = {
>>  	.pte_entry	= split_to_ptes_pte_entry,
>>  };
>>  
>> -static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp)
>> +static int range_split_to_ptes(unsigned long start, unsigned long end,
>> +			       split_pgtable_alloc_fn *pgtable_alloc_fn)
>>  {
>>  	int ret;
>>  
>>  	arch_enter_lazy_mmu_mode();
>>  	ret = walk_kernel_page_table_range_lockless(start, end,
>> -					&split_to_ptes_ops, NULL, &gfp);
>> +						    &split_to_ptes_ops, NULL,
>> +						    pgtable_alloc_fn);
>>  	arch_leave_lazy_mmu_mode();
>>  
>>  	return ret;
>> @@ -903,6 +933,105 @@ static void __init init_idmap_kpti_bbml2_flag(void)
>>  	smp_mb();
>>  }
>>  
>> +static int __init
>> +collect_to_split_pud_entry(pud_t *pudp, unsigned long addr,
>> +			   unsigned long next, struct mm_walk *walk)
>> +{
>> +	pud_t pud = pudp_get(pudp);
>> +
>> +	if (pud_leaf(pud))
>> +		split_pgtables_count += 1 + PTRS_PER_PMD;
> 
> I think you probably want:
> 
> 	walk->action = ACTION_CONTINUE;
> 
> Likely you will end up with the same behaviour regardless. But you should at
> least we consistent with collect_to_split_pmd_entry().
> 
>> +
>> +	return 0;
>> +}
>> +
>> +static int __init
>> +collect_to_split_pmd_entry(pmd_t *pmdp, unsigned long addr,
>> +			   unsigned long next, struct mm_walk *walk)
>> +{
>> +	pmd_t pmd = pmdp_get(pmdp);
>> +
>> +	if (pmd_leaf(pmd))
>> +		split_pgtables_count++;
>> +
>> +	walk->action = ACTION_CONTINUE;
>> +
>> +	return 0;
>> +}
>> +
>> +static void __init linear_map_free_split_pgtables(void)
>> +{
>> +	int i;
>> +
>> +	if (!split_pgtables_count || !split_pgtables)
>> +		goto skip_free;
>> +
>> +	for (i = split_pgtables_idx; i < split_pgtables_count; i++) {
>> +		if (split_pgtables[i])
>> +			pagetable_free(split_pgtables[i]);
>> +	}
>> +
>> +	free_pages((unsigned long)split_pgtables, split_pgtables_order);
>> +
>> +skip_free:
>> +	split_pgtables = NULL;
>> +	split_pgtables_count = 0;
>> +	split_pgtables_idx = 0;
>> +	split_pgtables_order = 0;
>> +}
>> +
>> +static int __init linear_map_prealloc_split_pgtables(void)
>> +{
>> +	int ret, i;
>> +	unsigned long lstart = _PAGE_OFFSET(vabits_actual);
>> +	unsigned long lend = PAGE_END;
>> +	unsigned long kstart = (unsigned long)lm_alias(_stext);
>> +	unsigned long kend = (unsigned long)lm_alias(__init_begin);
>> +
>> +	const struct mm_walk_ops collect_to_split_ops = {
>> +		.pud_entry	= collect_to_split_pud_entry,
>> +		.pmd_entry	= collect_to_split_pmd_entry
>> +	};
>> +
>> +	split_pgtables_idx = 0;
>> +	split_pgtables_count = 0;
>> +
>> +	ret = walk_kernel_page_table_range_lockless(lstart, kstart,
>> +						    &collect_to_split_ops,
>> +						    NULL, NULL);
>> +	if (!ret)
>> +		ret = walk_kernel_page_table_range_lockless(kend, lend,
>> +							    &collect_to_split_ops,
>> +							    NULL, NULL);
>> +	if (ret || !split_pgtables_count)
>> +		goto error;
>> +
>> +	ret = -ENOMEM;
>> +
>> +	split_pgtables_order =
>> +		order_base_2(PAGE_ALIGN(split_pgtables_count *
>> +					sizeof(struct ptdesc *)) >> PAGE_SHIFT);
> 
> Wouldn't this be simpler?
> 
> 	split_pgtables_order = get_order(split_pgtables_count *
> 					 sizeof(struct ptdesc *));
> 
>> +
>> +	split_pgtables = (struct ptdesc **) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
>> +							     split_pgtables_order);
> 
> Do you really need the cast? (I'm not sure).
> 
>> +	if (!split_pgtables)
>> +		goto error;
>> +
>> +	for (i = 0; i < split_pgtables_count; i++) {
>> +		split_pgtables[i] = pagetable_alloc(GFP_KERNEL, 0);
> 
> For consistency with other code, perhaps spell it out?:
> 
> 	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
> 	gfp_t gfp = GFP_PGTABLE_KERNEL & ~__GFP_ZERO;
> 
> 	for (i = 0; i < split_pgtables_count; i++) {
> 		split_pgtables[i] = pagetable_alloc(gfp, 0);
> 
>> +		if (!split_pgtables[i])
>> +			goto error;
>> +	}
>> +
>> +	ret = 0;
>> +
>> +error:
>> +	if (ret)
>> +		linear_map_free_split_pgtables();
>> +
>> +	return ret;
>> +}
> 
> I wonder if there is value in generalizing this a bit to separate out the
> determination of the number of pages from the actual pre-allocation and free
> functions? Then you have a reusable pre-allocation function that you could also
> use for the KPTI case instead of having it have yet another private pre-allocator?

Actually ignore this comment; looks like the KPTI case requires the allocation
to be contiguous because it needs to map it into a temp pgtable. So there is a
valid reason for the 2 pre-allocators to be different.

> 
>> +
>>  static int __init linear_map_split_to_ptes(void *__unused)
>>  {
>>  	/*
>> @@ -928,9 +1057,9 @@ static int __init linear_map_split_to_ptes(void *__unused)
>>  		 * PTE. The kernel alias remains static throughout runtime so
>>  		 * can continue to be safely mapped with large mappings.
>>  		 */
>> -		ret = range_split_to_ptes(lstart, kstart, GFP_ATOMIC);
>> +		ret = range_split_to_ptes(lstart, kstart, pgd_pgtable_get_preallocated);
>>  		if (!ret)
>> -			ret = range_split_to_ptes(kend, lend, GFP_ATOMIC);
>> +			ret = range_split_to_ptes(kend, lend, pgd_pgtable_get_preallocated);
>>  		if (ret)
>>  			panic("Failed to split linear map\n");
>>  		flush_tlb_kernel_range(lstart, lend);
>> @@ -963,10 +1092,16 @@ static int __init linear_map_split_to_ptes(void *__unused)
>>  
>>  void __init linear_map_maybe_split_to_ptes(void)
>>  {
>> -	if (linear_map_requires_bbml2 && !system_supports_bbml2_noabort()) {
>> -		init_idmap_kpti_bbml2_flag();
>> -		stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
>> -	}
>> +	if (!linear_map_requires_bbml2 || system_supports_bbml2_noabort())
>> +		return;
>> +
>> +	if (linear_map_prealloc_split_pgtables())
>> +		panic("Failed to split linear map\n");
>> +
>> +	init_idmap_kpti_bbml2_flag();
>> +	stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
>> +
>> +	linear_map_free_split_pgtables();
>>  }
>>  
>>  /*
>> @@ -1088,6 +1223,7 @@ bool arch_kfence_init_pool(void)
>>  	unsigned long end = start + KFENCE_POOL_SIZE;
>>  	int ret;
>>  
>> +
> 
> nit: Remove extra empty line.
> 
> This is looking much cleaner now; nearly there!
> 
> Thanks,
> Ryan
> 
> 
>>  	/* Exit early if we know the linear map is already pte-mapped. */
>>  	if (!split_leaf_mapping_possible())
>>  		return true;
>> @@ -1097,7 +1233,7 @@ bool arch_kfence_init_pool(void)
>>  		return true;
>>  
>>  	mutex_lock(&pgtable_split_lock);
>> -	ret = range_split_to_ptes(start, end, GFP_PGTABLE_KERNEL);
>> +	ret = range_split_to_ptes(start, end, pgd_pgtable_alloc_init_mm);
>>  	mutex_unlock(&pgtable_split_lock);
>>  
>>  	/*
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] arm64: mmu: avoid allocating pages while installing ng-mapping for KPTI
  2025-12-18 19:47 ` [PATCH v3 2/2] arm64: mmu: avoid allocating pages while installing ng-mapping for KPTI Yeoreum Yun
@ 2026-01-02 11:10   ` Ryan Roberts
  2026-01-02 14:48     ` Yeoreum Yun
  0 siblings, 1 reply; 11+ messages in thread
From: Ryan Roberts @ 2026-01-02 11:10 UTC (permalink / raw)
  To: Yeoreum Yun, catalin.marinas, will, akpm, david, kevin.brodsky,
	quic_zhenhuah, dev.jain, yang, chaitanyas.prakash, bigeasy,
	clrkwllms, rostedt, lorenzo.stoakes, ardb, jackmanb, vbabka,
	mhocko
  Cc: linux-arm-kernel, linux-kernel, linux-rt-devel

On 18/12/2025 19:47, Yeoreum Yun wrote:
> The current __kpti_install_ng_mappings() allocates a temporary PGD
> while installing the NG mapping for KPTI under stop_machine(),
> using GFP_ATOMIC.
> 
> This is fine in the non-PREEMPT_RT case. However, it becomes a problem
> under PREEMPT_RT because generic memory allocation/free APIs
> (e.g., pgtable_alloc(), __get_free_pages(), etc.) cannot be invoked
> in a non-preemptible context, except for the *_nolock() variants.
> These generic allocators may sleep due to their use of spin_lock().
> 
> In other words, calling __get_free_pages(), even with GFP_ATOMIC,
> is not allowed in __kpti_install_ng_mappings(), which is executed by
> the stopper thread where preemption is disabled under PREEMPT_RT.
> 
> To address this, preallocate the page needed for the temporary PGD
> before invoking __kpti_install_ng_mappings() via stop_machine().
> 
> Fixes: 47546a1912fc ("arm64: mm: install KPTI nG mappings with MMU enabled")
> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/mm/mmu.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 96a9fa505e71..9ad9612728e6 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1369,7 +1369,7 @@ static phys_addr_t __init kpti_ng_pgd_alloc(enum pgtable_type type)
>  	return kpti_ng_temp_alloc;
>  }
>  
> -static int __init __kpti_install_ng_mappings(void *__unused)
> +static int __init __kpti_install_ng_mappings(void *data)
>  {
>  	typedef void (kpti_remap_fn)(int, int, phys_addr_t, unsigned long);
>  	extern kpti_remap_fn idmap_kpti_install_ng_mappings;
> @@ -1377,10 +1377,9 @@ static int __init __kpti_install_ng_mappings(void *__unused)
>  
>  	int cpu = smp_processor_id();
>  	int levels = CONFIG_PGTABLE_LEVELS;
> -	int order = order_base_2(levels);
>  	u64 kpti_ng_temp_pgd_pa = 0;
>  	pgd_t *kpti_ng_temp_pgd;
> -	u64 alloc = 0;
> +	u64 alloc = *(u64 *)data;
>  
>  	if (levels == 5 && !pgtable_l5_enabled())
>  		levels = 4;
> @@ -1391,8 +1390,6 @@ static int __init __kpti_install_ng_mappings(void *__unused)
>  
>  	if (!cpu) {
>  		int ret;
> -
> -		alloc = __get_free_pages(GFP_ATOMIC | __GFP_ZERO, order);
>  		kpti_ng_temp_pgd = (pgd_t *)(alloc + (levels - 1) * PAGE_SIZE);
>  		kpti_ng_temp_alloc = kpti_ng_temp_pgd_pa = __pa(kpti_ng_temp_pgd);
>  
> @@ -1423,16 +1420,17 @@ static int __init __kpti_install_ng_mappings(void *__unused)
>  	remap_fn(cpu, num_online_cpus(), kpti_ng_temp_pgd_pa, KPTI_NG_TEMP_VA);
>  	cpu_uninstall_idmap();
>  
> -	if (!cpu) {
> -		free_pages(alloc, order);
> +	if (!cpu)
>  		arm64_use_ng_mappings = true;
> -	}
>  
>  	return 0;
>  }
>  
>  void __init kpti_install_ng_mappings(void)
>  {
> +	int order = order_base_2(CONFIG_PGTABLE_LEVELS);
> +	u64 alloc;
> +

nit: Restore the blank line between the variable definitioins and the logic.

But you already have my R-b :)

>  	/* Check whether KPTI is going to be used */
>  	if (!arm64_kernel_unmapped_at_el0())
>  		return;
> @@ -1445,8 +1443,14 @@ void __init kpti_install_ng_mappings(void)
>  	if (arm64_use_ng_mappings)
>  		return;
>  
> +	alloc = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> +	if (!alloc)
> +		panic("Failed to alloc page tables\n");
> +
>  	init_idmap_kpti_bbml2_flag();
> -	stop_machine(__kpti_install_ng_mappings, NULL, cpu_online_mask);
> +	stop_machine(__kpti_install_ng_mappings, &alloc, cpu_online_mask);
> +
> +	free_pages(alloc, order);
>  }
>  
>  static pgprot_t __init kernel_exec_prot(void)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] arm64: mmu: avoid allocating pages while splitting the linear mapping
  2026-01-02 11:03   ` Ryan Roberts
  2026-01-02 11:09     ` Ryan Roberts
@ 2026-01-02 12:21     ` Yeoreum Yun
  2026-01-02 12:35       ` Ryan Roberts
  1 sibling, 1 reply; 11+ messages in thread
From: Yeoreum Yun @ 2026-01-02 12:21 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: catalin.marinas, will, akpm, david, kevin.brodsky, quic_zhenhuah,
	dev.jain, yang, chaitanyas.prakash, bigeasy, clrkwllms, rostedt,
	lorenzo.stoakes, ardb, jackmanb, vbabka, mhocko, linux-arm-kernel,
	linux-kernel, linux-rt-devel

Hi Ryan,

Thanks for your review :)

> > linear_map_split_to_ptes() currently allocates page tables while
> > splitting the linear mapping into PTEs under stop_machine() using GFP_ATOMIC.
> >
> > This is fine for non-PREEMPT_RT configurations.
> > However, it becomes problematic on PREEMPT_RT, because
> > generic memory allocation/free APIs (e.g. pgtable_alloc(), __get_free_pages(), etc.)
> > cannot be called from a non-preemptible context, except for the _nolock() variants.
> > This is because generic memory allocation/free paths are sleepable,
> > as they rely on spin_lock(), which becomes sleepable on PREEMPT_RT.
> >
> > In other words, even calling pgtable_alloc() with GFP_ATOMIC is not permitted
> > in __linear_map_split_to_pte() when it is executed by the stopper thread,
> >  where preemption is disabled on PREEMPT_RT.
> >
> > To address this, the required number of page tables is first collected
> > and preallocated, and the preallocated page tables are then used
> > when splitting the linear mapping in __linear_map_split_to_pte().
> >
> > Fixes: 3df6979d222b ("arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs")
> > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > ---
> >  arch/arm64/mm/mmu.c | 232 +++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 184 insertions(+), 48 deletions(-)
> >
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 9ae7ce00a7ef..96a9fa505e71 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -527,18 +527,15 @@ static void early_create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> >  		panic("Failed to create page tables\n");
> >  }
> >
> > -static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
> > -				       enum pgtable_type pgtable_type)
> > -{
> > -	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
> > -	struct ptdesc *ptdesc = pagetable_alloc(gfp & ~__GFP_ZERO, 0);
> > -	phys_addr_t pa;
> > -
> > -	if (!ptdesc)
> > -		return INVALID_PHYS_ADDR;
> > -
> > -	pa = page_to_phys(ptdesc_page(ptdesc));
> > +static struct ptdesc **split_pgtables;
> > +static int split_pgtables_order;
> > +static unsigned long split_pgtables_count;
> > +static unsigned long split_pgtables_idx;
> >
> > +static __always_inline void __pgd_pgtable_init(struct mm_struct *mm,
> > +					       struct ptdesc *ptdesc,
> > +					       enum pgtable_type pgtable_type)
> > +{
> >  	switch (pgtable_type) {
> >  	case TABLE_PTE:
> >  		BUG_ON(!pagetable_pte_ctor(mm, ptdesc));
> > @@ -554,19 +551,28 @@ static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
> >  		break;
> >  	}
> >
>
> nit: no need for this empty line

Okay. I'll remove..

>
> > -	return pa;
> >  }
> >
> > -static phys_addr_t
> > -pgd_pgtable_alloc_init_mm_gfp(enum pgtable_type pgtable_type, gfp_t gfp)
> > +static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
>
> nit: all remaining callers pass gfp=GFP_PGTABLE_KERNEL so you could remove the
> param?

Agree. I'll remove it.

>
> > +				       enum pgtable_type pgtable_type)
> >  {
> > -	return __pgd_pgtable_alloc(&init_mm, gfp, pgtable_type);
> > +	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
> > +	struct ptdesc *ptdesc = pagetable_alloc(gfp & ~__GFP_ZERO, 0);
> > +
> > +	if (!ptdesc)
> > +		return INVALID_PHYS_ADDR;
> > +
> > +	__pgd_pgtable_init(mm, ptdesc, pgtable_type);
> > +
> > +	return page_to_phys(ptdesc_page(ptdesc));
> >  }
> >
> > +typedef phys_addr_t (split_pgtable_alloc_fn)(enum pgtable_type);
>
> This type is used more generally than just for splitting. Perhaps simply call it
> "pgtable_alloc_fn"?
>
> We also pass this type around in __create_pgd_mapping() and friends; perhaps we
> should have a preparatory patch to define this type and consistently use it
> everywhere instead of passing around "phys_addr_t (*pgtable_alloc)(enum
> pgtable_type)"?

Oh. I miss that __create_pgd_mapping() uses the same callback type.
I'll follow your suggestion. Thanks!

>
> > +
> >  static phys_addr_t __maybe_unused
>
> This is no longer __maybe_unused; you can drop the decorator.

Good point. Thanks!

>
> >  pgd_pgtable_alloc_init_mm(enum pgtable_type pgtable_type)
> >  {
> > -	return pgd_pgtable_alloc_init_mm_gfp(pgtable_type, GFP_PGTABLE_KERNEL);
> > +	return __pgd_pgtable_alloc(&init_mm, GFP_PGTABLE_KERNEL, pgtable_type);
> >  }
> >
> >  static phys_addr_t
> > @@ -575,6 +581,23 @@ pgd_pgtable_alloc_special_mm(enum pgtable_type pgtable_type)
> >  	return  __pgd_pgtable_alloc(NULL, GFP_PGTABLE_KERNEL, pgtable_type);
> >  }
> >
> > +static phys_addr_t
> > +pgd_pgtable_get_preallocated(enum pgtable_type pgtable_type)
> > +{
> > +	struct ptdesc *ptdesc;
> > +
> > +	if (WARN_ON(split_pgtables_idx >= split_pgtables_count))
> > +		return INVALID_PHYS_ADDR;
> > +
> > +	ptdesc = split_pgtables[split_pgtables_idx++];
> > +	if (!ptdesc)
> > +		return INVALID_PHYS_ADDR;
> > +
> > +	__pgd_pgtable_init(&init_mm, ptdesc, pgtable_type);
> > +
> > +	return page_to_phys(ptdesc_page(ptdesc));
> > +}
> > +
> >  static void split_contpte(pte_t *ptep)
> >  {
> >  	int i;
> > @@ -584,7 +607,9 @@ static void split_contpte(pte_t *ptep)
> >  		__set_pte(ptep, pte_mknoncont(__ptep_get(ptep)));
> >  }
> >
> > -static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
> > +static int split_pmd(pmd_t *pmdp, pmd_t pmd,
> > +		     split_pgtable_alloc_fn *pgtable_alloc_fn,
>
> nit: I believe the * has no effect when passing function pointers and the usual
> convention in Linux is to not use the *. Existing functions are also calling it
> "pgtable_alloc" so perhaps this is a bit more consistent:
>
> pgtable_alloc_fn pgtable_alloc
>
> (same nitty comment for all uses below :) )

You're right. It just my *bad* habit. I'll remove it.
>
> > +		     bool to_cont)
> >  {
> >  	pmdval_t tableprot = PMD_TYPE_TABLE | PMD_TABLE_UXN | PMD_TABLE_AF;
> >  	unsigned long pfn = pmd_pfn(pmd);
> > @@ -593,7 +618,7 @@ static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
> >  	pte_t *ptep;
> >  	int i;
> >
> > -	pte_phys = pgd_pgtable_alloc_init_mm_gfp(TABLE_PTE, gfp);
> > +	pte_phys = pgtable_alloc_fn(TABLE_PTE);
> >  	if (pte_phys == INVALID_PHYS_ADDR)
> >  		return -ENOMEM;
> >  	ptep = (pte_t *)phys_to_virt(pte_phys);
> > @@ -628,7 +653,9 @@ static void split_contpmd(pmd_t *pmdp)
> >  		set_pmd(pmdp, pmd_mknoncont(pmdp_get(pmdp)));
> >  }
> >
> > -static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
> > +static int split_pud(pud_t *pudp, pud_t pud,
> > +		     split_pgtable_alloc_fn *pgtable_alloc_fn,
> > +		     bool to_cont)
> >  {
> >  	pudval_t tableprot = PUD_TYPE_TABLE | PUD_TABLE_UXN | PUD_TABLE_AF;
> >  	unsigned int step = PMD_SIZE >> PAGE_SHIFT;
> > @@ -638,7 +665,7 @@ static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
> >  	pmd_t *pmdp;
> >  	int i;
> >
> > -	pmd_phys = pgd_pgtable_alloc_init_mm_gfp(TABLE_PMD, gfp);
> > +	pmd_phys = pgtable_alloc_fn(TABLE_PMD);
> >  	if (pmd_phys == INVALID_PHYS_ADDR)
> >  		return -ENOMEM;
> >  	pmdp = (pmd_t *)phys_to_virt(pmd_phys);
> > @@ -707,7 +734,7 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
> >  	if (!pud_present(pud))
> >  		goto out;
> >  	if (pud_leaf(pud)) {
> > -		ret = split_pud(pudp, pud, GFP_PGTABLE_KERNEL, true);
> > +		ret = split_pud(pudp, pud, pgd_pgtable_alloc_init_mm, true);
> >  		if (ret)
> >  			goto out;
> >  	}
> > @@ -732,7 +759,7 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
> >  		 */
> >  		if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
> >  			goto out;
> > -		ret = split_pmd(pmdp, pmd, GFP_PGTABLE_KERNEL, true);
> > +		ret = split_pmd(pmdp, pmd, pgd_pgtable_alloc_init_mm, true);
> >  		if (ret)
> >  			goto out;
> >  	}
> > @@ -831,34 +858,35 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
> >  static int split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
> >  				   unsigned long next, struct mm_walk *walk)
> >  {
> > -	gfp_t gfp = *(gfp_t *)walk->private;
> > +	split_pgtable_alloc_fn *pgtable_alloc_fn = walk->private;
> >  	pud_t pud = pudp_get(pudp);
> > -	int ret = 0;
> >
> > -	if (pud_leaf(pud))
> > -		ret = split_pud(pudp, pud, gfp, false);
> > +	if (!pud_leaf(pud))
> > +		return 0;
> >
> > -	return ret;
> > +	return split_pud(pudp, pud, pgtable_alloc_fn, false);
>
> why are you changing the layout of this function? Seems like unneccessary churn.
> Just pass pgtable_alloc to split_pud() instead of gfp.
>
> >  }
> >
> >  static int split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
> >  				   unsigned long next, struct mm_walk *walk)
> >  {
> > -	gfp_t gfp = *(gfp_t *)walk->private;
> > +	split_pgtable_alloc_fn *pgtable_alloc_fn = walk->private;
> >  	pmd_t pmd = pmdp_get(pmdp);
> > -	int ret = 0;
> > +	int ret;
> >
> > -	if (pmd_leaf(pmd)) {
> > -		if (pmd_cont(pmd))
> > -			split_contpmd(pmdp);
> > -		ret = split_pmd(pmdp, pmd, gfp, false);
> > +	if (!pmd_leaf(pmd))
> > +		return 0;
> >
> > -		/*
> > -		 * We have split the pmd directly to ptes so there is no need to
> > -		 * visit each pte to check if they are contpte.
> > -		 */
> > -		walk->action = ACTION_CONTINUE;
> > -	}
> > +	if (pmd_cont(pmd))
> > +		split_contpmd(pmdp);
> > +
> > +	ret = split_pmd(pmdp, pmd, pgtable_alloc_fn, false);
> > +
> > +	/*
> > +	 * We have split the pmd directly to ptes so there is no need to
> > +	 * visit each pte to check if they are contpte.
> > +	 */
> > +	walk->action = ACTION_CONTINUE;
>
> Same comment; no need to change the layout of the function.

Okay. I'll keep the former layout.

>
> >
> >  	return ret;
> >  }
> > @@ -880,13 +908,15 @@ static const struct mm_walk_ops split_to_ptes_ops = {
> >  	.pte_entry	= split_to_ptes_pte_entry,
> >  };
> >
> > -static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp)
> > +static int range_split_to_ptes(unsigned long start, unsigned long end,
> > +			       split_pgtable_alloc_fn *pgtable_alloc_fn)
> >  {
> >  	int ret;
> >
> >  	arch_enter_lazy_mmu_mode();
> >  	ret = walk_kernel_page_table_range_lockless(start, end,
> > -					&split_to_ptes_ops, NULL, &gfp);
> > +						    &split_to_ptes_ops, NULL,
> > +						    pgtable_alloc_fn);
> >  	arch_leave_lazy_mmu_mode();
> >
> >  	return ret;
> > @@ -903,6 +933,105 @@ static void __init init_idmap_kpti_bbml2_flag(void)
> >  	smp_mb();
> >  }
> >
> > +static int __init
> > +collect_to_split_pud_entry(pud_t *pudp, unsigned long addr,
> > +			   unsigned long next, struct mm_walk *walk)
> > +{
> > +	pud_t pud = pudp_get(pudp);
> > +
> > +	if (pud_leaf(pud))
> > +		split_pgtables_count += 1 + PTRS_PER_PMD;
>
> I think you probably want:
>
> 	walk->action = ACTION_CONTINUE;
>
> Likely you will end up with the same behaviour regardless. But you should at
> least we consistent with collect_to_split_pmd_entry().

This is my fault to miss here. Thanks for catching this :)

>
> > +
> > +	return 0;
> > +}
> > +
> > +static int __init
> > +collect_to_split_pmd_entry(pmd_t *pmdp, unsigned long addr,
> > +			   unsigned long next, struct mm_walk *walk)
> > +{
> > +	pmd_t pmd = pmdp_get(pmdp);
> > +
> > +	if (pmd_leaf(pmd))
> > +		split_pgtables_count++;
> > +
> > +	walk->action = ACTION_CONTINUE;
> > +
> > +	return 0;
> > +}
> > +
> > +static void __init linear_map_free_split_pgtables(void)
> > +{
> > +	int i;
> > +
> > +	if (!split_pgtables_count || !split_pgtables)
> > +		goto skip_free;
> > +
> > +	for (i = split_pgtables_idx; i < split_pgtables_count; i++) {
> > +		if (split_pgtables[i])
> > +			pagetable_free(split_pgtables[i]);
> > +	}
> > +
> > +	free_pages((unsigned long)split_pgtables, split_pgtables_order);
> > +
> > +skip_free:
> > +	split_pgtables = NULL;
> > +	split_pgtables_count = 0;
> > +	split_pgtables_idx = 0;
> > +	split_pgtables_order = 0;
> > +}
> > +
> > +static int __init linear_map_prealloc_split_pgtables(void)
> > +{
> > +	int ret, i;
> > +	unsigned long lstart = _PAGE_OFFSET(vabits_actual);
> > +	unsigned long lend = PAGE_END;
> > +	unsigned long kstart = (unsigned long)lm_alias(_stext);
> > +	unsigned long kend = (unsigned long)lm_alias(__init_begin);
> > +
> > +	const struct mm_walk_ops collect_to_split_ops = {
> > +		.pud_entry	= collect_to_split_pud_entry,
> > +		.pmd_entry	= collect_to_split_pmd_entry
> > +	};
> > +
> > +	split_pgtables_idx = 0;
> > +	split_pgtables_count = 0;
> > +
> > +	ret = walk_kernel_page_table_range_lockless(lstart, kstart,
> > +						    &collect_to_split_ops,
> > +						    NULL, NULL);
> > +	if (!ret)
> > +		ret = walk_kernel_page_table_range_lockless(kend, lend,
> > +							    &collect_to_split_ops,
> > +							    NULL, NULL);
> > +	if (ret || !split_pgtables_count)
> > +		goto error;
> > +
> > +	ret = -ENOMEM;
> > +
> > +	split_pgtables_order =
> > +		order_base_2(PAGE_ALIGN(split_pgtables_count *
> > +					sizeof(struct ptdesc *)) >> PAGE_SHIFT);
>
> Wouldn't this be simpler?
>
> 	split_pgtables_order = get_order(split_pgtables_count *
> 					 sizeof(struct ptdesc *));
>

Yes. that would be simpler. But I think we can use
kvmalloc() for split_pagtables since
linear_map_prealloc_split_pgtables() is called after mm_core_init().
So It could be dropped or Am I missing something?

> > +
> > +	split_pgtables = (struct ptdesc **) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> > +							     split_pgtables_order);
>
> Do you really need the cast? (I'm not sure).

Since, __get_free_pages() return is "unsigned long", cast is reqruied.
But as I said above, I think it could be replaced with the "kvmalloc()'
for split_pagetables.

>
> > +	if (!split_pgtables)
> > +		goto error;
> > +
> > +	for (i = 0; i < split_pgtables_count; i++) {
> > +		split_pgtables[i] = pagetable_alloc(GFP_KERNEL, 0);
>
> For consistency with other code, perhaps spell it out?:
>
> 	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
> 	gfp_t gfp = GFP_PGTABLE_KERNEL & ~__GFP_ZERO;

Using the gfp_t gfp is fine but comments should be changed
since init_clear_pgtable() isn't use while split instead
spliting fills the pgtable with former mapped pages.

I'll use gfp with the proepr comments. Thanks!

>
> 	for (i = 0; i < split_pgtables_count; i++) {
> 		split_pgtables[i] = pagetable_alloc(gfp, 0);
>
> > +		if (!split_pgtables[i])
> > +			goto error;
> > +	}
> > +
> > +	ret = 0;
> > +
> > +error:
> > +	if (ret)
> > +		linear_map_free_split_pgtables();
> > +
> > +	return ret;
> > +}
>
> I wonder if there is value in generalizing this a bit to separate out the
> determination of the number of pages from the actual pre-allocation and free
> functions? Then you have a reusable pre-allocation function that you could also
> use for the KPTI case instead of having it have yet another private pre-allocator?

I see your last comments for this :D. I'll drop this comment.
>
> > +
> >  static int __init linear_map_split_to_ptes(void *__unused)
> >  {
> >  	/*
> > @@ -928,9 +1057,9 @@ static int __init linear_map_split_to_ptes(void *__unused)
> >  		 * PTE. The kernel alias remains static throughout runtime so
> >  		 * can continue to be safely mapped with large mappings.
> >  		 */
> > -		ret = range_split_to_ptes(lstart, kstart, GFP_ATOMIC);
> > +		ret = range_split_to_ptes(lstart, kstart, pgd_pgtable_get_preallocated);
> >  		if (!ret)
> > -			ret = range_split_to_ptes(kend, lend, GFP_ATOMIC);
> > +			ret = range_split_to_ptes(kend, lend, pgd_pgtable_get_preallocated);
> >  		if (ret)
> >  			panic("Failed to split linear map\n");
> >  		flush_tlb_kernel_range(lstart, lend);
> > @@ -963,10 +1092,16 @@ static int __init linear_map_split_to_ptes(void *__unused)
> >
> >  void __init linear_map_maybe_split_to_ptes(void)
> >  {
> > -	if (linear_map_requires_bbml2 && !system_supports_bbml2_noabort()) {
> > -		init_idmap_kpti_bbml2_flag();
> > -		stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
> > -	}
> > +	if (!linear_map_requires_bbml2 || system_supports_bbml2_noabort())
> > +		return;
> > +
> > +	if (linear_map_prealloc_split_pgtables())
> > +		panic("Failed to split linear map\n");
> > +
> > +	init_idmap_kpti_bbml2_flag();
> > +	stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
> > +
> > +	linear_map_free_split_pgtables();
> >  }
> >
> >  /*
> > @@ -1088,6 +1223,7 @@ bool arch_kfence_init_pool(void)
> >  	unsigned long end = start + KFENCE_POOL_SIZE;
> >  	int ret;
> >
> > +
>
> nit: Remove extra empty line.

Okay. I'll remove it.

>
> This is looking much cleaner now; nearly there!

Thanks for your detail review!

>
> Thanks,
> Ryan
>
>
> >  	/* Exit early if we know the linear map is already pte-mapped. */
> >  	if (!split_leaf_mapping_possible())
> >  		return true;
> > @@ -1097,7 +1233,7 @@ bool arch_kfence_init_pool(void)
> >  		return true;
> >
> >  	mutex_lock(&pgtable_split_lock);
> > -	ret = range_split_to_ptes(start, end, GFP_PGTABLE_KERNEL);
> > +	ret = range_split_to_ptes(start, end, pgd_pgtable_alloc_init_mm);
> >  	mutex_unlock(&pgtable_split_lock);
> >
> >  	/*
>

--
Sincerely,
Yeoreum Yun


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] arm64: mmu: avoid allocating pages while splitting the linear mapping
  2026-01-02 12:21     ` Yeoreum Yun
@ 2026-01-02 12:35       ` Ryan Roberts
  0 siblings, 0 replies; 11+ messages in thread
From: Ryan Roberts @ 2026-01-02 12:35 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: catalin.marinas, will, akpm, david, kevin.brodsky, quic_zhenhuah,
	dev.jain, yang, chaitanyas.prakash, bigeasy, clrkwllms, rostedt,
	lorenzo.stoakes, ardb, jackmanb, vbabka, mhocko, linux-arm-kernel,
	linux-kernel, linux-rt-devel

On 02/01/2026 12:21, Yeoreum Yun wrote:
> Hi Ryan,
> 
> Thanks for your review :)
> 
>>> linear_map_split_to_ptes() currently allocates page tables while
>>> splitting the linear mapping into PTEs under stop_machine() using GFP_ATOMIC.
>>>
>>> This is fine for non-PREEMPT_RT configurations.
>>> However, it becomes problematic on PREEMPT_RT, because
>>> generic memory allocation/free APIs (e.g. pgtable_alloc(), __get_free_pages(), etc.)
>>> cannot be called from a non-preemptible context, except for the _nolock() variants.
>>> This is because generic memory allocation/free paths are sleepable,
>>> as they rely on spin_lock(), which becomes sleepable on PREEMPT_RT.
>>>
>>> In other words, even calling pgtable_alloc() with GFP_ATOMIC is not permitted
>>> in __linear_map_split_to_pte() when it is executed by the stopper thread,
>>>  where preemption is disabled on PREEMPT_RT.
>>>
>>> To address this, the required number of page tables is first collected
>>> and preallocated, and the preallocated page tables are then used
>>> when splitting the linear mapping in __linear_map_split_to_pte().
>>>
>>> Fixes: 3df6979d222b ("arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs")
>>> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
>>> ---
>>>  arch/arm64/mm/mmu.c | 232 +++++++++++++++++++++++++++++++++++---------
>>>  1 file changed, 184 insertions(+), 48 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 9ae7ce00a7ef..96a9fa505e71 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -527,18 +527,15 @@ static void early_create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>>>  		panic("Failed to create page tables\n");
>>>  }
>>>
>>> -static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
>>> -				       enum pgtable_type pgtable_type)
>>> -{
>>> -	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
>>> -	struct ptdesc *ptdesc = pagetable_alloc(gfp & ~__GFP_ZERO, 0);
>>> -	phys_addr_t pa;
>>> -
>>> -	if (!ptdesc)
>>> -		return INVALID_PHYS_ADDR;
>>> -
>>> -	pa = page_to_phys(ptdesc_page(ptdesc));
>>> +static struct ptdesc **split_pgtables;
>>> +static int split_pgtables_order;
>>> +static unsigned long split_pgtables_count;
>>> +static unsigned long split_pgtables_idx;
>>>
>>> +static __always_inline void __pgd_pgtable_init(struct mm_struct *mm,
>>> +					       struct ptdesc *ptdesc,
>>> +					       enum pgtable_type pgtable_type)
>>> +{
>>>  	switch (pgtable_type) {
>>>  	case TABLE_PTE:
>>>  		BUG_ON(!pagetable_pte_ctor(mm, ptdesc));
>>> @@ -554,19 +551,28 @@ static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
>>>  		break;
>>>  	}
>>>
>>
>> nit: no need for this empty line
> 
> Okay. I'll remove..
> 
>>
>>> -	return pa;
>>>  }
>>>
>>> -static phys_addr_t
>>> -pgd_pgtable_alloc_init_mm_gfp(enum pgtable_type pgtable_type, gfp_t gfp)
>>> +static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm, gfp_t gfp,
>>
>> nit: all remaining callers pass gfp=GFP_PGTABLE_KERNEL so you could remove the
>> param?
> 
> Agree. I'll remove it.
> 
>>
>>> +				       enum pgtable_type pgtable_type)
>>>  {
>>> -	return __pgd_pgtable_alloc(&init_mm, gfp, pgtable_type);
>>> +	/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
>>> +	struct ptdesc *ptdesc = pagetable_alloc(gfp & ~__GFP_ZERO, 0);
>>> +
>>> +	if (!ptdesc)
>>> +		return INVALID_PHYS_ADDR;
>>> +
>>> +	__pgd_pgtable_init(mm, ptdesc, pgtable_type);
>>> +
>>> +	return page_to_phys(ptdesc_page(ptdesc));
>>>  }
>>>
>>> +typedef phys_addr_t (split_pgtable_alloc_fn)(enum pgtable_type);
>>
>> This type is used more generally than just for splitting. Perhaps simply call it
>> "pgtable_alloc_fn"?
>>
>> We also pass this type around in __create_pgd_mapping() and friends; perhaps we
>> should have a preparatory patch to define this type and consistently use it
>> everywhere instead of passing around "phys_addr_t (*pgtable_alloc)(enum
>> pgtable_type)"?
> 
> Oh. I miss that __create_pgd_mapping() uses the same callback type.
> I'll follow your suggestion. Thanks!
> 
>>
>>> +
>>>  static phys_addr_t __maybe_unused
>>
>> This is no longer __maybe_unused; you can drop the decorator.
> 
> Good point. Thanks!
> 
>>
>>>  pgd_pgtable_alloc_init_mm(enum pgtable_type pgtable_type)
>>>  {
>>> -	return pgd_pgtable_alloc_init_mm_gfp(pgtable_type, GFP_PGTABLE_KERNEL);
>>> +	return __pgd_pgtable_alloc(&init_mm, GFP_PGTABLE_KERNEL, pgtable_type);
>>>  }
>>>
>>>  static phys_addr_t
>>> @@ -575,6 +581,23 @@ pgd_pgtable_alloc_special_mm(enum pgtable_type pgtable_type)
>>>  	return  __pgd_pgtable_alloc(NULL, GFP_PGTABLE_KERNEL, pgtable_type);
>>>  }
>>>
>>> +static phys_addr_t
>>> +pgd_pgtable_get_preallocated(enum pgtable_type pgtable_type)
>>> +{
>>> +	struct ptdesc *ptdesc;
>>> +
>>> +	if (WARN_ON(split_pgtables_idx >= split_pgtables_count))
>>> +		return INVALID_PHYS_ADDR;
>>> +
>>> +	ptdesc = split_pgtables[split_pgtables_idx++];
>>> +	if (!ptdesc)
>>> +		return INVALID_PHYS_ADDR;
>>> +
>>> +	__pgd_pgtable_init(&init_mm, ptdesc, pgtable_type);
>>> +
>>> +	return page_to_phys(ptdesc_page(ptdesc));
>>> +}
>>> +
>>>  static void split_contpte(pte_t *ptep)
>>>  {
>>>  	int i;
>>> @@ -584,7 +607,9 @@ static void split_contpte(pte_t *ptep)
>>>  		__set_pte(ptep, pte_mknoncont(__ptep_get(ptep)));
>>>  }
>>>
>>> -static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
>>> +static int split_pmd(pmd_t *pmdp, pmd_t pmd,
>>> +		     split_pgtable_alloc_fn *pgtable_alloc_fn,
>>
>> nit: I believe the * has no effect when passing function pointers and the usual
>> convention in Linux is to not use the *. Existing functions are also calling it
>> "pgtable_alloc" so perhaps this is a bit more consistent:
>>
>> pgtable_alloc_fn pgtable_alloc
>>
>> (same nitty comment for all uses below :) )
> 
> You're right. It just my *bad* habit. I'll remove it.
>>
>>> +		     bool to_cont)
>>>  {
>>>  	pmdval_t tableprot = PMD_TYPE_TABLE | PMD_TABLE_UXN | PMD_TABLE_AF;
>>>  	unsigned long pfn = pmd_pfn(pmd);
>>> @@ -593,7 +618,7 @@ static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
>>>  	pte_t *ptep;
>>>  	int i;
>>>
>>> -	pte_phys = pgd_pgtable_alloc_init_mm_gfp(TABLE_PTE, gfp);
>>> +	pte_phys = pgtable_alloc_fn(TABLE_PTE);
>>>  	if (pte_phys == INVALID_PHYS_ADDR)
>>>  		return -ENOMEM;
>>>  	ptep = (pte_t *)phys_to_virt(pte_phys);
>>> @@ -628,7 +653,9 @@ static void split_contpmd(pmd_t *pmdp)
>>>  		set_pmd(pmdp, pmd_mknoncont(pmdp_get(pmdp)));
>>>  }
>>>
>>> -static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
>>> +static int split_pud(pud_t *pudp, pud_t pud,
>>> +		     split_pgtable_alloc_fn *pgtable_alloc_fn,
>>> +		     bool to_cont)
>>>  {
>>>  	pudval_t tableprot = PUD_TYPE_TABLE | PUD_TABLE_UXN | PUD_TABLE_AF;
>>>  	unsigned int step = PMD_SIZE >> PAGE_SHIFT;
>>> @@ -638,7 +665,7 @@ static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
>>>  	pmd_t *pmdp;
>>>  	int i;
>>>
>>> -	pmd_phys = pgd_pgtable_alloc_init_mm_gfp(TABLE_PMD, gfp);
>>> +	pmd_phys = pgtable_alloc_fn(TABLE_PMD);
>>>  	if (pmd_phys == INVALID_PHYS_ADDR)
>>>  		return -ENOMEM;
>>>  	pmdp = (pmd_t *)phys_to_virt(pmd_phys);
>>> @@ -707,7 +734,7 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
>>>  	if (!pud_present(pud))
>>>  		goto out;
>>>  	if (pud_leaf(pud)) {
>>> -		ret = split_pud(pudp, pud, GFP_PGTABLE_KERNEL, true);
>>> +		ret = split_pud(pudp, pud, pgd_pgtable_alloc_init_mm, true);
>>>  		if (ret)
>>>  			goto out;
>>>  	}
>>> @@ -732,7 +759,7 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
>>>  		 */
>>>  		if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
>>>  			goto out;
>>> -		ret = split_pmd(pmdp, pmd, GFP_PGTABLE_KERNEL, true);
>>> +		ret = split_pmd(pmdp, pmd, pgd_pgtable_alloc_init_mm, true);
>>>  		if (ret)
>>>  			goto out;
>>>  	}
>>> @@ -831,34 +858,35 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
>>>  static int split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
>>>  				   unsigned long next, struct mm_walk *walk)
>>>  {
>>> -	gfp_t gfp = *(gfp_t *)walk->private;
>>> +	split_pgtable_alloc_fn *pgtable_alloc_fn = walk->private;
>>>  	pud_t pud = pudp_get(pudp);
>>> -	int ret = 0;
>>>
>>> -	if (pud_leaf(pud))
>>> -		ret = split_pud(pudp, pud, gfp, false);
>>> +	if (!pud_leaf(pud))
>>> +		return 0;
>>>
>>> -	return ret;
>>> +	return split_pud(pudp, pud, pgtable_alloc_fn, false);
>>
>> why are you changing the layout of this function? Seems like unneccessary churn.
>> Just pass pgtable_alloc to split_pud() instead of gfp.
>>
>>>  }
>>>
>>>  static int split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
>>>  				   unsigned long next, struct mm_walk *walk)
>>>  {
>>> -	gfp_t gfp = *(gfp_t *)walk->private;
>>> +	split_pgtable_alloc_fn *pgtable_alloc_fn = walk->private;
>>>  	pmd_t pmd = pmdp_get(pmdp);
>>> -	int ret = 0;
>>> +	int ret;
>>>
>>> -	if (pmd_leaf(pmd)) {
>>> -		if (pmd_cont(pmd))
>>> -			split_contpmd(pmdp);
>>> -		ret = split_pmd(pmdp, pmd, gfp, false);
>>> +	if (!pmd_leaf(pmd))
>>> +		return 0;
>>>
>>> -		/*
>>> -		 * We have split the pmd directly to ptes so there is no need to
>>> -		 * visit each pte to check if they are contpte.
>>> -		 */
>>> -		walk->action = ACTION_CONTINUE;
>>> -	}
>>> +	if (pmd_cont(pmd))
>>> +		split_contpmd(pmdp);
>>> +
>>> +	ret = split_pmd(pmdp, pmd, pgtable_alloc_fn, false);
>>> +
>>> +	/*
>>> +	 * We have split the pmd directly to ptes so there is no need to
>>> +	 * visit each pte to check if they are contpte.
>>> +	 */
>>> +	walk->action = ACTION_CONTINUE;
>>
>> Same comment; no need to change the layout of the function.
> 
> Okay. I'll keep the former layout.
> 
>>
>>>
>>>  	return ret;
>>>  }
>>> @@ -880,13 +908,15 @@ static const struct mm_walk_ops split_to_ptes_ops = {
>>>  	.pte_entry	= split_to_ptes_pte_entry,
>>>  };
>>>
>>> -static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp)
>>> +static int range_split_to_ptes(unsigned long start, unsigned long end,
>>> +			       split_pgtable_alloc_fn *pgtable_alloc_fn)
>>>  {
>>>  	int ret;
>>>
>>>  	arch_enter_lazy_mmu_mode();
>>>  	ret = walk_kernel_page_table_range_lockless(start, end,
>>> -					&split_to_ptes_ops, NULL, &gfp);
>>> +						    &split_to_ptes_ops, NULL,
>>> +						    pgtable_alloc_fn);
>>>  	arch_leave_lazy_mmu_mode();
>>>
>>>  	return ret;
>>> @@ -903,6 +933,105 @@ static void __init init_idmap_kpti_bbml2_flag(void)
>>>  	smp_mb();
>>>  }
>>>
>>> +static int __init
>>> +collect_to_split_pud_entry(pud_t *pudp, unsigned long addr,
>>> +			   unsigned long next, struct mm_walk *walk)
>>> +{
>>> +	pud_t pud = pudp_get(pudp);
>>> +
>>> +	if (pud_leaf(pud))
>>> +		split_pgtables_count += 1 + PTRS_PER_PMD;
>>
>> I think you probably want:
>>
>> 	walk->action = ACTION_CONTINUE;
>>
>> Likely you will end up with the same behaviour regardless. But you should at
>> least we consistent with collect_to_split_pmd_entry().
> 
> This is my fault to miss here. Thanks for catching this :)
> 
>>
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int __init
>>> +collect_to_split_pmd_entry(pmd_t *pmdp, unsigned long addr,
>>> +			   unsigned long next, struct mm_walk *walk)
>>> +{
>>> +	pmd_t pmd = pmdp_get(pmdp);
>>> +
>>> +	if (pmd_leaf(pmd))
>>> +		split_pgtables_count++;
>>> +
>>> +	walk->action = ACTION_CONTINUE;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static void __init linear_map_free_split_pgtables(void)
>>> +{
>>> +	int i;
>>> +
>>> +	if (!split_pgtables_count || !split_pgtables)
>>> +		goto skip_free;
>>> +
>>> +	for (i = split_pgtables_idx; i < split_pgtables_count; i++) {
>>> +		if (split_pgtables[i])
>>> +			pagetable_free(split_pgtables[i]);
>>> +	}
>>> +
>>> +	free_pages((unsigned long)split_pgtables, split_pgtables_order);
>>> +
>>> +skip_free:
>>> +	split_pgtables = NULL;
>>> +	split_pgtables_count = 0;
>>> +	split_pgtables_idx = 0;
>>> +	split_pgtables_order = 0;
>>> +}
>>> +
>>> +static int __init linear_map_prealloc_split_pgtables(void)
>>> +{
>>> +	int ret, i;
>>> +	unsigned long lstart = _PAGE_OFFSET(vabits_actual);
>>> +	unsigned long lend = PAGE_END;
>>> +	unsigned long kstart = (unsigned long)lm_alias(_stext);
>>> +	unsigned long kend = (unsigned long)lm_alias(__init_begin);
>>> +
>>> +	const struct mm_walk_ops collect_to_split_ops = {
>>> +		.pud_entry	= collect_to_split_pud_entry,
>>> +		.pmd_entry	= collect_to_split_pmd_entry
>>> +	};
>>> +
>>> +	split_pgtables_idx = 0;
>>> +	split_pgtables_count = 0;
>>> +
>>> +	ret = walk_kernel_page_table_range_lockless(lstart, kstart,
>>> +						    &collect_to_split_ops,
>>> +						    NULL, NULL);
>>> +	if (!ret)
>>> +		ret = walk_kernel_page_table_range_lockless(kend, lend,
>>> +							    &collect_to_split_ops,
>>> +							    NULL, NULL);
>>> +	if (ret || !split_pgtables_count)
>>> +		goto error;
>>> +
>>> +	ret = -ENOMEM;
>>> +
>>> +	split_pgtables_order =
>>> +		order_base_2(PAGE_ALIGN(split_pgtables_count *
>>> +					sizeof(struct ptdesc *)) >> PAGE_SHIFT);
>>
>> Wouldn't this be simpler?
>>
>> 	split_pgtables_order = get_order(split_pgtables_count *
>> 					 sizeof(struct ptdesc *));
>>
> 
> Yes. that would be simpler. But I think we can use
> kvmalloc() for split_pagtables since
> linear_map_prealloc_split_pgtables() is called after mm_core_init().
> So It could be dropped or Am I missing something?
> 


If kvmalloc is usable at this point, I agree that would be much better.

Thanks,
Ryan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] arm64: mmu: avoid allocating pages while installing ng-mapping for KPTI
  2026-01-02 11:10   ` Ryan Roberts
@ 2026-01-02 14:48     ` Yeoreum Yun
  0 siblings, 0 replies; 11+ messages in thread
From: Yeoreum Yun @ 2026-01-02 14:48 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: catalin.marinas, will, akpm, david, kevin.brodsky, quic_zhenhuah,
	dev.jain, yang, chaitanyas.prakash, bigeasy, clrkwllms, rostedt,
	lorenzo.stoakes, ardb, jackmanb, vbabka, mhocko, linux-arm-kernel,
	linux-kernel, linux-rt-devel

Hi Ryan,

> On 18/12/2025 19:47, Yeoreum Yun wrote:
> > The current __kpti_install_ng_mappings() allocates a temporary PGD
> > while installing the NG mapping for KPTI under stop_machine(),
> > using GFP_ATOMIC.
> >
> > This is fine in the non-PREEMPT_RT case. However, it becomes a problem
> > under PREEMPT_RT because generic memory allocation/free APIs
> > (e.g., pgtable_alloc(), __get_free_pages(), etc.) cannot be invoked
> > in a non-preemptible context, except for the *_nolock() variants.
> > These generic allocators may sleep due to their use of spin_lock().
> >
> > In other words, calling __get_free_pages(), even with GFP_ATOMIC,
> > is not allowed in __kpti_install_ng_mappings(), which is executed by
> > the stopper thread where preemption is disabled under PREEMPT_RT.
> >
> > To address this, preallocate the page needed for the temporary PGD
> > before invoking __kpti_install_ng_mappings() via stop_machine().
> >
> > Fixes: 47546a1912fc ("arm64: mm: install KPTI nG mappings with MMU enabled")
> > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> > ---
> >  arch/arm64/mm/mmu.c | 22 +++++++++++++---------
> >  1 file changed, 13 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 96a9fa505e71..9ad9612728e6 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -1369,7 +1369,7 @@ static phys_addr_t __init kpti_ng_pgd_alloc(enum pgtable_type type)
> >  	return kpti_ng_temp_alloc;
> >  }
> >
> > -static int __init __kpti_install_ng_mappings(void *__unused)
> > +static int __init __kpti_install_ng_mappings(void *data)
> >  {
> >  	typedef void (kpti_remap_fn)(int, int, phys_addr_t, unsigned long);
> >  	extern kpti_remap_fn idmap_kpti_install_ng_mappings;
> > @@ -1377,10 +1377,9 @@ static int __init __kpti_install_ng_mappings(void *__unused)
> >
> >  	int cpu = smp_processor_id();
> >  	int levels = CONFIG_PGTABLE_LEVELS;
> > -	int order = order_base_2(levels);
> >  	u64 kpti_ng_temp_pgd_pa = 0;
> >  	pgd_t *kpti_ng_temp_pgd;
> > -	u64 alloc = 0;
> > +	u64 alloc = *(u64 *)data;
> >
> >  	if (levels == 5 && !pgtable_l5_enabled())
> >  		levels = 4;
> > @@ -1391,8 +1390,6 @@ static int __init __kpti_install_ng_mappings(void *__unused)
> >
> >  	if (!cpu) {
> >  		int ret;
> > -
> > -		alloc = __get_free_pages(GFP_ATOMIC | __GFP_ZERO, order);
> >  		kpti_ng_temp_pgd = (pgd_t *)(alloc + (levels - 1) * PAGE_SIZE);
> >  		kpti_ng_temp_alloc = kpti_ng_temp_pgd_pa = __pa(kpti_ng_temp_pgd);
> >
> > @@ -1423,16 +1420,17 @@ static int __init __kpti_install_ng_mappings(void *__unused)
> >  	remap_fn(cpu, num_online_cpus(), kpti_ng_temp_pgd_pa, KPTI_NG_TEMP_VA);
> >  	cpu_uninstall_idmap();
> >
> > -	if (!cpu) {
> > -		free_pages(alloc, order);
> > +	if (!cpu)
> >  		arm64_use_ng_mappings = true;
> > -	}
> >
> >  	return 0;
> >  }
> >
> >  void __init kpti_install_ng_mappings(void)
> >  {
> > +	int order = order_base_2(CONFIG_PGTABLE_LEVELS);
> > +	u64 alloc;
> > +
>
> nit: Restore the blank line between the variable definitioins and the logic.

Okay. I'll restore that line.

Thanks!

--
Sincerely,
Yeoreum Yun


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-01-02 14:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-18 19:47 [PATCH v3 0/2] fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64 Yeoreum Yun
2025-12-18 19:47 ` [PATCH v3 1/2] arm64: mmu: avoid allocating pages while splitting the linear mapping Yeoreum Yun
2026-01-02 11:03   ` Ryan Roberts
2026-01-02 11:09     ` Ryan Roberts
2026-01-02 12:21     ` Yeoreum Yun
2026-01-02 12:35       ` Ryan Roberts
2025-12-18 19:47 ` [PATCH v3 2/2] arm64: mmu: avoid allocating pages while installing ng-mapping for KPTI Yeoreum Yun
2026-01-02 11:10   ` Ryan Roberts
2026-01-02 14:48     ` Yeoreum Yun
2025-12-31 10:07 ` [PATCH v3 0/2] fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64 Yeoreum Yun
2025-12-31 12:34   ` David Hildenbrand (Red Hat)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).