linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Don't sleep in split_kernel_leaf_mapping() when in atomic context
@ 2025-11-06 16:09 Ryan Roberts
  2025-11-06 16:09 ` [PATCH v2 1/3] arm64: mm: " Ryan Roberts
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Ryan Roberts @ 2025-11-06 16:09 UTC (permalink / raw)
  To: catalin.marinas, will, yang, david, ardb, dev.jain, scott, cl
  Cc: Ryan Roberts, linux-arm-kernel, linux-kernel

Hi Will,

This is v2 of the fix for split_kernel_leaf_mapping(). I've expanded it into 3
patches based on feedback from v1 [1].

Once happy with the content, patch 1 is needed urgently for next -rc to fix
regression since 6.18-rc1. The other patches could wait until 6.19, but I'd
prefer they all go together into 6.18.

Changes since v1 [1]
====================

Patch 1: The fix
  - Removed arch_kfence_init_pool() declaration for !KFENCE case (per Will)
  - Removed lazy mode mmu optimization (now separate patch) (per Will)
  - Simplified arch_kfence_init_pool() return expression (per Will)
  - Added comment about not needing tlbi
  - Generalized comment softirq -> atomic (per Yang Shi)
Patch 2: lazy mode mmu optimization (per Will)
Patch 3: force_pte_mapping() tidy ups (per David)

[1] https://lore.kernel.org/linux-arm-kernel/20251103125738.3073566-1-ryan.roberts@arm.com/

Thanks,
Ryan


Ryan Roberts (3):
  arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic
    context
  arm64: mm: Optimize range_split_to_ptes()
  arm64: mm: Tidy up force_pte_mapping()

 arch/arm64/include/asm/kfence.h |   3 +-
 arch/arm64/mm/mmu.c             | 111 +++++++++++++++++++++++---------
 2 files changed, 81 insertions(+), 33 deletions(-)

--
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context
  2025-11-06 16:09 [PATCH v2 0/3] Don't sleep in split_kernel_leaf_mapping() when in atomic context Ryan Roberts
@ 2025-11-06 16:09 ` Ryan Roberts
  2025-11-06 20:46   ` Yang Shi
  2025-11-06 21:08   ` David Hildenbrand (Red Hat)
  2025-11-06 16:09 ` [PATCH v2 2/3] arm64: mm: Optimize range_split_to_ptes() Ryan Roberts
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Ryan Roberts @ 2025-11-06 16:09 UTC (permalink / raw)
  To: catalin.marinas, will, yang, david, ardb, dev.jain, scott, cl
  Cc: Ryan Roberts, linux-arm-kernel, linux-kernel, Guenter Roeck

It has been reported that split_kernel_leaf_mapping() is trying to sleep
in non-sleepable context. It does this when acquiring the
pgtable_split_lock mutex, when either CONFIG_DEBUG_PAGEALLOC or
CONFIG_KFENCE are enabled, which change linear map permissions within
softirq context during memory allocation and/or freeing. All other paths
into this function are called from sleepable context and so are safe.

But it turns out that the memory for which these 2 features may attempt
to modify the permissions is always mapped by pte, so there is no need
to attempt to split the mapping. So let's exit early in these cases and
avoid attempting to take the mutex.

There is one wrinkle to this approach; late-initialized kfence allocates
it's pool from the buddy which may be block mapped. So we must hook that
allocation and convert it to pte-mappings up front. Previously this was
done as a side-effect of kfence protecting all the individual pages in
its pool at init-time, but this no longer works due to the added early
exit path in split_kernel_leaf_mapping().

So instead, do this via the existing arch_kfence_init_pool() arch hook,
and reuse the existing linear_map_split_to_ptes() infrastructure.

Closes: https://lore.kernel.org/all/f24b9032-0ec9-47b1-8b95-c0eeac7a31c5@roeck-us.net/
Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Tested-by: Guenter Roeck <groeck@google.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kfence.h |  3 +-
 arch/arm64/mm/mmu.c             | 92 +++++++++++++++++++++++----------
 2 files changed, 67 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/kfence.h b/arch/arm64/include/asm/kfence.h
index a81937fae9f6..21dbc9dda747 100644
--- a/arch/arm64/include/asm/kfence.h
+++ b/arch/arm64/include/asm/kfence.h
@@ -10,8 +10,6 @@
 
 #include <asm/set_memory.h>
 
-static inline bool arch_kfence_init_pool(void) { return true; }
-
 static inline bool kfence_protect_page(unsigned long addr, bool protect)
 {
 	set_memory_valid(addr, 1, !protect);
@@ -25,6 +23,7 @@ static inline bool arm64_kfence_can_set_direct_map(void)
 {
 	return !kfence_early_init;
 }
+bool arch_kfence_init_pool(void);
 #else /* CONFIG_KFENCE */
 static inline bool arm64_kfence_can_set_direct_map(void) { return false; }
 #endif /* CONFIG_KFENCE */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b8d37eb037fc..a364ac2c9c61 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -708,6 +708,16 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
 	return ret;
 }
 
+static inline bool force_pte_mapping(void)
+{
+	bool bbml2 = system_capabilities_finalized() ?
+		system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
+
+	return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
+			   is_realm_world())) ||
+		debug_pagealloc_enabled();
+}
+
 static DEFINE_MUTEX(pgtable_split_lock);
 
 int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
@@ -723,6 +733,16 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
 	if (!system_supports_bbml2_noabort())
 		return 0;
 
+	/*
+	 * If the region is within a pte-mapped area, there is no need to try to
+	 * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
+	 * change permissions from atomic context so for those cases (which are
+	 * always pte-mapped), we must not go any further because taking the
+	 * mutex below may sleep.
+	 */
+	if (force_pte_mapping() || is_kfence_address((void *)start))
+		return 0;
+
 	/*
 	 * Ensure start and end are at least page-aligned since this is the
 	 * finest granularity we can split to.
@@ -758,30 +778,30 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
 	return ret;
 }
 
-static int __init split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
-					  unsigned long next,
-					  struct mm_walk *walk)
+static int split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
+				   unsigned long next, struct mm_walk *walk)
 {
+	gfp_t gfp = *(gfp_t *)walk->private;
 	pud_t pud = pudp_get(pudp);
 	int ret = 0;
 
 	if (pud_leaf(pud))
-		ret = split_pud(pudp, pud, GFP_ATOMIC, false);
+		ret = split_pud(pudp, pud, gfp, false);
 
 	return ret;
 }
 
-static int __init split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
-					  unsigned long next,
-					  struct mm_walk *walk)
+static int split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
+				   unsigned long next, struct mm_walk *walk)
 {
+	gfp_t gfp = *(gfp_t *)walk->private;
 	pmd_t pmd = pmdp_get(pmdp);
 	int ret = 0;
 
 	if (pmd_leaf(pmd)) {
 		if (pmd_cont(pmd))
 			split_contpmd(pmdp);
-		ret = split_pmd(pmdp, pmd, GFP_ATOMIC, false);
+		ret = split_pmd(pmdp, pmd, gfp, false);
 
 		/*
 		 * We have split the pmd directly to ptes so there is no need to
@@ -793,9 +813,8 @@ static int __init split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
 	return ret;
 }
 
-static int __init split_to_ptes_pte_entry(pte_t *ptep, unsigned long addr,
-					  unsigned long next,
-					  struct mm_walk *walk)
+static int split_to_ptes_pte_entry(pte_t *ptep, unsigned long addr,
+				   unsigned long next, struct mm_walk *walk)
 {
 	pte_t pte = __ptep_get(ptep);
 
@@ -805,12 +824,18 @@ static int __init split_to_ptes_pte_entry(pte_t *ptep, unsigned long addr,
 	return 0;
 }
 
-static const struct mm_walk_ops split_to_ptes_ops __initconst = {
+static const struct mm_walk_ops split_to_ptes_ops = {
 	.pud_entry	= split_to_ptes_pud_entry,
 	.pmd_entry	= split_to_ptes_pmd_entry,
 	.pte_entry	= split_to_ptes_pte_entry,
 };
 
+static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp)
+{
+	return walk_kernel_page_table_range_lockless(start, end,
+					&split_to_ptes_ops, NULL, &gfp);
+}
+
 static bool linear_map_requires_bbml2 __initdata;
 
 u32 idmap_kpti_bbml2_flag;
@@ -847,11 +872,9 @@ static int __init linear_map_split_to_ptes(void *__unused)
 		 * PTE. The kernel alias remains static throughout runtime so
 		 * can continue to be safely mapped with large mappings.
 		 */
-		ret = walk_kernel_page_table_range_lockless(lstart, kstart,
-						&split_to_ptes_ops, NULL, NULL);
+		ret = range_split_to_ptes(lstart, kstart, GFP_ATOMIC);
 		if (!ret)
-			ret = walk_kernel_page_table_range_lockless(kend, lend,
-						&split_to_ptes_ops, NULL, NULL);
+			ret = range_split_to_ptes(kend, lend, GFP_ATOMIC);
 		if (ret)
 			panic("Failed to split linear map\n");
 		flush_tlb_kernel_range(lstart, lend);
@@ -1002,6 +1025,33 @@ static void __init arm64_kfence_map_pool(phys_addr_t kfence_pool, pgd_t *pgdp)
 	memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
 	__kfence_pool = phys_to_virt(kfence_pool);
 }
+
+bool arch_kfence_init_pool(void)
+{
+	unsigned long start = (unsigned long)__kfence_pool;
+	unsigned long end = start + KFENCE_POOL_SIZE;
+	int ret;
+
+	/* Exit early if we know the linear map is already pte-mapped. */
+	if (!system_supports_bbml2_noabort() || force_pte_mapping())
+		return true;
+
+	/* Kfence pool is already pte-mapped for the early init case. */
+	if (kfence_early_init)
+		return true;
+
+	mutex_lock(&pgtable_split_lock);
+	ret = range_split_to_ptes(start, end, GFP_PGTABLE_KERNEL);
+	mutex_unlock(&pgtable_split_lock);
+
+	/*
+	 * Since the system supports bbml2_noabort, tlb invalidation is not
+	 * required here; the pgtable mappings have been split to pte but larger
+	 * entries may safely linger in the TLB.
+	 */
+
+	return !ret;
+}
 #else /* CONFIG_KFENCE */
 
 static inline phys_addr_t arm64_kfence_alloc_pool(void) { return 0; }
@@ -1009,16 +1059,6 @@ static inline void arm64_kfence_map_pool(phys_addr_t kfence_pool, pgd_t *pgdp) {
 
 #endif /* CONFIG_KFENCE */
 
-static inline bool force_pte_mapping(void)
-{
-	bool bbml2 = system_capabilities_finalized() ?
-		system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
-
-	return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
-			   is_realm_world())) ||
-		debug_pagealloc_enabled();
-}
-
 static void __init map_mem(pgd_t *pgdp)
 {
 	static const u64 direct_map_end = _PAGE_END(VA_BITS_MIN);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] arm64: mm: Optimize range_split_to_ptes()
  2025-11-06 16:09 [PATCH v2 0/3] Don't sleep in split_kernel_leaf_mapping() when in atomic context Ryan Roberts
  2025-11-06 16:09 ` [PATCH v2 1/3] arm64: mm: " Ryan Roberts
@ 2025-11-06 16:09 ` Ryan Roberts
  2025-11-06 20:47   ` Yang Shi
  2025-11-06 16:09 ` [PATCH v2 3/3] arm64: mm: Tidy up force_pte_mapping() Ryan Roberts
  2025-11-07 15:53 ` [PATCH v2 0/3] Don't sleep in split_kernel_leaf_mapping() when in atomic context Will Deacon
  3 siblings, 1 reply; 11+ messages in thread
From: Ryan Roberts @ 2025-11-06 16:09 UTC (permalink / raw)
  To: catalin.marinas, will, yang, david, ardb, dev.jain, scott, cl
  Cc: Ryan Roberts, linux-arm-kernel, linux-kernel

Enter lazy_mmu mode while splitting a range of memory to pte mappings.
This causes barriers, which would otherwise be emitted after every pte
(and pmd/pud) write, to be deferred until exiting lazy_mmu mode.

For large systems, this is expected to significantly speed up fallback
to pte-mapping the linear map for the case where the boot CPU has
BBML2_NOABORT, but secondary CPUs do not. I haven't directly measured
it, but this is equivalent to commit 1fcb7cea8a5f ("arm64: mm: Batch dsb
and isb when populating pgtables").

Note that for the path from arch_kfence_init_pool(), we may sleep while
allocating memory inside the lazy_mmu mode. Sleeping is not allowed by
generic code inside lazy_mmu, but we know that the arm64 implementation
is sleep-safe. So this is ok and follows the same pattern already used
by split_kernel_leaf_mapping().

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/mm/mmu.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a364ac2c9c61..652bb8c14035 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -832,8 +832,14 @@ static const struct mm_walk_ops split_to_ptes_ops = {
 
 static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp)
 {
-	return walk_kernel_page_table_range_lockless(start, end,
+	int ret;
+
+	arch_enter_lazy_mmu_mode();
+	ret = walk_kernel_page_table_range_lockless(start, end,
 					&split_to_ptes_ops, NULL, &gfp);
+	arch_leave_lazy_mmu_mode();
+
+	return ret;
 }
 
 static bool linear_map_requires_bbml2 __initdata;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] arm64: mm: Tidy up force_pte_mapping()
  2025-11-06 16:09 [PATCH v2 0/3] Don't sleep in split_kernel_leaf_mapping() when in atomic context Ryan Roberts
  2025-11-06 16:09 ` [PATCH v2 1/3] arm64: mm: " Ryan Roberts
  2025-11-06 16:09 ` [PATCH v2 2/3] arm64: mm: Optimize range_split_to_ptes() Ryan Roberts
@ 2025-11-06 16:09 ` Ryan Roberts
  2025-11-06 20:51   ` Yang Shi
  2025-11-06 21:08   ` David Hildenbrand (Red Hat)
  2025-11-07 15:53 ` [PATCH v2 0/3] Don't sleep in split_kernel_leaf_mapping() when in atomic context Will Deacon
  3 siblings, 2 replies; 11+ messages in thread
From: Ryan Roberts @ 2025-11-06 16:09 UTC (permalink / raw)
  To: catalin.marinas, will, yang, david, ardb, dev.jain, scott, cl
  Cc: Ryan Roberts, linux-arm-kernel, linux-kernel, David Hildenbrand

Tidy up the implementation of force_pte_mapping() to make it easier to
read and introduce the split_leaf_mapping_possible() helper to reduce
code duplication in split_kernel_leaf_mapping() and
arch_kfence_init_pool().

Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/mm/mmu.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 652bb8c14035..2ba01dc8ef82 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -710,12 +710,26 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
 
 static inline bool force_pte_mapping(void)
 {
-	bool bbml2 = system_capabilities_finalized() ?
+	const bool bbml2 = system_capabilities_finalized() ?
 		system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
 
-	return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
-			   is_realm_world())) ||
-		debug_pagealloc_enabled();
+	if (debug_pagealloc_enabled())
+		return true;
+	if (bbml2)
+		return false;
+	return rodata_full || arm64_kfence_can_set_direct_map() || is_realm_world();
+}
+
+static inline bool split_leaf_mapping_possible(void)
+{
+	/*
+	 * !BBML2_NOABORT systems should never run into scenarios where we would
+	 * have to split. So exit early and let calling code detect it and raise
+	 * a warning.
+	 */
+	if (!system_supports_bbml2_noabort())
+		return false;
+	return !force_pte_mapping();
 }
 
 static DEFINE_MUTEX(pgtable_split_lock);
@@ -725,22 +739,11 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
 	int ret;
 
 	/*
-	 * !BBML2_NOABORT systems should not be trying to change permissions on
-	 * anything that is not pte-mapped in the first place. Just return early
-	 * and let the permission change code raise a warning if not already
-	 * pte-mapped.
-	 */
-	if (!system_supports_bbml2_noabort())
-		return 0;
-
-	/*
-	 * If the region is within a pte-mapped area, there is no need to try to
-	 * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
-	 * change permissions from atomic context so for those cases (which are
-	 * always pte-mapped), we must not go any further because taking the
-	 * mutex below may sleep.
+	 * Exit early if the region is within a pte-mapped area or if we can't
+	 * split. For the latter case, the permission change code will raise a
+	 * warning if not already pte-mapped.
 	 */
-	if (force_pte_mapping() || is_kfence_address((void *)start))
+	if (!split_leaf_mapping_possible() || is_kfence_address((void *)start))
 		return 0;
 
 	/*
@@ -1039,7 +1042,7 @@ bool arch_kfence_init_pool(void)
 	int ret;
 
 	/* Exit early if we know the linear map is already pte-mapped. */
-	if (!system_supports_bbml2_noabort() || force_pte_mapping())
+	if (!split_leaf_mapping_possible())
 		return true;
 
 	/* Kfence pool is already pte-mapped for the early init case. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context
  2025-11-06 16:09 ` [PATCH v2 1/3] arm64: mm: " Ryan Roberts
@ 2025-11-06 20:46   ` Yang Shi
  2025-11-07 12:10     ` Ryan Roberts
  2025-11-06 21:08   ` David Hildenbrand (Red Hat)
  1 sibling, 1 reply; 11+ messages in thread
From: Yang Shi @ 2025-11-06 20:46 UTC (permalink / raw)
  To: Ryan Roberts, catalin.marinas, will, david, ardb, dev.jain, scott,
	cl
  Cc: linux-arm-kernel, linux-kernel, Guenter Roeck



On 11/6/25 8:09 AM, Ryan Roberts wrote:
> It has been reported that split_kernel_leaf_mapping() is trying to sleep
> in non-sleepable context. It does this when acquiring the
> pgtable_split_lock mutex, when either CONFIG_DEBUG_PAGEALLOC or
> CONFIG_KFENCE are enabled, which change linear map permissions within
> softirq context during memory allocation and/or freeing. All other paths
> into this function are called from sleepable context and so are safe.
>
> But it turns out that the memory for which these 2 features may attempt
> to modify the permissions is always mapped by pte, so there is no need
> to attempt to split the mapping. So let's exit early in these cases and
> avoid attempting to take the mutex.
>
> There is one wrinkle to this approach; late-initialized kfence allocates
> it's pool from the buddy which may be block mapped. So we must hook that
> allocation and convert it to pte-mappings up front. Previously this was
> done as a side-effect of kfence protecting all the individual pages in
> its pool at init-time, but this no longer works due to the added early
> exit path in split_kernel_leaf_mapping().
>
> So instead, do this via the existing arch_kfence_init_pool() arch hook,
> and reuse the existing linear_map_split_to_ptes() infrastructure.
>
> Closes: https://lore.kernel.org/all/f24b9032-0ec9-47b1-8b95-c0eeac7a31c5@roeck-us.net/
> Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
> Tested-by: Guenter Roeck <groeck@google.com>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

Reviewed-by: Yang Shi <yang@os.amperecomputing.com>

Just a nit below:

> ---
>   arch/arm64/include/asm/kfence.h |  3 +-
>   arch/arm64/mm/mmu.c             | 92 +++++++++++++++++++++++----------
>   2 files changed, 67 insertions(+), 28 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kfence.h b/arch/arm64/include/asm/kfence.h
> index a81937fae9f6..21dbc9dda747 100644
> --- a/arch/arm64/include/asm/kfence.h
> +++ b/arch/arm64/include/asm/kfence.h
> @@ -10,8 +10,6 @@
>   
>   #include <asm/set_memory.h>
>   
> -static inline bool arch_kfence_init_pool(void) { return true; }
> -
>   static inline bool kfence_protect_page(unsigned long addr, bool protect)
>   {
>   	set_memory_valid(addr, 1, !protect);
> @@ -25,6 +23,7 @@ static inline bool arm64_kfence_can_set_direct_map(void)
>   {
>   	return !kfence_early_init;
>   }
> +bool arch_kfence_init_pool(void);
>   #else /* CONFIG_KFENCE */
>   static inline bool arm64_kfence_can_set_direct_map(void) { return false; }
>   #endif /* CONFIG_KFENCE */
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index b8d37eb037fc..a364ac2c9c61 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -708,6 +708,16 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
>   	return ret;
>   }
>   
> +static inline bool force_pte_mapping(void)
> +{
> +	bool bbml2 = system_capabilities_finalized() ?
> +		system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
> +
> +	return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
> +			   is_realm_world())) ||
> +		debug_pagealloc_enabled();
> +}
> +
>   static DEFINE_MUTEX(pgtable_split_lock);
>   
>   int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
> @@ -723,6 +733,16 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
>   	if (!system_supports_bbml2_noabort())
>   		return 0;
>   
> +	/*
> +	 * If the region is within a pte-mapped area, there is no need to try to
> +	 * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
> +	 * change permissions from atomic context so for those cases (which are
> +	 * always pte-mapped), we must not go any further because taking the
> +	 * mutex below may sleep.

The path 3 changed the comment, but since patch 3 just does some cleanup 
and code deduplication, there is no functional change, so why not just 
use the comment from patch 3?

Thanks,
Yang

> +	 */
> +	if (force_pte_mapping() || is_kfence_address((void *)start))
> +		return 0;
> +
>   	/*
>   	 * Ensure start and end are at least page-aligned since this is the
>   	 * finest granularity we can split to.
> @@ -758,30 +778,30 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
>   	return ret;
>   }
>   
> -static int __init split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
> -					  unsigned long next,
> -					  struct mm_walk *walk)
> +static int split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
> +				   unsigned long next, struct mm_walk *walk)
>   {
> +	gfp_t gfp = *(gfp_t *)walk->private;
>   	pud_t pud = pudp_get(pudp);
>   	int ret = 0;
>   
>   	if (pud_leaf(pud))
> -		ret = split_pud(pudp, pud, GFP_ATOMIC, false);
> +		ret = split_pud(pudp, pud, gfp, false);
>   
>   	return ret;
>   }
>   
> -static int __init split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
> -					  unsigned long next,
> -					  struct mm_walk *walk)
> +static int split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
> +				   unsigned long next, struct mm_walk *walk)
>   {
> +	gfp_t gfp = *(gfp_t *)walk->private;
>   	pmd_t pmd = pmdp_get(pmdp);
>   	int ret = 0;
>   
>   	if (pmd_leaf(pmd)) {
>   		if (pmd_cont(pmd))
>   			split_contpmd(pmdp);
> -		ret = split_pmd(pmdp, pmd, GFP_ATOMIC, false);
> +		ret = split_pmd(pmdp, pmd, gfp, false);
>   
>   		/*
>   		 * We have split the pmd directly to ptes so there is no need to
> @@ -793,9 +813,8 @@ static int __init split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
>   	return ret;
>   }
>   
> -static int __init split_to_ptes_pte_entry(pte_t *ptep, unsigned long addr,
> -					  unsigned long next,
> -					  struct mm_walk *walk)
> +static int split_to_ptes_pte_entry(pte_t *ptep, unsigned long addr,
> +				   unsigned long next, struct mm_walk *walk)
>   {
>   	pte_t pte = __ptep_get(ptep);
>   
> @@ -805,12 +824,18 @@ static int __init split_to_ptes_pte_entry(pte_t *ptep, unsigned long addr,
>   	return 0;
>   }
>   
> -static const struct mm_walk_ops split_to_ptes_ops __initconst = {
> +static const struct mm_walk_ops split_to_ptes_ops = {
>   	.pud_entry	= split_to_ptes_pud_entry,
>   	.pmd_entry	= split_to_ptes_pmd_entry,
>   	.pte_entry	= split_to_ptes_pte_entry,
>   };
>   
> +static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp)
> +{
> +	return walk_kernel_page_table_range_lockless(start, end,
> +					&split_to_ptes_ops, NULL, &gfp);
> +}
> +
>   static bool linear_map_requires_bbml2 __initdata;
>   
>   u32 idmap_kpti_bbml2_flag;
> @@ -847,11 +872,9 @@ static int __init linear_map_split_to_ptes(void *__unused)
>   		 * PTE. The kernel alias remains static throughout runtime so
>   		 * can continue to be safely mapped with large mappings.
>   		 */
> -		ret = walk_kernel_page_table_range_lockless(lstart, kstart,
> -						&split_to_ptes_ops, NULL, NULL);
> +		ret = range_split_to_ptes(lstart, kstart, GFP_ATOMIC);
>   		if (!ret)
> -			ret = walk_kernel_page_table_range_lockless(kend, lend,
> -						&split_to_ptes_ops, NULL, NULL);
> +			ret = range_split_to_ptes(kend, lend, GFP_ATOMIC);
>   		if (ret)
>   			panic("Failed to split linear map\n");
>   		flush_tlb_kernel_range(lstart, lend);
> @@ -1002,6 +1025,33 @@ static void __init arm64_kfence_map_pool(phys_addr_t kfence_pool, pgd_t *pgdp)
>   	memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
>   	__kfence_pool = phys_to_virt(kfence_pool);
>   }
> +
> +bool arch_kfence_init_pool(void)
> +{
> +	unsigned long start = (unsigned long)__kfence_pool;
> +	unsigned long end = start + KFENCE_POOL_SIZE;
> +	int ret;
> +
> +	/* Exit early if we know the linear map is already pte-mapped. */
> +	if (!system_supports_bbml2_noabort() || force_pte_mapping())
> +		return true;
> +
> +	/* Kfence pool is already pte-mapped for the early init case. */
> +	if (kfence_early_init)
> +		return true;
> +
> +	mutex_lock(&pgtable_split_lock);
> +	ret = range_split_to_ptes(start, end, GFP_PGTABLE_KERNEL);
> +	mutex_unlock(&pgtable_split_lock);
> +
> +	/*
> +	 * Since the system supports bbml2_noabort, tlb invalidation is not
> +	 * required here; the pgtable mappings have been split to pte but larger
> +	 * entries may safely linger in the TLB.
> +	 */
> +
> +	return !ret;
> +}
>   #else /* CONFIG_KFENCE */
>   
>   static inline phys_addr_t arm64_kfence_alloc_pool(void) { return 0; }
> @@ -1009,16 +1059,6 @@ static inline void arm64_kfence_map_pool(phys_addr_t kfence_pool, pgd_t *pgdp) {
>   
>   #endif /* CONFIG_KFENCE */
>   
> -static inline bool force_pte_mapping(void)
> -{
> -	bool bbml2 = system_capabilities_finalized() ?
> -		system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
> -
> -	return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
> -			   is_realm_world())) ||
> -		debug_pagealloc_enabled();
> -}
> -
>   static void __init map_mem(pgd_t *pgdp)
>   {
>   	static const u64 direct_map_end = _PAGE_END(VA_BITS_MIN);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] arm64: mm: Optimize range_split_to_ptes()
  2025-11-06 16:09 ` [PATCH v2 2/3] arm64: mm: Optimize range_split_to_ptes() Ryan Roberts
@ 2025-11-06 20:47   ` Yang Shi
  0 siblings, 0 replies; 11+ messages in thread
From: Yang Shi @ 2025-11-06 20:47 UTC (permalink / raw)
  To: Ryan Roberts, catalin.marinas, will, david, ardb, dev.jain, scott,
	cl
  Cc: linux-arm-kernel, linux-kernel



On 11/6/25 8:09 AM, Ryan Roberts wrote:
> Enter lazy_mmu mode while splitting a range of memory to pte mappings.
> This causes barriers, which would otherwise be emitted after every pte
> (and pmd/pud) write, to be deferred until exiting lazy_mmu mode.
>
> For large systems, this is expected to significantly speed up fallback
> to pte-mapping the linear map for the case where the boot CPU has
> BBML2_NOABORT, but secondary CPUs do not. I haven't directly measured
> it, but this is equivalent to commit 1fcb7cea8a5f ("arm64: mm: Batch dsb
> and isb when populating pgtables").
>
> Note that for the path from arch_kfence_init_pool(), we may sleep while
> allocating memory inside the lazy_mmu mode. Sleeping is not allowed by
> generic code inside lazy_mmu, but we know that the arm64 implementation
> is sleep-safe. So this is ok and follows the same pattern already used
> by split_kernel_leaf_mapping().
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

Reviewed-by: Yang Shi <yang@os.amperecomputing.com>

Thanks,
Yang

> ---
>   arch/arm64/mm/mmu.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index a364ac2c9c61..652bb8c14035 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -832,8 +832,14 @@ static const struct mm_walk_ops split_to_ptes_ops = {
>   
>   static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp)
>   {
> -	return walk_kernel_page_table_range_lockless(start, end,
> +	int ret;
> +
> +	arch_enter_lazy_mmu_mode();
> +	ret = walk_kernel_page_table_range_lockless(start, end,
>   					&split_to_ptes_ops, NULL, &gfp);
> +	arch_leave_lazy_mmu_mode();
> +
> +	return ret;
>   }
>   
>   static bool linear_map_requires_bbml2 __initdata;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] arm64: mm: Tidy up force_pte_mapping()
  2025-11-06 16:09 ` [PATCH v2 3/3] arm64: mm: Tidy up force_pte_mapping() Ryan Roberts
@ 2025-11-06 20:51   ` Yang Shi
  2025-11-06 21:08   ` David Hildenbrand (Red Hat)
  1 sibling, 0 replies; 11+ messages in thread
From: Yang Shi @ 2025-11-06 20:51 UTC (permalink / raw)
  To: Ryan Roberts, catalin.marinas, will, david, ardb, dev.jain, scott,
	cl
  Cc: linux-arm-kernel, linux-kernel, David Hildenbrand



On 11/6/25 8:09 AM, Ryan Roberts wrote:
> Tidy up the implementation of force_pte_mapping() to make it easier to
> read and introduce the split_leaf_mapping_possible() helper to reduce
> code duplication in split_kernel_leaf_mapping() and
> arch_kfence_init_pool().
>
> Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

Reviewed-by: Yang Shi <yang@os.amperecomputing.com>

Thanks,
Yang

> ---
>   arch/arm64/mm/mmu.c | 43 +++++++++++++++++++++++--------------------
>   1 file changed, 23 insertions(+), 20 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 652bb8c14035..2ba01dc8ef82 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -710,12 +710,26 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
>   
>   static inline bool force_pte_mapping(void)
>   {
> -	bool bbml2 = system_capabilities_finalized() ?
> +	const bool bbml2 = system_capabilities_finalized() ?
>   		system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
>   
> -	return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
> -			   is_realm_world())) ||
> -		debug_pagealloc_enabled();
> +	if (debug_pagealloc_enabled())
> +		return true;
> +	if (bbml2)
> +		return false;
> +	return rodata_full || arm64_kfence_can_set_direct_map() || is_realm_world();
> +}
> +
> +static inline bool split_leaf_mapping_possible(void)
> +{
> +	/*
> +	 * !BBML2_NOABORT systems should never run into scenarios where we would
> +	 * have to split. So exit early and let calling code detect it and raise
> +	 * a warning.
> +	 */
> +	if (!system_supports_bbml2_noabort())
> +		return false;
> +	return !force_pte_mapping();
>   }
>   
>   static DEFINE_MUTEX(pgtable_split_lock);
> @@ -725,22 +739,11 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
>   	int ret;
>   
>   	/*
> -	 * !BBML2_NOABORT systems should not be trying to change permissions on
> -	 * anything that is not pte-mapped in the first place. Just return early
> -	 * and let the permission change code raise a warning if not already
> -	 * pte-mapped.
> -	 */
> -	if (!system_supports_bbml2_noabort())
> -		return 0;
> -
> -	/*
> -	 * If the region is within a pte-mapped area, there is no need to try to
> -	 * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
> -	 * change permissions from atomic context so for those cases (which are
> -	 * always pte-mapped), we must not go any further because taking the
> -	 * mutex below may sleep.
> +	 * Exit early if the region is within a pte-mapped area or if we can't
> +	 * split. For the latter case, the permission change code will raise a
> +	 * warning if not already pte-mapped.
>   	 */
> -	if (force_pte_mapping() || is_kfence_address((void *)start))
> +	if (!split_leaf_mapping_possible() || is_kfence_address((void *)start))
>   		return 0;
>   
>   	/*
> @@ -1039,7 +1042,7 @@ bool arch_kfence_init_pool(void)
>   	int ret;
>   
>   	/* Exit early if we know the linear map is already pte-mapped. */
> -	if (!system_supports_bbml2_noabort() || force_pte_mapping())
> +	if (!split_leaf_mapping_possible())
>   		return true;
>   
>   	/* Kfence pool is already pte-mapped for the early init case. */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context
  2025-11-06 16:09 ` [PATCH v2 1/3] arm64: mm: " Ryan Roberts
  2025-11-06 20:46   ` Yang Shi
@ 2025-11-06 21:08   ` David Hildenbrand (Red Hat)
  1 sibling, 0 replies; 11+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-06 21:08 UTC (permalink / raw)
  To: Ryan Roberts, catalin.marinas, will, yang, ardb, dev.jain, scott,
	cl
  Cc: linux-arm-kernel, linux-kernel, Guenter Roeck

On 06.11.25 17:09, Ryan Roberts wrote:
> It has been reported that split_kernel_leaf_mapping() is trying to sleep
> in non-sleepable context. It does this when acquiring the
> pgtable_split_lock mutex, when either CONFIG_DEBUG_PAGEALLOC or
> CONFIG_KFENCE are enabled, which change linear map permissions within
> softirq context during memory allocation and/or freeing. All other paths
> into this function are called from sleepable context and so are safe.
> 
> But it turns out that the memory for which these 2 features may attempt
> to modify the permissions is always mapped by pte, so there is no need
> to attempt to split the mapping. So let's exit early in these cases and
> avoid attempting to take the mutex.
> 
> There is one wrinkle to this approach; late-initialized kfence allocates
> it's pool from the buddy which may be block mapped. So we must hook that
> allocation and convert it to pte-mappings up front. Previously this was
> done as a side-effect of kfence protecting all the individual pages in
> its pool at init-time, but this no longer works due to the added early
> exit path in split_kernel_leaf_mapping().
> 
> So instead, do this via the existing arch_kfence_init_pool() arch hook,
> and reuse the existing linear_map_split_to_ptes() infrastructure.
> 
> Closes: https://lore.kernel.org/all/f24b9032-0ec9-47b1-8b95-c0eeac7a31c5@roeck-us.net/
> Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
> Tested-by: Guenter Roeck <groeck@google.com>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---

Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] arm64: mm: Tidy up force_pte_mapping()
  2025-11-06 16:09 ` [PATCH v2 3/3] arm64: mm: Tidy up force_pte_mapping() Ryan Roberts
  2025-11-06 20:51   ` Yang Shi
@ 2025-11-06 21:08   ` David Hildenbrand (Red Hat)
  1 sibling, 0 replies; 11+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-06 21:08 UTC (permalink / raw)
  To: Ryan Roberts, catalin.marinas, will, yang, ardb, dev.jain, scott,
	cl
  Cc: linux-arm-kernel, linux-kernel

On 06.11.25 17:09, Ryan Roberts wrote:
> Tidy up the implementation of force_pte_mapping() to make it easier to
> read and introduce the split_leaf_mapping_possible() helper to reduce
> code duplication in split_kernel_leaf_mapping() and
> arch_kfence_init_pool().
> 
> Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---

Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>

-- 
Cheers

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context
  2025-11-06 20:46   ` Yang Shi
@ 2025-11-07 12:10     ` Ryan Roberts
  0 siblings, 0 replies; 11+ messages in thread
From: Ryan Roberts @ 2025-11-07 12:10 UTC (permalink / raw)
  To: Yang Shi, catalin.marinas, will, david, ardb, dev.jain, scott, cl
  Cc: linux-arm-kernel, linux-kernel, Guenter Roeck

On 06/11/2025 20:46, Yang Shi wrote:
> 
> 
> On 11/6/25 8:09 AM, Ryan Roberts wrote:
>> It has been reported that split_kernel_leaf_mapping() is trying to sleep
>> in non-sleepable context. It does this when acquiring the
>> pgtable_split_lock mutex, when either CONFIG_DEBUG_PAGEALLOC or
>> CONFIG_KFENCE are enabled, which change linear map permissions within
>> softirq context during memory allocation and/or freeing. All other paths
>> into this function are called from sleepable context and so are safe.
>>
>> But it turns out that the memory for which these 2 features may attempt
>> to modify the permissions is always mapped by pte, so there is no need
>> to attempt to split the mapping. So let's exit early in these cases and
>> avoid attempting to take the mutex.
>>
>> There is one wrinkle to this approach; late-initialized kfence allocates
>> it's pool from the buddy which may be block mapped. So we must hook that
>> allocation and convert it to pte-mappings up front. Previously this was
>> done as a side-effect of kfence protecting all the individual pages in
>> its pool at init-time, but this no longer works due to the added early
>> exit path in split_kernel_leaf_mapping().
>>
>> So instead, do this via the existing arch_kfence_init_pool() arch hook,
>> and reuse the existing linear_map_split_to_ptes() infrastructure.
>>
>> Closes: https://lore.kernel.org/all/f24b9032-0ec9-47b1-8b95-
>> c0eeac7a31c5@roeck-us.net/
>> Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
>> Tested-by: Guenter Roeck <groeck@google.com>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> 
> Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
> 
> Just a nit below:
> 
>> ---
>>   arch/arm64/include/asm/kfence.h |  3 +-
>>   arch/arm64/mm/mmu.c             | 92 +++++++++++++++++++++++----------
>>   2 files changed, 67 insertions(+), 28 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kfence.h b/arch/arm64/include/asm/kfence.h
>> index a81937fae9f6..21dbc9dda747 100644
>> --- a/arch/arm64/include/asm/kfence.h
>> +++ b/arch/arm64/include/asm/kfence.h
>> @@ -10,8 +10,6 @@
>>     #include <asm/set_memory.h>
>>   -static inline bool arch_kfence_init_pool(void) { return true; }
>> -
>>   static inline bool kfence_protect_page(unsigned long addr, bool protect)
>>   {
>>       set_memory_valid(addr, 1, !protect);
>> @@ -25,6 +23,7 @@ static inline bool arm64_kfence_can_set_direct_map(void)
>>   {
>>       return !kfence_early_init;
>>   }
>> +bool arch_kfence_init_pool(void);
>>   #else /* CONFIG_KFENCE */
>>   static inline bool arm64_kfence_can_set_direct_map(void) { return false; }
>>   #endif /* CONFIG_KFENCE */
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index b8d37eb037fc..a364ac2c9c61 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -708,6 +708,16 @@ static int split_kernel_leaf_mapping_locked(unsigned long
>> addr)
>>       return ret;
>>   }
>>   +static inline bool force_pte_mapping(void)
>> +{
>> +    bool bbml2 = system_capabilities_finalized() ?
>> +        system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
>> +
>> +    return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
>> +               is_realm_world())) ||
>> +        debug_pagealloc_enabled();
>> +}
>> +
>>   static DEFINE_MUTEX(pgtable_split_lock);
>>     int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
>> @@ -723,6 +733,16 @@ int split_kernel_leaf_mapping(unsigned long start,
>> unsigned long end)
>>       if (!system_supports_bbml2_noabort())
>>           return 0;
>>   +    /*
>> +     * If the region is within a pte-mapped area, there is no need to try to
>> +     * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
>> +     * change permissions from atomic context so for those cases (which are
>> +     * always pte-mapped), we must not go any further because taking the
>> +     * mutex below may sleep.
> 
> The path 3 changed the comment, but since patch 3 just does some cleanup and
> code deduplication, there is no functional change, so why not just use the
> comment from patch 3?

The reason I changed it is because in patch 3 we introduce
split_leaf_mapping_possible(), which is also doing the
!system_supports_bbml2_noabort() check above, so this original comment only
applies to a subset of the reasons we may exit here. It felt confusing to me. So
I decided to simplify it. The rationale is all captured in the commit log, so I
didn't think it was a big deal.

Thanks,
Ryan

> 
> Thanks,
> Yang
> 
>> +     */
>> +    if (force_pte_mapping() || is_kfence_address((void *)start))
>> +        return 0;
>> +
>>       /*
>>        * Ensure start and end are at least page-aligned since this is the
>>        * finest granularity we can split to.
>> @@ -758,30 +778,30 @@ int split_kernel_leaf_mapping(unsigned long start,
>> unsigned long end)
>>       return ret;
>>   }
>>   -static int __init split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
>> -                      unsigned long next,
>> -                      struct mm_walk *walk)
>> +static int split_to_ptes_pud_entry(pud_t *pudp, unsigned long addr,
>> +                   unsigned long next, struct mm_walk *walk)
>>   {
>> +    gfp_t gfp = *(gfp_t *)walk->private;
>>       pud_t pud = pudp_get(pudp);
>>       int ret = 0;
>>         if (pud_leaf(pud))
>> -        ret = split_pud(pudp, pud, GFP_ATOMIC, false);
>> +        ret = split_pud(pudp, pud, gfp, false);
>>         return ret;
>>   }
>>   -static int __init split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
>> -                      unsigned long next,
>> -                      struct mm_walk *walk)
>> +static int split_to_ptes_pmd_entry(pmd_t *pmdp, unsigned long addr,
>> +                   unsigned long next, struct mm_walk *walk)
>>   {
>> +    gfp_t gfp = *(gfp_t *)walk->private;
>>       pmd_t pmd = pmdp_get(pmdp);
>>       int ret = 0;
>>         if (pmd_leaf(pmd)) {
>>           if (pmd_cont(pmd))
>>               split_contpmd(pmdp);
>> -        ret = split_pmd(pmdp, pmd, GFP_ATOMIC, false);
>> +        ret = split_pmd(pmdp, pmd, gfp, false);
>>             /*
>>            * We have split the pmd directly to ptes so there is no need to
>> @@ -793,9 +813,8 @@ static int __init split_to_ptes_pmd_entry(pmd_t *pmdp,
>> unsigned long addr,
>>       return ret;
>>   }
>>   -static int __init split_to_ptes_pte_entry(pte_t *ptep, unsigned long addr,
>> -                      unsigned long next,
>> -                      struct mm_walk *walk)
>> +static int split_to_ptes_pte_entry(pte_t *ptep, unsigned long addr,
>> +                   unsigned long next, struct mm_walk *walk)
>>   {
>>       pte_t pte = __ptep_get(ptep);
>>   @@ -805,12 +824,18 @@ static int __init split_to_ptes_pte_entry(pte_t *ptep,
>> unsigned long addr,
>>       return 0;
>>   }
>>   -static const struct mm_walk_ops split_to_ptes_ops __initconst = {
>> +static const struct mm_walk_ops split_to_ptes_ops = {
>>       .pud_entry    = split_to_ptes_pud_entry,
>>       .pmd_entry    = split_to_ptes_pmd_entry,
>>       .pte_entry    = split_to_ptes_pte_entry,
>>   };
>>   +static int range_split_to_ptes(unsigned long start, unsigned long end,
>> gfp_t gfp)
>> +{
>> +    return walk_kernel_page_table_range_lockless(start, end,
>> +                    &split_to_ptes_ops, NULL, &gfp);
>> +}
>> +
>>   static bool linear_map_requires_bbml2 __initdata;
>>     u32 idmap_kpti_bbml2_flag;
>> @@ -847,11 +872,9 @@ static int __init linear_map_split_to_ptes(void *__unused)
>>            * PTE. The kernel alias remains static throughout runtime so
>>            * can continue to be safely mapped with large mappings.
>>            */
>> -        ret = walk_kernel_page_table_range_lockless(lstart, kstart,
>> -                        &split_to_ptes_ops, NULL, NULL);
>> +        ret = range_split_to_ptes(lstart, kstart, GFP_ATOMIC);
>>           if (!ret)
>> -            ret = walk_kernel_page_table_range_lockless(kend, lend,
>> -                        &split_to_ptes_ops, NULL, NULL);
>> +            ret = range_split_to_ptes(kend, lend, GFP_ATOMIC);
>>           if (ret)
>>               panic("Failed to split linear map\n");
>>           flush_tlb_kernel_range(lstart, lend);
>> @@ -1002,6 +1025,33 @@ static void __init arm64_kfence_map_pool(phys_addr_t
>> kfence_pool, pgd_t *pgdp)
>>       memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
>>       __kfence_pool = phys_to_virt(kfence_pool);
>>   }
>> +
>> +bool arch_kfence_init_pool(void)
>> +{
>> +    unsigned long start = (unsigned long)__kfence_pool;
>> +    unsigned long end = start + KFENCE_POOL_SIZE;
>> +    int ret;
>> +
>> +    /* Exit early if we know the linear map is already pte-mapped. */
>> +    if (!system_supports_bbml2_noabort() || force_pte_mapping())
>> +        return true;
>> +
>> +    /* Kfence pool is already pte-mapped for the early init case. */
>> +    if (kfence_early_init)
>> +        return true;
>> +
>> +    mutex_lock(&pgtable_split_lock);
>> +    ret = range_split_to_ptes(start, end, GFP_PGTABLE_KERNEL);
>> +    mutex_unlock(&pgtable_split_lock);
>> +
>> +    /*
>> +     * Since the system supports bbml2_noabort, tlb invalidation is not
>> +     * required here; the pgtable mappings have been split to pte but larger
>> +     * entries may safely linger in the TLB.
>> +     */
>> +
>> +    return !ret;
>> +}
>>   #else /* CONFIG_KFENCE */
>>     static inline phys_addr_t arm64_kfence_alloc_pool(void) { return 0; }
>> @@ -1009,16 +1059,6 @@ static inline void arm64_kfence_map_pool(phys_addr_t
>> kfence_pool, pgd_t *pgdp) {
>>     #endif /* CONFIG_KFENCE */
>>   -static inline bool force_pte_mapping(void)
>> -{
>> -    bool bbml2 = system_capabilities_finalized() ?
>> -        system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
>> -
>> -    return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
>> -               is_realm_world())) ||
>> -        debug_pagealloc_enabled();
>> -}
>> -
>>   static void __init map_mem(pgd_t *pgdp)
>>   {
>>       static const u64 direct_map_end = _PAGE_END(VA_BITS_MIN);
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] Don't sleep in split_kernel_leaf_mapping() when in atomic context
  2025-11-06 16:09 [PATCH v2 0/3] Don't sleep in split_kernel_leaf_mapping() when in atomic context Ryan Roberts
                   ` (2 preceding siblings ...)
  2025-11-06 16:09 ` [PATCH v2 3/3] arm64: mm: Tidy up force_pte_mapping() Ryan Roberts
@ 2025-11-07 15:53 ` Will Deacon
  3 siblings, 0 replies; 11+ messages in thread
From: Will Deacon @ 2025-11-07 15:53 UTC (permalink / raw)
  To: catalin.marinas, yang, david, ardb, dev.jain, scott, cl,
	Ryan Roberts
  Cc: kernel-team, Will Deacon, linux-arm-kernel, linux-kernel

On Thu, 06 Nov 2025 16:09:40 +0000, Ryan Roberts wrote:
> This is v2 of the fix for split_kernel_leaf_mapping(). I've expanded it into 3
> patches based on feedback from v1 [1].
> 
> Once happy with the content, patch 1 is needed urgently for next -rc to fix
> regression since 6.18-rc1. The other patches could wait until 6.19, but I'd
> prefer they all go together into 6.18.
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/3] arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context
      https://git.kernel.org/arm64/c/ce2b3a50ad92
[2/3] arm64: mm: Optimize range_split_to_ptes()
      https://git.kernel.org/arm64/c/40a292f70147
[3/3] arm64: mm: Tidy up force_pte_mapping()
      https://git.kernel.org/arm64/c/53357f14f924

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-11-07 15:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-06 16:09 [PATCH v2 0/3] Don't sleep in split_kernel_leaf_mapping() when in atomic context Ryan Roberts
2025-11-06 16:09 ` [PATCH v2 1/3] arm64: mm: " Ryan Roberts
2025-11-06 20:46   ` Yang Shi
2025-11-07 12:10     ` Ryan Roberts
2025-11-06 21:08   ` David Hildenbrand (Red Hat)
2025-11-06 16:09 ` [PATCH v2 2/3] arm64: mm: Optimize range_split_to_ptes() Ryan Roberts
2025-11-06 20:47   ` Yang Shi
2025-11-06 16:09 ` [PATCH v2 3/3] arm64: mm: Tidy up force_pte_mapping() Ryan Roberts
2025-11-06 20:51   ` Yang Shi
2025-11-06 21:08   ` David Hildenbrand (Red Hat)
2025-11-07 15:53 ` [PATCH v2 0/3] Don't sleep in split_kernel_leaf_mapping() when in atomic context Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).