linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] x86/mm: Improve alloc handling of phys_*_init()
@ 2025-06-10 10:15 Em Sharnoff
  2025-06-10 10:16 ` [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init() Em Sharnoff
  2025-06-10 10:17 ` [PATCH v3 2/2] x86/mm: Use GFP_KERNEL for alloc_low_pages() after boot Em Sharnoff
  0 siblings, 2 replies; 12+ messages in thread
From: Em Sharnoff @ 2025-06-10 10:15 UTC (permalink / raw)
  To: linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov,
	Edgecombe, Rick P, Oleg Vasilev, Arthur Petukhovsky, Stefan Radig,
	Misha Sakhnov

Hi folks,

See changelog + more context below.

tl;dr:

* Currently alloc_low_page() uses GFP_ATOMIC after boot, which may fail
* Those failures aren't currently handled by phys_pud_init() and similar
  functions.
* Those failures can happen during memory hotplug

So:

1. Add handling for those allocation failures
2. Use GFP_KERNEL instead of GFP_ATOMIC

Previous version here:
https://lore.kernel.org/all/0ce5e150-19e0-457f-bec3-ee031c0be7e7@neon.tech/

=== Changelog ===

v2:
- Switch from special-casing zero values to ERR_PTR()
- Add patch to move from GFP_ATOMIC -> GFP_KERNEL
- Move commentary out of the patch message and into this cover letter
v3:
- Fix -Wint-conversion issues

=== Background ===

We recently started observing these null pointer dereferences happening
in practice (albeit quite rarely), triggered by allocation failures
during virtio-mem hotplug.

We use virtio-mem quite heavily - adding/removing memory based on
resource usage of customer workloads across a fleet of VMs - so it's
somewhat expected that we have occasional allocation failures here, if
we run out of memory before hotplug takes place.

We started seeing this bug after upgrading from 6.6.64 to 6.12.26, but
there didn't appear to be relevant changes in the codepaths involved, so
we figured the upgrade was triggering a latent issue.

The possibility for this issue was also pointed out a while back:

> For alloc_low_pages(), I noticed the callers don’t check for allocation
> failure. I'm a little surprised that there haven't been reports of the
> allocation failing, because these operations could result in a lot more
> pages getting allocated way past boot, and failure causes a NULL
> pointer dereference.

https://lore.kernel.org/all/5aee7bcdf49b1c6b8ee902dd2abd9220169c694b.camel@intel.com/

For completeness, here's an example stack trace we saw (on 6.12.26):

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  ....
  Call Trace:
   <TASK>
   phys_pud_init+0xa0/0x390
   phys_p4d_init+0x93/0x330
   __kernel_physical_mapping_init+0xa1/0x370
   kernel_physical_mapping_init+0xf/0x20
   init_memory_mapping+0x1fa/0x430
   arch_add_memory+0x2b/0x50
   add_memory_resource+0xe6/0x260
   add_memory_driver_managed+0x78/0xc0
   virtio_mem_add_memory+0x46/0xc0
   virtio_mem_sbm_plug_and_add_mb+0xa3/0x160
   virtio_mem_run_wq+0x1035/0x16c0
   process_one_work+0x17a/0x3c0
   worker_thread+0x2c5/0x3f0
   ? _raw_spin_unlock_irqrestore+0x9/0x30
   ? __pfx_worker_thread+0x10/0x10
   kthread+0xdc/0x110
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x35/0x60
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1a/0x30
   </TASK>

and the allocation failure preceding it:

  kworker/0:2: page allocation failure: order:0, mode:0x920(GFP_ATOMIC|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
  ...
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5b/0x70
   dump_stack+0x10/0x20
   warn_alloc+0x103/0x180
   __alloc_pages_slowpath.constprop.0+0x738/0xf30
   __alloc_pages_noprof+0x1e9/0x340
   alloc_pages_mpol_noprof+0x47/0x100
   alloc_pages_noprof+0x4b/0x80
   get_free_pages_noprof+0xc/0x40
   alloc_low_pages+0xc2/0x150
   phys_pud_init+0x82/0x390
  ...

(everything from phys_pud_init and below was the same)

There's some additional context in a github issue we opened on our side:
https://github.com/neondatabase/autoscaling/issues/1391

=== Reproducing / Testing ===

I was able to partially reproduce the original issue we saw by
modifying phys_pud_init() to simulate alloc_low_page() returning null
after boot, and then doing memory hotplug to trigger the "failure".
Something roughly like:

  - pmd = alloc_low_page();
  + if (!after_bootmem)
  + 	pmd = alloc_low_page();
  + else
  + 	pmd = 0;

To test recovery, I also tried simulating just one alloc_low_page()
failure after boot. This change seemed to handle it at a basic level
(virito-mem hotplug succeeded with the right amount, after retrying),
but I didn't dig further.

We also plan to test this in our production environment (where we should
see the difference after a few days); as of 2025-06-10, we haven't yet
rolled that out.


Em Sharnoff (2):
  x86/mm: Handle alloc failure in phys_*_init()
  x86/mm: Use GFP_KERNEL for alloc_low_pages() after boot

 arch/x86/mm/init.c    |  8 +++++--
 arch/x86/mm/init_64.c | 54 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 56 insertions(+), 6 deletions(-)


base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
-- 
2.39.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init()
  2025-06-10 10:15 [PATCH v3 0/2] x86/mm: Improve alloc handling of phys_*_init() Em Sharnoff
@ 2025-06-10 10:16 ` Em Sharnoff
  2025-06-10 14:55   ` Dave Hansen
  2025-06-10 15:07   ` Dave Hansen
  2025-06-10 10:17 ` [PATCH v3 2/2] x86/mm: Use GFP_KERNEL for alloc_low_pages() after boot Em Sharnoff
  1 sibling, 2 replies; 12+ messages in thread
From: Em Sharnoff @ 2025-06-10 10:16 UTC (permalink / raw)
  To: linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov,
	Edgecombe, Rick P, Oleg Vasilev, Arthur Petukhovsky, Stefan Radig,
	Misha Sakhnov

During memory hotplug, allocation failures in phys_*_init() aren't
handled, which results in a null pointer dereference, if they occur.

To handle that, change phys_pud_init() and similar functions to return
allocation errors via ERR_PTR() and check for that in arch_add_memory().

Signed-off-by: Em Sharnoff <sharnoff@neon.tech>
---
Changelog:
- v2: switch from special-casing zero value to using ERR_PTR()
- v3: Fix -Wint-conversion errors
---
 arch/x86/mm/init.c    |  6 ++++-
 arch/x86/mm/init_64.c | 54 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bfa444a7dbb0..a2665b6fe376 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -533,6 +533,7 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
  * Setup the direct mapping of the physical memory at PAGE_OFFSET.
  * This runs before bootmem is initialized and gets pages directly from
  * the physical memory. To access them they are temporarily mapped.
+ * Allocation errors are returned with ERR_PTR.
  */
 unsigned long __ref init_memory_mapping(unsigned long start,
 					unsigned long end, pgprot_t prot)
@@ -547,10 +548,13 @@ unsigned long __ref init_memory_mapping(unsigned long start,
 	memset(mr, 0, sizeof(mr));
 	nr_range = split_mem_range(mr, 0, start, end);
 
-	for (i = 0; i < nr_range; i++)
+	for (i = 0; i < nr_range; i++) {
 		ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
 						   mr[i].page_size_mask,
 						   prot);
+		if (IS_ERR((void *)ret))
+			return ret;
+	}
 
 	add_pfn_range_mapped(start >> PAGE_SHIFT, ret >> PAGE_SHIFT);
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 7c4f6f591f2b..712006afcd6c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -502,7 +502,8 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
 /*
  * Create PMD level page table mapping for physical addresses. The virtual
  * and physical address have to be aligned at this level.
- * It returns the last physical address mapped.
+ * It returns the last physical address mapped. Allocation errors are
+ * returned with ERR_PTR.
  */
 static unsigned long __meminit
 phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
@@ -572,7 +573,14 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
 		}
 
 		pte = alloc_low_page();
+		if (!pte)
+			return (unsigned long)ERR_PTR(-ENOMEM);
 		paddr_last = phys_pte_init(pte, paddr, paddr_end, new_prot, init);
+		/*
+		 * phys_{ppmd,pud,p4d}_init return allocation errors via ERR_PTR.
+		 * phys_pte_init makes no allocations, so should not error.
+		 */
+		BUG_ON(IS_ERR((void *)paddr_last));
 
 		spin_lock(&init_mm.page_table_lock);
 		pmd_populate_kernel_init(&init_mm, pmd, pte, init);
@@ -586,7 +594,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
  * Create PUD level page table mapping for physical addresses. The virtual
  * and physical address do not have to be aligned at this level. KASLR can
  * randomize virtual addresses up to this level.
- * It returns the last physical address mapped.
+ * It returns the last physical address mapped. Allocation errors are
+ * returned with ERR_PTR.
  */
 static unsigned long __meminit
 phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
@@ -623,6 +632,8 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 							   paddr_end,
 							   page_size_mask,
 							   prot, init);
+				if (IS_ERR((void *)paddr_last))
+					return paddr_last;
 				continue;
 			}
 			/*
@@ -658,12 +669,22 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
 		}
 
 		pmd = alloc_low_page();
+		if (!pmd)
+			return (unsigned long)ERR_PTR(-ENOMEM);
 		paddr_last = phys_pmd_init(pmd, paddr, paddr_end,
 					   page_size_mask, prot, init);
 
+		/*
+		 * We might have IS_ERR(paddr_last) if allocation failed, but we should
+		 * still update pud before bailing, so that subsequent retries can pick
+		 * up on progress (here and in phys_pmd_init) without leaking pmd.
+		 */
 		spin_lock(&init_mm.page_table_lock);
 		pud_populate_init(&init_mm, pud, pmd, init);
 		spin_unlock(&init_mm.page_table_lock);
+
+		if (IS_ERR((void *)paddr_last))
+			return paddr_last;
 	}
 
 	update_page_count(PG_LEVEL_1G, pages);
@@ -707,16 +728,26 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 			pud = pud_offset(p4d, 0);
 			paddr_last = phys_pud_init(pud, paddr, __pa(vaddr_end),
 					page_size_mask, prot, init);
+			if (IS_ERR((void *)paddr_last))
+				return paddr_last;
 			continue;
 		}
 
 		pud = alloc_low_page();
+		if (!pud)
+			return (unsigned long)ERR_PTR(-ENOMEM);
 		paddr_last = phys_pud_init(pud, paddr, __pa(vaddr_end),
 					   page_size_mask, prot, init);
 
 		spin_lock(&init_mm.page_table_lock);
 		p4d_populate_init(&init_mm, p4d, pud, init);
 		spin_unlock(&init_mm.page_table_lock);
+
+		/*
+		 * Bail only after updating p4d to keep progress from pud across retries.
+		 */
+		if (IS_ERR((void *)paddr_last))
+			return paddr_last;
 	}
 
 	return paddr_last;
@@ -748,10 +779,14 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
 						   __pa(vaddr_end),
 						   page_size_mask,
 						   prot, init);
+			if (IS_ERR((void *)paddr_last))
+				return paddr_last;
 			continue;
 		}
 
 		p4d = alloc_low_page();
+		if (!p4d)
+			return (unsigned long)ERR_PTR(-ENOMEM);
 		paddr_last = phys_p4d_init(p4d, __pa(vaddr), __pa(vaddr_end),
 					   page_size_mask, prot, init);
 
@@ -763,6 +798,13 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
 					  (pud_t *) p4d, init);
 
 		spin_unlock(&init_mm.page_table_lock);
+
+		/*
+		 * Bail only after updating pgd/p4d to keep progress from p4d across retries.
+		 */
+		if (IS_ERR((void *)paddr_last))
+			return paddr_last;
+
 		pgd_changed = true;
 	}
 
@@ -777,7 +819,8 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
  * Create page table mapping for the physical memory for specific physical
  * addresses. Note that it can only be used to populate non-present entries.
  * The virtual and physical addresses have to be aligned on PMD level
- * down. It returns the last physical address mapped.
+ * down. It returns the last physical address mapped. Allocation errors are
+ * returned with ERR_PTR.
  */
 unsigned long __meminit
 kernel_physical_mapping_init(unsigned long paddr_start,
@@ -980,8 +1023,11 @@ int arch_add_memory(int nid, u64 start, u64 size,
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
+	unsigned long ret = 0;
 
-	init_memory_mapping(start, start + size, params->pgprot);
+	ret = init_memory_mapping(start, start + size, params->pgprot);
+	if (IS_ERR((void *)ret))
+		return (int)PTR_ERR((void *)ret);
 
 	return add_pages(nid, start_pfn, nr_pages, params);
 }
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] x86/mm: Use GFP_KERNEL for alloc_low_pages() after boot
  2025-06-10 10:15 [PATCH v3 0/2] x86/mm: Improve alloc handling of phys_*_init() Em Sharnoff
  2025-06-10 10:16 ` [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init() Em Sharnoff
@ 2025-06-10 10:17 ` Em Sharnoff
  1 sibling, 0 replies; 12+ messages in thread
From: Em Sharnoff @ 2025-06-10 10:17 UTC (permalink / raw)
  To: linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov,
	Edgecombe, Rick P, Oleg Vasilev, Arthur Petukhovsky, Stefan Radig,
	Misha Sakhnov

Currently it's GFP_ATOMIC. GFP_KERNEL seems more correct.

From Ingo M. [1]

> There's no real reason why it should be GFP_ATOMIC AFAICS, other than
> some historic inertia that nobody bothered to fix.

and previously Mike R. [2]

> The few callers that effectively use page allocator for the direct map
> updates are gart_iommu_init() and memory hotplug. Neither of them
> happen in an atomic context so there is no reason to use GFP_ATOMIC
> for these allocations.
>
> Replace GFP_ATOMIC with GFP_KERNEL to avoid using atomic reserves for
> allocations that do not require that.

[1]: https://lore.kernel.org/all/aEE6_S2a-1tk1dtI@gmail.com/
[2]: https://lore.kernel.org/all/20211111110241.25968-5-rppt@kernel.org/

Signed-off-by: Em Sharnoff <sharnoff@neon.tech>
---
Changelog:
- v2: Add this patch
- v3: No changes
---
 arch/x86/mm/init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index a2665b6fe376..3a25cd9e9076 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -131,7 +131,7 @@ __ref void *alloc_low_pages(unsigned int num)
 		unsigned int order;
 
 		order = get_order((unsigned long)num << PAGE_SHIFT);
-		return (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO, order);
+		return (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
 	}
 
 	if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init()
  2025-06-10 10:16 ` [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init() Em Sharnoff
@ 2025-06-10 14:55   ` Dave Hansen
  2025-06-11  8:38     ` Em Sharnoff
  2025-06-11 22:56     ` H. Peter Anvin
  2025-06-10 15:07   ` Dave Hansen
  1 sibling, 2 replies; 12+ messages in thread
From: Dave Hansen @ 2025-06-10 14:55 UTC (permalink / raw)
  To: Em Sharnoff, linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov,
	Edgecombe, Rick P, Oleg Vasilev, Arthur Petukhovsky, Stefan Radig,
	Misha Sakhnov

On 6/10/25 03:16, Em Sharnoff wrote:
> +		if (!pmd)
> +			return (unsigned long)ERR_PTR(-ENOMEM);

All of this casting isn't great to look at. Just about every line of
code that this patch touches also introduces has a cast.

Could you please find a way to reduce the number of casts?

> +		/*
> +		 * We might have IS_ERR(paddr_last) if allocation failed, but we should
> +		 * still update pud before bailing, so that subsequent retries can pick
> +		 * up on progress (here and in phys_pmd_init) without leaking pmd.
> +		 */

Please write everything in imperative voice. No "we's", please.

> -	for (i = 0; i < nr_range; i++)
> +	for (i = 0; i < nr_range; i++) {
>  		ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
>  						   mr[i].page_size_mask,
>  						   prot);
> +		if (IS_ERR((void *)ret))
> +			return ret;
> +	}

Are there any _actual_ users of 'paddr_last'? I see a lot of setting it
and passing it around, but I _think_ this is the only place it actually
gets used. Here, the fact that it's an address doesn't even matter.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init()
  2025-06-10 10:16 ` [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init() Em Sharnoff
  2025-06-10 14:55   ` Dave Hansen
@ 2025-06-10 15:07   ` Dave Hansen
  2025-06-11  8:39     ` Em Sharnoff
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2025-06-10 15:07 UTC (permalink / raw)
  To: Em Sharnoff, linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov,
	Edgecombe, Rick P, Oleg Vasilev, Arthur Petukhovsky, Stefan Radig,
	Misha Sakhnov

On 6/10/25 03:16, Em Sharnoff wrote:
> +		/*
> +		 * phys_{ppmd,pud,p4d}_init return allocation errors via ERR_PTR.

			 ^ typo?

> +		 * phys_pte_init makes no allocations, so should not error.
> +		 */
> +		BUG_ON(IS_ERR((void *)paddr_last));



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init()
  2025-06-10 14:55   ` Dave Hansen
@ 2025-06-11  8:38     ` Em Sharnoff
  2025-06-11 14:16       ` Dave Hansen
  2025-06-11 22:56     ` H. Peter Anvin
  1 sibling, 1 reply; 12+ messages in thread
From: Em Sharnoff @ 2025-06-11  8:38 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov, Edgecombe, Rick P, Oleg Vasilev,
	Arthur Petukhovsky, Stefan Radig, Misha Sakhnov

On 6/10/25 15:55, Dave Hansen wrote:

> Are there any _actual_ users of 'paddr_last'? I see a lot of setting it
> and passing it around, but I _think_ this is the only place it actually
> gets used. Here, the fact that it's an address doesn't even matter.

I checked and didn't find any other users, but I'm happy to give others a
chance to correct me in case I missed something.

> Could you please find a way to reduce the number of casts?

What do you think about changing the return for these functions to just 'int'
for errors?
It's a larger change (especially if all callers should be updated to check the
return value), but I think much cleaner in the end.

> Please write everything in imperative voice. No "we's", please.

Noted, thanks - will address in the next version.

Em


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init()
  2025-06-10 15:07   ` Dave Hansen
@ 2025-06-11  8:39     ` Em Sharnoff
  0 siblings, 0 replies; 12+ messages in thread
From: Em Sharnoff @ 2025-06-11  8:39 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov, Edgecombe, Rick P, Oleg Vasilev,
	Arthur Petukhovsky, Stefan Radig, Misha Sakhnov

On 6/10/25 16:07, Dave Hansen wrote:

>> +		 * phys_{ppmd,pud,p4d}_init return allocation errors via ERR_PTR.
> 
> 			 ^ typo?

Oh- yes, good catch, thanks!

Em


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init()
  2025-06-11  8:38     ` Em Sharnoff
@ 2025-06-11 14:16       ` Dave Hansen
  2025-06-11 19:26         ` Em Sharnoff
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2025-06-11 14:16 UTC (permalink / raw)
  To: Em Sharnoff, linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov, Edgecombe, Rick P, Oleg Vasilev,
	Arthur Petukhovsky, Stefan Radig, Misha Sakhnov

On 6/11/25 01:38, Em Sharnoff wrote:
>> Could you please find a way to reduce the number of casts?
> What do you think about changing the return for these functions to just 'int'
> for errors?
> It's a larger change (especially if all callers should be updated to check the
> return value), but I think much cleaner in the end.

Fine with me. No reason to cram errno's into a physical address that's
never used as a physical address.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init()
  2025-06-11 14:16       ` Dave Hansen
@ 2025-06-11 19:26         ` Em Sharnoff
  2025-06-11 19:36           ` Dave Hansen
  0 siblings, 1 reply; 12+ messages in thread
From: Em Sharnoff @ 2025-06-11 19:26 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov, Edgecombe, Rick P, Oleg Vasilev,
	Arthur Petukhovsky, Stefan Radig, Misha Sakhnov

On 6/11/25 15:16, Dave Hansen wrote:
> On 6/11/25 01:38, Em Sharnoff wrote:
>>> Could you please find a way to reduce the number of casts?
>> What do you think about changing the return for these functions to just 'int'
>> for errors?
> Fine with me. No reason to cram errno's into a physical address that's
> never used as a physical address.

Just realized paddr_last is actually used to set 'max_pfn_mapped'.
In init_memory_mapping():

> add_pfn_range_mapped(start >> PAGE_SHIFT, ret >> PAGE_SHIFT);

which in turn only uses it to update max_pfn_mapped:

> max_pfn_mapped = max(max_pfn_mapped, end_pfn)

This was added in cc6150321903 ("x86: account overlapped mappings in
max_pfn_mapped").

---

Some other options to reduce the number of casts:

1. Add helpers to do the '(void *)' casting for ERR_PTR, keeping everything
   else the same.
2. Change the phys_*_init() functions to return int, and directly update
   max_pfn_mapped from within them. They already call update_page_count(),
   maybe this is similar?
3. Change the phys_*_init() functions to return int, and calculate the
   expected paddr_last externally.

The third option I think is possible in theory, but probably too complicated
and fragile. (at a glance, there's complex emergent logic - but maybe someone
familiar with the code could make the case for something simple)


Thoughts?

Em


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init()
  2025-06-11 19:26         ` Em Sharnoff
@ 2025-06-11 19:36           ` Dave Hansen
  2025-06-13 20:17             ` Em Sharnoff
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2025-06-11 19:36 UTC (permalink / raw)
  To: Em Sharnoff, linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov, Edgecombe, Rick P, Oleg Vasilev,
	Arthur Petukhovsky, Stefan Radig, Misha Sakhnov

On 6/11/25 12:26, Em Sharnoff wrote:
> 2. Change the phys_*_init() functions to return int, and directly update
>    max_pfn_mapped from within them. They already call update_page_count(),
>    maybe this is similar?

That seems like the most straightforward. Each time they update a
mapping, they call a helper which bumps up 'max_pfn_mapped'.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init()
  2025-06-10 14:55   ` Dave Hansen
  2025-06-11  8:38     ` Em Sharnoff
@ 2025-06-11 22:56     ` H. Peter Anvin
  1 sibling, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2025-06-11 22:56 UTC (permalink / raw)
  To: Dave Hansen, Em Sharnoff, linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov, Edgecombe, Rick P, Oleg Vasilev,
	Arthur Petukhovsky, Stefan Radig, Misha Sakhnov

On June 10, 2025 7:55:36 AM PDT, Dave Hansen <dave.hansen@intel.com> wrote:
>On 6/10/25 03:16, Em Sharnoff wrote:
>> +		if (!pmd)
>> +			return (unsigned long)ERR_PTR(-ENOMEM);
>
>All of this casting isn't great to look at. Just about every line of
>code that this patch touches also introduces has a cast.
>
>Could you please find a way to reduce the number of casts?
>
>> +		/*
>> +		 * We might have IS_ERR(paddr_last) if allocation failed, but we should
>> +		 * still update pud before bailing, so that subsequent retries can pick
>> +		 * up on progress (here and in phys_pmd_init) without leaking pmd.
>> +		 */
>
>Please write everything in imperative voice. No "we's", please.
>
>> -	for (i = 0; i < nr_range; i++)
>> +	for (i = 0; i < nr_range; i++) {
>>  		ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
>>  						   mr[i].page_size_mask,
>>  						   prot);
>> +		if (IS_ERR((void *)ret))
>> +			return ret;
>> +	}
>
>Are there any _actual_ users of 'paddr_last'? I see a lot of setting it
>and passing it around, but I _think_ this is the only place it actually
>gets used. Here, the fact that it's an address doesn't even matter.
>
>

Given that ERR_PTR and IS_ERR are basically just casts to and from pointers, why have them at all?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init()
  2025-06-11 19:36           ` Dave Hansen
@ 2025-06-13 20:17             ` Em Sharnoff
  0 siblings, 0 replies; 12+ messages in thread
From: Em Sharnoff @ 2025-06-13 20:17 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov, Edgecombe, Rick P, Oleg Vasilev,
	Arthur Petukhovsky, Stefan Radig, Misha Sakhnov

On 6/11/25 20:36, Dave Hansen wrote:
> On 6/11/25 12:26, Em Sharnoff wrote:
> > 2. Change the phys_*_init() functions to return int, and directly update
> >    max_pfn_mapped from within them. They already call update_page_count(),
> >    maybe this is similar?
> 
> That seems like the most straightforward. Each time they update a
> mapping, they call a helper which bumps up 'max_pfn_mapped'.

Update on this -- turns out I was wrong again. 'paddr_last' is also used to
update 'pfn_mapped', which is more complex. Hopefully still within reason.

I've posted a new patch set, patch 1/4 adopts the "call a helper" route.
Design-wise, it's less clean than I'd like, but let me know what you think.
https://lore.kernel.org/all/7d0d307d-71eb-4913-8023-bccc7a8a4a3d@neon.tech/

Em


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-06-13 20:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-10 10:15 [PATCH v3 0/2] x86/mm: Improve alloc handling of phys_*_init() Em Sharnoff
2025-06-10 10:16 ` [PATCH v3 1/2] x86/mm: Handle alloc failure in phys_*_init() Em Sharnoff
2025-06-10 14:55   ` Dave Hansen
2025-06-11  8:38     ` Em Sharnoff
2025-06-11 14:16       ` Dave Hansen
2025-06-11 19:26         ` Em Sharnoff
2025-06-11 19:36           ` Dave Hansen
2025-06-13 20:17             ` Em Sharnoff
2025-06-11 22:56     ` H. Peter Anvin
2025-06-10 15:07   ` Dave Hansen
2025-06-11  8:39     ` Em Sharnoff
2025-06-10 10:17 ` [PATCH v3 2/2] x86/mm: Use GFP_KERNEL for alloc_low_pages() after boot Em Sharnoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).