linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] x86/mm: Improve alloc handling of phys_*_init()
@ 2025-06-13 20:09 Em Sharnoff
  2025-06-13 20:10 ` [PATCH v4 1/4] x86/mm: Update mapped addresses in phys_{pmd,pud}_init() Em Sharnoff
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Em Sharnoff @ 2025-06-13 20:09 UTC (permalink / raw)
  To: linux-kernel, x86, linux-mm
  Cc: Ingo Molnar, H. Peter Anvin, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Borislav Petkov,
	Edgecombe, Rick P, Oleg Vasilev, Arthur Petukhovsky, Stefan Radig,
	Misha Sakhnov

Hi folks,

See changelog + more context below.

tl;dr:

* Currently alloc_low_page() uses GFP_ATOMIC after boot, which may fail
* Those failures aren't currently handled by phys_pud_init() and similar
  functions.
* Those failures can happen during memory hotplug

So:

1. Add handling for those allocation failures (patches 1-3)
2. Use GFP_KERNEL instead of GFP_ATOMIC (patch 4)

Previous version here:
https://lore.kernel.org/all/a31e3b89-5040-4426-9ce8-d674b8554aa1@neon.tech/

=== Changelog ===

v2:
- Switch from special-casing zero values to ERR_PTR()
- Add patch to move from GFP_ATOMIC -> GFP_KERNEL
- Move commentary out of the patch message and into this cover letter
v3:
- Fix -Wint-conversion issues
v4:
- new patch: move 'paddr_last' usage into phys_{pud,pmd}_init() so the
  return from those functions is no longer needed.
- new patch: make phys_*_init() and their callers return int

I'm not sure if patch 2/4 ("Allow error returns ...") should be separate
from patch 3/4 ("Handle alloc failure ..."), but it's easy enough to
combine them if need be.

=== Background ===

We recently started observing these null pointer dereferences happening
in practice (albeit quite rarely), triggered by allocation failures
during virtio-mem hotplug.

We use virtio-mem quite heavily - adding/removing memory based on
resource usage of customer workloads across a fleet of VMs - so it's
somewhat expected that we have occasional allocation failures here, if
we run out of memory before hotplug takes place.

We started seeing this bug after upgrading from 6.6.64 to 6.12.26, but
there didn't appear to be relevant changes in the codepaths involved, so
we figured the upgrade was triggering a latent issue.

The possibility for this issue was also pointed out a while back:

> For alloc_low_pages(), I noticed the callers don’t check for allocation
> failure. I'm a little surprised that there haven't been reports of the
> allocation failing, because these operations could result in a lot more
> pages getting allocated way past boot, and failure causes a NULL
> pointer dereference.

https://lore.kernel.org/all/5aee7bcdf49b1c6b8ee902dd2abd9220169c694b.camel@intel.com/

For completeness, here's an example stack trace we saw (on 6.12.26):

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  ....
  Call Trace:
   <TASK>
   phys_pud_init+0xa0/0x390
   phys_p4d_init+0x93/0x330
   __kernel_physical_mapping_init+0xa1/0x370
   kernel_physical_mapping_init+0xf/0x20
   init_memory_mapping+0x1fa/0x430
   arch_add_memory+0x2b/0x50
   add_memory_resource+0xe6/0x260
   add_memory_driver_managed+0x78/0xc0
   virtio_mem_add_memory+0x46/0xc0
   virtio_mem_sbm_plug_and_add_mb+0xa3/0x160
   virtio_mem_run_wq+0x1035/0x16c0
   process_one_work+0x17a/0x3c0
   worker_thread+0x2c5/0x3f0
   ? _raw_spin_unlock_irqrestore+0x9/0x30
   ? __pfx_worker_thread+0x10/0x10
   kthread+0xdc/0x110
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x35/0x60
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1a/0x30
   </TASK>

and the allocation failure preceding it:

  kworker/0:2: page allocation failure: order:0, mode:0x920(GFP_ATOMIC|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
  ...
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5b/0x70
   dump_stack+0x10/0x20
   warn_alloc+0x103/0x180
   __alloc_pages_slowpath.constprop.0+0x738/0xf30
   __alloc_pages_noprof+0x1e9/0x340
   alloc_pages_mpol_noprof+0x47/0x100
   alloc_pages_noprof+0x4b/0x80
   get_free_pages_noprof+0xc/0x40
   alloc_low_pages+0xc2/0x150
   phys_pud_init+0x82/0x390
  ...

(everything from phys_pud_init and below was the same)

There's some additional context in a github issue we opened on our side:
https://github.com/neondatabase/autoscaling/issues/1391

=== Reproducing / Testing ===

I was able to partially reproduce the original issue we saw by
modifying phys_pud_init() to simulate alloc_low_page() returning null
after boot, and then doing memory hotplug to trigger the "failure".
Something roughly like:

  - pmd = alloc_low_page();
  + if (!after_bootmem)
  + 	pmd = alloc_low_page();
  + else
  + 	pmd = 0;

To test recovery, I also tried simulating just one alloc_low_page()
failure after boot. This change seemed to handle it at a basic level
(virito-mem hotplug succeeded with the right amount, after retrying),
but I didn't dig further.

We also plan to test this in our production environment (where we should
see the difference after a few days); as of 2025-06-13, we haven't yet
rolled that out.


Em Sharnoff (4):
  x86/mm: Update mapped addresses in phys_{pmd,pud}_init()
  x86/mm: Allow error returns from phys_*_init()
  x86/mm: Handle alloc failure in phys_*_init()
  x86/mm: Use GFP_KERNEL for alloc_low_pages() after boot

 arch/x86/include/asm/pgtable.h |   3 +-
 arch/x86/mm/init.c             |  29 ++++++---
 arch/x86/mm/init_32.c          |   6 +-
 arch/x86/mm/init_64.c          | 116 ++++++++++++++++++++++-----------
 arch/x86/mm/mem_encrypt_amd.c  |   8 ++-
 arch/x86/mm/mm_internal.h      |  13 ++--
 6 files changed, 113 insertions(+), 62 deletions(-)


base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
-- 
2.39.5


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-06-13 20:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13 20:09 [PATCH v4 0/4] x86/mm: Improve alloc handling of phys_*_init() Em Sharnoff
2025-06-13 20:10 ` [PATCH v4 1/4] x86/mm: Update mapped addresses in phys_{pmd,pud}_init() Em Sharnoff
2025-06-13 20:11 ` [PATCH v4 2/4] x86/mm: Allow error returns from phys_*_init() Em Sharnoff
2025-06-13 20:11 ` [PATCH v4 3/4] x86/mm: Handle alloc failure in phys_*_init() Em Sharnoff
2025-06-13 20:12 ` [PATCH v4 4/4] x86/mm: Use GFP_KERNEL for alloc_low_pages() after boot Em Sharnoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).