Linux driver-core infrastructure
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] mm/numa: reserve standby NUMA nodes for runtime claiming
@ 2026-06-10  1:45 Gregory Price
  2026-06-10  1:45 ` [RFC PATCH 1/3] mm/numa: add exclusive node pool and numa=standby boot parameter Gregory Price
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Gregory Price @ 2026-06-10  1:45 UTC (permalink / raw)
  To: linux-mm
  Cc: x86, linux-doc, linux-kernel, linux-acpi, driver-core,
	kernel-team, corbet, skhan, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, rafael, lenb, gregkh, dakr, akpm, rppt, rdunlap,
	feng.tang, dapeng1.mi, elver, kuba, ebiggers, lirongqing, paulmck,
	gourry, dave.jiang, jic23, xueshuai, kai.huang

A NUMA node must be "possible" at __init time to be usable later; a node
that is not described at boot cannot be brought online afterwards.

For memory tiering or isolation it is sometimes desirable to spread
hotplug memory (CXL, GPU, virtio-mem, ...) across more nodes than
firmware describes.  Additionally, some memory devices may provide
more than a single class of memory and need flexibility to redefine
the effective topology at runtime instead of depending on BIOS.

This series adds a way to reserve empty "standby" NUMA nodes at boot
so drivers can place hotplugged memory on distinct nodes later, at
runtime, without those nodes being described by BIOS.

Using the feature
=================
A standby node is an empty, offline-but-possible NUMA node: at boot it
has no memory and no CPUs.  A driver claims one at runtime, brings
memory online on it, and releases it when done.

This series adds 3 ways to reserve standby nodes.

  - numa=standby=N
      Boot parameter.  Reserve N extra empty nodes.  Platform
      independent; works with or without ACPI.

  - CONFIG_ACPI_NUMA_STANDBY_NODES=N
      Reserve N extra empty nodes on ACPI systems (honoured only when
      firmware produces a usable NUMA configuration).

  - CONFIG_ACPI_NUMA_ADD_CFMWS_NODES=K
      Reserve K extra empty nodes per CXL Fixed Memory Window (CEDT
      CFMWS), for CXL topologies that want several nodes behind one
      window.

All three default to off (0 / unset).

Reserved nodes show up in /sys/devices/system/node/possible but not
.../online until a driver claims one and onlines memory on it.

Testing
=======
Built and booted under QEMU (virtme-ng) across a matrix of boot
parameters and topologies:

  - Each reservation source, individually and combined: reserved nodes
    appear as possible-but-offline with no memory, claim/release
    round-trips correctly, and node distances are sane.

    The CFMWS path was exercised with an emulated CXL Type-3 device
    presenting a CEDT/CFMWS.

  - Fallback: when ACPI NUMA init does not produce a usable config,
    no standby nodes are reserved.

  - NUMA emulation (numa=fake): renumbers the node space.

    Standby nodes are created only after the (possibly emulated)
    topology is final, so their ids can never alias emulated nodes.

    numa=fake boots cleanly with the feature enabled and behaves
    identically to a baseline kernel without this series.

    Tested with CONFIG_NUMA_EMU both enabled and disabled, and with
    and without numa=fake on the command line.

  - Default-off builds behave identically to a baseline kernel.

Gregory Price (3):
mm/numa: add exclusive node pool and numa=standby boot parameter
acpi/numa: add CONFIG_ACPI_NUMA_STANDBY_NODES
acpi/numa: add CONFIG_ACPI_NUMA_ADD_CFMWS_NODES

 .../admin-guide/kernel-parameters.txt         |   8 ++
 arch/x86/mm/numa.c                            |   2 +
 drivers/acpi/numa/Kconfig                     |  35 ++++++
 drivers/acpi/numa/srat.c                      |  14 ++-
 drivers/base/arch_numa.c                      |   2 +
 include/linux/numa.h                          |  14 +++
 include/linux/numa_memblks.h                  |   3 +
 mm/numa.c                                     |  90 +++++++++++++
 mm/numa_memblks.c                             | 118 +++++++++++++++++-
 9 files changed, 284 insertions(+), 2 deletions(-)

--
2.54.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-11 14:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10  1:45 [RFC PATCH 0/3] mm/numa: reserve standby NUMA nodes for runtime claiming Gregory Price
2026-06-10  1:45 ` [RFC PATCH 1/3] mm/numa: add exclusive node pool and numa=standby boot parameter Gregory Price
2026-06-11  9:00   ` Mike Rapoport
2026-06-11 14:04     ` Gregory Price
2026-06-10  1:45 ` [RFC PATCH 2/3] acpi/numa: add CONFIG_ACPI_NUMA_STANDBY_NODES Gregory Price
2026-06-10  1:45 ` [RFC PATCH 3/3] acpi/numa: add CONFIG_ACPI_NUMA_ADD_CFMWS_NODES Gregory Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox