* [PATCH v8 0/1] hw/mem: add spm-memory device for Specific Purpose Memory
@ 2026-05-27 7:42 fanhuang
2026-05-27 7:42 ` [PATCH v8 1/1] " fanhuang
0 siblings, 1 reply; 4+ messages in thread
From: fanhuang @ 2026-05-27 7:42 UTC (permalink / raw)
To: qemu-devel, imammedo, david, gourry; +Cc: Zhigang.Luo, Lianjie.Shi, fanhuang
This series adds a TYPE_MEMORY_DEVICE subclass `spm-memory` for boot-time
SOFT_RESERVED guest memory, following the direction from the v7 thread [1].
Background
----------
This series targets coherent CPU + accelerator shared-address-space
systems, where the accelerator's HBM is not a device-private framebuffer
behind a PCIe BAR but a tier of host system memory: visible to the CPU
in the platform physical address space, shared coherently with the
accelerator over the platform fabric, and bound to a NUMA proximity
domain set by platform firmware at boot fabric training.
For such a region to function correctly in the guest, two things must
hold simultaneously: the CPU memory subsystem has to see it in the
system memory map (so the CPU side can address it), and it has to be
reserved exclusively for the accelerator's driver (so the kernel's
general allocator does not hand HBM pages to unrelated workloads). The
SOFT_RESERVED memory type in E820 plus a matching SRAT memory-affinity
entry is the mechanism that delivers both: a firmware-produced topology
that the CPU memory subsystem honors and the accelerator's driver
consumes for its own range.
Approach
--------
The patch adds a new TYPE_MEMORY_DEVICE subclass `spm-memory`. Each
instance binds one host memory backend to a single NUMA proximity
domain and is boot-time only; placement, hotplug rejection, and QMP
introspection come from the existing memory-device framework. The
device emits one E820 SOFT_RESERVED entry per instance at
machine_done and one SRAT memory-affinity entry per instance at
acpi-build, the latter flagged ENABLED only.
The device_memory SRAT umbrella entry in hw/i386/acpi-build.c is
restructured into a per-kind partition: for each plugged
TYPE_SPM_MEMORY device the partition emits an ENABLED entry at the
device's proximity_domain, and the remaining sub-ranges (gaps between
SPM devices, leading and trailing padding, and ranges occupied by
non-SPM memory devices) are emitted as HOTPLUGGABLE | ENABLED entries
at the placeholder PXM (nb_numa_nodes - 1).
No firmware-side change is required: the existing OVMF and SeaBIOS
handling for E820 SOFT_RESERVED and ACPI SRAT covers the guest-facing
contract.
Testing
-------
Verified end-to-end on SeaBIOS and OVMF, q35 + KVM, for:
- single spm-memory instance (natural placement and explicit addr=)
- two spm-memory instances on different NUMA nodes (tight pack and
with inter-device gap)
- one spm-memory + one pc-dimm on different NUMA nodes
Guest observations: /proc/iomem shows SOFT_RESERVED at the expected
addresses, dmesg SRAT parsing finds the per-device memory_affinity
entries with correct PXM.
Previous versions
-----------------
v1: https://lore.kernel.org/qemu-devel/20250924103324.2074819-1-FangSheng.Huang@amd.com/
v2: https://lore.kernel.org/qemu-devel/20251020090701.4036748-1-FangSheng.Huang@amd.com/
v3: https://lore.kernel.org/qemu-devel/20251208105137.2058928-1-FangSheng.Huang@amd.com/
v4: https://lore.kernel.org/qemu-devel/20251209093841.2250527-1-FangSheng.Huang@amd.com/
v5: https://lore.kernel.org/qemu-devel/20260123024312.1601732-1-FangSheng.Huang@amd.com/
v6: https://lore.kernel.org/qemu-devel/20260226105023.256568-1-FangSheng.Huang@amd.com/
v7: https://lore.kernel.org/qemu-devel/20260306082735.1106690-1-FangSheng.Huang@amd.com/
[1] v7 thread above, closed out by:
https://lore.kernel.org/qemu-devel/666a7ba1-5d3a-4732-b872-0d9fb2fe8461@amd.com/
fanhuang (1):
hw/mem: add spm-memory device for Specific Purpose Memory
MAINTAINERS | 2 +
hw/i386/Kconfig | 2 +
hw/i386/acpi-build.c | 105 ++++++++++++--
hw/i386/e820_memory_layout.h | 11 +-
hw/mem/Kconfig | 4 +
hw/mem/meson.build | 1 +
hw/mem/spm-memory.c | 269 +++++++++++++++++++++++++++++++++++
include/hw/mem/spm-memory.h | 43 ++++++
qapi/machine.json | 43 +++++-
9 files changed, 459 insertions(+), 21 deletions(-)
create mode 100644 hw/mem/spm-memory.c
create mode 100644 include/hw/mem/spm-memory.h
--
2.34.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v8 1/1] hw/mem: add spm-memory device for Specific Purpose Memory
2026-05-27 7:42 [PATCH v8 0/1] hw/mem: add spm-memory device for Specific Purpose Memory fanhuang
@ 2026-05-27 7:42 ` fanhuang
2026-06-01 8:50 ` Igor Mammedov
0 siblings, 1 reply; 4+ messages in thread
From: fanhuang @ 2026-05-27 7:42 UTC (permalink / raw)
To: qemu-devel, imammedo, david, gourry; +Cc: Zhigang.Luo, Lianjie.Shi, fanhuang
Introduce a TYPE_MEMORY_DEVICE subclass `spm-memory` for boot-time
SOFT_RESERVED memory exposed to the guest with a per-device NUMA
proximity domain.
The device targets accelerator memory (HBM and similar) that the
firmware hands to the guest OS as SOFT_RESERVED memory, so a driver
in the guest -- rather than the kernel's general allocator -- owns
the range. Per-device NUMA placement matches the natural shape of
multiple HBM blocks (one block == one driver claim == one PXM).
Usage:
-object memory-backend-ram,id=spm0,size=8G
-numa node,nodeid=N
-device spm-memory,id=dev0,memdev=spm0,node=N[,addr=GPA]
The device:
- inherits TYPE_DEVICE and implements TYPE_MEMORY_DEVICE; placement
in machine->device_memory goes through the standard memory-device
framework (memory_device_pre_plug + memory_device_plug)
- is boot-time only: dc->hotpluggable = false, and realize rejects
attempts past PHASE_MACHINE_READY
- emits one E820 SOFT_RESERVED entry per instance at machine_done
- emits one SRAT memory_affinity entry per instance at acpi-build,
ENABLED-only (no HOTPLUGGABLE flag)
- rejects mixed-memory configurations on the target NUMA node at
realize-time
- is reported by QMP query-memory-devices as a dedicated kind,
MEMORY_DEVICE_INFO_KIND_SPM_MEMORY
The device_memory SRAT umbrella entry in hw/i386/acpi-build.c is
restructured to partition the region into per-kind chunks rather
than emitting a single HOTPLUGGABLE entry covering everything.
For each plugged TYPE_SPM_MEMORY device the partition emits an
ENABLED entry at the device's proximity_domain; the remaining
sub-ranges (gaps between SPM devices, leading and trailing
padding, and ranges occupied by non-SPM memory devices) are
emitted as HOTPLUGGABLE | ENABLED entries at the placeholder
PXM (nb_numa_nodes - 1), preserving the upstream convention.
E820_SOFT_RESERVED is added to hw/i386/e820_memory_layout.h
alongside the other type codes.
CONFIG_SPM_MEMORY is selected by the i386 PC and Q35 machines
(same as DIMM).
MAINTAINERS gets new file entries under the existing "Memory devices"
stanza.
Signed-off-by: FangSheng Huang <FangSheng.Huang@amd.com>
---
MAINTAINERS | 2 +
hw/i386/Kconfig | 2 +
hw/i386/acpi-build.c | 105 ++++++++++++--
hw/i386/e820_memory_layout.h | 11 +-
hw/mem/Kconfig | 4 +
hw/mem/meson.build | 1 +
hw/mem/spm-memory.c | 269 +++++++++++++++++++++++++++++++++++
include/hw/mem/spm-memory.h | 43 ++++++
qapi/machine.json | 43 +++++-
9 files changed, 459 insertions(+), 21 deletions(-)
create mode 100644 hw/mem/spm-memory.c
create mode 100644 include/hw/mem/spm-memory.h
diff --git a/MAINTAINERS b/MAINTAINERS
index cd5c4831e2..2a06515fc8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3361,9 +3361,11 @@ S: Supported
F: hw/mem/memory-device*.c
F: hw/mem/nvdimm.c
F: hw/mem/pc-dimm.c
+F: hw/mem/spm-memory.c
F: include/hw/mem/memory-device.h
F: include/hw/mem/nvdimm.h
F: include/hw/mem/pc-dimm.h
+F: include/hw/mem/spm-memory.h
F: docs/nvdimm.txt
SPICE
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 12473acaa7..e31a25b634 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -84,6 +84,7 @@ config I440FX
select PCI_I440FX
select PIIX
select DIMM
+ select SPM_MEMORY
select SMBIOS
select SMBIOS_LEGACY
select FW_CFG_DMA
@@ -113,6 +114,7 @@ config Q35
select LPC_ICH9
select AHCI_ICH9
select DIMM
+ select SPM_MEMORY
select SMBIOS
select FW_CFG_DMA
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 0d7c83d5e9..865ab5fa4f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -52,6 +52,7 @@
#include "migration/vmstate.h"
#include "hw/mem/memory-device.h"
#include "hw/mem/nvdimm.h"
+#include "hw/mem/spm-memory.h"
#include "system/numa.h"
#include "system/reset.h"
#include "hw/hyperv/vmbus-bridge.h"
@@ -1346,6 +1347,95 @@ build_tpm_tcpa(GArray *table_data, BIOSLinker *linker, GArray *tcpalog,
}
#endif
+typedef struct {
+ uint64_t addr;
+ uint64_t size;
+ uint32_t node;
+} SpmRange;
+
+static int collect_spm_ranges_cb(Object *obj, void *opaque)
+{
+ GArray *ranges = opaque;
+ SpmMemoryDevice *spm;
+ MemoryDeviceClass *mdc;
+ SpmRange r;
+
+ if (!object_dynamic_cast(obj, TYPE_SPM_MEMORY)) {
+ return 0;
+ }
+ spm = SPM_MEMORY(obj);
+ mdc = MEMORY_DEVICE_GET_CLASS(MEMORY_DEVICE(spm));
+ r.addr = mdc->get_addr(MEMORY_DEVICE(spm));
+ r.size = memory_region_size(
+ host_memory_backend_get_memory(spm->hostmem));
+ r.node = spm->node;
+ g_array_append_val(ranges, r);
+ return 0;
+}
+
+static gint spm_range_compare(gconstpointer a, gconstpointer b)
+{
+ const SpmRange *range_a = a;
+ const SpmRange *range_b = b;
+
+ if (range_a->addr < range_b->addr) {
+ return -1;
+ }
+ if (range_a->addr > range_b->addr) {
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Emit SRAT memory-affinity entries covering the device_memory region:
+ * - ENABLED entry at the device's proximity_domain for each plugged
+ * TYPE_SPM_MEMORY instance.
+ * - HOTPLUGGABLE | ENABLED entry with PXM = nb_numa_nodes - 1 for
+ * every remaining sub-range (gaps, leading/trailing padding, and
+ * ranges occupied by non-SPM memory devices).
+ */
+static void build_srat_device_memory(GArray *table_data, MachineState *ms)
+{
+ g_autoptr(GArray) ranges = g_array_new(FALSE, TRUE, sizeof(SpmRange));
+ uint64_t cursor, end;
+ int nb_nodes = ms->numa_state ? ms->numa_state->num_nodes : 0;
+ uint32_t hotplug_pxm = nb_nodes > 0 ? nb_nodes - 1 : 0;
+ guint i;
+
+ if (!ms->device_memory) {
+ return;
+ }
+
+ cursor = ms->device_memory->base;
+ end = cursor + memory_region_size(&ms->device_memory->mr);
+
+ object_child_foreach_recursive(qdev_get_machine(),
+ collect_spm_ranges_cb, ranges);
+ g_array_sort(ranges, spm_range_compare);
+
+ for (i = 0; i < ranges->len; i++) {
+ SpmRange *r = &g_array_index(ranges, SpmRange, i);
+
+ if (cursor < r->addr) {
+ build_srat_memory(table_data, cursor, r->addr - cursor,
+ hotplug_pxm,
+ MEM_AFFINITY_HOTPLUGGABLE |
+ MEM_AFFINITY_ENABLED);
+ }
+ build_srat_memory(table_data, r->addr, r->size, r->node,
+ MEM_AFFINITY_ENABLED);
+ cursor = r->addr + r->size;
+ }
+
+ if (cursor < end) {
+ build_srat_memory(table_data, cursor, end - cursor,
+ hotplug_pxm,
+ MEM_AFFINITY_HOTPLUGGABLE |
+ MEM_AFFINITY_ENABLED);
+ }
+}
+
#define HOLE_640K_START (640 * KiB)
#define HOLE_640K_END (1 * MiB)
@@ -1473,20 +1563,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
build_srat_generic_affinity_structures(table_data);
- /*
- * Entry is required for Windows to enable memory hotplug in OS
- * and for Linux to enable SWIOTLB when booted with less than
- * 4G of RAM. Windows works better if the entry sets proximity
- * to the highest NUMA node in the machine.
- * Memory devices may override proximity set by this entry,
- * providing _PXM method if necessary.
- */
- if (machine->device_memory) {
- build_srat_memory(table_data, machine->device_memory->base,
- memory_region_size(&machine->device_memory->mr),
- nb_numa_nodes - 1,
- MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
- }
+ build_srat_device_memory(table_data, machine);
acpi_table_end(linker, &table);
}
diff --git a/hw/i386/e820_memory_layout.h b/hw/i386/e820_memory_layout.h
index b50acfa201..6ef169db9c 100644
--- a/hw/i386/e820_memory_layout.h
+++ b/hw/i386/e820_memory_layout.h
@@ -10,11 +10,12 @@
#define HW_I386_E820_MEMORY_LAYOUT_H
/* e820 types */
-#define E820_RAM 1
-#define E820_RESERVED 2
-#define E820_ACPI 3
-#define E820_NVS 4
-#define E820_UNUSABLE 5
+#define E820_RAM 1
+#define E820_RESERVED 2
+#define E820_ACPI 3
+#define E820_NVS 4
+#define E820_UNUSABLE 5
+#define E820_SOFT_RESERVED 0xefffffff
struct e820_entry {
uint64_t address;
diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
index 73c5ae8ad9..4145870881 100644
--- a/hw/mem/Kconfig
+++ b/hw/mem/Kconfig
@@ -16,3 +16,7 @@ config CXL_MEM_DEVICE
bool
default y if CXL
select MEM_DEVICE
+
+config SPM_MEMORY
+ bool
+ select MEM_DEVICE
diff --git a/hw/mem/meson.build b/hw/mem/meson.build
index 8c2beeb7d4..2c28104282 100644
--- a/hw/mem/meson.build
+++ b/hw/mem/meson.build
@@ -4,6 +4,7 @@ mem_ss.add(when: 'CONFIG_DIMM', if_true: files('pc-dimm.c'))
mem_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_mc.c'))
mem_ss.add(when: 'CONFIG_NVDIMM', if_true: files('nvdimm.c'))
mem_ss.add(when: 'CONFIG_CXL_MEM_DEVICE', if_true: files('cxl_type3.c'))
+mem_ss.add(when: 'CONFIG_SPM_MEMORY', if_true: files('spm-memory.c'))
stub_ss.add(files('cxl_type3_stubs.c'))
stub_ss.add(files('memory-device-stubs.c'))
diff --git a/hw/mem/spm-memory.c b/hw/mem/spm-memory.c
new file mode 100644
index 0000000000..85887b2479
--- /dev/null
+++ b/hw/mem/spm-memory.c
@@ -0,0 +1,269 @@
+/*
+ * Specific Purpose Memory (SPM) device
+ *
+ * Copyright (c) 2026 Advanced Micro Devices, Inc.
+ *
+ * Authors:
+ * FangSheng Huang <FangSheng.Huang@amd.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+#include "hw/core/boards.h"
+#include "hw/core/qdev-properties.h"
+#include "hw/core/qdev.h"
+#include "hw/mem/spm-memory.h"
+#include "hw/mem/memory-device.h"
+#include "hw/i386/e820_memory_layout.h"
+#include "migration/vmstate.h"
+#include "system/hostmem.h"
+#include "system/numa.h"
+#include "system/system.h"
+
+static QLIST_HEAD(, SpmMemoryDevice) spm_memory_list =
+ QLIST_HEAD_INITIALIZER(spm_memory_list);
+static Notifier spm_machine_done_notifier;
+static bool spm_machine_done_registered;
+
+#define SPM_MEMORY_MEMDEV_PROP "memdev"
+#define SPM_MEMORY_NODE_PROP "node"
+#define SPM_MEMORY_ADDR_PROP "addr"
+
+static const Property spm_memory_properties[] = {
+ DEFINE_PROP_LINK(SPM_MEMORY_MEMDEV_PROP, SpmMemoryDevice, hostmem,
+ TYPE_MEMORY_BACKEND, HostMemoryBackend *),
+ DEFINE_PROP_UINT32(SPM_MEMORY_NODE_PROP, SpmMemoryDevice, node, 0),
+ DEFINE_PROP_UINT64(SPM_MEMORY_ADDR_PROP, SpmMemoryDevice, addr, 0),
+};
+
+static uint64_t spm_memory_md_get_addr(const MemoryDeviceState *md)
+{
+ return SPM_MEMORY(md)->addr;
+}
+
+static void spm_memory_md_set_addr(MemoryDeviceState *md, uint64_t addr,
+ Error **errp)
+{
+ SPM_MEMORY(md)->addr = addr;
+}
+
+static MemoryRegion *spm_memory_md_get_memory_region(MemoryDeviceState *md,
+ Error **errp)
+{
+ SpmMemoryDevice *spm = SPM_MEMORY(md);
+
+ if (!spm->hostmem) {
+ error_setg(errp, "'memdev' property must be set");
+ return NULL;
+ }
+ return host_memory_backend_get_memory(spm->hostmem);
+}
+
+static uint64_t spm_memory_md_get_plugged_size(const MemoryDeviceState *md,
+ Error **errp)
+{
+ SpmMemoryDevice *spm = SPM_MEMORY(md);
+ return spm->hostmem ?
+ memory_region_size(host_memory_backend_get_memory(spm->hostmem)) : 0;
+}
+
+static void spm_memory_md_fill_device_info(const MemoryDeviceState *md,
+ MemoryDeviceInfo *info)
+{
+ SpmMemoryDeviceInfo *di = g_new0(SpmMemoryDeviceInfo, 1);
+ SpmMemoryDevice *spm = SPM_MEMORY(md);
+ DeviceState *dev = DEVICE(md);
+
+ di->id = dev->id ? g_strdup(dev->id) : NULL;
+ di->memaddr = spm->addr;
+ di->size = spm->hostmem ? memory_region_size(
+ host_memory_backend_get_memory(spm->hostmem)) : 0;
+ di->node = spm->node;
+ di->memdev = spm->hostmem ?
+ object_get_canonical_path(OBJECT(spm->hostmem)) : NULL;
+
+ info->u.spm_memory.data = di;
+ info->type = MEMORY_DEVICE_INFO_KIND_SPM_MEMORY;
+}
+
+typedef struct {
+ uint32_t node_id;
+ const SpmMemoryDevice *self; /* exclude self when walking */
+ bool conflict;
+} SpmNodeCheckCtx;
+
+static int spm_check_node_collision_cb(Object *obj, void *opaque)
+{
+ SpmNodeCheckCtx *ctx = opaque;
+ uint32_t other_node;
+
+ if (!object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) {
+ return 0;
+ }
+ /*
+ * Skip self. Compare canonical Object* pointers, not interface-cast
+ * MemoryDeviceState* (different address under INTERFACE_CHECK).
+ */
+ if (obj == OBJECT(ctx->self)) {
+ return 0;
+ }
+
+ /*
+ * Not all memory-device subclasses have a "node" property; skip
+ * those silently rather than asserting.
+ */
+ if (!object_property_find(obj, "node")) {
+ return 0;
+ }
+ other_node = (uint32_t)object_property_get_uint(obj, "node", NULL);
+ if (other_node == ctx->node_id) {
+ ctx->conflict = true;
+ return 1; /* stop walk */
+ }
+ return 0;
+}
+
+/*
+ * Require the target NUMA node to be SPM-only: driver-side discovery
+ * uses proximity_domain as the key, so a node mixing SPM with other
+ * memory yields ambiguous discovery.
+ */
+static void spm_memory_check_node_exclusive(SpmMemoryDevice *spm,
+ MachineState *ms, Error **errp)
+{
+ ERRP_GUARD();
+ SpmNodeCheckCtx ctx = { spm->node, spm, false };
+
+ /* Bounds check: spm->node must be a valid NUMA node id */
+ if (!ms->numa_state || spm->node >= ms->numa_state->num_nodes) {
+ error_setg(errp,
+ "spm-memory: node %u out of range "
+ "(numa_state has %d nodes)", spm->node,
+ ms->numa_state ? ms->numa_state->num_nodes : 0);
+ return;
+ }
+
+ /* Check 1: target node must not have memory from -numa node,memdev= */
+ if (ms->numa_state->nodes[spm->node].node_mem > 0) {
+ error_setg(errp,
+ "spm-memory: NUMA node %u already has memory attached "
+ "via -numa node,memdev=; SPM nodes must be SPM-only",
+ spm->node);
+ return;
+ }
+
+ /* Check 2: target node must not already have another memory device */
+ object_child_foreach_recursive(qdev_get_machine(),
+ spm_check_node_collision_cb, &ctx);
+ if (ctx.conflict) {
+ error_setg(errp,
+ "spm-memory: NUMA node %u already has another memory "
+ "device plugged; SPM nodes must be SPM-only", spm->node);
+ return;
+ }
+}
+
+static void spm_memory_machine_done(Notifier *n, void *opaque)
+{
+ SpmMemoryDevice *spm;
+ MemoryDeviceClass *mdc;
+ uint64_t addr, size;
+
+ QLIST_FOREACH(spm, &spm_memory_list, next) {
+ g_assert(spm->hostmem);
+ mdc = MEMORY_DEVICE_GET_CLASS(MEMORY_DEVICE(spm));
+ addr = mdc->get_addr(MEMORY_DEVICE(spm));
+ size = memory_region_size(
+ host_memory_backend_get_memory(spm->hostmem));
+ e820_add_entry(addr, size, E820_SOFT_RESERVED);
+ }
+}
+
+static void spm_memory_realize(DeviceState *dev, Error **errp)
+{
+ ERRP_GUARD();
+ SpmMemoryDevice *spm = SPM_MEMORY(dev);
+ MachineState *ms = MACHINE(qdev_get_machine());
+
+ if (phase_check(PHASE_MACHINE_READY)) {
+ error_setg(errp, "spm-memory: hotplug is not supported "
+ "(boot-time-only device)");
+ return;
+ }
+
+ if (!spm->hostmem) {
+ error_setg(errp, "'%s' property is required", SPM_MEMORY_MEMDEV_PROP);
+ return;
+ }
+ if (host_memory_backend_is_mapped(spm->hostmem)) {
+ error_setg(errp, "memory backend '%s' is already in use",
+ object_get_canonical_path_component(OBJECT(spm->hostmem)));
+ return;
+ }
+
+ spm_memory_check_node_exclusive(spm, ms, errp);
+ if (*errp) {
+ return;
+ }
+
+ memory_device_pre_plug(MEMORY_DEVICE(spm), ms, errp);
+ if (*errp) {
+ return;
+ }
+
+ host_memory_backend_set_mapped(spm->hostmem, true);
+ memory_device_plug(MEMORY_DEVICE(spm), ms);
+
+ QLIST_INSERT_HEAD(&spm_memory_list, spm, next);
+
+ if (!spm_machine_done_registered) {
+ spm_machine_done_notifier.notify = spm_memory_machine_done;
+ qemu_add_machine_init_done_notifier(&spm_machine_done_notifier);
+ spm_machine_done_registered = true;
+ }
+}
+
+static const VMStateDescription vmstate_spm_memory = {
+ .name = TYPE_SPM_MEMORY,
+ .unmigratable = 1,
+};
+
+static void spm_memory_class_init(ObjectClass *oc, const void *data)
+{
+ DeviceClass *dc = DEVICE_CLASS(oc);
+ MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(oc);
+
+ dc->desc = "SPM (Specific Purpose Memory) device";
+ dc->hotpluggable = false;
+ dc->realize = spm_memory_realize;
+ dc->vmsd = &vmstate_spm_memory;
+ device_class_set_props(dc, spm_memory_properties);
+
+ mdc->get_addr = spm_memory_md_get_addr;
+ mdc->set_addr = spm_memory_md_set_addr;
+ mdc->get_memory_region = spm_memory_md_get_memory_region;
+ mdc->get_plugged_size = spm_memory_md_get_plugged_size;
+ mdc->fill_device_info = spm_memory_md_fill_device_info;
+}
+
+static const TypeInfo spm_memory_info = {
+ .name = TYPE_SPM_MEMORY,
+ .parent = TYPE_DEVICE,
+ .class_size = sizeof(SpmMemoryDeviceClass),
+ .class_init = spm_memory_class_init,
+ .instance_size = sizeof(SpmMemoryDevice),
+ .interfaces = (InterfaceInfo[]) {
+ { TYPE_MEMORY_DEVICE },
+ { }
+ },
+};
+
+static void spm_memory_register_types(void)
+{
+ type_register_static(&spm_memory_info);
+}
+
+type_init(spm_memory_register_types)
diff --git a/include/hw/mem/spm-memory.h b/include/hw/mem/spm-memory.h
new file mode 100644
index 0000000000..c662864b29
--- /dev/null
+++ b/include/hw/mem/spm-memory.h
@@ -0,0 +1,43 @@
+/*
+ * Specific Purpose Memory (SPM) device
+ *
+ * TYPE_MEMORY_DEVICE subclass for boot-time-only memory exposed to the
+ * guest as an E820 SOFT_RESERVED range with a SRAT memory-affinity entry.
+ *
+ * Copyright (c) 2026 Advanced Micro Devices, Inc.
+ *
+ * Authors:
+ * FangSheng Huang <FangSheng.Huang@amd.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef QEMU_SPM_MEMORY_H
+#define QEMU_SPM_MEMORY_H
+
+#include "hw/mem/memory-device.h"
+#include "hw/core/qdev.h"
+#include "qom/object.h"
+#include "system/hostmem.h"
+
+#define TYPE_SPM_MEMORY "spm-memory"
+
+OBJECT_DECLARE_TYPE(SpmMemoryDevice, SpmMemoryDeviceClass, SPM_MEMORY)
+
+struct SpmMemoryDevice {
+ /*< private >*/
+ DeviceState parent_obj;
+ QLIST_ENTRY(SpmMemoryDevice) next;
+
+ /*< public >*/
+ HostMemoryBackend *hostmem; /* memdev= backend */
+ uint32_t node; /* NUMA proximity domain (node=) */
+ uint64_t addr; /* GPA (from addr= or framework-assigned) */
+};
+
+struct SpmMemoryDeviceClass {
+ /*< private >*/
+ DeviceClass parent_class;
+};
+
+#endif /* QEMU_SPM_MEMORY_H */
diff --git a/qapi/machine.json b/qapi/machine.json
index 685e4e29b8..51b06d7cba 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1413,6 +1413,32 @@
}
}
+##
+# @SpmMemoryDeviceInfo:
+#
+# spm-memory device state information
+#
+# @id: device's ID
+#
+# @memaddr: physical address in memory, where device is mapped
+#
+# @size: size of memory that the device provides
+#
+# @node: NUMA proximity domain to which the device is assigned
+#
+# @memdev: memory backend linked with device
+#
+# Since: 11.1
+##
+{ 'struct': 'SpmMemoryDeviceInfo',
+ 'data': { '*id': 'str',
+ 'memaddr': 'size',
+ 'size': 'size',
+ 'node': 'int',
+ 'memdev': 'str'
+ }
+}
+
##
# @MemoryDeviceInfoKind:
#
@@ -1426,11 +1452,13 @@
#
# @hv-balloon: since 8.2.
#
+# @spm-memory: since 11.1.
+#
# Since: 2.1
##
{ 'enum': 'MemoryDeviceInfoKind',
'data': [ 'dimm', 'nvdimm', 'virtio-pmem', 'virtio-mem', 'sgx-epc',
- 'hv-balloon' ] }
+ 'hv-balloon', 'spm-memory' ] }
##
# @PCDIMMDeviceInfoWrapper:
@@ -1482,6 +1510,16 @@
{ 'struct': 'HvBalloonDeviceInfoWrapper',
'data': { 'data': 'HvBalloonDeviceInfo' } }
+##
+# @SpmMemoryDeviceInfoWrapper:
+#
+# @data: spm-memory device state information
+#
+# Since: 11.1
+##
+{ 'struct': 'SpmMemoryDeviceInfoWrapper',
+ 'data': { 'data': 'SpmMemoryDeviceInfo' } }
+
##
# @MemoryDeviceInfo:
#
@@ -1499,7 +1537,8 @@
'virtio-pmem': 'VirtioPMEMDeviceInfoWrapper',
'virtio-mem': 'VirtioMEMDeviceInfoWrapper',
'sgx-epc': 'SgxEPCDeviceInfoWrapper',
- 'hv-balloon': 'HvBalloonDeviceInfoWrapper'
+ 'hv-balloon': 'HvBalloonDeviceInfoWrapper',
+ 'spm-memory': 'SpmMemoryDeviceInfoWrapper'
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v8 1/1] hw/mem: add spm-memory device for Specific Purpose Memory
2026-05-27 7:42 ` [PATCH v8 1/1] " fanhuang
@ 2026-06-01 8:50 ` Igor Mammedov
2026-06-01 10:28 ` Huang, FangSheng (Jerry)
0 siblings, 1 reply; 4+ messages in thread
From: Igor Mammedov @ 2026-06-01 8:50 UTC (permalink / raw)
To: fanhuang; +Cc: qemu-devel, david, gourry, Zhigang.Luo, Lianjie.Shi
On Wed, 27 May 2026 15:42:15 +0800
fanhuang <FangSheng.Huang@amd.com> wrote:
> Introduce a TYPE_MEMORY_DEVICE subclass `spm-memory` for boot-time
> SOFT_RESERVED memory exposed to the guest with a per-device NUMA
> proximity domain.
>
> The device targets accelerator memory (HBM and similar) that the
> firmware hands to the guest OS as SOFT_RESERVED memory, so a driver
> in the guest -- rather than the kernel's general allocator -- owns
> the range. Per-device NUMA placement matches the natural shape of
> multiple HBM blocks (one block == one driver claim == one PXM).
>
> Usage:
>
> -object memory-backend-ram,id=spm0,size=8G
> -numa node,nodeid=N
> -device spm-memory,id=dev0,memdev=spm0,node=N[,addr=GPA]
>
> The device:
>
> - inherits TYPE_DEVICE and implements TYPE_MEMORY_DEVICE; placement
> in machine->device_memory goes through the standard memory-device
> framework (memory_device_pre_plug + memory_device_plug)
> - is boot-time only: dc->hotpluggable = false, and realize rejects
> attempts past PHASE_MACHINE_READY
> - emits one E820 SOFT_RESERVED entry per instance at machine_done
> - emits one SRAT memory_affinity entry per instance at acpi-build,
> ENABLED-only (no HOTPLUGGABLE flag)
> - rejects mixed-memory configurations on the target NUMA node at
> realize-time
> - is reported by QMP query-memory-devices as a dedicated kind,
> MEMORY_DEVICE_INFO_KIND_SPM_MEMORY
>
> The device_memory SRAT umbrella entry in hw/i386/acpi-build.c is
> restructured to partition the region into per-kind chunks rather
> than emitting a single HOTPLUGGABLE entry covering everything.
> For each plugged TYPE_SPM_MEMORY device the partition emits an
> ENABLED entry at the device's proximity_domain; the remaining
> sub-ranges (gaps between SPM devices, leading and trailing
> padding, and ranges occupied by non-SPM memory devices) are
> emitted as HOTPLUGGABLE | ENABLED entries at the placeholder
> PXM (nb_numa_nodes - 1), preserving the upstream convention.
>
> E820_SOFT_RESERVED is added to hw/i386/e820_memory_layout.h
> alongside the other type codes.
>
> CONFIG_SPM_MEMORY is selected by the i386 PC and Q35 machines
> (same as DIMM).
this pass is mostly high level review.
patch is doing too much things at once,
Suggest to split it on several pieces,
1. introducing spm-memory boiler plate code
2. SRAT mangling
3. adding E820 entry
>
> MAINTAINERS gets new file entries under the existing "Memory devices"
> stanza.
>
> Signed-off-by: FangSheng Huang <FangSheng.Huang@amd.com>
> ---
> MAINTAINERS | 2 +
a separate patch, pls.
> hw/i386/Kconfig | 2 +
> hw/i386/acpi-build.c | 105 ++++++++++++--
> hw/i386/e820_memory_layout.h | 11 +-
> hw/mem/Kconfig | 4 +
> hw/mem/meson.build | 1 +
> hw/mem/spm-memory.c | 269 +++++++++++++++++++++++++++++++++++
> include/hw/mem/spm-memory.h | 43 ++++++
> qapi/machine.json | 43 +++++-
> 9 files changed, 459 insertions(+), 21 deletions(-)
> create mode 100644 hw/mem/spm-memory.c
> create mode 100644 include/hw/mem/spm-memory.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cd5c4831e2..2a06515fc8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3361,9 +3361,11 @@ S: Supported
> F: hw/mem/memory-device*.c
> F: hw/mem/nvdimm.c
> F: hw/mem/pc-dimm.c
> +F: hw/mem/spm-memory.c
> F: include/hw/mem/memory-device.h
> F: include/hw/mem/nvdimm.h
> F: include/hw/mem/pc-dimm.h
> +F: include/hw/mem/spm-memory.h
> F: docs/nvdimm.txt
>
> SPICE
> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> index 12473acaa7..e31a25b634 100644
> --- a/hw/i386/Kconfig
> +++ b/hw/i386/Kconfig
> @@ -84,6 +84,7 @@ config I440FX
> select PCI_I440FX
> select PIIX
> select DIMM
> + select SPM_MEMORY
> select SMBIOS
> select SMBIOS_LEGACY
> select FW_CFG_DMA
> @@ -113,6 +114,7 @@ config Q35
> select LPC_ICH9
> select AHCI_ICH9
> select DIMM
> + select SPM_MEMORY
> select SMBIOS
> select FW_CFG_DMA
>
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 0d7c83d5e9..865ab5fa4f 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -52,6 +52,7 @@
> #include "migration/vmstate.h"
> #include "hw/mem/memory-device.h"
> #include "hw/mem/nvdimm.h"
> +#include "hw/mem/spm-memory.h"
> #include "system/numa.h"
> #include "system/reset.h"
> #include "hw/hyperv/vmbus-bridge.h"
> @@ -1346,6 +1347,95 @@ build_tpm_tcpa(GArray *table_data, BIOSLinker *linker, GArray *tcpalog,
> }
> #endif
>
> +typedef struct {
> + uint64_t addr;
> + uint64_t size;
> + uint32_t node;
> +} SpmRange;
> +
> +static int collect_spm_ranges_cb(Object *obj, void *opaque)
> +{
> + GArray *ranges = opaque;
> + SpmMemoryDevice *spm;
> + MemoryDeviceClass *mdc;
> + SpmRange r;
> +
> + if (!object_dynamic_cast(obj, TYPE_SPM_MEMORY)) {
> + return 0;
> + }
> + spm = SPM_MEMORY(obj);
> + mdc = MEMORY_DEVICE_GET_CLASS(MEMORY_DEVICE(spm));
> + r.addr = mdc->get_addr(MEMORY_DEVICE(spm));
> + r.size = memory_region_size(
> + host_memory_backend_get_memory(spm->hostmem));
> + r.node = spm->node;
> + g_array_append_val(ranges, r);
> + return 0;
> +}
> +
> +static gint spm_range_compare(gconstpointer a, gconstpointer b)
> +{
> + const SpmRange *range_a = a;
> + const SpmRange *range_b = b;
> +
> + if (range_a->addr < range_b->addr) {
> + return -1;
> + }
> + if (range_a->addr > range_b->addr) {
> + return 1;
> + }
> + return 0;
> +}
> +
> +/*
> + * Emit SRAT memory-affinity entries covering the device_memory region:
> + * - ENABLED entry at the device's proximity_domain for each plugged
> + * TYPE_SPM_MEMORY instance.
> + * - HOTPLUGGABLE | ENABLED entry with PXM = nb_numa_nodes - 1 for
> + * every remaining sub-range (gaps, leading/trailing padding, and
> + * ranges occupied by non-SPM memory devices).
> + */
> +static void build_srat_device_memory(GArray *table_data, MachineState *ms)
> +{
> + g_autoptr(GArray) ranges = g_array_new(FALSE, TRUE, sizeof(SpmRange));
> + uint64_t cursor, end;
> + int nb_nodes = ms->numa_state ? ms->numa_state->num_nodes : 0;
> + uint32_t hotplug_pxm = nb_nodes > 0 ? nb_nodes - 1 : 0;
> + guint i;
> +
> + if (!ms->device_memory) {
> + return;
> + }
> +
> + cursor = ms->device_memory->base;
> + end = cursor + memory_region_size(&ms->device_memory->mr);
> +
> + object_child_foreach_recursive(qdev_get_machine(),
> + collect_spm_ranges_cb, ranges);
it's not an objection, but could we do better here, i.e. idea would be:
instead of full machine scan, take ms->device_memory and go over
children regions -> pick only SPM device owned ones.
> + g_array_sort(ranges, spm_range_compare);
> +
> + for (i = 0; i < ranges->len; i++) {
> + SpmRange *r = &g_array_index(ranges, SpmRange, i);
> +
> + if (cursor < r->addr) {
> + build_srat_memory(table_data, cursor, r->addr - cursor,
> + hotplug_pxm,
> + MEM_AFFINITY_HOTPLUGGABLE |
> + MEM_AFFINITY_ENABLED);
> + }
> + build_srat_memory(table_data, r->addr, r->size, r->node,
> + MEM_AFFINITY_ENABLED);
> + cursor = r->addr + r->size;
> + }
> +
> + if (cursor < end) {
> + build_srat_memory(table_data, cursor, end - cursor,
> + hotplug_pxm,
> + MEM_AFFINITY_HOTPLUGGABLE |
> + MEM_AFFINITY_ENABLED);
> + }
> +}
> +
> #define HOLE_640K_START (640 * KiB)
> #define HOLE_640K_END (1 * MiB)
>
> @@ -1473,20 +1563,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>
> build_srat_generic_affinity_structures(table_data);
>
> - /*
> - * Entry is required for Windows to enable memory hotplug in OS
> - * and for Linux to enable SWIOTLB when booted with less than
> - * 4G of RAM. Windows works better if the entry sets proximity
> - * to the highest NUMA node in the machine.
> - * Memory devices may override proximity set by this entry,
> - * providing _PXM method if necessary.
> - */
don't just delete comment,as it still stands true. we should keep reminder
why we adding place holder(s) and its quirks.
> - if (machine->device_memory) {
> - build_srat_memory(table_data, machine->device_memory->base,
> - memory_region_size(&machine->device_memory->mr),
> - nb_numa_nodes - 1,
> - MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
> - }
> + build_srat_device_memory(table_data, machine);
>
> acpi_table_end(linker, &table);
> }
> diff --git a/hw/i386/e820_memory_layout.h b/hw/i386/e820_memory_layout.h
> index b50acfa201..6ef169db9c 100644
> --- a/hw/i386/e820_memory_layout.h
> +++ b/hw/i386/e820_memory_layout.h
> @@ -10,11 +10,12 @@
> #define HW_I386_E820_MEMORY_LAYOUT_H
>
> /* e820 types */
> -#define E820_RAM 1
> -#define E820_RESERVED 2
> -#define E820_ACPI 3
> -#define E820_NVS 4
> -#define E820_UNUSABLE 5
> +#define E820_RAM 1
> +#define E820_RESERVED 2
> +#define E820_ACPI 3
> +#define E820_NVS 4
> +#define E820_UNUSABLE 5
> +#define E820_SOFT_RESERVED 0xefffffff
>
> struct e820_entry {
> uint64_t address;
> diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
> index 73c5ae8ad9..4145870881 100644
> --- a/hw/mem/Kconfig
> +++ b/hw/mem/Kconfig
> @@ -16,3 +16,7 @@ config CXL_MEM_DEVICE
> bool
> default y if CXL
> select MEM_DEVICE
> +
> +config SPM_MEMORY
> + bool
> + select MEM_DEVICE
> diff --git a/hw/mem/meson.build b/hw/mem/meson.build
> index 8c2beeb7d4..2c28104282 100644
> --- a/hw/mem/meson.build
> +++ b/hw/mem/meson.build
> @@ -4,6 +4,7 @@ mem_ss.add(when: 'CONFIG_DIMM', if_true: files('pc-dimm.c'))
> mem_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_mc.c'))
> mem_ss.add(when: 'CONFIG_NVDIMM', if_true: files('nvdimm.c'))
> mem_ss.add(when: 'CONFIG_CXL_MEM_DEVICE', if_true: files('cxl_type3.c'))
> +mem_ss.add(when: 'CONFIG_SPM_MEMORY', if_true: files('spm-memory.c'))
> stub_ss.add(files('cxl_type3_stubs.c'))
>
> stub_ss.add(files('memory-device-stubs.c'))
> diff --git a/hw/mem/spm-memory.c b/hw/mem/spm-memory.c
> new file mode 100644
> index 0000000000..85887b2479
> --- /dev/null
> +++ b/hw/mem/spm-memory.c
> @@ -0,0 +1,269 @@
> +/*
> + * Specific Purpose Memory (SPM) device
> + *
> + * Copyright (c) 2026 Advanced Micro Devices, Inc.
> + *
> + * Authors:
> + * FangSheng Huang <FangSheng.Huang@amd.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/module.h"
> +#include "qapi/error.h"
> +#include "hw/core/boards.h"
> +#include "hw/core/qdev-properties.h"
> +#include "hw/core/qdev.h"
> +#include "hw/mem/spm-memory.h"
> +#include "hw/mem/memory-device.h"
> +#include "hw/i386/e820_memory_layout.h"
> +#include "migration/vmstate.h"
> +#include "system/hostmem.h"
> +#include "system/numa.h"
> +#include "system/system.h"
> +
> +static QLIST_HEAD(, SpmMemoryDevice) spm_memory_list =
> + QLIST_HEAD_INITIALIZER(spm_memory_list);
> +static Notifier spm_machine_done_notifier;
> +static bool spm_machine_done_registered;
> +
> +#define SPM_MEMORY_MEMDEV_PROP "memdev"
> +#define SPM_MEMORY_NODE_PROP "node"
> +#define SPM_MEMORY_ADDR_PROP "addr"
> +
> +static const Property spm_memory_properties[] = {
> + DEFINE_PROP_LINK(SPM_MEMORY_MEMDEV_PROP, SpmMemoryDevice, hostmem,
> + TYPE_MEMORY_BACKEND, HostMemoryBackend *),
> + DEFINE_PROP_UINT32(SPM_MEMORY_NODE_PROP, SpmMemoryDevice, node, 0),
> + DEFINE_PROP_UINT64(SPM_MEMORY_ADDR_PROP, SpmMemoryDevice, addr, 0),
> +};
> +
> +static uint64_t spm_memory_md_get_addr(const MemoryDeviceState *md)
> +{
> + return SPM_MEMORY(md)->addr;
> +}
> +
> +static void spm_memory_md_set_addr(MemoryDeviceState *md, uint64_t addr,
> + Error **errp)
> +{
> + SPM_MEMORY(md)->addr = addr;
> +}
> +
> +static MemoryRegion *spm_memory_md_get_memory_region(MemoryDeviceState *md,
> + Error **errp)
> +{
> + SpmMemoryDevice *spm = SPM_MEMORY(md);
> +
> + if (!spm->hostmem) {
> + error_setg(errp, "'memdev' property must be set");
> + return NULL;
> + }
> + return host_memory_backend_get_memory(spm->hostmem);
> +}
> +
> +static uint64_t spm_memory_md_get_plugged_size(const MemoryDeviceState *md,
> + Error **errp)
> +{
> + SpmMemoryDevice *spm = SPM_MEMORY(md);
> + return spm->hostmem ?
> + memory_region_size(host_memory_backend_get_memory(spm->hostmem)) : 0;
> +}
> +
> +static void spm_memory_md_fill_device_info(const MemoryDeviceState *md,
> + MemoryDeviceInfo *info)
> +{
> + SpmMemoryDeviceInfo *di = g_new0(SpmMemoryDeviceInfo, 1);
> + SpmMemoryDevice *spm = SPM_MEMORY(md);
> + DeviceState *dev = DEVICE(md);
> +
> + di->id = dev->id ? g_strdup(dev->id) : NULL;
> + di->memaddr = spm->addr;
> + di->size = spm->hostmem ? memory_region_size(
> + host_memory_backend_get_memory(spm->hostmem)) : 0;
> + di->node = spm->node;
> + di->memdev = spm->hostmem ?
> + object_get_canonical_path(OBJECT(spm->hostmem)) : NULL;
> +
> + info->u.spm_memory.data = di;
> + info->type = MEMORY_DEVICE_INFO_KIND_SPM_MEMORY;
> +}
> +
> +typedef struct {
> + uint32_t node_id;
> + const SpmMemoryDevice *self; /* exclude self when walking */
> + bool conflict;
> +} SpmNodeCheckCtx;
> +
> +static int spm_check_node_collision_cb(Object *obj, void *opaque)
> +{
> + SpmNodeCheckCtx *ctx = opaque;
> + uint32_t other_node;
> +
> + if (!object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) {
> + return 0;
> + }
> + /*
> + * Skip self. Compare canonical Object* pointers, not interface-cast
> + * MemoryDeviceState* (different address under INTERFACE_CHECK).
> + */
> + if (obj == OBJECT(ctx->self)) {
> + return 0;
> + }
> +
> + /*
> + * Not all memory-device subclasses have a "node" property; skip
> + * those silently rather than asserting.
> + */
> + if (!object_property_find(obj, "node")) {
> + return 0;
> + }
> + other_node = (uint32_t)object_property_get_uint(obj, "node", NULL);
> + if (other_node == ctx->node_id) {
> + ctx->conflict = true;
> + return 1; /* stop walk */
> + }
> + return 0;
> +}
> +
> +/*
> + * Require the target NUMA node to be SPM-only: driver-side discovery
> + * uses proximity_domain as the key, so a node mixing SPM with other
> + * memory yields ambiguous discovery.
> + */
> +static void spm_memory_check_node_exclusive(SpmMemoryDevice *spm,
> + MachineState *ms, Error **errp)
> +{
> + ERRP_GUARD();
> + SpmNodeCheckCtx ctx = { spm->node, spm, false };
> +
> + /* Bounds check: spm->node must be a valid NUMA node id */
> + if (!ms->numa_state || spm->node >= ms->numa_state->num_nodes) {
> + error_setg(errp,
> + "spm-memory: node %u out of range "
> + "(numa_state has %d nodes)", spm->node,
> + ms->numa_state ? ms->numa_state->num_nodes : 0);
> + return;
> + }
> +
> + /* Check 1: target node must not have memory from -numa node,memdev= */
> + if (ms->numa_state->nodes[spm->node].node_mem > 0) {
> + error_setg(errp,
> + "spm-memory: NUMA node %u already has memory attached "
> + "via -numa node,memdev=; SPM nodes must be SPM-only",
> + spm->node);
> + return;
> + }
> +
> + /* Check 2: target node must not already have another memory device */
> + object_child_foreach_recursive(qdev_get_machine(),
> + spm_check_node_collision_cb, &ctx);
> + if (ctx.conflict) {
> + error_setg(errp,
> + "spm-memory: NUMA node %u already has another memory "
> + "device plugged; SPM nodes must be SPM-only", spm->node);
> + return;
> + }
> +}
> +
> +static void spm_memory_machine_done(Notifier *n, void *opaque)
> +{
> + SpmMemoryDevice *spm;
> + MemoryDeviceClass *mdc;
> + uint64_t addr, size;
> +
> + QLIST_FOREACH(spm, &spm_memory_list, next) {
> + g_assert(spm->hostmem);
> + mdc = MEMORY_DEVICE_GET_CLASS(MEMORY_DEVICE(spm));
> + addr = mdc->get_addr(MEMORY_DEVICE(spm));
> + size = memory_region_size(
> + host_memory_backend_get_memory(spm->hostmem));
> + e820_add_entry(addr, size, E820_SOFT_RESERVED);
> + }
> +}
> +
> +static void spm_memory_realize(DeviceState *dev, Error **errp)
> +{
> + ERRP_GUARD();
> + SpmMemoryDevice *spm = SPM_MEMORY(dev);
> + MachineState *ms = MACHINE(qdev_get_machine());
pls do not use machine from device proper code.
we do have plug handlers that provide it at the time when necessary.
> +
> + if (phase_check(PHASE_MACHINE_READY)) {
> + error_setg(errp, "spm-memory: hotplug is not supported "
> + "(boot-time-only device)");
> + return;
> + }
shouldn't be necessary, dc->hotpluggable in class init should be sufficient.
> +
> + if (!spm->hostmem) {
> + error_setg(errp, "'%s' property is required", SPM_MEMORY_MEMDEV_PROP);
> + return;
> + }
> + if (host_memory_backend_is_mapped(spm->hostmem)) {
> + error_setg(errp, "memory backend '%s' is already in use",
> + object_get_canonical_path_component(OBJECT(spm->hostmem)));
> + return;
> + }
> +
> + spm_memory_check_node_exclusive(spm, ms, errp);
> + if (*errp) {
> + return;
> + }
As far as I understood fro previous discussions, so far it's our
own precaution.
I'd drop that, well, if you find a spec requiring it then
it should be a separate patch pointing to spec (or something else that
justifies it).
> +
> + memory_device_pre_plug(MEMORY_DEVICE(spm), ms, errp);
> + if (*errp) {
> + return;
> + }
> +
> + host_memory_backend_set_mapped(spm->hostmem, true);
> + memory_device_plug(MEMORY_DEVICE(spm), ms);
That's basically code duplication,
that doesn't belong to realize_fn, see how it's used by other devices.
The gist is mapping into address space, generic checks, machine related
steps go into machine handlers.
> +
> + QLIST_INSERT_HEAD(&spm_memory_list, spm, next);
Don't use global list, unless you have to, see below.
> +
> + if (!spm_machine_done_registered) {
> + spm_machine_done_notifier.notify = spm_memory_machine_done;
> + qemu_add_machine_init_done_notifier(&spm_machine_done_notifier);
> + spm_machine_done_registered = true;
> + }
e820 part should also go to machine specific plug handler,
that will also hel with getting rid of spm_memory_list.
That also should let you get rid of adding machine_done handler,
the machine plug handler, would do the job instead (and much earlier).
> +}
> +
> +static const VMStateDescription vmstate_spm_memory = {
> + .name = TYPE_SPM_MEMORY,
> + .unmigratable = 1,
> +};
> +
> +static void spm_memory_class_init(ObjectClass *oc, const void *data)
> +{
> + DeviceClass *dc = DEVICE_CLASS(oc);
> + MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(oc);
> +
> + dc->desc = "SPM (Specific Purpose Memory) device";
> + dc->hotpluggable = false;
> + dc->realize = spm_memory_realize;
> + dc->vmsd = &vmstate_spm_memory;
> + device_class_set_props(dc, spm_memory_properties);
> +
> + mdc->get_addr = spm_memory_md_get_addr;
> + mdc->set_addr = spm_memory_md_set_addr;
> + mdc->get_memory_region = spm_memory_md_get_memory_region;
> + mdc->get_plugged_size = spm_memory_md_get_plugged_size;
> + mdc->fill_device_info = spm_memory_md_fill_device_info;
> +}
> +
> +static const TypeInfo spm_memory_info = {
> + .name = TYPE_SPM_MEMORY,
> + .parent = TYPE_DEVICE,
> + .class_size = sizeof(SpmMemoryDeviceClass),
> + .class_init = spm_memory_class_init,
> + .instance_size = sizeof(SpmMemoryDevice),
> + .interfaces = (InterfaceInfo[]) {
> + { TYPE_MEMORY_DEVICE },
> + { }
> + },
> +};
> +
> +static void spm_memory_register_types(void)
> +{
> + type_register_static(&spm_memory_info);
> +}
> +
> +type_init(spm_memory_register_types)
> diff --git a/include/hw/mem/spm-memory.h b/include/hw/mem/spm-memory.h
> new file mode 100644
> index 0000000000..c662864b29
> --- /dev/null
> +++ b/include/hw/mem/spm-memory.h
> @@ -0,0 +1,43 @@
> +/*
> + * Specific Purpose Memory (SPM) device
> + *
> + * TYPE_MEMORY_DEVICE subclass for boot-time-only memory exposed to the
> + * guest as an E820 SOFT_RESERVED range with a SRAT memory-affinity entry.
> + *
> + * Copyright (c) 2026 Advanced Micro Devices, Inc.
> + *
> + * Authors:
> + * FangSheng Huang <FangSheng.Huang@amd.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef QEMU_SPM_MEMORY_H
> +#define QEMU_SPM_MEMORY_H
> +
> +#include "hw/mem/memory-device.h"
> +#include "hw/core/qdev.h"
> +#include "qom/object.h"
> +#include "system/hostmem.h"
> +
> +#define TYPE_SPM_MEMORY "spm-memory"
> +
> +OBJECT_DECLARE_TYPE(SpmMemoryDevice, SpmMemoryDeviceClass, SPM_MEMORY)
> +
> +struct SpmMemoryDevice {
> + /*< private >*/
> + DeviceState parent_obj;
> + QLIST_ENTRY(SpmMemoryDevice) next;
> +
> + /*< public >*/
> + HostMemoryBackend *hostmem; /* memdev= backend */
> + uint32_t node; /* NUMA proximity domain (node=) */
> + uint64_t addr; /* GPA (from addr= or framework-assigned) */
> +};
> +
> +struct SpmMemoryDeviceClass {
> + /*< private >*/
> + DeviceClass parent_class;
> +};
> +
> +#endif /* QEMU_SPM_MEMORY_H */
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 685e4e29b8..51b06d7cba 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1413,6 +1413,32 @@
> }
> }
>
> +##
> +# @SpmMemoryDeviceInfo:
> +#
> +# spm-memory device state information
> +#
> +# @id: device's ID
> +#
> +# @memaddr: physical address in memory, where device is mapped
> +#
> +# @size: size of memory that the device provides
> +#
> +# @node: NUMA proximity domain to which the device is assigned
> +#
> +# @memdev: memory backend linked with device
> +#
> +# Since: 11.1
> +##
> +{ 'struct': 'SpmMemoryDeviceInfo',
> + 'data': { '*id': 'str',
> + 'memaddr': 'size',
> + 'size': 'size',
> + 'node': 'int',
> + 'memdev': 'str'
> + }
> +}
> +
> ##
> # @MemoryDeviceInfoKind:
> #
> @@ -1426,11 +1452,13 @@
> #
> # @hv-balloon: since 8.2.
> #
> +# @spm-memory: since 11.1.
> +#
> # Since: 2.1
> ##
> { 'enum': 'MemoryDeviceInfoKind',
> 'data': [ 'dimm', 'nvdimm', 'virtio-pmem', 'virtio-mem', 'sgx-epc',
> - 'hv-balloon' ] }
> + 'hv-balloon', 'spm-memory' ] }
>
> ##
> # @PCDIMMDeviceInfoWrapper:
> @@ -1482,6 +1510,16 @@
> { 'struct': 'HvBalloonDeviceInfoWrapper',
> 'data': { 'data': 'HvBalloonDeviceInfo' } }
>
> +##
> +# @SpmMemoryDeviceInfoWrapper:
> +#
> +# @data: spm-memory device state information
> +#
> +# Since: 11.1
> +##
> +{ 'struct': 'SpmMemoryDeviceInfoWrapper',
> + 'data': { 'data': 'SpmMemoryDeviceInfo' } }
> +
> ##
> # @MemoryDeviceInfo:
> #
> @@ -1499,7 +1537,8 @@
> 'virtio-pmem': 'VirtioPMEMDeviceInfoWrapper',
> 'virtio-mem': 'VirtioMEMDeviceInfoWrapper',
> 'sgx-epc': 'SgxEPCDeviceInfoWrapper',
> - 'hv-balloon': 'HvBalloonDeviceInfoWrapper'
> + 'hv-balloon': 'HvBalloonDeviceInfoWrapper',
> + 'spm-memory': 'SpmMemoryDeviceInfoWrapper'
> }
> }
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v8 1/1] hw/mem: add spm-memory device for Specific Purpose Memory
2026-06-01 8:50 ` Igor Mammedov
@ 2026-06-01 10:28 ` Huang, FangSheng (Jerry)
0 siblings, 0 replies; 4+ messages in thread
From: Huang, FangSheng (Jerry) @ 2026-06-01 10:28 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, david, gourry, Zhigang.Luo, Lianjie.Shi
Hi Igor,
Thanks for the detailed review. Inline.
On 6/1/2026 4:50 PM, Igor Mammedov wrote:
> On Wed, 27 May 2026 15:42:15 +0800
> fanhuang <FangSheng.Huang@amd.com> wrote:
>
>> Introduce a TYPE_MEMORY_DEVICE subclass `spm-memory` for boot-time
>> SOFT_RESERVED memory exposed to the guest with a per-device NUMA
>> proximity domain.
>>
>> The device targets accelerator memory (HBM and similar) that the
>> firmware hands to the guest OS as SOFT_RESERVED memory, so a driver
>> in the guest -- rather than the kernel's general allocator -- owns
>> the range. Per-device NUMA placement matches the natural shape of
>> multiple HBM blocks (one block == one driver claim == one PXM).
>>
>> [...]
>>
>> CONFIG_SPM_MEMORY is selected by the i386 PC and Q35 machines
>> (same as DIMM).
>
> this pass is mostly high level review.
> patch is doing too much things at once,
> Suggest to split it on several pieces,
> 1. introducing spm-memory boiler plate code
> 2. SRAT mangling
> 3. adding E820 entry
>
Will split as (1) spm-memory boilerplate, (2) SRAT mangling, and
(3) E820 + pc.c plug-handler integration; MAINTAINERS becomes (4).
>>
>> MAINTAINERS gets new file entries under the existing "Memory devices"
>> stanza.
>>
>> Signed-off-by: FangSheng Huang <FangSheng.Huang@amd.com>
>> ---
>> MAINTAINERS | 2 +
> a separate patch, pls.
>
Yes, will be patch (4) on its own.
>
> [...]
>
>> + cursor = ms->device_memory->base;
>> + end = cursor + memory_region_size(&ms->device_memory->mr);
>> +
>> + object_child_foreach_recursive(qdev_get_machine(),
>> + collect_spm_ranges_cb, ranges);
> it's not an objection, but could we do better here, i.e. idea would be:
> instead of full machine scan, take ms->device_memory and go over
> children regions -> pick only SPM device owned ones.
>
Noted -- I'll stay with the current full-scan pattern for v9.
The subregion's owner is the backend rather than the device, so
a clean device_memory walk would need an extra backend->device
reverse lookup. Happy to add that if you'd prefer.
>> - /*
>> - * Entry is required for Windows to enable memory hotplug in OS
>> - * and for Linux to enable SWIOTLB when booted with less than
>> - * 4G of RAM. Windows works better if the entry sets proximity
>> - * to the highest NUMA node in the machine.
>> - * Memory devices may override proximity set by this entry,
>> - * providing _PXM method if necessary.
>> - */
>
> don't just delete comment,as it still stands true. we should keep reminder
> why we adding place holder(s) and its quirks.
>
Will keep an adapted version of the comment alongside the new
partition logic.
>
> [...]
>
>> +static void spm_memory_realize(DeviceState *dev, Error **errp)
>> +{
>> + ERRP_GUARD();
>> + SpmMemoryDevice *spm = SPM_MEMORY(dev);
>
>> + MachineState *ms = MACHINE(qdev_get_machine());
>
> pls do not use machine from device proper code.
> we do have plug handlers that provide it at the time when necessary.
>
Will remove the MachineState reference; machine-level work moves
to the pc.c plug handler (see below).
>> +
>
>> + if (phase_check(PHASE_MACHINE_READY)) {
>> + error_setg(errp, "spm-memory: hotplug is not supported "
>> + "(boot-time-only device)");
>> + return;
>> + }
>
> shouldn't be necessary, dc->hotpluggable in class init should be sufficient.
>
Confirmed -- will drop the check.
>
>> + spm_memory_check_node_exclusive(spm, ms, errp);
>> + if (*errp) {
>> + return;
>> + }
>
> As far as I understood fro previous discussions, so far it's our
> own precaution.
>
> I'd drop that, well, if you find a spec requiring it then
> it should be a separate patch pointing to spec (or something else that
> justifies it).
>
No spec to cite. Will drop the check and the associated helpers
in v9.
>> +
>> + memory_device_pre_plug(MEMORY_DEVICE(spm), ms, errp);
>> + if (*errp) {
>> + return;
>> + }
>
>> +
>> + host_memory_backend_set_mapped(spm->hostmem, true);
>> + memory_device_plug(MEMORY_DEVICE(spm), ms);
>
> That's basically code duplication,
> that doesn't belong to realize_fn, see how it's used by other devices.
>
> The gist is mapping into address space, generic checks, machine related
> steps go into machine handlers.
>
Will move pre_plug / plug / set_mapped into pc_spm_memory_pre_plug /
pc_spm_memory_plug in hw/i386/pc.c, hooked via the existing
pc_machine_device_{pre_,}plug_cb dispatch (same pattern as pc-dimm).
>
>> +
>> + QLIST_INSERT_HEAD(&spm_memory_list, spm, next);
> Don't use global list, unless you have to, see below.
>
Goes away with the plug-handler move below.
>> +
>> + if (!spm_machine_done_registered) {
>> + spm_machine_done_notifier.notify = spm_memory_machine_done;
>> + qemu_add_machine_init_done_notifier(&spm_machine_done_notifier);
>> + spm_machine_done_registered = true;
>> + }
>
> e820 part should also go to machine specific plug handler,
> that will also hel with getting rid of spm_memory_list.
> That also should let you get rid of adding machine_done handler,
> the machine plug handler, would do the job instead (and much earlier).
>
Yes -- e820_add_entry moves into pc_spm_memory_plug(), eliminating
both the global list and the machine-init-done notifier.
> [...]
>
Will respin and post v9 shortly.
Best regards,
FangSheng Huang (Jerry)
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-01 10:34 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-27 7:42 [PATCH v8 0/1] hw/mem: add spm-memory device for Specific Purpose Memory fanhuang
2026-05-27 7:42 ` [PATCH v8 1/1] " fanhuang
2026-06-01 8:50 ` Igor Mammedov
2026-06-01 10:28 ` Huang, FangSheng (Jerry)
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.