* [PATCH v8 0/1] hw/mem: add spm-memory device for Specific Purpose Memory
@ 2026-05-27 7:42 fanhuang
2026-05-27 7:42 ` [PATCH v8 1/1] " fanhuang
0 siblings, 1 reply; 4+ messages in thread
From: fanhuang @ 2026-05-27 7:42 UTC (permalink / raw)
To: qemu-devel, imammedo, david, gourry; +Cc: Zhigang.Luo, Lianjie.Shi, fanhuang
This series adds a TYPE_MEMORY_DEVICE subclass `spm-memory` for boot-time
SOFT_RESERVED guest memory, following the direction from the v7 thread [1].
Background
----------
This series targets coherent CPU + accelerator shared-address-space
systems, where the accelerator's HBM is not a device-private framebuffer
behind a PCIe BAR but a tier of host system memory: visible to the CPU
in the platform physical address space, shared coherently with the
accelerator over the platform fabric, and bound to a NUMA proximity
domain set by platform firmware at boot fabric training.
For such a region to function correctly in the guest, two things must
hold simultaneously: the CPU memory subsystem has to see it in the
system memory map (so the CPU side can address it), and it has to be
reserved exclusively for the accelerator's driver (so the kernel's
general allocator does not hand HBM pages to unrelated workloads). The
SOFT_RESERVED memory type in E820 plus a matching SRAT memory-affinity
entry is the mechanism that delivers both: a firmware-produced topology
that the CPU memory subsystem honors and the accelerator's driver
consumes for its own range.
Approach
--------
The patch adds a new TYPE_MEMORY_DEVICE subclass `spm-memory`. Each
instance binds one host memory backend to a single NUMA proximity
domain and is boot-time only; placement, hotplug rejection, and QMP
introspection come from the existing memory-device framework. The
device emits one E820 SOFT_RESERVED entry per instance at
machine_done and one SRAT memory-affinity entry per instance at
acpi-build, the latter flagged ENABLED only.
The device_memory SRAT umbrella entry in hw/i386/acpi-build.c is
restructured into a per-kind partition: for each plugged
TYPE_SPM_MEMORY device the partition emits an ENABLED entry at the
device's proximity_domain, and the remaining sub-ranges (gaps between
SPM devices, leading and trailing padding, and ranges occupied by
non-SPM memory devices) are emitted as HOTPLUGGABLE | ENABLED entries
at the placeholder PXM (nb_numa_nodes - 1).
No firmware-side change is required: the existing OVMF and SeaBIOS
handling for E820 SOFT_RESERVED and ACPI SRAT covers the guest-facing
contract.
Testing
-------
Verified end-to-end on SeaBIOS and OVMF, q35 + KVM, for:
- single spm-memory instance (natural placement and explicit addr=)
- two spm-memory instances on different NUMA nodes (tight pack and
with inter-device gap)
- one spm-memory + one pc-dimm on different NUMA nodes
Guest observations: /proc/iomem shows SOFT_RESERVED at the expected
addresses, dmesg SRAT parsing finds the per-device memory_affinity
entries with correct PXM.
Previous versions
-----------------
v1: https://lore.kernel.org/qemu-devel/20250924103324.2074819-1-FangSheng.Huang@amd.com/
v2: https://lore.kernel.org/qemu-devel/20251020090701.4036748-1-FangSheng.Huang@amd.com/
v3: https://lore.kernel.org/qemu-devel/20251208105137.2058928-1-FangSheng.Huang@amd.com/
v4: https://lore.kernel.org/qemu-devel/20251209093841.2250527-1-FangSheng.Huang@amd.com/
v5: https://lore.kernel.org/qemu-devel/20260123024312.1601732-1-FangSheng.Huang@amd.com/
v6: https://lore.kernel.org/qemu-devel/20260226105023.256568-1-FangSheng.Huang@amd.com/
v7: https://lore.kernel.org/qemu-devel/20260306082735.1106690-1-FangSheng.Huang@amd.com/
[1] v7 thread above, closed out by:
https://lore.kernel.org/qemu-devel/666a7ba1-5d3a-4732-b872-0d9fb2fe8461@amd.com/
fanhuang (1):
hw/mem: add spm-memory device for Specific Purpose Memory
MAINTAINERS | 2 +
hw/i386/Kconfig | 2 +
hw/i386/acpi-build.c | 105 ++++++++++++--
hw/i386/e820_memory_layout.h | 11 +-
hw/mem/Kconfig | 4 +
hw/mem/meson.build | 1 +
hw/mem/spm-memory.c | 269 +++++++++++++++++++++++++++++++++++
include/hw/mem/spm-memory.h | 43 ++++++
qapi/machine.json | 43 +++++-
9 files changed, 459 insertions(+), 21 deletions(-)
create mode 100644 hw/mem/spm-memory.c
create mode 100644 include/hw/mem/spm-memory.h
--
2.34.1
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH v8 1/1] hw/mem: add spm-memory device for Specific Purpose Memory 2026-05-27 7:42 [PATCH v8 0/1] hw/mem: add spm-memory device for Specific Purpose Memory fanhuang @ 2026-05-27 7:42 ` fanhuang 2026-06-01 8:50 ` Igor Mammedov 0 siblings, 1 reply; 4+ messages in thread From: fanhuang @ 2026-05-27 7:42 UTC (permalink / raw) To: qemu-devel, imammedo, david, gourry; +Cc: Zhigang.Luo, Lianjie.Shi, fanhuang Introduce a TYPE_MEMORY_DEVICE subclass `spm-memory` for boot-time SOFT_RESERVED memory exposed to the guest with a per-device NUMA proximity domain. The device targets accelerator memory (HBM and similar) that the firmware hands to the guest OS as SOFT_RESERVED memory, so a driver in the guest -- rather than the kernel's general allocator -- owns the range. Per-device NUMA placement matches the natural shape of multiple HBM blocks (one block == one driver claim == one PXM). Usage: -object memory-backend-ram,id=spm0,size=8G -numa node,nodeid=N -device spm-memory,id=dev0,memdev=spm0,node=N[,addr=GPA] The device: - inherits TYPE_DEVICE and implements TYPE_MEMORY_DEVICE; placement in machine->device_memory goes through the standard memory-device framework (memory_device_pre_plug + memory_device_plug) - is boot-time only: dc->hotpluggable = false, and realize rejects attempts past PHASE_MACHINE_READY - emits one E820 SOFT_RESERVED entry per instance at machine_done - emits one SRAT memory_affinity entry per instance at acpi-build, ENABLED-only (no HOTPLUGGABLE flag) - rejects mixed-memory configurations on the target NUMA node at realize-time - is reported by QMP query-memory-devices as a dedicated kind, MEMORY_DEVICE_INFO_KIND_SPM_MEMORY The device_memory SRAT umbrella entry in hw/i386/acpi-build.c is restructured to partition the region into per-kind chunks rather than emitting a single HOTPLUGGABLE entry covering everything. For each plugged TYPE_SPM_MEMORY device the partition emits an ENABLED entry at the device's proximity_domain; the remaining sub-ranges (gaps between SPM devices, leading and trailing padding, and ranges occupied by non-SPM memory devices) are emitted as HOTPLUGGABLE | ENABLED entries at the placeholder PXM (nb_numa_nodes - 1), preserving the upstream convention. E820_SOFT_RESERVED is added to hw/i386/e820_memory_layout.h alongside the other type codes. CONFIG_SPM_MEMORY is selected by the i386 PC and Q35 machines (same as DIMM). MAINTAINERS gets new file entries under the existing "Memory devices" stanza. Signed-off-by: FangSheng Huang <FangSheng.Huang@amd.com> --- MAINTAINERS | 2 + hw/i386/Kconfig | 2 + hw/i386/acpi-build.c | 105 ++++++++++++-- hw/i386/e820_memory_layout.h | 11 +- hw/mem/Kconfig | 4 + hw/mem/meson.build | 1 + hw/mem/spm-memory.c | 269 +++++++++++++++++++++++++++++++++++ include/hw/mem/spm-memory.h | 43 ++++++ qapi/machine.json | 43 +++++- 9 files changed, 459 insertions(+), 21 deletions(-) create mode 100644 hw/mem/spm-memory.c create mode 100644 include/hw/mem/spm-memory.h diff --git a/MAINTAINERS b/MAINTAINERS index cd5c4831e2..2a06515fc8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3361,9 +3361,11 @@ S: Supported F: hw/mem/memory-device*.c F: hw/mem/nvdimm.c F: hw/mem/pc-dimm.c +F: hw/mem/spm-memory.c F: include/hw/mem/memory-device.h F: include/hw/mem/nvdimm.h F: include/hw/mem/pc-dimm.h +F: include/hw/mem/spm-memory.h F: docs/nvdimm.txt SPICE diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig index 12473acaa7..e31a25b634 100644 --- a/hw/i386/Kconfig +++ b/hw/i386/Kconfig @@ -84,6 +84,7 @@ config I440FX select PCI_I440FX select PIIX select DIMM + select SPM_MEMORY select SMBIOS select SMBIOS_LEGACY select FW_CFG_DMA @@ -113,6 +114,7 @@ config Q35 select LPC_ICH9 select AHCI_ICH9 select DIMM + select SPM_MEMORY select SMBIOS select FW_CFG_DMA diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 0d7c83d5e9..865ab5fa4f 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -52,6 +52,7 @@ #include "migration/vmstate.h" #include "hw/mem/memory-device.h" #include "hw/mem/nvdimm.h" +#include "hw/mem/spm-memory.h" #include "system/numa.h" #include "system/reset.h" #include "hw/hyperv/vmbus-bridge.h" @@ -1346,6 +1347,95 @@ build_tpm_tcpa(GArray *table_data, BIOSLinker *linker, GArray *tcpalog, } #endif +typedef struct { + uint64_t addr; + uint64_t size; + uint32_t node; +} SpmRange; + +static int collect_spm_ranges_cb(Object *obj, void *opaque) +{ + GArray *ranges = opaque; + SpmMemoryDevice *spm; + MemoryDeviceClass *mdc; + SpmRange r; + + if (!object_dynamic_cast(obj, TYPE_SPM_MEMORY)) { + return 0; + } + spm = SPM_MEMORY(obj); + mdc = MEMORY_DEVICE_GET_CLASS(MEMORY_DEVICE(spm)); + r.addr = mdc->get_addr(MEMORY_DEVICE(spm)); + r.size = memory_region_size( + host_memory_backend_get_memory(spm->hostmem)); + r.node = spm->node; + g_array_append_val(ranges, r); + return 0; +} + +static gint spm_range_compare(gconstpointer a, gconstpointer b) +{ + const SpmRange *range_a = a; + const SpmRange *range_b = b; + + if (range_a->addr < range_b->addr) { + return -1; + } + if (range_a->addr > range_b->addr) { + return 1; + } + return 0; +} + +/* + * Emit SRAT memory-affinity entries covering the device_memory region: + * - ENABLED entry at the device's proximity_domain for each plugged + * TYPE_SPM_MEMORY instance. + * - HOTPLUGGABLE | ENABLED entry with PXM = nb_numa_nodes - 1 for + * every remaining sub-range (gaps, leading/trailing padding, and + * ranges occupied by non-SPM memory devices). + */ +static void build_srat_device_memory(GArray *table_data, MachineState *ms) +{ + g_autoptr(GArray) ranges = g_array_new(FALSE, TRUE, sizeof(SpmRange)); + uint64_t cursor, end; + int nb_nodes = ms->numa_state ? ms->numa_state->num_nodes : 0; + uint32_t hotplug_pxm = nb_nodes > 0 ? nb_nodes - 1 : 0; + guint i; + + if (!ms->device_memory) { + return; + } + + cursor = ms->device_memory->base; + end = cursor + memory_region_size(&ms->device_memory->mr); + + object_child_foreach_recursive(qdev_get_machine(), + collect_spm_ranges_cb, ranges); + g_array_sort(ranges, spm_range_compare); + + for (i = 0; i < ranges->len; i++) { + SpmRange *r = &g_array_index(ranges, SpmRange, i); + + if (cursor < r->addr) { + build_srat_memory(table_data, cursor, r->addr - cursor, + hotplug_pxm, + MEM_AFFINITY_HOTPLUGGABLE | + MEM_AFFINITY_ENABLED); + } + build_srat_memory(table_data, r->addr, r->size, r->node, + MEM_AFFINITY_ENABLED); + cursor = r->addr + r->size; + } + + if (cursor < end) { + build_srat_memory(table_data, cursor, end - cursor, + hotplug_pxm, + MEM_AFFINITY_HOTPLUGGABLE | + MEM_AFFINITY_ENABLED); + } +} + #define HOLE_640K_START (640 * KiB) #define HOLE_640K_END (1 * MiB) @@ -1473,20 +1563,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) build_srat_generic_affinity_structures(table_data); - /* - * Entry is required for Windows to enable memory hotplug in OS - * and for Linux to enable SWIOTLB when booted with less than - * 4G of RAM. Windows works better if the entry sets proximity - * to the highest NUMA node in the machine. - * Memory devices may override proximity set by this entry, - * providing _PXM method if necessary. - */ - if (machine->device_memory) { - build_srat_memory(table_data, machine->device_memory->base, - memory_region_size(&machine->device_memory->mr), - nb_numa_nodes - 1, - MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED); - } + build_srat_device_memory(table_data, machine); acpi_table_end(linker, &table); } diff --git a/hw/i386/e820_memory_layout.h b/hw/i386/e820_memory_layout.h index b50acfa201..6ef169db9c 100644 --- a/hw/i386/e820_memory_layout.h +++ b/hw/i386/e820_memory_layout.h @@ -10,11 +10,12 @@ #define HW_I386_E820_MEMORY_LAYOUT_H /* e820 types */ -#define E820_RAM 1 -#define E820_RESERVED 2 -#define E820_ACPI 3 -#define E820_NVS 4 -#define E820_UNUSABLE 5 +#define E820_RAM 1 +#define E820_RESERVED 2 +#define E820_ACPI 3 +#define E820_NVS 4 +#define E820_UNUSABLE 5 +#define E820_SOFT_RESERVED 0xefffffff struct e820_entry { uint64_t address; diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig index 73c5ae8ad9..4145870881 100644 --- a/hw/mem/Kconfig +++ b/hw/mem/Kconfig @@ -16,3 +16,7 @@ config CXL_MEM_DEVICE bool default y if CXL select MEM_DEVICE + +config SPM_MEMORY + bool + select MEM_DEVICE diff --git a/hw/mem/meson.build b/hw/mem/meson.build index 8c2beeb7d4..2c28104282 100644 --- a/hw/mem/meson.build +++ b/hw/mem/meson.build @@ -4,6 +4,7 @@ mem_ss.add(when: 'CONFIG_DIMM', if_true: files('pc-dimm.c')) mem_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_mc.c')) mem_ss.add(when: 'CONFIG_NVDIMM', if_true: files('nvdimm.c')) mem_ss.add(when: 'CONFIG_CXL_MEM_DEVICE', if_true: files('cxl_type3.c')) +mem_ss.add(when: 'CONFIG_SPM_MEMORY', if_true: files('spm-memory.c')) stub_ss.add(files('cxl_type3_stubs.c')) stub_ss.add(files('memory-device-stubs.c')) diff --git a/hw/mem/spm-memory.c b/hw/mem/spm-memory.c new file mode 100644 index 0000000000..85887b2479 --- /dev/null +++ b/hw/mem/spm-memory.c @@ -0,0 +1,269 @@ +/* + * Specific Purpose Memory (SPM) device + * + * Copyright (c) 2026 Advanced Micro Devices, Inc. + * + * Authors: + * FangSheng Huang <FangSheng.Huang@amd.com> + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" +#include "qemu/module.h" +#include "qapi/error.h" +#include "hw/core/boards.h" +#include "hw/core/qdev-properties.h" +#include "hw/core/qdev.h" +#include "hw/mem/spm-memory.h" +#include "hw/mem/memory-device.h" +#include "hw/i386/e820_memory_layout.h" +#include "migration/vmstate.h" +#include "system/hostmem.h" +#include "system/numa.h" +#include "system/system.h" + +static QLIST_HEAD(, SpmMemoryDevice) spm_memory_list = + QLIST_HEAD_INITIALIZER(spm_memory_list); +static Notifier spm_machine_done_notifier; +static bool spm_machine_done_registered; + +#define SPM_MEMORY_MEMDEV_PROP "memdev" +#define SPM_MEMORY_NODE_PROP "node" +#define SPM_MEMORY_ADDR_PROP "addr" + +static const Property spm_memory_properties[] = { + DEFINE_PROP_LINK(SPM_MEMORY_MEMDEV_PROP, SpmMemoryDevice, hostmem, + TYPE_MEMORY_BACKEND, HostMemoryBackend *), + DEFINE_PROP_UINT32(SPM_MEMORY_NODE_PROP, SpmMemoryDevice, node, 0), + DEFINE_PROP_UINT64(SPM_MEMORY_ADDR_PROP, SpmMemoryDevice, addr, 0), +}; + +static uint64_t spm_memory_md_get_addr(const MemoryDeviceState *md) +{ + return SPM_MEMORY(md)->addr; +} + +static void spm_memory_md_set_addr(MemoryDeviceState *md, uint64_t addr, + Error **errp) +{ + SPM_MEMORY(md)->addr = addr; +} + +static MemoryRegion *spm_memory_md_get_memory_region(MemoryDeviceState *md, + Error **errp) +{ + SpmMemoryDevice *spm = SPM_MEMORY(md); + + if (!spm->hostmem) { + error_setg(errp, "'memdev' property must be set"); + return NULL; + } + return host_memory_backend_get_memory(spm->hostmem); +} + +static uint64_t spm_memory_md_get_plugged_size(const MemoryDeviceState *md, + Error **errp) +{ + SpmMemoryDevice *spm = SPM_MEMORY(md); + return spm->hostmem ? + memory_region_size(host_memory_backend_get_memory(spm->hostmem)) : 0; +} + +static void spm_memory_md_fill_device_info(const MemoryDeviceState *md, + MemoryDeviceInfo *info) +{ + SpmMemoryDeviceInfo *di = g_new0(SpmMemoryDeviceInfo, 1); + SpmMemoryDevice *spm = SPM_MEMORY(md); + DeviceState *dev = DEVICE(md); + + di->id = dev->id ? g_strdup(dev->id) : NULL; + di->memaddr = spm->addr; + di->size = spm->hostmem ? memory_region_size( + host_memory_backend_get_memory(spm->hostmem)) : 0; + di->node = spm->node; + di->memdev = spm->hostmem ? + object_get_canonical_path(OBJECT(spm->hostmem)) : NULL; + + info->u.spm_memory.data = di; + info->type = MEMORY_DEVICE_INFO_KIND_SPM_MEMORY; +} + +typedef struct { + uint32_t node_id; + const SpmMemoryDevice *self; /* exclude self when walking */ + bool conflict; +} SpmNodeCheckCtx; + +static int spm_check_node_collision_cb(Object *obj, void *opaque) +{ + SpmNodeCheckCtx *ctx = opaque; + uint32_t other_node; + + if (!object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) { + return 0; + } + /* + * Skip self. Compare canonical Object* pointers, not interface-cast + * MemoryDeviceState* (different address under INTERFACE_CHECK). + */ + if (obj == OBJECT(ctx->self)) { + return 0; + } + + /* + * Not all memory-device subclasses have a "node" property; skip + * those silently rather than asserting. + */ + if (!object_property_find(obj, "node")) { + return 0; + } + other_node = (uint32_t)object_property_get_uint(obj, "node", NULL); + if (other_node == ctx->node_id) { + ctx->conflict = true; + return 1; /* stop walk */ + } + return 0; +} + +/* + * Require the target NUMA node to be SPM-only: driver-side discovery + * uses proximity_domain as the key, so a node mixing SPM with other + * memory yields ambiguous discovery. + */ +static void spm_memory_check_node_exclusive(SpmMemoryDevice *spm, + MachineState *ms, Error **errp) +{ + ERRP_GUARD(); + SpmNodeCheckCtx ctx = { spm->node, spm, false }; + + /* Bounds check: spm->node must be a valid NUMA node id */ + if (!ms->numa_state || spm->node >= ms->numa_state->num_nodes) { + error_setg(errp, + "spm-memory: node %u out of range " + "(numa_state has %d nodes)", spm->node, + ms->numa_state ? ms->numa_state->num_nodes : 0); + return; + } + + /* Check 1: target node must not have memory from -numa node,memdev= */ + if (ms->numa_state->nodes[spm->node].node_mem > 0) { + error_setg(errp, + "spm-memory: NUMA node %u already has memory attached " + "via -numa node,memdev=; SPM nodes must be SPM-only", + spm->node); + return; + } + + /* Check 2: target node must not already have another memory device */ + object_child_foreach_recursive(qdev_get_machine(), + spm_check_node_collision_cb, &ctx); + if (ctx.conflict) { + error_setg(errp, + "spm-memory: NUMA node %u already has another memory " + "device plugged; SPM nodes must be SPM-only", spm->node); + return; + } +} + +static void spm_memory_machine_done(Notifier *n, void *opaque) +{ + SpmMemoryDevice *spm; + MemoryDeviceClass *mdc; + uint64_t addr, size; + + QLIST_FOREACH(spm, &spm_memory_list, next) { + g_assert(spm->hostmem); + mdc = MEMORY_DEVICE_GET_CLASS(MEMORY_DEVICE(spm)); + addr = mdc->get_addr(MEMORY_DEVICE(spm)); + size = memory_region_size( + host_memory_backend_get_memory(spm->hostmem)); + e820_add_entry(addr, size, E820_SOFT_RESERVED); + } +} + +static void spm_memory_realize(DeviceState *dev, Error **errp) +{ + ERRP_GUARD(); + SpmMemoryDevice *spm = SPM_MEMORY(dev); + MachineState *ms = MACHINE(qdev_get_machine()); + + if (phase_check(PHASE_MACHINE_READY)) { + error_setg(errp, "spm-memory: hotplug is not supported " + "(boot-time-only device)"); + return; + } + + if (!spm->hostmem) { + error_setg(errp, "'%s' property is required", SPM_MEMORY_MEMDEV_PROP); + return; + } + if (host_memory_backend_is_mapped(spm->hostmem)) { + error_setg(errp, "memory backend '%s' is already in use", + object_get_canonical_path_component(OBJECT(spm->hostmem))); + return; + } + + spm_memory_check_node_exclusive(spm, ms, errp); + if (*errp) { + return; + } + + memory_device_pre_plug(MEMORY_DEVICE(spm), ms, errp); + if (*errp) { + return; + } + + host_memory_backend_set_mapped(spm->hostmem, true); + memory_device_plug(MEMORY_DEVICE(spm), ms); + + QLIST_INSERT_HEAD(&spm_memory_list, spm, next); + + if (!spm_machine_done_registered) { + spm_machine_done_notifier.notify = spm_memory_machine_done; + qemu_add_machine_init_done_notifier(&spm_machine_done_notifier); + spm_machine_done_registered = true; + } +} + +static const VMStateDescription vmstate_spm_memory = { + .name = TYPE_SPM_MEMORY, + .unmigratable = 1, +}; + +static void spm_memory_class_init(ObjectClass *oc, const void *data) +{ + DeviceClass *dc = DEVICE_CLASS(oc); + MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(oc); + + dc->desc = "SPM (Specific Purpose Memory) device"; + dc->hotpluggable = false; + dc->realize = spm_memory_realize; + dc->vmsd = &vmstate_spm_memory; + device_class_set_props(dc, spm_memory_properties); + + mdc->get_addr = spm_memory_md_get_addr; + mdc->set_addr = spm_memory_md_set_addr; + mdc->get_memory_region = spm_memory_md_get_memory_region; + mdc->get_plugged_size = spm_memory_md_get_plugged_size; + mdc->fill_device_info = spm_memory_md_fill_device_info; +} + +static const TypeInfo spm_memory_info = { + .name = TYPE_SPM_MEMORY, + .parent = TYPE_DEVICE, + .class_size = sizeof(SpmMemoryDeviceClass), + .class_init = spm_memory_class_init, + .instance_size = sizeof(SpmMemoryDevice), + .interfaces = (InterfaceInfo[]) { + { TYPE_MEMORY_DEVICE }, + { } + }, +}; + +static void spm_memory_register_types(void) +{ + type_register_static(&spm_memory_info); +} + +type_init(spm_memory_register_types) diff --git a/include/hw/mem/spm-memory.h b/include/hw/mem/spm-memory.h new file mode 100644 index 0000000000..c662864b29 --- /dev/null +++ b/include/hw/mem/spm-memory.h @@ -0,0 +1,43 @@ +/* + * Specific Purpose Memory (SPM) device + * + * TYPE_MEMORY_DEVICE subclass for boot-time-only memory exposed to the + * guest as an E820 SOFT_RESERVED range with a SRAT memory-affinity entry. + * + * Copyright (c) 2026 Advanced Micro Devices, Inc. + * + * Authors: + * FangSheng Huang <FangSheng.Huang@amd.com> + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#ifndef QEMU_SPM_MEMORY_H +#define QEMU_SPM_MEMORY_H + +#include "hw/mem/memory-device.h" +#include "hw/core/qdev.h" +#include "qom/object.h" +#include "system/hostmem.h" + +#define TYPE_SPM_MEMORY "spm-memory" + +OBJECT_DECLARE_TYPE(SpmMemoryDevice, SpmMemoryDeviceClass, SPM_MEMORY) + +struct SpmMemoryDevice { + /*< private >*/ + DeviceState parent_obj; + QLIST_ENTRY(SpmMemoryDevice) next; + + /*< public >*/ + HostMemoryBackend *hostmem; /* memdev= backend */ + uint32_t node; /* NUMA proximity domain (node=) */ + uint64_t addr; /* GPA (from addr= or framework-assigned) */ +}; + +struct SpmMemoryDeviceClass { + /*< private >*/ + DeviceClass parent_class; +}; + +#endif /* QEMU_SPM_MEMORY_H */ diff --git a/qapi/machine.json b/qapi/machine.json index 685e4e29b8..51b06d7cba 100644 --- a/qapi/machine.json +++ b/qapi/machine.json @@ -1413,6 +1413,32 @@ } } +## +# @SpmMemoryDeviceInfo: +# +# spm-memory device state information +# +# @id: device's ID +# +# @memaddr: physical address in memory, where device is mapped +# +# @size: size of memory that the device provides +# +# @node: NUMA proximity domain to which the device is assigned +# +# @memdev: memory backend linked with device +# +# Since: 11.1 +## +{ 'struct': 'SpmMemoryDeviceInfo', + 'data': { '*id': 'str', + 'memaddr': 'size', + 'size': 'size', + 'node': 'int', + 'memdev': 'str' + } +} + ## # @MemoryDeviceInfoKind: # @@ -1426,11 +1452,13 @@ # # @hv-balloon: since 8.2. # +# @spm-memory: since 11.1. +# # Since: 2.1 ## { 'enum': 'MemoryDeviceInfoKind', 'data': [ 'dimm', 'nvdimm', 'virtio-pmem', 'virtio-mem', 'sgx-epc', - 'hv-balloon' ] } + 'hv-balloon', 'spm-memory' ] } ## # @PCDIMMDeviceInfoWrapper: @@ -1482,6 +1510,16 @@ { 'struct': 'HvBalloonDeviceInfoWrapper', 'data': { 'data': 'HvBalloonDeviceInfo' } } +## +# @SpmMemoryDeviceInfoWrapper: +# +# @data: spm-memory device state information +# +# Since: 11.1 +## +{ 'struct': 'SpmMemoryDeviceInfoWrapper', + 'data': { 'data': 'SpmMemoryDeviceInfo' } } + ## # @MemoryDeviceInfo: # @@ -1499,7 +1537,8 @@ 'virtio-pmem': 'VirtioPMEMDeviceInfoWrapper', 'virtio-mem': 'VirtioMEMDeviceInfoWrapper', 'sgx-epc': 'SgxEPCDeviceInfoWrapper', - 'hv-balloon': 'HvBalloonDeviceInfoWrapper' + 'hv-balloon': 'HvBalloonDeviceInfoWrapper', + 'spm-memory': 'SpmMemoryDeviceInfoWrapper' } } -- 2.34.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v8 1/1] hw/mem: add spm-memory device for Specific Purpose Memory 2026-05-27 7:42 ` [PATCH v8 1/1] " fanhuang @ 2026-06-01 8:50 ` Igor Mammedov 2026-06-01 10:28 ` Huang, FangSheng (Jerry) 0 siblings, 1 reply; 4+ messages in thread From: Igor Mammedov @ 2026-06-01 8:50 UTC (permalink / raw) To: fanhuang; +Cc: qemu-devel, david, gourry, Zhigang.Luo, Lianjie.Shi On Wed, 27 May 2026 15:42:15 +0800 fanhuang <FangSheng.Huang@amd.com> wrote: > Introduce a TYPE_MEMORY_DEVICE subclass `spm-memory` for boot-time > SOFT_RESERVED memory exposed to the guest with a per-device NUMA > proximity domain. > > The device targets accelerator memory (HBM and similar) that the > firmware hands to the guest OS as SOFT_RESERVED memory, so a driver > in the guest -- rather than the kernel's general allocator -- owns > the range. Per-device NUMA placement matches the natural shape of > multiple HBM blocks (one block == one driver claim == one PXM). > > Usage: > > -object memory-backend-ram,id=spm0,size=8G > -numa node,nodeid=N > -device spm-memory,id=dev0,memdev=spm0,node=N[,addr=GPA] > > The device: > > - inherits TYPE_DEVICE and implements TYPE_MEMORY_DEVICE; placement > in machine->device_memory goes through the standard memory-device > framework (memory_device_pre_plug + memory_device_plug) > - is boot-time only: dc->hotpluggable = false, and realize rejects > attempts past PHASE_MACHINE_READY > - emits one E820 SOFT_RESERVED entry per instance at machine_done > - emits one SRAT memory_affinity entry per instance at acpi-build, > ENABLED-only (no HOTPLUGGABLE flag) > - rejects mixed-memory configurations on the target NUMA node at > realize-time > - is reported by QMP query-memory-devices as a dedicated kind, > MEMORY_DEVICE_INFO_KIND_SPM_MEMORY > > The device_memory SRAT umbrella entry in hw/i386/acpi-build.c is > restructured to partition the region into per-kind chunks rather > than emitting a single HOTPLUGGABLE entry covering everything. > For each plugged TYPE_SPM_MEMORY device the partition emits an > ENABLED entry at the device's proximity_domain; the remaining > sub-ranges (gaps between SPM devices, leading and trailing > padding, and ranges occupied by non-SPM memory devices) are > emitted as HOTPLUGGABLE | ENABLED entries at the placeholder > PXM (nb_numa_nodes - 1), preserving the upstream convention. > > E820_SOFT_RESERVED is added to hw/i386/e820_memory_layout.h > alongside the other type codes. > > CONFIG_SPM_MEMORY is selected by the i386 PC and Q35 machines > (same as DIMM). this pass is mostly high level review. patch is doing too much things at once, Suggest to split it on several pieces, 1. introducing spm-memory boiler plate code 2. SRAT mangling 3. adding E820 entry > > MAINTAINERS gets new file entries under the existing "Memory devices" > stanza. > > Signed-off-by: FangSheng Huang <FangSheng.Huang@amd.com> > --- > MAINTAINERS | 2 + a separate patch, pls. > hw/i386/Kconfig | 2 + > hw/i386/acpi-build.c | 105 ++++++++++++-- > hw/i386/e820_memory_layout.h | 11 +- > hw/mem/Kconfig | 4 + > hw/mem/meson.build | 1 + > hw/mem/spm-memory.c | 269 +++++++++++++++++++++++++++++++++++ > include/hw/mem/spm-memory.h | 43 ++++++ > qapi/machine.json | 43 +++++- > 9 files changed, 459 insertions(+), 21 deletions(-) > create mode 100644 hw/mem/spm-memory.c > create mode 100644 include/hw/mem/spm-memory.h > > diff --git a/MAINTAINERS b/MAINTAINERS > index cd5c4831e2..2a06515fc8 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -3361,9 +3361,11 @@ S: Supported > F: hw/mem/memory-device*.c > F: hw/mem/nvdimm.c > F: hw/mem/pc-dimm.c > +F: hw/mem/spm-memory.c > F: include/hw/mem/memory-device.h > F: include/hw/mem/nvdimm.h > F: include/hw/mem/pc-dimm.h > +F: include/hw/mem/spm-memory.h > F: docs/nvdimm.txt > > SPICE > diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig > index 12473acaa7..e31a25b634 100644 > --- a/hw/i386/Kconfig > +++ b/hw/i386/Kconfig > @@ -84,6 +84,7 @@ config I440FX > select PCI_I440FX > select PIIX > select DIMM > + select SPM_MEMORY > select SMBIOS > select SMBIOS_LEGACY > select FW_CFG_DMA > @@ -113,6 +114,7 @@ config Q35 > select LPC_ICH9 > select AHCI_ICH9 > select DIMM > + select SPM_MEMORY > select SMBIOS > select FW_CFG_DMA > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > index 0d7c83d5e9..865ab5fa4f 100644 > --- a/hw/i386/acpi-build.c > +++ b/hw/i386/acpi-build.c > @@ -52,6 +52,7 @@ > #include "migration/vmstate.h" > #include "hw/mem/memory-device.h" > #include "hw/mem/nvdimm.h" > +#include "hw/mem/spm-memory.h" > #include "system/numa.h" > #include "system/reset.h" > #include "hw/hyperv/vmbus-bridge.h" > @@ -1346,6 +1347,95 @@ build_tpm_tcpa(GArray *table_data, BIOSLinker *linker, GArray *tcpalog, > } > #endif > > +typedef struct { > + uint64_t addr; > + uint64_t size; > + uint32_t node; > +} SpmRange; > + > +static int collect_spm_ranges_cb(Object *obj, void *opaque) > +{ > + GArray *ranges = opaque; > + SpmMemoryDevice *spm; > + MemoryDeviceClass *mdc; > + SpmRange r; > + > + if (!object_dynamic_cast(obj, TYPE_SPM_MEMORY)) { > + return 0; > + } > + spm = SPM_MEMORY(obj); > + mdc = MEMORY_DEVICE_GET_CLASS(MEMORY_DEVICE(spm)); > + r.addr = mdc->get_addr(MEMORY_DEVICE(spm)); > + r.size = memory_region_size( > + host_memory_backend_get_memory(spm->hostmem)); > + r.node = spm->node; > + g_array_append_val(ranges, r); > + return 0; > +} > + > +static gint spm_range_compare(gconstpointer a, gconstpointer b) > +{ > + const SpmRange *range_a = a; > + const SpmRange *range_b = b; > + > + if (range_a->addr < range_b->addr) { > + return -1; > + } > + if (range_a->addr > range_b->addr) { > + return 1; > + } > + return 0; > +} > + > +/* > + * Emit SRAT memory-affinity entries covering the device_memory region: > + * - ENABLED entry at the device's proximity_domain for each plugged > + * TYPE_SPM_MEMORY instance. > + * - HOTPLUGGABLE | ENABLED entry with PXM = nb_numa_nodes - 1 for > + * every remaining sub-range (gaps, leading/trailing padding, and > + * ranges occupied by non-SPM memory devices). > + */ > +static void build_srat_device_memory(GArray *table_data, MachineState *ms) > +{ > + g_autoptr(GArray) ranges = g_array_new(FALSE, TRUE, sizeof(SpmRange)); > + uint64_t cursor, end; > + int nb_nodes = ms->numa_state ? ms->numa_state->num_nodes : 0; > + uint32_t hotplug_pxm = nb_nodes > 0 ? nb_nodes - 1 : 0; > + guint i; > + > + if (!ms->device_memory) { > + return; > + } > + > + cursor = ms->device_memory->base; > + end = cursor + memory_region_size(&ms->device_memory->mr); > + > + object_child_foreach_recursive(qdev_get_machine(), > + collect_spm_ranges_cb, ranges); it's not an objection, but could we do better here, i.e. idea would be: instead of full machine scan, take ms->device_memory and go over children regions -> pick only SPM device owned ones. > + g_array_sort(ranges, spm_range_compare); > + > + for (i = 0; i < ranges->len; i++) { > + SpmRange *r = &g_array_index(ranges, SpmRange, i); > + > + if (cursor < r->addr) { > + build_srat_memory(table_data, cursor, r->addr - cursor, > + hotplug_pxm, > + MEM_AFFINITY_HOTPLUGGABLE | > + MEM_AFFINITY_ENABLED); > + } > + build_srat_memory(table_data, r->addr, r->size, r->node, > + MEM_AFFINITY_ENABLED); > + cursor = r->addr + r->size; > + } > + > + if (cursor < end) { > + build_srat_memory(table_data, cursor, end - cursor, > + hotplug_pxm, > + MEM_AFFINITY_HOTPLUGGABLE | > + MEM_AFFINITY_ENABLED); > + } > +} > + > #define HOLE_640K_START (640 * KiB) > #define HOLE_640K_END (1 * MiB) > > @@ -1473,20 +1563,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) > > build_srat_generic_affinity_structures(table_data); > > - /* > - * Entry is required for Windows to enable memory hotplug in OS > - * and for Linux to enable SWIOTLB when booted with less than > - * 4G of RAM. Windows works better if the entry sets proximity > - * to the highest NUMA node in the machine. > - * Memory devices may override proximity set by this entry, > - * providing _PXM method if necessary. > - */ don't just delete comment,as it still stands true. we should keep reminder why we adding place holder(s) and its quirks. > - if (machine->device_memory) { > - build_srat_memory(table_data, machine->device_memory->base, > - memory_region_size(&machine->device_memory->mr), > - nb_numa_nodes - 1, > - MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED); > - } > + build_srat_device_memory(table_data, machine); > > acpi_table_end(linker, &table); > } > diff --git a/hw/i386/e820_memory_layout.h b/hw/i386/e820_memory_layout.h > index b50acfa201..6ef169db9c 100644 > --- a/hw/i386/e820_memory_layout.h > +++ b/hw/i386/e820_memory_layout.h > @@ -10,11 +10,12 @@ > #define HW_I386_E820_MEMORY_LAYOUT_H > > /* e820 types */ > -#define E820_RAM 1 > -#define E820_RESERVED 2 > -#define E820_ACPI 3 > -#define E820_NVS 4 > -#define E820_UNUSABLE 5 > +#define E820_RAM 1 > +#define E820_RESERVED 2 > +#define E820_ACPI 3 > +#define E820_NVS 4 > +#define E820_UNUSABLE 5 > +#define E820_SOFT_RESERVED 0xefffffff > > struct e820_entry { > uint64_t address; > diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig > index 73c5ae8ad9..4145870881 100644 > --- a/hw/mem/Kconfig > +++ b/hw/mem/Kconfig > @@ -16,3 +16,7 @@ config CXL_MEM_DEVICE > bool > default y if CXL > select MEM_DEVICE > + > +config SPM_MEMORY > + bool > + select MEM_DEVICE > diff --git a/hw/mem/meson.build b/hw/mem/meson.build > index 8c2beeb7d4..2c28104282 100644 > --- a/hw/mem/meson.build > +++ b/hw/mem/meson.build > @@ -4,6 +4,7 @@ mem_ss.add(when: 'CONFIG_DIMM', if_true: files('pc-dimm.c')) > mem_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_mc.c')) > mem_ss.add(when: 'CONFIG_NVDIMM', if_true: files('nvdimm.c')) > mem_ss.add(when: 'CONFIG_CXL_MEM_DEVICE', if_true: files('cxl_type3.c')) > +mem_ss.add(when: 'CONFIG_SPM_MEMORY', if_true: files('spm-memory.c')) > stub_ss.add(files('cxl_type3_stubs.c')) > > stub_ss.add(files('memory-device-stubs.c')) > diff --git a/hw/mem/spm-memory.c b/hw/mem/spm-memory.c > new file mode 100644 > index 0000000000..85887b2479 > --- /dev/null > +++ b/hw/mem/spm-memory.c > @@ -0,0 +1,269 @@ > +/* > + * Specific Purpose Memory (SPM) device > + * > + * Copyright (c) 2026 Advanced Micro Devices, Inc. > + * > + * Authors: > + * FangSheng Huang <FangSheng.Huang@amd.com> > + * > + * SPDX-License-Identifier: GPL-2.0-or-later > + */ > + > +#include "qemu/osdep.h" > +#include "qemu/module.h" > +#include "qapi/error.h" > +#include "hw/core/boards.h" > +#include "hw/core/qdev-properties.h" > +#include "hw/core/qdev.h" > +#include "hw/mem/spm-memory.h" > +#include "hw/mem/memory-device.h" > +#include "hw/i386/e820_memory_layout.h" > +#include "migration/vmstate.h" > +#include "system/hostmem.h" > +#include "system/numa.h" > +#include "system/system.h" > + > +static QLIST_HEAD(, SpmMemoryDevice) spm_memory_list = > + QLIST_HEAD_INITIALIZER(spm_memory_list); > +static Notifier spm_machine_done_notifier; > +static bool spm_machine_done_registered; > + > +#define SPM_MEMORY_MEMDEV_PROP "memdev" > +#define SPM_MEMORY_NODE_PROP "node" > +#define SPM_MEMORY_ADDR_PROP "addr" > + > +static const Property spm_memory_properties[] = { > + DEFINE_PROP_LINK(SPM_MEMORY_MEMDEV_PROP, SpmMemoryDevice, hostmem, > + TYPE_MEMORY_BACKEND, HostMemoryBackend *), > + DEFINE_PROP_UINT32(SPM_MEMORY_NODE_PROP, SpmMemoryDevice, node, 0), > + DEFINE_PROP_UINT64(SPM_MEMORY_ADDR_PROP, SpmMemoryDevice, addr, 0), > +}; > + > +static uint64_t spm_memory_md_get_addr(const MemoryDeviceState *md) > +{ > + return SPM_MEMORY(md)->addr; > +} > + > +static void spm_memory_md_set_addr(MemoryDeviceState *md, uint64_t addr, > + Error **errp) > +{ > + SPM_MEMORY(md)->addr = addr; > +} > + > +static MemoryRegion *spm_memory_md_get_memory_region(MemoryDeviceState *md, > + Error **errp) > +{ > + SpmMemoryDevice *spm = SPM_MEMORY(md); > + > + if (!spm->hostmem) { > + error_setg(errp, "'memdev' property must be set"); > + return NULL; > + } > + return host_memory_backend_get_memory(spm->hostmem); > +} > + > +static uint64_t spm_memory_md_get_plugged_size(const MemoryDeviceState *md, > + Error **errp) > +{ > + SpmMemoryDevice *spm = SPM_MEMORY(md); > + return spm->hostmem ? > + memory_region_size(host_memory_backend_get_memory(spm->hostmem)) : 0; > +} > + > +static void spm_memory_md_fill_device_info(const MemoryDeviceState *md, > + MemoryDeviceInfo *info) > +{ > + SpmMemoryDeviceInfo *di = g_new0(SpmMemoryDeviceInfo, 1); > + SpmMemoryDevice *spm = SPM_MEMORY(md); > + DeviceState *dev = DEVICE(md); > + > + di->id = dev->id ? g_strdup(dev->id) : NULL; > + di->memaddr = spm->addr; > + di->size = spm->hostmem ? memory_region_size( > + host_memory_backend_get_memory(spm->hostmem)) : 0; > + di->node = spm->node; > + di->memdev = spm->hostmem ? > + object_get_canonical_path(OBJECT(spm->hostmem)) : NULL; > + > + info->u.spm_memory.data = di; > + info->type = MEMORY_DEVICE_INFO_KIND_SPM_MEMORY; > +} > + > +typedef struct { > + uint32_t node_id; > + const SpmMemoryDevice *self; /* exclude self when walking */ > + bool conflict; > +} SpmNodeCheckCtx; > + > +static int spm_check_node_collision_cb(Object *obj, void *opaque) > +{ > + SpmNodeCheckCtx *ctx = opaque; > + uint32_t other_node; > + > + if (!object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) { > + return 0; > + } > + /* > + * Skip self. Compare canonical Object* pointers, not interface-cast > + * MemoryDeviceState* (different address under INTERFACE_CHECK). > + */ > + if (obj == OBJECT(ctx->self)) { > + return 0; > + } > + > + /* > + * Not all memory-device subclasses have a "node" property; skip > + * those silently rather than asserting. > + */ > + if (!object_property_find(obj, "node")) { > + return 0; > + } > + other_node = (uint32_t)object_property_get_uint(obj, "node", NULL); > + if (other_node == ctx->node_id) { > + ctx->conflict = true; > + return 1; /* stop walk */ > + } > + return 0; > +} > + > +/* > + * Require the target NUMA node to be SPM-only: driver-side discovery > + * uses proximity_domain as the key, so a node mixing SPM with other > + * memory yields ambiguous discovery. > + */ > +static void spm_memory_check_node_exclusive(SpmMemoryDevice *spm, > + MachineState *ms, Error **errp) > +{ > + ERRP_GUARD(); > + SpmNodeCheckCtx ctx = { spm->node, spm, false }; > + > + /* Bounds check: spm->node must be a valid NUMA node id */ > + if (!ms->numa_state || spm->node >= ms->numa_state->num_nodes) { > + error_setg(errp, > + "spm-memory: node %u out of range " > + "(numa_state has %d nodes)", spm->node, > + ms->numa_state ? ms->numa_state->num_nodes : 0); > + return; > + } > + > + /* Check 1: target node must not have memory from -numa node,memdev= */ > + if (ms->numa_state->nodes[spm->node].node_mem > 0) { > + error_setg(errp, > + "spm-memory: NUMA node %u already has memory attached " > + "via -numa node,memdev=; SPM nodes must be SPM-only", > + spm->node); > + return; > + } > + > + /* Check 2: target node must not already have another memory device */ > + object_child_foreach_recursive(qdev_get_machine(), > + spm_check_node_collision_cb, &ctx); > + if (ctx.conflict) { > + error_setg(errp, > + "spm-memory: NUMA node %u already has another memory " > + "device plugged; SPM nodes must be SPM-only", spm->node); > + return; > + } > +} > + > +static void spm_memory_machine_done(Notifier *n, void *opaque) > +{ > + SpmMemoryDevice *spm; > + MemoryDeviceClass *mdc; > + uint64_t addr, size; > + > + QLIST_FOREACH(spm, &spm_memory_list, next) { > + g_assert(spm->hostmem); > + mdc = MEMORY_DEVICE_GET_CLASS(MEMORY_DEVICE(spm)); > + addr = mdc->get_addr(MEMORY_DEVICE(spm)); > + size = memory_region_size( > + host_memory_backend_get_memory(spm->hostmem)); > + e820_add_entry(addr, size, E820_SOFT_RESERVED); > + } > +} > + > +static void spm_memory_realize(DeviceState *dev, Error **errp) > +{ > + ERRP_GUARD(); > + SpmMemoryDevice *spm = SPM_MEMORY(dev); > + MachineState *ms = MACHINE(qdev_get_machine()); pls do not use machine from device proper code. we do have plug handlers that provide it at the time when necessary. > + > + if (phase_check(PHASE_MACHINE_READY)) { > + error_setg(errp, "spm-memory: hotplug is not supported " > + "(boot-time-only device)"); > + return; > + } shouldn't be necessary, dc->hotpluggable in class init should be sufficient. > + > + if (!spm->hostmem) { > + error_setg(errp, "'%s' property is required", SPM_MEMORY_MEMDEV_PROP); > + return; > + } > + if (host_memory_backend_is_mapped(spm->hostmem)) { > + error_setg(errp, "memory backend '%s' is already in use", > + object_get_canonical_path_component(OBJECT(spm->hostmem))); > + return; > + } > + > + spm_memory_check_node_exclusive(spm, ms, errp); > + if (*errp) { > + return; > + } As far as I understood fro previous discussions, so far it's our own precaution. I'd drop that, well, if you find a spec requiring it then it should be a separate patch pointing to spec (or something else that justifies it). > + > + memory_device_pre_plug(MEMORY_DEVICE(spm), ms, errp); > + if (*errp) { > + return; > + } > + > + host_memory_backend_set_mapped(spm->hostmem, true); > + memory_device_plug(MEMORY_DEVICE(spm), ms); That's basically code duplication, that doesn't belong to realize_fn, see how it's used by other devices. The gist is mapping into address space, generic checks, machine related steps go into machine handlers. > + > + QLIST_INSERT_HEAD(&spm_memory_list, spm, next); Don't use global list, unless you have to, see below. > + > + if (!spm_machine_done_registered) { > + spm_machine_done_notifier.notify = spm_memory_machine_done; > + qemu_add_machine_init_done_notifier(&spm_machine_done_notifier); > + spm_machine_done_registered = true; > + } e820 part should also go to machine specific plug handler, that will also hel with getting rid of spm_memory_list. That also should let you get rid of adding machine_done handler, the machine plug handler, would do the job instead (and much earlier). > +} > + > +static const VMStateDescription vmstate_spm_memory = { > + .name = TYPE_SPM_MEMORY, > + .unmigratable = 1, > +}; > + > +static void spm_memory_class_init(ObjectClass *oc, const void *data) > +{ > + DeviceClass *dc = DEVICE_CLASS(oc); > + MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(oc); > + > + dc->desc = "SPM (Specific Purpose Memory) device"; > + dc->hotpluggable = false; > + dc->realize = spm_memory_realize; > + dc->vmsd = &vmstate_spm_memory; > + device_class_set_props(dc, spm_memory_properties); > + > + mdc->get_addr = spm_memory_md_get_addr; > + mdc->set_addr = spm_memory_md_set_addr; > + mdc->get_memory_region = spm_memory_md_get_memory_region; > + mdc->get_plugged_size = spm_memory_md_get_plugged_size; > + mdc->fill_device_info = spm_memory_md_fill_device_info; > +} > + > +static const TypeInfo spm_memory_info = { > + .name = TYPE_SPM_MEMORY, > + .parent = TYPE_DEVICE, > + .class_size = sizeof(SpmMemoryDeviceClass), > + .class_init = spm_memory_class_init, > + .instance_size = sizeof(SpmMemoryDevice), > + .interfaces = (InterfaceInfo[]) { > + { TYPE_MEMORY_DEVICE }, > + { } > + }, > +}; > + > +static void spm_memory_register_types(void) > +{ > + type_register_static(&spm_memory_info); > +} > + > +type_init(spm_memory_register_types) > diff --git a/include/hw/mem/spm-memory.h b/include/hw/mem/spm-memory.h > new file mode 100644 > index 0000000000..c662864b29 > --- /dev/null > +++ b/include/hw/mem/spm-memory.h > @@ -0,0 +1,43 @@ > +/* > + * Specific Purpose Memory (SPM) device > + * > + * TYPE_MEMORY_DEVICE subclass for boot-time-only memory exposed to the > + * guest as an E820 SOFT_RESERVED range with a SRAT memory-affinity entry. > + * > + * Copyright (c) 2026 Advanced Micro Devices, Inc. > + * > + * Authors: > + * FangSheng Huang <FangSheng.Huang@amd.com> > + * > + * SPDX-License-Identifier: GPL-2.0-or-later > + */ > + > +#ifndef QEMU_SPM_MEMORY_H > +#define QEMU_SPM_MEMORY_H > + > +#include "hw/mem/memory-device.h" > +#include "hw/core/qdev.h" > +#include "qom/object.h" > +#include "system/hostmem.h" > + > +#define TYPE_SPM_MEMORY "spm-memory" > + > +OBJECT_DECLARE_TYPE(SpmMemoryDevice, SpmMemoryDeviceClass, SPM_MEMORY) > + > +struct SpmMemoryDevice { > + /*< private >*/ > + DeviceState parent_obj; > + QLIST_ENTRY(SpmMemoryDevice) next; > + > + /*< public >*/ > + HostMemoryBackend *hostmem; /* memdev= backend */ > + uint32_t node; /* NUMA proximity domain (node=) */ > + uint64_t addr; /* GPA (from addr= or framework-assigned) */ > +}; > + > +struct SpmMemoryDeviceClass { > + /*< private >*/ > + DeviceClass parent_class; > +}; > + > +#endif /* QEMU_SPM_MEMORY_H */ > diff --git a/qapi/machine.json b/qapi/machine.json > index 685e4e29b8..51b06d7cba 100644 > --- a/qapi/machine.json > +++ b/qapi/machine.json > @@ -1413,6 +1413,32 @@ > } > } > > +## > +# @SpmMemoryDeviceInfo: > +# > +# spm-memory device state information > +# > +# @id: device's ID > +# > +# @memaddr: physical address in memory, where device is mapped > +# > +# @size: size of memory that the device provides > +# > +# @node: NUMA proximity domain to which the device is assigned > +# > +# @memdev: memory backend linked with device > +# > +# Since: 11.1 > +## > +{ 'struct': 'SpmMemoryDeviceInfo', > + 'data': { '*id': 'str', > + 'memaddr': 'size', > + 'size': 'size', > + 'node': 'int', > + 'memdev': 'str' > + } > +} > + > ## > # @MemoryDeviceInfoKind: > # > @@ -1426,11 +1452,13 @@ > # > # @hv-balloon: since 8.2. > # > +# @spm-memory: since 11.1. > +# > # Since: 2.1 > ## > { 'enum': 'MemoryDeviceInfoKind', > 'data': [ 'dimm', 'nvdimm', 'virtio-pmem', 'virtio-mem', 'sgx-epc', > - 'hv-balloon' ] } > + 'hv-balloon', 'spm-memory' ] } > > ## > # @PCDIMMDeviceInfoWrapper: > @@ -1482,6 +1510,16 @@ > { 'struct': 'HvBalloonDeviceInfoWrapper', > 'data': { 'data': 'HvBalloonDeviceInfo' } } > > +## > +# @SpmMemoryDeviceInfoWrapper: > +# > +# @data: spm-memory device state information > +# > +# Since: 11.1 > +## > +{ 'struct': 'SpmMemoryDeviceInfoWrapper', > + 'data': { 'data': 'SpmMemoryDeviceInfo' } } > + > ## > # @MemoryDeviceInfo: > # > @@ -1499,7 +1537,8 @@ > 'virtio-pmem': 'VirtioPMEMDeviceInfoWrapper', > 'virtio-mem': 'VirtioMEMDeviceInfoWrapper', > 'sgx-epc': 'SgxEPCDeviceInfoWrapper', > - 'hv-balloon': 'HvBalloonDeviceInfoWrapper' > + 'hv-balloon': 'HvBalloonDeviceInfoWrapper', > + 'spm-memory': 'SpmMemoryDeviceInfoWrapper' > } > } > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v8 1/1] hw/mem: add spm-memory device for Specific Purpose Memory 2026-06-01 8:50 ` Igor Mammedov @ 2026-06-01 10:28 ` Huang, FangSheng (Jerry) 0 siblings, 0 replies; 4+ messages in thread From: Huang, FangSheng (Jerry) @ 2026-06-01 10:28 UTC (permalink / raw) To: Igor Mammedov; +Cc: qemu-devel, david, gourry, Zhigang.Luo, Lianjie.Shi Hi Igor, Thanks for the detailed review. Inline. On 6/1/2026 4:50 PM, Igor Mammedov wrote: > On Wed, 27 May 2026 15:42:15 +0800 > fanhuang <FangSheng.Huang@amd.com> wrote: > >> Introduce a TYPE_MEMORY_DEVICE subclass `spm-memory` for boot-time >> SOFT_RESERVED memory exposed to the guest with a per-device NUMA >> proximity domain. >> >> The device targets accelerator memory (HBM and similar) that the >> firmware hands to the guest OS as SOFT_RESERVED memory, so a driver >> in the guest -- rather than the kernel's general allocator -- owns >> the range. Per-device NUMA placement matches the natural shape of >> multiple HBM blocks (one block == one driver claim == one PXM). >> >> [...] >> >> CONFIG_SPM_MEMORY is selected by the i386 PC and Q35 machines >> (same as DIMM). > > this pass is mostly high level review. > patch is doing too much things at once, > Suggest to split it on several pieces, > 1. introducing spm-memory boiler plate code > 2. SRAT mangling > 3. adding E820 entry > Will split as (1) spm-memory boilerplate, (2) SRAT mangling, and (3) E820 + pc.c plug-handler integration; MAINTAINERS becomes (4). >> >> MAINTAINERS gets new file entries under the existing "Memory devices" >> stanza. >> >> Signed-off-by: FangSheng Huang <FangSheng.Huang@amd.com> >> --- >> MAINTAINERS | 2 + > a separate patch, pls. > Yes, will be patch (4) on its own. > > [...] > >> + cursor = ms->device_memory->base; >> + end = cursor + memory_region_size(&ms->device_memory->mr); >> + >> + object_child_foreach_recursive(qdev_get_machine(), >> + collect_spm_ranges_cb, ranges); > it's not an objection, but could we do better here, i.e. idea would be: > instead of full machine scan, take ms->device_memory and go over > children regions -> pick only SPM device owned ones. > Noted -- I'll stay with the current full-scan pattern for v9. The subregion's owner is the backend rather than the device, so a clean device_memory walk would need an extra backend->device reverse lookup. Happy to add that if you'd prefer. >> - /* >> - * Entry is required for Windows to enable memory hotplug in OS >> - * and for Linux to enable SWIOTLB when booted with less than >> - * 4G of RAM. Windows works better if the entry sets proximity >> - * to the highest NUMA node in the machine. >> - * Memory devices may override proximity set by this entry, >> - * providing _PXM method if necessary. >> - */ > > don't just delete comment,as it still stands true. we should keep reminder > why we adding place holder(s) and its quirks. > Will keep an adapted version of the comment alongside the new partition logic. > > [...] > >> +static void spm_memory_realize(DeviceState *dev, Error **errp) >> +{ >> + ERRP_GUARD(); >> + SpmMemoryDevice *spm = SPM_MEMORY(dev); > >> + MachineState *ms = MACHINE(qdev_get_machine()); > > pls do not use machine from device proper code. > we do have plug handlers that provide it at the time when necessary. > Will remove the MachineState reference; machine-level work moves to the pc.c plug handler (see below). >> + > >> + if (phase_check(PHASE_MACHINE_READY)) { >> + error_setg(errp, "spm-memory: hotplug is not supported " >> + "(boot-time-only device)"); >> + return; >> + } > > shouldn't be necessary, dc->hotpluggable in class init should be sufficient. > Confirmed -- will drop the check. > >> + spm_memory_check_node_exclusive(spm, ms, errp); >> + if (*errp) { >> + return; >> + } > > As far as I understood fro previous discussions, so far it's our > own precaution. > > I'd drop that, well, if you find a spec requiring it then > it should be a separate patch pointing to spec (or something else that > justifies it). > No spec to cite. Will drop the check and the associated helpers in v9. >> + >> + memory_device_pre_plug(MEMORY_DEVICE(spm), ms, errp); >> + if (*errp) { >> + return; >> + } > >> + >> + host_memory_backend_set_mapped(spm->hostmem, true); >> + memory_device_plug(MEMORY_DEVICE(spm), ms); > > That's basically code duplication, > that doesn't belong to realize_fn, see how it's used by other devices. > > The gist is mapping into address space, generic checks, machine related > steps go into machine handlers. > Will move pre_plug / plug / set_mapped into pc_spm_memory_pre_plug / pc_spm_memory_plug in hw/i386/pc.c, hooked via the existing pc_machine_device_{pre_,}plug_cb dispatch (same pattern as pc-dimm). > >> + >> + QLIST_INSERT_HEAD(&spm_memory_list, spm, next); > Don't use global list, unless you have to, see below. > Goes away with the plug-handler move below. >> + >> + if (!spm_machine_done_registered) { >> + spm_machine_done_notifier.notify = spm_memory_machine_done; >> + qemu_add_machine_init_done_notifier(&spm_machine_done_notifier); >> + spm_machine_done_registered = true; >> + } > > e820 part should also go to machine specific plug handler, > that will also hel with getting rid of spm_memory_list. > That also should let you get rid of adding machine_done handler, > the machine plug handler, would do the job instead (and much earlier). > Yes -- e820_add_entry moves into pc_spm_memory_plug(), eliminating both the global list and the machine-init-done notifier. > [...] > Will respin and post v9 shortly. Best regards, FangSheng Huang (Jerry) ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-01 10:34 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-27 7:42 [PATCH v8 0/1] hw/mem: add spm-memory device for Specific Purpose Memory fanhuang 2026-05-27 7:42 ` [PATCH v8 1/1] " fanhuang 2026-06-01 8:50 ` Igor Mammedov 2026-06-01 10:28 ` Huang, FangSheng (Jerry)
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.