* [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
@ 2025-12-09 9:38 fanhuang
2025-12-09 9:38 ` [PATCH v4 1/1] " fanhuang
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: fanhuang @ 2025-12-09 9:38 UTC (permalink / raw)
To: qemu-devel, david, imammedo, jonathan.cameron
Cc: Zhigang.Luo, Lianjie.Shi, FangSheng.Huang
Hi all,
This is v4 of the SPM (Specific Purpose Memory) patch. Thank you Jonathan
for the detailed review.
Changes in v4 (addressing Jonathan's feedback):
- Added architecture check: spm=on now reports error on non-x86 machines
- Simplified return logic in e820_update_entry_type() (return true/false directly)
- Changed 4GB boundary spanning from warn_report to error_report + exit
- Updated QAPI documentation to be architecture-agnostic (removed E820 reference)
- Removed unnecessary comments
Use case:
This feature allows passing EFI_MEMORY_SP (Specific Purpose Memory) from
host to guest VM, useful for memory reserved for specific PCI devices
(e.g., GPU memory via VFIO-PCI). The SPM memory appears as soft reserved
to the guest and is managed by device drivers rather than the OS memory
allocator.
Example usage:
-object memory-backend-ram,size=8G,id=m0
-object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0
-numa node,nodeid=0,memdev=m0
-numa node,nodeid=1,memdev=m1,spm=on
Please review. Thanks!
Best regards,
Jerry Huang
--
2.34.1
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-09 9:38 [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory fanhuang @ 2025-12-09 9:38 ` fanhuang 2025-12-23 9:56 ` Jonathan Cameron via 2025-12-29 18:26 ` [PATCH v4 0/1] " Gregory Price ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: fanhuang @ 2025-12-09 9:38 UTC (permalink / raw) To: qemu-devel, david, imammedo, jonathan.cameron Cc: Zhigang.Luo, Lianjie.Shi, FangSheng.Huang This patch adds support for Specific Purpose Memory (SPM) through the NUMA node configuration. When 'spm=on' is specified for a NUMA node, the memory region will be reported to the guest as soft reserved, allowing device drivers to manage it separately from normal system RAM. Note: This option is only supported on x86 platforms. Using spm=on on non-x86 machines will result in an error. Usage: -numa node,nodeid=0,memdev=m1,spm=on Signed-off-by: fanhuang <FangSheng.Huang@amd.com> --- hw/core/numa.c | 9 +++++ hw/i386/e820_memory_layout.c | 72 ++++++++++++++++++++++++++++++++++++ hw/i386/e820_memory_layout.h | 2 + hw/i386/pc.c | 51 +++++++++++++++++++++++++ include/system/numa.h | 1 + qapi/machine.json | 7 ++++ qemu-options.hx | 11 +++++- 7 files changed, 151 insertions(+), 2 deletions(-) diff --git a/hw/core/numa.c b/hw/core/numa.c index 218576f745..83079ba1fb 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -37,6 +37,7 @@ #include "hw/mem/pc-dimm.h" #include "hw/boards.h" #include "hw/mem/memory-device.h" +#include "hw/i386/x86.h" #include "qemu/option.h" #include "qemu/config-file.h" #include "qemu/cutils.h" @@ -163,6 +164,14 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, numa_info[nodenr].node_memdev = MEMORY_BACKEND(o); } + if (node->has_spm && node->spm) { + if (!object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) { + error_setg(errp, "spm option is only supported on x86 machines"); + return; + } + numa_info[nodenr].is_spm = true; + } + numa_info[nodenr].present = true; max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1); ms->numa_state->num_nodes++; diff --git a/hw/i386/e820_memory_layout.c b/hw/i386/e820_memory_layout.c index 3e848fb69c..4c62b5ddea 100644 --- a/hw/i386/e820_memory_layout.c +++ b/hw/i386/e820_memory_layout.c @@ -46,3 +46,75 @@ bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length) } return false; } + +bool e820_update_entry_type(uint64_t start, uint64_t length, uint32_t new_type) +{ + uint64_t end = start + length; + assert(!e820_done); + + /* For E820_SOFT_RESERVED, validate range is within E820_RAM */ + if (new_type == E820_SOFT_RESERVED) { + bool range_in_ram = false; + + for (size_t j = 0; j < e820_entries; j++) { + uint64_t ram_start = le64_to_cpu(e820_table[j].address); + uint64_t ram_end = ram_start + le64_to_cpu(e820_table[j].length); + uint32_t ram_type = le32_to_cpu(e820_table[j].type); + + if (ram_type == E820_RAM && ram_start <= start && ram_end >= end) { + range_in_ram = true; + break; + } + } + if (!range_in_ram) { + return false; + } + } + + /* Find entry that contains the target range and update it */ + for (size_t i = 0; i < e820_entries; i++) { + uint64_t entry_start = le64_to_cpu(e820_table[i].address); + uint64_t entry_length = le64_to_cpu(e820_table[i].length); + uint64_t entry_end = entry_start + entry_length; + + if (entry_start <= start && entry_end >= end) { + uint32_t original_type = e820_table[i].type; + + /* Remove original entry */ + memmove(&e820_table[i], &e820_table[i + 1], + (e820_entries - i - 1) * sizeof(struct e820_entry)); + e820_entries--; + + /* Add split parts inline */ + if (entry_start < start) { + e820_table = g_renew(struct e820_entry, e820_table, + e820_entries + 1); + e820_table[e820_entries].address = cpu_to_le64(entry_start); + e820_table[e820_entries].length = + cpu_to_le64(start - entry_start); + e820_table[e820_entries].type = original_type; + e820_entries++; + } + + e820_table = g_renew(struct e820_entry, e820_table, + e820_entries + 1); + e820_table[e820_entries].address = cpu_to_le64(start); + e820_table[e820_entries].length = cpu_to_le64(length); + e820_table[e820_entries].type = cpu_to_le32(new_type); + e820_entries++; + + if (end < entry_end) { + e820_table = g_renew(struct e820_entry, e820_table, + e820_entries + 1); + e820_table[e820_entries].address = cpu_to_le64(end); + e820_table[e820_entries].length = cpu_to_le64(entry_end - end); + e820_table[e820_entries].type = original_type; + e820_entries++; + } + + return true; + } + } + + return false; +} diff --git a/hw/i386/e820_memory_layout.h b/hw/i386/e820_memory_layout.h index b50acfa201..657cc679e2 100644 --- a/hw/i386/e820_memory_layout.h +++ b/hw/i386/e820_memory_layout.h @@ -15,6 +15,7 @@ #define E820_ACPI 3 #define E820_NVS 4 #define E820_UNUSABLE 5 +#define E820_SOFT_RESERVED 0xEFFFFFFF struct e820_entry { uint64_t address; @@ -26,5 +27,6 @@ void e820_add_entry(uint64_t address, uint64_t length, uint32_t type); bool e820_get_entry(int index, uint32_t type, uint64_t *address, uint64_t *length); int e820_get_table(struct e820_entry **table); +bool e820_update_entry_type(uint64_t start, uint64_t length, uint32_t new_type); #endif diff --git a/hw/i386/pc.c b/hw/i386/pc.c index f8b919cb6c..96066a1465 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -791,6 +791,54 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) return pc_above_4g_end(pcms) - 1; } +/* + * Update E820 entries for NUMA nodes marked as SPM (Specific Purpose Memory). + */ +static void pc_update_spm_memory(X86MachineState *x86ms) +{ + MachineState *ms = MACHINE(x86ms); + uint64_t addr = 0; + + for (int i = 0; i < ms->numa_state->num_nodes; i++) { + NodeInfo *numa_info = &ms->numa_state->nodes[i]; + uint64_t node_size = numa_info->node_mem; + + /* Process SPM nodes */ + if (numa_info->is_spm && numa_info->node_memdev) { + uint64_t guest_addr; + + /* Calculate guest physical address accounting for PCI hole */ + if (addr < x86ms->below_4g_mem_size) { + if (addr + node_size <= x86ms->below_4g_mem_size) { + /* Entirely below 4GB */ + guest_addr = addr; + } else { + error_report("SPM node %d spans across 4GB boundary, " + "this configuration is not supported", i); + exit(EXIT_FAILURE); + } + } else { + /* Above 4GB, account for PCI hole */ + guest_addr = 0x100000000ULL + + (addr - x86ms->below_4g_mem_size); + } + + /* Update E820 entry type to E820_SOFT_RESERVED */ + if (!e820_update_entry_type(guest_addr, node_size, + E820_SOFT_RESERVED)) { + warn_report("Failed to update E820 entry for SPM node %d " + "at 0x%" PRIx64 " length 0x%" PRIx64, + i, guest_addr, node_size); + } + } + + /* Accumulate address for next node */ + if (numa_info->node_memdev) { + addr += node_size; + } + } +} + /* * AMD systems with an IOMMU have an additional hole close to the * 1Tb, which are special GPAs that cannot be DMA mapped. Depending @@ -907,6 +955,9 @@ void pc_memory_init(PCMachineState *pcms, e820_add_entry(pcms->sgx_epc.base, pcms->sgx_epc.size, E820_RESERVED); } + /* Update E820 for NUMA nodes marked as SPM */ + pc_update_spm_memory(x86ms); + if (!pcmc->has_reserved_memory && (machine->ram_slots || (machine->maxram_size > machine->ram_size))) { diff --git a/include/system/numa.h b/include/system/numa.h index 1044b0eb6e..438511a756 100644 --- a/include/system/numa.h +++ b/include/system/numa.h @@ -41,6 +41,7 @@ typedef struct NodeInfo { bool present; bool has_cpu; bool has_gi; + bool is_spm; uint8_t lb_info_provided; uint16_t initiator; uint8_t distance[MAX_NODES]; diff --git a/qapi/machine.json b/qapi/machine.json index 907cb25f75..cbb19da35c 100644 --- a/qapi/machine.json +++ b/qapi/machine.json @@ -500,6 +500,12 @@ # @memdev: memory backend object. If specified for one node, it must # be specified for all nodes. # +# @spm: if true, mark the memory region of this node as Specific +# Purpose Memory (SPM). The memory will be reported to the +# guest as soft reserved, allowing device drivers to manage it +# separately from normal system RAM. Currently only supported +# on x86. (default: false, since 10.0) +# # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points # to the nodeid which has the memory controller responsible for # this NUMA node. This field provides additional information as @@ -514,6 +520,7 @@ '*cpus': ['uint16'], '*mem': 'size', '*memdev': 'str', + '*spm': 'bool', '*initiator': 'uint16' }} ## diff --git a/qemu-options.hx b/qemu-options.hx index fca2b7bc74..ffcd1f47cf 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -431,7 +431,7 @@ ERST DEF("numa", HAS_ARG, QEMU_OPTION_numa, "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n" - "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n" + "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node][,spm=on|off]\n" "-numa dist,src=source,dst=destination,val=distance\n" "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n" "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n" @@ -440,7 +440,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa, SRST ``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]`` \ -``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]`` +``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator][,spm=on|off]`` \ ``-numa dist,src=source,dst=destination,val=distance`` \ @@ -508,6 +508,13 @@ SRST largest bandwidth) to this NUMA node. Note that this option can be set only when the machine property 'hmat' is set to 'on'. + '\ ``spm``\ ' option marks the memory region of this NUMA node as + Specific Purpose Memory (SPM). When enabled, the memory will be + reported to the guest as soft reserved, allowing device drivers to + manage it separately from normal system RAM. This is useful for + device-specific memory that should not be used as general purpose + memory. This option is only supported on x86 platforms. + Following example creates a machine with 2 NUMA nodes, node 0 has CPU. node 1 has only memory, and its initiator is node 0. Note that because node 0 has CPU, by default the initiator of node 0 is itself -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-09 9:38 ` [PATCH v4 1/1] " fanhuang @ 2025-12-23 9:56 ` Jonathan Cameron via 2025-12-23 10:01 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 18+ messages in thread From: Jonathan Cameron via @ 2025-12-23 9:56 UTC (permalink / raw) To: fanhuang Cc: qemu-devel, david, imammedo, Zhigang.Luo, Lianjie.Shi, Alistair Popple, Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams On Tue, 9 Dec 2025 17:38:41 +0800 fanhuang <FangSheng.Huang@amd.com> wrote: > This patch adds support for Specific Purpose Memory (SPM) through the > NUMA node configuration. When 'spm=on' is specified for a NUMA node, > the memory region will be reported to the guest as soft reserved, > allowing device drivers to manage it separately from normal system RAM. > > Note: This option is only supported on x86 platforms. Using spm=on > on non-x86 machines will result in an error. > > Usage: > -numa node,nodeid=0,memdev=m1,spm=on > > Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Given the discussions at LPC around how to present GPU/HBM memory and suggestions that reserved might be a better choice. I wonder if this patch should provide that option as well? Or maybe as a potential follow up. The fun their is that you also need to arrange for DSDT entries to tie the region to the driver that actually wants it. Anyhow that session reminded me of what SPM was designed for (you don't want to know how long it took to come up with the name) and it is a little more subtle than the description in here suggests. The x86 specific code looks fine to me but I'm more or less totally unfamiliar with that, so need review from others. +CC a few folk from that discussion. I wasn't there in person and it sounded like the discussion moved to the hallway so it may have come to a totally different conclusion! https://lpc.events/event/19/contributions/2064/ has links to slides and youtube video. > diff --git a/qapi/machine.json b/qapi/machine.json > index 907cb25f75..cbb19da35c 100644 > --- a/qapi/machine.json > +++ b/qapi/machine.json > @@ -500,6 +500,12 @@ > # @memdev: memory backend object. If specified for one node, it must > # be specified for all nodes. > # > +# @spm: if true, mark the memory region of this node as Specific > +# Purpose Memory (SPM). The memory will be reported to the > +# guest as soft reserved, allowing device drivers to manage it > +# separately from normal system RAM. Currently only supported > +# on x86. (default: false, since 10.0) As below. This needs to say something about letting the guest know that it might want to let a driver manage it separately from normal system RAM. > +# > # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points > # to the nodeid which has the memory controller responsible for > # this NUMA node. This field provides additional information as > @@ -514,6 +520,7 @@ > '*cpus': ['uint16'], > '*mem': 'size', > '*memdev': 'str', > + '*spm': 'bool', > '*initiator': 'uint16' }} > > ## > diff --git a/qemu-options.hx b/qemu-options.hx > index fca2b7bc74..ffcd1f47cf 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -431,7 +431,7 @@ ERST > > DEF("numa", HAS_ARG, QEMU_OPTION_numa, > "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n" > - "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n" > + "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node][,spm=on|off]\n" > "-numa dist,src=source,dst=destination,val=distance\n" > "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n" > "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n" > @@ -440,7 +440,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa, > SRST > ``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]`` > \ > -``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]`` > +``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator][,spm=on|off]`` > \ > ``-numa dist,src=source,dst=destination,val=distance`` > \ > @@ -508,6 +508,13 @@ SRST > largest bandwidth) to this NUMA node. Note that this option can be > set only when the machine property 'hmat' is set to 'on'. > > + '\ ``spm``\ ' option marks the memory region of this NUMA node as > + Specific Purpose Memory (SPM). When enabled, the memory will be > + reported to the guest as soft reserved, allowing device drivers to > + manage it separately from normal system RAM. This is useful for > + device-specific memory that should not be used as general purpose > + memory. This option is only supported on x86 platforms. This wants tweaking. As came up at the LPC discussion, SPM is for memory that 'might' be used as general purpose memory if the policy of the guest is to do so - as Alistair pointed out at LPC, people don't actually do that very often, but none the less that's why this type exists. It is a strong hint to the guest that it needs to apply a policy choice to what happens to this memory. Reserved is for memory that is only suitable for use other than generic memory. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-23 9:56 ` Jonathan Cameron via @ 2025-12-23 10:01 ` David Hildenbrand (Red Hat) 2025-12-26 7:15 ` Huang, FangSheng (Jerry) 0 siblings, 1 reply; 18+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-12-23 10:01 UTC (permalink / raw) To: Jonathan Cameron, fanhuang Cc: qemu-devel, imammedo, Zhigang.Luo, Lianjie.Shi, Alistair Popple, Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams On 12/23/25 10:56, Jonathan Cameron via wrote: > On Tue, 9 Dec 2025 17:38:41 +0800 > fanhuang <FangSheng.Huang@amd.com> wrote: > >> This patch adds support for Specific Purpose Memory (SPM) through the >> NUMA node configuration. When 'spm=on' is specified for a NUMA node, >> the memory region will be reported to the guest as soft reserved, >> allowing device drivers to manage it separately from normal system RAM. >> >> Note: This option is only supported on x86 platforms. Using spm=on >> on non-x86 machines will result in an error. >> >> Usage: >> -numa node,nodeid=0,memdev=m1,spm=on >> >> Signed-off-by: fanhuang <FangSheng.Huang@amd.com> > > Given the discussions at LPC around how to present GPU/HBM memory and > suggestions that reserved might be a better choice. I wonder if this > patch should provide that option as well? Or maybe as a potential follow > up. The fun their is that you also need to arrange for DSDT entries to > tie the region to the driver that actually wants it. > > Anyhow that session reminded me of what SPM was designed for > (you don't want to know how long it took to come up with the name) > and it is a little more subtle than the description in here suggests. > > The x86 specific code looks fine to me but I'm more or less totally > unfamiliar with that, so need review from others. > > +CC a few folk from that discussion. I wasn't there in person and > it sounded like the discussion moved to the hallway so it may > have come to a totally different conclusion! > > https://lpc.events/event/19/contributions/2064/ has links to slides > and youtube video. > >> diff --git a/qapi/machine.json b/qapi/machine.json >> index 907cb25f75..cbb19da35c 100644 >> --- a/qapi/machine.json >> +++ b/qapi/machine.json >> @@ -500,6 +500,12 @@ >> # @memdev: memory backend object. If specified for one node, it must >> # be specified for all nodes. >> # >> +# @spm: if true, mark the memory region of this node as Specific >> +# Purpose Memory (SPM). The memory will be reported to the >> +# guest as soft reserved, allowing device drivers to manage it >> +# separately from normal system RAM. Currently only supported >> +# on x86. (default: false, since 10.0) > > As below. This needs to say something about letting the guest know > that it might want to let a driver manage it separately from normal > system RAM. > >> +# >> # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points >> # to the nodeid which has the memory controller responsible for >> # this NUMA node. This field provides additional information as >> @@ -514,6 +520,7 @@ >> '*cpus': ['uint16'], >> '*mem': 'size', >> '*memdev': 'str', >> + '*spm': 'bool', >> '*initiator': 'uint16' }} >> >> ## >> diff --git a/qemu-options.hx b/qemu-options.hx >> index fca2b7bc74..ffcd1f47cf 100644 >> --- a/qemu-options.hx >> +++ b/qemu-options.hx >> @@ -431,7 +431,7 @@ ERST >> >> DEF("numa", HAS_ARG, QEMU_OPTION_numa, >> "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n" >> - "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n" >> + "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node][,spm=on|off]\n" >> "-numa dist,src=source,dst=destination,val=distance\n" >> "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n" >> "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n" >> @@ -440,7 +440,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa, >> SRST >> ``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]`` >> \ >> -``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]`` >> +``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator][,spm=on|off]`` >> \ >> ``-numa dist,src=source,dst=destination,val=distance`` >> \ >> @@ -508,6 +508,13 @@ SRST >> largest bandwidth) to this NUMA node. Note that this option can be >> set only when the machine property 'hmat' is set to 'on'. >> >> + '\ ``spm``\ ' option marks the memory region of this NUMA node as >> + Specific Purpose Memory (SPM). When enabled, the memory will be >> + reported to the guest as soft reserved, allowing device drivers to >> + manage it separately from normal system RAM. This is useful for >> + device-specific memory that should not be used as general purpose >> + memory. This option is only supported on x86 platforms. > > This wants tweaking. As came up at the LPC discussion, SPM is for > memory that 'might' be used as general purpose memory if the policy of the > guest is to do so - as Alistair pointed out at LPC, people don't actually > do that very often, but none the less that's why this type exists. It is > a strong hint to the guest that it needs to apply a policy choice to > what happens to this memory. Just curious, it's the same on real hardware, right? -- Cheers David ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-23 10:01 ` David Hildenbrand (Red Hat) @ 2025-12-26 7:15 ` Huang, FangSheng (Jerry) 2025-12-26 22:46 ` Alistair Popple 2025-12-30 20:09 ` David Hildenbrand (Red Hat) 0 siblings, 2 replies; 18+ messages in thread From: Huang, FangSheng (Jerry) @ 2025-12-26 7:15 UTC (permalink / raw) To: David Hildenbrand (Red Hat), Jonathan Cameron Cc: qemu-devel, imammedo, Zhigang.Luo, Lianjie.Shi, Alistair Popple, Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams Hi Jonathan, David, Thanks for the review and for pointing out the LPC discussion! On 12/23/2025 6:01 PM, David Hildenbrand (Red Hat) wrote: > On 12/23/25 10:56, Jonathan Cameron via wrote: >> On Tue, 9 Dec 2025 17:38:41 +0800 >> fanhuang <FangSheng.Huang@amd.com> wrote: >> >>> This patch adds support for Specific Purpose Memory (SPM) through the >>> NUMA node configuration. When 'spm=on' is specified for a NUMA node, >>> the memory region will be reported to the guest as soft reserved, >>> allowing device drivers to manage it separately from normal system RAM. >>> >>> Note: This option is only supported on x86 platforms. Using spm=on >>> on non-x86 machines will result in an error. >>> >>> Usage: >>> -numa node,nodeid=0,memdev=m1,spm=on >>> >>> Signed-off-by: fanhuang <FangSheng.Huang@amd.com> >> >> Given the discussions at LPC around how to present GPU/HBM memory and >> suggestions that reserved might be a better choice. I wonder if this >> patch should provide that option as well? Or maybe as a potential follow >> up. The fun their is that you also need to arrange for DSDT entries to >> tie the region to the driver that actually wants it. >> >> Anyhow that session reminded me of what SPM was designed for >> (you don't want to know how long it took to come up with the name) >> and it is a little more subtle than the description in here suggests. >> >> The x86 specific code looks fine to me but I'm more or less totally >> unfamiliar with that, so need review from others. >> >> +CC a few folk from that discussion. I wasn't there in person and >> it sounded like the discussion moved to the hallway so it may >> have come to a totally different conclusion! >> >> https://lpc.events/event/19/contributions/2064/ has links to slides >> and youtube video. >> I watched the slides. Actually we've been experimenting with a combined approach: SBIOS reports HBM as SPM, then driver dynamically partitions and hot-plugs it as driver-managed memory to NUMA nodes. So SPM and driver-managed are complementary rather than mutually exclusive. This patch focuses on the first part - enabling QEMU to report memory as SPM to the guest. For the `reserved` option - agree it could be a potential follow-up, though it needs more investigation. For now, let's focus on SPM and soft reserved. >>> diff --git a/qapi/machine.json b/qapi/machine.json >>> index 907cb25f75..cbb19da35c 100644 >>> --- a/qapi/machine.json >>> +++ b/qapi/machine.json >>> @@ -500,6 +500,12 @@ >>> # @memdev: memory backend object. If specified for one node, it must >>> # be specified for all nodes. >>> # >>> +# @spm: if true, mark the memory region of this node as Specific >>> +# Purpose Memory (SPM). The memory will be reported to the >>> +# guest as soft reserved, allowing device drivers to manage it >>> +# separately from normal system RAM. Currently only supported >>> +# on x86. (default: false, since 10.0) >> >> As below. This needs to say something about letting the guest know >> that it might want to let a driver manage it separately from normal >> system RAM. >> >>> +# >>> # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points >>> # to the nodeid which has the memory controller responsible for >>> # this NUMA node. This field provides additional information as >>> @@ -514,6 +520,7 @@ >>> '*cpus': ['uint16'], >>> '*mem': 'size', >>> '*memdev': 'str', >>> + '*spm': 'bool', >>> '*initiator': 'uint16' }} >>> ## >>> diff --git a/qemu-options.hx b/qemu-options.hx >>> index fca2b7bc74..ffcd1f47cf 100644 >>> --- a/qemu-options.hx >>> +++ b/qemu-options.hx >>> @@ -431,7 +431,7 @@ ERST >>> DEF("numa", HAS_ARG, QEMU_OPTION_numa, >>> "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node] >>> [,initiator=node]\n" >>> - "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] >>> [,initiator=node]\n" >>> + "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] >>> [,initiator=node][,spm=on|off]\n" >>> "-numa dist,src=source,dst=destination,val=distance\n" >>> "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n" >>> "-numa hmat-lb,initiator=node,target=node,hierarchy=memory| >>> first-level|second-level|third-level,data-type=access-latency|read- >>> latency|write-latency[,latency=lat][,bandwidth=bw]\n" >>> @@ -440,7 +440,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa, >>> SRST >>> ``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node] >>> [,initiator=initiator]`` >>> \ >>> -``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] >>> [,initiator=initiator]`` >>> +``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] >>> [,initiator=initiator][,spm=on|off]`` >>> \ >>> ``-numa dist,src=source,dst=destination,val=distance`` >>> \ >>> @@ -508,6 +508,13 @@ SRST >>> largest bandwidth) to this NUMA node. Note that this option can be >>> set only when the machine property 'hmat' is set to 'on'. >>> + '\ ``spm``\ ' option marks the memory region of this NUMA node as >>> + Specific Purpose Memory (SPM). When enabled, the memory will be >>> + reported to the guest as soft reserved, allowing device drivers to >>> + manage it separately from normal system RAM. This is useful for >>> + device-specific memory that should not be used as general purpose >>> + memory. This option is only supported on x86 platforms. >> >> This wants tweaking. As came up at the LPC discussion, SPM is for >> memory that 'might' be used as general purpose memory if the policy of >> the >> guest is to do so - as Alistair pointed out at LPC, people don't actually >> do that very often, but none the less that's why this type exists. It is >> a strong hint to the guest that it needs to apply a policy choice to >> what happens to this memory. Got it. To clarify - this patch only handles the "reporting" part, just like how SBIOS reports HBM as SPM on real hardware. The guest kernel then decides how to use this memory based on its own policy (kernel config, boot parameters, etc.). Will update the docs to describe SPM as a policy hint rather than a definitive restriction. > > Just curious, it's the same on real hardware, right? > Hi David, could you clarify what you're asking about? Whether the SPM semantics are the same, or whether this QEMU implementation matches real hardware behavior? Best Regards, Jerry Huang ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-26 7:15 ` Huang, FangSheng (Jerry) @ 2025-12-26 22:46 ` Alistair Popple 2025-12-30 20:09 ` David Hildenbrand (Red Hat) 1 sibling, 0 replies; 18+ messages in thread From: Alistair Popple @ 2025-12-26 22:46 UTC (permalink / raw) To: Huang, FangSheng (Jerry) Cc: David Hildenbrand (Red Hat), Jonathan Cameron, qemu-devel, imammedo, Zhigang.Luo, Lianjie.Shi, Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams, Gregory Price On 2025-12-26 at 18:15 +1100, "Huang, FangSheng (Jerry)" <FangSheng.Huang@amd.com> wrote... > [You don't often get email from fangsheng.huang@amd.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > Hi Jonathan, David, > > Thanks for the review and for pointing out the LPC discussion! > > On 12/23/2025 6:01 PM, David Hildenbrand (Red Hat) wrote: > > On 12/23/25 10:56, Jonathan Cameron via wrote: > > > On Tue, 9 Dec 2025 17:38:41 +0800 > > > fanhuang <FangSheng.Huang@amd.com> wrote: > > > > > > > This patch adds support for Specific Purpose Memory (SPM) through the > > > > NUMA node configuration. When 'spm=on' is specified for a NUMA node, > > > > the memory region will be reported to the guest as soft reserved, > > > > allowing device drivers to manage it separately from normal system RAM. > > > > > > > > Note: This option is only supported on x86 platforms. Using spm=on > > > > on non-x86 machines will result in an error. > > > > > > > > Usage: > > > > -numa node,nodeid=0,memdev=m1,spm=on > > > > > > > > Signed-off-by: fanhuang <FangSheng.Huang@amd.com> > > > > > > Given the discussions at LPC around how to present GPU/HBM memory and > > > suggestions that reserved might be a better choice. I wonder if this > > > patch should provide that option as well? Or maybe as a potential follow > > > up. The fun their is that you also need to arrange for DSDT entries to > > > tie the region to the driver that actually wants it. > > > > > > Anyhow that session reminded me of what SPM was designed for > > > (you don't want to know how long it took to come up with the name) > > > and it is a little more subtle than the description in here suggests. > > > > > > The x86 specific code looks fine to me but I'm more or less totally > > > unfamiliar with that, so need review from others. > > > > > > +CC a few folk from that discussion. I wasn't there in person and > > > it sounded like the discussion moved to the hallway so it may > > > have come to a totally different conclusion! Indeed it did! We had an interesting discussion. I'm out of office for the next week or so though so don't have much to add for now but adding Gregory to this discussion as well. - Alistair > > > https://lpc.events/event/19/contributions/2064/ has links to slides > > > and youtube video. > > > > > I watched the slides. Actually we've been experimenting with > a combined approach: SBIOS > reports HBM as SPM, then driver dynamically partitions and hot-plugs it as > driver-managed memory to NUMA nodes. So SPM and driver-managed are > complementary rather than mutually exclusive. This patch focuses on the > first part - enabling QEMU to report memory as SPM to the guest. > > For the `reserved` option - agree it could be a potential follow-up, though > it needs more investigation. For now, let's focus on SPM and soft reserved. > > > > > diff --git a/qapi/machine.json b/qapi/machine.json > > > > index 907cb25f75..cbb19da35c 100644 > > > > --- a/qapi/machine.json > > > > +++ b/qapi/machine.json > > > > @@ -500,6 +500,12 @@ > > > > # @memdev: memory backend object. If specified for one node, it must > > > > # be specified for all nodes. > > > > # > > > > +# @spm: if true, mark the memory region of this node as Specific > > > > +# Purpose Memory (SPM). The memory will be reported to the > > > > +# guest as soft reserved, allowing device drivers to manage it > > > > +# separately from normal system RAM. Currently only supported > > > > +# on x86. (default: false, since 10.0) > > > > > > As below. This needs to say something about letting the guest know > > > that it might want to let a driver manage it separately from normal > > > system RAM. > > > > > > > +# > > > > # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points > > > > # to the nodeid which has the memory controller responsible for > > > > # this NUMA node. This field provides additional information as > > > > @@ -514,6 +520,7 @@ > > > > '*cpus': ['uint16'], > > > > '*mem': 'size', > > > > '*memdev': 'str', > > > > + '*spm': 'bool', > > > > '*initiator': 'uint16' }} > > > > ## > > > > diff --git a/qemu-options.hx b/qemu-options.hx > > > > index fca2b7bc74..ffcd1f47cf 100644 > > > > --- a/qemu-options.hx > > > > +++ b/qemu-options.hx > > > > @@ -431,7 +431,7 @@ ERST > > > > DEF("numa", HAS_ARG, QEMU_OPTION_numa, > > > > "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node] > > > > [,initiator=node]\n" > > > > - "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] > > > > [,initiator=node]\n" > > > > + "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] > > > > [,initiator=node][,spm=on|off]\n" > > > > "-numa dist,src=source,dst=destination,val=distance\n" > > > > "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n" > > > > "-numa hmat-lb,initiator=node,target=node,hierarchy=memory| > > > > first-level|second-level|third-level,data-type=access-latency|read- > > > > latency|write-latency[,latency=lat][,bandwidth=bw]\n" > > > > @@ -440,7 +440,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa, > > > > SRST > > > > ``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node] > > > > [,initiator=initiator]`` > > > > \ > > > > -``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] > > > > [,initiator=initiator]`` > > > > +``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] > > > > [,initiator=initiator][,spm=on|off]`` > > > > \ > > > > ``-numa dist,src=source,dst=destination,val=distance`` > > > > \ > > > > @@ -508,6 +508,13 @@ SRST > > > > largest bandwidth) to this NUMA node. Note that this option can be > > > > set only when the machine property 'hmat' is set to 'on'. > > > > + '\ ``spm``\ ' option marks the memory region of this NUMA node as > > > > + Specific Purpose Memory (SPM). When enabled, the memory will be > > > > + reported to the guest as soft reserved, allowing device drivers to > > > > + manage it separately from normal system RAM. This is useful for > > > > + device-specific memory that should not be used as general purpose > > > > + memory. This option is only supported on x86 platforms. > > > > > > This wants tweaking. As came up at the LPC discussion, SPM is for > > > memory that 'might' be used as general purpose memory if the policy of > > > the > > > guest is to do so - as Alistair pointed out at LPC, people don't actually > > > do that very often, but none the less that's why this type exists. It is > > > a strong hint to the guest that it needs to apply a policy choice to > > > what happens to this memory. > > Got it. To clarify - this patch only handles the "reporting" part, just > like how SBIOS reports HBM as SPM on real hardware. The guest kernel > then decides how to use this memory based on its own policy (kernel config, > boot parameters, etc.). Will update the docs to describe SPM as a > policy hint rather than a definitive restriction. > > > > > Just curious, it's the same on real hardware, right? > > > > Hi David, could you clarify what you're asking about? Whether the SPM > semantics are the same, or whether this QEMU implementation matches real > hardware behavior? > > Best Regards, > Jerry Huang ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-26 7:15 ` Huang, FangSheng (Jerry) 2025-12-26 22:46 ` Alistair Popple @ 2025-12-30 20:09 ` David Hildenbrand (Red Hat) 2026-01-04 10:43 ` Huang, FangSheng (Jerry) 1 sibling, 1 reply; 18+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-12-30 20:09 UTC (permalink / raw) To: Huang, FangSheng (Jerry), Jonathan Cameron Cc: qemu-devel, imammedo, Zhigang.Luo, Lianjie.Shi, Alistair Popple, Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams >> >> Just curious, it's the same on real hardware, right? >> > > Hi David, could you clarify what you're asking about? Whether the SPM > semantics are the same, or whether this QEMU implementation matches real > hardware behavior? Yes exactly. If it matches real hardware behavior then there are no real surprises exposed by the QEMU implementation. -- Cheers David ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-30 20:09 ` David Hildenbrand (Red Hat) @ 2026-01-04 10:43 ` Huang, FangSheng (Jerry) 0 siblings, 0 replies; 18+ messages in thread From: Huang, FangSheng (Jerry) @ 2026-01-04 10:43 UTC (permalink / raw) To: David Hildenbrand (Red Hat), Jonathan Cameron Cc: qemu-devel, imammedo, Zhigang.Luo, Lianjie.Shi, Alistair Popple, Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams On 12/31/2025 4:09 AM, David Hildenbrand (Red Hat) wrote: >>> >>> Just curious, it's the same on real hardware, right? >>> >> >> Hi David, could you clarify what you're asking about? Whether the SPM >> semantics are the same, or whether this QEMU implementation matches real >> hardware behavior? > > Yes exactly. If it matches real hardware behavior then there are no real > surprises exposed by the QEMU implementation. > For the SBIOS pre-configured scenario, yes, it matches. This QEMU implementation assumes SBIOS pre-configures the NUMA node with SPM via SRAT/E820 - SPM is static boot memory from VM start. One potential difference on real hardware: SPM might be initially soft-reserved by SBIOS, then dynamically added to a NUMA node via add_memory_driver_managed() at runtime. In that case, it's not pre-bound boot memory. This patch targets the first scenario, which should have no surprises. Thanks, Jerry ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-09 9:38 [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory fanhuang 2025-12-09 9:38 ` [PATCH v4 1/1] " fanhuang @ 2025-12-29 18:26 ` Gregory Price 2025-12-30 2:55 ` Huang, FangSheng (Jerry) 2026-01-02 13:09 ` Igor Mammedov 2026-01-02 16:30 ` Gregory Price 3 siblings, 1 reply; 18+ messages in thread From: Gregory Price @ 2025-12-29 18:26 UTC (permalink / raw) To: fanhuang Cc: qemu-devel, david, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote: > Example usage: > -object memory-backend-ram,size=8G,id=m0 > -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0 > -numa node,nodeid=0,memdev=m0 > -numa node,nodeid=1,memdev=m1,spm=on > Interesting that you added spm= to NUMA rather than the memory backend, but then in the patch you consume it to apply to the EFI/E820 memory maps. Sorry i've missed prior versions, is numa the right place to put this, considering that the node is not necessarily 100% SPM on a real system? (in practice it should be, but not technically required to be) ~Gregory ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-29 18:26 ` [PATCH v4 0/1] " Gregory Price @ 2025-12-30 2:55 ` Huang, FangSheng (Jerry) 2025-12-30 14:06 ` Gregory Price 0 siblings, 1 reply; 18+ messages in thread From: Huang, FangSheng (Jerry) @ 2025-12-30 2:55 UTC (permalink / raw) To: Gregory Price Cc: qemu-devel, david, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi Hi Gregory, Thanks for your review and good question! On 12/30/2025 2:26 AM, Gregory Price wrote: > On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote: >> Example usage: >> -object memory-backend-ram,size=8G,id=m0 >> -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0 >> -numa node,nodeid=0,memdev=m0 >> -numa node,nodeid=1,memdev=m1,spm=on >> > > Interesting that you added spm= to NUMA rather than the memory backend, > but then in the patch you consume it to apply to the EFI/E820 memory > maps. > > Sorry i've missed prior versions, is numa the right place to put this, > considering that the node is not necessarily 100% SPM on a real system? > The decision to add `spm=` to NUMA rather than the memory backend was based on earlier feedback from David during our initial RFC discussions. David raised a concern that if we put the spm flag on the memory backend, a user could accidentally pass such a memory backend to DIMM/virtio-mem/boot memory, which would have very undesired side effects. > (in practice it should be, but not technically required to be) You're right that on a real system, a NUMA node is not technically required to be 100% SPM. However, in AMD's use case, the entire NUMA node memory (backed by memdev) is intended to be SPM, and this approach provides a cleaner and safer configuration interface. > > ~Gregory Please let me know if you have further concerns or suggestions. Best Regards, Jerry Huang ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-30 2:55 ` Huang, FangSheng (Jerry) @ 2025-12-30 14:06 ` Gregory Price 2025-12-30 20:15 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 18+ messages in thread From: Gregory Price @ 2025-12-30 14:06 UTC (permalink / raw) To: Huang, FangSheng (Jerry) Cc: qemu-devel, david, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi On Tue, Dec 30, 2025 at 10:55:02AM +0800, Huang, FangSheng (Jerry) wrote: > Hi Gregory, > > > Sorry i've missed prior versions, is numa the right place to put this, > > considering that the node is not necessarily 100% SPM on a real system? > > > > The decision to add `spm=` to NUMA rather than the memory backend was based > on > earlier feedback from David during our initial RFC discussions. > > David raised a concern that if we put the spm flag on the memory backend, a > user > could accidentally pass such a memory backend to DIMM/virtio-mem/boot > memory, > which would have very undesired side effects. > This makes sense, and in fact I almost wonder if we should actually encode a warning in linux in general if a signal NUMA node contains both normal and SPM. That would help drive consistency between QEMU/KVM and real platforms from the direction of linux. > > (in practice it should be, but not technically required to be) > > You're right that on a real system, a NUMA node is not technically required > to > be 100% SPM. However, in AMD's use case, the entire NUMA node memory (backed > by > memdev) is intended to be SPM, and this approach provides a cleaner and > safer > configuration interface. > I figured this was the case, and honestly this just provides more evidence that any given NUMA node probably should only have 1 "type" of memory (or otherwise stated: uniform access within a node, non-uniform across nodes). --- bit of an aside - but at LPC we also talked about SPM NUMA nodes: https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/ Would be cool to be able to detect this in the drivers and have hotplug automatically mark a node SPM unless a driver overrides it. (MHP flag? Sorry David :P) > > > > ~Gregory > > Please let me know if you have further concerns or suggestions. > I'll look at the patch details a bit more, but generally I like the direction - with an obvious note that I have a biased given the above. ~Gregory ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-30 14:06 ` Gregory Price @ 2025-12-30 20:15 ` David Hildenbrand (Red Hat) 2025-12-30 23:03 ` Gregory Price 0 siblings, 1 reply; 18+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-12-30 20:15 UTC (permalink / raw) To: Gregory Price, Huang, FangSheng (Jerry) Cc: qemu-devel, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi On 12/30/25 15:06, Gregory Price wrote: > On Tue, Dec 30, 2025 at 10:55:02AM +0800, Huang, FangSheng (Jerry) wrote: >> Hi Gregory, >> >>> Sorry i've missed prior versions, is numa the right place to put this, >>> considering that the node is not necessarily 100% SPM on a real system? >>> >> >> The decision to add `spm=` to NUMA rather than the memory backend was based >> on >> earlier feedback from David during our initial RFC discussions. >> >> David raised a concern that if we put the spm flag on the memory backend, a >> user >> could accidentally pass such a memory backend to DIMM/virtio-mem/boot >> memory, >> which would have very undesired side effects. >> > > This makes sense, and in fact I almost wonder if we should actually > encode a warning in linux in general if a signal NUMA node contains > both normal and SPM. That would help drive consistency between QEMU/KVM > and real platforms from the direction of linux. Yeah, in theory we would have a "memory device" for all boot memory (boot DIMM, not sure ...) and that one would actually be marked as "spm". It's not really a thing of a memory backend after all, it's only how that memory is exposed to the VM. And given we don't have a boot memory device, the idea was to set it for the Node, where it means "all boot memory is SPM". And we only allow one type of boot memory (one memory backend) per node in QEMU. The tricky question is what happens with memory hotplug (DIMMs etc) on such a node. I'd argue that it's simply not SPM. > >>> (in practice it should be, but not technically required to be) >> >> You're right that on a real system, a NUMA node is not technically required >> to >> be 100% SPM. However, in AMD's use case, the entire NUMA node memory (backed >> by >> memdev) is intended to be SPM, and this approach provides a cleaner and >> safer >> configuration interface. >> > > I figured this was the case, and honestly this just provides more > evidence that any given NUMA node probably should only have 1 "type" of > memory (or otherwise stated: uniform access within a node, non-uniform > across nodes). That makes sense. > > --- > > bit of an aside - but at LPC we also talked about SPM NUMA nodes: > https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/ > > Would be cool to be able to detect this in the drivers and have hotplug > automatically mark a node SPM unless a driver overrides it. > (MHP flag? Sorry David :P) :) If it's a per-node thing, MHP flags feel a bit like "too late". It should be configured earlier for the node somehow. > >>> >>> ~Gregory >> >> Please let me know if you have further concerns or suggestions. >> > > I'll look at the patch details a bit more, but generally I like the > direction - with an obvious note that I have a biased given the above. Thanks for taking a look! -- Cheers David ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-30 20:15 ` David Hildenbrand (Red Hat) @ 2025-12-30 23:03 ` Gregory Price 0 siblings, 0 replies; 18+ messages in thread From: Gregory Price @ 2025-12-30 23:03 UTC (permalink / raw) To: David Hildenbrand (Red Hat) Cc: Huang, FangSheng (Jerry), qemu-devel, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi On Tue, Dec 30, 2025 at 09:15:34PM +0100, David Hildenbrand (Red Hat) wrote: > On 12/30/25 15:06, Gregory Price wrote: > > And given we don't have a boot memory device, the idea was to set it for the > Node, where it means "all boot memory is SPM". And we only allow one type of > boot memory (one memory backend) per node in QEMU. > > The tricky question is what happens with memory hotplug (DIMMs etc) on such > a node. I'd argue that it's simply not SPM. > ... +++ .../docs/whatever + Don't do that. :] > > > > --- > > > > bit of an aside - but at LPC we also talked about SPM NUMA nodes: > > https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/ > > > > Would be cool to be able to detect this in the drivers and have hotplug > > automatically mark a node SPM unless a driver overrides it. > > (MHP flag? Sorry David :P) > > :) > > If it's a per-node thing, MHP flags feel a bit like "too late". It should be > configured earlier for the node somehow. > just a clarification, the flag would be an override to have mhp mark a node N_MEMORY instead of N_SPM. As it stands right now, a node is "online with memory" if N_MEMORY is set for that node. https://elixir.bootlin.com/linux/v6.14-rc6/source/mm/memory_hotplug.c#L717 I imagine hotplugged N_SPM would operate the same. So mhp code would look like if (node_data->is_spm && !override) node_set_state(node, N_SPM) else node_set_state(node, N_MEMORY) Basically would allow SPM nodes to operate the same as they did before when hotplugged to retain existing behavior. (Sorry i'm think waaaaaaaaaaaaay far ahead here) ~Gregory ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-09 9:38 [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory fanhuang 2025-12-09 9:38 ` [PATCH v4 1/1] " fanhuang 2025-12-29 18:26 ` [PATCH v4 0/1] " Gregory Price @ 2026-01-02 13:09 ` Igor Mammedov 2026-01-02 16:28 ` Gregory Price 2026-01-02 16:30 ` Gregory Price 3 siblings, 1 reply; 18+ messages in thread From: Igor Mammedov @ 2026-01-02 13:09 UTC (permalink / raw) To: fanhuang; +Cc: qemu-devel, david, jonathan.cameron, Zhigang.Luo, Lianjie.Shi On Tue, 9 Dec 2025 17:38:40 +0800 fanhuang <FangSheng.Huang@amd.com> wrote: > Hi all, > > This is v4 of the SPM (Specific Purpose Memory) patch. Thank you Jonathan > for the detailed review. > > Changes in v4 (addressing Jonathan's feedback): > - Added architecture check: spm=on now reports error on non-x86 machines > - Simplified return logic in e820_update_entry_type() (return true/false directly) > - Changed 4GB boundary spanning from warn_report to error_report + exit > - Updated QAPI documentation to be architecture-agnostic (removed E820 reference) > - Removed unnecessary comments > > Use case: > This feature allows passing EFI_MEMORY_SP (Specific Purpose Memory) from > host to guest VM, useful for memory reserved for specific PCI devices > (e.g., GPU memory via VFIO-PCI). The SPM memory appears as soft reserved > to the guest and is managed by device drivers rather than the OS memory > allocator. > > Example usage: > -object memory-backend-ram,size=8G,id=m0 > -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0 > -numa node,nodeid=0,memdev=m0 > -numa node,nodeid=1,memdev=m1,spm=on I'm still not fond of 'spm' toggle on numa node itself (even though on AMD hadware sunch memory has 1:1 mapping) without device model in between. Can we try following instead: * add 'spm' property to DIMM device and disable hotplug on it in such case * make E820 enumerate spm/not hotpluggble marked DIMMs. That will let us later to have mixed memory on the node if such need arises without breaking QEMU CLI. > Please review. Thanks! > > Best regards, > Jerry Huang > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory 2026-01-02 13:09 ` Igor Mammedov @ 2026-01-02 16:28 ` Gregory Price 0 siblings, 0 replies; 18+ messages in thread From: Gregory Price @ 2026-01-02 16:28 UTC (permalink / raw) To: Igor Mammedov Cc: fanhuang, qemu-devel, david, jonathan.cameron, Zhigang.Luo, Lianjie.Shi On Fri, Jan 02, 2026 at 02:09:22PM +0100, Igor Mammedov wrote: > That will let us later to have mixed memory on the node We were just discussing strongly-dissuading such a configuration from a linux perspective, even if it's technically allowed. If only because it makes reasoning about placement policy on such a node completely impossible. ~Gregory ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory 2025-12-09 9:38 [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory fanhuang ` (2 preceding siblings ...) 2026-01-02 13:09 ` Igor Mammedov @ 2026-01-02 16:30 ` Gregory Price 2026-01-05 15:29 ` David Hildenbrand (Red Hat) 3 siblings, 1 reply; 18+ messages in thread From: Gregory Price @ 2026-01-02 16:30 UTC (permalink / raw) To: fanhuang Cc: qemu-devel, david, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote: > -numa node,nodeid=0,memdev=m0 > -numa node,nodeid=1,memdev=m1,spm=on > Should discussion with Jonathan - whatever form this ends up taking, can we change this from [on,off] to [normal,spm,reserved] and apply the appropriate types accordingly? don't know what to name the tag in that case, something like.. memmap_type=[normal,spm,reserved] ? (not married to this, open to suggestions) ~Gregory ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory 2026-01-02 16:30 ` Gregory Price @ 2026-01-05 15:29 ` David Hildenbrand (Red Hat) 2026-01-07 9:03 ` Huang, FangSheng (Jerry) 0 siblings, 1 reply; 18+ messages in thread From: David Hildenbrand (Red Hat) @ 2026-01-05 15:29 UTC (permalink / raw) To: Gregory Price, fanhuang Cc: qemu-devel, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi On 1/2/26 17:30, Gregory Price wrote: > On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote: >> -numa node,nodeid=0,memdev=m0 >> -numa node,nodeid=1,memdev=m1,spm=on >> > > Should discussion with Jonathan - whatever form this ends up taking, can > we change this from [on,off] to [normal,spm,reserved] and apply the > appropriate types accordingly? > > don't know what to name the tag in that case, something like.. > > memmap_type=[normal,spm,reserved] ? That looks more extensible indeed. The semantics would be unchanged compared to spm=on: only applies to boot memory. Although, as discussed, mixing and matching types per node should be avoided either way. -- Cheers David ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory 2026-01-05 15:29 ` David Hildenbrand (Red Hat) @ 2026-01-07 9:03 ` Huang, FangSheng (Jerry) 0 siblings, 0 replies; 18+ messages in thread From: Huang, FangSheng (Jerry) @ 2026-01-07 9:03 UTC (permalink / raw) To: David Hildenbrand (Red Hat), Gregory Price Cc: qemu-devel, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi On 1/5/2026 11:29 PM, David Hildenbrand (Red Hat) wrote: > On 1/2/26 17:30, Gregory Price wrote: >> On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote: >>> -numa node,nodeid=0,memdev=m0 >>> -numa node,nodeid=1,memdev=m1,spm=on >>> >> >> Should discussion with Jonathan - whatever form this ends up taking, can >> we change this from [on,off] to [normal,spm,reserved] and apply the >> appropriate types accordingly? >> >> don't know what to name the tag in that case, something like.. >> >> memmap_type=[normal,spm,reserved] ? > > That looks more extensible indeed. > > The semantics would be unchanged compared to spm=on: only applies to > boot memory. Although, as discussed, mixing and matching types per node > should be avoided either way. > Hi Gregory, David, Thank you for the suggestion on making this more extensible. I agree that `memmap_type=[normal,spm,reserved]` is a better approach than the simple boolean `spm=on|off`. I've analyzed the required changes and will prepare an updated patch implementing this. However, I need to go through an internal review process before submitting to the community, which may take some time. In the meantime, any feedback or suggestions on the design are welcome. Best Regards, Jerry Huang ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-01-07 9:09 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-09 9:38 [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory fanhuang 2025-12-09 9:38 ` [PATCH v4 1/1] " fanhuang 2025-12-23 9:56 ` Jonathan Cameron via 2025-12-23 10:01 ` David Hildenbrand (Red Hat) 2025-12-26 7:15 ` Huang, FangSheng (Jerry) 2025-12-26 22:46 ` Alistair Popple 2025-12-30 20:09 ` David Hildenbrand (Red Hat) 2026-01-04 10:43 ` Huang, FangSheng (Jerry) 2025-12-29 18:26 ` [PATCH v4 0/1] " Gregory Price 2025-12-30 2:55 ` Huang, FangSheng (Jerry) 2025-12-30 14:06 ` Gregory Price 2025-12-30 20:15 ` David Hildenbrand (Red Hat) 2025-12-30 23:03 ` Gregory Price 2026-01-02 13:09 ` Igor Mammedov 2026-01-02 16:28 ` Gregory Price 2026-01-02 16:30 ` Gregory Price 2026-01-05 15:29 ` David Hildenbrand (Red Hat) 2026-01-07 9:03 ` Huang, FangSheng (Jerry)
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.