All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
@ 2025-12-09  9:38 fanhuang
  2025-12-09  9:38 ` [PATCH v4 1/1] " fanhuang
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: fanhuang @ 2025-12-09  9:38 UTC (permalink / raw)
  To: qemu-devel, david, imammedo, jonathan.cameron
  Cc: Zhigang.Luo, Lianjie.Shi, FangSheng.Huang

Hi all,

This is v4 of the SPM (Specific Purpose Memory) patch. Thank you Jonathan
for the detailed review.

Changes in v4 (addressing Jonathan's feedback):
- Added architecture check: spm=on now reports error on non-x86 machines
- Simplified return logic in e820_update_entry_type() (return true/false directly)
- Changed 4GB boundary spanning from warn_report to error_report + exit
- Updated QAPI documentation to be architecture-agnostic (removed E820 reference)
- Removed unnecessary comments

Use case:
This feature allows passing EFI_MEMORY_SP (Specific Purpose Memory) from
host to guest VM, useful for memory reserved for specific PCI devices
(e.g., GPU memory via VFIO-PCI). The SPM memory appears as soft reserved
to the guest and is managed by device drivers rather than the OS memory
allocator.

Example usage:
  -object memory-backend-ram,size=8G,id=m0
  -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0
  -numa node,nodeid=0,memdev=m0
  -numa node,nodeid=1,memdev=m1,spm=on

Please review. Thanks!

Best regards,
Jerry Huang

-- 
2.34.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-09  9:38 [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory fanhuang
@ 2025-12-09  9:38 ` fanhuang
  2025-12-23  9:56   ` Jonathan Cameron via
  2025-12-29 18:26 ` [PATCH v4 0/1] " Gregory Price
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: fanhuang @ 2025-12-09  9:38 UTC (permalink / raw)
  To: qemu-devel, david, imammedo, jonathan.cameron
  Cc: Zhigang.Luo, Lianjie.Shi, FangSheng.Huang

This patch adds support for Specific Purpose Memory (SPM) through the
NUMA node configuration. When 'spm=on' is specified for a NUMA node,
the memory region will be reported to the guest as soft reserved,
allowing device drivers to manage it separately from normal system RAM.

Note: This option is only supported on x86 platforms. Using spm=on
on non-x86 machines will result in an error.

Usage:
  -numa node,nodeid=0,memdev=m1,spm=on

Signed-off-by: fanhuang <FangSheng.Huang@amd.com>
---
 hw/core/numa.c               |  9 +++++
 hw/i386/e820_memory_layout.c | 72 ++++++++++++++++++++++++++++++++++++
 hw/i386/e820_memory_layout.h |  2 +
 hw/i386/pc.c                 | 51 +++++++++++++++++++++++++
 include/system/numa.h        |  1 +
 qapi/machine.json            |  7 ++++
 qemu-options.hx              | 11 +++++-
 7 files changed, 151 insertions(+), 2 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 218576f745..83079ba1fb 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -37,6 +37,7 @@
 #include "hw/mem/pc-dimm.h"
 #include "hw/boards.h"
 #include "hw/mem/memory-device.h"
+#include "hw/i386/x86.h"
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 #include "qemu/cutils.h"
@@ -163,6 +164,14 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
         numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
     }
 
+    if (node->has_spm && node->spm) {
+        if (!object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
+            error_setg(errp, "spm option is only supported on x86 machines");
+            return;
+        }
+        numa_info[nodenr].is_spm = true;
+    }
+
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
     ms->numa_state->num_nodes++;
diff --git a/hw/i386/e820_memory_layout.c b/hw/i386/e820_memory_layout.c
index 3e848fb69c..4c62b5ddea 100644
--- a/hw/i386/e820_memory_layout.c
+++ b/hw/i386/e820_memory_layout.c
@@ -46,3 +46,75 @@ bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
     }
     return false;
 }
+
+bool e820_update_entry_type(uint64_t start, uint64_t length, uint32_t new_type)
+{
+    uint64_t end = start + length;
+    assert(!e820_done);
+
+    /* For E820_SOFT_RESERVED, validate range is within E820_RAM */
+    if (new_type == E820_SOFT_RESERVED) {
+        bool range_in_ram = false;
+
+        for (size_t j = 0; j < e820_entries; j++) {
+            uint64_t ram_start = le64_to_cpu(e820_table[j].address);
+            uint64_t ram_end = ram_start + le64_to_cpu(e820_table[j].length);
+            uint32_t ram_type = le32_to_cpu(e820_table[j].type);
+
+            if (ram_type == E820_RAM && ram_start <= start && ram_end >= end) {
+                range_in_ram = true;
+                break;
+            }
+        }
+        if (!range_in_ram) {
+            return false;
+        }
+    }
+
+    /* Find entry that contains the target range and update it */
+    for (size_t i = 0; i < e820_entries; i++) {
+        uint64_t entry_start = le64_to_cpu(e820_table[i].address);
+        uint64_t entry_length = le64_to_cpu(e820_table[i].length);
+        uint64_t entry_end = entry_start + entry_length;
+
+        if (entry_start <= start && entry_end >= end) {
+            uint32_t original_type = e820_table[i].type;
+
+            /* Remove original entry */
+            memmove(&e820_table[i], &e820_table[i + 1],
+                    (e820_entries - i - 1) * sizeof(struct e820_entry));
+            e820_entries--;
+
+            /* Add split parts inline */
+            if (entry_start < start) {
+                e820_table = g_renew(struct e820_entry, e820_table,
+                                     e820_entries + 1);
+                e820_table[e820_entries].address = cpu_to_le64(entry_start);
+                e820_table[e820_entries].length =
+                    cpu_to_le64(start - entry_start);
+                e820_table[e820_entries].type = original_type;
+                e820_entries++;
+            }
+
+            e820_table = g_renew(struct e820_entry, e820_table,
+                                 e820_entries + 1);
+            e820_table[e820_entries].address = cpu_to_le64(start);
+            e820_table[e820_entries].length = cpu_to_le64(length);
+            e820_table[e820_entries].type = cpu_to_le32(new_type);
+            e820_entries++;
+
+            if (end < entry_end) {
+                e820_table = g_renew(struct e820_entry, e820_table,
+                                     e820_entries + 1);
+                e820_table[e820_entries].address = cpu_to_le64(end);
+                e820_table[e820_entries].length = cpu_to_le64(entry_end - end);
+                e820_table[e820_entries].type = original_type;
+                e820_entries++;
+            }
+
+            return true;
+        }
+    }
+
+    return false;
+}
diff --git a/hw/i386/e820_memory_layout.h b/hw/i386/e820_memory_layout.h
index b50acfa201..657cc679e2 100644
--- a/hw/i386/e820_memory_layout.h
+++ b/hw/i386/e820_memory_layout.h
@@ -15,6 +15,7 @@
 #define E820_ACPI       3
 #define E820_NVS        4
 #define E820_UNUSABLE   5
+#define E820_SOFT_RESERVED  0xEFFFFFFF
 
 struct e820_entry {
     uint64_t address;
@@ -26,5 +27,6 @@ void e820_add_entry(uint64_t address, uint64_t length, uint32_t type);
 bool e820_get_entry(int index, uint32_t type,
                     uint64_t *address, uint64_t *length);
 int e820_get_table(struct e820_entry **table);
+bool e820_update_entry_type(uint64_t start, uint64_t length, uint32_t new_type);
 
 #endif
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f8b919cb6c..96066a1465 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -791,6 +791,54 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size)
     return pc_above_4g_end(pcms) - 1;
 }
 
+/*
+ * Update E820 entries for NUMA nodes marked as SPM (Specific Purpose Memory).
+ */
+static void pc_update_spm_memory(X86MachineState *x86ms)
+{
+    MachineState *ms = MACHINE(x86ms);
+    uint64_t addr = 0;
+
+    for (int i = 0; i < ms->numa_state->num_nodes; i++) {
+        NodeInfo *numa_info = &ms->numa_state->nodes[i];
+        uint64_t node_size = numa_info->node_mem;
+
+        /* Process SPM nodes */
+        if (numa_info->is_spm && numa_info->node_memdev) {
+            uint64_t guest_addr;
+
+            /* Calculate guest physical address accounting for PCI hole */
+            if (addr < x86ms->below_4g_mem_size) {
+                if (addr + node_size <= x86ms->below_4g_mem_size) {
+                    /* Entirely below 4GB */
+                    guest_addr = addr;
+                } else {
+                    error_report("SPM node %d spans across 4GB boundary, "
+                                 "this configuration is not supported", i);
+                    exit(EXIT_FAILURE);
+                }
+            } else {
+                /* Above 4GB, account for PCI hole */
+                guest_addr = 0x100000000ULL +
+                            (addr - x86ms->below_4g_mem_size);
+            }
+
+            /* Update E820 entry type to E820_SOFT_RESERVED */
+            if (!e820_update_entry_type(guest_addr, node_size,
+                                       E820_SOFT_RESERVED)) {
+                warn_report("Failed to update E820 entry for SPM node %d "
+                           "at 0x%" PRIx64 " length 0x%" PRIx64,
+                           i, guest_addr, node_size);
+            }
+        }
+
+        /* Accumulate address for next node */
+        if (numa_info->node_memdev) {
+            addr += node_size;
+        }
+    }
+}
+
 /*
  * AMD systems with an IOMMU have an additional hole close to the
  * 1Tb, which are special GPAs that cannot be DMA mapped. Depending
@@ -907,6 +955,9 @@ void pc_memory_init(PCMachineState *pcms,
         e820_add_entry(pcms->sgx_epc.base, pcms->sgx_epc.size, E820_RESERVED);
     }
 
+    /* Update E820 for NUMA nodes marked as SPM */
+    pc_update_spm_memory(x86ms);
+
     if (!pcmc->has_reserved_memory &&
         (machine->ram_slots ||
          (machine->maxram_size > machine->ram_size))) {
diff --git a/include/system/numa.h b/include/system/numa.h
index 1044b0eb6e..438511a756 100644
--- a/include/system/numa.h
+++ b/include/system/numa.h
@@ -41,6 +41,7 @@ typedef struct NodeInfo {
     bool present;
     bool has_cpu;
     bool has_gi;
+    bool is_spm;
     uint8_t lb_info_provided;
     uint16_t initiator;
     uint8_t distance[MAX_NODES];
diff --git a/qapi/machine.json b/qapi/machine.json
index 907cb25f75..cbb19da35c 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -500,6 +500,12 @@
 # @memdev: memory backend object.  If specified for one node, it must
 #     be specified for all nodes.
 #
+# @spm: if true, mark the memory region of this node as Specific
+#     Purpose Memory (SPM).  The memory will be reported to the
+#     guest as soft reserved, allowing device drivers to manage it
+#     separately from normal system RAM.  Currently only supported
+#     on x86.  (default: false, since 10.0)
+#
 # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points
 #     to the nodeid which has the memory controller responsible for
 #     this NUMA node.  This field provides additional information as
@@ -514,6 +520,7 @@
    '*cpus':   ['uint16'],
    '*mem':    'size',
    '*memdev': 'str',
+   '*spm':    'bool',
    '*initiator': 'uint16' }}
 
 ##
diff --git a/qemu-options.hx b/qemu-options.hx
index fca2b7bc74..ffcd1f47cf 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -431,7 +431,7 @@ ERST
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
-    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
+    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node][,spm=on|off]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
     "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
     "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
@@ -440,7 +440,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
 SRST
 ``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]``
   \ 
-``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]``
+``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator][,spm=on|off]``
   \
 ``-numa dist,src=source,dst=destination,val=distance``
   \ 
@@ -508,6 +508,13 @@ SRST
     largest bandwidth) to this NUMA node. Note that this option can be
     set only when the machine property 'hmat' is set to 'on'.
 
+    '\ ``spm``\ ' option marks the memory region of this NUMA node as
+    Specific Purpose Memory (SPM). When enabled, the memory will be
+    reported to the guest as soft reserved, allowing device drivers to
+    manage it separately from normal system RAM. This is useful for
+    device-specific memory that should not be used as general purpose
+    memory. This option is only supported on x86 platforms.
+
     Following example creates a machine with 2 NUMA nodes, node 0 has
     CPU. node 1 has only memory, and its initiator is node 0. Note that
     because node 0 has CPU, by default the initiator of node 0 is itself
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-09  9:38 ` [PATCH v4 1/1] " fanhuang
@ 2025-12-23  9:56   ` Jonathan Cameron via
  2025-12-23 10:01     ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 18+ messages in thread
From: Jonathan Cameron via @ 2025-12-23  9:56 UTC (permalink / raw)
  To: fanhuang
  Cc: qemu-devel, david, imammedo, Zhigang.Luo, Lianjie.Shi,
	Alistair Popple, Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams

On Tue, 9 Dec 2025 17:38:41 +0800
fanhuang <FangSheng.Huang@amd.com> wrote:

> This patch adds support for Specific Purpose Memory (SPM) through the
> NUMA node configuration. When 'spm=on' is specified for a NUMA node,
> the memory region will be reported to the guest as soft reserved,
> allowing device drivers to manage it separately from normal system RAM.
> 
> Note: This option is only supported on x86 platforms. Using spm=on
> on non-x86 machines will result in an error.
> 
> Usage:
>   -numa node,nodeid=0,memdev=m1,spm=on
> 
> Signed-off-by: fanhuang <FangSheng.Huang@amd.com>

Given the discussions at LPC around how to present GPU/HBM memory and
suggestions that reserved might be a better choice. I wonder if this
patch should provide that option as well?  Or maybe as a potential follow
up. The fun their is that you also need to arrange for DSDT entries to
tie the region to the driver that actually wants it.

Anyhow that session reminded me of what SPM was designed for
(you don't want to know how long it took to come up with the name)
and it is a little more subtle than the description in here suggests.

The x86 specific code looks fine to me but I'm more or less totally
unfamiliar with that, so need review from others.

+CC a few folk from that discussion. I wasn't there in person and
it sounded like the discussion moved to the hallway so it may
have come to a totally different conclusion!

https://lpc.events/event/19/contributions/2064/ has links to slides
and youtube video.

> diff --git a/qapi/machine.json b/qapi/machine.json
> index 907cb25f75..cbb19da35c 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -500,6 +500,12 @@
>  # @memdev: memory backend object.  If specified for one node, it must
>  #     be specified for all nodes.
>  #
> +# @spm: if true, mark the memory region of this node as Specific
> +#     Purpose Memory (SPM).  The memory will be reported to the
> +#     guest as soft reserved, allowing device drivers to manage it
> +#     separately from normal system RAM.  Currently only supported
> +#     on x86.  (default: false, since 10.0)

As below. This needs to say something about letting the guest know
that it might want to let a driver manage it separately from normal
system RAM.

> +#
>  # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points
>  #     to the nodeid which has the memory controller responsible for
>  #     this NUMA node.  This field provides additional information as
> @@ -514,6 +520,7 @@
>     '*cpus':   ['uint16'],
>     '*mem':    'size',
>     '*memdev': 'str',
> +   '*spm':    'bool',
>     '*initiator': 'uint16' }}
>  
>  ##
> diff --git a/qemu-options.hx b/qemu-options.hx
> index fca2b7bc74..ffcd1f47cf 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -431,7 +431,7 @@ ERST
>  
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node][,spm=on|off]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
>      "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
>      "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
> @@ -440,7 +440,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>  SRST
>  ``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]``
>    \ 
> -``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]``
> +``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator][,spm=on|off]``
>    \
>  ``-numa dist,src=source,dst=destination,val=distance``
>    \ 
> @@ -508,6 +508,13 @@ SRST
>      largest bandwidth) to this NUMA node. Note that this option can be
>      set only when the machine property 'hmat' is set to 'on'.
>  
> +    '\ ``spm``\ ' option marks the memory region of this NUMA node as
> +    Specific Purpose Memory (SPM). When enabled, the memory will be
> +    reported to the guest as soft reserved, allowing device drivers to
> +    manage it separately from normal system RAM. This is useful for
> +    device-specific memory that should not be used as general purpose
> +    memory. This option is only supported on x86 platforms.

This wants tweaking.  As came up at the LPC discussion, SPM is for
memory that 'might' be used as general purpose memory if the policy of the
guest is to do so - as Alistair pointed out at LPC, people don't actually
do that very often, but none the less that's why this type exists. It is
a strong hint to the guest that it needs to apply a policy choice to
what happens to this memory.

Reserved is for memory that is only suitable for use other than generic
memory.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-23  9:56   ` Jonathan Cameron via
@ 2025-12-23 10:01     ` David Hildenbrand (Red Hat)
  2025-12-26  7:15       ` Huang, FangSheng (Jerry)
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-23 10:01 UTC (permalink / raw)
  To: Jonathan Cameron, fanhuang
  Cc: qemu-devel, imammedo, Zhigang.Luo, Lianjie.Shi, Alistair Popple,
	Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams

On 12/23/25 10:56, Jonathan Cameron via wrote:
> On Tue, 9 Dec 2025 17:38:41 +0800
> fanhuang <FangSheng.Huang@amd.com> wrote:
> 
>> This patch adds support for Specific Purpose Memory (SPM) through the
>> NUMA node configuration. When 'spm=on' is specified for a NUMA node,
>> the memory region will be reported to the guest as soft reserved,
>> allowing device drivers to manage it separately from normal system RAM.
>>
>> Note: This option is only supported on x86 platforms. Using spm=on
>> on non-x86 machines will result in an error.
>>
>> Usage:
>>    -numa node,nodeid=0,memdev=m1,spm=on
>>
>> Signed-off-by: fanhuang <FangSheng.Huang@amd.com>
> 
> Given the discussions at LPC around how to present GPU/HBM memory and
> suggestions that reserved might be a better choice. I wonder if this
> patch should provide that option as well?  Or maybe as a potential follow
> up. The fun their is that you also need to arrange for DSDT entries to
> tie the region to the driver that actually wants it.
> 
> Anyhow that session reminded me of what SPM was designed for
> (you don't want to know how long it took to come up with the name)
> and it is a little more subtle than the description in here suggests.
> 
> The x86 specific code looks fine to me but I'm more or less totally
> unfamiliar with that, so need review from others.
> 
> +CC a few folk from that discussion. I wasn't there in person and
> it sounded like the discussion moved to the hallway so it may
> have come to a totally different conclusion!
> 
> https://lpc.events/event/19/contributions/2064/ has links to slides
> and youtube video.
> 
>> diff --git a/qapi/machine.json b/qapi/machine.json
>> index 907cb25f75..cbb19da35c 100644
>> --- a/qapi/machine.json
>> +++ b/qapi/machine.json
>> @@ -500,6 +500,12 @@
>>   # @memdev: memory backend object.  If specified for one node, it must
>>   #     be specified for all nodes.
>>   #
>> +# @spm: if true, mark the memory region of this node as Specific
>> +#     Purpose Memory (SPM).  The memory will be reported to the
>> +#     guest as soft reserved, allowing device drivers to manage it
>> +#     separately from normal system RAM.  Currently only supported
>> +#     on x86.  (default: false, since 10.0)
> 
> As below. This needs to say something about letting the guest know
> that it might want to let a driver manage it separately from normal
> system RAM.
> 
>> +#
>>   # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points
>>   #     to the nodeid which has the memory controller responsible for
>>   #     this NUMA node.  This field provides additional information as
>> @@ -514,6 +520,7 @@
>>      '*cpus':   ['uint16'],
>>      '*mem':    'size',
>>      '*memdev': 'str',
>> +   '*spm':    'bool',
>>      '*initiator': 'uint16' }}
>>   
>>   ##
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index fca2b7bc74..ffcd1f47cf 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -431,7 +431,7 @@ ERST
>>   
>>   DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>>       "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node][,spm=on|off]\n"
>>       "-numa dist,src=source,dst=destination,val=distance\n"
>>       "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
>>       "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
>> @@ -440,7 +440,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>>   SRST
>>   ``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]``
>>     \
>> -``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]``
>> +``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator][,spm=on|off]``
>>     \
>>   ``-numa dist,src=source,dst=destination,val=distance``
>>     \
>> @@ -508,6 +508,13 @@ SRST
>>       largest bandwidth) to this NUMA node. Note that this option can be
>>       set only when the machine property 'hmat' is set to 'on'.
>>   
>> +    '\ ``spm``\ ' option marks the memory region of this NUMA node as
>> +    Specific Purpose Memory (SPM). When enabled, the memory will be
>> +    reported to the guest as soft reserved, allowing device drivers to
>> +    manage it separately from normal system RAM. This is useful for
>> +    device-specific memory that should not be used as general purpose
>> +    memory. This option is only supported on x86 platforms.
> 
> This wants tweaking.  As came up at the LPC discussion, SPM is for
> memory that 'might' be used as general purpose memory if the policy of the
> guest is to do so - as Alistair pointed out at LPC, people don't actually
> do that very often, but none the less that's why this type exists. It is
> a strong hint to the guest that it needs to apply a policy choice to
> what happens to this memory.

Just curious, it's the same on real hardware, right?

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-23 10:01     ` David Hildenbrand (Red Hat)
@ 2025-12-26  7:15       ` Huang, FangSheng (Jerry)
  2025-12-26 22:46         ` Alistair Popple
  2025-12-30 20:09         ` David Hildenbrand (Red Hat)
  0 siblings, 2 replies; 18+ messages in thread
From: Huang, FangSheng (Jerry) @ 2025-12-26  7:15 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat), Jonathan Cameron
  Cc: qemu-devel, imammedo, Zhigang.Luo, Lianjie.Shi, Alistair Popple,
	Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams

Hi Jonathan, David,

Thanks for the review and for pointing out the LPC discussion!

On 12/23/2025 6:01 PM, David Hildenbrand (Red Hat) wrote:
> On 12/23/25 10:56, Jonathan Cameron via wrote:
>> On Tue, 9 Dec 2025 17:38:41 +0800
>> fanhuang <FangSheng.Huang@amd.com> wrote:
>>
>>> This patch adds support for Specific Purpose Memory (SPM) through the
>>> NUMA node configuration. When 'spm=on' is specified for a NUMA node,
>>> the memory region will be reported to the guest as soft reserved,
>>> allowing device drivers to manage it separately from normal system RAM.
>>>
>>> Note: This option is only supported on x86 platforms. Using spm=on
>>> on non-x86 machines will result in an error.
>>>
>>> Usage:
>>>    -numa node,nodeid=0,memdev=m1,spm=on
>>>
>>> Signed-off-by: fanhuang <FangSheng.Huang@amd.com>
>>
>> Given the discussions at LPC around how to present GPU/HBM memory and
>> suggestions that reserved might be a better choice. I wonder if this
>> patch should provide that option as well?  Or maybe as a potential follow
>> up. The fun their is that you also need to arrange for DSDT entries to
>> tie the region to the driver that actually wants it.
>>
>> Anyhow that session reminded me of what SPM was designed for
>> (you don't want to know how long it took to come up with the name)
>> and it is a little more subtle than the description in here suggests.
>>
>> The x86 specific code looks fine to me but I'm more or less totally
>> unfamiliar with that, so need review from others.
>>
>> +CC a few folk from that discussion. I wasn't there in person and
>> it sounded like the discussion moved to the hallway so it may
>> have come to a totally different conclusion!
>>
>> https://lpc.events/event/19/contributions/2064/ has links to slides
>> and youtube video.
>>

I watched the slides.  Actually we've been experimenting with
a combined approach: SBIOS
reports HBM as SPM, then driver dynamically partitions and hot-plugs it as
driver-managed memory to NUMA nodes. So SPM and driver-managed are
complementary rather than mutually exclusive. This patch focuses on the
first part - enabling QEMU to report memory as SPM to the guest.

For the `reserved` option - agree it could be a potential follow-up, though
it needs more investigation. For now, let's focus on SPM and soft reserved.

>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>> index 907cb25f75..cbb19da35c 100644
>>> --- a/qapi/machine.json
>>> +++ b/qapi/machine.json
>>> @@ -500,6 +500,12 @@
>>>   # @memdev: memory backend object.  If specified for one node, it must
>>>   #     be specified for all nodes.
>>>   #
>>> +# @spm: if true, mark the memory region of this node as Specific
>>> +#     Purpose Memory (SPM).  The memory will be reported to the
>>> +#     guest as soft reserved, allowing device drivers to manage it
>>> +#     separately from normal system RAM.  Currently only supported
>>> +#     on x86.  (default: false, since 10.0)
>>
>> As below. This needs to say something about letting the guest know
>> that it might want to let a driver manage it separately from normal
>> system RAM.
>>
>>> +#
>>>   # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points
>>>   #     to the nodeid which has the memory controller responsible for
>>>   #     this NUMA node.  This field provides additional information as
>>> @@ -514,6 +520,7 @@
>>>      '*cpus':   ['uint16'],
>>>      '*mem':    'size',
>>>      '*memdev': 'str',
>>> +   '*spm':    'bool',
>>>      '*initiator': 'uint16' }}
>>>   ##
>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>> index fca2b7bc74..ffcd1f47cf 100644
>>> --- a/qemu-options.hx
>>> +++ b/qemu-options.hx
>>> @@ -431,7 +431,7 @@ ERST
>>>   DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>>>       "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node] 
>>> [,initiator=node]\n"
>>> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] 
>>> [,initiator=node]\n"
>>> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] 
>>> [,initiator=node][,spm=on|off]\n"
>>>       "-numa dist,src=source,dst=destination,val=distance\n"
>>>       "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
>>>       "-numa hmat-lb,initiator=node,target=node,hierarchy=memory| 
>>> first-level|second-level|third-level,data-type=access-latency|read- 
>>> latency|write-latency[,latency=lat][,bandwidth=bw]\n"
>>> @@ -440,7 +440,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>>>   SRST
>>>   ``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node] 
>>> [,initiator=initiator]``
>>>     \
>>> -``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] 
>>> [,initiator=initiator]``
>>> +``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node] 
>>> [,initiator=initiator][,spm=on|off]``
>>>     \
>>>   ``-numa dist,src=source,dst=destination,val=distance``
>>>     \
>>> @@ -508,6 +508,13 @@ SRST
>>>       largest bandwidth) to this NUMA node. Note that this option can be
>>>       set only when the machine property 'hmat' is set to 'on'.
>>> +    '\ ``spm``\ ' option marks the memory region of this NUMA node as
>>> +    Specific Purpose Memory (SPM). When enabled, the memory will be
>>> +    reported to the guest as soft reserved, allowing device drivers to
>>> +    manage it separately from normal system RAM. This is useful for
>>> +    device-specific memory that should not be used as general purpose
>>> +    memory. This option is only supported on x86 platforms.
>>
>> This wants tweaking.  As came up at the LPC discussion, SPM is for
>> memory that 'might' be used as general purpose memory if the policy of 
>> the
>> guest is to do so - as Alistair pointed out at LPC, people don't actually
>> do that very often, but none the less that's why this type exists. It is
>> a strong hint to the guest that it needs to apply a policy choice to
>> what happens to this memory.

Got it. To clarify - this patch only handles the "reporting" part, just
like how SBIOS reports HBM as SPM on real hardware. The guest kernel
then decides how to use this memory based on its own policy (kernel config,
boot parameters, etc.). Will update the docs to describe SPM as a
policy hint rather than a definitive restriction.

> 
> Just curious, it's the same on real hardware, right?
> 

Hi David, could you clarify what you're asking about? Whether the SPM
semantics are the same, or whether this QEMU implementation matches real
hardware behavior?

Best Regards,
Jerry Huang


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-26  7:15       ` Huang, FangSheng (Jerry)
@ 2025-12-26 22:46         ` Alistair Popple
  2025-12-30 20:09         ` David Hildenbrand (Red Hat)
  1 sibling, 0 replies; 18+ messages in thread
From: Alistair Popple @ 2025-12-26 22:46 UTC (permalink / raw)
  To: Huang, FangSheng (Jerry)
  Cc: David Hildenbrand (Red Hat), Jonathan Cameron, qemu-devel,
	imammedo, Zhigang.Luo, Lianjie.Shi, Bhardwaj, Rajneesh,
	Paul Blinzer, dan.j.williams, Gregory Price

On 2025-12-26 at 18:15 +1100, "Huang, FangSheng (Jerry)" <FangSheng.Huang@amd.com> wrote...
> [You don't often get email from fangsheng.huang@amd.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hi Jonathan, David,
> 
> Thanks for the review and for pointing out the LPC discussion!
>
> On 12/23/2025 6:01 PM, David Hildenbrand (Red Hat) wrote:
> > On 12/23/25 10:56, Jonathan Cameron via wrote:
> > > On Tue, 9 Dec 2025 17:38:41 +0800
> > > fanhuang <FangSheng.Huang@amd.com> wrote:
> > > 
> > > > This patch adds support for Specific Purpose Memory (SPM) through the
> > > > NUMA node configuration. When 'spm=on' is specified for a NUMA node,
> > > > the memory region will be reported to the guest as soft reserved,
> > > > allowing device drivers to manage it separately from normal system RAM.
> > > > 
> > > > Note: This option is only supported on x86 platforms. Using spm=on
> > > > on non-x86 machines will result in an error.
> > > > 
> > > > Usage:
> > > >    -numa node,nodeid=0,memdev=m1,spm=on
> > > > 
> > > > Signed-off-by: fanhuang <FangSheng.Huang@amd.com>
> > > 
> > > Given the discussions at LPC around how to present GPU/HBM memory and
> > > suggestions that reserved might be a better choice. I wonder if this
> > > patch should provide that option as well?  Or maybe as a potential follow
> > > up. The fun their is that you also need to arrange for DSDT entries to
> > > tie the region to the driver that actually wants it.
> > > 
> > > Anyhow that session reminded me of what SPM was designed for
> > > (you don't want to know how long it took to come up with the name)
> > > and it is a little more subtle than the description in here suggests.
> > > 
> > > The x86 specific code looks fine to me but I'm more or less totally
> > > unfamiliar with that, so need review from others.
> > > 
> > > +CC a few folk from that discussion. I wasn't there in person and
> > > it sounded like the discussion moved to the hallway so it may
> > > have come to a totally different conclusion!

Indeed it did! We had an interesting discussion. I'm out of office for the next
week or so though so don't have much to add for now but adding Gregory to this
discussion as well.

 - Alistair

> > > https://lpc.events/event/19/contributions/2064/ has links to slides
> > > and youtube video.
> > > 
> 
> I watched the slides.  Actually we've been experimenting with
> a combined approach: SBIOS
> reports HBM as SPM, then driver dynamically partitions and hot-plugs it as
> driver-managed memory to NUMA nodes. So SPM and driver-managed are
> complementary rather than mutually exclusive. This patch focuses on the
> first part - enabling QEMU to report memory as SPM to the guest.
> 
> For the `reserved` option - agree it could be a potential follow-up, though
> it needs more investigation. For now, let's focus on SPM and soft reserved.
> 
> > > > diff --git a/qapi/machine.json b/qapi/machine.json
> > > > index 907cb25f75..cbb19da35c 100644
> > > > --- a/qapi/machine.json
> > > > +++ b/qapi/machine.json
> > > > @@ -500,6 +500,12 @@
> > > >   # @memdev: memory backend object.  If specified for one node, it must
> > > >   #     be specified for all nodes.
> > > >   #
> > > > +# @spm: if true, mark the memory region of this node as Specific
> > > > +#     Purpose Memory (SPM).  The memory will be reported to the
> > > > +#     guest as soft reserved, allowing device drivers to manage it
> > > > +#     separately from normal system RAM.  Currently only supported
> > > > +#     on x86.  (default: false, since 10.0)
> > > 
> > > As below. This needs to say something about letting the guest know
> > > that it might want to let a driver manage it separately from normal
> > > system RAM.
> > > 
> > > > +#
> > > >   # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points
> > > >   #     to the nodeid which has the memory controller responsible for
> > > >   #     this NUMA node.  This field provides additional information as
> > > > @@ -514,6 +520,7 @@
> > > >      '*cpus':   ['uint16'],
> > > >      '*mem':    'size',
> > > >      '*memdev': 'str',
> > > > +   '*spm':    'bool',
> > > >      '*initiator': 'uint16' }}
> > > >   ##
> > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > index fca2b7bc74..ffcd1f47cf 100644
> > > > --- a/qemu-options.hx
> > > > +++ b/qemu-options.hx
> > > > @@ -431,7 +431,7 @@ ERST
> > > >   DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> > > >       "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]
> > > > [,initiator=node]\n"
> > > > -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]
> > > > [,initiator=node]\n"
> > > > +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]
> > > > [,initiator=node][,spm=on|off]\n"
> > > >       "-numa dist,src=source,dst=destination,val=distance\n"
> > > >       "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
> > > >       "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|
> > > > first-level|second-level|third-level,data-type=access-latency|read-
> > > > latency|write-latency[,latency=lat][,bandwidth=bw]\n"
> > > > @@ -440,7 +440,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> > > >   SRST
> > > >   ``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]
> > > > [,initiator=initiator]``
> > > >     \
> > > > -``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]
> > > > [,initiator=initiator]``
> > > > +``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]
> > > > [,initiator=initiator][,spm=on|off]``
> > > >     \
> > > >   ``-numa dist,src=source,dst=destination,val=distance``
> > > >     \
> > > > @@ -508,6 +508,13 @@ SRST
> > > >       largest bandwidth) to this NUMA node. Note that this option can be
> > > >       set only when the machine property 'hmat' is set to 'on'.
> > > > +    '\ ``spm``\ ' option marks the memory region of this NUMA node as
> > > > +    Specific Purpose Memory (SPM). When enabled, the memory will be
> > > > +    reported to the guest as soft reserved, allowing device drivers to
> > > > +    manage it separately from normal system RAM. This is useful for
> > > > +    device-specific memory that should not be used as general purpose
> > > > +    memory. This option is only supported on x86 platforms.
> > > 
> > > This wants tweaking.  As came up at the LPC discussion, SPM is for
> > > memory that 'might' be used as general purpose memory if the policy of
> > > the
> > > guest is to do so - as Alistair pointed out at LPC, people don't actually
> > > do that very often, but none the less that's why this type exists. It is
> > > a strong hint to the guest that it needs to apply a policy choice to
> > > what happens to this memory.
> 
> Got it. To clarify - this patch only handles the "reporting" part, just
> like how SBIOS reports HBM as SPM on real hardware. The guest kernel
> then decides how to use this memory based on its own policy (kernel config,
> boot parameters, etc.). Will update the docs to describe SPM as a
> policy hint rather than a definitive restriction.
> 
> > 
> > Just curious, it's the same on real hardware, right?
> > 
> 
> Hi David, could you clarify what you're asking about? Whether the SPM
> semantics are the same, or whether this QEMU implementation matches real
> hardware behavior?
> 
> Best Regards,
> Jerry Huang


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-09  9:38 [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory fanhuang
  2025-12-09  9:38 ` [PATCH v4 1/1] " fanhuang
@ 2025-12-29 18:26 ` Gregory Price
  2025-12-30  2:55   ` Huang, FangSheng (Jerry)
  2026-01-02 13:09 ` Igor Mammedov
  2026-01-02 16:30 ` Gregory Price
  3 siblings, 1 reply; 18+ messages in thread
From: Gregory Price @ 2025-12-29 18:26 UTC (permalink / raw)
  To: fanhuang
  Cc: qemu-devel, david, imammedo, jonathan.cameron, Zhigang.Luo,
	Lianjie.Shi

On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote:
> Example usage:
>   -object memory-backend-ram,size=8G,id=m0
>   -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0
>   -numa node,nodeid=0,memdev=m0
>   -numa node,nodeid=1,memdev=m1,spm=on
> 

Interesting that you added spm= to NUMA rather than the memory backend,
but then in the patch you consume it to apply to the EFI/E820 memory
maps.

Sorry i've missed prior versions, is numa the right place to put this,
considering that the node is not necessarily 100% SPM on a real system?

(in practice it should be, but not technically required to be)

~Gregory


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-29 18:26 ` [PATCH v4 0/1] " Gregory Price
@ 2025-12-30  2:55   ` Huang, FangSheng (Jerry)
  2025-12-30 14:06     ` Gregory Price
  0 siblings, 1 reply; 18+ messages in thread
From: Huang, FangSheng (Jerry) @ 2025-12-30  2:55 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, david, imammedo, jonathan.cameron, Zhigang.Luo,
	Lianjie.Shi

Hi Gregory,

Thanks for your review and good question!

On 12/30/2025 2:26 AM, Gregory Price wrote:
> On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote:
>> Example usage:
>>    -object memory-backend-ram,size=8G,id=m0
>>    -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0
>>    -numa node,nodeid=0,memdev=m0
>>    -numa node,nodeid=1,memdev=m1,spm=on
>>
> 
> Interesting that you added spm= to NUMA rather than the memory backend,
> but then in the patch you consume it to apply to the EFI/E820 memory
> maps.
> 
> Sorry i've missed prior versions, is numa the right place to put this,
> considering that the node is not necessarily 100% SPM on a real system?
> 

The decision to add `spm=` to NUMA rather than the memory backend was 
based on
earlier feedback from David during our initial RFC discussions.

David raised a concern that if we put the spm flag on the memory 
backend, a user
could accidentally pass such a memory backend to DIMM/virtio-mem/boot 
memory,
which would have very undesired side effects.

> (in practice it should be, but not technically required to be)

You're right that on a real system, a NUMA node is not technically 
required to
be 100% SPM. However, in AMD's use case, the entire NUMA node memory 
(backed by
memdev) is intended to be SPM, and this approach provides a cleaner and 
safer
configuration interface.

> 
> ~Gregory

Please let me know if you have further concerns or suggestions.

Best Regards,
Jerry Huang


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-30  2:55   ` Huang, FangSheng (Jerry)
@ 2025-12-30 14:06     ` Gregory Price
  2025-12-30 20:15       ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 18+ messages in thread
From: Gregory Price @ 2025-12-30 14:06 UTC (permalink / raw)
  To: Huang, FangSheng (Jerry)
  Cc: qemu-devel, david, imammedo, jonathan.cameron, Zhigang.Luo,
	Lianjie.Shi

On Tue, Dec 30, 2025 at 10:55:02AM +0800, Huang, FangSheng (Jerry) wrote:
> Hi Gregory,
> 
> > Sorry i've missed prior versions, is numa the right place to put this,
> > considering that the node is not necessarily 100% SPM on a real system?
> > 
> 
> The decision to add `spm=` to NUMA rather than the memory backend was based
> on
> earlier feedback from David during our initial RFC discussions.
> 
> David raised a concern that if we put the spm flag on the memory backend, a
> user
> could accidentally pass such a memory backend to DIMM/virtio-mem/boot
> memory,
> which would have very undesired side effects.
> 

This makes sense, and in fact I almost wonder if we should actually
encode a warning in linux in general if a signal NUMA node contains
both normal and SPM.  That would help drive consistency between QEMU/KVM
and real platforms from the direction of linux.

> > (in practice it should be, but not technically required to be)
> 
> You're right that on a real system, a NUMA node is not technically required
> to
> be 100% SPM. However, in AMD's use case, the entire NUMA node memory (backed
> by
> memdev) is intended to be SPM, and this approach provides a cleaner and
> safer
> configuration interface.
> 

I figured this was the case, and honestly this just provides more
evidence that any given NUMA node probably should only have 1 "type" of
memory (or otherwise stated: uniform access within a node, non-uniform
across nodes).

---

bit of an aside - but at LPC we also talked about SPM NUMA nodes:
https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/

Would be cool to be able to detect this in the drivers and have hotplug
automatically mark a node SPM unless a driver overrides it.
(MHP flag? Sorry David :P)

> > 
> > ~Gregory
> 
> Please let me know if you have further concerns or suggestions.
> 

I'll look at the patch details a bit more, but generally I like the
direction - with an obvious note that I have a biased given the above.

~Gregory


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-26  7:15       ` Huang, FangSheng (Jerry)
  2025-12-26 22:46         ` Alistair Popple
@ 2025-12-30 20:09         ` David Hildenbrand (Red Hat)
  2026-01-04 10:43           ` Huang, FangSheng (Jerry)
  1 sibling, 1 reply; 18+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-30 20:09 UTC (permalink / raw)
  To: Huang, FangSheng (Jerry), Jonathan Cameron
  Cc: qemu-devel, imammedo, Zhigang.Luo, Lianjie.Shi, Alistair Popple,
	Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams

>>
>> Just curious, it's the same on real hardware, right?
>>
> 
> Hi David, could you clarify what you're asking about? Whether the SPM
> semantics are the same, or whether this QEMU implementation matches real
> hardware behavior?

Yes exactly. If it matches real hardware behavior then there are no real 
surprises exposed by the QEMU implementation.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-30 14:06     ` Gregory Price
@ 2025-12-30 20:15       ` David Hildenbrand (Red Hat)
  2025-12-30 23:03         ` Gregory Price
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-30 20:15 UTC (permalink / raw)
  To: Gregory Price, Huang, FangSheng (Jerry)
  Cc: qemu-devel, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi

On 12/30/25 15:06, Gregory Price wrote:
> On Tue, Dec 30, 2025 at 10:55:02AM +0800, Huang, FangSheng (Jerry) wrote:
>> Hi Gregory,
>>
>>> Sorry i've missed prior versions, is numa the right place to put this,
>>> considering that the node is not necessarily 100% SPM on a real system?
>>>
>>
>> The decision to add `spm=` to NUMA rather than the memory backend was based
>> on
>> earlier feedback from David during our initial RFC discussions.
>>
>> David raised a concern that if we put the spm flag on the memory backend, a
>> user
>> could accidentally pass such a memory backend to DIMM/virtio-mem/boot
>> memory,
>> which would have very undesired side effects.
>>
> 
> This makes sense, and in fact I almost wonder if we should actually
> encode a warning in linux in general if a signal NUMA node contains
> both normal and SPM.  That would help drive consistency between QEMU/KVM
> and real platforms from the direction of linux.

Yeah, in theory we would have a "memory device" for all boot memory 
(boot DIMM, not sure ...) and that one would actually be marked as "spm".

It's not really a thing of a memory backend after all, it's only how 
that memory is exposed to the VM.

And given we don't have a boot memory device, the idea was to set it for 
the Node, where it means "all boot memory is SPM". And we only allow one 
type of boot memory (one memory backend) per node in QEMU.

The tricky question is what happens with memory hotplug (DIMMs etc) on 
such a node. I'd argue that it's simply not SPM.

> 
>>> (in practice it should be, but not technically required to be)
>>
>> You're right that on a real system, a NUMA node is not technically required
>> to
>> be 100% SPM. However, in AMD's use case, the entire NUMA node memory (backed
>> by
>> memdev) is intended to be SPM, and this approach provides a cleaner and
>> safer
>> configuration interface.
>>
> 
> I figured this was the case, and honestly this just provides more
> evidence that any given NUMA node probably should only have 1 "type" of
> memory (or otherwise stated: uniform access within a node, non-uniform
> across nodes).

That makes sense.

> 
> ---
> 
> bit of an aside - but at LPC we also talked about SPM NUMA nodes:
> https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/
> 
> Would be cool to be able to detect this in the drivers and have hotplug
> automatically mark a node SPM unless a driver overrides it.
> (MHP flag? Sorry David :P)

:)

If it's a per-node thing, MHP flags feel a bit like "too late". It 
should be configured earlier for the node somehow.

> 
>>>
>>> ~Gregory
>>
>> Please let me know if you have further concerns or suggestions.
>>
> 
> I'll look at the patch details a bit more, but generally I like the
> direction - with an obvious note that I have a biased given the above.


Thanks for taking a look!


-- 
Cheers

David


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-30 20:15       ` David Hildenbrand (Red Hat)
@ 2025-12-30 23:03         ` Gregory Price
  0 siblings, 0 replies; 18+ messages in thread
From: Gregory Price @ 2025-12-30 23:03 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Huang, FangSheng (Jerry), qemu-devel, imammedo, jonathan.cameron,
	Zhigang.Luo, Lianjie.Shi

On Tue, Dec 30, 2025 at 09:15:34PM +0100, David Hildenbrand (Red Hat) wrote:
> On 12/30/25 15:06, Gregory Price wrote:
> 
> And given we don't have a boot memory device, the idea was to set it for the
> Node, where it means "all boot memory is SPM". And we only allow one type of
> boot memory (one memory backend) per node in QEMU.
> 
> The tricky question is what happens with memory hotplug (DIMMs etc) on such
> a node. I'd argue that it's simply not SPM.
>

...

+++ .../docs/whatever

+ Don't do that.

:]

> > 
> > ---
> > 
> > bit of an aside - but at LPC we also talked about SPM NUMA nodes:
> > https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/
> > 
> > Would be cool to be able to detect this in the drivers and have hotplug
> > automatically mark a node SPM unless a driver overrides it.
> > (MHP flag? Sorry David :P)
> 
> :)
> 
> If it's a per-node thing, MHP flags feel a bit like "too late". It should be
> configured earlier for the node somehow.
> 

just a clarification, the flag would be an override to have mhp mark a
node N_MEMORY instead of N_SPM.

As it stands right now, a node is "online with memory" if N_MEMORY is
set for that node.

https://elixir.bootlin.com/linux/v6.14-rc6/source/mm/memory_hotplug.c#L717

I imagine hotplugged N_SPM would operate the same.

So mhp code would look like

if (node_data->is_spm && !override)
	node_set_state(node, N_SPM)
else
	node_set_state(node, N_MEMORY)

Basically would allow SPM nodes to operate the same as they did before
when hotplugged to retain existing behavior.

(Sorry i'm think waaaaaaaaaaaaay far ahead here)

~Gregory


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-09  9:38 [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory fanhuang
  2025-12-09  9:38 ` [PATCH v4 1/1] " fanhuang
  2025-12-29 18:26 ` [PATCH v4 0/1] " Gregory Price
@ 2026-01-02 13:09 ` Igor Mammedov
  2026-01-02 16:28   ` Gregory Price
  2026-01-02 16:30 ` Gregory Price
  3 siblings, 1 reply; 18+ messages in thread
From: Igor Mammedov @ 2026-01-02 13:09 UTC (permalink / raw)
  To: fanhuang; +Cc: qemu-devel, david, jonathan.cameron, Zhigang.Luo, Lianjie.Shi

On Tue, 9 Dec 2025 17:38:40 +0800
fanhuang <FangSheng.Huang@amd.com> wrote:

> Hi all,
> 
> This is v4 of the SPM (Specific Purpose Memory) patch. Thank you Jonathan
> for the detailed review.
> 
> Changes in v4 (addressing Jonathan's feedback):
> - Added architecture check: spm=on now reports error on non-x86 machines
> - Simplified return logic in e820_update_entry_type() (return true/false directly)
> - Changed 4GB boundary spanning from warn_report to error_report + exit
> - Updated QAPI documentation to be architecture-agnostic (removed E820 reference)
> - Removed unnecessary comments
> 
> Use case:
> This feature allows passing EFI_MEMORY_SP (Specific Purpose Memory) from
> host to guest VM, useful for memory reserved for specific PCI devices
> (e.g., GPU memory via VFIO-PCI). The SPM memory appears as soft reserved
> to the guest and is managed by device drivers rather than the OS memory
> allocator.
> 
> Example usage:
>   -object memory-backend-ram,size=8G,id=m0
>   -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0
>   -numa node,nodeid=0,memdev=m0
>   -numa node,nodeid=1,memdev=m1,spm=on

I'm still not fond of 'spm' toggle on numa node itself (even though on AMD hadware sunch memory has 1:1 mapping)
without device model in between.

Can we try following instead:
  * add 'spm' property to DIMM device and disable hotplug on it in such case
  * make E820 enumerate spm/not hotpluggble marked DIMMs.

That will let us later to have mixed memory on the node if such need arises without
breaking QEMU CLI.

> Please review. Thanks!
> 
> Best regards,
> Jerry Huang
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
  2026-01-02 13:09 ` Igor Mammedov
@ 2026-01-02 16:28   ` Gregory Price
  0 siblings, 0 replies; 18+ messages in thread
From: Gregory Price @ 2026-01-02 16:28 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: fanhuang, qemu-devel, david, jonathan.cameron, Zhigang.Luo,
	Lianjie.Shi

On Fri, Jan 02, 2026 at 02:09:22PM +0100, Igor Mammedov wrote:
> That will let us later to have mixed memory on the node 

We were just discussing strongly-dissuading such a configuration from
a linux perspective, even if it's technically allowed.

If only because it makes reasoning about placement policy on such a node
completely impossible.

~Gregory


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-09  9:38 [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory fanhuang
                   ` (2 preceding siblings ...)
  2026-01-02 13:09 ` Igor Mammedov
@ 2026-01-02 16:30 ` Gregory Price
  2026-01-05 15:29   ` David Hildenbrand (Red Hat)
  3 siblings, 1 reply; 18+ messages in thread
From: Gregory Price @ 2026-01-02 16:30 UTC (permalink / raw)
  To: fanhuang
  Cc: qemu-devel, david, imammedo, jonathan.cameron, Zhigang.Luo,
	Lianjie.Shi

On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote:
>   -numa node,nodeid=0,memdev=m0
>   -numa node,nodeid=1,memdev=m1,spm=on
> 

Should discussion with Jonathan - whatever form this ends up taking, can
we change this from [on,off] to [normal,spm,reserved] and apply the
appropriate types accordingly?

don't know what to name the tag in that case, something like..

memmap_type=[normal,spm,reserved] ?

(not married to this, open to suggestions)

~Gregory


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/1] numa: add 'spm' option for Specific Purpose Memory
  2025-12-30 20:09         ` David Hildenbrand (Red Hat)
@ 2026-01-04 10:43           ` Huang, FangSheng (Jerry)
  0 siblings, 0 replies; 18+ messages in thread
From: Huang, FangSheng (Jerry) @ 2026-01-04 10:43 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat), Jonathan Cameron
  Cc: qemu-devel, imammedo, Zhigang.Luo, Lianjie.Shi, Alistair Popple,
	Bhardwaj, Rajneesh, Paul Blinzer, dan.j.williams



On 12/31/2025 4:09 AM, David Hildenbrand (Red Hat) wrote:
>>>
>>> Just curious, it's the same on real hardware, right?
>>>
>>
>> Hi David, could you clarify what you're asking about? Whether the SPM
>> semantics are the same, or whether this QEMU implementation matches real
>> hardware behavior?
> 
> Yes exactly. If it matches real hardware behavior then there are no real 
> surprises exposed by the QEMU implementation.
> 
For the SBIOS pre-configured scenario, yes, it matches.

This QEMU implementation assumes SBIOS pre-configures the NUMA node
with SPM via SRAT/E820 - SPM is static boot memory from VM start.

One potential difference on real hardware: SPM might be initially
soft-reserved by SBIOS, then dynamically added to a NUMA node via
add_memory_driver_managed() at runtime. In that case, it's not
pre-bound boot memory.

This patch targets the first scenario, which should have
no surprises.

Thanks,
Jerry


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
  2026-01-02 16:30 ` Gregory Price
@ 2026-01-05 15:29   ` David Hildenbrand (Red Hat)
  2026-01-07  9:03     ` Huang, FangSheng (Jerry)
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-05 15:29 UTC (permalink / raw)
  To: Gregory Price, fanhuang
  Cc: qemu-devel, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi

On 1/2/26 17:30, Gregory Price wrote:
> On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote:
>>    -numa node,nodeid=0,memdev=m0
>>    -numa node,nodeid=1,memdev=m1,spm=on
>>
> 
> Should discussion with Jonathan - whatever form this ends up taking, can
> we change this from [on,off] to [normal,spm,reserved] and apply the
> appropriate types accordingly?
> 
> don't know what to name the tag in that case, something like..
> 
> memmap_type=[normal,spm,reserved] ?

That looks more extensible indeed.

The semantics would be unchanged compared to spm=on: only applies to 
boot memory. Although, as discussed, mixing and matching types per node 
should be avoided either way.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
  2026-01-05 15:29   ` David Hildenbrand (Red Hat)
@ 2026-01-07  9:03     ` Huang, FangSheng (Jerry)
  0 siblings, 0 replies; 18+ messages in thread
From: Huang, FangSheng (Jerry) @ 2026-01-07  9:03 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat), Gregory Price
  Cc: qemu-devel, imammedo, jonathan.cameron, Zhigang.Luo, Lianjie.Shi



On 1/5/2026 11:29 PM, David Hildenbrand (Red Hat) wrote:
> On 1/2/26 17:30, Gregory Price wrote:
>> On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote:
>>>    -numa node,nodeid=0,memdev=m0
>>>    -numa node,nodeid=1,memdev=m1,spm=on
>>>
>>
>> Should discussion with Jonathan - whatever form this ends up taking, can
>> we change this from [on,off] to [normal,spm,reserved] and apply the
>> appropriate types accordingly?
>>
>> don't know what to name the tag in that case, something like..
>>
>> memmap_type=[normal,spm,reserved] ?
> 
> That looks more extensible indeed.
> 
> The semantics would be unchanged compared to spm=on: only applies to 
> boot memory. Although, as discussed, mixing and matching types per node 
> should be avoided either way.
> 
Hi Gregory, David,

Thank you for the suggestion on making this more extensible.

I agree that `memmap_type=[normal,spm,reserved]` is a better approach
than the simple boolean `spm=on|off`.

I've analyzed the required changes and will prepare an updated patch
implementing this. However, I need to go through an internal review
process before submitting to the community, which may take some time.

In the meantime, any feedback or suggestions on the design
are welcome.

Best Regards,
Jerry Huang


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-01-07  9:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-09  9:38 [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory fanhuang
2025-12-09  9:38 ` [PATCH v4 1/1] " fanhuang
2025-12-23  9:56   ` Jonathan Cameron via
2025-12-23 10:01     ` David Hildenbrand (Red Hat)
2025-12-26  7:15       ` Huang, FangSheng (Jerry)
2025-12-26 22:46         ` Alistair Popple
2025-12-30 20:09         ` David Hildenbrand (Red Hat)
2026-01-04 10:43           ` Huang, FangSheng (Jerry)
2025-12-29 18:26 ` [PATCH v4 0/1] " Gregory Price
2025-12-30  2:55   ` Huang, FangSheng (Jerry)
2025-12-30 14:06     ` Gregory Price
2025-12-30 20:15       ` David Hildenbrand (Red Hat)
2025-12-30 23:03         ` Gregory Price
2026-01-02 13:09 ` Igor Mammedov
2026-01-02 16:28   ` Gregory Price
2026-01-02 16:30 ` Gregory Price
2026-01-05 15:29   ` David Hildenbrand (Red Hat)
2026-01-07  9:03     ` Huang, FangSheng (Jerry)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.