qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] numa: add 'spm' option for special purpose memory
@ 2025-09-24 10:33 fanhuang
  2025-09-24 10:33 ` fanhuang
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: fanhuang @ 2025-09-24 10:33 UTC (permalink / raw)
  To: qemu-devel, david; +Cc: Zhigang.Luo, Lianjie.Shi, FangSheng.Huang

Hi David,

I hope this email finds you well. It's been several months since Zhigang last discussion about the Special Purpose Memory (SPM) implementation in QEMU with you, and I wanted to provide some background context before presenting the new patch based on your valuable suggestions.

Previous Discussion Summary
===========================
Back in December 2024, we had an extensive discussion regarding my original patch that added the `hmem` option to `memory-backend-file`. During that conversation, you raised several important concerns about the design approach:

Original Approach (December 2024)
----------------------------------
- Zhigang's patch: Added `hmem=on` option to `memory-backend-file`
- QEMU cmdline example:
  -object memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,align=1G,hmem=on
  -numa node,nodeid=1,memdev=m1

Your Concerns and Suggestions
-----------------------------
You correctly identified some issues with the original approach:
- Configuration Safety: Users could create problematic configurations like:
   -object memory-backend-file,size=16G,id=unused,mem-path=whatever,hmem=on

- Your Recommendation: You proposed a cleaner approach using NUMA node configuration:
   -numa node,nodeid=1,memdev=m1,spm=on

Project Context
===============
To refresh your memory on the use case:
- Objective: Pass `EFI_MEMORY_SP` (Special Purpose Memory) type memory from host to QEMU virtual machine
- Application: Memory reserved for specific PCI devices (e.g., VFIO-PCI devices)
- Guest Behavior: The SPM memory should be recognized by the guest OS and claimed by hmem-dax driver

Complete QEMU Configuration Example:
-object memory-backend-ram,size=8G,id=m0
-object memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,align=1G
-numa node,nodeid=0,memdev=m0
-numa node,nodeid=1,memdev=m1,spm=on  # <-- New approach based on your suggestion

New Patch Implementation
========================
Following your recommendations, I have completely redesigned the implementation:

Key Changes:
1. Removed `hmem` option from `memory-backend-file`
2. Added `spm` (special-purpose) option to NUMA node configuration

I would appreciate your review of the new patch implementation. The design now follows your suggested approach of using NUMA node configuration rather than memory backend options, which should resolve the safety and scope issues we discussed.
Thank you for your time and valuable guidance on this implementation.

Please note that I'm located in UTC+8 timezone, so there might be some delay in my responses to your emails due to the time difference. I appreciate your patience and understanding.

Best regards,
FangSheng Huang



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] numa: add 'spm' option for special purpose memory
  2025-09-24 10:33 [PATCH] numa: add 'spm' option for special purpose memory fanhuang
@ 2025-09-24 10:33 ` fanhuang
  2025-09-24 17:03 ` David Hildenbrand
  2025-10-02 14:11 ` Igor Mammedov
  2 siblings, 0 replies; 9+ messages in thread
From: fanhuang @ 2025-09-24 10:33 UTC (permalink / raw)
  To: qemu-devel, david; +Cc: Zhigang.Luo, Lianjie.Shi, FangSheng.Huang

This patch adds support for special purpose memory (SPM) through the
NUMA node configuration. When 'spm=on' is specified for a NUMA node,
QEMU will:

1. Set the RAM_SPM flag in the RAM block of the corresponding memory region
2. Set the E820 type to E820_SOFT_RESERVED for this memory region

This allows guest operating systems to recognize the memory as soft reserved
memory, which can be used for heterogeneous memory management or other
special purposes.

Usage:
  -numa node,nodeid=0,memdev=m1,spm=on
Signed-off-by: fanhuang <FangSheng.Huang@amd.com>
---
 hw/core/numa.c               |  3 +++
 hw/i386/e820_memory_layout.h |  1 +
 hw/i386/pc.c                 | 34 ++++++++++++++++++++++++++++++++++
 include/exec/cpu-common.h    |  1 +
 include/system/memory.h      |  3 +++
 include/system/numa.h        |  1 +
 qapi/machine.json            |  6 ++++++
 system/physmem.c             |  7 ++++++-
 8 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 218576f745..e680130460 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -163,6 +163,9 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
         numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
     }
 
+    /* Store spm configuration for later processing */
+    numa_info[nodenr].is_spm = node->has_spm && node->spm;
+
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
     ms->numa_state->num_nodes++;
diff --git a/hw/i386/e820_memory_layout.h b/hw/i386/e820_memory_layout.h
index b50acfa201..8af6a9cfac 100644
--- a/hw/i386/e820_memory_layout.h
+++ b/hw/i386/e820_memory_layout.h
@@ -15,6 +15,7 @@
 #define E820_ACPI       3
 #define E820_NVS        4
 #define E820_UNUSABLE   5
+#define E820_SOFT_RESERVED  0xEFFFFFFF
 
 struct e820_entry {
     uint64_t address;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index bc048a6d13..10ecd25728 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -26,6 +26,7 @@
 #include "qemu/units.h"
 #include "exec/target_page.h"
 #include "hw/i386/pc.h"
+#include "system/ramblock.h"
 #include "hw/char/serial-isa.h"
 #include "hw/char/parallel.h"
 #include "hw/hyperv/hv-balloon.h"
@@ -787,6 +788,38 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size)
     return pc_above_4g_end(pcms) - 1;
 }
 
+static int pc_update_spm_memory(RAMBlock *rb, void *opaque)
+{
+    X86MachineState *x86ms = opaque;
+    MachineState *ms = MACHINE(x86ms);
+    ram_addr_t offset;
+    ram_addr_t length;
+    bool is_spm = false;
+
+    /* Check if this RAM block belongs to a NUMA node with spm=on */
+    for (int i = 0; i < ms->numa_state->num_nodes; i++) {
+        NodeInfo *numa_info = &ms->numa_state->nodes[i];
+        if (numa_info->is_spm && numa_info->node_memdev) {
+            MemoryRegion *mr = &numa_info->node_memdev->mr;
+            if (mr->ram_block == rb) {
+                /* Mark this RAM block as SPM and set the flag */
+                rb->flags |= RAM_SPM;
+                is_spm = true;
+                break;
+            }
+        }
+    }
+
+    if (is_spm) {
+        offset = qemu_ram_get_offset(rb) +
+                 (0x100000000ULL - x86ms->below_4g_mem_size);
+        length = qemu_ram_get_used_length(rb);
+        e820_add_entry(offset, length, E820_SOFT_RESERVED);
+    }
+
+    return 0;
+}
+
 /*
  * AMD systems with an IOMMU have an additional hole close to the
  * 1Tb, which are special GPAs that cannot be DMA mapped. Depending
@@ -901,6 +934,7 @@ void pc_memory_init(PCMachineState *pcms,
     if (pcms->sgx_epc.size != 0) {
         e820_add_entry(pcms->sgx_epc.base, pcms->sgx_epc.size, E820_RESERVED);
     }
+    qemu_ram_foreach_block(pc_update_spm_memory, x86ms);
 
     if (!pcmc->has_reserved_memory &&
         (machine->ram_slots ||
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 9b658a3f48..9b437eaa10 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -89,6 +89,7 @@ ram_addr_t qemu_ram_get_fd_offset(RAMBlock *rb);
 ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
 ram_addr_t qemu_ram_get_max_length(RAMBlock *rb);
 bool qemu_ram_is_shared(RAMBlock *rb);
+bool qemu_ram_is_spm(RAMBlock *rb);
 bool qemu_ram_is_noreserve(RAMBlock *rb);
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
 void qemu_ram_set_uf_zeroable(RAMBlock *rb);
diff --git a/include/system/memory.h b/include/system/memory.h
index aa85fc27a1..520dda969e 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -275,6 +275,9 @@ typedef struct IOMMUTLBEvent {
  */
 #define RAM_PRIVATE (1 << 13)
 
+/* RAM is special purpose memory */
+#define RAM_SPM (1 << 14)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
diff --git a/include/system/numa.h b/include/system/numa.h
index 1044b0eb6e..438511a756 100644
--- a/include/system/numa.h
+++ b/include/system/numa.h
@@ -41,6 +41,7 @@ typedef struct NodeInfo {
     bool present;
     bool has_cpu;
     bool has_gi;
+    bool is_spm;
     uint8_t lb_info_provided;
     uint16_t initiator;
     uint8_t distance[MAX_NODES];
diff --git a/qapi/machine.json b/qapi/machine.json
index 038eab281c..1a513b38cf 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -500,6 +500,11 @@
 # @memdev: memory backend object.  If specified for one node, it must
 #     be specified for all nodes.
 #
+# @spm: if true, mark the memory region of this node as special
+#     purpose memory (SPM). This will set the RAM_SPM flag for the
+#     corresponding memory region and set the E820 type to
+#     E820_SOFT_RESERVED. (default: false, since 9.2)
+#
 # @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145, points
 #     to the nodeid which has the memory controller responsible for
 #     this NUMA node.  This field provides additional information as
@@ -514,6 +519,7 @@
    '*cpus':   ['uint16'],
    '*mem':    'size',
    '*memdev': 'str',
+   '*spm':    'bool',
    '*initiator': 'uint16' }}
 
 ##
diff --git a/system/physmem.c b/system/physmem.c
index ae8ecd50ea..0090d9955d 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1611,6 +1611,11 @@ bool qemu_ram_is_noreserve(RAMBlock *rb)
     return rb->flags & RAM_NORESERVE;
 }
 
+bool qemu_ram_is_spm(RAMBlock *rb)
+{
+    return rb->flags & RAM_SPM;
+}
+
 /* Note: Only set at the start of postcopy */
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb)
 {
@@ -2032,7 +2037,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, ram_addr_t max_size,
     ram_flags &= ~RAM_PRIVATE;
 
     /* Just support these ram flags by now. */
-    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
+    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SPM | RAM_NORESERVE |
                           RAM_PROTECTED | RAM_NAMED_FILE | RAM_READONLY |
                           RAM_READONLY_FD | RAM_GUEST_MEMFD |
                           RAM_RESIZEABLE)) == 0);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] numa: add 'spm' option for special purpose memory
  2025-09-24 10:33 [PATCH] numa: add 'spm' option for special purpose memory fanhuang
  2025-09-24 10:33 ` fanhuang
@ 2025-09-24 17:03 ` David Hildenbrand
  2025-09-25  7:39   ` Huang, FangSheng (Jerry)
  2025-10-02 14:11 ` Igor Mammedov
  2 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-09-24 17:03 UTC (permalink / raw)
  To: fanhuang, qemu-devel
  Cc: Zhigang.Luo, Lianjie.Shi, Igor Mammedov, Jonathan Cameron

On 24.09.25 12:33, fanhuang wrote:
> Hi David,

Hi!

CCing Igor and Jonathan.

> 
> I hope this email finds you well. It's been several months since Zhigang last discussion about the Special Purpose Memory (SPM) implementation in QEMU with you, and I wanted to provide some background context before presenting the new patch based on your valuable suggestions.
> 
> Previous Discussion Summary
> ===========================
> Back in December 2024, we had an extensive discussion regarding my original patch that added the `hmem` option to `memory-backend-file`. During that conversation, you raised several important concerns about the design approach:
> 
> Original Approach (December 2024)
> ----------------------------------
> - Zhigang's patch: Added `hmem=on` option to `memory-backend-file`
> - QEMU cmdline example:
>    -object memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,align=1G,hmem=on
>    -numa node,nodeid=1,memdev=m1
> 
> Your Concerns and Suggestions
> -----------------------------
> You correctly identified some issues with the original approach:
> - Configuration Safety: Users could create problematic configurations like:
>     -object memory-backend-file,size=16G,id=unused,mem-path=whatever,hmem=on
> 
> - Your Recommendation: You proposed a cleaner approach using NUMA node configuration:
>     -numa node,nodeid=1,memdev=m1,spm=on

Oh my, I don't remember all the details from that discussion :)

I assume that any memory devices (DIMM/NVDIMM/virtio-mem) we would
cold/hotplug to such a NUMA node would not be indicated as spm, correct?

> 
> Project Context
> ===============
> To refresh your memory on the use case:
> - Objective: Pass `EFI_MEMORY_SP` (Special Purpose Memory) type memory from host to QEMU virtual machine
> - Application: Memory reserved for specific PCI devices (e.g., VFIO-PCI devices)
> - Guest Behavior: The SPM memory should be recognized by the guest OS and claimed by hmem-dax driver
> 
> Complete QEMU Configuration Example:
> -object memory-backend-ram,size=8G,id=m0
> -object memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,align=1G
> -numa node,nodeid=0,memdev=m0
> -numa node,nodeid=1,memdev=m1,spm=on  # <-- New approach based on your suggestion

The only alternative I could think of is gluing it to a memory device. For example,
have something like:

-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1 \
-device pc-dimm,id=sp0,memdev=m1,sp=true

But we would not want (and cannot easily) use DIMMs for that purpose.

> 
> New Patch Implementation
> ========================
> Following your recommendations, I have completely redesigned the implementation:
> 
> Key Changes:
> 1. Removed `hmem` option from `memory-backend-file`
> 2. Added `spm` (special-purpose) option to NUMA node configuration

That definitely sounds better to me: essentially "spm" would say: the boot memory assigned to this
node (through memdev=) will be indicated as EFI_MEMORY_SP.

> 
> I would appreciate your review of the new patch implementation. The design now follows your suggested approach of using NUMA node configuration rather than memory backend options, which should resolve the safety and scope issues we discussed.
> Thank you for your time and valuable guidance on this implementation.
> 
> Please note that I'm located in UTC+8 timezone, so there might be some delay in my responses to your emails due to the time difference. I appreciate your patience and understanding.

No worries :)

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] numa: add 'spm' option for special purpose memory
  2025-09-24 17:03 ` David Hildenbrand
@ 2025-09-25  7:39   ` Huang, FangSheng (Jerry)
  2025-09-25 11:11     ` Huang, FangSheng (Jerry)
  0 siblings, 1 reply; 9+ messages in thread
From: Huang, FangSheng (Jerry) @ 2025-09-25  7:39 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Zhigang.Luo, Lianjie.Shi, Igor Mammedov, Jonathan Cameron


Hi David,

Thank you for the quick response and for bringing Igor and Jonathan into 
the discussion!

On 9/25/2025 1:03 AM, David Hildenbrand wrote:
> On 24.09.25 12:33, fanhuang wrote:
>> Hi David,
> 
> Hi!
> 
> CCing Igor and Jonathan.
> 
>>
>> I hope this email finds you well. It's been several months since 
>> Zhigang last discussion about the Special Purpose Memory (SPM) 
>> implementation in QEMU with you, and I wanted to provide some 
>> background context before presenting the new patch based on your 
>> valuable suggestions.
>>
>> Previous Discussion Summary
>> ===========================
>> Back in December 2024, we had an extensive discussion regarding my 
>> original patch that added the `hmem` option to `memory-backend-file`. 
>> During that conversation, you raised several important concerns about 
>> the design approach:
>>
>> Original Approach (December 2024)
>> ----------------------------------
>> - Zhigang's patch: Added `hmem=on` option to `memory-backend-file`
>> - QEMU cmdline example:
>>    -object memory-backend-file,size=16G,id=m1,mem-path=/dev/ 
>> dax0.0,prealloc=on,align=1G,hmem=on
>>    -numa node,nodeid=1,memdev=m1
>>
>> Your Concerns and Suggestions
>> -----------------------------
>> You correctly identified some issues with the original approach:
>> - Configuration Safety: Users could create problematic configurations 
>> like:
>>     -object memory-backend-file,size=16G,id=unused,mem- 
>> path=whatever,hmem=on
>>
>> - Your Recommendation: You proposed a cleaner approach using NUMA node 
>> configuration:
>>     -numa node,nodeid=1,memdev=m1,spm=on
> 
> Oh my, I don't remember all the details from that discussion :)
> 
> I assume that any memory devices (DIMM/NVDIMM/virtio-mem) we would
> cold/hotplug to such a NUMA node would not be indicated as spm, correct?
>
Yes, that's absolutely correct. The `spm=on` option only affects the boot
memory that is assigned to the NUMA node through the `memdev=` parameter.
>>
>> Project Context
>> ===============
>> To refresh your memory on the use case:
>> - Objective: Pass `EFI_MEMORY_SP` (Special Purpose Memory) type memory 
>> from host to QEMU virtual machine
>> - Application: Memory reserved for specific PCI devices (e.g., VFIO- 
>> PCI devices)
>> - Guest Behavior: The SPM memory should be recognized by the guest OS 
>> and claimed by hmem-dax driver
>>
>> Complete QEMU Configuration Example:
>> -object memory-backend-ram,size=8G,id=m0
>> -object memory-backend-file,size=16G,id=m1,mem-path=/dev/ 
>> dax0.0,prealloc=on,align=1G
>> -numa node,nodeid=0,memdev=m0
>> -numa node,nodeid=1,memdev=m1,spm=on  # <-- New approach based on your 
>> suggestion
> 
> The only alternative I could think of is gluing it to a memory device. 
> For example,
> have something like:
> 
> -numa node,nodeid=0,memdev=m0 \
> -numa node,nodeid=1 \
> -device pc-dimm,id=sp0,memdev=m1,sp=true
> 
> But we would not want (and cannot easily) use DIMMs for that purpose.
> 
>>
>> New Patch Implementation
>> ========================
>> Following your recommendations, I have completely redesigned the 
>> implementation:
>>
>> Key Changes:
>> 1. Removed `hmem` option from `memory-backend-file`
>> 2. Added `spm` (special-purpose) option to NUMA node configuration
> 
> That definitely sounds better to me: essentially "spm" would say: the 
> boot memory assigned to this
> node (through memdev=) will be indicated as EFI_MEMORY_SP.
> 
Thanks, that's exactly how the implementation works. The `spm=on` option
ensures that when QEMU builds the EFI memory map, the memory region
corresponding to the specified `memdev` will be marked with the
EFI_MEMORY_SP attribute.>>
>> I would appreciate your review of the new patch implementation. The 
>> design now follows your suggested approach of using NUMA node 
>> configuration rather than memory backend options, which should resolve 
>> the safety and scope issues we discussed.
>> Thank you for your time and valuable guidance on this implementation.
>>
>> Please note that I'm located in UTC+8 timezone, so there might be some 
>> delay in my responses to your emails due to the time difference. I 
>> appreciate your patience and understanding.
> 
> No worries :)
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] numa: add 'spm' option for special purpose memory
  2025-09-25  7:39   ` Huang, FangSheng (Jerry)
@ 2025-09-25 11:11     ` Huang, FangSheng (Jerry)
  0 siblings, 0 replies; 9+ messages in thread
From: Huang, FangSheng (Jerry) @ 2025-09-25 11:11 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Zhigang.Luo, Lianjie.Shi, Igor Mammedov, Jonathan Cameron

Hi David,

I need to update you on the SPM (Soft Reserved Memory) implementation. 
While
working on the OVMF patch, I discovered an issue in the current QEMU SPM 
patch
that causes overlapping E820 entries, leading to memory allocation 
conflicts in OVMF/UEFI firmware.

I'm currently working on fixing this issue and have already implemented a
preliminary solution. I'll keep you updated on the progress and send the
updated patches once the fix is properly tested and validated.

On 9/25/2025 3:39 PM, Huang, FangSheng (Jerry) wrote:
> 
> Hi David,
> 
> Thank you for the quick response and for bringing Igor and Jonathan into 
> the discussion!
> 
> On 9/25/2025 1:03 AM, David Hildenbrand wrote:
>> On 24.09.25 12:33, fanhuang wrote:
>>> Hi David,
>>
>> Hi!
>>
>> CCing Igor and Jonathan.
>>
>>>
>>> I hope this email finds you well. It's been several months since 
>>> Zhigang last discussion about the Special Purpose Memory (SPM) 
>>> implementation in QEMU with you, and I wanted to provide some 
>>> background context before presenting the new patch based on your 
>>> valuable suggestions.
>>>
>>> Previous Discussion Summary
>>> ===========================
>>> Back in December 2024, we had an extensive discussion regarding my 
>>> original patch that added the `hmem` option to `memory-backend-file`. 
>>> During that conversation, you raised several important concerns about 
>>> the design approach:
>>>
>>> Original Approach (December 2024)
>>> ----------------------------------
>>> - Zhigang's patch: Added `hmem=on` option to `memory-backend-file`
>>> - QEMU cmdline example:
>>>    -object memory-backend-file,size=16G,id=m1,mem-path=/dev/ 
>>> dax0.0,prealloc=on,align=1G,hmem=on
>>>    -numa node,nodeid=1,memdev=m1
>>>
>>> Your Concerns and Suggestions
>>> -----------------------------
>>> You correctly identified some issues with the original approach:
>>> - Configuration Safety: Users could create problematic configurations 
>>> like:
>>>     -object memory-backend-file,size=16G,id=unused,mem- 
>>> path=whatever,hmem=on
>>>
>>> - Your Recommendation: You proposed a cleaner approach using NUMA 
>>> node configuration:
>>>     -numa node,nodeid=1,memdev=m1,spm=on
>>
>> Oh my, I don't remember all the details from that discussion :)
>>
>> I assume that any memory devices (DIMM/NVDIMM/virtio-mem) we would
>> cold/hotplug to such a NUMA node would not be indicated as spm, correct?
>>
> Yes, that's absolutely correct. The `spm=on` option only affects the boot
> memory that is assigned to the NUMA node through the `memdev=` parameter.
>>>
>>> Project Context
>>> ===============
>>> To refresh your memory on the use case:
>>> - Objective: Pass `EFI_MEMORY_SP` (Special Purpose Memory) type 
>>> memory from host to QEMU virtual machine
>>> - Application: Memory reserved for specific PCI devices (e.g., VFIO- 
>>> PCI devices)
>>> - Guest Behavior: The SPM memory should be recognized by the guest OS 
>>> and claimed by hmem-dax driver
>>>
>>> Complete QEMU Configuration Example:
>>> -object memory-backend-ram,size=8G,id=m0
>>> -object memory-backend-file,size=16G,id=m1,mem-path=/dev/ 
>>> dax0.0,prealloc=on,align=1G
>>> -numa node,nodeid=0,memdev=m0
>>> -numa node,nodeid=1,memdev=m1,spm=on  # <-- New approach based on 
>>> your suggestion
>>
>> The only alternative I could think of is gluing it to a memory device. 
>> For example,
>> have something like:
>>
>> -numa node,nodeid=0,memdev=m0 \
>> -numa node,nodeid=1 \
>> -device pc-dimm,id=sp0,memdev=m1,sp=true
>>
>> But we would not want (and cannot easily) use DIMMs for that purpose.
>>
>>>
>>> New Patch Implementation
>>> ========================
>>> Following your recommendations, I have completely redesigned the 
>>> implementation:
>>>
>>> Key Changes:
>>> 1. Removed `hmem` option from `memory-backend-file`
>>> 2. Added `spm` (special-purpose) option to NUMA node configuration
>>
>> That definitely sounds better to me: essentially "spm" would say: the 
>> boot memory assigned to this
>> node (through memdev=) will be indicated as EFI_MEMORY_SP.
>>
> Thanks, that's exactly how the implementation works. The `spm=on` option
> ensures that when QEMU builds the EFI memory map, the memory region
> corresponding to the specified `memdev` will be marked with the
> EFI_MEMORY_SP attribute.>>
>>> I would appreciate your review of the new patch implementation. The 
>>> design now follows your suggested approach of using NUMA node 
>>> configuration rather than memory backend options, which should 
>>> resolve the safety and scope issues we discussed.
>>> Thank you for your time and valuable guidance on this implementation.
>>>
>>> Please note that I'm located in UTC+8 timezone, so there might be 
>>> some delay in my responses to your emails due to the time difference. 
>>> I appreciate your patience and understanding.
>>
>> No worries :)
>>
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] numa: add 'spm' option for special purpose memory
  2025-09-24 10:33 [PATCH] numa: add 'spm' option for special purpose memory fanhuang
  2025-09-24 10:33 ` fanhuang
  2025-09-24 17:03 ` David Hildenbrand
@ 2025-10-02 14:11 ` Igor Mammedov
  2025-10-02 14:19   ` David Hildenbrand
  2 siblings, 1 reply; 9+ messages in thread
From: Igor Mammedov @ 2025-10-02 14:11 UTC (permalink / raw)
  To: fanhuang; +Cc: qemu-devel, david, Zhigang.Luo, Lianjie.Shi

On Wed, 24 Sep 2025 18:33:23 +0800
fanhuang <FangSheng.Huang@amd.com> wrote:

> Hi David,
> 
> I hope this email finds you well. It's been several months since Zhigang last discussion about the Special Purpose Memory (SPM) implementation in QEMU with you, and I wanted to provide some background context before presenting the new patch based on your valuable suggestions.
> 
> Previous Discussion Summary
> ===========================
> Back in December 2024, we had an extensive discussion regarding my original patch that added the `hmem` option to `memory-backend-file`. During that conversation, you raised several important concerns about the design approach:
> 
> Original Approach (December 2024)
> ----------------------------------
> - Zhigang's patch: Added `hmem=on` option to `memory-backend-file`
> - QEMU cmdline example:
>   -object memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,align=1G,hmem=on
>   -numa node,nodeid=1,memdev=m1
> 
> Your Concerns and Suggestions
> -----------------------------
> You correctly identified some issues with the original approach:
> - Configuration Safety: Users could create problematic configurations like:
>    -object memory-backend-file,size=16G,id=unused,mem-path=whatever,hmem=on
> 
> - Your Recommendation: You proposed a cleaner approach using NUMA node configuration:
>    -numa node,nodeid=1,memdev=m1,spm=on

that seems to me a bit backwards,
aka it's just a particular case where node would have SPM memory only,
which (spm) is not a property of numa node, but rather of memory device attached to it.
 
> Project Context
> ===============
> To refresh your memory on the use case:
> - Objective: Pass `EFI_MEMORY_SP` (Special Purpose Memory) type memory from host to QEMU virtual machine
> - Application: Memory reserved for specific PCI devices (e.g., VFIO-PCI devices)

could you elaborate on this some more /maybe with examples (assume I know nothing about it)?

> - Guest Behavior: The SPM memory should be recognized by the guest OS and claimed by hmem-dax driver
> 
> Complete QEMU Configuration Example:
> -object memory-backend-ram,size=8G,id=m0
> -object memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,align=1G
> -numa node,nodeid=0,memdev=m0
> -numa node,nodeid=1,memdev=m1,spm=on  # <-- New approach based on your suggestion
> 
> New Patch Implementation
> ========================
> Following your recommendations, I have completely redesigned the implementation:
> 
> Key Changes:
> 1. Removed `hmem` option from `memory-backend-file`
> 2. Added `spm` (special-purpose) option to NUMA node configuration
> 
> I would appreciate your review of the new patch implementation. The design now follows your suggested approach of using NUMA node configuration rather than memory backend options, which should resolve the safety and scope issues we discussed.
> Thank you for your time and valuable guidance on this implementation.
> 
> Please note that I'm located in UTC+8 timezone, so there might be some delay in my responses to your emails due to the time difference. I appreciate your patience and understanding.
> 
> Best regards,
> FangSheng Huang
> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] numa: add 'spm' option for special purpose memory
  2025-10-02 14:11 ` Igor Mammedov
@ 2025-10-02 14:19   ` David Hildenbrand
  2025-10-02 14:59     ` Igor Mammedov
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-10-02 14:19 UTC (permalink / raw)
  To: Igor Mammedov, fanhuang; +Cc: qemu-devel, Zhigang.Luo, Lianjie.Shi

On 02.10.25 16:11, Igor Mammedov wrote:
> On Wed, 24 Sep 2025 18:33:23 +0800
> fanhuang <FangSheng.Huang@amd.com> wrote:
> 
>> Hi David,
>>
>> I hope this email finds you well. It's been several months since Zhigang last discussion about the Special Purpose Memory (SPM) implementation in QEMU with you, and I wanted to provide some background context before presenting the new patch based on your valuable suggestions.
>>
>> Previous Discussion Summary
>> ===========================
>> Back in December 2024, we had an extensive discussion regarding my original patch that added the `hmem` option to `memory-backend-file`. During that conversation, you raised several important concerns about the design approach:
>>
>> Original Approach (December 2024)
>> ----------------------------------
>> - Zhigang's patch: Added `hmem=on` option to `memory-backend-file`
>> - QEMU cmdline example:
>>    -object memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,align=1G,hmem=on
>>    -numa node,nodeid=1,memdev=m1
>>
>> Your Concerns and Suggestions
>> -----------------------------
>> You correctly identified some issues with the original approach:
>> - Configuration Safety: Users could create problematic configurations like:
>>     -object memory-backend-file,size=16G,id=unused,mem-path=whatever,hmem=on
>>
>> - Your Recommendation: You proposed a cleaner approach using NUMA node configuration:
>>     -numa node,nodeid=1,memdev=m1,spm=on
> 
> that seems to me a bit backwards,
> aka it's just a particular case where node would have SPM memory only,
> which (spm) is not a property of numa node, but rather of memory device attached to it.

The problem is that boot memory is not modeled as a memory device.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] numa: add 'spm' option for special purpose memory
  2025-10-02 14:19   ` David Hildenbrand
@ 2025-10-02 14:59     ` Igor Mammedov
  2025-10-02 15:51       ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Igor Mammedov @ 2025-10-02 14:59 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: fanhuang, qemu-devel, Zhigang.Luo, Lianjie.Shi

On Thu, 2 Oct 2025 16:19:00 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 02.10.25 16:11, Igor Mammedov wrote:
> > On Wed, 24 Sep 2025 18:33:23 +0800
> > fanhuang <FangSheng.Huang@amd.com> wrote:
> >   
> >> Hi David,
> >>
> >> I hope this email finds you well. It's been several months since Zhigang last discussion about the Special Purpose Memory (SPM) implementation in QEMU with you, and I wanted to provide some background context before presenting the new patch based on your valuable suggestions.
> >>
> >> Previous Discussion Summary
> >> ===========================
> >> Back in December 2024, we had an extensive discussion regarding my original patch that added the `hmem` option to `memory-backend-file`. During that conversation, you raised several important concerns about the design approach:
> >>
> >> Original Approach (December 2024)
> >> ----------------------------------
> >> - Zhigang's patch: Added `hmem=on` option to `memory-backend-file`
> >> - QEMU cmdline example:
> >>    -object memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,align=1G,hmem=on
> >>    -numa node,nodeid=1,memdev=m1
> >>
> >> Your Concerns and Suggestions
> >> -----------------------------
> >> You correctly identified some issues with the original approach:
> >> - Configuration Safety: Users could create problematic configurations like:
> >>     -object memory-backend-file,size=16G,id=unused,mem-path=whatever,hmem=on
> >>
> >> - Your Recommendation: You proposed a cleaner approach using NUMA node configuration:
> >>     -numa node,nodeid=1,memdev=m1,spm=on  
> > 
> > that seems to me a bit backwards,
> > aka it's just a particular case where node would have SPM memory only,
> > which (spm) is not a property of numa node, but rather of memory device attached to it.  
> 
> The problem is that boot memory is not modeled as a memory device.

That's historical abomination we currently have.
Question is: does it have to be boot memory, and why?

Also that's why I've asked for use-cases / devices example that would make use of this feature
(VFIO was mentioned here).



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] numa: add 'spm' option for special purpose memory
  2025-10-02 14:59     ` Igor Mammedov
@ 2025-10-02 15:51       ` David Hildenbrand
  0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-10-02 15:51 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: fanhuang, qemu-devel, Zhigang.Luo, Lianjie.Shi

On 02.10.25 16:59, Igor Mammedov wrote:
> On Thu, 2 Oct 2025 16:19:00 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 02.10.25 16:11, Igor Mammedov wrote:
>>> On Wed, 24 Sep 2025 18:33:23 +0800
>>> fanhuang <FangSheng.Huang@amd.com> wrote:
>>>    
>>>> Hi David,
>>>>
>>>> I hope this email finds you well. It's been several months since Zhigang last discussion about the Special Purpose Memory (SPM) implementation in QEMU with you, and I wanted to provide some background context before presenting the new patch based on your valuable suggestions.
>>>>
>>>> Previous Discussion Summary
>>>> ===========================
>>>> Back in December 2024, we had an extensive discussion regarding my original patch that added the `hmem` option to `memory-backend-file`. During that conversation, you raised several important concerns about the design approach:
>>>>
>>>> Original Approach (December 2024)
>>>> ----------------------------------
>>>> - Zhigang's patch: Added `hmem=on` option to `memory-backend-file`
>>>> - QEMU cmdline example:
>>>>     -object memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,align=1G,hmem=on
>>>>     -numa node,nodeid=1,memdev=m1
>>>>
>>>> Your Concerns and Suggestions
>>>> -----------------------------
>>>> You correctly identified some issues with the original approach:
>>>> - Configuration Safety: Users could create problematic configurations like:
>>>>      -object memory-backend-file,size=16G,id=unused,mem-path=whatever,hmem=on
>>>>
>>>> - Your Recommendation: You proposed a cleaner approach using NUMA node configuration:
>>>>      -numa node,nodeid=1,memdev=m1,spm=on
>>>
>>> that seems to me a bit backwards,
>>> aka it's just a particular case where node would have SPM memory only,
>>> which (spm) is not a property of numa node, but rather of memory device attached to it.
>>
>> The problem is that boot memory is not modeled as a memory device.
> 
> That's historical abomination we currently have.

Right, and that's what the memdev= parameter for the node is all about.

> Question is: does it have to be boot memory, and why?

I wondered the same in my reply: I'm afraid it cannot be a DIMM/NVDIMM, 
these ranges are only described in E820 as "hotplug area".

I think it must be something that's present in the memory map right from 
the start, where the OS would identify it as SP and treat it accordingly.

> 
> Also that's why I've asked for use-cases / devices example that would make use of this feature
> (VFIO was mentioned here).

Yes, good point.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-10-02 15:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-24 10:33 [PATCH] numa: add 'spm' option for special purpose memory fanhuang
2025-09-24 10:33 ` fanhuang
2025-09-24 17:03 ` David Hildenbrand
2025-09-25  7:39   ` Huang, FangSheng (Jerry)
2025-09-25 11:11     ` Huang, FangSheng (Jerry)
2025-10-02 14:11 ` Igor Mammedov
2025-10-02 14:19   ` David Hildenbrand
2025-10-02 14:59     ` Igor Mammedov
2025-10-02 15:51       ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).