qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] numa: add 'spm' option for Specific Purpose Memory
@ 2025-10-20  9:07 fanhuang
  2025-10-20  9:07 ` fanhuang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: fanhuang @ 2025-10-20  9:07 UTC (permalink / raw)
  To: qemu-devel, david, imammedo; +Cc: Zhigang.Luo, Lianjie.Shi, FangSheng.Huang

Hi David and Igor,

I apologize for the delayed response. Thank you very much for your thoughtful
questions and feedback on the SPM patch series.

Before addressing your questions, I'd like to briefly mention what the new
QEMU patch series additionally resolves:

1. **Corrected SPM terminology**: Fixed the description error from the previous
   version. The correct acronym is "Specific Purpose Memory" (not "special
   purpose memory" as previously stated).

2. **Fixed overlapping E820 entries**: Updated the implementation to properly
   handle overlapping E820 RAM entries before adding E820_SOFT_RESERVED
   regions. 

   The previous implementation created overlapping E820 entries by first adding
   a large E820_RAM entry covering the entire above-4GB memory range, then
   adding E820_SOFT_RESERVED entries for SPM regions that overlapped with the
   RAM entry. This violated the E820 specification and caused OVMF/UEFI
   firmware to receive conflicting memory type information for the same
   physical addresses.

   The new implementation processes SPM regions first to identify reserved
   areas, then adds RAM entries around the SPM regions, generating a clean,
   non-overlapping E820 map.

Now, regarding your questions:

========================================================================
Why SPM Must Be Boot Memory
========================================================================

SPM cannot be implemented as hotplug memory (DIMM/NVDIMM) because:

The primary goal of SPM is to ensure that memory is managed by guest
device drivers, not the guest OS. This requires boot-time discovery
for three key reasons:

1. SPM regions must appear in the E820 memory map as `E820_SOFT_RESERVED`
   during firmware initialization, before the OS starts.

2. Hotplug memory is integrated into kernel memory management, making
   it unavailable for device-specific use.

========================================================================
Detailed Use Case
========================================================================

**Background**
Unified Address Space for CPU and GPU:

Modern heterogeneous computing architectures implement a coherent and
unified address space shared between CPUs and GPUs. Unlike traditional
discrete GPU designs with dedicated frame buffer, these accelerators
connect CPU and GPU through high-speed interconnects (e.g., XGMI):

- **HBM (High Bandwidth Memory)**: Physically attached to each GPU,
  reported to the OS as driver-managed system memory

- **XGMI (eXternal Global Memory Interconnect, aka. Infinity Fabric)**:
  Maintains data coherence between CPU and GPU, enabling direct CPU
  access to GPU HBM without data copying

In this architecture, GPU HBM is reported as system memory to the OS,
but it needs to be managed exclusively by the GPU driver rather than
the general OS memory allocator. This driver-managed memory provides
optimal performance for GPU workloads while enabling coherent CPU-GPU
data sharing through the XGMI. This is where SPM (Specific Purpose
Memory) becomes essential.

**Virtualization Scenario**

In virtualization, hypervisor need to expose this memory topology to
guest VMs while maintaining the same driver-managed vs OS-managed
distinction.

In this example, `0000:c1:02.0` is a GPU Virtual Function (VF) device
that requires dedicated memory allocation. The host driver obtains VF
HBM information and creates a user space device for each VF (for
example `/dev/vf_hbm_0000.c1.02.0`) providing an mmap() interface that
allows QEMU to allocate memory from the VF's HBM. By using SPM, this
memory is reserved exclusively for the GPU driver rather than being
available for general OS allocation.

**QEMU Configuration**:
```
-object memory-backend-ram,size=8G,id=m0 \
-numa node,nodeid=0,memdev=m0 \
-object memory-backend-file,size=8G,id=m1,mem-path=/dev/vf_hbm_0000.c1.02.0,prealloc=on,align=16M \
-numa node,nodeid=1,memdev=m1,spm=on \
-device vfio-pci,host=0000:c1:02.0,bus=pcie.0
```

**BIOS-e820**

BIOS provided physical RAM map in which 0x280000000-0x47fffffff as
soft reserved:

```
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000027fffffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000280000000-0x000000047fffffff] soft reserved
```

**Guest OS**

Guest OS sees 8GB (0x280000000-0x47fffffff) as "soft reserved" memory
that only the GPU driver can use, preventing conflicts with general OS
memory allocation:

```
100000000-27fffffff : System RAM
  1b7a00000-1b8ffffff : Kernel code
  1b9000000-1b9825fff : Kernel rodata
  1b9a00000-1b9e775bf : Kernel data
  1ba397000-1ba7fffff : Kernel bss
280000000-47fffffff : Soft Reserved
  280000000-47fffffff : dax0.0
    280000000-47fffffff : System RAM (kmem)
```

========================================================================

I hope this addresses your concerns. Please let me know if you need any
further clarification or have additional questions.

Best regards,
Jerry Huang



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-11-04  8:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-20  9:07 [PATCH v2] numa: add 'spm' option for Specific Purpose Memory fanhuang
2025-10-20  9:07 ` fanhuang
2025-11-03 12:32   ` David Hildenbrand
2025-11-04  8:00     ` Huang, FangSheng (Jerry)
2025-10-20 10:15 ` Jonathan Cameron via
2025-10-20 20:03   ` David Hildenbrand
2025-10-22 10:19     ` Huang, FangSheng (Jerry)
2025-10-20 20:10 ` David Hildenbrand
2025-10-22 10:09   ` Huang, FangSheng (Jerry)
2025-10-22 10:28     ` David Hildenbrand
2025-11-03  3:01       ` Huang, FangSheng (Jerry)
2025-11-03 12:36         ` David Hildenbrand (Red Hat)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).