qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: fanhuang <FangSheng.Huang@amd.com>,
	qemu-devel@nongnu.org, imammedo@redhat.com
Cc: Zhigang.Luo@amd.com, Lianjie.Shi@amd.com,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Subject: Re: [PATCH v2] numa: add 'spm' option for Specific Purpose Memory
Date: Mon, 20 Oct 2025 22:10:21 +0200	[thread overview]
Message-ID: <c35a21dd-40e5-4fa5-87c4-18ebe8ca73ca@redhat.com> (raw)
In-Reply-To: <20251020090701.4036748-1-FangSheng.Huang@amd.com>

On 20.10.25 11:07, fanhuang wrote:
> Hi David and Igor,
> 
> I apologize for the delayed response. Thank you very much for your thoughtful
> questions and feedback on the SPM patch series.
> 
> Before addressing your questions, I'd like to briefly mention what the new
> QEMU patch series additionally resolves:
> 
> 1. **Corrected SPM terminology**: Fixed the description error from the previous
>     version. The correct acronym is "Specific Purpose Memory" (not "special
>     purpose memory" as previously stated).
> 
> 2. **Fixed overlapping E820 entries**: Updated the implementation to properly
>     handle overlapping E820 RAM entries before adding E820_SOFT_RESERVED
>     regions.
> 
>     The previous implementation created overlapping E820 entries by first adding
>     a large E820_RAM entry covering the entire above-4GB memory range, then
>     adding E820_SOFT_RESERVED entries for SPM regions that overlapped with the
>     RAM entry. This violated the E820 specification and caused OVMF/UEFI
>     firmware to receive conflicting memory type information for the same
>     physical addresses.
> 
>     The new implementation processes SPM regions first to identify reserved
>     areas, then adds RAM entries around the SPM regions, generating a clean,
>     non-overlapping E820 map.
> 
> Now, regarding your questions:
> 
> ========================================================================
> Why SPM Must Be Boot Memory
> ========================================================================
> 
> SPM cannot be implemented as hotplug memory (DIMM/NVDIMM) because:
> 
> The primary goal of SPM is to ensure that memory is managed by guest
> device drivers, not the guest OS. This requires boot-time discovery
> for three key reasons:
> 
> 1. SPM regions must appear in the E820 memory map as `E820_SOFT_RESERVED`
>     during firmware initialization, before the OS starts.
> 
> 2. Hotplug memory is integrated into kernel memory management, making
>     it unavailable for device-specific use.
> 
> ========================================================================
> Detailed Use Case
> ========================================================================
> 
> **Background**
> Unified Address Space for CPU and GPU:
> 
> Modern heterogeneous computing architectures implement a coherent and
> unified address space shared between CPUs and GPUs. Unlike traditional
> discrete GPU designs with dedicated frame buffer, these accelerators
> connect CPU and GPU through high-speed interconnects (e.g., XGMI):
> 
> - **HBM (High Bandwidth Memory)**: Physically attached to each GPU,
>    reported to the OS as driver-managed system memory
> 
> - **XGMI (eXternal Global Memory Interconnect, aka. Infinity Fabric)**:
>    Maintains data coherence between CPU and GPU, enabling direct CPU
>    access to GPU HBM without data copying
> 
> In this architecture, GPU HBM is reported as system memory to the OS,
> but it needs to be managed exclusively by the GPU driver rather than
> the general OS memory allocator. This driver-managed memory provides
> optimal performance for GPU workloads while enabling coherent CPU-GPU
> data sharing through the XGMI. This is where SPM (Specific Purpose
> Memory) becomes essential.
> 
> **Virtualization Scenario**
> 
> In virtualization, hypervisor need to expose this memory topology to
> guest VMs while maintaining the same driver-managed vs OS-managed
> distinction.

Just wondering, could device hotplug in that model ever work? I guess we 
wouldn't expose the memory at all in e820 (after all, it gets hotplugged 
later) and instead the device driver in the guest would have to 
detect+hotplug that memoory.

But that sounds weird, because the device driver in the VM shouldn't do 
something virt specific.

Which raises the question: how is device hoplug of such gpus handled on 
bare metal? Or does it simply not work? :)

-- 
Cheers

David / dhildenb



  parent reply	other threads:[~2025-10-20 20:11 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-20  9:07 [PATCH v2] numa: add 'spm' option for Specific Purpose Memory fanhuang
2025-10-20  9:07 ` fanhuang
2025-11-03 12:32   ` David Hildenbrand
2025-11-04  8:00     ` Huang, FangSheng (Jerry)
2025-10-20 10:15 ` Jonathan Cameron via
2025-10-20 20:03   ` David Hildenbrand
2025-10-22 10:19     ` Huang, FangSheng (Jerry)
2025-10-20 20:10 ` David Hildenbrand [this message]
2025-10-22 10:09   ` Huang, FangSheng (Jerry)
2025-10-22 10:28     ` David Hildenbrand
2025-11-03  3:01       ` Huang, FangSheng (Jerry)
2025-11-03 12:36         ` David Hildenbrand (Red Hat)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c35a21dd-40e5-4fa5-87c4-18ebe8ca73ca@redhat.com \
    --to=david@redhat.com \
    --cc=FangSheng.Huang@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Lianjie.Shi@amd.com \
    --cc=Zhigang.Luo@amd.com \
    --cc=imammedo@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).