Re: [PATCH v2] numa: add 'spm' option for Specific Purpose Memory

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, FangSheng (Jerry)" <FangSheng.Huang@amd.com>
To: David Hildenbrand <david@redhat.com>, <qemu-devel@nongnu.org>,
	<imammedo@redhat.com>
Cc: <Zhigang.Luo@amd.com>, <Lianjie.Shi@amd.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Subject: Re: [PATCH v2] numa: add 'spm' option for Specific Purpose Memory
Date: Mon, 3 Nov 2025 11:01:25 +0800	[thread overview]
Message-ID: <1fc33dfc-ae73-4d23-a21a-a3a5ed480dd1@amd.com> (raw)
In-Reply-To: <eb1b524d-3a8c-481b-85eb-6697f5ee332b@redhat.com>

Hi David,

I hope this email finds you well. I wanted to follow up on the SPM
patch series we discussed back in October.

I'm reaching out to check on the current status and see if there's
anything else I should address or any additional information I can
provide.

Thank you for your time and guidance on this!

Best regards,
Jerry Huang

On 10/22/2025 6:28 PM, David Hildenbrand wrote:
> On 22.10.25 12:09, Huang, FangSheng (Jerry) wrote:
>>
>>
>> On 10/21/2025 4:10 AM, David Hildenbrand wrote:
>>> On 20.10.25 11:07, fanhuang wrote:
>>>> Hi David and Igor,
>>>>
>>>> I apologize for the delayed response. Thank you very much for your
>>>> thoughtful
>>>> questions and feedback on the SPM patch series.
>>>>
>>>> Before addressing your questions, I'd like to briefly mention what the
>>>> new
>>>> QEMU patch series additionally resolves:
>>>>
>>>> 1. **Corrected SPM terminology**: Fixed the description error from the
>>>> previous
>>>>      version. The correct acronym is "Specific Purpose Memory" (not
>>>> "special
>>>>      purpose memory" as previously stated).
>>>>
>>>> 2. **Fixed overlapping E820 entries**: Updated the implementation to
>>>> properly
>>>>      handle overlapping E820 RAM entries before adding 
>>>> E820_SOFT_RESERVED
>>>>      regions.
>>>>
>>>>      The previous implementation created overlapping E820 entries by
>>>> first adding
>>>>      a large E820_RAM entry covering the entire above-4GB memory range,
>>>> then
>>>>      adding E820_SOFT_RESERVED entries for SPM regions that overlapped
>>>> with the
>>>>      RAM entry. This violated the E820 specification and caused 
>>>> OVMF/UEFI
>>>>      firmware to receive conflicting memory type information for the 
>>>> same
>>>>      physical addresses.
>>>>
>>>>      The new implementation processes SPM regions first to identify
>>>> reserved
>>>>      areas, then adds RAM entries around the SPM regions, generating a
>>>> clean,
>>>>      non-overlapping E820 map.
>>>>
>>>> Now, regarding your questions:
>>>>
>>>> ========================================================================
>>>> Why SPM Must Be Boot Memory
>>>> ========================================================================
>>>>
>>>> SPM cannot be implemented as hotplug memory (DIMM/NVDIMM) because:
>>>>
>>>> The primary goal of SPM is to ensure that memory is managed by guest
>>>> device drivers, not the guest OS. This requires boot-time discovery
>>>> for three key reasons:
>>>>
>>>> 1. SPM regions must appear in the E820 memory map as 
>>>> `E820_SOFT_RESERVED`
>>>>      during firmware initialization, before the OS starts.
>>>>
>>>> 2. Hotplug memory is integrated into kernel memory management, making
>>>>      it unavailable for device-specific use.
>>>>
>>>> ========================================================================
>>>> Detailed Use Case
>>>> ========================================================================
>>>>
>>>> **Background**
>>>> Unified Address Space for CPU and GPU:
>>>>
>>>> Modern heterogeneous computing architectures implement a coherent and
>>>> unified address space shared between CPUs and GPUs. Unlike traditional
>>>> discrete GPU designs with dedicated frame buffer, these accelerators
>>>> connect CPU and GPU through high-speed interconnects (e.g., XGMI):
>>>>
>>>> - **HBM (High Bandwidth Memory)**: Physically attached to each GPU,
>>>>     reported to the OS as driver-managed system memory
>>>>
>>>> - **XGMI (eXternal Global Memory Interconnect, aka. Infinity Fabric)**:
>>>>     Maintains data coherence between CPU and GPU, enabling direct CPU
>>>>     access to GPU HBM without data copying
>>>>
>>>> In this architecture, GPU HBM is reported as system memory to the OS,
>>>> but it needs to be managed exclusively by the GPU driver rather than
>>>> the general OS memory allocator. This driver-managed memory provides
>>>> optimal performance for GPU workloads while enabling coherent CPU-GPU
>>>> data sharing through the XGMI. This is where SPM (Specific Purpose
>>>> Memory) becomes essential.
>>>>
>>>> **Virtualization Scenario**
>>>>
>>>> In virtualization, hypervisor need to expose this memory topology to
>>>> guest VMs while maintaining the same driver-managed vs OS-managed
>>>> distinction.
>>>
>>> Just wondering, could device hotplug in that model ever work? I guess we
>>> wouldn't expose the memory at all in e820 (after all, it gets hotplugged
>>> later) and instead the device driver in the guest would have to
>>> detect+hotplug that memoory.
>>>
>>> But that sounds weird, because the device driver in the VM shouldn't do
>>> something virt specific.
>>>
>>> Which raises the question: how is device hoplug of such gpus handled on
>>> bare metal? Or does it simply not work? :)
>>>
>> Hi David, Thank you for your thoughtful feedback.
>> To directly answer your question:
>> in our use case, GPU device hotplug does NOT work on bare metal,
>> and this is by design.
> 
> Cool, thanks for clarifying!
>

next prev parent reply	other threads:[~2025-11-03  3:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-20  9:07 [PATCH v2] numa: add 'spm' option for Specific Purpose Memory fanhuang
2025-10-20  9:07 ` fanhuang
2025-11-03 12:32   ` David Hildenbrand
2025-11-04  8:00     ` Huang, FangSheng (Jerry)
2025-10-20 10:15 ` Jonathan Cameron via
2025-10-20 20:03   ` David Hildenbrand
2025-10-22 10:19     ` Huang, FangSheng (Jerry)
2025-10-20 20:10 ` David Hildenbrand
2025-10-22 10:09   ` Huang, FangSheng (Jerry)
2025-10-22 10:28     ` David Hildenbrand
2025-11-03  3:01       ` Huang, FangSheng (Jerry) [this message]
2025-11-03 12:36         ` David Hildenbrand (Red Hat)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1fc33dfc-ae73-4d23-a21a-a3a5ed480dd1@amd.com \
    --to=fangsheng.huang@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Lianjie.Shi@amd.com \
    --cc=Zhigang.Luo@amd.com \
    --cc=david@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).