qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Luo, Zhigang" <Zhigang.Luo@amd.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Cc: "kraxel@redhat.com" <kraxel@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Subject: Re: [PATCH] hostmem-file: add the 'hmem' option
Date: Tue, 10 Dec 2024 23:01:47 +0100	[thread overview]
Message-ID: <7e9298da-79e2-43b6-a616-b1e1e1e1a883@redhat.com> (raw)
In-Reply-To: <BL1PR12MB5317EAF52CFCABB96E05D538F13D2@BL1PR12MB5317.namprd12.prod.outlook.com>

On 10.12.24 22:51, Luo, Zhigang wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
>> -----Original Message-----
>> From: David Hildenbrand <david@redhat.com>
>> Sent: Tuesday, December 10, 2024 2:55 PM
>> To: Luo, Zhigang <Zhigang.Luo@amd.com>; qemu-devel@nongnu.org
>> Cc: kraxel@redhat.com; Igor Mammedov <imammedo@redhat.com>
>> Subject: Re: [PATCH] hostmem-file: add the 'hmem' option
>>
>> On 10.12.24 20:32, Luo, Zhigang wrote:
>>> [AMD Official Use Only - AMD Internal Distribution Only]
>>>
>>> Hi David,
>>>
>>
>> Hi,
>>
>>>>>
>>>>> Thanks for your comments.
>>>>> Let me give you some background for this patch.
>>>>> I am currently engaged in a project that requires to pass the
>>>>> EFI_MEMORY_SP
>>>> (Special Purpose Memory) type memory from host to a virtual machine
>>>> within QEMU. This memory needs to be EFI_MEMORY_SP type in the
>>>> virtual machine as well.
>>>>> This particular memory type is essential for the functionality of my project.
>>>>
>>>> Which exact guest memory will be backed by this memory? All guest-memory?
>>> [Luo, Zhigang] not all guest-memory. Only the memory reserved for specific
>> device.
>>
>> Can you show me an example QEMU cmdline, and how you would pass that
>> hostmem-file object to the device?
>>
> [Luo, Zhigang] the following is an example. m1 is the reserved memory for pci device "0000:03:00.0". both the memory and pci device are set to same numa node.
> 
> -object memory-backend-ram,size=8G,id=m0 \
> -object memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,align=1G,hmem=on \
> -numa node,nodeid=0,memdev=m0 -numa node,nodeid=1,memdev=m1 \

Okay, so you expose this memory as a second numa node, and want the 
guest to identify the second numa node as SP to not use it during boot.

Let me CC Jonathan, I am pretty sure he has an idea what to do here.

> -device pxb-pcie,id=pcie.1,numa_node=1,bus_nr=2,bus=pcie.0 \
> -device ioh3420,id=pcie_port1,bus=pcie.1,chassis=1 \
> -device vfio-pci,host=0000:03:00.0,id=hostdev0,bus=pcie_port1
> 
>>>
>>>>
>>>> And, what is the guest OS going to do with this memory?
>>> [Luo, Zhigang] the device driver in guest will use this reserved memory.
>>
>> Okay, so just like CXL memory.
>>
>>>
>>>>
>>>> Usually, this SP memory (dax, cxl, ...) is not used as boot memory.
>>>> Like on a bare metal system, one would expect that only CXL memory
>>>> will be marked as special and put aside to the cxl driver, such that
>>>> the OS can boot on ordinary DIMMs, such that cxl can online it etc.
>>>>
>>>> So maybe you would want to expose this memory using CXL-mem device to
>>>> the VM? Or a DIMM?
>>>>
>>>> I assume the alternative is to tell the VM on the Linux kernel
>>>> cmdline to set EFI_MEMORY_SP on this memory. I recall that there is a way to
>> achieve that.
>>>>
>>> [Luo, Zhigang] I know this option. but it requires the end user to know where is the
>> memory location in guest side(start address, size).
>>
>> Right.
>>
>>>
>>>
>>>>> In Linux, the SPM memory will be claimed by hmem-dax driver by
>>>>> default. With
>>>> this patch I can use the following config to pass the SPM memory to guest VM.
>>>>> -object
>>>>> memory-backend-file,size=30G,id=m1,mem-path=/dev/dax0.0,prealloc=on,
>>>>> al
>>>>> ign=1G,hmem=on
>>>>>
>>>>> I was thinking to change the option name from "hmem" to "spm" to
>>>>> avoid
>>>> confusion.
>>>>
>>>> Likely it should be specified elsewhere, that you want specific guest
>>>> RAM ranges to be EFI_MEMORY_SP. For a DIMM, it could be a property,
>>>> similarly maybe for CXL- mem devices (no expert on that).
>>>>
>>>> For boot memory / machine memory it could be a machine property. But
>>>> I'll first have to learn which ranges you actually want to expose
>>>> that way, and what the VM will do with that information.
>>> [Luo, Zhigang] we want to expose the SPM memory reserved for specific device.
>> And we will pass the SPM memory and the device to guest. Then the device driver
>> can use the SPM memory in guest side.
>>
>> Then the device driver should likely have a way to configure that, not the memory
>> backend.
>>
>> After all, the device driver will map it somehow into guest physical address space
>> (how?).
>>
> [Luo, Zhigang] from guest view, it's still system memory, but marked as SPM. So, qemu will map the memory to guest physical address space.
> The device driver just claims to use the SPM memory in guest side.
> 
>>>
>>>>
>>>>>
>>>>> Do you have any suggestions to achieve this more reasonable?
>>>>
>>>> The problem with qemu_ram_foreach_block() is that you would indicate
>>>> also DIMMs, virtio-mem, ... and even RAMBlocks that are not even used
>>>> for backing anything to the VM as EFI_MEMORY_SP, which is wrong.
>>> [Luo, Zhigang] qemu_ram_foreach_block() will list all memory block, but in
>> pc_update_hmem_memory(), only the memory block with "hmem" flag will be
>> updated to SPM memory.
>>
>> Yes, but imagine a user passing such a memory backend to a DIMM/virtio-mem/boot
>> memory etc. It will have very undesired side effects.
>>
> [Luo, Zhigang] the user should know what he/she is doing when he/she set the flag for the memory region.

No, we must not allow to create insane configurations that don't make 
any sense.

Sufficient to add:

-object memory-backend-file,size=16G,id=unused,mem-path=whatever,hmem=on

to the cmdline to cause a mess.


Maybe it should be a "numa" node configuration like

-numa node,nodeid=1,memdev=m1,sp=on

But I recall that we discussed something related with Jonathan, so I'm 
hoping we can get his input.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-12-10 22:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-04 17:11 [PATCH] hostmem-file: add the 'hmem' option Zhigang Luo
2024-12-06 10:07 ` David Hildenbrand
2024-12-06 17:58   ` Luo, Zhigang
2024-12-09 21:11     ` David Hildenbrand
2024-12-10 19:32       ` Luo, Zhigang
2024-12-10 19:54         ` David Hildenbrand
2024-12-10 21:51           ` Luo, Zhigang
2024-12-10 22:01             ` David Hildenbrand [this message]
2024-12-13 16:54               ` Luo, Zhigang
2024-12-16 14:40             ` Igor Mammedov
2024-12-16 15:40               ` Luo, Zhigang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e9298da-79e2-43b6-a616-b1e1e1e1a883@redhat.com \
    --to=david@redhat.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Zhigang.Luo@amd.com \
    --cc=imammedo@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).