Re: [PATCH v3 00/12] KVM/hostmem: Support init-shared guest-memfd as VM backends

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Michael Roth <michael.roth@amd.com>
Cc: qemu-devel@nongnu.org, Juraj Marcin <jmarcin@redhat.com>,
	David Hildenbrand <david@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Chenyi Qiang <chenyi.qiang@intel.com>,
	Fabiano Rosas <farosas@suse.de>,
	Alexey Kardashevskiy <aik@amd.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: Re: [PATCH v3 00/12] KVM/hostmem: Support init-shared guest-memfd as VM backends
Date: Fri, 5 Jun 2026 10:57:34 -0400	[thread overview]
Message-ID: <aiLj3o7IIixCvX2A@x1.local> (raw)
In-Reply-To: <lpkcfd2crgparcd64ydry3ocryx3sfc5gj5pzrrms4nwvw6j4c@ulc3wa3rmefo>

On Thu, Jun 04, 2026 at 05:36:42PM -0500, Michael Roth wrote:
> > IIUC it's a matter of if we expect future property of guest-memfd that will
> > stop applying to memfd anymore?
> 
> Yah, I think that's the main thing to consider. There's a few things in the
> pipeline where the options associated with guest_memfd might diverage
> quite a bit from memfd:

Thanks for all these contexts.  I'll throw some random questions below,
some of them may not be directly related to the current discussion, but
please bare with me.

> 
>   - hugetlb: yes, these could potentially use the same options memfd
>     uses, and I'm guessing that will end up being the case, but one
>     large gap there is that shared memory is always split to 4K, which
>     we've accepted for now, but if you consider use-cases like DPDK
>     there can still be major performance bottlenecks that would drive
>     us to try to enable larger mappings for the shared ranges, and then
>     we'd end up with guest-memfd-specific parameters intermix with
>     normal memfd options, and our related documentation would need to
>     covers these differences case by case

The first thing I thought about is mTHP and how it can also be similarly
applied to normal memfd (now, or in the future, that I'm not sure).

Before that..  shouldn't the whole concept of private mem / gmem about
reducing the area of mapping the host (including dpdk, if we're talking
about things like OpenVswitch)?  Can you roughly describe how huge mapping
is expected to be allowed in such case?  Does it mean the guest driver
should also be aware to allocate huge continuous physical mem for DMA only?

>   - DAX-like stuff: there are some proposals for making device memory
>     available to use as private guest memory, and since 'guest-memfd'
>     is generally responsible for managing private memory, it will
>     likely end up being extended to handle this at some point. One
>     proposal/PoC[1] would involve at least needing additional options
>     for the /dev/dax path, but there have also been discussions about
>     having a general notion of custom allocators that can be plugged
>     into guest_memfd, and some of these might have overlapping options
>     WRT things like hugepages/etc. But at a high-level, DAX would map
>     more to memory-backend-file than memory-backend-memfd, so we'd
>     already be crossing up some wires there.

I have no deep understanding on this, but IIUC we used to stick with
memory-backend-file for dax.  Why switch to memory-backend-guest-memfd?
Are we still exposing a dax via a file path ultimately, even with CoCo?

Note, here I want to differenciate two concepts: QEMU interfacing and
kernel/KVM interfacing.  I mean, I have a gut feeling that for coco dax we
could still stick with memory-backend-file, even if internally we can still
use new KVM ioctls to set them up: there's no rule to say only
memory-backend-guest-memfd can use the KVM ioctl.  IMHO they're different
stories, and here I'm focused more on the QEMU interfacing that we're
discussing here.

IMHO for QEMU's interfacing, any memory-backend should play one solo role
which is to point to QEMU (as a hypervisor) a backing store for some piece
of resource that can be used as guest memory backend.  It doesn't need to
have any implication on how we implement that backend internally.

>   - live update: there's work[2] on enabling preservation of confidential
>     guest memory across kexec by preserving it through guest_memfd. This
>     one is still a bit mind-blowing to me but I could see us needing
>     some additional options here that would really make no sense for
>     memfd.

Could you elaborate what kind of parameter you would expect?

I'm not sure if you have investigated QEMU's CPR approach, now memfd
backend is really the core of supporting such infrastructure, where fds can
be persisted.  For live update, it'll be persisted across kexec and kernel
switchover.  For CPR, it actually also works when with cpr-reboot with its
own tricky way to persist memory.

In general, what I want to say is, I really think they should play the same
in term of live update case too: if we need to register some fd for
persistency, we need to register gmem, kvm, but also memfd if some of them
are attached to the current VM, right?

>   - directmap removal: these[3] patches allow a new guest_memfd flag to
>     be set to unmap guest_memfd pages from kernel directmap to help
>     mitigate speculative attacks, probably would involve a new option
>     as well that wouldn't be applicable to normal memfds

Now the question is, do we want to remove directmap for "some" memory
backend, or do we want to remove it per-VM?

This is another thing I want to make sure we're on the same page: I want to
make sure we don't introduce per-VM setup for memory backends.

Say, "init-shared" or "in-place CoCo", what should we use for one gmem fd?
IMHO it shouldn't be a parameter in the memory-backend.  It should be a
parameter for the -machine or some similar per-vm setup, which will apply
to all gmemfd across the current VM.

My understanding is directmap removal is similar in this case, which seems
to be a per-VM (rather than per-memory-backend) attribute?  We can still
operate on that per-memory-backend, but then it'll be internally, the
backends need to understand the VM setup and do things properly, IMHO.

> 
> It could also end up that even memory-backend-guest-memfd is too
> generic, and that some of these would involve a more specialized memory
> backend where may they can share a common base class for some of the
> core guest_memfd stuff but otherwise be separate backends with their
> own specific options. So to me, starting off building up
> memory-backend-memfd seems like a potential misstep, whereas we don't
> really lose much to start with a clean slate.
> 
> [1] DAX: https://lwn.net/ml/all/20260423170219.281618-1-dave.jiang@intel.com/
> [2] LUO: https://lore.kernel.org/all/cover.1779080766.git.tarunsahu@google.com/#r
> [3] directmap removal: https://lore.kernel.org/kvm/20260317141031.514-1-kalyazin@amazon.com/
> 
> > 
> > > 
> > > I also saw you were open to having someone pick up these patches if you
> > > don't think you'll have a chance to get to them near-term, so I'd be
> > > happy to pick them up if that's preferable.
> > 
> > Sure!  Indeed I don't have bandwidth to keep working on this one in the
> > near future. Please feel free to pick whatever needed into your series.
> 
> Ok, sounds good, I'll pick these up for my next posting and incorporate
> any changes/comments that might still be pending at that time.
> 
> Thanks for getting things to this stage!

Thanks for picking it up!  Juraj in our team may have some future
exploration on gmem over 1G for postcopy on init-shared, so it's great the
code is moving closer to that direction.

Thanks,

-- 
Peter Xu

next prev parent reply	other threads:[~2026-06-05 14:58 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-15 20:51 [PATCH v3 00/12] KVM/hostmem: Support init-shared guest-memfd as VM backends Peter Xu
2025-12-15 20:51 ` [PATCH v3 01/12] kvm: Decouple memory attribute check from kvm_guest_memfd_supported Peter Xu
2025-12-16 12:41   ` Xiaoyao Li
2025-12-23 16:56     ` Peter Xu
2025-12-16 13:53   ` Fabiano Rosas
2025-12-23 17:02     ` Peter Xu
2026-06-02  1:10   ` Michael Roth
2025-12-15 20:51 ` [PATCH v3 02/12] kvm: Detect guest-memfd flags supported Peter Xu
2025-12-16 13:54   ` Fabiano Rosas
2026-06-02  1:29   ` Michael Roth
2025-12-15 20:51 ` [PATCH v3 03/12] kvm: Provide explicit error for kvm_create_guest_memfd() Peter Xu
2025-12-16  4:03   ` Xiaoyao Li
2025-12-16 13:55   ` Fabiano Rosas
2026-06-02  1:31   ` Michael Roth
2025-12-15 20:51 ` [PATCH v3 04/12] ramblock: Rename guest_memfd to guest_memfd_private Peter Xu
2026-06-02  1:37   ` Michael Roth
2025-12-15 20:51 ` [PATCH v3 05/12] memory: Rename RAM_GUEST_MEMFD to RAM_GUEST_MEMFD_PRIVATE Peter Xu
2025-12-16  5:49   ` Xiaoyao Li
2025-12-23 17:04     ` Peter Xu
2026-06-02  1:39   ` Michael Roth
2025-12-15 20:51 ` [PATCH v3 06/12] memory: Rename memory_region_has_guest_memfd() to *_private() Peter Xu
2026-06-02  1:40   ` Michael Roth
2025-12-15 20:51 ` [PATCH v3 07/12] hostmem: Rename guest_memfd to guest_memfd_private Peter Xu
2025-12-16  5:54   ` Xiaoyao Li
2026-06-02 18:56   ` Michael Roth
2025-12-15 20:51 ` [PATCH v3 08/12] hostmem: Support fully shared guest memfd to back a VM Peter Xu
2025-12-16  6:54   ` Xiaoyao Li
2025-12-16 14:02   ` Fabiano Rosas
2026-06-02 21:40   ` Michael Roth
2026-06-05  7:23     ` David Hildenbrand (Arm)
2026-06-05 11:23       ` David Hildenbrand (Arm)
2025-12-15 20:52 ` [PATCH v3 09/12] machine: Rename machine_require_guest_memfd() to *_private() Peter Xu
2025-12-16  6:55   ` Xiaoyao Li
2026-06-02 21:46   ` Michael Roth
2025-12-15 20:52 ` [PATCH v3 10/12] memory: Rename memory_region_init_ram_guest_memfd() " Peter Xu
2025-12-16  6:56   ` Xiaoyao Li
2026-06-02 21:49   ` Michael Roth
2025-12-15 20:52 ` [PATCH v3 11/12] tests/migration-test: Support guest-memfd init shared mem type Peter Xu
2025-12-16 14:18   ` Fabiano Rosas
2025-12-23 17:09     ` Peter Xu
2025-12-15 20:52 ` [PATCH v3 12/12] tests/migration-test: Add a precopy test for guest-memfd Peter Xu
2025-12-16 14:20   ` Fabiano Rosas
2026-06-02 22:02 ` [PATCH v3 00/12] KVM/hostmem: Support init-shared guest-memfd as VM backends Michael Roth
2026-06-03 19:27   ` Peter Xu
2026-06-04 22:36     ` Michael Roth
2026-06-05 14:57       ` Peter Xu [this message]
2026-06-08 17:59         ` Michael Roth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiLj3o7IIixCvX2A@x1.local \
    --to=peterx@redhat.com \
    --cc=aik@amd.com \
    --cc=chenyi.qiang@intel.com \
    --cc=david@kernel.org \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=xiaoyao.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.