All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/12] guest_memfd: support in-place memory conversion
@ 2026-05-28  0:03 Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 01/12] accel/kvm: Decouple guest_memfd checks from memory attribute checks Michael Roth
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

This patchset is also available at:

  https://github.com/amdese/qemu/commits/snp-inplace-rfc1

which is in turn based on the following series:

  [PATCH 0/4] "guest_memfd: Fix handling for conversions of MMIO ranges"
  https://lists.gnu.org/archive/html/qemu-devel/2026-05/msg07547.html


OVERVIEW
--------

This series adds guest_memfd support for in-place conversion of memory
between private/shared, and enables it for SEV-SNP guests. It is based
on recently-added kernel support for mmap()-able guest_memfd
instances[1], which allow it to be used for shared memory, and the
following patchset[2], which adds additional guest_memfd interfaces to
allow it to be used to perform in-place conversion:

  "[PATCH v7 00/42] guest_memfd: In-place conversion support"
  https://lore.kernel.org/kvm/20260522-gmem-inplace-conversion-v7-0-2f0fae496530@google.com/

That series also introduces a new 'vm_memory_attributes' KVM
module option, which sets whether memory attributes are tracked
VM-wide by KVM (vm_memory_attributes=1: the existing 'legacy' mode),
or per-guest_memfd instance (vm_memory_attributes=0: the new mode
which allows for in-place conversion). The latter is intended to
eventually deprecate the legacy mode, at which point in-place
conversion would become the primarily-supported mode.


MOTIVATION
----------

Today, SEV-SNP guests (and other CoCo VM types using guest_memfd) keep
shared and private memory on separate physical backings: a userspace
memory-backend object for shared pages, and a kernel-allocated
guest_memfd file descriptor for private pages. KVM_SET_MEMORY_ATTRIBUTES
flips which backing the guest sees for a given GPA range, and the old
backing is typically discarded / hole-punched on conversion to avoid
doubled memory usage.

That model works, but has a number of downsides that impact certain
use-cases:

  - Each conversion involves discarding pages on one side and faulting
    them in on the other, which incurs allocation overheads in the
    host kernel for every conversion.

  - Some use-cases, like pKVM[3], rely on memory isolation rather than
    encryption and rely on in-place conversion to pass through things
    like secured framebuffer memory without needing to bounce data
    through separate shared/private HPAs, which would introduce
    unacceptable latency for that sort of workload.

  - Hugetlb support[4] for guest_memfd will rely on it, since things like
    1GB hugepages with a mix of shared/private sub-ranges would generally
    require 2 1GB hugetlb pages to remain available to handle shared vs.
    private accesses, which quickly causes doubling of guest memory usage.

Recent kernel work[2] makes guest_memfd mmap()-able and lets the *same*
physical pages be used for both shared and private states for a given
GPA range, allowing the above pitfalls to be naturally avoided.

This series wires that support up in QEMU.


DESIGN
------

A new dedicated memory backend, memory-backend-guest-memfd, allocates
its memory via a guest_memfd file descriptor obtained from KVM with
the GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED flags. The fd
is mmap()ed so userspace can access pages directly while they are in
the shared state. For a normal/non-confidential VM, this backend can
be used in a similar fashion as the existing memory-backend-memfd.

For confidential VMs, a new 'convert-in-place' flag is added to switch
on in-place conversion support. When running in this mode, the user
*MUST* use memory-backend-guest-memfd for backing guest RAM. A new
RAM_GUEST_MEMFD_SHARED RAMBlock flag is added to track/enforce the
dependency. Additionally, QEMU is modified to use mmap()-able
guest_memfd and set this flag for other cases where it allocates RAM
internally. As a result, block->fd will generally always a guest_memfd,
and when RAM_GUEST_MEMFD_SHARED is set then that block->fd will be
qemu_dup()'d as the FD handle for private memory is well (which is
currently what block->guest_memfd point to). This allows the prior
non-in-place handling around block->guest_memfd to be kept mostly
unchanged.

When running with convert-in-place=true, shared/private conversions
are no longer handled directly by KVM, but instead by a new guest_memfd
ioctl, KVM_SET_MEMORY_ATTRIBUTES2, which purposely provides similar
naming/implementation to the KVM_SET_MEMORY_ATTRIBUTES KVM ioctl that
it replaces. This series adds handling to route conversion requests to
the appropriate ioctls based on whether or not in-place conversion is
enabled.

Since guest_memfd ioctls need to be called against the specific
guest_memfd inode associated with each memory slot/region, some
refactoring is needed to handle conversions on a per-section. Much of
that is inherited from the bugfix series this patchset is based on top
of, which adds the initial logic for handling multiple sections within
a range that gets heavily re-used here.


USAGE
-----

After applying this series against a kernel with the RFC patches above
present, an SEV-SNP guest can be started with in-place conversion via:

    qemu-system-x86_64 \
        -machine q35,confidential-guest-support=sev0,memory-backend=ram0 \
        -object memory-backend-guest-memfd,id=ram0,size=8G,share=on \
        -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,\
                convert-in-place=on \
        ...

The new memory-backend-guest-memfd can also be used by normal VMs:

    qemu-system-x86_64 \
        -machine q35,memory-backend=ram0 \
        -object memory-backend-guest-memfd,id=ram0,size=8G,share=on \
        ...

This is mainly only useful atm for testing, but in the future there may
be more use-cases around using guest_memfd as a general-purpose backend
for non-confidential VMs, so it is intended to work in this manner as
well.


NOTES/TODO
----------

  - the CPR handling to support resetting of confidential VMs is
    currently disabled when in-place conversion is enabled.
  - TDX testing would be great, in theory it can be enabled with this
    series (similarly to the top patch) but I'm not sure if there are
    other special requirements before we can switch it on.
  - kernel patches are still in-flight, but fairly mature at this point
    and nearing upstream


REFERENCES
----------

[1] https://lore.kernel.org/kvm/20250729225455.670324-1-seanjc@google.com/
[2] https://lore.kernel.org/kvm/20260522-gmem-inplace-conversion-v7-0-2f0fae496530@google.com/
[3] https://www.youtube.com/watch?v=MMfAGNW9RVg
[4] 1GB hugetlb v2


Thoughts, feedback, and testing are very much appreciated.

Thanks,

Mike


----------------------------------------------------------------
Michael Roth (12):
      accel/kvm: Decouple guest_memfd checks from memory attribute checks
      hostmem: Introduce dedicated memory backend for guest_memfd
      linux-headers: Update headers for v7 of in-place conversion kernel support
      accel/kvm: Add CGS option to control in-place conversion support
      system/memory: Re-use memory-backend-guest-memfd inode for private memory
      system/memory: Default to guest_memfd for RAM for in-place conversion
      accel/kvm: Move post-conversion updates to a separate helper
      accel/kvm: Re-order attribute notifications for in-place conversion
      accel/kvm: Support shared/private conversions via guest_memfd ioctls
      accel/kvm: Don't default to private attributes for in-place conversion
      i386/sev: Update SNP_LAUNCH_UPDATE for in-place conversion
      i386/sev: Allow in-place conversion for SEV-SNP guests

 accel/kvm/kvm-all.c                                | 286 +++++++++++--
 accel/stubs/kvm-stub.c                             |   9 +-
 backends/confidential-guest-support.c              |  25 ++
 backends/hostmem-guest-memfd.c                     |  93 +++++
 backends/meson.build                               |   1 +
 include/standard-headers/drm/drm_fourcc.h          |  28 +-
 include/standard-headers/linux/const.h             |  18 +
 include/standard-headers/linux/ethtool.h           |  28 +-
 include/standard-headers/linux/input-event-codes.h |  13 +
 include/standard-headers/linux/pci_regs.h          |  71 +++-
 include/standard-headers/linux/typelimits.h        |   8 +
 include/standard-headers/linux/virtio_ring.h       |   5 +-
 include/standard-headers/linux/virtio_rtc.h        | 237 +++++++++++
 include/standard-headers/linux/vmclock-abi.h       |  20 +
 include/system/confidential-guest-support.h        |  14 +
 include/system/hostmem.h                           |   1 +
 include/system/kvm.h                               |   3 +-
 include/system/memory.h                            |   8 +-
 linux-headers/asm-arm64/kvm.h                      |   1 +
 linux-headers/asm-arm64/unistd_64.h                |   1 +
 linux-headers/asm-generic/unistd.h                 |   5 +-
 linux-headers/asm-loongarch/kvm.h                  |   5 +
 linux-headers/asm-loongarch/kvm_para.h             |   1 +
 linux-headers/asm-loongarch/unistd_64.h            |   2 +
 linux-headers/asm-mips/unistd_n32.h                |   1 +
 linux-headers/asm-mips/unistd_n64.h                |   1 +
 linux-headers/asm-mips/unistd_o32.h                |   1 +
 linux-headers/asm-powerpc/unistd_32.h              |   1 +
 linux-headers/asm-powerpc/unistd_64.h              |   1 +
 linux-headers/asm-riscv/kvm.h                      |  11 +-
 linux-headers/asm-riscv/ptrace.h                   |  37 ++
 linux-headers/asm-riscv/unistd_32.h                |   1 +
 linux-headers/asm-riscv/unistd_64.h                |   1 +
 linux-headers/asm-s390/unistd_32.h                 | 446 ---------------------
 linux-headers/asm-s390/unistd_64.h                 |   1 +
 linux-headers/asm-x86/kvm.h                        |  21 +-
 linux-headers/asm-x86/unistd_32.h                  |   1 +
 linux-headers/asm-x86/unistd_64.h                  |   1 +
 linux-headers/asm-x86/unistd_x32.h                 |   1 +
 linux-headers/linux/const.h                        |  18 +
 linux-headers/linux/iommufd.h                      |  48 +++
 linux-headers/linux/kvm.h                          |  62 ++-
 linux-headers/linux/mshv.h                         |   4 +-
 linux-headers/linux/psp-sev.h                      |   2 +-
 linux-headers/linux/stddef.h                       |   4 +
 linux-headers/linux/vduse.h                        |  85 +++-
 linux-headers/linux/vfio.h                         |  30 +-
 qapi/qom.json                                      |  35 +-
 qemu-options.hx                                    |   5 +
 system/memory.c                                    |  22 +-
 system/physmem.c                                   |  50 ++-
 target/i386/sev.c                                  |  12 +-
 52 files changed, 1253 insertions(+), 533 deletions(-)
 create mode 100644 backends/hostmem-guest-memfd.c
 create mode 100644 include/standard-headers/linux/typelimits.h
 create mode 100644 include/standard-headers/linux/virtio_rtc.h
 delete mode 100644 linux-headers/asm-s390/unistd_32.h


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-06-08 20:43 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
2026-05-28  0:03 ` [PATCH RFC 01/12] accel/kvm: Decouple guest_memfd checks from memory attribute checks Michael Roth
2026-05-28  0:03 ` [PATCH RFC 02/12] hostmem: Introduce dedicated memory backend for guest_memfd Michael Roth
2026-06-02  8:22   ` Markus Armbruster
2026-06-03  6:19     ` Michael Roth
2026-06-08  8:20       ` Markus Armbruster
2026-06-08 20:42         ` Michael Roth
2026-05-28  0:03 ` [PATCH RFC 03/12] linux-headers: Update headers for v7 of in-place conversion kernel support Michael Roth
2026-05-28  0:03 ` [PATCH RFC 04/12] accel/kvm: Add CGS option to control in-place conversion support Michael Roth
2026-06-02  8:23   ` Markus Armbruster
2026-06-03  6:39     ` Michael Roth
2026-06-08  8:15       ` Markus Armbruster
2026-06-08 20:21         ` Michael Roth
2026-05-28  0:03 ` [PATCH RFC 05/12] system/memory: Re-use memory-backend-guest-memfd inode for private memory Michael Roth
2026-05-28  0:03 ` [PATCH RFC 06/12] system/memory: Default to guest_memfd for RAM for in-place conversion Michael Roth
2026-05-28  0:03 ` [PATCH RFC 07/12] accel/kvm: Move post-conversion updates to a separate helper Michael Roth
2026-05-28  0:03 ` [PATCH RFC 08/12] accel/kvm: Re-order attribute notifications for in-place conversion Michael Roth
2026-05-28  0:03 ` [PATCH RFC 09/12] accel/kvm: Support shared/private conversions via guest_memfd ioctls Michael Roth
2026-06-04 13:19   ` Gupta, Pankaj
2026-06-04 23:36     ` Michael Roth
2026-06-04 23:36       ` Michael Roth via qemu development
2026-05-28  0:03 ` [PATCH RFC 10/12] accel/kvm: Don't default to private attributes for in-place conversion Michael Roth
2026-05-28  0:03 ` [PATCH RFC 11/12] i386/sev: Update SNP_LAUNCH_UPDATE " Michael Roth
2026-05-28  0:03 ` [PATCH RFC 12/12] i386/sev: Allow in-place conversion for SEV-SNP guests Michael Roth
2026-05-28  5:44 ` [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Xiaoyao Li
2026-06-02 22:20   ` Michael Roth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.