[PATCH RFC 00/12] guest_memfd: support in-place memory conversion

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH RFC 00/12] guest_memfd: support in-place memory conversion
@ 2026-05-28  0:03 Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 01/12] accel/kvm: Decouple guest_memfd checks from memory attribute checks Michael Roth
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

This patchset is also available at:

  https://github.com/amdese/qemu/commits/snp-inplace-rfc1

which is in turn based on the following series:

  [PATCH 0/4] "guest_memfd: Fix handling for conversions of MMIO ranges"
  https://lists.gnu.org/archive/html/qemu-devel/2026-05/msg07547.html

OVERVIEW
--------

This series adds guest_memfd support for in-place conversion of memory
between private/shared, and enables it for SEV-SNP guests. It is based
on recently-added kernel support for mmap()-able guest_memfd
instances[1], which allow it to be used for shared memory, and the
following patchset[2], which adds additional guest_memfd interfaces to
allow it to be used to perform in-place conversion:

  "[PATCH v7 00/42] guest_memfd: In-place conversion support"
  https://lore.kernel.org/kvm/20260522-gmem-inplace-conversion-v7-0-2f0fae496530@google.com/

That series also introduces a new 'vm_memory_attributes' KVM
module option, which sets whether memory attributes are tracked
VM-wide by KVM (vm_memory_attributes=1: the existing 'legacy' mode),
or per-guest_memfd instance (vm_memory_attributes=0: the new mode
which allows for in-place conversion). The latter is intended to
eventually deprecate the legacy mode, at which point in-place
conversion would become the primarily-supported mode.

MOTIVATION
----------

Today, SEV-SNP guests (and other CoCo VM types using guest_memfd) keep
shared and private memory on separate physical backings: a userspace
memory-backend object for shared pages, and a kernel-allocated
guest_memfd file descriptor for private pages. KVM_SET_MEMORY_ATTRIBUTES
flips which backing the guest sees for a given GPA range, and the old
backing is typically discarded / hole-punched on conversion to avoid
doubled memory usage.

That model works, but has a number of downsides that impact certain
use-cases:

  - Each conversion involves discarding pages on one side and faulting
    them in on the other, which incurs allocation overheads in the
    host kernel for every conversion.

  - Some use-cases, like pKVM[3], rely on memory isolation rather than
    encryption and rely on in-place conversion to pass through things
    like secured framebuffer memory without needing to bounce data
    through separate shared/private HPAs, which would introduce
    unacceptable latency for that sort of workload.

  - Hugetlb support[4] for guest_memfd will rely on it, since things like
    1GB hugepages with a mix of shared/private sub-ranges would generally
    require 2 1GB hugetlb pages to remain available to handle shared vs.
    private accesses, which quickly causes doubling of guest memory usage.

Recent kernel work[2] makes guest_memfd mmap()-able and lets the *same*
physical pages be used for both shared and private states for a given
GPA range, allowing the above pitfalls to be naturally avoided.

This series wires that support up in QEMU.

DESIGN
------

A new dedicated memory backend, memory-backend-guest-memfd, allocates
its memory via a guest_memfd file descriptor obtained from KVM with
the GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED flags. The fd
is mmap()ed so userspace can access pages directly while they are in
the shared state. For a normal/non-confidential VM, this backend can
be used in a similar fashion as the existing memory-backend-memfd.

For confidential VMs, a new 'convert-in-place' flag is added to switch
on in-place conversion support. When running in this mode, the user
*MUST* use memory-backend-guest-memfd for backing guest RAM. A new
RAM_GUEST_MEMFD_SHARED RAMBlock flag is added to track/enforce the
dependency. Additionally, QEMU is modified to use mmap()-able
guest_memfd and set this flag for other cases where it allocates RAM
internally. As a result, block->fd will generally always a guest_memfd,
and when RAM_GUEST_MEMFD_SHARED is set then that block->fd will be
qemu_dup()'d as the FD handle for private memory is well (which is
currently what block->guest_memfd point to). This allows the prior
non-in-place handling around block->guest_memfd to be kept mostly
unchanged.

When running with convert-in-place=true, shared/private conversions
are no longer handled directly by KVM, but instead by a new guest_memfd
ioctl, KVM_SET_MEMORY_ATTRIBUTES2, which purposely provides similar
naming/implementation to the KVM_SET_MEMORY_ATTRIBUTES KVM ioctl that
it replaces. This series adds handling to route conversion requests to
the appropriate ioctls based on whether or not in-place conversion is
enabled.

Since guest_memfd ioctls need to be called against the specific
guest_memfd inode associated with each memory slot/region, some
refactoring is needed to handle conversions on a per-section. Much of
that is inherited from the bugfix series this patchset is based on top
of, which adds the initial logic for handling multiple sections within
a range that gets heavily re-used here.

USAGE
-----

After applying this series against a kernel with the RFC patches above
present, an SEV-SNP guest can be started with in-place conversion via:

    qemu-system-x86_64 \
        -machine q35,confidential-guest-support=sev0,memory-backend=ram0 \
        -object memory-backend-guest-memfd,id=ram0,size=8G,share=on \
        -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,\
                convert-in-place=on \
        ...

The new memory-backend-guest-memfd can also be used by normal VMs:

    qemu-system-x86_64 \
        -machine q35,memory-backend=ram0 \
        -object memory-backend-guest-memfd,id=ram0,size=8G,share=on \
        ...

This is mainly only useful atm for testing, but in the future there may
be more use-cases around using guest_memfd as a general-purpose backend
for non-confidential VMs, so it is intended to work in this manner as
well.

NOTES/TODO
----------

  - the CPR handling to support resetting of confidential VMs is
    currently disabled when in-place conversion is enabled.
  - TDX testing would be great, in theory it can be enabled with this
    series (similarly to the top patch) but I'm not sure if there are
    other special requirements before we can switch it on.
  - kernel patches are still in-flight, but fairly mature at this point
    and nearing upstream

REFERENCES
----------

[1] https://lore.kernel.org/kvm/20250729225455.670324-1-seanjc@google.com/
[2] https://lore.kernel.org/kvm/20260522-gmem-inplace-conversion-v7-0-2f0fae496530@google.com/
[3] https://www.youtube.com/watch?v=MMfAGNW9RVg
[4] 1GB hugetlb v2

Thoughts, feedback, and testing are very much appreciated.

Thanks,

Mike

----------------------------------------------------------------
Michael Roth (12):
      accel/kvm: Decouple guest_memfd checks from memory attribute checks
      hostmem: Introduce dedicated memory backend for guest_memfd
      linux-headers: Update headers for v7 of in-place conversion kernel support
      accel/kvm: Add CGS option to control in-place conversion support
      system/memory: Re-use memory-backend-guest-memfd inode for private memory
      system/memory: Default to guest_memfd for RAM for in-place conversion
      accel/kvm: Move post-conversion updates to a separate helper
      accel/kvm: Re-order attribute notifications for in-place conversion
      accel/kvm: Support shared/private conversions via guest_memfd ioctls
      accel/kvm: Don't default to private attributes for in-place conversion
      i386/sev: Update SNP_LAUNCH_UPDATE for in-place conversion
      i386/sev: Allow in-place conversion for SEV-SNP guests

 accel/kvm/kvm-all.c                                | 286 +++++++++++--
 accel/stubs/kvm-stub.c                             |   9 +-
 backends/confidential-guest-support.c              |  25 ++
 backends/hostmem-guest-memfd.c                     |  93 +++++
 backends/meson.build                               |   1 +
 include/standard-headers/drm/drm_fourcc.h          |  28 +-
 include/standard-headers/linux/const.h             |  18 +
 include/standard-headers/linux/ethtool.h           |  28 +-
 include/standard-headers/linux/input-event-codes.h |  13 +
 include/standard-headers/linux/pci_regs.h          |  71 +++-
 include/standard-headers/linux/typelimits.h        |   8 +
 include/standard-headers/linux/virtio_ring.h       |   5 +-
 include/standard-headers/linux/virtio_rtc.h        | 237 +++++++++++
 include/standard-headers/linux/vmclock-abi.h       |  20 +
 include/system/confidential-guest-support.h        |  14 +
 include/system/hostmem.h                           |   1 +
 include/system/kvm.h                               |   3 +-
 include/system/memory.h                            |   8 +-
 linux-headers/asm-arm64/kvm.h                      |   1 +
 linux-headers/asm-arm64/unistd_64.h                |   1 +
 linux-headers/asm-generic/unistd.h                 |   5 +-
 linux-headers/asm-loongarch/kvm.h                  |   5 +
 linux-headers/asm-loongarch/kvm_para.h             |   1 +
 linux-headers/asm-loongarch/unistd_64.h            |   2 +
 linux-headers/asm-mips/unistd_n32.h                |   1 +
 linux-headers/asm-mips/unistd_n64.h                |   1 +
 linux-headers/asm-mips/unistd_o32.h                |   1 +
 linux-headers/asm-powerpc/unistd_32.h              |   1 +
 linux-headers/asm-powerpc/unistd_64.h              |   1 +
 linux-headers/asm-riscv/kvm.h                      |  11 +-
 linux-headers/asm-riscv/ptrace.h                   |  37 ++
 linux-headers/asm-riscv/unistd_32.h                |   1 +
 linux-headers/asm-riscv/unistd_64.h                |   1 +
 linux-headers/asm-s390/unistd_32.h                 | 446 ---------------------
 linux-headers/asm-s390/unistd_64.h                 |   1 +
 linux-headers/asm-x86/kvm.h                        |  21 +-
 linux-headers/asm-x86/unistd_32.h                  |   1 +
 linux-headers/asm-x86/unistd_64.h                  |   1 +
 linux-headers/asm-x86/unistd_x32.h                 |   1 +
 linux-headers/linux/const.h                        |  18 +
 linux-headers/linux/iommufd.h                      |  48 +++
 linux-headers/linux/kvm.h                          |  62 ++-
 linux-headers/linux/mshv.h                         |   4 +-
 linux-headers/linux/psp-sev.h                      |   2 +-
 linux-headers/linux/stddef.h                       |   4 +
 linux-headers/linux/vduse.h                        |  85 +++-
 linux-headers/linux/vfio.h                         |  30 +-
 qapi/qom.json                                      |  35 +-
 qemu-options.hx                                    |   5 +
 system/memory.c                                    |  22 +-
 system/physmem.c                                   |  50 ++-
 target/i386/sev.c                                  |  12 +-
 52 files changed, 1253 insertions(+), 533 deletions(-)
 create mode 100644 backends/hostmem-guest-memfd.c
 create mode 100644 include/standard-headers/linux/typelimits.h
 create mode 100644 include/standard-headers/linux/virtio_rtc.h
 delete mode 100644 linux-headers/asm-s390/unistd_32.h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH RFC 01/12] accel/kvm: Decouple guest_memfd checks from memory attribute checks
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 02/12] hostmem: Introduce dedicated memory backend for guest_memfd Michael Roth
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

Currently QEMU supports using guest_memfd internally (separately from
user-specified memory backends) to handle private memory for
confidential VMs, and as a result has checks for guest_memfd support
merged with checks to see if KVM can handle mapping private memory (as
determined by KVM_MEMORY_ATTRIBUTE_PRIVATE).

Future QEMU support will allow using guest_memfd not just for private
memory, but as mmap()'able memory that can be used by non-confidential
guests as well.

In prep for this, split the checks for guest_memfd out from the check
for KVM_MEMORY_ATTRIBUTE_PRIVATE, and rename the current
kvm_create_guest_memfd() to kvm_create_guest_memfd_private() to
self-document current behavior/expectations and disambiguate from future
helpers intended for creating a guest_memfd to handle non-private/shared
memory. While there, fix up the missing error_setg() handling in the
stub functions.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 accel/kvm/kvm-all.c     | 20 +++++++++++++++++---
 accel/stubs/kvm-stub.c  |  3 ++-
 include/system/kvm.h    |  2 +-
 include/system/memory.h |  5 +++--
 system/physmem.c        |  8 ++++----
 5 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 585f1cea35..02911ff6e3 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -795,6 +795,11 @@ static int kvm_mem_flags(MemoryRegion *mr)
     }
     if (memory_region_has_guest_memfd(mr)) {
         assert(kvm_guest_memfd_supported);
+        /*
+         * memory_region_has_guest_memfd() is specifically pertaining to
+         * using guest_memfd to handle private memory use cases.
+         */
+        assert(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE);
         flags |= KVM_MEM_GUEST_MEMFD;
     }
     return flags;
@@ -3066,8 +3071,7 @@ static int kvm_init(AccelState *as, MachineState *ms)
     kvm_supported_memory_attributes = kvm_vm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES);
     kvm_guest_memfd_supported =
         kvm_vm_check_extension(s, KVM_CAP_GUEST_MEMFD) &&
-        kvm_vm_check_extension(s, KVM_CAP_USER_MEMORY2) &&
-        (kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE);
+        kvm_vm_check_extension(s, KVM_CAP_USER_MEMORY2);
     kvm_pre_fault_memory_supported = kvm_vm_check_extension(s, KVM_CAP_PRE_FAULT_MEMORY);
 
     if (s->kernel_irqchip_split == ON_OFF_AUTO_AUTO) {
@@ -4854,7 +4858,7 @@ void kvm_mark_guest_state_protected(void)
     kvm_state->guest_state_protected = true;
 }
 
-int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
+static int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
 {
     int fd;
     struct kvm_create_guest_memfd guest_memfd = {
@@ -4875,3 +4879,13 @@ int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
 
     return fd;
 }
+
+int kvm_create_guest_memfd_private(uint64_t size, Error **errp)
+{
+    if (!(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+        error_setg(errp, "KVM does not support using guest_memfd for private memory");
+        return -1;
+    }
+
+    return kvm_create_guest_memfd(size, 0, errp);
+}
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index c4617caac6..1940bcbd2c 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -139,7 +139,8 @@ bool kvm_hwpoisoned_mem(void)
     return false;
 }
 
-int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
+int kvm_create_guest_memfd_private(uint64_t size, Error **errp)
 {
+    error_setg(errp, "guest_memfd is not supported for this configuration");
     return -ENOSYS;
 }
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 5fa33eddda..aeb0c7ca8f 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -561,7 +561,7 @@ void kvm_mark_guest_state_protected(void);
  */
 bool kvm_hwpoisoned_mem(void);
 
-int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
+int kvm_create_guest_memfd_private(uint64_t size, Error **errp);
 
 int kvm_set_memory_attributes_private(hwaddr start, uint64_t size);
 int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size);
diff --git a/include/system/memory.h b/include/system/memory.h
index 1417132f6d..24c68720aa 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -1745,9 +1745,10 @@ bool memory_region_is_protected(const MemoryRegion *mr);
 
 /**
  * memory_region_has_guest_memfd: check whether a memory region has guest_memfd
- *     associated
+ *     associated with it for handling private memory
  *
- * Returns %true if a memory region's ram_block has valid guest_memfd assigned.
+ * Returns %true if a memory region's ram_block has valid guest_memfd assigned
+ * for handling private memory.
  *
  * @mr: the memory region being queried
  */
diff --git a/system/physmem.c b/system/physmem.c
index 7bcbf87573..04c7c38721 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2202,8 +2202,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
             goto out_free;
         }
 
-        new_block->guest_memfd = kvm_create_guest_memfd(new_block->max_length,
-                                                        0, errp);
+        new_block->guest_memfd = kvm_create_guest_memfd_private(new_block->max_length,
+                                                                errp);
         if (new_block->guest_memfd < 0) {
             qemu_mutex_unlock_ramlist();
             goto out_free;
@@ -2835,8 +2835,8 @@ int ram_block_rebind(Error **errp)
             if (block->guest_memfd >= 0) {
                 close(block->guest_memfd);
             }
-            block->guest_memfd = kvm_create_guest_memfd(block->max_length,
-                                                        0, errp);
+            block->guest_memfd = kvm_create_guest_memfd_private(block->max_length,
+                                                                errp);
             if (block->guest_memfd < 0) {
                 qemu_mutex_unlock_ramlist();
                 return -1;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 02/12] hostmem: Introduce dedicated memory backend for guest_memfd
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 01/12] accel/kvm: Decouple guest_memfd checks from memory attribute checks Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-06-02  8:22   ` Markus Armbruster
  2026-05-28  0:03 ` [PATCH RFC 04/12] accel/kvm: Add CGS option to control in-place conversion support Michael Roth
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

In the initial implementation of guest_memfd in the linux kernel, it
was not possible to map memory into userspace for direct access; instead
the memory provided by the memory backend would be used for cases where
a confidential VM wants to access normal/unprotected/unencrypted memory
that can be used for shared memory use cases, and for access to private
memory a guest_memfd could be associated with the same memslot. A memory
'private' attribute set via KVM_SET_MEMORY_ATTRIBUTES could then be used
to have KVM route to the approprate backing memory.

In that model, it didn't make sense to introduce a specific backend for
guest_memfd, since there was always a generally need to have a separate
backend type to handle shared memory access/allocation. Instead, QEMU
configures the guest_memfd support for the associated memslots
internally for cases where it is running a confidential VM.

However, with recent changes in guest_memfd kernel support, it is now
possible to mmap() a guest_memfd FD into userspace and use it for shared
memory, as well as continue to use the same physical pages for the same
GPA ranges after they are converted to private ("in-place conversion").

To enable the use of this mmap()-able/guest_memfd-provided memory to be
used for normal/shared memory instead of just for private memory,
introduce a dedicated guest_memfd memory backend that can be used both
for confidential VMs that wish to make use of in-place conversion, as
well as for non-confidential VMs that just want to make use of
guest_memfd for normal memory (which can be useful both for testing as
well as a stepping stone to things like software-protected VMs where the
host can be trusted to provided some additional degree of isolation for
the VM independently of hardware support).

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 accel/kvm/kvm-all.c            | 15 ++++++
 accel/stubs/kvm-stub.c         |  6 +++
 backends/hostmem-guest-memfd.c | 92 ++++++++++++++++++++++++++++++++++
 backends/meson.build           |  1 +
 include/system/hostmem.h       |  1 +
 include/system/kvm.h           |  1 +
 qapi/qom.json                  | 19 ++++++-
 qemu-options.hx                |  5 ++
 8 files changed, 139 insertions(+), 1 deletion(-)
 create mode 100644 backends/hostmem-guest-memfd.c

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 02911ff6e3..e6ae2e8ced 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -108,6 +108,7 @@ static bool kvm_has_guest_debug;
 static int kvm_sstep_flags;
 static bool kvm_immediate_exit;
 static uint64_t kvm_supported_memory_attributes;
+static uint64_t kvm_supported_guest_memfd_flags;
 static bool kvm_guest_memfd_supported;
 static hwaddr kvm_max_slot_size = ~0;
 
@@ -3069,6 +3070,7 @@ static int kvm_init(AccelState *as, MachineState *ms)
     }
 
     kvm_supported_memory_attributes = kvm_vm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES);
+    kvm_supported_guest_memfd_flags = kvm_vm_check_extension(s, KVM_CAP_GUEST_MEMFD_FLAGS);
     kvm_guest_memfd_supported =
         kvm_vm_check_extension(s, KVM_CAP_GUEST_MEMFD) &&
         kvm_vm_check_extension(s, KVM_CAP_USER_MEMORY2);
@@ -4889,3 +4891,16 @@ int kvm_create_guest_memfd_private(uint64_t size, Error **errp)
 
     return kvm_create_guest_memfd(size, 0, errp);
 }
+
+int kvm_create_guest_memfd_shared(uint64_t size, Error **errp)
+{
+    if (!(kvm_supported_guest_memfd_flags & GUEST_MEMFD_FLAG_MMAP) ||
+        !(kvm_supported_guest_memfd_flags & GUEST_MEMFD_FLAG_INIT_SHARED)) {
+        error_setg(errp, "KVM does not support using guest_memfd for shared memory");
+        return -1;
+    }
+
+    return kvm_create_guest_memfd(size,
+                                  GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED,
+                                  errp);
+}
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 1940bcbd2c..e50329f26e 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -144,3 +144,9 @@ int kvm_create_guest_memfd_private(uint64_t size, Error **errp)
     error_setg(errp, "guest_memfd is not supported for this configuration");
     return -ENOSYS;
 }
+
+int kvm_create_guest_memfd_shared(uint64_t size, Error **errp)
+{
+    error_setg(errp, "guest_memfd is not supported for this configuration");
+    return -ENOSYS;
+}
diff --git a/backends/hostmem-guest-memfd.c b/backends/hostmem-guest-memfd.c
new file mode 100644
index 0000000000..deb796a6bd
--- /dev/null
+++ b/backends/hostmem-guest-memfd.c
@@ -0,0 +1,92 @@
+/*
+ * QEMU guest_memfd memory backend
+ *
+ * Copyright (C) 2026 Advanced Micro Devices, Inc.
+ *
+ * Authors:
+ *   Michael Roth <michael.roth@amd.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "system/hostmem.h"
+#include "qom/object_interfaces.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "migration/cpr.h"
+#include "system/kvm.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendGuestMemfd, MEMORY_BACKEND_GUEST_MEMFD)
+
+struct HostMemoryBackendGuestMemfd {
+    HostMemoryBackend parent_obj;
+};
+
+static bool
+guest_memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
+{
+    g_autofree char *name = host_memory_backend_get_name(backend);
+    int fd = cpr_find_fd(name, 0);
+    uint32_t ram_flags;
+
+    if (!backend->size) {
+        error_setg(errp, "can't create backend with size 0");
+        return false;
+    }
+
+    if (!backend->share) {
+        error_setg(errp, "can't create backend with share=off");
+        return false;
+    }
+
+    if (fd >= 0) {
+        goto have_fd;
+    }
+
+    fd = kvm_create_guest_memfd_shared(backend->size, errp);
+    if (fd < 0) {
+        return false;
+    }
+    cpr_save_fd(name, 0, fd);
+
+have_fd:
+    backend->aligned = true;
+    ram_flags = backend->share ? RAM_SHARED : RAM_PRIVATE;
+    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
+    return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
+                                          backend->size, ram_flags, fd, 0, errp);
+}
+
+static void
+guest_memfd_backend_instance_init(Object *obj)
+{
+    HostMemoryBackendGuestMemfd *m = MEMORY_BACKEND_GUEST_MEMFD(obj);
+
+    MEMORY_BACKEND(m)->share = true;
+}
+
+static void
+guest_memfd_backend_class_init(ObjectClass *oc, const void *data)
+{
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
+
+    bc->alloc = guest_memfd_backend_memory_alloc;
+}
+
+static const TypeInfo guest_memfd_backend_info = {
+    .name = TYPE_MEMORY_BACKEND_GUEST_MEMFD,
+    .parent = TYPE_MEMORY_BACKEND,
+    .instance_init = guest_memfd_backend_instance_init,
+    .class_init = guest_memfd_backend_class_init,
+    .instance_size = sizeof(HostMemoryBackendGuestMemfd),
+};
+
+static void register_types(void)
+{
+    type_register_static(&guest_memfd_backend_info);
+}
+
+type_init(register_types);
diff --git a/backends/meson.build b/backends/meson.build
index 60021f45d1..6c53f4a097 100644
--- a/backends/meson.build
+++ b/backends/meson.build
@@ -20,6 +20,7 @@ endif
 if host_os == 'linux'
   system_ss.add(files('hostmem-memfd.c'))
   system_ss.add(files('host_iommu_device.c'))
+  system_ss.add(files('hostmem-guest-memfd.c'))
 endif
 if keyutils.found()
     system_ss.add(keyutils, files('cryptodev-lkcf.c'))
diff --git a/include/system/hostmem.h b/include/system/hostmem.h
index 88fa791ac7..2d0c25a43e 100644
--- a/include/system/hostmem.h
+++ b/include/system/hostmem.h
@@ -41,6 +41,7 @@ OBJECT_DECLARE_TYPE(HostMemoryBackend, HostMemoryBackendClass,
 
 #define TYPE_MEMORY_BACKEND_MEMFD "memory-backend-memfd"
 
+#define TYPE_MEMORY_BACKEND_GUEST_MEMFD "memory-backend-guest-memfd"
 
 /**
  * HostMemoryBackendClass:
diff --git a/include/system/kvm.h b/include/system/kvm.h
index aeb0c7ca8f..b959a6d3df 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -562,6 +562,7 @@ void kvm_mark_guest_state_protected(void);
 bool kvm_hwpoisoned_mem(void);
 
 int kvm_create_guest_memfd_private(uint64_t size, Error **errp);
+int kvm_create_guest_memfd_shared(uint64_t size, Error **errp);
 
 int kvm_set_memory_attributes_private(hwaddr start, uint64_t size);
 int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size);
diff --git a/qapi/qom.json b/qapi/qom.json
index dd45ac1087..502fafeb15 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -661,7 +661,8 @@
 # @share: if false, the memory is private to QEMU; if true, it is
 #     shared (default false for backends memory-backend-file and
 #     memory-backend-ram, true for backends memory-backend-epc,
-#     memory-backend-memfd, and memory-backend-shm)
+#     memory-backend-memfd, memory-backend-shm, and
+#     memory-backend-guest-memfd)
 #
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 #     (default: true) (since 6.1)
@@ -780,6 +781,18 @@
             '*seal': 'bool' },
   'if': 'CONFIG_LINUX' }
 
+##
+# @MemoryBackendGuestMemfdProperties:
+#
+# Properties for memory-backend-guest-memfd objects.
+#
+# Since: 11.1
+##
+{ 'struct': 'MemoryBackendGuestMemfdProperties',
+  'base': 'MemoryBackendProperties',
+  'data': {},
+  'if': 'CONFIG_LINUX' }
+
 ##
 # @MemoryBackendShmProperties:
 #
@@ -1234,6 +1247,8 @@
     'memory-backend-file',
     { 'name': 'memory-backend-memfd',
       'if': 'CONFIG_LINUX' },
+    { 'name': 'memory-backend-guest-memfd',
+      'if': 'CONFIG_LINUX' },
     'memory-backend-ram',
     { 'name': 'memory-backend-shm',
       'if': 'CONFIG_POSIX' },
@@ -1312,6 +1327,8 @@
       'memory-backend-file':        'MemoryBackendFileProperties',
       'memory-backend-memfd':       { 'type': 'MemoryBackendMemfdProperties',
                                       'if': 'CONFIG_LINUX' },
+      'memory-backend-guest-memfd': { 'type': 'MemoryBackendGuestMemfdProperties',
+                                      'if': 'CONFIG_LINUX' },
       'memory-backend-ram':         'MemoryBackendProperties',
       'memory-backend-shm':         { 'type': 'MemoryBackendShmProperties',
                                       'if': 'CONFIG_POSIX' },
diff --git a/qemu-options.hx b/qemu-options.hx
index 96ae41f787..3c754c149f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5858,6 +5858,11 @@ SRST
         off will cause a failure during allocation because it is not supported
         by this backend.
 
+    ``-object memory-backend-guest-memfd,id=id,prealloc=on|off,size=size,host-nodes=host-nodes,policy=default|preferred|bind|interleave``
+        Creates an anonymous memory file backend object that has similar
+        semantics to memfd, but is also usable as private memory when
+        running as a confidential VM. (Linux only)
+
     ``-object iommufd,id=id[,fd=fd]``
         Creates an iommufd backend which allows control of DMA mapping
         through the ``/dev/iommu`` device.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 04/12] accel/kvm: Add CGS option to control in-place conversion support
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 01/12] accel/kvm: Decouple guest_memfd checks from memory attribute checks Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 02/12] hostmem: Introduce dedicated memory backend for guest_memfd Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-06-02  8:23   ` Markus Armbruster
  2026-05-28  0:03 ` [PATCH RFC 05/12] system/memory: Re-use memory-backend-guest-memfd inode for private memory Michael Roth
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

For confidential guests, guest_memfd is currently used only for private
guest memory, and normal guest memory comes from the configured memory
backend just as it does for a non-confidential guest. It is now possible
to use the same physical memory to back a particular GPA regardless of
whether it is in a shared or private state. This avoids the need to
rely on discarding memory between shared/private conversions (to avoid
doubled memory usage), and is intended to be the primary mode of using
guest_memfd for confidential guests moving forward, and future features
like hugepage support will likely require it.

Add an option to enable this support. Since ConfidentialGuestSupport is
already used to track some guest_memfd-related functionality (e.g.
whether it is required for the configured machine), similarly introduce
this option as a property of ConfidentialGuestSupport.

Also add the KVM-specific checks to enable this support, but leave the
option disabled until other required changes are implemented for
CGS variants that intend to make use of KVM's in-place conversion
support.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 accel/kvm/kvm-all.c                         | 21 +++++++++++++++++
 backends/confidential-guest-support.c       | 25 +++++++++++++++++++++
 include/system/confidential-guest-support.h | 14 ++++++++++++
 qapi/qom.json                               | 16 +++++++++++++
 4 files changed, 76 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e6ae2e8ced..a1832712a4 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -52,6 +52,7 @@
 #include "kvm-cpus.h"
 #include "system/dirtylimit.h"
 #include "qemu/range.h"
+#include "system/confidential-guest-support.h"
 
 #include "hw/core/boards.h"
 #include "system/stats.h"
@@ -2901,6 +2902,7 @@ static int kvm_reset_vmfd(MachineState *ms)
 static int kvm_init(AccelState *as, MachineState *ms)
 {
     MachineClass *mc = MACHINE_GET_CLASS(ms);
+    ConfidentialGuestSupport *cgs = ms->cgs;
     static const char upgrade_note[] =
         "Please upgrade to at least kernel 4.5.\n";
     const struct {
@@ -3076,6 +3078,25 @@ static int kvm_init(AccelState *as, MachineState *ms)
         kvm_vm_check_extension(s, KVM_CAP_USER_MEMORY2);
     kvm_pre_fault_memory_supported = kvm_vm_check_extension(s, KVM_CAP_PRE_FAULT_MEMORY);
 
+    if (cgs && cgs->convert_in_place) {
+        uint64_t guest_memfd_supported_memory_attributes;
+
+        guest_memfd_supported_memory_attributes =
+            kvm_vm_check_extension(s, KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES);
+
+        if (!(guest_memfd_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+            ret = -EINVAL;
+            error_report("In-place conversion is only supported if private "
+                         "memory attributes can be set via guest_memfd. "
+                         "Please ensure the 'vm_memory_attributes' KVM module "
+                         "parameter is set to 0.");
+            goto err;
+        }
+
+        assert(kvm_guest_memfd_supported);
+        kvm_supported_memory_attributes = guest_memfd_supported_memory_attributes;
+    }
+
     if (s->kernel_irqchip_split == ON_OFF_AUTO_AUTO) {
         s->kernel_irqchip_split = mc->default_kernel_irqchip_split ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
     }
diff --git a/backends/confidential-guest-support.c b/backends/confidential-guest-support.c
index 156dd15e66..c89bcf3cb3 100644
--- a/backends/confidential-guest-support.c
+++ b/backends/confidential-guest-support.c
@@ -21,6 +21,24 @@ OBJECT_DEFINE_ABSTRACT_TYPE(ConfidentialGuestSupport,
                             CONFIDENTIAL_GUEST_SUPPORT,
                             OBJECT)
 
+static bool
+cgs_get_convert_in_place(Object *obj, Error **errp)
+{
+    return CONFIDENTIAL_GUEST_SUPPORT(obj)->convert_in_place;
+}
+
+static void
+cgs_set_convert_in_place(Object *obj, bool value, Error **errp)
+{
+    ConfidentialGuestSupport *cgs = CONFIDENTIAL_GUEST_SUPPORT(obj);
+
+    if (!cgs->allow_convert_in_place && value) {
+        error_setg(errp, "In-place conversion support is not supported for this guest configuration.");
+    }
+
+    cgs->convert_in_place = value;
+}
+
 static bool check_support(ConfidentialGuestPlatformType platform,
                          uint16_t platform_version, uint8_t highest_vtl,
                          uint64_t shared_gpa_boundary)
@@ -70,6 +88,13 @@ static void confidential_guest_support_class_init(ObjectClass *oc,
 
 static void confidential_guest_support_init(Object *obj)
 {
+    ConfidentialGuestSupport *cgs = CONFIDENTIAL_GUEST_SUPPORT(obj);
+
+    object_property_add_bool(obj, "convert-in-place", cgs_get_convert_in_place,
+                             cgs_set_convert_in_place);
+
+    cgs->convert_in_place = false;
+    cgs->allow_convert_in_place = false;
 }
 
 static void confidential_guest_support_finalize(Object *obj)
diff --git a/include/system/confidential-guest-support.h b/include/system/confidential-guest-support.h
index 5dca717308..c1e9c41ad2 100644
--- a/include/system/confidential-guest-support.h
+++ b/include/system/confidential-guest-support.h
@@ -20,6 +20,7 @@
 
 #include "qom/object.h"
 #include "exec/hwaddr.h"
+#include "qapi/qapi-visit-qom.h"
 
 #define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
 OBJECT_DECLARE_TYPE(ConfidentialGuestSupport,
@@ -92,6 +93,19 @@ struct ConfidentialGuestSupport {
      * so 'ready' is not set, we'll abort.
      */
     bool ready;
+
+    /*
+     * True if the machine re-uses physical pages when converting
+     * between shared/private (as opposed to using different
+     * physical pages depending on the access type).
+     */
+    bool convert_in_place;
+
+    /*
+     * CGS implementations will use this to indicate whether or not
+     * in-place conversion can be enabled by users.
+     */
+    bool allow_convert_in_place;
 };
 
 typedef struct ConfidentialGuestSupportClass {
diff --git a/qapi/qom.json b/qapi/qom.json
index 502fafeb15..037c078799 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -1014,6 +1014,21 @@
   'if': 'CONFIG_IGVM',
   'data': { 'file': 'str' } }
 
+##
+# @ConfidentialGuestSupportProperties:
+#
+# Properties for ConfidentialGuestSupport base class.
+#
+# @convert-in-place: If true, the same physical pages are reused
+#     when memory is converted between shared and private states.
+#     If false (default), separate allocations are used depending
+#     on whether the page is private or shared.
+#
+# Since: 11.1
+##
+{ 'struct': 'ConfidentialGuestSupportProperties',
+  'data': { '*convert-in-place': 'bool' } }
+
 ##
 # @SevCommonProperties:
 #
@@ -1038,6 +1053,7 @@
 # Since: 9.1
 ##
 { 'struct': 'SevCommonProperties',
+  'base': 'ConfidentialGuestSupportProperties',
   'data': { '*sev-device': 'str',
             '*cbitpos': 'uint32',
             'reduced-phys-bits': 'uint32',
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 05/12] system/memory: Re-use memory-backend-guest-memfd inode for private memory
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
                   ` (2 preceding siblings ...)
  2026-05-28  0:03 ` [PATCH RFC 04/12] accel/kvm: Add CGS option to control in-place conversion support Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 06/12] system/memory: Default to guest_memfd for RAM for in-place conversion Michael Roth
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

When convert-in-place=true, the shared memory allocated/provided by the
guest-memfd memory backend should also be used internally for private
memory. Do this by dup()'ing the guest_memfd FD so separate cleanup
paths for shared vs. private FDs can be managed in the same way they are
currently for convert-in-place=false (where shared memory comes from
some other backend like memory-backend-memfd).

Since it only currently makes sense to allow a
memory-backend-guest-memfd FD to be used for private memory, introduce a
new RAM_GUEST_MEMFD_SHARED flag that can be used to limit dup()'ing to
specific backend types like memory-backend-guest-memfd.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 backends/hostmem-guest-memfd.c |  1 +
 include/system/memory.h        |  3 +++
 system/physmem.c               | 46 +++++++++++++++++++++++++++++++---
 3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/backends/hostmem-guest-memfd.c b/backends/hostmem-guest-memfd.c
index deb796a6bd..8ab8242892 100644
--- a/backends/hostmem-guest-memfd.c
+++ b/backends/hostmem-guest-memfd.c
@@ -56,6 +56,7 @@ have_fd:
     ram_flags = backend->share ? RAM_SHARED : RAM_PRIVATE;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
     ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
+    ram_flags |= RAM_GUEST_MEMFD_SHARED;
     return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
                                           backend->size, ram_flags, fd, 0, errp);
 }
diff --git a/include/system/memory.h b/include/system/memory.h
index 24c68720aa..0a371b686a 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -282,6 +282,9 @@ typedef struct IOMMUTLBEvent {
  */
 #define RAM_PRIVATE (1 << 13)
 
+/* RAM can be shared that has kvm guest memfd backend */
+#define RAM_GUEST_MEMFD_SHARED   (1 << 14)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
diff --git a/system/physmem.c b/system/physmem.c
index 04c7c38721..ebec7ae7a4 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -59,6 +59,7 @@
 #include "system/hostmem.h"
 #include "system/hw_accel.h"
 #include "system/xen-mapcache.h"
+#include "system/confidential-guest-support.h"
 #include "trace.h"
 
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
@@ -2187,11 +2188,14 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
     if (new_block->flags & RAM_GUEST_MEMFD) {
         int ret;
 
+        assert(current_machine->cgs);
+
         if (!kvm_enabled()) {
             error_setg(errp, "cannot set up private guest memory for %s: KVM required",
                        object_get_typename(OBJECT(current_machine->cgs)));
             goto out_free;
         }
+
         assert(new_block->guest_memfd < 0);
 
         ret = ram_block_coordinated_discard_require(true);
@@ -2202,8 +2206,38 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
             goto out_free;
         }
 
-        new_block->guest_memfd = kvm_create_guest_memfd_private(new_block->max_length,
-                                                                errp);
+        /*
+         * If both shared/private memory are handled by guest_memfd, make sure to
+         * re-use the guest_memfd inode that should have already been created for
+         * handling shared memory.
+         */
+        if (current_machine->cgs->convert_in_place) {
+            if (!(new_block->flags & RAM_GUEST_MEMFD_SHARED)) {
+                error_setg(errp, "configured memory backend is not compatible with in-place conversion");
+                qemu_mutex_unlock_ramlist();
+                goto out_free;
+            }
+            assert(new_block->fd >= 0);
+
+            /*
+             * Current logic calculates guest_memfd_offset on the assumption that
+             * offset 0 corresponds to the first GPA that is backed by the RAM
+             * block/backend. For cases where the guest_memfd is only used for
+             * private memory and created internally as-needed this is always the
+             * case, but when re-using a guest_memfd that's also usable for shared
+             * memory (e.g. via memory-backend-guest-memfd) it's possible that
+             * guest_memfd might be mmap()'d starting at some non-zero offset. For
+             * now, this isn't a reachable condition, but assert this in case this
+             * ever changes and the logic needs to be updated to account for this.
+             */
+            assert(new_block->fd_offset == 0);
+
+            new_block->guest_memfd = qemu_dup(new_block->fd);
+        } else {
+            new_block->guest_memfd =
+                kvm_create_guest_memfd_private(new_block->max_length, errp);
+        }
+
         if (new_block->guest_memfd < 0) {
             qemu_mutex_unlock_ramlist();
             goto out_free;
@@ -2315,7 +2349,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, ram_addr_t max_size,
     assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
                           RAM_PROTECTED | RAM_NAMED_FILE | RAM_READONLY |
                           RAM_READONLY_FD | RAM_GUEST_MEMFD |
-                          RAM_RESIZEABLE)) == 0);
+                          RAM_RESIZEABLE | RAM_GUEST_MEMFD_SHARED)) == 0);
     assert(max_size >= size);
 
     if (xen_enabled()) {
@@ -2828,6 +2862,12 @@ int ram_block_rebind(Error **errp)
 {
     RAMBlock *block;
 
+    if (current_machine->cgs && current_machine->cgs->convert_in_place) {
+        error_setg(errp,
+                   "reset support is not yet enabled for in-place conversion");
+        return -1;
+    }
+
     qemu_mutex_lock_ramlist();
 
     RAMBLOCK_FOREACH(block) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 06/12] system/memory: Default to guest_memfd for RAM for in-place conversion
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
                   ` (3 preceding siblings ...)
  2026-05-28  0:03 ` [PATCH RFC 05/12] system/memory: Re-use memory-backend-guest-memfd inode for private memory Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 07/12] accel/kvm: Move post-conversion updates to a separate helper Michael Roth
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

memory_region_init_ram_guest_memfd() is called in some cases (legacy
BIOS regions / IGVM regions) to allocate a new RAM region with a
guest_memfd FD under the covers to handle private memory since the GPA
range can be converted between shared/private guest RAM.

When in-place conversion is enabled, the conversions happen with the
guest_memfd inode itself, so the same inode must be used for both shared
and private memory. Handle this accordingly when convert-in-place=true.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 system/memory.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/system/memory.c b/system/memory.c
index 739ba11da6..f6c695fd23 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -35,6 +35,7 @@
 #include "hw/core/boards.h"
 #include "migration/vmstate.h"
 #include "system/address-spaces.h"
+#include "system/confidential-guest-support.h"
 
 #include "memory-internal.h"
 
@@ -3674,10 +3675,25 @@ bool memory_region_init_ram_guest_memfd(MemoryRegion *mr, Object *owner,
                                         const char *name, uint64_t size,
                                         Error **errp)
 {
-    if (!memory_region_init_ram_flags_nomigrate(mr, owner, name, size,
-                                                RAM_GUEST_MEMFD, errp)) {
-        return false;
+    if (current_machine->cgs && current_machine->cgs->convert_in_place) {
+        int fd = kvm_create_guest_memfd_shared(size, errp);
+        if (fd < 0) {
+            return false;
+        }
+
+        if (!memory_region_init_ram_from_fd(mr, owner, name, size,
+                                            RAM_SHARED | RAM_GUEST_MEMFD |
+                                            RAM_GUEST_MEMFD_SHARED,
+                                            fd, 0, errp)) {
+                return false;
+        }
+    } else {
+        if (!memory_region_init_ram_flags_nomigrate(mr, owner, name, size,
+                                                    RAM_GUEST_MEMFD, errp)) {
+            return false;
+        }
     }
+
     memory_region_register_ram(mr, owner);
     return true;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 07/12] accel/kvm: Move post-conversion updates to a separate helper
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
                   ` (4 preceding siblings ...)
  2026-05-28  0:03 ` [PATCH RFC 06/12] system/memory: Default to guest_memfd for RAM for in-place conversion Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 08/12] accel/kvm: Re-order attribute notifications for in-place conversion Michael Roth
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

Currently memory attribute conversions are followed up by other
bookkeeping tasks like discarding unused memory or issuing iommufd
notifications. Move these tasks to a separate post-conversions helper to
better compartmentalize and track these tasks, and in doing so lay the
groundwork for a pre-conversion helper which will be needed in the
future.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 accel/kvm/kvm-all.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a1832712a4..0e6ff2de4b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3445,20 +3445,26 @@ static int kvm_convert_section(MemoryRegionSection *section, bool to_private)
 {
     hwaddr start = section->offset_within_address_space;
     hwaddr size = int128_get64(section->size);
-    MemoryRegion *mr = section->mr;
-    ram_addr_t offset;
-    RAMBlock *rb;
-    void *addr;
-    int ret = -EINVAL;
+    int ret;
 
     if (to_private) {
         ret = kvm_set_memory_attributes_private(start, size);
     } else {
         ret = kvm_set_memory_attributes_shared(start, size);
     }
-    if (ret) {
-        return ret;
-    }
+
+    return ret;
+}
+
+static int kvm_post_convert_section(MemoryRegionSection *section, bool to_private)
+{
+    hwaddr start = section->offset_within_address_space;
+    hwaddr size = int128_get64(section->size);
+    MemoryRegion *mr = section->mr;
+    ram_addr_t offset;
+    RAMBlock *rb;
+    void *addr;
+    int ret;
 
     addr = memory_region_get_ram_ptr(mr) + section->offset_within_region;
     rb = qemu_ram_block_from_host(addr, false, &offset);
@@ -3485,7 +3491,7 @@ static int kvm_convert_section(MemoryRegionSection *section, bool to_private)
         ret = ram_block_discard_guest_memfd_range(rb, offset, size);
     }
 
-    return ret;
+    return 0;
 }
 
 int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
@@ -3533,6 +3539,12 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
         }
 
         ret = kvm_convert_section(&section, to_private);
+        if (ret) {
+            memory_region_unref(section.mr);
+            break;
+        }
+
+        ret = kvm_post_convert_section(&section, to_private);
         memory_region_unref(section.mr);
 
         if (ret) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 08/12] accel/kvm: Re-order attribute notifications for in-place conversion
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
                   ` (5 preceding siblings ...)
  2026-05-28  0:03 ` [PATCH RFC 07/12] accel/kvm: Move post-conversion updates to a separate helper Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 09/12] accel/kvm: Support shared/private conversions via guest_memfd ioctls Michael Roth
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

ram-block-attribute update notifications are currently sent after
conversions from/to private pages to trigger DMA maps/unmaps of shared
GPA ranges (respectively). However, with in-place conversion additional
requirements on the kernel side come into play which require this
behavior to be adjusted.

For shared->private conversions: the attributes need to be set to
private *after* the notification, since when using VFIO it may not be
possible to update the attribute while it remains pinned due to the
IOMMU mapping, so issue the notification first to ensure unmappings are
done in advance.

For private->shared conversions: the attributes need to be set to shared
*before* the notification, since it will possibly result in the page
being mapped into an IOMMU and trigger guest_memfd's fault handler,
which will expect the page to have its attributes set to shared or
otherwise SIGBUS.

Implement this to enable passthrough support for CoCo guests with
in-place conversion support enabled. For non-inplace conversion, pages
mapped into the IOMMU are not the same physical pages as the one used
for private accesses by the guest, so neither order risks DMA accesses
to private memory and that path can be consolidated to use the same
handling as well.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 accel/kvm/kvm-all.c | 70 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 63 insertions(+), 7 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0e6ff2de4b..62f2e8aa15 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3456,6 +3456,47 @@ static int kvm_convert_section(MemoryRegionSection *section, bool to_private)
     return ret;
 }
 
+static int kvm_pre_convert_section(MemoryRegionSection *section, bool to_private)
+{
+    hwaddr start = section->offset_within_address_space;
+    hwaddr size = int128_get64(section->size);
+    MemoryRegion *mr = section->mr;
+    ram_addr_t offset;
+    RAMBlock *rb;
+    void *addr;
+    int ret;
+
+    addr = memory_region_get_ram_ptr(mr) + section->offset_within_region;
+    rb = qemu_ram_block_from_host(addr, false, &offset);
+
+    /*
+     * The attributes need to be set to private *after* the notification
+     * of a shared->private conversion, since when using VFIO it may not
+     * be possible to update the attribute while it remains pinned due
+     * to the IOMMU mapping, so issue the notification first to ensure
+     * unmappings are done in advance.
+     *
+     * There is an asymmetry here in that if the subsequent memory
+     * attribute update fails, this notification is out of sync with the
+     * state as tracked by guest_memfd, which isn't ideal, but memory
+     * attribute failures are not expected to be recoverable any way so
+     * there it would be a waste of time to roll back the notification and
+     * re-trigger things like mapping the page via iommufd.
+     */
+    if (to_private) {
+        ret = ram_block_attributes_state_change(rb->attributes,
+                                                offset, size, to_private);
+        if (ret) {
+            error_report("Failed to notify the listener the state change of "
+                         "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s, ret %d",
+                         start, size, to_private ? "private" : "shared", ret);
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 static int kvm_post_convert_section(MemoryRegionSection *section, bool to_private)
 {
     hwaddr start = section->offset_within_address_space;
@@ -3469,13 +3510,22 @@ static int kvm_post_convert_section(MemoryRegionSection *section, bool to_privat
     addr = memory_region_get_ram_ptr(mr) + section->offset_within_region;
     rb = qemu_ram_block_from_host(addr, false, &offset);
 
-    ret = ram_block_attributes_state_change(rb->attributes,
-                                            offset, size, to_private);
-    if (ret) {
-        error_report("Failed to notify the listener the state change of "
-                     "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s, ret %d",
-                     start, size, to_private ? "private" : "shared", ret);
-        return ret;
+    /*
+     * The attributes need to have been set to shared *before* the notification
+     * of a private->shared conversion, since it will possibly result in the
+     * page being mapped into an IOMMU when using VFIO and trigger
+     * guest_memfd's fault handler, which will expect the page to have its
+     * attributes set to shared.
+     */
+    if (!to_private) {
+        ret = ram_block_attributes_state_change(rb->attributes,
+                                                offset, size, to_private);
+        if (ret) {
+            error_report("Failed to notify the listener the state change of "
+                         "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s, ret %d",
+                         start, size, to_private ? "private" : "shared", ret);
+            return ret;
+        }
     }
 
     if (to_private) {
@@ -3538,6 +3588,12 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
             continue;
         }
 
+        ret = kvm_pre_convert_section(&section, to_private);
+        if (ret) {
+            memory_region_unref(section.mr);
+            break;
+        }
+
         ret = kvm_convert_section(&section, to_private);
         if (ret) {
             memory_region_unref(section.mr);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 09/12] accel/kvm: Support shared/private conversions via guest_memfd ioctls
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
                   ` (6 preceding siblings ...)
  2026-05-28  0:03 ` [PATCH RFC 08/12] accel/kvm: Re-order attribute notifications for in-place conversion Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-06-04 13:19   ` Gupta, Pankaj
  2026-05-28  0:03 ` [PATCH RFC 10/12] accel/kvm: Don't default to private attributes for in-place conversion Michael Roth
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

When using guest_memfd with support for shared memory / in-place
conversion, it is necessary to use the guest_memfd ioctls to handle
conversions instead of KVM ioctls. Implement support for this by looping
through all the sections within a converison range. Implement everything
in terms of the kvm_convert_memory() loop, which already deals with some
special considerations regarding various holes / region types that might
be encountered.

Also update kvm_set_memory_attributes_*() to use the same common path
when convert-in-place=false. This potentially results in a small change
in behavior due to the additional MMIO checks/skips now being applied in
that case (generally qemu-triggered during setup) rather than only for
kvm_convert_memory() (generally guest-triggered), but this is arguably
safer, and it provides similar behavior between convert-in-place=false
vs. convert-in-place=true, the latter of which *must* skip MMIO holes
because the regions (and associated guest_memfds) themselves track
shared/private state internally and passing the whole conversion range
through to KVM is not an option in that case.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 accel/kvm/kvm-all.c | 131 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 114 insertions(+), 17 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 62f2e8aa15..fd01435a0f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1626,14 +1626,78 @@ static int kvm_set_memory_attributes(hwaddr start, uint64_t size, uint64_t attr)
     return r;
 }
 
-int kvm_set_memory_attributes_private(hwaddr start, uint64_t size)
+static int kvm_gmem_ioctl(int guest_memfd, unsigned long type, ...)
 {
-    return kvm_set_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+    int ret;
+    void *arg;
+    va_list ap;
+
+    va_start(ap, type);
+    arg = va_arg(ap, void *);
+    va_end(ap);
+
+    ret = ioctl(guest_memfd, type, arg);
+    if (ret == -1) {
+        ret = -errno;
+    }
+    return ret;
 }
 
-int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size)
+static int guest_memfd_set_memory_attributes_fd(int guest_memfd, hwaddr offset,
+                                                uint64_t size, uint64_t attr)
 {
-    return kvm_set_memory_attributes(start, size, 0);
+    struct kvm_memory_attributes2 attrs;
+    int r;
+
+    assert((attr & kvm_supported_memory_attributes) == attr);
+    attrs.attributes = attr;
+    attrs.offset = offset;
+    attrs.size = size;
+    attrs.flags = 0;
+
+    /*
+     * guest_memfd may need to delay conversion requests due to
+     * the memory being in-use by the kernel. In most cases these
+     * will be transient uses. In some cases, userspace itself may
+     * be the cause of the memory being considered in-use, though
+     * QEMU currently takes steps to avoid this (e.g. via
+     * RamBlockAttributes). On that basis, this code loops
+     * indefinitely with the assumption that only transient cases
+     * will block, and that those will be for relatively short
+     * periods vs. the overall conversion path.
+     * If those assumptions at some point prove false, most likely
+     * this will manifest as guest-side lockups on their conversion
+     * path, which seems like the appropriate way to surface this
+     * situation to the guest owner rather than some hard timeout.
+     */
+    do {
+        r = kvm_gmem_ioctl(guest_memfd, KVM_SET_MEMORY_ATTRIBUTES2, &attrs);
+    } while (r == -EAGAIN);
+
+    if (r) {
+        error_report("failed to set memory (0x%" HWADDR_PRIx "+0x%" PRIx64 ") "
+                     "with attr 0x%" PRIx64 " error '%s'",
+                     offset, size, attr, strerror(-r));
+    }
+    return r;
+}
+
+static int guest_memfd_set_memory_section_attributes(MemoryRegionSection *section, uint64_t attr)
+{
+    hwaddr convert_offset, convert_size;
+    MemoryRegion *mr = section->mr;
+    RAMBlock *rb;
+
+    assert(mr);
+    rb = mr->ram_block;
+    assert(rb->guest_memfd);
+    convert_offset = section->offset_within_region;
+    convert_size = int128_get64(section->size);
+
+    return guest_memfd_set_memory_attributes_fd(rb->guest_memfd,
+                                                convert_offset,
+                                                convert_size,
+                                                attr);
 }
 
 /* Called with KVMMemoryListener.slots_lock held */
@@ -3447,10 +3511,18 @@ static int kvm_convert_section(MemoryRegionSection *section, bool to_private)
     hwaddr size = int128_get64(section->size);
     int ret;
 
-    if (to_private) {
-        ret = kvm_set_memory_attributes_private(start, size);
+    if (current_machine->cgs && current_machine->cgs->convert_in_place) {
+        ret = guest_memfd_set_memory_section_attributes(section,
+                                                        to_private ? KVM_MEMORY_ATTRIBUTE_PRIVATE
+                                                                   : 0);
     } else {
-        ret = kvm_set_memory_attributes_shared(start, size);
+        /*
+         * Without in-place conversion, attribute-tracking is handled by KVM
+         * across all guest memory rather than on a per-section/slot basis.
+         */
+        ret = kvm_set_memory_attributes(start, size,
+                                        to_private ? KVM_MEMORY_ATTRIBUTE_PRIVATE
+                                                   : 0);
     }
 
     return ret;
@@ -3544,7 +3616,8 @@ static int kvm_post_convert_section(MemoryRegionSection *section, bool to_privat
     return 0;
 }
 
-int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
+static int kvm_convert_memory_full(hwaddr start, hwaddr size, bool to_private,
+                                   bool pre_hooks, bool post_hooks)
 {
     int ret = -EINVAL;
 
@@ -3588,10 +3661,12 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
             continue;
         }
 
-        ret = kvm_pre_convert_section(&section, to_private);
-        if (ret) {
-            memory_region_unref(section.mr);
-            break;
+        if (pre_hooks) {
+            ret = kvm_pre_convert_section(&section, to_private);
+            if (ret) {
+                memory_region_unref(section.mr);
+                break;
+            }
         }
 
         ret = kvm_convert_section(&section, to_private);
@@ -3600,13 +3675,15 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
             break;
         }
 
-        ret = kvm_post_convert_section(&section, to_private);
-        memory_region_unref(section.mr);
-
-        if (ret) {
-            break;
+        if (post_hooks) {
+            ret = kvm_post_convert_section(&section, to_private);
+            if (ret) {
+                memory_region_unref(section.mr);
+                break;
+            }
         }
 
+        memory_region_unref(section.mr);
         size -= section_end - start;
         start = section_end;
     }
@@ -3614,6 +3691,26 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
     return ret;
 }
 
+int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
+{
+    return kvm_convert_memory_full(start, size, to_private, true, true);
+}
+
+static int kvm_convert_memory_attributes(hwaddr start, hwaddr size, bool to_private)
+{
+    return kvm_convert_memory_full(start, size, to_private, false, false);
+}
+
+int kvm_set_memory_attributes_private(hwaddr start, uint64_t size)
+{
+    return kvm_convert_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+}
+
+int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size)
+{
+    return kvm_convert_memory_attributes(start, size, 0);
+}
+
 int kvm_cpu_exec(CPUState *cpu)
 {
     struct kvm_run *run = cpu->kvm_run;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 10/12] accel/kvm: Don't default to private attributes for in-place conversion
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
                   ` (7 preceding siblings ...)
  2026-05-28  0:03 ` [PATCH RFC 09/12] accel/kvm: Support shared/private conversions via guest_memfd ioctls Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 11/12] i386/sev: Update SNP_LAUNCH_UPDATE " Michael Roth
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

Without in-place conversion, QEMU can still access shared memory to load
initial state into guest memory prior to launch even if the GPA's memory
attributes default to private, since userspace is accessing a completely
separate pool of memory. With in-place conversion, all these accesses
would need to first be converted to shared, then back to private, since
the memory all comes from guest_memfd and only shared memory can be
accessed by userspace.

To avoid sprinkling these differences in behavior throughout QEMU when
in-place conversion is enabled, just default to shared. This does not
compromise guest security, since Confidential VMs will necessarily
enforce this via trusted entities, and simply generate implicit page
state changes if their default expectations don't match KVM's. However,
in most cases a guest will explicitly convert memory to a particular
state before actually using it, so even these implicit conversion
requests should be rare.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 accel/kvm/kvm-all.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index fd01435a0f..c3d399517d 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1808,7 +1808,26 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
             abort();
         }
 
-        if (memory_region_has_guest_memfd(mr)) {
+        /*
+         * Without in-place conversion, QEMU can still access shared memory
+         * to load initial state into guest memory prior to launch even if
+         * the GPA's memory attributes default to private, since userspace
+         * is accessing a completely separate pool of memory. With in-place
+         * conversion, all these accesses would need to first be converted
+         * to shared, then back to private, since the memory all comes from
+         * guest_memfd and only shared memory can be accessed by userspace.
+         *
+         * To avoid sprinkling these differences in behavior throughout QEMU
+         * when in-place conversion is enabled, just default to shared. This
+         * does not compromise guest security, since Confidential VMs will
+         * necessarily enforce this via trusted entities, and simply generate
+         * implicit page state changes if their default expectations don't
+         * match KVM's. However, in most cases a guest will explicitly
+         * convert memory to a particular state before actually using it, so
+         * even these implicit conversion requests should be rare.
+         */
+        if (memory_region_has_guest_memfd(mr) &&
+            !(current_machine->cgs && current_machine->cgs->convert_in_place)) {
             err = kvm_set_memory_attributes_private(start_addr, slot_size);
             if (err) {
                 error_report("%s: failed to set memory attribute private: %s",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 11/12] i386/sev: Update SNP_LAUNCH_UPDATE for in-place conversion
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
                   ` (8 preceding siblings ...)
  2026-05-28  0:03 ` [PATCH RFC 10/12] accel/kvm: Don't default to private attributes for in-place conversion Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-05-28  0:03 ` [PATCH RFC 12/12] i386/sev: Allow in-place conversion for SEV-SNP guests Michael Roth
  2026-05-28  5:44 ` [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Xiaoyao Li
  11 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

For in-place conversion, the source pointer is expected to be NULL since
the data has already been written directly to guest memory and doesn't
need to be copied in prior to encrypting it in-place for initial guest
memory payload.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 target/i386/sev.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index b44b5a1c2b..32a5e605bf 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1186,6 +1186,8 @@ sev_snp_launch_update(SevSnpGuestState *sev_snp_guest,
     int ret, fw_error;
     SnpCpuidInfo snp_cpuid_info;
     struct kvm_sev_snp_launch_update update = {0};
+    ConfidentialGuestSupport *cgs =
+        CONFIDENTIAL_GUEST_SUPPORT(OBJECT(sev_snp_guest));
 
     if (!data->hva || !data->len) {
         error_report("SNP_LAUNCH_UPDATE called with invalid address"
@@ -1199,7 +1201,14 @@ sev_snp_launch_update(SevSnpGuestState *sev_snp_guest,
         memcpy(&snp_cpuid_info, data->hva, sizeof(snp_cpuid_info));
     }
 
-    update.uaddr = (__u64)(unsigned long)data->hva;
+    /*
+     * For in-place conversion, the source pointer is expected to be NULL
+     * since the data has already been written directly to guest memory
+     * and only needs to be encrypted in-place for secure access.
+     */
+    if (!cgs->convert_in_place) {
+        update.uaddr = (__u64)(unsigned long)data->hva;
+    }
     update.gfn_start = data->gpa >> TARGET_PAGE_BITS;
     update.len = data->len;
     update.type = data->type;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 12/12] i386/sev: Allow in-place conversion for SEV-SNP guests
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
                   ` (9 preceding siblings ...)
  2026-05-28  0:03 ` [PATCH RFC 11/12] i386/sev: Update SNP_LAUNCH_UPDATE " Michael Roth
@ 2026-05-28  0:03 ` Michael Roth
  2026-05-28  5:44 ` [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Xiaoyao Li
  11 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-05-28  0:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

All the necessary changes are now in place for an SNP guest to be able
to leverage in-place conversion support. Allow it to be switched on by
users. KVM-specific checks will still gate whether or not the option is
ultimately allowed, this just allows the option to be set via
command-line.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 target/i386/sev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 32a5e605bf..a56367aa5e 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -3198,6 +3198,7 @@ sev_snp_guest_instance_init(Object *obj)
     SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(obj);
 
     cgs->require_guest_memfd = true;
+    cgs->allow_convert_in_place = true;
 
     /* default init/start/finish params for kvm */
     sev_snp_guest->kvm_start_conf.policy = DEFAULT_SEV_SNP_POLICY;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 00/12] guest_memfd: support in-place memory conversion
  2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
                   ` (10 preceding siblings ...)
  2026-05-28  0:03 ` [PATCH RFC 12/12] i386/sev: Allow in-place conversion for SEV-SNP guests Michael Roth
@ 2026-05-28  5:44 ` Xiaoyao Li
  2026-06-02 22:20   ` Michael Roth
  11 siblings, 1 reply; 24+ messages in thread
From: Xiaoyao Li @ 2026-05-28  5:44 UTC (permalink / raw)
  To: Michael Roth, qemu-devel, Peter Xu
  Cc: kvm, pbonzini, berrange, armbru, pankaj.gupta, isaku.yamahata,
	chao.p.peng, david, ashish.kalra, ackerleytng

On 5/28/2026 8:03 AM, Michael Roth wrote:
> This patchset is also available at:
> 
>    https://github.com/amdese/qemu/commits/snp-inplace-rfc1
> 
> which is in turn based on the following series:
> 
>    [PATCH 0/4] "guest_memfd: Fix handling for conversions of MMIO ranges"
>    https://lists.gnu.org/archive/html/qemu-devel/2026-05/msg07547.html
> 
> 
> OVERVIEW
> --------
> 
> This series adds guest_memfd support for in-place conversion of memory
> between private/shared, and enables it for SEV-SNP guests. It is based
> on recently-added kernel support for mmap()-able guest_memfd
> instances[1], which allow it to be used for shared memory, and the
> following patchset[2], which adds additional guest_memfd interfaces to
> allow it to be used to perform in-place conversion:
> 
>    "[PATCH v7 00/42] guest_memfd: In-place conversion support"
>    https://lore.kernel.org/kvm/20260522-gmem-inplace-conversion-v7-0-2f0fae496530@google.com/
> 
> That series also introduces a new 'vm_memory_attributes' KVM
> module option, which sets whether memory attributes are tracked
> VM-wide by KVM (vm_memory_attributes=1: the existing 'legacy' mode),
> or per-guest_memfd instance (vm_memory_attributes=0: the new mode
> which allows for in-place conversion). The latter is intended to
> eventually deprecate the legacy mode, at which point in-place
> conversion would become the primarily-supported mode.
> 
> 
> MOTIVATION
> ----------
> 
> Today, SEV-SNP guests (and other CoCo VM types using guest_memfd) keep
> shared and private memory on separate physical backings: a userspace
> memory-backend object for shared pages, and a kernel-allocated
> guest_memfd file descriptor for private pages. KVM_SET_MEMORY_ATTRIBUTES
> flips which backing the guest sees for a given GPA range, and the old
> backing is typically discarded / hole-punched on conversion to avoid
> doubled memory usage.
> 
> That model works, but has a number of downsides that impact certain
> use-cases:
> 
>    - Each conversion involves discarding pages on one side and faulting
>      them in on the other, which incurs allocation overheads in the
>      host kernel for every conversion.
> 
>    - Some use-cases, like pKVM[3], rely on memory isolation rather than
>      encryption and rely on in-place conversion to pass through things
>      like secured framebuffer memory without needing to bounce data
>      through separate shared/private HPAs, which would introduce
>      unacceptable latency for that sort of workload.
> 
>    - Hugetlb support[4] for guest_memfd will rely on it, since things like
>      1GB hugepages with a mix of shared/private sub-ranges would generally
>      require 2 1GB hugetlb pages to remain available to handle shared vs.
>      private accesses, which quickly causes doubling of guest memory usage.
> 
> Recent kernel work[2] makes guest_memfd mmap()-able and lets the *same*
> physical pages be used for both shared and private states for a given
> GPA range, allowing the above pitfalls to be naturally avoided.
> 
> This series wires that support up in QEMU.

+ Peter,

Peter had the series[*] to enable the mmap() of guest memfd and allow it 
serve as unencrypted memory for VMs. I believe there are some overlapped 
efforts.

[*] 
https://lore.kernel.org/qemu-devel/20251215205203.1185099-1-peterx@redhat.com/

> 
> DESIGN
> ------
> 
> A new dedicated memory backend, memory-backend-guest-memfd, allocates
> its memory via a guest_memfd file descriptor obtained from KVM with
> the GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED flags. 

A quick feedback:

The design choice from Peter's series was to extend the current 
hostmem-memfd backend to support guest-memfd instead of a new dedicated 
backend.
I think we need to evaluate the pros and cons of each other, and make a 
choice.

(I will go read the other part later and provide more feedback)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 02/12] hostmem: Introduce dedicated memory backend for guest_memfd
  2026-05-28  0:03 ` [PATCH RFC 02/12] hostmem: Introduce dedicated memory backend for guest_memfd Michael Roth
@ 2026-06-02  8:22   ` Markus Armbruster
  2026-06-03  6:19     ` Michael Roth
  0 siblings, 1 reply; 24+ messages in thread
From: Markus Armbruster @ 2026-06-02  8:22 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, kvm, pbonzini, berrange, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

Michael Roth <michael.roth@amd.com> writes:

> In the initial implementation of guest_memfd in the linux kernel, it
> was not possible to map memory into userspace for direct access; instead
> the memory provided by the memory backend would be used for cases where
> a confidential VM wants to access normal/unprotected/unencrypted memory
> that can be used for shared memory use cases, and for access to private
> memory a guest_memfd could be associated with the same memslot. A memory
> 'private' attribute set via KVM_SET_MEMORY_ATTRIBUTES could then be used
> to have KVM route to the approprate backing memory.
>
> In that model, it didn't make sense to introduce a specific backend for
> guest_memfd, since there was always a generally need to have a separate

a general need?

> backend type to handle shared memory access/allocation. Instead, QEMU
> configures the guest_memfd support for the associated memslots
> internally for cases where it is running a confidential VM.
>
> However, with recent changes in guest_memfd kernel support, it is now
> possible to mmap() a guest_memfd FD into userspace and use it for shared
> memory, as well as continue to use the same physical pages for the same
> GPA ranges after they are converted to private ("in-place conversion").
>
> To enable the use of this mmap()-able/guest_memfd-provided memory to be
> used for normal/shared memory instead of just for private memory,
> introduce a dedicated guest_memfd memory backend that can be used both
> for confidential VMs that wish to make use of in-place conversion, as
> well as for non-confidential VMs that just want to make use of
> guest_memfd for normal memory (which can be useful both for testing as
> well as a stepping stone to things like software-protected VMs where the
> host can be trusted to provided some additional degree of isolation for
> the VM independently of hardware support).
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>

[...]

> diff --git a/qapi/qom.json b/qapi/qom.json
> index dd45ac1087..502fafeb15 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -661,7 +661,8 @@
>  # @share: if false, the memory is private to QEMU; if true, it is
>  #     shared (default false for backends memory-backend-file and
>  #     memory-backend-ram, true for backends memory-backend-epc,
> -#     memory-backend-memfd, and memory-backend-shm)
> +#     memory-backend-memfd, memory-backend-shm, and
> +#     memory-backend-guest-memfd)
>  #
>  # @reserve: if true, reserve swap space (or huge pages) if applicable
>  #     (default: true) (since 6.1)
> @@ -780,6 +781,18 @@
>              '*seal': 'bool' },
>    'if': 'CONFIG_LINUX' }
>  
> +##
> +# @MemoryBackendGuestMemfdProperties:
> +#
> +# Properties for memory-backend-guest-memfd objects.
> +#
> +# Since: 11.1
> +##
> +{ 'struct': 'MemoryBackendGuestMemfdProperties',
> +  'base': 'MemoryBackendProperties',
> +  'data': {},
> +  'if': 'CONFIG_LINUX' }
> +

Identical to MemoryBackendProperties so far.

>  ##
>  # @MemoryBackendShmProperties:
>  #
> @@ -1234,6 +1247,8 @@
>      'memory-backend-file',
>      { 'name': 'memory-backend-memfd',
>        'if': 'CONFIG_LINUX' },
> +    { 'name': 'memory-backend-guest-memfd',
> +      'if': 'CONFIG_LINUX' },
>      'memory-backend-ram',
>      { 'name': 'memory-backend-shm',
>        'if': 'CONFIG_POSIX' },
> @@ -1312,6 +1327,8 @@
>        'memory-backend-file':        'MemoryBackendFileProperties',
>        'memory-backend-memfd':       { 'type': 'MemoryBackendMemfdProperties',
>                                        'if': 'CONFIG_LINUX' },
> +      'memory-backend-guest-memfd': { 'type': 'MemoryBackendGuestMemfdProperties',
> +                                      'if': 'CONFIG_LINUX' },

You could use MemoryBackendProperties here, and drop
MemoryBackendGuestMemfdProperties, similar to how memory-backend-ram
is done.

>        'memory-backend-ram':         'MemoryBackendProperties',
>        'memory-backend-shm':         { 'type': 'MemoryBackendShmProperties',
>                                        'if': 'CONFIG_POSIX' },

Should we provide guidance on when to use which memory backend?  The
commit message provides some clues...

> diff --git a/qemu-options.hx b/qemu-options.hx
> index 96ae41f787..3c754c149f 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -5858,6 +5858,11 @@ SRST
>          off will cause a failure during allocation because it is not supported
>          by this backend.
>  
> +    ``-object memory-backend-guest-memfd,id=id,prealloc=on|off,size=size,host-nodes=host-nodes,policy=default|preferred|bind|interleave``
> +        Creates an anonymous memory file backend object that has similar
> +        semantics to memfd, but is also usable as private memory when
> +        running as a confidential VM. (Linux only)

There is no object type "memfd".  Do you mean "memory-backend-memfd"?

If yes, that one has additional properties @hugetlb, @hugetlbsize, and
@seal.  Why are they not needed for memory-backend-guest-memfd?

> +
>      ``-object iommufd,id=id[,fd=fd]``
>          Creates an iommufd backend which allows control of DMA mapping
>          through the ``/dev/iommu`` device.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 04/12] accel/kvm: Add CGS option to control in-place conversion support
  2026-05-28  0:03 ` [PATCH RFC 04/12] accel/kvm: Add CGS option to control in-place conversion support Michael Roth
@ 2026-06-02  8:23   ` Markus Armbruster
  2026-06-03  6:39     ` Michael Roth
  0 siblings, 1 reply; 24+ messages in thread
From: Markus Armbruster @ 2026-06-02  8:23 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, kvm, pbonzini, berrange, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

Michael Roth <michael.roth@amd.com> writes:

> For confidential guests, guest_memfd is currently used only for private
> guest memory, and normal guest memory comes from the configured memory
> backend just as it does for a non-confidential guest. It is now possible
> to use the same physical memory to back a particular GPA regardless of
> whether it is in a shared or private state. This avoids the need to
> rely on discarding memory between shared/private conversions (to avoid
> doubled memory usage), and is intended to be the primary mode of using
> guest_memfd for confidential guests moving forward, and future features
> like hugepage support will likely require it.
>
> Add an option to enable this support. Since ConfidentialGuestSupport is
> already used to track some guest_memfd-related functionality (e.g.
> whether it is required for the configured machine), similarly introduce
> this option as a property of ConfidentialGuestSupport.
>
> Also add the KVM-specific checks to enable this support, but leave the
> option disabled until other required changes are implemented for
> CGS variants that intend to make use of KVM's in-place conversion
> support.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>

[...]

> diff --git a/qapi/qom.json b/qapi/qom.json
> index 502fafeb15..037c078799 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -1014,6 +1014,21 @@
>    'if': 'CONFIG_IGVM',
>    'data': { 'file': 'str' } }
>  
> +##
> +# @ConfidentialGuestSupportProperties:
> +#
> +# Properties for ConfidentialGuestSupport base class.
> +#
> +# @convert-in-place: If true, the same physical pages are reused
> +#     when memory is converted between shared and private states.
> +#     If false (default), separate allocations are used depending
> +#     on whether the page is private or shared.
> +#
> +# Since: 11.1
> +##
> +{ 'struct': 'ConfidentialGuestSupportProperties',
> +  'data': { '*convert-in-place': 'bool' } }
> +
>  ##
>  # @SevCommonProperties:
>  #
> @@ -1038,6 +1053,7 @@
>  # Since: 9.1
>  ##
>  { 'struct': 'SevCommonProperties',
> +  'base': 'ConfidentialGuestSupportProperties',
>    'data': { '*sev-device': 'str',
>              '*cbitpos': 'uint32',
>              'reduced-phys-bits': 'uint32',

Why use a base type instead of simply adding @convert-in-place to
SevCommonProperties?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 00/12] guest_memfd: support in-place memory conversion
  2026-05-28  5:44 ` [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Xiaoyao Li
@ 2026-06-02 22:20   ` Michael Roth
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-06-02 22:20 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: qemu-devel, Peter Xu, kvm, pbonzini, berrange, armbru,
	pankaj.gupta, isaku.yamahata, chao.p.peng, david, ashish.kalra,
	ackerleytng

On Thu, May 28, 2026 at 01:44:39PM +0800, Xiaoyao Li wrote:
> On 5/28/2026 8:03 AM, Michael Roth wrote:
> > This patchset is also available at:
> > 
> >    https://github.com/amdese/qemu/commits/snp-inplace-rfc1
> > 
> > which is in turn based on the following series:
> > 
> >    [PATCH 0/4] "guest_memfd: Fix handling for conversions of MMIO ranges"
> >    https://lists.gnu.org/archive/html/qemu-devel/2026-05/msg07547.html
> > 
> > 
> > OVERVIEW
> > --------
> > 
> > This series adds guest_memfd support for in-place conversion of memory
> > between private/shared, and enables it for SEV-SNP guests. It is based
> > on recently-added kernel support for mmap()-able guest_memfd
> > instances[1], which allow it to be used for shared memory, and the
> > following patchset[2], which adds additional guest_memfd interfaces to
> > allow it to be used to perform in-place conversion:
> > 
> >    "[PATCH v7 00/42] guest_memfd: In-place conversion support"
> >    https://lore.kernel.org/kvm/20260522-gmem-inplace-conversion-v7-0-2f0fae496530@google.com/
> > 
> > That series also introduces a new 'vm_memory_attributes' KVM
> > module option, which sets whether memory attributes are tracked
> > VM-wide by KVM (vm_memory_attributes=1: the existing 'legacy' mode),
> > or per-guest_memfd instance (vm_memory_attributes=0: the new mode
> > which allows for in-place conversion). The latter is intended to
> > eventually deprecate the legacy mode, at which point in-place
> > conversion would become the primarily-supported mode.
> > 
> > 
> > MOTIVATION
> > ----------
> > 
> > Today, SEV-SNP guests (and other CoCo VM types using guest_memfd) keep
> > shared and private memory on separate physical backings: a userspace
> > memory-backend object for shared pages, and a kernel-allocated
> > guest_memfd file descriptor for private pages. KVM_SET_MEMORY_ATTRIBUTES
> > flips which backing the guest sees for a given GPA range, and the old
> > backing is typically discarded / hole-punched on conversion to avoid
> > doubled memory usage.
> > 
> > That model works, but has a number of downsides that impact certain
> > use-cases:
> > 
> >    - Each conversion involves discarding pages on one side and faulting
> >      them in on the other, which incurs allocation overheads in the
> >      host kernel for every conversion.
> > 
> >    - Some use-cases, like pKVM[3], rely on memory isolation rather than
> >      encryption and rely on in-place conversion to pass through things
> >      like secured framebuffer memory without needing to bounce data
> >      through separate shared/private HPAs, which would introduce
> >      unacceptable latency for that sort of workload.
> > 
> >    - Hugetlb support[4] for guest_memfd will rely on it, since things like
> >      1GB hugepages with a mix of shared/private sub-ranges would generally
> >      require 2 1GB hugetlb pages to remain available to handle shared vs.
> >      private accesses, which quickly causes doubling of guest memory usage.
> > 
> > Recent kernel work[2] makes guest_memfd mmap()-able and lets the *same*
> > physical pages be used for both shared and private states for a given
> > GPA range, allowing the above pitfalls to be naturally avoided.
> > 
> > This series wires that support up in QEMU.
> 
> + Peter,
> 
> Peter had the series[*] to enable the mmap() of guest memfd and allow it
> serve as unencrypted memory for VMs. I believe there are some overlapped
> efforts.
> 
> [*] https://lore.kernel.org/qemu-devel/20251215205203.1185099-1-peterx@redhat.com/

Thanks, I wasn't aware of that series but it definitely seems like a
good idea to take that for base mmapable guest_memfd support for normal
VMs and then rebase my inplace-conversion / confidential VM patches on
top.

I do think it would be a good idea to introduce a dedicated backend
however. I brought up the discussion in that thread, but I think that
mostly only calls patch #2 from this series into question and most of
the other patches still seem like they'll be needed for confidential
VMs.

Thanks for pointing this out.

-Mike

> 
> > 
> > DESIGN
> > ------
> > 
> > A new dedicated memory backend, memory-backend-guest-memfd, allocates
> > its memory via a guest_memfd file descriptor obtained from KVM with
> > the GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED flags.
> 
> A quick feedback:
> 
> The design choice from Peter's series was to extend the current
> hostmem-memfd backend to support guest-memfd instead of a new dedicated
> backend.
> I think we need to evaluate the pros and cons of each other, and make a
> choice.
> 
> (I will go read the other part later and provide more feedback)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 02/12] hostmem: Introduce dedicated memory backend for guest_memfd
  2026-06-02  8:22   ` Markus Armbruster
@ 2026-06-03  6:19     ` Michael Roth
  2026-06-08  8:20       ` Markus Armbruster
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Roth @ 2026-06-03  6:19 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, kvm, pbonzini, berrange, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

On Tue, Jun 02, 2026 at 10:22:01AM +0200, Markus Armbruster wrote:
> Michael Roth <michael.roth@amd.com> writes:
> 
> > In the initial implementation of guest_memfd in the linux kernel, it
> > was not possible to map memory into userspace for direct access; instead
> > the memory provided by the memory backend would be used for cases where
> > a confidential VM wants to access normal/unprotected/unencrypted memory
> > that can be used for shared memory use cases, and for access to private
> > memory a guest_memfd could be associated with the same memslot. A memory
> > 'private' attribute set via KVM_SET_MEMORY_ATTRIBUTES could then be used
> > to have KVM route to the approprate backing memory.
> >
> > In that model, it didn't make sense to introduce a specific backend for
> > guest_memfd, since there was always a generally need to have a separate
> 
> a general need?

Much nicer :)

> 
> > backend type to handle shared memory access/allocation. Instead, QEMU
> > configures the guest_memfd support for the associated memslots
> > internally for cases where it is running a confidential VM.
> >
> > However, with recent changes in guest_memfd kernel support, it is now
> > possible to mmap() a guest_memfd FD into userspace and use it for shared
> > memory, as well as continue to use the same physical pages for the same
> > GPA ranges after they are converted to private ("in-place conversion").
> >
> > To enable the use of this mmap()-able/guest_memfd-provided memory to be
> > used for normal/shared memory instead of just for private memory,
> > introduce a dedicated guest_memfd memory backend that can be used both
> > for confidential VMs that wish to make use of in-place conversion, as
> > well as for non-confidential VMs that just want to make use of
> > guest_memfd for normal memory (which can be useful both for testing as
> > well as a stepping stone to things like software-protected VMs where the
> > host can be trusted to provided some additional degree of isolation for
> > the VM independently of hardware support).
> >
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> 
> [...]
> 
> > diff --git a/qapi/qom.json b/qapi/qom.json
> > index dd45ac1087..502fafeb15 100644
> > --- a/qapi/qom.json
> > +++ b/qapi/qom.json
> > @@ -661,7 +661,8 @@
> >  # @share: if false, the memory is private to QEMU; if true, it is
> >  #     shared (default false for backends memory-backend-file and
> >  #     memory-backend-ram, true for backends memory-backend-epc,
> > -#     memory-backend-memfd, and memory-backend-shm)
> > +#     memory-backend-memfd, memory-backend-shm, and
> > +#     memory-backend-guest-memfd)
> >  #
> >  # @reserve: if true, reserve swap space (or huge pages) if applicable
> >  #     (default: true) (since 6.1)
> > @@ -780,6 +781,18 @@
> >              '*seal': 'bool' },
> >    'if': 'CONFIG_LINUX' }
> >  
> > +##
> > +# @MemoryBackendGuestMemfdProperties:
> > +#
> > +# Properties for memory-backend-guest-memfd objects.
> > +#
> > +# Since: 11.1
> > +##
> > +{ 'struct': 'MemoryBackendGuestMemfdProperties',
> > +  'base': 'MemoryBackendProperties',
> > +  'data': {},
> > +  'if': 'CONFIG_LINUX' }
> > +
> 
> Identical to MemoryBackendProperties so far.
> 
> >  ##
> >  # @MemoryBackendShmProperties:
> >  #
> > @@ -1234,6 +1247,8 @@
> >      'memory-backend-file',
> >      { 'name': 'memory-backend-memfd',
> >        'if': 'CONFIG_LINUX' },
> > +    { 'name': 'memory-backend-guest-memfd',
> > +      'if': 'CONFIG_LINUX' },
> >      'memory-backend-ram',
> >      { 'name': 'memory-backend-shm',
> >        'if': 'CONFIG_POSIX' },
> > @@ -1312,6 +1327,8 @@
> >        'memory-backend-file':        'MemoryBackendFileProperties',
> >        'memory-backend-memfd':       { 'type': 'MemoryBackendMemfdProperties',
> >                                        'if': 'CONFIG_LINUX' },
> > +      'memory-backend-guest-memfd': { 'type': 'MemoryBackendGuestMemfdProperties',
> > +                                      'if': 'CONFIG_LINUX' },
> 
> You could use MemoryBackendProperties here, and drop
> MemoryBackendGuestMemfdProperties, similar to how memory-backend-ram
> is done.

That's true. I think I was anticipating it being warranted at some point, but
that doesn't need to happen here.

> 
> >        'memory-backend-ram':         'MemoryBackendProperties',
> >        'memory-backend-shm':         { 'type': 'MemoryBackendShmProperties',
> >                                        'if': 'CONFIG_POSIX' },
> 
> Should we provide guidance on when to use which memory backend?  The
> commit message provides some clues...

Were you thinking from a schema perspective, or something more
user-facing?

Either way, docs/system/confidential-guest-support.rst could definitely
use some sprucing up as part of this series, so I can cover this aspect
there as well.

> 
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 96ae41f787..3c754c149f 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -5858,6 +5858,11 @@ SRST
> >          off will cause a failure during allocation because it is not supported
> >          by this backend.
> >  
> > +    ``-object memory-backend-guest-memfd,id=id,prealloc=on|off,size=size,host-nodes=host-nodes,policy=default|preferred|bind|interleave``
> > +        Creates an anonymous memory file backend object that has similar
> > +        semantics to memfd, but is also usable as private memory when
> > +        running as a confidential VM. (Linux only)
> 
> There is no object type "memfd".  Do you mean "memory-backend-memfd"?

Yes, will update.

> 
> If yes, that one has additional properties @hugetlb, @hugetlbsize, and
> @seal.  Why are they not needed for memory-backend-guest-memfd?

ATM, hugetlb is not enabled for guest_memfd in the kernel. It's likely the
same set of options will apply, but there are also efforts to do things like
plumb DAX memory through guest_memfd for confidential VMs where maybe we end
up needing to be a bit more flexible/creative... not sure, but it seemed
like a good idea to give ourselves a clean slate since the support isn't
there yet anyway.

For seal, I'm not aware of any plan to support that for guest_memfd, so
it seems like unecessary baggage to pull in.

Thanks,

Mike

> 
> > +
> >      ``-object iommufd,id=id[,fd=fd]``
> >          Creates an iommufd backend which allows control of DMA mapping
> >          through the ``/dev/iommu`` device.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 04/12] accel/kvm: Add CGS option to control in-place conversion support
  2026-06-02  8:23   ` Markus Armbruster
@ 2026-06-03  6:39     ` Michael Roth
  2026-06-08  8:15       ` Markus Armbruster
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Roth @ 2026-06-03  6:39 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, kvm, pbonzini, berrange, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

On Tue, Jun 02, 2026 at 10:23:40AM +0200, Markus Armbruster wrote:
> Michael Roth <michael.roth@amd.com> writes:
> 
> > For confidential guests, guest_memfd is currently used only for private
> > guest memory, and normal guest memory comes from the configured memory
> > backend just as it does for a non-confidential guest. It is now possible
> > to use the same physical memory to back a particular GPA regardless of
> > whether it is in a shared or private state. This avoids the need to
> > rely on discarding memory between shared/private conversions (to avoid
> > doubled memory usage), and is intended to be the primary mode of using
> > guest_memfd for confidential guests moving forward, and future features
> > like hugepage support will likely require it.
> >
> > Add an option to enable this support. Since ConfidentialGuestSupport is
> > already used to track some guest_memfd-related functionality (e.g.
> > whether it is required for the configured machine), similarly introduce
> > this option as a property of ConfidentialGuestSupport.
> >
> > Also add the KVM-specific checks to enable this support, but leave the
> > option disabled until other required changes are implemented for
> > CGS variants that intend to make use of KVM's in-place conversion
> > support.
> >
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> 
> [...]
> 
> > diff --git a/qapi/qom.json b/qapi/qom.json
> > index 502fafeb15..037c078799 100644
> > --- a/qapi/qom.json
> > +++ b/qapi/qom.json
> > @@ -1014,6 +1014,21 @@
> >    'if': 'CONFIG_IGVM',
> >    'data': { 'file': 'str' } }
> >  
> > +##
> > +# @ConfidentialGuestSupportProperties:
> > +#
> > +# Properties for ConfidentialGuestSupport base class.
> > +#
> > +# @convert-in-place: If true, the same physical pages are reused
> > +#     when memory is converted between shared and private states.
> > +#     If false (default), separate allocations are used depending
> > +#     on whether the page is private or shared.
> > +#
> > +# Since: 11.1
> > +##
> > +{ 'struct': 'ConfidentialGuestSupportProperties',
> > +  'data': { '*convert-in-place': 'bool' } }
> > +
> >  ##
> >  # @SevCommonProperties:
> >  #
> > @@ -1038,6 +1053,7 @@
> >  # Since: 9.1
> >  ##
> >  { 'struct': 'SevCommonProperties',
> > +  'base': 'ConfidentialGuestSupportProperties',
> >    'data': { '*sev-device': 'str',
> >              '*cbitpos': 'uint32',
> >              'reduced-phys-bits': 'uint32',
> 
> Why use a base type instead of simply adding @convert-in-place to
> SevCommonProperties?
> 

My thinking was that TDX and other implementations would similarly enable
this through their CGS implementation, so I went ahead and carved out a
set of common properties that ConfidentialGuestSupport implementations
could use the same ,convert-in-place=true option (or set it by default
for newer implementations)

It is sort of tied to the 'allow_convert_in_place' flag that is part of
the common ConfidentialGuestSupport object struct, so the property
handling is sort of tied to the common ConfidentialGuestSupport base
class as well rather than something implementation-specific.

Not sure if there are better ways to handle all that though.

Thanks,

Mike


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 09/12] accel/kvm: Support shared/private conversions via guest_memfd ioctls
  2026-05-28  0:03 ` [PATCH RFC 09/12] accel/kvm: Support shared/private conversions via guest_memfd ioctls Michael Roth
@ 2026-06-04 13:19   ` Gupta, Pankaj
  2026-06-04 23:36     ` Michael Roth
  0 siblings, 1 reply; 24+ messages in thread
From: Gupta, Pankaj @ 2026-06-04 13:19 UTC (permalink / raw)
  To: Michael Roth, qemu-devel
  Cc: kvm, pbonzini, berrange, armbru, isaku.yamahata, xiaoyao.li,
	chao.p.peng, david, ashish.kalra, ackerleytng


> When using guest_memfd with support for shared memory / in-place
> conversion, it is necessary to use the guest_memfd ioctls to handle
> conversions instead of KVM ioctls. Implement support for this by looping
> through all the sections within a converison range. Implement everything
> in terms of the kvm_convert_memory() loop, which already deals with some
> special considerations regarding various holes / region types that might
> be encountered.
>
> Also update kvm_set_memory_attributes_*() to use the same common path
> when convert-in-place=false. This potentially results in a small change
> in behavior due to the additional MMIO checks/skips now being applied in
> that case (generally qemu-triggered during setup) rather than only for
> kvm_convert_memory() (generally guest-triggered), but this is arguably
> safer, and it provides similar behavior between convert-in-place=false
> vs. convert-in-place=true, the latter of which *must* skip MMIO holes
> because the regions (and associated guest_memfds) themselves track
> shared/private state internally and passing the whole conversion range
> through to KVM is not an option in that case.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   accel/kvm/kvm-all.c | 131 ++++++++++++++++++++++++++++++++++++++------
>   1 file changed, 114 insertions(+), 17 deletions(-)
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 62f2e8aa15..fd01435a0f 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -1626,14 +1626,78 @@ static int kvm_set_memory_attributes(hwaddr start, uint64_t size, uint64_t attr)
>       return r;
>   }
>   
> -int kvm_set_memory_attributes_private(hwaddr start, uint64_t size)
> +static int kvm_gmem_ioctl(int guest_memfd, unsigned long type, ...)
>   {
> -    return kvm_set_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +    int ret;
> +    void *arg;
> +    va_list ap;
> +
> +    va_start(ap, type);
> +    arg = va_arg(ap, void *);
> +    va_end(ap);
> +
> +    ret = ioctl(guest_memfd, type, arg);
> +    if (ret == -1) {
> +        ret = -errno;
> +    }
> +    return ret;
>   }
>   
> -int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size)
> +static int guest_memfd_set_memory_attributes_fd(int guest_memfd, hwaddr offset,
> +                                                uint64_t size, uint64_t attr)
>   {
> -    return kvm_set_memory_attributes(start, size, 0);
> +    struct kvm_memory_attributes2 attrs;

-    struct kvm_memory_attributes2 attrs;
+    struct kvm_memory_attributes2 attrs = {0};

Zero initializing 'attrs' fixed a '-EINVAL' error, caused because of 
kernel 'attrs.reserved' check failed in 'kvm_gmem_set_attributes()'.

Thanks,

Pankaj

> +    int r;
> +
> +    assert((attr & kvm_supported_memory_attributes) == attr);
> +    attrs.attributes = attr;
> +    attrs.offset = offset;
> +    attrs.size = size;
> +    attrs.flags = 0;
> +
> +    /*
> +     * guest_memfd may need to delay conversion requests due to
> +     * the memory being in-use by the kernel. In most cases these
> +     * will be transient uses. In some cases, userspace itself may
> +     * be the cause of the memory being considered in-use, though
> +     * QEMU currently takes steps to avoid this (e.g. via
> +     * RamBlockAttributes). On that basis, this code loops
> +     * indefinitely with the assumption that only transient cases
> +     * will block, and that those will be for relatively short
> +     * periods vs. the overall conversion path.
> +     * If those assumptions at some point prove false, most likely
> +     * this will manifest as guest-side lockups on their conversion
> +     * path, which seems like the appropriate way to surface this
> +     * situation to the guest owner rather than some hard timeout.
> +     */
> +    do {
> +        r = kvm_gmem_ioctl(guest_memfd, KVM_SET_MEMORY_ATTRIBUTES2, &attrs);
> +    } while (r == -EAGAIN);
> +
> +    if (r) {
> +        error_report("failed to set memory (0x%" HWADDR_PRIx "+0x%" PRIx64 ") "
> +                     "with attr 0x%" PRIx64 " error '%s'",
> +                     offset, size, attr, strerror(-r));
> +    }
> +    return r;
> +}
> +
> +static int guest_memfd_set_memory_section_attributes(MemoryRegionSection *section, uint64_t attr)
> +{
> +    hwaddr convert_offset, convert_size;
> +    MemoryRegion *mr = section->mr;
> +    RAMBlock *rb;
> +
> +    assert(mr);
> +    rb = mr->ram_block;
> +    assert(rb->guest_memfd);
> +    convert_offset = section->offset_within_region;
> +    convert_size = int128_get64(section->size);
> +
> +    return guest_memfd_set_memory_attributes_fd(rb->guest_memfd,
> +                                                convert_offset,
> +                                                convert_size,
> +                                                attr);
>   }
>   
>   /* Called with KVMMemoryListener.slots_lock held */
> @@ -3447,10 +3511,18 @@ static int kvm_convert_section(MemoryRegionSection *section, bool to_private)
>       hwaddr size = int128_get64(section->size);
>       int ret;
>   
> -    if (to_private) {
> -        ret = kvm_set_memory_attributes_private(start, size);
> +    if (current_machine->cgs && current_machine->cgs->convert_in_place) {
> +        ret = guest_memfd_set_memory_section_attributes(section,
> +                                                        to_private ? KVM_MEMORY_ATTRIBUTE_PRIVATE
> +                                                                   : 0);
>       } else {
> -        ret = kvm_set_memory_attributes_shared(start, size);
> +        /*
> +         * Without in-place conversion, attribute-tracking is handled by KVM
> +         * across all guest memory rather than on a per-section/slot basis.
> +         */
> +        ret = kvm_set_memory_attributes(start, size,
> +                                        to_private ? KVM_MEMORY_ATTRIBUTE_PRIVATE
> +                                                   : 0);
>       }
>   
>       return ret;
> @@ -3544,7 +3616,8 @@ static int kvm_post_convert_section(MemoryRegionSection *section, bool to_privat
>       return 0;
>   }
>   
> -int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> +static int kvm_convert_memory_full(hwaddr start, hwaddr size, bool to_private,
> +                                   bool pre_hooks, bool post_hooks)
>   {
>       int ret = -EINVAL;
>   
> @@ -3588,10 +3661,12 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>               continue;
>           }
>   
> -        ret = kvm_pre_convert_section(&section, to_private);
> -        if (ret) {
> -            memory_region_unref(section.mr);
> -            break;
> +        if (pre_hooks) {
> +            ret = kvm_pre_convert_section(&section, to_private);
> +            if (ret) {
> +                memory_region_unref(section.mr);
> +                break;
> +            }
>           }
>   
>           ret = kvm_convert_section(&section, to_private);
> @@ -3600,13 +3675,15 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>               break;
>           }
>   
> -        ret = kvm_post_convert_section(&section, to_private);
> -        memory_region_unref(section.mr);
> -
> -        if (ret) {
> -            break;
> +        if (post_hooks) {
> +            ret = kvm_post_convert_section(&section, to_private);
> +            if (ret) {
> +                memory_region_unref(section.mr);
> +                break;
> +            }
>           }
>   
> +        memory_region_unref(section.mr);
>           size -= section_end - start;
>           start = section_end;
>       }
> @@ -3614,6 +3691,26 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>       return ret;
>   }
>   
> +int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> +{
> +    return kvm_convert_memory_full(start, size, to_private, true, true);
> +}
> +
> +static int kvm_convert_memory_attributes(hwaddr start, hwaddr size, bool to_private)
> +{
> +    return kvm_convert_memory_full(start, size, to_private, false, false);
> +}
> +
> +int kvm_set_memory_attributes_private(hwaddr start, uint64_t size)
> +{
> +    return kvm_convert_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +}
> +
> +int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size)
> +{
> +    return kvm_convert_memory_attributes(start, size, 0);
> +}
> +
>   int kvm_cpu_exec(CPUState *cpu)
>   {
>       struct kvm_run *run = cpu->kvm_run;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 09/12] accel/kvm: Support shared/private conversions via guest_memfd ioctls
  2026-06-04 13:19   ` Gupta, Pankaj
@ 2026-06-04 23:36     ` Michael Roth
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-06-04 23:36 UTC (permalink / raw)
  To: Gupta, Pankaj
  Cc: qemu-devel, kvm, pbonzini, berrange, armbru, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

On Thu, Jun 04, 2026 at 03:19:17PM +0200, Gupta, Pankaj wrote:
> 
> > When using guest_memfd with support for shared memory / in-place
> > conversion, it is necessary to use the guest_memfd ioctls to handle
> > conversions instead of KVM ioctls. Implement support for this by looping
> > through all the sections within a converison range. Implement everything
> > in terms of the kvm_convert_memory() loop, which already deals with some
> > special considerations regarding various holes / region types that might
> > be encountered.
> > 
> > Also update kvm_set_memory_attributes_*() to use the same common path
> > when convert-in-place=false. This potentially results in a small change
> > in behavior due to the additional MMIO checks/skips now being applied in
> > that case (generally qemu-triggered during setup) rather than only for
> > kvm_convert_memory() (generally guest-triggered), but this is arguably
> > safer, and it provides similar behavior between convert-in-place=false
> > vs. convert-in-place=true, the latter of which *must* skip MMIO holes
> > because the regions (and associated guest_memfds) themselves track
> > shared/private state internally and passing the whole conversion range
> > through to KVM is not an option in that case.
> > 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >   accel/kvm/kvm-all.c | 131 ++++++++++++++++++++++++++++++++++++++------
> >   1 file changed, 114 insertions(+), 17 deletions(-)
> > 
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index 62f2e8aa15..fd01435a0f 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -1626,14 +1626,78 @@ static int kvm_set_memory_attributes(hwaddr start, uint64_t size, uint64_t attr)
> >       return r;
> >   }
> > -int kvm_set_memory_attributes_private(hwaddr start, uint64_t size)
> > +static int kvm_gmem_ioctl(int guest_memfd, unsigned long type, ...)
> >   {
> > -    return kvm_set_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> > +    int ret;
> > +    void *arg;
> > +    va_list ap;
> > +
> > +    va_start(ap, type);
> > +    arg = va_arg(ap, void *);
> > +    va_end(ap);
> > +
> > +    ret = ioctl(guest_memfd, type, arg);
> > +    if (ret == -1) {
> > +        ret = -errno;
> > +    }
> > +    return ret;
> >   }
> > -int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size)
> > +static int guest_memfd_set_memory_attributes_fd(int guest_memfd, hwaddr offset,
> > +                                                uint64_t size, uint64_t attr)
> >   {
> > -    return kvm_set_memory_attributes(start, size, 0);
> > +    struct kvm_memory_attributes2 attrs;
> 
> -    struct kvm_memory_attributes2 attrs;
> +    struct kvm_memory_attributes2 attrs = {0};
> 
> Zero initializing 'attrs' fixed a '-EINVAL' error, caused because of kernel
> 'attrs.reserved' check failed in 'kvm_gmem_set_attributes()'.

Indeed, thanks for the catch!

-Mike

> 
> Thanks,
> 
> Pankaj
> 
> > +    int r;
> > +
> > +    assert((attr & kvm_supported_memory_attributes) == attr);
> > +    attrs.attributes = attr;
> > +    attrs.offset = offset;
> > +    attrs.size = size;
> > +    attrs.flags = 0;
> > +
> > +    /*
> > +     * guest_memfd may need to delay conversion requests due to
> > +     * the memory being in-use by the kernel. In most cases these
> > +     * will be transient uses. In some cases, userspace itself may
> > +     * be the cause of the memory being considered in-use, though
> > +     * QEMU currently takes steps to avoid this (e.g. via
> > +     * RamBlockAttributes). On that basis, this code loops
> > +     * indefinitely with the assumption that only transient cases
> > +     * will block, and that those will be for relatively short
> > +     * periods vs. the overall conversion path.
> > +     * If those assumptions at some point prove false, most likely
> > +     * this will manifest as guest-side lockups on their conversion
> > +     * path, which seems like the appropriate way to surface this
> > +     * situation to the guest owner rather than some hard timeout.
> > +     */
> > +    do {
> > +        r = kvm_gmem_ioctl(guest_memfd, KVM_SET_MEMORY_ATTRIBUTES2, &attrs);
> > +    } while (r == -EAGAIN);
> > +
> > +    if (r) {
> > +        error_report("failed to set memory (0x%" HWADDR_PRIx "+0x%" PRIx64 ") "
> > +                     "with attr 0x%" PRIx64 " error '%s'",
> > +                     offset, size, attr, strerror(-r));
> > +    }
> > +    return r;
> > +}
> > +
> > +static int guest_memfd_set_memory_section_attributes(MemoryRegionSection *section, uint64_t attr)
> > +{
> > +    hwaddr convert_offset, convert_size;
> > +    MemoryRegion *mr = section->mr;
> > +    RAMBlock *rb;
> > +
> > +    assert(mr);
> > +    rb = mr->ram_block;
> > +    assert(rb->guest_memfd);
> > +    convert_offset = section->offset_within_region;
> > +    convert_size = int128_get64(section->size);
> > +
> > +    return guest_memfd_set_memory_attributes_fd(rb->guest_memfd,
> > +                                                convert_offset,
> > +                                                convert_size,
> > +                                                attr);
> >   }
> >   /* Called with KVMMemoryListener.slots_lock held */
> > @@ -3447,10 +3511,18 @@ static int kvm_convert_section(MemoryRegionSection *section, bool to_private)
> >       hwaddr size = int128_get64(section->size);
> >       int ret;
> > -    if (to_private) {
> > -        ret = kvm_set_memory_attributes_private(start, size);
> > +    if (current_machine->cgs && current_machine->cgs->convert_in_place) {
> > +        ret = guest_memfd_set_memory_section_attributes(section,
> > +                                                        to_private ? KVM_MEMORY_ATTRIBUTE_PRIVATE
> > +                                                                   : 0);
> >       } else {
> > -        ret = kvm_set_memory_attributes_shared(start, size);
> > +        /*
> > +         * Without in-place conversion, attribute-tracking is handled by KVM
> > +         * across all guest memory rather than on a per-section/slot basis.
> > +         */
> > +        ret = kvm_set_memory_attributes(start, size,
> > +                                        to_private ? KVM_MEMORY_ATTRIBUTE_PRIVATE
> > +                                                   : 0);
> >       }
> >       return ret;
> > @@ -3544,7 +3616,8 @@ static int kvm_post_convert_section(MemoryRegionSection *section, bool to_privat
> >       return 0;
> >   }
> > -int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> > +static int kvm_convert_memory_full(hwaddr start, hwaddr size, bool to_private,
> > +                                   bool pre_hooks, bool post_hooks)
> >   {
> >       int ret = -EINVAL;
> > @@ -3588,10 +3661,12 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> >               continue;
> >           }
> > -        ret = kvm_pre_convert_section(&section, to_private);
> > -        if (ret) {
> > -            memory_region_unref(section.mr);
> > -            break;
> > +        if (pre_hooks) {
> > +            ret = kvm_pre_convert_section(&section, to_private);
> > +            if (ret) {
> > +                memory_region_unref(section.mr);
> > +                break;
> > +            }
> >           }
> >           ret = kvm_convert_section(&section, to_private);
> > @@ -3600,13 +3675,15 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> >               break;
> >           }
> > -        ret = kvm_post_convert_section(&section, to_private);
> > -        memory_region_unref(section.mr);
> > -
> > -        if (ret) {
> > -            break;
> > +        if (post_hooks) {
> > +            ret = kvm_post_convert_section(&section, to_private);
> > +            if (ret) {
> > +                memory_region_unref(section.mr);
> > +                break;
> > +            }
> >           }
> > +        memory_region_unref(section.mr);
> >           size -= section_end - start;
> >           start = section_end;
> >       }
> > @@ -3614,6 +3691,26 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> >       return ret;
> >   }
> > +int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> > +{
> > +    return kvm_convert_memory_full(start, size, to_private, true, true);
> > +}
> > +
> > +static int kvm_convert_memory_attributes(hwaddr start, hwaddr size, bool to_private)
> > +{
> > +    return kvm_convert_memory_full(start, size, to_private, false, false);
> > +}
> > +
> > +int kvm_set_memory_attributes_private(hwaddr start, uint64_t size)
> > +{
> > +    return kvm_convert_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> > +}
> > +
> > +int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size)
> > +{
> > +    return kvm_convert_memory_attributes(start, size, 0);
> > +}
> > +
> >   int kvm_cpu_exec(CPUState *cpu)
> >   {
> >       struct kvm_run *run = cpu->kvm_run;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 04/12] accel/kvm: Add CGS option to control in-place conversion support
  2026-06-03  6:39     ` Michael Roth
@ 2026-06-08  8:15       ` Markus Armbruster
  2026-06-08 20:21         ` Michael Roth
  0 siblings, 1 reply; 24+ messages in thread
From: Markus Armbruster @ 2026-06-08  8:15 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, kvm, pbonzini, berrange, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

Michael Roth <michael.roth@amd.com> writes:

> On Tue, Jun 02, 2026 at 10:23:40AM +0200, Markus Armbruster wrote:
>> Michael Roth <michael.roth@amd.com> writes:
>> 
>> > For confidential guests, guest_memfd is currently used only for private
>> > guest memory, and normal guest memory comes from the configured memory
>> > backend just as it does for a non-confidential guest. It is now possible
>> > to use the same physical memory to back a particular GPA regardless of
>> > whether it is in a shared or private state. This avoids the need to
>> > rely on discarding memory between shared/private conversions (to avoid
>> > doubled memory usage), and is intended to be the primary mode of using
>> > guest_memfd for confidential guests moving forward, and future features
>> > like hugepage support will likely require it.
>> >
>> > Add an option to enable this support. Since ConfidentialGuestSupport is
>> > already used to track some guest_memfd-related functionality (e.g.
>> > whether it is required for the configured machine), similarly introduce
>> > this option as a property of ConfidentialGuestSupport.
>> >
>> > Also add the KVM-specific checks to enable this support, but leave the
>> > option disabled until other required changes are implemented for
>> > CGS variants that intend to make use of KVM's in-place conversion
>> > support.
>> >
>> > Signed-off-by: Michael Roth <michael.roth@amd.com>
>> 
>> [...]
>> 
>> > diff --git a/qapi/qom.json b/qapi/qom.json
>> > index 502fafeb15..037c078799 100644
>> > --- a/qapi/qom.json
>> > +++ b/qapi/qom.json
>> > @@ -1014,6 +1014,21 @@
>> >    'if': 'CONFIG_IGVM',
>> >    'data': { 'file': 'str' } }
>> >  
>> > +##
>> > +# @ConfidentialGuestSupportProperties:
>> > +#
>> > +# Properties for ConfidentialGuestSupport base class.
>> > +#
>> > +# @convert-in-place: If true, the same physical pages are reused
>> > +#     when memory is converted between shared and private states.
>> > +#     If false (default), separate allocations are used depending
>> > +#     on whether the page is private or shared.
>> > +#
>> > +# Since: 11.1
>> > +##
>> > +{ 'struct': 'ConfidentialGuestSupportProperties',
>> > +  'data': { '*convert-in-place': 'bool' } }
>> > +
>> >  ##
>> >  # @SevCommonProperties:
>> >  #
>> > @@ -1038,6 +1053,7 @@
>> >  # Since: 9.1
>> >  ##
>> >  { 'struct': 'SevCommonProperties',
>> > +  'base': 'ConfidentialGuestSupportProperties',
>> >    'data': { '*sev-device': 'str',
>> >              '*cbitpos': 'uint32',
>> >              'reduced-phys-bits': 'uint32',
>> 
>> Why use a base type instead of simply adding @convert-in-place to
>> SevCommonProperties?
>> 
>
> My thinking was that TDX and other implementations would similarly enable
> this through their CGS implementation, so I went ahead and carved out a
> set of common properties that ConfidentialGuestSupport implementations
> could use the same ,convert-in-place=true option (or set it by default
> for newer implementations)

How confident are we in future reuse by TDX and others?

If there are doubts, refactoring for reuse when reuse happens would be
smarter.  The refactoring would be a bit of churn, but not all that
much.

If it's something like "pretty much inevitable", preparing the reuse now
saves us that churn, and makes sense.

Judgement call, i.e. you decide.

> It is sort of tied to the 'allow_convert_in_place' flag that is part of
> the common ConfidentialGuestSupport object struct, so the property
> handling is sort of tied to the common ConfidentialGuestSupport base
> class as well rather than something implementation-specific.

Valid point.  Not sure how much weight to assign to it, though.

> Not sure if there are better ways to handle all that though.

Work your rationale into the commit message, please.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 02/12] hostmem: Introduce dedicated memory backend for guest_memfd
  2026-06-03  6:19     ` Michael Roth
@ 2026-06-08  8:20       ` Markus Armbruster
  2026-06-08 20:42         ` Michael Roth
  0 siblings, 1 reply; 24+ messages in thread
From: Markus Armbruster @ 2026-06-08  8:20 UTC (permalink / raw)
  To: Michael Roth
  Cc: Markus Armbruster, qemu-devel, kvm, pbonzini, berrange,
	pankaj.gupta, isaku.yamahata, xiaoyao.li, chao.p.peng, david,
	ashish.kalra, ackerleytng

Michael Roth <michael.roth@amd.com> writes:

> On Tue, Jun 02, 2026 at 10:22:01AM +0200, Markus Armbruster wrote:
>> Michael Roth <michael.roth@amd.com> writes:
>> 
>> > In the initial implementation of guest_memfd in the linux kernel, it
>> > was not possible to map memory into userspace for direct access; instead
>> > the memory provided by the memory backend would be used for cases where
>> > a confidential VM wants to access normal/unprotected/unencrypted memory
>> > that can be used for shared memory use cases, and for access to private
>> > memory a guest_memfd could be associated with the same memslot. A memory
>> > 'private' attribute set via KVM_SET_MEMORY_ATTRIBUTES could then be used
>> > to have KVM route to the approprate backing memory.
>> >
>> > In that model, it didn't make sense to introduce a specific backend for
>> > guest_memfd, since there was always a generally need to have a separate
>> 
>> a general need?
>
> Much nicer :)
>
>> 
>> > backend type to handle shared memory access/allocation. Instead, QEMU
>> > configures the guest_memfd support for the associated memslots
>> > internally for cases where it is running a confidential VM.
>> >
>> > However, with recent changes in guest_memfd kernel support, it is now
>> > possible to mmap() a guest_memfd FD into userspace and use it for shared
>> > memory, as well as continue to use the same physical pages for the same
>> > GPA ranges after they are converted to private ("in-place conversion").
>> >
>> > To enable the use of this mmap()-able/guest_memfd-provided memory to be
>> > used for normal/shared memory instead of just for private memory,
>> > introduce a dedicated guest_memfd memory backend that can be used both
>> > for confidential VMs that wish to make use of in-place conversion, as
>> > well as for non-confidential VMs that just want to make use of
>> > guest_memfd for normal memory (which can be useful both for testing as
>> > well as a stepping stone to things like software-protected VMs where the
>> > host can be trusted to provided some additional degree of isolation for
>> > the VM independently of hardware support).
>> >
>> > Signed-off-by: Michael Roth <michael.roth@amd.com>
>> 
>> [...]
>> 
>> > diff --git a/qapi/qom.json b/qapi/qom.json
>> > index dd45ac1087..502fafeb15 100644
>> > --- a/qapi/qom.json
>> > +++ b/qapi/qom.json
>> > @@ -661,7 +661,8 @@
>> >  # @share: if false, the memory is private to QEMU; if true, it is
>> >  #     shared (default false for backends memory-backend-file and
>> >  #     memory-backend-ram, true for backends memory-backend-epc,
>> > -#     memory-backend-memfd, and memory-backend-shm)
>> > +#     memory-backend-memfd, memory-backend-shm, and
>> > +#     memory-backend-guest-memfd)
>> >  #
>> >  # @reserve: if true, reserve swap space (or huge pages) if applicable
>> >  #     (default: true) (since 6.1)
>> > @@ -780,6 +781,18 @@
>> >              '*seal': 'bool' },
>> >    'if': 'CONFIG_LINUX' }
>> >  
>> > +##
>> > +# @MemoryBackendGuestMemfdProperties:
>> > +#
>> > +# Properties for memory-backend-guest-memfd objects.
>> > +#
>> > +# Since: 11.1
>> > +##
>> > +{ 'struct': 'MemoryBackendGuestMemfdProperties',
>> > +  'base': 'MemoryBackendProperties',
>> > +  'data': {},
>> > +  'if': 'CONFIG_LINUX' }
>> > +
>> 
>> Identical to MemoryBackendProperties so far.
>> 
>> >  ##
>> >  # @MemoryBackendShmProperties:
>> >  #
>> > @@ -1234,6 +1247,8 @@
>> >      'memory-backend-file',
>> >      { 'name': 'memory-backend-memfd',
>> >        'if': 'CONFIG_LINUX' },
>> > +    { 'name': 'memory-backend-guest-memfd',
>> > +      'if': 'CONFIG_LINUX' },
>> >      'memory-backend-ram',
>> >      { 'name': 'memory-backend-shm',
>> >        'if': 'CONFIG_POSIX' },
>> > @@ -1312,6 +1327,8 @@
>> >        'memory-backend-file':        'MemoryBackendFileProperties',
>> >        'memory-backend-memfd':       { 'type': 'MemoryBackendMemfdProperties',
>> >                                        'if': 'CONFIG_LINUX' },
>> > +      'memory-backend-guest-memfd': { 'type': 'MemoryBackendGuestMemfdProperties',
>> > +                                      'if': 'CONFIG_LINUX' },
>> 
>> You could use MemoryBackendProperties here, and drop
>> MemoryBackendGuestMemfdProperties, similar to how memory-backend-ram
>> is done.
>
> That's true. I think I was anticipating it being warranted at some point, but
> that doesn't need to happen here.
>
>> 
>> >        'memory-backend-ram':         'MemoryBackendProperties',
>> >        'memory-backend-shm':         { 'type': 'MemoryBackendShmProperties',
>> >                                        'if': 'CONFIG_POSIX' },
>> 
>> Should we provide guidance on when to use which memory backend?  The
>> commit message provides some clues...
>
> Were you thinking from a schema perspective, or something more
> user-facing?

The QAPI schema doc comments become the QEMU QMP Reference Manual, which
I believe is the first stop for "how do I use this?"

Sometimes, a full answer just doesn't fit there comfortably.  So we put
it elsewhere, and point to it from the QMP Reference.

> Either way, docs/system/confidential-guest-support.rst could definitely
> use some sprucing up as part of this series, so I can cover this aspect
> there as well.
>
>> 
>> > diff --git a/qemu-options.hx b/qemu-options.hx
>> > index 96ae41f787..3c754c149f 100644
>> > --- a/qemu-options.hx
>> > +++ b/qemu-options.hx
>> > @@ -5858,6 +5858,11 @@ SRST
>> >          off will cause a failure during allocation because it is not supported
>> >          by this backend.
>> >  
>> > +    ``-object memory-backend-guest-memfd,id=id,prealloc=on|off,size=size,host-nodes=host-nodes,policy=default|preferred|bind|interleave``
>> > +        Creates an anonymous memory file backend object that has similar
>> > +        semantics to memfd, but is also usable as private memory when
>> > +        running as a confidential VM. (Linux only)
>> 
>> There is no object type "memfd".  Do you mean "memory-backend-memfd"?
>
> Yes, will update.
>
>> 
>> If yes, that one has additional properties @hugetlb, @hugetlbsize, and
>> @seal.  Why are they not needed for memory-backend-guest-memfd?
>
> ATM, hugetlb is not enabled for guest_memfd in the kernel. It's likely the
> same set of options will apply, but there are also efforts to do things like
> plumb DAX memory through guest_memfd for confidential VMs where maybe we end
> up needing to be a bit more flexible/creative... not sure, but it seemed
> like a good idea to give ourselves a clean slate since the support isn't
> there yet anyway.

I gather these properties cannot work today.  I agree we shouldn't add
them until they do.

> For seal, I'm not aware of any plan to support that for guest_memfd, so
> it seems like unecessary baggage to pull in.

Likewise.

> Thanks,
>
> Mike
>
>> 
>> > +
>> >      ``-object iommufd,id=id[,fd=fd]``
>> >          Creates an iommufd backend which allows control of DMA mapping
>> >          through the ``/dev/iommu`` device.
>> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 04/12] accel/kvm: Add CGS option to control in-place conversion support
  2026-06-08  8:15       ` Markus Armbruster
@ 2026-06-08 20:21         ` Michael Roth
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-06-08 20:21 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, kvm, pbonzini, berrange, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

On Mon, Jun 08, 2026 at 10:15:41AM +0200, Markus Armbruster wrote:
> Michael Roth <michael.roth@amd.com> writes:
> 
> > On Tue, Jun 02, 2026 at 10:23:40AM +0200, Markus Armbruster wrote:
> >> Michael Roth <michael.roth@amd.com> writes:
> >> 
> >> > For confidential guests, guest_memfd is currently used only for private
> >> > guest memory, and normal guest memory comes from the configured memory
> >> > backend just as it does for a non-confidential guest. It is now possible
> >> > to use the same physical memory to back a particular GPA regardless of
> >> > whether it is in a shared or private state. This avoids the need to
> >> > rely on discarding memory between shared/private conversions (to avoid
> >> > doubled memory usage), and is intended to be the primary mode of using
> >> > guest_memfd for confidential guests moving forward, and future features
> >> > like hugepage support will likely require it.
> >> >
> >> > Add an option to enable this support. Since ConfidentialGuestSupport is
> >> > already used to track some guest_memfd-related functionality (e.g.
> >> > whether it is required for the configured machine), similarly introduce
> >> > this option as a property of ConfidentialGuestSupport.
> >> >
> >> > Also add the KVM-specific checks to enable this support, but leave the
> >> > option disabled until other required changes are implemented for
> >> > CGS variants that intend to make use of KVM's in-place conversion
> >> > support.
> >> >
> >> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> >> 
> >> [...]
> >> 
> >> > diff --git a/qapi/qom.json b/qapi/qom.json
> >> > index 502fafeb15..037c078799 100644
> >> > --- a/qapi/qom.json
> >> > +++ b/qapi/qom.json
> >> > @@ -1014,6 +1014,21 @@
> >> >    'if': 'CONFIG_IGVM',
> >> >    'data': { 'file': 'str' } }
> >> >  
> >> > +##
> >> > +# @ConfidentialGuestSupportProperties:
> >> > +#
> >> > +# Properties for ConfidentialGuestSupport base class.
> >> > +#
> >> > +# @convert-in-place: If true, the same physical pages are reused
> >> > +#     when memory is converted between shared and private states.
> >> > +#     If false (default), separate allocations are used depending
> >> > +#     on whether the page is private or shared.
> >> > +#
> >> > +# Since: 11.1
> >> > +##
> >> > +{ 'struct': 'ConfidentialGuestSupportProperties',
> >> > +  'data': { '*convert-in-place': 'bool' } }
> >> > +
> >> >  ##
> >> >  # @SevCommonProperties:
> >> >  #
> >> > @@ -1038,6 +1053,7 @@
> >> >  # Since: 9.1
> >> >  ##
> >> >  { 'struct': 'SevCommonProperties',
> >> > +  'base': 'ConfidentialGuestSupportProperties',
> >> >    'data': { '*sev-device': 'str',
> >> >              '*cbitpos': 'uint32',
> >> >              'reduced-phys-bits': 'uint32',
> >> 
> >> Why use a base type instead of simply adding @convert-in-place to
> >> SevCommonProperties?
> >> 
> >
> > My thinking was that TDX and other implementations would similarly enable
> > this through their CGS implementation, so I went ahead and carved out a
> > set of common properties that ConfidentialGuestSupport implementations
> > could use the same ,convert-in-place=true option (or set it by default
> > for newer implementations)
> 
> How confident are we in future reuse by TDX and others?
> 
> If there are doubts, refactoring for reuse when reuse happens would be
> smarter.  The refactoring would be a bit of churn, but not all that
> much.
> 
> If it's something like "pretty much inevitable", preparing the reuse now
> saves us that churn, and makes sense.
> 
> Judgement call, i.e. you decide.

Hoping to hear back from the TDX folks on whether there's anything
missing to switch things on there too, at which point maybe it won't be
premature to have a common type. But yah, until then I can plan to keep
the option specific to SEV/SevCommonProperties. As you said, not a big
deal to move it out to a common base after-the-fact.

> 
> > It is sort of tied to the 'allow_convert_in_place' flag that is part of
> > the common ConfidentialGuestSupport object struct, so the property
> > handling is sort of tied to the common ConfidentialGuestSupport base
> > class as well rather than something implementation-specific.
> 
> Valid point.  Not sure how much weight to assign to it, though.

Yah, I was purposely trying to make it easy for other platforms to switch
it on, but if we do for the time being keep it limited to SNP, then
there's probably other ways to go about it.

> 
> > Not sure if there are better ways to handle all that though.
> 
> Work your rationale into the commit message, please.
> 

Will do!

Thanks,

Mike

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 02/12] hostmem: Introduce dedicated memory backend for guest_memfd
  2026-06-08  8:20       ` Markus Armbruster
@ 2026-06-08 20:42         ` Michael Roth
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2026-06-08 20:42 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, kvm, pbonzini, berrange, pankaj.gupta, isaku.yamahata,
	xiaoyao.li, chao.p.peng, david, ashish.kalra, ackerleytng

On Mon, Jun 08, 2026 at 10:20:22AM +0200, Markus Armbruster wrote:
> Michael Roth <michael.roth@amd.com> writes:
> 
> > On Tue, Jun 02, 2026 at 10:22:01AM +0200, Markus Armbruster wrote:
> >> Michael Roth <michael.roth@amd.com> writes:
> >> 
> >> > In the initial implementation of guest_memfd in the linux kernel, it
> >> > was not possible to map memory into userspace for direct access; instead
> >> > the memory provided by the memory backend would be used for cases where
> >> > a confidential VM wants to access normal/unprotected/unencrypted memory
> >> > that can be used for shared memory use cases, and for access to private
> >> > memory a guest_memfd could be associated with the same memslot. A memory
> >> > 'private' attribute set via KVM_SET_MEMORY_ATTRIBUTES could then be used
> >> > to have KVM route to the approprate backing memory.
> >> >
> >> > In that model, it didn't make sense to introduce a specific backend for
> >> > guest_memfd, since there was always a generally need to have a separate
> >> 
> >> a general need?
> >
> > Much nicer :)
> >
> >> 
> >> > backend type to handle shared memory access/allocation. Instead, QEMU
> >> > configures the guest_memfd support for the associated memslots
> >> > internally for cases where it is running a confidential VM.
> >> >
> >> > However, with recent changes in guest_memfd kernel support, it is now
> >> > possible to mmap() a guest_memfd FD into userspace and use it for shared
> >> > memory, as well as continue to use the same physical pages for the same
> >> > GPA ranges after they are converted to private ("in-place conversion").
> >> >
> >> > To enable the use of this mmap()-able/guest_memfd-provided memory to be
> >> > used for normal/shared memory instead of just for private memory,
> >> > introduce a dedicated guest_memfd memory backend that can be used both
> >> > for confidential VMs that wish to make use of in-place conversion, as
> >> > well as for non-confidential VMs that just want to make use of
> >> > guest_memfd for normal memory (which can be useful both for testing as
> >> > well as a stepping stone to things like software-protected VMs where the
> >> > host can be trusted to provided some additional degree of isolation for
> >> > the VM independently of hardware support).
> >> >
> >> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> >> 
> >> [...]
> >> 
> >> > diff --git a/qapi/qom.json b/qapi/qom.json
> >> > index dd45ac1087..502fafeb15 100644
> >> > --- a/qapi/qom.json
> >> > +++ b/qapi/qom.json
> >> > @@ -661,7 +661,8 @@
> >> >  # @share: if false, the memory is private to QEMU; if true, it is
> >> >  #     shared (default false for backends memory-backend-file and
> >> >  #     memory-backend-ram, true for backends memory-backend-epc,
> >> > -#     memory-backend-memfd, and memory-backend-shm)
> >> > +#     memory-backend-memfd, memory-backend-shm, and
> >> > +#     memory-backend-guest-memfd)
> >> >  #
> >> >  # @reserve: if true, reserve swap space (or huge pages) if applicable
> >> >  #     (default: true) (since 6.1)
> >> > @@ -780,6 +781,18 @@
> >> >              '*seal': 'bool' },
> >> >    'if': 'CONFIG_LINUX' }
> >> >  
> >> > +##
> >> > +# @MemoryBackendGuestMemfdProperties:
> >> > +#
> >> > +# Properties for memory-backend-guest-memfd objects.
> >> > +#
> >> > +# Since: 11.1
> >> > +##
> >> > +{ 'struct': 'MemoryBackendGuestMemfdProperties',
> >> > +  'base': 'MemoryBackendProperties',
> >> > +  'data': {},
> >> > +  'if': 'CONFIG_LINUX' }
> >> > +
> >> 
> >> Identical to MemoryBackendProperties so far.
> >> 
> >> >  ##
> >> >  # @MemoryBackendShmProperties:
> >> >  #
> >> > @@ -1234,6 +1247,8 @@
> >> >      'memory-backend-file',
> >> >      { 'name': 'memory-backend-memfd',
> >> >        'if': 'CONFIG_LINUX' },
> >> > +    { 'name': 'memory-backend-guest-memfd',
> >> > +      'if': 'CONFIG_LINUX' },
> >> >      'memory-backend-ram',
> >> >      { 'name': 'memory-backend-shm',
> >> >        'if': 'CONFIG_POSIX' },
> >> > @@ -1312,6 +1327,8 @@
> >> >        'memory-backend-file':        'MemoryBackendFileProperties',
> >> >        'memory-backend-memfd':       { 'type': 'MemoryBackendMemfdProperties',
> >> >                                        'if': 'CONFIG_LINUX' },
> >> > +      'memory-backend-guest-memfd': { 'type': 'MemoryBackendGuestMemfdProperties',
> >> > +                                      'if': 'CONFIG_LINUX' },
> >> 
> >> You could use MemoryBackendProperties here, and drop
> >> MemoryBackendGuestMemfdProperties, similar to how memory-backend-ram
> >> is done.
> >
> > That's true. I think I was anticipating it being warranted at some point, but
> > that doesn't need to happen here.
> >
> >> 
> >> >        'memory-backend-ram':         'MemoryBackendProperties',
> >> >        'memory-backend-shm':         { 'type': 'MemoryBackendShmProperties',
> >> >                                        'if': 'CONFIG_POSIX' },
> >> 
> >> Should we provide guidance on when to use which memory backend?  The
> >> commit message provides some clues...
> >
> > Were you thinking from a schema perspective, or something more
> > user-facing?
> 
> The QAPI schema doc comments become the QEMU QMP Reference Manual, which
> I believe is the first stop for "how do I use this?"
> 
> Sometimes, a full answer just doesn't fit there comfortably.  So we put
> it elsewhere, and point to it from the QMP Reference.

Makes sense, I'll cross reference the documentation and provide some
background on how the backends / options are used.

Thanks,

Mike

> 
> > Either way, docs/system/confidential-guest-support.rst could definitely
> > use some sprucing up as part of this series, so I can cover this aspect
> > there as well.
> >
> >> 
> >> > diff --git a/qemu-options.hx b/qemu-options.hx
> >> > index 96ae41f787..3c754c149f 100644
> >> > --- a/qemu-options.hx
> >> > +++ b/qemu-options.hx
> >> > @@ -5858,6 +5858,11 @@ SRST
> >> >          off will cause a failure during allocation because it is not supported
> >> >          by this backend.
> >> >  
> >> > +    ``-object memory-backend-guest-memfd,id=id,prealloc=on|off,size=size,host-nodes=host-nodes,policy=default|preferred|bind|interleave``
> >> > +        Creates an anonymous memory file backend object that has similar
> >> > +        semantics to memfd, but is also usable as private memory when
> >> > +        running as a confidential VM. (Linux only)
> >> 
> >> There is no object type "memfd".  Do you mean "memory-backend-memfd"?
> >
> > Yes, will update.
> >
> >> 
> >> If yes, that one has additional properties @hugetlb, @hugetlbsize, and
> >> @seal.  Why are they not needed for memory-backend-guest-memfd?
> >
> > ATM, hugetlb is not enabled for guest_memfd in the kernel. It's likely the
> > same set of options will apply, but there are also efforts to do things like
> > plumb DAX memory through guest_memfd for confidential VMs where maybe we end
> > up needing to be a bit more flexible/creative... not sure, but it seemed
> > like a good idea to give ourselves a clean slate since the support isn't
> > there yet anyway.
> 
> I gather these properties cannot work today.  I agree we shouldn't add
> them until they do.
> 
> > For seal, I'm not aware of any plan to support that for guest_memfd, so
> > it seems like unecessary baggage to pull in.
> 
> Likewise.

Sounds good, though I'm sort of now leaning more toward the
memory-backend-memfd,guest_memfd=on approach that Peter implemented[1]
since it requires less assumptions about what we'll need to do later
(i.e. if we want to introduce a backend specifically for guest_memfd
we'll still have the option, but if we do it now, but decide to go back
to re-using the existin *-memfd/*-file/*-etc backends because the
option format seems more familiar to QEMU users, then the dedicated
backend is a little bit more of a pain to turn around and try to
deprecate.

Not sure yet what we'll end up doing though, but hopefully for v2 we'll
have a plan for what to do initially at least.

Thanks,

Mike

[1] https://lore.kernel.org/qemu-devel/aiCAFWKEAHkPLCO5@x1.local/

> 
> > Thanks,
> >
> > Mike
> >
> >> 
> >> > +
> >> >      ``-object iommufd,id=id[,fd=fd]``
> >> >          Creates an iommufd backend which allows control of DMA mapping
> >> >          through the ``/dev/iommu`` device.
> >> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2026-06-08 20:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28  0:03 [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Michael Roth
2026-05-28  0:03 ` [PATCH RFC 01/12] accel/kvm: Decouple guest_memfd checks from memory attribute checks Michael Roth
2026-05-28  0:03 ` [PATCH RFC 02/12] hostmem: Introduce dedicated memory backend for guest_memfd Michael Roth
2026-06-02  8:22   ` Markus Armbruster
2026-06-03  6:19     ` Michael Roth
2026-06-08  8:20       ` Markus Armbruster
2026-06-08 20:42         ` Michael Roth
2026-05-28  0:03 ` [PATCH RFC 04/12] accel/kvm: Add CGS option to control in-place conversion support Michael Roth
2026-06-02  8:23   ` Markus Armbruster
2026-06-03  6:39     ` Michael Roth
2026-06-08  8:15       ` Markus Armbruster
2026-06-08 20:21         ` Michael Roth
2026-05-28  0:03 ` [PATCH RFC 05/12] system/memory: Re-use memory-backend-guest-memfd inode for private memory Michael Roth
2026-05-28  0:03 ` [PATCH RFC 06/12] system/memory: Default to guest_memfd for RAM for in-place conversion Michael Roth
2026-05-28  0:03 ` [PATCH RFC 07/12] accel/kvm: Move post-conversion updates to a separate helper Michael Roth
2026-05-28  0:03 ` [PATCH RFC 08/12] accel/kvm: Re-order attribute notifications for in-place conversion Michael Roth
2026-05-28  0:03 ` [PATCH RFC 09/12] accel/kvm: Support shared/private conversions via guest_memfd ioctls Michael Roth
2026-06-04 13:19   ` Gupta, Pankaj
2026-06-04 23:36     ` Michael Roth
2026-05-28  0:03 ` [PATCH RFC 10/12] accel/kvm: Don't default to private attributes for in-place conversion Michael Roth
2026-05-28  0:03 ` [PATCH RFC 11/12] i386/sev: Update SNP_LAUNCH_UPDATE " Michael Roth
2026-05-28  0:03 ` [PATCH RFC 12/12] i386/sev: Allow in-place conversion for SEV-SNP guests Michael Roth
2026-05-28  5:44 ` [PATCH RFC 00/12] guest_memfd: support in-place memory conversion Xiaoyao Li
2026-06-02 22:20   ` Michael Roth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox