[POC PATCH 0/5] QEMU: Enable in-place conversion and hugetlb gmem

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [POC PATCH 0/5] QEMU: Enable in-place conversion and hugetlb gmem
       [not found] <cover.1747264138.git.ackerleytng@google.com>
@ 2025-07-15  3:31 ` Xiaoyao Li
  2025-07-15  3:31   ` [POC PATCH 1/5] update-linux-headers: Add guestmem.h Xiaoyao Li
                     ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Xiaoyao Li @ 2025-07-15  3:31 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, ackerleytng, seanjc
  Cc: Fuad Tabba, Vishal Annapurve, rick.p.edgecombe, Kai Huang,
	binbin.wu, yan.y.zhao, ira.weiny, michael.roth, kvm, qemu-devel,
	Peter Xu, Philippe Mathieu-Daudé

Hi all,

This is the POC to enable in-place conversion and hugetlb support of
gmem (guest memfd) in QEMU. It can work with 1G gmem support series[1] and
TDX hugepage support series[2] to run TDX guest with hugepage. I don't
have SNP environment and don't know how it goes with SNP.

It is just the POC and we share it to show how QEMU work with gmem ABI.

The POC uses the simple implementation that switches to use in-place
conversion and hugetlb when it is supported and it doesn't introduce new
interface in QEMU so that existing command line to boot TDX can work without
any change.

Please go to each patch (specifically patch 3/4/5) to discuss the ABI
usage, potential issue, and maybe the upstreamable design.

[1] https://lore.kernel.org/all/cover.1747264138.git.ackerleytng@google.com/
[2] https://lore.kernel.org/all/20250424030033.32635-1-yan.y.zhao@intel.com/

Xiaoyao Li (4):
  update-linux-headers: Add guestmem.h
  headers: Fetch gmem updates
  memory/guest_memfd: Enable hugetlb support
  [HACK] memory: Don't enable in-place conversion for internal
    MemoryRegion with gmem

Yan Zhao (1):
  memory/guest_memfd: Enable in-place conversion when available

 accel/kvm/kvm-all.c             | 82 ++++++++++++++++++++++++---------
 accel/stubs/kvm-stub.c          |  2 +
 include/system/kvm.h            |  2 +
 include/system/memory.h         |  5 ++
 include/system/ramblock.h       |  1 +
 linux-headers/linux/guestmem.h  | 29 ++++++++++++
 linux-headers/linux/kvm.h       | 18 ++++++++
 scripts/update-linux-headers.sh |  2 +-
 system/memory.c                 |  9 +++-
 system/physmem.c                | 40 ++++++++++++++--
 10 files changed, 163 insertions(+), 27 deletions(-)
 create mode 100644 linux-headers/linux/guestmem.h

-- 
2.43.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [POC PATCH 1/5] update-linux-headers: Add guestmem.h
  2025-07-15  3:31 ` [POC PATCH 0/5] QEMU: Enable in-place conversion and hugetlb gmem Xiaoyao Li
@ 2025-07-15  3:31   ` Xiaoyao Li
  2025-07-15  3:31   ` [POC PATCH 2/5] headers: Fetch gmem updates Xiaoyao Li
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Xiaoyao Li @ 2025-07-15  3:31 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, ackerleytng, seanjc
  Cc: Fuad Tabba, Vishal Annapurve, rick.p.edgecombe, Kai Huang,
	binbin.wu, yan.y.zhao, ira.weiny, michael.roth, kvm, qemu-devel,
	Peter Xu, Philippe Mathieu-Daudé

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index b43b8ef75a63..3f6169a121a8 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -200,7 +200,7 @@ rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
 for header in const.h stddef.h kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h \
               psci.h psp-sev.h userfaultfd.h memfd.h mman.h nvme_ioctl.h \
-              vduse.h iommufd.h bits.h; do
+              vduse.h iommufd.h bits.h guestmem.h; do
     cp "$hdrdir/include/linux/$header" "$output/linux-headers/linux"
 done
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [POC PATCH 2/5] headers: Fetch gmem updates
  2025-07-15  3:31 ` [POC PATCH 0/5] QEMU: Enable in-place conversion and hugetlb gmem Xiaoyao Li
  2025-07-15  3:31   ` [POC PATCH 1/5] update-linux-headers: Add guestmem.h Xiaoyao Li
@ 2025-07-15  3:31   ` Xiaoyao Li
  2025-07-15  3:31   ` [POC PATCH 3/5] memory/guest_memfd: Enable in-place conversion when available Xiaoyao Li
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Xiaoyao Li @ 2025-07-15  3:31 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, ackerleytng, seanjc
  Cc: Fuad Tabba, Vishal Annapurve, rick.p.edgecombe, Kai Huang,
	binbin.wu, yan.y.zhao, ira.weiny, michael.roth, kvm, qemu-devel,
	Peter Xu, Philippe Mathieu-Daudé

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 linux-headers/linux/guestmem.h | 29 +++++++++++++++++++++++++++++
 linux-headers/linux/kvm.h      | 18 ++++++++++++++++++
 2 files changed, 47 insertions(+)
 create mode 100644 linux-headers/linux/guestmem.h

diff --git a/linux-headers/linux/guestmem.h b/linux-headers/linux/guestmem.h
new file mode 100644
index 000000000000..be045fbad230
--- /dev/null
+++ b/linux-headers/linux/guestmem.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_GUESTMEM_H
+#define _LINUX_GUESTMEM_H
+
+/*
+ * Huge page size must be explicitly defined when using the guestmem_hugetlb
+ * allocator for guest_memfd.  It is the responsibility of the application to
+ * know which sizes are supported on the running system.  See mmap(2) man page
+ * for details.
+ */
+
+#define GUESTMEM_HUGETLB_FLAG_SHIFT	58
+#define GUESTMEM_HUGETLB_FLAG_MASK	0x3fUL
+
+#define GUESTMEM_HUGETLB_FLAG_16KB	(14UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_64KB	(16UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_512KB	(19UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_1MB	(20UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_2MB	(21UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_8MB	(23UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_16MB	(24UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_32MB	(25UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_256MB	(28UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_512MB	(29UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_1GB	(30UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_2GB	(31UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+#define GUESTMEM_HUGETLB_FLAG_16GB	(34UL << GUESTMEM_HUGETLB_FLAG_SHIFT)
+
+#endif /* _LINUX_GUESTMEM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 32c5885a3c20..ff9ef5fb37c5 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -952,6 +952,9 @@ struct kvm_enable_cap {
 #define KVM_CAP_ARM_EL2 240
 #define KVM_CAP_ARM_EL2_E2H0 241
 #define KVM_CAP_RISCV_MP_STATE_RESET 242
+#define KVM_CAP_GMEM_SHARED_MEM 240
+#define KVM_CAP_GMEM_CONVERSION 241
+#define KVM_CAP_GMEM_HUGETLB 242
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
@@ -1589,12 +1592,27 @@ struct kvm_memory_attributes {
 
 #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
 
+#define GUEST_MEMFD_FLAG_SUPPORT_SHARED	(1UL << 0)
+#define GUEST_MEMFD_FLAG_INIT_PRIVATE	(1UL << 1)
+#define GUEST_MEMFD_FLAG_HUGETLB	(1UL << 2)
+
 struct kvm_create_guest_memfd {
 	__u64 size;
 	__u64 flags;
 	__u64 reserved[6];
 };
 
+#define KVM_GMEM_IO 0xAF
+#define KVM_GMEM_CONVERT_SHARED		_IOWR(KVM_GMEM_IO,  0x41, struct kvm_gmem_convert)
+#define KVM_GMEM_CONVERT_PRIVATE	_IOWR(KVM_GMEM_IO,  0x42, struct kvm_gmem_convert)
+
+struct kvm_gmem_convert {
+	__u64 offset;
+	__u64 size;
+	__u64 error_offset;
+	__u64 reserved[5];
+};
+
 #define KVM_PRE_FAULT_MEMORY	_IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory)
 
 struct kvm_pre_fault_memory {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [POC PATCH 3/5] memory/guest_memfd: Enable in-place conversion when available
  2025-07-15  3:31 ` [POC PATCH 0/5] QEMU: Enable in-place conversion and hugetlb gmem Xiaoyao Li
  2025-07-15  3:31   ` [POC PATCH 1/5] update-linux-headers: Add guestmem.h Xiaoyao Li
  2025-07-15  3:31   ` [POC PATCH 2/5] headers: Fetch gmem updates Xiaoyao Li
@ 2025-07-15  3:31   ` Xiaoyao Li
  2025-07-17  2:02     ` Chenyi Qiang
  2025-07-15  3:31   ` [POC PATCH 4/5] memory/guest_memfd: Enable hugetlb support Xiaoyao Li
  2025-07-15  3:31   ` [POC PATCH 5/5] [HACK] memory: Don't enable in-place conversion for internal MemoryRegion with gmem Xiaoyao Li
  4 siblings, 1 reply; 8+ messages in thread
From: Xiaoyao Li @ 2025-07-15  3:31 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, ackerleytng, seanjc
  Cc: Fuad Tabba, Vishal Annapurve, rick.p.edgecombe, Kai Huang,
	binbin.wu, yan.y.zhao, ira.weiny, michael.roth, kvm, qemu-devel,
	Peter Xu, Philippe Mathieu-Daudé

From: Yan Zhao <yan.y.zhao@intel.com>

(This is just the POC code to use in-place conversion gmem.)

Try to use in-place conversion gmem when it is supported.

When in-place conversion is enabled, there is no need to discard memory
since it still needs to be used as the memory of opposite attribute
after conversion.

For a upstreamable solution, we can introduce memory-backend-guestmemfd
for in-place conversion. With the non in-place conversion, it needs
seperate non-gmem memory to back the shared memory and gmem is created
implicitly and internally based on vm type. While with in-place
conversion, there is no need for seperate non-gmem memory because gmem
itself can be served as shared memory. So that we can introduce
memory-backend-guestmemfd as the specific backend for in-place
conversion gmem.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c       | 79 ++++++++++++++++++++++++++++-----------
 accel/stubs/kvm-stub.c    |  1 +
 include/system/kvm.h      |  1 +
 include/system/memory.h   |  2 +
 include/system/ramblock.h |  1 +
 system/memory.c           |  7 ++++
 system/physmem.c          | 21 ++++++++++-
 7 files changed, 90 insertions(+), 22 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a106d1ba0f0b..609537738d38 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -105,6 +105,7 @@ static int kvm_sstep_flags;
 static bool kvm_immediate_exit;
 static uint64_t kvm_supported_memory_attributes;
 static bool kvm_guest_memfd_supported;
+bool kvm_guest_memfd_inplace_supported;
 static hwaddr kvm_max_slot_size = ~0;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
@@ -1487,6 +1488,30 @@ static int kvm_set_memory_attributes(hwaddr start, uint64_t size, uint64_t attr)
     return r;
 }
 
+static int kvm_set_guest_memfd_shareability(MemoryRegion *mr, ram_addr_t offset,
+                                            uint64_t size, bool shared)
+{
+    int guest_memfd = mr->ram_block->guest_memfd;
+    struct kvm_gmem_convert param = {
+                .offset = offset,
+                .size = size,
+                .error_offset = 0,
+    };
+    unsigned long op;
+    int r;
+
+    op = shared ? KVM_GMEM_CONVERT_SHARED : KVM_GMEM_CONVERT_PRIVATE;
+
+    r = ioctl(guest_memfd, op, &param);
+    if (r) {
+        error_report("failed to set guest_memfd offset 0x%lx size 0x%lx to %s  "
+                     "error '%s' error offset 0x%llx",
+                     offset, size, shared ? "shared" : "private",
+                     strerror(errno), param.error_offset);
+    }
+    return r;
+}
+
 int kvm_set_memory_attributes_private(hwaddr start, uint64_t size)
 {
     return kvm_set_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
@@ -1604,7 +1629,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
             abort();
         }
 
-        if (memory_region_has_guest_memfd(mr)) {
+        if (memory_region_has_guest_memfd(mr) &&
+            !memory_region_guest_memfd_in_place_conversion(mr)) {
             err = kvm_set_memory_attributes_private(start_addr, slot_size);
             if (err) {
                 error_report("%s: failed to set memory attribute private: %s",
@@ -2779,6 +2805,9 @@ static int kvm_init(AccelState *as, MachineState *ms)
         kvm_check_extension(s, KVM_CAP_GUEST_MEMFD) &&
         kvm_check_extension(s, KVM_CAP_USER_MEMORY2) &&
         (kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE);
+    kvm_guest_memfd_inplace_supported =
+        kvm_check_extension(s, KVM_CAP_GMEM_SHARED_MEM) &&
+        kvm_check_extension(s, KVM_CAP_GMEM_CONVERSION);
     kvm_pre_fault_memory_supported = kvm_vm_check_extension(s, KVM_CAP_PRE_FAULT_MEMORY);
 
     if (s->kernel_irqchip_split == ON_OFF_AUTO_AUTO) {
@@ -3056,6 +3085,7 @@ static void kvm_eat_signals(CPUState *cpu)
 
 int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
 {
+    bool in_place_conversion = false;
     MemoryRegionSection section;
     ram_addr_t offset;
     MemoryRegion *mr;
@@ -3112,18 +3142,23 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
         goto out_unref;
     }
 
-    if (to_private) {
-        ret = kvm_set_memory_attributes_private(start, size);
-    } else {
-        ret = kvm_set_memory_attributes_shared(start, size);
-    }
-    if (ret) {
-        goto out_unref;
-    }
-
     addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
     rb = qemu_ram_block_from_host(addr, false, &offset);
 
+    in_place_conversion = memory_region_guest_memfd_in_place_conversion(mr);
+    if (in_place_conversion) {
+        ret = kvm_set_guest_memfd_shareability(mr, offset, size, !to_private);
+    } else {
+        if (to_private) {
+            ret = kvm_set_memory_attributes_private(start, size);
+        } else {
+            ret = kvm_set_memory_attributes_shared(start, size);
+        }
+    }
+    if (ret) {
+        goto out_unref;
+    }
+
     ret = ram_block_attributes_state_change(RAM_BLOCK_ATTRIBUTES(mr->rdm),
                                             offset, size, to_private);
     if (ret) {
@@ -3133,17 +3168,19 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
         goto out_unref;
     }
 
-    if (to_private) {
-        if (rb->page_size != qemu_real_host_page_size()) {
-            /*
-             * shared memory is backed by hugetlb, which is supposed to be
-             * pre-allocated and doesn't need to be discarded
-             */
-            goto out_unref;
-        }
-        ret = ram_block_discard_range(rb, offset, size);
-    } else {
-        ret = ram_block_discard_guest_memfd_range(rb, offset, size);
+    if (!in_place_conversion) {
+        if (to_private) {
+            if (rb->page_size != qemu_real_host_page_size()) {
+               /*
+                * shared memory is backed by hugetlb, which is supposed to be
+                * pre-allocated and doesn't need to be discarded
+                */
+                goto out_unref;
+             }
+             ret = ram_block_discard_range(rb, offset, size);
+         } else {
+             ret = ram_block_discard_guest_memfd_range(rb, offset, size);
+         }
     }
 
 out_unref:
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 68cd33ba9735..bf0ccae27b62 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -24,6 +24,7 @@ bool kvm_gsi_direct_mapping;
 bool kvm_allowed;
 bool kvm_readonly_mem_allowed;
 bool kvm_msi_use_devid;
+bool kvm_guest_memfd_inplace_supported;
 
 void kvm_flush_coalesced_mmio_buffer(void)
 {
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 3c7d31473663..32f2be5f92e1 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -43,6 +43,7 @@ extern bool kvm_gsi_direct_mapping;
 extern bool kvm_readonly_mem_allowed;
 extern bool kvm_msi_use_devid;
 extern bool kvm_pre_fault_memory_supported;
+extern bool kvm_guest_memfd_inplace_supported;
 
 #define kvm_enabled()           (kvm_allowed)
 /**
diff --git a/include/system/memory.h b/include/system/memory.h
index 46248d4a52c4..f14fbf65805d 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -1812,6 +1812,8 @@ bool memory_region_is_protected(MemoryRegion *mr);
  */
 bool memory_region_has_guest_memfd(MemoryRegion *mr);
 
+bool memory_region_guest_memfd_in_place_conversion(MemoryRegion *mr);
+
 /**
  * memory_region_get_iommu: check whether a memory region is an iommu
  *
diff --git a/include/system/ramblock.h b/include/system/ramblock.h
index 87e847e184aa..87757940ea21 100644
--- a/include/system/ramblock.h
+++ b/include/system/ramblock.h
@@ -46,6 +46,7 @@ struct RAMBlock {
     int fd;
     uint64_t fd_offset;
     int guest_memfd;
+    uint64_t guest_memfd_flags;
     RamBlockAttributes *attributes;
     size_t page_size;
     /* dirty bitmap used during migration */
diff --git a/system/memory.c b/system/memory.c
index e8d9b15b28f6..6870a41629ef 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -35,6 +35,7 @@
 
 #include "memory-internal.h"
 
+#include <linux/kvm.h>
 //#define DEBUG_UNASSIGNED
 
 static unsigned memory_region_transaction_depth;
@@ -1878,6 +1879,12 @@ bool memory_region_has_guest_memfd(MemoryRegion *mr)
     return mr->ram_block && mr->ram_block->guest_memfd >= 0;
 }
 
+bool memory_region_guest_memfd_in_place_conversion(MemoryRegion *mr)
+{
+    return mr && memory_region_has_guest_memfd(mr) &&
+           (mr->ram_block->guest_memfd_flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED);
+}
+
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
     uint8_t mask = mr->dirty_log_mask;
diff --git a/system/physmem.c b/system/physmem.c
index 130c148ffb5c..955480685310 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -89,6 +89,9 @@
 
 #include "memory-internal.h"
 
+#include <linux/guestmem.h>
+#include <linux/kvm.h>
+
 //#define DEBUG_SUBPAGE
 
 /* ram_list is read under rcu_read_lock()/rcu_read_unlock().  Writes
@@ -1913,6 +1916,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
 
     if (new_block->flags & RAM_GUEST_MEMFD) {
         int ret;
+        bool in_place = kvm_guest_memfd_inplace_supported;
+
+        new_block->guest_memfd_flags = 0;
 
         if (!kvm_enabled()) {
             error_setg(errp, "cannot set up private guest memory for %s: KVM required",
@@ -1929,13 +1935,26 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
             goto out_free;
         }
 
+        if (in_place) {
+            new_block->guest_memfd_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED |
+                                            GUEST_MEMFD_FLAG_INIT_PRIVATE;
+        }
+
         new_block->guest_memfd = kvm_create_guest_memfd(new_block->max_length,
-                                                        0, errp);
+                                 new_block->guest_memfd_flags, errp);
         if (new_block->guest_memfd < 0) {
             qemu_mutex_unlock_ramlist();
             goto out_free;
         }
 
+        if (in_place) {
+            qemu_ram_munmap(new_block->fd, new_block->host, new_block->max_length);
+            new_block->host = qemu_ram_mmap(new_block->guest_memfd,
+                                            new_block->max_length,
+                                            QEMU_VMALLOC_ALIGN,
+                                            QEMU_MAP_SHARED, 0);
+        }
+
         /*
          * The attribute bitmap of the RamBlockAttributes is default to
          * discarded, which mimics the behavior of kvm_set_phys_mem() when it
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [POC PATCH 3/5] memory/guest_memfd: Enable in-place conversion when available
  2025-07-15  3:31   ` [POC PATCH 3/5] memory/guest_memfd: Enable in-place conversion when available Xiaoyao Li
@ 2025-07-17  2:02     ` Chenyi Qiang
  2025-08-01  2:33       ` Xiaoyao Li
  0 siblings, 1 reply; 8+ messages in thread
From: Chenyi Qiang @ 2025-07-17  2:02 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, David Hildenbrand, ackerleytng, seanjc
  Cc: Fuad Tabba, Vishal Annapurve, rick.p.edgecombe, Kai Huang,
	binbin.wu, yan.y.zhao, ira.weiny, michael.roth, kvm, qemu-devel,
	Peter Xu, Philippe Mathieu-Daudé



On 7/15/2025 11:31 AM, Xiaoyao Li wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 
> (This is just the POC code to use in-place conversion gmem.)
> 
> Try to use in-place conversion gmem when it is supported.
> 
> When in-place conversion is enabled, there is no need to discard memory
> since it still needs to be used as the memory of opposite attribute
> after conversion.
> 
> For a upstreamable solution, we can introduce memory-backend-guestmemfd
> for in-place conversion. With the non in-place conversion, it needs
> seperate non-gmem memory to back the shared memory and gmem is created
> implicitly and internally based on vm type. While with in-place
> conversion, there is no need for seperate non-gmem memory because gmem
> itself can be served as shared memory. So that we can introduce
> memory-backend-guestmemfd as the specific backend for in-place
> conversion gmem.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Co-developed-by Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  accel/kvm/kvm-all.c       | 79 ++++++++++++++++++++++++++++-----------
>  accel/stubs/kvm-stub.c    |  1 +
>  include/system/kvm.h      |  1 +
>  include/system/memory.h   |  2 +
>  include/system/ramblock.h |  1 +
>  system/memory.c           |  7 ++++
>  system/physmem.c          | 21 ++++++++++-
>  7 files changed, 90 insertions(+), 22 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index a106d1ba0f0b..609537738d38 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -105,6 +105,7 @@ static int kvm_sstep_flags;
>  static bool kvm_immediate_exit;
>  static uint64_t kvm_supported_memory_attributes;
>  static bool kvm_guest_memfd_supported;
> +bool kvm_guest_memfd_inplace_supported;
>  static hwaddr kvm_max_slot_size = ~0;
>  
>  static const KVMCapabilityInfo kvm_required_capabilites[] = {
> @@ -1487,6 +1488,30 @@ static int kvm_set_memory_attributes(hwaddr start, uint64_t size, uint64_t attr)
>      return r;
>  }
>  
> +static int kvm_set_guest_memfd_shareability(MemoryRegion *mr, ram_addr_t offset,
> +                                            uint64_t size, bool shared)
> +{
> +    int guest_memfd = mr->ram_block->guest_memfd;
> +    struct kvm_gmem_convert param = {
> +                .offset = offset,
> +                .size = size,
> +                .error_offset = 0,
> +    };
> +    unsigned long op;
> +    int r;
> +
> +    op = shared ? KVM_GMEM_CONVERT_SHARED : KVM_GMEM_CONVERT_PRIVATE;
> +
> +    r = ioctl(guest_memfd, op, &param);
> +    if (r) {
> +        error_report("failed to set guest_memfd offset 0x%lx size 0x%lx to %s  "
> +                     "error '%s' error offset 0x%llx",
> +                     offset, size, shared ? "shared" : "private",
> +                     strerror(errno), param.error_offset);
> +    }
> +    return r;
> +}
> +
>  int kvm_set_memory_attributes_private(hwaddr start, uint64_t size)
>  {
>      return kvm_set_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> @@ -1604,7 +1629,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>              abort();
>          }
>  
> -        if (memory_region_has_guest_memfd(mr)) {
> +        if (memory_region_has_guest_memfd(mr) &&
> +            !memory_region_guest_memfd_in_place_conversion(mr)) {
>              err = kvm_set_memory_attributes_private(start_addr, slot_size);
>              if (err) {
>                  error_report("%s: failed to set memory attribute private: %s",
> @@ -2779,6 +2805,9 @@ static int kvm_init(AccelState *as, MachineState *ms)
>          kvm_check_extension(s, KVM_CAP_GUEST_MEMFD) &&
>          kvm_check_extension(s, KVM_CAP_USER_MEMORY2) &&
>          (kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +    kvm_guest_memfd_inplace_supported =
> +        kvm_check_extension(s, KVM_CAP_GMEM_SHARED_MEM) &&
> +        kvm_check_extension(s, KVM_CAP_GMEM_CONVERSION);
>      kvm_pre_fault_memory_supported = kvm_vm_check_extension(s, KVM_CAP_PRE_FAULT_MEMORY);
>  
>      if (s->kernel_irqchip_split == ON_OFF_AUTO_AUTO) {
> @@ -3056,6 +3085,7 @@ static void kvm_eat_signals(CPUState *cpu)
>  
>  int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>  {
> +    bool in_place_conversion = false;
>      MemoryRegionSection section;
>      ram_addr_t offset;
>      MemoryRegion *mr;
> @@ -3112,18 +3142,23 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>          goto out_unref;
>      }
>  
> -    if (to_private) {
> -        ret = kvm_set_memory_attributes_private(start, size);
> -    } else {
> -        ret = kvm_set_memory_attributes_shared(start, size);
> -    }
> -    if (ret) {
> -        goto out_unref;
> -    }
> -
>      addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
>      rb = qemu_ram_block_from_host(addr, false, &offset);
>  
> +    in_place_conversion = memory_region_guest_memfd_in_place_conversion(mr);
> +    if (in_place_conversion) {
> +        ret = kvm_set_guest_memfd_shareability(mr, offset, size, !to_private);
> +    } else {
> +        if (to_private) {
> +            ret = kvm_set_memory_attributes_private(start, size);
> +        } else {
> +            ret = kvm_set_memory_attributes_shared(start, size);
> +        }
> +    }
> +    if (ret) {
> +        goto out_unref;
> +    }
> +
>      ret = ram_block_attributes_state_change(RAM_BLOCK_ATTRIBUTES(mr->rdm),
>                                              offset, size, to_private);
>      if (ret) {

There's one thing required for shared device assignment with in-place conversion, we need to follow the
sequence of unmap-before-conversion-to-private and map-after-conversion-to-shared. Maybe change it like:

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a54e68e769..e9e62ae8f2 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3146,6 +3146,17 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
     addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
     rb = qemu_ram_block_from_host(addr, false, &offset);
 
+    if (to_private) {
+        ret = ram_block_attributes_state_change(RAM_BLOCK_ATTRIBUTES(mr->rdm),
+                                                offset, size, to_private);
+        if (ret) {
+            error_report("Failed to notify the listener the state change of "
+                         "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s",
+                         start, size, to_private ? "private" : "shared");
+            goto out_unref;
+        }
+    }
+
     in_place_conversion = memory_region_guest_memfd_in_place_conversion(mr);
     if (in_place_conversion) {
         ret = kvm_set_guest_memfd_shareability(mr, offset, size, !to_private);
@@ -3160,13 +3171,15 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
         goto out_unref;
     }
 
-    ret = ram_block_attributes_state_change(RAM_BLOCK_ATTRIBUTES(mr->rdm),
-                                            offset, size, to_private);
-    if (ret) {
-        error_report("Failed to notify the listener the state change of "
-                     "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s",
-                     start, size, to_private ? "private" : "shared");
-        goto out_unref;
+    if (!to_private) {
+        ret = ram_block_attributes_state_change(RAM_BLOCK_ATTRIBUTES(mr->rdm),
+                                                offset, size, to_private);
+        if (ret) {
+            error_report("Failed to notify the listener the state change of "
+                         "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s",
+                         start, size, to_private ? "private" : "shared");
+            goto out_unref;
+        }
     }





^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [POC PATCH 3/5] memory/guest_memfd: Enable in-place conversion when available
  2025-07-17  2:02     ` Chenyi Qiang
@ 2025-08-01  2:33       ` Xiaoyao Li
  0 siblings, 0 replies; 8+ messages in thread
From: Xiaoyao Li @ 2025-08-01  2:33 UTC (permalink / raw)
  To: Chenyi Qiang, Paolo Bonzini, David Hildenbrand, ackerleytng,
	seanjc
  Cc: Fuad Tabba, Vishal Annapurve, rick.p.edgecombe, Kai Huang,
	binbin.wu, yan.y.zhao, ira.weiny, michael.roth, kvm, qemu-devel,
	Peter Xu, Philippe Mathieu-Daudé

On 7/17/2025 10:02 AM, Chenyi Qiang wrote:
> 
> 
> On 7/15/2025 11:31 AM, Xiaoyao Li wrote:
>> From: Yan Zhao <yan.y.zhao@intel.com>
>>
>> (This is just the POC code to use in-place conversion gmem.)
>>
>> Try to use in-place conversion gmem when it is supported.
>>
>> When in-place conversion is enabled, there is no need to discard memory
>> since it still needs to be used as the memory of opposite attribute
>> after conversion.
>>
>> For a upstreamable solution, we can introduce memory-backend-guestmemfd
>> for in-place conversion. With the non in-place conversion, it needs
>> seperate non-gmem memory to back the shared memory and gmem is created
>> implicitly and internally based on vm type. While with in-place
>> conversion, there is no need for seperate non-gmem memory because gmem
>> itself can be served as shared memory. So that we can introduce
>> memory-backend-guestmemfd as the specific backend for in-place
>> conversion gmem.
>>
>> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
>> Co-developed-by Xiaoyao Li <xiaoyao.li@intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   accel/kvm/kvm-all.c       | 79 ++++++++++++++++++++++++++++-----------
>>   accel/stubs/kvm-stub.c    |  1 +
>>   include/system/kvm.h      |  1 +
>>   include/system/memory.h   |  2 +
>>   include/system/ramblock.h |  1 +
>>   system/memory.c           |  7 ++++
>>   system/physmem.c          | 21 ++++++++++-
>>   7 files changed, 90 insertions(+), 22 deletions(-)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index a106d1ba0f0b..609537738d38 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -105,6 +105,7 @@ static int kvm_sstep_flags;
>>   static bool kvm_immediate_exit;
>>   static uint64_t kvm_supported_memory_attributes;
>>   static bool kvm_guest_memfd_supported;
>> +bool kvm_guest_memfd_inplace_supported;
>>   static hwaddr kvm_max_slot_size = ~0;
>>   
>>   static const KVMCapabilityInfo kvm_required_capabilites[] = {
>> @@ -1487,6 +1488,30 @@ static int kvm_set_memory_attributes(hwaddr start, uint64_t size, uint64_t attr)
>>       return r;
>>   }
>>   
>> +static int kvm_set_guest_memfd_shareability(MemoryRegion *mr, ram_addr_t offset,
>> +                                            uint64_t size, bool shared)
>> +{
>> +    int guest_memfd = mr->ram_block->guest_memfd;
>> +    struct kvm_gmem_convert param = {
>> +                .offset = offset,
>> +                .size = size,
>> +                .error_offset = 0,
>> +    };
>> +    unsigned long op;
>> +    int r;
>> +
>> +    op = shared ? KVM_GMEM_CONVERT_SHARED : KVM_GMEM_CONVERT_PRIVATE;
>> +
>> +    r = ioctl(guest_memfd, op, &param);
>> +    if (r) {
>> +        error_report("failed to set guest_memfd offset 0x%lx size 0x%lx to %s  "
>> +                     "error '%s' error offset 0x%llx",
>> +                     offset, size, shared ? "shared" : "private",
>> +                     strerror(errno), param.error_offset);
>> +    }
>> +    return r;
>> +}
>> +
>>   int kvm_set_memory_attributes_private(hwaddr start, uint64_t size)
>>   {
>>       return kvm_set_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
>> @@ -1604,7 +1629,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>               abort();
>>           }
>>   
>> -        if (memory_region_has_guest_memfd(mr)) {
>> +        if (memory_region_has_guest_memfd(mr) &&
>> +            !memory_region_guest_memfd_in_place_conversion(mr)) {
>>               err = kvm_set_memory_attributes_private(start_addr, slot_size);
>>               if (err) {
>>                   error_report("%s: failed to set memory attribute private: %s",
>> @@ -2779,6 +2805,9 @@ static int kvm_init(AccelState *as, MachineState *ms)
>>           kvm_check_extension(s, KVM_CAP_GUEST_MEMFD) &&
>>           kvm_check_extension(s, KVM_CAP_USER_MEMORY2) &&
>>           (kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE);
>> +    kvm_guest_memfd_inplace_supported =
>> +        kvm_check_extension(s, KVM_CAP_GMEM_SHARED_MEM) &&
>> +        kvm_check_extension(s, KVM_CAP_GMEM_CONVERSION);
>>       kvm_pre_fault_memory_supported = kvm_vm_check_extension(s, KVM_CAP_PRE_FAULT_MEMORY);
>>   
>>       if (s->kernel_irqchip_split == ON_OFF_AUTO_AUTO) {
>> @@ -3056,6 +3085,7 @@ static void kvm_eat_signals(CPUState *cpu)
>>   
>>   int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>>   {
>> +    bool in_place_conversion = false;
>>       MemoryRegionSection section;
>>       ram_addr_t offset;
>>       MemoryRegion *mr;
>> @@ -3112,18 +3142,23 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>>           goto out_unref;
>>       }
>>   
>> -    if (to_private) {
>> -        ret = kvm_set_memory_attributes_private(start, size);
>> -    } else {
>> -        ret = kvm_set_memory_attributes_shared(start, size);
>> -    }
>> -    if (ret) {
>> -        goto out_unref;
>> -    }
>> -
>>       addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
>>       rb = qemu_ram_block_from_host(addr, false, &offset);
>>   
>> +    in_place_conversion = memory_region_guest_memfd_in_place_conversion(mr);
>> +    if (in_place_conversion) {
>> +        ret = kvm_set_guest_memfd_shareability(mr, offset, size, !to_private);
>> +    } else {
>> +        if (to_private) {
>> +            ret = kvm_set_memory_attributes_private(start, size);
>> +        } else {
>> +            ret = kvm_set_memory_attributes_shared(start, size);
>> +        }
>> +    }
>> +    if (ret) {
>> +        goto out_unref;
>> +    }
>> +
>>       ret = ram_block_attributes_state_change(RAM_BLOCK_ATTRIBUTES(mr->rdm),
>>                                               offset, size, to_private);
>>       if (ret) {
> 
> There's one thing required for shared device assignment with in-place conversion, we need to follow the
> sequence of unmap-before-conversion-to-private and map-after-conversion-to-shared. Maybe change it like:
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index a54e68e769..e9e62ae8f2 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -3146,6 +3146,17 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>       addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
>       rb = qemu_ram_block_from_host(addr, false, &offset);
>   
> +    if (to_private) {
> +        ret = ram_block_attributes_state_change(RAM_BLOCK_ATTRIBUTES(mr->rdm),
> +                                                offset, size, to_private);
> +        if (ret) {
> +            error_report("Failed to notify the listener the state change of "
> +                         "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s",
> +                         start, size, to_private ? "private" : "shared");
> +            goto out_unref;
> +        }
> +    }
> +
>       in_place_conversion = memory_region_guest_memfd_in_place_conversion(mr);
>       if (in_place_conversion) {
>           ret = kvm_set_guest_memfd_shareability(mr, offset, size, !to_private);
> @@ -3160,13 +3171,15 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>           goto out_unref;
>       }
>   
> -    ret = ram_block_attributes_state_change(RAM_BLOCK_ATTRIBUTES(mr->rdm),
> -                                            offset, size, to_private);
> -    if (ret) {
> -        error_report("Failed to notify the listener the state change of "
> -                     "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s",
> -                     start, size, to_private ? "private" : "shared");
> -        goto out_unref;
> +    if (!to_private) {
> +        ret = ram_block_attributes_state_change(RAM_BLOCK_ATTRIBUTES(mr->rdm),
> +                                                offset, size, to_private);
> +        if (ret) {
> +            error_report("Failed to notify the listener the state change of "
> +                         "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s",
> +                         start, size, to_private ? "private" : "shared");
> +            goto out_unref;
> +        }
>       }

(Sorry for forgetting to reply in the community)

Thanks for catching and reporting it. I have incorporated it to the 
internal branch.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [POC PATCH 4/5] memory/guest_memfd: Enable hugetlb support
  2025-07-15  3:31 ` [POC PATCH 0/5] QEMU: Enable in-place conversion and hugetlb gmem Xiaoyao Li
                     ` (2 preceding siblings ...)
  2025-07-15  3:31   ` [POC PATCH 3/5] memory/guest_memfd: Enable in-place conversion when available Xiaoyao Li
@ 2025-07-15  3:31   ` Xiaoyao Li
  2025-07-15  3:31   ` [POC PATCH 5/5] [HACK] memory: Don't enable in-place conversion for internal MemoryRegion with gmem Xiaoyao Li
  4 siblings, 0 replies; 8+ messages in thread
From: Xiaoyao Li @ 2025-07-15  3:31 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, ackerleytng, seanjc
  Cc: Fuad Tabba, Vishal Annapurve, rick.p.edgecombe, Kai Huang,
	binbin.wu, yan.y.zhao, ira.weiny, michael.roth, kvm, qemu-devel,
	Peter Xu, Philippe Mathieu-Daudé

(This is just the POC code to use gmem with hugetlb.)

Try with hugetlb first when hugetlb is supported by gmem. If hugetlb
cannot afford the requested memory size and returns -ENOMEM, fallback to
create gmem withtout hugetlb.

The hugetlb size is hardcoded as GUESTMEM_HUGETLB_FLAG_2MB. I'm not sure
if it will be better if gmem can report the supported hugetlb size.
But look at the current implementation of memfd, it just tries with
the requested hugetlb size from user and fail when not supported.
Hence gmem can do the same way without the supported size being
enuemrated.

For a upstreamable solution, the hugetlb support of gmem can be
implemented as "hugetlb" and "hugetlbsize" properties of
memory-backend-guestmemfd as similar of memory-backend-memfd. (It
requires memory-backed-guestmemfd introduced for in-place conversion
gmem at first)

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c    |  3 ++-
 accel/stubs/kvm-stub.c |  1 +
 include/system/kvm.h   |  1 +
 system/physmem.c       | 13 +++++++++++++
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 609537738d38..2d18e961714e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -106,6 +106,7 @@ static bool kvm_immediate_exit;
 static uint64_t kvm_supported_memory_attributes;
 static bool kvm_guest_memfd_supported;
 bool kvm_guest_memfd_inplace_supported;
+bool kvm_guest_memfd_hugetlb_supported;
 static hwaddr kvm_max_slot_size = ~0;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
@@ -2808,6 +2809,7 @@ static int kvm_init(AccelState *as, MachineState *ms)
     kvm_guest_memfd_inplace_supported =
         kvm_check_extension(s, KVM_CAP_GMEM_SHARED_MEM) &&
         kvm_check_extension(s, KVM_CAP_GMEM_CONVERSION);
+    kvm_guest_memfd_hugetlb_supported = kvm_check_extension(s, KVM_CAP_GMEM_HUGETLB);
     kvm_pre_fault_memory_supported = kvm_vm_check_extension(s, KVM_CAP_PRE_FAULT_MEMORY);
 
     if (s->kernel_irqchip_split == ON_OFF_AUTO_AUTO) {
@@ -4536,7 +4538,6 @@ int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
     fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, &guest_memfd);
     if (fd < 0) {
         error_setg_errno(errp, errno, "Error creating KVM guest_memfd");
-        return -1;
     }
 
     return fd;
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index bf0ccae27b62..fbc1d7c4e9b5 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -25,6 +25,7 @@ bool kvm_allowed;
 bool kvm_readonly_mem_allowed;
 bool kvm_msi_use_devid;
 bool kvm_guest_memfd_inplace_supported;
+bool kvm_guest_memfd_hugetlb_supported;
 
 void kvm_flush_coalesced_mmio_buffer(void)
 {
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 32f2be5f92e1..d1d79510ee26 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -44,6 +44,7 @@ extern bool kvm_readonly_mem_allowed;
 extern bool kvm_msi_use_devid;
 extern bool kvm_pre_fault_memory_supported;
 extern bool kvm_guest_memfd_inplace_supported;
+extern bool kvm_guest_memfd_hugetlb_supported;
 
 #define kvm_enabled()           (kvm_allowed)
 /**
diff --git a/system/physmem.c b/system/physmem.c
index 955480685310..ea1c27ea2b99 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1940,8 +1940,21 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
                                             GUEST_MEMFD_FLAG_INIT_PRIVATE;
         }
 
+        if (kvm_guest_memfd_hugetlb_supported) {
+            new_block->guest_memfd_flags |= GUEST_MEMFD_FLAG_HUGETLB |
+                                            GUESTMEM_HUGETLB_FLAG_2MB;
+        }
+
+        new_block->guest_memfd = kvm_create_guest_memfd(new_block->max_length,
+                                 new_block->guest_memfd_flags, &err);
+        if (new_block->guest_memfd == -ENOMEM) {
+            error_free(err);
+            new_block->guest_memfd_flags &= ~(GUEST_MEMFD_FLAG_HUGETLB |
+                                              GUESTMEM_HUGETLB_FLAG_2MB);
+        }
         new_block->guest_memfd = kvm_create_guest_memfd(new_block->max_length,
                                  new_block->guest_memfd_flags, errp);
+
         if (new_block->guest_memfd < 0) {
             qemu_mutex_unlock_ramlist();
             goto out_free;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [POC PATCH 5/5] [HACK] memory: Don't enable in-place conversion for internal MemoryRegion with gmem
  2025-07-15  3:31 ` [POC PATCH 0/5] QEMU: Enable in-place conversion and hugetlb gmem Xiaoyao Li
                     ` (3 preceding siblings ...)
  2025-07-15  3:31   ` [POC PATCH 4/5] memory/guest_memfd: Enable hugetlb support Xiaoyao Li
@ 2025-07-15  3:31   ` Xiaoyao Li
  4 siblings, 0 replies; 8+ messages in thread
From: Xiaoyao Li @ 2025-07-15  3:31 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, ackerleytng, seanjc
  Cc: Fuad Tabba, Vishal Annapurve, rick.p.edgecombe, Kai Huang,
	binbin.wu, yan.y.zhao, ira.weiny, michael.roth, kvm, qemu-devel,
	Peter Xu, Philippe Mathieu-Daudé

Currently, the TDVF cannot work with gmem in-place conversion because
current implementation of KVM_TDX_INIT_MEM_REGION in KVM requires
gmem of TDVF to be valid for both shared and private at the same time.

To workaround it, explicitly not enable in-place conversion for internal
MemoryRegion with gmem. So that TDVF doesn't use in-place conversion gmem
and KVM_TDX_INIT_MEM_REGION will initialize the gmem with the separate
shared memory.

To make in-place conversion work with TDX's initial memory, the
one possible solution and flow would be as below and it requires KVM
change:

- QEMU create gmem as shared;
- QEMU mmap the gmem and load TDVF binary into it;
- QEMU convert gmem to private with the content preserved[1];
- QEMU invokes KVM_TDX_INIT_MEM_REGION without valid src, so that KVM
  knows to fetch the content in-place and use in-place PAGE.ADD for TDX.

[1] https://lore.kernel.org/all/aG0pNijVpl0czqXu@google.com/

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 include/system/memory.h | 3 +++
 system/memory.c         | 2 +-
 system/physmem.c        | 8 +++++---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/system/memory.h b/include/system/memory.h
index f14fbf65805d..89d6449cef70 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -256,6 +256,9 @@ typedef struct IOMMUTLBEvent {
  */
 #define RAM_PRIVATE (1 << 13)
 
+/* Don't use enable in-place conversion for the guest mmefd backend */
+#define RAM_GUEST_MEMFD_NO_INPLACE (1 << 14)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
diff --git a/system/memory.c b/system/memory.c
index 6870a41629ef..c1b73abc4c94 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -3702,7 +3702,7 @@ bool memory_region_init_ram_guest_memfd(MemoryRegion *mr,
     DeviceState *owner_dev;
 
     if (!memory_region_init_ram_flags_nomigrate(mr, owner, name, size,
-                                                RAM_GUEST_MEMFD, errp)) {
+                                                RAM_GUEST_MEMFD | RAM_GUEST_MEMFD_NO_INPLACE, errp)) {
         return false;
     }
     /* This will assert if owner is neither NULL nor a DeviceState.
diff --git a/system/physmem.c b/system/physmem.c
index ea1c27ea2b99..c23379082f38 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1916,7 +1916,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
 
     if (new_block->flags & RAM_GUEST_MEMFD) {
         int ret;
-        bool in_place = kvm_guest_memfd_inplace_supported;
+        bool in_place = !(new_block->flags & RAM_GUEST_MEMFD_NO_INPLACE) &&
+                        kvm_guest_memfd_inplace_supported;
 
         new_block->guest_memfd_flags = 0;
 
@@ -2230,7 +2231,8 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     ram_flags &= ~RAM_PRIVATE;
 
     assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
-                          RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
+                          RAM_NORESERVE | RAM_GUEST_MEMFD |
+                          RAM_GUEST_MEMFD_NO_INPLACE)) == 0);
     assert(!host ^ (ram_flags & RAM_PREALLOC));
     assert(max_size >= size);
 
@@ -2314,7 +2316,7 @@ RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
                          MemoryRegion *mr, Error **errp)
 {
     assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE | RAM_GUEST_MEMFD |
-                          RAM_PRIVATE)) == 0);
+                          RAM_PRIVATE | RAM_GUEST_MEMFD_NO_INPLACE)) == 0);
     return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-01  2:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1747264138.git.ackerleytng@google.com>
2025-07-15  3:31 ` [POC PATCH 0/5] QEMU: Enable in-place conversion and hugetlb gmem Xiaoyao Li
2025-07-15  3:31   ` [POC PATCH 1/5] update-linux-headers: Add guestmem.h Xiaoyao Li
2025-07-15  3:31   ` [POC PATCH 2/5] headers: Fetch gmem updates Xiaoyao Li
2025-07-15  3:31   ` [POC PATCH 3/5] memory/guest_memfd: Enable in-place conversion when available Xiaoyao Li
2025-07-17  2:02     ` Chenyi Qiang
2025-08-01  2:33       ` Xiaoyao Li
2025-07-15  3:31   ` [POC PATCH 4/5] memory/guest_memfd: Enable hugetlb support Xiaoyao Li
2025-07-15  3:31   ` [POC PATCH 5/5] [HACK] memory: Don't enable in-place conversion for internal MemoryRegion with gmem Xiaoyao Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).