[PATCH v4 00/13] Enable shared device assignment

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 00/13] Enable shared device assignment
@ 2025-04-07  7:49 Chenyi Qiang
  2025-04-07  7:49 ` [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
                   ` (12 more replies)
  0 siblings, 13 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

This is the v4 series of the shared device assignment support.

Compared with v3 series, the main changes are:

- Introduced a new GenericStateManager parent class, so that the existing
  RamDiscardManager and new PrivateSharedManager can be its child class
  and manage different states.
- Changed the name of MemoryAttributeManager to RamBlockAttribute to
  distinguish from the XXXManager interface and still use it to manage
  guest_memfd information. Meanwhile, Use it to implement
  PrivateSharedManager instead of RamDiscardManager to distinguish the
  states of populate/discard and shared/private.
- Moved the attribute change operations into a listener so that both the
  attribute change and IOMMU pins can be invoked in listener callbacks.
- Added priority listener support in PrivateSharedListener so that the
  attribute change listener and VFIO listener can be triggered in
  expected order to comply with in-place conversin requirement.
- v3: https://lore.kernel.org/qemu-devel/20250310081837.13123-1-chenyi.qiang@intel.com/

The overview of this series:
- Patch 1-3: preparation patches. These include function exposure and
  some definition changes to return values.
- Patch 4: Introduce a generic state change parent class with
  RamDiscardManager as its child class. This paves the way to introduce
  new child classes to manage other memory states.
- Patch 5-6: Introduce a new child class, PrivateSharedManager, to
  manage the private and shared states. Also adds VFIO support for this
  new interface to coordinate RAM discard support. 
- Patch 7-9: Introduce a new object to implement the
  PrivateSharedManager interface and a callback to notify the
  shared/private state change. Stores it in RAMBlocks and register it in
  the target MemoryRegion so that the object can notify page conversion
  events to other systems.
- Patch 10-11: Moves the state change handling into a
  PrivateSharedListener so that it can be invoked together with the VFIO
  listener by the state_change() call.
- Patch 12: To comply with in-place conversion, introduces the priority
  listener support so that the attribute change and IOMMU pin can follow
  the expected order.
- Patch 13: Unlocks the coordinate discard so that the shared device
  assignment (VFIO) can work with guest_memfd.

More small changes or details can be found in the individual patches.

---
Original cover letter with minor changes related to new parent class:

Background
==========
Confidential VMs have two classes of memory: shared and private memory.
Shared memory is accessible from the host/VMM while private memory is
not. Confidential VMs can decide which memory is shared/private and
convert memory between shared/private at runtime.

"guest_memfd" is a new kind of fd whose primary goal is to serve guest
private memory. In current implementation, shared memory is allocated
with normal methods (e.g. mmap or fallocate) while private memory is
allocated from guest_memfd. When a VM performs memory conversions, QEMU
frees pages via madvise or via PUNCH_HOLE on memfd or guest_memfd from
one side, and allocates new pages from the other side. This will cause a
stale IOMMU mapping issue mentioned in [1] when we try to enable shared
device assignment in confidential VMs.

Solution
========
The key to enable shared device assignment is to update the IOMMU mappings
on page conversion. RamDiscardManager, an existing interface currently
utilized by virtio-mem, offers a means to modify IOMMU mappings in
accordance with VM page assignment. Although the required operations in
VFIO for page conversion are similar to memory plug/unplug, the states of
private/shared are different from discard/populated. We want a similar
mechanism with RamDiscardManager but used to manage the state of private
and shared.

This series introduce a new parent abstract class to manage a pair of
opposite states with RamDiscardManager as its child to manage
populate/discard states, and introduce a new child class,
PrivateSharedManager, which can also utilize the same infrastructure to
notify VFIO of page conversions.

Relationship with in-place page conversion
==========================================
To support 1G page support for guest_memfd [2], the current direction is to
allow mmap() of guest_memfd to userspace so that both private and shared
memory can use the same physical pages as the backend. This in-place page
conversion design eliminates the need to discard pages during shared/private
conversions. However, device assignment will still be blocked because the
in-place page conversion will reject the conversion when the page is pinned
by VFIO.

To address this, the key difference lies in the sequence of VFIO map/unmap
operations and the page conversion. It can be adjusted to achieve
unmap-before-conversion-to-private and map-after-conversion-to-shared,
ensuring compatibility with guest_memfd.

Limitation
==========
One limitation is that VFIO expects the DMA mapping for a specific IOVA
to be mapped and unmapped with the same granularity. The guest may
perform partial conversions, such as converting a small region within a
larger region. To prevent such invalid cases, all operations are
performed with 4K granularity. This could be optimized after the
cut_mapping operation [3] is introduced in future. We can alway perform a
split-before-unmap if partial conversions happen. If the split succeeds,
the unmap will succeed and be atomic. If the split fails, the unmap
process fails.

Testing
=======
This patch series is tested based on TDX patches available at:
KVM: https://github.com/intel/tdx/tree/kvm-coco-queue-snapshot/kvm-coco-queue-snapshot-20250322
     (With the revert of HEAD commit)
QEMU: https://github.com/intel-staging/qemu-tdx/tree/tdx-upstream-snapshot-2025-04-07

To facilitate shared device assignment with the NIC, employ the legacy
type1 VFIO with the QEMU command:

qemu-system-x86_64 [...]
    -device vfio-pci,host=XX:XX.X

The parameter of dma_entry_limit needs to be adjusted. For example, a
16GB guest needs to adjust the parameter like
vfio_iommu_type1.dma_entry_limit=4194304.

If use the iommufd-backed VFIO with the qemu command:

qemu-system-x86_64 [...]
    -object iommufd,id=iommufd0 \
    -device vfio-pci,host=XX:XX.X,iommufd=iommufd0

No additional adjustment required.

Following the bootup of the TD guest, the guest's IP address becomes
visible, and iperf is able to successfully send and receive data.

Related link
============
[1] https://lore.kernel.org/qemu-devel/20240423150951.41600-54-pbonzini@redhat.com/
[2] https://lore.kernel.org/lkml/cover.1726009989.git.ackerleytng@google.com/
[3] https://lore.kernel.org/linux-iommu/7-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/

Chenyi Qiang (13):
  memory: Export a helper to get intersection of a MemoryRegionSection
    with a given range
  memory: Change memory_region_set_ram_discard_manager() to return the
    result
  memory: Unify the definiton of ReplayRamPopulate() and
    ReplayRamDiscard()
  memory: Introduce generic state change parent class for
    RamDiscardManager
  memory: Introduce PrivateSharedManager Interface as child of
    GenericStateManager
  vfio: Add the support for PrivateSharedManager Interface
  ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock
    with guest_memfd
  ram-block-attribute: Introduce a callback to notify shared/private
    state changes
  memory: Attach RamBlockAttribute to guest_memfd-backed RAMBlocks
  memory: Change NotifyStateClear() definition to return the result
  KVM: Introduce CVMPrivateSharedListener for attribute changes during
    page conversions
  ram-block-attribute: Add priority listener support for
    PrivateSharedListener
  RAMBlock: Make guest_memfd require coordinate discard

 accel/kvm/kvm-all.c                         |  81 +++-
 hw/vfio/common.c                            | 131 +++++-
 hw/vfio/container-base.c                    |   1 +
 hw/virtio/virtio-mem.c                      | 168 +++----
 include/exec/memory.h                       | 407 ++++++++++------
 include/exec/ramblock.h                     |  25 +
 include/hw/vfio/vfio-container-base.h       |  10 +
 include/system/confidential-guest-support.h |  10 +
 migration/ram.c                             |  21 +-
 system/memory.c                             | 137 ++++--
 system/memory_mapping.c                     |   6 +-
 system/meson.build                          |   1 +
 system/physmem.c                            |  20 +-
 system/ram-block-attribute.c                | 495 ++++++++++++++++++++
 target/i386/kvm/tdx.c                       |   1 +
 target/i386/sev.c                           |   1 +
 16 files changed, 1192 insertions(+), 323 deletions(-)
 create mode 100644 system/ram-block-attribute.c

-- 
2.43.5

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-04-09  2:47   ` Alexey Kardashevskiy
  2025-05-12  3:24   ` Zhao Liu
  2025-04-07  7:49 ` [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result Chenyi Qiang
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

Rename the helper to memory_region_section_intersect_range() to make it
more generic. Meanwhile, define the @end as Int128 and replace the
related operations with Int128_* format since the helper is exported as
a wider API.

Suggested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - No change.

Changes in v3:
    - No change

Changes in v2:
    - Make memory_region_section_intersect_range() an inline function.
    - Add Reviewed-by from David
    - Define the @end as Int128 and use the related Int128_* ops as a wilder
      API (Alexey)
---
 hw/virtio/virtio-mem.c | 32 +++++---------------------------
 include/exec/memory.h  | 27 +++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index b1a003736b..21f16e4912 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -244,28 +244,6 @@ static int virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg,
     return ret;
 }
 
-/*
- * Adjust the memory section to cover the intersection with the given range.
- *
- * Returns false if the intersection is empty, otherwise returns true.
- */
-static bool virtio_mem_intersect_memory_section(MemoryRegionSection *s,
-                                                uint64_t offset, uint64_t size)
-{
-    uint64_t start = MAX(s->offset_within_region, offset);
-    uint64_t end = MIN(s->offset_within_region + int128_get64(s->size),
-                       offset + size);
-
-    if (end <= start) {
-        return false;
-    }
-
-    s->offset_within_address_space += start - s->offset_within_region;
-    s->offset_within_region = start;
-    s->size = int128_make64(end - start);
-    return true;
-}
-
 typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void *arg);
 
 static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
@@ -287,7 +265,7 @@ static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
                                       first_bit + 1) - 1;
         size = (last_bit - first_bit + 1) * vmem->block_size;
 
-        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             break;
         }
         ret = cb(&tmp, arg);
@@ -319,7 +297,7 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
                                  first_bit + 1) - 1;
         size = (last_bit - first_bit + 1) * vmem->block_size;
 
-        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             break;
         }
         ret = cb(&tmp, arg);
@@ -355,7 +333,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
         MemoryRegionSection tmp = *rdl->section;
 
-        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             continue;
         }
         rdl->notify_discard(rdl, &tmp);
@@ -371,7 +349,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
         MemoryRegionSection tmp = *rdl->section;
 
-        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             continue;
         }
         ret = rdl->notify_populate(rdl, &tmp);
@@ -388,7 +366,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
             if (rdl2 == rdl) {
                 break;
             }
-            if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
+            if (!memory_region_section_intersect_range(&tmp, offset, size)) {
                 continue;
             }
             rdl2->notify_discard(rdl2, &tmp);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52..3bebc43d59 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1202,6 +1202,33 @@ MemoryRegionSection *memory_region_section_new_copy(MemoryRegionSection *s);
  */
 void memory_region_section_free_copy(MemoryRegionSection *s);
 
+/**
+ * memory_region_section_intersect_range: Adjust the memory section to cover
+ * the intersection with the given range.
+ *
+ * @s: the #MemoryRegionSection to be adjusted
+ * @offset: the offset of the given range in the memory region
+ * @size: the size of the given range
+ *
+ * Returns false if the intersection is empty, otherwise returns true.
+ */
+static inline bool memory_region_section_intersect_range(MemoryRegionSection *s,
+                                                         uint64_t offset, uint64_t size)
+{
+    uint64_t start = MAX(s->offset_within_region, offset);
+    Int128 end = int128_min(int128_add(int128_make64(s->offset_within_region), s->size),
+                            int128_add(int128_make64(offset), int128_make64(size)));
+
+    if (int128_le(end, int128_make64(start))) {
+        return false;
+    }
+
+    s->offset_within_address_space += start - s->offset_within_region;
+    s->offset_within_region = start;
+    s->size = int128_sub(end, int128_make64(start));
+    return true;
+}
+
 /**
  * memory_region_init: Initialize a memory region
  *
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
  2025-04-07  7:49 ` [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-04-07  9:53   ` Xiaoyao Li
  2025-04-09  5:35   ` Alexey Kardashevskiy
  2025-04-07  7:49 ` [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() Chenyi Qiang
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

Modify memory_region_set_ram_discard_manager() to return false if a
RamDiscardManager is already set in the MemoryRegion. The caller must
handle this failure, such as having virtio-mem undo its actions and fail
the realize() process. Opportunistically move the call earlier to avoid
complex error handling.

This change is beneficial when introducing a new RamDiscardManager
instance besides virtio-mem. After
ram_block_coordinated_discard_require(true) unlocks all
RamDiscardManager instances, only one instance is allowed to be set for
a MemoryRegion at present.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - No change.

Changes in v3:
    - Move set_ram_discard_manager() up to avoid a g_free()
    - Clean up set_ram_discard_manager() definition

Changes in v2:
    - newly added.
---
 hw/virtio/virtio-mem.c | 29 ++++++++++++++++-------------
 include/exec/memory.h  |  6 +++---
 system/memory.c        | 10 +++++++---
 3 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 21f16e4912..d0d3a0240f 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -1049,6 +1049,17 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    /*
+     * Set ourselves as RamDiscardManager before the plug handler maps the
+     * memory region and exposes it via an address space.
+     */
+    if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
+                                              RAM_DISCARD_MANAGER(vmem))) {
+        error_setg(errp, "Failed to set RamDiscardManager");
+        ram_block_coordinated_discard_require(false);
+        return;
+    }
+
     /*
      * We don't know at this point whether shared RAM is migrated using
      * QEMU or migrated using the file content. "x-ignore-shared" will be
@@ -1124,13 +1135,6 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     vmem->system_reset = VIRTIO_MEM_SYSTEM_RESET(obj);
     vmem->system_reset->vmem = vmem;
     qemu_register_resettable(obj);
-
-    /*
-     * Set ourselves as RamDiscardManager before the plug handler maps the
-     * memory region and exposes it via an address space.
-     */
-    memory_region_set_ram_discard_manager(&vmem->memdev->mr,
-                                          RAM_DISCARD_MANAGER(vmem));
 }
 
 static void virtio_mem_device_unrealize(DeviceState *dev)
@@ -1138,12 +1142,6 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOMEM *vmem = VIRTIO_MEM(dev);
 
-    /*
-     * The unplug handler unmapped the memory region, it cannot be
-     * found via an address space anymore. Unset ourselves.
-     */
-    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
-
     qemu_unregister_resettable(OBJECT(vmem->system_reset));
     object_unref(OBJECT(vmem->system_reset));
 
@@ -1156,6 +1154,11 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
     virtio_del_queue(vdev, 0);
     virtio_cleanup(vdev);
     g_free(vmem->bitmap);
+    /*
+     * The unplug handler unmapped the memory region, it cannot be
+     * found via an address space anymore. Unset ourselves.
+     */
+    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
     ram_block_coordinated_discard_require(false);
 }
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3bebc43d59..390477b588 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2487,13 +2487,13 @@ static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
  *
  * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
  * that does not cover RAM, or a #MemoryRegion that already has a
- * #RamDiscardManager assigned.
+ * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
  *
  * @mr: the #MemoryRegion
  * @rdm: #RamDiscardManager to set
  */
-void memory_region_set_ram_discard_manager(MemoryRegion *mr,
-                                           RamDiscardManager *rdm);
+int memory_region_set_ram_discard_manager(MemoryRegion *mr,
+                                          RamDiscardManager *rdm);
 
 /**
  * memory_region_find: translate an address/size relative to a
diff --git a/system/memory.c b/system/memory.c
index b17b5538ff..62d6b410f0 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2115,12 +2115,16 @@ RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
     return mr->rdm;
 }
 
-void memory_region_set_ram_discard_manager(MemoryRegion *mr,
-                                           RamDiscardManager *rdm)
+int memory_region_set_ram_discard_manager(MemoryRegion *mr,
+                                          RamDiscardManager *rdm)
 {
     g_assert(memory_region_is_ram(mr));
-    g_assert(!rdm || !mr->rdm);
+    if (mr->rdm && rdm) {
+        return -EBUSY;
+    }
+
     mr->rdm = rdm;
+    return 0;
 }
 
 uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard()
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
  2025-04-07  7:49 ` [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
  2025-04-07  7:49 ` [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-04-09  5:43   ` Alexey Kardashevskiy
  2025-04-25 12:42   ` David Hildenbrand
  2025-04-07  7:49 ` [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager Chenyi Qiang
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

Update ReplayRamDiscard() function to return the result and unify the
ReplayRamPopulate() and ReplayRamDiscard() to ReplayStateChange() at
the same time due to their identical definitions. This unification
simplifies related structures, such as VirtIOMEMReplayData, which makes
it more cleaner and maintainable.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Modify the commit message. We won't use Replay() operation when
      doing the attribute change like v3.

Changes in v3:
    - Newly added.
---
 hw/virtio/virtio-mem.c | 20 ++++++++++----------
 include/exec/memory.h  | 31 ++++++++++++++++---------------
 migration/ram.c        |  5 +++--
 system/memory.c        | 12 ++++++------
 4 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index d0d3a0240f..1a88d649cb 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -1733,7 +1733,7 @@ static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
 }
 
 struct VirtIOMEMReplayData {
-    void *fn;
+    ReplayStateChange fn;
     void *opaque;
 };
 
@@ -1741,12 +1741,12 @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
 {
     struct VirtIOMEMReplayData *data = arg;
 
-    return ((ReplayRamPopulate)data->fn)(s, data->opaque);
+    return data->fn(s, data->opaque);
 }
 
 static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
                                            MemoryRegionSection *s,
-                                           ReplayRamPopulate replay_fn,
+                                           ReplayStateChange replay_fn,
                                            void *opaque)
 {
     const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
@@ -1765,14 +1765,14 @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
 {
     struct VirtIOMEMReplayData *data = arg;
 
-    ((ReplayRamDiscard)data->fn)(s, data->opaque);
+    data->fn(s, data->opaque);
     return 0;
 }
 
-static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
-                                            MemoryRegionSection *s,
-                                            ReplayRamDiscard replay_fn,
-                                            void *opaque)
+static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
+                                           MemoryRegionSection *s,
+                                           ReplayStateChange replay_fn,
+                                           void *opaque)
 {
     const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
     struct VirtIOMEMReplayData data = {
@@ -1781,8 +1781,8 @@ static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
     };
 
     g_assert(s->mr == &vmem->memdev->mr);
-    virtio_mem_for_each_unplugged_section(vmem, s, &data,
-                                          virtio_mem_rdm_replay_discarded_cb);
+    return virtio_mem_for_each_unplugged_section(vmem, s, &data,
+                                                 virtio_mem_rdm_replay_discarded_cb);
 }
 
 static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 390477b588..3b1d25a403 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -566,8 +566,7 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
     rdl->double_discard_supported = double_discard_supported;
 }
 
-typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque);
-typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque);
+typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
 
 /*
  * RamDiscardManagerClass:
@@ -641,36 +640,38 @@ struct RamDiscardManagerClass {
     /**
      * @replay_populated:
      *
-     * Call the #ReplayRamPopulate callback for all populated parts within the
+     * Call the #ReplayStateChange callback for all populated parts within the
      * #MemoryRegionSection via the #RamDiscardManager.
      *
      * In case any call fails, no further calls are made.
      *
      * @rdm: the #RamDiscardManager
      * @section: the #MemoryRegionSection
-     * @replay_fn: the #ReplayRamPopulate callback
+     * @replay_fn: the #ReplayStateChange callback
      * @opaque: pointer to forward to the callback
      *
      * Returns 0 on success, or a negative error if any notification failed.
      */
     int (*replay_populated)(const RamDiscardManager *rdm,
                             MemoryRegionSection *section,
-                            ReplayRamPopulate replay_fn, void *opaque);
+                            ReplayStateChange replay_fn, void *opaque);
 
     /**
      * @replay_discarded:
      *
-     * Call the #ReplayRamDiscard callback for all discarded parts within the
+     * Call the #ReplayStateChange callback for all discarded parts within the
      * #MemoryRegionSection via the #RamDiscardManager.
      *
      * @rdm: the #RamDiscardManager
      * @section: the #MemoryRegionSection
-     * @replay_fn: the #ReplayRamDiscard callback
+     * @replay_fn: the #ReplayStateChange callback
      * @opaque: pointer to forward to the callback
+     *
+     * Returns 0 on success, or a negative error if any notification failed.
      */
-    void (*replay_discarded)(const RamDiscardManager *rdm,
-                             MemoryRegionSection *section,
-                             ReplayRamDiscard replay_fn, void *opaque);
+    int (*replay_discarded)(const RamDiscardManager *rdm,
+                            MemoryRegionSection *section,
+                            ReplayStateChange replay_fn, void *opaque);
 
     /**
      * @register_listener:
@@ -713,13 +714,13 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
 
 int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
                                          MemoryRegionSection *section,
-                                         ReplayRamPopulate replay_fn,
+                                         ReplayStateChange replay_fn,
                                          void *opaque);
 
-void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
-                                          MemoryRegionSection *section,
-                                          ReplayRamDiscard replay_fn,
-                                          void *opaque);
+int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
+                                         MemoryRegionSection *section,
+                                         ReplayStateChange replay_fn,
+                                         void *opaque);
 
 void ram_discard_manager_register_listener(RamDiscardManager *rdm,
                                            RamDiscardListener *rdl,
diff --git a/migration/ram.c b/migration/ram.c
index ce28328141..053730367b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -816,8 +816,8 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
     return ret;
 }
 
-static void dirty_bitmap_clear_section(MemoryRegionSection *section,
-                                       void *opaque)
+static int dirty_bitmap_clear_section(MemoryRegionSection *section,
+                                      void *opaque)
 {
     const hwaddr offset = section->offset_within_region;
     const hwaddr size = int128_get64(section->size);
@@ -836,6 +836,7 @@ static void dirty_bitmap_clear_section(MemoryRegionSection *section,
     }
     *cleared_bits += bitmap_count_one_with_offset(rb->bmap, start, npages);
     bitmap_clear(rb->bmap, start, npages);
+    return 0;
 }
 
 /*
diff --git a/system/memory.c b/system/memory.c
index 62d6b410f0..b5ab729e13 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2147,7 +2147,7 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
 
 int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
                                          MemoryRegionSection *section,
-                                         ReplayRamPopulate replay_fn,
+                                         ReplayStateChange replay_fn,
                                          void *opaque)
 {
     RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
@@ -2156,15 +2156,15 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
     return rdmc->replay_populated(rdm, section, replay_fn, opaque);
 }
 
-void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
-                                          MemoryRegionSection *section,
-                                          ReplayRamDiscard replay_fn,
-                                          void *opaque)
+int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
+                                         MemoryRegionSection *section,
+                                         ReplayStateChange replay_fn,
+                                         void *opaque)
 {
     RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
 
     g_assert(rdmc->replay_discarded);
-    rdmc->replay_discarded(rdm, section, replay_fn, opaque);
+    return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
 }
 
 void ram_discard_manager_register_listener(RamDiscardManager *rdm,
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
                   ` (2 preceding siblings ...)
  2025-04-07  7:49 ` [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-04-09  9:56   ` Alexey Kardashevskiy
  2025-04-07  7:49 ` [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager Chenyi Qiang
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

RamDiscardManager is an interface used by virtio-mem to adjust VFIO
mappings in relation to VM page assignment. It manages the state of
populated and discard for the RAM. To accommodate future scnarios for
managing RAM states, such as private and shared states in confidential
VMs, the existing RamDiscardManager interface needs to be generalized.

Introduce a parent class, GenericStateManager, to manage a pair of
opposite states with RamDiscardManager as its child. The changes include
- Define a new abstract class GenericStateChange.
- Extract six callbacks into GenericStateChangeClass and allow the child
  classes to inherit them.
- Modify RamDiscardManager-related helpers to use GenericStateManager
  ones.
- Define a generic StatChangeListener to extract fields from
  RamDiscardManager listener which allows future listeners to embed it
  and avoid duplication.
- Change the users of RamDiscardManager (virtio-mem, migration, etc.) to
  switch to use GenericStateChange helpers.

It can provide a more flexible and resuable framework for RAM state
management, facilitating future enhancements and use cases.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Newly added.
---
 hw/vfio/common.c        |  30 ++--
 hw/virtio/virtio-mem.c  |  95 ++++++------
 include/exec/memory.h   | 313 ++++++++++++++++++++++------------------
 migration/ram.c         |  16 +-
 system/memory.c         | 106 ++++++++------
 system/memory_mapping.c |   6 +-
 6 files changed, 310 insertions(+), 256 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f7499a9b74..3172d877cc 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -335,9 +335,10 @@ out:
     rcu_read_unlock();
 }
 
-static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
+static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
                                             MemoryRegionSection *section)
 {
+    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
     VFIOContainerBase *bcontainer = vrdl->bcontainer;
@@ -353,9 +354,10 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
     }
 }
 
-static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
+static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
                                             MemoryRegionSection *section)
 {
+    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
     VFIOContainerBase *bcontainer = vrdl->bcontainer;
@@ -381,7 +383,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
                                      vaddr, section->readonly);
         if (ret) {
             /* Rollback */
-            vfio_ram_discard_notify_discard(rdl, section);
+            vfio_ram_discard_notify_discard(scl, section);
             return ret;
         }
     }
@@ -391,8 +393,9 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
 static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
                                                MemoryRegionSection *section)
 {
-    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
     VFIORamDiscardListener *vrdl;
+    RamDiscardListener *rdl;
 
     /* Ignore some corner cases not relevant in practice. */
     g_assert(QEMU_IS_ALIGNED(section->offset_within_region, TARGET_PAGE_SIZE));
@@ -405,17 +408,18 @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
     vrdl->mr = section->mr;
     vrdl->offset_within_address_space = section->offset_within_address_space;
     vrdl->size = int128_get64(section->size);
-    vrdl->granularity = ram_discard_manager_get_min_granularity(rdm,
-                                                                section->mr);
+    vrdl->granularity = generic_state_manager_get_min_granularity(gsm,
+                                                                  section->mr);
 
     g_assert(vrdl->granularity && is_power_of_2(vrdl->granularity));
     g_assert(bcontainer->pgsizes &&
              vrdl->granularity >= 1ULL << ctz64(bcontainer->pgsizes));
 
-    ram_discard_listener_init(&vrdl->listener,
+    rdl = &vrdl->listener;
+    ram_discard_listener_init(rdl,
                               vfio_ram_discard_notify_populate,
                               vfio_ram_discard_notify_discard, true);
-    ram_discard_manager_register_listener(rdm, &vrdl->listener, section);
+    generic_state_manager_register_listener(gsm, &rdl->scl, section);
     QLIST_INSERT_HEAD(&bcontainer->vrdl_list, vrdl, next);
 
     /*
@@ -465,8 +469,9 @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
 static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
                                                  MemoryRegionSection *section)
 {
-    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
     VFIORamDiscardListener *vrdl = NULL;
+    RamDiscardListener *rdl;
 
     QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
         if (vrdl->mr == section->mr &&
@@ -480,7 +485,8 @@ static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
         hw_error("vfio: Trying to unregister missing RAM discard listener");
     }
 
-    ram_discard_manager_unregister_listener(rdm, &vrdl->listener);
+    rdl = &vrdl->listener;
+    generic_state_manager_unregister_listener(gsm, &rdl->scl);
     QLIST_REMOVE(vrdl, next);
     g_free(vrdl);
 }
@@ -1265,7 +1271,7 @@ static int
 vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
                                             MemoryRegionSection *section)
 {
-    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
     VFIORamDiscardListener *vrdl = NULL;
 
     QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
@@ -1284,7 +1290,7 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
      * We only want/can synchronize the bitmap for actually mapped parts -
      * which correspond to populated parts. Replay all populated parts.
      */
-    return ram_discard_manager_replay_populated(rdm, section,
+    return generic_state_manager_replay_on_state_set(gsm, section,
                                               vfio_ram_discard_get_dirty_bitmap,
                                                 &vrdl);
 }
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 1a88d649cb..40e8267254 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -312,16 +312,16 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
 
 static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
 {
-    RamDiscardListener *rdl = arg;
+    StateChangeListener *scl = arg;
 
-    return rdl->notify_populate(rdl, s);
+    return scl->notify_to_state_set(scl, s);
 }
 
 static int virtio_mem_notify_discard_cb(MemoryRegionSection *s, void *arg)
 {
-    RamDiscardListener *rdl = arg;
+    StateChangeListener *scl = arg;
 
-    rdl->notify_discard(rdl, s);
+    scl->notify_to_state_clear(scl, s);
     return 0;
 }
 
@@ -331,12 +331,13 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
     RamDiscardListener *rdl;
 
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
-        MemoryRegionSection tmp = *rdl->section;
+        StateChangeListener *scl = &rdl->scl;
+        MemoryRegionSection tmp = *scl->section;
 
         if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             continue;
         }
-        rdl->notify_discard(rdl, &tmp);
+        scl->notify_to_state_clear(scl, &tmp);
     }
 }
 
@@ -347,12 +348,13 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
     int ret = 0;
 
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
-        MemoryRegionSection tmp = *rdl->section;
+        StateChangeListener *scl = &rdl->scl;
+        MemoryRegionSection tmp = *scl->section;
 
         if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             continue;
         }
-        ret = rdl->notify_populate(rdl, &tmp);
+        ret = scl->notify_to_state_set(scl, &tmp);
         if (ret) {
             break;
         }
@@ -361,7 +363,8 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
     if (ret) {
         /* Notify all already-notified listeners. */
         QLIST_FOREACH(rdl2, &vmem->rdl_list, next) {
-            MemoryRegionSection tmp = *rdl2->section;
+            StateChangeListener *scl2 = &rdl2->scl;
+            MemoryRegionSection tmp = *scl2->section;
 
             if (rdl2 == rdl) {
                 break;
@@ -369,7 +372,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
             if (!memory_region_section_intersect_range(&tmp, offset, size)) {
                 continue;
             }
-            rdl2->notify_discard(rdl2, &tmp);
+            scl2->notify_to_state_clear(scl2, &tmp);
         }
     }
     return ret;
@@ -384,10 +387,11 @@ static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem)
     }
 
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
+        StateChangeListener *scl = &rdl->scl;
         if (rdl->double_discard_supported) {
-            rdl->notify_discard(rdl, rdl->section);
+            scl->notify_to_state_clear(scl, scl->section);
         } else {
-            virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
+            virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
                                                 virtio_mem_notify_discard_cb);
         }
     }
@@ -1053,8 +1057,8 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
      * Set ourselves as RamDiscardManager before the plug handler maps the
      * memory region and exposes it via an address space.
      */
-    if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
-                                              RAM_DISCARD_MANAGER(vmem))) {
+    if (memory_region_set_generic_state_manager(&vmem->memdev->mr,
+                                                GENERIC_STATE_MANAGER(vmem))) {
         error_setg(errp, "Failed to set RamDiscardManager");
         ram_block_coordinated_discard_require(false);
         return;
@@ -1158,7 +1162,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
      * The unplug handler unmapped the memory region, it cannot be
      * found via an address space anymore. Unset ourselves.
      */
-    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
+    memory_region_set_generic_state_manager(&vmem->memdev->mr, NULL);
     ram_block_coordinated_discard_require(false);
 }
 
@@ -1207,7 +1211,8 @@ static int virtio_mem_post_load_bitmap(VirtIOMEM *vmem)
      * into an address space. Replay, now that we updated the bitmap.
      */
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
-        ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
+        StateChangeListener *scl = &rdl->scl;
+        ret = virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
                                                  virtio_mem_notify_populate_cb);
         if (ret) {
             return ret;
@@ -1704,19 +1709,19 @@ static const Property virtio_mem_properties[] = {
                      dynamic_memslots, false),
 };
 
-static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm,
+static uint64_t virtio_mem_rdm_get_min_granularity(const GenericStateManager *gsm,
                                                    const MemoryRegion *mr)
 {
-    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
 
     g_assert(mr == &vmem->memdev->mr);
     return vmem->block_size;
 }
 
-static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
+static bool virtio_mem_rdm_is_populated(const GenericStateManager *gsm,
                                         const MemoryRegionSection *s)
 {
-    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
     uint64_t start_gpa = vmem->addr + s->offset_within_region;
     uint64_t end_gpa = start_gpa + int128_get64(s->size);
 
@@ -1744,12 +1749,12 @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
     return data->fn(s, data->opaque);
 }
 
-static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
+static int virtio_mem_rdm_replay_populated(const GenericStateManager *gsm,
                                            MemoryRegionSection *s,
                                            ReplayStateChange replay_fn,
                                            void *opaque)
 {
-    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
     struct VirtIOMEMReplayData data = {
         .fn = replay_fn,
         .opaque = opaque,
@@ -1769,12 +1774,12 @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
     return 0;
 }
 
-static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
+static int virtio_mem_rdm_replay_discarded(const GenericStateManager *gsm,
                                            MemoryRegionSection *s,
                                            ReplayStateChange replay_fn,
                                            void *opaque)
 {
-    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
     struct VirtIOMEMReplayData data = {
         .fn = replay_fn,
         .opaque = opaque,
@@ -1785,18 +1790,19 @@ static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
                                                  virtio_mem_rdm_replay_discarded_cb);
 }
 
-static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
-                                             RamDiscardListener *rdl,
+static void virtio_mem_rdm_register_listener(GenericStateManager *gsm,
+                                             StateChangeListener *scl,
                                              MemoryRegionSection *s)
 {
-    VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    VirtIOMEM *vmem = VIRTIO_MEM(gsm);
+    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
     int ret;
 
     g_assert(s->mr == &vmem->memdev->mr);
-    rdl->section = memory_region_section_new_copy(s);
+    scl->section = memory_region_section_new_copy(s);
 
     QLIST_INSERT_HEAD(&vmem->rdl_list, rdl, next);
-    ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
+    ret = virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
                                               virtio_mem_notify_populate_cb);
     if (ret) {
         error_report("%s: Replaying plugged ranges failed: %s", __func__,
@@ -1804,23 +1810,24 @@ static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
     }
 }
 
-static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm,
-                                               RamDiscardListener *rdl)
+static void virtio_mem_rdm_unregister_listener(GenericStateManager *gsm,
+                                               StateChangeListener *scl)
 {
-    VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    VirtIOMEM *vmem = VIRTIO_MEM(gsm);
+    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
 
-    g_assert(rdl->section->mr == &vmem->memdev->mr);
+    g_assert(scl->section->mr == &vmem->memdev->mr);
     if (vmem->size) {
         if (rdl->double_discard_supported) {
-            rdl->notify_discard(rdl, rdl->section);
+            scl->notify_to_state_clear(scl, scl->section);
         } else {
-            virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
+            virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
                                                 virtio_mem_notify_discard_cb);
         }
     }
 
-    memory_region_section_free_copy(rdl->section);
-    rdl->section = NULL;
+    memory_region_section_free_copy(scl->section);
+    scl->section = NULL;
     QLIST_REMOVE(rdl, next);
 }
 
@@ -1853,7 +1860,7 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
     DeviceClass *dc = DEVICE_CLASS(klass);
     VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
     VirtIOMEMClass *vmc = VIRTIO_MEM_CLASS(klass);
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(klass);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(klass);
 
     device_class_set_props(dc, virtio_mem_properties);
     dc->vmsd = &vmstate_virtio_mem;
@@ -1874,12 +1881,12 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
     vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier;
     vmc->unplug_request_check = virtio_mem_unplug_request_check;
 
-    rdmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
-    rdmc->is_populated = virtio_mem_rdm_is_populated;
-    rdmc->replay_populated = virtio_mem_rdm_replay_populated;
-    rdmc->replay_discarded = virtio_mem_rdm_replay_discarded;
-    rdmc->register_listener = virtio_mem_rdm_register_listener;
-    rdmc->unregister_listener = virtio_mem_rdm_unregister_listener;
+    gsmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
+    gsmc->is_state_set = virtio_mem_rdm_is_populated;
+    gsmc->replay_on_state_set = virtio_mem_rdm_replay_populated;
+    gsmc->replay_on_state_clear = virtio_mem_rdm_replay_discarded;
+    gsmc->register_listener = virtio_mem_rdm_register_listener;
+    gsmc->unregister_listener = virtio_mem_rdm_unregister_listener;
 }
 
 static const TypeInfo virtio_mem_info = {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3b1d25a403..30e5838d02 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -43,6 +43,12 @@ typedef struct IOMMUMemoryRegionClass IOMMUMemoryRegionClass;
 DECLARE_OBJ_CHECKERS(IOMMUMemoryRegion, IOMMUMemoryRegionClass,
                      IOMMU_MEMORY_REGION, TYPE_IOMMU_MEMORY_REGION)
 
+#define TYPE_GENERIC_STATE_MANAGER "generic-state-manager"
+typedef struct GenericStateManagerClass GenericStateManagerClass;
+typedef struct GenericStateManager GenericStateManager;
+DECLARE_OBJ_CHECKERS(GenericStateManager, GenericStateManagerClass,
+                     GENERIC_STATE_MANAGER, TYPE_GENERIC_STATE_MANAGER)
+
 #define TYPE_RAM_DISCARD_MANAGER "ram-discard-manager"
 typedef struct RamDiscardManagerClass RamDiscardManagerClass;
 typedef struct RamDiscardManager RamDiscardManager;
@@ -506,103 +512,59 @@ struct IOMMUMemoryRegionClass {
     int (*num_indexes)(IOMMUMemoryRegion *iommu);
 };
 
-typedef struct RamDiscardListener RamDiscardListener;
-typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl,
-                                 MemoryRegionSection *section);
-typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl,
+typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
+
+typedef struct StateChangeListener StateChangeListener;
+typedef int (*NotifyStateSet)(StateChangeListener *scl,
+                              MemoryRegionSection *section);
+typedef void (*NotifyStateClear)(StateChangeListener *scl,
                                  MemoryRegionSection *section);
 
-struct RamDiscardListener {
+struct StateChangeListener {
     /*
-     * @notify_populate:
+     * @notify_to_state_set:
      *
-     * Notification that previously discarded memory is about to get populated.
-     * Listeners are able to object. If any listener objects, already
-     * successfully notified listeners are notified about a discard again.
+     * Notification that previously state clear part is about to be set.
      *
-     * @rdl: the #RamDiscardListener getting notified
-     * @section: the #MemoryRegionSection to get populated. The section
+     * @scl: the #StateChangeListener getting notified
+     * @section: the #MemoryRegionSection to be state-set. The section
      *           is aligned within the memory region to the minimum granularity
      *           unless it would exceed the registered section.
      *
      * Returns 0 on success. If the notification is rejected by the listener,
      * an error is returned.
      */
-    NotifyRamPopulate notify_populate;
+    NotifyStateSet notify_to_state_set;
 
     /*
-     * @notify_discard:
+     * @notify_to_state_clear:
      *
-     * Notification that previously populated memory was discarded successfully
-     * and listeners should drop all references to such memory and prevent
-     * new population (e.g., unmap).
+     * Notification that previously state set part is about to be cleared
      *
-     * @rdl: the #RamDiscardListener getting notified
-     * @section: the #MemoryRegionSection to get populated. The section
+     * @scl: the #StateChangeListener getting notified
+     * @section: the #MemoryRegionSection to be state-cleared. The section
      *           is aligned within the memory region to the minimum granularity
      *           unless it would exceed the registered section.
-     */
-    NotifyRamDiscard notify_discard;
-
-    /*
-     * @double_discard_supported:
      *
-     * The listener suppors getting @notify_discard notifications that span
-     * already discarded parts.
+     * Returns 0 on success. If the notification is rejected by the listener,
+     * an error is returned.
      */
-    bool double_discard_supported;
+    NotifyStateClear notify_to_state_clear;
 
     MemoryRegionSection *section;
-    QLIST_ENTRY(RamDiscardListener) next;
 };
 
-static inline void ram_discard_listener_init(RamDiscardListener *rdl,
-                                             NotifyRamPopulate populate_fn,
-                                             NotifyRamDiscard discard_fn,
-                                             bool double_discard_supported)
-{
-    rdl->notify_populate = populate_fn;
-    rdl->notify_discard = discard_fn;
-    rdl->double_discard_supported = double_discard_supported;
-}
-
-typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
-
 /*
- * RamDiscardManagerClass:
- *
- * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
- * regions are currently populated to be used/accessed by the VM, notifying
- * after parts were discarded (freeing up memory) and before parts will be
- * populated (consuming memory), to be used/accessed by the VM.
- *
- * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
- * #MemoryRegion isn't mapped into an address space yet (either directly
- * or via an alias); it cannot change while the #MemoryRegion is
- * mapped into an address space.
+ * GenericStateManagerClass:
  *
- * The #RamDiscardManager is intended to be used by technologies that are
- * incompatible with discarding of RAM (e.g., VFIO, which may pin all
- * memory inside a #MemoryRegion), and require proper coordination to only
- * map the currently populated parts, to hinder parts that are expected to
- * remain discarded from silently getting populated and consuming memory.
- * Technologies that support discarding of RAM don't have to bother and can
- * simply map the whole #MemoryRegion.
- *
- * An example #RamDiscardManager is virtio-mem, which logically (un)plugs
- * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
- * Logically unplugging memory consists of discarding RAM. The VM agreed to not
- * access unplugged (discarded) memory - especially via DMA. virtio-mem will
- * properly coordinate with listeners before memory is plugged (populated),
- * and after memory is unplugged (discarded).
+ * A #GenericStateManager is a common interface used to manage the state of
+ * a #MemoryRegion. The managed states is a pair of opposite states, such as
+ * populated and discarded, or private and shared. It is abstract as set and
+ * clear in below callbacks, and the actual state is managed by the
+ * implementation.
  *
- * Listeners are called in multiples of the minimum granularity (unless it
- * would exceed the registered range) and changes are aligned to the minimum
- * granularity within the #MemoryRegion. Listeners have to prepare for memory
- * becoming discarded in a different granularity than it was populated and the
- * other way around.
  */
-struct RamDiscardManagerClass {
+struct GenericStateManagerClass {
     /* private */
     InterfaceClass parent_class;
 
@@ -612,122 +574,188 @@ struct RamDiscardManagerClass {
      * @get_min_granularity:
      *
      * Get the minimum granularity in which listeners will get notified
-     * about changes within the #MemoryRegion via the #RamDiscardManager.
+     * about changes within the #MemoryRegion via the #GenericStateManager.
      *
-     * @rdm: the #RamDiscardManager
+     * @gsm: the #GenericStateManager
      * @mr: the #MemoryRegion
      *
      * Returns the minimum granularity.
      */
-    uint64_t (*get_min_granularity)(const RamDiscardManager *rdm,
+    uint64_t (*get_min_granularity)(const GenericStateManager *gsm,
                                     const MemoryRegion *mr);
 
     /**
-     * @is_populated:
+     * @is_state_set:
      *
-     * Check whether the given #MemoryRegionSection is completely populated
-     * (i.e., no parts are currently discarded) via the #RamDiscardManager.
-     * There are no alignment requirements.
+     * Check whether the given #MemoryRegionSection state is set.
+     * via the #GenericStateManager.
      *
-     * @rdm: the #RamDiscardManager
+     * @gsm: the #GenericStateManager
      * @section: the #MemoryRegionSection
      *
-     * Returns whether the given range is completely populated.
+     * Returns whether the given range is completely set.
      */
-    bool (*is_populated)(const RamDiscardManager *rdm,
+    bool (*is_state_set)(const GenericStateManager *gsm,
                          const MemoryRegionSection *section);
 
     /**
-     * @replay_populated:
+     * @replay_on_state_set:
      *
-     * Call the #ReplayStateChange callback for all populated parts within the
-     * #MemoryRegionSection via the #RamDiscardManager.
+     * Call the #ReplayStateChange callback for all state set parts within the
+     * #MemoryRegionSection via the #GenericStateManager.
      *
      * In case any call fails, no further calls are made.
      *
-     * @rdm: the #RamDiscardManager
+     * @gsm: the #GenericStateManager
      * @section: the #MemoryRegionSection
      * @replay_fn: the #ReplayStateChange callback
      * @opaque: pointer to forward to the callback
      *
      * Returns 0 on success, or a negative error if any notification failed.
      */
-    int (*replay_populated)(const RamDiscardManager *rdm,
-                            MemoryRegionSection *section,
-                            ReplayStateChange replay_fn, void *opaque);
+    int (*replay_on_state_set)(const GenericStateManager *gsm,
+                               MemoryRegionSection *section,
+                               ReplayStateChange replay_fn, void *opaque);
 
     /**
-     * @replay_discarded:
+     * @replay_on_state_clear:
      *
-     * Call the #ReplayStateChange callback for all discarded parts within the
-     * #MemoryRegionSection via the #RamDiscardManager.
+     * Call the #ReplayStateChange callback for all state clear parts within the
+     * #MemoryRegionSection via the #GenericStateManager.
+     *
+     * In case any call fails, no further calls are made.
      *
-     * @rdm: the #RamDiscardManager
+     * @gsm: the #GenericStateManager
      * @section: the #MemoryRegionSection
      * @replay_fn: the #ReplayStateChange callback
      * @opaque: pointer to forward to the callback
      *
      * Returns 0 on success, or a negative error if any notification failed.
      */
-    int (*replay_discarded)(const RamDiscardManager *rdm,
-                            MemoryRegionSection *section,
-                            ReplayStateChange replay_fn, void *opaque);
+    int (*replay_on_state_clear)(const GenericStateManager *gsm,
+                                 MemoryRegionSection *section,
+                                 ReplayStateChange replay_fn, void *opaque);
 
     /**
      * @register_listener:
      *
-     * Register a #RamDiscardListener for the given #MemoryRegionSection and
-     * immediately notify the #RamDiscardListener about all populated parts
-     * within the #MemoryRegionSection via the #RamDiscardManager.
+     * Register a #StateChangeListener for the given #MemoryRegionSection and
+     * immediately notify the #StateChangeListener about all state-set parts
+     * within the #MemoryRegionSection via the #GenericStateManager.
      *
      * In case any notification fails, no further notifications are triggered
      * and an error is logged.
      *
-     * @rdm: the #RamDiscardManager
-     * @rdl: the #RamDiscardListener
+     * @rdm: the #GenericStateManager
+     * @rdl: the #StateChangeListener
      * @section: the #MemoryRegionSection
      */
-    void (*register_listener)(RamDiscardManager *rdm,
-                              RamDiscardListener *rdl,
+    void (*register_listener)(GenericStateManager *gsm,
+                              StateChangeListener *scl,
                               MemoryRegionSection *section);
 
     /**
      * @unregister_listener:
      *
-     * Unregister a previously registered #RamDiscardListener via the
-     * #RamDiscardManager after notifying the #RamDiscardListener about all
-     * populated parts becoming unpopulated within the registered
+     * Unregister a previously registered #StateChangeListener via the
+     * #GenericStateManager after notifying the #StateChangeListener about all
+     * state-set parts becoming state-cleared within the registered
      * #MemoryRegionSection.
      *
-     * @rdm: the #RamDiscardManager
-     * @rdl: the #RamDiscardListener
+     * @rdm: the #GenericStateManager
+     * @rdl: the #StateChangeListener
      */
-    void (*unregister_listener)(RamDiscardManager *rdm,
-                                RamDiscardListener *rdl);
+    void (*unregister_listener)(GenericStateManager *gsm,
+                                StateChangeListener *scl);
 };
 
-uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
-                                                 const MemoryRegion *mr);
+uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
+                                                   const MemoryRegion *mr);
 
-bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
-                                      const MemoryRegionSection *section);
+bool generic_state_manager_is_state_set(const GenericStateManager *gsm,
+                                        const MemoryRegionSection *section);
 
-int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
-                                         MemoryRegionSection *section,
-                                         ReplayStateChange replay_fn,
-                                         void *opaque);
+int generic_state_manager_replay_on_state_set(const GenericStateManager *gsm,
+                                           MemoryRegionSection *section,
+                                           ReplayStateChange replay_fn,
+                                           void *opaque);
 
-int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
-                                         MemoryRegionSection *section,
-                                         ReplayStateChange replay_fn,
-                                         void *opaque);
+int generic_state_manager_replay_on_state_clear(const GenericStateManager *gsm,
+                                                MemoryRegionSection *section,
+                                                ReplayStateChange replay_fn,
+                                                void *opaque);
 
-void ram_discard_manager_register_listener(RamDiscardManager *rdm,
-                                           RamDiscardListener *rdl,
-                                           MemoryRegionSection *section);
+void generic_state_manager_register_listener(GenericStateManager *gsm,
+                                             StateChangeListener *scl,
+                                             MemoryRegionSection *section);
 
-void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
-                                             RamDiscardListener *rdl);
+void generic_state_manager_unregister_listener(GenericStateManager *gsm,
+                                               StateChangeListener *scl);
+
+typedef struct RamDiscardListener RamDiscardListener;
+
+struct RamDiscardListener {
+    struct StateChangeListener scl;
+
+    /*
+     * @double_discard_supported:
+     *
+     * The listener suppors getting @notify_discard notifications that span
+     * already discarded parts.
+     */
+    bool double_discard_supported;
+
+    QLIST_ENTRY(RamDiscardListener) next;
+};
+
+static inline void ram_discard_listener_init(RamDiscardListener *rdl,
+                                             NotifyStateSet populate_fn,
+                                             NotifyStateClear discard_fn,
+                                             bool double_discard_supported)
+{
+    rdl->scl.notify_to_state_set = populate_fn;
+    rdl->scl.notify_to_state_clear = discard_fn;
+    rdl->double_discard_supported = double_discard_supported;
+}
+
+/*
+ * RamDiscardManagerClass:
+ *
+ * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
+ * regions are currently populated to be used/accessed by the VM, notifying
+ * after parts were discarded (freeing up memory) and before parts will be
+ * populated (consuming memory), to be used/accessed by the VM.
+ *
+ * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
+ * #MemoryRegion isn't mapped into an address space yet (either directly
+ * or via an alias); it cannot change while the #MemoryRegion is
+ * mapped into an address space.
+ *
+ * The #RamDiscardManager is intended to be used by technologies that are
+ * incompatible with discarding of RAM (e.g., VFIO, which may pin all
+ * memory inside a #MemoryRegion), and require proper coordination to only
+ * map the currently populated parts, to hinder parts that are expected to
+ * remain discarded from silently getting populated and consuming memory.
+ * Technologies that support discarding of RAM don't have to bother and can
+ * simply map the whole #MemoryRegion.
+ *
+ * An example #RamDiscardManager is virtio-mem, which logically (un)plugs
+ * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
+ * Logically unplugging memory consists of discarding RAM. The VM agreed to not
+ * access unplugged (discarded) memory - especially via DMA. virtio-mem will
+ * properly coordinate with listeners before memory is plugged (populated),
+ * and after memory is unplugged (discarded).
+ *
+ * Listeners are called in multiples of the minimum granularity (unless it
+ * would exceed the registered range) and changes are aligned to the minimum
+ * granularity within the #MemoryRegion. Listeners have to prepare for memory
+ * becoming discarded in a different granularity than it was populated and the
+ * other way around.
+ */
+struct RamDiscardManagerClass {
+    /* private */
+    GenericStateManagerClass parent_class;
+};
 
 /**
  * memory_get_xlat_addr: Extract addresses from a TLB entry
@@ -795,7 +823,7 @@ struct MemoryRegion {
     const char *name;
     unsigned ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
-    RamDiscardManager *rdm; /* Only for RAM */
+    GenericStateManager *gsm; /* Only for RAM */
 
     /* For devices designed to perform re-entrant IO into their own IO MRs */
     bool disable_reentrancy_guard;
@@ -2462,39 +2490,36 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr);
 bool memory_region_is_mapped(MemoryRegion *mr);
 
 /**
- * memory_region_get_ram_discard_manager: get the #RamDiscardManager for a
+ * memory_region_get_generic_state_manager: get the #GenericStateManager for a
  * #MemoryRegion
  *
- * The #RamDiscardManager cannot change while a memory region is mapped.
+ * The #GenericStateManager cannot change while a memory region is mapped.
  *
  * @mr: the #MemoryRegion
  */
-RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr);
+GenericStateManager *memory_region_get_generic_state_manager(MemoryRegion *mr);
 
 /**
- * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a
- * #RamDiscardManager assigned
+ * memory_region_set_generic_state_manager: set the #GenericStateManager for a
+ * #MemoryRegion
+ *
+ * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
+ * that does not cover RAM, or a #MemoryRegion that already has a
+ * #GenericStateManager assigned. Return 0 if the gsm is set successfully.
  *
  * @mr: the #MemoryRegion
+ * @gsm: #GenericStateManager to set
  */
-static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
-{
-    return !!memory_region_get_ram_discard_manager(mr);
-}
+int memory_region_set_generic_state_manager(MemoryRegion *mr,
+                                            GenericStateManager *gsm);
 
 /**
- * memory_region_set_ram_discard_manager: set the #RamDiscardManager for a
- * #MemoryRegion
- *
- * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
- * that does not cover RAM, or a #MemoryRegion that already has a
- * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
+ * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a
+ * #RamDiscardManager assigned
  *
  * @mr: the #MemoryRegion
- * @rdm: #RamDiscardManager to set
  */
-int memory_region_set_ram_discard_manager(MemoryRegion *mr,
-                                          RamDiscardManager *rdm);
+bool memory_region_has_ram_discard_manager(MemoryRegion *mr);
 
 /**
  * memory_region_find: translate an address/size relative to a
diff --git a/migration/ram.c b/migration/ram.c
index 053730367b..c881523e64 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -857,14 +857,14 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
     uint64_t cleared_bits = 0;
 
     if (rb->mr && rb->bmap && memory_region_has_ram_discard_manager(rb->mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
         MemoryRegionSection section = {
             .mr = rb->mr,
             .offset_within_region = 0,
             .size = int128_make64(qemu_ram_get_used_length(rb)),
         };
 
-        ram_discard_manager_replay_discarded(rdm, &section,
+        generic_state_manager_replay_on_state_clear(gsm, &section,
                                              dirty_bitmap_clear_section,
                                              &cleared_bits);
     }
@@ -880,14 +880,14 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start)
 {
     if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
         MemoryRegionSection section = {
             .mr = rb->mr,
             .offset_within_region = start,
             .size = int128_make64(qemu_ram_pagesize(rb)),
         };
 
-        return !ram_discard_manager_is_populated(rdm, &section);
+        return !generic_state_manager_is_state_set(gsm, &section);
     }
     return false;
 }
@@ -1545,14 +1545,14 @@ static void ram_block_populate_read(RAMBlock *rb)
      * Note: The result is only stable while migrating (precopy/postcopy).
      */
     if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
         MemoryRegionSection section = {
             .mr = rb->mr,
             .offset_within_region = 0,
             .size = rb->mr->size,
         };
 
-        ram_discard_manager_replay_populated(rdm, &section,
+        generic_state_manager_replay_on_state_set(gsm, &section,
                                              populate_read_section, NULL);
     } else {
         populate_read_range(rb, 0, rb->used_length);
@@ -1604,14 +1604,14 @@ static int ram_block_uffd_protect(RAMBlock *rb, int uffd_fd)
 
     /* See ram_block_populate_read() */
     if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
         MemoryRegionSection section = {
             .mr = rb->mr,
             .offset_within_region = 0,
             .size = rb->mr->size,
         };
 
-        return ram_discard_manager_replay_populated(rdm, &section,
+        return generic_state_manager_replay_on_state_set(gsm, &section,
                                                     uffd_protect_section,
                                                     (void *)(uintptr_t)uffd_fd);
     }
diff --git a/system/memory.c b/system/memory.c
index b5ab729e13..7b921c66a6 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2107,83 +2107,93 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr)
     return imrc->num_indexes(iommu_mr);
 }
 
-RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
+GenericStateManager *memory_region_get_generic_state_manager(MemoryRegion *mr)
 {
     if (!memory_region_is_ram(mr)) {
         return NULL;
     }
-    return mr->rdm;
+    return mr->gsm;
 }
 
-int memory_region_set_ram_discard_manager(MemoryRegion *mr,
-                                          RamDiscardManager *rdm)
+int memory_region_set_generic_state_manager(MemoryRegion *mr,
+                                            GenericStateManager *gsm)
 {
     g_assert(memory_region_is_ram(mr));
-    if (mr->rdm && rdm) {
+    if (mr->gsm && gsm) {
         return -EBUSY;
     }
 
-    mr->rdm = rdm;
+    mr->gsm = gsm;
     return 0;
 }
 
-uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
-                                                 const MemoryRegion *mr)
+bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    if (!memory_region_is_ram(mr) ||
+        !object_dynamic_cast(OBJECT(mr->gsm), TYPE_RAM_DISCARD_MANAGER)) {
+        return false;
+    }
+
+    return true;
+}
+
+uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
+                                                   const MemoryRegion *mr)
+{
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->get_min_granularity);
-    return rdmc->get_min_granularity(rdm, mr);
+    g_assert(gsmc->get_min_granularity);
+    return gsmc->get_min_granularity(gsm, mr);
 }
 
-bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
-                                      const MemoryRegionSection *section)
+bool generic_state_manager_is_state_set(const GenericStateManager *gsm,
+                                        const MemoryRegionSection *section)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->is_populated);
-    return rdmc->is_populated(rdm, section);
+    g_assert(gsmc->is_state_set);
+    return gsmc->is_state_set(gsm, section);
 }
 
-int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
-                                         MemoryRegionSection *section,
-                                         ReplayStateChange replay_fn,
-                                         void *opaque)
+int generic_state_manager_replay_on_state_set(const GenericStateManager *gsm,
+                                              MemoryRegionSection *section,
+                                              ReplayStateChange replay_fn,
+                                              void *opaque)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->replay_populated);
-    return rdmc->replay_populated(rdm, section, replay_fn, opaque);
+    g_assert(gsmc->replay_on_state_set);
+    return gsmc->replay_on_state_set(gsm, section, replay_fn, opaque);
 }
 
-int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
-                                         MemoryRegionSection *section,
-                                         ReplayStateChange replay_fn,
-                                         void *opaque)
+int generic_state_manager_replay_on_state_clear(const GenericStateManager *gsm,
+                                                MemoryRegionSection *section,
+                                                ReplayStateChange replay_fn,
+                                                void *opaque)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->replay_discarded);
-    return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
+    g_assert(gsmc->replay_on_state_clear);
+    return gsmc->replay_on_state_clear(gsm, section, replay_fn, opaque);
 }
 
-void ram_discard_manager_register_listener(RamDiscardManager *rdm,
-                                           RamDiscardListener *rdl,
-                                           MemoryRegionSection *section)
+void generic_state_manager_register_listener(GenericStateManager *gsm,
+                                             StateChangeListener *scl,
+                                             MemoryRegionSection *section)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->register_listener);
-    rdmc->register_listener(rdm, rdl, section);
+    g_assert(gsmc->register_listener);
+    gsmc->register_listener(gsm, scl, section);
 }
 
-void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
-                                             RamDiscardListener *rdl)
+void generic_state_manager_unregister_listener(GenericStateManager *gsm,
+                                               StateChangeListener *scl)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->unregister_listener);
-    rdmc->unregister_listener(rdm, rdl);
+    g_assert(gsmc->unregister_listener);
+    gsmc->unregister_listener(gsm, scl);
 }
 
 /* Called with rcu_read_lock held.  */
@@ -2210,7 +2220,7 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
         error_setg(errp, "iommu map to non memory area %" HWADDR_PRIx "", xlat);
         return false;
     } else if (memory_region_has_ram_discard_manager(mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(mr);
         MemoryRegionSection tmp = {
             .mr = mr,
             .offset_within_region = xlat,
@@ -2225,7 +2235,7 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
          * Disallow that. vmstate priorities make sure any RamDiscardManager
          * were already restored before IOMMUs are restored.
          */
-        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
+        if (!generic_state_manager_is_state_set(gsm, &tmp)) {
             error_setg(errp, "iommu map to discarded memory (e.g., unplugged"
                          " via virtio-mem): %" HWADDR_PRIx "",
                          iotlb->translated_addr);
@@ -3814,8 +3824,15 @@ static const TypeInfo iommu_memory_region_info = {
     .abstract           = true,
 };
 
-static const TypeInfo ram_discard_manager_info = {
+static const TypeInfo generic_state_manager_info = {
     .parent             = TYPE_INTERFACE,
+    .name               = TYPE_GENERIC_STATE_MANAGER,
+    .class_size         = sizeof(GenericStateManagerClass),
+    .abstract           = true,
+};
+
+static const TypeInfo ram_discard_manager_info = {
+    .parent             = TYPE_GENERIC_STATE_MANAGER,
     .name               = TYPE_RAM_DISCARD_MANAGER,
     .class_size         = sizeof(RamDiscardManagerClass),
 };
@@ -3824,6 +3841,7 @@ static void memory_register_types(void)
 {
     type_register_static(&memory_region_info);
     type_register_static(&iommu_memory_region_info);
+    type_register_static(&generic_state_manager_info);
     type_register_static(&ram_discard_manager_info);
 }
 
diff --git a/system/memory_mapping.c b/system/memory_mapping.c
index 37d3325f77..e9d15c737d 100644
--- a/system/memory_mapping.c
+++ b/system/memory_mapping.c
@@ -271,10 +271,8 @@ static void guest_phys_blocks_region_add(MemoryListener *listener,
 
     /* for special sparse regions, only add populated parts */
     if (memory_region_has_ram_discard_manager(section->mr)) {
-        RamDiscardManager *rdm;
-
-        rdm = memory_region_get_ram_discard_manager(section->mr);
-        ram_discard_manager_replay_populated(rdm, section,
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
+        generic_state_manager_replay_on_state_set(gsm, section,
                                              guest_phys_ram_populate_cb, g);
         return;
     }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
                   ` (3 preceding siblings ...)
  2025-04-07  7:49 ` [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-04-09  9:56   ` Alexey Kardashevskiy
  2025-04-25 12:57   ` David Hildenbrand
  2025-04-07  7:49 ` [PATCH v4 06/13] vfio: Add the support for PrivateSharedManager Interface Chenyi Qiang
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

To manage the private and shared RAM states in confidential VMs,
introduce a new class of PrivateShareManager as a child of
GenericStateManager, which inherits the six interface callbacks. With a
different interface type, it can be distinguished from the
RamDiscardManager object and provide the flexibility for addressing
specific requirements of confidential VMs in the future.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Newly added.
---
 include/exec/memory.h | 44 +++++++++++++++++++++++++++++++++++++++++--
 system/memory.c       | 17 +++++++++++++++++
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 30e5838d02..08f25e5e84 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -55,6 +55,12 @@ typedef struct RamDiscardManager RamDiscardManager;
 DECLARE_OBJ_CHECKERS(RamDiscardManager, RamDiscardManagerClass,
                      RAM_DISCARD_MANAGER, TYPE_RAM_DISCARD_MANAGER);
 
+#define TYPE_PRIVATE_SHARED_MANAGER "private-shared-manager"
+typedef struct PrivateSharedManagerClass PrivateSharedManagerClass;
+typedef struct PrivateSharedManager PrivateSharedManager;
+DECLARE_OBJ_CHECKERS(PrivateSharedManager, PrivateSharedManagerClass,
+                     PRIVATE_SHARED_MANAGER, TYPE_PRIVATE_SHARED_MANAGER)
+
 #ifdef CONFIG_FUZZ
 void fuzz_dma_read_cb(size_t addr,
                       size_t len,
@@ -692,6 +698,14 @@ void generic_state_manager_register_listener(GenericStateManager *gsm,
 void generic_state_manager_unregister_listener(GenericStateManager *gsm,
                                                StateChangeListener *scl);
 
+static inline void state_change_listener_init(StateChangeListener *scl,
+                                              NotifyStateSet state_set_fn,
+                                              NotifyStateClear state_clear_fn)
+{
+    scl->notify_to_state_set = state_set_fn;
+    scl->notify_to_state_clear = state_clear_fn;
+}
+
 typedef struct RamDiscardListener RamDiscardListener;
 
 struct RamDiscardListener {
@@ -713,8 +727,7 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
                                              NotifyStateClear discard_fn,
                                              bool double_discard_supported)
 {
-    rdl->scl.notify_to_state_set = populate_fn;
-    rdl->scl.notify_to_state_clear = discard_fn;
+    state_change_listener_init(&rdl->scl, populate_fn, discard_fn);
     rdl->double_discard_supported = double_discard_supported;
 }
 
@@ -757,6 +770,25 @@ struct RamDiscardManagerClass {
     GenericStateManagerClass parent_class;
 };
 
+typedef struct PrivateSharedListener PrivateSharedListener;
+struct PrivateSharedListener {
+    struct StateChangeListener scl;
+
+    QLIST_ENTRY(PrivateSharedListener) next;
+};
+
+struct PrivateSharedManagerClass {
+    /* private */
+    GenericStateManagerClass parent_class;
+};
+
+static inline void private_shared_listener_init(PrivateSharedListener *psl,
+                                                NotifyStateSet populate_fn,
+                                                NotifyStateClear discard_fn)
+{
+    state_change_listener_init(&psl->scl, populate_fn, discard_fn);
+}
+
 /**
  * memory_get_xlat_addr: Extract addresses from a TLB entry
  *
@@ -2521,6 +2553,14 @@ int memory_region_set_generic_state_manager(MemoryRegion *mr,
  */
 bool memory_region_has_ram_discard_manager(MemoryRegion *mr);
 
+/**
+ * memory_region_has_private_shared_manager: check whether a #MemoryRegion has a
+ * #PrivateSharedManager assigned
+ *
+ * @mr: the #MemoryRegion
+ */
+bool memory_region_has_private_shared_manager(MemoryRegion *mr);
+
 /**
  * memory_region_find: translate an address/size relative to a
  * MemoryRegion into a #MemoryRegionSection.
diff --git a/system/memory.c b/system/memory.c
index 7b921c66a6..e6e944d9c0 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2137,6 +2137,16 @@ bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
     return true;
 }
 
+bool memory_region_has_private_shared_manager(MemoryRegion *mr)
+{
+    if (!memory_region_is_ram(mr) ||
+        !object_dynamic_cast(OBJECT(mr->gsm), TYPE_PRIVATE_SHARED_MANAGER)) {
+        return false;
+    }
+
+    return true;
+}
+
 uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
                                                    const MemoryRegion *mr)
 {
@@ -3837,12 +3847,19 @@ static const TypeInfo ram_discard_manager_info = {
     .class_size         = sizeof(RamDiscardManagerClass),
 };
 
+static const TypeInfo private_shared_manager_info = {
+    .parent             = TYPE_GENERIC_STATE_MANAGER,
+    .name               = TYPE_PRIVATE_SHARED_MANAGER,
+    .class_size         = sizeof(PrivateSharedManagerClass),
+};
+
 static void memory_register_types(void)
 {
     type_register_static(&memory_region_info);
     type_register_static(&iommu_memory_region_info);
     type_register_static(&generic_state_manager_info);
     type_register_static(&ram_discard_manager_info);
+    type_register_static(&private_shared_manager_info);
 }
 
 type_init(memory_register_types)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 06/13] vfio: Add the support for PrivateSharedManager Interface
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
                   ` (4 preceding siblings ...)
  2025-04-07  7:49 ` [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-04-09  9:58   ` Alexey Kardashevskiy
  2025-04-07  7:49 ` [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd Chenyi Qiang
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

Subsystems like VFIO previously disabled ram block discard and only
allowed coordinated discarding via RamDiscardManager. However,
guest_memfd in confidential VMs relies on discard operations for page
conversion between private and shared memory. This can lead to stale
IOMMU mapping issue when assigning a hardware device to a confidential
VM via shared memory. With the introduction of PrivateSharedManager
interface to manage private and shared states and being distinct from
RamDiscardManager, include PrivateSharedManager in coordinated RAM
discard and add related support in VFIO.

Currently, migration support for confidential VMs is not available, so
vfio_sync_dirty_bitmap() handling for PrivateSharedListener can be
ignored. The register/unregister of PrivateSharedListener is necessary
during vfio_listener_region_add/del(). The listener callbacks are
similar between RamDiscardListener and PrivateSharedListener, allowing
for extraction of common parts opportunisticlly.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4
    - Newly added.
---
 hw/vfio/common.c                      | 104 +++++++++++++++++++++++---
 hw/vfio/container-base.c              |   1 +
 include/hw/vfio/vfio-container-base.h |  10 +++
 3 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3172d877cc..48468a12c3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -335,13 +335,9 @@ out:
     rcu_read_unlock();
 }
 
-static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
-                                            MemoryRegionSection *section)
+static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
+                                                    MemoryRegionSection *section)
 {
-    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
-    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
-                                                listener);
-    VFIOContainerBase *bcontainer = vrdl->bcontainer;
     const hwaddr size = int128_get64(section->size);
     const hwaddr iova = section->offset_within_address_space;
     int ret;
@@ -354,13 +350,28 @@ static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
     }
 }
 
-static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
+static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
                                             MemoryRegionSection *section)
 {
     RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
-    VFIOContainerBase *bcontainer = vrdl->bcontainer;
+    vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
+}
+
+static void vfio_private_shared_notify_to_private(StateChangeListener *scl,
+                                                  MemoryRegionSection *section)
+{
+    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
+    VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
+                                                   listener);
+    vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
+}
+
+static int vfio_state_change_notify_to_state_set(VFIOContainerBase *bcontainer,
+                                                 MemoryRegionSection *section,
+                                                 uint64_t granularity)
+{
     const hwaddr end = section->offset_within_region +
                        int128_get64(section->size);
     hwaddr start, next, iova;
@@ -372,7 +383,7 @@ static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
      * unmap in minimum granularity later.
      */
     for (start = section->offset_within_region; start < end; start = next) {
-        next = ROUND_UP(start + 1, vrdl->granularity);
+        next = ROUND_UP(start + 1, granularity);
         next = MIN(next, end);
 
         iova = start - section->offset_within_region +
@@ -383,13 +394,33 @@ static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
                                      vaddr, section->readonly);
         if (ret) {
             /* Rollback */
-            vfio_ram_discard_notify_discard(scl, section);
+            vfio_state_change_notify_to_state_clear(bcontainer, section);
             return ret;
         }
     }
     return 0;
 }
 
+static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
+                                            MemoryRegionSection *section)
+{
+    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
+    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
+                                                listener);
+    return vfio_state_change_notify_to_state_set(vrdl->bcontainer, section,
+                                                 vrdl->granularity);
+}
+
+static int vfio_private_shared_notify_to_shared(StateChangeListener *scl,
+                                                MemoryRegionSection *section)
+{
+    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
+    VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
+                                                   listener);
+    return vfio_state_change_notify_to_state_set(vpsl->bcontainer, section,
+                                                 vpsl->granularity);
+}
+
 static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
                                                MemoryRegionSection *section)
 {
@@ -466,6 +497,27 @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
     }
 }
 
+static void vfio_register_private_shared_listener(VFIOContainerBase *bcontainer,
+                                                  MemoryRegionSection *section)
+{
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
+    VFIOPrivateSharedListener *vpsl;
+    PrivateSharedListener *psl;
+
+    vpsl = g_new0(VFIOPrivateSharedListener, 1);
+    vpsl->bcontainer = bcontainer;
+    vpsl->mr = section->mr;
+    vpsl->offset_within_address_space = section->offset_within_address_space;
+    vpsl->granularity = generic_state_manager_get_min_granularity(gsm,
+                                                                  section->mr);
+
+    psl = &vpsl->listener;
+    private_shared_listener_init(psl, vfio_private_shared_notify_to_shared,
+                                 vfio_private_shared_notify_to_private);
+    generic_state_manager_register_listener(gsm, &psl->scl, section);
+    QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next);
+}
+
 static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
                                                  MemoryRegionSection *section)
 {
@@ -491,6 +543,31 @@ static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
     g_free(vrdl);
 }
 
+static void vfio_unregister_private_shared_listener(VFIOContainerBase *bcontainer,
+                                                    MemoryRegionSection *section)
+{
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
+    VFIOPrivateSharedListener *vpsl = NULL;
+    PrivateSharedListener *psl;
+
+    QLIST_FOREACH(vpsl, &bcontainer->vpsl_list, next) {
+        if (vpsl->mr == section->mr &&
+            vpsl->offset_within_address_space ==
+            section->offset_within_address_space) {
+            break;
+        }
+    }
+
+    if (!vpsl) {
+        hw_error("vfio: Trying to unregister missing RAM discard listener");
+    }
+
+    psl = &vpsl->listener;
+    generic_state_manager_unregister_listener(gsm, &psl->scl);
+    QLIST_REMOVE(vpsl, next);
+    g_free(vpsl);
+}
+
 static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
 {
     MemoryRegion *mr = section->mr;
@@ -644,6 +721,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
     if (memory_region_has_ram_discard_manager(section->mr)) {
         vfio_register_ram_discard_listener(bcontainer, section);
         return;
+    } else if (memory_region_has_private_shared_manager(section->mr)) {
+        vfio_register_private_shared_listener(bcontainer, section);
+        return;
     }
 
     vaddr = memory_region_get_ram_ptr(section->mr) +
@@ -764,6 +844,10 @@ static void vfio_listener_region_del(MemoryListener *listener,
         vfio_unregister_ram_discard_listener(bcontainer, section);
         /* Unregistering will trigger an unmap. */
         try_unmap = false;
+    } else if (memory_region_has_private_shared_manager(section->mr)) {
+        vfio_unregister_private_shared_listener(bcontainer, section);
+        /* Unregistering will trigger an unmap. */
+        try_unmap = false;
     }
 
     if (try_unmap) {
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 749a3fd29d..ff5df925c2 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -135,6 +135,7 @@ static void vfio_container_instance_init(Object *obj)
     bcontainer->iova_ranges = NULL;
     QLIST_INIT(&bcontainer->giommu_list);
     QLIST_INIT(&bcontainer->vrdl_list);
+    QLIST_INIT(&bcontainer->vpsl_list);
 }
 
 static const TypeInfo types[] = {
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 4cff9943ab..8d7c0b1179 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -47,6 +47,7 @@ typedef struct VFIOContainerBase {
     bool dirty_pages_started; /* Protected by BQL */
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
+    QLIST_HEAD(, VFIOPrivateSharedListener) vpsl_list;
     QLIST_ENTRY(VFIOContainerBase) next;
     QLIST_HEAD(, VFIODevice) device_list;
     GList *iova_ranges;
@@ -71,6 +72,15 @@ typedef struct VFIORamDiscardListener {
     QLIST_ENTRY(VFIORamDiscardListener) next;
 } VFIORamDiscardListener;
 
+typedef struct VFIOPrivateSharedListener {
+    VFIOContainerBase *bcontainer;
+    MemoryRegion *mr;
+    hwaddr offset_within_address_space;
+    uint64_t granularity;
+    PrivateSharedListener listener;
+    QLIST_ENTRY(VFIOPrivateSharedListener) next;
+} VFIOPrivateSharedListener;
+
 int vfio_container_dma_map(VFIOContainerBase *bcontainer,
                            hwaddr iova, ram_addr_t size,
                            void *vaddr, bool readonly);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
                   ` (5 preceding siblings ...)
  2025-04-07  7:49 ` [PATCH v4 06/13] vfio: Add the support for PrivateSharedManager Interface Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-04-09  9:57   ` Alexey Kardashevskiy
                     ` (2 more replies)
  2025-04-07  7:49 ` [PATCH v4 08/13] ram-block-attribute: Introduce a callback to notify shared/private state changes Chenyi Qiang
                   ` (5 subsequent siblings)
  12 siblings, 3 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
discard") highlighted that subsystems like VFIO may disable RAM block
discard. However, guest_memfd relies on discard operations for page
conversion between private and shared memory, potentially leading to
stale IOMMU mapping issue when assigning hardware devices to
confidential VMs via shared memory. To address this, it is crucial to
ensure systems like VFIO refresh its IOMMU mappings.

PrivateSharedManager is introduced to manage private and shared states in
confidential VMs, similar to RamDiscardManager, which supports
coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
guest_memfd can facilitate the adjustment of VFIO mappings in response
to page conversion events.

Since guest_memfd is not an object, it cannot directly implement the
PrivateSharedManager interface. Implementing it in HostMemoryBackend is
not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
have a memory backend while others do not. Notably, virtual BIOS
RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
backend.

To manage RAMBlocks with guest_memfd, define a new object named
RamBlockAttribute to implement the RamDiscardManager interface. This
object stores guest_memfd information such as shared_bitmap, and handles
page conversion notification. The memory state is tracked at the host
page size granularity, as the minimum memory conversion size can be one
page per request. Additionally, VFIO expects the DMA mapping for a
specific iova to be mapped and unmapped with the same granularity.
Confidential VMs may perform partial conversions, such as conversions on
small regions within larger regions. To prevent invalid cases and until
cut_mapping operation support is available, all operations are performed
with 4K granularity.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Change the name from memory-attribute-manager to
      ram-block-attribute.
    - Implement the newly-introduced PrivateSharedManager instead of
      RamDiscardManager and change related commit message.
    - Define the new object in ramblock.h instead of adding a new file.

Changes in v3:
    - Some rename (bitmap_size->shared_bitmap_size,
      first_one/zero_bit->first_bit, etc.)
    - Change shared_bitmap_size from uint32_t to unsigned
    - Return mgr->mr->ram_block->page_size in get_block_size()
    - Move set_ram_discard_manager() up to avoid a g_free() in failure
      case.
    - Add const for the memory_attribute_manager_get_block_size()
    - Unify the ReplayRamPopulate and ReplayRamDiscard and related
      callback.

Changes in v2:
    - Rename the object name to MemoryAttributeManager
    - Rename the bitmap to shared_bitmap to make it more clear.
    - Remove block_size field and get it from a helper. In future, we
      can get the page_size from RAMBlock if necessary.
    - Remove the unncessary "struct" before GuestMemfdReplayData
    - Remove the unncessary g_free() for the bitmap
    - Add some error report when the callback failure for
      populated/discarded section.
    - Move the realize()/unrealize() definition to this patch.
---
 include/exec/ramblock.h      |  24 +++
 system/meson.build           |   1 +
 system/ram-block-attribute.c | 282 +++++++++++++++++++++++++++++++++++
 3 files changed, 307 insertions(+)
 create mode 100644 system/ram-block-attribute.c

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 0babd105c0..b8b5469db9 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -23,6 +23,10 @@
 #include "cpu-common.h"
 #include "qemu/rcu.h"
 #include "exec/ramlist.h"
+#include "system/hostmem.h"
+
+#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
+OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass, RAM_BLOCK_ATTRIBUTE)
 
 struct RAMBlock {
     struct rcu_head rcu;
@@ -90,5 +94,25 @@ struct RAMBlock {
      */
     ram_addr_t postcopy_length;
 };
+
+struct RamBlockAttribute {
+    Object parent;
+
+    MemoryRegion *mr;
+
+    /* 1-setting of the bit represents the memory is populated (shared) */
+    unsigned shared_bitmap_size;
+    unsigned long *shared_bitmap;
+
+    QLIST_HEAD(, PrivateSharedListener) psl_list;
+};
+
+struct RamBlockAttributeClass {
+    ObjectClass parent_class;
+};
+
+int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr);
+void ram_block_attribute_unrealize(RamBlockAttribute *attr);
+
 #endif
 #endif
diff --git a/system/meson.build b/system/meson.build
index 4952f4b2c7..50a5a64f1c 100644
--- a/system/meson.build
+++ b/system/meson.build
@@ -15,6 +15,7 @@ system_ss.add(files(
   'dirtylimit.c',
   'dma-helpers.c',
   'globals.c',
+  'ram-block-attribute.c',
   'memory_mapping.c',
   'qdev-monitor.c',
   'qtest.c',
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
new file mode 100644
index 0000000000..283c03b354
--- /dev/null
+++ b/system/ram-block-attribute.c
@@ -0,0 +1,282 @@
+/*
+ * QEMU ram block attribute
+ *
+ * Copyright Intel
+ *
+ * Author:
+ *      Chenyi Qiang <chenyi.qiang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "exec/ramblock.h"
+
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(RamBlockAttribute,
+                                   ram_block_attribute,
+                                   RAM_BLOCK_ATTRIBUTE,
+                                   OBJECT,
+                                   { TYPE_PRIVATE_SHARED_MANAGER },
+                                   { })
+
+static size_t ram_block_attribute_get_block_size(const RamBlockAttribute *attr)
+{
+    /*
+     * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
+     * Use the host page size as the granularity to track the memory attribute.
+     */
+    g_assert(attr && attr->mr && attr->mr->ram_block);
+    g_assert(attr->mr->ram_block->page_size == qemu_real_host_page_size());
+    return attr->mr->ram_block->page_size;
+}
+
+
+static bool ram_block_attribute_psm_is_shared(const GenericStateManager *gsm,
+                                              const MemoryRegionSection *section)
+{
+    const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    uint64_t first_bit = section->offset_within_region / block_size;
+    uint64_t last_bit = first_bit + int128_get64(section->size) / block_size - 1;
+    unsigned long first_discard_bit;
+
+    first_discard_bit = find_next_zero_bit(attr->shared_bitmap, last_bit + 1, first_bit);
+    return first_discard_bit > last_bit;
+}
+
+typedef int (*ram_block_attribute_section_cb)(MemoryRegionSection *s, void *arg);
+
+static int ram_block_attribute_notify_shared_cb(MemoryRegionSection *section, void *arg)
+{
+    StateChangeListener *scl = arg;
+
+    return scl->notify_to_state_set(scl, section);
+}
+
+static int ram_block_attribute_notify_private_cb(MemoryRegionSection *section, void *arg)
+{
+    StateChangeListener *scl = arg;
+
+    scl->notify_to_state_clear(scl, section);
+    return 0;
+}
+
+static int ram_block_attribute_for_each_shared_section(const RamBlockAttribute *attr,
+                                                       MemoryRegionSection *section,
+                                                       void *arg,
+                                                       ram_block_attribute_section_cb cb)
+{
+    unsigned long first_bit, last_bit;
+    uint64_t offset, size;
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    int ret = 0;
+
+    first_bit = section->offset_within_region / block_size;
+    first_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size, first_bit);
+
+    while (first_bit < attr->shared_bitmap_size) {
+        MemoryRegionSection tmp = *section;
+
+        offset = first_bit * block_size;
+        last_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
+                                      first_bit + 1) - 1;
+        size = (last_bit - first_bit + 1) * block_size;
+
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+            break;
+        }
+
+        ret = cb(&tmp, arg);
+        if (ret) {
+            error_report("%s: Failed to notify RAM discard listener: %s", __func__,
+                         strerror(-ret));
+            break;
+        }
+
+        first_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size,
+                                  last_bit + 2);
+    }
+
+    return ret;
+}
+
+static int ram_block_attribute_for_each_private_section(const RamBlockAttribute *attr,
+                                                        MemoryRegionSection *section,
+                                                        void *arg,
+                                                        ram_block_attribute_section_cb cb)
+{
+    unsigned long first_bit, last_bit;
+    uint64_t offset, size;
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    int ret = 0;
+
+    first_bit = section->offset_within_region / block_size;
+    first_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
+                                   first_bit);
+
+    while (first_bit < attr->shared_bitmap_size) {
+        MemoryRegionSection tmp = *section;
+
+        offset = first_bit * block_size;
+        last_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size,
+                                      first_bit + 1) - 1;
+        size = (last_bit - first_bit + 1) * block_size;
+
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+            break;
+        }
+
+        ret = cb(&tmp, arg);
+        if (ret) {
+            error_report("%s: Failed to notify RAM discard listener: %s", __func__,
+                         strerror(-ret));
+            break;
+        }
+
+        first_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
+                                       last_bit + 2);
+    }
+
+    return ret;
+}
+
+static uint64_t ram_block_attribute_psm_get_min_granularity(const GenericStateManager *gsm,
+                                                            const MemoryRegion *mr)
+{
+    const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+
+    g_assert(mr == attr->mr);
+    return ram_block_attribute_get_block_size(attr);
+}
+
+static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
+                                                      StateChangeListener *scl,
+                                                      MemoryRegionSection *section)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
+    int ret;
+
+    g_assert(section->mr == attr->mr);
+    scl->section = memory_region_section_new_copy(section);
+
+    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
+
+    ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
+                                                      ram_block_attribute_notify_shared_cb);
+    if (ret) {
+        error_report("%s: Failed to register RAM discard listener: %s", __func__,
+                     strerror(-ret));
+    }
+}
+
+static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm,
+                                                        StateChangeListener *scl)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
+    int ret;
+
+    g_assert(scl->section);
+    g_assert(scl->section->mr == attr->mr);
+
+    ret = ram_block_attribute_for_each_shared_section(attr, scl->section, scl,
+                                                      ram_block_attribute_notify_private_cb);
+    if (ret) {
+        error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
+                     strerror(-ret));
+    }
+
+    memory_region_section_free_copy(scl->section);
+    scl->section = NULL;
+    QLIST_REMOVE(psl, next);
+}
+
+typedef struct RamBlockAttributeReplayData {
+    ReplayStateChange fn;
+    void *opaque;
+} RamBlockAttributeReplayData;
+
+static int ram_block_attribute_psm_replay_cb(MemoryRegionSection *section, void *arg)
+{
+    RamBlockAttributeReplayData *data = arg;
+
+    return data->fn(section, data->opaque);
+}
+
+static int ram_block_attribute_psm_replay_on_shared(const GenericStateManager *gsm,
+                                                    MemoryRegionSection *section,
+                                                    ReplayStateChange replay_fn,
+                                                    void *opaque)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+    RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
+
+    g_assert(section->mr == attr->mr);
+    return ram_block_attribute_for_each_shared_section(attr, section, &data,
+                                                       ram_block_attribute_psm_replay_cb);
+}
+
+static int ram_block_attribute_psm_replay_on_private(const GenericStateManager *gsm,
+                                                     MemoryRegionSection *section,
+                                                     ReplayStateChange replay_fn,
+                                                     void *opaque)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+    RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
+
+    g_assert(section->mr == attr->mr);
+    return ram_block_attribute_for_each_private_section(attr, section, &data,
+                                                        ram_block_attribute_psm_replay_cb);
+}
+
+int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr)
+{
+    uint64_t shared_bitmap_size;
+    const int block_size  = qemu_real_host_page_size();
+    int ret;
+
+    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
+
+    attr->mr = mr;
+    ret = memory_region_set_generic_state_manager(mr, GENERIC_STATE_MANAGER(attr));
+    if (ret) {
+        return ret;
+    }
+    attr->shared_bitmap_size = shared_bitmap_size;
+    attr->shared_bitmap = bitmap_new(shared_bitmap_size);
+
+    return ret;
+}
+
+void ram_block_attribute_unrealize(RamBlockAttribute *attr)
+{
+    g_free(attr->shared_bitmap);
+    memory_region_set_generic_state_manager(attr->mr, NULL);
+}
+
+static void ram_block_attribute_init(Object *obj)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj);
+
+    QLIST_INIT(&attr->psl_list);
+}
+
+static void ram_block_attribute_finalize(Object *obj)
+{
+}
+
+static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
+{
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
+
+    gsmc->get_min_granularity = ram_block_attribute_psm_get_min_granularity;
+    gsmc->register_listener = ram_block_attribute_psm_register_listener;
+    gsmc->unregister_listener = ram_block_attribute_psm_unregister_listener;
+    gsmc->is_state_set = ram_block_attribute_psm_is_shared;
+    gsmc->replay_on_state_set = ram_block_attribute_psm_replay_on_shared;
+    gsmc->replay_on_state_clear = ram_block_attribute_psm_replay_on_private;
+}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 08/13] ram-block-attribute: Introduce a callback to notify shared/private state changes
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
                   ` (6 preceding siblings ...)
  2025-04-07  7:49 ` [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-04-07  7:49 ` [PATCH v4 09/13] memory: Attach RamBlockAttribute to guest_memfd-backed RAMBlocks Chenyi Qiang
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

A new state_change() callback is introduced in PrivateSharedManageClass
to efficiently notify all registered PrivateSharedListeners, including
VFIO listeners, about memory conversion events in guest_memfd. The VFIO
listener can dynamically DMA map/unmap shared pages based on conversion
types:
- For conversions from shared to private, the VFIO system ensures the
  discarding of shared mapping from the IOMMU.
- For conversions from private to shared, it triggers the population of
  the shared mapping into the IOMMU.

Additionally, special conversion requests are handled as followed:
- If a conversion request is made for a page already in the desired
  state, the helper simply returns success.
- For requests involving a range partially in the desired state, only
  the necessary segments are converted, ensuring efficient compliance
  with the request. In this case, fallback to "1 block at a time"
  handling so that the range passed to the notify_to_private/shared() is
  always in the desired state.
- If a conversion request is declined by other systems, such as a
  failure from VFIO during notify_to_shared(), the helper rolls back the
  request to maintain consistency. As for notify_to_private() handling,
  failure in VFIO is unexpected, so no error check is performed.

Note that the bitmap status is updated before callbacks, allowing
listeners to handle memory based on the latest status.

Opportunistically, introduce a helper to trigger the state_change()
callback of the class.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Add the state_change() callback in PrivateSharedManagerClass
      instead of the RamBlockAttribute.

Changes in v3:
    - Move the bitmap update before notifier callbacks.
    - Call the notifier callbacks directly in notify_discard/populate()
      with the expectation that the request memory range is in the
      desired attribute.
    - For the case that only partial range in the desire status, handle
      the range with block_size granularity for ease of rollback
      (https://lore.kernel.org/qemu-devel/812768d7-a02d-4b29-95f3-fb7a125cf54e@redhat.com/)

Changes in v2:
    - Do the alignment changes due to the rename to MemoryAttributeManager
    - Move the state_change() helper definition in this patch.
---
 include/exec/memory.h        |   7 ++
 system/memory.c              |  10 ++
 system/ram-block-attribute.c | 191 +++++++++++++++++++++++++++++++++++
 3 files changed, 208 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 08f25e5e84..a61896251c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -780,6 +780,9 @@ struct PrivateSharedListener {
 struct PrivateSharedManagerClass {
     /* private */
     GenericStateManagerClass parent_class;
+
+    int (*state_change)(PrivateSharedManager *mgr, uint64_t offset, uint64_t size,
+                        bool to_private);
 };
 
 static inline void private_shared_listener_init(PrivateSharedListener *psl,
@@ -789,6 +792,10 @@ static inline void private_shared_listener_init(PrivateSharedListener *psl,
     state_change_listener_init(&psl->scl, populate_fn, discard_fn);
 }
 
+int private_shared_manager_state_change(PrivateSharedManager *mgr,
+                                        uint64_t offset, uint64_t size,
+                                        bool to_private);
+
 /**
  * memory_get_xlat_addr: Extract addresses from a TLB entry
  *
diff --git a/system/memory.c b/system/memory.c
index e6e944d9c0..2f6eaf6314 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2206,6 +2206,16 @@ void generic_state_manager_unregister_listener(GenericStateManager *gsm,
     gsmc->unregister_listener(gsm, scl);
 }
 
+int private_shared_manager_state_change(PrivateSharedManager *mgr,
+                                        uint64_t offset, uint64_t size,
+                                        bool to_private)
+{
+    PrivateSharedManagerClass *psmc = PRIVATE_SHARED_MANAGER_GET_CLASS(mgr);
+
+    g_assert(psmc->state_change);
+    return psmc->state_change(mgr, offset, size, to_private);
+}
+
 /* Called with rcu_read_lock held.  */
 bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
                           ram_addr_t *ram_addr, bool *read_only,
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
index 283c03b354..06ed134cda 100644
--- a/system/ram-block-attribute.c
+++ b/system/ram-block-attribute.c
@@ -233,6 +233,195 @@ static int ram_block_attribute_psm_replay_on_private(const GenericStateManager *
                                                         ram_block_attribute_psm_replay_cb);
 }
 
+static bool ram_block_attribute_is_valid_range(RamBlockAttribute *attr,
+                                               uint64_t offset, uint64_t size)
+{
+    MemoryRegion *mr = attr->mr;
+
+    g_assert(mr);
+
+    uint64_t region_size = memory_region_size(mr);
+    int block_size = ram_block_attribute_get_block_size(attr);
+
+    if (!QEMU_IS_ALIGNED(offset, block_size)) {
+        return false;
+    }
+    if (offset + size < offset || !size) {
+        return false;
+    }
+    if (offset >= region_size || offset + size > region_size) {
+        return false;
+    }
+    return true;
+}
+
+static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
+                                                  uint64_t offset, uint64_t size)
+{
+    PrivateSharedListener *psl;
+
+    QLIST_FOREACH(psl, &attr->psl_list, next) {
+        StateChangeListener *scl = &psl->scl;
+        MemoryRegionSection tmp = *scl->section;
+
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+            continue;
+        }
+        scl->notify_to_state_clear(scl, &tmp);
+    }
+}
+
+static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
+                                                uint64_t offset, uint64_t size)
+{
+    PrivateSharedListener *psl, *psl2;
+    int ret = 0;
+
+    QLIST_FOREACH(psl, &attr->psl_list, next) {
+        StateChangeListener *scl = &psl->scl;
+        MemoryRegionSection tmp = *scl->section;
+
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+            continue;
+        }
+        ret = scl->notify_to_state_set(scl, &tmp);
+        if (ret) {
+            break;
+        }
+    }
+
+    if (ret) {
+        /* Notify all already-notified listeners. */
+        QLIST_FOREACH(psl2, &attr->psl_list, next) {
+            StateChangeListener *scl2 = &psl2->scl;
+            MemoryRegionSection tmp = *scl2->section;
+
+            if (psl == psl2) {
+                break;
+            }
+            if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+                continue;
+            }
+            scl2->notify_to_state_clear(scl2, &tmp);
+        }
+    }
+    return ret;
+}
+
+static bool ram_block_attribute_is_range_shared(RamBlockAttribute *attr,
+                                                uint64_t offset, uint64_t size)
+{
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    const unsigned long first_bit = offset / block_size;
+    const unsigned long last_bit = first_bit + (size / block_size) - 1;
+    unsigned long found_bit;
+
+    /* We fake a shorter bitmap to avoid searching too far. */
+    found_bit = find_next_zero_bit(attr->shared_bitmap, last_bit + 1, first_bit);
+    return found_bit > last_bit;
+}
+
+static bool ram_block_attribute_is_range_private(RamBlockAttribute *attr,
+                                                uint64_t offset, uint64_t size)
+{
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    const unsigned long first_bit = offset / block_size;
+    const unsigned long last_bit = first_bit + (size / block_size) - 1;
+    unsigned long found_bit;
+
+    /* We fake a shorter bitmap to avoid searching too far. */
+    found_bit = find_next_bit(attr->shared_bitmap, last_bit + 1, first_bit);
+    return found_bit > last_bit;
+}
+
+static int ram_block_attribute_psm_state_change(PrivateSharedManager *mgr, uint64_t offset,
+                                                uint64_t size, bool to_private)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(mgr);
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    const unsigned long first_bit = offset / block_size;
+    const unsigned long nbits = size / block_size;
+    const uint64_t end = offset + size;
+    unsigned long bit;
+    uint64_t cur;
+    int ret = 0;
+
+    if (!ram_block_attribute_is_valid_range(attr, offset, size)) {
+        error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
+                     __func__, offset, size);
+        return -1;
+    }
+
+    if (to_private) {
+        if (ram_block_attribute_is_range_private(attr, offset, size)) {
+            /* Already private */
+        } else if (!ram_block_attribute_is_range_shared(attr, offset, size)) {
+            /* Unexpected mixture: process individual blocks */
+            for (cur = offset; cur < end; cur += block_size) {
+                bit = cur / block_size;
+                if (!test_bit(bit, attr->shared_bitmap)) {
+                    continue;
+                }
+                clear_bit(bit, attr->shared_bitmap);
+                ram_block_attribute_notify_to_private(attr, cur, block_size);
+            }
+        } else {
+            /* Completely shared */
+            bitmap_clear(attr->shared_bitmap, first_bit, nbits);
+            ram_block_attribute_notify_to_private(attr, offset, size);
+        }
+    } else {
+        if (ram_block_attribute_is_range_shared(attr, offset, size)) {
+            /* Already shared */
+        } else if (!ram_block_attribute_is_range_private(attr, offset, size)) {
+            /* Unexpected mixture: process individual blocks */
+            unsigned long *modified_bitmap = bitmap_new(nbits);
+
+            for (cur = offset; cur < end; cur += block_size) {
+                bit = cur / block_size;
+                if (test_bit(bit, attr->shared_bitmap)) {
+                    continue;
+                }
+                set_bit(bit, attr->shared_bitmap);
+                ret = ram_block_attribute_notify_to_shared(attr, cur, block_size);
+                if (!ret) {
+                    set_bit(bit - first_bit, modified_bitmap);
+                    continue;
+                }
+                clear_bit(bit, attr->shared_bitmap);
+                break;
+            }
+
+            if (ret) {
+                /*
+                 * Very unexpected: something went wrong. Revert to the old
+                 * state, marking only the blocks as private that we converted
+                 * to shared.
+                 */
+                for (cur = offset; cur < end; cur += block_size) {
+                    bit = cur / block_size;
+                    if (!test_bit(bit - first_bit, modified_bitmap)) {
+                        continue;
+                    }
+                    assert(test_bit(bit, attr->shared_bitmap));
+                    clear_bit(bit, attr->shared_bitmap);
+                    ram_block_attribute_notify_to_private(attr, cur, block_size);
+                }
+            }
+            g_free(modified_bitmap);
+        } else {
+            /* Complete private */
+            bitmap_set(attr->shared_bitmap, first_bit, nbits);
+            ret = ram_block_attribute_notify_to_shared(attr, offset, size);
+            if (ret) {
+                bitmap_clear(attr->shared_bitmap, first_bit, nbits);
+            }
+        }
+    }
+
+    return ret;
+}
+
 int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr)
 {
     uint64_t shared_bitmap_size;
@@ -272,6 +461,7 @@ static void ram_block_attribute_finalize(Object *obj)
 static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
 {
     GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
+    PrivateSharedManagerClass *psmc = PRIVATE_SHARED_MANAGER_CLASS(oc);
 
     gsmc->get_min_granularity = ram_block_attribute_psm_get_min_granularity;
     gsmc->register_listener = ram_block_attribute_psm_register_listener;
@@ -279,4 +469,5 @@ static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
     gsmc->is_state_set = ram_block_attribute_psm_is_shared;
     gsmc->replay_on_state_set = ram_block_attribute_psm_replay_on_shared;
     gsmc->replay_on_state_clear = ram_block_attribute_psm_replay_on_private;
+    psmc->state_change = ram_block_attribute_psm_state_change;
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 09/13] memory: Attach RamBlockAttribute to guest_memfd-backed RAMBlocks
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
                   ` (7 preceding siblings ...)
  2025-04-07  7:49 ` [PATCH v4 08/13] ram-block-attribute: Introduce a callback to notify shared/private state changes Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-04-07  7:49 ` [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result Chenyi Qiang
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

A new field, ram_block_attribute, is introduced in RAMBlock to link to a
RamBlockAttribute object. This change centralizes all guest_memfd state
information (such as fd and shared_bitmap) within a RAMBlock,
simplifying management.

The realize()/unrealized() helpers are used to initialize/uninitialize
the RamBlockAttribute object. The object is registered/unregistered in
the target RAMBlock's MemoryRegion when creating guest_memfd.

Additionally, use the private_shared_manager_state_change() helper to
notify the registered PrivateSharedListener of these changes.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Remove the replay operations for attribute changes which will be
      handled in a listener in following patches.
    - Add some comment in the error path of realize() to remind the
      future development of the unified error path.

Changes in v3:
    - Use ram_discard_manager_reply_populated/discarded() to set the
      memory attribute and add the undo support if state_change()
      failed.
    - Didn't add Reviewed-by from Alexey due to the new changes in this
      commit.

Changes in v2:
    - Introduce a new field memory_attribute_manager in RAMBlock.
    - Move the state_change() handling during page conversion in this patch.
    - Undo what we did if it fails to set.
    - Change the order of close(guest_memfd) and memory_attribute_manager cleanup.
---
 accel/kvm/kvm-all.c     |  9 +++++++++
 include/exec/ramblock.h |  1 +
 system/physmem.c        | 16 ++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c1fea69d58..546b58b737 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3088,6 +3088,15 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
     addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
     rb = qemu_ram_block_from_host(addr, false, &offset);
 
+    ret = private_shared_manager_state_change(PRIVATE_SHARED_MANAGER(mr->gsm),
+                                              offset, size, to_private);
+    if (ret) {
+        error_report("Failed to notify the listener the state change of "
+                     "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s",
+                     start, size, to_private ? "private" : "shared");
+        goto out_unref;
+    }
+
     if (to_private) {
         if (rb->page_size != qemu_real_host_page_size()) {
             /*
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index b8b5469db9..78eb031819 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -46,6 +46,7 @@ struct RAMBlock {
     int fd;
     uint64_t fd_offset;
     int guest_memfd;
+    RamBlockAttribute *ram_block_attribute;
     size_t page_size;
     /* dirty bitmap used during migration */
     unsigned long *bmap;
diff --git a/system/physmem.c b/system/physmem.c
index c76503aea8..fb74321e10 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1885,6 +1885,20 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
             qemu_mutex_unlock_ramlist();
             goto out_free;
         }
+
+        new_block->ram_block_attribute = RAM_BLOCK_ATTRIBUTE(object_new(TYPE_RAM_BLOCK_ATTRIBUTE));
+        if (ram_block_attribute_realize(new_block->ram_block_attribute, new_block->mr)) {
+            error_setg(errp, "Failed to realize ram block attribute");
+            /*
+             * The error path could be unified if the rest of ram_block_add() ever
+             * develops a need to check for errors.
+             */
+            object_unref(OBJECT(new_block->ram_block_attribute));
+            close(new_block->guest_memfd);
+            ram_block_discard_require(false);
+            qemu_mutex_unlock_ramlist();
+            goto out_free;
+        }
     }
 
     ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
@@ -2138,6 +2152,8 @@ static void reclaim_ramblock(RAMBlock *block)
     }
 
     if (block->guest_memfd >= 0) {
+        ram_block_attribute_unrealize(block->ram_block_attribute);
+        object_unref(OBJECT(block->ram_block_attribute));
         close(block->guest_memfd);
         ram_block_discard_require(false);
     }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
                   ` (8 preceding siblings ...)
  2025-04-07  7:49 ` [PATCH v4 09/13] memory: Attach RamBlockAttribute to guest_memfd-backed RAMBlocks Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-04-27  2:26   ` Chenyi Qiang
  2025-04-07  7:49 ` [PATCH v4 11/13] KVM: Introduce CVMPrivateSharedListener for attribute changes during page conversions Chenyi Qiang
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

So that the caller can check the result of NotifyStateClear() handler if
the operation fails.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Newly added.
---
 hw/vfio/common.c      | 18 ++++++++++--------
 include/exec/memory.h |  4 ++--
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 48468a12c3..6e49ae597d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -335,8 +335,8 @@ out:
     rcu_read_unlock();
 }
 
-static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
-                                                    MemoryRegionSection *section)
+static int vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
+                                                   MemoryRegionSection *section)
 {
     const hwaddr size = int128_get64(section->size);
     const hwaddr iova = section->offset_within_address_space;
@@ -348,24 +348,26 @@ static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontaine
         error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
                      strerror(-ret));
     }
+
+    return ret;
 }
 
-static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
-                                            MemoryRegionSection *section)
+static int vfio_ram_discard_notify_discard(StateChangeListener *scl,
+                                           MemoryRegionSection *section)
 {
     RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
-    vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
+    return vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
 }
 
-static void vfio_private_shared_notify_to_private(StateChangeListener *scl,
-                                                  MemoryRegionSection *section)
+static int vfio_private_shared_notify_to_private(StateChangeListener *scl,
+                                                 MemoryRegionSection *section)
 {
     PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
     VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
                                                    listener);
-    vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
+    return vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
 }
 
 static int vfio_state_change_notify_to_state_set(VFIOContainerBase *bcontainer,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index a61896251c..9472d9e9b4 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -523,8 +523,8 @@ typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
 typedef struct StateChangeListener StateChangeListener;
 typedef int (*NotifyStateSet)(StateChangeListener *scl,
                               MemoryRegionSection *section);
-typedef void (*NotifyStateClear)(StateChangeListener *scl,
-                                 MemoryRegionSection *section);
+typedef int (*NotifyStateClear)(StateChangeListener *scl,
+                                MemoryRegionSection *section);
 
 struct StateChangeListener {
     /*
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 11/13] KVM: Introduce CVMPrivateSharedListener for attribute changes during page conversions
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
                   ` (9 preceding siblings ...)
  2025-04-07  7:49 ` [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-05-09  9:03   ` Baolu Lu
  2025-04-07  7:49 ` [PATCH v4 12/13] ram-block-attribute: Add priority listener support for PrivateSharedListener Chenyi Qiang
  2025-04-07  7:49 ` [PATCH v4 13/13] RAMBlock: Make guest_memfd require coordinate discard Chenyi Qiang
  12 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

With the introduction of the RamBlockAttribute object to manage
RAMBlocks with guest_memfd and the implementation of
PrivateSharedManager interface to convey page conversion events, it is
more elegant to move attribute changes into a PrivateSharedListener.

The PrivateSharedListener is reigstered/unregistered for each memory
region section during kvm_region_add/del(), and listeners are stored in
a CVMPrivateSharedListener list for easy management. The listener
handler performs attribute changes upon receiving notifications from
private_shared_manager_state_change() calls. With this change, the
state changes operations in kvm_convert_memory() can be removed.

Note that after moving attribute changes into a listener, errors can be
returned in ram_block_attribute_notify_to_private() if attribute changes
fail in corner cases (e.g. -ENOMEM). Since there is currently no rollback
operation for the to_private case, an assert is used to prevent the
guest from continuing with a partially changed attribute state.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Newly added.
---
 accel/kvm/kvm-all.c                         | 73 ++++++++++++++++++---
 include/system/confidential-guest-support.h | 10 +++
 system/ram-block-attribute.c                | 17 ++++-
 target/i386/kvm/tdx.c                       |  1 +
 target/i386/sev.c                           |  1 +
 5 files changed, 90 insertions(+), 12 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 546b58b737..aec64d559b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -48,6 +48,7 @@
 #include "kvm-cpus.h"
 #include "system/dirtylimit.h"
 #include "qemu/range.h"
+#include "system/confidential-guest-support.h"
 
 #include "hw/boards.h"
 #include "system/stats.h"
@@ -1691,28 +1692,91 @@ static int kvm_dirty_ring_init(KVMState *s)
     return 0;
 }
 
+static int kvm_private_shared_notify(StateChangeListener *scl,
+                                     MemoryRegionSection *section,
+                                     bool to_private)
+{
+    hwaddr start = section->offset_within_address_space;
+    hwaddr size = section->size;
+
+    if (to_private) {
+        return kvm_set_memory_attributes_private(start, size);
+    } else {
+        return kvm_set_memory_attributes_shared(start, size);
+    }
+}
+
+static int kvm_private_shared_notify_to_shared(StateChangeListener *scl,
+                                               MemoryRegionSection *section)
+{
+    return kvm_private_shared_notify(scl, section, false);
+}
+
+static int kvm_private_shared_notify_to_private(StateChangeListener *scl,
+                                                MemoryRegionSection *section)
+{
+    return kvm_private_shared_notify(scl, section, true);
+}
+
 static void kvm_region_add(MemoryListener *listener,
                            MemoryRegionSection *section)
 {
     KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
+    ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs;
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
     KVMMemoryUpdate *update;
+    CVMPrivateSharedListener *cpsl;
+    PrivateSharedListener *psl;
+
 
     update = g_new0(KVMMemoryUpdate, 1);
     update->section = *section;
 
     QSIMPLEQ_INSERT_TAIL(&kml->transaction_add, update, next);
+
+    if (!memory_region_has_guest_memfd(section->mr) || !gsm) {
+        return;
+    }
+
+    cpsl = g_new0(CVMPrivateSharedListener, 1);
+    cpsl->mr = section->mr;
+    cpsl->offset_within_address_space = section->offset_within_address_space;
+    cpsl->granularity = generic_state_manager_get_min_granularity(gsm, section->mr);
+    psl = &cpsl->listener;
+    QLIST_INSERT_HEAD(&cgs->cvm_private_shared_list, cpsl, next);
+    private_shared_listener_init(psl, kvm_private_shared_notify_to_shared,
+                                 kvm_private_shared_notify_to_private);
+    generic_state_manager_register_listener(gsm, &psl->scl, section);
 }
 
 static void kvm_region_del(MemoryListener *listener,
                            MemoryRegionSection *section)
 {
     KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
+    ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs;
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
     KVMMemoryUpdate *update;
+    CVMPrivateSharedListener *cpsl;
+    PrivateSharedListener *psl;
 
     update = g_new0(KVMMemoryUpdate, 1);
     update->section = *section;
 
     QSIMPLEQ_INSERT_TAIL(&kml->transaction_del, update, next);
+    if (!memory_region_has_guest_memfd(section->mr) || !gsm) {
+        return;
+    }
+
+    QLIST_FOREACH(cpsl, &cgs->cvm_private_shared_list, next) {
+        if (cpsl->mr == section->mr &&
+            cpsl->offset_within_address_space == section->offset_within_address_space) {
+            psl = &cpsl->listener;
+            generic_state_manager_unregister_listener(gsm, &psl->scl);
+            QLIST_REMOVE(cpsl, next);
+            g_free(cpsl);
+            break;
+        }
+    }
 }
 
 static void kvm_region_commit(MemoryListener *listener)
@@ -3076,15 +3140,6 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
         goto out_unref;
     }
 
-    if (to_private) {
-        ret = kvm_set_memory_attributes_private(start, size);
-    } else {
-        ret = kvm_set_memory_attributes_shared(start, size);
-    }
-    if (ret) {
-        goto out_unref;
-    }
-
     addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
     rb = qemu_ram_block_from_host(addr, false, &offset);
 
diff --git a/include/system/confidential-guest-support.h b/include/system/confidential-guest-support.h
index b68c4bebbc..64f67db19a 100644
--- a/include/system/confidential-guest-support.h
+++ b/include/system/confidential-guest-support.h
@@ -23,12 +23,20 @@
 #endif
 
 #include "qom/object.h"
+#include "exec/memory.h"
 
 #define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
 OBJECT_DECLARE_TYPE(ConfidentialGuestSupport,
                     ConfidentialGuestSupportClass,
                     CONFIDENTIAL_GUEST_SUPPORT)
 
+typedef struct CVMPrivateSharedListener {
+    MemoryRegion *mr;
+    hwaddr offset_within_address_space;
+    uint64_t granularity;
+    PrivateSharedListener listener;
+    QLIST_ENTRY(CVMPrivateSharedListener) next;
+} CVMPrivateSharedListener;
 
 struct ConfidentialGuestSupport {
     Object parent;
@@ -38,6 +46,8 @@ struct ConfidentialGuestSupport {
      */
     bool require_guest_memfd;
 
+    QLIST_HEAD(, CVMPrivateSharedListener) cvm_private_shared_list;
+
     /*
      * ready: flag set by CGS initialization code once it's ready to
      *        start executing instructions in a potentially-secure
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
index 06ed134cda..15c9aebd09 100644
--- a/system/ram-block-attribute.c
+++ b/system/ram-block-attribute.c
@@ -259,6 +259,7 @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
                                                   uint64_t offset, uint64_t size)
 {
     PrivateSharedListener *psl;
+    int ret;
 
     QLIST_FOREACH(psl, &attr->psl_list, next) {
         StateChangeListener *scl = &psl->scl;
@@ -267,7 +268,12 @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
         if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             continue;
         }
-        scl->notify_to_state_clear(scl, &tmp);
+        /*
+         * No undo operation for the state_clear() callback failure at present.
+         * Expect the state_clear() callback always succeed.
+         */
+        ret = scl->notify_to_state_clear(scl, &tmp);
+        g_assert(!ret);
     }
 }
 
@@ -275,7 +281,7 @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
                                                 uint64_t offset, uint64_t size)
 {
     PrivateSharedListener *psl, *psl2;
-    int ret = 0;
+    int ret = 0, ret2 = 0;
 
     QLIST_FOREACH(psl, &attr->psl_list, next) {
         StateChangeListener *scl = &psl->scl;
@@ -302,7 +308,12 @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
             if (!memory_region_section_intersect_range(&tmp, offset, size)) {
                 continue;
             }
-            scl2->notify_to_state_clear(scl2, &tmp);
+            /*
+             * No undo operation for the state_clear() callback failure at present.
+             * Expect the state_clear() callback always succeed.
+             */
+            ret2 = scl2->notify_to_state_clear(scl2, &tmp);
+            g_assert(!ret2);
         }
     }
     return ret;
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index c906a76c4c..718385c8de 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -1179,6 +1179,7 @@ static void tdx_guest_init(Object *obj)
     qemu_mutex_init(&tdx->lock);
 
     cgs->require_guest_memfd = true;
+    QLIST_INIT(&cgs->cvm_private_shared_list);
     tdx->attributes = TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE;
 
     object_property_add_uint64_ptr(obj, "attributes", &tdx->attributes,
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 217b19ad7b..6647727a44 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -2432,6 +2432,7 @@ sev_snp_guest_instance_init(Object *obj)
     SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(obj);
 
     cgs->require_guest_memfd = true;
+    QLIST_INIT(&cgs->cvm_private_shared_list);
 
     /* default init/start/finish params for kvm */
     sev_snp_guest->kvm_start_conf.policy = DEFAULT_SEV_SNP_POLICY;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 12/13] ram-block-attribute: Add priority listener support for PrivateSharedListener
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
                   ` (10 preceding siblings ...)
  2025-04-07  7:49 ` [PATCH v4 11/13] KVM: Introduce CVMPrivateSharedListener for attribute changes during page conversions Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  2025-05-09  9:23   ` Baolu Lu
  2025-04-07  7:49 ` [PATCH v4 13/13] RAMBlock: Make guest_memfd require coordinate discard Chenyi Qiang
  12 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

In-place page conversion requires operations to follow a specific
sequence: unmap-before-conversion-to-private and
map-after-conversion-to-shared. Currently, both attribute changes and
VFIO DMA map/unmap operations are handled by PrivateSharedListeners,
they need to be invoked in a specific order.

For private to shared conversion:
- Change attribute to shared.
- VFIO populates the shared mappings into the IOMMU.
- Restore attribute if the operation fails.

For shared to private conversion:
- VFIO discards shared mapping from the IOMMU.
- Change attribute to private.

To faciliate this sequence, priority support is added to
PrivateSharedListener so that listeners are stored in a determined
order based on priority. A tail queue is used to store listeners,
allowing traversal in either direction.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Newly added.
---
 accel/kvm/kvm-all.c          |  3 ++-
 hw/vfio/common.c             |  3 ++-
 include/exec/memory.h        | 19 +++++++++++++++++--
 include/exec/ramblock.h      |  2 +-
 system/ram-block-attribute.c | 23 +++++++++++++++++------
 5 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index aec64d559b..879c61b391 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1745,7 +1745,8 @@ static void kvm_region_add(MemoryListener *listener,
     psl = &cpsl->listener;
     QLIST_INSERT_HEAD(&cgs->cvm_private_shared_list, cpsl, next);
     private_shared_listener_init(psl, kvm_private_shared_notify_to_shared,
-                                 kvm_private_shared_notify_to_private);
+                                 kvm_private_shared_notify_to_private,
+                                 PRIVATE_SHARED_LISTENER_PRIORITY_MIN);
     generic_state_manager_register_listener(gsm, &psl->scl, section);
 }
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6e49ae597d..a8aacae26c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -515,7 +515,8 @@ static void vfio_register_private_shared_listener(VFIOContainerBase *bcontainer,
 
     psl = &vpsl->listener;
     private_shared_listener_init(psl, vfio_private_shared_notify_to_shared,
-                                 vfio_private_shared_notify_to_private);
+                                 vfio_private_shared_notify_to_private,
+                                 PRIVATE_SHARED_LISTENER_PRIORITY_COMMON);
     generic_state_manager_register_listener(gsm, &psl->scl, section);
     QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next);
 }
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9472d9e9b4..3d06cc04a0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -770,11 +770,24 @@ struct RamDiscardManagerClass {
     GenericStateManagerClass parent_class;
 };
 
+#define PRIVATE_SHARED_LISTENER_PRIORITY_MIN       0
+#define PRIVATE_SHARED_LISTENER_PRIORITY_COMMON    10
+
 typedef struct PrivateSharedListener PrivateSharedListener;
 struct PrivateSharedListener {
     struct StateChangeListener scl;
 
-    QLIST_ENTRY(PrivateSharedListener) next;
+    /*
+     * @priority:
+     *
+     * Govern the order in which ram discard listeners are invoked. Lower priorities
+     * are invoked earlier.
+     * The listener priority can help to undo the effects of previous listeners in
+     * a reverse order in case of a failure callback.
+     */
+    int priority;
+
+    QTAILQ_ENTRY(PrivateSharedListener) next;
 };
 
 struct PrivateSharedManagerClass {
@@ -787,9 +800,11 @@ struct PrivateSharedManagerClass {
 
 static inline void private_shared_listener_init(PrivateSharedListener *psl,
                                                 NotifyStateSet populate_fn,
-                                                NotifyStateClear discard_fn)
+                                                NotifyStateClear discard_fn,
+                                                int priority)
 {
     state_change_listener_init(&psl->scl, populate_fn, discard_fn);
+    psl->priority = priority;
 }
 
 int private_shared_manager_state_change(PrivateSharedManager *mgr,
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 78eb031819..7a3dd709bb 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -105,7 +105,7 @@ struct RamBlockAttribute {
     unsigned shared_bitmap_size;
     unsigned long *shared_bitmap;
 
-    QLIST_HEAD(, PrivateSharedListener) psl_list;
+    QTAILQ_HEAD(, PrivateSharedListener) psl_list;
 };
 
 struct RamBlockAttributeClass {
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
index 15c9aebd09..fd041148c7 100644
--- a/system/ram-block-attribute.c
+++ b/system/ram-block-attribute.c
@@ -158,12 +158,23 @@ static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
 {
     RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
     PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
+    PrivateSharedListener *other = NULL;
     int ret;
 
     g_assert(section->mr == attr->mr);
     scl->section = memory_region_section_new_copy(section);
 
-    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
+    if (QTAILQ_EMPTY(&attr->psl_list) ||
+        psl->priority >= QTAILQ_LAST(&attr->psl_list)->priority) {
+        QTAILQ_INSERT_TAIL(&attr->psl_list, psl, next);
+    } else {
+        QTAILQ_FOREACH(other, &attr->psl_list, next) {
+            if (psl->priority < other->priority) {
+                break;
+            }
+        }
+        QTAILQ_INSERT_BEFORE(other, psl, next);
+    }
 
     ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
                                                       ram_block_attribute_notify_shared_cb);
@@ -192,7 +203,7 @@ static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm
 
     memory_region_section_free_copy(scl->section);
     scl->section = NULL;
-    QLIST_REMOVE(psl, next);
+    QTAILQ_REMOVE(&attr->psl_list, psl, next);
 }
 
 typedef struct RamBlockAttributeReplayData {
@@ -261,7 +272,7 @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
     PrivateSharedListener *psl;
     int ret;
 
-    QLIST_FOREACH(psl, &attr->psl_list, next) {
+    QTAILQ_FOREACH_REVERSE(psl, &attr->psl_list, next) {
         StateChangeListener *scl = &psl->scl;
         MemoryRegionSection tmp = *scl->section;
 
@@ -283,7 +294,7 @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
     PrivateSharedListener *psl, *psl2;
     int ret = 0, ret2 = 0;
 
-    QLIST_FOREACH(psl, &attr->psl_list, next) {
+    QTAILQ_FOREACH(psl, &attr->psl_list, next) {
         StateChangeListener *scl = &psl->scl;
         MemoryRegionSection tmp = *scl->section;
 
@@ -298,7 +309,7 @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
 
     if (ret) {
         /* Notify all already-notified listeners. */
-        QLIST_FOREACH(psl2, &attr->psl_list, next) {
+        QTAILQ_FOREACH(psl2, &attr->psl_list, next) {
             StateChangeListener *scl2 = &psl2->scl;
             MemoryRegionSection tmp = *scl2->section;
 
@@ -462,7 +473,7 @@ static void ram_block_attribute_init(Object *obj)
 {
     RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj);
 
-    QLIST_INIT(&attr->psl_list);
+    QTAILQ_INIT(&attr->psl_list);
 }
 
 static void ram_block_attribute_finalize(Object *obj)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 13/13] RAMBlock: Make guest_memfd require coordinate discard
  2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
                   ` (11 preceding siblings ...)
  2025-04-07  7:49 ` [PATCH v4 12/13] ram-block-attribute: Add priority listener support for PrivateSharedListener Chenyi Qiang
@ 2025-04-07  7:49 ` Chenyi Qiang
  12 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-07  7:49 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: Chenyi Qiang, qemu-devel, kvm, Williams Dan J, Peng Chao P,
	Gao Chao, Xu Yilun, Li Xiaoyao

As guest_memfd is now managed by ram_block_attribute with
PrivateSharedManager, only block uncoordinated discard.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Modify commit message (RamDiscardManager->PrivateSharedManager).

Changes in v3:
    - No change.

Changes in v2:
    - Change the ram_block_discard_require(false) to
      ram_block_coordinated_discard_require(false).
---
 system/physmem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index fb74321e10..5e72d2a544 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1871,7 +1871,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
         assert(kvm_enabled());
         assert(new_block->guest_memfd < 0);
 
-        ret = ram_block_discard_require(true);
+        ret = ram_block_coordinated_discard_require(true);
         if (ret < 0) {
             error_setg_errno(errp, -ret,
                              "cannot set up private guest memory: discard currently blocked");
@@ -1895,7 +1895,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
              */
             object_unref(OBJECT(new_block->ram_block_attribute));
             close(new_block->guest_memfd);
-            ram_block_discard_require(false);
+            ram_block_coordinated_discard_require(false);
             qemu_mutex_unlock_ramlist();
             goto out_free;
         }
@@ -2155,7 +2155,7 @@ static void reclaim_ramblock(RAMBlock *block)
         ram_block_attribute_unrealize(block->ram_block_attribute);
         object_unref(OBJECT(block->ram_block_attribute));
         close(block->guest_memfd);
-        ram_block_discard_require(false);
+        ram_block_coordinated_discard_require(false);
     }
 
     g_free(block);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result
  2025-04-07  7:49 ` [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result Chenyi Qiang
@ 2025-04-07  9:53   ` Xiaoyao Li
  2025-04-08  0:50     ` Chenyi Qiang
  2025-04-09  5:35   ` Alexey Kardashevskiy
  1 sibling, 1 reply; 67+ messages in thread
From: Xiaoyao Li @ 2025-04-07  9:53 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun

On 4/7/2025 3:49 PM, Chenyi Qiang wrote:
> Modify memory_region_set_ram_discard_manager() to return false if a
> RamDiscardManager is already set in the MemoryRegion. 

It doesn't return false, but -EBUSY.

> The caller must
> handle this failure, such as having virtio-mem undo its actions and fail
> the realize() process. Opportunistically move the call earlier to avoid
> complex error handling.
> 
> This change is beneficial when introducing a new RamDiscardManager
> instance besides virtio-mem. After
> ram_block_coordinated_discard_require(true) unlocks all
> RamDiscardManager instances, only one instance is allowed to be set for
> a MemoryRegion at present.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v4:
>      - No change.
> 
> Changes in v3:
>      - Move set_ram_discard_manager() up to avoid a g_free()
>      - Clean up set_ram_discard_manager() definition
> 
> Changes in v2:
>      - newly added.
> ---
>   hw/virtio/virtio-mem.c | 29 ++++++++++++++++-------------
>   include/exec/memory.h  |  6 +++---
>   system/memory.c        | 10 +++++++---
>   3 files changed, 26 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 21f16e4912..d0d3a0240f 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -1049,6 +1049,17 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>           return;
>       }
>   
> +    /*
> +     * Set ourselves as RamDiscardManager before the plug handler maps the
> +     * memory region and exposes it via an address space.
> +     */
> +    if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
> +                                              RAM_DISCARD_MANAGER(vmem))) {
> +        error_setg(errp, "Failed to set RamDiscardManager");
> +        ram_block_coordinated_discard_require(false);
> +        return;
> +    }
> +
>       /*
>        * We don't know at this point whether shared RAM is migrated using
>        * QEMU or migrated using the file content. "x-ignore-shared" will be
> @@ -1124,13 +1135,6 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>       vmem->system_reset = VIRTIO_MEM_SYSTEM_RESET(obj);
>       vmem->system_reset->vmem = vmem;
>       qemu_register_resettable(obj);
> -
> -    /*
> -     * Set ourselves as RamDiscardManager before the plug handler maps the
> -     * memory region and exposes it via an address space.
> -     */
> -    memory_region_set_ram_discard_manager(&vmem->memdev->mr,
> -                                          RAM_DISCARD_MANAGER(vmem));
>   }
>   
>   static void virtio_mem_device_unrealize(DeviceState *dev)
> @@ -1138,12 +1142,6 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
>       VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>       VirtIOMEM *vmem = VIRTIO_MEM(dev);
>   
> -    /*
> -     * The unplug handler unmapped the memory region, it cannot be
> -     * found via an address space anymore. Unset ourselves.
> -     */
> -    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
> -
>       qemu_unregister_resettable(OBJECT(vmem->system_reset));
>       object_unref(OBJECT(vmem->system_reset));
>   
> @@ -1156,6 +1154,11 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
>       virtio_del_queue(vdev, 0);
>       virtio_cleanup(vdev);
>       g_free(vmem->bitmap);
> +    /*
> +     * The unplug handler unmapped the memory region, it cannot be
> +     * found via an address space anymore. Unset ourselves.
> +     */
> +    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
>       ram_block_coordinated_discard_require(false);
>   }
>   
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 3bebc43d59..390477b588 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2487,13 +2487,13 @@ static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
>    *
>    * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
>    * that does not cover RAM, or a #MemoryRegion that already has a
> - * #RamDiscardManager assigned.
> + * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
>    *
>    * @mr: the #MemoryRegion
>    * @rdm: #RamDiscardManager to set
>    */
> -void memory_region_set_ram_discard_manager(MemoryRegion *mr,
> -                                           RamDiscardManager *rdm);
> +int memory_region_set_ram_discard_manager(MemoryRegion *mr,
> +                                          RamDiscardManager *rdm);
>   
>   /**
>    * memory_region_find: translate an address/size relative to a
> diff --git a/system/memory.c b/system/memory.c
> index b17b5538ff..62d6b410f0 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2115,12 +2115,16 @@ RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
>       return mr->rdm;
>   }
>   
> -void memory_region_set_ram_discard_manager(MemoryRegion *mr,
> -                                           RamDiscardManager *rdm)
> +int memory_region_set_ram_discard_manager(MemoryRegion *mr,
> +                                          RamDiscardManager *rdm)
>   {
>       g_assert(memory_region_is_ram(mr));
> -    g_assert(!rdm || !mr->rdm);
> +    if (mr->rdm && rdm) {
> +        return -EBUSY;
> +    }
> +
>       mr->rdm = rdm;
> +    return 0;
>   }
>   
>   uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result
  2025-04-07  9:53   ` Xiaoyao Li
@ 2025-04-08  0:50     ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-08  0:50 UTC (permalink / raw)
  To: Xiaoyao Li, David Hildenbrand, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun



On 4/7/2025 5:53 PM, Xiaoyao Li wrote:
> On 4/7/2025 3:49 PM, Chenyi Qiang wrote:
>> Modify memory_region_set_ram_discard_manager() to return false if a
>> RamDiscardManager is already set in the MemoryRegion. 
> 
> It doesn't return false, but -EBUSY.

Nice catch! Forgot to modify this commit message.

> 
>> The caller must
>> handle this failure, such as having virtio-mem undo its actions and fail
>> the realize() process. Opportunistically move the call earlier to avoid
>> complex error handling.
>>
>> This change is beneficial when introducing a new RamDiscardManager
>> instance besides virtio-mem. After
>> ram_block_coordinated_discard_require(true) unlocks all
>> RamDiscardManager instances, only one instance is allowed to be set for
>> a MemoryRegion at present.
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>> Changes in v4:
>>      - No change.
>>
>> Changes in v3:
>>      - Move set_ram_discard_manager() up to avoid a g_free()
>>      - Clean up set_ram_discard_manager() definition
>>
>> Changes in v2:
>>      - newly added.
>> ---
>>   hw/virtio/virtio-mem.c | 29 ++++++++++++++++-------------
>>   include/exec/memory.h  |  6 +++---
>>   system/memory.c        | 10 +++++++---
>>   3 files changed, 26 insertions(+), 19 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>> index 21f16e4912..d0d3a0240f 100644
>> --- a/hw/virtio/virtio-mem.c
>> +++ b/hw/virtio/virtio-mem.c
>> @@ -1049,6 +1049,17 @@ static void
>> virtio_mem_device_realize(DeviceState *dev, Error **errp)
>>           return;
>>       }
>>   +    /*
>> +     * Set ourselves as RamDiscardManager before the plug handler
>> maps the
>> +     * memory region and exposes it via an address space.
>> +     */
>> +    if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
>> +                                             
>> RAM_DISCARD_MANAGER(vmem))) {
>> +        error_setg(errp, "Failed to set RamDiscardManager");
>> +        ram_block_coordinated_discard_require(false);
>> +        return;
>> +    }
>> +
>>       /*
>>        * We don't know at this point whether shared RAM is migrated using
>>        * QEMU or migrated using the file content. "x-ignore-shared"
>> will be
>> @@ -1124,13 +1135,6 @@ static void
>> virtio_mem_device_realize(DeviceState *dev, Error **errp)
>>       vmem->system_reset = VIRTIO_MEM_SYSTEM_RESET(obj);
>>       vmem->system_reset->vmem = vmem;
>>       qemu_register_resettable(obj);
>> -
>> -    /*
>> -     * Set ourselves as RamDiscardManager before the plug handler
>> maps the
>> -     * memory region and exposes it via an address space.
>> -     */
>> -    memory_region_set_ram_discard_manager(&vmem->memdev->mr,
>> -                                          RAM_DISCARD_MANAGER(vmem));
>>   }
>>     static void virtio_mem_device_unrealize(DeviceState *dev)
>> @@ -1138,12 +1142,6 @@ static void
>> virtio_mem_device_unrealize(DeviceState *dev)
>>       VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>       VirtIOMEM *vmem = VIRTIO_MEM(dev);
>>   -    /*
>> -     * The unplug handler unmapped the memory region, it cannot be
>> -     * found via an address space anymore. Unset ourselves.
>> -     */
>> -    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
>> -
>>       qemu_unregister_resettable(OBJECT(vmem->system_reset));
>>       object_unref(OBJECT(vmem->system_reset));
>>   @@ -1156,6 +1154,11 @@ static void
>> virtio_mem_device_unrealize(DeviceState *dev)
>>       virtio_del_queue(vdev, 0);
>>       virtio_cleanup(vdev);
>>       g_free(vmem->bitmap);
>> +    /*
>> +     * The unplug handler unmapped the memory region, it cannot be
>> +     * found via an address space anymore. Unset ourselves.
>> +     */
>> +    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
>>       ram_block_coordinated_discard_require(false);
>>   }
>>   diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 3bebc43d59..390477b588 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -2487,13 +2487,13 @@ static inline bool
>> memory_region_has_ram_discard_manager(MemoryRegion *mr)
>>    *
>>    * This function must not be called for a mapped #MemoryRegion, a
>> #MemoryRegion
>>    * that does not cover RAM, or a #MemoryRegion that already has a
>> - * #RamDiscardManager assigned.
>> + * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
>>    *
>>    * @mr: the #MemoryRegion
>>    * @rdm: #RamDiscardManager to set
>>    */
>> -void memory_region_set_ram_discard_manager(MemoryRegion *mr,
>> -                                           RamDiscardManager *rdm);
>> +int memory_region_set_ram_discard_manager(MemoryRegion *mr,
>> +                                          RamDiscardManager *rdm);
>>     /**
>>    * memory_region_find: translate an address/size relative to a
>> diff --git a/system/memory.c b/system/memory.c
>> index b17b5538ff..62d6b410f0 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -2115,12 +2115,16 @@ RamDiscardManager
>> *memory_region_get_ram_discard_manager(MemoryRegion *mr)
>>       return mr->rdm;
>>   }
>>   -void memory_region_set_ram_discard_manager(MemoryRegion *mr,
>> -                                           RamDiscardManager *rdm)
>> +int memory_region_set_ram_discard_manager(MemoryRegion *mr,
>> +                                          RamDiscardManager *rdm)
>>   {
>>       g_assert(memory_region_is_ram(mr));
>> -    g_assert(!rdm || !mr->rdm);
>> +    if (mr->rdm && rdm) {
>> +        return -EBUSY;
>> +    }
>> +
>>       mr->rdm = rdm;
>> +    return 0;
>>   }
>>     uint64_t ram_discard_manager_get_min_granularity(const
>> RamDiscardManager *rdm,
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range
  2025-04-07  7:49 ` [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
@ 2025-04-09  2:47   ` Alexey Kardashevskiy
  2025-04-09  6:26     ` Chenyi Qiang
  2025-05-12  3:24   ` Zhao Liu
  1 sibling, 1 reply; 67+ messages in thread
From: Alexey Kardashevskiy @ 2025-04-09  2:47 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao


On 7/4/25 17:49, Chenyi Qiang wrote:
> Rename the helper to memory_region_section_intersect_range() to make it
> more generic. Meanwhile, define the @end as Int128 and replace the
> related operations with Int128_* format since the helper is exported as
> a wider API.
> 
> Suggested-by: Alexey Kardashevskiy <aik@amd.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>

./scripts/checkpatch.pl complains "WARNING: line over 80 characters"

with that fixed,

Reviewed-by: Alexey Kardashevskiy <aik@amd.com>

> ---
> Changes in v4:
>      - No change.
> 
> Changes in v3:
>      - No change
> 
> Changes in v2:
>      - Make memory_region_section_intersect_range() an inline function.
>      - Add Reviewed-by from David
>      - Define the @end as Int128 and use the related Int128_* ops as a wilder
>        API (Alexey)
> ---
>   hw/virtio/virtio-mem.c | 32 +++++---------------------------
>   include/exec/memory.h  | 27 +++++++++++++++++++++++++++
>   2 files changed, 32 insertions(+), 27 deletions(-)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index b1a003736b..21f16e4912 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -244,28 +244,6 @@ static int virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg,
>       return ret;
>   }
>   
> -/*
> - * Adjust the memory section to cover the intersection with the given range.
> - *
> - * Returns false if the intersection is empty, otherwise returns true.
> - */
> -static bool virtio_mem_intersect_memory_section(MemoryRegionSection *s,
> -                                                uint64_t offset, uint64_t size)
> -{
> -    uint64_t start = MAX(s->offset_within_region, offset);
> -    uint64_t end = MIN(s->offset_within_region + int128_get64(s->size),
> -                       offset + size);
> -
> -    if (end <= start) {
> -        return false;
> -    }
> -
> -    s->offset_within_address_space += start - s->offset_within_region;
> -    s->offset_within_region = start;
> -    s->size = int128_make64(end - start);
> -    return true;
> -}
> -
>   typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void *arg);
>   
>   static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
> @@ -287,7 +265,7 @@ static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
>                                         first_bit + 1) - 1;
>           size = (last_bit - first_bit + 1) * vmem->block_size;
>   
> -        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
> +        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
>               break;
>           }
>           ret = cb(&tmp, arg);
> @@ -319,7 +297,7 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
>                                    first_bit + 1) - 1;
>           size = (last_bit - first_bit + 1) * vmem->block_size;
>   
> -        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
> +        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
>               break;
>           }
>           ret = cb(&tmp, arg);
> @@ -355,7 +333,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
>       QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
>           MemoryRegionSection tmp = *rdl->section;
>   
> -        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
> +        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
>               continue;
>           }
>           rdl->notify_discard(rdl, &tmp);
> @@ -371,7 +349,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
>       QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
>           MemoryRegionSection tmp = *rdl->section;
>   
> -        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
> +        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
>               continue;
>           }
>           ret = rdl->notify_populate(rdl, &tmp);
> @@ -388,7 +366,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
>               if (rdl2 == rdl) {
>                   break;
>               }
> -            if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
> +            if (!memory_region_section_intersect_range(&tmp, offset, size)) {
>                   continue;
>               }
>               rdl2->notify_discard(rdl2, &tmp);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 3ee1901b52..3bebc43d59 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1202,6 +1202,33 @@ MemoryRegionSection *memory_region_section_new_copy(MemoryRegionSection *s);
>    */
>   void memory_region_section_free_copy(MemoryRegionSection *s);
>   
> +/**
> + * memory_region_section_intersect_range: Adjust the memory section to cover
> + * the intersection with the given range.
> + *
> + * @s: the #MemoryRegionSection to be adjusted
> + * @offset: the offset of the given range in the memory region
> + * @size: the size of the given range
> + *
> + * Returns false if the intersection is empty, otherwise returns true.
> + */
> +static inline bool memory_region_section_intersect_range(MemoryRegionSection *s,
> +                                                         uint64_t offset, uint64_t size)
> +{
> +    uint64_t start = MAX(s->offset_within_region, offset);
> +    Int128 end = int128_min(int128_add(int128_make64(s->offset_within_region), s->size),
> +                            int128_add(int128_make64(offset), int128_make64(size)));
> +
> +    if (int128_le(end, int128_make64(start))) {
> +        return false;
> +    }
> +
> +    s->offset_within_address_space += start - s->offset_within_region;
> +    s->offset_within_region = start;
> +    s->size = int128_sub(end, int128_make64(start));
> +    return true;
> +}
> +
>   /**
>    * memory_region_init: Initialize a memory region
>    *

-- 
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result
  2025-04-07  7:49 ` [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result Chenyi Qiang
  2025-04-07  9:53   ` Xiaoyao Li
@ 2025-04-09  5:35   ` Alexey Kardashevskiy
  2025-04-09  5:52     ` Chenyi Qiang
  1 sibling, 1 reply; 67+ messages in thread
From: Alexey Kardashevskiy @ 2025-04-09  5:35 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 7/4/25 17:49, Chenyi Qiang wrote:
> Modify memory_region_set_ram_discard_manager() to return false if a
> RamDiscardManager is already set in the MemoryRegion. The caller must
> handle this failure, such as having virtio-mem undo its actions and fail
> the realize() process. Opportunistically move the call earlier to avoid
> complex error handling.
> 
> This change is beneficial when introducing a new RamDiscardManager
> instance besides virtio-mem. After
> ram_block_coordinated_discard_require(true) unlocks all
> RamDiscardManager instances, only one instance is allowed to be set for
> a MemoryRegion at present.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v4:
>      - No change.
> 
> Changes in v3:
>      - Move set_ram_discard_manager() up to avoid a g_free()
>      - Clean up set_ram_discard_manager() definition
> 
> Changes in v2:
>      - newly added.
> ---
>   hw/virtio/virtio-mem.c | 29 ++++++++++++++++-------------
>   include/exec/memory.h  |  6 +++---
>   system/memory.c        | 10 +++++++---
>   3 files changed, 26 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 21f16e4912..d0d3a0240f 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -1049,6 +1049,17 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>           return;
>       }
>   
> +    /*
> +     * Set ourselves as RamDiscardManager before the plug handler maps the
> +     * memory region and exposes it via an address space.
> +     */
> +    if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
> +                                              RAM_DISCARD_MANAGER(vmem))) {
> +        error_setg(errp, "Failed to set RamDiscardManager");
> +        ram_block_coordinated_discard_require(false);
> +        return;
> +    }
> +
>       /*
>        * We don't know at this point whether shared RAM is migrated using
>        * QEMU or migrated using the file content. "x-ignore-shared" will be

Right after the end of this comment block, do not you want 
memory_region_set_generic_state_manager(..., NULL)?


> @@ -1124,13 +1135,6 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>       vmem->system_reset = VIRTIO_MEM_SYSTEM_RESET(obj);
>       vmem->system_reset->vmem = vmem;
>       qemu_register_resettable(obj);
> -
> -    /*
> -     * Set ourselves as RamDiscardManager before the plug handler maps the
> -     * memory region and exposes it via an address space.
> -     */
> -    memory_region_set_ram_discard_manager(&vmem->memdev->mr,
> -                                          RAM_DISCARD_MANAGER(vmem));
>   }
>   
>   static void virtio_mem_device_unrealize(DeviceState *dev)
> @@ -1138,12 +1142,6 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
>       VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>       VirtIOMEM *vmem = VIRTIO_MEM(dev);
>   
> -    /*
> -     * The unplug handler unmapped the memory region, it cannot be
> -     * found via an address space anymore. Unset ourselves.
> -     */
> -    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
> -
>       qemu_unregister_resettable(OBJECT(vmem->system_reset));
>       object_unref(OBJECT(vmem->system_reset));
>   
> @@ -1156,6 +1154,11 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
>       virtio_del_queue(vdev, 0);
>       virtio_cleanup(vdev);
>       g_free(vmem->bitmap);
> +    /*
> +     * The unplug handler unmapped the memory region, it cannot be
> +     * found via an address space anymore. Unset ourselves.
> +     */
> +    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
>       ram_block_coordinated_discard_require(false);
>   }
>   
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 3bebc43d59..390477b588 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2487,13 +2487,13 @@ static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
>    *
>    * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
>    * that does not cover RAM, or a #MemoryRegion that already has a
> - * #RamDiscardManager assigned.
> + * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
>    *
>    * @mr: the #MemoryRegion
>    * @rdm: #RamDiscardManager to set
>    */
> -void memory_region_set_ram_discard_manager(MemoryRegion *mr,
> -                                           RamDiscardManager *rdm);
> +int memory_region_set_ram_discard_manager(MemoryRegion *mr,
> +                                          RamDiscardManager *rdm);
>   
>   /**
>    * memory_region_find: translate an address/size relative to a
> diff --git a/system/memory.c b/system/memory.c
> index b17b5538ff..62d6b410f0 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2115,12 +2115,16 @@ RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
>       return mr->rdm;
>   }
>   
> -void memory_region_set_ram_discard_manager(MemoryRegion *mr,
> -                                           RamDiscardManager *rdm)
> +int memory_region_set_ram_discard_manager(MemoryRegion *mr,
> +                                          RamDiscardManager *rdm)
>   {
>       g_assert(memory_region_is_ram(mr));
> -    g_assert(!rdm || !mr->rdm);
> +    if (mr->rdm && rdm) {
> +        return -EBUSY;
> +    }
> +
>       mr->rdm = rdm;
> +    return 0;

This is a change which can potentially break something, or currently 
there is no way to trigger -EBUSY? Thanks,


>   }
>   
>   uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,

-- 
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard()
  2025-04-07  7:49 ` [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() Chenyi Qiang
@ 2025-04-09  5:43   ` Alexey Kardashevskiy
  2025-04-09  6:56     ` Chenyi Qiang
  2025-04-25 12:44     ` David Hildenbrand
  2025-04-25 12:42   ` David Hildenbrand
  1 sibling, 2 replies; 67+ messages in thread
From: Alexey Kardashevskiy @ 2025-04-09  5:43 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 7/4/25 17:49, Chenyi Qiang wrote:
> Update ReplayRamDiscard() function to return the result and unify the
> ReplayRamPopulate() and ReplayRamDiscard() to ReplayStateChange() at
> the same time due to their identical definitions. This unification
> simplifies related structures, such as VirtIOMEMReplayData, which makes
> it more cleaner and maintainable.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v4:
>      - Modify the commit message. We won't use Replay() operation when
>        doing the attribute change like v3.
> 
> Changes in v3:
>      - Newly added.
> ---
>   hw/virtio/virtio-mem.c | 20 ++++++++++----------
>   include/exec/memory.h  | 31 ++++++++++++++++---------------
>   migration/ram.c        |  5 +++--
>   system/memory.c        | 12 ++++++------
>   4 files changed, 35 insertions(+), 33 deletions(-)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index d0d3a0240f..1a88d649cb 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -1733,7 +1733,7 @@ static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
>   }
>   
>   struct VirtIOMEMReplayData {
> -    void *fn;
> +    ReplayStateChange fn;


s/ReplayStateChange/ReplayRamStateChange/

Just "State" is way too generic imho.


>       void *opaque;
>   };
>   
> @@ -1741,12 +1741,12 @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
>   {
>       struct VirtIOMEMReplayData *data = arg;
>   
> -    return ((ReplayRamPopulate)data->fn)(s, data->opaque);
> +    return data->fn(s, data->opaque);
>   }
>   
>   static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
>                                              MemoryRegionSection *s,
> -                                           ReplayRamPopulate replay_fn,
> +                                           ReplayStateChange replay_fn,
>                                              void *opaque)
>   {
>       const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
> @@ -1765,14 +1765,14 @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
>   {
>       struct VirtIOMEMReplayData *data = arg;
>   
> -    ((ReplayRamDiscard)data->fn)(s, data->opaque);
> +    data->fn(s, data->opaque);
>       return 0;

return data->fn(s, data->opaque); ?

Or a comment why we ignore the return result? Thanks,

>   }
>   
> -static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
> -                                            MemoryRegionSection *s,
> -                                            ReplayRamDiscard replay_fn,
> -                                            void *opaque)
> +static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
> +                                           MemoryRegionSection *s,
> +                                           ReplayStateChange replay_fn,
> +                                           void *opaque)
>   {
>       const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
>       struct VirtIOMEMReplayData data = {
> @@ -1781,8 +1781,8 @@ static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
>       };
>   
>       g_assert(s->mr == &vmem->memdev->mr);
> -    virtio_mem_for_each_unplugged_section(vmem, s, &data,
> -                                          virtio_mem_rdm_replay_discarded_cb);
> +    return virtio_mem_for_each_unplugged_section(vmem, s, &data,
> +                                                 virtio_mem_rdm_replay_discarded_cb);


a nit: "WARNING: line over 80 characters" - I have no idea what is the 
best thing to do here though.



>   }
>   
>   static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 390477b588..3b1d25a403 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -566,8 +566,7 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
>       rdl->double_discard_supported = double_discard_supported;
>   }
>   
> -typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque);
> -typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque);
> +typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
>   
>   /*
>    * RamDiscardManagerClass:
> @@ -641,36 +640,38 @@ struct RamDiscardManagerClass {
>       /**
>        * @replay_populated:
>        *
> -     * Call the #ReplayRamPopulate callback for all populated parts within the
> +     * Call the #ReplayStateChange callback for all populated parts within the
>        * #MemoryRegionSection via the #RamDiscardManager.
>        *
>        * In case any call fails, no further calls are made.
>        *
>        * @rdm: the #RamDiscardManager
>        * @section: the #MemoryRegionSection
> -     * @replay_fn: the #ReplayRamPopulate callback
> +     * @replay_fn: the #ReplayStateChange callback
>        * @opaque: pointer to forward to the callback
>        *
>        * Returns 0 on success, or a negative error if any notification failed.
>        */
>       int (*replay_populated)(const RamDiscardManager *rdm,
>                               MemoryRegionSection *section,
> -                            ReplayRamPopulate replay_fn, void *opaque);
> +                            ReplayStateChange replay_fn, void *opaque);
>   
>       /**
>        * @replay_discarded:
>        *
> -     * Call the #ReplayRamDiscard callback for all discarded parts within the
> +     * Call the #ReplayStateChange callback for all discarded parts within the
>        * #MemoryRegionSection via the #RamDiscardManager.
>        *
>        * @rdm: the #RamDiscardManager
>        * @section: the #MemoryRegionSection
> -     * @replay_fn: the #ReplayRamDiscard callback
> +     * @replay_fn: the #ReplayStateChange callback
>        * @opaque: pointer to forward to the callback
> +     *
> +     * Returns 0 on success, or a negative error if any notification failed.
>        */
> -    void (*replay_discarded)(const RamDiscardManager *rdm,
> -                             MemoryRegionSection *section,
> -                             ReplayRamDiscard replay_fn, void *opaque);
> +    int (*replay_discarded)(const RamDiscardManager *rdm,
> +                            MemoryRegionSection *section,
> +                            ReplayStateChange replay_fn, void *opaque);
>   
>       /**
>        * @register_listener:
> @@ -713,13 +714,13 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
>   
>   int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
>                                            MemoryRegionSection *section,
> -                                         ReplayRamPopulate replay_fn,
> +                                         ReplayStateChange replay_fn,
>                                            void *opaque);
>   
> -void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
> -                                          MemoryRegionSection *section,
> -                                          ReplayRamDiscard replay_fn,
> -                                          void *opaque);
> +int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
> +                                         MemoryRegionSection *section,
> +                                         ReplayStateChange replay_fn,
> +                                         void *opaque);
>   
>   void ram_discard_manager_register_listener(RamDiscardManager *rdm,
>                                              RamDiscardListener *rdl,
> diff --git a/migration/ram.c b/migration/ram.c
> index ce28328141..053730367b 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -816,8 +816,8 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
>       return ret;
>   }
>   
> -static void dirty_bitmap_clear_section(MemoryRegionSection *section,
> -                                       void *opaque)
> +static int dirty_bitmap_clear_section(MemoryRegionSection *section,
> +                                      void *opaque)
>   {
>       const hwaddr offset = section->offset_within_region;
>       const hwaddr size = int128_get64(section->size);
> @@ -836,6 +836,7 @@ static void dirty_bitmap_clear_section(MemoryRegionSection *section,
>       }
>       *cleared_bits += bitmap_count_one_with_offset(rb->bmap, start, npages);
>       bitmap_clear(rb->bmap, start, npages);
> +    return 0;
>   }
>   
>   /*
> diff --git a/system/memory.c b/system/memory.c
> index 62d6b410f0..b5ab729e13 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2147,7 +2147,7 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
>   
>   int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
>                                            MemoryRegionSection *section,
> -                                         ReplayRamPopulate replay_fn,
> +                                         ReplayStateChange replay_fn,
>                                            void *opaque)
>   {
>       RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
> @@ -2156,15 +2156,15 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
>       return rdmc->replay_populated(rdm, section, replay_fn, opaque);
>   }
>   
> -void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
> -                                          MemoryRegionSection *section,
> -                                          ReplayRamDiscard replay_fn,
> -                                          void *opaque)
> +int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
> +                                         MemoryRegionSection *section,
> +                                         ReplayStateChange replay_fn,
> +                                         void *opaque)
>   {
>       RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
>   
>       g_assert(rdmc->replay_discarded);
> -    rdmc->replay_discarded(rdm, section, replay_fn, opaque);
> +    return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
>   }
>   
>   void ram_discard_manager_register_listener(RamDiscardManager *rdm,

-- 
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result
  2025-04-09  5:35   ` Alexey Kardashevskiy
@ 2025-04-09  5:52     ` Chenyi Qiang
  2025-04-25 12:35       ` David Hildenbrand
  0 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-09  5:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/9/2025 1:35 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> Modify memory_region_set_ram_discard_manager() to return false if a
>> RamDiscardManager is already set in the MemoryRegion. The caller must
>> handle this failure, such as having virtio-mem undo its actions and fail
>> the realize() process. Opportunistically move the call earlier to avoid
>> complex error handling.
>>
>> This change is beneficial when introducing a new RamDiscardManager
>> instance besides virtio-mem. After
>> ram_block_coordinated_discard_require(true) unlocks all
>> RamDiscardManager instances, only one instance is allowed to be set for
>> a MemoryRegion at present.
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>> Changes in v4:
>>      - No change.
>>
>> Changes in v3:
>>      - Move set_ram_discard_manager() up to avoid a g_free()
>>      - Clean up set_ram_discard_manager() definition
>>
>> Changes in v2:
>>      - newly added.
>> ---
>>   hw/virtio/virtio-mem.c | 29 ++++++++++++++++-------------
>>   include/exec/memory.h  |  6 +++---
>>   system/memory.c        | 10 +++++++---
>>   3 files changed, 26 insertions(+), 19 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>> index 21f16e4912..d0d3a0240f 100644
>> --- a/hw/virtio/virtio-mem.c
>> +++ b/hw/virtio/virtio-mem.c
>> @@ -1049,6 +1049,17 @@ static void
>> virtio_mem_device_realize(DeviceState *dev, Error **errp)
>>           return;
>>       }
>>   +    /*
>> +     * Set ourselves as RamDiscardManager before the plug handler
>> maps the
>> +     * memory region and exposes it via an address space.
>> +     */
>> +    if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
>> +                                             
>> RAM_DISCARD_MANAGER(vmem))) {
>> +        error_setg(errp, "Failed to set RamDiscardManager");
>> +        ram_block_coordinated_discard_require(false);
>> +        return;
>> +    }
>> +
>>       /*
>>        * We don't know at this point whether shared RAM is migrated using
>>        * QEMU or migrated using the file content. "x-ignore-shared"
>> will be
> 
> Right after the end of this comment block, do not you want
> memory_region_set_generic_state_manager(..., NULL)?

Nice catch! Miss to add memory_region_set_generic_state_manager(NULL)
for the ram_block_discard_range() failure case.

> 
> 
>> @@ -1124,13 +1135,6 @@ static void
>> virtio_mem_device_realize(DeviceState *dev, Error **errp)
>>       vmem->system_reset = VIRTIO_MEM_SYSTEM_RESET(obj);
>>       vmem->system_reset->vmem = vmem;
>>       qemu_register_resettable(obj);
>> -
>> -    /*
>> -     * Set ourselves as RamDiscardManager before the plug handler
>> maps the
>> -     * memory region and exposes it via an address space.
>> -     */
>> -    memory_region_set_ram_discard_manager(&vmem->memdev->mr,
>> -                                          RAM_DISCARD_MANAGER(vmem));
>>   }
>>     static void virtio_mem_device_unrealize(DeviceState *dev)
>> @@ -1138,12 +1142,6 @@ static void
>> virtio_mem_device_unrealize(DeviceState *dev)
>>       VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>       VirtIOMEM *vmem = VIRTIO_MEM(dev);
>>   -    /*
>> -     * The unplug handler unmapped the memory region, it cannot be
>> -     * found via an address space anymore. Unset ourselves.
>> -     */
>> -    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
>> -
>>       qemu_unregister_resettable(OBJECT(vmem->system_reset));
>>       object_unref(OBJECT(vmem->system_reset));
>>   @@ -1156,6 +1154,11 @@ static void
>> virtio_mem_device_unrealize(DeviceState *dev)
>>       virtio_del_queue(vdev, 0);
>>       virtio_cleanup(vdev);
>>       g_free(vmem->bitmap);
>> +    /*
>> +     * The unplug handler unmapped the memory region, it cannot be
>> +     * found via an address space anymore. Unset ourselves.
>> +     */
>> +    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
>>       ram_block_coordinated_discard_require(false);
>>   }
>>   diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 3bebc43d59..390477b588 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -2487,13 +2487,13 @@ static inline bool
>> memory_region_has_ram_discard_manager(MemoryRegion *mr)
>>    *
>>    * This function must not be called for a mapped #MemoryRegion, a
>> #MemoryRegion
>>    * that does not cover RAM, or a #MemoryRegion that already has a
>> - * #RamDiscardManager assigned.
>> + * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
>>    *
>>    * @mr: the #MemoryRegion
>>    * @rdm: #RamDiscardManager to set
>>    */
>> -void memory_region_set_ram_discard_manager(MemoryRegion *mr,
>> -                                           RamDiscardManager *rdm);
>> +int memory_region_set_ram_discard_manager(MemoryRegion *mr,
>> +                                          RamDiscardManager *rdm);
>>     /**
>>    * memory_region_find: translate an address/size relative to a
>> diff --git a/system/memory.c b/system/memory.c
>> index b17b5538ff..62d6b410f0 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -2115,12 +2115,16 @@ RamDiscardManager
>> *memory_region_get_ram_discard_manager(MemoryRegion *mr)
>>       return mr->rdm;
>>   }
>>   -void memory_region_set_ram_discard_manager(MemoryRegion *mr,
>> -                                           RamDiscardManager *rdm)
>> +int memory_region_set_ram_discard_manager(MemoryRegion *mr,
>> +                                          RamDiscardManager *rdm)
>>   {
>>       g_assert(memory_region_is_ram(mr));
>> -    g_assert(!rdm || !mr->rdm);
>> +    if (mr->rdm && rdm) {
>> +        return -EBUSY;
>> +    }
>> +
>>       mr->rdm = rdm;
>> +    return 0;
> 
> This is a change which can potentially break something, or currently
> there is no way to trigger -EBUSY? Thanks,

Before this series, virtio-mem is the only user to
set_ram_discard_manager(), there's no way to trigger -EBUSY. With this
series, guest_memfd-backed RAMBlocks become the second user. It can be
triggered if we try to use virtio-mem in confidential VMs.

> 
> 
>>   }
>>     uint64_t ram_discard_manager_get_min_granularity(const
>> RamDiscardManager *rdm,
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range
  2025-04-09  2:47   ` Alexey Kardashevskiy
@ 2025-04-09  6:26     ` Chenyi Qiang
  2025-04-09  6:45       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-09  6:26 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/9/2025 10:47 AM, Alexey Kardashevskiy wrote:
> 
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> Rename the helper to memory_region_section_intersect_range() to make it
>> more generic. Meanwhile, define the @end as Int128 and replace the
>> related operations with Int128_* format since the helper is exported as
>> a wider API.
>>
>> Suggested-by: Alexey Kardashevskiy <aik@amd.com>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> 
> ./scripts/checkpatch.pl complains "WARNING: line over 80 characters"
> 
> with that fixed,

I observed many places in QEMU ignore the WARNING for over 80
characters, so I also ignored them in my series.

After checking the rule in docs/devel/style.rst, I think I should try
best to make it not longer than 80. But if it is hard to do so due to
long function or symbol names, it is acceptable to not wrap it.

Then, I would modify the first warning code. For the latter two
warnings, see code below

> 
> Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
> 
>> ---
>> Changes in v4:
>>      - No change.
>>
>> Changes in v3:
>>      - No change
>>
>> Changes in v2:
>>      - Make memory_region_section_intersect_range() an inline function.
>>      - Add Reviewed-by from David
>>      - Define the @end as Int128 and use the related Int128_* ops as a
>> wilder
>>        API (Alexey)
>> ---
>>   hw/virtio/virtio-mem.c | 32 +++++---------------------------
>>   include/exec/memory.h  | 27 +++++++++++++++++++++++++++
>>   2 files changed, 32 insertions(+), 27 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>> index b1a003736b..21f16e4912 100644
>> --- a/hw/virtio/virtio-mem.c
>> +++ b/hw/virtio/virtio-mem.c
>> @@ -244,28 +244,6 @@ static int
>> virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg,
>>       return ret;
>>   }
>>   -/*
>> - * Adjust the memory section to cover the intersection with the given
>> range.
>> - *
>> - * Returns false if the intersection is empty, otherwise returns true.
>> - */
>> -static bool virtio_mem_intersect_memory_section(MemoryRegionSection *s,
>> -                                                uint64_t offset,
>> uint64_t size)
>> -{
>> -    uint64_t start = MAX(s->offset_within_region, offset);
>> -    uint64_t end = MIN(s->offset_within_region + int128_get64(s->size),
>> -                       offset + size);
>> -
>> -    if (end <= start) {
>> -        return false;
>> -    }
>> -
>> -    s->offset_within_address_space += start - s->offset_within_region;
>> -    s->offset_within_region = start;
>> -    s->size = int128_make64(end - start);
>> -    return true;
>> -}
>> -
>>   typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void
>> *arg);
>>     static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
>> @@ -287,7 +265,7 @@ static int
>> virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
>>                                         first_bit + 1) - 1;
>>           size = (last_bit - first_bit + 1) * vmem->block_size;
>>   -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>> size)) {
>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>>               break;
>>           }
>>           ret = cb(&tmp, arg);
>> @@ -319,7 +297,7 @@ static int
>> virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
>>                                    first_bit + 1) - 1;
>>           size = (last_bit - first_bit + 1) * vmem->block_size;
>>   -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>> size)) {
>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>>               break;
>>           }
>>           ret = cb(&tmp, arg);
>> @@ -355,7 +333,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM
>> *vmem, uint64_t offset,
>>       QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
>>           MemoryRegionSection tmp = *rdl->section;
>>   -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>> size)) {
>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>>               continue;
>>           }
>>           rdl->notify_discard(rdl, &tmp);
>> @@ -371,7 +349,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem,
>> uint64_t offset,
>>       QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
>>           MemoryRegionSection tmp = *rdl->section;
>>   -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>> size)) {
>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>>               continue;
>>           }
>>           ret = rdl->notify_populate(rdl, &tmp);
>> @@ -388,7 +366,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem,
>> uint64_t offset,
>>               if (rdl2 == rdl) {
>>                   break;
>>               }
>> -            if (!virtio_mem_intersect_memory_section(&tmp, offset,
>> size)) {
>> +            if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>>                   continue;
>>               }
>>               rdl2->notify_discard(rdl2, &tmp);
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 3ee1901b52..3bebc43d59 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -1202,6 +1202,33 @@ MemoryRegionSection
>> *memory_region_section_new_copy(MemoryRegionSection *s);
>>    */
>>   void memory_region_section_free_copy(MemoryRegionSection *s);
>>   +/**
>> + * memory_region_section_intersect_range: Adjust the memory section
>> to cover
>> + * the intersection with the given range.
>> + *
>> + * @s: the #MemoryRegionSection to be adjusted
>> + * @offset: the offset of the given range in the memory region
>> + * @size: the size of the given range
>> + *
>> + * Returns false if the intersection is empty, otherwise returns true.
>> + */
>> +static inline bool
>> memory_region_section_intersect_range(MemoryRegionSection *s,
>> +                                                         uint64_t
>> offset, uint64_t size)
>> +{
>> +    uint64_t start = MAX(s->offset_within_region, offset);
>> +    Int128 end = int128_min(int128_add(int128_make64(s-
>> >offset_within_region), s->size),
>> +                            int128_add(int128_make64(offset),
>> int128_make64(size)));

The Int128_* format helper make the line over 80. I think it's better
not wrap it for readability.

>> +
>> +    if (int128_le(end, int128_make64(start))) {
>> +        return false;
>> +    }
>> +
>> +    s->offset_within_address_space += start - s->offset_within_region;
>> +    s->offset_within_region = start;
>> +    s->size = int128_sub(end, int128_make64(start));
>> +    return true;
>> +}
>> +
>>   /**
>>    * memory_region_init: Initialize a memory region
>>    *
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range
  2025-04-09  6:26     ` Chenyi Qiang
@ 2025-04-09  6:45       ` Alexey Kardashevskiy
  2025-04-09  7:38         ` Chenyi Qiang
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Kardashevskiy @ 2025-04-09  6:45 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 9/4/25 16:26, Chenyi Qiang wrote:
> 
> 
> On 4/9/2025 10:47 AM, Alexey Kardashevskiy wrote:
>>
>> On 7/4/25 17:49, Chenyi Qiang wrote:
>>> Rename the helper to memory_region_section_intersect_range() to make it
>>> more generic. Meanwhile, define the @end as Int128 and replace the
>>> related operations with Int128_* format since the helper is exported as
>>> a wider API.
>>>
>>> Suggested-by: Alexey Kardashevskiy <aik@amd.com>
>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>>
>> ./scripts/checkpatch.pl complains "WARNING: line over 80 characters"
>>
>> with that fixed,
> 
> I observed many places in QEMU ignore the WARNING for over 80
> characters, so I also ignored them in my series.
> 
> After checking the rule in docs/devel/style.rst, I think I should try
> best to make it not longer than 80. But if it is hard to do so due to
> long function or symbol names, it is acceptable to not wrap it.
 >
> Then, I would modify the first warning code. For the latter two
> warnings, see code below
> 
>>
>> Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
>>
>>> ---
>>> Changes in v4:
>>>       - No change.
>>>
>>> Changes in v3:
>>>       - No change
>>>
>>> Changes in v2:
>>>       - Make memory_region_section_intersect_range() an inline function.
>>>       - Add Reviewed-by from David
>>>       - Define the @end as Int128 and use the related Int128_* ops as a
>>> wilder
>>>         API (Alexey)
>>> ---
>>>    hw/virtio/virtio-mem.c | 32 +++++---------------------------
>>>    include/exec/memory.h  | 27 +++++++++++++++++++++++++++
>>>    2 files changed, 32 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>>> index b1a003736b..21f16e4912 100644
>>> --- a/hw/virtio/virtio-mem.c
>>> +++ b/hw/virtio/virtio-mem.c
>>> @@ -244,28 +244,6 @@ static int
>>> virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg,
>>>        return ret;
>>>    }
>>>    -/*
>>> - * Adjust the memory section to cover the intersection with the given
>>> range.
>>> - *
>>> - * Returns false if the intersection is empty, otherwise returns true.
>>> - */
>>> -static bool virtio_mem_intersect_memory_section(MemoryRegionSection *s,
>>> -                                                uint64_t offset,
>>> uint64_t size)
>>> -{
>>> -    uint64_t start = MAX(s->offset_within_region, offset);
>>> -    uint64_t end = MIN(s->offset_within_region + int128_get64(s->size),
>>> -                       offset + size);
>>> -
>>> -    if (end <= start) {
>>> -        return false;
>>> -    }
>>> -
>>> -    s->offset_within_address_space += start - s->offset_within_region;
>>> -    s->offset_within_region = start;
>>> -    s->size = int128_make64(end - start);
>>> -    return true;
>>> -}
>>> -
>>>    typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void
>>> *arg);
>>>      static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
>>> @@ -287,7 +265,7 @@ static int
>>> virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
>>>                                          first_bit + 1) - 1;
>>>            size = (last_bit - first_bit + 1) * vmem->block_size;
>>>    -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>>> size)) {
>>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>>> size)) {
>>>                break;
>>>            }
>>>            ret = cb(&tmp, arg);
>>> @@ -319,7 +297,7 @@ static int
>>> virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
>>>                                     first_bit + 1) - 1;
>>>            size = (last_bit - first_bit + 1) * vmem->block_size;
>>>    -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>>> size)) {
>>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>>> size)) {
>>>                break;
>>>            }
>>>            ret = cb(&tmp, arg);
>>> @@ -355,7 +333,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM
>>> *vmem, uint64_t offset,
>>>        QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
>>>            MemoryRegionSection tmp = *rdl->section;
>>>    -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>>> size)) {
>>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>>> size)) {
>>>                continue;
>>>            }
>>>            rdl->notify_discard(rdl, &tmp);
>>> @@ -371,7 +349,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem,
>>> uint64_t offset,
>>>        QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
>>>            MemoryRegionSection tmp = *rdl->section;
>>>    -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>>> size)) {
>>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>>> size)) {
>>>                continue;
>>>            }
>>>            ret = rdl->notify_populate(rdl, &tmp);
>>> @@ -388,7 +366,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem,
>>> uint64_t offset,
>>>                if (rdl2 == rdl) {
>>>                    break;
>>>                }
>>> -            if (!virtio_mem_intersect_memory_section(&tmp, offset,
>>> size)) {
>>> +            if (!memory_region_section_intersect_range(&tmp, offset,
>>> size)) {
>>>                    continue;
>>>                }
>>>                rdl2->notify_discard(rdl2, &tmp);
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index 3ee1901b52..3bebc43d59 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -1202,6 +1202,33 @@ MemoryRegionSection
>>> *memory_region_section_new_copy(MemoryRegionSection *s);
>>>     */
>>>    void memory_region_section_free_copy(MemoryRegionSection *s);
>>>    +/**
>>> + * memory_region_section_intersect_range: Adjust the memory section
>>> to cover
>>> + * the intersection with the given range.
>>> + *
>>> + * @s: the #MemoryRegionSection to be adjusted
>>> + * @offset: the offset of the given range in the memory region
>>> + * @size: the size of the given range
>>> + *
>>> + * Returns false if the intersection is empty, otherwise returns true.
>>> + */
>>> +static inline bool
>>> memory_region_section_intersect_range(MemoryRegionSection *s,
>>> +                                                         uint64_t
>>> offset, uint64_t size)
>>> +{
>>> +    uint64_t start = MAX(s->offset_within_region, offset);
>>> +    Int128 end = int128_min(int128_add(int128_make64(s-
>>>> offset_within_region), s->size),
>>> +                            int128_add(int128_make64(offset),
>>> int128_make64(size)));
> 
> The Int128_* format helper make the line over 80. I think it's better
> not wrap it for readability.

I'd just reduce indent to previous line + 4 spaces vs the current "under 
the opening bracket" rule which I dislike anyway :) Thanks,


>>> +
>>> +    if (int128_le(end, int128_make64(start))) {
>>> +        return false;
>>> +    }
>>> +
>>> +    s->offset_within_address_space += start - s->offset_within_region;
>>> +    s->offset_within_region = start;
>>> +    s->size = int128_sub(end, int128_make64(start));
>>> +    return true;
>>> +}
>>> +
>>>    /**
>>>     * memory_region_init: Initialize a memory region
>>>     *
>>
> 

-- 
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard()
  2025-04-09  5:43   ` Alexey Kardashevskiy
@ 2025-04-09  6:56     ` Chenyi Qiang
  2025-04-25 12:44     ` David Hildenbrand
  1 sibling, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-09  6:56 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/9/2025 1:43 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> Update ReplayRamDiscard() function to return the result and unify the
>> ReplayRamPopulate() and ReplayRamDiscard() to ReplayStateChange() at
>> the same time due to their identical definitions. This unification
>> simplifies related structures, such as VirtIOMEMReplayData, which makes
>> it more cleaner and maintainable.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>> Changes in v4:
>>      - Modify the commit message. We won't use Replay() operation when
>>        doing the attribute change like v3.
>>
>> Changes in v3:
>>      - Newly added.
>> ---
>>   hw/virtio/virtio-mem.c | 20 ++++++++++----------
>>   include/exec/memory.h  | 31 ++++++++++++++++---------------
>>   migration/ram.c        |  5 +++--
>>   system/memory.c        | 12 ++++++------
>>   4 files changed, 35 insertions(+), 33 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>> index d0d3a0240f..1a88d649cb 100644
>> --- a/hw/virtio/virtio-mem.c
>> +++ b/hw/virtio/virtio-mem.c
>> @@ -1733,7 +1733,7 @@ static bool virtio_mem_rdm_is_populated(const
>> RamDiscardManager *rdm,
>>   }
>>     struct VirtIOMEMReplayData {
>> -    void *fn;
>> +    ReplayStateChange fn;
> 
> 
> s/ReplayStateChange/ReplayRamStateChange/
> 
> Just "State" is way too generic imho.

LGTM.

> 
> 
>>       void *opaque;
>>   };
>>   @@ -1741,12 +1741,12 @@ static int
>> virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
>>   {
>>       struct VirtIOMEMReplayData *data = arg;
>>   -    return ((ReplayRamPopulate)data->fn)(s, data->opaque);
>> +    return data->fn(s, data->opaque);
>>   }
>>     static int virtio_mem_rdm_replay_populated(const RamDiscardManager
>> *rdm,
>>                                              MemoryRegionSection *s,
>> -                                           ReplayRamPopulate replay_fn,
>> +                                           ReplayStateChange replay_fn,
>>                                              void *opaque)
>>   {
>>       const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
>> @@ -1765,14 +1765,14 @@ static int
>> virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
>>   {
>>       struct VirtIOMEMReplayData *data = arg;
>>   -    ((ReplayRamDiscard)data->fn)(s, data->opaque);
>> +    data->fn(s, data->opaque);
>>       return 0;
> 
> return data->fn(s, data->opaque); ?
> 
> Or a comment why we ignore the return result? Thanks,

You are right, since we have made ReplayRamDiscard() return the results,
it is doable to use "return data->fn(s, data->opaque)" directly. Thanks.

> 
>>   }
>>   -static void virtio_mem_rdm_replay_discarded(const RamDiscardManager
>> *rdm,
>> -                                            MemoryRegionSection *s,
>> -                                            ReplayRamDiscard replay_fn,
>> -                                            void *opaque)
>> +static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
>> +                                           MemoryRegionSection *s,
>> +                                           ReplayStateChange replay_fn,
>> +                                           void *opaque)
>>   {
>>       const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
>>       struct VirtIOMEMReplayData data = {
>> @@ -1781,8 +1781,8 @@ static void
>> virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
>>       };
>>         g_assert(s->mr == &vmem->memdev->mr);
>> -    virtio_mem_for_each_unplugged_section(vmem, s, &data,
>> -                                         
>> virtio_mem_rdm_replay_discarded_cb);
>> +    return virtio_mem_for_each_unplugged_section(vmem, s, &data,
>> +                                                
>> virtio_mem_rdm_replay_discarded_cb);
> 
> 
> a nit: "WARNING: line over 80 characters" - I have no idea what is the
> best thing to do here though.

It is not easy to adjust the line. Then I think we can keep it.

> 
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range
  2025-04-09  6:45       ` Alexey Kardashevskiy
@ 2025-04-09  7:38         ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-09  7:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/9/2025 2:45 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 9/4/25 16:26, Chenyi Qiang wrote:
>>
>>
>> On 4/9/2025 10:47 AM, Alexey Kardashevskiy wrote:
>>>
>>> On 7/4/25 17:49, Chenyi Qiang wrote:
>>>> Rename the helper to memory_region_section_intersect_range() to make it
>>>> more generic. Meanwhile, define the @end as Int128 and replace the
>>>> related operations with Int128_* format since the helper is exported as
>>>> a wider API.
>>>>
>>>> Suggested-by: Alexey Kardashevskiy <aik@amd.com>
>>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>>>
>>> ./scripts/checkpatch.pl complains "WARNING: line over 80 characters"
>>>
>>> with that fixed,
>>
>> I observed many places in QEMU ignore the WARNING for over 80
>> characters, so I also ignored them in my series.
>>
>> After checking the rule in docs/devel/style.rst, I think I should try
>> best to make it not longer than 80. But if it is hard to do so due to
>> long function or symbol names, it is acceptable to not wrap it.
>>
>> Then, I would modify the first warning code. For the latter two
>> warnings, see code below
>>
>>>
>>> Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
>>>
>>>> ---
>>>> Changes in v4:
>>>>       - No change.
>>>>
>>>> Changes in v3:
>>>>       - No change
>>>>
>>>> Changes in v2:
>>>>       - Make memory_region_section_intersect_range() an inline
>>>> function.
>>>>       - Add Reviewed-by from David
>>>>       - Define the @end as Int128 and use the related Int128_* ops as a
>>>> wilder
>>>>         API (Alexey)
>>>> ---
>>>>    hw/virtio/virtio-mem.c | 32 +++++---------------------------
>>>>    include/exec/memory.h  | 27 +++++++++++++++++++++++++++
>>>>    2 files changed, 32 insertions(+), 27 deletions(-)
>>>>
>>>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>>>> index b1a003736b..21f16e4912 100644
>>>> --- a/hw/virtio/virtio-mem.c
>>>> +++ b/hw/virtio/virtio-mem.c
>>>> @@ -244,28 +244,6 @@ static int
>>>> virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg,
>>>>        return ret;
>>>>    }
>>>>    -/*
>>>> - * Adjust the memory section to cover the intersection with the given
>>>> range.
>>>> - *
>>>> - * Returns false if the intersection is empty, otherwise returns true.
>>>> - */
>>>> -static bool virtio_mem_intersect_memory_section(MemoryRegionSection
>>>> *s,
>>>> -                                                uint64_t offset,
>>>> uint64_t size)
>>>> -{
>>>> -    uint64_t start = MAX(s->offset_within_region, offset);
>>>> -    uint64_t end = MIN(s->offset_within_region + int128_get64(s-
>>>> >size),
>>>> -                       offset + size);
>>>> -
>>>> -    if (end <= start) {
>>>> -        return false;
>>>> -    }
>>>> -
>>>> -    s->offset_within_address_space += start - s->offset_within_region;
>>>> -    s->offset_within_region = start;
>>>> -    s->size = int128_make64(end - start);
>>>> -    return true;
>>>> -}
>>>> -
>>>>    typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void
>>>> *arg);
>>>>      static int virtio_mem_for_each_plugged_section(const VirtIOMEM
>>>> *vmem,
>>>> @@ -287,7 +265,7 @@ static int
>>>> virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
>>>>                                          first_bit + 1) - 1;
>>>>            size = (last_bit - first_bit + 1) * vmem->block_size;
>>>>    -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>>>> size)) {
>>>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>>>> size)) {
>>>>                break;
>>>>            }
>>>>            ret = cb(&tmp, arg);
>>>> @@ -319,7 +297,7 @@ static int
>>>> virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
>>>>                                     first_bit + 1) - 1;
>>>>            size = (last_bit - first_bit + 1) * vmem->block_size;
>>>>    -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>>>> size)) {
>>>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>>>> size)) {
>>>>                break;
>>>>            }
>>>>            ret = cb(&tmp, arg);
>>>> @@ -355,7 +333,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM
>>>> *vmem, uint64_t offset,
>>>>        QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
>>>>            MemoryRegionSection tmp = *rdl->section;
>>>>    -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>>>> size)) {
>>>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>>>> size)) {
>>>>                continue;
>>>>            }
>>>>            rdl->notify_discard(rdl, &tmp);
>>>> @@ -371,7 +349,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem,
>>>> uint64_t offset,
>>>>        QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
>>>>            MemoryRegionSection tmp = *rdl->section;
>>>>    -        if (!virtio_mem_intersect_memory_section(&tmp, offset,
>>>> size)) {
>>>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>>>> size)) {
>>>>                continue;
>>>>            }
>>>>            ret = rdl->notify_populate(rdl, &tmp);
>>>> @@ -388,7 +366,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem,
>>>> uint64_t offset,
>>>>                if (rdl2 == rdl) {
>>>>                    break;
>>>>                }
>>>> -            if (!virtio_mem_intersect_memory_section(&tmp, offset,
>>>> size)) {
>>>> +            if (!memory_region_section_intersect_range(&tmp, offset,
>>>> size)) {
>>>>                    continue;
>>>>                }
>>>>                rdl2->notify_discard(rdl2, &tmp);
>>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>>> index 3ee1901b52..3bebc43d59 100644
>>>> --- a/include/exec/memory.h
>>>> +++ b/include/exec/memory.h
>>>> @@ -1202,6 +1202,33 @@ MemoryRegionSection
>>>> *memory_region_section_new_copy(MemoryRegionSection *s);
>>>>     */
>>>>    void memory_region_section_free_copy(MemoryRegionSection *s);
>>>>    +/**
>>>> + * memory_region_section_intersect_range: Adjust the memory section
>>>> to cover
>>>> + * the intersection with the given range.
>>>> + *
>>>> + * @s: the #MemoryRegionSection to be adjusted
>>>> + * @offset: the offset of the given range in the memory region
>>>> + * @size: the size of the given range
>>>> + *
>>>> + * Returns false if the intersection is empty, otherwise returns true.
>>>> + */
>>>> +static inline bool
>>>> memory_region_section_intersect_range(MemoryRegionSection *s,
>>>> +                                                         uint64_t
>>>> offset, uint64_t size)
>>>> +{
>>>> +    uint64_t start = MAX(s->offset_within_region, offset);
>>>> +    Int128 end = int128_min(int128_add(int128_make64(s-
>>>>> offset_within_region), s->size),

[..]

>>>> +                            int128_add(int128_make64(offset),
>>>> int128_make64(size)));
>>
>> The Int128_* format helper make the line over 80. I think it's better
>> not wrap it for readability.
> 
> I'd just reduce indent to previous line + 4 spaces vs the current "under
> the opening bracket" rule which I dislike anyway :) Thanks,

I can make the adjustment for this line. As for the previous line which
also reports a warning, do you think it needs to wrap?

> 
> 



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-07  7:49 ` [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager Chenyi Qiang
@ 2025-04-09  9:56   ` Alexey Kardashevskiy
  2025-04-09 12:57     ` Chenyi Qiang
  2025-04-25 12:49     ` David Hildenbrand
  0 siblings, 2 replies; 67+ messages in thread
From: Alexey Kardashevskiy @ 2025-04-09  9:56 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 7/4/25 17:49, Chenyi Qiang wrote:
> RamDiscardManager is an interface used by virtio-mem to adjust VFIO
> mappings in relation to VM page assignment. It manages the state of
> populated and discard for the RAM. To accommodate future scnarios for
> managing RAM states, such as private and shared states in confidential
> VMs, the existing RamDiscardManager interface needs to be generalized.
> 
> Introduce a parent class, GenericStateManager, to manage a pair of

"GenericState" is the same as "State" really. Call it RamStateManager.



> opposite states with RamDiscardManager as its child. The changes include
> - Define a new abstract class GenericStateChange.
> - Extract six callbacks into GenericStateChangeClass and allow the child
>    classes to inherit them.
> - Modify RamDiscardManager-related helpers to use GenericStateManager
>    ones.
> - Define a generic StatChangeListener to extract fields from

"e" missing in StateChangeListener.

>    RamDiscardManager listener which allows future listeners to embed it
>    and avoid duplication.
> - Change the users of RamDiscardManager (virtio-mem, migration, etc.) to
>    switch to use GenericStateChange helpers.
> 
> It can provide a more flexible and resuable framework for RAM state
> management, facilitating future enhancements and use cases.

I fail to see how new interface helps with this. RamDiscardManager 
manipulates populated/discarded. It would make sense may be if the new 
class had more bits per page, say private/shared/discarded but it does 
not. And PrivateSharedManager cannot coexist with RamDiscard. imho this 
is going in a wrong direction.


> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v4:
>      - Newly added.
> ---
>   hw/vfio/common.c        |  30 ++--
>   hw/virtio/virtio-mem.c  |  95 ++++++------
>   include/exec/memory.h   | 313 ++++++++++++++++++++++------------------
>   migration/ram.c         |  16 +-
>   system/memory.c         | 106 ++++++++------
>   system/memory_mapping.c |   6 +-
>   6 files changed, 310 insertions(+), 256 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f7499a9b74..3172d877cc 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -335,9 +335,10 @@ out:
>       rcu_read_unlock();
>   }
>   
> -static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
> +static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
>                                               MemoryRegionSection *section)
>   {
> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
>       VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
>                                                   listener);
>       VFIOContainerBase *bcontainer = vrdl->bcontainer;
> @@ -353,9 +354,10 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>       }
>   }
>   
> -static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
> +static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
>                                               MemoryRegionSection *section)
>   {
> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
>       VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
>                                                   listener);

VFIORamDiscardListener *vrdl = container_of(scl, VFIORamDiscardListener, 
listener.scl) and drop @ rdl? Thanks,


>       VFIOContainerBase *bcontainer = vrdl->bcontainer;
> @@ -381,7 +383,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>                                        vaddr, section->readonly);
>           if (ret) {
>               /* Rollback */
> -            vfio_ram_discard_notify_discard(rdl, section);
> +            vfio_ram_discard_notify_discard(scl, section);
>               return ret;
>           }
>       }
> @@ -391,8 +393,9 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>   static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
>                                                  MemoryRegionSection *section)
>   {
> -    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
> +    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
>       VFIORamDiscardListener *vrdl;
> +    RamDiscardListener *rdl;
>   
>       /* Ignore some corner cases not relevant in practice. */
>       g_assert(QEMU_IS_ALIGNED(section->offset_within_region, TARGET_PAGE_SIZE));
> @@ -405,17 +408,18 @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
>       vrdl->mr = section->mr;
>       vrdl->offset_within_address_space = section->offset_within_address_space;
>       vrdl->size = int128_get64(section->size);
> -    vrdl->granularity = ram_discard_manager_get_min_granularity(rdm,
> -                                                                section->mr);
> +    vrdl->granularity = generic_state_manager_get_min_granularity(gsm,
> +                                                                  section->mr);
>   
>       g_assert(vrdl->granularity && is_power_of_2(vrdl->granularity));
>       g_assert(bcontainer->pgsizes &&
>                vrdl->granularity >= 1ULL << ctz64(bcontainer->pgsizes));
>   
> -    ram_discard_listener_init(&vrdl->listener,
> +    rdl = &vrdl->listener;
> +    ram_discard_listener_init(rdl,
>                                 vfio_ram_discard_notify_populate,
>                                 vfio_ram_discard_notify_discard, true);
> -    ram_discard_manager_register_listener(rdm, &vrdl->listener, section);
> +    generic_state_manager_register_listener(gsm, &rdl->scl, section);
>       QLIST_INSERT_HEAD(&bcontainer->vrdl_list, vrdl, next);
>   
>       /*
> @@ -465,8 +469,9 @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
>   static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
>                                                    MemoryRegionSection *section)
>   {
> -    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
> +    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
>       VFIORamDiscardListener *vrdl = NULL;
> +    RamDiscardListener *rdl;
>   
>       QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
>           if (vrdl->mr == section->mr &&
> @@ -480,7 +485,8 @@ static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
>           hw_error("vfio: Trying to unregister missing RAM discard listener");
>       }
>   
> -    ram_discard_manager_unregister_listener(rdm, &vrdl->listener);
> +    rdl = &vrdl->listener;
> +    generic_state_manager_unregister_listener(gsm, &rdl->scl);
>       QLIST_REMOVE(vrdl, next);
>       g_free(vrdl);
>   }
> @@ -1265,7 +1271,7 @@ static int
>   vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
>                                               MemoryRegionSection *section)
>   {
> -    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
> +    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
>       VFIORamDiscardListener *vrdl = NULL;
>   
>       QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
> @@ -1284,7 +1290,7 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
>        * We only want/can synchronize the bitmap for actually mapped parts -
>        * which correspond to populated parts. Replay all populated parts.
>        */
> -    return ram_discard_manager_replay_populated(rdm, section,
> +    return generic_state_manager_replay_on_state_set(gsm, section,
>                                                 vfio_ram_discard_get_dirty_bitmap,
>                                                   &vrdl);
>   }
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 1a88d649cb..40e8267254 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -312,16 +312,16 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
>   
>   static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
>   {
> -    RamDiscardListener *rdl = arg;
> +    StateChangeListener *scl = arg;
>   
> -    return rdl->notify_populate(rdl, s);
> +    return scl->notify_to_state_set(scl, s);
>   }
>   
>   static int virtio_mem_notify_discard_cb(MemoryRegionSection *s, void *arg)
>   {
> -    RamDiscardListener *rdl = arg;
> +    StateChangeListener *scl = arg;
>   
> -    rdl->notify_discard(rdl, s);
> +    scl->notify_to_state_clear(scl, s);
>       return 0;
>   }
>   
> @@ -331,12 +331,13 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
>       RamDiscardListener *rdl;
>   
>       QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
> -        MemoryRegionSection tmp = *rdl->section;
> +        StateChangeListener *scl = &rdl->scl;
> +        MemoryRegionSection tmp = *scl->section;
>   
>           if (!memory_region_section_intersect_range(&tmp, offset, size)) {
>               continue;
>           }
> -        rdl->notify_discard(rdl, &tmp);
> +        scl->notify_to_state_clear(scl, &tmp);
>       }
>   }
>   
> @@ -347,12 +348,13 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
>       int ret = 0;
>   
>       QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
> -        MemoryRegionSection tmp = *rdl->section;
> +        StateChangeListener *scl = &rdl->scl;
> +        MemoryRegionSection tmp = *scl->section;
>   
>           if (!memory_region_section_intersect_range(&tmp, offset, size)) {
>               continue;
>           }
> -        ret = rdl->notify_populate(rdl, &tmp);
> +        ret = scl->notify_to_state_set(scl, &tmp);
>           if (ret) {
>               break;
>           }
> @@ -361,7 +363,8 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
>       if (ret) {
>           /* Notify all already-notified listeners. */
>           QLIST_FOREACH(rdl2, &vmem->rdl_list, next) {
> -            MemoryRegionSection tmp = *rdl2->section;
> +            StateChangeListener *scl2 = &rdl2->scl;
> +            MemoryRegionSection tmp = *scl2->section;
>   
>               if (rdl2 == rdl) {
>                   break;
> @@ -369,7 +372,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
>               if (!memory_region_section_intersect_range(&tmp, offset, size)) {
>                   continue;
>               }
> -            rdl2->notify_discard(rdl2, &tmp);
> +            scl2->notify_to_state_clear(scl2, &tmp);
>           }
>       }
>       return ret;
> @@ -384,10 +387,11 @@ static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem)
>       }
>   
>       QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
> +        StateChangeListener *scl = &rdl->scl;
>           if (rdl->double_discard_supported) {
> -            rdl->notify_discard(rdl, rdl->section);
> +            scl->notify_to_state_clear(scl, scl->section);
>           } else {
> -            virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
> +            virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
>                                                   virtio_mem_notify_discard_cb);
>           }
>       }
> @@ -1053,8 +1057,8 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>        * Set ourselves as RamDiscardManager before the plug handler maps the
>        * memory region and exposes it via an address space.
>        */
> -    if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
> -                                              RAM_DISCARD_MANAGER(vmem))) {
> +    if (memory_region_set_generic_state_manager(&vmem->memdev->mr,
> +                                                GENERIC_STATE_MANAGER(vmem))) {
>           error_setg(errp, "Failed to set RamDiscardManager");
>           ram_block_coordinated_discard_require(false);
>           return;
> @@ -1158,7 +1162,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
>        * The unplug handler unmapped the memory region, it cannot be
>        * found via an address space anymore. Unset ourselves.
>        */
> -    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
> +    memory_region_set_generic_state_manager(&vmem->memdev->mr, NULL);
>       ram_block_coordinated_discard_require(false);
>   }
>   
> @@ -1207,7 +1211,8 @@ static int virtio_mem_post_load_bitmap(VirtIOMEM *vmem)
>        * into an address space. Replay, now that we updated the bitmap.
>        */
>       QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
> -        ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
> +        StateChangeListener *scl = &rdl->scl;
> +        ret = virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
>                                                    virtio_mem_notify_populate_cb);
>           if (ret) {
>               return ret;
> @@ -1704,19 +1709,19 @@ static const Property virtio_mem_properties[] = {
>                        dynamic_memslots, false),
>   };
>   
> -static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm,
> +static uint64_t virtio_mem_rdm_get_min_granularity(const GenericStateManager *gsm,
>                                                      const MemoryRegion *mr)
>   {
> -    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
> +    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
>   
>       g_assert(mr == &vmem->memdev->mr);
>       return vmem->block_size;
>   }
>   
> -static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
> +static bool virtio_mem_rdm_is_populated(const GenericStateManager *gsm,
>                                           const MemoryRegionSection *s)
>   {
> -    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
> +    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
>       uint64_t start_gpa = vmem->addr + s->offset_within_region;
>       uint64_t end_gpa = start_gpa + int128_get64(s->size);
>   
> @@ -1744,12 +1749,12 @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
>       return data->fn(s, data->opaque);
>   }
>   
> -static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
> +static int virtio_mem_rdm_replay_populated(const GenericStateManager *gsm,
>                                              MemoryRegionSection *s,
>                                              ReplayStateChange replay_fn,
>                                              void *opaque)
>   {
> -    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
> +    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
>       struct VirtIOMEMReplayData data = {
>           .fn = replay_fn,
>           .opaque = opaque,
> @@ -1769,12 +1774,12 @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
>       return 0;
>   }
>   
> -static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
> +static int virtio_mem_rdm_replay_discarded(const GenericStateManager *gsm,
>                                              MemoryRegionSection *s,
>                                              ReplayStateChange replay_fn,
>                                              void *opaque)
>   {
> -    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
> +    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
>       struct VirtIOMEMReplayData data = {
>           .fn = replay_fn,
>           .opaque = opaque,
> @@ -1785,18 +1790,19 @@ static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
>                                                    virtio_mem_rdm_replay_discarded_cb);
>   }
>   
> -static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
> -                                             RamDiscardListener *rdl,
> +static void virtio_mem_rdm_register_listener(GenericStateManager *gsm,
> +                                             StateChangeListener *scl,
>                                                MemoryRegionSection *s)
>   {
> -    VirtIOMEM *vmem = VIRTIO_MEM(rdm);
> +    VirtIOMEM *vmem = VIRTIO_MEM(gsm);
> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
>       int ret;
>   
>       g_assert(s->mr == &vmem->memdev->mr);
> -    rdl->section = memory_region_section_new_copy(s);
> +    scl->section = memory_region_section_new_copy(s);
>   
>       QLIST_INSERT_HEAD(&vmem->rdl_list, rdl, next);
> -    ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
> +    ret = virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
>                                                 virtio_mem_notify_populate_cb);
>       if (ret) {
>           error_report("%s: Replaying plugged ranges failed: %s", __func__,
> @@ -1804,23 +1810,24 @@ static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
>       }
>   }
>   
> -static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm,
> -                                               RamDiscardListener *rdl)
> +static void virtio_mem_rdm_unregister_listener(GenericStateManager *gsm,
> +                                               StateChangeListener *scl)
>   {
> -    VirtIOMEM *vmem = VIRTIO_MEM(rdm);
> +    VirtIOMEM *vmem = VIRTIO_MEM(gsm);
> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
>   
> -    g_assert(rdl->section->mr == &vmem->memdev->mr);
> +    g_assert(scl->section->mr == &vmem->memdev->mr);
>       if (vmem->size) {
>           if (rdl->double_discard_supported) {
> -            rdl->notify_discard(rdl, rdl->section);
> +            scl->notify_to_state_clear(scl, scl->section);
>           } else {
> -            virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
> +            virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
>                                                   virtio_mem_notify_discard_cb);
>           }
>       }
>   
> -    memory_region_section_free_copy(rdl->section);
> -    rdl->section = NULL;
> +    memory_region_section_free_copy(scl->section);
> +    scl->section = NULL;
>       QLIST_REMOVE(rdl, next);
>   }
>   
> @@ -1853,7 +1860,7 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
>       DeviceClass *dc = DEVICE_CLASS(klass);
>       VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
>       VirtIOMEMClass *vmc = VIRTIO_MEM_CLASS(klass);
> -    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(klass);
> +    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(klass);
>   
>       device_class_set_props(dc, virtio_mem_properties);
>       dc->vmsd = &vmstate_virtio_mem;
> @@ -1874,12 +1881,12 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
>       vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier;
>       vmc->unplug_request_check = virtio_mem_unplug_request_check;
>   
> -    rdmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
> -    rdmc->is_populated = virtio_mem_rdm_is_populated;
> -    rdmc->replay_populated = virtio_mem_rdm_replay_populated;
> -    rdmc->replay_discarded = virtio_mem_rdm_replay_discarded;
> -    rdmc->register_listener = virtio_mem_rdm_register_listener;
> -    rdmc->unregister_listener = virtio_mem_rdm_unregister_listener;
> +    gsmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
> +    gsmc->is_state_set = virtio_mem_rdm_is_populated;
> +    gsmc->replay_on_state_set = virtio_mem_rdm_replay_populated;
> +    gsmc->replay_on_state_clear = virtio_mem_rdm_replay_discarded;
> +    gsmc->register_listener = virtio_mem_rdm_register_listener;
> +    gsmc->unregister_listener = virtio_mem_rdm_unregister_listener;
>   }
>   
>   static const TypeInfo virtio_mem_info = {
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 3b1d25a403..30e5838d02 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -43,6 +43,12 @@ typedef struct IOMMUMemoryRegionClass IOMMUMemoryRegionClass;
>   DECLARE_OBJ_CHECKERS(IOMMUMemoryRegion, IOMMUMemoryRegionClass,
>                        IOMMU_MEMORY_REGION, TYPE_IOMMU_MEMORY_REGION)
>   
> +#define TYPE_GENERIC_STATE_MANAGER "generic-state-manager"
> +typedef struct GenericStateManagerClass GenericStateManagerClass;
> +typedef struct GenericStateManager GenericStateManager;
> +DECLARE_OBJ_CHECKERS(GenericStateManager, GenericStateManagerClass,
> +                     GENERIC_STATE_MANAGER, TYPE_GENERIC_STATE_MANAGER)
> +
>   #define TYPE_RAM_DISCARD_MANAGER "ram-discard-manager"
>   typedef struct RamDiscardManagerClass RamDiscardManagerClass;
>   typedef struct RamDiscardManager RamDiscardManager;
> @@ -506,103 +512,59 @@ struct IOMMUMemoryRegionClass {
>       int (*num_indexes)(IOMMUMemoryRegion *iommu);
>   };
>   
> -typedef struct RamDiscardListener RamDiscardListener;
> -typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl,
> -                                 MemoryRegionSection *section);
> -typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl,
> +typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
> +
> +typedef struct StateChangeListener StateChangeListener;
> +typedef int (*NotifyStateSet)(StateChangeListener *scl,
> +                              MemoryRegionSection *section);
> +typedef void (*NotifyStateClear)(StateChangeListener *scl,
>                                    MemoryRegionSection *section);
>   
> -struct RamDiscardListener {
> +struct StateChangeListener {
>       /*
> -     * @notify_populate:
> +     * @notify_to_state_set:
>        *
> -     * Notification that previously discarded memory is about to get populated.
> -     * Listeners are able to object. If any listener objects, already
> -     * successfully notified listeners are notified about a discard again.
> +     * Notification that previously state clear part is about to be set.
>        *
> -     * @rdl: the #RamDiscardListener getting notified
> -     * @section: the #MemoryRegionSection to get populated. The section
> +     * @scl: the #StateChangeListener getting notified
> +     * @section: the #MemoryRegionSection to be state-set. The section
>        *           is aligned within the memory region to the minimum granularity
>        *           unless it would exceed the registered section.
>        *
>        * Returns 0 on success. If the notification is rejected by the listener,
>        * an error is returned.
>        */
> -    NotifyRamPopulate notify_populate;
> +    NotifyStateSet notify_to_state_set;
>   
>       /*
> -     * @notify_discard:
> +     * @notify_to_state_clear:
>        *
> -     * Notification that previously populated memory was discarded successfully
> -     * and listeners should drop all references to such memory and prevent
> -     * new population (e.g., unmap).
> +     * Notification that previously state set part is about to be cleared
>        *
> -     * @rdl: the #RamDiscardListener getting notified
> -     * @section: the #MemoryRegionSection to get populated. The section
> +     * @scl: the #StateChangeListener getting notified
> +     * @section: the #MemoryRegionSection to be state-cleared. The section
>        *           is aligned within the memory region to the minimum granularity
>        *           unless it would exceed the registered section.
> -     */
> -    NotifyRamDiscard notify_discard;
> -
> -    /*
> -     * @double_discard_supported:
>        *
> -     * The listener suppors getting @notify_discard notifications that span
> -     * already discarded parts.
> +     * Returns 0 on success. If the notification is rejected by the listener,
> +     * an error is returned.
>        */
> -    bool double_discard_supported;
> +    NotifyStateClear notify_to_state_clear;
>   
>       MemoryRegionSection *section;
> -    QLIST_ENTRY(RamDiscardListener) next;
>   };
>   
> -static inline void ram_discard_listener_init(RamDiscardListener *rdl,
> -                                             NotifyRamPopulate populate_fn,
> -                                             NotifyRamDiscard discard_fn,
> -                                             bool double_discard_supported)
> -{
> -    rdl->notify_populate = populate_fn;
> -    rdl->notify_discard = discard_fn;
> -    rdl->double_discard_supported = double_discard_supported;
> -}
> -
> -typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
> -
>   /*
> - * RamDiscardManagerClass:
> - *
> - * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
> - * regions are currently populated to be used/accessed by the VM, notifying
> - * after parts were discarded (freeing up memory) and before parts will be
> - * populated (consuming memory), to be used/accessed by the VM.
> - *
> - * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
> - * #MemoryRegion isn't mapped into an address space yet (either directly
> - * or via an alias); it cannot change while the #MemoryRegion is
> - * mapped into an address space.
> + * GenericStateManagerClass:
>    *
> - * The #RamDiscardManager is intended to be used by technologies that are
> - * incompatible with discarding of RAM (e.g., VFIO, which may pin all
> - * memory inside a #MemoryRegion), and require proper coordination to only
> - * map the currently populated parts, to hinder parts that are expected to
> - * remain discarded from silently getting populated and consuming memory.
> - * Technologies that support discarding of RAM don't have to bother and can
> - * simply map the whole #MemoryRegion.
> - *
> - * An example #RamDiscardManager is virtio-mem, which logically (un)plugs
> - * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
> - * Logically unplugging memory consists of discarding RAM. The VM agreed to not
> - * access unplugged (discarded) memory - especially via DMA. virtio-mem will
> - * properly coordinate with listeners before memory is plugged (populated),
> - * and after memory is unplugged (discarded).
> + * A #GenericStateManager is a common interface used to manage the state of
> + * a #MemoryRegion. The managed states is a pair of opposite states, such as
> + * populated and discarded, or private and shared. It is abstract as set and
> + * clear in below callbacks, and the actual state is managed by the
> + * implementation.
>    *
> - * Listeners are called in multiples of the minimum granularity (unless it
> - * would exceed the registered range) and changes are aligned to the minimum
> - * granularity within the #MemoryRegion. Listeners have to prepare for memory
> - * becoming discarded in a different granularity than it was populated and the
> - * other way around.
>    */
> -struct RamDiscardManagerClass {
> +struct GenericStateManagerClass {
>       /* private */
>       InterfaceClass parent_class;
>   
> @@ -612,122 +574,188 @@ struct RamDiscardManagerClass {
>        * @get_min_granularity:
>        *
>        * Get the minimum granularity in which listeners will get notified
> -     * about changes within the #MemoryRegion via the #RamDiscardManager.
> +     * about changes within the #MemoryRegion via the #GenericStateManager.
>        *
> -     * @rdm: the #RamDiscardManager
> +     * @gsm: the #GenericStateManager
>        * @mr: the #MemoryRegion
>        *
>        * Returns the minimum granularity.
>        */
> -    uint64_t (*get_min_granularity)(const RamDiscardManager *rdm,
> +    uint64_t (*get_min_granularity)(const GenericStateManager *gsm,
>                                       const MemoryRegion *mr);
>   
>       /**
> -     * @is_populated:
> +     * @is_state_set:
>        *
> -     * Check whether the given #MemoryRegionSection is completely populated
> -     * (i.e., no parts are currently discarded) via the #RamDiscardManager.
> -     * There are no alignment requirements.
> +     * Check whether the given #MemoryRegionSection state is set.
> +     * via the #GenericStateManager.
>        *
> -     * @rdm: the #RamDiscardManager
> +     * @gsm: the #GenericStateManager
>        * @section: the #MemoryRegionSection
>        *
> -     * Returns whether the given range is completely populated.
> +     * Returns whether the given range is completely set.
>        */
> -    bool (*is_populated)(const RamDiscardManager *rdm,
> +    bool (*is_state_set)(const GenericStateManager *gsm,
>                            const MemoryRegionSection *section);
>   
>       /**
> -     * @replay_populated:
> +     * @replay_on_state_set:
>        *
> -     * Call the #ReplayStateChange callback for all populated parts within the
> -     * #MemoryRegionSection via the #RamDiscardManager.
> +     * Call the #ReplayStateChange callback for all state set parts within the
> +     * #MemoryRegionSection via the #GenericStateManager.
>        *
>        * In case any call fails, no further calls are made.
>        *
> -     * @rdm: the #RamDiscardManager
> +     * @gsm: the #GenericStateManager
>        * @section: the #MemoryRegionSection
>        * @replay_fn: the #ReplayStateChange callback
>        * @opaque: pointer to forward to the callback
>        *
>        * Returns 0 on success, or a negative error if any notification failed.
>        */
> -    int (*replay_populated)(const RamDiscardManager *rdm,
> -                            MemoryRegionSection *section,
> -                            ReplayStateChange replay_fn, void *opaque);
> +    int (*replay_on_state_set)(const GenericStateManager *gsm,
> +                               MemoryRegionSection *section,
> +                               ReplayStateChange replay_fn, void *opaque);
>   
>       /**
> -     * @replay_discarded:
> +     * @replay_on_state_clear:
>        *
> -     * Call the #ReplayStateChange callback for all discarded parts within the
> -     * #MemoryRegionSection via the #RamDiscardManager.
> +     * Call the #ReplayStateChange callback for all state clear parts within the
> +     * #MemoryRegionSection via the #GenericStateManager.
> +     *
> +     * In case any call fails, no further calls are made.
>        *
> -     * @rdm: the #RamDiscardManager
> +     * @gsm: the #GenericStateManager
>        * @section: the #MemoryRegionSection
>        * @replay_fn: the #ReplayStateChange callback
>        * @opaque: pointer to forward to the callback
>        *
>        * Returns 0 on success, or a negative error if any notification failed.
>        */
> -    int (*replay_discarded)(const RamDiscardManager *rdm,
> -                            MemoryRegionSection *section,
> -                            ReplayStateChange replay_fn, void *opaque);
> +    int (*replay_on_state_clear)(const GenericStateManager *gsm,
> +                                 MemoryRegionSection *section,
> +                                 ReplayStateChange replay_fn, void *opaque);
>   
>       /**
>        * @register_listener:
>        *
> -     * Register a #RamDiscardListener for the given #MemoryRegionSection and
> -     * immediately notify the #RamDiscardListener about all populated parts
> -     * within the #MemoryRegionSection via the #RamDiscardManager.
> +     * Register a #StateChangeListener for the given #MemoryRegionSection and
> +     * immediately notify the #StateChangeListener about all state-set parts
> +     * within the #MemoryRegionSection via the #GenericStateManager.
>        *
>        * In case any notification fails, no further notifications are triggered
>        * and an error is logged.
>        *
> -     * @rdm: the #RamDiscardManager
> -     * @rdl: the #RamDiscardListener
> +     * @rdm: the #GenericStateManager
> +     * @rdl: the #StateChangeListener
>        * @section: the #MemoryRegionSection
>        */
> -    void (*register_listener)(RamDiscardManager *rdm,
> -                              RamDiscardListener *rdl,
> +    void (*register_listener)(GenericStateManager *gsm,
> +                              StateChangeListener *scl,
>                                 MemoryRegionSection *section);
>   
>       /**
>        * @unregister_listener:
>        *
> -     * Unregister a previously registered #RamDiscardListener via the
> -     * #RamDiscardManager after notifying the #RamDiscardListener about all
> -     * populated parts becoming unpopulated within the registered
> +     * Unregister a previously registered #StateChangeListener via the
> +     * #GenericStateManager after notifying the #StateChangeListener about all
> +     * state-set parts becoming state-cleared within the registered
>        * #MemoryRegionSection.
>        *
> -     * @rdm: the #RamDiscardManager
> -     * @rdl: the #RamDiscardListener
> +     * @rdm: the #GenericStateManager
> +     * @rdl: the #StateChangeListener
>        */
> -    void (*unregister_listener)(RamDiscardManager *rdm,
> -                                RamDiscardListener *rdl);
> +    void (*unregister_listener)(GenericStateManager *gsm,
> +                                StateChangeListener *scl);
>   };
>   
> -uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
> -                                                 const MemoryRegion *mr);
> +uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
> +                                                   const MemoryRegion *mr);
>   
> -bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
> -                                      const MemoryRegionSection *section);
> +bool generic_state_manager_is_state_set(const GenericStateManager *gsm,
> +                                        const MemoryRegionSection *section);
>   
> -int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
> -                                         MemoryRegionSection *section,
> -                                         ReplayStateChange replay_fn,
> -                                         void *opaque);
> +int generic_state_manager_replay_on_state_set(const GenericStateManager *gsm,
> +                                           MemoryRegionSection *section,
> +                                           ReplayStateChange replay_fn,
> +                                           void *opaque);
>   
> -int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
> -                                         MemoryRegionSection *section,
> -                                         ReplayStateChange replay_fn,
> -                                         void *opaque);
> +int generic_state_manager_replay_on_state_clear(const GenericStateManager *gsm,
> +                                                MemoryRegionSection *section,
> +                                                ReplayStateChange replay_fn,
> +                                                void *opaque);
>   
> -void ram_discard_manager_register_listener(RamDiscardManager *rdm,
> -                                           RamDiscardListener *rdl,
> -                                           MemoryRegionSection *section);
> +void generic_state_manager_register_listener(GenericStateManager *gsm,
> +                                             StateChangeListener *scl,
> +                                             MemoryRegionSection *section);
>   
> -void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> -                                             RamDiscardListener *rdl);
> +void generic_state_manager_unregister_listener(GenericStateManager *gsm,
> +                                               StateChangeListener *scl);
> +
> +typedef struct RamDiscardListener RamDiscardListener;
> +
> +struct RamDiscardListener {
> +    struct StateChangeListener scl;
> +
> +    /*
> +     * @double_discard_supported:
> +     *
> +     * The listener suppors getting @notify_discard notifications that span
> +     * already discarded parts.
> +     */
> +    bool double_discard_supported;
> +
> +    QLIST_ENTRY(RamDiscardListener) next;
> +};
> +
> +static inline void ram_discard_listener_init(RamDiscardListener *rdl,
> +                                             NotifyStateSet populate_fn,
> +                                             NotifyStateClear discard_fn,
> +                                             bool double_discard_supported)
> +{
> +    rdl->scl.notify_to_state_set = populate_fn;
> +    rdl->scl.notify_to_state_clear = discard_fn;
> +    rdl->double_discard_supported = double_discard_supported;
> +}
> +
> +/*
> + * RamDiscardManagerClass:
> + *
> + * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
> + * regions are currently populated to be used/accessed by the VM, notifying
> + * after parts were discarded (freeing up memory) and before parts will be
> + * populated (consuming memory), to be used/accessed by the VM.
> + *
> + * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
> + * #MemoryRegion isn't mapped into an address space yet (either directly
> + * or via an alias); it cannot change while the #MemoryRegion is
> + * mapped into an address space.
> + *
> + * The #RamDiscardManager is intended to be used by technologies that are
> + * incompatible with discarding of RAM (e.g., VFIO, which may pin all
> + * memory inside a #MemoryRegion), and require proper coordination to only
> + * map the currently populated parts, to hinder parts that are expected to
> + * remain discarded from silently getting populated and consuming memory.
> + * Technologies that support discarding of RAM don't have to bother and can
> + * simply map the whole #MemoryRegion.
> + *
> + * An example #RamDiscardManager is virtio-mem, which logically (un)plugs
> + * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
> + * Logically unplugging memory consists of discarding RAM. The VM agreed to not
> + * access unplugged (discarded) memory - especially via DMA. virtio-mem will
> + * properly coordinate with listeners before memory is plugged (populated),
> + * and after memory is unplugged (discarded).
> + *
> + * Listeners are called in multiples of the minimum granularity (unless it
> + * would exceed the registered range) and changes are aligned to the minimum
> + * granularity within the #MemoryRegion. Listeners have to prepare for memory
> + * becoming discarded in a different granularity than it was populated and the
> + * other way around.
> + */
> +struct RamDiscardManagerClass {
> +    /* private */
> +    GenericStateManagerClass parent_class;
> +};
>   
>   /**
>    * memory_get_xlat_addr: Extract addresses from a TLB entry
> @@ -795,7 +823,7 @@ struct MemoryRegion {
>       const char *name;
>       unsigned ioeventfd_nb;
>       MemoryRegionIoeventfd *ioeventfds;
> -    RamDiscardManager *rdm; /* Only for RAM */
> +    GenericStateManager *gsm; /* Only for RAM */
>   
>       /* For devices designed to perform re-entrant IO into their own IO MRs */
>       bool disable_reentrancy_guard;
> @@ -2462,39 +2490,36 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr);
>   bool memory_region_is_mapped(MemoryRegion *mr);
>   
>   /**
> - * memory_region_get_ram_discard_manager: get the #RamDiscardManager for a
> + * memory_region_get_generic_state_manager: get the #GenericStateManager for a
>    * #MemoryRegion
>    *
> - * The #RamDiscardManager cannot change while a memory region is mapped.
> + * The #GenericStateManager cannot change while a memory region is mapped.
>    *
>    * @mr: the #MemoryRegion
>    */
> -RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr);
> +GenericStateManager *memory_region_get_generic_state_manager(MemoryRegion *mr);
>   
>   /**
> - * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a
> - * #RamDiscardManager assigned
> + * memory_region_set_generic_state_manager: set the #GenericStateManager for a
> + * #MemoryRegion
> + *
> + * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
> + * that does not cover RAM, or a #MemoryRegion that already has a
> + * #GenericStateManager assigned. Return 0 if the gsm is set successfully.
>    *
>    * @mr: the #MemoryRegion
> + * @gsm: #GenericStateManager to set
>    */
> -static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
> -{
> -    return !!memory_region_get_ram_discard_manager(mr);
> -}
> +int memory_region_set_generic_state_manager(MemoryRegion *mr,
> +                                            GenericStateManager *gsm);
>   
>   /**
> - * memory_region_set_ram_discard_manager: set the #RamDiscardManager for a
> - * #MemoryRegion
> - *
> - * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
> - * that does not cover RAM, or a #MemoryRegion that already has a
> - * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
> + * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a
> + * #RamDiscardManager assigned
>    *
>    * @mr: the #MemoryRegion
> - * @rdm: #RamDiscardManager to set
>    */
> -int memory_region_set_ram_discard_manager(MemoryRegion *mr,
> -                                          RamDiscardManager *rdm);
> +bool memory_region_has_ram_discard_manager(MemoryRegion *mr);
>   
>   /**
>    * memory_region_find: translate an address/size relative to a
> diff --git a/migration/ram.c b/migration/ram.c
> index 053730367b..c881523e64 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -857,14 +857,14 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
>       uint64_t cleared_bits = 0;
>   
>       if (rb->mr && rb->bmap && memory_region_has_ram_discard_manager(rb->mr)) {
> -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
> +        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
>           MemoryRegionSection section = {
>               .mr = rb->mr,
>               .offset_within_region = 0,
>               .size = int128_make64(qemu_ram_get_used_length(rb)),
>           };
>   
> -        ram_discard_manager_replay_discarded(rdm, &section,
> +        generic_state_manager_replay_on_state_clear(gsm, &section,
>                                                dirty_bitmap_clear_section,
>                                                &cleared_bits);
>       }
> @@ -880,14 +880,14 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
>   bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start)
>   {
>       if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
> -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
> +        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
>           MemoryRegionSection section = {
>               .mr = rb->mr,
>               .offset_within_region = start,
>               .size = int128_make64(qemu_ram_pagesize(rb)),
>           };
>   
> -        return !ram_discard_manager_is_populated(rdm, &section);
> +        return !generic_state_manager_is_state_set(gsm, &section);
>       }
>       return false;
>   }
> @@ -1545,14 +1545,14 @@ static void ram_block_populate_read(RAMBlock *rb)
>        * Note: The result is only stable while migrating (precopy/postcopy).
>        */
>       if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
> -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
> +        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
>           MemoryRegionSection section = {
>               .mr = rb->mr,
>               .offset_within_region = 0,
>               .size = rb->mr->size,
>           };
>   
> -        ram_discard_manager_replay_populated(rdm, &section,
> +        generic_state_manager_replay_on_state_set(gsm, &section,
>                                                populate_read_section, NULL);
>       } else {
>           populate_read_range(rb, 0, rb->used_length);
> @@ -1604,14 +1604,14 @@ static int ram_block_uffd_protect(RAMBlock *rb, int uffd_fd)
>   
>       /* See ram_block_populate_read() */
>       if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
> -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
> +        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
>           MemoryRegionSection section = {
>               .mr = rb->mr,
>               .offset_within_region = 0,
>               .size = rb->mr->size,
>           };
>   
> -        return ram_discard_manager_replay_populated(rdm, &section,
> +        return generic_state_manager_replay_on_state_set(gsm, &section,
>                                                       uffd_protect_section,
>                                                       (void *)(uintptr_t)uffd_fd);
>       }
> diff --git a/system/memory.c b/system/memory.c
> index b5ab729e13..7b921c66a6 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2107,83 +2107,93 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr)
>       return imrc->num_indexes(iommu_mr);
>   }
>   
> -RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
> +GenericStateManager *memory_region_get_generic_state_manager(MemoryRegion *mr)
>   {
>       if (!memory_region_is_ram(mr)) {
>           return NULL;
>       }
> -    return mr->rdm;
> +    return mr->gsm;
>   }
>   
> -int memory_region_set_ram_discard_manager(MemoryRegion *mr,
> -                                          RamDiscardManager *rdm)
> +int memory_region_set_generic_state_manager(MemoryRegion *mr,
> +                                            GenericStateManager *gsm)
>   {
>       g_assert(memory_region_is_ram(mr));
> -    if (mr->rdm && rdm) {
> +    if (mr->gsm && gsm) {
>           return -EBUSY;
>       }
>   
> -    mr->rdm = rdm;
> +    mr->gsm = gsm;
>       return 0;
>   }
>   
> -uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
> -                                                 const MemoryRegion *mr)
> +bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
>   {
> -    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
> +    if (!memory_region_is_ram(mr) ||
> +        !object_dynamic_cast(OBJECT(mr->gsm), TYPE_RAM_DISCARD_MANAGER)) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
> +                                                   const MemoryRegion *mr)
> +{
> +    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
>   
> -    g_assert(rdmc->get_min_granularity);
> -    return rdmc->get_min_granularity(rdm, mr);
> +    g_assert(gsmc->get_min_granularity);
> +    return gsmc->get_min_granularity(gsm, mr);
>   }
>   
> -bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
> -                                      const MemoryRegionSection *section)
> +bool generic_state_manager_is_state_set(const GenericStateManager *gsm,
> +                                        const MemoryRegionSection *section)
>   {
> -    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
> +    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
>   
> -    g_assert(rdmc->is_populated);
> -    return rdmc->is_populated(rdm, section);
> +    g_assert(gsmc->is_state_set);
> +    return gsmc->is_state_set(gsm, section);
>   }
>   
> -int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
> -                                         MemoryRegionSection *section,
> -                                         ReplayStateChange replay_fn,
> -                                         void *opaque)
> +int generic_state_manager_replay_on_state_set(const GenericStateManager *gsm,
> +                                              MemoryRegionSection *section,
> +                                              ReplayStateChange replay_fn,
> +                                              void *opaque)
>   {
> -    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
> +    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
>   
> -    g_assert(rdmc->replay_populated);
> -    return rdmc->replay_populated(rdm, section, replay_fn, opaque);
> +    g_assert(gsmc->replay_on_state_set);
> +    return gsmc->replay_on_state_set(gsm, section, replay_fn, opaque);
>   }
>   
> -int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
> -                                         MemoryRegionSection *section,
> -                                         ReplayStateChange replay_fn,
> -                                         void *opaque)
> +int generic_state_manager_replay_on_state_clear(const GenericStateManager *gsm,
> +                                                MemoryRegionSection *section,
> +                                                ReplayStateChange replay_fn,
> +                                                void *opaque)
>   {
> -    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
> +    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
>   
> -    g_assert(rdmc->replay_discarded);
> -    return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
> +    g_assert(gsmc->replay_on_state_clear);
> +    return gsmc->replay_on_state_clear(gsm, section, replay_fn, opaque);
>   }
>   
> -void ram_discard_manager_register_listener(RamDiscardManager *rdm,
> -                                           RamDiscardListener *rdl,
> -                                           MemoryRegionSection *section)
> +void generic_state_manager_register_listener(GenericStateManager *gsm,
> +                                             StateChangeListener *scl,
> +                                             MemoryRegionSection *section)
>   {
> -    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
> +    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
>   
> -    g_assert(rdmc->register_listener);
> -    rdmc->register_listener(rdm, rdl, section);
> +    g_assert(gsmc->register_listener);
> +    gsmc->register_listener(gsm, scl, section);
>   }
>   
> -void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> -                                             RamDiscardListener *rdl)
> +void generic_state_manager_unregister_listener(GenericStateManager *gsm,
> +                                               StateChangeListener *scl)
>   {
> -    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
> +    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
>   
> -    g_assert(rdmc->unregister_listener);
> -    rdmc->unregister_listener(rdm, rdl);
> +    g_assert(gsmc->unregister_listener);
> +    gsmc->unregister_listener(gsm, scl);
>   }
>   
>   /* Called with rcu_read_lock held.  */
> @@ -2210,7 +2220,7 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
>           error_setg(errp, "iommu map to non memory area %" HWADDR_PRIx "", xlat);
>           return false;
>       } else if (memory_region_has_ram_discard_manager(mr)) {
> -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> +        GenericStateManager *gsm = memory_region_get_generic_state_manager(mr);
>           MemoryRegionSection tmp = {
>               .mr = mr,
>               .offset_within_region = xlat,
> @@ -2225,7 +2235,7 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
>            * Disallow that. vmstate priorities make sure any RamDiscardManager
>            * were already restored before IOMMUs are restored.
>            */
> -        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> +        if (!generic_state_manager_is_state_set(gsm, &tmp)) {
>               error_setg(errp, "iommu map to discarded memory (e.g., unplugged"
>                            " via virtio-mem): %" HWADDR_PRIx "",
>                            iotlb->translated_addr);
> @@ -3814,8 +3824,15 @@ static const TypeInfo iommu_memory_region_info = {
>       .abstract           = true,
>   };
>   
> -static const TypeInfo ram_discard_manager_info = {
> +static const TypeInfo generic_state_manager_info = {
>       .parent             = TYPE_INTERFACE,
> +    .name               = TYPE_GENERIC_STATE_MANAGER,
> +    .class_size         = sizeof(GenericStateManagerClass),
> +    .abstract           = true,
> +};
> +
> +static const TypeInfo ram_discard_manager_info = {
> +    .parent             = TYPE_GENERIC_STATE_MANAGER,
>       .name               = TYPE_RAM_DISCARD_MANAGER,
>       .class_size         = sizeof(RamDiscardManagerClass),
>   };
> @@ -3824,6 +3841,7 @@ static void memory_register_types(void)
>   {
>       type_register_static(&memory_region_info);
>       type_register_static(&iommu_memory_region_info);
> +    type_register_static(&generic_state_manager_info);
>       type_register_static(&ram_discard_manager_info);
>   }
>   
> diff --git a/system/memory_mapping.c b/system/memory_mapping.c
> index 37d3325f77..e9d15c737d 100644
> --- a/system/memory_mapping.c
> +++ b/system/memory_mapping.c
> @@ -271,10 +271,8 @@ static void guest_phys_blocks_region_add(MemoryListener *listener,
>   
>       /* for special sparse regions, only add populated parts */
>       if (memory_region_has_ram_discard_manager(section->mr)) {
> -        RamDiscardManager *rdm;
> -
> -        rdm = memory_region_get_ram_discard_manager(section->mr);
> -        ram_discard_manager_replay_populated(rdm, section,
> +        GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
> +        generic_state_manager_replay_on_state_set(gsm, section,
>                                                guest_phys_ram_populate_cb, g);
>           return;
>       }

-- 
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager
  2025-04-07  7:49 ` [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager Chenyi Qiang
@ 2025-04-09  9:56   ` Alexey Kardashevskiy
  2025-04-10  3:47     ` Chenyi Qiang
  2025-04-25 12:57   ` David Hildenbrand
  1 sibling, 1 reply; 67+ messages in thread
From: Alexey Kardashevskiy @ 2025-04-09  9:56 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 7/4/25 17:49, Chenyi Qiang wrote:
> To manage the private and shared RAM states in confidential VMs,
> introduce a new class of PrivateShareManager as a child of

missing "d" in "PrivateShareManager"


> GenericStateManager, which inherits the six interface callbacks. With a
> different interface type, it can be distinguished from the
> RamDiscardManager object and provide the flexibility for addressing
> specific requirements of confidential VMs in the future.

This is still one bit per page, right? What does "set" mean here - 
private or shared? It is either RamPrivateManager or RamSharedManager imho.


> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v4:
>      - Newly added.
> ---
>   include/exec/memory.h | 44 +++++++++++++++++++++++++++++++++++++++++--
>   system/memory.c       | 17 +++++++++++++++++
>   2 files changed, 59 insertions(+), 2 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 30e5838d02..08f25e5e84 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -55,6 +55,12 @@ typedef struct RamDiscardManager RamDiscardManager;
>   DECLARE_OBJ_CHECKERS(RamDiscardManager, RamDiscardManagerClass,
>                        RAM_DISCARD_MANAGER, TYPE_RAM_DISCARD_MANAGER);
>   
> +#define TYPE_PRIVATE_SHARED_MANAGER "private-shared-manager"
> +typedef struct PrivateSharedManagerClass PrivateSharedManagerClass;
> +typedef struct PrivateSharedManager PrivateSharedManager;
> +DECLARE_OBJ_CHECKERS(PrivateSharedManager, PrivateSharedManagerClass,
> +                     PRIVATE_SHARED_MANAGER, TYPE_PRIVATE_SHARED_MANAGER)
> +
>   #ifdef CONFIG_FUZZ
>   void fuzz_dma_read_cb(size_t addr,
>                         size_t len,
> @@ -692,6 +698,14 @@ void generic_state_manager_register_listener(GenericStateManager *gsm,
>   void generic_state_manager_unregister_listener(GenericStateManager *gsm,
>                                                  StateChangeListener *scl);
>   
> +static inline void state_change_listener_init(StateChangeListener *scl,
> +                                              NotifyStateSet state_set_fn,
> +                                              NotifyStateClear state_clear_fn)

This belongs to 04/13 as there is nothing about PrivateSharedManager. 
Thanks,


> +{
> +    scl->notify_to_state_set = state_set_fn;
> +    scl->notify_to_state_clear = state_clear_fn;
> +}
> +
>   typedef struct RamDiscardListener RamDiscardListener;
>   
>   struct RamDiscardListener {
> @@ -713,8 +727,7 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
>                                                NotifyStateClear discard_fn,
>                                                bool double_discard_supported)
>   {
> -    rdl->scl.notify_to_state_set = populate_fn;
> -    rdl->scl.notify_to_state_clear = discard_fn;
> +    state_change_listener_init(&rdl->scl, populate_fn, discard_fn);
>       rdl->double_discard_supported = double_discard_supported;
>   }
>   
> @@ -757,6 +770,25 @@ struct RamDiscardManagerClass {
>       GenericStateManagerClass parent_class;
>   };
>   
> +typedef struct PrivateSharedListener PrivateSharedListener;
> +struct PrivateSharedListener {
> +    struct StateChangeListener scl;
> +
> +    QLIST_ENTRY(PrivateSharedListener) next;
> +};
> +
> +struct PrivateSharedManagerClass {
> +    /* private */
> +    GenericStateManagerClass parent_class;
> +};
> +
> +static inline void private_shared_listener_init(PrivateSharedListener *psl,
> +                                                NotifyStateSet populate_fn,
> +                                                NotifyStateClear discard_fn)
> +{
> +    state_change_listener_init(&psl->scl, populate_fn, discard_fn);
> +}
> +
>   /**
>    * memory_get_xlat_addr: Extract addresses from a TLB entry
>    *
> @@ -2521,6 +2553,14 @@ int memory_region_set_generic_state_manager(MemoryRegion *mr,
>    */
>   bool memory_region_has_ram_discard_manager(MemoryRegion *mr);
>   
> +/**
> + * memory_region_has_private_shared_manager: check whether a #MemoryRegion has a
> + * #PrivateSharedManager assigned
> + *
> + * @mr: the #MemoryRegion
> + */
> +bool memory_region_has_private_shared_manager(MemoryRegion *mr);
> +
>   /**
>    * memory_region_find: translate an address/size relative to a
>    * MemoryRegion into a #MemoryRegionSection.
> diff --git a/system/memory.c b/system/memory.c
> index 7b921c66a6..e6e944d9c0 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2137,6 +2137,16 @@ bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
>       return true;
>   }
>   
> +bool memory_region_has_private_shared_manager(MemoryRegion *mr)
> +{
> +    if (!memory_region_is_ram(mr) ||
> +        !object_dynamic_cast(OBJECT(mr->gsm), TYPE_PRIVATE_SHARED_MANAGER)) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
>   uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
>                                                      const MemoryRegion *mr)
>   {
> @@ -3837,12 +3847,19 @@ static const TypeInfo ram_discard_manager_info = {
>       .class_size         = sizeof(RamDiscardManagerClass),
>   };
>   
> +static const TypeInfo private_shared_manager_info = {
> +    .parent             = TYPE_GENERIC_STATE_MANAGER,
> +    .name               = TYPE_PRIVATE_SHARED_MANAGER,
> +    .class_size         = sizeof(PrivateSharedManagerClass),
> +};
> +
>   static void memory_register_types(void)
>   {
>       type_register_static(&memory_region_info);
>       type_register_static(&iommu_memory_region_info);
>       type_register_static(&generic_state_manager_info);
>       type_register_static(&ram_discard_manager_info);
> +    type_register_static(&private_shared_manager_info);
>   }
>   
>   type_init(memory_register_types)

-- 
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-04-07  7:49 ` [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd Chenyi Qiang
@ 2025-04-09  9:57   ` Alexey Kardashevskiy
  2025-04-10  7:37     ` Chenyi Qiang
  2025-05-09  6:41   ` Baolu Lu
  2025-05-12  8:07   ` Zhao Liu
  2 siblings, 1 reply; 67+ messages in thread
From: Alexey Kardashevskiy @ 2025-04-09  9:57 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 7/4/25 17:49, Chenyi Qiang wrote:
> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
> discard") highlighted that subsystems like VFIO may disable RAM block
> discard. However, guest_memfd relies on discard operations for page
> conversion between private and shared memory, potentially leading to
> stale IOMMU mapping issue when assigning hardware devices to
> confidential VMs via shared memory. To address this, it is crucial to
> ensure systems like VFIO refresh its IOMMU mappings.
> 
> PrivateSharedManager is introduced to manage private and shared states in
> confidential VMs, similar to RamDiscardManager, which supports
> coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
> guest_memfd can facilitate the adjustment of VFIO mappings in response
> to page conversion events.
> 
> Since guest_memfd is not an object, it cannot directly implement the
> PrivateSharedManager interface. Implementing it in HostMemoryBackend is
> not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
> have a memory backend while others do not. 

HostMemoryBackend::mr::ram_block::guest_memfd?
And there is HostMemoryBackendMemfd too.

> Notably, virtual BIOS
> RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
> backend.

I thought private memory can be allocated from guest_memfd only. And it 
is still not clear if this BIOS memory can be discarded or not, does it 
change state during the VM lifetime?
(sorry I keep asking but I do not remember definitive answer).

> To manage RAMBlocks with guest_memfd, define a new object named
> RamBlockAttribute to implement the RamDiscardManager interface. This
> object stores guest_memfd information such as shared_bitmap, and handles
> page conversion notification. The memory state is tracked at the host
> page size granularity, as the minimum memory conversion size can be one
> page per request. Additionally, VFIO expects the DMA mapping for a
> specific iova to be mapped and unmapped with the same granularity.
> Confidential VMs may perform partial conversions, such as conversions on
> small regions within larger regions. To prevent invalid cases and until
> cut_mapping operation support is available, all operations are performed
> with 4K granularity.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v4:
>      - Change the name from memory-attribute-manager to
>        ram-block-attribute.
>      - Implement the newly-introduced PrivateSharedManager instead of
>        RamDiscardManager and change related commit message.
>      - Define the new object in ramblock.h instead of adding a new file.
> 
> Changes in v3:
>      - Some rename (bitmap_size->shared_bitmap_size,
>        first_one/zero_bit->first_bit, etc.)
>      - Change shared_bitmap_size from uint32_t to unsigned
>      - Return mgr->mr->ram_block->page_size in get_block_size()
>      - Move set_ram_discard_manager() up to avoid a g_free() in failure
>        case.
>      - Add const for the memory_attribute_manager_get_block_size()
>      - Unify the ReplayRamPopulate and ReplayRamDiscard and related
>        callback.
> 
> Changes in v2:
>      - Rename the object name to MemoryAttributeManager
>      - Rename the bitmap to shared_bitmap to make it more clear.
>      - Remove block_size field and get it from a helper. In future, we
>        can get the page_size from RAMBlock if necessary.
>      - Remove the unncessary "struct" before GuestMemfdReplayData
>      - Remove the unncessary g_free() for the bitmap
>      - Add some error report when the callback failure for
>        populated/discarded section.
>      - Move the realize()/unrealize() definition to this patch.
> ---
>   include/exec/ramblock.h      |  24 +++
>   system/meson.build           |   1 +
>   system/ram-block-attribute.c | 282 +++++++++++++++++++++++++++++++++++
>   3 files changed, 307 insertions(+)
>   create mode 100644 system/ram-block-attribute.c
> 
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 0babd105c0..b8b5469db9 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -23,6 +23,10 @@
>   #include "cpu-common.h"
>   #include "qemu/rcu.h"
>   #include "exec/ramlist.h"
> +#include "system/hostmem.h"
> +
> +#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
> +OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass, RAM_BLOCK_ATTRIBUTE)
>   
>   struct RAMBlock {
>       struct rcu_head rcu;
> @@ -90,5 +94,25 @@ struct RAMBlock {
>        */
>       ram_addr_t postcopy_length;
>   };
> +
> +struct RamBlockAttribute {
> +    Object parent;
> +
> +    MemoryRegion *mr;
> +
> +    /* 1-setting of the bit represents the memory is populated (shared) */

It is either RamBlockShared, or it is a "generic" RamBlockAttribute 
implementing a bitmap with a bit per page and no special meaning 
(shared/private or discarded/populated). And if it is a generic 
RamBlockAttribute, then this hunk from 09/13 (which should be in this 
patch) should look like:


--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -46,6 +46,7 @@ struct RAMBlock {
      int fd;
      uint64_t fd_offset;
      int guest_memfd;
+    RamBlockAttribute *ram_shared; // and not "ram_block_attribute"

Thanks,


> +    unsigned shared_bitmap_size;
> +    unsigned long *shared_bitmap;
> +
> +    QLIST_HEAD(, PrivateSharedListener) psl_list;
> +};
> +
> +struct RamBlockAttributeClass {
> +    ObjectClass parent_class;
> +};
> +
> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr);
> +void ram_block_attribute_unrealize(RamBlockAttribute *attr);
> +
>   #endif
>   #endif
> diff --git a/system/meson.build b/system/meson.build
> index 4952f4b2c7..50a5a64f1c 100644
> --- a/system/meson.build
> +++ b/system/meson.build
> @@ -15,6 +15,7 @@ system_ss.add(files(
>     'dirtylimit.c',
>     'dma-helpers.c',
>     'globals.c',
> +  'ram-block-attribute.c',
>     'memory_mapping.c',
>     'qdev-monitor.c',
>     'qtest.c',
> diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
> new file mode 100644
> index 0000000000..283c03b354
> --- /dev/null
> +++ b/system/ram-block-attribute.c
> @@ -0,0 +1,282 @@
> +/*
> + * QEMU ram block attribute
> + *
> + * Copyright Intel
> + *
> + * Author:
> + *      Chenyi Qiang <chenyi.qiang@intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "exec/ramblock.h"
> +
> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(RamBlockAttribute,
> +                                   ram_block_attribute,
> +                                   RAM_BLOCK_ATTRIBUTE,
> +                                   OBJECT,
> +                                   { TYPE_PRIVATE_SHARED_MANAGER },
> +                                   { })
> +
> +static size_t ram_block_attribute_get_block_size(const RamBlockAttribute *attr)
> +{
> +    /*
> +     * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
> +     * Use the host page size as the granularity to track the memory attribute.
> +     */
> +    g_assert(attr && attr->mr && attr->mr->ram_block);
> +    g_assert(attr->mr->ram_block->page_size == qemu_real_host_page_size());
> +    return attr->mr->ram_block->page_size;
> +}
> +
> +
> +static bool ram_block_attribute_psm_is_shared(const GenericStateManager *gsm,
> +                                              const MemoryRegionSection *section)
> +{
> +    const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
> +    const int block_size = ram_block_attribute_get_block_size(attr);
> +    uint64_t first_bit = section->offset_within_region / block_size;
> +    uint64_t last_bit = first_bit + int128_get64(section->size) / block_size - 1;
> +    unsigned long first_discard_bit;
> +
> +    first_discard_bit = find_next_zero_bit(attr->shared_bitmap, last_bit + 1, first_bit);
> +    return first_discard_bit > last_bit;
> +}
> +
> +typedef int (*ram_block_attribute_section_cb)(MemoryRegionSection *s, void *arg);
> +
> +static int ram_block_attribute_notify_shared_cb(MemoryRegionSection *section, void *arg)
> +{
> +    StateChangeListener *scl = arg;
> +
> +    return scl->notify_to_state_set(scl, section);
> +}
> +
> +static int ram_block_attribute_notify_private_cb(MemoryRegionSection *section, void *arg)
> +{
> +    StateChangeListener *scl = arg;
> +
> +    scl->notify_to_state_clear(scl, section);
> +    return 0;
> +}
> +
> +static int ram_block_attribute_for_each_shared_section(const RamBlockAttribute *attr,
> +                                                       MemoryRegionSection *section,
> +                                                       void *arg,
> +                                                       ram_block_attribute_section_cb cb)
> +{
> +    unsigned long first_bit, last_bit;
> +    uint64_t offset, size;
> +    const int block_size = ram_block_attribute_get_block_size(attr);
> +    int ret = 0;
> +
> +    first_bit = section->offset_within_region / block_size;
> +    first_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size, first_bit);
> +
> +    while (first_bit < attr->shared_bitmap_size) {
> +        MemoryRegionSection tmp = *section;
> +
> +        offset = first_bit * block_size;
> +        last_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
> +                                      first_bit + 1) - 1;
> +        size = (last_bit - first_bit + 1) * block_size;
> +
> +        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
> +            break;
> +        }
> +
> +        ret = cb(&tmp, arg);
> +        if (ret) {
> +            error_report("%s: Failed to notify RAM discard listener: %s", __func__,
> +                         strerror(-ret));
> +            break;
> +        }
> +
> +        first_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size,
> +                                  last_bit + 2);
> +    }
> +
> +    return ret;
> +}
> +
> +static int ram_block_attribute_for_each_private_section(const RamBlockAttribute *attr,
> +                                                        MemoryRegionSection *section,
> +                                                        void *arg,
> +                                                        ram_block_attribute_section_cb cb)
> +{
> +    unsigned long first_bit, last_bit;
> +    uint64_t offset, size;
> +    const int block_size = ram_block_attribute_get_block_size(attr);
> +    int ret = 0;
> +
> +    first_bit = section->offset_within_region / block_size;
> +    first_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
> +                                   first_bit);
> +
> +    while (first_bit < attr->shared_bitmap_size) {
> +        MemoryRegionSection tmp = *section;
> +
> +        offset = first_bit * block_size;
> +        last_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size,
> +                                      first_bit + 1) - 1;
> +        size = (last_bit - first_bit + 1) * block_size;
> +
> +        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
> +            break;
> +        }
> +
> +        ret = cb(&tmp, arg);
> +        if (ret) {
> +            error_report("%s: Failed to notify RAM discard listener: %s", __func__,
> +                         strerror(-ret));
> +            break;
> +        }
> +
> +        first_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
> +                                       last_bit + 2);
> +    }
> +
> +    return ret;
> +}
> +
> +static uint64_t ram_block_attribute_psm_get_min_granularity(const GenericStateManager *gsm,
> +                                                            const MemoryRegion *mr)
> +{
> +    const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
> +
> +    g_assert(mr == attr->mr);
> +    return ram_block_attribute_get_block_size(attr);
> +}
> +
> +static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
> +                                                      StateChangeListener *scl,
> +                                                      MemoryRegionSection *section)
> +{
> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
> +    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
> +    int ret;
> +
> +    g_assert(section->mr == attr->mr);
> +    scl->section = memory_region_section_new_copy(section);
> +
> +    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
> +
> +    ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
> +                                                      ram_block_attribute_notify_shared_cb);
> +    if (ret) {
> +        error_report("%s: Failed to register RAM discard listener: %s", __func__,
> +                     strerror(-ret));
> +    }
> +}
> +
> +static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm,
> +                                                        StateChangeListener *scl)
> +{
> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
> +    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
> +    int ret;
> +
> +    g_assert(scl->section);
> +    g_assert(scl->section->mr == attr->mr);
> +
> +    ret = ram_block_attribute_for_each_shared_section(attr, scl->section, scl,
> +                                                      ram_block_attribute_notify_private_cb);
> +    if (ret) {
> +        error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
> +                     strerror(-ret));
> +    }
> +
> +    memory_region_section_free_copy(scl->section);
> +    scl->section = NULL;
> +    QLIST_REMOVE(psl, next);
> +}
> +
> +typedef struct RamBlockAttributeReplayData {
> +    ReplayStateChange fn;
> +    void *opaque;
> +} RamBlockAttributeReplayData;
> +
> +static int ram_block_attribute_psm_replay_cb(MemoryRegionSection *section, void *arg)
> +{
> +    RamBlockAttributeReplayData *data = arg;
> +
> +    return data->fn(section, data->opaque);
> +}
> +
> +static int ram_block_attribute_psm_replay_on_shared(const GenericStateManager *gsm,
> +                                                    MemoryRegionSection *section,
> +                                                    ReplayStateChange replay_fn,
> +                                                    void *opaque)
> +{
> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
> +    RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
> +
> +    g_assert(section->mr == attr->mr);
> +    return ram_block_attribute_for_each_shared_section(attr, section, &data,
> +                                                       ram_block_attribute_psm_replay_cb);
> +}
> +
> +static int ram_block_attribute_psm_replay_on_private(const GenericStateManager *gsm,
> +                                                     MemoryRegionSection *section,
> +                                                     ReplayStateChange replay_fn,
> +                                                     void *opaque)
> +{
> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
> +    RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
> +
> +    g_assert(section->mr == attr->mr);
> +    return ram_block_attribute_for_each_private_section(attr, section, &data,
> +                                                        ram_block_attribute_psm_replay_cb);
> +}
> +
> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr)
> +{
> +    uint64_t shared_bitmap_size;
> +    const int block_size  = qemu_real_host_page_size();
> +    int ret;
> +
> +    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
> +
> +    attr->mr = mr;
> +    ret = memory_region_set_generic_state_manager(mr, GENERIC_STATE_MANAGER(attr));
> +    if (ret) {
> +        return ret;
> +    }
> +    attr->shared_bitmap_size = shared_bitmap_size;
> +    attr->shared_bitmap = bitmap_new(shared_bitmap_size);
> +
> +    return ret;
> +}
> +
> +void ram_block_attribute_unrealize(RamBlockAttribute *attr)
> +{
> +    g_free(attr->shared_bitmap);
> +    memory_region_set_generic_state_manager(attr->mr, NULL);
> +}
> +
> +static void ram_block_attribute_init(Object *obj)
> +{
> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj);
> +
> +    QLIST_INIT(&attr->psl_list);
> +}
> +
> +static void ram_block_attribute_finalize(Object *obj)
> +{
> +}
> +
> +static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
> +{
> +    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
> +
> +    gsmc->get_min_granularity = ram_block_attribute_psm_get_min_granularity;
> +    gsmc->register_listener = ram_block_attribute_psm_register_listener;
> +    gsmc->unregister_listener = ram_block_attribute_psm_unregister_listener;
> +    gsmc->is_state_set = ram_block_attribute_psm_is_shared;
> +    gsmc->replay_on_state_set = ram_block_attribute_psm_replay_on_shared;
> +    gsmc->replay_on_state_clear = ram_block_attribute_psm_replay_on_private;
> +}

-- 
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 06/13] vfio: Add the support for PrivateSharedManager Interface
  2025-04-07  7:49 ` [PATCH v4 06/13] vfio: Add the support for PrivateSharedManager Interface Chenyi Qiang
@ 2025-04-09  9:58   ` Alexey Kardashevskiy
  2025-04-10  5:53     ` Chenyi Qiang
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Kardashevskiy @ 2025-04-09  9:58 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 7/4/25 17:49, Chenyi Qiang wrote:
> Subsystems like VFIO previously disabled ram block discard and only
> allowed coordinated discarding via RamDiscardManager. However,
> guest_memfd in confidential VMs relies on discard operations for page
> conversion between private and shared memory. This can lead to stale
> IOMMU mapping issue when assigning a hardware device to a confidential
> VM via shared memory. With the introduction of PrivateSharedManager
> interface to manage private and shared states and being distinct from
> RamDiscardManager, include PrivateSharedManager in coordinated RAM
> discard and add related support in VFIO.

How does the new behavior differ from what 
vfio_register_ram_discard_listener() does? Thanks,


> Currently, migration support for confidential VMs is not available, so
> vfio_sync_dirty_bitmap() handling for PrivateSharedListener can be
> ignored. The register/unregister of PrivateSharedListener is necessary
> during vfio_listener_region_add/del(). The listener callbacks are
> similar between RamDiscardListener and PrivateSharedListener, allowing
> for extraction of common parts opportunisticlly.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v4
>      - Newly added.
> ---
>   hw/vfio/common.c                      | 104 +++++++++++++++++++++++---
>   hw/vfio/container-base.c              |   1 +
>   include/hw/vfio/vfio-container-base.h |  10 +++
>   3 files changed, 105 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 3172d877cc..48468a12c3 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -335,13 +335,9 @@ out:
>       rcu_read_unlock();
>   }
>   
> -static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
> -                                            MemoryRegionSection *section)
> +static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
> +                                                    MemoryRegionSection *section)
>   {
> -    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
> -    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
> -                                                listener);
> -    VFIOContainerBase *bcontainer = vrdl->bcontainer;
>       const hwaddr size = int128_get64(section->size);
>       const hwaddr iova = section->offset_within_address_space;
>       int ret;
> @@ -354,13 +350,28 @@ static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
>       }
>   }
>   
> -static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
> +static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
>                                               MemoryRegionSection *section)
>   {
>       RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
>       VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
>                                                   listener);
> -    VFIOContainerBase *bcontainer = vrdl->bcontainer;
> +    vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
> +}
> +
> +static void vfio_private_shared_notify_to_private(StateChangeListener *scl,
> +                                                  MemoryRegionSection *section)
> +{
> +    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
> +    VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
> +                                                   listener);
> +    vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
> +}
> +
> +static int vfio_state_change_notify_to_state_set(VFIOContainerBase *bcontainer,
> +                                                 MemoryRegionSection *section,
> +                                                 uint64_t granularity)
> +{
>       const hwaddr end = section->offset_within_region +
>                          int128_get64(section->size);
>       hwaddr start, next, iova;
> @@ -372,7 +383,7 @@ static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
>        * unmap in minimum granularity later.
>        */
>       for (start = section->offset_within_region; start < end; start = next) {
> -        next = ROUND_UP(start + 1, vrdl->granularity);
> +        next = ROUND_UP(start + 1, granularity);
>           next = MIN(next, end);
>   
>           iova = start - section->offset_within_region +
> @@ -383,13 +394,33 @@ static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
>                                        vaddr, section->readonly);
>           if (ret) {
>               /* Rollback */
> -            vfio_ram_discard_notify_discard(scl, section);
> +            vfio_state_change_notify_to_state_clear(bcontainer, section);
>               return ret;
>           }
>       }
>       return 0;
>   }
>   
> +static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
> +                                            MemoryRegionSection *section)
> +{
> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
> +    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
> +                                                listener);
> +    return vfio_state_change_notify_to_state_set(vrdl->bcontainer, section,
> +                                                 vrdl->granularity);
> +}
> +
> +static int vfio_private_shared_notify_to_shared(StateChangeListener *scl,
> +                                                MemoryRegionSection *section)
> +{
> +    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
> +    VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
> +                                                   listener);
> +    return vfio_state_change_notify_to_state_set(vpsl->bcontainer, section,
> +                                                 vpsl->granularity);
> +}
> +
>   static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
>                                                  MemoryRegionSection *section)
>   {
> @@ -466,6 +497,27 @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
>       }
>   }
>   
> +static void vfio_register_private_shared_listener(VFIOContainerBase *bcontainer,
> +                                                  MemoryRegionSection *section)
> +{
> +    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
> +    VFIOPrivateSharedListener *vpsl;
> +    PrivateSharedListener *psl;
> +
> +    vpsl = g_new0(VFIOPrivateSharedListener, 1);
> +    vpsl->bcontainer = bcontainer;
> +    vpsl->mr = section->mr;
> +    vpsl->offset_within_address_space = section->offset_within_address_space;
> +    vpsl->granularity = generic_state_manager_get_min_granularity(gsm,
> +                                                                  section->mr);
> +
> +    psl = &vpsl->listener;
> +    private_shared_listener_init(psl, vfio_private_shared_notify_to_shared,
> +                                 vfio_private_shared_notify_to_private);
> +    generic_state_manager_register_listener(gsm, &psl->scl, section);
> +    QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next);
> +}
> +
>   static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
>                                                    MemoryRegionSection *section)
>   {
> @@ -491,6 +543,31 @@ static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
>       g_free(vrdl);
>   }
>   
> +static void vfio_unregister_private_shared_listener(VFIOContainerBase *bcontainer,
> +                                                    MemoryRegionSection *section)
> +{
> +    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
> +    VFIOPrivateSharedListener *vpsl = NULL;
> +    PrivateSharedListener *psl;
> +
> +    QLIST_FOREACH(vpsl, &bcontainer->vpsl_list, next) {
> +        if (vpsl->mr == section->mr &&
> +            vpsl->offset_within_address_space ==
> +            section->offset_within_address_space) {
> +            break;
> +        }
> +    }
> +
> +    if (!vpsl) {
> +        hw_error("vfio: Trying to unregister missing RAM discard listener");
> +    }
> +
> +    psl = &vpsl->listener;
> +    generic_state_manager_unregister_listener(gsm, &psl->scl);
> +    QLIST_REMOVE(vpsl, next);
> +    g_free(vpsl);
> +}
> +
>   static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
>   {
>       MemoryRegion *mr = section->mr;
> @@ -644,6 +721,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
>       if (memory_region_has_ram_discard_manager(section->mr)) {
>           vfio_register_ram_discard_listener(bcontainer, section);
>           return;
> +    } else if (memory_region_has_private_shared_manager(section->mr)) {
> +        vfio_register_private_shared_listener(bcontainer, section);
> +        return;
>       }
>   
>       vaddr = memory_region_get_ram_ptr(section->mr) +
> @@ -764,6 +844,10 @@ static void vfio_listener_region_del(MemoryListener *listener,
>           vfio_unregister_ram_discard_listener(bcontainer, section);
>           /* Unregistering will trigger an unmap. */
>           try_unmap = false;
> +    } else if (memory_region_has_private_shared_manager(section->mr)) {
> +        vfio_unregister_private_shared_listener(bcontainer, section);
> +        /* Unregistering will trigger an unmap. */
> +        try_unmap = false;
>       }
>   
>       if (try_unmap) {
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 749a3fd29d..ff5df925c2 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -135,6 +135,7 @@ static void vfio_container_instance_init(Object *obj)
>       bcontainer->iova_ranges = NULL;
>       QLIST_INIT(&bcontainer->giommu_list);
>       QLIST_INIT(&bcontainer->vrdl_list);
> +    QLIST_INIT(&bcontainer->vpsl_list);
>   }
>   
>   static const TypeInfo types[] = {
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 4cff9943ab..8d7c0b1179 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -47,6 +47,7 @@ typedef struct VFIOContainerBase {
>       bool dirty_pages_started; /* Protected by BQL */
>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>       QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
> +    QLIST_HEAD(, VFIOPrivateSharedListener) vpsl_list;
>       QLIST_ENTRY(VFIOContainerBase) next;
>       QLIST_HEAD(, VFIODevice) device_list;
>       GList *iova_ranges;
> @@ -71,6 +72,15 @@ typedef struct VFIORamDiscardListener {
>       QLIST_ENTRY(VFIORamDiscardListener) next;
>   } VFIORamDiscardListener;
>   
> +typedef struct VFIOPrivateSharedListener {
> +    VFIOContainerBase *bcontainer;
> +    MemoryRegion *mr;
> +    hwaddr offset_within_address_space;
> +    uint64_t granularity;
> +    PrivateSharedListener listener;
> +    QLIST_ENTRY(VFIOPrivateSharedListener) next;
> +} VFIOPrivateSharedListener;
> +
>   int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>                              hwaddr iova, ram_addr_t size,
>                              void *vaddr, bool readonly);

-- 
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-09  9:56   ` Alexey Kardashevskiy
@ 2025-04-09 12:57     ` Chenyi Qiang
  2025-04-10  0:11       ` Alexey Kardashevskiy
  2025-04-25 12:49     ` David Hildenbrand
  1 sibling, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-09 12:57 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/9/2025 5:56 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> RamDiscardManager is an interface used by virtio-mem to adjust VFIO
>> mappings in relation to VM page assignment. It manages the state of
>> populated and discard for the RAM. To accommodate future scnarios for
>> managing RAM states, such as private and shared states in confidential
>> VMs, the existing RamDiscardManager interface needs to be generalized.
>>
>> Introduce a parent class, GenericStateManager, to manage a pair of
> 
> "GenericState" is the same as "State" really. Call it RamStateManager.

OK to me.

> 
> 
> 
>> opposite states with RamDiscardManager as its child. The changes include
>> - Define a new abstract class GenericStateChange.
>> - Extract six callbacks into GenericStateChangeClass and allow the child
>>    classes to inherit them.
>> - Modify RamDiscardManager-related helpers to use GenericStateManager
>>    ones.
>> - Define a generic StatChangeListener to extract fields from
> 
> "e" missing in StateChangeListener.

Fixed. Thanks.

> 
>>    RamDiscardManager listener which allows future listeners to embed it
>>    and avoid duplication.
>> - Change the users of RamDiscardManager (virtio-mem, migration, etc.) to
>>    switch to use GenericStateChange helpers.
>>
>> It can provide a more flexible and resuable framework for RAM state
>> management, facilitating future enhancements and use cases.
> 
> I fail to see how new interface helps with this. RamDiscardManager
> manipulates populated/discarded. It would make sense may be if the new
> class had more bits per page, say private/shared/discarded but it does
> not. And PrivateSharedManager cannot coexist with RamDiscard. imho this
> is going in a wrong direction.

I think we have two questions here:

1. whether we should define an abstract parent class and distinguish the
RamDiscardManager and PrivateSharedManager?

I vote for this. First, After making the distinction, the
PrivateSharedManager won't go into the RamDiscardManager path which
PrivateSharedManager may have not supported yet. e.g. the migration
related path. In addtional, we can extend the PrivateSharedManager for
specific handling, e.g. the priority listener, state_change() callback.

2. How we should abstract the parent class?

I think this is the problem. My current implementation extracts all the
callbacks in RamDiscardManager into the parent class and call them
state_set and state_clear, which can only manage a pair of opposite
states. As you mentioned, there could be private/shared/discarded three
states in the future, which is not compatible with current design. Maybe
we can make the parent class more generic, e.g. only extract the
register/unregister_listener() into it.

> 
> 
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>> Changes in v4:
>>      - Newly added.
>> ---
>>   hw/vfio/common.c        |  30 ++--
>>   hw/virtio/virtio-mem.c  |  95 ++++++------
>>   include/exec/memory.h   | 313 ++++++++++++++++++++++------------------
>>   migration/ram.c         |  16 +-
>>   system/memory.c         | 106 ++++++++------
>>   system/memory_mapping.c |   6 +-
>>   6 files changed, 310 insertions(+), 256 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index f7499a9b74..3172d877cc 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -335,9 +335,10 @@ out:
>>       rcu_read_unlock();
>>   }
>>   -static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>> +static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
>>                                               MemoryRegionSection
>> *section)
>>   {
>> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener,
>> scl);
>>       VFIORamDiscardListener *vrdl = container_of(rdl,
>> VFIORamDiscardListener,
>>                                                   listener);
>>       VFIOContainerBase *bcontainer = vrdl->bcontainer;
>> @@ -353,9 +354,10 @@ static void
>> vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>>       }
>>   }
>>   -static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>> +static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
>>                                               MemoryRegionSection
>> *section)
>>   {
>> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener,
>> scl);
>>       VFIORamDiscardListener *vrdl = container_of(rdl,
>> VFIORamDiscardListener,
>>                                                   listener);
> 
> VFIORamDiscardListener *vrdl = container_of(scl, VFIORamDiscardListener,
> listener.scl) and drop @ rdl? Thanks,

Modified. Thanks!

> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-09 12:57     ` Chenyi Qiang
@ 2025-04-10  0:11       ` Alexey Kardashevskiy
  2025-04-10  1:44         ` Chenyi Qiang
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Kardashevskiy @ 2025-04-10  0:11 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 9/4/25 22:57, Chenyi Qiang wrote:
> 
> 
> On 4/9/2025 5:56 PM, Alexey Kardashevskiy wrote:
>>
>>
>> On 7/4/25 17:49, Chenyi Qiang wrote:
>>> RamDiscardManager is an interface used by virtio-mem to adjust VFIO
>>> mappings in relation to VM page assignment. It manages the state of
>>> populated and discard for the RAM. To accommodate future scnarios for
>>> managing RAM states, such as private and shared states in confidential
>>> VMs, the existing RamDiscardManager interface needs to be generalized.
>>>
>>> Introduce a parent class, GenericStateManager, to manage a pair of
>>
>> "GenericState" is the same as "State" really. Call it RamStateManager.
> 
> OK to me.

Sorry, nah. "Generic" would mean "machine" in QEMU.


>>
>>
>>> opposite states with RamDiscardManager as its child. The changes include
>>> - Define a new abstract class GenericStateChange.
>>> - Extract six callbacks into GenericStateChangeClass and allow the child
>>>     classes to inherit them.
>>> - Modify RamDiscardManager-related helpers to use GenericStateManager
>>>     ones.
>>> - Define a generic StatChangeListener to extract fields from
>>
>> "e" missing in StateChangeListener.
> 
> Fixed. Thanks.
> 
>>
>>>     RamDiscardManager listener which allows future listeners to embed it
>>>     and avoid duplication.
>>> - Change the users of RamDiscardManager (virtio-mem, migration, etc.) to
>>>     switch to use GenericStateChange helpers.
>>>
>>> It can provide a more flexible and resuable framework for RAM state
>>> management, facilitating future enhancements and use cases.
>>
>> I fail to see how new interface helps with this. RamDiscardManager
>> manipulates populated/discarded. It would make sense may be if the new
>> class had more bits per page, say private/shared/discarded but it does
>> not. And PrivateSharedManager cannot coexist with RamDiscard. imho this
>> is going in a wrong direction.
> 
> I think we have two questions here:
> 
> 1. whether we should define an abstract parent class and distinguish the
> RamDiscardManager and PrivateSharedManager?

If it is 1 bit per page with the meaning "1 == populated == shared", 
then no, one class will do.


> I vote for this. First, After making the distinction, the
> PrivateSharedManager won't go into the RamDiscardManager path which
> PrivateSharedManager may have not supported yet. e.g. the migration
> related path. In addtional, we can extend the PrivateSharedManager for
> specific handling, e.g. the priority listener, state_change() callback.
> 
> 2. How we should abstract the parent class?
> 
> I think this is the problem. My current implementation extracts all the
> callbacks in RamDiscardManager into the parent class and call them
> state_set and state_clear, which can only manage a pair of opposite
> states. As you mentioned, there could be private/shared/discarded three
> states in the future, which is not compatible with current design. Maybe
> we can make the parent class more generic, e.g. only extract the
> register/unregister_listener() into it.

Or we could rename RamDiscardManager to RamStateManager, implement 2bit 
per page (0 = discarded, 1 = populated+shared, 2 = populated+private).
Eventually we will have to deal with the mix of private and shared 
mappings for the same device, how 1 bit per page is going to work? Thanks,


> 
>>
>>
>>>
>>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>>> ---
>>> Changes in v4:
>>>       - Newly added.
>>> ---
>>>    hw/vfio/common.c        |  30 ++--
>>>    hw/virtio/virtio-mem.c  |  95 ++++++------
>>>    include/exec/memory.h   | 313 ++++++++++++++++++++++------------------
>>>    migration/ram.c         |  16 +-
>>>    system/memory.c         | 106 ++++++++------
>>>    system/memory_mapping.c |   6 +-
>>>    6 files changed, 310 insertions(+), 256 deletions(-)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index f7499a9b74..3172d877cc 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -335,9 +335,10 @@ out:
>>>        rcu_read_unlock();
>>>    }
>>>    -static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>>> +static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
>>>                                                MemoryRegionSection
>>> *section)
>>>    {
>>> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener,
>>> scl);
>>>        VFIORamDiscardListener *vrdl = container_of(rdl,
>>> VFIORamDiscardListener,
>>>                                                    listener);
>>>        VFIOContainerBase *bcontainer = vrdl->bcontainer;
>>> @@ -353,9 +354,10 @@ static void
>>> vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>>>        }
>>>    }
>>>    -static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>>> +static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
>>>                                                MemoryRegionSection
>>> *section)
>>>    {
>>> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener,
>>> scl);
>>>        VFIORamDiscardListener *vrdl = container_of(rdl,
>>> VFIORamDiscardListener,
>>>                                                    listener);
>>
>> VFIORamDiscardListener *vrdl = container_of(scl, VFIORamDiscardListener,
>> listener.scl) and drop @ rdl? Thanks,
> 
> Modified. Thanks!
> 
>>
>>

-- 
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-10  0:11       ` Alexey Kardashevskiy
@ 2025-04-10  1:44         ` Chenyi Qiang
  2025-04-16  3:32           ` Chenyi Qiang
  0 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-10  1:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/10/2025 8:11 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 9/4/25 22:57, Chenyi Qiang wrote:
>>
>>
>> On 4/9/2025 5:56 PM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 7/4/25 17:49, Chenyi Qiang wrote:
>>>> RamDiscardManager is an interface used by virtio-mem to adjust VFIO
>>>> mappings in relation to VM page assignment. It manages the state of
>>>> populated and discard for the RAM. To accommodate future scnarios for
>>>> managing RAM states, such as private and shared states in confidential
>>>> VMs, the existing RamDiscardManager interface needs to be generalized.
>>>>
>>>> Introduce a parent class, GenericStateManager, to manage a pair of
>>>
>>> "GenericState" is the same as "State" really. Call it RamStateManager.
>>
>> OK to me.
> 
> Sorry, nah. "Generic" would mean "machine" in QEMU.

OK, anyway, I can rename to RamStateManager if we follow this direction.

> 
> 
>>>
>>>
>>>> opposite states with RamDiscardManager as its child. The changes
>>>> include
>>>> - Define a new abstract class GenericStateChange.
>>>> - Extract six callbacks into GenericStateChangeClass and allow the
>>>> child
>>>>     classes to inherit them.
>>>> - Modify RamDiscardManager-related helpers to use GenericStateManager
>>>>     ones.
>>>> - Define a generic StatChangeListener to extract fields from
>>>
>>> "e" missing in StateChangeListener.
>>
>> Fixed. Thanks.
>>
>>>
>>>>     RamDiscardManager listener which allows future listeners to
>>>> embed it
>>>>     and avoid duplication.
>>>> - Change the users of RamDiscardManager (virtio-mem, migration,
>>>> etc.) to
>>>>     switch to use GenericStateChange helpers.
>>>>
>>>> It can provide a more flexible and resuable framework for RAM state
>>>> management, facilitating future enhancements and use cases.
>>>
>>> I fail to see how new interface helps with this. RamDiscardManager
>>> manipulates populated/discarded. It would make sense may be if the new
>>> class had more bits per page, say private/shared/discarded but it does
>>> not. And PrivateSharedManager cannot coexist with RamDiscard. imho this
>>> is going in a wrong direction.
>>
>> I think we have two questions here:
>>
>> 1. whether we should define an abstract parent class and distinguish the
>> RamDiscardManager and PrivateSharedManager?
> 
> If it is 1 bit per page with the meaning "1 == populated == shared",
> then no, one class will do.

Not restrict to 1 bit per page. As mentioned in questions 2, the parent
class can be more generic, e.g. only including
register/unregister_listener().

Like in this way:

The parent class:

struct StateChangeListener {
    MemoryRegionSection *section;
}

struct RamStateManagerClass {
    void (*register_listener)();
    void (*unregister_listener)();
}

The child class:

1. RamDiscardManager

struct RamDiscardListener {
    StateChangeListener scl;
    NotifyPopulate notify_populate;
    NotifyDiscard notify_discard;
    bool double_discard_supported;

    QLIST_ENTRY(RamDiscardListener) next;
}

struct RamDiscardManagerClass {
    RamStateManagerClass parent_class;
    uint64_t (*get_min_granularity)();
    bool (*is_populate)();
    bool (*replay_populate)();
    bool (*replay_discard)();
}

2. PrivateSharedManager (or other name like ConfidentialRamManager?)

struct PrivateSharedListener {
    StateChangeListener scl;
    NotifyShared notify_shared;
    NotifyPrivate notify_private;
    int priority;

    QLIST_ENTRY(PrivateSharedListener) next;
}

struct PrivateSharedManagerClass {
    RamStateManagerClass parent_class;
    uint64_t (*get_min_granularity)();
    bool (*is_shared)();
    // No need to define replay_private/replay_shared as no use case at
present.
}

In the future, if we want to manage three states, we can only extend
PrivateSharedManagerClass/PrivateSharedListener.

> 
> 
>> I vote for this. First, After making the distinction, the
>> PrivateSharedManager won't go into the RamDiscardManager path which
>> PrivateSharedManager may have not supported yet. e.g. the migration
>> related path. In addtional, we can extend the PrivateSharedManager for
>> specific handling, e.g. the priority listener, state_change() callback.
>>
>> 2. How we should abstract the parent class?
>>
>> I think this is the problem. My current implementation extracts all the
>> callbacks in RamDiscardManager into the parent class and call them
>> state_set and state_clear, which can only manage a pair of opposite
>> states. As you mentioned, there could be private/shared/discarded three
>> states in the future, which is not compatible with current design. Maybe
>> we can make the parent class more generic, e.g. only extract the
>> register/unregister_listener() into it.
> 
> Or we could rename RamDiscardManager to RamStateManager, implement 2bit
> per page (0 = discarded, 1 = populated+shared, 2 = populated+private).
> Eventually we will have to deal with the mix of private and shared
> mappings for the same device, how 1 bit per page is going to work? Thanks,

Only renaming RamDiscardManager seems not sufficient. Current
RamDiscardManagerClass can only manage two states. For example, its
callback functions only have the name of xxx_populate and xxx_discard.
If we want to extend it to manage three states, we have to modify those
callbacks, e.g. add some new argument like is_populate(bool is_private),
or define some new callbacks like is_populate_private(). It will make
this class more complicated, but actually not necessary in legacy VMs
without the concept of private/shared.

> 
> 
>>
>>>
>>>
>>>>
>>>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>>>> ---
>>>> Changes in v4:
>>>>       - Newly added.
>>>> ---
>>>>    hw/vfio/common.c        |  30 ++--
>>>>    hw/virtio/virtio-mem.c  |  95 ++++++------
>>>>    include/exec/memory.h   | 313 +++++++++++++++++++++
>>>> +------------------
>>>>    migration/ram.c         |  16 +-
>>>>    system/memory.c         | 106 ++++++++------
>>>>    system/memory_mapping.c |   6 +-
>>>>    6 files changed, 310 insertions(+), 256 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index f7499a9b74..3172d877cc 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -335,9 +335,10 @@ out:
>>>>        rcu_read_unlock();
>>>>    }
>>>>    -static void vfio_ram_discard_notify_discard(RamDiscardListener
>>>> *rdl,
>>>> +static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
>>>>                                                MemoryRegionSection
>>>> *section)
>>>>    {
>>>> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener,
>>>> scl);
>>>>        VFIORamDiscardListener *vrdl = container_of(rdl,
>>>> VFIORamDiscardListener,
>>>>                                                    listener);
>>>>        VFIOContainerBase *bcontainer = vrdl->bcontainer;
>>>> @@ -353,9 +354,10 @@ static void
>>>> vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>>>>        }
>>>>    }
>>>>    -static int vfio_ram_discard_notify_populate(RamDiscardListener
>>>> *rdl,
>>>> +static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
>>>>                                                MemoryRegionSection
>>>> *section)
>>>>    {
>>>> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener,
>>>> scl);
>>>>        VFIORamDiscardListener *vrdl = container_of(rdl,
>>>> VFIORamDiscardListener,
>>>>                                                    listener);
>>>
>>> VFIORamDiscardListener *vrdl = container_of(scl, VFIORamDiscardListener,
>>> listener.scl) and drop @ rdl? Thanks,
>>
>> Modified. Thanks!
>>
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager
  2025-04-09  9:56   ` Alexey Kardashevskiy
@ 2025-04-10  3:47     ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-10  3:47 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/9/2025 5:56 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> To manage the private and shared RAM states in confidential VMs,
>> introduce a new class of PrivateShareManager as a child of
> 
> missing "d" in "PrivateShareManager"

Fixed. Thanks!

> 
> 
>> GenericStateManager, which inherits the six interface callbacks. With a
>> different interface type, it can be distinguished from the
>> RamDiscardManager object and provide the flexibility for addressing
>> specific requirements of confidential VMs in the future.
> 
> This is still one bit per page, right? What does "set" mean here -
> private or shared? It is either RamPrivateManager or RamSharedManager imho.

This series only allows one bit per page, let's continue the discussion
for this in patch 04/13.

"Set" just mean the bitmap set. Maybe rename to RamPrivateManager
(corresponding to RamDiscardManager) or CVMRamManager (Confidential VM
Ram Manager) if we want to introduce more states in the future.

> 
> 
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>> Changes in v4:
>>      - Newly added.
>> ---
>>   include/exec/memory.h | 44 +++++++++++++++++++++++++++++++++++++++++--
>>   system/memory.c       | 17 +++++++++++++++++
>>   2 files changed, 59 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 30e5838d02..08f25e5e84 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -55,6 +55,12 @@ typedef struct RamDiscardManager RamDiscardManager;
>>   DECLARE_OBJ_CHECKERS(RamDiscardManager, RamDiscardManagerClass,
>>                        RAM_DISCARD_MANAGER, TYPE_RAM_DISCARD_MANAGER);
>>   +#define TYPE_PRIVATE_SHARED_MANAGER "private-shared-manager"
>> +typedef struct PrivateSharedManagerClass PrivateSharedManagerClass;
>> +typedef struct PrivateSharedManager PrivateSharedManager;
>> +DECLARE_OBJ_CHECKERS(PrivateSharedManager, PrivateSharedManagerClass,
>> +                     PRIVATE_SHARED_MANAGER,
>> TYPE_PRIVATE_SHARED_MANAGER)
>> +
>>   #ifdef CONFIG_FUZZ
>>   void fuzz_dma_read_cb(size_t addr,
>>                         size_t len,
>> @@ -692,6 +698,14 @@ void
>> generic_state_manager_register_listener(GenericStateManager *gsm,
>>   void generic_state_manager_unregister_listener(GenericStateManager
>> *gsm,
>>                                                  StateChangeListener
>> *scl);
>>   +static inline void state_change_listener_init(StateChangeListener
>> *scl,
>> +                                              NotifyStateSet
>> state_set_fn,
>> +                                              NotifyStateClear
>> state_clear_fn)
> 
> This belongs to 04/13 as there is nothing about PrivateSharedManager.
> Thanks,

Will move it. Thanks.

> 
> 
>> +{
>> +    scl->notify_to_state_set = state_set_fn;
>> +    scl->notify_to_state_clear = state_clear_fn;
>> +}
>> +
>>   typedef struct RamDiscardListener RamDiscardListener;
>>     struct RamDiscardListener {
>> @@ -713,8 +727,7 @@ static inline void
>> ram_discard_listener_init(RamDiscardListener *rdl,
>>                                                NotifyStateClear
>> discard_fn,
>>                                                bool
>> double_discard_supported)
>>   {
>> -    rdl->scl.notify_to_state_set = populate_fn;
>> -    rdl->scl.notify_to_state_clear = discard_fn;
>> +    state_change_listener_init(&rdl->scl, populate_fn, discard_fn);
>>       rdl->double_discard_supported = double_discard_supported;
>>   }
>>   @@ -757,6 +770,25 @@ struct RamDiscardManagerClass {
>>       GenericStateManagerClass parent_class;
>>   };
>>   +typedef struct PrivateSharedListener PrivateSharedListener;
>> +struct PrivateSharedListener {
>> +    struct StateChangeListener scl;
>> +
>> +    QLIST_ENTRY(PrivateSharedListener) next;
>> +};
>> +
>> +struct PrivateSharedManagerClass {
>> +    /* private */
>> +    GenericStateManagerClass parent_class;
>> +};
>> +
>> +static inline void private_shared_listener_init(PrivateSharedListener
>> *psl,
>> +                                                NotifyStateSet
>> populate_fn,
>> +                                                NotifyStateClear
>> discard_fn)
>> +{
>> +    state_change_listener_init(&psl->scl, populate_fn, discard_fn);
>> +}
>> +
>>   /**
>>    * memory_get_xlat_addr: Extract addresses from a TLB entry
>>    *
>> @@ -2521,6 +2553,14 @@ int
>> memory_region_set_generic_state_manager(MemoryRegion *mr,
>>    */
>>   bool memory_region_has_ram_discard_manager(MemoryRegion *mr);
>>   +/**
>> + * memory_region_has_private_shared_manager: check whether a
>> #MemoryRegion has a
>> + * #PrivateSharedManager assigned
>> + *
>> + * @mr: the #MemoryRegion
>> + */
>> +bool memory_region_has_private_shared_manager(MemoryRegion *mr);
>> +
>>   /**
>>    * memory_region_find: translate an address/size relative to a
>>    * MemoryRegion into a #MemoryRegionSection.
>> diff --git a/system/memory.c b/system/memory.c
>> index 7b921c66a6..e6e944d9c0 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -2137,6 +2137,16 @@ bool
>> memory_region_has_ram_discard_manager(MemoryRegion *mr)
>>       return true;
>>   }
>>   +bool memory_region_has_private_shared_manager(MemoryRegion *mr)
>> +{
>> +    if (!memory_region_is_ram(mr) ||
>> +        !object_dynamic_cast(OBJECT(mr->gsm),
>> TYPE_PRIVATE_SHARED_MANAGER)) {
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>   uint64_t generic_state_manager_get_min_granularity(const
>> GenericStateManager *gsm,
>>                                                      const
>> MemoryRegion *mr)
>>   {
>> @@ -3837,12 +3847,19 @@ static const TypeInfo ram_discard_manager_info
>> = {
>>       .class_size         = sizeof(RamDiscardManagerClass),
>>   };
>>   +static const TypeInfo private_shared_manager_info = {
>> +    .parent             = TYPE_GENERIC_STATE_MANAGER,
>> +    .name               = TYPE_PRIVATE_SHARED_MANAGER,
>> +    .class_size         = sizeof(PrivateSharedManagerClass),
>> +};
>> +
>>   static void memory_register_types(void)
>>   {
>>       type_register_static(&memory_region_info);
>>       type_register_static(&iommu_memory_region_info);
>>       type_register_static(&generic_state_manager_info);
>>       type_register_static(&ram_discard_manager_info);
>> +    type_register_static(&private_shared_manager_info);
>>   }
>>     type_init(memory_register_types)
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 06/13] vfio: Add the support for PrivateSharedManager Interface
  2025-04-09  9:58   ` Alexey Kardashevskiy
@ 2025-04-10  5:53     ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-10  5:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/9/2025 5:58 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> Subsystems like VFIO previously disabled ram block discard and only
>> allowed coordinated discarding via RamDiscardManager. However,
>> guest_memfd in confidential VMs relies on discard operations for page
>> conversion between private and shared memory. This can lead to stale
>> IOMMU mapping issue when assigning a hardware device to a confidential
>> VM via shared memory. With the introduction of PrivateSharedManager
>> interface to manage private and shared states and being distinct from
>> RamDiscardManager, include PrivateSharedManager in coordinated RAM
>> discard and add related support in VFIO.
> 
> How does the new behavior differ from what
> vfio_register_ram_discard_listener() does? Thanks,

Strictly speaking, there is no particular difference except the embedded
PrivateSharedListener and RamDiscardListener in VFIOXXXListener.

It is possible to extract some common part between
VFIOPrivateSharedListener and VFIORamDiscardListener and some common
part of vfio_register/unregister_xxx_listener(). But I'm not sure if it
can become more concise.

> 
> 
>> Currently, migration support for confidential VMs is not available, so
>> vfio_sync_dirty_bitmap() handling for PrivateSharedListener can be
>> ignored. The register/unregister of PrivateSharedListener is necessary
>> during vfio_listener_region_add/del(). The listener callbacks are
>> similar between RamDiscardListener and PrivateSharedListener, allowing
>> for extraction of common parts opportunisticlly.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>> Changes in v4
>>      - Newly added.
>> ---
>>   hw/vfio/common.c                      | 104 +++++++++++++++++++++++---
>>   hw/vfio/container-base.c              |   1 +
>>   include/hw/vfio/vfio-container-base.h |  10 +++
>>   3 files changed, 105 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 3172d877cc..48468a12c3 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -335,13 +335,9 @@ out:
>>       rcu_read_unlock();
>>   }
>>   -static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
>> -                                            MemoryRegionSection
>> *section)
>> +static void vfio_state_change_notify_to_state_clear(VFIOContainerBase
>> *bcontainer,
>> +                                                   
>> MemoryRegionSection *section)
>>   {
>> -    RamDiscardListener *rdl = container_of(scl, RamDiscardListener,
>> scl);
>> -    VFIORamDiscardListener *vrdl = container_of(rdl,
>> VFIORamDiscardListener,
>> -                                                listener);
>> -    VFIOContainerBase *bcontainer = vrdl->bcontainer;
>>       const hwaddr size = int128_get64(section->size);
>>       const hwaddr iova = section->offset_within_address_space;
>>       int ret;
>> @@ -354,13 +350,28 @@ static void
>> vfio_ram_discard_notify_discard(StateChangeListener *scl,
>>       }
>>   }
>>   -static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
>> +static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
>>                                               MemoryRegionSection
>> *section)
>>   {
>>       RamDiscardListener *rdl = container_of(scl, RamDiscardListener,
>> scl);
>>       VFIORamDiscardListener *vrdl = container_of(rdl,
>> VFIORamDiscardListener,
>>                                                   listener);
>> -    VFIOContainerBase *bcontainer = vrdl->bcontainer;
>> +    vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
>> +}
>> +
>> +static void vfio_private_shared_notify_to_private(StateChangeListener
>> *scl,
>> +                                                  MemoryRegionSection
>> *section)
>> +{
>> +    PrivateSharedListener *psl = container_of(scl,
>> PrivateSharedListener, scl);
>> +    VFIOPrivateSharedListener *vpsl = container_of(psl,
>> VFIOPrivateSharedListener,
>> +                                                   listener);
>> +    vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
>> +}
>> +
>> +static int vfio_state_change_notify_to_state_set(VFIOContainerBase
>> *bcontainer,
>> +                                                 MemoryRegionSection
>> *section,
>> +                                                 uint64_t granularity)
>> +{
>>       const hwaddr end = section->offset_within_region +
>>                          int128_get64(section->size);
>>       hwaddr start, next, iova;
>> @@ -372,7 +383,7 @@ static int
>> vfio_ram_discard_notify_populate(StateChangeListener *scl,
>>        * unmap in minimum granularity later.
>>        */
>>       for (start = section->offset_within_region; start < end; start =
>> next) {
>> -        next = ROUND_UP(start + 1, vrdl->granularity);
>> +        next = ROUND_UP(start + 1, granularity);
>>           next = MIN(next, end);
>>             iova = start - section->offset_within_region +
>> @@ -383,13 +394,33 @@ static int
>> vfio_ram_discard_notify_populate(StateChangeListener *scl,
>>                                        vaddr, section->readonly);
>>           if (ret) {
>>               /* Rollback */
>> -            vfio_ram_discard_notify_discard(scl, section);
>> +            vfio_state_change_notify_to_state_clear(bcontainer,
>> section);
>>               return ret;
>>           }
>>       }
>>       return 0;
>>   }
>>   +static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
>> +                                            MemoryRegionSection
>> *section)
>> +{
>> +    RamDiscardListener *rdl = container_of(scl, RamDiscardListener,
>> scl);
>> +    VFIORamDiscardListener *vrdl = container_of(rdl,
>> VFIORamDiscardListener,
>> +                                                listener);
>> +    return vfio_state_change_notify_to_state_set(vrdl->bcontainer,
>> section,
>> +                                                 vrdl->granularity);
>> +}
>> +
>> +static int vfio_private_shared_notify_to_shared(StateChangeListener
>> *scl,
>> +                                                MemoryRegionSection
>> *section)
>> +{
>> +    PrivateSharedListener *psl = container_of(scl,
>> PrivateSharedListener, scl);
>> +    VFIOPrivateSharedListener *vpsl = container_of(psl,
>> VFIOPrivateSharedListener,
>> +                                                   listener);
>> +    return vfio_state_change_notify_to_state_set(vpsl->bcontainer,
>> section,
>> +                                                 vpsl->granularity);
>> +}
>> +
>>   static void vfio_register_ram_discard_listener(VFIOContainerBase
>> *bcontainer,
>>                                                  MemoryRegionSection
>> *section)
>>   {
>> @@ -466,6 +497,27 @@ static void
>> vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
>>       }
>>   }
>>   +static void vfio_register_private_shared_listener(VFIOContainerBase
>> *bcontainer,
>> +                                                  MemoryRegionSection
>> *section)
>> +{
>> +    GenericStateManager *gsm =
>> memory_region_get_generic_state_manager(section->mr);
>> +    VFIOPrivateSharedListener *vpsl;
>> +    PrivateSharedListener *psl;
>> +
>> +    vpsl = g_new0(VFIOPrivateSharedListener, 1);
>> +    vpsl->bcontainer = bcontainer;
>> +    vpsl->mr = section->mr;
>> +    vpsl->offset_within_address_space = section-
>> >offset_within_address_space;
>> +    vpsl->granularity = generic_state_manager_get_min_granularity(gsm,
>> +                                                                 
>> section->mr);
>> +
>> +    psl = &vpsl->listener;
>> +    private_shared_listener_init(psl,
>> vfio_private_shared_notify_to_shared,
>> +                                 vfio_private_shared_notify_to_private);
>> +    generic_state_manager_register_listener(gsm, &psl->scl, section);
>> +    QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next);
>> +}
>> +
>>   static void vfio_unregister_ram_discard_listener(VFIOContainerBase
>> *bcontainer,
>>                                                    MemoryRegionSection
>> *section)
>>   {
>> @@ -491,6 +543,31 @@ static void
>> vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
>>       g_free(vrdl);
>>   }
>>   +static void
>> vfio_unregister_private_shared_listener(VFIOContainerBase *bcontainer,
>> +                                                   
>> MemoryRegionSection *section)
>> +{
>> +    GenericStateManager *gsm =
>> memory_region_get_generic_state_manager(section->mr);
>> +    VFIOPrivateSharedListener *vpsl = NULL;
>> +    PrivateSharedListener *psl;
>> +
>> +    QLIST_FOREACH(vpsl, &bcontainer->vpsl_list, next) {
>> +        if (vpsl->mr == section->mr &&
>> +            vpsl->offset_within_address_space ==
>> +            section->offset_within_address_space) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (!vpsl) {
>> +        hw_error("vfio: Trying to unregister missing RAM discard
>> listener");
>> +    }
>> +
>> +    psl = &vpsl->listener;
>> +    generic_state_manager_unregister_listener(gsm, &psl->scl);
>> +    QLIST_REMOVE(vpsl, next);
>> +    g_free(vpsl);
>> +}
>> +
>>   static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
>>   {
>>       MemoryRegion *mr = section->mr;
>> @@ -644,6 +721,9 @@ static void
>> vfio_listener_region_add(MemoryListener *listener,
>>       if (memory_region_has_ram_discard_manager(section->mr)) {
>>           vfio_register_ram_discard_listener(bcontainer, section);
>>           return;
>> +    } else if (memory_region_has_private_shared_manager(section->mr)) {
>> +        vfio_register_private_shared_listener(bcontainer, section);
>> +        return;
>>       }
>>         vaddr = memory_region_get_ram_ptr(section->mr) +
>> @@ -764,6 +844,10 @@ static void
>> vfio_listener_region_del(MemoryListener *listener,
>>           vfio_unregister_ram_discard_listener(bcontainer, section);
>>           /* Unregistering will trigger an unmap. */
>>           try_unmap = false;
>> +    } else if (memory_region_has_private_shared_manager(section->mr)) {
>> +        vfio_unregister_private_shared_listener(bcontainer, section);
>> +        /* Unregistering will trigger an unmap. */
>> +        try_unmap = false;
>>       }
>>         if (try_unmap) {
>> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
>> index 749a3fd29d..ff5df925c2 100644
>> --- a/hw/vfio/container-base.c
>> +++ b/hw/vfio/container-base.c
>> @@ -135,6 +135,7 @@ static void vfio_container_instance_init(Object *obj)
>>       bcontainer->iova_ranges = NULL;
>>       QLIST_INIT(&bcontainer->giommu_list);
>>       QLIST_INIT(&bcontainer->vrdl_list);
>> +    QLIST_INIT(&bcontainer->vpsl_list);
>>   }
>>     static const TypeInfo types[] = {
>> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/
>> vfio-container-base.h
>> index 4cff9943ab..8d7c0b1179 100644
>> --- a/include/hw/vfio/vfio-container-base.h
>> +++ b/include/hw/vfio/vfio-container-base.h
>> @@ -47,6 +47,7 @@ typedef struct VFIOContainerBase {
>>       bool dirty_pages_started; /* Protected by BQL */
>>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>>       QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
>> +    QLIST_HEAD(, VFIOPrivateSharedListener) vpsl_list;
>>       QLIST_ENTRY(VFIOContainerBase) next;
>>       QLIST_HEAD(, VFIODevice) device_list;
>>       GList *iova_ranges;
>> @@ -71,6 +72,15 @@ typedef struct VFIORamDiscardListener {
>>       QLIST_ENTRY(VFIORamDiscardListener) next;
>>   } VFIORamDiscardListener;
>>   +typedef struct VFIOPrivateSharedListener {
>> +    VFIOContainerBase *bcontainer;
>> +    MemoryRegion *mr;
>> +    hwaddr offset_within_address_space;
>> +    uint64_t granularity;
>> +    PrivateSharedListener listener;
>> +    QLIST_ENTRY(VFIOPrivateSharedListener) next;
>> +} VFIOPrivateSharedListener;
>> +
>>   int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>>                              hwaddr iova, ram_addr_t size,
>>                              void *vaddr, bool readonly);
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-04-09  9:57   ` Alexey Kardashevskiy
@ 2025-04-10  7:37     ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-10  7:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/9/2025 5:57 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
>> discard") highlighted that subsystems like VFIO may disable RAM block
>> discard. However, guest_memfd relies on discard operations for page
>> conversion between private and shared memory, potentially leading to
>> stale IOMMU mapping issue when assigning hardware devices to
>> confidential VMs via shared memory. To address this, it is crucial to
>> ensure systems like VFIO refresh its IOMMU mappings.
>>
>> PrivateSharedManager is introduced to manage private and shared states in
>> confidential VMs, similar to RamDiscardManager, which supports
>> coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
>> guest_memfd can facilitate the adjustment of VFIO mappings in response
>> to page conversion events.
>>
>> Since guest_memfd is not an object, it cannot directly implement the
>> PrivateSharedManager interface. Implementing it in HostMemoryBackend is
>> not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
>> have a memory backend while others do not. 
> 
> HostMemoryBackend::mr::ram_block::guest_memfd?
> And there is HostMemoryBackendMemfd too.

HostMemoryBackend is the parent of HostMemoryBackendMemfd. It is also
possible to use HostMemoryBackendFile or HostMemoryBackendRAM.

> 
>> Notably, virtual BIOS
>> RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
>> backend.
> 
> I thought private memory can be allocated from guest_memfd only. And it
> is still not clear if this BIOS memory can be discarded or not, does it
> change state during the VM lifetime?
> (sorry I keep asking but I do not remember definitive answer).

The BIOS region supports conversion as it is backed by guest_memfd. It
can change the state but it never does during VM lifetime.

> 
>> To manage RAMBlocks with guest_memfd, define a new object named
>> RamBlockAttribute to implement the RamDiscardManager interface. This
>> object stores guest_memfd information such as shared_bitmap, and handles
>> page conversion notification. The memory state is tracked at the host
>> page size granularity, as the minimum memory conversion size can be one
>> page per request. Additionally, VFIO expects the DMA mapping for a
>> specific iova to be mapped and unmapped with the same granularity.
>> Confidential VMs may perform partial conversions, such as conversions on
>> small regions within larger regions. To prevent invalid cases and until
>> cut_mapping operation support is available, all operations are performed
>> with 4K granularity.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>> Changes in v4:
>>      - Change the name from memory-attribute-manager to
>>        ram-block-attribute.
>>      - Implement the newly-introduced PrivateSharedManager instead of
>>        RamDiscardManager and change related commit message.
>>      - Define the new object in ramblock.h instead of adding a new file.
>>
>> Changes in v3:
>>      - Some rename (bitmap_size->shared_bitmap_size,
>>        first_one/zero_bit->first_bit, etc.)
>>      - Change shared_bitmap_size from uint32_t to unsigned
>>      - Return mgr->mr->ram_block->page_size in get_block_size()
>>      - Move set_ram_discard_manager() up to avoid a g_free() in failure
>>        case.
>>      - Add const for the memory_attribute_manager_get_block_size()
>>      - Unify the ReplayRamPopulate and ReplayRamDiscard and related
>>        callback.
>>
>> Changes in v2:
>>      - Rename the object name to MemoryAttributeManager
>>      - Rename the bitmap to shared_bitmap to make it more clear.
>>      - Remove block_size field and get it from a helper. In future, we
>>        can get the page_size from RAMBlock if necessary.
>>      - Remove the unncessary "struct" before GuestMemfdReplayData
>>      - Remove the unncessary g_free() for the bitmap
>>      - Add some error report when the callback failure for
>>        populated/discarded section.
>>      - Move the realize()/unrealize() definition to this patch.
>> ---
>>   include/exec/ramblock.h      |  24 +++
>>   system/meson.build           |   1 +
>>   system/ram-block-attribute.c | 282 +++++++++++++++++++++++++++++++++++
>>   3 files changed, 307 insertions(+)
>>   create mode 100644 system/ram-block-attribute.c
>>
>> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
>> index 0babd105c0..b8b5469db9 100644
>> --- a/include/exec/ramblock.h
>> +++ b/include/exec/ramblock.h
>> @@ -23,6 +23,10 @@
>>   #include "cpu-common.h"
>>   #include "qemu/rcu.h"
>>   #include "exec/ramlist.h"
>> +#include "system/hostmem.h"
>> +
>> +#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
>> +OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass,
>> RAM_BLOCK_ATTRIBUTE)
>>     struct RAMBlock {
>>       struct rcu_head rcu;
>> @@ -90,5 +94,25 @@ struct RAMBlock {
>>        */
>>       ram_addr_t postcopy_length;
>>   };
>> +
>> +struct RamBlockAttribute {
>> +    Object parent;
>> +
>> +    MemoryRegion *mr;
>> +
>> +    /* 1-setting of the bit represents the memory is populated
>> (shared) */
> 
> It is either RamBlockShared, or it is a "generic" RamBlockAttribute
> implementing a bitmap with a bit per page and no special meaning
> (shared/private or discarded/populated). And if it is a generic
> RamBlockAttribute, then this hunk from 09/13 (which should be in this
> patch) should look like:
> 
> 
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -46,6 +46,7 @@ struct RAMBlock {
>      int fd;
>      uint64_t fd_offset;
>      int guest_memfd;
> +    RamBlockAttribute *ram_shared; // and not "ram_block_attribute"
> 
> Thanks,

I prefer generic RamBlockAttribute as we can extend to manage
private/shared/discarded states in the future if necessary. With the
same reason, I hope to keep the variable name of ram_block_attribute.

> 
> 
>> +    unsigned shared_bitmap_size;
>> +    unsigned long *shared_bitmap;
>> +
>> +    QLIST_HEAD(, PrivateSharedListener) psl_list;
>> +};
>> +
>> +struct RamBlockAttributeClass {
>> +    ObjectClass parent_class;
>> +};
>> +
>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion
>> *mr);
>> +void ram_block_attribute_unrealize(RamBlockAttribute *attr);
>> +
>>   #endif
>>   #endif
>> diff --git a/system/meson.build b/system/meson.build
>> index 4952f4b2c7..50a5a64f1c 100644
>> --- a/system/meson.build
>> +++ b/system/meson.build
>> @@ -15,6 +15,7 @@ system_ss.add(files(
>>     'dirtylimit.c',
>>     'dma-helpers.c',
>>     'globals.c',
>> +  'ram-block-attribute.c',
>>     'memory_mapping.c',
>>     'qdev-monitor.c',
>>     'qtest.c',
>> diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
>> new file mode 100644
>> index 0000000000..283c03b354
>> --- /dev/null
>> +++ b/system/ram-block-attribute.c
>> @@ -0,0 +1,282 @@
>> +/*
>> + * QEMU ram block attribute
>> + *
>> + * Copyright Intel
>> + *
>> + * Author:
>> + *      Chenyi Qiang <chenyi.qiang@intel.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> later.
>> + * See the COPYING file in the top-level directory
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "exec/ramblock.h"
>> +
>> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(RamBlockAttribute,
>> +                                   ram_block_attribute,
>> +                                   RAM_BLOCK_ATTRIBUTE,
>> +                                   OBJECT,
>> +                                   { TYPE_PRIVATE_SHARED_MANAGER },
>> +                                   { })
>> +
>> +static size_t ram_block_attribute_get_block_size(const
>> RamBlockAttribute *attr)
>> +{
>> +    /*
>> +     * Because page conversion could be manipulated in the size of at
>> least 4K or 4K aligned,
>> +     * Use the host page size as the granularity to track the memory
>> attribute.
>> +     */
>> +    g_assert(attr && attr->mr && attr->mr->ram_block);
>> +    g_assert(attr->mr->ram_block->page_size ==
>> qemu_real_host_page_size());
>> +    return attr->mr->ram_block->page_size;
>> +}
>> +
>> +
>> +static bool ram_block_attribute_psm_is_shared(const
>> GenericStateManager *gsm,
>> +                                              const
>> MemoryRegionSection *section)
>> +{
>> +    const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    const int block_size = ram_block_attribute_get_block_size(attr);
>> +    uint64_t first_bit = section->offset_within_region / block_size;
>> +    uint64_t last_bit = first_bit + int128_get64(section->size) /
>> block_size - 1;
>> +    unsigned long first_discard_bit;
>> +
>> +    first_discard_bit = find_next_zero_bit(attr->shared_bitmap,
>> last_bit + 1, first_bit);
>> +    return first_discard_bit > last_bit;
>> +}
>> +
>> +typedef int (*ram_block_attribute_section_cb)(MemoryRegionSection *s,
>> void *arg);
>> +
>> +static int ram_block_attribute_notify_shared_cb(MemoryRegionSection
>> *section, void *arg)
>> +{
>> +    StateChangeListener *scl = arg;
>> +
>> +    return scl->notify_to_state_set(scl, section);
>> +}
>> +
>> +static int ram_block_attribute_notify_private_cb(MemoryRegionSection
>> *section, void *arg)
>> +{
>> +    StateChangeListener *scl = arg;
>> +
>> +    scl->notify_to_state_clear(scl, section);
>> +    return 0;
>> +}
>> +
>> +static int ram_block_attribute_for_each_shared_section(const
>> RamBlockAttribute *attr,
>> +                                                      
>> MemoryRegionSection *section,
>> +                                                       void *arg,
>> +                                                      
>> ram_block_attribute_section_cb cb)
>> +{
>> +    unsigned long first_bit, last_bit;
>> +    uint64_t offset, size;
>> +    const int block_size = ram_block_attribute_get_block_size(attr);
>> +    int ret = 0;
>> +
>> +    first_bit = section->offset_within_region / block_size;
>> +    first_bit = find_next_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size, first_bit);
>> +
>> +    while (first_bit < attr->shared_bitmap_size) {
>> +        MemoryRegionSection tmp = *section;
>> +
>> +        offset = first_bit * block_size;
>> +        last_bit = find_next_zero_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> +                                      first_bit + 1) - 1;
>> +        size = (last_bit - first_bit + 1) * block_size;
>> +
>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>> +            break;
>> +        }
>> +
>> +        ret = cb(&tmp, arg);
>> +        if (ret) {
>> +            error_report("%s: Failed to notify RAM discard listener:
>> %s", __func__,
>> +                         strerror(-ret));
>> +            break;
>> +        }
>> +
>> +        first_bit = find_next_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> +                                  last_bit + 2);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static int ram_block_attribute_for_each_private_section(const
>> RamBlockAttribute *attr,
>> +                                                       
>> MemoryRegionSection *section,
>> +                                                        void *arg,
>> +                                                       
>> ram_block_attribute_section_cb cb)
>> +{
>> +    unsigned long first_bit, last_bit;
>> +    uint64_t offset, size;
>> +    const int block_size = ram_block_attribute_get_block_size(attr);
>> +    int ret = 0;
>> +
>> +    first_bit = section->offset_within_region / block_size;
>> +    first_bit = find_next_zero_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> +                                   first_bit);
>> +
>> +    while (first_bit < attr->shared_bitmap_size) {
>> +        MemoryRegionSection tmp = *section;
>> +
>> +        offset = first_bit * block_size;
>> +        last_bit = find_next_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> +                                      first_bit + 1) - 1;
>> +        size = (last_bit - first_bit + 1) * block_size;
>> +
>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>> +            break;
>> +        }
>> +
>> +        ret = cb(&tmp, arg);
>> +        if (ret) {
>> +            error_report("%s: Failed to notify RAM discard listener:
>> %s", __func__,
>> +                         strerror(-ret));
>> +            break;
>> +        }
>> +
>> +        first_bit = find_next_zero_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> +                                       last_bit + 2);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static uint64_t ram_block_attribute_psm_get_min_granularity(const
>> GenericStateManager *gsm,
>> +                                                            const
>> MemoryRegion *mr)
>> +{
>> +    const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +
>> +    g_assert(mr == attr->mr);
>> +    return ram_block_attribute_get_block_size(attr);
>> +}
>> +
>> +static void
>> ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
>> +                                                     
>> StateChangeListener *scl,
>> +                                                     
>> MemoryRegionSection *section)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    PrivateSharedListener *psl = container_of(scl,
>> PrivateSharedListener, scl);
>> +    int ret;
>> +
>> +    g_assert(section->mr == attr->mr);
>> +    scl->section = memory_region_section_new_copy(section);
>> +
>> +    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
>> +
>> +    ret = ram_block_attribute_for_each_shared_section(attr, section,
>> scl,
>> +                                                     
>> ram_block_attribute_notify_shared_cb);
>> +    if (ret) {
>> +        error_report("%s: Failed to register RAM discard listener:
>> %s", __func__,
>> +                     strerror(-ret));
>> +    }
>> +}
>> +
>> +static void
>> ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm,
>> +                                                       
>> StateChangeListener *scl)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    PrivateSharedListener *psl = container_of(scl,
>> PrivateSharedListener, scl);
>> +    int ret;
>> +
>> +    g_assert(scl->section);
>> +    g_assert(scl->section->mr == attr->mr);
>> +
>> +    ret = ram_block_attribute_for_each_shared_section(attr, scl-
>> >section, scl,
>> +                                                     
>> ram_block_attribute_notify_private_cb);
>> +    if (ret) {
>> +        error_report("%s: Failed to unregister RAM discard listener:
>> %s", __func__,
>> +                     strerror(-ret));
>> +    }
>> +
>> +    memory_region_section_free_copy(scl->section);
>> +    scl->section = NULL;
>> +    QLIST_REMOVE(psl, next);
>> +}
>> +
>> +typedef struct RamBlockAttributeReplayData {
>> +    ReplayStateChange fn;
>> +    void *opaque;
>> +} RamBlockAttributeReplayData;
>> +
>> +static int ram_block_attribute_psm_replay_cb(MemoryRegionSection
>> *section, void *arg)
>> +{
>> +    RamBlockAttributeReplayData *data = arg;
>> +
>> +    return data->fn(section, data->opaque);
>> +}
>> +
>> +static int ram_block_attribute_psm_replay_on_shared(const
>> GenericStateManager *gsm,
>> +                                                   
>> MemoryRegionSection *section,
>> +                                                    ReplayStateChange
>> replay_fn,
>> +                                                    void *opaque)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque =
>> opaque };
>> +
>> +    g_assert(section->mr == attr->mr);
>> +    return ram_block_attribute_for_each_shared_section(attr, section,
>> &data,
>> +                                                      
>> ram_block_attribute_psm_replay_cb);
>> +}
>> +
>> +static int ram_block_attribute_psm_replay_on_private(const
>> GenericStateManager *gsm,
>> +                                                    
>> MemoryRegionSection *section,
>> +                                                    
>> ReplayStateChange replay_fn,
>> +                                                     void *opaque)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque =
>> opaque };
>> +
>> +    g_assert(section->mr == attr->mr);
>> +    return ram_block_attribute_for_each_private_section(attr,
>> section, &data,
>> +                                                       
>> ram_block_attribute_psm_replay_cb);
>> +}
>> +
>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion
>> *mr)
>> +{
>> +    uint64_t shared_bitmap_size;
>> +    const int block_size  = qemu_real_host_page_size();
>> +    int ret;
>> +
>> +    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
>> +
>> +    attr->mr = mr;
>> +    ret = memory_region_set_generic_state_manager(mr,
>> GENERIC_STATE_MANAGER(attr));
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +    attr->shared_bitmap_size = shared_bitmap_size;
>> +    attr->shared_bitmap = bitmap_new(shared_bitmap_size);
>> +
>> +    return ret;
>> +}
>> +
>> +void ram_block_attribute_unrealize(RamBlockAttribute *attr)
>> +{
>> +    g_free(attr->shared_bitmap);
>> +    memory_region_set_generic_state_manager(attr->mr, NULL);
>> +}
>> +
>> +static void ram_block_attribute_init(Object *obj)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj);
>> +
>> +    QLIST_INIT(&attr->psl_list);
>> +}
>> +
>> +static void ram_block_attribute_finalize(Object *obj)
>> +{
>> +}
>> +
>> +static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
>> +{
>> +    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
>> +
>> +    gsmc->get_min_granularity =
>> ram_block_attribute_psm_get_min_granularity;
>> +    gsmc->register_listener = ram_block_attribute_psm_register_listener;
>> +    gsmc->unregister_listener =
>> ram_block_attribute_psm_unregister_listener;
>> +    gsmc->is_state_set = ram_block_attribute_psm_is_shared;
>> +    gsmc->replay_on_state_set =
>> ram_block_attribute_psm_replay_on_shared;
>> +    gsmc->replay_on_state_clear =
>> ram_block_attribute_psm_replay_on_private;
>> +}
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-10  1:44         ` Chenyi Qiang
@ 2025-04-16  3:32           ` Chenyi Qiang
  2025-04-17 23:10             ` Alexey Kardashevskiy
  2025-04-25 12:54             ` David Hildenbrand
  0 siblings, 2 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-16  3:32 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/10/2025 9:44 AM, Chenyi Qiang wrote:
> 
> 
> On 4/10/2025 8:11 AM, Alexey Kardashevskiy wrote:
>>
>>
>> On 9/4/25 22:57, Chenyi Qiang wrote:
>>>
>>>
>>> On 4/9/2025 5:56 PM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 7/4/25 17:49, Chenyi Qiang wrote:
>>>>> RamDiscardManager is an interface used by virtio-mem to adjust VFIO
>>>>> mappings in relation to VM page assignment. It manages the state of
>>>>> populated and discard for the RAM. To accommodate future scnarios for
>>>>> managing RAM states, such as private and shared states in confidential
>>>>> VMs, the existing RamDiscardManager interface needs to be generalized.
>>>>>
>>>>> Introduce a parent class, GenericStateManager, to manage a pair of
>>>>
>>>> "GenericState" is the same as "State" really. Call it RamStateManager.
>>>
>>> OK to me.
>>
>> Sorry, nah. "Generic" would mean "machine" in QEMU.
> 
> OK, anyway, I can rename to RamStateManager if we follow this direction.
> 
>>
>>
>>>>
>>>>
>>>>> opposite states with RamDiscardManager as its child. The changes
>>>>> include
>>>>> - Define a new abstract class GenericStateChange.
>>>>> - Extract six callbacks into GenericStateChangeClass and allow the
>>>>> child
>>>>>     classes to inherit them.
>>>>> - Modify RamDiscardManager-related helpers to use GenericStateManager
>>>>>     ones.
>>>>> - Define a generic StatChangeListener to extract fields from
>>>>
>>>> "e" missing in StateChangeListener.
>>>
>>> Fixed. Thanks.
>>>
>>>>
>>>>>     RamDiscardManager listener which allows future listeners to
>>>>> embed it
>>>>>     and avoid duplication.
>>>>> - Change the users of RamDiscardManager (virtio-mem, migration,
>>>>> etc.) to
>>>>>     switch to use GenericStateChange helpers.
>>>>>
>>>>> It can provide a more flexible and resuable framework for RAM state
>>>>> management, facilitating future enhancements and use cases.
>>>>
>>>> I fail to see how new interface helps with this. RamDiscardManager
>>>> manipulates populated/discarded. It would make sense may be if the new
>>>> class had more bits per page, say private/shared/discarded but it does
>>>> not. And PrivateSharedManager cannot coexist with RamDiscard. imho this
>>>> is going in a wrong direction.
>>>
>>> I think we have two questions here:
>>>
>>> 1. whether we should define an abstract parent class and distinguish the
>>> RamDiscardManager and PrivateSharedManager?
>>
>> If it is 1 bit per page with the meaning "1 == populated == shared",
>> then no, one class will do.
> 
> Not restrict to 1 bit per page. As mentioned in questions 2, the parent
> class can be more generic, e.g. only including
> register/unregister_listener().
> 
> Like in this way:
> 
> The parent class:
> 
> struct StateChangeListener {
>     MemoryRegionSection *section;
> }
> 
> struct RamStateManagerClass {
>     void (*register_listener)();
>     void (*unregister_listener)();
> }
> 
> The child class:
> 
> 1. RamDiscardManager
> 
> struct RamDiscardListener {
>     StateChangeListener scl;
>     NotifyPopulate notify_populate;
>     NotifyDiscard notify_discard;
>     bool double_discard_supported;
> 
>     QLIST_ENTRY(RamDiscardListener) next;
> }
> 
> struct RamDiscardManagerClass {
>     RamStateManagerClass parent_class;
>     uint64_t (*get_min_granularity)();
>     bool (*is_populate)();
>     bool (*replay_populate)();
>     bool (*replay_discard)();
> }
> 
> 2. PrivateSharedManager (or other name like ConfidentialRamManager?)
> 
> struct PrivateSharedListener {
>     StateChangeListener scl;
>     NotifyShared notify_shared;
>     NotifyPrivate notify_private;
>     int priority;
> 
>     QLIST_ENTRY(PrivateSharedListener) next;
> }
> 
> struct PrivateSharedManagerClass {
>     RamStateManagerClass parent_class;
>     uint64_t (*get_min_granularity)();
>     bool (*is_shared)();
>     // No need to define replay_private/replay_shared as no use case at
> present.
> }
> 
> In the future, if we want to manage three states, we can only extend
> PrivateSharedManagerClass/PrivateSharedListener.

Hi Alexey & David,

Any thoughts on this proposal?

> 
>>
>>
>>> I vote for this. First, After making the distinction, the
>>> PrivateSharedManager won't go into the RamDiscardManager path which
>>> PrivateSharedManager may have not supported yet. e.g. the migration
>>> related path. In addtional, we can extend the PrivateSharedManager for
>>> specific handling, e.g. the priority listener, state_change() callback.
>>>
>>> 2. How we should abstract the parent class?
>>>
>>> I think this is the problem. My current implementation extracts all the
>>> callbacks in RamDiscardManager into the parent class and call them
>>> state_set and state_clear, which can only manage a pair of opposite
>>> states. As you mentioned, there could be private/shared/discarded three
>>> states in the future, which is not compatible with current design. Maybe
>>> we can make the parent class more generic, e.g. only extract the
>>> register/unregister_listener() into it.
>>
>> Or we could rename RamDiscardManager to RamStateManager, implement 2bit
>> per page (0 = discarded, 1 = populated+shared, 2 = populated+private).
>> Eventually we will have to deal with the mix of private and shared
>> mappings for the same device, how 1 bit per page is going to work? Thanks,
> 
> Only renaming RamDiscardManager seems not sufficient. Current
> RamDiscardManagerClass can only manage two states. For example, its
> callback functions only have the name of xxx_populate and xxx_discard.
> If we want to extend it to manage three states, we have to modify those
> callbacks, e.g. add some new argument like is_populate(bool is_private),
> or define some new callbacks like is_populate_private(). It will make
> this class more complicated, but actually not necessary in legacy VMs
> without the concept of private/shared.
> 



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-16  3:32           ` Chenyi Qiang
@ 2025-04-17 23:10             ` Alexey Kardashevskiy
  2025-04-18  3:49               ` Chenyi Qiang
  2025-04-25 12:54             ` David Hildenbrand
  1 sibling, 1 reply; 67+ messages in thread
From: Alexey Kardashevskiy @ 2025-04-17 23:10 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 16/4/25 13:32, Chenyi Qiang wrote:
> 
> 
> On 4/10/2025 9:44 AM, Chenyi Qiang wrote:
>>
>>
>> On 4/10/2025 8:11 AM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 9/4/25 22:57, Chenyi Qiang wrote:
>>>>
>>>>
>>>> On 4/9/2025 5:56 PM, Alexey Kardashevskiy wrote:
>>>>>
>>>>>
>>>>> On 7/4/25 17:49, Chenyi Qiang wrote:
>>>>>> RamDiscardManager is an interface used by virtio-mem to adjust VFIO
>>>>>> mappings in relation to VM page assignment. It manages the state of
>>>>>> populated and discard for the RAM. To accommodate future scnarios for
>>>>>> managing RAM states, such as private and shared states in confidential
>>>>>> VMs, the existing RamDiscardManager interface needs to be generalized.
>>>>>>
>>>>>> Introduce a parent class, GenericStateManager, to manage a pair of
>>>>>
>>>>> "GenericState" is the same as "State" really. Call it RamStateManager.
>>>>
>>>> OK to me.
>>>
>>> Sorry, nah. "Generic" would mean "machine" in QEMU.
>>
>> OK, anyway, I can rename to RamStateManager if we follow this direction.
>>
>>>
>>>
>>>>>
>>>>>
>>>>>> opposite states with RamDiscardManager as its child. The changes
>>>>>> include
>>>>>> - Define a new abstract class GenericStateChange.
>>>>>> - Extract six callbacks into GenericStateChangeClass and allow the
>>>>>> child
>>>>>>      classes to inherit them.
>>>>>> - Modify RamDiscardManager-related helpers to use GenericStateManager
>>>>>>      ones.
>>>>>> - Define a generic StatChangeListener to extract fields from
>>>>>
>>>>> "e" missing in StateChangeListener.
>>>>
>>>> Fixed. Thanks.
>>>>
>>>>>
>>>>>>      RamDiscardManager listener which allows future listeners to
>>>>>> embed it
>>>>>>      and avoid duplication.
>>>>>> - Change the users of RamDiscardManager (virtio-mem, migration,
>>>>>> etc.) to
>>>>>>      switch to use GenericStateChange helpers.
>>>>>>
>>>>>> It can provide a more flexible and resuable framework for RAM state
>>>>>> management, facilitating future enhancements and use cases.
>>>>>
>>>>> I fail to see how new interface helps with this. RamDiscardManager
>>>>> manipulates populated/discarded. It would make sense may be if the new
>>>>> class had more bits per page, say private/shared/discarded but it does
>>>>> not. And PrivateSharedManager cannot coexist with RamDiscard. imho this
>>>>> is going in a wrong direction.
>>>>
>>>> I think we have two questions here:
>>>>
>>>> 1. whether we should define an abstract parent class and distinguish the
>>>> RamDiscardManager and PrivateSharedManager?
>>>
>>> If it is 1 bit per page with the meaning "1 == populated == shared",
>>> then no, one class will do.
>>
>> Not restrict to 1 bit per page. As mentioned in questions 2, the parent
>> class can be more generic, e.g. only including
>> register/unregister_listener().
>>
>> Like in this way:
>>
>> The parent class:
>>
>> struct StateChangeListener {
>>      MemoryRegionSection *section;
>> }
>>
>> struct RamStateManagerClass {
>>      void (*register_listener)();
>>      void (*unregister_listener)();
>> }
>>
>> The child class:
>>
>> 1. RamDiscardManager
>>
>> struct RamDiscardListener {
>>      StateChangeListener scl;
>>      NotifyPopulate notify_populate;
>>      NotifyDiscard notify_discard;
>>      bool double_discard_supported;
>>
>>      QLIST_ENTRY(RamDiscardListener) next;
>> }
>>
>> struct RamDiscardManagerClass {
>>      RamStateManagerClass parent_class;
>>      uint64_t (*get_min_granularity)();
>>      bool (*is_populate)();
>>      bool (*replay_populate)();
>>      bool (*replay_discard)();
>> }
>>
>> 2. PrivateSharedManager (or other name like ConfidentialRamManager?)
>>
>> struct PrivateSharedListener {
>>      StateChangeListener scl;
>>      NotifyShared notify_shared;
>>      NotifyPrivate notify_private;
>>      int priority;
>>
>>      QLIST_ENTRY(PrivateSharedListener) next;
>> }
>>
>> struct PrivateSharedManagerClass {
>>      RamStateManagerClass parent_class;
>>      uint64_t (*get_min_granularity)();
>>      bool (*is_shared)();
>>      // No need to define replay_private/replay_shared as no use case at
>> present.
>> }
>>
>> In the future, if we want to manage three states, we can only extend
>> PrivateSharedManagerClass/PrivateSharedListener.
> 
> Hi Alexey & David,
> 
> Any thoughts on this proposal?


Sorry it is taking a while, I'll comment after the holidays. It is just a bit hard to follow how we started with just 1 patch and ended up with 13 patches with no clear answer why. Thanks,


> 
>>
>>>
>>>
>>>> I vote for this. First, After making the distinction, the
>>>> PrivateSharedManager won't go into the RamDiscardManager path which
>>>> PrivateSharedManager may have not supported yet. e.g. the migration
>>>> related path. In addtional, we can extend the PrivateSharedManager for
>>>> specific handling, e.g. the priority listener, state_change() callback.
>>>>
>>>> 2. How we should abstract the parent class?
>>>>
>>>> I think this is the problem. My current implementation extracts all the
>>>> callbacks in RamDiscardManager into the parent class and call them
>>>> state_set and state_clear, which can only manage a pair of opposite
>>>> states. As you mentioned, there could be private/shared/discarded three
>>>> states in the future, which is not compatible with current design. Maybe
>>>> we can make the parent class more generic, e.g. only extract the
>>>> register/unregister_listener() into it.
>>>
>>> Or we could rename RamDiscardManager to RamStateManager, implement 2bit
>>> per page (0 = discarded, 1 = populated+shared, 2 = populated+private).
>>> Eventually we will have to deal with the mix of private and shared
>>> mappings for the same device, how 1 bit per page is going to work? Thanks,
>>
>> Only renaming RamDiscardManager seems not sufficient. Current
>> RamDiscardManagerClass can only manage two states. For example, its
>> callback functions only have the name of xxx_populate and xxx_discard.
>> If we want to extend it to manage three states, we have to modify those
>> callbacks, e.g. add some new argument like is_populate(bool is_private),
>> or define some new callbacks like is_populate_private(). It will make
>> this class more complicated, but actually not necessary in legacy VMs
>> without the concept of private/shared.
>>
> 
> 

-- 
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-17 23:10             ` Alexey Kardashevskiy
@ 2025-04-18  3:49               ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-18  3:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Hildenbrand, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/18/2025 7:10 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 16/4/25 13:32, Chenyi Qiang wrote:
>>
>>
>> On 4/10/2025 9:44 AM, Chenyi Qiang wrote:
>>>
>>>
>>> On 4/10/2025 8:11 AM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 9/4/25 22:57, Chenyi Qiang wrote:
>>>>>
>>>>>
>>>>> On 4/9/2025 5:56 PM, Alexey Kardashevskiy wrote:
>>>>>>
>>>>>>
>>>>>> On 7/4/25 17:49, Chenyi Qiang wrote:
>>>>>>> RamDiscardManager is an interface used by virtio-mem to adjust VFIO
>>>>>>> mappings in relation to VM page assignment. It manages the state of
>>>>>>> populated and discard for the RAM. To accommodate future scnarios
>>>>>>> for
>>>>>>> managing RAM states, such as private and shared states in
>>>>>>> confidential
>>>>>>> VMs, the existing RamDiscardManager interface needs to be
>>>>>>> generalized.
>>>>>>>
>>>>>>> Introduce a parent class, GenericStateManager, to manage a pair of
>>>>>>
>>>>>> "GenericState" is the same as "State" really. Call it
>>>>>> RamStateManager.
>>>>>
>>>>> OK to me.
>>>>
>>>> Sorry, nah. "Generic" would mean "machine" in QEMU.
>>>
>>> OK, anyway, I can rename to RamStateManager if we follow this direction.
>>>
>>>>
>>>>
>>>>>>
>>>>>>
>>>>>>> opposite states with RamDiscardManager as its child. The changes
>>>>>>> include
>>>>>>> - Define a new abstract class GenericStateChange.
>>>>>>> - Extract six callbacks into GenericStateChangeClass and allow the
>>>>>>> child
>>>>>>>      classes to inherit them.
>>>>>>> - Modify RamDiscardManager-related helpers to use
>>>>>>> GenericStateManager
>>>>>>>      ones.
>>>>>>> - Define a generic StatChangeListener to extract fields from
>>>>>>
>>>>>> "e" missing in StateChangeListener.
>>>>>
>>>>> Fixed. Thanks.
>>>>>
>>>>>>
>>>>>>>      RamDiscardManager listener which allows future listeners to
>>>>>>> embed it
>>>>>>>      and avoid duplication.
>>>>>>> - Change the users of RamDiscardManager (virtio-mem, migration,
>>>>>>> etc.) to
>>>>>>>      switch to use GenericStateChange helpers.
>>>>>>>
>>>>>>> It can provide a more flexible and resuable framework for RAM state
>>>>>>> management, facilitating future enhancements and use cases.
>>>>>>
>>>>>> I fail to see how new interface helps with this. RamDiscardManager
>>>>>> manipulates populated/discarded. It would make sense may be if the
>>>>>> new
>>>>>> class had more bits per page, say private/shared/discarded but it
>>>>>> does
>>>>>> not. And PrivateSharedManager cannot coexist with RamDiscard. imho
>>>>>> this
>>>>>> is going in a wrong direction.
>>>>>
>>>>> I think we have two questions here:
>>>>>
>>>>> 1. whether we should define an abstract parent class and
>>>>> distinguish the
>>>>> RamDiscardManager and PrivateSharedManager?
>>>>
>>>> If it is 1 bit per page with the meaning "1 == populated == shared",
>>>> then no, one class will do.
>>>
>>> Not restrict to 1 bit per page. As mentioned in questions 2, the parent
>>> class can be more generic, e.g. only including
>>> register/unregister_listener().
>>>
>>> Like in this way:
>>>
>>> The parent class:
>>>
>>> struct StateChangeListener {
>>>      MemoryRegionSection *section;
>>> }
>>>
>>> struct RamStateManagerClass {
>>>      void (*register_listener)();
>>>      void (*unregister_listener)();
>>> }
>>>
>>> The child class:
>>>
>>> 1. RamDiscardManager
>>>
>>> struct RamDiscardListener {
>>>      StateChangeListener scl;
>>>      NotifyPopulate notify_populate;
>>>      NotifyDiscard notify_discard;
>>>      bool double_discard_supported;
>>>
>>>      QLIST_ENTRY(RamDiscardListener) next;
>>> }
>>>
>>> struct RamDiscardManagerClass {
>>>      RamStateManagerClass parent_class;
>>>      uint64_t (*get_min_granularity)();
>>>      bool (*is_populate)();
>>>      bool (*replay_populate)();
>>>      bool (*replay_discard)();
>>> }
>>>
>>> 2. PrivateSharedManager (or other name like ConfidentialRamManager?)
>>>
>>> struct PrivateSharedListener {
>>>      StateChangeListener scl;
>>>      NotifyShared notify_shared;
>>>      NotifyPrivate notify_private;
>>>      int priority;
>>>
>>>      QLIST_ENTRY(PrivateSharedListener) next;
>>> }
>>>
>>> struct PrivateSharedManagerClass {
>>>      RamStateManagerClass parent_class;
>>>      uint64_t (*get_min_granularity)();
>>>      bool (*is_shared)();
>>>      // No need to define replay_private/replay_shared as no use case at
>>> present.
>>> }
>>>
>>> In the future, if we want to manage three states, we can only extend
>>> PrivateSharedManagerClass/PrivateSharedListener.
>>
>> Hi Alexey & David,
>>
>> Any thoughts on this proposal?
> 
> 
> Sorry it is taking a while, I'll comment after the holidays. It is just
> a bit hard to follow how we started with just 1 patch and ended up with
> 13 patches with no clear answer why. Thanks,

Have a nice holiday! :)

And sorry for the confusion. I missed to paste the history link for the
motivation of the change
(https://lore.kernel.org/qemu-devel/0ed6faf8-f6f4-4050-994b-2722d2726bef@amd.com/)

I think the original RamDiscardManager solution can just work. This
framework change is mainly to facilitate future extension.

> 
> 
>>
>>>
>>>>
>>>>
>>>>> I vote for this. First, After making the distinction, the
>>>>> PrivateSharedManager won't go into the RamDiscardManager path which
>>>>> PrivateSharedManager may have not supported yet. e.g. the migration
>>>>> related path. In addtional, we can extend the PrivateSharedManager for
>>>>> specific handling, e.g. the priority listener, state_change()
>>>>> callback.
>>>>>
>>>>> 2. How we should abstract the parent class?
>>>>>
>>>>> I think this is the problem. My current implementation extracts all
>>>>> the
>>>>> callbacks in RamDiscardManager into the parent class and call them
>>>>> state_set and state_clear, which can only manage a pair of opposite
>>>>> states. As you mentioned, there could be private/shared/discarded
>>>>> three
>>>>> states in the future, which is not compatible with current design.
>>>>> Maybe
>>>>> we can make the parent class more generic, e.g. only extract the
>>>>> register/unregister_listener() into it.
>>>>
>>>> Or we could rename RamDiscardManager to RamStateManager, implement 2bit
>>>> per page (0 = discarded, 1 = populated+shared, 2 = populated+private).
>>>> Eventually we will have to deal with the mix of private and shared
>>>> mappings for the same device, how 1 bit per page is going to work?
>>>> Thanks,
>>>
>>> Only renaming RamDiscardManager seems not sufficient. Current
>>> RamDiscardManagerClass can only manage two states. For example, its
>>> callback functions only have the name of xxx_populate and xxx_discard.
>>> If we want to extend it to manage three states, we have to modify those
>>> callbacks, e.g. add some new argument like is_populate(bool is_private),
>>> or define some new callbacks like is_populate_private(). It will make
>>> this class more complicated, but actually not necessary in legacy VMs
>>> without the concept of private/shared.
>>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result
  2025-04-09  5:52     ` Chenyi Qiang
@ 2025-04-25 12:35       ` David Hildenbrand
  0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-04-25 12:35 UTC (permalink / raw)
  To: Chenyi Qiang, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

                               RamDiscardManager *rdm);>>>      /**
>>>     * memory_region_find: translate an address/size relative to a
>>> diff --git a/system/memory.c b/system/memory.c
>>> index b17b5538ff..62d6b410f0 100644
>>> --- a/system/memory.c
>>> +++ b/system/memory.c
>>> @@ -2115,12 +2115,16 @@ RamDiscardManager
>>> *memory_region_get_ram_discard_manager(MemoryRegion *mr)
>>>        return mr->rdm;
>>>    }
>>>    -void memory_region_set_ram_discard_manager(MemoryRegion *mr,
>>> -                                           RamDiscardManager *rdm)
>>> +int memory_region_set_ram_discard_manager(MemoryRegion *mr,
>>> +                                          RamDiscardManager *rdm)
>>>    {
>>>        g_assert(memory_region_is_ram(mr));
>>> -    g_assert(!rdm || !mr->rdm);
>>> +    if (mr->rdm && rdm) {
>>> +        return -EBUSY;
>>> +    }
>>> +
>>>        mr->rdm = rdm;
>>> +    return 0;
>>
>> This is a change which can potentially break something, or currently
>> there is no way to trigger -EBUSY? Thanks,
> 
> Before this series, virtio-mem is the only user to
> set_ram_discard_manager(), there's no way to trigger -EBUSY. With this
> series, guest_memfd-backed RAMBlocks become the second user. It can be
> triggered if we try to use virtio-mem in confidential VMs.

Right. I have plans on looking into using virtio-mem for confidential 
VMs, so far it's not compatible.

One challenge will be resolving how to deal with two sources of information.

Assume virtio-mem says "memory is now plugged", we would have to check 
with guest_memfd if it is also in the "shared" state. If not, no need to 
notify anybody.

Similarly, if guest_memfd says "memory is now shared", we would have to 
check with guest_memfd if it is in the "plugged" state.


I don't know yet how exactly the solution would look like.

Possibly, we have a list of such "populate/discard" information sources, 
and a real "manager" on top, that gets notified by these sources.

That real "manager" would then collect information from other sources to 
make a decision whether to propagate the populate / shared notification.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard()
  2025-04-07  7:49 ` [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() Chenyi Qiang
  2025-04-09  5:43   ` Alexey Kardashevskiy
@ 2025-04-25 12:42   ` David Hildenbrand
  2025-04-27  2:13     ` Chenyi Qiang
  1 sibling, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-04-25 12:42 UTC (permalink / raw)
  To: Chenyi Qiang, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

On 07.04.25 09:49, Chenyi Qiang wrote:
> Update ReplayRamDiscard() function to return the result and unify the
> ReplayRamPopulate() and ReplayRamDiscard() to ReplayStateChange() at
> the same time due to their identical definitions. This unification
> simplifies related structures, such as VirtIOMEMReplayData, which makes
> it more cleaner and maintainable.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v4:
>      - Modify the commit message. We won't use Replay() operation when
>        doing the attribute change like v3.
> 
> Changes in v3:
>      - Newly added.
> ---

[...]

>   
> -typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque);
> -typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque);
> +typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
>   

But it's not a state change.

ReplayRamState maybe?

[...]
>   /*
> diff --git a/system/memory.c b/system/memory.c
> index 62d6b410f0..b5ab729e13 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2147,7 +2147,7 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
>   
>   int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
>                                            MemoryRegionSection *section,
> -                                         ReplayRamPopulate replay_fn,
> +                                         ReplayStateChange replay_fn,
>                                            void *opaque)
>   {
>       RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
> @@ -2156,15 +2156,15 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
>       return rdmc->replay_populated(rdm, section, replay_fn, opaque);
>   }
>   
> -void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
> -                                          MemoryRegionSection *section,
> -                                          ReplayRamDiscard replay_fn,
> -                                          void *opaque)
> +int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
> +                                         MemoryRegionSection *section,
> +                                         ReplayStateChange replay_fn,
> +                                         void *opaque)
>   {
>       RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
>   
>       g_assert(rdmc->replay_discarded);
> -    rdmc->replay_discarded(rdm, section, replay_fn, opaque);
> +    return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
>   }

The idea was that ram_discard_manager_replay_discarded() would never be 
able to fail. But I don't think this really matters, because the 
function is provided by the caller, that can just always return 0 -- 
like we do in dirty_bitmap_clear_section() now.

So yeah, this looks fine to me, given that we don't call it a "state 
change" when we are merely replaying a selected state.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard()
  2025-04-09  5:43   ` Alexey Kardashevskiy
  2025-04-09  6:56     ` Chenyi Qiang
@ 2025-04-25 12:44     ` David Hildenbrand
  1 sibling, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-04-25 12:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Chenyi Qiang, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

On 09.04.25 07:43, Alexey Kardashevskiy wrote:
> 
> 
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> Update ReplayRamDiscard() function to return the result and unify the
>> ReplayRamPopulate() and ReplayRamDiscard() to ReplayStateChange() at
>> the same time due to their identical definitions. This unification
>> simplifies related structures, such as VirtIOMEMReplayData, which makes
>> it more cleaner and maintainable.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>> Changes in v4:
>>       - Modify the commit message. We won't use Replay() operation when
>>         doing the attribute change like v3.
>>
>> Changes in v3:
>>       - Newly added.
>> ---
>>    hw/virtio/virtio-mem.c | 20 ++++++++++----------
>>    include/exec/memory.h  | 31 ++++++++++++++++---------------
>>    migration/ram.c        |  5 +++--
>>    system/memory.c        | 12 ++++++------
>>    4 files changed, 35 insertions(+), 33 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>> index d0d3a0240f..1a88d649cb 100644
>> --- a/hw/virtio/virtio-mem.c
>> +++ b/hw/virtio/virtio-mem.c
>> @@ -1733,7 +1733,7 @@ static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
>>    }
>>    
>>    struct VirtIOMEMReplayData {
>> -    void *fn;
>> +    ReplayStateChange fn;
> 
> 
> s/ReplayStateChange/ReplayRamStateChange/
> 
> Just "State" is way too generic imho.

Right, but raised in my review, the "Change" is wrong, it's not a change.

ReplayRamState ... ReplayRamDiscardState or sth like that ?

After all, it's the "RAM Discard manager".

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-09  9:56   ` Alexey Kardashevskiy
  2025-04-09 12:57     ` Chenyi Qiang
@ 2025-04-25 12:49     ` David Hildenbrand
  2025-04-27  1:33       ` Chenyi Qiang
  1 sibling, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-04-25 12:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Chenyi Qiang, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

On 09.04.25 11:56, Alexey Kardashevskiy wrote:
> 
> 
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> RamDiscardManager is an interface used by virtio-mem to adjust VFIO
>> mappings in relation to VM page assignment. It manages the state of
>> populated and discard for the RAM. To accommodate future scnarios for
>> managing RAM states, such as private and shared states in confidential
>> VMs, the existing RamDiscardManager interface needs to be generalized.
>>
>> Introduce a parent class, GenericStateManager, to manage a pair of
> 
> "GenericState" is the same as "State" really. Call it RamStateManager.
> 
> 
> 
>> opposite states with RamDiscardManager as its child. The changes include
>> - Define a new abstract class GenericStateChange.
>> - Extract six callbacks into GenericStateChangeClass and allow the child
>>     classes to inherit them.
>> - Modify RamDiscardManager-related helpers to use GenericStateManager
>>     ones.
>> - Define a generic StatChangeListener to extract fields from
> 
> "e" missing in StateChangeListener.
> 
>>     RamDiscardManager listener which allows future listeners to embed it
>>     and avoid duplication.
>> - Change the users of RamDiscardManager (virtio-mem, migration, etc.) to
>>     switch to use GenericStateChange helpers.
>>
>> It can provide a more flexible and resuable framework for RAM state
>> management, facilitating future enhancements and use cases.
> 
> I fail to see how new interface helps with this. RamDiscardManager
> manipulates populated/discarded. It would make sense may be if the new
> class had more bits per page, say private/shared/discarded but it does
> not. And PrivateSharedManager cannot coexist with RamDiscard. imho this
> is going in a wrong direction.

Agreed.

In the future, we will have virtio-mem co-exist with guest_memfd.

Both are information sources, and likely we'd have some instance on top, 
that merges these sources to identify if anybody needs to be notified.

Until we figure out how that would look like, I would suggest to keep it 
as is.

Maybe, in the future we would have a single RamDiscardManager and 
multiple RamDiscardSources per RAMBlock.

The sources notify the manager, and the manager can ask other sources to 
merge the information.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-16  3:32           ` Chenyi Qiang
  2025-04-17 23:10             ` Alexey Kardashevskiy
@ 2025-04-25 12:54             ` David Hildenbrand
  1 sibling, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-04-25 12:54 UTC (permalink / raw)
  To: Chenyi Qiang, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

On 16.04.25 05:32, Chenyi Qiang wrote:
> 
> 
> On 4/10/2025 9:44 AM, Chenyi Qiang wrote:
>>
>>
>> On 4/10/2025 8:11 AM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 9/4/25 22:57, Chenyi Qiang wrote:
>>>>
>>>>
>>>> On 4/9/2025 5:56 PM, Alexey Kardashevskiy wrote:
>>>>>
>>>>>
>>>>> On 7/4/25 17:49, Chenyi Qiang wrote:
>>>>>> RamDiscardManager is an interface used by virtio-mem to adjust VFIO
>>>>>> mappings in relation to VM page assignment. It manages the state of
>>>>>> populated and discard for the RAM. To accommodate future scnarios for
>>>>>> managing RAM states, such as private and shared states in confidential
>>>>>> VMs, the existing RamDiscardManager interface needs to be generalized.
>>>>>>
>>>>>> Introduce a parent class, GenericStateManager, to manage a pair of
>>>>>
>>>>> "GenericState" is the same as "State" really. Call it RamStateManager.
>>>>
>>>> OK to me.
>>>
>>> Sorry, nah. "Generic" would mean "machine" in QEMU.
>>
>> OK, anyway, I can rename to RamStateManager if we follow this direction.
>>
>>>
>>>
>>>>>
>>>>>
>>>>>> opposite states with RamDiscardManager as its child. The changes
>>>>>> include
>>>>>> - Define a new abstract class GenericStateChange.
>>>>>> - Extract six callbacks into GenericStateChangeClass and allow the
>>>>>> child
>>>>>>      classes to inherit them.
>>>>>> - Modify RamDiscardManager-related helpers to use GenericStateManager
>>>>>>      ones.
>>>>>> - Define a generic StatChangeListener to extract fields from
>>>>>
>>>>> "e" missing in StateChangeListener.
>>>>
>>>> Fixed. Thanks.
>>>>
>>>>>
>>>>>>      RamDiscardManager listener which allows future listeners to
>>>>>> embed it
>>>>>>      and avoid duplication.
>>>>>> - Change the users of RamDiscardManager (virtio-mem, migration,
>>>>>> etc.) to
>>>>>>      switch to use GenericStateChange helpers.
>>>>>>
>>>>>> It can provide a more flexible and resuable framework for RAM state
>>>>>> management, facilitating future enhancements and use cases.
>>>>>
>>>>> I fail to see how new interface helps with this. RamDiscardManager
>>>>> manipulates populated/discarded. It would make sense may be if the new
>>>>> class had more bits per page, say private/shared/discarded but it does
>>>>> not. And PrivateSharedManager cannot coexist with RamDiscard. imho this
>>>>> is going in a wrong direction.
>>>>
>>>> I think we have two questions here:
>>>>
>>>> 1. whether we should define an abstract parent class and distinguish the
>>>> RamDiscardManager and PrivateSharedManager?
>>>
>>> If it is 1 bit per page with the meaning "1 == populated == shared",
>>> then no, one class will do.
>>
>> Not restrict to 1 bit per page. As mentioned in questions 2, the parent
>> class can be more generic, e.g. only including
>> register/unregister_listener().
>>
>> Like in this way:
>>
>> The parent class:
>>
>> struct StateChangeListener {
>>      MemoryRegionSection *section;
>> }
>>
>> struct RamStateManagerClass {
>>      void (*register_listener)();
>>      void (*unregister_listener)();
>> }
>>
>> The child class:
>>
>> 1. RamDiscardManager
>>
>> struct RamDiscardListener {
>>      StateChangeListener scl;
>>      NotifyPopulate notify_populate;
>>      NotifyDiscard notify_discard;
>>      bool double_discard_supported;
>>
>>      QLIST_ENTRY(RamDiscardListener) next;
>> }
>>
>> struct RamDiscardManagerClass {
>>      RamStateManagerClass parent_class;
>>      uint64_t (*get_min_granularity)();
>>      bool (*is_populate)();
>>      bool (*replay_populate)();
>>      bool (*replay_discard)();
>> }
>>
>> 2. PrivateSharedManager (or other name like ConfidentialRamManager?)
>>
>> struct PrivateSharedListener {
>>      StateChangeListener scl;
>>      NotifyShared notify_shared;
>>      NotifyPrivate notify_private;
>>      int priority;
>>
>>      QLIST_ENTRY(PrivateSharedListener) next;
>> }
>>
>> struct PrivateSharedManagerClass {
>>      RamStateManagerClass parent_class;
>>      uint64_t (*get_min_granularity)();
>>      bool (*is_shared)();
>>      // No need to define replay_private/replay_shared as no use case at
>> present.
>> }
>>
>> In the future, if we want to manage three states, we can only extend
>> PrivateSharedManagerClass/PrivateSharedListener.
> 
> Hi Alexey & David,
> 
> Any thoughts on this proposal?

Thinking about how to reasonable make virtio-mem and guest_memdfd work 
in the future together, I don't think such an abstraction might 
necessarily help. (see my other mails)

In the end we populate/discard, how to merge that information from 
multiple sources (or maintain it in a single object) is TBD.

virtio-mem has a bitmap that is usually 1 bit per block (e.g., 2 MiB). 
guest_memfd has a bitmap that is usually 1 bit per page.

Maybe a GuestRamStateManager would store both separately if requested. 
virtio-mem would register itself with it, and guest_memfd would register 
itself with that.

GuestRamStateManager would then implement the logic of merging both 
information (shared vs. private, plugged vs. unplugged).

But that needs more thought: essentially, the virtio-mem bitmap would 
move to the GuestRamStateManager.

OFC, we would only want the bitmaps and the manager if there is an 
actual provider for it (e.g., virtio-mem for the plugged part, 
guest_memfd for the cc part).

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager
  2025-04-07  7:49 ` [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager Chenyi Qiang
  2025-04-09  9:56   ` Alexey Kardashevskiy
@ 2025-04-25 12:57   ` David Hildenbrand
  2025-04-27  1:40     ` Chenyi Qiang
  1 sibling, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-04-25 12:57 UTC (permalink / raw)
  To: Chenyi Qiang, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

On 07.04.25 09:49, Chenyi Qiang wrote:
> To manage the private and shared RAM states in confidential VMs,
> introduce a new class of PrivateShareManager as a child of
> GenericStateManager, which inherits the six interface callbacks. With a
> different interface type, it can be distinguished from the
> RamDiscardManager object and provide the flexibility for addressing
> specific requirements of confidential VMs in the future.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---

See my other mail, likely this is going into the wrong direction.

If we want to abstract more into a RamStateManager, then it would have 
to have two two sets of states, and allow for registering a provider for 
each of the states.

It would then merge these informations.

But the private vs. shared provider and the plugged vs. unplugged 
provider would not be a subclass of the RamStateManager.

They would have a different interface.

(e.g., RamDiscardStateProvider vs. RamPrivateStateProvider)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager
  2025-04-25 12:49     ` David Hildenbrand
@ 2025-04-27  1:33       ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-27  1:33 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

Thanks David for your review!

On 4/25/2025 8:49 PM, David Hildenbrand wrote:
> On 09.04.25 11:56, Alexey Kardashevskiy wrote:
>>
>>
>> On 7/4/25 17:49, Chenyi Qiang wrote:
>>> RamDiscardManager is an interface used by virtio-mem to adjust VFIO
>>> mappings in relation to VM page assignment. It manages the state of
>>> populated and discard for the RAM. To accommodate future scnarios for
>>> managing RAM states, such as private and shared states in confidential
>>> VMs, the existing RamDiscardManager interface needs to be generalized.
>>>
>>> Introduce a parent class, GenericStateManager, to manage a pair of
>>
>> "GenericState" is the same as "State" really. Call it RamStateManager.
>>
>>
>>
>>> opposite states with RamDiscardManager as its child. The changes include
>>> - Define a new abstract class GenericStateChange.
>>> - Extract six callbacks into GenericStateChangeClass and allow the child
>>>     classes to inherit them.
>>> - Modify RamDiscardManager-related helpers to use GenericStateManager
>>>     ones.
>>> - Define a generic StatChangeListener to extract fields from
>>
>> "e" missing in StateChangeListener.
>>
>>>     RamDiscardManager listener which allows future listeners to embed it
>>>     and avoid duplication.
>>> - Change the users of RamDiscardManager (virtio-mem, migration, etc.) to
>>>     switch to use GenericStateChange helpers.
>>>
>>> It can provide a more flexible and resuable framework for RAM state
>>> management, facilitating future enhancements and use cases.
>>
>> I fail to see how new interface helps with this. RamDiscardManager
>> manipulates populated/discarded. It would make sense may be if the new
>> class had more bits per page, say private/shared/discarded but it does
>> not. And PrivateSharedManager cannot coexist with RamDiscard. imho this
>> is going in a wrong direction.
> 
> Agreed.
> 
> In the future, we will have virtio-mem co-exist with guest_memfd.
> 
> Both are information sources, and likely we'd have some instance on top,
> that merges these sources to identify if anybody needs to be notified.

Yes, that's the problem. My current proposal is not so abstract to fit
for this target. In my proposal, it tries to create a center for each
target. For example, virtio-mem has a VirtioMemRamStateManager to manage
populate/discard states. guest-memfd has a GuestRamStateManager to
manage private/shared/discarded states. That is poorly structured and
would bring a lot of duplication.

> 
> Until we figure out how that would look like, I would suggest to keep it
> as is.

OK, I'll revert to the old implementation in next version. Thanks!

> 
> Maybe, in the future we would have a single RamDiscardManager and
> multiple RamDiscardSources per RAMBlock.
> 
> The sources notify the manager, and the manager can ask other sources to
> merge the information.
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager
  2025-04-25 12:57   ` David Hildenbrand
@ 2025-04-27  1:40     ` Chenyi Qiang
  2025-04-29 10:01       ` David Hildenbrand
  0 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-27  1:40 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/25/2025 8:57 PM, David Hildenbrand wrote:
> On 07.04.25 09:49, Chenyi Qiang wrote:
>> To manage the private and shared RAM states in confidential VMs,
>> introduce a new class of PrivateShareManager as a child of
>> GenericStateManager, which inherits the six interface callbacks. With a
>> different interface type, it can be distinguished from the
>> RamDiscardManager object and provide the flexibility for addressing
>> specific requirements of confidential VMs in the future.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
> 
> See my other mail, likely this is going into the wrong direction.
> 
> If we want to abstract more into a RamStateManager, then it would have
> to have two two sets of states, and allow for registering a provider for
> each of the states.
> 
> It would then merge these informations.
> 
> But the private vs. shared provider and the plugged vs. unplugged
> provider would not be a subclass of the RamStateManager.
> 
> They would have a different interface.
> 
> (e.g., RamDiscardStateProvider vs. RamPrivateStateProvider)

Got it! Before the real use case (guest_memfd + virtio-mem) comes, I
would keep the original design. Maybe seek the new framework after that
work.

> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard()
  2025-04-25 12:42   ` David Hildenbrand
@ 2025-04-27  2:13     ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-27  2:13 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 4/25/2025 8:42 PM, David Hildenbrand wrote:
> On 07.04.25 09:49, Chenyi Qiang wrote:
>> Update ReplayRamDiscard() function to return the result and unify the
>> ReplayRamPopulate() and ReplayRamDiscard() to ReplayStateChange() at
>> the same time due to their identical definitions. This unification
>> simplifies related structures, such as VirtIOMEMReplayData, which makes
>> it more cleaner and maintainable.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>> Changes in v4:
>>      - Modify the commit message. We won't use Replay() operation when
>>        doing the attribute change like v3.
>>
>> Changes in v3:
>>      - Newly added.
>> ---
> 
> [...]
> 
>>   -typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void
>> *opaque);
>> -typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void
>> *opaque);
>> +typedef int (*ReplayStateChange)(MemoryRegionSection *section, void
>> *opaque);
>>   
> 
> But it's not a state change.
> 
> ReplayRamState maybe?

OK. Will rename it to ReplayRamDiscardState as mentioned in another
thread. Thanks.

> 
> [...]
>>   /*
>> diff --git a/system/memory.c b/system/memory.c
>> index 62d6b410f0..b5ab729e13 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -2147,7 +2147,7 @@ bool ram_discard_manager_is_populated(const
>> RamDiscardManager *rdm,
>>     int ram_discard_manager_replay_populated(const RamDiscardManager
>> *rdm,
>>                                            MemoryRegionSection *section,
>> -                                         ReplayRamPopulate replay_fn,
>> +                                         ReplayStateChange replay_fn,
>>                                            void *opaque)
>>   {
>>       RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
>> @@ -2156,15 +2156,15 @@ int ram_discard_manager_replay_populated(const
>> RamDiscardManager *rdm,
>>       return rdmc->replay_populated(rdm, section, replay_fn, opaque);
>>   }
>>   -void ram_discard_manager_replay_discarded(const RamDiscardManager
>> *rdm,
>> -                                          MemoryRegionSection *section,
>> -                                          ReplayRamDiscard replay_fn,
>> -                                          void *opaque)
>> +int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
>> +                                         MemoryRegionSection *section,
>> +                                         ReplayStateChange replay_fn,
>> +                                         void *opaque)
>>   {
>>       RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
>>         g_assert(rdmc->replay_discarded);
>> -    rdmc->replay_discarded(rdm, section, replay_fn, opaque);
>> +    return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
>>   }
> 
> The idea was that ram_discard_manager_replay_discarded() would never be
> able to fail. But I don't think this really matters, because the
> function is provided by the caller, that can just always return 0 --
> like we do in dirty_bitmap_clear_section() now.
> 
> So yeah, this looks fine to me, given that we don't call it a "state
> change" when we are merely replaying a selected state.
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result
  2025-04-07  7:49 ` [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result Chenyi Qiang
@ 2025-04-27  2:26   ` Chenyi Qiang
  2025-05-09  2:38     ` Chao Gao
  2025-05-09  8:22     ` Baolu Lu
  0 siblings, 2 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-04-27  2:26 UTC (permalink / raw)
  To: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

Hi David,

Any thought on patch 10-12, which is to move the change attribute into a
priority listener. A problem is how to handle the error handling of
private_to_shared failure. Previously, we thought it would never be able
to fail, but right now, it is possible in corner cases (e.g. -ENOMEM) in
set_attribute_private(). At present, I simply raise an assert instead of
adding any rollback work (see patch 11).

On 4/7/2025 3:49 PM, Chenyi Qiang wrote:
> So that the caller can check the result of NotifyStateClear() handler if
> the operation fails.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v4:
>     - Newly added.
> ---
>  hw/vfio/common.c      | 18 ++++++++++--------
>  include/exec/memory.h |  4 ++--
>  2 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 48468a12c3..6e49ae597d 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -335,8 +335,8 @@ out:
>      rcu_read_unlock();
>  }
>  
> -static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
> -                                                    MemoryRegionSection *section)
> +static int vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
> +                                                   MemoryRegionSection *section)
>  {
>      const hwaddr size = int128_get64(section->size);
>      const hwaddr iova = section->offset_within_address_space;
> @@ -348,24 +348,26 @@ static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontaine
>          error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
>                       strerror(-ret));
>      }
> +
> +    return ret;
>  }
>  
> -static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
> -                                            MemoryRegionSection *section)
> +static int vfio_ram_discard_notify_discard(StateChangeListener *scl,
> +                                           MemoryRegionSection *section)
>  {
>      RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
>      VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
>                                                  listener);
> -    vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
> +    return vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
>  }
>  
> -static void vfio_private_shared_notify_to_private(StateChangeListener *scl,
> -                                                  MemoryRegionSection *section)
> +static int vfio_private_shared_notify_to_private(StateChangeListener *scl,
> +                                                 MemoryRegionSection *section)
>  {
>      PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
>      VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
>                                                     listener);
> -    vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
> +    return vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
>  }
>  
>  static int vfio_state_change_notify_to_state_set(VFIOContainerBase *bcontainer,
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index a61896251c..9472d9e9b4 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -523,8 +523,8 @@ typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
>  typedef struct StateChangeListener StateChangeListener;
>  typedef int (*NotifyStateSet)(StateChangeListener *scl,
>                                MemoryRegionSection *section);
> -typedef void (*NotifyStateClear)(StateChangeListener *scl,
> -                                 MemoryRegionSection *section);
> +typedef int (*NotifyStateClear)(StateChangeListener *scl,
> +                                MemoryRegionSection *section);
>  
>  struct StateChangeListener {
>      /*


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager
  2025-04-27  1:40     ` Chenyi Qiang
@ 2025-04-29 10:01       ` David Hildenbrand
  0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-04-29 10:01 UTC (permalink / raw)
  To: Chenyi Qiang, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

On 27.04.25 03:40, Chenyi Qiang wrote:
> 
> 
> On 4/25/2025 8:57 PM, David Hildenbrand wrote:
>> On 07.04.25 09:49, Chenyi Qiang wrote:
>>> To manage the private and shared RAM states in confidential VMs,
>>> introduce a new class of PrivateShareManager as a child of
>>> GenericStateManager, which inherits the six interface callbacks. With a
>>> different interface type, it can be distinguished from the
>>> RamDiscardManager object and provide the flexibility for addressing
>>> specific requirements of confidential VMs in the future.
>>>
>>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>>> ---
>>
>> See my other mail, likely this is going into the wrong direction.
>>
>> If we want to abstract more into a RamStateManager, then it would have
>> to have two two sets of states, and allow for registering a provider for
>> each of the states.
>>
>> It would then merge these informations.
>>
>> But the private vs. shared provider and the plugged vs. unplugged
>> provider would not be a subclass of the RamStateManager.
>>
>> They would have a different interface.
>>
>> (e.g., RamDiscardStateProvider vs. RamPrivateStateProvider)
> 
> Got it! Before the real use case (guest_memfd + virtio-mem) comes, I
> would keep the original design.

Yes, absolutely fine with me.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result
  2025-04-27  2:26   ` Chenyi Qiang
@ 2025-05-09  2:38     ` Chao Gao
  2025-05-09  8:20       ` David Hildenbrand
  2025-05-09  8:22     ` Baolu Lu
  1 sibling, 1 reply; 67+ messages in thread
From: Chao Gao @ 2025-05-09  2:38 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth,
	qemu-devel, kvm, Williams Dan J, Peng Chao P, Xu Yilun,
	Li Xiaoyao

On Sun, Apr 27, 2025 at 10:26:52AM +0800, Chenyi Qiang wrote:
>Hi David,
>
>Any thought on patch 10-12, which is to move the change attribute into a
>priority listener. A problem is how to handle the error handling of
>private_to_shared failure. Previously, we thought it would never be able
>to fail, but right now, it is possible in corner cases (e.g. -ENOMEM) in
>set_attribute_private(). At present, I simply raise an assert instead of
>adding any rollback work (see patch 11).

I took a look at patches 10-12, and here are my thoughts:

Moving the change attribute into a priority listener seems sensible. It can
ensure the correct order between setting memory attributes and VFIO's DMA
map/unmap operations, and it can also simplify rollbacks. Since
MemoryListener already uses a priority-based list, it should be a good fit
for page conversion listeners.

Regarding error handling, -ENOMEM won't occur during page conversion
because the attribute xarray on the KVM side is populated earlier when QEMU
calls kvm_set_phys_mem() -> kvm_set_memory_attributes_private(). Other
errors, such as -EINVAL and -EFAULT, are unlikely to occur unless there is
a bug in QEMU. So, the assertion is appropriate for now. And, since any
failure in page conversion currently leads to QEMU quit (as seen in
kvm_cpu_exec() -> kvm_convert_memory()), implementing a complex rollback in
this case doesn't add value and merely adds code that is difficult to test.

Let's see what others think.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-04-07  7:49 ` [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd Chenyi Qiang
  2025-04-09  9:57   ` Alexey Kardashevskiy
@ 2025-05-09  6:41   ` Baolu Lu
  2025-05-09  7:55     ` Chenyi Qiang
  2025-05-12  8:07   ` Zhao Liu
  2 siblings, 1 reply; 67+ messages in thread
From: Baolu Lu @ 2025-05-09  6:41 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

On 4/7/25 15:49, Chenyi Qiang wrote:
> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
> discard") highlighted that subsystems like VFIO may disable RAM block
> discard. However, guest_memfd relies on discard operations for page
> conversion between private and shared memory, potentially leading to
> stale IOMMU mapping issue when assigning hardware devices to
> confidential VMs via shared memory. To address this, it is crucial to
> ensure systems like VFIO refresh its IOMMU mappings.
> 
> PrivateSharedManager is introduced to manage private and shared states in
> confidential VMs, similar to RamDiscardManager, which supports
> coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
> guest_memfd can facilitate the adjustment of VFIO mappings in response
> to page conversion events.
> 
> Since guest_memfd is not an object, it cannot directly implement the
> PrivateSharedManager interface. Implementing it in HostMemoryBackend is
> not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
> have a memory backend while others do not. Notably, virtual BIOS
> RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
> backend.
> 
> To manage RAMBlocks with guest_memfd, define a new object named
> RamBlockAttribute to implement the RamDiscardManager interface. This
> object stores guest_memfd information such as shared_bitmap, and handles
> page conversion notification. The memory state is tracked at the host
> page size granularity, as the minimum memory conversion size can be one
> page per request. Additionally, VFIO expects the DMA mapping for a
> specific iova to be mapped and unmapped with the same granularity.
> Confidential VMs may perform partial conversions, such as conversions on
> small regions within larger regions. To prevent invalid cases and until
> cut_mapping operation support is available, all operations are performed
> with 4K granularity.

Just for your information, IOMMUFD plans to introduce the support for
cut operation. The kickoff patch series is under discussion here:

https://lore.kernel.org/linux-iommu/0-v2-5c26bde5c22d+58b-iommu_pt_jgg@nvidia.com/

This new cut support is expected to be exclusive to IOMMUFD and not
directly available in the VFIO container context. The VFIO uAPI for map/
unmap is being superseded by IOMMUFD, and all new features will only be
available in IOMMUFD.

> 
> Signed-off-by: Chenyi Qiang<chenyi.qiang@intel.com>

<...>

> +
> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr)
> +{
> +    uint64_t shared_bitmap_size;
> +    const int block_size  = qemu_real_host_page_size();
> +    int ret;
> +
> +    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
> +
> +    attr->mr = mr;
> +    ret = memory_region_set_generic_state_manager(mr, GENERIC_STATE_MANAGER(attr));
> +    if (ret) {
> +        return ret;
> +    }
> +    attr->shared_bitmap_size = shared_bitmap_size;
> +    attr->shared_bitmap = bitmap_new(shared_bitmap_size);

Above introduces a bitmap to track the private/shared state of each 4KB
page. While functional, for large RAM blocks managed by guest_memfd,
this could lead to significant memory consumption.

Have you considered an alternative like a Maple Tree or a generic
interval tree? Both are often more memory-efficient for tracking ranges
of contiguous states.

> +
> +    return ret;
> +}
> +
> +void ram_block_attribute_unrealize(RamBlockAttribute *attr)
> +{
> +    g_free(attr->shared_bitmap);
> +    memory_region_set_generic_state_manager(attr->mr, NULL);
> +}

Thanks,
baolu

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-05-09  6:41   ` Baolu Lu
@ 2025-05-09  7:55     ` Chenyi Qiang
  2025-05-09  8:18       ` David Hildenbrand
  0 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-05-09  7:55 UTC (permalink / raw)
  To: Baolu Lu, David Hildenbrand, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

Thanks Baolu for your review!

On 5/9/2025 2:41 PM, Baolu Lu wrote:
> On 4/7/25 15:49, Chenyi Qiang wrote:
>> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
>> discard") highlighted that subsystems like VFIO may disable RAM block
>> discard. However, guest_memfd relies on discard operations for page
>> conversion between private and shared memory, potentially leading to
>> stale IOMMU mapping issue when assigning hardware devices to
>> confidential VMs via shared memory. To address this, it is crucial to
>> ensure systems like VFIO refresh its IOMMU mappings.
>>
>> PrivateSharedManager is introduced to manage private and shared states in
>> confidential VMs, similar to RamDiscardManager, which supports
>> coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
>> guest_memfd can facilitate the adjustment of VFIO mappings in response
>> to page conversion events.
>>
>> Since guest_memfd is not an object, it cannot directly implement the
>> PrivateSharedManager interface. Implementing it in HostMemoryBackend is
>> not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
>> have a memory backend while others do not. Notably, virtual BIOS
>> RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
>> backend.
>>
>> To manage RAMBlocks with guest_memfd, define a new object named
>> RamBlockAttribute to implement the RamDiscardManager interface. This
>> object stores guest_memfd information such as shared_bitmap, and handles
>> page conversion notification. The memory state is tracked at the host
>> page size granularity, as the minimum memory conversion size can be one
>> page per request. Additionally, VFIO expects the DMA mapping for a
>> specific iova to be mapped and unmapped with the same granularity.
>> Confidential VMs may perform partial conversions, such as conversions on
>> small regions within larger regions. To prevent invalid cases and until
>> cut_mapping operation support is available, all operations are performed
>> with 4K granularity.
> 
> Just for your information, IOMMUFD plans to introduce the support for
> cut operation. The kickoff patch series is under discussion here:
> 
> https://lore.kernel.org/linux-iommu/0-v2-5c26bde5c22d+58b-
> iommu_pt_jgg@nvidia.com/

Thanks for this info. Just find the new version comes out.

> 
> This new cut support is expected to be exclusive to IOMMUFD and not
> directly available in the VFIO container context. The VFIO uAPI for map/
> unmap is being superseded by IOMMUFD, and all new features will only be
> available in IOMMUFD.

Yeah. I would suggest the test step to use iommufd in my cover letter
since this is the direction.

> 
>>
>> Signed-off-by: Chenyi Qiang<chenyi.qiang@intel.com>
> 
> <...>
> 
>> +
>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion
>> *mr)
>> +{
>> +    uint64_t shared_bitmap_size;
>> +    const int block_size  = qemu_real_host_page_size();
>> +    int ret;
>> +
>> +    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
>> +
>> +    attr->mr = mr;
>> +    ret = memory_region_set_generic_state_manager(mr,
>> GENERIC_STATE_MANAGER(attr));
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +    attr->shared_bitmap_size = shared_bitmap_size;
>> +    attr->shared_bitmap = bitmap_new(shared_bitmap_size);
> 
> Above introduces a bitmap to track the private/shared state of each 4KB
> page. While functional, for large RAM blocks managed by guest_memfd,
> this could lead to significant memory consumption.
> 
> Have you considered an alternative like a Maple Tree or a generic
> interval tree? Both are often more memory-efficient for tracking ranges
> of contiguous states.

Maybe not necessary. The memory overhead is 1 bit per page
(1/(4096*8)=0.003%). I think it is not too much.

> 
>> +
>> +    return ret;
>> +}
>> +
>> +void ram_block_attribute_unrealize(RamBlockAttribute *attr)
>> +{
>> +    g_free(attr->shared_bitmap);
>> +    memory_region_set_generic_state_manager(attr->mr, NULL);
>> +}
> 
> Thanks,
> baolu


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-05-09  7:55     ` Chenyi Qiang
@ 2025-05-09  8:18       ` David Hildenbrand
  2025-05-09 10:37         ` Chenyi Qiang
  0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-05-09  8:18 UTC (permalink / raw)
  To: Chenyi Qiang, Baolu Lu, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

>>>
>>> Signed-off-by: Chenyi Qiang<chenyi.qiang@intel.com>
>>
>> <...>
>>
>>> +
>>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion
>>> *mr)
>>> +{
>>> +    uint64_t shared_bitmap_size;
>>> +    const int block_size  = qemu_real_host_page_size();
>>> +    int ret;
>>> +
>>> +    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
>>> +
>>> +    attr->mr = mr;
>>> +    ret = memory_region_set_generic_state_manager(mr,
>>> GENERIC_STATE_MANAGER(attr));
>>> +    if (ret) {
>>> +        return ret;
>>> +    }
>>> +    attr->shared_bitmap_size = shared_bitmap_size;
>>> +    attr->shared_bitmap = bitmap_new(shared_bitmap_size);
>>
>> Above introduces a bitmap to track the private/shared state of each 4KB
>> page. While functional, for large RAM blocks managed by guest_memfd,
>> this could lead to significant memory consumption.
>>
>> Have you considered an alternative like a Maple Tree or a generic
>> interval tree? Both are often more memory-efficient for tracking ranges
>> of contiguous states.
> 
> Maybe not necessary. The memory overhead is 1 bit per page
> (1/(4096*8)=0.003%). I think it is not too much.

It's certainly not optimal.

IIRC, QEMU already maintains 3 dirty bitmaps in
ram_list.dirty_memory (DIRTY_MEMORY_NUM = 3) for guest ram.

With KVM, we also allocate yet another dirty bitmap without 
KVM_MEM_LOG_DIRTY_PAGES.

Assuming a 4 TiB VM, a single bitmap should be 128 MiB.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result
  2025-05-09  2:38     ` Chao Gao
@ 2025-05-09  8:20       ` David Hildenbrand
  2025-05-09  9:19         ` Chenyi Qiang
  0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2025-05-09  8:20 UTC (permalink / raw)
  To: Chao Gao, Chenyi Qiang
  Cc: Alexey Kardashevskiy, Peter Xu, Gupta Pankaj, Paolo Bonzini,
	Philippe Mathieu-Daudé, Michael Roth, qemu-devel, kvm,
	Williams Dan J, Peng Chao P, Xu Yilun, Li Xiaoyao

On 09.05.25 04:38, Chao Gao wrote:
> On Sun, Apr 27, 2025 at 10:26:52AM +0800, Chenyi Qiang wrote:
>> Hi David,
>>
>> Any thought on patch 10-12, which is to move the change attribute into a
>> priority listener. A problem is how to handle the error handling of
>> private_to_shared failure. Previously, we thought it would never be able
>> to fail, but right now, it is possible in corner cases (e.g. -ENOMEM) in
>> set_attribute_private(). At present, I simply raise an assert instead of
>> adding any rollback work (see patch 11).
> 
> I took a look at patches 10-12, and here are my thoughts:
> 
> Moving the change attribute into a priority listener seems sensible. It can
> ensure the correct order between setting memory attributes and VFIO's DMA
> map/unmap operations, and it can also simplify rollbacks. Since
> MemoryListener already uses a priority-based list, it should be a good fit
> for page conversion listeners.
> 
> Regarding error handling, -ENOMEM won't occur during page conversion
> because the attribute xarray on the KVM side is populated earlier when QEMU
> calls kvm_set_phys_mem() -> kvm_set_memory_attributes_private(). 

I'll note that, with guest_memfd supporting in-place conversion in the 
future, this conversion path will likely change, and we might more 
likely in getting more errors on some conversion paths. (e.g., shared -> 
private could fail).

But I agree, we should keep complex error handling out of the picture 
for now if not required.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result
  2025-04-27  2:26   ` Chenyi Qiang
  2025-05-09  2:38     ` Chao Gao
@ 2025-05-09  8:22     ` Baolu Lu
  2025-05-09 10:04       ` Chenyi Qiang
  1 sibling, 1 reply; 67+ messages in thread
From: Baolu Lu @ 2025-05-09  8:22 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: baolu.lu, qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao,
	Xu Yilun, Li Xiaoyao

On 4/27/2025 10:26 AM, Chenyi Qiang wrote:
> Hi David,
> 
> Any thought on patch 10-12, which is to move the change attribute into a
> priority listener. A problem is how to handle the error handling of
> private_to_shared failure. Previously, we thought it would never be able
> to fail, but right now, it is possible in corner cases (e.g. -ENOMEM) in
> set_attribute_private(). At present, I simply raise an assert instead of
> adding any rollback work (see patch 11).

Do the pages need to be pinned when converting them to a shared state
and unpinned when converting to a private state? Or is this handled
within the vfio_state_change_notify callbacks?

Thanks,
baolu

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 11/13] KVM: Introduce CVMPrivateSharedListener for attribute changes during page conversions
  2025-04-07  7:49 ` [PATCH v4 11/13] KVM: Introduce CVMPrivateSharedListener for attribute changes during page conversions Chenyi Qiang
@ 2025-05-09  9:03   ` Baolu Lu
  2025-05-12  3:18     ` Chenyi Qiang
  0 siblings, 1 reply; 67+ messages in thread
From: Baolu Lu @ 2025-05-09  9:03 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: baolu.lu, qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao,
	Xu Yilun, Li Xiaoyao

On 4/7/2025 3:49 PM, Chenyi Qiang wrote:
> With the introduction of the RamBlockAttribute object to manage
> RAMBlocks with guest_memfd and the implementation of
> PrivateSharedManager interface to convey page conversion events, it is
> more elegant to move attribute changes into a PrivateSharedListener.
> 
> The PrivateSharedListener is reigstered/unregistered for each memory
> region section during kvm_region_add/del(), and listeners are stored in
> a CVMPrivateSharedListener list for easy management. The listener
> handler performs attribute changes upon receiving notifications from
> private_shared_manager_state_change() calls. With this change, the
> state changes operations in kvm_convert_memory() can be removed.
> 
> Note that after moving attribute changes into a listener, errors can be
> returned in ram_block_attribute_notify_to_private() if attribute changes
> fail in corner cases (e.g. -ENOMEM). Since there is currently no rollback
> operation for the to_private case, an assert is used to prevent the
> guest from continuing with a partially changed attribute state.

 From the kernel IOMMU subsystem's perspective, this lack of rollback
might not be a significant issue. Currently, converting memory pages
from shared to private involves unpinning the pages and removing the
mappings from the IOMMU page table, both of which are typically non-
failing operations.

But, in the future, when it comes to partial conversions, there might be
a cut operation before the VFIO unmap. The kernel IOMMU subsystem cannot
guarantee an always-successful cut operation.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result
  2025-05-09  8:20       ` David Hildenbrand
@ 2025-05-09  9:19         ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-05-09  9:19 UTC (permalink / raw)
  To: David Hildenbrand, Chao Gao
  Cc: Alexey Kardashevskiy, Peter Xu, Gupta Pankaj, Paolo Bonzini,
	Philippe Mathieu-Daudé, Michael Roth, qemu-devel, kvm,
	Williams Dan J, Peng Chao P, Xu Yilun, Li Xiaoyao



On 5/9/2025 4:20 PM, David Hildenbrand wrote:
> On 09.05.25 04:38, Chao Gao wrote:
>> On Sun, Apr 27, 2025 at 10:26:52AM +0800, Chenyi Qiang wrote:
>>> Hi David,
>>>
>>> Any thought on patch 10-12, which is to move the change attribute into a
>>> priority listener. A problem is how to handle the error handling of
>>> private_to_shared failure. Previously, we thought it would never be able
>>> to fail, but right now, it is possible in corner cases (e.g. -ENOMEM) in
>>> set_attribute_private(). At present, I simply raise an assert instead of
>>> adding any rollback work (see patch 11).
>>
>> I took a look at patches 10-12, and here are my thoughts:
>>
>> Moving the change attribute into a priority listener seems sensible.
>> It can
>> ensure the correct order between setting memory attributes and VFIO's DMA
>> map/unmap operations, and it can also simplify rollbacks. Since
>> MemoryListener already uses a priority-based list, it should be a good
>> fit
>> for page conversion listeners.
>>
>> Regarding error handling, -ENOMEM won't occur during page conversion
>> because the attribute xarray on the KVM side is populated earlier when
>> QEMU
>> calls kvm_set_phys_mem() -> kvm_set_memory_attributes_private(). 
> 
> I'll note that, with guest_memfd supporting in-place conversion in the
> future, this conversion path will likely change, and we might more
> likely in getting more errors on some conversion paths. (e.g., shared ->
> private could fail).
> 
> But I agree, we should keep complex error handling out of the picture
> for now if not required.

OK, I'll keep the current to_private conversion path simple without
rollback. Just give an assert to kvm_set_memory_attributes_private() in
the listener.

Per the to_shared conversion, As Chao mentioned, since any
failure in page conversion currently leads to QEMU quit (as seen in
kvm_cpu_exec() -> kvm_convert_memory()), the complex rollback seems
meaningless as well. Not sure if we need to keep it with the expectation
that QEMU changes to resume guest with some error status returned in the
future.

> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 12/13] ram-block-attribute: Add priority listener support for PrivateSharedListener
  2025-04-07  7:49 ` [PATCH v4 12/13] ram-block-attribute: Add priority listener support for PrivateSharedListener Chenyi Qiang
@ 2025-05-09  9:23   ` Baolu Lu
  2025-05-09  9:39     ` Chenyi Qiang
  0 siblings, 1 reply; 67+ messages in thread
From: Baolu Lu @ 2025-05-09  9:23 UTC (permalink / raw)
  To: Chenyi Qiang, David Hildenbrand, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: baolu.lu, qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao,
	Xu Yilun, Li Xiaoyao

On 4/7/2025 3:49 PM, Chenyi Qiang wrote:
> In-place page conversion requires operations to follow a specific
> sequence: unmap-before-conversion-to-private and
> map-after-conversion-to-shared. Currently, both attribute changes and
> VFIO DMA map/unmap operations are handled by PrivateSharedListeners,
> they need to be invoked in a specific order.
> 
> For private to shared conversion:
> - Change attribute to shared.
> - VFIO populates the shared mappings into the IOMMU.
> - Restore attribute if the operation fails.
> 
> For shared to private conversion:
> - VFIO discards shared mapping from the IOMMU.
> - Change attribute to private.
> 
> To faciliate this sequence, priority support is added to
> PrivateSharedListener so that listeners are stored in a determined
> order based on priority. A tail queue is used to store listeners,
> allowing traversal in either direction.
> 
> Signed-off-by: Chenyi Qiang<chenyi.qiang@intel.com>
> ---
> Changes in v4:
>      - Newly added.
> ---
>   accel/kvm/kvm-all.c          |  3 ++-
>   hw/vfio/common.c             |  3 ++-
>   include/exec/memory.h        | 19 +++++++++++++++++--
>   include/exec/ramblock.h      |  2 +-
>   system/ram-block-attribute.c | 23 +++++++++++++++++------
>   5 files changed, 39 insertions(+), 11 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index aec64d559b..879c61b391 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -1745,7 +1745,8 @@ static void kvm_region_add(MemoryListener *listener,
>       psl = &cpsl->listener;
>       QLIST_INSERT_HEAD(&cgs->cvm_private_shared_list, cpsl, next);
>       private_shared_listener_init(psl, kvm_private_shared_notify_to_shared,
> -                                 kvm_private_shared_notify_to_private);
> +                                 kvm_private_shared_notify_to_private,
> +                                 PRIVATE_SHARED_LISTENER_PRIORITY_MIN);
>       generic_state_manager_register_listener(gsm, &psl->scl, section);
>   }
>   
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 6e49ae597d..a8aacae26c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -515,7 +515,8 @@ static void vfio_register_private_shared_listener(VFIOContainerBase *bcontainer,
>   
>       psl = &vpsl->listener;
>       private_shared_listener_init(psl, vfio_private_shared_notify_to_shared,
> -                                 vfio_private_shared_notify_to_private);
> +                                 vfio_private_shared_notify_to_private,
> +                                 PRIVATE_SHARED_LISTENER_PRIORITY_COMMON);
>       generic_state_manager_register_listener(gsm, &psl->scl, section);
>       QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next);
>   }
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 9472d9e9b4..3d06cc04a0 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -770,11 +770,24 @@ struct RamDiscardManagerClass {
>       GenericStateManagerClass parent_class;
>   };
>   
> +#define PRIVATE_SHARED_LISTENER_PRIORITY_MIN       0
> +#define PRIVATE_SHARED_LISTENER_PRIORITY_COMMON    10

For the current implementation with primarily KVM and VFIO needing
ordered execution, the two priority levels are likely sufficient. Not
sure whether it needs more priority levels for future development.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 12/13] ram-block-attribute: Add priority listener support for PrivateSharedListener
  2025-05-09  9:23   ` Baolu Lu
@ 2025-05-09  9:39     ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-05-09  9:39 UTC (permalink / raw)
  To: Baolu Lu, David Hildenbrand, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 5/9/2025 5:23 PM, Baolu Lu wrote:
> On 4/7/2025 3:49 PM, Chenyi Qiang wrote:
>> In-place page conversion requires operations to follow a specific
>> sequence: unmap-before-conversion-to-private and
>> map-after-conversion-to-shared. Currently, both attribute changes and
>> VFIO DMA map/unmap operations are handled by PrivateSharedListeners,
>> they need to be invoked in a specific order.
>>
>> For private to shared conversion:
>> - Change attribute to shared.
>> - VFIO populates the shared mappings into the IOMMU.
>> - Restore attribute if the operation fails.
>>
>> For shared to private conversion:
>> - VFIO discards shared mapping from the IOMMU.
>> - Change attribute to private.
>>
>> To faciliate this sequence, priority support is added to
>> PrivateSharedListener so that listeners are stored in a determined
>> order based on priority. A tail queue is used to store listeners,
>> allowing traversal in either direction.
>>
>> Signed-off-by: Chenyi Qiang<chenyi.qiang@intel.com>
>> ---
>> Changes in v4:
>>      - Newly added.
>> ---
>>   accel/kvm/kvm-all.c          |  3 ++-
>>   hw/vfio/common.c             |  3 ++-
>>   include/exec/memory.h        | 19 +++++++++++++++++--
>>   include/exec/ramblock.h      |  2 +-
>>   system/ram-block-attribute.c | 23 +++++++++++++++++------
>>   5 files changed, 39 insertions(+), 11 deletions(-)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index aec64d559b..879c61b391 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -1745,7 +1745,8 @@ static void kvm_region_add(MemoryListener
>> *listener,
>>       psl = &cpsl->listener;
>>       QLIST_INSERT_HEAD(&cgs->cvm_private_shared_list, cpsl, next);
>>       private_shared_listener_init(psl,
>> kvm_private_shared_notify_to_shared,
>> -                                 kvm_private_shared_notify_to_private);
>> +                                 kvm_private_shared_notify_to_private,
>> +                                 PRIVATE_SHARED_LISTENER_PRIORITY_MIN);
>>       generic_state_manager_register_listener(gsm, &psl->scl, section);
>>   }
>>   diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 6e49ae597d..a8aacae26c 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -515,7 +515,8 @@ static void
>> vfio_register_private_shared_listener(VFIOContainerBase *bcontainer,
>>         psl = &vpsl->listener;
>>       private_shared_listener_init(psl,
>> vfio_private_shared_notify_to_shared,
>> -                                 vfio_private_shared_notify_to_private);
>> +                                 vfio_private_shared_notify_to_private,
>> +                                
>> PRIVATE_SHARED_LISTENER_PRIORITY_COMMON);
>>       generic_state_manager_register_listener(gsm, &psl->scl, section);
>>       QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next);
>>   }
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 9472d9e9b4..3d06cc04a0 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -770,11 +770,24 @@ struct RamDiscardManagerClass {
>>       GenericStateManagerClass parent_class;
>>   };
>>   +#define PRIVATE_SHARED_LISTENER_PRIORITY_MIN       0
>> +#define PRIVATE_SHARED_LISTENER_PRIORITY_COMMON    10
> 
> For the current implementation with primarily KVM and VFIO needing
> ordered execution, the two priority levels are likely sufficient. Not
> sure whether it needs more priority levels for future development.

For the priority levels, I think they can be classified. Subsystems like
VFIO don't require explicit order and can be put in the same level. The
primary KVM responsible for changing attribute should be classified
separately.

In addition, since this priority support mainly serves the in-place
conversion. I'll drop this patch temporarily since in-place conversion
will likely change the path.

> 
> Thanks,
> baolu


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result
  2025-05-09  8:22     ` Baolu Lu
@ 2025-05-09 10:04       ` Chenyi Qiang
  2025-05-12  7:54         ` David Hildenbrand
  0 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-05-09 10:04 UTC (permalink / raw)
  To: Baolu Lu, David Hildenbrand, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 5/9/2025 4:22 PM, Baolu Lu wrote:
> On 4/27/2025 10:26 AM, Chenyi Qiang wrote:
>> Hi David,
>>
>> Any thought on patch 10-12, which is to move the change attribute into a
>> priority listener. A problem is how to handle the error handling of
>> private_to_shared failure. Previously, we thought it would never be able
>> to fail, but right now, it is possible in corner cases (e.g. -ENOMEM) in
>> set_attribute_private(). At present, I simply raise an assert instead of
>> adding any rollback work (see patch 11).
> 
> Do the pages need to be pinned when converting them to a shared state
> and unpinned when converting to a private state? Or is this handled
> within the vfio_state_change_notify callbacks?

I think it is handled in vfio_state_change_notify(). Just like the
device passthrough in legacy VM, the shared memory will be pinned during
vfio dma-map and unpin during unmap.

> 
> Thanks,
> baolu


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-05-09  8:18       ` David Hildenbrand
@ 2025-05-09 10:37         ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-05-09 10:37 UTC (permalink / raw)
  To: David Hildenbrand, Baolu Lu, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 5/9/2025 4:18 PM, David Hildenbrand wrote:
>>>>
>>>> Signed-off-by: Chenyi Qiang<chenyi.qiang@intel.com>
>>>
>>> <...>
>>>
>>>> +
>>>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion
>>>> *mr)
>>>> +{
>>>> +    uint64_t shared_bitmap_size;
>>>> +    const int block_size  = qemu_real_host_page_size();
>>>> +    int ret;
>>>> +
>>>> +    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
>>>> +
>>>> +    attr->mr = mr;
>>>> +    ret = memory_region_set_generic_state_manager(mr,
>>>> GENERIC_STATE_MANAGER(attr));
>>>> +    if (ret) {
>>>> +        return ret;
>>>> +    }
>>>> +    attr->shared_bitmap_size = shared_bitmap_size;
>>>> +    attr->shared_bitmap = bitmap_new(shared_bitmap_size);
>>>
>>> Above introduces a bitmap to track the private/shared state of each 4KB
>>> page. While functional, for large RAM blocks managed by guest_memfd,
>>> this could lead to significant memory consumption.
>>>
>>> Have you considered an alternative like a Maple Tree or a generic
>>> interval tree? Both are often more memory-efficient for tracking ranges
>>> of contiguous states.
>>
>> Maybe not necessary. The memory overhead is 1 bit per page
>> (1/(4096*8)=0.003%). I think it is not too much.
> 
> It's certainly not optimal.
> 
> IIRC, QEMU already maintains 3 dirty bitmaps in
> ram_list.dirty_memory (DIRTY_MEMORY_NUM = 3) for guest ram.
> 
> With KVM, we also allocate yet another dirty bitmap without
> KVM_MEM_LOG_DIRTY_PAGES.
> 
> Assuming a 4 TiB VM, a single bitmap should be 128 MiB.

OK. So this is a long-term issue which could be optimized in many
places. I think it needs more efforts to evaluate the benefits of the
change. Currently, maybe put it as a future work.

> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 11/13] KVM: Introduce CVMPrivateSharedListener for attribute changes during page conversions
  2025-05-09  9:03   ` Baolu Lu
@ 2025-05-12  3:18     ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-05-12  3:18 UTC (permalink / raw)
  To: Baolu Lu, David Hildenbrand, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 5/9/2025 5:03 PM, Baolu Lu wrote:
> On 4/7/2025 3:49 PM, Chenyi Qiang wrote:
>> With the introduction of the RamBlockAttribute object to manage
>> RAMBlocks with guest_memfd and the implementation of
>> PrivateSharedManager interface to convey page conversion events, it is
>> more elegant to move attribute changes into a PrivateSharedListener.
>>
>> The PrivateSharedListener is reigstered/unregistered for each memory
>> region section during kvm_region_add/del(), and listeners are stored in
>> a CVMPrivateSharedListener list for easy management. The listener
>> handler performs attribute changes upon receiving notifications from
>> private_shared_manager_state_change() calls. With this change, the
>> state changes operations in kvm_convert_memory() can be removed.
>>
>> Note that after moving attribute changes into a listener, errors can be
>> returned in ram_block_attribute_notify_to_private() if attribute changes
>> fail in corner cases (e.g. -ENOMEM). Since there is currently no rollback
>> operation for the to_private case, an assert is used to prevent the
>> guest from continuing with a partially changed attribute state.
> 
> From the kernel IOMMU subsystem's perspective, this lack of rollback
> might not be a significant issue. Currently, converting memory pages
> from shared to private involves unpinning the pages and removing the
> mappings from the IOMMU page table, both of which are typically non-
> failing operations.
> 
> But, in the future, when it comes to partial conversions, there might be
> a cut operation before the VFIO unmap. The kernel IOMMU subsystem cannot
> guarantee an always-successful cut operation.

Indeed. cut_mapping could fail and in-place conversion path would
change, which makes the error handling more complicated in the future.

At present, the basic convert memory handling does it in the simplest
way, i.e. QEMU quit if failed, which puzzled me a little bit: Should I
follow this simplest thought to just return without rollback, or keep
the rollback logic in case the future change requires it. Maybe I can
move the rollback handling in a individual patch for ease of management.

> 
> Thanks,
> baolu


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range
  2025-04-07  7:49 ` [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
  2025-04-09  2:47   ` Alexey Kardashevskiy
@ 2025-05-12  3:24   ` Zhao Liu
  1 sibling, 0 replies; 67+ messages in thread
From: Zhao Liu @ 2025-05-12  3:24 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth,
	qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

On Mon, Apr 07, 2025 at 03:49:21PM +0800, Chenyi Qiang wrote:
> Date: Mon,  7 Apr 2025 15:49:21 +0800
> From: Chenyi Qiang <chenyi.qiang@intel.com>
> Subject: [PATCH v4 01/13] memory: Export a helper to get intersection of a
>  MemoryRegionSection with a given range
> X-Mailer: git-send-email 2.43.5
> 
> Rename the helper to memory_region_section_intersect_range() to make it
> more generic. Meanwhile, define the @end as Int128 and replace the
> related operations with Int128_* format since the helper is exported as
> a wider API.
> 
> Suggested-by: Alexey Kardashevskiy <aik@amd.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v4:
>     - No change.
> 
> Changes in v3:
>     - No change
> 
> Changes in v2:
>     - Make memory_region_section_intersect_range() an inline function.
>     - Add Reviewed-by from David
>     - Define the @end as Int128 and use the related Int128_* ops as a wilder
>       API (Alexey)
> ---
>  hw/virtio/virtio-mem.c | 32 +++++---------------------------
>  include/exec/memory.h  | 27 +++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+), 27 deletions(-)

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result
  2025-05-09 10:04       ` Chenyi Qiang
@ 2025-05-12  7:54         ` David Hildenbrand
  0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2025-05-12  7:54 UTC (permalink / raw)
  To: Chenyi Qiang, Baolu Lu, Alexey Kardashevskiy, Peter Xu,
	Gupta Pankaj, Paolo Bonzini, Philippe Mathieu-Daudé,
	Michael Roth
  Cc: qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

On 09.05.25 12:04, Chenyi Qiang wrote:
> 
> 
> On 5/9/2025 4:22 PM, Baolu Lu wrote:
>> On 4/27/2025 10:26 AM, Chenyi Qiang wrote:
>>> Hi David,
>>>
>>> Any thought on patch 10-12, which is to move the change attribute into a
>>> priority listener. A problem is how to handle the error handling of
>>> private_to_shared failure. Previously, we thought it would never be able
>>> to fail, but right now, it is possible in corner cases (e.g. -ENOMEM) in
>>> set_attribute_private(). At present, I simply raise an assert instead of
>>> adding any rollback work (see patch 11).
>>
>> Do the pages need to be pinned when converting them to a shared state
>> and unpinned when converting to a private state? Or is this handled
>> within the vfio_state_change_notify callbacks?
> 
> I think it is handled in vfio_state_change_notify(). Just like the
> device passthrough in legacy VM, the shared memory will be pinned during
> vfio dma-map and unpin during unmap.

We'll have to "unmap/unpin before shared->private" and "map/pin after 
private->shared" conversion.


vfio cannot fail unmap/unpin, but guest_memfd will be able to "easily" 
fail shared->private. But in that case (temporary references) we'll 
likely have to retry the conversion until it works.

guest_memfd cannot "easily" fail private->shared conversion, but vfio 
can fail map/pin, in which case we probably have to abort the conversion.

Error handling/recovery will be a bit more tricky than it is today.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-04-07  7:49 ` [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd Chenyi Qiang
  2025-04-09  9:57   ` Alexey Kardashevskiy
  2025-05-09  6:41   ` Baolu Lu
@ 2025-05-12  8:07   ` Zhao Liu
  2025-05-12  9:43     ` Chenyi Qiang
  2 siblings, 1 reply; 67+ messages in thread
From: Zhao Liu @ 2025-05-12  8:07 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth,
	qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

[snip]

> ---
>  include/exec/ramblock.h      |  24 +++
>  system/meson.build           |   1 +
>  system/ram-block-attribute.c | 282 +++++++++++++++++++++++++++++++++++
>  3 files changed, 307 insertions(+)
>  create mode 100644 system/ram-block-attribute.c

checkpatch.pl complains a lot about code line length:

total: 5 errors, 34 warnings, 324 lines checked

> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 0babd105c0..b8b5469db9 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -23,6 +23,10 @@
>  #include "cpu-common.h"
>  #include "qemu/rcu.h"
>  #include "exec/ramlist.h"
> +#include "system/hostmem.h"
> +
> +#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
> +OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass, RAM_BLOCK_ATTRIBUTE)

Could we use "OBJECT_DECLARE_SIMPLE_TYPE" here? Since I find class
doesn't have any virtual method.

>  struct RAMBlock {
>      struct rcu_head rcu;
> @@ -90,5 +94,25 @@ struct RAMBlock {
>       */
>      ram_addr_t postcopy_length;
>  };
> +
> +struct RamBlockAttribute {
> +    Object parent;
> +
> +    MemoryRegion *mr;
> +
> +    /* 1-setting of the bit represents the memory is populated (shared) */
> +    unsigned shared_bitmap_size;
> +    unsigned long *shared_bitmap;
> +
> +    QLIST_HEAD(, PrivateSharedListener) psl_list;
> +};
> +
> +struct RamBlockAttributeClass {
> +    ObjectClass parent_class;
> +};

With OBJECT_DECLARE_SIMPLE_TYPE, this class definition is not needed.

> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr);
> +void ram_block_attribute_unrealize(RamBlockAttribute *attr);
> +
>  #endif
>  #endif
> diff --git a/system/meson.build b/system/meson.build
> index 4952f4b2c7..50a5a64f1c 100644
> --- a/system/meson.build
> +++ b/system/meson.build
> @@ -15,6 +15,7 @@ system_ss.add(files(
>    'dirtylimit.c',
>    'dma-helpers.c',
>    'globals.c',
> +  'ram-block-attribute.c',

This new file is missing a MAINTAINERS entry.

>    'memory_mapping.c',
>    'qdev-monitor.c',
>    'qtest.c',

[snip]

> +static size_t ram_block_attribute_get_block_size(const RamBlockAttribute *attr)
> +{
> +    /*
> +     * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
> +     * Use the host page size as the granularity to track the memory attribute.
> +     */
> +    g_assert(attr && attr->mr && attr->mr->ram_block);
> +    g_assert(attr->mr->ram_block->page_size == qemu_real_host_page_size());
> +    return attr->mr->ram_block->page_size;

What about using qemu_ram_pagesize() instead of accessing
ram_block->page_size directly?

Additionally, maybe we can add a simple helper to get page size from
RamBlockAttribute.

> +}
> +

[snip]

> +static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
> +                                                      StateChangeListener *scl,
> +                                                      MemoryRegionSection *section)
> +{
> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
> +    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
> +    int ret;
> +
> +    g_assert(section->mr == attr->mr);
> +    scl->section = memory_region_section_new_copy(section);
> +
> +    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
> +
> +    ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
> +                                                      ram_block_attribute_notify_shared_cb);
> +    if (ret) {
> +        error_report("%s: Failed to register RAM discard listener: %s", __func__,
> +                     strerror(-ret));

There will be 2 error messages: one is the above, and another is from
ram_block_attribute_for_each_shared_section().

Could we just exit to handle this error?

> +    }
> +}
> +
> +static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm,
> +                                                        StateChangeListener *scl)
> +{
> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
> +    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
> +    int ret;
> +
> +    g_assert(scl->section);
> +    g_assert(scl->section->mr == attr->mr);
> +
> +    ret = ram_block_attribute_for_each_shared_section(attr, scl->section, scl,
> +                                                      ram_block_attribute_notify_private_cb);
> +    if (ret) {
> +        error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
> +                     strerror(-ret));

Ditto.

> +    }
> +
> +    memory_region_section_free_copy(scl->section);
> +    scl->section = NULL;
> +    QLIST_REMOVE(psl, next);
> +}
> +

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-05-12  8:07   ` Zhao Liu
@ 2025-05-12  9:43     ` Chenyi Qiang
  2025-05-13  8:31       ` Zhao Liu
  0 siblings, 1 reply; 67+ messages in thread
From: Chenyi Qiang @ 2025-05-12  9:43 UTC (permalink / raw)
  To: Zhao Liu
  Cc: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth,
	qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

Thanks Zhao for your review!

On 5/12/2025 4:07 PM, Zhao Liu wrote:
> [snip]
> 
>> ---
>>  include/exec/ramblock.h      |  24 +++
>>  system/meson.build           |   1 +
>>  system/ram-block-attribute.c | 282 +++++++++++++++++++++++++++++++++++
>>  3 files changed, 307 insertions(+)
>>  create mode 100644 system/ram-block-attribute.c
> 
> checkpatch.pl complains a lot about code line length:
> 
> total: 5 errors, 34 warnings, 324 lines checked

Thanks for reminder, I have adjusted indent locally and fixed most of
them but still have some warnings for the function definition like:
static uint64_t ram_block_attribute_rdm_get_min_granularity(const
RamDiscardManager *rdm, const MemoryRegion *mr). The "rdm" argument in
the same line with function name will exceed 80 width and I think it is
acceptable to keep it.

> 
>> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
>> index 0babd105c0..b8b5469db9 100644
>> --- a/include/exec/ramblock.h
>> +++ b/include/exec/ramblock.h
>> @@ -23,6 +23,10 @@
>>  #include "cpu-common.h"
>>  #include "qemu/rcu.h"
>>  #include "exec/ramlist.h"
>> +#include "system/hostmem.h"
>> +
>> +#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
>> +OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass, RAM_BLOCK_ATTRIBUTE)
> 
> Could we use "OBJECT_DECLARE_SIMPLE_TYPE" here? Since I find class
> doesn't have any virtual method.

Yes, we can. Previously, I defined the state_change() method for the
class (MemoryAttributeManagerClass) [1] instead of parent
PrivateSharedManagerClass. And leave it unchanged in this version.

In next version, I will drop PrivateShareManager and revert to use
RamDiscardManager. Then, maybe I should also use
OBJECT_DECLARE_SIMPLE_TYPE and make state_change() an exported function
instead of a virtual method since no derived class for RamBlockAttribute.

[1]
https://lore.kernel.org/qemu-devel/20250310081837.13123-6-chenyi.qiang@intel.com/

> 
>>  struct RAMBlock {
>>      struct rcu_head rcu;
>> @@ -90,5 +94,25 @@ struct RAMBlock {
>>       */
>>      ram_addr_t postcopy_length;
>>  };
>> +
>> +struct RamBlockAttribute {
>> +    Object parent;
>> +
>> +    MemoryRegion *mr;
>> +
>> +    /* 1-setting of the bit represents the memory is populated (shared) */
>> +    unsigned shared_bitmap_size;
>> +    unsigned long *shared_bitmap;
>> +
>> +    QLIST_HEAD(, PrivateSharedListener) psl_list;
>> +};
>> +
>> +struct RamBlockAttributeClass {
>> +    ObjectClass parent_class;
>> +};
> 
> With OBJECT_DECLARE_SIMPLE_TYPE, this class definition is not needed.
> 
>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr);
>> +void ram_block_attribute_unrealize(RamBlockAttribute *attr);
>> +
>>  #endif
>>  #endif
>> diff --git a/system/meson.build b/system/meson.build
>> index 4952f4b2c7..50a5a64f1c 100644
>> --- a/system/meson.build
>> +++ b/system/meson.build
>> @@ -15,6 +15,7 @@ system_ss.add(files(
>>    'dirtylimit.c',
>>    'dma-helpers.c',
>>    'globals.c',
>> +  'ram-block-attribute.c',
> 
> This new file is missing a MAINTAINERS entry.

It is still uncertain to me if we need to introduce a new file or add
the code in an existing one like system/physmem.c. Anyway, I can add the
MAINTAINERS entry if no objection.

> 
>>    'memory_mapping.c',
>>    'qdev-monitor.c',
>>    'qtest.c',
> 
> [snip]
> 
>> +static size_t ram_block_attribute_get_block_size(const RamBlockAttribute *attr)
>> +{
>> +    /*
>> +     * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
>> +     * Use the host page size as the granularity to track the memory attribute.
>> +     */
>> +    g_assert(attr && attr->mr && attr->mr->ram_block);
>> +    g_assert(attr->mr->ram_block->page_size == qemu_real_host_page_size());
>> +    return attr->mr->ram_block->page_size;
> 
> What about using qemu_ram_pagesize() instead of accessing
> ram_block->page_size directly?

Make sense!

> 
> Additionally, maybe we can add a simple helper to get page size from
> RamBlockAttribute.

Do you mean introduce a new field page_size and related helper? That was
my first version and but suggested with current implementation
(https://lore.kernel.org/qemu-devel/b55047fd-7b73-4669-b6d2-31653064f27f@intel.com/)

> 
>> +}
>> +
> 
> [snip]
> 
>> +static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
>> +                                                      StateChangeListener *scl,
>> +                                                      MemoryRegionSection *section)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
>> +    int ret;
>> +
>> +    g_assert(section->mr == attr->mr);
>> +    scl->section = memory_region_section_new_copy(section);
>> +
>> +    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
>> +
>> +    ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
>> +                                                      ram_block_attribute_notify_shared_cb);
>> +    if (ret) {
>> +        error_report("%s: Failed to register RAM discard listener: %s", __func__,
>> +                     strerror(-ret));
> 
> There will be 2 error messages: one is the above, and another is from
> ram_block_attribute_for_each_shared_section().
> 
> Could we just exit to handle this error?

Sure, will remove this message as well as the below one.

> 
>> +    }
>> +}
>> +
>> +static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm,
>> +                                                        StateChangeListener *scl)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
>> +    int ret;
>> +
>> +    g_assert(scl->section);
>> +    g_assert(scl->section->mr == attr->mr);
>> +
>> +    ret = ram_block_attribute_for_each_shared_section(attr, scl->section, scl,
>> +                                                      ram_block_attribute_notify_private_cb);
>> +    if (ret) {
>> +        error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
>> +                     strerror(-ret));
> 
> Ditto.
> 
>> +    }
>> +
>> +    memory_region_section_free_copy(scl->section);
>> +    scl->section = NULL;
>> +    QLIST_REMOVE(psl, next);
>> +}
>> +


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-05-12  9:43     ` Chenyi Qiang
@ 2025-05-13  8:31       ` Zhao Liu
  2025-05-14  1:39         ` Chenyi Qiang
  0 siblings, 1 reply; 67+ messages in thread
From: Zhao Liu @ 2025-05-13  8:31 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth,
	qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao

> >> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> >> index 0babd105c0..b8b5469db9 100644
> >> --- a/include/exec/ramblock.h
> >> +++ b/include/exec/ramblock.h
> >> @@ -23,6 +23,10 @@
> >>  #include "cpu-common.h"
> >>  #include "qemu/rcu.h"
> >>  #include "exec/ramlist.h"
> >> +#include "system/hostmem.h"
> >> +
> >> +#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
> >> +OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass, RAM_BLOCK_ATTRIBUTE)
> > 
> > Could we use "OBJECT_DECLARE_SIMPLE_TYPE" here? Since I find class
> > doesn't have any virtual method.
> 
> Yes, we can. Previously, I defined the state_change() method for the
> class (MemoryAttributeManagerClass) [1] instead of parent
> PrivateSharedManagerClass. And leave it unchanged in this version.
> 
> In next version, I will drop PrivateShareManager and revert to use
> RamDiscardManager. Then, maybe I should also use
> OBJECT_DECLARE_SIMPLE_TYPE and make state_change() an exported function
> instead of a virtual method since no derived class for RamBlockAttribute.

Thank you! I see. I don't have an opinion on whether to add virtual
method or not, if you feel it's appropriate then adding class is fine.
(My comment may be outdated, it's just for the fact that there is no
need to add class in this patch.) Looking forward to your next version.

> [1]
> https://lore.kernel.org/qemu-devel/20250310081837.13123-6-chenyi.qiang@intel.com/
> 
> > 
> >>  struct RAMBlock {
> >>      struct rcu_head rcu;
> >> @@ -90,5 +94,25 @@ struct RAMBlock {
> >>       */
> >>      ram_addr_t postcopy_length;
> >>  };
> >> +

[snip]

> >> +static size_t ram_block_attribute_get_block_size(const RamBlockAttribute *attr)
> >> +{
> >> +    /*
> >> +     * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
> >> +     * Use the host page size as the granularity to track the memory attribute.
> >> +     */
> >> +    g_assert(attr && attr->mr && attr->mr->ram_block);
> >> +    g_assert(attr->mr->ram_block->page_size == qemu_real_host_page_size());
> >> +    return attr->mr->ram_block->page_size;
> > 
> > What about using qemu_ram_pagesize() instead of accessing
> > ram_block->page_size directly?
> 
> Make sense!
> 
> > 
> > Additionally, maybe we can add a simple helper to get page size from
> > RamBlockAttribute.
> 
> Do you mean introduce a new field page_size and related helper? That was
> my first version and but suggested with current implementation
> (https://lore.kernel.org/qemu-devel/b55047fd-7b73-4669-b6d2-31653064f27f@intel.com/)

Yes, that's exactly my point. It's up to you if it's really necessary :-).

> > 
> >> +}
> >> +
> > 
> > [snip]
> > 
> >> +static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
> >> +                                                      StateChangeListener *scl,
> >> +                                                      MemoryRegionSection *section)
> >> +{
> >> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
> >> +    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
> >> +    int ret;
> >> +
> >> +    g_assert(section->mr == attr->mr);
> >> +    scl->section = memory_region_section_new_copy(section);
> >> +
> >> +    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
> >> +
> >> +    ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
> >> +                                                      ram_block_attribute_notify_shared_cb);
> >> +    if (ret) {
> >> +        error_report("%s: Failed to register RAM discard listener: %s", __func__,
> >> +                     strerror(-ret));
> > 
> > There will be 2 error messages: one is the above, and another is from
> > ram_block_attribute_for_each_shared_section().
> > 
> > Could we just exit to handle this error?
> 
> Sure, will remove this message as well as the below one.

   if (ret) {
       error_report("%s: Failed to register RAM discard listener: %s", __func__,
                    strerror(-ret);
       exit(1);
   }

I mean adding a exit() here. When there's the error, if we expect it not to
break the QEMU, then perhaps warning is better. Otherwise, it's better to
handle this error. Direct exit() feels like an option.

Thanks,
Zhao

> > 
> >> +    }
> >> +}
> >> +

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd
  2025-05-13  8:31       ` Zhao Liu
@ 2025-05-14  1:39         ` Chenyi Qiang
  0 siblings, 0 replies; 67+ messages in thread
From: Chenyi Qiang @ 2025-05-14  1:39 UTC (permalink / raw)
  To: Zhao Liu
  Cc: David Hildenbrand, Alexey Kardashevskiy, Peter Xu, Gupta Pankaj,
	Paolo Bonzini, Philippe Mathieu-Daudé, Michael Roth,
	qemu-devel, kvm, Williams Dan J, Peng Chao P, Gao Chao, Xu Yilun,
	Li Xiaoyao



On 5/13/2025 4:31 PM, Zhao Liu wrote:
>>>> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
>>>> index 0babd105c0..b8b5469db9 100644
>>>> --- a/include/exec/ramblock.h
>>>> +++ b/include/exec/ramblock.h
>>>> @@ -23,6 +23,10 @@
>>>>  #include "cpu-common.h"
>>>>  #include "qemu/rcu.h"
>>>>  #include "exec/ramlist.h"
>>>> +#include "system/hostmem.h"
>>>> +
>>>> +#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
>>>> +OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass, RAM_BLOCK_ATTRIBUTE)
>>>
>>> Could we use "OBJECT_DECLARE_SIMPLE_TYPE" here? Since I find class
>>> doesn't have any virtual method.
>>
>> Yes, we can. Previously, I defined the state_change() method for the
>> class (MemoryAttributeManagerClass) [1] instead of parent
>> PrivateSharedManagerClass. And leave it unchanged in this version.
>>
>> In next version, I will drop PrivateShareManager and revert to use
>> RamDiscardManager. Then, maybe I should also use
>> OBJECT_DECLARE_SIMPLE_TYPE and make state_change() an exported function
>> instead of a virtual method since no derived class for RamBlockAttribute.
> 
> Thank you! I see. I don't have an opinion on whether to add virtual
> method or not, if you feel it's appropriate then adding class is fine.
> (My comment may be outdated, it's just for the fact that there is no
> need to add class in this patch.) Looking forward to your next version.
> 
>> [1]
>> https://lore.kernel.org/qemu-devel/20250310081837.13123-6-chenyi.qiang@intel.com/
>>
>>>
>>>>  struct RAMBlock {
>>>>      struct rcu_head rcu;
>>>> @@ -90,5 +94,25 @@ struct RAMBlock {
>>>>       */
>>>>      ram_addr_t postcopy_length;
>>>>  };
>>>> +
> 
> [snip]
> 
>>>> +static size_t ram_block_attribute_get_block_size(const RamBlockAttribute *attr)
>>>> +{
>>>> +    /*
>>>> +     * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
>>>> +     * Use the host page size as the granularity to track the memory attribute.
>>>> +     */
>>>> +    g_assert(attr && attr->mr && attr->mr->ram_block);
>>>> +    g_assert(attr->mr->ram_block->page_size == qemu_real_host_page_size());
>>>> +    return attr->mr->ram_block->page_size;
>>>
>>> What about using qemu_ram_pagesize() instead of accessing
>>> ram_block->page_size directly?
>>
>> Make sense!
>>
>>>
>>> Additionally, maybe we can add a simple helper to get page size from
>>> RamBlockAttribute.
>>
>> Do you mean introduce a new field page_size and related helper? That was
>> my first version and but suggested with current implementation
>> (https://lore.kernel.org/qemu-devel/b55047fd-7b73-4669-b6d2-31653064f27f@intel.com/)
> 
> Yes, that's exactly my point. It's up to you if it's really necessary :-).
> 
>>>
>>>> +}
>>>> +
>>>
>>> [snip]
>>>
>>>> +static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
>>>> +                                                      StateChangeListener *scl,
>>>> +                                                      MemoryRegionSection *section)
>>>> +{
>>>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>>>> +    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
>>>> +    int ret;
>>>> +
>>>> +    g_assert(section->mr == attr->mr);
>>>> +    scl->section = memory_region_section_new_copy(section);
>>>> +
>>>> +    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
>>>> +
>>>> +    ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
>>>> +                                                      ram_block_attribute_notify_shared_cb);
>>>> +    if (ret) {
>>>> +        error_report("%s: Failed to register RAM discard listener: %s", __func__,
>>>> +                     strerror(-ret));
>>>
>>> There will be 2 error messages: one is the above, and another is from
>>> ram_block_attribute_for_each_shared_section().
>>>
>>> Could we just exit to handle this error?
>>
>> Sure, will remove this message as well as the below one.
> 
>    if (ret) {
>        error_report("%s: Failed to register RAM discard listener: %s", __func__,
>                     strerror(-ret);
>        exit(1);
>    }
> 
> I mean adding a exit() here. When there's the error, if we expect it not to
> break the QEMU, then perhaps warning is better. Otherwise, it's better to
> handle this error. Direct exit() feels like an option.

Sorry for my misunderstanding. You are right, only warning may cause
unexpected behavior especially after adding a new listener for changing
attribute. Will add a direct exit() here.

> 
> Thanks,
> Zhao
> 
>>>
>>>> +    }
>>>> +}
>>>> +


^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2025-05-14  1:40 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
2025-04-09  2:47   ` Alexey Kardashevskiy
2025-04-09  6:26     ` Chenyi Qiang
2025-04-09  6:45       ` Alexey Kardashevskiy
2025-04-09  7:38         ` Chenyi Qiang
2025-05-12  3:24   ` Zhao Liu
2025-04-07  7:49 ` [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result Chenyi Qiang
2025-04-07  9:53   ` Xiaoyao Li
2025-04-08  0:50     ` Chenyi Qiang
2025-04-09  5:35   ` Alexey Kardashevskiy
2025-04-09  5:52     ` Chenyi Qiang
2025-04-25 12:35       ` David Hildenbrand
2025-04-07  7:49 ` [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() Chenyi Qiang
2025-04-09  5:43   ` Alexey Kardashevskiy
2025-04-09  6:56     ` Chenyi Qiang
2025-04-25 12:44     ` David Hildenbrand
2025-04-25 12:42   ` David Hildenbrand
2025-04-27  2:13     ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager Chenyi Qiang
2025-04-09  9:56   ` Alexey Kardashevskiy
2025-04-09 12:57     ` Chenyi Qiang
2025-04-10  0:11       ` Alexey Kardashevskiy
2025-04-10  1:44         ` Chenyi Qiang
2025-04-16  3:32           ` Chenyi Qiang
2025-04-17 23:10             ` Alexey Kardashevskiy
2025-04-18  3:49               ` Chenyi Qiang
2025-04-25 12:54             ` David Hildenbrand
2025-04-25 12:49     ` David Hildenbrand
2025-04-27  1:33       ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager Chenyi Qiang
2025-04-09  9:56   ` Alexey Kardashevskiy
2025-04-10  3:47     ` Chenyi Qiang
2025-04-25 12:57   ` David Hildenbrand
2025-04-27  1:40     ` Chenyi Qiang
2025-04-29 10:01       ` David Hildenbrand
2025-04-07  7:49 ` [PATCH v4 06/13] vfio: Add the support for PrivateSharedManager Interface Chenyi Qiang
2025-04-09  9:58   ` Alexey Kardashevskiy
2025-04-10  5:53     ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd Chenyi Qiang
2025-04-09  9:57   ` Alexey Kardashevskiy
2025-04-10  7:37     ` Chenyi Qiang
2025-05-09  6:41   ` Baolu Lu
2025-05-09  7:55     ` Chenyi Qiang
2025-05-09  8:18       ` David Hildenbrand
2025-05-09 10:37         ` Chenyi Qiang
2025-05-12  8:07   ` Zhao Liu
2025-05-12  9:43     ` Chenyi Qiang
2025-05-13  8:31       ` Zhao Liu
2025-05-14  1:39         ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 08/13] ram-block-attribute: Introduce a callback to notify shared/private state changes Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 09/13] memory: Attach RamBlockAttribute to guest_memfd-backed RAMBlocks Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result Chenyi Qiang
2025-04-27  2:26   ` Chenyi Qiang
2025-05-09  2:38     ` Chao Gao
2025-05-09  8:20       ` David Hildenbrand
2025-05-09  9:19         ` Chenyi Qiang
2025-05-09  8:22     ` Baolu Lu
2025-05-09 10:04       ` Chenyi Qiang
2025-05-12  7:54         ` David Hildenbrand
2025-04-07  7:49 ` [PATCH v4 11/13] KVM: Introduce CVMPrivateSharedListener for attribute changes during page conversions Chenyi Qiang
2025-05-09  9:03   ` Baolu Lu
2025-05-12  3:18     ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 12/13] ram-block-attribute: Add priority listener support for PrivateSharedListener Chenyi Qiang
2025-05-09  9:23   ` Baolu Lu
2025-05-09  9:39     ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 13/13] RAMBlock: Make guest_memfd require coordinate discard Chenyi Qiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).