* [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem
@ 2026-05-04 12:30 Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 01/13] system/memory: split RamDiscardManager into source and manager Marc-André Lureau
` (13 more replies)
0 siblings, 14 replies; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Marc-André Lureau, Cédric Le Goater
Hi,
This is an attempt to fix the incompatibility of virtio-mem with confidential
VMs. The solution implements what was discussed earlier with D. Hildenbrand:
https://patchwork.ozlabs.org/project/qemu-devel/patch/20250407074939.18657-5-chenyi.qiang@intel.com/#3502238
The first patches are misc cleanups. Then some code refactoring to have split a
manager/source. And finally, the manager learns to deal with multiple sources.
I haven't done thorough testing. I only launched a SEV guest with a virtio-mem
device. It would be nice to have more tests for those scenarios with
VFIO/virtio-mem/confvm.. In any case, review & testing needed!
(help fix https://issues.redhat.com/browse/RHEL-131968)
v4:
- added "system/physmem: make ram_block_discard_range() handle guest_memfd"
- added "monitor: add 'info ramblock-attributes' command"
- added "RFC: hw/virtio: start virtio-mem guest_memfd regions as shared"
- skip calling source in notify_populate (it may not have updated its
internal state)
- rebased, collected trailer tags
v3: issues found by Cédric
- fix assertion error on shutdown, due to rcu-defer cleanup
- fix API doc warnings
v2:
- drop replay_{populated,discarded} from source, suggested by Peter Xu
- add extra manager cleanup
- add r-b tags for preliminary patches
---
Marc-André Lureau (13):
system/memory: split RamDiscardManager into source and manager
system/memory: move RamDiscardManager to separate compilation unit
system/memory: constify section arguments
system/ram-discard-manager: implement replay via is_populated iteration
virtio-mem: remove replay_populated/replay_discarded implementation
system/ram-discard-manager: drop replay from source interface
system/memory: implement RamDiscardManager multi-source aggregation
system/physmem: destroy ram block attributes before RCU-deferred reclaim
system/memory: add RamDiscardManager reference counting and cleanup
tests: add unit tests for RamDiscardManager multi-source aggregation
system/physmem: make ram_block_discard_range() handle guest_memfd
monitor: add 'info ramblock-attributes' command
RFC: hw/virtio: start virtio-mem guest_memfd regions as shared
MAINTAINERS | 4 +
qapi/machine.json | 55 ++
include/hw/vfio/vfio-container.h | 2 +-
include/hw/vfio/vfio-cpr.h | 2 +-
include/hw/virtio/virtio-mem.h | 3 -
include/monitor/hmp.h | 1 +
include/system/memory.h | 283 +-----
include/system/ram-discard-manager.h | 358 ++++++++
include/system/ramblock.h | 6 +-
accel/kvm/kvm-all.c | 5 +-
hw/core/machine-hmp-cmds.c | 32 +
hw/vfio/cpr-legacy.c | 4 +-
hw/vfio/listener.c | 10 +-
hw/virtio/virtio-mem.c | 286 ++-----
migration/ram.c | 6 +-
system/memory.c | 83 +-
system/memory_mapping.c | 4 +-
system/physmem.c | 27 +-
system/ram-block-attributes.c | 329 +++----
system/ram-discard-manager.c | 612 +++++++++++++
target/i386/kvm/tdx.c | 2 +-
tests/unit/test-ram-discard-manager-stubs.c | 48 ++
tests/unit/test-ram-discard-manager.c | 1235 +++++++++++++++++++++++++++
hmp-commands-info.hx | 13 +
rust/bindings/system-sys/lib.rs | 2 +-
system/meson.build | 1 +
system/trace-events | 2 +-
tests/unit/meson.build | 8 +-
28 files changed, 2597 insertions(+), 826 deletions(-)
---
base-commit: ac0cc20ad2fe0b8df2e5d9458e90a095ac711ab1
change-id: 20260414-rdm5-b6df2366d603
Best regards,
--
Marc-André Lureau <marcandre.lureau@redhat.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v4 01/13] system/memory: split RamDiscardManager into source and manager
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 02/13] system/memory: move RamDiscardManager to separate compilation unit Marc-André Lureau
` (12 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Marc-André Lureau
Refactor the RamDiscardManager interface into two distinct components:
- RamDiscardSource: An interface that state providers (virtio-mem,
RamBlockAttributes) implement to provide discard state information
(granularity, populated/discarded ranges, replay callbacks).
- RamDiscardManager: A concrete QOM object that wraps a source, owns
the listener list, and handles listener registration/unregistration
and notifications.
This separation moves the listener management logic from individual
source implementations into the central RamDiscardManager, reducing
code duplication between virtio-mem and RamBlockAttributes.
The change prepares for future work where a RamDiscardManager could
aggregate multiple sources.
Note, the original virtio-mem code had conditions before discard:
if (vmem->size) {
rdl->notify_discard(rdl, rdl->section);
}
however, the new code calls discard unconditionally. This is considered
safe, since the populate/discard of sections are already asymmetrical
(unplug & unregister all listener section unconditionally).
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
include/hw/virtio/virtio-mem.h | 3 -
include/system/memory.h | 197 +++++++++++++++++++++----------------
include/system/ramblock.h | 3 +-
hw/virtio/virtio-mem.c | 163 ++++++------------------------
system/memory.c | 218 ++++++++++++++++++++++++++++++++++++-----
system/ram-block-attributes.c | 171 ++++++++++----------------------
6 files changed, 386 insertions(+), 369 deletions(-)
diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
index 221cfd76bf9..5d1d19c6bec 100644
--- a/include/hw/virtio/virtio-mem.h
+++ b/include/hw/virtio/virtio-mem.h
@@ -118,9 +118,6 @@ struct VirtIOMEM {
/* notifiers to notify when "size" changes */
NotifierList size_change_notifiers;
- /* listeners to notify on plug/unplug activity. */
- QLIST_HEAD(, RamDiscardListener) rdl_list;
-
/* Catch system resets -> qemu_devices_reset() only. */
VirtioMemSystemReset *system_reset;
};
diff --git a/include/system/memory.h b/include/system/memory.h
index 1417132f6d9..a37d320293a 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -54,6 +54,12 @@ typedef struct RamDiscardManager RamDiscardManager;
DECLARE_OBJ_CHECKERS(RamDiscardManager, RamDiscardManagerClass,
RAM_DISCARD_MANAGER, TYPE_RAM_DISCARD_MANAGER);
+#define TYPE_RAM_DISCARD_SOURCE "ram-discard-source"
+typedef struct RamDiscardSourceClass RamDiscardSourceClass;
+typedef struct RamDiscardSource RamDiscardSource;
+DECLARE_OBJ_CHECKERS(RamDiscardSource, RamDiscardSourceClass,
+ RAM_DISCARD_SOURCE, TYPE_RAM_DISCARD_SOURCE);
+
#ifdef CONFIG_FUZZ
void fuzz_dma_read_cb(size_t addr,
size_t len,
@@ -595,8 +601,8 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
/**
* typedef ReplayRamDiscardState:
*
- * The callback handler for #RamDiscardManagerClass.replay_populated/
- * #RamDiscardManagerClass.replay_discarded to invoke on populated/discarded
+ * The callback handler for #RamDiscardSourceClass.replay_populated/
+ * #RamDiscardSourceClass.replay_discarded to invoke on populated/discarded
* parts.
*
* @section: the #MemoryRegionSection of populated/discarded part
@@ -608,40 +614,17 @@ typedef int (*ReplayRamDiscardState)(MemoryRegionSection *section,
void *opaque);
/*
- * RamDiscardManagerClass:
- *
- * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
- * regions are currently populated to be used/accessed by the VM, notifying
- * after parts were discarded (freeing up memory) and before parts will be
- * populated (consuming memory), to be used/accessed by the VM.
+ * RamDiscardSourceClass:
*
- * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
- * #MemoryRegion isn't mapped into an address space yet (either directly
- * or via an alias); it cannot change while the #MemoryRegion is
- * mapped into an address space.
+ * A #RamDiscardSource provides information about which parts of a specific
+ * RAM #MemoryRegion are currently populated (accessible) vs discarded.
*
- * The #RamDiscardManager is intended to be used by technologies that are
- * incompatible with discarding of RAM (e.g., VFIO, which may pin all
- * memory inside a #MemoryRegion), and require proper coordination to only
- * map the currently populated parts, to hinder parts that are expected to
- * remain discarded from silently getting populated and consuming memory.
- * Technologies that support discarding of RAM don't have to bother and can
- * simply map the whole #MemoryRegion.
- *
- * An example #RamDiscardManager is virtio-mem, which logically (un)plugs
- * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
- * Logically unplugging memory consists of discarding RAM. The VM agreed to not
- * access unplugged (discarded) memory - especially via DMA. virtio-mem will
- * properly coordinate with listeners before memory is plugged (populated),
- * and after memory is unplugged (discarded).
- *
- * Listeners are called in multiples of the minimum granularity (unless it
- * would exceed the registered range) and changes are aligned to the minimum
- * granularity within the #MemoryRegion. Listeners have to prepare for memory
- * becoming discarded in a different granularity than it was populated and the
- * other way around.
+ * This is an interface that state providers (like virtio-mem or
+ * RamBlockAttributes) implement to provide discard state information. A
+ * #RamDiscardManager wraps sources and manages listener registrations and
+ * notifications.
*/
-struct RamDiscardManagerClass {
+struct RamDiscardSourceClass {
/* private */
InterfaceClass parent_class;
@@ -651,47 +634,47 @@ struct RamDiscardManagerClass {
* @get_min_granularity:
*
* Get the minimum granularity in which listeners will get notified
- * about changes within the #MemoryRegion via the #RamDiscardManager.
+ * about changes within the #MemoryRegion via the #RamDiscardSource.
*
- * @rdm: the #RamDiscardManager
+ * @rds: the #RamDiscardSource
* @mr: the #MemoryRegion
*
* Returns the minimum granularity.
*/
- uint64_t (*get_min_granularity)(const RamDiscardManager *rdm,
+ uint64_t (*get_min_granularity)(const RamDiscardSource *rds,
const MemoryRegion *mr);
/**
* @is_populated:
*
* Check whether the given #MemoryRegionSection is completely populated
- * (i.e., no parts are currently discarded) via the #RamDiscardManager.
+ * (i.e., no parts are currently discarded) via the #RamDiscardSource.
* There are no alignment requirements.
*
- * @rdm: the #RamDiscardManager
+ * @rds: the #RamDiscardSource
* @section: the #MemoryRegionSection
*
* Returns whether the given range is completely populated.
*/
- bool (*is_populated)(const RamDiscardManager *rdm,
+ bool (*is_populated)(const RamDiscardSource *rds,
const MemoryRegionSection *section);
/**
* @replay_populated:
*
* Call the #ReplayRamDiscardState callback for all populated parts within
- * the #MemoryRegionSection via the #RamDiscardManager.
+ * the #MemoryRegionSection via the #RamDiscardSource.
*
* In case any call fails, no further calls are made.
*
- * @rdm: the #RamDiscardManager
+ * @rds: the #RamDiscardSource
* @section: the #MemoryRegionSection
* @replay_fn: the #ReplayRamDiscardState callback
* @opaque: pointer to forward to the callback
*
* Returns 0 on success, or a negative error if any notification failed.
*/
- int (*replay_populated)(const RamDiscardManager *rdm,
+ int (*replay_populated)(const RamDiscardSource *rds,
MemoryRegionSection *section,
ReplayRamDiscardState replay_fn, void *opaque);
@@ -699,50 +682,60 @@ struct RamDiscardManagerClass {
* @replay_discarded:
*
* Call the #ReplayRamDiscardState callback for all discarded parts within
- * the #MemoryRegionSection via the #RamDiscardManager.
+ * the #MemoryRegionSection via the #RamDiscardSource.
*
- * @rdm: the #RamDiscardManager
+ * @rds: the #RamDiscardSource
* @section: the #MemoryRegionSection
* @replay_fn: the #ReplayRamDiscardState callback
* @opaque: pointer to forward to the callback
*
* Returns 0 on success, or a negative error if any notification failed.
*/
- int (*replay_discarded)(const RamDiscardManager *rdm,
+ int (*replay_discarded)(const RamDiscardSource *rds,
MemoryRegionSection *section,
ReplayRamDiscardState replay_fn, void *opaque);
+};
- /**
- * @register_listener:
- *
- * Register a #RamDiscardListener for the given #MemoryRegionSection and
- * immediately notify the #RamDiscardListener about all populated parts
- * within the #MemoryRegionSection via the #RamDiscardManager.
- *
- * In case any notification fails, no further notifications are triggered
- * and an error is logged.
- *
- * @rdm: the #RamDiscardManager
- * @rdl: the #RamDiscardListener
- * @section: the #MemoryRegionSection
- */
- void (*register_listener)(RamDiscardManager *rdm,
- RamDiscardListener *rdl,
- MemoryRegionSection *section);
+/**
+ * RamDiscardManager:
+ *
+ * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
+ * regions are currently populated to be used/accessed by the VM, notifying
+ * after parts were discarded (freeing up memory) and before parts will be
+ * populated (consuming memory), to be used/accessed by the VM.
+ *
+ * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
+ * #MemoryRegion isn't mapped into an address space yet (either directly
+ * or via an alias); it cannot change while the #MemoryRegion is
+ * mapped into an address space.
+ *
+ * The #RamDiscardManager is intended to be used by technologies that are
+ * incompatible with discarding of RAM (e.g., VFIO, which may pin all
+ * memory inside a #MemoryRegion), and require proper coordination to only
+ * map the currently populated parts, to hinder parts that are expected to
+ * remain discarded from silently getting populated and consuming memory.
+ * Technologies that support discarding of RAM don't have to bother and can
+ * simply map the whole #MemoryRegion.
+ *
+ * An example #RamDiscardSource is virtio-mem, which logically (un)plugs
+ * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
+ * Logically unplugging memory consists of discarding RAM. The VM agreed to not
+ * access unplugged (discarded) memory - especially via DMA. virtio-mem will
+ * properly coordinate with listeners before memory is plugged (populated),
+ * and after memory is unplugged (discarded).
+ *
+ * Listeners are called in multiples of the minimum granularity (unless it
+ * would exceed the registered range) and changes are aligned to the minimum
+ * granularity within the #MemoryRegion. Listeners have to prepare for memory
+ * becoming discarded in a different granularity than it was populated and the
+ * other way around.
+ */
+struct RamDiscardManager {
+ Object parent;
- /**
- * @unregister_listener:
- *
- * Unregister a previously registered #RamDiscardListener via the
- * #RamDiscardManager after notifying the #RamDiscardListener about all
- * populated parts becoming unpopulated within the registered
- * #MemoryRegionSection.
- *
- * @rdm: the #RamDiscardManager
- * @rdl: the #RamDiscardListener
- */
- void (*unregister_listener)(RamDiscardManager *rdm,
- RamDiscardListener *rdl);
+ RamDiscardSource *rds;
+ MemoryRegion *mr;
+ QLIST_HEAD(, RamDiscardListener) rdl_list;
};
uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
@@ -754,8 +747,8 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
/**
* ram_discard_manager_replay_populated:
*
- * A wrapper to call the #RamDiscardManagerClass.replay_populated callback
- * of the #RamDiscardManager.
+ * A wrapper to call the #RamDiscardSourceClass.replay_populated callback
+ * of the #RamDiscardSource sources.
*
* @rdm: the #RamDiscardManager
* @section: the #MemoryRegionSection
@@ -772,8 +765,8 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
/**
* ram_discard_manager_replay_discarded:
*
- * A wrapper to call the #RamDiscardManagerClass.replay_discarded callback
- * of the #RamDiscardManager.
+ * A wrapper to call the #RamDiscardSourceClass.replay_discarded callback
+ * of the #RamDiscardSource sources.
*
* @rdm: the #RamDiscardManager
* @section: the #MemoryRegionSection
@@ -794,6 +787,34 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
RamDiscardListener *rdl);
+/*
+ * Note: later refactoring should take the source into account and the manager
+ * should be able to aggregate multiple sources.
+ */
+int ram_discard_manager_notify_populate(RamDiscardManager *rdm,
+ uint64_t offset, uint64_t size);
+
+ /*
+ * Note: later refactoring should take the source into account and the manager
+ * should be able to aggregate multiple sources.
+ */
+void ram_discard_manager_notify_discard(RamDiscardManager *rdm,
+ uint64_t offset, uint64_t size);
+
+/*
+ * Note: later refactoring should take the source into account and the manager
+ * should be able to aggregate multiple sources.
+ */
+void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm);
+
+/*
+ * Replay populated sections to all registered listeners.
+ *
+ * Note: later refactoring should take the source into account and the manager
+ * should be able to aggregate multiple sources.
+ */
+int ram_discard_manager_replay_populated_to_listeners(RamDiscardManager *rdm);
+
/**
* memory_translate_iotlb: Extract addresses from a TLB entry.
* Called with rcu_read_lock held.
@@ -2486,18 +2507,22 @@ static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
}
/**
- * memory_region_set_ram_discard_manager: set the #RamDiscardManager for a
+ * memory_region_add_ram_discard_source: add a #RamDiscardSource for a
* #MemoryRegion
*
- * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
- * that does not cover RAM, or a #MemoryRegion that already has a
- * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
+ * @mr: the #MemoryRegion
+ * @source: #RamDiscardSource to add
+ */
+int memory_region_add_ram_discard_source(MemoryRegion *mr, RamDiscardSource *source);
+
+/**
+ * memory_region_del_ram_discard_source: remove a #RamDiscardSource for a
+ * #MemoryRegion
*
* @mr: the #MemoryRegion
- * @rdm: #RamDiscardManager to set
+ * @source: #RamDiscardSource to remove
*/
-int memory_region_set_ram_discard_manager(MemoryRegion *mr,
- RamDiscardManager *rdm);
+void memory_region_del_ram_discard_source(MemoryRegion *mr, RamDiscardSource *source);
/**
* memory_region_find: translate an address/size relative to a
diff --git a/include/system/ramblock.h b/include/system/ramblock.h
index 4435f8d55fe..f0b557af416 100644
--- a/include/system/ramblock.h
+++ b/include/system/ramblock.h
@@ -99,11 +99,10 @@ struct RamBlockAttributes {
/* 1-setting of the bitmap represents ram is populated (shared) */
unsigned bitmap_size;
unsigned long *bitmap;
-
- QLIST_HEAD(, RamDiscardListener) rdl_list;
};
/* @offset: the offset within the RAMBlock */
+
int ram_block_discard_range(RAMBlock *rb, uint64_t offset, size_t length);
/* @offset: the offset within the RAMBlock */
int ram_block_discard_guest_memfd_range(RAMBlock *rb, uint64_t offset,
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index a4b71974a1c..be149ee9441 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -16,6 +16,7 @@
#include "qemu/error-report.h"
#include "qemu/units.h"
#include "qemu/target-info-qapi.h"
+#include "system/memory.h"
#include "system/numa.h"
#include "system/system.h"
#include "system/ramblock.h"
@@ -324,74 +325,31 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
return ret;
}
-static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
-{
- RamDiscardListener *rdl = arg;
-
- return rdl->notify_populate(rdl, s);
-}
-
static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
uint64_t size)
{
- RamDiscardListener *rdl;
+ RamDiscardManager *rdm = memory_region_get_ram_discard_manager(&vmem->memdev->mr);
- QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
- MemoryRegionSection tmp = *rdl->section;
-
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- rdl->notify_discard(rdl, &tmp);
- }
+ ram_discard_manager_notify_discard(rdm, offset, size);
}
static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
uint64_t size)
{
- RamDiscardListener *rdl, *rdl2;
- int ret = 0;
-
- QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
- MemoryRegionSection tmp = *rdl->section;
+ RamDiscardManager *rdm = memory_region_get_ram_discard_manager(&vmem->memdev->mr);
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- ret = rdl->notify_populate(rdl, &tmp);
- if (ret) {
- break;
- }
- }
-
- if (ret) {
- /* Notify all already-notified listeners. */
- QLIST_FOREACH(rdl2, &vmem->rdl_list, next) {
- MemoryRegionSection tmp = *rdl2->section;
-
- if (rdl2 == rdl) {
- break;
- }
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- rdl2->notify_discard(rdl2, &tmp);
- }
- }
- return ret;
+ return ram_discard_manager_notify_populate(rdm, offset, size);
}
static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem)
{
- RamDiscardListener *rdl;
+ RamDiscardManager *rdm = memory_region_get_ram_discard_manager(&vmem->memdev->mr);
if (!vmem->size) {
return;
}
- QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
- rdl->notify_discard(rdl, rdl->section);
- }
+ ram_discard_manager_notify_discard_all(rdm);
}
static bool virtio_mem_is_range_plugged(const VirtIOMEM *vmem,
@@ -1037,13 +995,9 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
return;
}
- /*
- * Set ourselves as RamDiscardManager before the plug handler maps the
- * memory region and exposes it via an address space.
- */
- if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
- RAM_DISCARD_MANAGER(vmem))) {
- error_setg(errp, "Failed to set RamDiscardManager");
+ if (memory_region_add_ram_discard_source(&vmem->memdev->mr,
+ RAM_DISCARD_SOURCE(vmem))) {
+ error_setg(errp, "Failed to add RAM discard source");
ram_block_coordinated_discard_require(false);
return;
}
@@ -1062,7 +1016,8 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
if (ret) {
error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
- memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
+ memory_region_del_ram_discard_source(&vmem->memdev->mr,
+ RAM_DISCARD_SOURCE(vmem));
ram_block_coordinated_discard_require(false);
return;
}
@@ -1147,7 +1102,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
* The unplug handler unmapped the memory region, it cannot be
* found via an address space anymore. Unset ourselves.
*/
- memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
+ memory_region_del_ram_discard_source(&vmem->memdev->mr, RAM_DISCARD_SOURCE(vmem));
ram_block_coordinated_discard_require(false);
}
@@ -1175,9 +1130,7 @@ static int virtio_mem_activate_memslot_range_cb(VirtIOMEM *vmem, void *arg,
static int virtio_mem_post_load_bitmap(VirtIOMEM *vmem)
{
- RamDiscardListener *rdl;
- int ret;
-
+ RamDiscardManager *rdm = memory_region_get_ram_discard_manager(&vmem->memdev->mr);
/*
* We restored the bitmap and updated the requested size; activate all
* memslots (so listeners register) before notifying about plugged blocks.
@@ -1195,14 +1148,7 @@ static int virtio_mem_post_load_bitmap(VirtIOMEM *vmem)
* We started out with all memory discarded and our memory region is mapped
* into an address space. Replay, now that we updated the bitmap.
*/
- QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
- ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
- virtio_mem_notify_populate_cb);
- if (ret) {
- return ret;
- }
- }
- return 0;
+ return ram_discard_manager_replay_populated_to_listeners(rdm);
}
static int virtio_mem_post_load(void *opaque, int version_id)
@@ -1650,7 +1596,6 @@ static void virtio_mem_instance_init(Object *obj)
VirtIOMEM *vmem = VIRTIO_MEM(obj);
notifier_list_init(&vmem->size_change_notifiers);
- QLIST_INIT(&vmem->rdl_list);
object_property_add(obj, VIRTIO_MEM_SIZE_PROP, "size", virtio_mem_get_size,
NULL, NULL, NULL);
@@ -1694,19 +1639,19 @@ static const Property virtio_mem_legacy_guests_properties[] = {
unplugged_inaccessible, ON_OFF_AUTO_ON),
};
-static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm,
+static uint64_t virtio_mem_rds_get_min_granularity(const RamDiscardSource *rds,
const MemoryRegion *mr)
{
- const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+ const VirtIOMEM *vmem = VIRTIO_MEM(rds);
g_assert(mr == &vmem->memdev->mr);
return vmem->block_size;
}
-static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
+static bool virtio_mem_rds_is_populated(const RamDiscardSource *rds,
const MemoryRegionSection *s)
{
- const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+ const VirtIOMEM *vmem = VIRTIO_MEM(rds);
uint64_t start_gpa = vmem->addr + s->offset_within_region;
uint64_t end_gpa = start_gpa + int128_get64(s->size);
@@ -1727,19 +1672,19 @@ struct VirtIOMEMReplayData {
void *opaque;
};
-static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
+static int virtio_mem_rds_replay_cb(MemoryRegionSection *s, void *arg)
{
struct VirtIOMEMReplayData *data = arg;
return data->fn(s, data->opaque);
}
-static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
+static int virtio_mem_rds_replay_populated(const RamDiscardSource *rds,
MemoryRegionSection *s,
ReplayRamDiscardState replay_fn,
void *opaque)
{
- const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+ const VirtIOMEM *vmem = VIRTIO_MEM(rds);
struct VirtIOMEMReplayData data = {
.fn = replay_fn,
.opaque = opaque,
@@ -1747,23 +1692,15 @@ static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
g_assert(s->mr == &vmem->memdev->mr);
return virtio_mem_for_each_plugged_section(vmem, s, &data,
- virtio_mem_rdm_replay_populated_cb);
-}
-
-static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
- void *arg)
-{
- struct VirtIOMEMReplayData *data = arg;
-
- return data->fn(s, data->opaque);
+ virtio_mem_rds_replay_cb);
}
-static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
+static int virtio_mem_rds_replay_discarded(const RamDiscardSource *rds,
MemoryRegionSection *s,
ReplayRamDiscardState replay_fn,
void *opaque)
{
- const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+ const VirtIOMEM *vmem = VIRTIO_MEM(rds);
struct VirtIOMEMReplayData data = {
.fn = replay_fn,
.opaque = opaque,
@@ -1771,41 +1708,7 @@ static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
g_assert(s->mr == &vmem->memdev->mr);
return virtio_mem_for_each_unplugged_section(vmem, s, &data,
- virtio_mem_rdm_replay_discarded_cb);
-}
-
-static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
- RamDiscardListener *rdl,
- MemoryRegionSection *s)
-{
- VirtIOMEM *vmem = VIRTIO_MEM(rdm);
- int ret;
-
- g_assert(s->mr == &vmem->memdev->mr);
- rdl->section = memory_region_section_new_copy(s);
-
- QLIST_INSERT_HEAD(&vmem->rdl_list, rdl, next);
- ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
- virtio_mem_notify_populate_cb);
- if (ret) {
- error_report("%s: Replaying plugged ranges failed: %s", __func__,
- strerror(-ret));
- }
-}
-
-static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm,
- RamDiscardListener *rdl)
-{
- VirtIOMEM *vmem = VIRTIO_MEM(rdm);
-
- g_assert(rdl->section->mr == &vmem->memdev->mr);
- if (vmem->size) {
- rdl->notify_discard(rdl, rdl->section);
- }
-
- memory_region_section_free_copy(rdl->section);
- rdl->section = NULL;
- QLIST_REMOVE(rdl, next);
+ virtio_mem_rds_replay_cb);
}
static void virtio_mem_unplug_request_check(VirtIOMEM *vmem, Error **errp)
@@ -1837,7 +1740,7 @@ static void virtio_mem_class_init(ObjectClass *klass, const void *data)
DeviceClass *dc = DEVICE_CLASS(klass);
VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
VirtIOMEMClass *vmc = VIRTIO_MEM_CLASS(klass);
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(klass);
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_CLASS(klass);
device_class_set_props(dc, virtio_mem_properties);
if (virtio_mem_has_legacy_guests()) {
@@ -1861,12 +1764,10 @@ static void virtio_mem_class_init(ObjectClass *klass, const void *data)
vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier;
vmc->unplug_request_check = virtio_mem_unplug_request_check;
- rdmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
- rdmc->is_populated = virtio_mem_rdm_is_populated;
- rdmc->replay_populated = virtio_mem_rdm_replay_populated;
- rdmc->replay_discarded = virtio_mem_rdm_replay_discarded;
- rdmc->register_listener = virtio_mem_rdm_register_listener;
- rdmc->unregister_listener = virtio_mem_rdm_unregister_listener;
+ rdsc->get_min_granularity = virtio_mem_rds_get_min_granularity;
+ rdsc->is_populated = virtio_mem_rds_is_populated;
+ rdsc->replay_populated = virtio_mem_rds_replay_populated;
+ rdsc->replay_discarded = virtio_mem_rds_replay_discarded;
}
static const TypeInfo virtio_mem_info = {
@@ -1878,7 +1779,7 @@ static const TypeInfo virtio_mem_info = {
.class_init = virtio_mem_class_init,
.class_size = sizeof(VirtIOMEMClass),
.interfaces = (const InterfaceInfo[]) {
- { TYPE_RAM_DISCARD_MANAGER },
+ { TYPE_RAM_DISCARD_SOURCE },
{ }
},
};
diff --git a/system/memory.c b/system/memory.c
index 225bbe38c32..77966513113 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2042,34 +2042,88 @@ RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
return mr->rdm;
}
-int memory_region_set_ram_discard_manager(MemoryRegion *mr,
- RamDiscardManager *rdm)
+static RamDiscardManager *ram_discard_manager_new(MemoryRegion *mr,
+ RamDiscardSource *rds)
+{
+ RamDiscardManager *rdm = RAM_DISCARD_MANAGER(object_new(TYPE_RAM_DISCARD_MANAGER));
+
+ rdm->rds = rds;
+ rdm->mr = mr;
+ QLIST_INIT(&rdm->rdl_list);
+ return rdm;
+}
+
+int memory_region_add_ram_discard_source(MemoryRegion *mr,
+ RamDiscardSource *source)
{
g_assert(memory_region_is_ram(mr));
- if (mr->rdm && rdm) {
+ if (mr->rdm) {
return -EBUSY;
}
- mr->rdm = rdm;
+ mr->rdm = ram_discard_manager_new(mr, RAM_DISCARD_SOURCE(source));
return 0;
}
+void memory_region_del_ram_discard_source(MemoryRegion *mr,
+ RamDiscardSource *source)
+{
+ g_assert(mr->rdm->rds == source);
+
+ object_unref(mr->rdm);
+ mr->rdm = NULL;
+}
+
+static uint64_t ram_discard_source_get_min_granularity(const RamDiscardSource *rds,
+ const MemoryRegion *mr)
+{
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
+
+ g_assert(rdsc->get_min_granularity);
+ return rdsc->get_min_granularity(rds, mr);
+}
+
+static bool ram_discard_source_is_populated(const RamDiscardSource *rds,
+ const MemoryRegionSection *section)
+{
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
+
+ g_assert(rdsc->is_populated);
+ return rdsc->is_populated(rds, section);
+}
+
+static int ram_discard_source_replay_populated(const RamDiscardSource *rds,
+ MemoryRegionSection *section,
+ ReplayRamDiscardState replay_fn,
+ void *opaque)
+{
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
+
+ g_assert(rdsc->replay_populated);
+ return rdsc->replay_populated(rds, section, replay_fn, opaque);
+}
+
+static int ram_discard_source_replay_discarded(const RamDiscardSource *rds,
+ MemoryRegionSection *section,
+ ReplayRamDiscardState replay_fn,
+ void *opaque)
+{
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
+
+ g_assert(rdsc->replay_discarded);
+ return rdsc->replay_discarded(rds, section, replay_fn, opaque);
+}
+
uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
const MemoryRegion *mr)
{
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
-
- g_assert(rdmc->get_min_granularity);
- return rdmc->get_min_granularity(rdm, mr);
+ return ram_discard_source_get_min_granularity(rdm->rds, mr);
}
bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
const MemoryRegionSection *section)
{
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
-
- g_assert(rdmc->is_populated);
- return rdmc->is_populated(rdm, section);
+ return ram_discard_source_is_populated(rdm->rds, section);
}
int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
@@ -2077,10 +2131,7 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
ReplayRamDiscardState replay_fn,
void *opaque)
{
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
-
- g_assert(rdmc->replay_populated);
- return rdmc->replay_populated(rdm, section, replay_fn, opaque);
+ return ram_discard_source_replay_populated(rdm->rds, section, replay_fn, opaque);
}
int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
@@ -2088,29 +2139,133 @@ int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
ReplayRamDiscardState replay_fn,
void *opaque)
{
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+ return ram_discard_source_replay_discarded(rdm->rds, section, replay_fn, opaque);
+}
+
+static void ram_discard_manager_initfn(Object *obj)
+{
+ RamDiscardManager *rdm = RAM_DISCARD_MANAGER(obj);
+
+ QLIST_INIT(&rdm->rdl_list);
+}
+
+static void ram_discard_manager_finalize(Object *obj)
+{
+ RamDiscardManager *rdm = RAM_DISCARD_MANAGER(obj);
- g_assert(rdmc->replay_discarded);
- return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
+ g_assert(QLIST_EMPTY(&rdm->rdl_list));
+}
+
+int ram_discard_manager_notify_populate(RamDiscardManager *rdm,
+ uint64_t offset, uint64_t size)
+{
+ RamDiscardListener *rdl, *rdl2;
+ int ret = 0;
+
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ MemoryRegionSection tmp = *rdl->section;
+
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+ continue;
+ }
+ ret = rdl->notify_populate(rdl, &tmp);
+ if (ret) {
+ break;
+ }
+ }
+
+ if (ret) {
+ /* Notify all already-notified listeners about discard. */
+ QLIST_FOREACH(rdl2, &rdm->rdl_list, next) {
+ MemoryRegionSection tmp = *rdl2->section;
+
+ if (rdl2 == rdl) {
+ break;
+ }
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+ continue;
+ }
+ rdl2->notify_discard(rdl2, &tmp);
+ }
+ }
+ return ret;
+}
+
+void ram_discard_manager_notify_discard(RamDiscardManager *rdm,
+ uint64_t offset, uint64_t size)
+{
+ RamDiscardListener *rdl;
+
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ MemoryRegionSection tmp = *rdl->section;
+
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+ continue;
+ }
+ rdl->notify_discard(rdl, &tmp);
+ }
+}
+
+void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm)
+{
+ RamDiscardListener *rdl;
+
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ rdl->notify_discard(rdl, rdl->section);
+ }
+}
+
+static int rdm_populate_cb(MemoryRegionSection *section, void *opaque)
+{
+ RamDiscardListener *rdl = opaque;
+
+ return rdl->notify_populate(rdl, section);
}
void ram_discard_manager_register_listener(RamDiscardManager *rdm,
RamDiscardListener *rdl,
MemoryRegionSection *section)
{
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+ int ret;
+
+ g_assert(section->mr == rdm->mr);
+
+ rdl->section = memory_region_section_new_copy(section);
+ QLIST_INSERT_HEAD(&rdm->rdl_list, rdl, next);
- g_assert(rdmc->register_listener);
- rdmc->register_listener(rdm, rdl, section);
+ ret = ram_discard_source_replay_populated(rdm->rds, rdl->section,
+ rdm_populate_cb, rdl);
+ if (ret) {
+ error_report("%s: Replaying populated ranges failed: %s", __func__,
+ strerror(-ret));
+ }
}
void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
RamDiscardListener *rdl)
{
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+ g_assert(rdl->section);
+ g_assert(rdl->section->mr == rdm->mr);
+
+ rdl->notify_discard(rdl, rdl->section);
+ memory_region_section_free_copy(rdl->section);
+ rdl->section = NULL;
+ QLIST_REMOVE(rdl, next);
+}
+
+int ram_discard_manager_replay_populated_to_listeners(RamDiscardManager *rdm)
+{
+ RamDiscardListener *rdl;
+ int ret = 0;
- g_assert(rdmc->unregister_listener);
- rdmc->unregister_listener(rdm, rdl);
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ ret = ram_discard_source_replay_populated(rdm->rds, rdl->section,
+ rdm_populate_cb, rdl);
+ if (ret) {
+ break;
+ }
+ }
+ return ret;
}
/* Called with rcu_read_lock held. */
@@ -3743,9 +3898,17 @@ static const TypeInfo iommu_memory_region_info = {
};
static const TypeInfo ram_discard_manager_info = {
- .parent = TYPE_INTERFACE,
+ .parent = TYPE_OBJECT,
.name = TYPE_RAM_DISCARD_MANAGER,
- .class_size = sizeof(RamDiscardManagerClass),
+ .instance_size = sizeof(RamDiscardManager),
+ .instance_init = ram_discard_manager_initfn,
+ .instance_finalize = ram_discard_manager_finalize,
+};
+
+static const TypeInfo ram_discard_source_info = {
+ .parent = TYPE_INTERFACE,
+ .name = TYPE_RAM_DISCARD_SOURCE,
+ .class_size = sizeof(RamDiscardSourceClass),
};
static void memory_register_types(void)
@@ -3753,6 +3916,7 @@ static void memory_register_types(void)
type_register_static(&memory_region_info);
type_register_static(&iommu_memory_region_info);
type_register_static(&ram_discard_manager_info);
+ type_register_static(&ram_discard_source_info);
}
type_init(memory_register_types)
diff --git a/system/ram-block-attributes.c b/system/ram-block-attributes.c
index 630b0fda126..a72924eea7d 100644
--- a/system/ram-block-attributes.c
+++ b/system/ram-block-attributes.c
@@ -18,7 +18,7 @@ OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(RamBlockAttributes,
ram_block_attributes,
RAM_BLOCK_ATTRIBUTES,
OBJECT,
- { TYPE_RAM_DISCARD_MANAGER },
+ { TYPE_RAM_DISCARD_SOURCE },
{ })
static size_t
@@ -32,35 +32,9 @@ ram_block_attributes_get_block_size(void)
return qemu_real_host_page_size();
}
-
-static bool
-ram_block_attributes_rdm_is_populated(const RamDiscardManager *rdm,
- const MemoryRegionSection *section)
-{
- const RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rdm);
- const size_t block_size = ram_block_attributes_get_block_size();
- const uint64_t first_bit = section->offset_within_region / block_size;
- const uint64_t last_bit =
- first_bit + int128_get64(section->size) / block_size - 1;
- unsigned long first_discarded_bit;
-
- first_discarded_bit = find_next_zero_bit(attr->bitmap, last_bit + 1,
- first_bit);
- return first_discarded_bit > last_bit;
-}
-
typedef int (*ram_block_attributes_section_cb)(MemoryRegionSection *s,
void *arg);
-static int
-ram_block_attributes_notify_populate_cb(MemoryRegionSection *section,
- void *arg)
-{
- RamDiscardListener *rdl = arg;
-
- return rdl->notify_populate(rdl, section);
-}
-
static int
ram_block_attributes_for_each_populated_section(const RamBlockAttributes *attr,
MemoryRegionSection *section,
@@ -144,93 +118,73 @@ ram_block_attributes_for_each_discarded_section(const RamBlockAttributes *attr,
return ret;
}
-static uint64_t
-ram_block_attributes_rdm_get_min_granularity(const RamDiscardManager *rdm,
- const MemoryRegion *mr)
-{
- const RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rdm);
- g_assert(mr == attr->ram_block->mr);
- return ram_block_attributes_get_block_size();
-}
+typedef struct RamBlockAttributesReplayData {
+ ReplayRamDiscardState fn;
+ void *opaque;
+} RamBlockAttributesReplayData;
-static void
-ram_block_attributes_rdm_register_listener(RamDiscardManager *rdm,
- RamDiscardListener *rdl,
- MemoryRegionSection *section)
+static int ram_block_attributes_rds_replay_cb(MemoryRegionSection *section,
+ void *arg)
{
- RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rdm);
- int ret;
-
- g_assert(section->mr == attr->ram_block->mr);
- rdl->section = memory_region_section_new_copy(section);
-
- QLIST_INSERT_HEAD(&attr->rdl_list, rdl, next);
+ RamBlockAttributesReplayData *data = arg;
- ret = ram_block_attributes_for_each_populated_section(attr, section, rdl,
- ram_block_attributes_notify_populate_cb);
- if (ret) {
- error_report("%s: Failed to register RAM discard listener: %s",
- __func__, strerror(-ret));
- exit(1);
- }
+ return data->fn(section, data->opaque);
}
-static void
-ram_block_attributes_rdm_unregister_listener(RamDiscardManager *rdm,
- RamDiscardListener *rdl)
+/* RamDiscardSource interface implementation */
+static uint64_t
+ram_block_attributes_rds_get_min_granularity(const RamDiscardSource *rds,
+ const MemoryRegion *mr)
{
- RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rdm);
+ const RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rds);
- g_assert(rdl->section);
- g_assert(rdl->section->mr == attr->ram_block->mr);
-
- rdl->notify_discard(rdl, rdl->section);
-
- memory_region_section_free_copy(rdl->section);
- rdl->section = NULL;
- QLIST_REMOVE(rdl, next);
+ g_assert(mr == attr->ram_block->mr);
+ return ram_block_attributes_get_block_size();
}
-typedef struct RamBlockAttributesReplayData {
- ReplayRamDiscardState fn;
- void *opaque;
-} RamBlockAttributesReplayData;
-
-static int ram_block_attributes_rdm_replay_cb(MemoryRegionSection *section,
- void *arg)
+static bool
+ram_block_attributes_rds_is_populated(const RamDiscardSource *rds,
+ const MemoryRegionSection *section)
{
- RamBlockAttributesReplayData *data = arg;
+ const RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rds);
+ const size_t block_size = ram_block_attributes_get_block_size();
+ const uint64_t first_bit = section->offset_within_region / block_size;
+ const uint64_t last_bit =
+ first_bit + int128_get64(section->size) / block_size - 1;
+ unsigned long first_discarded_bit;
- return data->fn(section, data->opaque);
+ first_discarded_bit = find_next_zero_bit(attr->bitmap, last_bit + 1,
+ first_bit);
+ return first_discarded_bit > last_bit;
}
static int
-ram_block_attributes_rdm_replay_populated(const RamDiscardManager *rdm,
+ram_block_attributes_rds_replay_populated(const RamDiscardSource *rds,
MemoryRegionSection *section,
ReplayRamDiscardState replay_fn,
void *opaque)
{
- RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rdm);
+ RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rds);
RamBlockAttributesReplayData data = { .fn = replay_fn, .opaque = opaque };
g_assert(section->mr == attr->ram_block->mr);
return ram_block_attributes_for_each_populated_section(attr, section, &data,
- ram_block_attributes_rdm_replay_cb);
+ ram_block_attributes_rds_replay_cb);
}
static int
-ram_block_attributes_rdm_replay_discarded(const RamDiscardManager *rdm,
+ram_block_attributes_rds_replay_discarded(const RamDiscardSource *rds,
MemoryRegionSection *section,
ReplayRamDiscardState replay_fn,
void *opaque)
{
- RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rdm);
+ RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rds);
RamBlockAttributesReplayData data = { .fn = replay_fn, .opaque = opaque };
g_assert(section->mr == attr->ram_block->mr);
return ram_block_attributes_for_each_discarded_section(attr, section, &data,
- ram_block_attributes_rdm_replay_cb);
+ ram_block_attributes_rds_replay_cb);
}
static bool
@@ -257,42 +211,23 @@ ram_block_attributes_is_valid_range(RamBlockAttributes *attr, uint64_t offset,
return true;
}
-static void ram_block_attributes_notify_discard(RamBlockAttributes *attr,
- uint64_t offset,
- uint64_t size)
+static void
+ram_block_attributes_notify_discard(RamBlockAttributes *attr,
+ uint64_t offset,
+ uint64_t size)
{
- RamDiscardListener *rdl;
+ RamDiscardManager *rdm = memory_region_get_ram_discard_manager(attr->ram_block->mr);
- QLIST_FOREACH(rdl, &attr->rdl_list, next) {
- MemoryRegionSection tmp = *rdl->section;
-
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- rdl->notify_discard(rdl, &tmp);
- }
+ ram_discard_manager_notify_discard(rdm, offset, size);
}
static int
ram_block_attributes_notify_populate(RamBlockAttributes *attr,
uint64_t offset, uint64_t size)
{
- RamDiscardListener *rdl;
- int ret = 0;
-
- QLIST_FOREACH(rdl, &attr->rdl_list, next) {
- MemoryRegionSection tmp = *rdl->section;
-
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- ret = rdl->notify_populate(rdl, &tmp);
- if (ret) {
- break;
- }
- }
+ RamDiscardManager *rdm = memory_region_get_ram_discard_manager(attr->ram_block->mr);
- return ret;
+ return ram_discard_manager_notify_populate(rdm, offset, size);
}
int ram_block_attributes_state_change(RamBlockAttributes *attr,
@@ -376,7 +311,8 @@ RamBlockAttributes *ram_block_attributes_create(RAMBlock *ram_block)
attr = RAM_BLOCK_ATTRIBUTES(object_new(TYPE_RAM_BLOCK_ATTRIBUTES));
attr->ram_block = ram_block;
- if (memory_region_set_ram_discard_manager(mr, RAM_DISCARD_MANAGER(attr))) {
+
+ if (memory_region_add_ram_discard_source(mr, RAM_DISCARD_SOURCE(attr))) {
object_unref(OBJECT(attr));
return NULL;
}
@@ -391,15 +327,12 @@ void ram_block_attributes_destroy(RamBlockAttributes *attr)
g_assert(attr);
g_free(attr->bitmap);
- memory_region_set_ram_discard_manager(attr->ram_block->mr, NULL);
+ memory_region_del_ram_discard_source(attr->ram_block->mr, RAM_DISCARD_SOURCE(attr));
object_unref(OBJECT(attr));
}
static void ram_block_attributes_init(Object *obj)
{
- RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(obj);
-
- QLIST_INIT(&attr->rdl_list);
}
static void ram_block_attributes_finalize(Object *obj)
@@ -409,12 +342,10 @@ static void ram_block_attributes_finalize(Object *obj)
static void ram_block_attributes_class_init(ObjectClass *klass,
const void *data)
{
- RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(klass);
-
- rdmc->get_min_granularity = ram_block_attributes_rdm_get_min_granularity;
- rdmc->register_listener = ram_block_attributes_rdm_register_listener;
- rdmc->unregister_listener = ram_block_attributes_rdm_unregister_listener;
- rdmc->is_populated = ram_block_attributes_rdm_is_populated;
- rdmc->replay_populated = ram_block_attributes_rdm_replay_populated;
- rdmc->replay_discarded = ram_block_attributes_rdm_replay_discarded;
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_CLASS(klass);
+
+ rdsc->get_min_granularity = ram_block_attributes_rds_get_min_granularity;
+ rdsc->is_populated = ram_block_attributes_rds_is_populated;
+ rdsc->replay_populated = ram_block_attributes_rds_replay_populated;
+ rdsc->replay_discarded = ram_block_attributes_rds_replay_discarded;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 02/13] system/memory: move RamDiscardManager to separate compilation unit
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 01/13] system/memory: split RamDiscardManager into source and manager Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 03/13] system/memory: constify section arguments Marc-André Lureau
` (11 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Marc-André Lureau
Extract RamDiscardManager and RamDiscardSource from system/memory.c into
dedicated a unit.
This reduces coupling and allows code that only needs the
RamDiscardManager interface to avoid pulling in all of memory.h
dependencies.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
MAINTAINERS | 2 +
include/system/memory.h | 280 +--------------------------------
include/system/ram-discard-manager.h | 297 +++++++++++++++++++++++++++++++++++
system/memory.c | 221 --------------------------
system/ram-discard-manager.c | 240 ++++++++++++++++++++++++++++
rust/bindings/system-sys/lib.rs | 2 +-
system/meson.build | 1 +
7 files changed, 542 insertions(+), 501 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index afa178c5cce..239218bc1f1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3315,6 +3315,7 @@ F: include/system/memory.h
F: include/system/memory_cached.h
F: include/system/memory_ldst*
F: include/system/physmem.h
+F: include/system/ram-discard-manager.h
F: include/system/ramblock.h
F: include/system/memory_mapping.h
F: system/dma-helpers.c
@@ -3325,6 +3326,7 @@ F: system/physmem.c
F: system/memory_ldst*
F: system/memory-internal.h
F: system/ram-block-attributes.c
+F: system/ram-discard-manager.c
F: scripts/coccinelle/memory-region-housekeeping.cocci
Memory devices
diff --git a/include/system/memory.h b/include/system/memory.h
index a37d320293a..28a75dac4ae 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -16,6 +16,7 @@
#include "exec/hwaddr.h"
#include "system/ram_addr.h"
+#include "system/ram-discard-manager.h"
#include "exec/memattrs.h"
#include "exec/memop.h"
#include "qemu/bswap.h"
@@ -48,18 +49,6 @@ typedef struct IOMMUMemoryRegionClass IOMMUMemoryRegionClass;
DECLARE_OBJ_CHECKERS(IOMMUMemoryRegion, IOMMUMemoryRegionClass,
IOMMU_MEMORY_REGION, TYPE_IOMMU_MEMORY_REGION)
-#define TYPE_RAM_DISCARD_MANAGER "ram-discard-manager"
-typedef struct RamDiscardManagerClass RamDiscardManagerClass;
-typedef struct RamDiscardManager RamDiscardManager;
-DECLARE_OBJ_CHECKERS(RamDiscardManager, RamDiscardManagerClass,
- RAM_DISCARD_MANAGER, TYPE_RAM_DISCARD_MANAGER);
-
-#define TYPE_RAM_DISCARD_SOURCE "ram-discard-source"
-typedef struct RamDiscardSourceClass RamDiscardSourceClass;
-typedef struct RamDiscardSource RamDiscardSource;
-DECLARE_OBJ_CHECKERS(RamDiscardSource, RamDiscardSourceClass,
- RAM_DISCARD_SOURCE, TYPE_RAM_DISCARD_SOURCE);
-
#ifdef CONFIG_FUZZ
void fuzz_dma_read_cb(size_t addr,
size_t len,
@@ -548,273 +537,6 @@ struct IOMMUMemoryRegionClass {
int (*num_indexes)(IOMMUMemoryRegion *iommu);
};
-typedef struct RamDiscardListener RamDiscardListener;
-typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl,
- MemoryRegionSection *section);
-typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl,
- MemoryRegionSection *section);
-
-struct RamDiscardListener {
- /*
- * @notify_populate:
- *
- * Notification that previously discarded memory is about to get populated.
- * Listeners are able to object. If any listener objects, already
- * successfully notified listeners are notified about a discard again.
- *
- * @rdl: the #RamDiscardListener getting notified
- * @section: the #MemoryRegionSection to get populated. The section
- * is aligned within the memory region to the minimum granularity
- * unless it would exceed the registered section.
- *
- * Returns 0 on success. If the notification is rejected by the listener,
- * an error is returned.
- */
- NotifyRamPopulate notify_populate;
-
- /*
- * @notify_discard:
- *
- * Notification that previously populated memory was discarded successfully
- * and listeners should drop all references to such memory and prevent
- * new population (e.g., unmap).
- *
- * @rdl: the #RamDiscardListener getting notified
- * @section: the #MemoryRegionSection to get discarded. The section
- * is aligned within the memory region to the minimum granularity
- * unless it would exceed the registered section.
- */
- NotifyRamDiscard notify_discard;
-
- MemoryRegionSection *section;
- QLIST_ENTRY(RamDiscardListener) next;
-};
-
-static inline void ram_discard_listener_init(RamDiscardListener *rdl,
- NotifyRamPopulate populate_fn,
- NotifyRamDiscard discard_fn)
-{
- rdl->notify_populate = populate_fn;
- rdl->notify_discard = discard_fn;
-}
-
-/**
- * typedef ReplayRamDiscardState:
- *
- * The callback handler for #RamDiscardSourceClass.replay_populated/
- * #RamDiscardSourceClass.replay_discarded to invoke on populated/discarded
- * parts.
- *
- * @section: the #MemoryRegionSection of populated/discarded part
- * @opaque: pointer to forward to the callback
- *
- * Returns 0 on success, or a negative error if failed.
- */
-typedef int (*ReplayRamDiscardState)(MemoryRegionSection *section,
- void *opaque);
-
-/*
- * RamDiscardSourceClass:
- *
- * A #RamDiscardSource provides information about which parts of a specific
- * RAM #MemoryRegion are currently populated (accessible) vs discarded.
- *
- * This is an interface that state providers (like virtio-mem or
- * RamBlockAttributes) implement to provide discard state information. A
- * #RamDiscardManager wraps sources and manages listener registrations and
- * notifications.
- */
-struct RamDiscardSourceClass {
- /* private */
- InterfaceClass parent_class;
-
- /* public */
-
- /**
- * @get_min_granularity:
- *
- * Get the minimum granularity in which listeners will get notified
- * about changes within the #MemoryRegion via the #RamDiscardSource.
- *
- * @rds: the #RamDiscardSource
- * @mr: the #MemoryRegion
- *
- * Returns the minimum granularity.
- */
- uint64_t (*get_min_granularity)(const RamDiscardSource *rds,
- const MemoryRegion *mr);
-
- /**
- * @is_populated:
- *
- * Check whether the given #MemoryRegionSection is completely populated
- * (i.e., no parts are currently discarded) via the #RamDiscardSource.
- * There are no alignment requirements.
- *
- * @rds: the #RamDiscardSource
- * @section: the #MemoryRegionSection
- *
- * Returns whether the given range is completely populated.
- */
- bool (*is_populated)(const RamDiscardSource *rds,
- const MemoryRegionSection *section);
-
- /**
- * @replay_populated:
- *
- * Call the #ReplayRamDiscardState callback for all populated parts within
- * the #MemoryRegionSection via the #RamDiscardSource.
- *
- * In case any call fails, no further calls are made.
- *
- * @rds: the #RamDiscardSource
- * @section: the #MemoryRegionSection
- * @replay_fn: the #ReplayRamDiscardState callback
- * @opaque: pointer to forward to the callback
- *
- * Returns 0 on success, or a negative error if any notification failed.
- */
- int (*replay_populated)(const RamDiscardSource *rds,
- MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn, void *opaque);
-
- /**
- * @replay_discarded:
- *
- * Call the #ReplayRamDiscardState callback for all discarded parts within
- * the #MemoryRegionSection via the #RamDiscardSource.
- *
- * @rds: the #RamDiscardSource
- * @section: the #MemoryRegionSection
- * @replay_fn: the #ReplayRamDiscardState callback
- * @opaque: pointer to forward to the callback
- *
- * Returns 0 on success, or a negative error if any notification failed.
- */
- int (*replay_discarded)(const RamDiscardSource *rds,
- MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn, void *opaque);
-};
-
-/**
- * RamDiscardManager:
- *
- * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
- * regions are currently populated to be used/accessed by the VM, notifying
- * after parts were discarded (freeing up memory) and before parts will be
- * populated (consuming memory), to be used/accessed by the VM.
- *
- * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
- * #MemoryRegion isn't mapped into an address space yet (either directly
- * or via an alias); it cannot change while the #MemoryRegion is
- * mapped into an address space.
- *
- * The #RamDiscardManager is intended to be used by technologies that are
- * incompatible with discarding of RAM (e.g., VFIO, which may pin all
- * memory inside a #MemoryRegion), and require proper coordination to only
- * map the currently populated parts, to hinder parts that are expected to
- * remain discarded from silently getting populated and consuming memory.
- * Technologies that support discarding of RAM don't have to bother and can
- * simply map the whole #MemoryRegion.
- *
- * An example #RamDiscardSource is virtio-mem, which logically (un)plugs
- * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
- * Logically unplugging memory consists of discarding RAM. The VM agreed to not
- * access unplugged (discarded) memory - especially via DMA. virtio-mem will
- * properly coordinate with listeners before memory is plugged (populated),
- * and after memory is unplugged (discarded).
- *
- * Listeners are called in multiples of the minimum granularity (unless it
- * would exceed the registered range) and changes are aligned to the minimum
- * granularity within the #MemoryRegion. Listeners have to prepare for memory
- * becoming discarded in a different granularity than it was populated and the
- * other way around.
- */
-struct RamDiscardManager {
- Object parent;
-
- RamDiscardSource *rds;
- MemoryRegion *mr;
- QLIST_HEAD(, RamDiscardListener) rdl_list;
-};
-
-uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
- const MemoryRegion *mr);
-
-bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
- const MemoryRegionSection *section);
-
-/**
- * ram_discard_manager_replay_populated:
- *
- * A wrapper to call the #RamDiscardSourceClass.replay_populated callback
- * of the #RamDiscardSource sources.
- *
- * @rdm: the #RamDiscardManager
- * @section: the #MemoryRegionSection
- * @replay_fn: the #ReplayRamDiscardState callback
- * @opaque: pointer to forward to the callback
- *
- * Returns 0 on success, or a negative error if any notification failed.
- */
-int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
- MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn,
- void *opaque);
-
-/**
- * ram_discard_manager_replay_discarded:
- *
- * A wrapper to call the #RamDiscardSourceClass.replay_discarded callback
- * of the #RamDiscardSource sources.
- *
- * @rdm: the #RamDiscardManager
- * @section: the #MemoryRegionSection
- * @replay_fn: the #ReplayRamDiscardState callback
- * @opaque: pointer to forward to the callback
- *
- * Returns 0 on success, or a negative error if any notification failed.
- */
-int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
- MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn,
- void *opaque);
-
-void ram_discard_manager_register_listener(RamDiscardManager *rdm,
- RamDiscardListener *rdl,
- MemoryRegionSection *section);
-
-void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
- RamDiscardListener *rdl);
-
-/*
- * Note: later refactoring should take the source into account and the manager
- * should be able to aggregate multiple sources.
- */
-int ram_discard_manager_notify_populate(RamDiscardManager *rdm,
- uint64_t offset, uint64_t size);
-
- /*
- * Note: later refactoring should take the source into account and the manager
- * should be able to aggregate multiple sources.
- */
-void ram_discard_manager_notify_discard(RamDiscardManager *rdm,
- uint64_t offset, uint64_t size);
-
-/*
- * Note: later refactoring should take the source into account and the manager
- * should be able to aggregate multiple sources.
- */
-void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm);
-
-/*
- * Replay populated sections to all registered listeners.
- *
- * Note: later refactoring should take the source into account and the manager
- * should be able to aggregate multiple sources.
- */
-int ram_discard_manager_replay_populated_to_listeners(RamDiscardManager *rdm);
-
/**
* memory_translate_iotlb: Extract addresses from a TLB entry.
* Called with rcu_read_lock held.
diff --git a/include/system/ram-discard-manager.h b/include/system/ram-discard-manager.h
new file mode 100644
index 00000000000..da55658169f
--- /dev/null
+++ b/include/system/ram-discard-manager.h
@@ -0,0 +1,297 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * RAM Discard Manager
+ *
+ * Copyright Red Hat, Inc. 2026
+ */
+
+#ifndef RAM_DISCARD_MANAGER_H
+#define RAM_DISCARD_MANAGER_H
+
+#include "qemu/typedefs.h"
+#include "qom/object.h"
+#include "qemu/queue.h"
+
+#define TYPE_RAM_DISCARD_MANAGER "ram-discard-manager"
+typedef struct RamDiscardManagerClass RamDiscardManagerClass;
+typedef struct RamDiscardManager RamDiscardManager;
+DECLARE_OBJ_CHECKERS(RamDiscardManager, RamDiscardManagerClass,
+ RAM_DISCARD_MANAGER, TYPE_RAM_DISCARD_MANAGER);
+
+#define TYPE_RAM_DISCARD_SOURCE "ram-discard-source"
+typedef struct RamDiscardSourceClass RamDiscardSourceClass;
+typedef struct RamDiscardSource RamDiscardSource;
+DECLARE_OBJ_CHECKERS(RamDiscardSource, RamDiscardSourceClass,
+ RAM_DISCARD_SOURCE, TYPE_RAM_DISCARD_SOURCE);
+
+typedef struct RamDiscardListener RamDiscardListener;
+typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl,
+ MemoryRegionSection *section);
+typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl,
+ MemoryRegionSection *section);
+
+struct RamDiscardListener {
+ /*
+ * @notify_populate:
+ *
+ * Notification that previously discarded memory is about to get populated.
+ * Listeners are able to object. If any listener objects, already
+ * successfully notified listeners are notified about a discard again.
+ *
+ * @rdl: the #RamDiscardListener getting notified
+ * @section: the #MemoryRegionSection to get populated. The section
+ * is aligned within the memory region to the minimum granularity
+ * unless it would exceed the registered section.
+ *
+ * Returns 0 on success. If the notification is rejected by the listener,
+ * an error is returned.
+ */
+ NotifyRamPopulate notify_populate;
+
+ /*
+ * @notify_discard:
+ *
+ * Notification that previously populated memory was discarded successfully
+ * and listeners should drop all references to such memory and prevent
+ * new population (e.g., unmap).
+ *
+ * @rdl: the #RamDiscardListener getting notified
+ * @section: the #MemoryRegionSection to get discarded. The section
+ * is aligned within the memory region to the minimum granularity
+ * unless it would exceed the registered section.
+ */
+ NotifyRamDiscard notify_discard;
+
+ MemoryRegionSection *section;
+ QLIST_ENTRY(RamDiscardListener) next;
+};
+
+static inline void ram_discard_listener_init(RamDiscardListener *rdl,
+ NotifyRamPopulate populate_fn,
+ NotifyRamDiscard discard_fn)
+{
+ rdl->notify_populate = populate_fn;
+ rdl->notify_discard = discard_fn;
+}
+
+/**
+ * typedef ReplayRamDiscardState:
+ *
+ * The callback handler for #RamDiscardSourceClass.replay_populated/
+ * #RamDiscardSourceClass.replay_discarded to invoke on populated/discarded
+ * parts.
+ *
+ * @section: the #MemoryRegionSection of populated/discarded part
+ * @opaque: pointer to forward to the callback
+ *
+ * Returns 0 on success, or a negative error if failed.
+ */
+typedef int (*ReplayRamDiscardState)(MemoryRegionSection *section,
+ void *opaque);
+
+/*
+ * RamDiscardSourceClass:
+ *
+ * A #RamDiscardSource provides information about which parts of a specific
+ * RAM #MemoryRegion are currently populated (accessible) vs discarded.
+ *
+ * This is an interface that state providers (like virtio-mem or
+ * RamBlockAttributes) implement to provide discard state information. A
+ * #RamDiscardManager wraps sources and manages listener registrations and
+ * notifications.
+ */
+struct RamDiscardSourceClass {
+ /* private */
+ InterfaceClass parent_class;
+
+ /* public */
+
+ /**
+ * @get_min_granularity:
+ *
+ * Get the minimum granularity in which listeners will get notified
+ * about changes within the #MemoryRegion via the #RamDiscardSource.
+ *
+ * @rds: the #RamDiscardSource
+ * @mr: the #MemoryRegion
+ *
+ * Returns the minimum granularity.
+ */
+ uint64_t (*get_min_granularity)(const RamDiscardSource *rds,
+ const MemoryRegion *mr);
+
+ /**
+ * @is_populated:
+ *
+ * Check whether the given #MemoryRegionSection is completely populated
+ * (i.e., no parts are currently discarded) via the #RamDiscardSource.
+ * There are no alignment requirements.
+ *
+ * @rds: the #RamDiscardSource
+ * @section: the #MemoryRegionSection
+ *
+ * Returns whether the given range is completely populated.
+ */
+ bool (*is_populated)(const RamDiscardSource *rds,
+ const MemoryRegionSection *section);
+
+ /**
+ * @replay_populated:
+ *
+ * Call the #ReplayRamDiscardState callback for all populated parts within
+ * the #MemoryRegionSection via the #RamDiscardSource.
+ *
+ * In case any call fails, no further calls are made.
+ *
+ * @rds: the #RamDiscardSource
+ * @section: the #MemoryRegionSection
+ * @replay_fn: the #ReplayRamDiscardState callback
+ * @opaque: pointer to forward to the callback
+ *
+ * Returns 0 on success, or a negative error if any notification failed.
+ */
+ int (*replay_populated)(const RamDiscardSource *rds,
+ MemoryRegionSection *section,
+ ReplayRamDiscardState replay_fn, void *opaque);
+
+ /**
+ * @replay_discarded:
+ *
+ * Call the #ReplayRamDiscardState callback for all discarded parts within
+ * the #MemoryRegionSection via the #RamDiscardSource.
+ *
+ * @rds: the #RamDiscardSource
+ * @section: the #MemoryRegionSection
+ * @replay_fn: the #ReplayRamDiscardState callback
+ * @opaque: pointer to forward to the callback
+ *
+ * Returns 0 on success, or a negative error if any notification failed.
+ */
+ int (*replay_discarded)(const RamDiscardSource *rds,
+ MemoryRegionSection *section,
+ ReplayRamDiscardState replay_fn, void *opaque);
+};
+
+/**
+ * RamDiscardManager:
+ *
+ * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
+ * regions are currently populated to be used/accessed by the VM, notifying
+ * after parts were discarded (freeing up memory) and before parts will be
+ * populated (consuming memory), to be used/accessed by the VM.
+ *
+ * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
+ * #MemoryRegion isn't mapped into an address space yet (either directly
+ * or via an alias); it cannot change while the #MemoryRegion is
+ * mapped into an address space.
+ *
+ * The #RamDiscardManager is intended to be used by technologies that are
+ * incompatible with discarding of RAM (e.g., VFIO, which may pin all
+ * memory inside a #MemoryRegion), and require proper coordination to only
+ * map the currently populated parts, to hinder parts that are expected to
+ * remain discarded from silently getting populated and consuming memory.
+ * Technologies that support discarding of RAM don't have to bother and can
+ * simply map the whole #MemoryRegion.
+ *
+ * An example #RamDiscardSource is virtio-mem, which logically (un)plugs
+ * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
+ * Logically unplugging memory consists of discarding RAM. The VM agreed to not
+ * access unplugged (discarded) memory - especially via DMA. virtio-mem will
+ * properly coordinate with listeners before memory is plugged (populated),
+ * and after memory is unplugged (discarded).
+ *
+ * Listeners are called in multiples of the minimum granularity (unless it
+ * would exceed the registered range) and changes are aligned to the minimum
+ * granularity within the #MemoryRegion. Listeners have to prepare for memory
+ * becoming discarded in a different granularity than it was populated and the
+ * other way around.
+ */
+struct RamDiscardManager {
+ Object parent;
+
+ RamDiscardSource *rds;
+ MemoryRegion *mr;
+ QLIST_HEAD(, RamDiscardListener) rdl_list;
+};
+
+RamDiscardManager *ram_discard_manager_new(MemoryRegion *mr,
+ RamDiscardSource *rds);
+
+uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
+ const MemoryRegion *mr);
+
+bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
+ const MemoryRegionSection *section);
+
+/**
+ * ram_discard_manager_replay_populated:
+ *
+ * A wrapper to call the #RamDiscardSourceClass.replay_populated callback
+ * of the #RamDiscardSource sources.
+ *
+ * @rdm: the #RamDiscardManager
+ * @section: the #MemoryRegionSection
+ * @replay_fn: the #ReplayRamDiscardState callback
+ * @opaque: pointer to forward to the callback
+ *
+ * Returns 0 on success, or a negative error if any notification failed.
+ */
+int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
+ MemoryRegionSection *section,
+ ReplayRamDiscardState replay_fn,
+ void *opaque);
+
+/**
+ * ram_discard_manager_replay_discarded:
+ *
+ * A wrapper to call the #RamDiscardSourceClass.replay_discarded callback
+ * of the #RamDiscardSource sources.
+ *
+ * @rdm: the #RamDiscardManager
+ * @section: the #MemoryRegionSection
+ * @replay_fn: the #ReplayRamDiscardState callback
+ * @opaque: pointer to forward to the callback
+ *
+ * Returns 0 on success, or a negative error if any notification failed.
+ */
+int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
+ MemoryRegionSection *section,
+ ReplayRamDiscardState replay_fn,
+ void *opaque);
+
+void ram_discard_manager_register_listener(RamDiscardManager *rdm,
+ RamDiscardListener *rdl,
+ MemoryRegionSection *section);
+
+void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
+ RamDiscardListener *rdl);
+
+/*
+ * Note: later refactoring should take the source into account and the manager
+ * should be able to aggregate multiple sources.
+ */
+int ram_discard_manager_notify_populate(RamDiscardManager *rdm,
+ uint64_t offset, uint64_t size);
+
+/*
+ * Note: later refactoring should take the source into account and the manager
+ * should be able to aggregate multiple sources.
+ */
+void ram_discard_manager_notify_discard(RamDiscardManager *rdm,
+ uint64_t offset, uint64_t size);
+
+/*
+ * Note: later refactoring should take the source into account and the manager
+ * should be able to aggregate multiple sources.
+ */
+void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm);
+
+/*
+ * Replay populated sections to all registered listeners.
+ *
+ * Note: later refactoring should take the source into account and the manager
+ * should be able to aggregate multiple sources.
+ */
+int ram_discard_manager_replay_populated_to_listeners(RamDiscardManager *rdm);
+
+#endif /* RAM_DISCARD_MANAGER_H */
diff --git a/system/memory.c b/system/memory.c
index 77966513113..97695a253e6 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2042,17 +2042,6 @@ RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
return mr->rdm;
}
-static RamDiscardManager *ram_discard_manager_new(MemoryRegion *mr,
- RamDiscardSource *rds)
-{
- RamDiscardManager *rdm = RAM_DISCARD_MANAGER(object_new(TYPE_RAM_DISCARD_MANAGER));
-
- rdm->rds = rds;
- rdm->mr = mr;
- QLIST_INIT(&rdm->rdl_list);
- return rdm;
-}
-
int memory_region_add_ram_discard_source(MemoryRegion *mr,
RamDiscardSource *source)
{
@@ -2074,200 +2063,6 @@ void memory_region_del_ram_discard_source(MemoryRegion *mr,
mr->rdm = NULL;
}
-static uint64_t ram_discard_source_get_min_granularity(const RamDiscardSource *rds,
- const MemoryRegion *mr)
-{
- RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
-
- g_assert(rdsc->get_min_granularity);
- return rdsc->get_min_granularity(rds, mr);
-}
-
-static bool ram_discard_source_is_populated(const RamDiscardSource *rds,
- const MemoryRegionSection *section)
-{
- RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
-
- g_assert(rdsc->is_populated);
- return rdsc->is_populated(rds, section);
-}
-
-static int ram_discard_source_replay_populated(const RamDiscardSource *rds,
- MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn,
- void *opaque)
-{
- RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
-
- g_assert(rdsc->replay_populated);
- return rdsc->replay_populated(rds, section, replay_fn, opaque);
-}
-
-static int ram_discard_source_replay_discarded(const RamDiscardSource *rds,
- MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn,
- void *opaque)
-{
- RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
-
- g_assert(rdsc->replay_discarded);
- return rdsc->replay_discarded(rds, section, replay_fn, opaque);
-}
-
-uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
- const MemoryRegion *mr)
-{
- return ram_discard_source_get_min_granularity(rdm->rds, mr);
-}
-
-bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
- const MemoryRegionSection *section)
-{
- return ram_discard_source_is_populated(rdm->rds, section);
-}
-
-int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
- MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn,
- void *opaque)
-{
- return ram_discard_source_replay_populated(rdm->rds, section, replay_fn, opaque);
-}
-
-int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
- MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn,
- void *opaque)
-{
- return ram_discard_source_replay_discarded(rdm->rds, section, replay_fn, opaque);
-}
-
-static void ram_discard_manager_initfn(Object *obj)
-{
- RamDiscardManager *rdm = RAM_DISCARD_MANAGER(obj);
-
- QLIST_INIT(&rdm->rdl_list);
-}
-
-static void ram_discard_manager_finalize(Object *obj)
-{
- RamDiscardManager *rdm = RAM_DISCARD_MANAGER(obj);
-
- g_assert(QLIST_EMPTY(&rdm->rdl_list));
-}
-
-int ram_discard_manager_notify_populate(RamDiscardManager *rdm,
- uint64_t offset, uint64_t size)
-{
- RamDiscardListener *rdl, *rdl2;
- int ret = 0;
-
- QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
- MemoryRegionSection tmp = *rdl->section;
-
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- ret = rdl->notify_populate(rdl, &tmp);
- if (ret) {
- break;
- }
- }
-
- if (ret) {
- /* Notify all already-notified listeners about discard. */
- QLIST_FOREACH(rdl2, &rdm->rdl_list, next) {
- MemoryRegionSection tmp = *rdl2->section;
-
- if (rdl2 == rdl) {
- break;
- }
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- rdl2->notify_discard(rdl2, &tmp);
- }
- }
- return ret;
-}
-
-void ram_discard_manager_notify_discard(RamDiscardManager *rdm,
- uint64_t offset, uint64_t size)
-{
- RamDiscardListener *rdl;
-
- QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
- MemoryRegionSection tmp = *rdl->section;
-
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- rdl->notify_discard(rdl, &tmp);
- }
-}
-
-void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm)
-{
- RamDiscardListener *rdl;
-
- QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
- rdl->notify_discard(rdl, rdl->section);
- }
-}
-
-static int rdm_populate_cb(MemoryRegionSection *section, void *opaque)
-{
- RamDiscardListener *rdl = opaque;
-
- return rdl->notify_populate(rdl, section);
-}
-
-void ram_discard_manager_register_listener(RamDiscardManager *rdm,
- RamDiscardListener *rdl,
- MemoryRegionSection *section)
-{
- int ret;
-
- g_assert(section->mr == rdm->mr);
-
- rdl->section = memory_region_section_new_copy(section);
- QLIST_INSERT_HEAD(&rdm->rdl_list, rdl, next);
-
- ret = ram_discard_source_replay_populated(rdm->rds, rdl->section,
- rdm_populate_cb, rdl);
- if (ret) {
- error_report("%s: Replaying populated ranges failed: %s", __func__,
- strerror(-ret));
- }
-}
-
-void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
- RamDiscardListener *rdl)
-{
- g_assert(rdl->section);
- g_assert(rdl->section->mr == rdm->mr);
-
- rdl->notify_discard(rdl, rdl->section);
- memory_region_section_free_copy(rdl->section);
- rdl->section = NULL;
- QLIST_REMOVE(rdl, next);
-}
-
-int ram_discard_manager_replay_populated_to_listeners(RamDiscardManager *rdm)
-{
- RamDiscardListener *rdl;
- int ret = 0;
-
- QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
- ret = ram_discard_source_replay_populated(rdm->rds, rdl->section,
- rdm_populate_cb, rdl);
- if (ret) {
- break;
- }
- }
- return ret;
-}
-
/* Called with rcu_read_lock held. */
MemoryRegion *memory_translate_iotlb(IOMMUTLBEntry *iotlb, hwaddr *xlat_p,
Error **errp)
@@ -3897,26 +3692,10 @@ static const TypeInfo iommu_memory_region_info = {
.abstract = true,
};
-static const TypeInfo ram_discard_manager_info = {
- .parent = TYPE_OBJECT,
- .name = TYPE_RAM_DISCARD_MANAGER,
- .instance_size = sizeof(RamDiscardManager),
- .instance_init = ram_discard_manager_initfn,
- .instance_finalize = ram_discard_manager_finalize,
-};
-
-static const TypeInfo ram_discard_source_info = {
- .parent = TYPE_INTERFACE,
- .name = TYPE_RAM_DISCARD_SOURCE,
- .class_size = sizeof(RamDiscardSourceClass),
-};
-
static void memory_register_types(void)
{
type_register_static(&memory_region_info);
type_register_static(&iommu_memory_region_info);
- type_register_static(&ram_discard_manager_info);
- type_register_static(&ram_discard_source_info);
}
type_init(memory_register_types)
diff --git a/system/ram-discard-manager.c b/system/ram-discard-manager.c
new file mode 100644
index 00000000000..3d8c85617d7
--- /dev/null
+++ b/system/ram-discard-manager.c
@@ -0,0 +1,240 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * RAM Discard Manager
+ *
+ * Copyright Red Hat, Inc. 2026
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "system/memory.h"
+
+static uint64_t ram_discard_source_get_min_granularity(const RamDiscardSource *rds,
+ const MemoryRegion *mr)
+{
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
+
+ g_assert(rdsc->get_min_granularity);
+ return rdsc->get_min_granularity(rds, mr);
+}
+
+static bool ram_discard_source_is_populated(const RamDiscardSource *rds,
+ const MemoryRegionSection *section)
+{
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
+
+ g_assert(rdsc->is_populated);
+ return rdsc->is_populated(rds, section);
+}
+
+static int ram_discard_source_replay_populated(const RamDiscardSource *rds,
+ MemoryRegionSection *section,
+ ReplayRamDiscardState replay_fn,
+ void *opaque)
+{
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
+
+ g_assert(rdsc->replay_populated);
+ return rdsc->replay_populated(rds, section, replay_fn, opaque);
+}
+
+static int ram_discard_source_replay_discarded(const RamDiscardSource *rds,
+ MemoryRegionSection *section,
+ ReplayRamDiscardState replay_fn,
+ void *opaque)
+{
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
+
+ g_assert(rdsc->replay_discarded);
+ return rdsc->replay_discarded(rds, section, replay_fn, opaque);
+}
+
+RamDiscardManager *ram_discard_manager_new(MemoryRegion *mr,
+ RamDiscardSource *rds)
+{
+ RamDiscardManager *rdm;
+
+ rdm = RAM_DISCARD_MANAGER(object_new(TYPE_RAM_DISCARD_MANAGER));
+ rdm->rds = rds;
+ rdm->mr = mr;
+ QLIST_INIT(&rdm->rdl_list);
+ return rdm;
+}
+
+uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
+ const MemoryRegion *mr)
+{
+ return ram_discard_source_get_min_granularity(rdm->rds, mr);
+}
+
+bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
+ const MemoryRegionSection *section)
+{
+ return ram_discard_source_is_populated(rdm->rds, section);
+}
+
+int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
+ MemoryRegionSection *section,
+ ReplayRamDiscardState replay_fn,
+ void *opaque)
+{
+ return ram_discard_source_replay_populated(rdm->rds, section,
+ replay_fn, opaque);
+}
+
+int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
+ MemoryRegionSection *section,
+ ReplayRamDiscardState replay_fn,
+ void *opaque)
+{
+ return ram_discard_source_replay_discarded(rdm->rds, section,
+ replay_fn, opaque);
+}
+
+static void ram_discard_manager_initfn(Object *obj)
+{
+ RamDiscardManager *rdm = RAM_DISCARD_MANAGER(obj);
+
+ QLIST_INIT(&rdm->rdl_list);
+}
+
+static void ram_discard_manager_finalize(Object *obj)
+{
+ RamDiscardManager *rdm = RAM_DISCARD_MANAGER(obj);
+
+ g_assert(QLIST_EMPTY(&rdm->rdl_list));
+}
+
+int ram_discard_manager_notify_populate(RamDiscardManager *rdm,
+ uint64_t offset, uint64_t size)
+{
+ RamDiscardListener *rdl, *rdl2;
+ int ret = 0;
+
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ MemoryRegionSection tmp = *rdl->section;
+
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+ continue;
+ }
+ ret = rdl->notify_populate(rdl, &tmp);
+ if (ret) {
+ break;
+ }
+ }
+
+ if (ret) {
+ /* Notify all already-notified listeners about discard. */
+ QLIST_FOREACH(rdl2, &rdm->rdl_list, next) {
+ MemoryRegionSection tmp = *rdl2->section;
+
+ if (rdl2 == rdl) {
+ break;
+ }
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+ continue;
+ }
+ rdl2->notify_discard(rdl2, &tmp);
+ }
+ }
+ return ret;
+}
+
+void ram_discard_manager_notify_discard(RamDiscardManager *rdm,
+ uint64_t offset, uint64_t size)
+{
+ RamDiscardListener *rdl;
+
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ MemoryRegionSection tmp = *rdl->section;
+
+ if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+ continue;
+ }
+ rdl->notify_discard(rdl, &tmp);
+ }
+}
+
+void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm)
+{
+ RamDiscardListener *rdl;
+
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ rdl->notify_discard(rdl, rdl->section);
+ }
+}
+
+static int rdm_populate_cb(MemoryRegionSection *section, void *opaque)
+{
+ RamDiscardListener *rdl = opaque;
+
+ return rdl->notify_populate(rdl, section);
+}
+
+void ram_discard_manager_register_listener(RamDiscardManager *rdm,
+ RamDiscardListener *rdl,
+ MemoryRegionSection *section)
+{
+ int ret;
+
+ g_assert(section->mr == rdm->mr);
+
+ rdl->section = memory_region_section_new_copy(section);
+ QLIST_INSERT_HEAD(&rdm->rdl_list, rdl, next);
+
+ ret = ram_discard_source_replay_populated(rdm->rds, rdl->section,
+ rdm_populate_cb, rdl);
+ if (ret) {
+ error_report("%s: Replaying populated ranges failed: %s", __func__,
+ strerror(-ret));
+ }
+}
+
+void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
+ RamDiscardListener *rdl)
+{
+ g_assert(rdl->section);
+ g_assert(rdl->section->mr == rdm->mr);
+
+ rdl->notify_discard(rdl, rdl->section);
+ memory_region_section_free_copy(rdl->section);
+ rdl->section = NULL;
+ QLIST_REMOVE(rdl, next);
+}
+
+int ram_discard_manager_replay_populated_to_listeners(RamDiscardManager *rdm)
+{
+ RamDiscardListener *rdl;
+ int ret = 0;
+
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ ret = ram_discard_source_replay_populated(rdm->rds, rdl->section,
+ rdm_populate_cb, rdl);
+ if (ret) {
+ break;
+ }
+ }
+ return ret;
+}
+
+static const TypeInfo ram_discard_manager_info = {
+ .parent = TYPE_OBJECT,
+ .name = TYPE_RAM_DISCARD_MANAGER,
+ .instance_size = sizeof(RamDiscardManager),
+ .instance_init = ram_discard_manager_initfn,
+ .instance_finalize = ram_discard_manager_finalize,
+};
+
+static const TypeInfo ram_discard_source_info = {
+ .parent = TYPE_INTERFACE,
+ .name = TYPE_RAM_DISCARD_SOURCE,
+ .class_size = sizeof(RamDiscardSourceClass),
+};
+
+static void ram_discard_manager_register_types(void)
+{
+ type_register_static(&ram_discard_manager_info);
+ type_register_static(&ram_discard_source_info);
+}
+
+type_init(ram_discard_manager_register_types)
diff --git a/rust/bindings/system-sys/lib.rs b/rust/bindings/system-sys/lib.rs
index 022fe65dd83..30adf683c35 100644
--- a/rust/bindings/system-sys/lib.rs
+++ b/rust/bindings/system-sys/lib.rs
@@ -20,7 +20,7 @@
use common::Zeroable;
use hwcore_sys::{qemu_irq, DeviceClass, DeviceState};
-use qom_sys::{InterfaceClass, Object, ObjectClass};
+use qom_sys::{Object, ObjectClass};
use util_sys::{Error, EventNotifier, QEMUBH};
#[cfg(MESON)]
diff --git a/system/meson.build b/system/meson.build
index 9cdfe1b3e75..cd3193d170b 100644
--- a/system/meson.build
+++ b/system/meson.build
@@ -14,6 +14,7 @@ system_ss.add(files(
'globals.c',
'ioport.c',
'ram-block-attributes.c',
+ 'ram-discard-manager.c',
'memory_mapping.c',
'memory.c',
'physmem.c',
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 03/13] system/memory: constify section arguments
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 01/13] system/memory: split RamDiscardManager into source and manager Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 02/13] system/memory: move RamDiscardManager to separate compilation unit Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 04/13] system/ram-discard-manager: implement replay via is_populated iteration Marc-André Lureau
` (10 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Cédric Le Goater, Marc-André Lureau
The sections shouldn't be modified.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
include/hw/vfio/vfio-container.h | 2 +-
include/hw/vfio/vfio-cpr.h | 2 +-
include/system/ram-discard-manager.h | 14 +++++++-------
hw/vfio/cpr-legacy.c | 4 ++--
hw/vfio/listener.c | 10 +++++-----
hw/virtio/virtio-mem.c | 10 +++++-----
migration/ram.c | 6 +++---
system/memory_mapping.c | 4 ++--
system/ram-block-attributes.c | 8 ++++----
system/ram-discard-manager.c | 10 +++++-----
10 files changed, 35 insertions(+), 35 deletions(-)
diff --git a/include/hw/vfio/vfio-container.h b/include/hw/vfio/vfio-container.h
index a7d5c5ed679..b2e7f4312c3 100644
--- a/include/hw/vfio/vfio-container.h
+++ b/include/hw/vfio/vfio-container.h
@@ -277,7 +277,7 @@ struct VFIOIOMMUClass {
};
VFIORamDiscardListener *vfio_find_ram_discard_listener(
- VFIOContainer *bcontainer, MemoryRegionSection *section);
+ VFIOContainer *bcontainer, const MemoryRegionSection *section);
void vfio_container_region_add(VFIOContainer *bcontainer,
MemoryRegionSection *section, bool cpr_remap);
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index 4606da500a7..ecabe0c747d 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -69,7 +69,7 @@ void vfio_cpr_giommu_remap(struct VFIOContainer *bcontainer,
MemoryRegionSection *section);
bool vfio_cpr_ram_discard_replay_populated(
- struct VFIOContainer *bcontainer, MemoryRegionSection *section);
+ struct VFIOContainer *bcontainer, const MemoryRegionSection *section);
void vfio_cpr_save_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
int nr, int fd);
diff --git a/include/system/ram-discard-manager.h b/include/system/ram-discard-manager.h
index da55658169f..b188e09a30f 100644
--- a/include/system/ram-discard-manager.h
+++ b/include/system/ram-discard-manager.h
@@ -26,9 +26,9 @@ DECLARE_OBJ_CHECKERS(RamDiscardSource, RamDiscardSourceClass,
typedef struct RamDiscardListener RamDiscardListener;
typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl,
- MemoryRegionSection *section);
+ const MemoryRegionSection *section);
typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl,
- MemoryRegionSection *section);
+ const MemoryRegionSection *section);
struct RamDiscardListener {
/*
@@ -86,7 +86,7 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
*
* Returns 0 on success, or a negative error if failed.
*/
-typedef int (*ReplayRamDiscardState)(MemoryRegionSection *section,
+typedef int (*ReplayRamDiscardState)(const MemoryRegionSection *section,
void *opaque);
/*
@@ -151,7 +151,7 @@ struct RamDiscardSourceClass {
* Returns 0 on success, or a negative error if any notification failed.
*/
int (*replay_populated)(const RamDiscardSource *rds,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
ReplayRamDiscardState replay_fn, void *opaque);
/**
@@ -168,7 +168,7 @@ struct RamDiscardSourceClass {
* Returns 0 on success, or a negative error if any notification failed.
*/
int (*replay_discarded)(const RamDiscardSource *rds,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
ReplayRamDiscardState replay_fn, void *opaque);
};
@@ -237,7 +237,7 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
* Returns 0 on success, or a negative error if any notification failed.
*/
int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
ReplayRamDiscardState replay_fn,
void *opaque);
@@ -255,7 +255,7 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
* Returns 0 on success, or a negative error if any notification failed.
*/
int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
ReplayRamDiscardState replay_fn,
void *opaque);
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index 033a546c301..cca7dd08dfc 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -226,7 +226,7 @@ void vfio_cpr_giommu_remap(VFIOContainer *bcontainer,
memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
}
-static int vfio_cpr_rdm_remap(MemoryRegionSection *section, void *opaque)
+static int vfio_cpr_rdm_remap(const MemoryRegionSection *section, void *opaque)
{
RamDiscardListener *rdl = opaque;
@@ -242,7 +242,7 @@ static int vfio_cpr_rdm_remap(MemoryRegionSection *section, void *opaque)
* directly, which calls vfio_legacy_cpr_dma_map.
*/
bool vfio_cpr_ram_discard_replay_populated(VFIOContainer *bcontainer,
- MemoryRegionSection *section)
+ const MemoryRegionSection *section)
{
RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
VFIORamDiscardListener *vrdl =
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index 31c3113f8fb..e784868f62a 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -201,7 +201,7 @@ out:
}
static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
- MemoryRegionSection *section)
+ const MemoryRegionSection *section)
{
VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
listener);
@@ -219,7 +219,7 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
}
static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
- MemoryRegionSection *section)
+ const MemoryRegionSection *section)
{
VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
listener);
@@ -461,7 +461,7 @@ static void vfio_device_error_append(VFIODevice *vbasedev, Error **errp)
}
VFIORamDiscardListener *vfio_find_ram_discard_listener(
- VFIOContainer *bcontainer, MemoryRegionSection *section)
+ VFIOContainer *bcontainer, const MemoryRegionSection *section)
{
VFIORamDiscardListener *vrdl = NULL;
@@ -1143,8 +1143,8 @@ out:
}
}
-static int vfio_ram_discard_query_dirty_bitmap(MemoryRegionSection *section,
- void *opaque)
+static int vfio_ram_discard_query_dirty_bitmap(const MemoryRegionSection *section,
+ void *opaque)
{
const hwaddr size = int128_get64(section->size);
const hwaddr iova = section->offset_within_address_space;
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index be149ee9441..ec165503205 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -262,7 +262,7 @@ static int virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg,
typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void *arg);
static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
- MemoryRegionSection *s,
+ const MemoryRegionSection *s,
void *arg,
virtio_mem_section_cb cb)
{
@@ -294,7 +294,7 @@ static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
}
static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
- MemoryRegionSection *s,
+ const MemoryRegionSection *s,
void *arg,
virtio_mem_section_cb cb)
{
@@ -1680,7 +1680,7 @@ static int virtio_mem_rds_replay_cb(MemoryRegionSection *s, void *arg)
}
static int virtio_mem_rds_replay_populated(const RamDiscardSource *rds,
- MemoryRegionSection *s,
+ const MemoryRegionSection *s,
ReplayRamDiscardState replay_fn,
void *opaque)
{
@@ -1692,11 +1692,11 @@ static int virtio_mem_rds_replay_populated(const RamDiscardSource *rds,
g_assert(s->mr == &vmem->memdev->mr);
return virtio_mem_for_each_plugged_section(vmem, s, &data,
- virtio_mem_rds_replay_cb);
+ virtio_mem_rds_replay_cb);
}
static int virtio_mem_rds_replay_discarded(const RamDiscardSource *rds,
- MemoryRegionSection *s,
+ const MemoryRegionSection *s,
ReplayRamDiscardState replay_fn,
void *opaque)
{
diff --git a/migration/ram.c b/migration/ram.c
index 2046f16caa2..3f69b8733eb 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -860,7 +860,7 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
return ret;
}
-static int dirty_bitmap_clear_section(MemoryRegionSection *section,
+static int dirty_bitmap_clear_section(const MemoryRegionSection *section,
void *opaque)
{
const hwaddr offset = section->offset_within_region;
@@ -1595,7 +1595,7 @@ static inline void populate_read_range(RAMBlock *block, ram_addr_t offset,
}
}
-static inline int populate_read_section(MemoryRegionSection *section,
+static inline int populate_read_section(const MemoryRegionSection *section,
void *opaque)
{
const hwaddr size = int128_get64(section->size);
@@ -1670,7 +1670,7 @@ void ram_write_tracking_prepare(void)
}
}
-static inline int uffd_protect_section(MemoryRegionSection *section,
+static inline int uffd_protect_section(const MemoryRegionSection *section,
void *opaque)
{
const hwaddr size = int128_get64(section->size);
diff --git a/system/memory_mapping.c b/system/memory_mapping.c
index da708a08ab7..cacef504f68 100644
--- a/system/memory_mapping.c
+++ b/system/memory_mapping.c
@@ -196,7 +196,7 @@ typedef struct GuestPhysListener {
} GuestPhysListener;
static void guest_phys_block_add_section(GuestPhysListener *g,
- MemoryRegionSection *section)
+ const MemoryRegionSection *section)
{
const hwaddr target_start = section->offset_within_address_space;
const hwaddr target_end = target_start + int128_get64(section->size);
@@ -248,7 +248,7 @@ static void guest_phys_block_add_section(GuestPhysListener *g,
#endif
}
-static int guest_phys_ram_populate_cb(MemoryRegionSection *section,
+static int guest_phys_ram_populate_cb(const MemoryRegionSection *section,
void *opaque)
{
GuestPhysListener *g = opaque;
diff --git a/system/ram-block-attributes.c b/system/ram-block-attributes.c
index a72924eea7d..79c7e97d9a1 100644
--- a/system/ram-block-attributes.c
+++ b/system/ram-block-attributes.c
@@ -37,7 +37,7 @@ typedef int (*ram_block_attributes_section_cb)(MemoryRegionSection *s,
static int
ram_block_attributes_for_each_populated_section(const RamBlockAttributes *attr,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
void *arg,
ram_block_attributes_section_cb cb)
{
@@ -78,7 +78,7 @@ ram_block_attributes_for_each_populated_section(const RamBlockAttributes *attr,
static int
ram_block_attributes_for_each_discarded_section(const RamBlockAttributes *attr,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
void *arg,
ram_block_attributes_section_cb cb)
{
@@ -161,7 +161,7 @@ ram_block_attributes_rds_is_populated(const RamDiscardSource *rds,
static int
ram_block_attributes_rds_replay_populated(const RamDiscardSource *rds,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
ReplayRamDiscardState replay_fn,
void *opaque)
{
@@ -175,7 +175,7 @@ ram_block_attributes_rds_replay_populated(const RamDiscardSource *rds,
static int
ram_block_attributes_rds_replay_discarded(const RamDiscardSource *rds,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
ReplayRamDiscardState replay_fn,
void *opaque)
{
diff --git a/system/ram-discard-manager.c b/system/ram-discard-manager.c
index 3d8c85617d7..1c9ff7fda58 100644
--- a/system/ram-discard-manager.c
+++ b/system/ram-discard-manager.c
@@ -28,7 +28,7 @@ static bool ram_discard_source_is_populated(const RamDiscardSource *rds,
}
static int ram_discard_source_replay_populated(const RamDiscardSource *rds,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
ReplayRamDiscardState replay_fn,
void *opaque)
{
@@ -39,7 +39,7 @@ static int ram_discard_source_replay_populated(const RamDiscardSource *rds,
}
static int ram_discard_source_replay_discarded(const RamDiscardSource *rds,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
ReplayRamDiscardState replay_fn,
void *opaque)
{
@@ -74,7 +74,7 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
}
int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
ReplayRamDiscardState replay_fn,
void *opaque)
{
@@ -83,7 +83,7 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
}
int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
- MemoryRegionSection *section,
+ const MemoryRegionSection *section,
ReplayRamDiscardState replay_fn,
void *opaque)
{
@@ -164,7 +164,7 @@ void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm)
}
}
-static int rdm_populate_cb(MemoryRegionSection *section, void *opaque)
+static int rdm_populate_cb(const MemoryRegionSection *section, void *opaque)
{
RamDiscardListener *rdl = opaque;
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 04/13] system/ram-discard-manager: implement replay via is_populated iteration
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (2 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 03/13] system/memory: constify section arguments Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-13 20:40 ` Peter Xu
2026-05-04 12:30 ` [PATCH v4 05/13] virtio-mem: remove replay_populated/replay_discarded implementation Marc-André Lureau
` (9 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Marc-André Lureau
Replace the source-level replay wrappers with a new
replay_by_populated_state() helper that iterates the section at
min-granularity, calls is_populated() for each chunk, and aggregates
consecutive chunks of the same state before invoking the callback.
This moves the iteration logic from individual sources into the manager,
preparing for multi-source aggregation where the manager must combine
state from multiple sources anyway.
The replay_populated/replay_discarded vtable entries in
RamDiscardSourceClass are no longer called but remain in the interface
for now; they will be removed in follow-up commits along with the
now-dead source implementations.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
system/ram-discard-manager.c | 85 +++++++++++++++++++++++++++++++-------------
1 file changed, 61 insertions(+), 24 deletions(-)
diff --git a/system/ram-discard-manager.c b/system/ram-discard-manager.c
index 1c9ff7fda58..a907ddf3708 100644
--- a/system/ram-discard-manager.c
+++ b/system/ram-discard-manager.c
@@ -27,26 +27,65 @@ static bool ram_discard_source_is_populated(const RamDiscardSource *rds,
return rdsc->is_populated(rds, section);
}
-static int ram_discard_source_replay_populated(const RamDiscardSource *rds,
- const MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn,
- void *opaque)
+/*
+ * Iterate the section at source granularity, aggregating consecutive chunks
+ * with matching populated state, and call replay_fn for each run.
+ */
+static int replay_by_populated_state(const RamDiscardManager *rdm,
+ const MemoryRegionSection *section,
+ bool replay_populated,
+ ReplayRamDiscardState replay_fn,
+ void *opaque)
{
- RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
+ uint64_t granularity, offset, size, end, pos, run_start = 0;
+ bool in_run = false;
+ int ret = 0;
- g_assert(rdsc->replay_populated);
- return rdsc->replay_populated(rds, section, replay_fn, opaque);
-}
+ granularity = ram_discard_source_get_min_granularity(rdm->rds, rdm->mr);
+ offset = section->offset_within_region;
+ size = int128_get64(section->size);
+ end = offset + size;
+
+ /* Align iteration to granularity boundaries */
+ pos = QEMU_ALIGN_DOWN(offset, granularity);
+
+ for (; pos < end; pos += granularity) {
+ MemoryRegionSection chunk = {
+ .mr = section->mr,
+ .offset_within_region = pos,
+ .size = int128_make64(granularity),
+ };
+ bool populated = ram_discard_source_is_populated(rdm->rds, &chunk);
+
+ if (populated == replay_populated) {
+ if (!in_run) {
+ run_start = pos;
+ in_run = true;
+ }
+ } else if (in_run) {
+ MemoryRegionSection tmp = *section;
+
+ if (memory_region_section_intersect_range(&tmp, run_start,
+ pos - run_start)) {
+ ret = replay_fn(&tmp, opaque);
+ if (ret) {
+ return ret;
+ }
+ }
+ in_run = false;
+ }
+ }
-static int ram_discard_source_replay_discarded(const RamDiscardSource *rds,
- const MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn,
- void *opaque)
-{
- RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_GET_CLASS(rds);
+ if (in_run) {
+ MemoryRegionSection tmp = *section;
- g_assert(rdsc->replay_discarded);
- return rdsc->replay_discarded(rds, section, replay_fn, opaque);
+ if (memory_region_section_intersect_range(&tmp, run_start,
+ pos - run_start)) {
+ ret = replay_fn(&tmp, opaque);
+ }
+ }
+
+ return ret;
}
RamDiscardManager *ram_discard_manager_new(MemoryRegion *mr,
@@ -78,8 +117,7 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
ReplayRamDiscardState replay_fn,
void *opaque)
{
- return ram_discard_source_replay_populated(rdm->rds, section,
- replay_fn, opaque);
+ return replay_by_populated_state(rdm, section, true, replay_fn, opaque);
}
int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
@@ -87,8 +125,7 @@ int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
ReplayRamDiscardState replay_fn,
void *opaque)
{
- return ram_discard_source_replay_discarded(rdm->rds, section,
- replay_fn, opaque);
+ return replay_by_populated_state(rdm, section, false, replay_fn, opaque);
}
static void ram_discard_manager_initfn(Object *obj)
@@ -182,8 +219,8 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
rdl->section = memory_region_section_new_copy(section);
QLIST_INSERT_HEAD(&rdm->rdl_list, rdl, next);
- ret = ram_discard_source_replay_populated(rdm->rds, rdl->section,
- rdm_populate_cb, rdl);
+ ret = ram_discard_manager_replay_populated(rdm, rdl->section,
+ rdm_populate_cb, rdl);
if (ret) {
error_report("%s: Replaying populated ranges failed: %s", __func__,
strerror(-ret));
@@ -208,8 +245,8 @@ int ram_discard_manager_replay_populated_to_listeners(RamDiscardManager *rdm)
int ret = 0;
QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
- ret = ram_discard_source_replay_populated(rdm->rds, rdl->section,
- rdm_populate_cb, rdl);
+ ret = ram_discard_manager_replay_populated(rdm, rdl->section,
+ rdm_populate_cb, rdl);
if (ret) {
break;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 05/13] virtio-mem: remove replay_populated/replay_discarded implementation
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (3 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 04/13] system/ram-discard-manager: implement replay via is_populated iteration Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-13 20:40 ` Peter Xu
2026-05-04 12:30 ` [PATCH v4 06/13] system/ram-discard-manager: drop replay from source interface Marc-André Lureau
` (8 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Marc-André Lureau
The replay iteration logic has been moved into the RamDiscardManager,
which now iterates at source granularity using is_populated(). The
source-level replay_populated/replay_discarded methods and their
helpers are no longer called.
Remove the now-dead replay methods, the VirtIOMEMReplayData struct,
the virtio_mem_for_each_plugged/unplugged_section() helpers (only used
by the replay methods), and the virtio_mem_section_cb typedef.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
hw/virtio/virtio-mem.c | 112 -------------------------------------------------
1 file changed, 112 deletions(-)
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index ec165503205..2b67b2882d2 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -259,72 +259,6 @@ static int virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg,
return ret;
}
-typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void *arg);
-
-static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
- const MemoryRegionSection *s,
- void *arg,
- virtio_mem_section_cb cb)
-{
- unsigned long first_bit, last_bit;
- uint64_t offset, size;
- int ret = 0;
-
- first_bit = s->offset_within_region / vmem->block_size;
- first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, first_bit);
- while (first_bit < vmem->bitmap_size) {
- MemoryRegionSection tmp = *s;
-
- offset = first_bit * vmem->block_size;
- last_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
- first_bit + 1) - 1;
- size = (last_bit - first_bit + 1) * vmem->block_size;
-
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- break;
- }
- ret = cb(&tmp, arg);
- if (ret) {
- break;
- }
- first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
- last_bit + 2);
- }
- return ret;
-}
-
-static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
- const MemoryRegionSection *s,
- void *arg,
- virtio_mem_section_cb cb)
-{
- unsigned long first_bit, last_bit;
- uint64_t offset, size;
- int ret = 0;
-
- first_bit = s->offset_within_region / vmem->block_size;
- first_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, first_bit);
- while (first_bit < vmem->bitmap_size) {
- MemoryRegionSection tmp = *s;
-
- offset = first_bit * vmem->block_size;
- last_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
- first_bit + 1) - 1;
- size = (last_bit - first_bit + 1) * vmem->block_size;
-
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- break;
- }
- ret = cb(&tmp, arg);
- if (ret) {
- break;
- }
- first_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
- last_bit + 2);
- }
- return ret;
-}
-
static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
uint64_t size)
{
@@ -1667,50 +1601,6 @@ static bool virtio_mem_rds_is_populated(const RamDiscardSource *rds,
return virtio_mem_is_range_plugged(vmem, start_gpa, end_gpa - start_gpa);
}
-struct VirtIOMEMReplayData {
- ReplayRamDiscardState fn;
- void *opaque;
-};
-
-static int virtio_mem_rds_replay_cb(MemoryRegionSection *s, void *arg)
-{
- struct VirtIOMEMReplayData *data = arg;
-
- return data->fn(s, data->opaque);
-}
-
-static int virtio_mem_rds_replay_populated(const RamDiscardSource *rds,
- const MemoryRegionSection *s,
- ReplayRamDiscardState replay_fn,
- void *opaque)
-{
- const VirtIOMEM *vmem = VIRTIO_MEM(rds);
- struct VirtIOMEMReplayData data = {
- .fn = replay_fn,
- .opaque = opaque,
- };
-
- g_assert(s->mr == &vmem->memdev->mr);
- return virtio_mem_for_each_plugged_section(vmem, s, &data,
- virtio_mem_rds_replay_cb);
-}
-
-static int virtio_mem_rds_replay_discarded(const RamDiscardSource *rds,
- const MemoryRegionSection *s,
- ReplayRamDiscardState replay_fn,
- void *opaque)
-{
- const VirtIOMEM *vmem = VIRTIO_MEM(rds);
- struct VirtIOMEMReplayData data = {
- .fn = replay_fn,
- .opaque = opaque,
- };
-
- g_assert(s->mr == &vmem->memdev->mr);
- return virtio_mem_for_each_unplugged_section(vmem, s, &data,
- virtio_mem_rds_replay_cb);
-}
-
static void virtio_mem_unplug_request_check(VirtIOMEM *vmem, Error **errp)
{
if (vmem->unplugged_inaccessible == ON_OFF_AUTO_OFF) {
@@ -1766,8 +1656,6 @@ static void virtio_mem_class_init(ObjectClass *klass, const void *data)
rdsc->get_min_granularity = virtio_mem_rds_get_min_granularity;
rdsc->is_populated = virtio_mem_rds_is_populated;
- rdsc->replay_populated = virtio_mem_rds_replay_populated;
- rdsc->replay_discarded = virtio_mem_rds_replay_discarded;
}
static const TypeInfo virtio_mem_info = {
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 06/13] system/ram-discard-manager: drop replay from source interface
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (4 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 05/13] virtio-mem: remove replay_populated/replay_discarded implementation Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-13 20:40 ` Peter Xu
2026-05-04 12:30 ` [PATCH v4 07/13] system/memory: implement RamDiscardManager multi-source aggregation Marc-André Lureau
` (7 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Marc-André Lureau
Remove replay_populated and replay_discarded from RamDiscardSourceClass
now that the RamDiscardManager handles replay iteration internally via
is_populated.
Remove the now-dead replay methods, helpers, and
for_each_populated/discarded_section() from ram-block-attributes, which
was the last source still carrying this code.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
include/system/ram-discard-manager.h | 52 +++-----------
system/ram-block-attributes.c | 130 -----------------------------------
2 files changed, 10 insertions(+), 172 deletions(-)
diff --git a/include/system/ram-discard-manager.h b/include/system/ram-discard-manager.h
index b188e09a30f..b5dbcb4a82d 100644
--- a/include/system/ram-discard-manager.h
+++ b/include/system/ram-discard-manager.h
@@ -77,8 +77,8 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
/**
* typedef ReplayRamDiscardState:
*
- * The callback handler for #RamDiscardSourceClass.replay_populated/
- * #RamDiscardSourceClass.replay_discarded to invoke on populated/discarded
+ * The callback handler used by ram_discard_manager_replay_populated() and
+ * ram_discard_manager_replay_discarded() to invoke on populated/discarded
* parts.
*
* @section: the #MemoryRegionSection of populated/discarded part
@@ -134,42 +134,6 @@ struct RamDiscardSourceClass {
*/
bool (*is_populated)(const RamDiscardSource *rds,
const MemoryRegionSection *section);
-
- /**
- * @replay_populated:
- *
- * Call the #ReplayRamDiscardState callback for all populated parts within
- * the #MemoryRegionSection via the #RamDiscardSource.
- *
- * In case any call fails, no further calls are made.
- *
- * @rds: the #RamDiscardSource
- * @section: the #MemoryRegionSection
- * @replay_fn: the #ReplayRamDiscardState callback
- * @opaque: pointer to forward to the callback
- *
- * Returns 0 on success, or a negative error if any notification failed.
- */
- int (*replay_populated)(const RamDiscardSource *rds,
- const MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn, void *opaque);
-
- /**
- * @replay_discarded:
- *
- * Call the #ReplayRamDiscardState callback for all discarded parts within
- * the #MemoryRegionSection via the #RamDiscardSource.
- *
- * @rds: the #RamDiscardSource
- * @section: the #MemoryRegionSection
- * @replay_fn: the #ReplayRamDiscardState callback
- * @opaque: pointer to forward to the callback
- *
- * Returns 0 on success, or a negative error if any notification failed.
- */
- int (*replay_discarded)(const RamDiscardSource *rds,
- const MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn, void *opaque);
};
/**
@@ -226,8 +190,10 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
/**
* ram_discard_manager_replay_populated:
*
- * A wrapper to call the #RamDiscardSourceClass.replay_populated callback
- * of the #RamDiscardSource sources.
+ * Iterate the given #MemoryRegionSection at minimum granularity, calling
+ * #RamDiscardSourceClass.is_populated for each chunk, and invoke @replay_fn
+ * for each contiguous populated range. In case any call fails, no further
+ * calls are made.
*
* @rdm: the #RamDiscardManager
* @section: the #MemoryRegionSection
@@ -244,8 +210,10 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
/**
* ram_discard_manager_replay_discarded:
*
- * A wrapper to call the #RamDiscardSourceClass.replay_discarded callback
- * of the #RamDiscardSource sources.
+ * Iterate the given #MemoryRegionSection at minimum granularity, calling
+ * #RamDiscardSourceClass.is_populated for each chunk, and invoke @replay_fn
+ * for each contiguous discarded range. In case any call fails, no further
+ * calls are made.
*
* @rdm: the #RamDiscardManager
* @section: the #MemoryRegionSection
diff --git a/system/ram-block-attributes.c b/system/ram-block-attributes.c
index 79c7e97d9a1..718c7075cec 100644
--- a/system/ram-block-attributes.c
+++ b/system/ram-block-attributes.c
@@ -32,106 +32,6 @@ ram_block_attributes_get_block_size(void)
return qemu_real_host_page_size();
}
-typedef int (*ram_block_attributes_section_cb)(MemoryRegionSection *s,
- void *arg);
-
-static int
-ram_block_attributes_for_each_populated_section(const RamBlockAttributes *attr,
- const MemoryRegionSection *section,
- void *arg,
- ram_block_attributes_section_cb cb)
-{
- unsigned long first_bit, last_bit;
- uint64_t offset, size;
- const size_t block_size = ram_block_attributes_get_block_size();
- int ret = 0;
-
- first_bit = section->offset_within_region / block_size;
- first_bit = find_next_bit(attr->bitmap, attr->bitmap_size,
- first_bit);
-
- while (first_bit < attr->bitmap_size) {
- MemoryRegionSection tmp = *section;
-
- offset = first_bit * block_size;
- last_bit = find_next_zero_bit(attr->bitmap, attr->bitmap_size,
- first_bit + 1) - 1;
- size = (last_bit - first_bit + 1) * block_size;
-
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- break;
- }
-
- ret = cb(&tmp, arg);
- if (ret) {
- error_report("%s: Failed to notify RAM discard listener: %s",
- __func__, strerror(-ret));
- break;
- }
-
- first_bit = find_next_bit(attr->bitmap, attr->bitmap_size,
- last_bit + 2);
- }
-
- return ret;
-}
-
-static int
-ram_block_attributes_for_each_discarded_section(const RamBlockAttributes *attr,
- const MemoryRegionSection *section,
- void *arg,
- ram_block_attributes_section_cb cb)
-{
- unsigned long first_bit, last_bit;
- uint64_t offset, size;
- const size_t block_size = ram_block_attributes_get_block_size();
- int ret = 0;
-
- first_bit = section->offset_within_region / block_size;
- first_bit = find_next_zero_bit(attr->bitmap, attr->bitmap_size,
- first_bit);
-
- while (first_bit < attr->bitmap_size) {
- MemoryRegionSection tmp = *section;
-
- offset = first_bit * block_size;
- last_bit = find_next_bit(attr->bitmap, attr->bitmap_size,
- first_bit + 1) - 1;
- size = (last_bit - first_bit + 1) * block_size;
-
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- break;
- }
-
- ret = cb(&tmp, arg);
- if (ret) {
- error_report("%s: Failed to notify RAM discard listener: %s",
- __func__, strerror(-ret));
- break;
- }
-
- first_bit = find_next_zero_bit(attr->bitmap,
- attr->bitmap_size,
- last_bit + 2);
- }
-
- return ret;
-}
-
-
-typedef struct RamBlockAttributesReplayData {
- ReplayRamDiscardState fn;
- void *opaque;
-} RamBlockAttributesReplayData;
-
-static int ram_block_attributes_rds_replay_cb(MemoryRegionSection *section,
- void *arg)
-{
- RamBlockAttributesReplayData *data = arg;
-
- return data->fn(section, data->opaque);
-}
-
/* RamDiscardSource interface implementation */
static uint64_t
ram_block_attributes_rds_get_min_granularity(const RamDiscardSource *rds,
@@ -159,34 +59,6 @@ ram_block_attributes_rds_is_populated(const RamDiscardSource *rds,
return first_discarded_bit > last_bit;
}
-static int
-ram_block_attributes_rds_replay_populated(const RamDiscardSource *rds,
- const MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn,
- void *opaque)
-{
- RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rds);
- RamBlockAttributesReplayData data = { .fn = replay_fn, .opaque = opaque };
-
- g_assert(section->mr == attr->ram_block->mr);
- return ram_block_attributes_for_each_populated_section(attr, section, &data,
- ram_block_attributes_rds_replay_cb);
-}
-
-static int
-ram_block_attributes_rds_replay_discarded(const RamDiscardSource *rds,
- const MemoryRegionSection *section,
- ReplayRamDiscardState replay_fn,
- void *opaque)
-{
- RamBlockAttributes *attr = RAM_BLOCK_ATTRIBUTES(rds);
- RamBlockAttributesReplayData data = { .fn = replay_fn, .opaque = opaque };
-
- g_assert(section->mr == attr->ram_block->mr);
- return ram_block_attributes_for_each_discarded_section(attr, section, &data,
- ram_block_attributes_rds_replay_cb);
-}
-
static bool
ram_block_attributes_is_valid_range(RamBlockAttributes *attr, uint64_t offset,
uint64_t size)
@@ -346,6 +218,4 @@ static void ram_block_attributes_class_init(ObjectClass *klass,
rdsc->get_min_granularity = ram_block_attributes_rds_get_min_granularity;
rdsc->is_populated = ram_block_attributes_rds_is_populated;
- rdsc->replay_populated = ram_block_attributes_rds_replay_populated;
- rdsc->replay_discarded = ram_block_attributes_rds_replay_discarded;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 07/13] system/memory: implement RamDiscardManager multi-source aggregation
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (5 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 06/13] system/ram-discard-manager: drop replay from source interface Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 08/13] system/physmem: destroy ram block attributes before RCU-deferred reclaim Marc-André Lureau
` (6 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Marc-André Lureau
Refactor RamDiscardManager to aggregate multiple RamDiscardSource
instances. This enables scenarios where multiple components (e.g.,
virtio-mem and RamBlockAttributes) can coordinate memory discard
state for the same memory region.
The aggregation uses:
- Populated: ALL sources populated
- Discarded: ANY source discarded
When a source is added with existing listeners, they are notified
about regions that become discarded. When a source is removed,
listeners are notified about regions that become populated.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
include/system/ram-discard-manager.h | 141 ++++++++++--
hw/virtio/virtio-mem.c | 8 +-
system/memory.c | 15 +-
system/ram-block-attributes.c | 6 +-
system/ram-discard-manager.c | 427 +++++++++++++++++++++++++++++++----
5 files changed, 514 insertions(+), 83 deletions(-)
diff --git a/include/system/ram-discard-manager.h b/include/system/ram-discard-manager.h
index b5dbcb4a82d..05d3d31b55a 100644
--- a/include/system/ram-discard-manager.h
+++ b/include/system/ram-discard-manager.h
@@ -170,30 +170,96 @@ struct RamDiscardSourceClass {
* becoming discarded in a different granularity than it was populated and the
* other way around.
*/
+
+typedef struct RamDiscardSourceEntry RamDiscardSourceEntry;
+
+struct RamDiscardSourceEntry {
+ RamDiscardSource *rds;
+ QLIST_ENTRY(RamDiscardSourceEntry) next;
+};
+
struct RamDiscardManager {
Object parent;
- RamDiscardSource *rds;
MemoryRegion *mr;
+ QLIST_HEAD(, RamDiscardSourceEntry) source_list;
+ uint64_t min_granularity;
QLIST_HEAD(, RamDiscardListener) rdl_list;
};
-RamDiscardManager *ram_discard_manager_new(MemoryRegion *mr,
- RamDiscardSource *rds);
+RamDiscardManager *ram_discard_manager_new(MemoryRegion *mr);
+
+/**
+ * ram_discard_manager_add_source:
+ *
+ * Register a #RamDiscardSource with the #RamDiscardManager. The manager
+ * aggregates state from all registered sources using AND semantics: a region
+ * is considered populated only if ALL sources report it as populated.
+ *
+ * If listeners are already registered, they will be notified about any
+ * regions that become discarded due to adding this source. Specifically,
+ * for each region that the new source reports as discarded, if all other
+ * sources reported it as populated, listeners receive a discard notification.
+ *
+ * If any listener rejects the notification (returns an error), previously
+ * notified listeners are rolled back with populate notifications and the
+ * source is not added.
+ *
+ * @rdm: the #RamDiscardManager
+ * @source: the #RamDiscardSource to add
+ *
+ * Returns: 0 on success, -EBUSY if @source is already registered, or a
+ * negative error code if a listener rejected the state change.
+ */
+int ram_discard_manager_add_source(RamDiscardManager *rdm,
+ RamDiscardSource *source);
+
+/**
+ * ram_discard_manager_del_source:
+ *
+ * Unregister a #RamDiscardSource from the #RamDiscardManager.
+ *
+ * If listeners are already registered, they will be notified about any
+ * regions that become populated due to removing this source. Specifically,
+ * for each region that the removed source reported as discarded, if all
+ * remaining sources report it as populated, listeners receive a populate
+ * notification.
+ *
+ * If any listener rejects the notification (returns an error), previously
+ * notified listeners are rolled back with discard notifications and the
+ * source is not removed.
+ *
+ * @rdm: the #RamDiscardManager
+ * @source: the #RamDiscardSource to remove
+ *
+ * Returns: 0 on success, -ENOENT if @source is not registered, or a
+ * negative error code if a listener rejected the state change.
+ */
+int ram_discard_manager_del_source(RamDiscardManager *rdm,
+ RamDiscardSource *source);
+
uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
const MemoryRegion *mr);
+/**
+ * ram_discard_manager_is_populated:
+ *
+ * Check if the given memory region section is populated.
+ * If the manager has no sources, it is considered populated.
+ *
+ * @rdm: the #RamDiscardManager
+ * @section: the #MemoryRegionSection to check
+ *
+ * Returns: true if the section is populated, false otherwise.
+ */
bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
const MemoryRegionSection *section);
/**
* ram_discard_manager_replay_populated:
*
- * Iterate the given #MemoryRegionSection at minimum granularity, calling
- * #RamDiscardSourceClass.is_populated for each chunk, and invoke @replay_fn
- * for each contiguous populated range. In case any call fails, no further
- * calls are made.
+ * Call @replay_fn on regions that are populated in all sources.
*
* @rdm: the #RamDiscardManager
* @section: the #MemoryRegionSection
@@ -210,10 +276,7 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
/**
* ram_discard_manager_replay_discarded:
*
- * Iterate the given #MemoryRegionSection at minimum granularity, calling
- * #RamDiscardSourceClass.is_populated for each chunk, and invoke @replay_fn
- * for each contiguous discarded range. In case any call fails, no further
- * calls are made.
+ * Call @replay_fn on regions that are discarded in any sources.
*
* @rdm: the #RamDiscardManager
* @section: the #MemoryRegionSection
@@ -234,31 +297,61 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
RamDiscardListener *rdl);
-/*
- * Note: later refactoring should take the source into account and the manager
- * should be able to aggregate multiple sources.
+/**
+ * ram_discard_manager_notify_populate:
+ *
+ * Notify listeners that a region is about to be populated by a source.
+ * For multi-source aggregation, only notifies when all sources agree
+ * the region is populated (intersection).
+ *
+ * @rdm: the #RamDiscardManager
+ * @source: the #RamDiscardSource that is populating
+ * @offset: offset within the memory region
+ * @size: size of the region being populated
+ *
+ * Returns 0 on success, or a negative error if any listener rejects.
*/
int ram_discard_manager_notify_populate(RamDiscardManager *rdm,
+ RamDiscardSource *source,
uint64_t offset, uint64_t size);
-/*
- * Note: later refactoring should take the source into account and the manager
- * should be able to aggregate multiple sources.
+/**
+ * ram_discard_manager_notify_discard:
+ *
+ * Notify listeners that a region has been discarded by a source.
+ * For multi-source aggregation, always notifies immediately
+ * (union semantics - any source discarding makes region discarded).
+ *
+ * @rdm: the #RamDiscardManager
+ * @source: the #RamDiscardSource that is discarding
+ * @offset: offset within the memory region
+ * @size: size of the region being discarded
*/
void ram_discard_manager_notify_discard(RamDiscardManager *rdm,
+ RamDiscardSource *source,
uint64_t offset, uint64_t size);
-/*
- * Note: later refactoring should take the source into account and the manager
- * should be able to aggregate multiple sources.
+/**
+ * ram_discard_manager_notify_discard_all:
+ *
+ * Notify listeners that all regions have been discarded by a source.
+ *
+ * @rdm: the #RamDiscardManager
+ * @source: the #RamDiscardSource that is discarding
*/
-void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm);
+void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm,
+ RamDiscardSource *source);
-/*
+/**
+ * ram_discard_manager_replay_populated_to_listeners:
+ *
* Replay populated sections to all registered listeners.
+ * For multi-source aggregation, only replays regions where all sources
+ * are populated (intersection).
*
- * Note: later refactoring should take the source into account and the manager
- * should be able to aggregate multiple sources.
+ * @rdm: the #RamDiscardManager
+ *
+ * Returns 0 on success, or a negative error if any notification failed.
*/
int ram_discard_manager_replay_populated_to_listeners(RamDiscardManager *rdm);
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 2b67b2882d2..35e03ed7599 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -264,7 +264,8 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
{
RamDiscardManager *rdm = memory_region_get_ram_discard_manager(&vmem->memdev->mr);
- ram_discard_manager_notify_discard(rdm, offset, size);
+ ram_discard_manager_notify_discard(rdm, RAM_DISCARD_SOURCE(vmem),
+ offset, size);
}
static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
@@ -272,7 +273,8 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
{
RamDiscardManager *rdm = memory_region_get_ram_discard_manager(&vmem->memdev->mr);
- return ram_discard_manager_notify_populate(rdm, offset, size);
+ return ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(vmem),
+ offset, size);
}
static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem)
@@ -283,7 +285,7 @@ static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem)
return;
}
- ram_discard_manager_notify_discard_all(rdm);
+ ram_discard_manager_notify_discard_all(rdm, RAM_DISCARD_SOURCE(vmem));
}
static bool virtio_mem_is_range_plugged(const VirtIOMEM *vmem,
diff --git a/system/memory.c b/system/memory.c
index 97695a253e6..4768375053a 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2046,21 +2046,22 @@ int memory_region_add_ram_discard_source(MemoryRegion *mr,
RamDiscardSource *source)
{
g_assert(memory_region_is_ram(mr));
- if (mr->rdm) {
- return -EBUSY;
+
+ if (!mr->rdm) {
+ mr->rdm = ram_discard_manager_new(mr);
}
- mr->rdm = ram_discard_manager_new(mr, RAM_DISCARD_SOURCE(source));
- return 0;
+ return ram_discard_manager_add_source(mr->rdm, source);
}
void memory_region_del_ram_discard_source(MemoryRegion *mr,
RamDiscardSource *source)
{
- g_assert(mr->rdm->rds == source);
+ g_assert(mr->rdm);
+
+ ram_discard_manager_del_source(mr->rdm, source);
- object_unref(mr->rdm);
- mr->rdm = NULL;
+ /* if there is no source and no listener left, we could free rdm */
}
/* Called with rcu_read_lock held. */
diff --git a/system/ram-block-attributes.c b/system/ram-block-attributes.c
index 718c7075cec..59ec7a28eb0 100644
--- a/system/ram-block-attributes.c
+++ b/system/ram-block-attributes.c
@@ -90,7 +90,8 @@ ram_block_attributes_notify_discard(RamBlockAttributes *attr,
{
RamDiscardManager *rdm = memory_region_get_ram_discard_manager(attr->ram_block->mr);
- ram_discard_manager_notify_discard(rdm, offset, size);
+ ram_discard_manager_notify_discard(rdm, RAM_DISCARD_SOURCE(attr),
+ offset, size);
}
static int
@@ -99,7 +100,8 @@ ram_block_attributes_notify_populate(RamBlockAttributes *attr,
{
RamDiscardManager *rdm = memory_region_get_ram_discard_manager(attr->ram_block->mr);
- return ram_discard_manager_notify_populate(rdm, offset, size);
+ return ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(attr),
+ offset, size);
}
int ram_block_attributes_state_change(RamBlockAttributes *attr,
diff --git a/system/ram-discard-manager.c b/system/ram-discard-manager.c
index a907ddf3708..7da91bf648a 100644
--- a/system/ram-discard-manager.c
+++ b/system/ram-discard-manager.c
@@ -7,6 +7,7 @@
#include "qemu/osdep.h"
#include "qemu/error-report.h"
+#include "qemu/queue.h"
#include "system/memory.h"
static uint64_t ram_discard_source_get_min_granularity(const RamDiscardSource *rds,
@@ -28,20 +29,21 @@ static bool ram_discard_source_is_populated(const RamDiscardSource *rds,
}
/*
- * Iterate the section at source granularity, aggregating consecutive chunks
- * with matching populated state, and call replay_fn for each run.
+ * Iterate a single source's populated or discarded regions and call
+ * replay_fn for each contiguous run.
*/
-static int replay_by_populated_state(const RamDiscardManager *rdm,
- const MemoryRegionSection *section,
- bool replay_populated,
- ReplayRamDiscardState replay_fn,
- void *opaque)
+static int replay_source_by_state(const RamDiscardSource *source,
+ const MemoryRegion *mr,
+ const MemoryRegionSection *section,
+ bool replay_populated,
+ ReplayRamDiscardState replay_fn,
+ void *opaque)
{
uint64_t granularity, offset, size, end, pos, run_start = 0;
bool in_run = false;
int ret = 0;
- granularity = ram_discard_source_get_min_granularity(rdm->rds, rdm->mr);
+ granularity = ram_discard_source_get_min_granularity(source, mr);
offset = section->offset_within_region;
size = int128_get64(section->size);
end = offset + size;
@@ -55,7 +57,7 @@ static int replay_by_populated_state(const RamDiscardManager *rdm,
.offset_within_region = pos,
.size = int128_make64(granularity),
};
- bool populated = ram_discard_source_is_populated(rdm->rds, &chunk);
+ bool populated = ram_discard_source_is_populated(source, &chunk);
if (populated == replay_populated) {
if (!in_run) {
@@ -88,28 +90,338 @@ static int replay_by_populated_state(const RamDiscardManager *rdm,
return ret;
}
-RamDiscardManager *ram_discard_manager_new(MemoryRegion *mr,
- RamDiscardSource *rds)
+RamDiscardManager *ram_discard_manager_new(MemoryRegion *mr)
{
RamDiscardManager *rdm;
rdm = RAM_DISCARD_MANAGER(object_new(TYPE_RAM_DISCARD_MANAGER));
- rdm->rds = rds;
rdm->mr = mr;
- QLIST_INIT(&rdm->rdl_list);
return rdm;
}
+static void ram_discard_manager_update_granularity(RamDiscardManager *rdm)
+{
+ RamDiscardSourceEntry *entry;
+ uint64_t granularity = 0;
+
+ QLIST_FOREACH(entry, &rdm->source_list, next) {
+ uint64_t src_granularity;
+
+ src_granularity =
+ ram_discard_source_get_min_granularity(entry->rds, rdm->mr);
+ g_assert(src_granularity != 0);
+ if (granularity == 0) {
+ granularity = src_granularity;
+ } else {
+ granularity = MIN(granularity, src_granularity);
+ }
+ }
+ rdm->min_granularity = granularity;
+}
+
+static RamDiscardSourceEntry *
+ram_discard_manager_find_source(RamDiscardManager *rdm, RamDiscardSource *rds)
+{
+ RamDiscardSourceEntry *entry;
+
+ QLIST_FOREACH(entry, &rdm->source_list, next) {
+ if (entry->rds == rds) {
+ return entry;
+ }
+ }
+ return NULL;
+}
+
+static int rdl_populate_cb(const MemoryRegionSection *section, void *opaque)
+{
+ RamDiscardListener *rdl = opaque;
+ MemoryRegionSection tmp = *rdl->section;
+
+ g_assert(section->mr == rdl->section->mr);
+
+ if (!memory_region_section_intersect_range(&tmp,
+ section->offset_within_region,
+ int128_get64(section->size))) {
+ return 0;
+ }
+
+ return rdl->notify_populate(rdl, &tmp);
+}
+
+static int rdl_discard_cb(const MemoryRegionSection *section, void *opaque)
+{
+ RamDiscardListener *rdl = opaque;
+ MemoryRegionSection tmp = *rdl->section;
+
+ g_assert(section->mr == rdl->section->mr);
+
+ if (!memory_region_section_intersect_range(&tmp,
+ section->offset_within_region,
+ int128_get64(section->size))) {
+ return 0;
+ }
+
+ rdl->notify_discard(rdl, &tmp);
+ return 0;
+}
+
+static bool rdm_is_all_populated_skip(const RamDiscardManager *rdm,
+ const MemoryRegionSection *section,
+ const RamDiscardSource *skip_source)
+{
+ RamDiscardSourceEntry *entry;
+
+ QLIST_FOREACH(entry, &rdm->source_list, next) {
+ if (skip_source && entry->rds == skip_source) {
+ continue;
+ }
+ if (!ram_discard_source_is_populated(entry->rds, section)) {
+ return false;
+ }
+ }
+ return true;
+}
+
+typedef struct SourceNotifyCtx {
+ RamDiscardManager *rdm;
+ RamDiscardListener *rdl;
+ RamDiscardSource *source; /* added or removed */
+} SourceNotifyCtx;
+
+/*
+ * Unified helper to replay regions based on populated state.
+ * If replay_populated is true: replay regions where ALL sources are populated.
+ * If replay_populated is false: replay regions where ANY source is discarded.
+ */
+static int replay_by_populated_state(const RamDiscardManager *rdm,
+ const MemoryRegionSection *section,
+ const RamDiscardSource *skip_source,
+ bool replay_populated,
+ ReplayRamDiscardState replay_fn,
+ void *user_opaque)
+{
+ uint64_t granularity = rdm->min_granularity;
+ uint64_t offset, end_offset;
+ uint64_t run_start = 0;
+ bool in_run = false;
+ int ret = 0;
+
+ if (QLIST_EMPTY(&rdm->source_list)) {
+ if (replay_populated) {
+ return replay_fn(section, user_opaque);
+ }
+ return 0;
+ }
+
+ g_assert(granularity != 0);
+
+ offset = section->offset_within_region;
+ end_offset = offset + int128_get64(section->size);
+
+ while (offset < end_offset) {
+ MemoryRegionSection subsection = {
+ .mr = section->mr,
+ .offset_within_region = offset,
+ .size = int128_make64(MIN(granularity, end_offset - offset)),
+ };
+ bool all_populated;
+ bool included;
+
+ all_populated = rdm_is_all_populated_skip(rdm, &subsection,
+ skip_source);
+ included = replay_populated ? all_populated : !all_populated;
+
+ if (included) {
+ if (!in_run) {
+ run_start = offset;
+ in_run = true;
+ }
+ } else {
+ if (in_run) {
+ MemoryRegionSection run_section = {
+ .mr = section->mr,
+ .offset_within_region = run_start,
+ .size = int128_make64(offset - run_start),
+ };
+ ret = replay_fn(&run_section, user_opaque);
+ if (ret) {
+ return ret;
+ }
+ in_run = false;
+ }
+ }
+ if (granularity > end_offset - offset) {
+ break;
+ }
+ offset += granularity;
+ }
+
+ if (in_run) {
+ MemoryRegionSection run_section = {
+ .mr = section->mr,
+ .offset_within_region = run_start,
+ .size = int128_make64(end_offset - run_start),
+ };
+ ret = replay_fn(&run_section, user_opaque);
+ }
+
+ return ret;
+}
+
+static int add_source_check_discard_cb(const MemoryRegionSection *section,
+ void *opaque)
+{
+ SourceNotifyCtx *ctx = opaque;
+
+ return replay_by_populated_state(ctx->rdm, section, ctx->source, true,
+ rdl_discard_cb, ctx->rdl);
+}
+
+static int del_source_check_populate_cb(const MemoryRegionSection *section,
+ void *opaque)
+{
+ SourceNotifyCtx *ctx = opaque;
+
+ return replay_by_populated_state(ctx->rdm, section, ctx->source, true,
+ rdl_populate_cb, ctx->rdl);
+}
+
+int ram_discard_manager_add_source(RamDiscardManager *rdm,
+ RamDiscardSource *source)
+{
+ RamDiscardSourceEntry *entry;
+ RamDiscardListener *rdl, *rdl2;
+ int ret = 0;
+
+ if (ram_discard_manager_find_source(rdm, source)) {
+ return -EBUSY;
+ }
+
+ /*
+ * If there are existing listeners, notify them about regions that
+ * become discarded due to adding this source. Only notify for regions
+ * that were previously populated (all other sources agreed).
+ */
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ SourceNotifyCtx ctx = {
+ .rdm = rdm,
+ .rdl = rdl,
+ /* no need to set source */
+ };
+ ret = replay_source_by_state(source, rdm->mr, rdl->section,
+ false,
+ add_source_check_discard_cb, &ctx);
+ if (ret) {
+ break;
+ }
+ }
+ if (ret) {
+ QLIST_FOREACH(rdl2, &rdm->rdl_list, next) {
+ SourceNotifyCtx ctx = {
+ .rdm = rdm,
+ .rdl = rdl2,
+ };
+ replay_source_by_state(source, rdm->mr, rdl2->section,
+ false,
+ del_source_check_populate_cb,
+ &ctx);
+ if (rdl == rdl2) {
+ break;
+ }
+ }
+
+ return ret;
+ }
+
+ entry = g_new0(RamDiscardSourceEntry, 1);
+ entry->rds = source;
+ QLIST_INSERT_HEAD(&rdm->source_list, entry, next);
+
+ ram_discard_manager_update_granularity(rdm);
+
+ return ret;
+}
+
+int ram_discard_manager_del_source(RamDiscardManager *rdm,
+ RamDiscardSource *source)
+{
+ RamDiscardSourceEntry *entry;
+ RamDiscardListener *rdl, *rdl2;
+ int ret = 0;
+
+ entry = ram_discard_manager_find_source(rdm, source);
+ if (!entry) {
+ return -ENOENT;
+ }
+
+ /*
+ * If there are existing listeners, check if any regions become
+ * populated due to removing this source.
+ */
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ SourceNotifyCtx ctx = {
+ .rdm = rdm,
+ .rdl = rdl,
+ .source = source,
+ };
+ /*
+ * From the previously discarded regions, check if any
+ * regions become populated.
+ */
+ ret = replay_source_by_state(source, rdm->mr, rdl->section,
+ false,
+ del_source_check_populate_cb,
+ &ctx);
+ if (ret) {
+ break;
+ }
+ }
+ if (ret) {
+ QLIST_FOREACH(rdl2, &rdm->rdl_list, next) {
+ SourceNotifyCtx ctx = {
+ .rdm = rdm,
+ .rdl = rdl2,
+ .source = source,
+ };
+ replay_source_by_state(source, rdm->mr, rdl2->section,
+ false,
+ add_source_check_discard_cb,
+ &ctx);
+ if (rdl == rdl2) {
+ break;
+ }
+ }
+
+ return ret;
+ }
+
+ QLIST_REMOVE(entry, next);
+ g_free(entry);
+ ram_discard_manager_update_granularity(rdm);
+ return ret;
+}
+
uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
const MemoryRegion *mr)
{
- return ram_discard_source_get_min_granularity(rdm->rds, mr);
+ g_assert(mr == rdm->mr);
+ return rdm->min_granularity;
}
+/*
+ * Aggregated query: returns true only if ALL sources report populated (AND).
+ */
bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
const MemoryRegionSection *section)
{
- return ram_discard_source_is_populated(rdm->rds, section);
+ RamDiscardSourceEntry *entry;
+
+ QLIST_FOREACH(entry, &rdm->source_list, next) {
+ if (!ram_discard_source_is_populated(entry->rds, section)) {
+ return false;
+ }
+ }
+ return true;
}
int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
@@ -117,7 +429,8 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
ReplayRamDiscardState replay_fn,
void *opaque)
{
- return replay_by_populated_state(rdm, section, true, replay_fn, opaque);
+ return replay_by_populated_state(rdm, section, NULL, true,
+ replay_fn, opaque);
}
int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
@@ -125,14 +438,17 @@ int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
ReplayRamDiscardState replay_fn,
void *opaque)
{
- return replay_by_populated_state(rdm, section, false, replay_fn, opaque);
+ return replay_by_populated_state(rdm, section, NULL, false,
+ replay_fn, opaque);
}
static void ram_discard_manager_initfn(Object *obj)
{
RamDiscardManager *rdm = RAM_DISCARD_MANAGER(obj);
+ QLIST_INIT(&rdm->source_list);
QLIST_INIT(&rdm->rdl_list);
+ rdm->min_granularity = 0;
}
static void ram_discard_manager_finalize(Object *obj)
@@ -140,74 +456,91 @@ static void ram_discard_manager_finalize(Object *obj)
RamDiscardManager *rdm = RAM_DISCARD_MANAGER(obj);
g_assert(QLIST_EMPTY(&rdm->rdl_list));
+ g_assert(QLIST_EMPTY(&rdm->source_list));
}
int ram_discard_manager_notify_populate(RamDiscardManager *rdm,
+ RamDiscardSource *source,
uint64_t offset, uint64_t size)
{
RamDiscardListener *rdl, *rdl2;
+ MemoryRegionSection section = {
+ .mr = rdm->mr,
+ .offset_within_region = offset,
+ .size = int128_make64(size),
+ };
int ret = 0;
- QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
- MemoryRegionSection tmp = *rdl->section;
+ g_assert(ram_discard_manager_find_source(rdm, source));
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- ret = rdl->notify_populate(rdl, &tmp);
+ /*
+ * Only notify about regions that are populated in ALL sources.
+ * Skip the calling source: it has implicitly declared itself populated
+ * for this range but may not have updated its bitmap yet.
+ */
+ QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
+ ret = replay_by_populated_state(rdm, §ion, source, true,
+ rdl_populate_cb, rdl);
if (ret) {
break;
}
}
if (ret) {
- /* Notify all already-notified listeners about discard. */
+ /*
+ * Rollback: notify discard for listeners we already notified,
+ * including the failing listener which may have been partially
+ * notified. Listeners must handle discard notifications for
+ * regions they didn't receive populate notifications for.
+ */
QLIST_FOREACH(rdl2, &rdm->rdl_list, next) {
- MemoryRegionSection tmp = *rdl2->section;
-
+ replay_by_populated_state(rdm, §ion, source, true,
+ rdl_discard_cb, rdl2);
if (rdl2 == rdl) {
break;
}
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- rdl2->notify_discard(rdl2, &tmp);
}
}
return ret;
}
void ram_discard_manager_notify_discard(RamDiscardManager *rdm,
+ RamDiscardSource *source,
uint64_t offset, uint64_t size)
{
RamDiscardListener *rdl;
-
+ MemoryRegionSection section = {
+ .mr = rdm->mr,
+ .offset_within_region = offset,
+ .size = int128_make64(size),
+ };
+
+ g_assert(ram_discard_manager_find_source(rdm, source));
+
+ /*
+ * Only notify about ranges that were aggregately populated before this
+ * source's discard. Since the source has already updated its state,
+ * we use replay_by_populated_state with this source skipped - it will
+ * replay only the ranges where all OTHER sources are populated.
+ */
QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
- MemoryRegionSection tmp = *rdl->section;
-
- if (!memory_region_section_intersect_range(&tmp, offset, size)) {
- continue;
- }
- rdl->notify_discard(rdl, &tmp);
+ replay_by_populated_state(rdm, §ion, source, true,
+ rdl_discard_cb, rdl);
}
}
-void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm)
+void ram_discard_manager_notify_discard_all(RamDiscardManager *rdm,
+ RamDiscardSource *source)
{
RamDiscardListener *rdl;
+ g_assert(ram_discard_manager_find_source(rdm, source));
+
QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
rdl->notify_discard(rdl, rdl->section);
}
}
-static int rdm_populate_cb(const MemoryRegionSection *section, void *opaque)
-{
- RamDiscardListener *rdl = opaque;
-
- return rdl->notify_populate(rdl, section);
-}
-
void ram_discard_manager_register_listener(RamDiscardManager *rdm,
RamDiscardListener *rdl,
MemoryRegionSection *section)
@@ -220,7 +553,7 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
QLIST_INSERT_HEAD(&rdm->rdl_list, rdl, next);
ret = ram_discard_manager_replay_populated(rdm, rdl->section,
- rdm_populate_cb, rdl);
+ rdl_populate_cb, rdl);
if (ret) {
error_report("%s: Replaying populated ranges failed: %s", __func__,
strerror(-ret));
@@ -246,7 +579,7 @@ int ram_discard_manager_replay_populated_to_listeners(RamDiscardManager *rdm)
QLIST_FOREACH(rdl, &rdm->rdl_list, next) {
ret = ram_discard_manager_replay_populated(rdm, rdl->section,
- rdm_populate_cb, rdl);
+ rdl_populate_cb, rdl);
if (ret) {
break;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 08/13] system/physmem: destroy ram block attributes before RCU-deferred reclaim
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (6 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 07/13] system/memory: implement RamDiscardManager multi-source aggregation Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 09/13] system/memory: add RamDiscardManager reference counting and cleanup Marc-André Lureau
` (5 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Marc-André Lureau
ram_block_attributes_destroy() was called from reclaim_ramblock(), which
runs as an RCU callback deferred by call_rcu().
However,when the RamDiscardManager is finalized, it will assert that its
source_list is empty in the next commit. Since the RCU callback hasn't
run yet, the source added by ram_block_attributes_create() is still
attached.
Move ram_block_attributes_destroy() into qemu_ram_free() so the source
is removed synchronously. This is safe because qemu_ram_free() during
shutdown runs after pause_all_vcpus(), so no vCPU thread can
concurrently access the attributes via kvm_convert_memory().
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
system/physmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/system/physmem.c b/system/physmem.c
index c58d940e807..a8472c91dff 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2583,7 +2583,6 @@ static void reclaim_ramblock(RAMBlock *block)
}
if (block->guest_memfd >= 0) {
- ram_block_attributes_destroy(block->attributes);
close(block->guest_memfd);
ram_block_coordinated_discard_require(false);
}
@@ -2612,6 +2611,7 @@ void qemu_ram_free(RAMBlock *block)
/* Write list before version */
smp_wmb();
ram_list.version++;
+ g_clear_pointer(&block->attributes, ram_block_attributes_destroy);
call_rcu(block, reclaim_ramblock, rcu);
qemu_mutex_unlock_ramlist();
}
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 09/13] system/memory: add RamDiscardManager reference counting and cleanup
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (7 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 08/13] system/physmem: destroy ram block attributes before RCU-deferred reclaim Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 10/13] tests: add unit tests for RamDiscardManager multi-source aggregation Marc-André Lureau
` (4 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Marc-André Lureau
Listeners now hold a reference to the RamDiscardManager, ensuring it
stays alive while listeners are registered. The RDM is eagerly freed
when the last source and listener are removed, and also unreffed during
MemoryRegion finalization as a safety net.
This completes the TODO left in the previous commit and prevents both
use-after-free and memory leaks of the RamDiscardManager.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
system/memory.c | 7 +++++--
system/ram-discard-manager.c | 2 ++
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/system/memory.c b/system/memory.c
index 4768375053a..456c53a1af4 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1754,6 +1754,7 @@ static void memory_region_finalize(Object *obj)
memory_region_clear_coalescing(mr);
g_free((char *)mr->name);
g_free(mr->ioeventfds);
+ object_unref(mr->rdm);
}
Object *memory_region_owner(const MemoryRegion *mr)
@@ -2060,8 +2061,10 @@ void memory_region_del_ram_discard_source(MemoryRegion *mr,
g_assert(mr->rdm);
ram_discard_manager_del_source(mr->rdm, source);
-
- /* if there is no source and no listener left, we could free rdm */
+ if (QLIST_EMPTY(&mr->rdm->source_list) && QLIST_EMPTY(&mr->rdm->rdl_list)) {
+ object_unref(mr->rdm);
+ mr->rdm = NULL;
+ }
}
/* Called with rcu_read_lock held. */
diff --git a/system/ram-discard-manager.c b/system/ram-discard-manager.c
index 7da91bf648a..4e8816e5a2f 100644
--- a/system/ram-discard-manager.c
+++ b/system/ram-discard-manager.c
@@ -549,6 +549,7 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
g_assert(section->mr == rdm->mr);
+ object_ref(rdm);
rdl->section = memory_region_section_new_copy(section);
QLIST_INSERT_HEAD(&rdm->rdl_list, rdl, next);
@@ -570,6 +571,7 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
memory_region_section_free_copy(rdl->section);
rdl->section = NULL;
QLIST_REMOVE(rdl, next);
+ object_unref(rdm);
}
int ram_discard_manager_replay_populated_to_listeners(RamDiscardManager *rdm)
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 10/13] tests: add unit tests for RamDiscardManager multi-source aggregation
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (8 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 09/13] system/memory: add RamDiscardManager reference counting and cleanup Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 11/13] system/physmem: make ram_block_discard_range() handle guest_memfd Marc-André Lureau
` (3 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Marc-André Lureau
Add various unit tests for the RamDiscardManager multi-source
aggregation functionality.
The test uses a TestRamDiscardSource QOM object that tracks populated
state via a bitmap, similar to RamBlockAttributes implementation.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
MAINTAINERS | 2 +
tests/unit/test-ram-discard-manager-stubs.c | 48 ++
tests/unit/test-ram-discard-manager.c | 1235 +++++++++++++++++++++++++++
tests/unit/meson.build | 8 +-
4 files changed, 1292 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 239218bc1f1..3601da89080 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3328,6 +3328,8 @@ F: system/memory-internal.h
F: system/ram-block-attributes.c
F: system/ram-discard-manager.c
F: scripts/coccinelle/memory-region-housekeeping.cocci
+F: tests/unit/test-ram-discard-manager.c
+F: tests/unit/test-ram-discard-manager-stubs.c
Memory devices
M: David Hildenbrand <david@kernel.org>
diff --git a/tests/unit/test-ram-discard-manager-stubs.c b/tests/unit/test-ram-discard-manager-stubs.c
new file mode 100644
index 00000000000..c4fb3ff473b
--- /dev/null
+++ b/tests/unit/test-ram-discard-manager-stubs.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#include "qemu/osdep.h"
+#include "qom/object.h"
+#include "glib.h"
+#include "system/memory.h"
+
+RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
+{
+ return mr->rdm;
+}
+
+int memory_region_add_ram_discard_source(MemoryRegion *mr,
+ RamDiscardSource *source)
+{
+ if (!mr->rdm) {
+ mr->rdm = ram_discard_manager_new(mr);
+ }
+ return ram_discard_manager_add_source(mr->rdm, source);
+}
+
+void memory_region_del_ram_discard_source(MemoryRegion *mr,
+ RamDiscardSource *source)
+{
+ RamDiscardManager *rdm = mr->rdm;
+
+ if (!rdm) {
+ return;
+ }
+
+ ram_discard_manager_del_source(rdm, source);
+}
+
+uint64_t memory_region_size(const MemoryRegion *mr)
+{
+ return int128_get64(mr->size);
+}
+
+MemoryRegionSection *memory_region_section_new_copy(MemoryRegionSection *s)
+{
+ MemoryRegionSection *copy = g_new(MemoryRegionSection, 1);
+ *copy = *s;
+ return copy;
+}
+
+void memory_region_section_free_copy(MemoryRegionSection *s)
+{
+ g_free(s);
+}
diff --git a/tests/unit/test-ram-discard-manager.c b/tests/unit/test-ram-discard-manager.c
new file mode 100644
index 00000000000..3d39a1e94ba
--- /dev/null
+++ b/tests/unit/test-ram-discard-manager.c
@@ -0,0 +1,1235 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#include "qemu/osdep.h"
+#include "qemu/bitmap.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "qom/qom-qobject.h"
+#include "glib.h"
+#include "system/memory.h"
+
+#define TYPE_TEST_RAM_DISCARD_SOURCE "test-ram-discard-source"
+
+OBJECT_DECLARE_SIMPLE_TYPE(TestRamDiscardSource, TEST_RAM_DISCARD_SOURCE)
+
+struct TestRamDiscardSource {
+ Object parent;
+
+ MemoryRegion *mr;
+ uint64_t granularity;
+ unsigned long *bitmap;
+ uint64_t bitmap_size;
+};
+
+static uint64_t test_rds_get_min_granularity(const RamDiscardSource *rds,
+ const MemoryRegion *mr)
+{
+ TestRamDiscardSource *src = TEST_RAM_DISCARD_SOURCE(rds);
+
+ g_assert(mr == src->mr);
+ return src->granularity;
+}
+
+static bool test_rds_is_populated(const RamDiscardSource *rds,
+ const MemoryRegionSection *section)
+{
+ TestRamDiscardSource *src = TEST_RAM_DISCARD_SOURCE(rds);
+ uint64_t offset = section->offset_within_region;
+ uint64_t size = int128_get64(section->size);
+ uint64_t first_bit = offset / src->granularity;
+ uint64_t last_bit = (offset + size - 1) / src->granularity;
+ unsigned long found;
+
+ g_assert(section->mr == src->mr);
+
+ /* Check if any bit in range is zero (discarded) */
+ found = find_next_zero_bit(src->bitmap, last_bit + 1, first_bit);
+ return found > last_bit;
+}
+
+static void test_rds_class_init(ObjectClass *klass, const void *data)
+{
+ RamDiscardSourceClass *rdsc = RAM_DISCARD_SOURCE_CLASS(klass);
+
+ rdsc->get_min_granularity = test_rds_get_min_granularity;
+ rdsc->is_populated = test_rds_is_populated;
+}
+
+static const TypeInfo test_rds_info = {
+ .name = TYPE_TEST_RAM_DISCARD_SOURCE,
+ .parent = TYPE_OBJECT,
+ .instance_size = sizeof(TestRamDiscardSource),
+ .class_init = test_rds_class_init,
+ .interfaces = (const InterfaceInfo[]) {
+ { TYPE_RAM_DISCARD_SOURCE },
+ { }
+ },
+};
+
+static TestRamDiscardSource *test_source_new(MemoryRegion *mr,
+ uint64_t granularity)
+{
+ TestRamDiscardSource *src;
+ uint64_t region_size = memory_region_size(mr);
+
+ src = TEST_RAM_DISCARD_SOURCE(object_new(TYPE_TEST_RAM_DISCARD_SOURCE));
+ src->mr = mr;
+ src->granularity = granularity;
+ src->bitmap_size = DIV_ROUND_UP(region_size, granularity);
+ src->bitmap = bitmap_new(src->bitmap_size);
+
+ return src;
+}
+
+static void test_source_free(TestRamDiscardSource *src)
+{
+ g_free(src->bitmap);
+ object_unref(OBJECT(src));
+}
+
+static void test_source_populate(TestRamDiscardSource *src,
+ uint64_t offset, uint64_t size)
+{
+ uint64_t first_bit = offset / src->granularity;
+ uint64_t nbits = size / src->granularity;
+
+ bitmap_set(src->bitmap, first_bit, nbits);
+}
+
+static void test_source_discard(TestRamDiscardSource *src,
+ uint64_t offset, uint64_t size)
+{
+ uint64_t first_bit = offset / src->granularity;
+ uint64_t nbits = size / src->granularity;
+
+ bitmap_clear(src->bitmap, first_bit, nbits);
+}
+
+typedef struct TestListener {
+ RamDiscardListener rdl;
+ int populate_count;
+ int discard_count;
+ uint64_t last_populate_offset;
+ uint64_t last_populate_size;
+ uint64_t last_discard_offset;
+ uint64_t last_discard_size;
+ int fail_on_populate; /* Return error on Nth populate */
+ int populate_call_num;
+} TestListener;
+
+static int test_listener_populate(RamDiscardListener *rdl,
+ const MemoryRegionSection *section)
+{
+ TestListener *tl = container_of(rdl, TestListener, rdl);
+
+ tl->populate_call_num++;
+ if (tl->fail_on_populate > 0 &&
+ tl->populate_call_num >= tl->fail_on_populate) {
+ return -ENOMEM;
+ }
+
+ tl->populate_count++;
+ tl->last_populate_offset = section->offset_within_region;
+ tl->last_populate_size = int128_get64(section->size);
+ return 0;
+}
+
+static void test_listener_discard(RamDiscardListener *rdl,
+ const MemoryRegionSection *section)
+{
+ TestListener *tl = container_of(rdl, TestListener, rdl);
+
+ tl->discard_count++;
+ tl->last_discard_offset = section->offset_within_region;
+ tl->last_discard_size = int128_get64(section->size);
+}
+
+static void test_listener_init(TestListener *tl)
+{
+ ram_discard_listener_init(&tl->rdl,
+ test_listener_populate,
+ test_listener_discard);
+}
+
+#define TEST_REGION_SIZE (16 * 1024 * 1024) /* 16 MB */
+#define GRANULARITY_4K (4 * 1024)
+#define GRANULARITY_2M (2 * 1024 * 1024)
+
+static MemoryRegion *test_mr;
+
+static void test_setup(void)
+{
+ test_mr = g_new0(MemoryRegion, 1);
+ test_mr->size = int128_make64(TEST_REGION_SIZE);
+ test_mr->ram = true;
+}
+
+static void test_teardown(void)
+{
+ g_clear_pointer(&test_mr->rdm, object_unref);
+ object_unparent(OBJECT(test_mr));
+ g_free(test_mr);
+ test_mr = NULL;
+}
+
+static void test_single_source_basic(void)
+{
+ TestRamDiscardSource *src;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ int ret;
+
+ test_setup();
+
+ src = test_source_new(test_mr, GRANULARITY_4K);
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+ g_assert_null(rdm);
+
+ /* Add source */
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, 0);
+
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+ g_assert_nonnull(rdm);
+
+ g_assert_cmpuint(ram_discard_manager_get_min_granularity(rdm, test_mr),
+ ==, GRANULARITY_4K);
+
+ /* Initially all discarded */
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(GRANULARITY_4K);
+ g_assert_false(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Populate a range in source */
+ test_source_populate(src, 0, GRANULARITY_4K * 4);
+
+ /* Now should be populated */
+ g_assert_true(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Check larger section */
+ section.size = int128_make64(GRANULARITY_4K * 4);
+ g_assert_true(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Check section that spans populated and discarded */
+ section.size = int128_make64(GRANULARITY_4K * 8);
+ g_assert_false(ram_discard_manager_is_populated(rdm, §ion));
+
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src));
+
+ g_assert_true(ram_discard_manager_is_populated(rdm, §ion));
+
+ test_source_free(src);
+ test_teardown();
+}
+
+static void test_single_source_listener(void)
+{
+ TestRamDiscardSource *src;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ TestListener tl = { 0, };
+ int ret;
+
+ test_setup();
+
+ src = test_source_new(test_mr, GRANULARITY_4K);
+
+ /* Populate some ranges before adding listener */
+ test_source_populate(src, 0, GRANULARITY_4K * 4);
+ test_source_populate(src, GRANULARITY_4K * 8, GRANULARITY_4K * 4);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, 0);
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+ g_assert_nonnull(rdm);
+
+ /* Register listener */
+ test_listener_init(&tl);
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(TEST_REGION_SIZE);
+
+ ram_discard_manager_register_listener(rdm, &tl.rdl, §ion);
+
+ /* Should have been notified about populated regions */
+ g_assert_cmpint(tl.populate_count, ==, 2);
+
+ /* Notify populate for new range */
+ tl.populate_count = 0;
+ test_source_populate(src, GRANULARITY_4K * 16, GRANULARITY_4K * 2);
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src),
+ GRANULARITY_4K * 16,
+ GRANULARITY_4K * 2);
+ g_assert_cmpint(ret, ==, 0);
+ g_assert_cmpint(tl.populate_count, ==, 1);
+ g_assert_cmpuint(tl.last_populate_offset, ==, GRANULARITY_4K * 16);
+ g_assert_cmpuint(tl.last_populate_size, ==, GRANULARITY_4K * 2);
+
+ /* Notify discard */
+ tl.discard_count = 0;
+ test_source_discard(src, 0, GRANULARITY_4K * 4);
+ ram_discard_manager_notify_discard(rdm, RAM_DISCARD_SOURCE(src),
+ 0, GRANULARITY_4K * 4);
+ g_assert_cmpint(tl.discard_count, ==, 1);
+ g_assert_cmpuint(tl.last_discard_offset, ==, 0);
+ g_assert_cmpuint(tl.last_discard_size, ==, GRANULARITY_4K * 4);
+
+ /* Unregister listener */
+ ram_discard_manager_unregister_listener(rdm, &tl.rdl);
+
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src));
+ test_source_free(src);
+ test_teardown();
+}
+
+static void test_two_sources_same_granularity(void)
+{
+ TestRamDiscardSource *src1, *src2;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ int ret;
+
+ test_setup();
+
+ src1 = test_source_new(test_mr, GRANULARITY_4K);
+ src2 = test_source_new(test_mr, GRANULARITY_4K);
+
+ /* Add first source */
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src1));
+ g_assert_cmpint(ret, ==, 0);
+
+ /* Add second source */
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src2));
+ g_assert_cmpint(ret, ==, 0);
+
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+ g_assert_nonnull(rdm);
+
+ /* Check granularity */
+ g_assert_cmpuint(ram_discard_manager_get_min_granularity(rdm, test_mr),
+ ==, GRANULARITY_4K);
+
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(GRANULARITY_4K);
+
+ /* Both discarded -> aggregated discarded */
+ g_assert_false(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Populate in src1 only */
+ test_source_populate(src1, 0, GRANULARITY_4K);
+ g_assert_false(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Populate in src2 only */
+ test_source_discard(src1, 0, GRANULARITY_4K);
+ test_source_populate(src2, 0, GRANULARITY_4K);
+ g_assert_false(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Populate in both -> aggregated populated */
+ test_source_populate(src1, 0, GRANULARITY_4K);
+ g_assert_true(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Remove sources */
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src2));
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src1));
+
+ test_source_free(src2);
+ test_source_free(src1);
+ test_teardown();
+}
+
+/*
+ * Test: Two sources with different granularities (4K and 2M).
+ * The aggregated granularity should be GCD(4K, 2M) = 4K.
+ */
+static void test_two_sources_different_granularity(void)
+{
+ TestRamDiscardSource *src_4k, *src_2m;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ int ret;
+
+ test_setup();
+
+ src_4k = test_source_new(test_mr, GRANULARITY_4K);
+ src_2m = test_source_new(test_mr, GRANULARITY_2M);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src_4k));
+ g_assert_cmpint(ret, ==, 0);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src_2m));
+ g_assert_cmpint(ret, ==, 0);
+
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ g_assert_cmpuint(ram_discard_manager_get_min_granularity(rdm, test_mr),
+ ==, GRANULARITY_4K);
+
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(GRANULARITY_4K);
+
+ /* Both discarded */
+ g_assert_false(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Populate 4K in src_4k, but src_2m still discarded the whole 2M block */
+ test_source_populate(src_4k, 0, GRANULARITY_4K);
+ g_assert_false(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Populate 2M in src_2m (which includes the 4K block) */
+ test_source_populate(src_2m, 0, GRANULARITY_2M);
+ g_assert_true(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Check a 4K block at offset 4K (populated in src_2m but not in src_4k) */
+ section.offset_within_region = GRANULARITY_4K;
+ g_assert_false(ram_discard_manager_is_populated(rdm, §ion));
+
+ /* Populate it in src_4k */
+ test_source_populate(src_4k, GRANULARITY_4K, GRANULARITY_4K);
+ g_assert_true(ram_discard_manager_is_populated(rdm, §ion));
+
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src_2m));
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src_4k));
+
+ test_source_free(src_2m);
+ test_source_free(src_4k);
+ test_teardown();
+}
+
+/*
+ * Test: Notification with two sources.
+ * Populate notification should only fire when all sources are populated.
+ */
+static void test_two_sources_notification(void)
+{
+ TestRamDiscardSource *src1, *src2;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ TestListener tl = { 0, };
+ int ret;
+
+ test_setup();
+
+ src1 = test_source_new(test_mr, GRANULARITY_4K);
+ src2 = test_source_new(test_mr, GRANULARITY_4K);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src1));
+ g_assert_cmpint(ret, ==, 0);
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src2));
+ g_assert_cmpint(ret, ==, 0);
+
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* Register listener */
+ test_listener_init(&tl);
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(TEST_REGION_SIZE);
+ ram_discard_manager_register_listener(rdm, &tl.rdl, §ion);
+
+ /* No populate notifications yet (all discarded) */
+ g_assert_cmpint(tl.populate_count, ==, 0);
+
+ /* Populate in src1 only - no notification (src2 still discarded) */
+ test_source_populate(src1, 0, GRANULARITY_4K * 4);
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src1),
+ 0, GRANULARITY_4K * 4);
+ g_assert_cmpint(ret, ==, 0);
+ g_assert_cmpint(tl.populate_count, ==, 0);
+
+ /* Populate same range in src2 - now should notify */
+ test_source_populate(src2, 0, GRANULARITY_4K * 4);
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src2),
+ 0, GRANULARITY_4K * 4);
+ g_assert_cmpint(ret, ==, 0);
+ g_assert_cmpint(tl.populate_count, ==, 1);
+
+ /* Discard from src1 - should notify discard immediately */
+ tl.discard_count = 0;
+ test_source_discard(src1, 0, GRANULARITY_4K * 2);
+ ram_discard_manager_notify_discard(rdm, RAM_DISCARD_SOURCE(src1),
+ 0, GRANULARITY_4K * 2);
+ g_assert_cmpint(tl.discard_count, ==, 1);
+
+ ram_discard_manager_unregister_listener(rdm, &tl.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src2));
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src1));
+
+ test_source_free(src2);
+ test_source_free(src1);
+ test_teardown();
+}
+
+/*
+ * Test: Adding source with existing listener.
+ * When a new source is added, listeners should be notified about
+ * regions that become discarded.
+ */
+static void test_add_source_with_listener(void)
+{
+ TestRamDiscardSource *src1, *src2;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ TestListener tl = { 0, };
+ int ret;
+
+ test_setup();
+
+ src1 = test_source_new(test_mr, GRANULARITY_4K);
+ src2 = test_source_new(test_mr, GRANULARITY_4K);
+
+ /* Populate some range in src1 */
+ test_source_populate(src1, 0, GRANULARITY_4K * 8);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src1));
+ g_assert_cmpint(ret, ==, 0);
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* Register listener */
+ test_listener_init(&tl);
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(TEST_REGION_SIZE);
+ ram_discard_manager_register_listener(rdm, &tl.rdl, §ion);
+
+ /* Should have been notified about populated region */
+ g_assert_cmpint(tl.populate_count, ==, 1);
+ g_assert_cmpint(tl.last_populate_offset, ==, 0);
+ g_assert_cmpint(tl.last_populate_size, ==, GRANULARITY_4K * 8);
+
+ /* src2 has part of the region populated, part discarded */
+ /* src2 has 0-4 populated, 4-8 discarded */
+ test_source_populate(src2, 0, GRANULARITY_4K * 4);
+
+ /* Add src2 - listener should be notified about newly discarded regions */
+ tl.discard_count = 0;
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src2));
+ g_assert_cmpint(ret, ==, 0);
+
+ /*
+ * The range 4K*4 to 4K*8 was populated in src1 but discarded in src2,
+ * so it becomes aggregated-discarded. Listener should be notified.
+ * Only this range should trigger a discard notification - regions beyond
+ * 4K*8 were already discarded in src1, so adding src2 doesn't change them.
+ */
+ g_assert_cmpint(tl.discard_count, ==, 1);
+ g_assert_cmpint(tl.last_discard_offset, ==, GRANULARITY_4K * 4);
+ g_assert_cmpint(tl.last_discard_size, ==, GRANULARITY_4K * 4);
+
+ ram_discard_manager_unregister_listener(rdm, &tl.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src2));
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src1));
+
+ test_source_free(src2);
+ test_source_free(src1);
+ test_teardown();
+}
+
+/*
+ * Test: Removing source with existing listener.
+ * When a source is removed, listeners should be notified about
+ * regions that become populated.
+ */
+static void test_remove_source_with_listener(void)
+{
+ TestRamDiscardSource *src1, *src2;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ TestListener tl = { 0, };
+ int ret;
+
+ test_setup();
+
+ src1 = test_source_new(test_mr, GRANULARITY_4K);
+ src2 = test_source_new(test_mr, GRANULARITY_4K);
+
+ /* src1: all of first 8 blocks populated */
+ test_source_populate(src1, 0, GRANULARITY_4K * 8);
+ /* src2: only first 4 blocks populated */
+ test_source_populate(src2, 0, GRANULARITY_4K * 4);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src1));
+ g_assert_cmpint(ret, ==, 0);
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src2));
+ g_assert_cmpint(ret, ==, 0);
+
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* Register listener */
+ test_listener_init(&tl);
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(TEST_REGION_SIZE);
+ ram_discard_manager_register_listener(rdm, &tl.rdl, §ion);
+
+ /* Only first 4 blocks are aggregated-populated */
+ g_assert_cmpint(tl.populate_count, ==, 1);
+ g_assert_cmpuint(tl.last_populate_size, ==, GRANULARITY_4K * 4);
+
+ /* Remove src2 - blocks 4-8 should become populated */
+ tl.populate_count = 0;
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src2));
+
+ /* Listener should be notified about newly populated region (4K*4 to 4K*8) */
+ g_assert_cmpint(tl.populate_count, >=, 1);
+
+ ram_discard_manager_unregister_listener(rdm, &tl.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src1));
+
+ test_source_free(src2);
+ test_source_free(src1);
+ test_teardown();
+}
+
+/*
+ * Test: Add a source, register a listener, remove the source, then add it back.
+ * This checks the transition from 0 sources (all populated) to 1 source
+ * (partially discarded) with an active listener.
+ */
+static void test_readd_source_with_listener(void)
+{
+ TestRamDiscardSource *src;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ TestListener tl = { 0, };
+ int ret;
+
+ test_setup();
+
+ src = test_source_new(test_mr, GRANULARITY_4K);
+
+ /* Populate some range in src */
+ test_source_populate(src, 0, GRANULARITY_4K * 8);
+
+ /* 1. Add source */
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, 0);
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* 2. Register listener */
+ test_listener_init(&tl);
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(TEST_REGION_SIZE);
+ ram_discard_manager_register_listener(rdm, &tl.rdl, §ion);
+
+ /* Listener notified about populated region (0 - 32K) */
+ g_assert_cmpint(tl.populate_count, ==, 1);
+ g_assert_cmpuint(tl.last_populate_size, ==, GRANULARITY_4K * 8);
+
+ /* 3. Remove source */
+ tl.populate_count = 0;
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src));
+
+ /*
+ * With 0 sources, everything is populated.
+ * The range that was discarded in src (from 32K to end) becomes populated.
+ */
+ g_assert_cmpint(tl.populate_count, ==, 1);
+ g_assert_cmpuint(tl.last_populate_offset, ==, GRANULARITY_4K * 8);
+ g_assert_cmpuint(tl.last_populate_size, ==, TEST_REGION_SIZE - GRANULARITY_4K * 8);
+
+ /* 4. Add source back */
+ tl.discard_count = 0;
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, 0);
+
+ /*
+ * Now we have 1 source again. The range (32K to end) is discarded again.
+ * Listener should be notified about this discard.
+ */
+ g_assert_cmpint(tl.discard_count, ==, 1);
+ g_assert_cmpuint(tl.last_discard_offset, ==, GRANULARITY_4K * 8);
+ g_assert_cmpuint(tl.last_discard_size, ==, TEST_REGION_SIZE - GRANULARITY_4K * 8);
+
+ ram_discard_manager_unregister_listener(rdm, &tl.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src));
+ test_source_free(src);
+ test_teardown();
+}
+
+/*
+ * Test: Duplicate source registration should fail.
+ */
+static void test_duplicate_source(void)
+{
+ TestRamDiscardSource *src;
+ int ret;
+
+ test_setup();
+
+ src = test_source_new(test_mr, GRANULARITY_4K);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, 0);
+
+ /* Adding same source again should fail */
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, -EBUSY);
+
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src));
+ test_source_free(src);
+ test_teardown();
+}
+
+/*
+ * Test: Populate notification rollback on listener error.
+ */
+static void test_populate_rollback(void)
+{
+ TestRamDiscardSource *src;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ TestListener tl1 = { 0, }, tl2 = { 0, };
+ int ret;
+
+ test_setup();
+
+ src = test_source_new(test_mr, GRANULARITY_4K);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, 0);
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* Register two listeners */
+ test_listener_init(&tl1);
+ test_listener_init(&tl2);
+ tl2.fail_on_populate = 1; /* Second listener fails on first populate */
+
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(TEST_REGION_SIZE);
+
+ /*
+ * Register tl2 first so it's visited second (QLIST_INSERT_HEAD reverses
+ * registration order). This ensures tl1 receives populate before tl2
+ * fails.
+ */
+ ram_discard_manager_register_listener(rdm, &tl2.rdl, §ion);
+ ram_discard_manager_register_listener(rdm, &tl1.rdl, §ion);
+
+ /* Try to populate - should fail and roll back */
+ test_source_populate(src, 0, GRANULARITY_4K);
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src),
+ 0, GRANULARITY_4K);
+ g_assert_cmpint(ret, ==, -ENOMEM);
+
+ /* First listener should have received populate then discard (rollback) */
+ g_assert_cmpint(tl1.populate_count, ==, 1);
+ g_assert_cmpint(tl1.discard_count, ==, 1);
+
+ ram_discard_manager_unregister_listener(rdm, &tl1.rdl);
+ ram_discard_manager_unregister_listener(rdm, &tl2.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src));
+ test_source_free(src);
+ test_teardown();
+}
+
+/*
+ * Test: Replay populated with two sources (intersection).
+ */
+static void test_replay_populated_intersection(void)
+{
+ TestRamDiscardSource *src1, *src2;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ TestListener tl = { 0, };
+ int ret;
+
+ test_setup();
+
+ src1 = test_source_new(test_mr, GRANULARITY_4K);
+ src2 = test_source_new(test_mr, GRANULARITY_4K);
+
+ /*
+ * src1: blocks 0-7 populated
+ * src2: blocks 4-11 populated
+ * Intersection: blocks 4-7
+ */
+ test_source_populate(src1, 0, GRANULARITY_4K * 8);
+ test_source_populate(src2, GRANULARITY_4K * 4, GRANULARITY_4K * 8);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src1));
+ g_assert_cmpint(ret, ==, 0);
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src2));
+ g_assert_cmpint(ret, ==, 0);
+
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* Register listener - should only get notified about intersection */
+ test_listener_init(&tl);
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(TEST_REGION_SIZE);
+ ram_discard_manager_register_listener(rdm, &tl.rdl, §ion);
+
+ /* Should have been notified about blocks 4-7 (intersection) */
+ g_assert_cmpint(tl.populate_count, ==, 1);
+ g_assert_cmpuint(tl.last_populate_offset, ==, GRANULARITY_4K * 4);
+ g_assert_cmpuint(tl.last_populate_size, ==, GRANULARITY_4K * 4);
+
+ ram_discard_manager_unregister_listener(rdm, &tl.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src2));
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src1));
+
+ test_source_free(src2);
+ test_source_free(src1);
+ test_teardown();
+}
+
+/*
+ * Test: Empty region (no sources).
+ */
+static void test_no_sources(void)
+{
+ test_setup();
+
+ /* No sources - should have no manager */
+ g_assert_null(memory_region_get_ram_discard_manager(test_mr));
+ g_assert_false(memory_region_has_ram_discard_manager(test_mr));
+
+ test_teardown();
+}
+
+static void test_redundant_discard(void)
+{
+ TestRamDiscardSource *src1, *src2;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ TestListener tl = { 0, };
+ int ret;
+
+ test_setup();
+
+ src1 = test_source_new(test_mr, GRANULARITY_4K);
+ src2 = test_source_new(test_mr, GRANULARITY_4K);
+
+ /* Add sources */
+ ret = memory_region_add_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src1));
+ g_assert_cmpint(ret, ==, 0);
+ ret = memory_region_add_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src2));
+ g_assert_cmpint(ret, ==, 0);
+
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* Register listener */
+ test_listener_init(&tl);
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(TEST_REGION_SIZE);
+ ram_discard_manager_register_listener(rdm, &tl.rdl, §ion);
+
+ /* Populate intersection (0-4K) in both sources */
+ test_source_populate(src1, 0, GRANULARITY_4K);
+ test_source_populate(src2, 0, GRANULARITY_4K);
+
+ /* Notify populate src1 - should trigger listener populate */
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src1),
+ 0, GRANULARITY_4K);
+ g_assert_cmpint(ret, ==, 0);
+ g_assert_cmpint(tl.populate_count, ==, 1);
+
+ /* Now Discard src1 -> Aggregate Discarded */
+ tl.discard_count = 0;
+ test_source_discard(src1, 0, GRANULARITY_4K);
+ ram_discard_manager_notify_discard(rdm, RAM_DISCARD_SOURCE(src1), 0, GRANULARITY_4K);
+ g_assert_cmpint(tl.discard_count, ==, 1);
+
+ /* Now Discard src2 -> Aggregate Discarded (Already Discarded!) */
+ /* Listener should NOT receive another discard notification for the same range. */
+ test_source_discard(src2, 0, GRANULARITY_4K);
+ ram_discard_manager_notify_discard(rdm, RAM_DISCARD_SOURCE(src2), 0, GRANULARITY_4K);
+
+ g_assert_cmpint(tl.discard_count, ==, 1);
+
+ ram_discard_manager_unregister_listener(rdm, &tl.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src2));
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src1));
+
+ test_source_free(src2);
+ test_source_free(src1);
+ test_teardown();
+}
+
+/*
+ * Test: Listener with partial section coverage.
+ * Listener should only receive notifications for its registered range.
+ */
+static void test_partial_listener_section(void)
+{
+ TestRamDiscardSource *src;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ TestListener tl = { 0, };
+ int ret;
+
+ test_setup();
+
+ src = test_source_new(test_mr, GRANULARITY_4K);
+
+ /* Populate blocks 0-7 */
+ test_source_populate(src, 0, GRANULARITY_4K * 8);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, 0);
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* Register listener for only blocks 2-5 (not the full region) */
+ test_listener_init(&tl);
+ section.mr = test_mr;
+ section.offset_within_region = GRANULARITY_4K * 2;
+ section.size = int128_make64(GRANULARITY_4K * 4);
+ ram_discard_manager_register_listener(rdm, &tl.rdl, §ion);
+
+ /* Should be notified only about blocks 2-5 (intersection) */
+ g_assert_cmpint(tl.populate_count, ==, 1);
+ g_assert_cmpuint(tl.last_populate_offset, ==, GRANULARITY_4K * 2);
+ g_assert_cmpuint(tl.last_populate_size, ==, GRANULARITY_4K * 4);
+
+ /* Discard block 0 - outside listener's section, no notification */
+ tl.discard_count = 0;
+ test_source_discard(src, 0, GRANULARITY_4K);
+ ram_discard_manager_notify_discard(rdm, RAM_DISCARD_SOURCE(src),
+ 0, GRANULARITY_4K);
+ g_assert_cmpint(tl.discard_count, ==, 0);
+
+ /* Discard block 3 - inside listener's section */
+ test_source_discard(src, GRANULARITY_4K * 3, GRANULARITY_4K);
+ ram_discard_manager_notify_discard(rdm, RAM_DISCARD_SOURCE(src),
+ GRANULARITY_4K * 3, GRANULARITY_4K);
+ g_assert_cmpint(tl.discard_count, ==, 1);
+ g_assert_cmpuint(tl.last_discard_offset, ==, GRANULARITY_4K * 3);
+
+ /* Discard spanning boundary (blocks 5-6) - only block 5 in section */
+ tl.discard_count = 0;
+ test_source_discard(src, GRANULARITY_4K * 5, GRANULARITY_4K * 2);
+ ram_discard_manager_notify_discard(rdm, RAM_DISCARD_SOURCE(src),
+ GRANULARITY_4K * 5, GRANULARITY_4K * 2);
+ g_assert_cmpint(tl.discard_count, ==, 1);
+ g_assert_cmpuint(tl.last_discard_offset, ==, GRANULARITY_4K * 5);
+ g_assert_cmpuint(tl.last_discard_size, ==, GRANULARITY_4K);
+
+ ram_discard_manager_unregister_listener(rdm, &tl.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src));
+ test_source_free(src);
+ test_teardown();
+}
+
+/*
+ * Test: Multiple listeners with different (non-overlapping) sections.
+ */
+static void test_multiple_listeners_different_sections(void)
+{
+ TestRamDiscardSource *src;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section1, section2;
+ TestListener tl1 = { 0, }, tl2 = { 0, };
+ int ret;
+
+ test_setup();
+
+ src = test_source_new(test_mr, GRANULARITY_4K);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, 0);
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* Listener 1: blocks 0-3 */
+ test_listener_init(&tl1);
+ section1.mr = test_mr;
+ section1.offset_within_region = 0;
+ section1.size = int128_make64(GRANULARITY_4K * 4);
+ ram_discard_manager_register_listener(rdm, &tl1.rdl, §ion1);
+
+ /* Listener 2: blocks 8-11 */
+ test_listener_init(&tl2);
+ section2.mr = test_mr;
+ section2.offset_within_region = GRANULARITY_4K * 8;
+ section2.size = int128_make64(GRANULARITY_4K * 4);
+ ram_discard_manager_register_listener(rdm, &tl2.rdl, §ion2);
+
+ /* Initially all discarded - no populate notifications */
+ g_assert_cmpint(tl1.populate_count, ==, 0);
+ g_assert_cmpint(tl2.populate_count, ==, 0);
+
+ /* Populate blocks 0-3 - only tl1 should be notified */
+ test_source_populate(src, 0, GRANULARITY_4K * 4);
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src),
+ 0, GRANULARITY_4K * 4);
+ g_assert_cmpint(ret, ==, 0);
+ g_assert_cmpint(tl1.populate_count, ==, 1);
+ g_assert_cmpint(tl2.populate_count, ==, 0);
+
+ /* Populate blocks 8-11 - only tl2 should be notified */
+ test_source_populate(src, GRANULARITY_4K * 8, GRANULARITY_4K * 4);
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src),
+ GRANULARITY_4K * 8,
+ GRANULARITY_4K * 4);
+ g_assert_cmpint(ret, ==, 0);
+ g_assert_cmpint(tl1.populate_count, ==, 1);
+ g_assert_cmpint(tl2.populate_count, ==, 1);
+
+ /* Populate blocks 4-7 (gap) - neither listener should be notified */
+ test_source_populate(src, GRANULARITY_4K * 4, GRANULARITY_4K * 4);
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src),
+ GRANULARITY_4K * 4,
+ GRANULARITY_4K * 4);
+ g_assert_cmpint(ret, ==, 0);
+ g_assert_cmpint(tl1.populate_count, ==, 1);
+ g_assert_cmpint(tl2.populate_count, ==, 1);
+
+ ram_discard_manager_unregister_listener(rdm, &tl2.rdl);
+ ram_discard_manager_unregister_listener(rdm, &tl1.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src));
+ test_source_free(src);
+ test_teardown();
+}
+
+/*
+ * Test: Multiple listeners with overlapping sections.
+ */
+static void test_overlapping_listener_sections(void)
+{
+ TestRamDiscardSource *src;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section1, section2;
+ TestListener tl1 = { 0, }, tl2 = { 0, };
+ int ret;
+
+ test_setup();
+
+ src = test_source_new(test_mr, GRANULARITY_4K);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, 0);
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* Listener 1: blocks 0-7 */
+ test_listener_init(&tl1);
+ section1.mr = test_mr;
+ section1.offset_within_region = 0;
+ section1.size = int128_make64(GRANULARITY_4K * 8);
+ ram_discard_manager_register_listener(rdm, &tl1.rdl, §ion1);
+
+ /* Listener 2: blocks 4-11 (overlaps with tl1 on blocks 4-7) */
+ test_listener_init(&tl2);
+ section2.mr = test_mr;
+ section2.offset_within_region = GRANULARITY_4K * 4;
+ section2.size = int128_make64(GRANULARITY_4K * 8);
+ ram_discard_manager_register_listener(rdm, &tl2.rdl, §ion2);
+
+ /* Populate blocks 4-7 (overlap region) - both should be notified */
+ test_source_populate(src, GRANULARITY_4K * 4, GRANULARITY_4K * 4);
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src),
+ GRANULARITY_4K * 4,
+ GRANULARITY_4K * 4);
+ g_assert_cmpint(ret, ==, 0);
+ g_assert_cmpint(tl1.populate_count, ==, 1);
+ g_assert_cmpint(tl2.populate_count, ==, 1);
+
+ /* Populate blocks 0-3 - only tl1 */
+ test_source_populate(src, 0, GRANULARITY_4K * 4);
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src),
+ 0, GRANULARITY_4K * 4);
+ g_assert_cmpint(ret, ==, 0);
+ g_assert_cmpint(tl1.populate_count, ==, 2);
+ g_assert_cmpint(tl2.populate_count, ==, 1);
+
+ /* Populate blocks 8-11 - only tl2 */
+ test_source_populate(src, GRANULARITY_4K * 8, GRANULARITY_4K * 4);
+ ret = ram_discard_manager_notify_populate(rdm, RAM_DISCARD_SOURCE(src),
+ GRANULARITY_4K * 8,
+ GRANULARITY_4K * 4);
+ g_assert_cmpint(ret, ==, 0);
+ g_assert_cmpint(tl1.populate_count, ==, 2);
+ g_assert_cmpint(tl2.populate_count, ==, 2);
+
+ ram_discard_manager_unregister_listener(rdm, &tl2.rdl);
+ ram_discard_manager_unregister_listener(rdm, &tl1.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src));
+ test_source_free(src);
+ test_teardown();
+}
+
+/*
+ * Test: Listener at exact memory region boundaries.
+ */
+static void test_boundary_section(void)
+{
+ TestRamDiscardSource *src;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ TestListener tl = { 0, };
+ uint64_t last_offset;
+ int ret;
+
+ test_setup();
+
+ src = test_source_new(test_mr, GRANULARITY_4K);
+
+ /* Populate last 4 blocks of the region */
+ last_offset = TEST_REGION_SIZE - GRANULARITY_4K * 4;
+ test_source_populate(src, last_offset, GRANULARITY_4K * 4);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src));
+ g_assert_cmpint(ret, ==, 0);
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ /* Register listener for exactly the last 4 blocks */
+ test_listener_init(&tl);
+ section.mr = test_mr;
+ section.offset_within_region = last_offset;
+ section.size = int128_make64(GRANULARITY_4K * 4);
+ ram_discard_manager_register_listener(rdm, &tl.rdl, §ion);
+
+ /* Should receive notification for the populated range */
+ g_assert_cmpint(tl.populate_count, ==, 1);
+ g_assert_cmpuint(tl.last_populate_offset, ==, last_offset);
+ g_assert_cmpuint(tl.last_populate_size, ==, GRANULARITY_4K * 4);
+
+ /* Discard exactly at boundary */
+ tl.discard_count = 0;
+ test_source_discard(src, last_offset, GRANULARITY_4K * 4);
+ ram_discard_manager_notify_discard(rdm, RAM_DISCARD_SOURCE(src),
+ last_offset, GRANULARITY_4K * 4);
+ g_assert_cmpint(tl.discard_count, ==, 1);
+
+ ram_discard_manager_unregister_listener(rdm, &tl.rdl);
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src));
+ test_source_free(src);
+ test_teardown();
+}
+
+static int count_discarded_blocks(const MemoryRegionSection *section,
+ void *opaque)
+{
+ int *count = opaque;
+ *count += int128_get64(section->size) / GRANULARITY_4K;
+ return 0;
+}
+
+/*
+ * Test: replay_discarded with two sources (union semantics).
+ */
+static void test_replay_discarded(void)
+{
+ TestRamDiscardSource *src1, *src2;
+ RamDiscardManager *rdm;
+ MemoryRegionSection section;
+ int count = 0;
+ int ret;
+
+ test_setup();
+
+ src1 = test_source_new(test_mr, GRANULARITY_4K);
+ src2 = test_source_new(test_mr, GRANULARITY_4K);
+
+ /*
+ * src1: blocks 0-3 populated, rest discarded
+ * src2: blocks 2-5 populated, rest discarded
+ * Aggregated populated: blocks 2-3 (intersection)
+ * Aggregated discarded: blocks 0-1, 4-5, 6+ (union of discarded)
+ */
+ test_source_populate(src1, 0, GRANULARITY_4K * 4);
+ test_source_populate(src2, GRANULARITY_4K * 2, GRANULARITY_4K * 4);
+
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src1));
+ g_assert_cmpint(ret, ==, 0);
+ ret = memory_region_add_ram_discard_source(test_mr,
+ RAM_DISCARD_SOURCE(src2));
+ g_assert_cmpint(ret, ==, 0);
+
+ rdm = memory_region_get_ram_discard_manager(test_mr);
+
+ section.mr = test_mr;
+ section.offset_within_region = 0;
+ section.size = int128_make64(GRANULARITY_4K * 8);
+
+ /* Count discarded blocks */
+ ret = ram_discard_manager_replay_discarded(rdm, §ion,
+ count_discarded_blocks, &count);
+
+ g_assert_cmpint(ret, ==, 0);
+ /* Discarded: blocks 0-1 (2), blocks 4-5 (2), blocks 6-7 (2) = 6 blocks */
+ g_assert_cmpint(count, ==, 6);
+
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src2));
+ memory_region_del_ram_discard_source(test_mr, RAM_DISCARD_SOURCE(src1));
+
+ test_source_free(src2);
+ test_source_free(src1);
+ test_teardown();
+}
+
+int main(int argc, char **argv)
+{
+ g_test_init(&argc, &argv, NULL);
+
+ module_call_init(MODULE_INIT_QOM);
+ type_register_static(&test_rds_info);
+
+ g_test_add_func("/ram-discard-manager/single-source/basic",
+ test_single_source_basic);
+ g_test_add_func("/ram-discard-manager/single-source/listener",
+ test_single_source_listener);
+ g_test_add_func("/ram-discard-manager/two-sources/same-granularity",
+ test_two_sources_same_granularity);
+ g_test_add_func("/ram-discard-manager/two-sources/different-granularity",
+ test_two_sources_different_granularity);
+ g_test_add_func("/ram-discard-manager/two-sources/notification",
+ test_two_sources_notification);
+ g_test_add_func("/ram-discard-manager/dynamic/add-source-with-listener",
+ test_add_source_with_listener);
+ g_test_add_func("/ram-discard-manager/dynamic/remove-source-with-listener",
+ test_remove_source_with_listener);
+ g_test_add_func("/ram-discard-manager/dynamic/readd-source-with-listener",
+ test_readd_source_with_listener);
+ g_test_add_func("/ram-discard-manager/edge/duplicate-source",
+ test_duplicate_source);
+ g_test_add_func("/ram-discard-manager/edge/populate-rollback",
+ test_populate_rollback);
+ g_test_add_func("/ram-discard-manager/edge/replay-intersection",
+ test_replay_populated_intersection);
+ g_test_add_func("/ram-discard-manager/edge/no-sources",
+ test_no_sources);
+ g_test_add_func("/ram-discard-manager/multi-source/redundant-discard",
+ test_redundant_discard);
+ g_test_add_func("/ram-discard-manager/listener/partial-section",
+ test_partial_listener_section);
+ g_test_add_func("/ram-discard-manager/listener/multiple-different",
+ test_multiple_listeners_different_sections);
+ g_test_add_func("/ram-discard-manager/listener/overlapping",
+ test_overlapping_listener_sections);
+ g_test_add_func("/ram-discard-manager/edge/boundary-section",
+ test_boundary_section);
+ g_test_add_func("/ram-discard-manager/multi-source/replay-discarded",
+ test_replay_discarded);
+
+ return g_test_run();
+}
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 03d36748c73..e303f80119f 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -137,7 +137,13 @@ if have_system
'test-bufferiszero': [],
'test-smp-parse': [qom, meson.project_source_root() / 'hw/core/machine-smp.c'],
'test-vmstate': [migration, io],
- 'test-yank': ['socket-helpers.c', qom, io, chardev]
+ 'test-yank': ['socket-helpers.c', qom, io, chardev],
+ 'test-ram-discard-manager': [
+ 'test-ram-discard-manager.c',
+ 'test-ram-discard-manager-stubs.c',
+ meson.project_source_root() / 'system/ram-discard-manager.c',
+ genh, qemuutil, qom
+ ],
}
if config_host_data.get('CONFIG_INOTIFY1')
tests += {'test-util-filemonitor': []}
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 11/13] system/physmem: make ram_block_discard_range() handle guest_memfd
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (9 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 10/13] tests: add unit tests for RamDiscardManager multi-source aggregation Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-13 20:37 ` Peter Xu
2026-05-04 12:30 ` [PATCH v4 12/13] monitor: add 'info ramblock-attributes' command Marc-André Lureau
` (2 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Marc-André Lureau
Most callers of ram_block_discard_range() want to discard both the
shared and guest_memfd backing. Only kvm_convert_memory() intentionally
discards a single plane during private/shared conversions.
Rename the current implementation to ram_block_discard_shared_range()
and make ram_block_discard_range() a composite that also discards
guest_memfd when present (rb->guest_memfd >= 0). This ensures callers
like virtio-mem, virtio-balloon, hv-balloon, migration.. reclaim
private pages on discard.
Update kvm_convert_memory() to use the plane-specific
ram_block_discard_shared_range() since it only needs to discard
the shared backing when converting to private.
Likewise, after TDVF image copy, use ram_block_discard_shared_range().
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
include/system/ramblock.h | 3 ++-
accel/kvm/kvm-all.c | 2 +-
system/physmem.c | 25 +++++++++++++++++++++----
target/i386/kvm/tdx.c | 2 +-
system/trace-events | 2 +-
5 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/include/system/ramblock.h b/include/system/ramblock.h
index f0b557af416..76a84fd9c88 100644
--- a/include/system/ramblock.h
+++ b/include/system/ramblock.h
@@ -104,7 +104,8 @@ struct RamBlockAttributes {
/* @offset: the offset within the RAMBlock */
int ram_block_discard_range(RAMBlock *rb, uint64_t offset, size_t length);
-/* @offset: the offset within the RAMBlock */
+int ram_block_discard_shared_range(RAMBlock *rb, uint64_t offset,
+ size_t length);
int ram_block_discard_guest_memfd_range(RAMBlock *rb, uint64_t offset,
size_t length);
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 92af42503b1..97463a683f4 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3426,7 +3426,7 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
*/
goto out_unref;
}
- ret = ram_block_discard_range(rb, offset, size);
+ ret = ram_block_discard_shared_range(rb, offset, size);
} else {
ret = ram_block_discard_guest_memfd_range(rb, offset, size);
}
diff --git a/system/physmem.c b/system/physmem.c
index a8472c91dff..5af9d5ac1a8 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -4085,7 +4085,7 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
* Returns: 0 on success, none-0 on failure
*
*/
-int ram_block_discard_range(RAMBlock *rb, uint64_t offset, size_t length)
+int ram_block_discard_shared_range(RAMBlock *rb, uint64_t offset, size_t length)
{
int ret = -1;
@@ -4134,7 +4134,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t offset, size_t length)
* have a MAP_PRIVATE mapping, possibly messing with other
* MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
* change that behavior whithout violating the promised
- * semantics of ram_block_discard_range().
+ * semantics of ram_block_discard_shared_range().
*
* Only warn, because it works as long as nobody else uses that
* file.
@@ -4190,8 +4190,9 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t offset, size_t length)
goto err;
#endif
}
- trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
- need_madvise, need_fallocate, ret);
+ trace_ram_block_discard_shared_range(rb->idstr, host_startaddr, length,
+ need_madvise, need_fallocate,
+ ret);
} else {
error_report("%s: Overrun block '%s' (%" PRIu64 "/%zx/" RAM_ADDR_FMT")",
__func__, rb->idstr, offset, length, rb->max_length);
@@ -4201,6 +4202,22 @@ err:
return ret;
}
+int ram_block_discard_range(RAMBlock *rb, uint64_t offset, size_t length)
+{
+ int ret;
+
+ ret = ram_block_discard_shared_range(rb, offset, length);
+ if (ret) {
+ return ret;
+ }
+
+ if (rb->guest_memfd >= 0) {
+ ret = ram_block_discard_guest_memfd_range(rb, offset, length);
+ }
+
+ return ret;
+}
+
int ram_block_discard_guest_memfd_range(RAMBlock *rb, uint64_t offset,
size_t length)
{
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 4714c9d514e..fcb11aa67e4 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -385,7 +385,7 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
* KVM_MEMORY_MAPPING. It becomes useless.
*/
ram_block = tdx_guest->tdvf_mr->ram_block;
- ram_block_discard_range(ram_block, 0, ram_block->max_length);
+ ram_block_discard_shared_range(ram_block, 0, ram_block->max_length);
tdx_vm_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL, &error_fatal);
CONFIDENTIAL_GUEST_SUPPORT(tdx_guest)->ready = true;
diff --git a/system/trace-events b/system/trace-events
index e6e1b612798..51b4a4679a2 100644
--- a/system/trace-events
+++ b/system/trace-events
@@ -32,7 +32,7 @@ global_dirty_changed(unsigned int bitmask) "bitmask 0x%"PRIx32
address_space_map(void *as, uint64_t addr, uint64_t len, bool is_write, uint32_t attrs) "as:%p addr 0x%"PRIx64":%"PRIx64" write:%d attrs:0x%x"
find_ram_offset(uint64_t size, uint64_t offset) "size: 0x%" PRIx64 " @ 0x%" PRIx64
find_ram_offset_loop(uint64_t size, uint64_t candidate, uint64_t offset, uint64_t next, uint64_t mingap) "trying size: 0x%" PRIx64 " @ 0x%" PRIx64 ", offset: 0x%" PRIx64" next: 0x%" PRIx64 " mingap: 0x%" PRIx64
-ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d fallocate: %d ret: %d"
+ram_block_discard_shared_range(const char *rbname, void *hva, size_t length, bool need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d fallocate: %d ret: %d"
qemu_ram_alloc_shared(const char *name, size_t size, size_t max_size, int fd, void *host) "%s size %zu max_size %zu fd %d host %p"
subpage_register(void *subpage, uint32_t start, uint32_t end, int idx, int eidx, uint16_t section) "subpage %p start 0x%08x end 0x%08x idx 0x%08x eidx 0x%08x section %u"
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 12/13] monitor: add 'info ramblock-attributes' command
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (10 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 11/13] system/physmem: make ram_block_discard_range() handle guest_memfd Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-13 20:39 ` Peter Xu
2026-05-04 12:30 ` [PATCH v4 13/13] RFC: hw/virtio: start virtio-mem guest_memfd regions as shared Marc-André Lureau
2026-05-13 20:53 ` [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Peter Xu
13 siblings, 1 reply; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Marc-André Lureau
Add a new 'info ramblock-attributes' HMP command and the corresponding
'x-query-ramblock-attributes' QMP command to display the shared/private
memory attributes for ram blocks.
The QMP command returns structured data (RamBlockAttributesInfo list
with per-range shared/populated attributes), while HMP formats it for
human consumption.
This is useful for debugging confidential guests (TDX, SNP) to inspect
which memory regions are shared vs private, and their population state
when a RamDiscardManager is present (e.g. virtio-mem).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
qapi/machine.json | 55 +++++++++++++++++++++++++++++++++
include/monitor/hmp.h | 1 +
hw/core/machine-hmp-cmds.c | 32 +++++++++++++++++++
system/ram-block-attributes.c | 72 +++++++++++++++++++++++++++++++++++++++++++
hmp-commands-info.hx | 13 ++++++++
5 files changed, 173 insertions(+)
diff --git a/qapi/machine.json b/qapi/machine.json
index 685e4e29b87..aac8a235cf6 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1738,6 +1738,61 @@
'returns': 'HumanReadableText',
'features': [ 'unstable' ] }
+##
+# @RamBlockAttributeRange:
+#
+# A contiguous range within a ram block with uniform attributes.
+#
+# @start: start offset in bytes within the ram block
+#
+# @length: length in bytes of the range
+#
+# @shared: true if the range is shared, false if private
+#
+# @populated: true if the range is populated (only present when a
+# RamDiscardManager is managing the block)
+#
+# Since: 11.1
+##
+{ 'struct': 'RamBlockAttributeRange',
+ 'data': { 'start': 'uint64',
+ 'length': 'uint64',
+ 'shared': 'bool',
+ '*populated': 'bool' } }
+
+##
+# @RamBlockAttributesInfo:
+#
+# Shared/private memory attributes for a ram block.
+#
+# @name: the ram block identifier
+#
+# @ranges: list of attribute ranges
+#
+# Since: 11.1
+##
+{ 'struct': 'RamBlockAttributesInfo',
+ 'data': { 'name': 'str',
+ 'ranges': [ 'RamBlockAttributeRange' ] } }
+
+##
+# @x-query-ramblock-attributes:
+#
+# Query ram block shared/private attributes. This is useful
+# to debug confidential guests.
+#
+# Features:
+#
+# @unstable: This command is meant for debugging.
+#
+# Returns: list of ram block attributes
+#
+# Since: 11.1
+##
+{ 'command': 'x-query-ramblock-attributes',
+ 'returns': [ 'RamBlockAttributesInfo' ],
+ 'features': [ 'unstable' ] }
+
##
# @x-query-roms:
#
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index e222bea60cd..4493952b417 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -143,6 +143,7 @@ void hmp_info_dump(Monitor *mon, const QDict *qdict);
void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict);
+void hmp_info_ramblock_attributes(Monitor *mon, const QDict *qdict);
void hmp_info_replay(Monitor *mon, const QDict *qdict);
void hmp_replay_break(Monitor *mon, const QDict *qdict);
void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index 46846f741a2..122e1a0f735 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -24,6 +24,7 @@
#include "qapi/string-output-visitor.h"
#include "qemu/error-report.h"
#include "system/numa.h"
+#include "system/ramblock.h"
#include "hw/core/boards.h"
void hmp_info_cpus(Monitor *mon, const QDict *qdict)
@@ -388,3 +389,34 @@ void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict)
}
hmp_handle_error(mon, err);
}
+
+void hmp_info_ramblock_attributes(Monitor *mon, const QDict *qdict)
+{
+ Error *err = NULL;
+ g_autoptr(RamBlockAttributesInfoList) list = NULL;
+ RamBlockAttributesInfoList *it;
+
+ list = qmp_x_query_ramblock_attributes(&err);
+ if (hmp_handle_error(mon, err)) {
+ return;
+ }
+
+ for (it = list; it; it = it->next) {
+ RamBlockAttributesInfo *rba = it->value;
+ RamBlockAttributeRangeList *r;
+
+ monitor_printf(mon, "%s:\n", rba->name);
+ for (r = rba->ranges; r; r = r->next) {
+ RamBlockAttributeRange *range = r->value;
+ const char *shared = range->shared ? "shared" : "private";
+ const char *pop = range->has_populated ?
+ (range->populated ? "+populated" : "-populated") : "";
+
+ monitor_printf(mon,
+ " 0x%016" PRIx64 "-0x%016" PRIx64 " %s%s\n",
+ range->start,
+ range->start + range->length - 1,
+ shared, pop);
+ }
+ }
+}
diff --git a/system/ram-block-attributes.c b/system/ram-block-attributes.c
index 59ec7a28eb0..f9573801b60 100644
--- a/system/ram-block-attributes.c
+++ b/system/ram-block-attributes.c
@@ -11,6 +11,7 @@
#include "qemu/osdep.h"
#include "qemu/error-report.h"
+#include "qapi/qapi-commands-machine.h"
#include "system/ramblock.h"
#include "trace.h"
@@ -221,3 +222,74 @@ static void ram_block_attributes_class_init(ObjectClass *klass,
rdsc->get_min_granularity = ram_block_attributes_rds_get_min_granularity;
rdsc->is_populated = ram_block_attributes_rds_is_populated;
}
+
+RamBlockAttributesInfoList *qmp_x_query_ramblock_attributes(Error **errp)
+{
+ RamBlockAttributesInfoList *head = NULL, **tail = &head;
+ RAMBlock *block;
+ size_t rba_block_size = ram_block_attributes_get_block_size();
+
+ RCU_READ_LOCK_GUARD();
+
+ RAMBLOCK_FOREACH(block) {
+ RamBlockAttributesInfo *rba;
+ RamBlockAttributeRangeList **range_tail;
+ RamBlockAttributes *attr = block->attributes;
+ RamDiscardManager *rdm;
+ bool has_rdm;
+ unsigned long pos;
+
+ if (!attr) {
+ continue;
+ }
+
+ rdm = memory_region_get_ram_discard_manager(block->mr);
+ has_rdm = rdm != NULL;
+
+ rba = g_new0(RamBlockAttributesInfo, 1);
+ rba->name = g_strdup(block->idstr);
+ range_tail = &rba->ranges;
+
+ pos = 0;
+ while (pos < attr->bitmap_size) {
+ bool is_shared = test_bit(pos, attr->bitmap);
+ unsigned long next;
+ uint64_t start_offset, length;
+ RamBlockAttributeRange *range;
+
+ if (is_shared) {
+ next = find_next_zero_bit(attr->bitmap,
+ attr->bitmap_size, pos);
+ } else {
+ next = find_next_bit(attr->bitmap,
+ attr->bitmap_size, pos);
+ }
+
+ start_offset = (uint64_t)pos * rba_block_size;
+ length = (uint64_t)(next - pos) * rba_block_size;
+
+ range = g_new0(RamBlockAttributeRange, 1);
+ range->start = start_offset;
+ range->length = length;
+ range->shared = is_shared;
+
+ if (has_rdm) {
+ MemoryRegionSection section = {
+ .mr = block->mr,
+ .offset_within_region = start_offset,
+ .size = int128_make64(length),
+ };
+ range->has_populated = true;
+ range->populated =
+ ram_discard_manager_is_populated(rdm, §ion);
+ }
+
+ QAPI_LIST_APPEND(range_tail, range);
+ pos = next;
+ }
+
+ QAPI_LIST_APPEND(tail, rba);
+ }
+
+ return head;
+}
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 74c741f80e2..1168b4c20ca 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -774,6 +774,19 @@ SRST
Dump all the ramblocks of the system.
ERST
+ {
+ .name = "ramblock-attributes",
+ .args_type = "",
+ .params = "",
+ .help = "Display ramblock shared/private attributes",
+ .cmd = hmp_info_ramblock_attributes,
+ },
+
+SRST
+ ``info ramblock-attributes``
+ Display the shared/private memory attributes for ram blocks.
+ERST
+
{
.name = "hotpluggable-cpus",
.args_type = "",
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 13/13] RFC: hw/virtio: start virtio-mem guest_memfd regions as shared
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (11 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 12/13] monitor: add 'info ramblock-attributes' command Marc-André Lureau
@ 2026-05-04 12:30 ` Marc-André Lureau
2026-05-13 20:47 ` Peter Xu
2026-05-14 7:32 ` Chenyi Qiang
2026-05-13 20:53 ` [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Peter Xu
13 siblings, 2 replies; 23+ messages in thread
From: Marc-André Lureau @ 2026-05-04 12:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Marc-André Lureau
In TDX guests, virtio-mem plug/unplug/re-plug fails because
kvm_set_phys_mem() unconditionally sets KVM memory attributes to
PRIVATE for all guest_memfd regions. On re-plug, the PRIVATE->PRIVATE
transition is a no-op, so KVM doesn't re-AUG pages and the guest's
TDG.MEM.PAGE.ACCEPT fails.
Implement the "start-shared" approach: virtio-mem memory starts with
shared KVM attributes. The guest converts shared->private on plug (via
set_memory_encrypted -> MapGPA + ACCEPT), and back to shared on unplug
(via set_memory_decrypted). This ensures every plug triggers a real
SHARED->PRIVATE transition, causing KVM to AUG fresh pages.
Add RAM_GUEST_MEMFD_START_SHARED flag and set it during virtio-mem
realize for guest_memfd-backed regions. Use
ram_block_attributes_state_change() to properly update the attributes
bitmap through the API. Skip setting PRIVATE in kvm_set_phys_mem()
when the flag is set. On unplug, explicitly reset KVM attributes to
shared on the host side to handle the case where the guest skips
set_memory_decrypted().
See also virtio-comment "[PATCH RFC] virtio-mem: add shared/private memory property details".
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
include/system/memory.h | 6 ++++++
accel/kvm/kvm-all.c | 3 ++-
hw/virtio/virtio-mem.c | 27 ++++++++++++++++++++++++++-
3 files changed, 34 insertions(+), 2 deletions(-)
diff --git a/include/system/memory.h b/include/system/memory.h
index 28a75dac4ae..9dbf67efe50 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -277,6 +277,12 @@ typedef struct IOMMUTLBEvent {
*/
#define RAM_PRIVATE (1 << 13)
+/*
+ * RAM with guest_memfd that should start with shared KVM memory
+ * attributes. The guest converts to private on use.
+ */
+#define RAM_GUEST_MEMFD_START_SHARED (1 << 14)
+
static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 97463a683f4..c034e74c8e5 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1737,7 +1737,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
abort();
}
- if (memory_region_has_guest_memfd(mr)) {
+ if (memory_region_has_guest_memfd(mr) &&
+ !(mr->ram_block->flags & RAM_GUEST_MEMFD_START_SHARED)) {
err = kvm_set_memory_attributes_private(start_addr, slot_size);
if (err) {
error_report("%s: failed to set memory attribute private: %s",
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 35e03ed7599..b46efe21126 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -19,6 +19,7 @@
#include "system/memory.h"
#include "system/numa.h"
#include "system/system.h"
+#include "system/kvm.h"
#include "system/ramblock.h"
#include "system/reset.h"
#include "system/runstate.h"
@@ -479,6 +480,11 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa,
if (vmem->dynamic_memslots) {
virtio_mem_deactivate_unplugged_memslots(vmem, offset, size);
}
+ if (rb->flags & RAM_GUEST_MEMFD_START_SHARED) {
+ kvm_set_memory_attributes_shared(start_gpa, size);
+ ram_block_attributes_state_change(rb->attributes,
+ offset, size, false);
+ }
return 0;
}
@@ -606,10 +612,12 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem)
RAMBlock *rb = vmem->memdev->mr.ram_block;
if (vmem->size) {
+ uint64_t used = qemu_ram_get_used_length(rb);
+
if (virtio_mem_is_busy()) {
return -EBUSY;
}
- if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
+ if (ram_block_discard_range(rb, 0, used)) {
return -EBUSY;
}
virtio_mem_notify_unplug_all(vmem);
@@ -622,6 +630,11 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem)
if (vmem->dynamic_memslots) {
virtio_mem_deactivate_unplugged_memslots(vmem, 0, region_size);
}
+ if (rb->flags & RAM_GUEST_MEMFD_START_SHARED) {
+ kvm_set_memory_attributes_shared(vmem->addr, used);
+ ram_block_attributes_state_change(rb->attributes,
+ 0, used, false);
+ }
}
trace_virtio_mem_unplugged_all();
@@ -859,6 +872,18 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
rb = vmem->memdev->mr.ram_block;
page_size = qemu_ram_pagesize(rb);
+ /*
+ * For CoCo VMs with guest_memfd, use the "start-shared" model:
+ * memory starts as shared and the guest converts to private on
+ * plug.
+ */
+ if (rb->flags & RAM_GUEST_MEMFD) {
+ rb->flags |= RAM_GUEST_MEMFD_START_SHARED;
+ ram_block_attributes_state_change(rb->attributes, 0,
+ qemu_ram_get_used_length(rb),
+ false);
+ }
+
if (virtio_mem_has_legacy_guests()) {
switch (vmem->unplugged_inaccessible) {
case ON_OFF_AUTO_AUTO:
--
2.54.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v4 11/13] system/physmem: make ram_block_discard_range() handle guest_memfd
2026-05-04 12:30 ` [PATCH v4 11/13] system/physmem: make ram_block_discard_range() handle guest_memfd Marc-André Lureau
@ 2026-05-13 20:37 ` Peter Xu
0 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2026-05-13 20:37 UTC (permalink / raw)
To: Marc-André Lureau; +Cc: qemu-devel
On Mon, May 04, 2026 at 04:30:17PM +0400, Marc-André Lureau wrote:
> Most callers of ram_block_discard_range() want to discard both the
> shared and guest_memfd backing. Only kvm_convert_memory() intentionally
> discards a single plane during private/shared conversions.
>
> Rename the current implementation to ram_block_discard_shared_range()
> and make ram_block_discard_range() a composite that also discards
> guest_memfd when present (rb->guest_memfd >= 0). This ensures callers
> like virtio-mem, virtio-balloon, hv-balloon, migration.. reclaim
> private pages on discard.
>
> Update kvm_convert_memory() to use the plane-specific
> ram_block_discard_shared_range() since it only needs to discard
> the shared backing when converting to private.
>
> Likewise, after TDVF image copy, use ram_block_discard_shared_range().
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 12/13] monitor: add 'info ramblock-attributes' command
2026-05-04 12:30 ` [PATCH v4 12/13] monitor: add 'info ramblock-attributes' command Marc-André Lureau
@ 2026-05-13 20:39 ` Peter Xu
0 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2026-05-13 20:39 UTC (permalink / raw)
To: Marc-André Lureau
Cc: qemu-devel, Markus Armbruster, Dr. David Alan Gilbert
On Mon, May 04, 2026 at 04:30:18PM +0400, Marc-André Lureau wrote:
> Add a new 'info ramblock-attributes' HMP command and the corresponding
> 'x-query-ramblock-attributes' QMP command to display the shared/private
> memory attributes for ram blocks.
>
> The QMP command returns structured data (RamBlockAttributesInfo list
> with per-range shared/populated attributes), while HMP formats it for
> human consumption.
>
> This is useful for debugging confidential guests (TDX, SNP) to inspect
> which memory regions are shared vs private, and their population state
> when a RamDiscardManager is present (e.g. virtio-mem).
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Nobody is copied on this patch.. Let me add Markus and Dave at minimum..
Only one trivial question below:
> ---
> qapi/machine.json | 55 +++++++++++++++++++++++++++++++++
> include/monitor/hmp.h | 1 +
> hw/core/machine-hmp-cmds.c | 32 +++++++++++++++++++
> system/ram-block-attributes.c | 72 +++++++++++++++++++++++++++++++++++++++++++
> hmp-commands-info.hx | 13 ++++++++
> 5 files changed, 173 insertions(+)
>
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 685e4e29b87..aac8a235cf6 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1738,6 +1738,61 @@
> 'returns': 'HumanReadableText',
> 'features': [ 'unstable' ] }
>
> +##
> +# @RamBlockAttributeRange:
> +#
> +# A contiguous range within a ram block with uniform attributes.
> +#
> +# @start: start offset in bytes within the ram block
> +#
> +# @length: length in bytes of the range
> +#
> +# @shared: true if the range is shared, false if private
> +#
> +# @populated: true if the range is populated (only present when a
> +# RamDiscardManager is managing the block)
> +#
> +# Since: 11.1
> +##
> +{ 'struct': 'RamBlockAttributeRange',
> + 'data': { 'start': 'uint64',
> + 'length': 'uint64',
> + 'shared': 'bool',
> + '*populated': 'bool' } }
> +
> +##
> +# @RamBlockAttributesInfo:
> +#
> +# Shared/private memory attributes for a ram block.
> +#
> +# @name: the ram block identifier
> +#
> +# @ranges: list of attribute ranges
> +#
> +# Since: 11.1
> +##
> +{ 'struct': 'RamBlockAttributesInfo',
> + 'data': { 'name': 'str',
> + 'ranges': [ 'RamBlockAttributeRange' ] } }
> +
> +##
> +# @x-query-ramblock-attributes:
> +#
> +# Query ram block shared/private attributes. This is useful
> +# to debug confidential guests.
> +#
> +# Features:
> +#
> +# @unstable: This command is meant for debugging.
> +#
> +# Returns: list of ram block attributes
> +#
> +# Since: 11.1
> +##
> +{ 'command': 'x-query-ramblock-attributes',
> + 'returns': [ 'RamBlockAttributesInfo' ],
> + 'features': [ 'unstable' ] }
> +
> ##
> # @x-query-roms:
> #
> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> index e222bea60cd..4493952b417 100644
> --- a/include/monitor/hmp.h
> +++ b/include/monitor/hmp.h
> @@ -143,6 +143,7 @@ void hmp_info_dump(Monitor *mon, const QDict *qdict);
> void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
> void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
> void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict);
> +void hmp_info_ramblock_attributes(Monitor *mon, const QDict *qdict);
> void hmp_info_replay(Monitor *mon, const QDict *qdict);
> void hmp_replay_break(Monitor *mon, const QDict *qdict);
> void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
> diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
> index 46846f741a2..122e1a0f735 100644
> --- a/hw/core/machine-hmp-cmds.c
> +++ b/hw/core/machine-hmp-cmds.c
> @@ -24,6 +24,7 @@
> #include "qapi/string-output-visitor.h"
> #include "qemu/error-report.h"
> #include "system/numa.h"
> +#include "system/ramblock.h"
> #include "hw/core/boards.h"
>
> void hmp_info_cpus(Monitor *mon, const QDict *qdict)
> @@ -388,3 +389,34 @@ void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict)
> }
> hmp_handle_error(mon, err);
> }
> +
> +void hmp_info_ramblock_attributes(Monitor *mon, const QDict *qdict)
> +{
> + Error *err = NULL;
> + g_autoptr(RamBlockAttributesInfoList) list = NULL;
> + RamBlockAttributesInfoList *it;
> +
> + list = qmp_x_query_ramblock_attributes(&err);
> + if (hmp_handle_error(mon, err)) {
> + return;
> + }
> +
> + for (it = list; it; it = it->next) {
> + RamBlockAttributesInfo *rba = it->value;
> + RamBlockAttributeRangeList *r;
> +
> + monitor_printf(mon, "%s:\n", rba->name);
> + for (r = rba->ranges; r; r = r->next) {
> + RamBlockAttributeRange *range = r->value;
> + const char *shared = range->shared ? "shared" : "private";
> + const char *pop = range->has_populated ?
> + (range->populated ? "+populated" : "-populated") : "";
> +
> + monitor_printf(mon,
> + " 0x%016" PRIx64 "-0x%016" PRIx64 " %s%s\n",
> + range->start,
> + range->start + range->length - 1,
> + shared, pop);
> + }
> + }
> +}
> diff --git a/system/ram-block-attributes.c b/system/ram-block-attributes.c
> index 59ec7a28eb0..f9573801b60 100644
> --- a/system/ram-block-attributes.c
> +++ b/system/ram-block-attributes.c
> @@ -11,6 +11,7 @@
>
> #include "qemu/osdep.h"
> #include "qemu/error-report.h"
> +#include "qapi/qapi-commands-machine.h"
> #include "system/ramblock.h"
> #include "trace.h"
>
> @@ -221,3 +222,74 @@ static void ram_block_attributes_class_init(ObjectClass *klass,
> rdsc->get_min_granularity = ram_block_attributes_rds_get_min_granularity;
> rdsc->is_populated = ram_block_attributes_rds_is_populated;
> }
> +
> +RamBlockAttributesInfoList *qmp_x_query_ramblock_attributes(Error **errp)
> +{
> + RamBlockAttributesInfoList *head = NULL, **tail = &head;
> + RAMBlock *block;
> + size_t rba_block_size = ram_block_attributes_get_block_size();
> +
> + RCU_READ_LOCK_GUARD();
> +
> + RAMBLOCK_FOREACH(block) {
> + RamBlockAttributesInfo *rba;
> + RamBlockAttributeRangeList **range_tail;
> + RamBlockAttributes *attr = block->attributes;
> + RamDiscardManager *rdm;
> + bool has_rdm;
> + unsigned long pos;
> +
> + if (!attr) {
> + continue;
> + }
> +
> + rdm = memory_region_get_ram_discard_manager(block->mr);
> + has_rdm = rdm != NULL;
> +
> + rba = g_new0(RamBlockAttributesInfo, 1);
> + rba->name = g_strdup(block->idstr);
> + range_tail = &rba->ranges;
> +
> + pos = 0;
> + while (pos < attr->bitmap_size) {
> + bool is_shared = test_bit(pos, attr->bitmap);
> + unsigned long next;
> + uint64_t start_offset, length;
> + RamBlockAttributeRange *range;
> +
> + if (is_shared) {
> + next = find_next_zero_bit(attr->bitmap,
> + attr->bitmap_size, pos);
> + } else {
> + next = find_next_bit(attr->bitmap,
> + attr->bitmap_size, pos);
> + }
> +
> + start_offset = (uint64_t)pos * rba_block_size;
> + length = (uint64_t)(next - pos) * rba_block_size;
> +
> + range = g_new0(RamBlockAttributeRange, 1);
> + range->start = start_offset;
> + range->length = length;
> + range->shared = is_shared;
> +
> + if (has_rdm) {
Is this only for extra safety? My understanding is when reaching here,
attr!=NULL, which means we have at least one disgard manager source, then
the manager must be there. So I wonder if we could assert. But not a big
deal. Maybe I overlooked that it is needed?
> + MemoryRegionSection section = {
> + .mr = block->mr,
> + .offset_within_region = start_offset,
> + .size = int128_make64(length),
> + };
> + range->has_populated = true;
> + range->populated =
> + ram_discard_manager_is_populated(rdm, §ion);
> + }
> +
> + QAPI_LIST_APPEND(range_tail, range);
> + pos = next;
> + }
> +
> + QAPI_LIST_APPEND(tail, rba);
> + }
> +
> + return head;
> +}
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index 74c741f80e2..1168b4c20ca 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -774,6 +774,19 @@ SRST
> Dump all the ramblocks of the system.
> ERST
>
> + {
> + .name = "ramblock-attributes",
> + .args_type = "",
> + .params = "",
> + .help = "Display ramblock shared/private attributes",
> + .cmd = hmp_info_ramblock_attributes,
> + },
> +
> +SRST
> + ``info ramblock-attributes``
> + Display the shared/private memory attributes for ram blocks.
> +ERST
> +
> {
> .name = "hotpluggable-cpus",
> .args_type = "",
>
> --
> 2.54.0
>
>
--
Peter Xu
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 04/13] system/ram-discard-manager: implement replay via is_populated iteration
2026-05-04 12:30 ` [PATCH v4 04/13] system/ram-discard-manager: implement replay via is_populated iteration Marc-André Lureau
@ 2026-05-13 20:40 ` Peter Xu
0 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2026-05-13 20:40 UTC (permalink / raw)
To: Marc-André Lureau; +Cc: qemu-devel
On Mon, May 04, 2026 at 04:30:10PM +0400, Marc-André Lureau wrote:
> Replace the source-level replay wrappers with a new
> replay_by_populated_state() helper that iterates the section at
> min-granularity, calls is_populated() for each chunk, and aggregates
> consecutive chunks of the same state before invoking the callback.
>
> This moves the iteration logic from individual sources into the manager,
> preparing for multi-source aggregation where the manager must combine
> state from multiple sources anyway.
>
> The replay_populated/replay_discarded vtable entries in
> RamDiscardSourceClass are no longer called but remain in the interface
> for now; they will be removed in follow-up commits along with the
> now-dead source implementations.
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 05/13] virtio-mem: remove replay_populated/replay_discarded implementation
2026-05-04 12:30 ` [PATCH v4 05/13] virtio-mem: remove replay_populated/replay_discarded implementation Marc-André Lureau
@ 2026-05-13 20:40 ` Peter Xu
0 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2026-05-13 20:40 UTC (permalink / raw)
To: Marc-André Lureau; +Cc: qemu-devel
On Mon, May 04, 2026 at 04:30:11PM +0400, Marc-André Lureau wrote:
> The replay iteration logic has been moved into the RamDiscardManager,
> which now iterates at source granularity using is_populated(). The
> source-level replay_populated/replay_discarded methods and their
> helpers are no longer called.
>
> Remove the now-dead replay methods, the VirtIOMEMReplayData struct,
> the virtio_mem_for_each_plugged/unplugged_section() helpers (only used
> by the replay methods), and the virtio_mem_section_cb typedef.
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 06/13] system/ram-discard-manager: drop replay from source interface
2026-05-04 12:30 ` [PATCH v4 06/13] system/ram-discard-manager: drop replay from source interface Marc-André Lureau
@ 2026-05-13 20:40 ` Peter Xu
0 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2026-05-13 20:40 UTC (permalink / raw)
To: Marc-André Lureau; +Cc: qemu-devel
On Mon, May 04, 2026 at 04:30:12PM +0400, Marc-André Lureau wrote:
> Remove replay_populated and replay_discarded from RamDiscardSourceClass
> now that the RamDiscardManager handles replay iteration internally via
> is_populated.
>
> Remove the now-dead replay methods, helpers, and
> for_each_populated/discarded_section() from ram-block-attributes, which
> was the last source still carrying this code.
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 13/13] RFC: hw/virtio: start virtio-mem guest_memfd regions as shared
2026-05-04 12:30 ` [PATCH v4 13/13] RFC: hw/virtio: start virtio-mem guest_memfd regions as shared Marc-André Lureau
@ 2026-05-13 20:47 ` Peter Xu
2026-05-14 7:32 ` Chenyi Qiang
1 sibling, 0 replies; 23+ messages in thread
From: Peter Xu @ 2026-05-13 20:47 UTC (permalink / raw)
To: Marc-André Lureau; +Cc: qemu-devel, Paolo Bonzini, Chenyi Qiang
On Mon, May 04, 2026 at 04:30:19PM +0400, Marc-André Lureau wrote:
> In TDX guests, virtio-mem plug/unplug/re-plug fails because
> kvm_set_phys_mem() unconditionally sets KVM memory attributes to
> PRIVATE for all guest_memfd regions. On re-plug, the PRIVATE->PRIVATE
> transition is a no-op, so KVM doesn't re-AUG pages and the guest's
> TDG.MEM.PAGE.ACCEPT fails.
Know little on TDX, please bare with me..
I saw KVM does a seamcall to ADD or AUG whenever a new EPT pte is set, via
this path:
__tdp_mmu_set_spte_atomic
set_external_spte_present
tdx_sept_set_private_spte <------
On unplug, I'm expecting with your prior patches, gmem pages will be
truncated properly, so they'll be all gone.
Then, qemu does replug -> guest gets that event, start access page -> EPT
violation, KVM resolving page fault with __tdp_mmu_set_spte_atomic() (per
above) and a new page -> triggering AUG (not ADD, since it's post-boot).
Could you elaborate here why AUG is missing in the first place?
Thanks,
>
> Implement the "start-shared" approach: virtio-mem memory starts with
> shared KVM attributes. The guest converts shared->private on plug (via
> set_memory_encrypted -> MapGPA + ACCEPT), and back to shared on unplug
> (via set_memory_decrypted). This ensures every plug triggers a real
> SHARED->PRIVATE transition, causing KVM to AUG fresh pages.
>
> Add RAM_GUEST_MEMFD_START_SHARED flag and set it during virtio-mem
> realize for guest_memfd-backed regions. Use
> ram_block_attributes_state_change() to properly update the attributes
> bitmap through the API. Skip setting PRIVATE in kvm_set_phys_mem()
> when the flag is set. On unplug, explicitly reset KVM attributes to
> shared on the host side to handle the case where the guest skips
> set_memory_decrypted().
>
> See also virtio-comment "[PATCH RFC] virtio-mem: add shared/private memory property details".
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
> include/system/memory.h | 6 ++++++
> accel/kvm/kvm-all.c | 3 ++-
> hw/virtio/virtio-mem.c | 27 ++++++++++++++++++++++++++-
> 3 files changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/include/system/memory.h b/include/system/memory.h
> index 28a75dac4ae..9dbf67efe50 100644
> --- a/include/system/memory.h
> +++ b/include/system/memory.h
> @@ -277,6 +277,12 @@ typedef struct IOMMUTLBEvent {
> */
> #define RAM_PRIVATE (1 << 13)
>
> +/*
> + * RAM with guest_memfd that should start with shared KVM memory
> + * attributes. The guest converts to private on use.
> + */
> +#define RAM_GUEST_MEMFD_START_SHARED (1 << 14)
> +
> static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> IOMMUNotifierFlag flags,
> hwaddr start, hwaddr end,
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 97463a683f4..c034e74c8e5 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -1737,7 +1737,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
> abort();
> }
>
> - if (memory_region_has_guest_memfd(mr)) {
> + if (memory_region_has_guest_memfd(mr) &&
> + !(mr->ram_block->flags & RAM_GUEST_MEMFD_START_SHARED)) {
> err = kvm_set_memory_attributes_private(start_addr, slot_size);
> if (err) {
> error_report("%s: failed to set memory attribute private: %s",
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 35e03ed7599..b46efe21126 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -19,6 +19,7 @@
> #include "system/memory.h"
> #include "system/numa.h"
> #include "system/system.h"
> +#include "system/kvm.h"
> #include "system/ramblock.h"
> #include "system/reset.h"
> #include "system/runstate.h"
> @@ -479,6 +480,11 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa,
> if (vmem->dynamic_memslots) {
> virtio_mem_deactivate_unplugged_memslots(vmem, offset, size);
> }
> + if (rb->flags & RAM_GUEST_MEMFD_START_SHARED) {
> + kvm_set_memory_attributes_shared(start_gpa, size);
> + ram_block_attributes_state_change(rb->attributes,
> + offset, size, false);
> + }
> return 0;
> }
>
> @@ -606,10 +612,12 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem)
> RAMBlock *rb = vmem->memdev->mr.ram_block;
>
> if (vmem->size) {
> + uint64_t used = qemu_ram_get_used_length(rb);
> +
> if (virtio_mem_is_busy()) {
> return -EBUSY;
> }
> - if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
> + if (ram_block_discard_range(rb, 0, used)) {
> return -EBUSY;
> }
> virtio_mem_notify_unplug_all(vmem);
> @@ -622,6 +630,11 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem)
> if (vmem->dynamic_memslots) {
> virtio_mem_deactivate_unplugged_memslots(vmem, 0, region_size);
> }
> + if (rb->flags & RAM_GUEST_MEMFD_START_SHARED) {
> + kvm_set_memory_attributes_shared(vmem->addr, used);
> + ram_block_attributes_state_change(rb->attributes,
> + 0, used, false);
> + }
> }
>
> trace_virtio_mem_unplugged_all();
> @@ -859,6 +872,18 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
> rb = vmem->memdev->mr.ram_block;
> page_size = qemu_ram_pagesize(rb);
>
> + /*
> + * For CoCo VMs with guest_memfd, use the "start-shared" model:
> + * memory starts as shared and the guest converts to private on
> + * plug.
> + */
> + if (rb->flags & RAM_GUEST_MEMFD) {
> + rb->flags |= RAM_GUEST_MEMFD_START_SHARED;
> + ram_block_attributes_state_change(rb->attributes, 0,
> + qemu_ram_get_used_length(rb),
> + false);
> + }
> +
> if (virtio_mem_has_legacy_guests()) {
> switch (vmem->unplugged_inaccessible) {
> case ON_OFF_AUTO_AUTO:
>
> --
> 2.54.0
>
>
--
Peter Xu
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
` (12 preceding siblings ...)
2026-05-04 12:30 ` [PATCH v4 13/13] RFC: hw/virtio: start virtio-mem guest_memfd regions as shared Marc-André Lureau
@ 2026-05-13 20:53 ` Peter Xu
2026-05-14 5:15 ` Chenyi Qiang
13 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2026-05-13 20:53 UTC (permalink / raw)
To: Marc-André Lureau
Cc: qemu-devel, Cédric Le Goater, David Hildenbrand,
Chenyi Qiang, Paolo Bonzini
On Mon, May 04, 2026 at 04:30:06PM +0400, Marc-André Lureau wrote:
> Hi,
>
> This is an attempt to fix the incompatibility of virtio-mem with confidential
> VMs. The solution implements what was discussed earlier with D. Hildenbrand:
> https://patchwork.ozlabs.org/project/qemu-devel/patch/20250407074939.18657-5-chenyi.qiang@intel.com/#3502238
>
> The first patches are misc cleanups. Then some code refactoring to have split a
> manager/source. And finally, the manager learns to deal with multiple sources.
>
> I haven't done thorough testing. I only launched a SEV guest with a virtio-mem
> device. It would be nice to have more tests for those scenarios with
> VFIO/virtio-mem/confvm.. In any case, review & testing needed!
>
> (help fix https://issues.redhat.com/browse/RHEL-131968)
Copy David for virtio-mem: David, please see if you're OK with virtio-mem
side of things; if you have time look at everything it'll be even better.
Copy Chenyi: would you please check with your environment on whether you
still hit issue with this series?
The whole series can be found here:
https://lore.kernel.org/all/20260504-rdm5-v4-0-bdf61e57c1e1@redhat.com/
Side note: this version still contain quite some "over 80 chars" checkpatch
complains... please consider fixing them when repost.
Thanks,
>
> v4:
> - added "system/physmem: make ram_block_discard_range() handle guest_memfd"
> - added "monitor: add 'info ramblock-attributes' command"
> - added "RFC: hw/virtio: start virtio-mem guest_memfd regions as shared"
> - skip calling source in notify_populate (it may not have updated its
> internal state)
> - rebased, collected trailer tags
>
> v3: issues found by Cédric
> - fix assertion error on shutdown, due to rcu-defer cleanup
> - fix API doc warnings
>
> v2:
> - drop replay_{populated,discarded} from source, suggested by Peter Xu
> - add extra manager cleanup
> - add r-b tags for preliminary patches
>
> ---
> Marc-André Lureau (13):
> system/memory: split RamDiscardManager into source and manager
> system/memory: move RamDiscardManager to separate compilation unit
> system/memory: constify section arguments
> system/ram-discard-manager: implement replay via is_populated iteration
> virtio-mem: remove replay_populated/replay_discarded implementation
> system/ram-discard-manager: drop replay from source interface
> system/memory: implement RamDiscardManager multi-source aggregation
> system/physmem: destroy ram block attributes before RCU-deferred reclaim
> system/memory: add RamDiscardManager reference counting and cleanup
> tests: add unit tests for RamDiscardManager multi-source aggregation
> system/physmem: make ram_block_discard_range() handle guest_memfd
> monitor: add 'info ramblock-attributes' command
> RFC: hw/virtio: start virtio-mem guest_memfd regions as shared
>
> MAINTAINERS | 4 +
> qapi/machine.json | 55 ++
> include/hw/vfio/vfio-container.h | 2 +-
> include/hw/vfio/vfio-cpr.h | 2 +-
> include/hw/virtio/virtio-mem.h | 3 -
> include/monitor/hmp.h | 1 +
> include/system/memory.h | 283 +-----
> include/system/ram-discard-manager.h | 358 ++++++++
> include/system/ramblock.h | 6 +-
> accel/kvm/kvm-all.c | 5 +-
> hw/core/machine-hmp-cmds.c | 32 +
> hw/vfio/cpr-legacy.c | 4 +-
> hw/vfio/listener.c | 10 +-
> hw/virtio/virtio-mem.c | 286 ++-----
> migration/ram.c | 6 +-
> system/memory.c | 83 +-
> system/memory_mapping.c | 4 +-
> system/physmem.c | 27 +-
> system/ram-block-attributes.c | 329 +++----
> system/ram-discard-manager.c | 612 +++++++++++++
> target/i386/kvm/tdx.c | 2 +-
> tests/unit/test-ram-discard-manager-stubs.c | 48 ++
> tests/unit/test-ram-discard-manager.c | 1235 +++++++++++++++++++++++++++
> hmp-commands-info.hx | 13 +
> rust/bindings/system-sys/lib.rs | 2 +-
> system/meson.build | 1 +
> system/trace-events | 2 +-
> tests/unit/meson.build | 8 +-
> 28 files changed, 2597 insertions(+), 826 deletions(-)
> ---
> base-commit: ac0cc20ad2fe0b8df2e5d9458e90a095ac711ab1
> change-id: 20260414-rdm5-b6df2366d603
>
> Best regards,
> --
> Marc-André Lureau <marcandre.lureau@redhat.com>
>
--
Peter Xu
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem
2026-05-13 20:53 ` [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Peter Xu
@ 2026-05-14 5:15 ` Chenyi Qiang
0 siblings, 0 replies; 23+ messages in thread
From: Chenyi Qiang @ 2026-05-14 5:15 UTC (permalink / raw)
To: Peter Xu, Marc-André Lureau
Cc: qemu-devel, Cédric Le Goater, David Hildenbrand,
Paolo Bonzini
On 5/14/2026 4:53 AM, Peter Xu wrote:
> On Mon, May 04, 2026 at 04:30:06PM +0400, Marc-André Lureau wrote:
>> Hi,
>>
>> This is an attempt to fix the incompatibility of virtio-mem with confidential
>> VMs. The solution implements what was discussed earlier with D. Hildenbrand:
>> https://patchwork.ozlabs.org/project/qemu-devel/patch/20250407074939.18657-5-chenyi.qiang@intel.com/#3502238
>>
>> The first patches are misc cleanups. Then some code refactoring to have split a
>> manager/source. And finally, the manager learns to deal with multiple sources.
>>
>> I haven't done thorough testing. I only launched a SEV guest with a virtio-mem
>> device. It would be nice to have more tests for those scenarios with
>> VFIO/virtio-mem/confvm.. In any case, review & testing needed!
>>
>> (help fix https://issues.redhat.com/browse/RHEL-131968)
>
> Copy David for virtio-mem: David, please see if you're OK with virtio-mem
> side of things; if you have time look at everything it'll be even better.
>
> Copy Chenyi: would you please check with your environment on whether you
> still hit issue with this series?
With this series along with the ongoing virtio-mem guest change[1], I can
launch the TD guest with a virtio-mem device and do the plug/unplug/replug operations
successfully.
[1] https://lore.kernel.org/lkml/20260401-coco-v1-1-b9c3072e2d9c@redhat.com/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 13/13] RFC: hw/virtio: start virtio-mem guest_memfd regions as shared
2026-05-04 12:30 ` [PATCH v4 13/13] RFC: hw/virtio: start virtio-mem guest_memfd regions as shared Marc-André Lureau
2026-05-13 20:47 ` Peter Xu
@ 2026-05-14 7:32 ` Chenyi Qiang
1 sibling, 0 replies; 23+ messages in thread
From: Chenyi Qiang @ 2026-05-14 7:32 UTC (permalink / raw)
To: Marc-André Lureau, qemu-devel; +Cc: Peter Xu
On 5/4/2026 8:30 PM, Marc-André Lureau wrote:
> In TDX guests, virtio-mem plug/unplug/re-plug fails because
> kvm_set_phys_mem() unconditionally sets KVM memory attributes to
> PRIVATE for all guest_memfd regions. On re-plug, the PRIVATE->PRIVATE
> transition is a no-op, so KVM doesn't re-AUG pages and the guest's
> TDG.MEM.PAGE.ACCEPT fails.
I think private->private conversion is a no-op success, it will continue
to do KVM_PRE_FAULT_MEMORY in kvm_handle_hc_map_gpa_range() and KVM will AUG pages.
>
> Implement the "start-shared" approach: virtio-mem memory starts with
> shared KVM attributes. The guest converts shared->private on plug (via
> set_memory_encrypted -> MapGPA + ACCEPT), and back to shared on unplug
> (via set_memory_decrypted). This ensures every plug triggers a real
> SHARED->PRIVATE transition, causing KVM to AUG fresh pages.
>
> Add RAM_GUEST_MEMFD_START_SHARED flag and set it during virtio-mem
> realize for guest_memfd-backed regions. Use
> ram_block_attributes_state_change() to properly update the attributes
> bitmap through the API. Skip setting PRIVATE in kvm_set_phys_mem()
> when the flag is set. On unplug, explicitly reset KVM attributes to
> shared on the host side to handle the case where the guest skips
> set_memory_decrypted().
If we only want to support unplug the shared memory, should we restrict it to check the attribute
instead of resetting to shared unconditionally?
>
> See also virtio-comment "[PATCH RFC] virtio-mem: add shared/private memory property details".
Maybe I missed some context, can you provide the link to this RFC patch?
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2026-05-14 7:33 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 12:30 [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 01/13] system/memory: split RamDiscardManager into source and manager Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 02/13] system/memory: move RamDiscardManager to separate compilation unit Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 03/13] system/memory: constify section arguments Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 04/13] system/ram-discard-manager: implement replay via is_populated iteration Marc-André Lureau
2026-05-13 20:40 ` Peter Xu
2026-05-04 12:30 ` [PATCH v4 05/13] virtio-mem: remove replay_populated/replay_discarded implementation Marc-André Lureau
2026-05-13 20:40 ` Peter Xu
2026-05-04 12:30 ` [PATCH v4 06/13] system/ram-discard-manager: drop replay from source interface Marc-André Lureau
2026-05-13 20:40 ` Peter Xu
2026-05-04 12:30 ` [PATCH v4 07/13] system/memory: implement RamDiscardManager multi-source aggregation Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 08/13] system/physmem: destroy ram block attributes before RCU-deferred reclaim Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 09/13] system/memory: add RamDiscardManager reference counting and cleanup Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 10/13] tests: add unit tests for RamDiscardManager multi-source aggregation Marc-André Lureau
2026-05-04 12:30 ` [PATCH v4 11/13] system/physmem: make ram_block_discard_range() handle guest_memfd Marc-André Lureau
2026-05-13 20:37 ` Peter Xu
2026-05-04 12:30 ` [PATCH v4 12/13] monitor: add 'info ramblock-attributes' command Marc-André Lureau
2026-05-13 20:39 ` Peter Xu
2026-05-04 12:30 ` [PATCH v4 13/13] RFC: hw/virtio: start virtio-mem guest_memfd regions as shared Marc-André Lureau
2026-05-13 20:47 ` Peter Xu
2026-05-14 7:32 ` Chenyi Qiang
2026-05-13 20:53 ` [PATCH v4 00/13] Make RamDiscardManager work with multiple sources & virtio-mem Peter Xu
2026-05-14 5:15 ` Chenyi Qiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox