From: "Gupta, Pankaj" <pankaj.gupta@amd.com>
To: "Chenyi Qiang" <chenyi.qiang@intel.com>,
"David Hildenbrand" <david@redhat.com>,
"Alexey Kardashevskiy" <aik@amd.com>,
"Peter Xu" <peterx@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Michael Roth" <michael.roth@amd.com>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org,
Williams Dan J <dan.j.williams@intel.com>,
Peng Chao P <chao.p.peng@intel.com>,
Gao Chao <chao.gao@intel.com>, Xu Yilun <yilun.xu@intel.com>,
Li Xiaoyao <xiaoyao.li@intel.com>
Subject: Re: [PATCH v3 4/7] memory-attribute-manager: Introduce MemoryAttributeManager to manage RAMBLock with guest_memfd
Date: Fri, 14 Mar 2025 13:11:13 +0100 [thread overview]
Message-ID: <2ab368b2-62ca-4163-a483-68e9d332201a@amd.com> (raw)
In-Reply-To: <20250310081837.13123-5-chenyi.qiang@intel.com>
On 3/10/2025 9:18 AM, Chenyi Qiang wrote:
> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require
> uncoordinated discard") highlighted, some subsystems like VFIO may
> disable ram block discard. However, guest_memfd relies on the discard
> operation to perform page conversion between private and shared memory.
> This can lead to stale IOMMU mapping issue when assigning a hardware
> device to a confidential VM via shared memory. To address this, it is
> crucial to ensure systems like VFIO refresh its IOMMU mappings.
>
> RamDiscardManager is an existing concept (used by virtio-mem) to adjust
> VFIO mappings in relation to VM page assignment. Effectively page
> conversion is similar to hot-removing a page in one mode and adding it
> back in the other. Therefore, similar actions are required for page
> conversion events. Introduce the RamDiscardManager to guest_memfd to
> facilitate this process.
>
> Since guest_memfd is not an object, it cannot directly implement the
> RamDiscardManager interface. One potential attempt is to implement it in
> HostMemoryBackend. This is not appropriate because guest_memfd is per
> RAMBlock. Some RAMBlocks have a memory backend but others do not. In
> particular, the ones like virtual BIOS calling
> memory_region_init_ram_guest_memfd() do not.
>
> To manage the RAMBlocks with guest_memfd, define a new object named
> MemoryAttributeManager to implement the RamDiscardManager interface. The
Isn't this should be the other way around. 'MemoryAttributeManager'
should be an interface and RamDiscardManager a type of it, an
implementation?
MemoryAttributeManager have the data like 'shared_bitmap' etc that
information can also be used by the other implementation types?
Or maybe I am getting it entirely wrong.
Thanks,
Pankaj
> object stores guest_memfd information such as shared_bitmap, and handles
> page conversion notification. Using the name of MemoryAttributeManager is
> aimed to make it more generic. The term "Memory" emcompasses not only RAM
> but also private MMIO in TEE I/O, which might rely on this
> object/interface to handle page conversion events in the future. The
> term "Attribute" allows for the management of various attributes beyond
> shared and private. For instance, it could support scenarios where
> discard vs. populated and shared vs. private states co-exists, such as
> supporting virtio-mem or something similar in the future.
>
> In the current context, MemoryAttributeManager signifies discarded state
> as private and populated state as shared. Memory state is tracked at the
> host page size granularity, as the minimum memory conversion size can be one
> page per request. Additionally, VFIO expects the DMA mapping for a
> specific iova to be mapped and unmapped with the same granularity.
> Confidential VMs may perform partial conversions, e.g. conversion
> happens on a small region within a large region. To prevent such invalid
> cases and until cut_mapping operation support is introduced, all
> operations are performed with 4K granularity.
>
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v3:
> - Some rename (bitmap_size->shared_bitmap_size,
> first_one/zero_bit->first_bit, etc.)
> - Change shared_bitmap_size from uint32_t to unsigned
> - Return mgr->mr->ram_block->page_size in get_block_size()
> - Move set_ram_discard_manager() up to avoid a g_free() in failure
> case.
> - Add const for the memory_attribute_manager_get_block_size()
> - Unify the ReplayRamPopulate and ReplayRamDiscard and related
> callback.
>
> Changes in v2:
> - Rename the object name to MemoryAttributeManager
> - Rename the bitmap to shared_bitmap to make it more clear.
> - Remove block_size field and get it from a helper. In future, we
> can get the page_size from RAMBlock if necessary.
> - Remove the unncessary "struct" before GuestMemfdReplayData
> - Remove the unncessary g_free() for the bitmap
> - Add some error report when the callback failure for
> populated/discarded section.
> - Move the realize()/unrealize() definition to this patch.
> ---
> include/system/memory-attribute-manager.h | 42 ++++
> system/memory-attribute-manager.c | 283 ++++++++++++++++++++++
> system/meson.build | 1 +
> 3 files changed, 326 insertions(+)
> create mode 100644 include/system/memory-attribute-manager.h
> create mode 100644 system/memory-attribute-manager.c
>
> diff --git a/include/system/memory-attribute-manager.h b/include/system/memory-attribute-manager.h
> new file mode 100644
> index 0000000000..23375a14b8
> --- /dev/null
> +++ b/include/system/memory-attribute-manager.h
> @@ -0,0 +1,42 @@
> +/*
> + * QEMU memory attribute manager
> + *
> + * Copyright Intel
> + *
> + * Author:
> + * Chenyi Qiang <chenyi.qiang@intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory
> + *
> + */
> +
> +#ifndef SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H
> +#define SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H
> +
> +#include "system/hostmem.h"
> +
> +#define TYPE_MEMORY_ATTRIBUTE_MANAGER "memory-attribute-manager"
> +
> +OBJECT_DECLARE_TYPE(MemoryAttributeManager, MemoryAttributeManagerClass, MEMORY_ATTRIBUTE_MANAGER)
> +
> +struct MemoryAttributeManager {
> + Object parent;
> +
> + MemoryRegion *mr;
> +
> + /* 1-setting of the bit represents the memory is populated (shared) */
> + unsigned shared_bitmap_size;
> + unsigned long *shared_bitmap;
> +
> + QLIST_HEAD(, RamDiscardListener) rdl_list;
> +};
> +
> +struct MemoryAttributeManagerClass {
> + ObjectClass parent_class;
> +};
> +
> +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr);
> +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr);
> +
> +#endif
> diff --git a/system/memory-attribute-manager.c b/system/memory-attribute-manager.c
> new file mode 100644
> index 0000000000..7c3789cf49
> --- /dev/null
> +++ b/system/memory-attribute-manager.c
> @@ -0,0 +1,283 @@
> +/*
> + * QEMU memory attribute manager
> + *
> + * Copyright Intel
> + *
> + * Author:
> + * Chenyi Qiang <chenyi.qiang@intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "exec/ramblock.h"
> +#include "system/memory-attribute-manager.h"
> +
> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(MemoryAttributeManager,
> + memory_attribute_manager,
> + MEMORY_ATTRIBUTE_MANAGER,
> + OBJECT,
> + { TYPE_RAM_DISCARD_MANAGER },
> + { })
> +
> +static size_t memory_attribute_manager_get_block_size(const MemoryAttributeManager *mgr)
> +{
> + /*
> + * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
> + * Use the host page size as the granularity to track the memory attribute.
> + */
> + g_assert(mgr && mgr->mr && mgr->mr->ram_block);
> + g_assert(mgr->mr->ram_block->page_size == qemu_real_host_page_size());
> + return mgr->mr->ram_block->page_size;
> +}
> +
> +
> +static bool memory_attribute_rdm_is_populated(const RamDiscardManager *rdm,
> + const MemoryRegionSection *section)
> +{
> + const MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> + const int block_size = memory_attribute_manager_get_block_size(mgr);
> + uint64_t first_bit = section->offset_within_region / block_size;
> + uint64_t last_bit = first_bit + int128_get64(section->size) / block_size - 1;
> + unsigned long first_discard_bit;
> +
> + first_discard_bit = find_next_zero_bit(mgr->shared_bitmap, last_bit + 1, first_bit);
> + return first_discard_bit > last_bit;
> +}
> +
> +typedef int (*memory_attribute_section_cb)(MemoryRegionSection *s, void *arg);
> +
> +static int memory_attribute_notify_populate_cb(MemoryRegionSection *section, void *arg)
> +{
> + RamDiscardListener *rdl = arg;
> +
> + return rdl->notify_populate(rdl, section);
> +}
> +
> +static int memory_attribute_notify_discard_cb(MemoryRegionSection *section, void *arg)
> +{
> + RamDiscardListener *rdl = arg;
> +
> + rdl->notify_discard(rdl, section);
> +
> + return 0;
> +}
> +
> +static int memory_attribute_for_each_populated_section(const MemoryAttributeManager *mgr,
> + MemoryRegionSection *section,
> + void *arg,
> + memory_attribute_section_cb cb)
> +{
> + unsigned long first_bit, last_bit;
> + uint64_t offset, size;
> + const int block_size = memory_attribute_manager_get_block_size(mgr);
> + int ret = 0;
> +
> + first_bit = section->offset_within_region / block_size;
> + first_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, first_bit);
> +
> + while (first_bit < mgr->shared_bitmap_size) {
> + MemoryRegionSection tmp = *section;
> +
> + offset = first_bit * block_size;
> + last_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
> + first_bit + 1) - 1;
> + size = (last_bit - first_bit + 1) * block_size;
> +
> + if (!memory_region_section_intersect_range(&tmp, offset, size)) {
> + break;
> + }
> +
> + ret = cb(&tmp, arg);
> + if (ret) {
> + error_report("%s: Failed to notify RAM discard listener: %s", __func__,
> + strerror(-ret));
> + break;
> + }
> +
> + first_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
> + last_bit + 2);
> + }
> +
> + return ret;
> +}
> +
> +static int memory_attribute_for_each_discarded_section(const MemoryAttributeManager *mgr,
> + MemoryRegionSection *section,
> + void *arg,
> + memory_attribute_section_cb cb)
> +{
> + unsigned long first_bit, last_bit;
> + uint64_t offset, size;
> + const int block_size = memory_attribute_manager_get_block_size(mgr);
> + int ret = 0;
> +
> + first_bit = section->offset_within_region / block_size;
> + first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
> + first_bit);
> +
> + while (first_bit < mgr->shared_bitmap_size) {
> + MemoryRegionSection tmp = *section;
> +
> + offset = first_bit * block_size;
> + last_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
> + first_bit + 1) - 1;
> + size = (last_bit - first_bit + 1) * block_size;
> +
> + if (!memory_region_section_intersect_range(&tmp, offset, size)) {
> + break;
> + }
> +
> + ret = cb(&tmp, arg);
> + if (ret) {
> + error_report("%s: Failed to notify RAM discard listener: %s", __func__,
> + strerror(-ret));
> + break;
> + }
> +
> + first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
> + last_bit + 2);
> + }
> +
> + return ret;
> +}
> +
> +static uint64_t memory_attribute_rdm_get_min_granularity(const RamDiscardManager *rdm,
> + const MemoryRegion *mr)
> +{
> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> +
> + g_assert(mr == mgr->mr);
> + return memory_attribute_manager_get_block_size(mgr);
> +}
> +
> +static void memory_attribute_rdm_register_listener(RamDiscardManager *rdm,
> + RamDiscardListener *rdl,
> + MemoryRegionSection *section)
> +{
> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> + int ret;
> +
> + g_assert(section->mr == mgr->mr);
> + rdl->section = memory_region_section_new_copy(section);
> +
> + QLIST_INSERT_HEAD(&mgr->rdl_list, rdl, next);
> +
> + ret = memory_attribute_for_each_populated_section(mgr, section, rdl,
> + memory_attribute_notify_populate_cb);
> + if (ret) {
> + error_report("%s: Failed to register RAM discard listener: %s", __func__,
> + strerror(-ret));
> + }
> +}
> +
> +static void memory_attribute_rdm_unregister_listener(RamDiscardManager *rdm,
> + RamDiscardListener *rdl)
> +{
> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> + int ret;
> +
> + g_assert(rdl->section);
> + g_assert(rdl->section->mr == mgr->mr);
> +
> + ret = memory_attribute_for_each_populated_section(mgr, rdl->section, rdl,
> + memory_attribute_notify_discard_cb);
> + if (ret) {
> + error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
> + strerror(-ret));
> + }
> +
> + memory_region_section_free_copy(rdl->section);
> + rdl->section = NULL;
> + QLIST_REMOVE(rdl, next);
> +
> +}
> +
> +typedef struct MemoryAttributeReplayData {
> + ReplayRamStateChange fn;
> + void *opaque;
> +} MemoryAttributeReplayData;
> +
> +static int memory_attribute_rdm_replay_cb(MemoryRegionSection *section, void *arg)
> +{
> + MemoryAttributeReplayData *data = arg;
> +
> + return data->fn(section, data->opaque);
> +}
> +
> +static int memory_attribute_rdm_replay_populated(const RamDiscardManager *rdm,
> + MemoryRegionSection *section,
> + ReplayRamStateChange replay_fn,
> + void *opaque)
> +{
> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> + MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
> +
> + g_assert(section->mr == mgr->mr);
> + return memory_attribute_for_each_populated_section(mgr, section, &data,
> + memory_attribute_rdm_replay_cb);
> +}
> +
> +static int memory_attribute_rdm_replay_discarded(const RamDiscardManager *rdm,
> + MemoryRegionSection *section,
> + ReplayRamStateChange replay_fn,
> + void *opaque)
> +{
> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> + MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
> +
> + g_assert(section->mr == mgr->mr);
> + return memory_attribute_for_each_discarded_section(mgr, section, &data,
> + memory_attribute_rdm_replay_cb);
> +}
> +
> +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr)
> +{
> + uint64_t shared_bitmap_size;
> + const int block_size = qemu_real_host_page_size();
> + int ret;
> +
> + shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
> +
> + mgr->mr = mr;
> + ret = memory_region_set_ram_discard_manager(mgr->mr, RAM_DISCARD_MANAGER(mgr));
> + if (ret) {
> + return ret;
> + }
> + mgr->shared_bitmap_size = shared_bitmap_size;
> + mgr->shared_bitmap = bitmap_new(shared_bitmap_size);
> +
> + return ret;
> +}
> +
> +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr)
> +{
> + g_free(mgr->shared_bitmap);
> + memory_region_set_ram_discard_manager(mgr->mr, NULL);
> +}
> +
> +static void memory_attribute_manager_init(Object *obj)
> +{
> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(obj);
> +
> + QLIST_INIT(&mgr->rdl_list);
> +}
> +
> +static void memory_attribute_manager_finalize(Object *obj)
> +{
> +}
> +
> +static void memory_attribute_manager_class_init(ObjectClass *oc, void *data)
> +{
> + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc);
> +
> + rdmc->get_min_granularity = memory_attribute_rdm_get_min_granularity;
> + rdmc->register_listener = memory_attribute_rdm_register_listener;
> + rdmc->unregister_listener = memory_attribute_rdm_unregister_listener;
> + rdmc->is_populated = memory_attribute_rdm_is_populated;
> + rdmc->replay_populated = memory_attribute_rdm_replay_populated;
> + rdmc->replay_discarded = memory_attribute_rdm_replay_discarded;
> +}
Would this initialization be for
> diff --git a/system/meson.build b/system/meson.build
> index 4952f4b2c7..ab07ff1442 100644
> --- a/system/meson.build
> +++ b/system/meson.build
> @@ -15,6 +15,7 @@ system_ss.add(files(
> 'dirtylimit.c',
> 'dma-helpers.c',
> 'globals.c',
> + 'memory-attribute-manager.c',
> 'memory_mapping.c',
> 'qdev-monitor.c',
> 'qtest.c',
next prev parent reply other threads:[~2025-03-14 12:11 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-10 8:18 [PATCH v3 0/7] Enable shared device assignment Chenyi Qiang
2025-03-10 8:18 ` [PATCH v3 1/7] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
2025-03-10 8:18 ` [PATCH v3 2/7] memory: Change memory_region_set_ram_discard_manager() to return the result Chenyi Qiang
2025-03-10 8:18 ` [PATCH v3 3/7] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() Chenyi Qiang
2025-03-17 3:00 ` Kishen Maloor
2025-03-17 4:11 ` Chenyi Qiang
2025-03-10 8:18 ` [PATCH v3 4/7] memory-attribute-manager: Introduce MemoryAttributeManager to manage RAMBLock with guest_memfd Chenyi Qiang
2025-03-14 12:11 ` Gupta, Pankaj [this message]
2025-03-17 2:54 ` Chenyi Qiang
2025-03-17 10:36 ` David Hildenbrand
2025-03-17 17:01 ` Gupta, Pankaj
2025-03-18 1:54 ` Chenyi Qiang
2025-03-19 8:55 ` Gupta, Pankaj
2025-03-19 11:23 ` Chenyi Qiang
2025-03-19 11:56 ` Gupta, Pankaj
2025-03-20 3:15 ` Chenyi Qiang
2025-03-24 4:04 ` Chenyi Qiang
2025-03-10 8:18 ` [PATCH v3 5/7] memory-attribute-manager: Introduce a callback to notify the shared/private state change Chenyi Qiang
2025-03-10 8:18 ` [PATCH v3 6/7] memory: Attach MemoryAttributeManager to guest_memfd-backed RAMBlocks Chenyi Qiang
2025-03-14 8:21 ` Chenyi Qiang
2025-03-14 9:00 ` David Hildenbrand
2025-03-14 9:30 ` Chenyi Qiang
2025-03-14 9:50 ` David Hildenbrand
2025-03-14 10:23 ` Chenyi Qiang
2025-03-31 8:55 ` Chenyi Qiang
2025-03-17 6:18 ` Tony Lindgren
2025-03-17 7:32 ` Chenyi Qiang
2025-03-17 9:45 ` Tony Lindgren
2025-03-17 10:21 ` Chenyi Qiang
2025-03-17 11:07 ` Tony Lindgren
2025-03-10 8:18 ` [PATCH v3 7/7] RAMBlock: Make guest_memfd require coordinate discard Chenyi Qiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2ab368b2-62ca-4163-a483-68e9d332201a@amd.com \
--to=pankaj.gupta@amd.com \
--cc=aik@amd.com \
--cc=chao.gao@intel.com \
--cc=chao.p.peng@intel.com \
--cc=chenyi.qiang@intel.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=xiaoyao.li@intel.com \
--cc=yilun.xu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox