public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: "Gupta, Pankaj" <pankaj.gupta@amd.com>
To: "Chenyi Qiang" <chenyi.qiang@intel.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Alexey Kardashevskiy" <aik@amd.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Michael Roth" <michael.roth@amd.com>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>, Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: Re: [PATCH v3 4/7] memory-attribute-manager: Introduce MemoryAttributeManager to manage RAMBLock with guest_memfd
Date: Fri, 14 Mar 2025 13:11:13 +0100	[thread overview]
Message-ID: <2ab368b2-62ca-4163-a483-68e9d332201a@amd.com> (raw)
In-Reply-To: <20250310081837.13123-5-chenyi.qiang@intel.com>

On 3/10/2025 9:18 AM, Chenyi Qiang wrote:
> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require
> uncoordinated discard") highlighted, some subsystems like VFIO may
> disable ram block discard. However, guest_memfd relies on the discard
> operation to perform page conversion between private and shared memory.
> This can lead to stale IOMMU mapping issue when assigning a hardware
> device to a confidential VM via shared memory. To address this, it is
> crucial to ensure systems like VFIO refresh its IOMMU mappings.
> 
> RamDiscardManager is an existing concept (used by virtio-mem) to adjust
> VFIO mappings in relation to VM page assignment. Effectively page
> conversion is similar to hot-removing a page in one mode and adding it
> back in the other. Therefore, similar actions are required for page
> conversion events. Introduce the RamDiscardManager to guest_memfd to
> facilitate this process.
> 
> Since guest_memfd is not an object, it cannot directly implement the
> RamDiscardManager interface. One potential attempt is to implement it in
> HostMemoryBackend. This is not appropriate because guest_memfd is per
> RAMBlock. Some RAMBlocks have a memory backend but others do not. In
> particular, the ones like virtual BIOS calling
> memory_region_init_ram_guest_memfd() do not.
> 
> To manage the RAMBlocks with guest_memfd, define a new object named
> MemoryAttributeManager to implement the RamDiscardManager interface. The

Isn't this should be the other way around. 'MemoryAttributeManager' 
should be an interface and RamDiscardManager a type of it, an 
implementation?

MemoryAttributeManager have the data like 'shared_bitmap' etc that
information can also be used by the other implementation types?

Or maybe I am getting it entirely wrong.

Thanks,
Pankaj

> object stores guest_memfd information such as shared_bitmap, and handles
> page conversion notification. Using the name of MemoryAttributeManager is
> aimed to make it more generic. The term "Memory" emcompasses not only RAM
> but also private MMIO in TEE I/O, which might rely on this
> object/interface to handle page conversion events in the future. The
> term "Attribute" allows for the management of various attributes beyond
> shared and private. For instance, it could support scenarios where
> discard vs. populated and shared vs. private states co-exists, such as
> supporting virtio-mem or something similar in the future.
> 
> In the current context, MemoryAttributeManager signifies discarded state
> as private and populated state as shared. Memory state is tracked at the
> host page size granularity, as the minimum memory conversion size can be one
> page per request. Additionally, VFIO expects the DMA mapping for a
> specific iova to be mapped and unmapped with the same granularity.
> Confidential VMs may perform  partial conversions, e.g. conversion
> happens on a small region within a large region. To prevent such invalid
> cases and until cut_mapping operation support is introduced, all
> operations are performed with 4K granularity.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v3:
>      - Some rename (bitmap_size->shared_bitmap_size,
>        first_one/zero_bit->first_bit, etc.)
>      - Change shared_bitmap_size from uint32_t to unsigned
>      - Return mgr->mr->ram_block->page_size in get_block_size()
>      - Move set_ram_discard_manager() up to avoid a g_free() in failure
>        case.
>      - Add const for the memory_attribute_manager_get_block_size()
>      - Unify the ReplayRamPopulate and ReplayRamDiscard and related
>        callback.
> 
> Changes in v2:
>      - Rename the object name to MemoryAttributeManager
>      - Rename the bitmap to shared_bitmap to make it more clear.
>      - Remove block_size field and get it from a helper. In future, we
>        can get the page_size from RAMBlock if necessary.
>      - Remove the unncessary "struct" before GuestMemfdReplayData
>      - Remove the unncessary g_free() for the bitmap
>      - Add some error report when the callback failure for
>        populated/discarded section.
>      - Move the realize()/unrealize() definition to this patch.
> ---
>   include/system/memory-attribute-manager.h |  42 ++++
>   system/memory-attribute-manager.c         | 283 ++++++++++++++++++++++
>   system/meson.build                        |   1 +
>   3 files changed, 326 insertions(+)
>   create mode 100644 include/system/memory-attribute-manager.h
>   create mode 100644 system/memory-attribute-manager.c
> 
> diff --git a/include/system/memory-attribute-manager.h b/include/system/memory-attribute-manager.h
> new file mode 100644
> index 0000000000..23375a14b8
> --- /dev/null
> +++ b/include/system/memory-attribute-manager.h
> @@ -0,0 +1,42 @@
> +/*
> + * QEMU memory attribute manager
> + *
> + * Copyright Intel
> + *
> + * Author:
> + *      Chenyi Qiang <chenyi.qiang@intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory
> + *
> + */
> +
> +#ifndef SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H
> +#define SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H
> +
> +#include "system/hostmem.h"
> +
> +#define TYPE_MEMORY_ATTRIBUTE_MANAGER "memory-attribute-manager"
> +
> +OBJECT_DECLARE_TYPE(MemoryAttributeManager, MemoryAttributeManagerClass, MEMORY_ATTRIBUTE_MANAGER)
> +
> +struct MemoryAttributeManager {
> +    Object parent;
> +
> +    MemoryRegion *mr;
> +
> +    /* 1-setting of the bit represents the memory is populated (shared) */
> +    unsigned shared_bitmap_size;
> +    unsigned long *shared_bitmap;
> +
> +    QLIST_HEAD(, RamDiscardListener) rdl_list;
> +};
> +
> +struct MemoryAttributeManagerClass {
> +    ObjectClass parent_class;
> +};
> +
> +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr);
> +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr);
> +
> +#endif
> diff --git a/system/memory-attribute-manager.c b/system/memory-attribute-manager.c
> new file mode 100644
> index 0000000000..7c3789cf49
> --- /dev/null
> +++ b/system/memory-attribute-manager.c
> @@ -0,0 +1,283 @@
> +/*
> + * QEMU memory attribute manager
> + *
> + * Copyright Intel
> + *
> + * Author:
> + *      Chenyi Qiang <chenyi.qiang@intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "exec/ramblock.h"
> +#include "system/memory-attribute-manager.h"
> +
> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(MemoryAttributeManager,
> +                                   memory_attribute_manager,
> +                                   MEMORY_ATTRIBUTE_MANAGER,
> +                                   OBJECT,
> +                                   { TYPE_RAM_DISCARD_MANAGER },
> +                                   { })
> +
> +static size_t memory_attribute_manager_get_block_size(const MemoryAttributeManager *mgr)
> +{
> +    /*
> +     * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
> +     * Use the host page size as the granularity to track the memory attribute.
> +     */
> +    g_assert(mgr && mgr->mr && mgr->mr->ram_block);
> +    g_assert(mgr->mr->ram_block->page_size == qemu_real_host_page_size());
> +    return mgr->mr->ram_block->page_size;
> +}
> +
> +
> +static bool memory_attribute_rdm_is_populated(const RamDiscardManager *rdm,
> +                                              const MemoryRegionSection *section)
> +{
> +    const MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> +    const int block_size = memory_attribute_manager_get_block_size(mgr);
> +    uint64_t first_bit = section->offset_within_region / block_size;
> +    uint64_t last_bit = first_bit + int128_get64(section->size) / block_size - 1;
> +    unsigned long first_discard_bit;
> +
> +    first_discard_bit = find_next_zero_bit(mgr->shared_bitmap, last_bit + 1, first_bit);
> +    return first_discard_bit > last_bit;
> +}
> +
> +typedef int (*memory_attribute_section_cb)(MemoryRegionSection *s, void *arg);
> +
> +static int memory_attribute_notify_populate_cb(MemoryRegionSection *section, void *arg)
> +{
> +    RamDiscardListener *rdl = arg;
> +
> +    return rdl->notify_populate(rdl, section);
> +}
> +
> +static int memory_attribute_notify_discard_cb(MemoryRegionSection *section, void *arg)
> +{
> +    RamDiscardListener *rdl = arg;
> +
> +    rdl->notify_discard(rdl, section);
> +
> +    return 0;
> +}
> +
> +static int memory_attribute_for_each_populated_section(const MemoryAttributeManager *mgr,
> +                                                       MemoryRegionSection *section,
> +                                                       void *arg,
> +                                                       memory_attribute_section_cb cb)
> +{
> +    unsigned long first_bit, last_bit;
> +    uint64_t offset, size;
> +    const int block_size = memory_attribute_manager_get_block_size(mgr);
> +    int ret = 0;
> +
> +    first_bit = section->offset_within_region / block_size;
> +    first_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size, first_bit);
> +
> +    while (first_bit < mgr->shared_bitmap_size) {
> +        MemoryRegionSection tmp = *section;
> +
> +        offset = first_bit * block_size;
> +        last_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
> +                                      first_bit + 1) - 1;
> +        size = (last_bit - first_bit + 1) * block_size;
> +
> +        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
> +            break;
> +        }
> +
> +        ret = cb(&tmp, arg);
> +        if (ret) {
> +            error_report("%s: Failed to notify RAM discard listener: %s", __func__,
> +                         strerror(-ret));
> +            break;
> +        }
> +
> +        first_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
> +                                  last_bit + 2);
> +    }
> +
> +    return ret;
> +}
> +
> +static int memory_attribute_for_each_discarded_section(const MemoryAttributeManager *mgr,
> +                                                       MemoryRegionSection *section,
> +                                                       void *arg,
> +                                                       memory_attribute_section_cb cb)
> +{
> +    unsigned long first_bit, last_bit;
> +    uint64_t offset, size;
> +    const int block_size = memory_attribute_manager_get_block_size(mgr);
> +    int ret = 0;
> +
> +    first_bit = section->offset_within_region / block_size;
> +    first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
> +                                   first_bit);
> +
> +    while (first_bit < mgr->shared_bitmap_size) {
> +        MemoryRegionSection tmp = *section;
> +
> +        offset = first_bit * block_size;
> +        last_bit = find_next_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
> +                                      first_bit + 1) - 1;
> +        size = (last_bit - first_bit + 1) * block_size;
> +
> +        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
> +            break;
> +        }
> +
> +        ret = cb(&tmp, arg);
> +        if (ret) {
> +            error_report("%s: Failed to notify RAM discard listener: %s", __func__,
> +                         strerror(-ret));
> +            break;
> +        }
> +
> +        first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr->shared_bitmap_size,
> +                                       last_bit + 2);
> +    }
> +
> +    return ret;
> +}
> +
> +static uint64_t memory_attribute_rdm_get_min_granularity(const RamDiscardManager *rdm,
> +                                                         const MemoryRegion *mr)
> +{
> +    MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> +
> +    g_assert(mr == mgr->mr);
> +    return memory_attribute_manager_get_block_size(mgr);
> +}
> +
> +static void memory_attribute_rdm_register_listener(RamDiscardManager *rdm,
> +                                                   RamDiscardListener *rdl,
> +                                                   MemoryRegionSection *section)
> +{
> +    MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> +    int ret;
> +
> +    g_assert(section->mr == mgr->mr);
> +    rdl->section = memory_region_section_new_copy(section);
> +
> +    QLIST_INSERT_HEAD(&mgr->rdl_list, rdl, next);
> +
> +    ret = memory_attribute_for_each_populated_section(mgr, section, rdl,
> +                                                      memory_attribute_notify_populate_cb);
> +    if (ret) {
> +        error_report("%s: Failed to register RAM discard listener: %s", __func__,
> +                     strerror(-ret));
> +    }
> +}
> +
> +static void memory_attribute_rdm_unregister_listener(RamDiscardManager *rdm,
> +                                                     RamDiscardListener *rdl)
> +{
> +    MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> +    int ret;
> +
> +    g_assert(rdl->section);
> +    g_assert(rdl->section->mr == mgr->mr);
> +
> +    ret = memory_attribute_for_each_populated_section(mgr, rdl->section, rdl,
> +                                                      memory_attribute_notify_discard_cb);
> +    if (ret) {
> +        error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
> +                     strerror(-ret));
> +    }
> +
> +    memory_region_section_free_copy(rdl->section);
> +    rdl->section = NULL;
> +    QLIST_REMOVE(rdl, next);
> +
> +}
> +
> +typedef struct MemoryAttributeReplayData {
> +    ReplayRamStateChange fn;
> +    void *opaque;
> +} MemoryAttributeReplayData;
> +
> +static int memory_attribute_rdm_replay_cb(MemoryRegionSection *section, void *arg)
> +{
> +    MemoryAttributeReplayData *data = arg;
> +
> +    return data->fn(section, data->opaque);
> +}
> +
> +static int memory_attribute_rdm_replay_populated(const RamDiscardManager *rdm,
> +                                                 MemoryRegionSection *section,
> +                                                 ReplayRamStateChange replay_fn,
> +                                                 void *opaque)
> +{
> +    MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> +    MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
> +
> +    g_assert(section->mr == mgr->mr);
> +    return memory_attribute_for_each_populated_section(mgr, section, &data,
> +                                                       memory_attribute_rdm_replay_cb);
> +}
> +
> +static int memory_attribute_rdm_replay_discarded(const RamDiscardManager *rdm,
> +                                                 MemoryRegionSection *section,
> +                                                 ReplayRamStateChange replay_fn,
> +                                                 void *opaque)
> +{
> +    MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm);
> +    MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
> +
> +    g_assert(section->mr == mgr->mr);
> +    return memory_attribute_for_each_discarded_section(mgr, section, &data,
> +                                                       memory_attribute_rdm_replay_cb);
> +}
> +
> +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr)
> +{
> +    uint64_t shared_bitmap_size;
> +    const int block_size  = qemu_real_host_page_size();
> +    int ret;
> +
> +    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
> +
> +    mgr->mr = mr;
> +    ret = memory_region_set_ram_discard_manager(mgr->mr, RAM_DISCARD_MANAGER(mgr));
> +    if (ret) {
> +        return ret;
> +    }
> +    mgr->shared_bitmap_size = shared_bitmap_size;
> +    mgr->shared_bitmap = bitmap_new(shared_bitmap_size);
> +
> +    return ret;
> +}
> +
> +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr)
> +{
> +    g_free(mgr->shared_bitmap);
> +    memory_region_set_ram_discard_manager(mgr->mr, NULL);
> +}
> +
> +static void memory_attribute_manager_init(Object *obj)
> +{
> +    MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(obj);
> +
> +    QLIST_INIT(&mgr->rdl_list);
> +}
> +
> +static void memory_attribute_manager_finalize(Object *obj)
> +{
> +}
> +
> +static void memory_attribute_manager_class_init(ObjectClass *oc, void *data)
> +{
> +    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc);
> +
> +    rdmc->get_min_granularity = memory_attribute_rdm_get_min_granularity;
> +    rdmc->register_listener = memory_attribute_rdm_register_listener;
> +    rdmc->unregister_listener = memory_attribute_rdm_unregister_listener;
> +    rdmc->is_populated = memory_attribute_rdm_is_populated;
> +    rdmc->replay_populated = memory_attribute_rdm_replay_populated;
> +    rdmc->replay_discarded = memory_attribute_rdm_replay_discarded;
> +}

Would this initialization be for
> diff --git a/system/meson.build b/system/meson.build
> index 4952f4b2c7..ab07ff1442 100644
> --- a/system/meson.build
> +++ b/system/meson.build
> @@ -15,6 +15,7 @@ system_ss.add(files(
>     'dirtylimit.c',
>     'dma-helpers.c',
>     'globals.c',
> +  'memory-attribute-manager.c',
>     'memory_mapping.c',
>     'qdev-monitor.c',
>     'qtest.c',


  reply	other threads:[~2025-03-14 12:11 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-10  8:18 [PATCH v3 0/7] Enable shared device assignment Chenyi Qiang
2025-03-10  8:18 ` [PATCH v3 1/7] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
2025-03-10  8:18 ` [PATCH v3 2/7] memory: Change memory_region_set_ram_discard_manager() to return the result Chenyi Qiang
2025-03-10  8:18 ` [PATCH v3 3/7] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() Chenyi Qiang
2025-03-17  3:00   ` Kishen Maloor
2025-03-17  4:11     ` Chenyi Qiang
2025-03-10  8:18 ` [PATCH v3 4/7] memory-attribute-manager: Introduce MemoryAttributeManager to manage RAMBLock with guest_memfd Chenyi Qiang
2025-03-14 12:11   ` Gupta, Pankaj [this message]
2025-03-17  2:54     ` Chenyi Qiang
2025-03-17 10:36       ` David Hildenbrand
2025-03-17 17:01         ` Gupta, Pankaj
2025-03-18  1:54           ` Chenyi Qiang
2025-03-19  8:55             ` Gupta, Pankaj
2025-03-19 11:23               ` Chenyi Qiang
2025-03-19 11:56                 ` Gupta, Pankaj
2025-03-20  3:15                   ` Chenyi Qiang
2025-03-24  4:04                     ` Chenyi Qiang
2025-03-10  8:18 ` [PATCH v3 5/7] memory-attribute-manager: Introduce a callback to notify the shared/private state change Chenyi Qiang
2025-03-10  8:18 ` [PATCH v3 6/7] memory: Attach MemoryAttributeManager to guest_memfd-backed RAMBlocks Chenyi Qiang
2025-03-14  8:21   ` Chenyi Qiang
2025-03-14  9:00     ` David Hildenbrand
2025-03-14  9:30       ` Chenyi Qiang
2025-03-14  9:50         ` David Hildenbrand
2025-03-14 10:23           ` Chenyi Qiang
2025-03-31  8:55             ` Chenyi Qiang
2025-03-17  6:18   ` Tony Lindgren
2025-03-17  7:32     ` Chenyi Qiang
2025-03-17  9:45       ` Tony Lindgren
2025-03-17 10:21         ` Chenyi Qiang
2025-03-17 11:07           ` Tony Lindgren
2025-03-10  8:18 ` [PATCH v3 7/7] RAMBlock: Make guest_memfd require coordinate discard Chenyi Qiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2ab368b2-62ca-4163-a483-68e9d332201a@amd.com \
    --to=pankaj.gupta@amd.com \
    --cc=aik@amd.com \
    --cc=chao.gao@intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=chenyi.qiang@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox