From: Alexey Kardashevskiy <aik@amd.com>
To: "Chenyi Qiang" <chenyi.qiang@intel.com>,
"David Hildenbrand" <david@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Michael Roth" <michael.roth@amd.com>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org,
Williams Dan J <dan.j.williams@intel.com>,
Peng Chao P <chao.p.peng@intel.com>,
Gao Chao <chao.gao@intel.com>, Xu Yilun <yilun.xu@intel.com>,
Li Xiaoyao <xiaoyao.li@intel.com>
Subject: Re: [PATCH v2 4/6] memory-attribute-manager: Introduce a callback to notify the shared/private state change
Date: Tue, 18 Feb 2025 20:19:29 +1100 [thread overview]
Message-ID: <9a8fe1a7-528d-466a-a72d-89ceb88f47fb@amd.com> (raw)
In-Reply-To: <20250217081833.21568-5-chenyi.qiang@intel.com>
On 17/2/25 19:18, Chenyi Qiang wrote:
> Introduce a new state_change() callback in MemoryAttributeManagerClass to
> efficiently notify all registered RamDiscardListeners, including VFIO
> listeners about the memory conversion events in guest_memfd. The
> existing VFIO listener can dynamically DMA map/unmap the shared pages
> based on conversion types:
> - For conversions from shared to private, the VFIO system ensures the
> discarding of shared mapping from the IOMMU.
> - For conversions from private to shared, it triggers the population of
> the shared mapping into the IOMMU.
>
> Additionally, there could be some special conversion requests:
> - When a conversion request is made for a page already in the desired
> state, the helper simply returns success.
> - For requests involving a range partially in the desired state, only
> the necessary segments are converted, ensuring the entire range
> complies with the request efficiently.
> - In scenarios where a conversion request is declined by other systems,
> such as a failure from VFIO during notify_populate(), the helper will
> roll back the request, maintaining consistency.
>
> Opportunistically introduce a helper to trigger the state_change()
> callback of the class.
>
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
> Changes in v2:
> - Do the alignment changes due to the rename to MemoryAttributeManager
> - Move the state_change() helper definition in this patch.
> ---
> include/system/memory-attribute-manager.h | 20 +++
> system/memory-attribute-manager.c | 148 ++++++++++++++++++++++
> 2 files changed, 168 insertions(+)
>
> diff --git a/include/system/memory-attribute-manager.h b/include/system/memory-attribute-manager.h
> index 72adc0028e..c3dab4e47b 100644
> --- a/include/system/memory-attribute-manager.h
> +++ b/include/system/memory-attribute-manager.h
> @@ -34,8 +34,28 @@ struct MemoryAttributeManager {
>
> struct MemoryAttributeManagerClass {
> ObjectClass parent_class;
> +
> + int (*state_change)(MemoryAttributeManager *mgr, uint64_t offset, uint64_t size,
> + bool shared_to_private);
> };
>
> +static inline int memory_attribute_manager_state_change(MemoryAttributeManager *mgr, uint64_t offset,
> + uint64_t size, bool shared_to_private)
> +{
> + MemoryAttributeManagerClass *klass;
> +
> + if (mgr == NULL) {
> + return 0;
> + }
> +
> + klass = MEMORY_ATTRIBUTE_MANAGER_GET_CLASS(mgr);
> + if (klass->state_change) {
> + return klass->state_change(mgr, offset, size, shared_to_private);
> + }
> +
> + return 0;
nit: MemoryAttributeManagerClass without this only callback defined
should produce some error imho. Or assert.
> +}
> +
> int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr);
> void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr);
>
> diff --git a/system/memory-attribute-manager.c b/system/memory-attribute-manager.c
> index ed97e43dd0..17c70cf677 100644
> --- a/system/memory-attribute-manager.c
> +++ b/system/memory-attribute-manager.c
> @@ -241,6 +241,151 @@ static void memory_attribute_rdm_replay_discarded(const RamDiscardManager *rdm,
> memory_attribute_rdm_replay_discarded_cb);
> }
>
> +static bool memory_attribute_is_valid_range(MemoryAttributeManager *mgr,
> + uint64_t offset, uint64_t size)
> +{
> + MemoryRegion *mr = mgr->mr;
> +
> + g_assert(mr);
> +
> + uint64_t region_size = memory_region_size(mr);
> + int block_size = memory_attribute_manager_get_block_size(mgr);
> +
> + if (!QEMU_IS_ALIGNED(offset, block_size)) {
> + return false;
> + }
> + if (offset + size < offset || !size) {
> + return false;
> + }
> + if (offset >= region_size || offset + size > region_size) {
> + return false;
> + }
> + return true;
> +}
> +
> +static void memory_attribute_notify_discard(MemoryAttributeManager *mgr,
> + uint64_t offset, uint64_t size)
> +{
> + RamDiscardListener *rdl;
> +
> + QLIST_FOREACH(rdl, &mgr->rdl_list, next) {
> + MemoryRegionSection tmp = *rdl->section;
> +
> + if (!memory_region_section_intersect_range(&tmp, offset, size)) {
> + continue;
> + }
> +
> + memory_attribute_for_each_populated_section(mgr, &tmp, rdl,
> + memory_attribute_notify_discard_cb);
> + }
> +}
> +
> +static int memory_attribute_notify_populate(MemoryAttributeManager *mgr,
> + uint64_t offset, uint64_t size)
> +{
> + RamDiscardListener *rdl, *rdl2;
> + int ret = 0;
> +
> + QLIST_FOREACH(rdl, &mgr->rdl_list, next) {
> + MemoryRegionSection tmp = *rdl->section;
> +
> + if (!memory_region_section_intersect_range(&tmp, offset, size)) {
> + continue;
> + }
> +
> + ret = memory_attribute_for_each_discarded_section(mgr, &tmp, rdl,
> + memory_attribute_notify_populate_cb);
> + if (ret) {
> + break;
> + }
> + }
> +
> + if (ret) {
> + /* Notify all already-notified listeners. */
> + QLIST_FOREACH(rdl2, &mgr->rdl_list, next) {
> + MemoryRegionSection tmp = *rdl2->section;
> +
> + if (rdl2 == rdl) {
> + break;
> + }
> + if (!memory_region_section_intersect_range(&tmp, offset, size)) {
> + continue;
> + }
> +
> + memory_attribute_for_each_discarded_section(mgr, &tmp, rdl2,
> + memory_attribute_notify_discard_cb);
> + }
> + }
> + return ret;
> +}
> +
> +static bool memory_attribute_is_range_populated(MemoryAttributeManager *mgr,
> + uint64_t offset, uint64_t size)
> +{
> + int block_size = memory_attribute_manager_get_block_size(mgr);
> + const unsigned long first_bit = offset / block_size;
> + const unsigned long last_bit = first_bit + (size / block_size) - 1;
> + unsigned long found_bit;
> +
> + /* We fake a shorter bitmap to avoid searching too far. */
> + found_bit = find_next_zero_bit(mgr->shared_bitmap, last_bit + 1, first_bit);
> + return found_bit > last_bit;
> +}
> +
> +static bool memory_attribute_is_range_discarded(MemoryAttributeManager *mgr,
> + uint64_t offset, uint64_t size)
> +{
> + int block_size = memory_attribute_manager_get_block_size(mgr);
> + const unsigned long first_bit = offset / block_size;
> + const unsigned long last_bit = first_bit + (size / block_size) - 1;
> + unsigned long found_bit;
> +
> + /* We fake a shorter bitmap to avoid searching too far. */
Weird comment imho, why is it a "fake"? You check if all pages within
[offset, offset+size) are discarded. You do not want to search beyond
the end of this range anyway, right?
> + found_bit = find_next_bit(mgr->shared_bitmap, last_bit + 1, first_bit);
> + return found_bit > last_bit;
> +}
> +
> +static int memory_attribute_state_change(MemoryAttributeManager *mgr, uint64_t offset,
> + uint64_t size, bool shared_to_private)
Elsewhere it is just "to_private".
> +{
> + int block_size = memory_attribute_manager_get_block_size(mgr);
> + int ret = 0;
> +
> + if (!memory_attribute_is_valid_range(mgr, offset, size)) {
> + error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
> + __func__, offset, size);
> + return -1;
> + }
> +
> + if ((shared_to_private && memory_attribute_is_range_discarded(mgr, offset, size)) ||
> + (!shared_to_private && memory_attribute_is_range_populated(mgr, offset, size))) {
> + return 0;
> + }
> +
> + if (shared_to_private) {
> + memory_attribute_notify_discard(mgr, offset, size);
> + } else {
> + ret = memory_attribute_notify_populate(mgr, offset, size);
> + }
> +
> + if (!ret) {
> + unsigned long first_bit = offset / block_size;
> + unsigned long nbits = size / block_size;
> +
> + g_assert((first_bit + nbits) <= mgr->bitmap_size);
> +
> + if (shared_to_private) {
> + bitmap_clear(mgr->shared_bitmap, first_bit, nbits);
> + } else {
> + bitmap_set(mgr->shared_bitmap, first_bit, nbits);
> + }
> +
> + return 0;
Do not need this return. Thanks,
> + }
> +
> + return ret;
> +}
> +
> int memory_attribute_manager_realize(MemoryAttributeManager *mgr, MemoryRegion *mr)
> {
> uint64_t bitmap_size;
> @@ -281,8 +426,11 @@ static void memory_attribute_manager_finalize(Object *obj)
>
> static void memory_attribute_manager_class_init(ObjectClass *oc, void *data)
> {
> + MemoryAttributeManagerClass *mamc = MEMORY_ATTRIBUTE_MANAGER_CLASS(oc);
> RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc);
>
> + mamc->state_change = memory_attribute_state_change;
> +
> rdmc->get_min_granularity = memory_attribute_rdm_get_min_granularity;
> rdmc->register_listener = memory_attribute_rdm_register_listener;
> rdmc->unregister_listener = memory_attribute_rdm_unregister_listener;
--
Alexey
next prev parent reply other threads:[~2025-02-18 9:20 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-17 8:18 [PATCH v2 0/6] Enable shared device assignment Chenyi Qiang
2025-02-17 8:18 ` [PATCH v2 1/6] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
2025-02-17 8:18 ` [PATCH v2 2/6] memory: Change memory_region_set_ram_discard_manager() to return the result Chenyi Qiang
2025-02-18 9:19 ` Alexey Kardashevskiy
2025-02-18 9:41 ` Chenyi Qiang
2025-02-18 10:46 ` David Hildenbrand
2025-02-17 8:18 ` [PATCH v2 3/6] memory-attribute-manager: Introduce MemoryAttributeManager to manage RAMBLock with guest_memfd Chenyi Qiang
2025-02-18 9:19 ` Alexey Kardashevskiy
2025-02-19 1:20 ` Chenyi Qiang
2025-02-19 3:49 ` Alexey Kardashevskiy
2025-02-19 6:33 ` Chenyi Qiang
2025-02-20 3:02 ` Alexey Kardashevskiy
2025-02-17 8:18 ` [PATCH v2 4/6] memory-attribute-manager: Introduce a callback to notify the shared/private state change Chenyi Qiang
2025-02-18 9:19 ` Alexey Kardashevskiy [this message]
2025-02-19 1:50 ` Chenyi Qiang
2025-02-17 8:18 ` [PATCH v2 5/6] memory: Attach MemoryAttributeManager to guest_memfd-backed RAMBlocks Chenyi Qiang
2025-02-18 9:19 ` Alexey Kardashevskiy
2025-02-17 8:18 ` [PATCH v2 6/6] RAMBlock: Make guest_memfd require coordinate discard Chenyi Qiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a8fe1a7-528d-466a-a72d-89ceb88f47fb@amd.com \
--to=aik@amd.com \
--cc=chao.gao@intel.com \
--cc=chao.p.peng@intel.com \
--cc=chenyi.qiang@intel.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=xiaoyao.li@intel.com \
--cc=yilun.xu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).