qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: "David Hildenbrand" <david@redhat.com>,
	"Alexey Kardashevskiy" <aik@amd.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Gupta Pankaj" <pankaj.gupta@amd.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Michael Roth" <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>, Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 08/13] ram-block-attribute: Introduce a callback to notify shared/private state changes
Date: Mon,  7 Apr 2025 15:49:28 +0800	[thread overview]
Message-ID: <20250407074939.18657-9-chenyi.qiang@intel.com> (raw)
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>

A new state_change() callback is introduced in PrivateSharedManageClass
to efficiently notify all registered PrivateSharedListeners, including
VFIO listeners, about memory conversion events in guest_memfd. The VFIO
listener can dynamically DMA map/unmap shared pages based on conversion
types:
- For conversions from shared to private, the VFIO system ensures the
  discarding of shared mapping from the IOMMU.
- For conversions from private to shared, it triggers the population of
  the shared mapping into the IOMMU.

Additionally, special conversion requests are handled as followed:
- If a conversion request is made for a page already in the desired
  state, the helper simply returns success.
- For requests involving a range partially in the desired state, only
  the necessary segments are converted, ensuring efficient compliance
  with the request. In this case, fallback to "1 block at a time"
  handling so that the range passed to the notify_to_private/shared() is
  always in the desired state.
- If a conversion request is declined by other systems, such as a
  failure from VFIO during notify_to_shared(), the helper rolls back the
  request to maintain consistency. As for notify_to_private() handling,
  failure in VFIO is unexpected, so no error check is performed.

Note that the bitmap status is updated before callbacks, allowing
listeners to handle memory based on the latest status.

Opportunistically, introduce a helper to trigger the state_change()
callback of the class.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Add the state_change() callback in PrivateSharedManagerClass
      instead of the RamBlockAttribute.

Changes in v3:
    - Move the bitmap update before notifier callbacks.
    - Call the notifier callbacks directly in notify_discard/populate()
      with the expectation that the request memory range is in the
      desired attribute.
    - For the case that only partial range in the desire status, handle
      the range with block_size granularity for ease of rollback
      (https://lore.kernel.org/qemu-devel/812768d7-a02d-4b29-95f3-fb7a125cf54e@redhat.com/)

Changes in v2:
    - Do the alignment changes due to the rename to MemoryAttributeManager
    - Move the state_change() helper definition in this patch.
---
 include/exec/memory.h        |   7 ++
 system/memory.c              |  10 ++
 system/ram-block-attribute.c | 191 +++++++++++++++++++++++++++++++++++
 3 files changed, 208 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 08f25e5e84..a61896251c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -780,6 +780,9 @@ struct PrivateSharedListener {
 struct PrivateSharedManagerClass {
     /* private */
     GenericStateManagerClass parent_class;
+
+    int (*state_change)(PrivateSharedManager *mgr, uint64_t offset, uint64_t size,
+                        bool to_private);
 };
 
 static inline void private_shared_listener_init(PrivateSharedListener *psl,
@@ -789,6 +792,10 @@ static inline void private_shared_listener_init(PrivateSharedListener *psl,
     state_change_listener_init(&psl->scl, populate_fn, discard_fn);
 }
 
+int private_shared_manager_state_change(PrivateSharedManager *mgr,
+                                        uint64_t offset, uint64_t size,
+                                        bool to_private);
+
 /**
  * memory_get_xlat_addr: Extract addresses from a TLB entry
  *
diff --git a/system/memory.c b/system/memory.c
index e6e944d9c0..2f6eaf6314 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2206,6 +2206,16 @@ void generic_state_manager_unregister_listener(GenericStateManager *gsm,
     gsmc->unregister_listener(gsm, scl);
 }
 
+int private_shared_manager_state_change(PrivateSharedManager *mgr,
+                                        uint64_t offset, uint64_t size,
+                                        bool to_private)
+{
+    PrivateSharedManagerClass *psmc = PRIVATE_SHARED_MANAGER_GET_CLASS(mgr);
+
+    g_assert(psmc->state_change);
+    return psmc->state_change(mgr, offset, size, to_private);
+}
+
 /* Called with rcu_read_lock held.  */
 bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
                           ram_addr_t *ram_addr, bool *read_only,
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
index 283c03b354..06ed134cda 100644
--- a/system/ram-block-attribute.c
+++ b/system/ram-block-attribute.c
@@ -233,6 +233,195 @@ static int ram_block_attribute_psm_replay_on_private(const GenericStateManager *
                                                         ram_block_attribute_psm_replay_cb);
 }
 
+static bool ram_block_attribute_is_valid_range(RamBlockAttribute *attr,
+                                               uint64_t offset, uint64_t size)
+{
+    MemoryRegion *mr = attr->mr;
+
+    g_assert(mr);
+
+    uint64_t region_size = memory_region_size(mr);
+    int block_size = ram_block_attribute_get_block_size(attr);
+
+    if (!QEMU_IS_ALIGNED(offset, block_size)) {
+        return false;
+    }
+    if (offset + size < offset || !size) {
+        return false;
+    }
+    if (offset >= region_size || offset + size > region_size) {
+        return false;
+    }
+    return true;
+}
+
+static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
+                                                  uint64_t offset, uint64_t size)
+{
+    PrivateSharedListener *psl;
+
+    QLIST_FOREACH(psl, &attr->psl_list, next) {
+        StateChangeListener *scl = &psl->scl;
+        MemoryRegionSection tmp = *scl->section;
+
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+            continue;
+        }
+        scl->notify_to_state_clear(scl, &tmp);
+    }
+}
+
+static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
+                                                uint64_t offset, uint64_t size)
+{
+    PrivateSharedListener *psl, *psl2;
+    int ret = 0;
+
+    QLIST_FOREACH(psl, &attr->psl_list, next) {
+        StateChangeListener *scl = &psl->scl;
+        MemoryRegionSection tmp = *scl->section;
+
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+            continue;
+        }
+        ret = scl->notify_to_state_set(scl, &tmp);
+        if (ret) {
+            break;
+        }
+    }
+
+    if (ret) {
+        /* Notify all already-notified listeners. */
+        QLIST_FOREACH(psl2, &attr->psl_list, next) {
+            StateChangeListener *scl2 = &psl2->scl;
+            MemoryRegionSection tmp = *scl2->section;
+
+            if (psl == psl2) {
+                break;
+            }
+            if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+                continue;
+            }
+            scl2->notify_to_state_clear(scl2, &tmp);
+        }
+    }
+    return ret;
+}
+
+static bool ram_block_attribute_is_range_shared(RamBlockAttribute *attr,
+                                                uint64_t offset, uint64_t size)
+{
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    const unsigned long first_bit = offset / block_size;
+    const unsigned long last_bit = first_bit + (size / block_size) - 1;
+    unsigned long found_bit;
+
+    /* We fake a shorter bitmap to avoid searching too far. */
+    found_bit = find_next_zero_bit(attr->shared_bitmap, last_bit + 1, first_bit);
+    return found_bit > last_bit;
+}
+
+static bool ram_block_attribute_is_range_private(RamBlockAttribute *attr,
+                                                uint64_t offset, uint64_t size)
+{
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    const unsigned long first_bit = offset / block_size;
+    const unsigned long last_bit = first_bit + (size / block_size) - 1;
+    unsigned long found_bit;
+
+    /* We fake a shorter bitmap to avoid searching too far. */
+    found_bit = find_next_bit(attr->shared_bitmap, last_bit + 1, first_bit);
+    return found_bit > last_bit;
+}
+
+static int ram_block_attribute_psm_state_change(PrivateSharedManager *mgr, uint64_t offset,
+                                                uint64_t size, bool to_private)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(mgr);
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    const unsigned long first_bit = offset / block_size;
+    const unsigned long nbits = size / block_size;
+    const uint64_t end = offset + size;
+    unsigned long bit;
+    uint64_t cur;
+    int ret = 0;
+
+    if (!ram_block_attribute_is_valid_range(attr, offset, size)) {
+        error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
+                     __func__, offset, size);
+        return -1;
+    }
+
+    if (to_private) {
+        if (ram_block_attribute_is_range_private(attr, offset, size)) {
+            /* Already private */
+        } else if (!ram_block_attribute_is_range_shared(attr, offset, size)) {
+            /* Unexpected mixture: process individual blocks */
+            for (cur = offset; cur < end; cur += block_size) {
+                bit = cur / block_size;
+                if (!test_bit(bit, attr->shared_bitmap)) {
+                    continue;
+                }
+                clear_bit(bit, attr->shared_bitmap);
+                ram_block_attribute_notify_to_private(attr, cur, block_size);
+            }
+        } else {
+            /* Completely shared */
+            bitmap_clear(attr->shared_bitmap, first_bit, nbits);
+            ram_block_attribute_notify_to_private(attr, offset, size);
+        }
+    } else {
+        if (ram_block_attribute_is_range_shared(attr, offset, size)) {
+            /* Already shared */
+        } else if (!ram_block_attribute_is_range_private(attr, offset, size)) {
+            /* Unexpected mixture: process individual blocks */
+            unsigned long *modified_bitmap = bitmap_new(nbits);
+
+            for (cur = offset; cur < end; cur += block_size) {
+                bit = cur / block_size;
+                if (test_bit(bit, attr->shared_bitmap)) {
+                    continue;
+                }
+                set_bit(bit, attr->shared_bitmap);
+                ret = ram_block_attribute_notify_to_shared(attr, cur, block_size);
+                if (!ret) {
+                    set_bit(bit - first_bit, modified_bitmap);
+                    continue;
+                }
+                clear_bit(bit, attr->shared_bitmap);
+                break;
+            }
+
+            if (ret) {
+                /*
+                 * Very unexpected: something went wrong. Revert to the old
+                 * state, marking only the blocks as private that we converted
+                 * to shared.
+                 */
+                for (cur = offset; cur < end; cur += block_size) {
+                    bit = cur / block_size;
+                    if (!test_bit(bit - first_bit, modified_bitmap)) {
+                        continue;
+                    }
+                    assert(test_bit(bit, attr->shared_bitmap));
+                    clear_bit(bit, attr->shared_bitmap);
+                    ram_block_attribute_notify_to_private(attr, cur, block_size);
+                }
+            }
+            g_free(modified_bitmap);
+        } else {
+            /* Complete private */
+            bitmap_set(attr->shared_bitmap, first_bit, nbits);
+            ret = ram_block_attribute_notify_to_shared(attr, offset, size);
+            if (ret) {
+                bitmap_clear(attr->shared_bitmap, first_bit, nbits);
+            }
+        }
+    }
+
+    return ret;
+}
+
 int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr)
 {
     uint64_t shared_bitmap_size;
@@ -272,6 +461,7 @@ static void ram_block_attribute_finalize(Object *obj)
 static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
 {
     GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
+    PrivateSharedManagerClass *psmc = PRIVATE_SHARED_MANAGER_CLASS(oc);
 
     gsmc->get_min_granularity = ram_block_attribute_psm_get_min_granularity;
     gsmc->register_listener = ram_block_attribute_psm_register_listener;
@@ -279,4 +469,5 @@ static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
     gsmc->is_state_set = ram_block_attribute_psm_is_shared;
     gsmc->replay_on_state_set = ram_block_attribute_psm_replay_on_shared;
     gsmc->replay_on_state_clear = ram_block_attribute_psm_replay_on_private;
+    psmc->state_change = ram_block_attribute_psm_state_change;
 }
-- 
2.43.5



  parent reply	other threads:[~2025-04-07  7:52 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07  7:49 [PATCH v4 00/13] Enable shared device assignment Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 01/13] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
2025-04-09  2:47   ` Alexey Kardashevskiy
2025-04-09  6:26     ` Chenyi Qiang
2025-04-09  6:45       ` Alexey Kardashevskiy
2025-04-09  7:38         ` Chenyi Qiang
2025-05-12  3:24   ` Zhao Liu
2025-04-07  7:49 ` [PATCH v4 02/13] memory: Change memory_region_set_ram_discard_manager() to return the result Chenyi Qiang
2025-04-07  9:53   ` Xiaoyao Li
2025-04-08  0:50     ` Chenyi Qiang
2025-04-09  5:35   ` Alexey Kardashevskiy
2025-04-09  5:52     ` Chenyi Qiang
2025-04-25 12:35       ` David Hildenbrand
2025-04-07  7:49 ` [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() Chenyi Qiang
2025-04-09  5:43   ` Alexey Kardashevskiy
2025-04-09  6:56     ` Chenyi Qiang
2025-04-25 12:44     ` David Hildenbrand
2025-04-25 12:42   ` David Hildenbrand
2025-04-27  2:13     ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 04/13] memory: Introduce generic state change parent class for RamDiscardManager Chenyi Qiang
2025-04-09  9:56   ` Alexey Kardashevskiy
2025-04-09 12:57     ` Chenyi Qiang
2025-04-10  0:11       ` Alexey Kardashevskiy
2025-04-10  1:44         ` Chenyi Qiang
2025-04-16  3:32           ` Chenyi Qiang
2025-04-17 23:10             ` Alexey Kardashevskiy
2025-04-18  3:49               ` Chenyi Qiang
2025-04-25 12:54             ` David Hildenbrand
2025-04-25 12:49     ` David Hildenbrand
2025-04-27  1:33       ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as child of GenericStateManager Chenyi Qiang
2025-04-09  9:56   ` Alexey Kardashevskiy
2025-04-10  3:47     ` Chenyi Qiang
2025-04-25 12:57   ` David Hildenbrand
2025-04-27  1:40     ` Chenyi Qiang
2025-04-29 10:01       ` David Hildenbrand
2025-04-07  7:49 ` [PATCH v4 06/13] vfio: Add the support for PrivateSharedManager Interface Chenyi Qiang
2025-04-09  9:58   ` Alexey Kardashevskiy
2025-04-10  5:53     ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to manage RAMBLock with guest_memfd Chenyi Qiang
2025-04-09  9:57   ` Alexey Kardashevskiy
2025-04-10  7:37     ` Chenyi Qiang
2025-05-09  6:41   ` Baolu Lu
2025-05-09  7:55     ` Chenyi Qiang
2025-05-09  8:18       ` David Hildenbrand
2025-05-09 10:37         ` Chenyi Qiang
2025-05-12  8:07   ` Zhao Liu
2025-05-12  9:43     ` Chenyi Qiang
2025-05-13  8:31       ` Zhao Liu
2025-05-14  1:39         ` Chenyi Qiang
2025-04-07  7:49 ` Chenyi Qiang [this message]
2025-04-07  7:49 ` [PATCH v4 09/13] memory: Attach RamBlockAttribute to guest_memfd-backed RAMBlocks Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 10/13] memory: Change NotifyStateClear() definition to return the result Chenyi Qiang
2025-04-27  2:26   ` Chenyi Qiang
2025-05-09  2:38     ` Chao Gao
2025-05-09  8:20       ` David Hildenbrand
2025-05-09  9:19         ` Chenyi Qiang
2025-05-09  8:22     ` Baolu Lu
2025-05-09 10:04       ` Chenyi Qiang
2025-05-12  7:54         ` David Hildenbrand
2025-04-07  7:49 ` [PATCH v4 11/13] KVM: Introduce CVMPrivateSharedListener for attribute changes during page conversions Chenyi Qiang
2025-05-09  9:03   ` Baolu Lu
2025-05-12  3:18     ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 12/13] ram-block-attribute: Add priority listener support for PrivateSharedListener Chenyi Qiang
2025-05-09  9:23   ` Baolu Lu
2025-05-09  9:39     ` Chenyi Qiang
2025-04-07  7:49 ` [PATCH v4 13/13] RAMBlock: Make guest_memfd require coordinate discard Chenyi Qiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250407074939.18657-9-chenyi.qiang@intel.com \
    --to=chenyi.qiang@intel.com \
    --cc=aik@amd.com \
    --cc=chao.gao@intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pankaj.gupta@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).