qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: qemu-devel@nongnu.org
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Si-Wei Liu <si-wei.liu@oracle.com>,
	Joao Martins <joao.m.martins@oracle.com>,
	Jason Wang <jasowang@redhat.com>
Subject: [PULL v2 02/88] vhost: Perform memory section dirty scans once per iteration
Date: Tue, 2 Jul 2024 16:15:10 -0400	[thread overview]
Message-ID: <c5cd7e5f230afb56891e3826fbb60f9e2b6c086a.1719951168.git.mst@redhat.com> (raw)
In-Reply-To: <cover.1719951168.git.mst@redhat.com>

From: Si-Wei Liu <si-wei.liu@oracle.com>

On setups with one or more virtio-net devices with vhost on,
dirty tracking iteration increases cost the bigger the number
amount of queues are set up e.g. on idle guests migration the
following is observed with virtio-net with vhost=on:

48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
1 queue -> 6.89%     [.] vhost_dev_sync_region.isra.13
2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14

With high memory rates the symptom is lack of convergence as soon
as it has a vhost device with a sufficiently high number of queues,
the sufficient number of vhost devices.

On every migration iteration (every 100msecs) it will redundantly
query the *shared log* the number of queues configured with vhost
that exist in the guest. For the virtqueue data, this is necessary,
but not for the memory sections which are the same. So essentially
we end up scanning the dirty log too often.

To fix that, select a vhost device responsible for scanning the
log with regards to memory sections dirty tracking. It is selected
when we enable the logger (during migration) and cleared when we
disable the logger. If the vhost logger device goes away for some
reason, the logger will be re-selected from the rest of vhost
devices.

After making mem-section logger a singleton instance, constant cost
of 7%-9% (like the 1 queue report) will be seen, no matter how many
queues or how many vhost devices are configured:

48 queues -> 8.71%    [.] vhost_dev_sync_region.isra.13
2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14

Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <1710448055-11709-2-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 include/hw/virtio/vhost.h |  1 +
 hw/virtio/vhost.c         | 67 +++++++++++++++++++++++++++++++++++----
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 02477788df..d75faf46e9 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -129,6 +129,7 @@ struct vhost_dev {
     void *opaque;
     struct vhost_log *log;
     QLIST_ENTRY(vhost_dev) entry;
+    QLIST_ENTRY(vhost_dev) logdev_entry;
     QLIST_HEAD(, vhost_iommu) iommu_list;
     IOMMUNotifier n;
     const VhostDevConfigOps *config_ops;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index a1e8b79e1a..06fc71746e 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -45,6 +45,7 @@
 
 static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
 static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
+static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];
 
 /* Memslots used by backends that support private memslots (without an fd). */
 static unsigned int used_memslots;
@@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
     }
 }
 
+static inline bool vhost_dev_should_log(struct vhost_dev *dev)
+{
+    assert(dev->vhost_ops);
+    assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
+    assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
+
+    return dev == QLIST_FIRST(&vhost_log_devs[dev->vhost_ops->backend_type]);
+}
+
+static inline void vhost_dev_elect_mem_logger(struct vhost_dev *hdev, bool add)
+{
+    VhostBackendType backend_type;
+
+    assert(hdev->vhost_ops);
+
+    backend_type = hdev->vhost_ops->backend_type;
+    assert(backend_type > VHOST_BACKEND_TYPE_NONE);
+    assert(backend_type < VHOST_BACKEND_TYPE_MAX);
+
+    if (add && !QLIST_IS_INSERTED(hdev, logdev_entry)) {
+        if (QLIST_EMPTY(&vhost_log_devs[backend_type])) {
+            QLIST_INSERT_HEAD(&vhost_log_devs[backend_type],
+                              hdev, logdev_entry);
+        } else {
+            /*
+             * The first vhost_device in the list is selected as the shared
+             * logger to scan memory sections. Put new entry next to the head
+             * to avoid inadvertent change to the underlying logger device.
+             * This is done in order to get better cache locality and to avoid
+             * performance churn on the hot path for log scanning. Even when
+             * new devices come and go quickly, it wouldn't end up changing
+             * the active leading logger device at all.
+             */
+            QLIST_INSERT_AFTER(QLIST_FIRST(&vhost_log_devs[backend_type]),
+                               hdev, logdev_entry);
+        }
+    } else if (!add && QLIST_IS_INSERTED(hdev, logdev_entry)) {
+        QLIST_REMOVE(hdev, logdev_entry);
+    }
+}
+
 static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
                                    MemoryRegionSection *section,
                                    hwaddr first,
@@ -166,12 +208,14 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
     start_addr = MAX(first, start_addr);
     end_addr = MIN(last, end_addr);
 
-    for (i = 0; i < dev->mem->nregions; ++i) {
-        struct vhost_memory_region *reg = dev->mem->regions + i;
-        vhost_dev_sync_region(dev, section, start_addr, end_addr,
-                              reg->guest_phys_addr,
-                              range_get_last(reg->guest_phys_addr,
-                                             reg->memory_size));
+    if (vhost_dev_should_log(dev)) {
+        for (i = 0; i < dev->mem->nregions; ++i) {
+            struct vhost_memory_region *reg = dev->mem->regions + i;
+            vhost_dev_sync_region(dev, section, start_addr, end_addr,
+                                  reg->guest_phys_addr,
+                                  range_get_last(reg->guest_phys_addr,
+                                                 reg->memory_size));
+        }
     }
     for (i = 0; i < dev->nvqs; ++i) {
         struct vhost_virtqueue *vq = dev->vqs + i;
@@ -383,6 +427,7 @@ static void vhost_log_put(struct vhost_dev *dev, bool sync)
         g_free(log);
     }
 
+    vhost_dev_elect_mem_logger(dev, false);
     dev->log = NULL;
     dev->log_size = 0;
 }
@@ -998,6 +1043,15 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
             goto err_vq;
         }
     }
+
+    /*
+     * At log start we select our vhost_device logger that will scan the
+     * memory sections and skip for the others. This is possible because
+     * the log is shared amongst all vhost devices for a given type of
+     * backend.
+     */
+    vhost_dev_elect_mem_logger(dev, enable_log);
+
     return 0;
 err_vq:
     for (; i >= 0; --i) {
@@ -2075,6 +2129,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
             VHOST_OPS_DEBUG(r, "vhost_set_log_base failed");
             goto fail_log;
         }
+        vhost_dev_elect_mem_logger(hdev, true);
     }
     if (vrings) {
         r = vhost_dev_set_vring_enable(hdev, true);
-- 
MST



  parent reply	other threads:[~2024-07-02 20:27 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-02 20:15 [PULL v2 00/88] virtio: features,fixes Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 01/88] vhost: dirty log should be per backend type Michael S. Tsirkin
2024-07-02 20:15 ` Michael S. Tsirkin [this message]
2024-07-02 20:15 ` [PULL v2 03/88] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 04/88] virtio/virtio-pci: Handle extra notification data Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 05/88] virtio: Prevent creation of device using notification-data with ioeventfd Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 06/88] virtio-mmio: Handle extra notification data Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 07/88] virtio-ccw: " Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 08/88] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 09/88] Fix vhost user assertion when sending more than one fd Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 10/88] vhost-vsock: add VIRTIO_F_RING_PACKED to feature_bits Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 11/88] hw/virtio: Fix obtain the buffer id from the last descriptor Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 12/88] virtio-pci: only reset pm state during resetting Michael S. Tsirkin
2024-07-02 20:15 ` [PULL v2 13/88] vhost-user-gpu: fix import of DMABUF Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 14/88] Revert "vhost-user: fix lost reconnect" Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 15/88] vhost-user: fix lost reconnect again Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 16/88] hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 17/88] hw/cxl/mailbox: interface to add CCI commands to an existing CCI Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 18/88] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 19/88] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 20/88] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 21/88] hw/mem/cxl_type3: Add support to create DC regions to " Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 22/88] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 23/88] hw/mem/cxl_type3: Add host backend and address space handling for DC regions Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 24/88] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 25/88] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 26/88] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 27/88] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 28/88] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support Michael S. Tsirkin
2024-07-02 20:16 ` [PULL v2 29/88] hw/mem/cxl_type3: Allow to release extent superset in QMP interface Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 30/88] linux-headers: update to 6.10-rc1 Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 31/88] hw/misc/pvpanic: centralize definition of supported events Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 32/88] tests/qtest/pvpanic: use centralized " Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 33/88] hw/misc/pvpanic: add support for normal shutdowns Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 34/88] pvpanic: Emit GUEST_PVSHUTDOWN QMP event on pvpanic shutdown signal Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 35/88] tests/qtest/pvpanic: add tests for pvshutdown event Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 36/88] Revert "docs/specs/pvpanic: mark shutdown event as not implemented" Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 37/88] virtio-pci: Fix the failure process in kvm_virtio_pci_vector_use_one() Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 38/88] hw/cxl: Fix read from bogus memory Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 39/88] virtio-pci: implement No_Soft_Reset bit Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 40/88] vhost-user-test: no set non-blocking for cal fd less than 0 Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 41/88] i386/apic: Add hint on boot failure because of disabling x2APIC Michael S. Tsirkin
2024-07-02 20:17 ` [PULL v2 42/88] hw/virtio: Free vqs after vhost_dev_cleanup() Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 43/88] virtio-iommu: add error check before assert Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 44/88] vhost-user: Skip unnecessary duplicated VHOST_USER_SET_LOG_BASE requests Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 45/88] hw/net/virtio-net.c: fix crash in iov_copy() Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 46/88] qapi: clarify that the default is backend dependent Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 47/88] libvhost-user: set msg.msg_control to NULL when it is empty Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 48/88] libvhost-user: fail vu_message_write() if sendmsg() is failing Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 49/88] libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 50/88] vhost-user-server: do not set memory fd non-blocking Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 51/88] contrib/vhost-user-blk: fix bind() using the right size of the address Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 52/88] contrib/vhost-user-*: use QEMU bswap helper functions Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 53/88] vhost-user: enable frontends on any POSIX system Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 54/88] libvhost-user: enable it " Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 55/88] contrib/vhost-user-blk: " Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 56/88] hostmem: add a new memory backend based on POSIX shm_open() Michael S. Tsirkin
2024-07-02 20:18 ` [PULL v2 57/88] tests/qtest/vhost-user-blk-test: use memory-backend-shm Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 58/88] tests/qtest/vhost-user-test: add a test case for memory-backend-shm Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 59/88] hw/virtio: Fix the de-initialization of vhost-user devices Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 60/88] hw/arm/virt-acpi-build: Drop local iort_node_offset Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 61/88] hw/i386/fw_cfg: Add etc/e820 to fw_cfg late Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 62/88] hw/arm/virt-acpi-build: Fix id_count in build_iort_id_mapping Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 63/88] uefi-test-tools/UefiTestToolsPkg: Add RISC-V support Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 64/88] uefi-test-tools: Add support for python based build script Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 65/88] tests/data/uefi-boot-images: Add RISC-V ISO image Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 66/88] qtest: bios-tables-test: Rename aarch64 tests with aarch64 in them Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 67/88] tests/qtest/bios-tables-test.c: Add support for arch in path Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 68/88] tests/qtest/bios-tables-test.c: Set "arch" for aarch64 tests Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 69/88] tests/qtest/bios-tables-test.c: Set "arch" for x86 tests Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 70/88] tests/data/acpi: Move x86 ACPI tables under x86/${machine} path Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 71/88] tests/data/acpi/virt: Move ARM64 ACPI tables under aarch64/${machine} path Michael S. Tsirkin
2024-07-02 20:19 ` [PULL v2 72/88] meson.build: Add RISC-V to the edk2-target list Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 73/88] pc-bios/meson.build: Add support for RISC-V in unpack_edk2_blobs Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 74/88] tests/data/acpi/rebuild-expected-aml.sh: Add RISC-V Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 75/88] hw/cxl/events: Improve QMP interfaces and documentation for add/release dynamic capacity Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 76/88] hw/cxl/events: Mark cxl-add-dynamic-capacity and cxl-release-dynamic-capcity unstable Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 77/88] virtio: remove virtio_tswap16s() call in vring_packed_event_read() Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 78/88] virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 79/88] hw/pci: Rename has_power to enabled Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 80/88] hw/ppc/spapr_pci: Do not create DT for disabled PCI device Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 81/88] hw/ppc/spapr_pci: Do not reject VFs created after a PF Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 82/88] pcie_sriov: Do not manually unrealize Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 83/88] pcie_sriov: Ensure VF function number does not overflow Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 84/88] pcie_sriov: Reuse SR-IOV VF device instances Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 85/88] pcie_sriov: Release VFs failed to realize Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 86/88] pcie_sriov: Remove num_vfs from PCIESriovPF Michael S. Tsirkin
2024-07-02 20:20 ` [PULL v2 87/88] pcie_sriov: Register VFs after migration Michael S. Tsirkin
2024-07-02 20:21 ` [PULL v2 88/88] hw/pci: Replace -1 with UINT32_MAX for romsize Michael S. Tsirkin
2024-07-03 16:31 ` [PULL v2 00/88] virtio: features,fixes Richard Henderson
2024-07-03 16:51   ` Michael S. Tsirkin
2024-07-03 17:01     ` Richard Henderson
2024-07-03 18:46       ` Thomas Huth
2024-07-03 19:45         ` Michael S. Tsirkin
2024-07-03 19:50           ` Michael S. Tsirkin
2024-07-04  5:51           ` Thomas Huth
2024-07-03 20:26         ` Michael S. Tsirkin
2024-07-03 22:37           ` Richard Henderson
2024-07-03 23:00             ` Michael S. Tsirkin
2024-07-04  5:48           ` Thomas Huth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5cd7e5f230afb56891e3826fbb60f9e2b6c086a.1719951168.git.mst@redhat.com \
    --to=mst@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=si-wei.liu@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).