From: "Michael S. Tsirkin" <mst@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Lei Yang" <leiyang@redhat.com>,
"Si-Wei Liu" <si-wei.liu@oracle.com>,
"Jason Wang" <jasowang@redhat.com>,
"Jonah Palmer" <jonah.palmer@oracle.com>,
"Stefano Garzarella" <sgarzare@redhat.com>
Subject: [PULL 30/31] vdpa: move memory listener register to vhost_vdpa_init
Date: Sun, 1 Jun 2025 11:26:03 -0400 [thread overview]
Message-ID: <e88eeb089f33c6cb4c177952038c8e2613be7342.1748791463.git.mst@redhat.com> (raw)
In-Reply-To: <cover.1748791463.git.mst@redhat.com>
From: Eugenio Pérez <eperezma@redhat.com>
Current memory operations like pinning may take a lot of time at the
destination. Currently they are done after the source of the migration is
stopped, and before the workload is resumed at the destination. This is a
period where neigher traffic can flow, nor the VM workload can continue
(downtime).
We can do better as we know the memory layout of the guest RAM at the
destination from the moment that all devices are initializaed. So
moving that operation allows QEMU to communicate the kernel the maps
while the workload is still running in the source, so Linux can start
mapping them.
As a small drawback, there is a time in the initialization where QEMU
cannot respond to QMP etc. By some testing, this time is about
0.2seconds. This may be further reduced (or increased) depending on the
vdpa driver and the platform hardware, and it is dominated by the cost
of memory pinning.
This matches the time that we move out of the called downtime window.
The downtime is measured as the elapsed trace time between the last
vhost_vdpa_suspend on the source and the last vhost_vdpa_set_vring_enable_one
on the destination. In other words, from "guest CPUs freeze" to the
instant the final Rx/Tx queue-pair is able to start moving data.
Using ConnectX-6 Dx (MLX5) NICs in vhost-vDPA mode with 8 queue-pairs,
the series reduces guest-visible downtime during back-to-back live
migrations by more than half:
- 39G VM: 4.72s -> 2.09s (-2.63s, ~56% improvement)
- 128G VM: 14.72s -> 5.83s (-8.89s, ~60% improvement)
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20250522145839.59974-8-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
hw/virtio/vhost-vdpa.c | 35 ++++++++++++++++++++++++++++-------
1 file changed, 28 insertions(+), 7 deletions(-)
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index de834f2ebd..e20da95f30 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -894,8 +894,14 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
trace_vhost_vdpa_reset_device(dev);
+ if (ret) {
+ return ret;
+ }
+
+ memory_listener_unregister(&v->shared->listener);
+ v->shared->listener_registered = false;
v->suspended = false;
- return ret;
+ return 0;
}
static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
@@ -1379,6 +1385,11 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
"IOMMU and try again");
return -1;
}
+ if (v->shared->listener_registered &&
+ dev->vdev->dma_as != v->shared->listener.address_space) {
+ memory_listener_unregister(&v->shared->listener);
+ v->shared->listener_registered = false;
+ }
if (!v->shared->listener_registered) {
memory_listener_register(&v->shared->listener, dev->vdev->dma_as);
v->shared->listener_registered = true;
@@ -1392,8 +1403,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
static void vhost_vdpa_reset_status(struct vhost_dev *dev)
{
- struct vhost_vdpa *v = dev->opaque;
-
if (!vhost_vdpa_last_dev(dev)) {
return;
}
@@ -1401,9 +1410,6 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev)
vhost_vdpa_reset_device(dev);
vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
VIRTIO_CONFIG_S_DRIVER);
- memory_listener_unregister(&v->shared->listener);
- v->shared->listener_registered = false;
-
}
static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
@@ -1537,12 +1543,27 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
static int vhost_vdpa_set_owner(struct vhost_dev *dev)
{
+ int r;
+ struct vhost_vdpa *v;
+
if (!vhost_vdpa_first_dev(dev)) {
return 0;
}
trace_vhost_vdpa_set_owner(dev);
- return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
+ r = vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
+ if (unlikely(r < 0)) {
+ return r;
+ }
+
+ /*
+ * Being optimistic and listening address space memory. If the device
+ * uses vIOMMU, it is changed at vhost_vdpa_dev_start.
+ */
+ v = dev->opaque;
+ memory_listener_register(&v->shared->listener, &address_space_memory);
+ v->shared->listener_registered = true;
+ return 0;
}
static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
--
MST
next prev parent reply other threads:[~2025-06-01 15:26 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-01 15:24 [PULL 00/31] virtio,pci,pc: features, fixes, tests Michael S. Tsirkin
2025-06-01 15:24 ` [PULL 01/31] virtio: check for validity of indirect descriptors Michael S. Tsirkin
2025-06-01 15:24 ` [PULL 02/31] hw/i386/amd_iommu: Fix device setup failure when PT is on Michael S. Tsirkin
2025-06-01 15:24 ` [PULL 03/31] hw/i386/amd_iommu: Fix xtsup when vcpus < 255 Michael S. Tsirkin
2025-06-01 15:24 ` [PULL 04/31] pcie: Add helper to declare PASID capability for a pcie device Michael S. Tsirkin
2025-06-01 15:24 ` [PULL 05/31] pcie: Helper functions to check if PASID is enabled Michael S. Tsirkin
2025-06-01 15:24 ` [PULL 06/31] pcie: Helper function to check if ATS " Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 07/31] pcie: Add a helper to declare the PRI capability for a pcie device Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 08/31] pcie: Helper functions to check to check if PRI is enabled Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 09/31] pci: Cache the bus mastering status in the device Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 10/31] pci: Add an API to get IOMMU's min page size and virtual address width Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 11/31] memory: Store user data pointer in the IOMMU notifiers Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 12/31] pci: Add a pci-level initialization function for " Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 13/31] pci: Add a pci-level API for ATS Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 14/31] pci: Add a PCI-level API for PRI Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 15/31] uefi-test-tools:: Add LoongArch64 support Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 16/31] tests/data/uefi-boot-images: Add ISO image for LoongArch system Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 17/31] tests/qtest/bios-tables-test: Use MiB macro rather hardcode value Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 18/31] tests/acpi: Add empty ACPI data files for LoongArch Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 19/31] tests/qtest/bios-tables-test: Add basic testing " Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 20/31] rebuild-expected-aml.sh: Add support " Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 21/31] tests/acpi: Fill acpi table data " Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 22/31] tests/acpi: Remove stale allowed tables Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 23/31] vhost: Don't set vring call if guest notifier is unused Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 24/31] vdpa: check for iova tree initialized at net_client_start Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 25/31] vdpa: reorder vhost_vdpa_set_backend_cap Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 26/31] vdpa: set backend capabilities at vhost_vdpa_init Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 27/31] vdpa: add listener_registered Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 28/31] vdpa: reorder listener assignment Michael S. Tsirkin
2025-06-01 15:25 ` [PULL 29/31] vdpa: move iova_tree allocation to net_vhost_vdpa_init Michael S. Tsirkin
2025-06-01 15:26 ` Michael S. Tsirkin [this message]
2025-06-01 15:26 ` [PULL 31/31] hw/i386/pc_piix: Fix RTC ISA IRQ wiring of isapc machine Michael S. Tsirkin
2025-06-02 8:39 ` [PULL 00/31] virtio,pci,pc: features, fixes, tests Michael S. Tsirkin
2025-06-02 16:39 ` Stefan Hajnoczi
2025-06-02 17:54 ` Michael S. Tsirkin
2025-06-02 18:25 ` Stefan Hajnoczi
2025-06-02 18:31 ` Michael S. Tsirkin
2025-06-02 21:58 ` Michael S. Tsirkin
2025-06-02 22:27 ` Stefan Hajnoczi
2025-06-03 1:09 ` Bibo Mao
2025-06-02 20:43 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e88eeb089f33c6cb4c177952038c8e2613be7342.1748791463.git.mst@redhat.com \
--to=mst@redhat.com \
--cc=eperezma@redhat.com \
--cc=jasowang@redhat.com \
--cc=jonah.palmer@oracle.com \
--cc=leiyang@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
--cc=si-wei.liu@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).