[RFC PATCH 0/5] virtio-net: Introduce LM early load

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/5] virtio-net: Introduce LM early load
@ 2023-09-18  4:49 Yajun Wu
  2023-09-18  4:49 ` [RFC PATCH 1/5] vhost-user: Add presetup protocol feature and op Yajun Wu
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Yajun Wu @ 2023-09-18  4:49 UTC (permalink / raw)
  To: qemu-devel, jasowang, mst, yajunw

This series of patches aims to minimize the downtime during live migration of a
virtio-net device with a vhost-user backend. In the case of hardware virtual
Data Path Acceleration (vDPA) implementation, the hardware configuration, which
includes tasks like VQ creation and RSS setting, may take above 200ms. This
significantly increases the downtime of the VM, particularly in terms of
networking.

To reduce the VM downtime, the proposed approach involves capturing the basic
device state/configuration during the VM's running stage and performing the
initial device configuration(presetup). During the normal configuration process
when the VM is in a stopped state, the second configuration is compared to the
first one, and only the differences are applied to reduce downtime. Ideally,
only the vring available index needs to be changed within VM stop.

This feature is disabled by default, because backend like dpdk also needs
adding support for vhost new message. New device property "x-early-migration"
can enable this feature.

1. Register a new vmstate for virtio-net with an early_setup flag to send the
   device state during migration setup.
2. After device state load on destination VM, need to send device status to
   vhost backend in a new way. Introduce new vhost-user message:
   VHOST_USER_PRESETUP, to notify backend of presetup.
3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow:
   a. vhost-dev sending presetup start.
   b. virtio-net setting mtu.
   c. vhost-dev sending vring configuration and setting dummy call/kick fd.
   d. vhost-net sending vring enable.
   e. vhost-dev sending presetup end.

TODOs:
======
  - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface
    if there's same requirement for vhost-vdpa.

  - No vIOMMU support so far. If there is a need for vIOMMU support, it is
    planned to be addressed in a follow-up patchset.

Test:
=====
  - Live migration VM with 2 virtio-net devices, ping can recover.
    Together with DPDK patch [1].
  - The time consumption of DPDK function dev_conf is reduced from 191.4 ms
    to 6.6 ms.

References:
===========

[1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37

Any comments or feedback are highly appreciated.

Thanks,
Yajun

Yajun Wu (5):
  vhost-user: Add presetup protocol feature and op
  vhost: Add support for presetup
  vhost-net: Add support for presetup
  virtio: Add VMState for early load
  virtio-net: Introduce LM early load

 docs/interop/vhost-user.rst       |  10 ++
 hw/net/trace-events               |   1 +
 hw/net/vhost_net.c                |  40 +++++++
 hw/net/virtio-net.c               | 100 ++++++++++++++++++
 hw/virtio/vhost-user.c            |  30 ++++++
 hw/virtio/vhost.c                 | 166 +++++++++++++++++++++++++-----
 hw/virtio/virtio.c                | 152 ++++++++++++++++-----------
 include/hw/virtio/vhost-backend.h |   3 +
 include/hw/virtio/vhost.h         |  12 +++
 include/hw/virtio/virtio-net.h    |   1 +
 include/hw/virtio/virtio.h        |  10 +-
 include/net/vhost_net.h           |   3 +
 12 files changed, 443 insertions(+), 85 deletions(-)

-- 
2.27.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/5] vhost-user: Add presetup protocol feature and op
  2023-09-18  4:49 [RFC PATCH 0/5] virtio-net: Introduce LM early load Yajun Wu
@ 2023-09-18  4:49 ` Yajun Wu
  2023-09-18  4:49 ` [RFC PATCH 2/5] vhost: Add support for presetup Yajun Wu
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Yajun Wu @ 2023-09-18  4:49 UTC (permalink / raw)
  To: qemu-devel, jasowang, mst, yajunw; +Cc: Avihai Horon, Jiri Pirko

This patch implements VHOST_USER_PROTOCOL_F_PRESETUP protocol feature
and VHOST_USER_PRESETUP, so that the backend can know the beginning
and completion of the early setup phase for the virtio device.

Unlike the regular device state load, which occurs in the VM stop
phase, this pre-setup takes place in the live migration setup stage.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 docs/interop/vhost-user.rst       | 10 ++++++++++
 hw/virtio/vhost-user.c            | 30 ++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-backend.h |  3 +++
 3 files changed, 43 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 5a070adbc1..70b8e2694c 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -885,6 +885,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
   #define VHOST_USER_PROTOCOL_F_STATUS               16
   #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
+  #define VHOST_USER_PROTOCOL_F_PRESETUP             18
 
 Front-end message types
 -----------------------
@@ -1440,6 +1441,15 @@ Front-end message types
   query the back-end for its device status as defined in the Virtio
   specification.
 
+``VHOST_USER_PRESETUP``
+  :id: 41
+  :equivalent ioctl: N/A
+  :request payload: ``u64``
+  :reply payload: N/A
+
+  When the ``VHOST_USER_PROTOCOL_F_PRESETUP`` protocol feature has been
+  successfully negotiated, this message is submitted by the front-end to
+  indicate start or end early setup. Value 1 means start, 2 means end.
 
 Back-end message types
 ----------------------
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d42..71018d06c1 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -74,6 +74,8 @@ enum VhostUserProtocolFeature {
     /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */
     VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
     VHOST_USER_PROTOCOL_F_STATUS = 16,
+    /* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
+    VHOST_USER_PROTOCOL_F_PRESETUP = 18,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -121,6 +123,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_REM_MEM_REG = 38,
     VHOST_USER_SET_STATUS = 39,
     VHOST_USER_GET_STATUS = 40,
+    VHOST_USER_PRESETUP = 41,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -132,6 +135,11 @@ typedef enum VhostUserBackendRequest {
     VHOST_USER_BACKEND_MAX
 }  VhostUserBackendRequest;
 
+typedef enum VhostUserPresetupState {
+    VHOST_USER_PRESETUP_START = 1,
+    VHOST_USER_PRESETUP_END = 2,
+} VhostUserPresetupState;
+
 typedef struct VhostUserMemoryRegion {
     uint64_t guest_phys_addr;
     uint64_t memory_size;
@@ -2741,6 +2749,27 @@ static void vhost_user_reset_status(struct vhost_dev *dev)
     }
 }
 
+static int vhost_user_set_presetup_state(struct vhost_dev *dev, bool start)
+{
+    if (start) {
+        return vhost_user_set_u64(dev, VHOST_USER_PRESETUP,
+                                  VHOST_USER_PRESETUP_START, false);
+    } else {
+        return vhost_user_set_u64(dev, VHOST_USER_PRESETUP,
+                                  VHOST_USER_PRESETUP_END, false);
+    }
+}
+
+static int vhost_user_presetup(struct vhost_dev *dev, bool start)
+{
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_PRESETUP)) {
+        return -ENOTSUP;
+    }
+
+    return vhost_user_set_presetup_state(dev, start);
+}
+
 const VhostOps user_ops = {
         .backend_type = VHOST_BACKEND_TYPE_USER,
         .vhost_backend_init = vhost_user_backend_init,
@@ -2777,4 +2806,5 @@ const VhostOps user_ops = {
         .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
         .vhost_dev_start = vhost_user_dev_start,
         .vhost_reset_status = vhost_user_reset_status,
+        .vhost_presetup = vhost_user_presetup,
 };
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 31a251a9f5..00dd532df9 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -133,6 +133,8 @@ typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
 
 typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
 
+typedef int (*vhost_presetup_op)(struct vhost_dev *dev, bool start);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -181,6 +183,7 @@ typedef struct VhostOps {
     vhost_force_iommu_op vhost_force_iommu;
     vhost_set_config_call_op vhost_set_config_call;
     vhost_reset_status_op vhost_reset_status;
+    vhost_presetup_op vhost_presetup;
 } VhostOps;
 
 int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/5] vhost: Add support for presetup
  2023-09-18  4:49 [RFC PATCH 0/5] virtio-net: Introduce LM early load Yajun Wu
  2023-09-18  4:49 ` [RFC PATCH 1/5] vhost-user: Add presetup protocol feature and op Yajun Wu
@ 2023-09-18  4:49 ` Yajun Wu
  2023-12-22 18:46   ` Eugenio Perez Martin
  2023-09-18  4:49 ` [RFC PATCH 3/5] vhost-net: " Yajun Wu
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Yajun Wu @ 2023-09-18  4:49 UTC (permalink / raw)
  To: qemu-devel, jasowang, mst, yajunw; +Cc: Avihai Horon, Jiri Pirko

Add New API vhost_dev_start_presetup to notify backend the start
and end of presetup.

API vhost_dev_presetup to send out the device configurations:
1. acked_features
2. memory table
3. vring information
4. disable host/guest notifier.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 hw/virtio/vhost.c         | 166 ++++++++++++++++++++++++++++++++------
 include/hw/virtio/vhost.h |  12 +++
 2 files changed, 152 insertions(+), 26 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e2f6ffb446..5b162590fb 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1138,24 +1138,71 @@ out:
     return ret;
 }
 
-int vhost_virtqueue_start(struct vhost_dev *dev,
-                          struct VirtIODevice *vdev,
-                          struct vhost_virtqueue *vq,
-                          unsigned idx)
+static void vhost_virtqueue_memory_unmap(struct vhost_dev *dev,
+                                         struct VirtIODevice *vdev,
+                                         struct vhost_virtqueue *vq,
+                                         unsigned idx)
+{
+    if (vq->used) {
+        vhost_memory_unmap(dev, vq->used,
+                           virtio_queue_get_used_size(vdev, idx),
+                           1, virtio_queue_get_used_size(vdev, idx));
+        vq->used = NULL;
+    }
+
+    if (vq->avail) {
+        vhost_memory_unmap(dev, vq->avail,
+                           virtio_queue_get_avail_size(vdev, idx),
+                           0, virtio_queue_get_avail_size(vdev, idx));
+        vq->avail = NULL;
+    }
+
+    if (vq->desc) {
+        vhost_memory_unmap(dev, vq->desc,
+                           virtio_queue_get_desc_size(vdev, idx),
+                           0, virtio_queue_get_desc_size(vdev, idx));
+        vq->desc = NULL;
+    }
+}
+
+static int vhost_virtqueue_disable_notify(struct vhost_dev *dev,
+                                          struct VirtIODevice *vdev,
+                                          struct vhost_virtqueue *vq,
+                                          unsigned idx)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
-    hwaddr s, l, a;
-    int r;
     int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx);
     struct vhost_vring_file file = {
         .index = vhost_vq_index
     };
+    int r;
+
+    file.fd = -1;
+    r = dev->vhost_ops->vhost_set_vring_kick(dev, &file);
+    if (r) {
+        VHOST_OPS_DEBUG(r, "vhost_set_vring_kick failed");
+        return r;
+    }
+
+    r = dev->vhost_ops->vhost_set_vring_call(dev, &file);
+    if (r) {
+        VHOST_OPS_DEBUG(r, "vhost_set_vring_call failed");
+        return r;
+    }
+
+    return 0;
+}
+
+static int vhost_virtqueue_vring_setup(struct vhost_dev *dev,
+                                       struct VirtIODevice *vdev,
+                                       struct vhost_virtqueue *vq,
+                                       unsigned idx)
+{
+    hwaddr s, l, a;
+    int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx);
     struct vhost_vring_state state = {
         .index = vhost_vq_index
     };
-    struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
+    int r;
 
     a = virtio_queue_get_desc_addr(vdev, idx);
     if (a == 0) {
@@ -1186,6 +1233,10 @@ int vhost_virtqueue_start(struct vhost_dev *dev,
         }
     }
 
+    if (vq->desc) {
+        vhost_virtqueue_memory_unmap(dev, vdev, vq, idx);
+    }
+
     vq->desc_size = s = l = virtio_queue_get_desc_size(vdev, idx);
     vq->desc_phys = a;
     vq->desc = vhost_memory_map(dev, a, &l, false);
@@ -1212,6 +1263,36 @@ int vhost_virtqueue_start(struct vhost_dev *dev,
     if (r < 0) {
         goto fail_alloc;
     }
+    return 0;
+
+fail_alloc:
+fail_alloc_used:
+fail_alloc_avail:
+    vhost_virtqueue_memory_unmap(dev, vdev, vq, idx);
+fail_alloc_desc:
+    return r;
+}
+
+int vhost_virtqueue_start(struct vhost_dev *dev,
+                          struct VirtIODevice *vdev,
+                          struct vhost_virtqueue *vq,
+                          unsigned idx)
+{
+    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+    VirtioBusState *vbus = VIRTIO_BUS(qbus);
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
+    int r;
+    int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx);
+    struct vhost_vring_file file = {
+        .index = vhost_vq_index
+    };
+    struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
+
+    r = vhost_virtqueue_vring_setup(dev, vdev, vq, idx);
+    if (r) {
+        VHOST_OPS_DEBUG(r, "vhost_virtqueue_vring_setup failed");
+        goto fail_vring_setup;
+    }
 
     file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq));
     r = dev->vhost_ops->vhost_set_vring_kick(dev, &file);
@@ -1245,16 +1326,8 @@ int vhost_virtqueue_start(struct vhost_dev *dev,
 
 fail_vector:
 fail_kick:
-fail_alloc:
-    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
-                       0, 0);
-fail_alloc_used:
-    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
-                       0, 0);
-fail_alloc_avail:
-    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
-                       0, 0);
-fail_alloc_desc:
+    vhost_virtqueue_memory_unmap(dev, vdev, vq, idx);
+fail_vring_setup:
     return r;
 }
 
@@ -1296,12 +1369,7 @@ void vhost_virtqueue_stop(struct vhost_dev *dev,
                                                 vhost_vq_index);
     }
 
-    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
-                       1, virtio_queue_get_used_size(vdev, idx));
-    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
-                       0, virtio_queue_get_avail_size(vdev, idx));
-    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
-                       0, virtio_queue_get_desc_size(vdev, idx));
+    vhost_virtqueue_memory_unmap(dev, vdev, vq, idx);
 }
 
 static int vhost_virtqueue_set_busyloop_timeout(struct vhost_dev *dev,
@@ -1921,6 +1989,43 @@ static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable)
     return hdev->vhost_ops->vhost_set_vring_enable(hdev, enable);
 }
 
+int vhost_dev_presetup(struct vhost_dev *hdev, VirtIODevice *vdev)
+{
+    int i, r;
+
+    /* should only be called after backend is connected */
+    assert(hdev->vhost_ops);
+
+    r = vhost_dev_set_features(hdev, hdev->log_enabled);
+    if (r < 0) {
+        return r;
+    }
+
+    r = hdev->vhost_ops->vhost_set_mem_table(hdev, hdev->mem);
+    if (r < 0) {
+        VHOST_OPS_DEBUG(r, "vhost_set_mem_table failed");
+        return r;
+    }
+
+    for (i = 0; i < hdev->nvqs; ++i) {
+        r = vhost_virtqueue_vring_setup(hdev, vdev,
+                                        hdev->vqs + i,
+                                        hdev->vq_index + i);
+        if (r < 0) {
+            VHOST_OPS_DEBUG(r, "vhost_virtqueue_setup failed");
+            return r;
+        }
+        r = vhost_virtqueue_disable_notify(hdev, vdev,
+                                           hdev->vqs + i,
+                                           hdev->vq_index + i);
+        if (r < 0) {
+            return r;
+        }
+    }
+
+    return 0;
+}
+
 /* Host notifiers must be enabled at this point. */
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
 {
@@ -2087,3 +2192,12 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
 
     return -ENOSYS;
 }
+
+int vhost_dev_set_presetup_state(struct vhost_dev *hdev, bool start)
+{
+    if (!hdev->vhost_ops->vhost_presetup) {
+        return -ENOTSUP;
+    }
+
+    return hdev->vhost_ops->vhost_presetup(hdev, start);
+}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 6a173cb9fa..95a8031d12 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -192,6 +192,17 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 bool vhost_config_pending(struct vhost_dev *hdev);
 void vhost_config_mask(struct vhost_dev *hdev, VirtIODevice *vdev, bool mask);
 
+/**
+ * vhost_dev_presetup() - pre-setup the vhost device in LM
+ * @hdev: common vhost_dev structure
+ * @vdev: the VirtIODevice structure
+ *
+ * During live migration, send out device information to backend in early
+ * running state. Backend can have enough time to prepare HW.
+ * Return: 0 on success, < 0 on error.
+ */
+int vhost_dev_presetup(struct vhost_dev *hdev, VirtIODevice *vdev);
+
 /**
  * vhost_dev_is_started() - report status of vhost device
  * @hdev: common vhost_dev structure
@@ -338,4 +349,5 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
 int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
                            struct vhost_inflight *inflight);
 bool vhost_dev_has_iommu(struct vhost_dev *dev);
+int vhost_dev_set_presetup_state(struct vhost_dev *hdev, bool start);
 #endif
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/5] vhost-net: Add support for presetup
  2023-09-18  4:49 [RFC PATCH 0/5] virtio-net: Introduce LM early load Yajun Wu
  2023-09-18  4:49 ` [RFC PATCH 1/5] vhost-user: Add presetup protocol feature and op Yajun Wu
  2023-09-18  4:49 ` [RFC PATCH 2/5] vhost: Add support for presetup Yajun Wu
@ 2023-09-18  4:49 ` Yajun Wu
  2023-09-18  4:49 ` [RFC PATCH 4/5] virtio: Add VMState for early load Yajun Wu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Yajun Wu @ 2023-09-18  4:49 UTC (permalink / raw)
  To: qemu-devel, jasowang, mst, yajunw; +Cc: Avihai Horon, Jiri Pirko

Introduce New API vhost_net_presetup to send virtio net device
configuration to backend in LM setup.

Mainly calling vhost_dev_presetup, then sending out vring enable.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 hw/net/vhost_net.c      | 40 ++++++++++++++++++++++++++++++++++++++++
 include/net/vhost_net.h |  3 +++
 2 files changed, 43 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6b958d6363..dcb818ccf1 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -345,6 +345,46 @@ static void vhost_net_stop_one(struct vhost_net *net,
     vhost_dev_disable_notifiers(&net->dev, dev);
 }
 
+int vhost_net_presetup(VirtIODevice *dev, NetClientState *ncs,
+                    int data_queue_pairs, int cvq)
+{
+    VirtIONet *n = VIRTIO_NET(dev);
+    int nvhosts = data_queue_pairs + cvq;
+    struct vhost_net *net;
+    int r = 0, i, index_end = data_queue_pairs * 2;
+    NetClientState *peer;
+
+    if (cvq) {
+        index_end += 1;
+    }
+
+    for (i = 0; i < nvhosts; i++) {
+        if (i < data_queue_pairs) {
+            peer = qemu_get_peer(ncs, i);
+        } else { /* Control Virtqueue */
+            peer = qemu_get_peer(ncs, n->max_queue_pairs);
+        }
+
+        net = get_vhost_net(peer);
+        vhost_net_set_vq_index(net, i * 2, index_end);
+
+        r = vhost_dev_presetup(&net->dev, dev);
+        if (r < 0) {
+            return r;
+        }
+
+        if (peer->vring_enable) {
+            /* restore vring enable state */
+            r = vhost_set_vring_enable(peer, peer->vring_enable);
+            if (r < 0) {
+                return r;
+            }
+        }
+    }
+
+    return r;
+}
+
 int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
                     int data_queue_pairs, int cvq)
 {
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index c37aba35e6..2c9020c5a2 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -26,6 +26,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
                     int data_queue_pairs, int cvq);
 
+int vhost_net_presetup(VirtIODevice *dev, NetClientState *ncs,
+                           int data_queue_pairs, int cvq);
+
 void vhost_net_cleanup(VHostNetState *net);
 
 uint64_t vhost_net_get_features(VHostNetState *net, uint64_t features);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/5] virtio: Add VMState for early load
  2023-09-18  4:49 [RFC PATCH 0/5] virtio-net: Introduce LM early load Yajun Wu
                   ` (2 preceding siblings ...)
  2023-09-18  4:49 ` [RFC PATCH 3/5] vhost-net: " Yajun Wu
@ 2023-09-18  4:49 ` Yajun Wu
  2023-12-22 17:36   ` Eugenio Perez Martin
  2023-09-18  4:49 ` [RFC PATCH 5/5] virtio-net: Introduce LM " Yajun Wu
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Yajun Wu @ 2023-09-18  4:49 UTC (permalink / raw)
  To: qemu-devel, jasowang, mst, yajunw; +Cc: Avihai Horon, Jiri Pirko

Define new virtio device vmstate for early save/load (happens in
LM setup stage). Same as original vmstate, except:

In LM setup phase of the destination VM, the guest memory has not
been synchronized yet. To address this, a flag has been added to
virtio_load function to skip the index check.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 hw/virtio/virtio.c         | 152 +++++++++++++++++++++++--------------
 include/hw/virtio/virtio.h |  10 ++-
 2 files changed, 103 insertions(+), 59 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 969c25f4cf..8c73c245dd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2832,7 +2832,17 @@ virtio_device_get(QEMUFile *f, void *opaque, size_t size,
     VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
     DeviceClass *dc = DEVICE_CLASS(VIRTIO_DEVICE_GET_CLASS(vdev));
 
-    return virtio_load(vdev, f, dc->vmsd->version_id);
+    return virtio_load(vdev, f, dc->vmsd->version_id, false);
+}
+
+/* A wrapper for use as a VMState .get function */
+static int virtio_early_device_get(QEMUFile *f, void *opaque, size_t size,
+                                      const VMStateField *field)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+    DeviceClass *dc = DEVICE_CLASS(VIRTIO_DEVICE_GET_CLASS(vdev));
+
+    return virtio_load(vdev, f, dc->vmsd->version_id, true);
 }
 
 const VMStateInfo  virtio_vmstate_info = {
@@ -2841,6 +2851,12 @@ const VMStateInfo  virtio_vmstate_info = {
     .put = virtio_device_put,
 };
 
+const VMStateInfo virtio_early_vmstate_info = {
+    .name = "virtio-early",
+    .get = virtio_early_device_get,
+    .put = virtio_device_put,
+};
+
 static int virtio_set_features_nocheck(VirtIODevice *vdev, uint64_t val)
 {
     VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
@@ -2940,8 +2956,75 @@ size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
     return config_size;
 }
 
+static int virtio_load_check_index(VirtIODevice *vdev, int num)
+{
+    int i;
+
+    RCU_READ_LOCK_GUARD();
+
+    for (i = 0; i < num; i++) {
+        if (vdev->vq[i].vring.desc) {
+            uint16_t nheads;
+
+            /*
+             * VIRTIO-1 devices migrate desc, used, and avail ring addresses so
+             * only the region cache needs to be set up.  Legacy devices need
+             * to calculate used and avail ring addresses based on the desc
+             * address.
+             */
+            if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+                virtio_init_region_cache(vdev, i);
+            } else {
+                virtio_queue_update_rings(vdev, i);
+            }
+
+            if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+                vdev->vq[i].shadow_avail_idx = vdev->vq[i].last_avail_idx;
+                vdev->vq[i].shadow_avail_wrap_counter =
+                                        vdev->vq[i].last_avail_wrap_counter;
+                continue;
+            }
+
+            nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
+            /* Check it isn't doing strange things with descriptor numbers. */
+            if (nheads > vdev->vq[i].vring.num) {
+                virtio_error(vdev, "VQ %d size 0x%x Guest index 0x%x "
+                             "inconsistent with Host index 0x%x: delta 0x%x",
+                             i, vdev->vq[i].vring.num,
+                             vring_avail_idx(&vdev->vq[i]),
+                             vdev->vq[i].last_avail_idx, nheads);
+                vdev->vq[i].used_idx = 0;
+                vdev->vq[i].shadow_avail_idx = 0;
+                vdev->vq[i].inuse = 0;
+                continue;
+            }
+            vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
+            vdev->vq[i].shadow_avail_idx = vring_avail_idx(&vdev->vq[i]);
+
+            /*
+             * Some devices migrate VirtQueueElements that have been popped
+             * from the avail ring but not yet returned to the used ring.
+             * Since max ring size < UINT16_MAX it's safe to use modulo
+             * UINT16_MAX + 1 subtraction.
+             */
+            vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
+                                vdev->vq[i].used_idx);
+            if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
+                error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
+                             "used_idx 0x%x",
+                             i, vdev->vq[i].vring.num,
+                             vdev->vq[i].last_avail_idx,
+                             vdev->vq[i].used_idx);
+                return -1;
+            }
+        }
+    }
+
+    return 0;
+}
+
 int coroutine_mixed_fn
-virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
+virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id, bool early)
 {
     int i, ret;
     int32_t config_len;
@@ -3078,62 +3161,15 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
         vdev->start_on_kick = true;
     }
 
-    RCU_READ_LOCK_GUARD();
-    for (i = 0; i < num; i++) {
-        if (vdev->vq[i].vring.desc) {
-            uint16_t nheads;
-
-            /*
-             * VIRTIO-1 devices migrate desc, used, and avail ring addresses so
-             * only the region cache needs to be set up.  Legacy devices need
-             * to calculate used and avail ring addresses based on the desc
-             * address.
-             */
-            if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
-                virtio_init_region_cache(vdev, i);
-            } else {
-                virtio_queue_update_rings(vdev, i);
-            }
-
-            if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
-                vdev->vq[i].shadow_avail_idx = vdev->vq[i].last_avail_idx;
-                vdev->vq[i].shadow_avail_wrap_counter =
-                                        vdev->vq[i].last_avail_wrap_counter;
-                continue;
-            }
-
-            nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
-            /* Check it isn't doing strange things with descriptor numbers. */
-            if (nheads > vdev->vq[i].vring.num) {
-                virtio_error(vdev, "VQ %d size 0x%x Guest index 0x%x "
-                             "inconsistent with Host index 0x%x: delta 0x%x",
-                             i, vdev->vq[i].vring.num,
-                             vring_avail_idx(&vdev->vq[i]),
-                             vdev->vq[i].last_avail_idx, nheads);
-                vdev->vq[i].used_idx = 0;
-                vdev->vq[i].shadow_avail_idx = 0;
-                vdev->vq[i].inuse = 0;
-                continue;
-            }
-            vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
-            vdev->vq[i].shadow_avail_idx = vring_avail_idx(&vdev->vq[i]);
-
-            /*
-             * Some devices migrate VirtQueueElements that have been popped
-             * from the avail ring but not yet returned to the used ring.
-             * Since max ring size < UINT16_MAX it's safe to use modulo
-             * UINT16_MAX + 1 subtraction.
-             */
-            vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
-                                vdev->vq[i].used_idx);
-            if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
-                error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
-                             "used_idx 0x%x",
-                             i, vdev->vq[i].vring.num,
-                             vdev->vq[i].last_avail_idx,
-                             vdev->vq[i].used_idx);
-                return -1;
-            }
+    /*
+     * Early setup happens in LM setup stage when the guest memory hasn't
+     * synced to target VM yet. So skip all guest memory access and index check
+     * in early load.
+     */
+    if (!early) {
+        ret = virtio_load_check_index(vdev, num);
+        if (ret) {
+            return ret;
         }
     }
 
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..c9e6faf72c 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -280,6 +280,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
 int virtio_save(VirtIODevice *vdev, QEMUFile *f);
 
 extern const VMStateInfo virtio_vmstate_info;
+extern const VMStateInfo virtio_early_vmstate_info;
 
 #define VMSTATE_VIRTIO_DEVICE \
     {                                         \
@@ -288,7 +289,14 @@ extern const VMStateInfo virtio_vmstate_info;
         .flags = VMS_SINGLE,                  \
     }
 
-int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id);
+#define VMSTATE_EARLY_VIRTIO_DEVICE \
+    {                                         \
+        .name = "virtio-early",            \
+        .info = &virtio_early_vmstate_info,\
+        .flags = VMS_SINGLE,                  \
+    }
+
+int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id, bool early);
 
 /**
  * virtio_notify_config() - signal a change to device config
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 5/5] virtio-net: Introduce LM early load
  2023-09-18  4:49 [RFC PATCH 0/5] virtio-net: Introduce LM early load Yajun Wu
                   ` (3 preceding siblings ...)
  2023-09-18  4:49 ` [RFC PATCH 4/5] virtio: Add VMState for early load Yajun Wu
@ 2023-09-18  4:49 ` Yajun Wu
  2023-12-22 18:58   ` Eugenio Perez Martin
  2023-10-17  7:32 ` [RFC PATCH 0/5] " Yajun Wu
  2023-10-17 16:47 ` Eugenio Perez Martin
  6 siblings, 1 reply; 14+ messages in thread
From: Yajun Wu @ 2023-09-18  4:49 UTC (permalink / raw)
  To: qemu-devel, jasowang, mst, yajunw; +Cc: Avihai Horon, Jiri Pirko

Register a new vmstate for virtio-net with an early_setup flag to send
the device state during migration setup.

This can reduce the migration downtime of a virtio-net device with a
vhost-user backend.

This feature is disabled by default and can be enabled by setting the
"x-early-migration" device property to on.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 hw/net/trace-events            |   1 +
 hw/net/virtio-net.c            | 100 +++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-net.h |   1 +
 3 files changed, 102 insertions(+)

diff --git a/hw/net/trace-events b/hw/net/trace-events
index 6b5ba669a2..ec89229044 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -399,6 +399,7 @@ virtio_net_post_load_device(void)
 virtio_net_rss_disable(void)
 virtio_net_rss_error(const char *msg, uint32_t value) "%s, value 0x%08x"
 virtio_net_rss_enable(uint32_t p1, uint16_t p2, uint8_t p3) "hashes 0x%x, table of %d, key of %d"
+virtio_net_load_early_setup(void) ""
 
 # tulip.c
 tulip_reg_write(uint64_t addr, const char *name, int size, uint64_t val) "addr 0x%02"PRIx64" (%s) size %d value 0x%08"PRIx64
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7102ec4817..d0b0cc2ffe 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -46,6 +46,7 @@
 #include "net_rx_pkt.h"
 #include "hw/virtio/vhost.h"
 #include "sysemu/qtest.h"
+#include "sysemu/runstate.h"
 
 #define VIRTIO_NET_VM_VERSION    11
 
@@ -3568,6 +3569,95 @@ static bool failover_hide_primary_device(DeviceListener *listener,
     return qatomic_read(&n->failover_primary_hidden);
 }
 
+static int virtio_net_load_early_setup(void *opaque, int version_id)
+{
+    VirtIONet *n = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(n);
+    NetClientState *nc = qemu_get_queue(n->nic);
+    int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
+        n->max_ncs - n->max_queue_pairs : 0;
+    VHostNetState *net;
+    int r;
+
+    assert(nc->peer);
+    assert(nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER);
+
+    net = get_vhost_net(nc->peer);
+    assert(net);
+    assert(net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
+
+    trace_virtio_net_load_early_setup();
+
+    /* backend should support presetup */
+    r = vhost_dev_set_presetup_state(&net->dev, true);
+    if (r < 0) {
+        error_report("Start presetup device fail: %d", r);
+        return r;
+    }
+
+    if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_MTU)) {
+        r = vhost_net_set_mtu(get_vhost_net(nc->peer), n->net_conf.mtu);
+        if (r < 0) {
+            error_report("%uBytes MTU not supported by the backend",
+                         n->net_conf.mtu);
+            goto error;
+        }
+    }
+
+    r = vhost_net_presetup(vdev, n->nic->ncs, queue_pairs, cvq);
+    if (r < 0) {
+        error_report("Presetup device fail: %d", r);
+        goto error;
+    }
+
+    r = vhost_dev_set_presetup_state(&net->dev, false);
+    if (r < 0) {
+        error_report("Finish presetup device fail: %d", r);
+        return r;
+    }
+    return 0;
+
+error:
+    vhost_dev_set_presetup_state(&net->dev, false);
+    return r;
+}
+
+static bool virtio_net_early_setup_needed(void *opaque)
+{
+    VirtIONet *n = opaque;
+    NetClientState *nc = qemu_get_queue(n->nic);
+    VHostNetState *net = get_vhost_net(nc->peer);
+
+    /*
+     * Presetup aims to reduce live migration downtime by sync device
+     * status in setup stage. So only do presetup when source VM is in
+     * running state.
+     */
+    if (runstate_is_running() &&
+        nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER &&
+        net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER &&
+        !vhost_dev_has_iommu(&net->dev) &&
+        n->vhost_started &&
+        n->status & VIRTIO_NET_S_LINK_UP) {
+        return true;
+    }
+    return false;
+}
+
+static const VMStateDescription vmstate_virtio_net_early = {
+    .name = "virtio-net-early",
+    .minimum_version_id = VIRTIO_NET_VM_VERSION,
+    .version_id = VIRTIO_NET_VM_VERSION,
+    .fields = (VMStateField[]) {
+        VMSTATE_EARLY_VIRTIO_DEVICE,
+        VMSTATE_END_OF_LIST()
+    },
+    .early_setup = true,
+    .post_load = virtio_net_load_early_setup,
+    .needed = virtio_net_early_setup_needed,
+};
+
 static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -3743,6 +3833,11 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
     if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
         virtio_net_load_ebpf(n);
     }
+
+    if (n->early_migration) {
+        vmstate_register(NULL, VMSTATE_INSTANCE_ID_ANY,
+                         &vmstate_virtio_net_early, n);
+    }
 }
 
 static void virtio_net_device_unrealize(DeviceState *dev)
@@ -3787,6 +3882,10 @@ static void virtio_net_device_unrealize(DeviceState *dev)
     g_free(n->rss_data.indirections_table);
     net_rx_pkt_uninit(n->rx_pkt);
     virtio_cleanup(vdev);
+
+    if (n->early_migration) {
+        vmstate_unregister(NULL, &vmstate_virtio_net_early, n);
+    }
 }
 
 static void virtio_net_instance_init(Object *obj)
@@ -3922,6 +4021,7 @@ static Property virtio_net_properties[] = {
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
     DEFINE_PROP_BOOL("failover", VirtIONet, failover, false),
+    DEFINE_PROP_BOOL("x-early-migration", VirtIONet, early_migration, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index e07a723027..9e6f90b46f 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -212,6 +212,7 @@ struct VirtIONet {
     /* primary failover device is hidden*/
     bool failover_primary_hidden;
     bool failover;
+    bool early_migration;
     DeviceListener primary_listener;
     QDict *primary_opts;
     bool primary_opts_from_json;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/5] virtio-net: Introduce LM early load
  2023-09-18  4:49 [RFC PATCH 0/5] virtio-net: Introduce LM early load Yajun Wu
                   ` (4 preceding siblings ...)
  2023-09-18  4:49 ` [RFC PATCH 5/5] virtio-net: Introduce LM " Yajun Wu
@ 2023-10-17  7:32 ` Yajun Wu
  2023-10-17 16:47 ` Eugenio Perez Martin
  6 siblings, 0 replies; 14+ messages in thread
From: Yajun Wu @ 2023-10-17  7:32 UTC (permalink / raw)
  To: qemu-devel@nongnu.org, jasowang@redhat.com, mst@redhat.com

Ping.

On 9/18/2023 12:49 PM, Yajun Wu wrote:
> This series of patches aims to minimize the downtime during live migration of a
> virtio-net device with a vhost-user backend. In the case of hardware virtual
> Data Path Acceleration (vDPA) implementation, the hardware configuration, which
> includes tasks like VQ creation and RSS setting, may take above 200ms. This
> significantly increases the downtime of the VM, particularly in terms of
> networking.
>
> To reduce the VM downtime, the proposed approach involves capturing the basic
> device state/configuration during the VM's running stage and performing the
> initial device configuration(presetup). During the normal configuration process
> when the VM is in a stopped state, the second configuration is compared to the
> first one, and only the differences are applied to reduce downtime. Ideally,
> only the vring available index needs to be changed within VM stop.
>
> This feature is disabled by default, because backend like dpdk also needs
> adding support for vhost new message. New device property "x-early-migration"
> can enable this feature.
>
> 1. Register a new vmstate for virtio-net with an early_setup flag to send the
>     device state during migration setup.
> 2. After device state load on destination VM, need to send device status to
>     vhost backend in a new way. Introduce new vhost-user message:
>     VHOST_USER_PRESETUP, to notify backend of presetup.
> 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow:
>     a. vhost-dev sending presetup start.
>     b. virtio-net setting mtu.
>     c. vhost-dev sending vring configuration and setting dummy call/kick fd.
>     d. vhost-net sending vring enable.
>     e. vhost-dev sending presetup end.
>
>
> TODOs:
> ======
>    - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface
>      if there's same requirement for vhost-vdpa.
>
>    - No vIOMMU support so far. If there is a need for vIOMMU support, it is
>      planned to be addressed in a follow-up patchset.
>
>
> Test:
> =====
>    - Live migration VM with 2 virtio-net devices, ping can recover.
>      Together with DPDK patch [1].
>    - The time consumption of DPDK function dev_conf is reduced from 191.4 ms
>      to 6.6 ms.
>
>
> References:
> ===========
>
> [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37
>
> Any comments or feedback are highly appreciated.
>
> Thanks,
> Yajun
>
>
> Yajun Wu (5):
>    vhost-user: Add presetup protocol feature and op
>    vhost: Add support for presetup
>    vhost-net: Add support for presetup
>    virtio: Add VMState for early load
>    virtio-net: Introduce LM early load
>
>   docs/interop/vhost-user.rst       |  10 ++
>   hw/net/trace-events               |   1 +
>   hw/net/vhost_net.c                |  40 +++++++
>   hw/net/virtio-net.c               | 100 ++++++++++++++++++
>   hw/virtio/vhost-user.c            |  30 ++++++
>   hw/virtio/vhost.c                 | 166 +++++++++++++++++++++++++-----
>   hw/virtio/virtio.c                | 152 ++++++++++++++++-----------
>   include/hw/virtio/vhost-backend.h |   3 +
>   include/hw/virtio/vhost.h         |  12 +++
>   include/hw/virtio/virtio-net.h    |   1 +
>   include/hw/virtio/virtio.h        |  10 +-
>   include/net/vhost_net.h           |   3 +
>   12 files changed, 443 insertions(+), 85 deletions(-)
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/5] virtio-net: Introduce LM early load
  2023-09-18  4:49 [RFC PATCH 0/5] virtio-net: Introduce LM early load Yajun Wu
                   ` (5 preceding siblings ...)
  2023-10-17  7:32 ` [RFC PATCH 0/5] " Yajun Wu
@ 2023-10-17 16:47 ` Eugenio Perez Martin
  2023-10-18  6:40   ` Yajun Wu
  6 siblings, 1 reply; 14+ messages in thread
From: Eugenio Perez Martin @ 2023-10-17 16:47 UTC (permalink / raw)
  To: Yajun Wu; +Cc: qemu-devel, jasowang, mst

On Mon, Sep 18, 2023 at 6:51 AM Yajun Wu <yajunw@nvidia.com> wrote:
>
> This series of patches aims to minimize the downtime during live migration of a
> virtio-net device with a vhost-user backend. In the case of hardware virtual
> Data Path Acceleration (vDPA) implementation, the hardware configuration, which
> includes tasks like VQ creation and RSS setting, may take above 200ms. This
> significantly increases the downtime of the VM, particularly in terms of
> networking.
>

Hi!

Sorry I totally missed this email. Please CC me in next versions.

Just for completion, there is an ongoing plan to reduce the downtime
in vhost-vdpa. You can find more details at [1].

To send the state periodically is in the roadmap, but some
benchmarking detected that memory pinning and unpinning affects more
to downtime. I'll send a RFC soon with this. The plan was to continue
with iterative state restoring, so I'm happy to know more people are
looking into it!

In the case of vhost-vdpa it already restores the state by not
enabling dataplane until migration completes. All the load is
performed using CVQ, as you can see in
net/vhost-vdpa.c:vhost_vdpa_net_load. After that, all dataplane is
started again.

My idea is to start vhost-vdpa (by calling vhost_vdpa_dev_start) at
the destination at the same moment the migration starts, as it will
not have dataplane enabled. After that, the source should send the
virtio-net vmstate every time it changes. vhost-vdpa net is able to
send and receive through CVQ, so it should be able to modify net
device configuration as many times as needed. I guess that could be
done by calling something in the line of your
vhost_user_set_presetup_state.

This can be improved in vhost-vdpa by being able to send only the new state.

When all the migration is completed, vhost-vdpa net dataplane should
start as it does now.

If you are interested in saving changes to vhost-user protocol, maybe
qemu could just disable the dataplane too with
VHOST_USER_SET_VRING_ENABLE? If not, I think both approaches have a
lot in common, so I'm sure we can develop one backend on top of
another.

Thanks!

[1] https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg00659.html

> To reduce the VM downtime, the proposed approach involves capturing the basic
> device state/configuration during the VM's running stage and performing the
> initial device configuration(presetup). During the normal configuration process
> when the VM is in a stopped state, the second configuration is compared to the
> first one, and only the differences are applied to reduce downtime. Ideally,
> only the vring available index needs to be changed within VM stop.
>
> This feature is disabled by default, because backend like dpdk also needs
> adding support for vhost new message. New device property "x-early-migration"
> can enable this feature.
>
> 1. Register a new vmstate for virtio-net with an early_setup flag to send the
>    device state during migration setup.
> 2. After device state load on destination VM, need to send device status to
>    vhost backend in a new way. Introduce new vhost-user message:
>    VHOST_USER_PRESETUP, to notify backend of presetup.
> 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow:
>    a. vhost-dev sending presetup start.
>    b. virtio-net setting mtu.
>    c. vhost-dev sending vring configuration and setting dummy call/kick fd.
>    d. vhost-net sending vring enable.
>    e. vhost-dev sending presetup end.
>
>
> TODOs:
> ======
>   - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface
>     if there's same requirement for vhost-vdpa.
>
>   - No vIOMMU support so far. If there is a need for vIOMMU support, it is
>     planned to be addressed in a follow-up patchset.
>
>
> Test:
> =====
>   - Live migration VM with 2 virtio-net devices, ping can recover.
>     Together with DPDK patch [1].
>   - The time consumption of DPDK function dev_conf is reduced from 191.4 ms
>     to 6.6 ms.
>
>
> References:
> ===========
>
> [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37
>
> Any comments or feedback are highly appreciated.
>
> Thanks,
> Yajun
>
>
> Yajun Wu (5):
>   vhost-user: Add presetup protocol feature and op
>   vhost: Add support for presetup
>   vhost-net: Add support for presetup
>   virtio: Add VMState for early load
>   virtio-net: Introduce LM early load
>
>  docs/interop/vhost-user.rst       |  10 ++
>  hw/net/trace-events               |   1 +
>  hw/net/vhost_net.c                |  40 +++++++
>  hw/net/virtio-net.c               | 100 ++++++++++++++++++
>  hw/virtio/vhost-user.c            |  30 ++++++
>  hw/virtio/vhost.c                 | 166 +++++++++++++++++++++++++-----
>  hw/virtio/virtio.c                | 152 ++++++++++++++++-----------
>  include/hw/virtio/vhost-backend.h |   3 +
>  include/hw/virtio/vhost.h         |  12 +++
>  include/hw/virtio/virtio-net.h    |   1 +
>  include/hw/virtio/virtio.h        |  10 +-
>  include/net/vhost_net.h           |   3 +
>  12 files changed, 443 insertions(+), 85 deletions(-)
>
> --
> 2.27.0
>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/5] virtio-net: Introduce LM early load
  2023-10-17 16:47 ` Eugenio Perez Martin
@ 2023-10-18  6:40   ` Yajun Wu
  2023-10-19 15:00     ` Eugenio Perez Martin
  0 siblings, 1 reply; 14+ messages in thread
From: Yajun Wu @ 2023-10-18  6:40 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com, mst@redhat.com, parav,
	jiri


On 10/18/2023 12:47 AM, Eugenio Perez Martin wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, Sep 18, 2023 at 6:51 AM Yajun Wu <yajunw@nvidia.com> wrote:
>> This series of patches aims to minimize the downtime during live migration of a
>> virtio-net device with a vhost-user backend. In the case of hardware virtual
>> Data Path Acceleration (vDPA) implementation, the hardware configuration, which
>> includes tasks like VQ creation and RSS setting, may take above 200ms. This
>> significantly increases the downtime of the VM, particularly in terms of
>> networking.
>>
> Hi!
>
> Sorry I totally missed this email. Please CC me in next versions.
>
> Just for completion, there is an ongoing plan to reduce the downtime
> in vhost-vdpa. You can find more details at [1].
>
> To send the state periodically is in the roadmap, but some
> benchmarking detected that memory pinning and unpinning affects more
> to downtime. I'll send a RFC soon with this. The plan was to continue
> with iterative state restoring, so I'm happy to know more people are
> looking into it!
>
> In the case of vhost-vdpa it already restores the state by not
> enabling dataplane until migration completes. All the load is
> performed using CVQ, as you can see in
> net/vhost-vdpa.c:vhost_vdpa_net_load. After that, all dataplane is
> started again.
>
> My idea is to start vhost-vdpa (by calling vhost_vdpa_dev_start) at
> the destination at the same moment the migration starts, as it will
> not have dataplane enabled. After that, the source should send the
> virtio-net vmstate every time it changes. vhost-vdpa net is able to
> send and receive through CVQ, so it should be able to modify net
> device configuration as many times as needed. I guess that could be
> done by calling something in the line of your
> vhost_user_set_presetup_state.
This is very good approach. How do you know when virtio-net vmstate 
change? vhost-user and vhost-vdpa should share same code of virtio-net 
vmstate early sync.
>
> This can be improved in vhost-vdpa by being able to send only the new state.
>
> When all the migration is completed, vhost-vdpa net dataplane should
> start as it does now.
>
> If you are interested in saving changes to vhost-user protocol, maybe
> qemu could just disable the dataplane too with
> VHOST_USER_SET_VRING_ENABLE? If not, I think both approaches have a
> lot in common, so I'm sure we can develop one backend on top of
> another.
>
> Thanks!
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg00659.html

I'm afraid just like DRIVER_OK as a hint for vhost-user vDPA to apply 
all the configuration to HW. Vhost-user also needs same hint as the end 
of each round vmstate sync to apply configuration to HW. That's why I 
need define new protocol message.

Because of MQ can also change, VQ enable is a valid parameter to HW. HW 
will create only enabled queue, number of enabled queues affects RSS 
setting.


>
>> To reduce the VM downtime, the proposed approach involves capturing the basic
>> device state/configuration during the VM's running stage and performing the
>> initial device configuration(presetup). During the normal configuration process
>> when the VM is in a stopped state, the second configuration is compared to the
>> first one, and only the differences are applied to reduce downtime. Ideally,
>> only the vring available index needs to be changed within VM stop.
>>
>> This feature is disabled by default, because backend like dpdk also needs
>> adding support for vhost new message. New device property "x-early-migration"
>> can enable this feature.
>>
>> 1. Register a new vmstate for virtio-net with an early_setup flag to send the
>>     device state during migration setup.
>> 2. After device state load on destination VM, need to send device status to
>>     vhost backend in a new way. Introduce new vhost-user message:
>>     VHOST_USER_PRESETUP, to notify backend of presetup.
>> 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow:
>>     a. vhost-dev sending presetup start.
>>     b. virtio-net setting mtu.
>>     c. vhost-dev sending vring configuration and setting dummy call/kick fd.
>>     d. vhost-net sending vring enable.
>>     e. vhost-dev sending presetup end.
>>
>>
>> TODOs:
>> ======
>>    - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface
>>      if there's same requirement for vhost-vdpa.
>>
>>    - No vIOMMU support so far. If there is a need for vIOMMU support, it is
>>      planned to be addressed in a follow-up patchset.
>>
>>
>> Test:
>> =====
>>    - Live migration VM with 2 virtio-net devices, ping can recover.
>>      Together with DPDK patch [1].
>>    - The time consumption of DPDK function dev_conf is reduced from 191.4 ms
>>      to 6.6 ms.
>>
>>
>> References:
>> ===========
>>
>> [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37
>>
>> Any comments or feedback are highly appreciated.
>>
>> Thanks,
>> Yajun
>>
>>
>> Yajun Wu (5):
>>    vhost-user: Add presetup protocol feature and op
>>    vhost: Add support for presetup
>>    vhost-net: Add support for presetup
>>    virtio: Add VMState for early load
>>    virtio-net: Introduce LM early load
>>
>>   docs/interop/vhost-user.rst       |  10 ++
>>   hw/net/trace-events               |   1 +
>>   hw/net/vhost_net.c                |  40 +++++++
>>   hw/net/virtio-net.c               | 100 ++++++++++++++++++
>>   hw/virtio/vhost-user.c            |  30 ++++++
>>   hw/virtio/vhost.c                 | 166 +++++++++++++++++++++++++-----
>>   hw/virtio/virtio.c                | 152 ++++++++++++++++-----------
>>   include/hw/virtio/vhost-backend.h |   3 +
>>   include/hw/virtio/vhost.h         |  12 +++
>>   include/hw/virtio/virtio-net.h    |   1 +
>>   include/hw/virtio/virtio.h        |  10 +-
>>   include/net/vhost_net.h           |   3 +
>>   12 files changed, 443 insertions(+), 85 deletions(-)
>>
>> --
>> 2.27.0
>>
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/5] virtio-net: Introduce LM early load
  2023-10-18  6:40   ` Yajun Wu
@ 2023-10-19 15:00     ` Eugenio Perez Martin
  2023-12-22 19:07       ` Eugenio Perez Martin
  0 siblings, 1 reply; 14+ messages in thread
From: Eugenio Perez Martin @ 2023-10-19 15:00 UTC (permalink / raw)
  To: Yajun Wu
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com, mst@redhat.com, parav,
	jiri

On Wed, Oct 18, 2023 at 8:41 AM Yajun Wu <yajunw@nvidia.com> wrote:
>
>
> On 10/18/2023 12:47 AM, Eugenio Perez Martin wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, Sep 18, 2023 at 6:51 AM Yajun Wu <yajunw@nvidia.com> wrote:
> >> This series of patches aims to minimize the downtime during live migration of a
> >> virtio-net device with a vhost-user backend. In the case of hardware virtual
> >> Data Path Acceleration (vDPA) implementation, the hardware configuration, which
> >> includes tasks like VQ creation and RSS setting, may take above 200ms. This
> >> significantly increases the downtime of the VM, particularly in terms of
> >> networking.
> >>
> > Hi!
> >
> > Sorry I totally missed this email. Please CC me in next versions.
> >
> > Just for completion, there is an ongoing plan to reduce the downtime
> > in vhost-vdpa. You can find more details at [1].
> >
> > To send the state periodically is in the roadmap, but some
> > benchmarking detected that memory pinning and unpinning affects more
> > to downtime. I'll send a RFC soon with this. The plan was to continue
> > with iterative state restoring, so I'm happy to know more people are
> > looking into it!
> >
> > In the case of vhost-vdpa it already restores the state by not
> > enabling dataplane until migration completes. All the load is
> > performed using CVQ, as you can see in
> > net/vhost-vdpa.c:vhost_vdpa_net_load. After that, all dataplane is
> > started again.
> >
> > My idea is to start vhost-vdpa (by calling vhost_vdpa_dev_start) at
> > the destination at the same moment the migration starts, as it will
> > not have dataplane enabled. After that, the source should send the
> > virtio-net vmstate every time it changes. vhost-vdpa net is able to
> > send and receive through CVQ, so it should be able to modify net
> > device configuration as many times as needed. I guess that could be
> > done by calling something in the line of your
> > vhost_user_set_presetup_state.
> This is very good approach. How do you know when virtio-net vmstate
> change? vhost-user and vhost-vdpa should share same code of virtio-net
> vmstate early sync.

CVQ in vhost-vdpa must be shadowed already to be able to migrate.
Everytime the guest places a buffer in CVQ,
net/vhost-vdpa.c:vhost_vdpa_net_handle_ctrl_avail is called, which
calls virtio_net_handle_ctrl_iov.

So virtio_net_handle_ctrl_iov should be able to check if we're
migrating and signal that the state must be re-sent.

> >
> > This can be improved in vhost-vdpa by being able to send only the new state.
> >
> > When all the migration is completed, vhost-vdpa net dataplane should
> > start as it does now.
> >
> > If you are interested in saving changes to vhost-user protocol, maybe
> > qemu could just disable the dataplane too with
> > VHOST_USER_SET_VRING_ENABLE? If not, I think both approaches have a
> > lot in common, so I'm sure we can develop one backend on top of
> > another.
> >
> > Thanks!
> >
> > [1] https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg00659.html
>
> I'm afraid just like DRIVER_OK as a hint for vhost-user vDPA to apply
> all the configuration to HW. Vhost-user also needs same hint as the end
> of each round vmstate sync to apply configuration to HW. That's why I
> need define new protocol message.
>
> Because of MQ can also change, VQ enable is a valid parameter to HW. HW
> will create only enabled queue, number of enabled queues affects RSS
> setting.
>

I'm not sure I follow 100%, the first part is true for properties like
vq address etc. For that to change, a full device reset in the
destination is needed.

But for the number of queues, the destination QEMU is able to send
multiple CVQ commands before starting the dataplane as long as the
device supports the dataplane late enabling.

>
> >
> >> To reduce the VM downtime, the proposed approach involves capturing the basic
> >> device state/configuration during the VM's running stage and performing the
> >> initial device configuration(presetup). During the normal configuration process
> >> when the VM is in a stopped state, the second configuration is compared to the
> >> first one, and only the differences are applied to reduce downtime. Ideally,
> >> only the vring available index needs to be changed within VM stop.
> >>
> >> This feature is disabled by default, because backend like dpdk also needs
> >> adding support for vhost new message. New device property "x-early-migration"
> >> can enable this feature.
> >>
> >> 1. Register a new vmstate for virtio-net with an early_setup flag to send the
> >>     device state during migration setup.
> >> 2. After device state load on destination VM, need to send device status to
> >>     vhost backend in a new way. Introduce new vhost-user message:
> >>     VHOST_USER_PRESETUP, to notify backend of presetup.
> >> 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow:
> >>     a. vhost-dev sending presetup start.
> >>     b. virtio-net setting mtu.
> >>     c. vhost-dev sending vring configuration and setting dummy call/kick fd.
> >>     d. vhost-net sending vring enable.
> >>     e. vhost-dev sending presetup end.
> >>
> >>
> >> TODOs:
> >> ======
> >>    - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface
> >>      if there's same requirement for vhost-vdpa.
> >>
> >>    - No vIOMMU support so far. If there is a need for vIOMMU support, it is
> >>      planned to be addressed in a follow-up patchset.
> >>
> >>
> >> Test:
> >> =====
> >>    - Live migration VM with 2 virtio-net devices, ping can recover.
> >>      Together with DPDK patch [1].
> >>    - The time consumption of DPDK function dev_conf is reduced from 191.4 ms
> >>      to 6.6 ms.
> >>
> >>
> >> References:
> >> ===========
> >>
> >> [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37
> >>
> >> Any comments or feedback are highly appreciated.
> >>
> >> Thanks,
> >> Yajun
> >>
> >>
> >> Yajun Wu (5):
> >>    vhost-user: Add presetup protocol feature and op
> >>    vhost: Add support for presetup
> >>    vhost-net: Add support for presetup
> >>    virtio: Add VMState for early load
> >>    virtio-net: Introduce LM early load
> >>
> >>   docs/interop/vhost-user.rst       |  10 ++
> >>   hw/net/trace-events               |   1 +
> >>   hw/net/vhost_net.c                |  40 +++++++
> >>   hw/net/virtio-net.c               | 100 ++++++++++++++++++
> >>   hw/virtio/vhost-user.c            |  30 ++++++
> >>   hw/virtio/vhost.c                 | 166 +++++++++++++++++++++++++-----
> >>   hw/virtio/virtio.c                | 152 ++++++++++++++++-----------
> >>   include/hw/virtio/vhost-backend.h |   3 +
> >>   include/hw/virtio/vhost.h         |  12 +++
> >>   include/hw/virtio/virtio-net.h    |   1 +
> >>   include/hw/virtio/virtio.h        |  10 +-
> >>   include/net/vhost_net.h           |   3 +
> >>   12 files changed, 443 insertions(+), 85 deletions(-)
> >>
> >> --
> >> 2.27.0
> >>
> >>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/5] virtio: Add VMState for early load
  2023-09-18  4:49 ` [RFC PATCH 4/5] virtio: Add VMState for early load Yajun Wu
@ 2023-12-22 17:36   ` Eugenio Perez Martin
  0 siblings, 0 replies; 14+ messages in thread
From: Eugenio Perez Martin @ 2023-12-22 17:36 UTC (permalink / raw)
  To: Yajun Wu; +Cc: qemu-devel, jasowang, mst, Avihai Horon, Jiri Pirko

On Mon, Sep 18, 2023 at 6:51 AM Yajun Wu <yajunw@nvidia.com> wrote:
>
> Define new virtio device vmstate for early save/load (happens in
> LM setup stage). Same as original vmstate, except:
>
> In LM setup phase of the destination VM, the guest memory has not
> been synchronized yet. To address this, a flag has been added to
> virtio_load function to skip the index check.
>
> Signed-off-by: Yajun Wu <yajunw@nvidia.com>
> Reviewed-by: Avihai Horon <avihaih@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> ---
>  hw/virtio/virtio.c         | 152 +++++++++++++++++++++++--------------
>  include/hw/virtio/virtio.h |  10 ++-
>  2 files changed, 103 insertions(+), 59 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 969c25f4cf..8c73c245dd 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2832,7 +2832,17 @@ virtio_device_get(QEMUFile *f, void *opaque, size_t size,
>      VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
>      DeviceClass *dc = DEVICE_CLASS(VIRTIO_DEVICE_GET_CLASS(vdev));
>
> -    return virtio_load(vdev, f, dc->vmsd->version_id);
> +    return virtio_load(vdev, f, dc->vmsd->version_id, false);
> +}
> +
> +/* A wrapper for use as a VMState .get function */
> +static int virtio_early_device_get(QEMUFile *f, void *opaque, size_t size,
> +                                      const VMStateField *field)
> +{
> +    VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
> +    DeviceClass *dc = DEVICE_CLASS(VIRTIO_DEVICE_GET_CLASS(vdev));
> +
> +    return virtio_load(vdev, f, dc->vmsd->version_id, true);
>  }
>
>  const VMStateInfo  virtio_vmstate_info = {
> @@ -2841,6 +2851,12 @@ const VMStateInfo  virtio_vmstate_info = {
>      .put = virtio_device_put,
>  };
>
> +const VMStateInfo virtio_early_vmstate_info = {
> +    .name = "virtio-early",
> +    .get = virtio_early_device_get,
> +    .put = virtio_device_put,
> +};
> +
>  static int virtio_set_features_nocheck(VirtIODevice *vdev, uint64_t val)
>  {
>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> @@ -2940,8 +2956,75 @@ size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
>      return config_size;
>  }
>
> +static int virtio_load_check_index(VirtIODevice *vdev, int num)
> +{
> +    int i;
> +
> +    RCU_READ_LOCK_GUARD();
> +

I didn't check manually, but in the original function vdc->post_load
call was also protected by the RCU. Maybe it is better to leave it at
the caller?

> +    for (i = 0; i < num; i++) {
> +        if (vdev->vq[i].vring.desc) {
> +            uint16_t nheads;
> +
> +            /*
> +             * VIRTIO-1 devices migrate desc, used, and avail ring addresses so
> +             * only the region cache needs to be set up.  Legacy devices need
> +             * to calculate used and avail ring addresses based on the desc
> +             * address.
> +             */
> +            if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> +                virtio_init_region_cache(vdev, i);
> +            } else {
> +                virtio_queue_update_rings(vdev, i);
> +            }
> +
> +            if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +                vdev->vq[i].shadow_avail_idx = vdev->vq[i].last_avail_idx;
> +                vdev->vq[i].shadow_avail_wrap_counter =
> +                                        vdev->vq[i].last_avail_wrap_counter;
> +                continue;
> +            }
> +
> +            nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
> +            /* Check it isn't doing strange things with descriptor numbers. */
> +            if (nheads > vdev->vq[i].vring.num) {
> +                virtio_error(vdev, "VQ %d size 0x%x Guest index 0x%x "
> +                             "inconsistent with Host index 0x%x: delta 0x%x",
> +                             i, vdev->vq[i].vring.num,
> +                             vring_avail_idx(&vdev->vq[i]),
> +                             vdev->vq[i].last_avail_idx, nheads);
> +                vdev->vq[i].used_idx = 0;
> +                vdev->vq[i].shadow_avail_idx = 0;
> +                vdev->vq[i].inuse = 0;
> +                continue;
> +            }
> +            vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
> +            vdev->vq[i].shadow_avail_idx = vring_avail_idx(&vdev->vq[i]);
> +
> +            /*
> +             * Some devices migrate VirtQueueElements that have been popped
> +             * from the avail ring but not yet returned to the used ring.
> +             * Since max ring size < UINT16_MAX it's safe to use modulo
> +             * UINT16_MAX + 1 subtraction.
> +             */
> +            vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
> +                                vdev->vq[i].used_idx);
> +            if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
> +                error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
> +                             "used_idx 0x%x",
> +                             i, vdev->vq[i].vring.num,
> +                             vdev->vq[i].last_avail_idx,
> +                             vdev->vq[i].used_idx);
> +                return -1;
> +            }
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  int coroutine_mixed_fn
> -virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
> +virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id, bool early)

As an alternative to introducing a parameter, maybe we can check
incoming migration state?

migration_incoming_get_current()->state == MIGRATION_STATUS_SETUP should work.

>  {
>      int i, ret;
>      int32_t config_len;
> @@ -3078,62 +3161,15 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
>          vdev->start_on_kick = true;
>      }
>
> -    RCU_READ_LOCK_GUARD();
> -    for (i = 0; i < num; i++) {
> -        if (vdev->vq[i].vring.desc) {
> -            uint16_t nheads;
> -
> -            /*
> -             * VIRTIO-1 devices migrate desc, used, and avail ring addresses so
> -             * only the region cache needs to be set up.  Legacy devices need
> -             * to calculate used and avail ring addresses based on the desc
> -             * address.
> -             */
> -            if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> -                virtio_init_region_cache(vdev, i);
> -            } else {
> -                virtio_queue_update_rings(vdev, i);
> -            }
> -
> -            if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> -                vdev->vq[i].shadow_avail_idx = vdev->vq[i].last_avail_idx;
> -                vdev->vq[i].shadow_avail_wrap_counter =
> -                                        vdev->vq[i].last_avail_wrap_counter;
> -                continue;
> -            }
> -
> -            nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
> -            /* Check it isn't doing strange things with descriptor numbers. */
> -            if (nheads > vdev->vq[i].vring.num) {
> -                virtio_error(vdev, "VQ %d size 0x%x Guest index 0x%x "
> -                             "inconsistent with Host index 0x%x: delta 0x%x",
> -                             i, vdev->vq[i].vring.num,
> -                             vring_avail_idx(&vdev->vq[i]),
> -                             vdev->vq[i].last_avail_idx, nheads);
> -                vdev->vq[i].used_idx = 0;
> -                vdev->vq[i].shadow_avail_idx = 0;
> -                vdev->vq[i].inuse = 0;
> -                continue;
> -            }
> -            vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
> -            vdev->vq[i].shadow_avail_idx = vring_avail_idx(&vdev->vq[i]);
> -
> -            /*
> -             * Some devices migrate VirtQueueElements that have been popped
> -             * from the avail ring but not yet returned to the used ring.
> -             * Since max ring size < UINT16_MAX it's safe to use modulo
> -             * UINT16_MAX + 1 subtraction.
> -             */
> -            vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
> -                                vdev->vq[i].used_idx);
> -            if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
> -                error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
> -                             "used_idx 0x%x",
> -                             i, vdev->vq[i].vring.num,
> -                             vdev->vq[i].last_avail_idx,
> -                             vdev->vq[i].used_idx);
> -                return -1;
> -            }
> +    /*
> +     * Early setup happens in LM setup stage when the guest memory hasn't
> +     * synced to target VM yet. So skip all guest memory access and index check
> +     * in early load.
> +     */
> +    if (!early) {
> +        ret = virtio_load_check_index(vdev, num);
> +        if (ret) {
> +            return ret;
>          }
>      }
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index c8f72850bc..c9e6faf72c 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -280,6 +280,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
>  int virtio_save(VirtIODevice *vdev, QEMUFile *f);
>
>  extern const VMStateInfo virtio_vmstate_info;
> +extern const VMStateInfo virtio_early_vmstate_info;
>
>  #define VMSTATE_VIRTIO_DEVICE \
>      {                                         \
> @@ -288,7 +289,14 @@ extern const VMStateInfo virtio_vmstate_info;
>          .flags = VMS_SINGLE,                  \
>      }
>
> -int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id);
> +#define VMSTATE_EARLY_VIRTIO_DEVICE \
> +    {                                         \
> +        .name = "virtio-early",            \
> +        .info = &virtio_early_vmstate_info,\
> +        .flags = VMS_SINGLE,                  \
> +    }
> +
> +int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id, bool early);
>
>  /**
>   * virtio_notify_config() - signal a change to device config
> --
> 2.27.0
>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 2/5] vhost: Add support for presetup
  2023-09-18  4:49 ` [RFC PATCH 2/5] vhost: Add support for presetup Yajun Wu
@ 2023-12-22 18:46   ` Eugenio Perez Martin
  0 siblings, 0 replies; 14+ messages in thread
From: Eugenio Perez Martin @ 2023-12-22 18:46 UTC (permalink / raw)
  To: Yajun Wu; +Cc: qemu-devel, jasowang, mst, Avihai Horon, Jiri Pirko

On Mon, Sep 18, 2023 at 6:56 AM Yajun Wu <yajunw@nvidia.com> wrote:
>
> Add New API vhost_dev_start_presetup to notify backend the start
> and end of presetup.
>
> API vhost_dev_presetup to send out the device configurations:
> 1. acked_features
> 2. memory table
> 3. vring information
> 4. disable host/guest notifier.
>
> Signed-off-by: Yajun Wu <yajunw@nvidia.com>
> Reviewed-by: Avihai Horon <avihaih@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> ---
>  hw/virtio/vhost.c         | 166 ++++++++++++++++++++++++++++++++------
>  include/hw/virtio/vhost.h |  12 +++
>  2 files changed, 152 insertions(+), 26 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index e2f6ffb446..5b162590fb 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1138,24 +1138,71 @@ out:
>      return ret;
>  }
>
> -int vhost_virtqueue_start(struct vhost_dev *dev,
> -                          struct VirtIODevice *vdev,
> -                          struct vhost_virtqueue *vq,
> -                          unsigned idx)
> +static void vhost_virtqueue_memory_unmap(struct vhost_dev *dev,
> +                                         struct VirtIODevice *vdev,
> +                                         struct vhost_virtqueue *vq,
> +                                         unsigned idx)
> +{
> +    if (vq->used) {
> +        vhost_memory_unmap(dev, vq->used,
> +                           virtio_queue_get_used_size(vdev, idx),
> +                           1, virtio_queue_get_used_size(vdev, idx));
> +        vq->used = NULL;
> +    }
> +
> +    if (vq->avail) {
> +        vhost_memory_unmap(dev, vq->avail,
> +                           virtio_queue_get_avail_size(vdev, idx),
> +                           0, virtio_queue_get_avail_size(vdev, idx));
> +        vq->avail = NULL;
> +    }
> +
> +    if (vq->desc) {
> +        vhost_memory_unmap(dev, vq->desc,
> +                           virtio_queue_get_desc_size(vdev, idx),
> +                           0, virtio_queue_get_desc_size(vdev, idx));
> +        vq->desc = NULL;
> +    }
> +}

Can we split the vhost_virtqueue_memory_unmap in its own

> +
> +static int vhost_virtqueue_disable_notify(struct vhost_dev *dev,
> +                                          struct VirtIODevice *vdev,
> +                                          struct vhost_virtqueue *vq,
> +                                          unsigned idx)
>  {
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> -    VirtioBusState *vbus = VIRTIO_BUS(qbus);
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> -    hwaddr s, l, a;
> -    int r;
>      int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx);
>      struct vhost_vring_file file = {
>          .index = vhost_vq_index
>      };
> +    int r;
> +
> +    file.fd = -1;
> +    r = dev->vhost_ops->vhost_set_vring_kick(dev, &file);
> +    if (r) {
> +        VHOST_OPS_DEBUG(r, "vhost_set_vring_kick failed");
> +        return r;
> +    }
> +
> +    r = dev->vhost_ops->vhost_set_vring_call(dev, &file);
> +    if (r) {
> +        VHOST_OPS_DEBUG(r, "vhost_set_vring_call failed");
> +        return r;
> +    }
> +
> +    return 0;
> +}
> +
> +static int vhost_virtqueue_vring_setup(struct vhost_dev *dev,
> +                                       struct VirtIODevice *vdev,
> +                                       struct vhost_virtqueue *vq,
> +                                       unsigned idx)
> +{
> +    hwaddr s, l, a;
> +    int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx);
>      struct vhost_vring_state state = {
>          .index = vhost_vq_index
>      };
> -    struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
> +    int r;
>
>      a = virtio_queue_get_desc_addr(vdev, idx);
>      if (a == 0) {
> @@ -1186,6 +1233,10 @@ int vhost_virtqueue_start(struct vhost_dev *dev,
>          }
>      }
>
> +    if (vq->desc) {
> +        vhost_virtqueue_memory_unmap(dev, vdev, vq, idx);
> +    }
> +

How is that we need to unmap from here? Actually, vq->desc should
always be NULL here, isn't it?

I guess it is because vhost_virtqueue_vring_setup is called twice in
vhost-net: One when the first device state reaches the destination,
and another time at vhost_virtqueue_start. Would it work to not call
vhost_virtqueue_vring_setup at vhost_virtqueue_start if vq->desc !=
NULL?

>      vq->desc_size = s = l = virtio_queue_get_desc_size(vdev, idx);
>      vq->desc_phys = a;
>      vq->desc = vhost_memory_map(dev, a, &l, false);
> @@ -1212,6 +1263,36 @@ int vhost_virtqueue_start(struct vhost_dev *dev,
>      if (r < 0) {
>          goto fail_alloc;
>      }
> +    return 0;
> +
> +fail_alloc:
> +fail_alloc_used:
> +fail_alloc_avail:
> +    vhost_virtqueue_memory_unmap(dev, vdev, vq, idx);
> +fail_alloc_desc:
> +    return r;
> +}
> +
> +int vhost_virtqueue_start(struct vhost_dev *dev,
> +                          struct VirtIODevice *vdev,
> +                          struct vhost_virtqueue *vq,
> +                          unsigned idx)
> +{
> +    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> +    VirtioBusState *vbus = VIRTIO_BUS(qbus);
> +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> +    int r;
> +    int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx);
> +    struct vhost_vring_file file = {
> +        .index = vhost_vq_index
> +    };
> +    struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
> +
> +    r = vhost_virtqueue_vring_setup(dev, vdev, vq, idx);
> +    if (r) {
> +        VHOST_OPS_DEBUG(r, "vhost_virtqueue_vring_setup failed");
> +        goto fail_vring_setup;
> +    }
>
>      file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq));
>      r = dev->vhost_ops->vhost_set_vring_kick(dev, &file);
> @@ -1245,16 +1326,8 @@ int vhost_virtqueue_start(struct vhost_dev *dev,
>
>  fail_vector:
>  fail_kick:
> -fail_alloc:
> -    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
> -                       0, 0);
> -fail_alloc_used:
> -    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
> -                       0, 0);
> -fail_alloc_avail:
> -    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
> -                       0, 0);
> -fail_alloc_desc:
> +    vhost_virtqueue_memory_unmap(dev, vdev, vq, idx);
> +fail_vring_setup:
>      return r;
>  }
>
> @@ -1296,12 +1369,7 @@ void vhost_virtqueue_stop(struct vhost_dev *dev,
>                                                  vhost_vq_index);
>      }
>
> -    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
> -                       1, virtio_queue_get_used_size(vdev, idx));
> -    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
> -                       0, virtio_queue_get_avail_size(vdev, idx));
> -    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
> -                       0, virtio_queue_get_desc_size(vdev, idx));
> +    vhost_virtqueue_memory_unmap(dev, vdev, vq, idx);
>  }
>
>  static int vhost_virtqueue_set_busyloop_timeout(struct vhost_dev *dev,
> @@ -1921,6 +1989,43 @@ static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable)
>      return hdev->vhost_ops->vhost_set_vring_enable(hdev, enable);
>  }
>
> +int vhost_dev_presetup(struct vhost_dev *hdev, VirtIODevice *vdev)
> +{
> +    int i, r;
> +
> +    /* should only be called after backend is connected */
> +    assert(hdev->vhost_ops);
> +
> +    r = vhost_dev_set_features(hdev, hdev->log_enabled);
> +    if (r < 0) {
> +        return r;
> +    }
> +
> +    r = hdev->vhost_ops->vhost_set_mem_table(hdev, hdev->mem);
> +    if (r < 0) {
> +        VHOST_OPS_DEBUG(r, "vhost_set_mem_table failed");
> +        return r;
> +    }
> +
> +    for (i = 0; i < hdev->nvqs; ++i) {
> +        r = vhost_virtqueue_vring_setup(hdev, vdev,
> +                                        hdev->vqs + i,
> +                                        hdev->vq_index + i);
> +        if (r < 0) {
> +            VHOST_OPS_DEBUG(r, "vhost_virtqueue_setup failed");
> +            return r;
> +        }
> +        r = vhost_virtqueue_disable_notify(hdev, vdev,
> +                                           hdev->vqs + i,
> +                                           hdev->vq_index + i);

Why is this call needed? The vhost backend should not have any kick or
call fd configured at this moment, isn't it?

> +        if (r < 0) {
> +            return r;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  /* Host notifiers must be enabled at this point. */
>  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
>  {
> @@ -2087,3 +2192,12 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>
>      return -ENOSYS;
>  }
> +
> +int vhost_dev_set_presetup_state(struct vhost_dev *hdev, bool start)
> +{
> +    if (!hdev->vhost_ops->vhost_presetup) {
> +        return -ENOTSUP;

I'm thinking if we must return an error here.

Presetup is only "warming up" the device, as all the information is
re-sent at vhost_dev_start. If we annotate the device state somewhere
(bool presetup_has_run), we can just call vhost_virtqueue_vring_setup
at vhost_virtqueue_start and configure the virtqueues selectively.

This way we enable migration between all the backends, either support
presetup or not.

> +    }
> +
> +    return hdev->vhost_ops->vhost_presetup(hdev, start);
> +}
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 6a173cb9fa..95a8031d12 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -192,6 +192,17 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>  bool vhost_config_pending(struct vhost_dev *hdev);
>  void vhost_config_mask(struct vhost_dev *hdev, VirtIODevice *vdev, bool mask);
>
> +/**
> + * vhost_dev_presetup() - pre-setup the vhost device in LM
> + * @hdev: common vhost_dev structure
> + * @vdev: the VirtIODevice structure
> + *
> + * During live migration, send out device information to backend in early
> + * running state. Backend can have enough time to prepare HW.
> + * Return: 0 on success, < 0 on error.
> + */
> +int vhost_dev_presetup(struct vhost_dev *hdev, VirtIODevice *vdev);
> +
>  /**
>   * vhost_dev_is_started() - report status of vhost device
>   * @hdev: common vhost_dev structure
> @@ -338,4 +349,5 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
>  int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
>                             struct vhost_inflight *inflight);
>  bool vhost_dev_has_iommu(struct vhost_dev *dev);
> +int vhost_dev_set_presetup_state(struct vhost_dev *hdev, bool start);
>  #endif
> --
> 2.27.0
>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 5/5] virtio-net: Introduce LM early load
  2023-09-18  4:49 ` [RFC PATCH 5/5] virtio-net: Introduce LM " Yajun Wu
@ 2023-12-22 18:58   ` Eugenio Perez Martin
  0 siblings, 0 replies; 14+ messages in thread
From: Eugenio Perez Martin @ 2023-12-22 18:58 UTC (permalink / raw)
  To: Yajun Wu; +Cc: qemu-devel, jasowang, mst, Avihai Horon, Jiri Pirko

On Mon, Sep 18, 2023 at 6:51 AM Yajun Wu <yajunw@nvidia.com> wrote:
>
> Register a new vmstate for virtio-net with an early_setup flag to send
> the device state during migration setup.
>
> This can reduce the migration downtime of a virtio-net device with a
> vhost-user backend.
>
> This feature is disabled by default and can be enabled by setting the
> "x-early-migration" device property to on.
>
> Signed-off-by: Yajun Wu <yajunw@nvidia.com>
> Reviewed-by: Avihai Horon <avihaih@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> ---
>  hw/net/trace-events            |   1 +
>  hw/net/virtio-net.c            | 100 +++++++++++++++++++++++++++++++++
>  include/hw/virtio/virtio-net.h |   1 +
>  3 files changed, 102 insertions(+)
>
> diff --git a/hw/net/trace-events b/hw/net/trace-events
> index 6b5ba669a2..ec89229044 100644
> --- a/hw/net/trace-events
> +++ b/hw/net/trace-events
> @@ -399,6 +399,7 @@ virtio_net_post_load_device(void)
>  virtio_net_rss_disable(void)
>  virtio_net_rss_error(const char *msg, uint32_t value) "%s, value 0x%08x"
>  virtio_net_rss_enable(uint32_t p1, uint16_t p2, uint8_t p3) "hashes 0x%x, table of %d, key of %d"
> +virtio_net_load_early_setup(void) ""
>
>  # tulip.c
>  tulip_reg_write(uint64_t addr, const char *name, int size, uint64_t val) "addr 0x%02"PRIx64" (%s) size %d value 0x%08"PRIx64
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 7102ec4817..d0b0cc2ffe 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -46,6 +46,7 @@
>  #include "net_rx_pkt.h"
>  #include "hw/virtio/vhost.h"
>  #include "sysemu/qtest.h"
> +#include "sysemu/runstate.h"
>
>  #define VIRTIO_NET_VM_VERSION    11
>
> @@ -3568,6 +3569,95 @@ static bool failover_hide_primary_device(DeviceListener *listener,
>      return qatomic_read(&n->failover_primary_hidden);
>  }
>
> +static int virtio_net_load_early_setup(void *opaque, int version_id)
> +{
> +    VirtIONet *n = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
> +    NetClientState *nc = qemu_get_queue(n->nic);
> +    int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> +    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> +        n->max_ncs - n->max_queue_pairs : 0;
> +    VHostNetState *net;
> +    int r;
> +
> +    assert(nc->peer);
> +    assert(nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER);
> +
> +    net = get_vhost_net(nc->peer);
> +    assert(net);
> +    assert(net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
> +
> +    trace_virtio_net_load_early_setup();
> +
> +    /* backend should support presetup */
> +    r = vhost_dev_set_presetup_state(&net->dev, true);
> +    if (r < 0) {
> +        error_report("Start presetup device fail: %d", r);
> +        return r;
> +    }
> +
> +    if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_MTU)) {
> +        r = vhost_net_set_mtu(get_vhost_net(nc->peer), n->net_conf.mtu);
> +        if (r < 0) {
> +            error_report("%uBytes MTU not supported by the backend",
> +                         n->net_conf.mtu);
> +            goto error;
> +        }
> +    }
> +
> +    r = vhost_net_presetup(vdev, n->nic->ncs, queue_pairs, cvq);
> +    if (r < 0) {
> +        error_report("Presetup device fail: %d", r);
> +        goto error;
> +    }
> +
> +    r = vhost_dev_set_presetup_state(&net->dev, false);

I guess this is to signal the backend the end of the presetup
information, isn't it?

Can we do it in the vhost-user backend itself? You can check the queue
a function is running against with dev->vq_index and
dev->vq_index_end.

You can see an example of checking if the function is running at the
first device with at vhost_user_backend_init, that checks
dev->vq_index == 0.

You can see an example of vq_index_end at vhost_user_dev_start, that
only add the status if it runs in the last device. In this case, the
check is (dev->vq_index + dev->nvqs != dev->vq_index_end).

> +    if (r < 0) {
> +        error_report("Finish presetup device fail: %d", r);
> +        return r;
> +    }
> +    return 0;
> +
> +error:
> +    vhost_dev_set_presetup_state(&net->dev, false);
> +    return r;
> +}
> +
> +static bool virtio_net_early_setup_needed(void *opaque)
> +{
> +    VirtIONet *n = opaque;
> +    NetClientState *nc = qemu_get_queue(n->nic);
> +    VHostNetState *net = get_vhost_net(nc->peer);
> +
> +    /*
> +     * Presetup aims to reduce live migration downtime by sync device
> +     * status in setup stage. So only do presetup when source VM is in
> +     * running state.
> +     */
> +    if (runstate_is_running() &&
> +        nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER &&
> +        net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER &&
> +        !vhost_dev_has_iommu(&net->dev) &&
> +        n->vhost_started &&
> +        n->status & VIRTIO_NET_S_LINK_UP) {
> +        return true;
> +    }
> +    return false;
> +}

I think it is better not to check for vhost-user here, as:
* All backends can potentially benefit from this.
* Source running vhost-user does not mean the destination is running
vhost-user too.

Another nitpick, you can directly "return runstate_is_running() &&
...;". But I'm fine with this version too.

> +
> +static const VMStateDescription vmstate_virtio_net_early = {
> +    .name = "virtio-net-early",
> +    .minimum_version_id = VIRTIO_NET_VM_VERSION,
> +    .version_id = VIRTIO_NET_VM_VERSION,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_EARLY_VIRTIO_DEVICE,
> +        VMSTATE_END_OF_LIST()
> +    },
> +    .early_setup = true,
> +    .post_load = virtio_net_load_early_setup,
> +    .needed = virtio_net_early_setup_needed,
> +};
> +
>  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -3743,6 +3833,11 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>      if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
>          virtio_net_load_ebpf(n);
>      }
> +
> +    if (n->early_migration) {
> +        vmstate_register(NULL, VMSTATE_INSTANCE_ID_ANY,
> +                         &vmstate_virtio_net_early, n);
> +    }
>  }
>
>  static void virtio_net_device_unrealize(DeviceState *dev)
> @@ -3787,6 +3882,10 @@ static void virtio_net_device_unrealize(DeviceState *dev)
>      g_free(n->rss_data.indirections_table);
>      net_rx_pkt_uninit(n->rx_pkt);
>      virtio_cleanup(vdev);
> +
> +    if (n->early_migration) {
> +        vmstate_unregister(NULL, &vmstate_virtio_net_early, n);
> +    }
>  }
>
>  static void virtio_net_instance_init(Object *obj)
> @@ -3922,6 +4021,7 @@ static Property virtio_net_properties[] = {
>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>      DEFINE_PROP_BOOL("failover", VirtIONet, failover, false),
> +    DEFINE_PROP_BOOL("x-early-migration", VirtIONet, early_migration, false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index e07a723027..9e6f90b46f 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -212,6 +212,7 @@ struct VirtIONet {
>      /* primary failover device is hidden*/
>      bool failover_primary_hidden;
>      bool failover;
> +    bool early_migration;
>      DeviceListener primary_listener;
>      QDict *primary_opts;
>      bool primary_opts_from_json;
> --
> 2.27.0
>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/5] virtio-net: Introduce LM early load
  2023-10-19 15:00     ` Eugenio Perez Martin
@ 2023-12-22 19:07       ` Eugenio Perez Martin
  0 siblings, 0 replies; 14+ messages in thread
From: Eugenio Perez Martin @ 2023-12-22 19:07 UTC (permalink / raw)
  To: Yajun Wu
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com, mst@redhat.com, parav,
	jiri

Hi Yajun,

Sorry for the late reply.

Apart from the few nitpicks commented, I think it is valid to start
from this series and then add the capability to re-send the
configuration in case the source changes it by another series on top.
That would allow us to keep both series small.

Not sure if all can be done before the next release, so we don't have
to change the virtio-net migration format twice...

Please let me know what you think about the comments.

Thanks!

On Thu, Oct 19, 2023 at 5:00 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Oct 18, 2023 at 8:41 AM Yajun Wu <yajunw@nvidia.com> wrote:
> >
> >
> > On 10/18/2023 12:47 AM, Eugenio Perez Martin wrote:
...

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-12-22 19:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-18  4:49 [RFC PATCH 0/5] virtio-net: Introduce LM early load Yajun Wu
2023-09-18  4:49 ` [RFC PATCH 1/5] vhost-user: Add presetup protocol feature and op Yajun Wu
2023-09-18  4:49 ` [RFC PATCH 2/5] vhost: Add support for presetup Yajun Wu
2023-12-22 18:46   ` Eugenio Perez Martin
2023-09-18  4:49 ` [RFC PATCH 3/5] vhost-net: " Yajun Wu
2023-09-18  4:49 ` [RFC PATCH 4/5] virtio: Add VMState for early load Yajun Wu
2023-12-22 17:36   ` Eugenio Perez Martin
2023-09-18  4:49 ` [RFC PATCH 5/5] virtio-net: Introduce LM " Yajun Wu
2023-12-22 18:58   ` Eugenio Perez Martin
2023-10-17  7:32 ` [RFC PATCH 0/5] " Yajun Wu
2023-10-17 16:47 ` Eugenio Perez Martin
2023-10-18  6:40   ` Yajun Wu
2023-10-19 15:00     ` Eugenio Perez Martin
2023-12-22 19:07       ` Eugenio Perez Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).