qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration
@ 2023-02-24 15:54 Eugenio Pérez
  2023-02-24 15:54 ` [PATCH v4 01/15] vdpa net: move iova tree creation from init to start Eugenio Pérez
                   ` (15 more replies)
  0 siblings, 16 replies; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

It's possible to migrate vdpa net devices if they are shadowed from the
start.  But to always shadow the dataplane is to effectively break its host
passthrough, so its not efficient in vDPA scenarios.

This series enables dynamically switching to shadow mode only at
migration time.  This allows full data virtqueues passthrough all the
time qemu is not migrating.

In this series only net devices with no CVQ are migratable.  CVQ adds
additional state that would make the series bigger and still had some
controversy on previous RFC, so let's split it.

Successfully tested with vdpa_sim_net with patch [1] applied and with the qemu
emulated device with vp_vdpa with some restrictions:
* No CVQ. No feature that didn't work with SVQ previously (packed, ...)
* VIRTIO_RING_F_STATE patches implementing [2].
* Expose _F_SUSPEND, but ignore it and suspend on ring state fetch like
  DPDK.

Previous versions were tested by many vendors. Not carrying Tested-by because
of code changes, so re-testing would be appreciated.

Comments are welcome.

v4:
- Recover used_idx from guest's vring if device cannot suspend.
- Fix starting device in the middle of a migration.  Removed some
  duplication in setting / clearing enable_shadow_vqs and shadow_data
  members of vhost_vdpa.
- Fix (again) "Check for SUSPEND in vhost_dev.backend_cap, as
  .backend_features is empty at the check moment.". It was reverted by
  mistake in v3.
- Fix memory leak of iova tree.
- Properly rewind SVQ as in flight descriptors were still being accounted
  in vq base.
- Expand documentation.

v3:
- Start datapatch in SVQ in device started while migrating.
- Properly register migration blockers if device present unsupported features.
- Fix race condition because of not stopping the SVQ until device cleanup.
- Explain purpose of iova tree in the first patch message.
- s/dynamycally/dynamically/ in cover letter.
- at lore.kernel.org/qemu-devel/20230215173850.298832-14-eperezma@redhat.com

v2:
- Check for SUSPEND in vhost_dev.backend_cap, as .backend_features is empty at
  the check moment.
- at https://lore.kernel.org/all/20230208094253.702672-12-eperezma@redhat.com/T/

v1:
- Omit all code working with CVQ and block migration if the device supports
  CVQ.
- Remove spurious kick.
- Move all possible checks for migration to vhost-vdpa instead of the net
  backend. Move them to init code from start code.
- Suspend on vhost_vdpa_dev_start(false) instead of in vhost-vdpa net backend.
- Properly split suspend after geting base and adding of status_reset patches.
- Add possible TODOs to points where this series can improve in the future.
- Check the state of migration using migration_in_setup and
  migration_has_failed instead of checking all the possible migration status in
  a switch.
- Add TODO with possible low hand fruit using RESUME ops.
- Always offer _F_LOG from virtio/vhost-vdpa and let migration blockers do
  their thing instead of adding a variable.
- RFC v2 at https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02574.html

RFC v2:
- Use a migration listener instead of a memory listener to know when
  the migration starts.
- Add stuff not picked with ASID patches, like enable rings after
  driver_ok
- Add rewinding on the migration src, not in dst
- RFC v1 at https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg01664.html

[1] https://lore.kernel.org/lkml/20230203142501.300125-1-eperezma@redhat.com/T/
[2] https://lists.oasis-open.org/archives/virtio-comment/202103/msg00036.html

Eugenio Pérez (15):
  vdpa net: move iova tree creation from init to start
  vdpa: Remember last call fd set
  vdpa: stop svq at vhost_vdpa_dev_start(false)
  vdpa: Negotiate _F_SUSPEND feature
  vdpa: move vhost reset after get vring base
  vdpa: add vhost_vdpa->suspended parameter
  vdpa: add vhost_vdpa_suspend
  vdpa: rewind at get_base, not set_base
  vdpa: add vdpa net migration state notifier
  vdpa: disable RAM block discard only for the first device
  vdpa net: block migration if the device has CVQ
  vdpa: block migration if device has unsupported features
  vdpa: block migration if SVQ does not admit a feature
  vdpa net: allow VHOST_F_LOG_ALL
  vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices

 include/hw/virtio/vhost-backend.h  |   4 +
 include/hw/virtio/vhost-vdpa.h     |   3 +
 hw/virtio/vhost-shadow-virtqueue.c |   8 +-
 hw/virtio/vhost-vdpa.c             | 128 +++++++++++++------
 hw/virtio/vhost.c                  |   3 +
 net/vhost-vdpa.c                   | 198 ++++++++++++++++++++++++-----
 hw/virtio/trace-events             |   1 +
 7 files changed, 273 insertions(+), 72 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 01/15] vdpa net: move iova tree creation from init to start
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27  7:04   ` Jason Wang
  2023-02-24 15:54 ` [PATCH v4 02/15] vdpa: Remember last call fd set Eugenio Pérez
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

Only create iova_tree if and when it is needed.

The cleanup keeps being responsible of last VQ but this change allows it
to merge both cleanup functions.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
v4:
* Remove leak of iova_tree because double allocation
* Document better the sharing of IOVA tree between data and CVQ
---
 net/vhost-vdpa.c | 113 ++++++++++++++++++++++++++++++++++-------------
 1 file changed, 83 insertions(+), 30 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index de5ed8ff22..b89c99066a 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -178,13 +178,9 @@ err_init:
 static void vhost_vdpa_cleanup(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
-    struct vhost_dev *dev = &s->vhost_net->dev;
 
     qemu_vfree(s->cvq_cmd_out_buffer);
     qemu_vfree(s->status);
-    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
-        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
-    }
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
@@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
     return size;
 }
 
+/** From any vdpa net client, get the netclient of first queue pair */
+static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
+{
+    NICState *nic = qemu_get_nic(s->nc.peer);
+    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
+
+    return DO_UPCAST(VhostVDPAState, nc, nc0);
+}
+
+static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
+{
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+
+    if (v->shadow_vqs_enabled) {
+        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
+                                           v->iova_range.last);
+    }
+}
+
+static int vhost_vdpa_net_data_start(NetClientState *nc)
+{
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+    if (v->index == 0) {
+        vhost_vdpa_net_data_start_first(s);
+        return 0;
+    }
+
+    if (v->shadow_vqs_enabled) {
+        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
+        v->iova_tree = s0->vhost_vdpa.iova_tree;
+    }
+
+    return 0;
+}
+
+static void vhost_vdpa_net_client_stop(NetClientState *nc)
+{
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_dev *dev;
+
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+    dev = s->vhost_vdpa.dev;
+    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
+        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
+    }
+}
+
 static NetClientInfo net_vhost_vdpa_info = {
         .type = NET_CLIENT_DRIVER_VHOST_VDPA,
         .size = sizeof(VhostVDPAState),
         .receive = vhost_vdpa_receive,
+        .start = vhost_vdpa_net_data_start,
+        .stop = vhost_vdpa_net_client_stop,
         .cleanup = vhost_vdpa_cleanup,
         .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
         .has_ufo = vhost_vdpa_has_ufo,
@@ -351,7 +401,7 @@ dma_map_err:
 
 static int vhost_vdpa_net_cvq_start(NetClientState *nc)
 {
-    VhostVDPAState *s;
+    VhostVDPAState *s, *s0;
     struct vhost_vdpa *v;
     uint64_t backend_features;
     int64_t cvq_group;
@@ -415,8 +465,6 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
         return r;
     }
 
-    v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
-                                       v->iova_range.last);
     v->shadow_vqs_enabled = true;
     s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
 
@@ -425,6 +473,27 @@ out:
         return 0;
     }
 
+    s0 = vhost_vdpa_net_first_nc_vdpa(s);
+    if (s0->vhost_vdpa.iova_tree) {
+        /*
+         * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
+         * simplicity, wether CVQ shares ASID with guest or not, because:
+         * - Memory listener need access to guest's memory addresses allocated
+         *   in the IOVA tree.
+         * - There should be plenty of IOVA address space for both ASID not to
+         *   worry about collisions between them.  Guest's translations are
+         *   still validated with virtio virtqueue_pop so there is no risk for
+         *   the guest to access memory it shouldn't.
+         *
+         * To allocate a iova tree per ASID is doable but it complicates the
+         * code and it is not worth for the moment.
+         */
+        v->iova_tree = s0->vhost_vdpa.iova_tree;
+    } else {
+        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
+                                           v->iova_range.last);
+    }
+
     r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
                                vhost_vdpa_net_cvq_cmd_page_len(), false);
     if (unlikely(r < 0)) {
@@ -449,15 +518,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
     if (s->vhost_vdpa.shadow_vqs_enabled) {
         vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
         vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
-        if (!s->always_svq) {
-            /*
-             * If only the CVQ is shadowed we can delete this safely.
-             * If all the VQs are shadows this will be needed by the time the
-             * device is started again to register SVQ vrings and similar.
-             */
-            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
-        }
     }
+
+    vhost_vdpa_net_client_stop(nc);
 }
 
 static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
@@ -667,8 +730,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                        int nvqs,
                                        bool is_datapath,
                                        bool svq,
-                                       struct vhost_vdpa_iova_range iova_range,
-                                       VhostIOVATree *iova_tree)
+                                       struct vhost_vdpa_iova_range iova_range)
 {
     NetClientState *nc = NULL;
     VhostVDPAState *s;
@@ -690,7 +752,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     s->vhost_vdpa.shadow_vqs_enabled = svq;
     s->vhost_vdpa.iova_range = iova_range;
     s->vhost_vdpa.shadow_data = svq;
-    s->vhost_vdpa.iova_tree = iova_tree;
     if (!is_datapath) {
         s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
                                             vhost_vdpa_net_cvq_cmd_page_len());
@@ -760,7 +821,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     uint64_t features;
     int vdpa_device_fd;
     g_autofree NetClientState **ncs = NULL;
-    g_autoptr(VhostIOVATree) iova_tree = NULL;
     struct vhost_vdpa_iova_range iova_range;
     NetClientState *nc;
     int queue_pairs, r, i = 0, has_cvq = 0;
@@ -812,12 +872,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         goto err;
     }
 
-    if (opts->x_svq) {
-        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
-            goto err_svq;
-        }
-
-        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
+    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
+        goto err;
     }
 
     ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
@@ -825,7 +881,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
                                      vdpa_device_fd, i, 2, true, opts->x_svq,
-                                     iova_range, iova_tree);
+                                     iova_range);
         if (!ncs[i])
             goto err;
     }
@@ -833,13 +889,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
                                  vdpa_device_fd, i, 1, false,
-                                 opts->x_svq, iova_range, iova_tree);
+                                 opts->x_svq, iova_range);
         if (!nc)
             goto err;
     }
 
-    /* iova_tree ownership belongs to last NetClientState */
-    g_steal_pointer(&iova_tree);
     return 0;
 
 err:
@@ -849,7 +903,6 @@ err:
         }
     }
 
-err_svq:
     qemu_close(vdpa_device_fd);
 
     return -1;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 02/15] vdpa: Remember last call fd set
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
  2023-02-24 15:54 ` [PATCH v4 01/15] vdpa net: move iova tree creation from init to start Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-24 15:54 ` [PATCH v4 03/15] vdpa: stop svq at vhost_vdpa_dev_start(false) Eugenio Pérez
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

As SVQ can be enabled dynamically at any time, it needs to store call fd
always.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 542e003101..4f72a52a43 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1240,16 +1240,16 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
                                        struct vhost_vring_file *file)
 {
     struct vhost_vdpa *v = dev->opaque;
+    int vdpa_idx = file->index - dev->vq_index;
+    VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
 
+    /* Remember last call fd because we can switch to SVQ anytime. */
+    vhost_svq_set_svq_call_fd(svq, file->fd);
     if (v->shadow_vqs_enabled) {
-        int vdpa_idx = file->index - dev->vq_index;
-        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
-
-        vhost_svq_set_svq_call_fd(svq, file->fd);
         return 0;
-    } else {
-        return vhost_vdpa_set_vring_dev_call(dev, file);
     }
+
+    return vhost_vdpa_set_vring_dev_call(dev, file);
 }
 
 static int vhost_vdpa_get_features(struct vhost_dev *dev,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 03/15] vdpa: stop svq at vhost_vdpa_dev_start(false)
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
  2023-02-24 15:54 ` [PATCH v4 01/15] vdpa net: move iova tree creation from init to start Eugenio Pérez
  2023-02-24 15:54 ` [PATCH v4 02/15] vdpa: Remember last call fd set Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27  7:15   ` Jason Wang
  2023-02-24 15:54 ` [PATCH v4 04/15] vdpa: Negotiate _F_SUSPEND feature Eugenio Pérez
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

It used to be done at vhost_vdpa_svq_cleanup, since a device couldn't
switch to SVQ mode dynamically.  Now that we need to fetch the state and
trust SVQ will not modify guest's used_idx at migration, stop
effectively SVQ at suspend time, as a real device would do.

Leaving old vhost_svq_stop call at vhost_vdpa_svq_cleanup, as its
supported to call it many times and it follows other operations that are
called redundantly there too:
* vhost_vdpa_host_notifiers_uninit
* memory_listener_unregister

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v3: New in v3
---
 hw/virtio/vhost-vdpa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 4f72a52a43..d9260191cc 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1100,10 +1100,11 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
 
     for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
         VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
+
+        vhost_svq_stop(svq);
         vhost_vdpa_svq_unmap_rings(dev, svq);
 
         event_notifier_cleanup(&svq->hdev_kick);
-        event_notifier_cleanup(&svq->hdev_call);
     }
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 04/15] vdpa: Negotiate _F_SUSPEND feature
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (2 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 03/15] vdpa: stop svq at vhost_vdpa_dev_start(false) Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-24 15:54 ` [PATCH v4 05/15] vdpa: move vhost reset after get vring base Eugenio Pérez
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

This is needed for qemu to know it can suspend the device to retrieve
its status and enable SVQ with it, so all the process is transparent to
the guest.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index d9260191cc..4fac144169 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -659,7 +659,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
     uint64_t features;
     uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
         0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
-        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
+        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
+        0x1ULL << VHOST_BACKEND_F_SUSPEND;
     int r;
 
     if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 05/15] vdpa: move vhost reset after get vring base
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (3 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 04/15] vdpa: Negotiate _F_SUSPEND feature Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27  7:22   ` Jason Wang
  2023-02-24 15:54 ` [PATCH v4 06/15] vdpa: add vhost_vdpa->suspended parameter Eugenio Pérez
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

The function vhost.c:vhost_dev_stop calls vhost operation
vhost_dev_start(false). In the case of vdpa it totally reset and wipes
the device, making the fetching of the vring base (virtqueue state) totally
useless.

The kernel backend does not use vhost_dev_start vhost op callback, but
vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
is desirable, but it can be added on top.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-backend.h |  4 ++++
 hw/virtio/vhost-vdpa.c            | 22 ++++++++++++++++------
 hw/virtio/vhost.c                 |  3 +++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index c5ab49051e..ec3fbae58d 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
 
 typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
                                        int fd);
+
+typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -177,6 +180,7 @@ typedef struct VhostOps {
     vhost_get_device_id_op vhost_get_device_id;
     vhost_force_iommu_op vhost_force_iommu;
     vhost_set_config_call_op vhost_set_config_call;
+    vhost_reset_status_op vhost_reset_status;
 } VhostOps;
 
 int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 4fac144169..8cc9c98db9 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1134,14 +1134,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
     if (started) {
         memory_listener_register(&v->listener, &address_space_memory);
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
-    } else {
-        vhost_vdpa_reset_device(dev);
-        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
-                                   VIRTIO_CONFIG_S_DRIVER);
-        memory_listener_unregister(&v->listener);
+    }
 
-        return 0;
+    return 0;
+}
+
+static void vhost_vdpa_reset_status(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+
+    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+        return;
     }
+
+    vhost_vdpa_reset_device(dev);
+    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+                               VIRTIO_CONFIG_S_DRIVER);
+    memory_listener_unregister(&v->listener);
 }
 
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
@@ -1328,4 +1337,5 @@ const VhostOps vdpa_ops = {
         .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
         .vhost_force_iommu = vhost_vdpa_force_iommu,
         .vhost_set_config_call = vhost_vdpa_set_config_call,
+        .vhost_reset_status = vhost_vdpa_reset_status,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index eb8c4c378c..a266396576 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
                              hdev->vqs + i,
                              hdev->vq_index + i);
     }
+    if (hdev->vhost_ops->vhost_reset_status) {
+        hdev->vhost_ops->vhost_reset_status(hdev);
+    }
 
     if (vhost_dev_has_iommu(hdev)) {
         if (hdev->vhost_ops->vhost_set_iotlb_callback) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 06/15] vdpa: add vhost_vdpa->suspended parameter
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (4 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 05/15] vdpa: move vhost reset after get vring base Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27  7:24   ` Jason Wang
  2023-02-24 15:54 ` [PATCH v4 07/15] vdpa: add vhost_vdpa_suspend Eugenio Pérez
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

This allows vhost_vdpa to track if it is safe to get vring base from the
device or not.  If it is not, vhost can fall back to fetch idx from the
guest buffer again.

No functional change intended in this patch, later patches will use this
field.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-vdpa.h | 2 ++
 hw/virtio/vhost-vdpa.c         | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 7997f09a8d..4a7d396674 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -42,6 +42,8 @@ typedef struct vhost_vdpa {
     bool shadow_vqs_enabled;
     /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */
     bool shadow_data;
+    /* Device suspended successfully */
+    bool suspended;
     /* IOVA mapping used by the Shadow Virtqueue */
     VhostIOVATree *iova_tree;
     GPtrArray *shadow_vqs;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 8cc9c98db9..228677895a 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1227,6 +1227,14 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
         return 0;
     }
 
+    if (!v->suspended) {
+        /*
+         * Cannot trust in value returned by device, let vhost recover used
+         * idx from guest.
+         */
+        return -1;
+    }
+
     ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
     trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
     return ret;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 07/15] vdpa: add vhost_vdpa_suspend
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (5 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 06/15] vdpa: add vhost_vdpa->suspended parameter Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27  7:27   ` Jason Wang
  2023-03-01  1:30   ` Si-Wei Liu
  2023-02-24 15:54 ` [PATCH v4 08/15] vdpa: rewind at get_base, not set_base Eugenio Pérez
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

The function vhost.c:vhost_dev_stop fetches the vring base so the vq
state can be migrated to other devices.  However, this is unreliable in
vdpa, since we didn't signal the device to suspend the queues, making
the value fetched useless.

Suspend the device if possible before fetching first and subsequent
vring bases.

Moreover, vdpa totally reset and wipes the device at the last device
before fetch its vrings base, making that operation useless in the last
device. This will be fixed in later patches of this series.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v4:
* Look for _F_SUSPEND at vhost_dev->backend_cap, not backend_features
* Fall back on reset & fetch used idx from guest's memory
---
 hw/virtio/vhost-vdpa.c | 25 +++++++++++++++++++++++++
 hw/virtio/trace-events |  1 +
 2 files changed, 26 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 228677895a..f542960a64 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -712,6 +712,7 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
 
     ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
     trace_vhost_vdpa_reset_device(dev, status);
+    v->suspended = false;
     return ret;
 }
 
@@ -1109,6 +1110,29 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
     }
 }
 
+static void vhost_vdpa_suspend(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    int r;
+
+    if (!vhost_vdpa_first_dev(dev)) {
+        return;
+    }
+
+    if (!(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
+        trace_vhost_vdpa_suspend(dev);
+        r = ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
+        if (unlikely(r)) {
+            error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
+        } else {
+            v->suspended = true;
+            return;
+        }
+    }
+
+    vhost_vdpa_reset_device(dev);
+}
+
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 {
     struct vhost_vdpa *v = dev->opaque;
@@ -1123,6 +1147,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         }
         vhost_vdpa_set_vring_ready(dev);
     } else {
+        vhost_vdpa_suspend(dev);
         vhost_vdpa_svqs_stop(dev);
         vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     }
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index a87c5f39a2..8f8d05cf9b 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -50,6 +50,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
 vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
 vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
 vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
+vhost_vdpa_suspend(void *dev) "dev: %p"
 vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
 vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
 vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 08/15] vdpa: rewind at get_base, not set_base
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (6 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 07/15] vdpa: add vhost_vdpa_suspend Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27  7:34   ` Jason Wang
  2023-02-24 15:54 ` [PATCH v4 09/15] vdpa: add vdpa net migration state notifier Eugenio Pérez
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

At this moment it is only possible to migrate to a vdpa device running
with x-svq=on. As a protective measure, the rewind of the inflight
descriptors was done at the destination. That way if the source sent a
virtqueue with inuse descriptors they are always discarded.

Since this series allows to migrate also to passthrough devices with no
SVQ, the right thing to do is to rewind at the source so the base of
vrings are correct.

Support for inflight descriptors may be added in the future.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v4:
* Use virtqueue_unpop at vhost_svq_stop instead of rewinding at
  vhost_vdpa_get_vring_base.
---
 hw/virtio/vhost-shadow-virtqueue.c |  8 ++++++--
 hw/virtio/vhost-vdpa.c             | 11 -----------
 2 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 4307296358..523b379439 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -694,13 +694,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
         g_autofree VirtQueueElement *elem = NULL;
         elem = g_steal_pointer(&svq->desc_state[i].elem);
         if (elem) {
-            virtqueue_detach_element(svq->vq, elem, 0);
+            /*
+             * TODO: This is ok for networking, but other kinds of devices
+             * might have problems with just unpop these.
+             */
+            virtqueue_unpop(svq->vq, elem, 0);
         }
     }
 
     next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
     if (next_avail_elem) {
-        virtqueue_detach_element(svq->vq, next_avail_elem, 0);
+        virtqueue_unpop(svq->vq, next_avail_elem, 0);
     }
     svq->vq = NULL;
     g_free(svq->desc_next);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index f542960a64..71e3dc21fe 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1218,18 +1218,7 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
                                        struct vhost_vring_state *ring)
 {
     struct vhost_vdpa *v = dev->opaque;
-    VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
 
-    /*
-     * vhost-vdpa devices does not support in-flight requests. Set all of them
-     * as available.
-     *
-     * TODO: This is ok for networking, but other kinds of devices might
-     * have problems with these retransmissions.
-     */
-    while (virtqueue_rewind(vq, 1)) {
-        continue;
-    }
     if (v->shadow_vqs_enabled) {
         /*
          * Device vring base was set at device start. SVQ base is handled by
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 09/15] vdpa: add vdpa net migration state notifier
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (7 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 08/15] vdpa: rewind at get_base, not set_base Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27  8:08   ` Jason Wang
  2023-02-24 15:54 ` [PATCH v4 10/15] vdpa: disable RAM block discard only for the first device Eugenio Pérez
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

This allows net to restart the device backend to configure SVQ on it.

Ideally, these changes should not be net specific. However, the vdpa net
backend is the one with enough knowledge to configure everything because
of some reasons:
* Queues might need to be shadowed or not depending on its kind (control
  vs data).
* Queues need to share the same map translations (iova tree).

Because of that it is cleaner to restart the whole net backend and
configure again as expected, similar to how vhost-kernel moves between
userspace and passthrough.

If more kinds of devices need dynamic switching to SVQ we can create a
callback struct like VhostOps and move most of the code there.
VhostOps cannot be reused since all vdpa backend share them, and to
personalize just for networking would be too heavy.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v4:
* Delete duplication of set shadow_data and shadow_vqs_enabled moving it
  to data / cvq net start functions.

v3:
* Check for migration state at vdpa device start to enable SVQ in data
  vqs.

v1 from RFC:
* Add TODO to use the resume operation in the future.
* Use migration_in_setup and migration_has_failed instead of a
  complicated switch case.
---
 net/vhost-vdpa.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 69 insertions(+), 3 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index b89c99066a..c5512ddf10 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -26,12 +26,15 @@
 #include <err.h>
 #include "standard-headers/linux/virtio_net.h"
 #include "monitor/monitor.h"
+#include "migration/migration.h"
+#include "migration/misc.h"
 #include "hw/virtio/vhost.h"
 
 /* Todo:need to add the multiqueue support here */
 typedef struct VhostVDPAState {
     NetClientState nc;
     struct vhost_vdpa vhost_vdpa;
+    Notifier migration_state;
     VHostNetState *vhost_net;
 
     /* Control commands shadow buffers */
@@ -239,10 +242,59 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
     return DO_UPCAST(VhostVDPAState, nc, nc0);
 }
 
+static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
+{
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+    VirtIONet *n;
+    VirtIODevice *vdev;
+    int data_queue_pairs, cvq, r;
+
+    /* We are only called on the first data vqs and only if x-svq is not set */
+    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
+        return;
+    }
+
+    vdev = v->dev->vdev;
+    n = VIRTIO_NET(vdev);
+    if (!n->vhost_started) {
+        return;
+    }
+
+    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
+                                  n->max_ncs - n->max_queue_pairs : 0;
+    /*
+     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
+     * in the future and resume the device if read-only operations between
+     * suspend and reset goes wrong.
+     */
+    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
+
+    /* Start will check migration setup_or_active to configure or not SVQ */
+    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
+    if (unlikely(r < 0)) {
+        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
+    }
+}
+
+static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
+{
+    MigrationState *migration = data;
+    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
+                                     migration_state);
+
+    if (migration_in_setup(migration)) {
+        vhost_vdpa_net_log_global_enable(s, true);
+    } else if (migration_has_failed(migration)) {
+        vhost_vdpa_net_log_global_enable(s, false);
+    }
+}
+
 static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
 {
     struct vhost_vdpa *v = &s->vhost_vdpa;
 
+    add_migration_state_change_notifier(&s->migration_state);
     if (v->shadow_vqs_enabled) {
         v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
                                            v->iova_range.last);
@@ -256,6 +308,15 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
+    if (s->always_svq ||
+        migration_is_setup_or_active(migrate_get_current()->state)) {
+        v->shadow_vqs_enabled = true;
+        v->shadow_data = true;
+    } else {
+        v->shadow_vqs_enabled = false;
+        v->shadow_data = false;
+    }
+
     if (v->index == 0) {
         vhost_vdpa_net_data_start_first(s);
         return 0;
@@ -276,6 +337,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
+    if (s->vhost_vdpa.index == 0) {
+        remove_migration_state_change_notifier(&s->migration_state);
+    }
+
     dev = s->vhost_vdpa.dev;
     if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
         g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
@@ -412,11 +477,12 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
     s = DO_UPCAST(VhostVDPAState, nc, nc);
     v = &s->vhost_vdpa;
 
-    v->shadow_data = s->always_svq;
+    s0 = vhost_vdpa_net_first_nc_vdpa(s);
+    v->shadow_data = s0->vhost_vdpa.shadow_vqs_enabled;
     v->shadow_vqs_enabled = s->always_svq;
     s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
 
-    if (s->always_svq) {
+    if (s->vhost_vdpa.shadow_data) {
         /* SVQ is already configured for all virtqueues */
         goto out;
     }
@@ -473,7 +539,6 @@ out:
         return 0;
     }
 
-    s0 = vhost_vdpa_net_first_nc_vdpa(s);
     if (s0->vhost_vdpa.iova_tree) {
         /*
          * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
@@ -749,6 +814,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
     s->always_svq = svq;
+    s->migration_state.notify = vdpa_net_migration_state_notifier;
     s->vhost_vdpa.shadow_vqs_enabled = svq;
     s->vhost_vdpa.iova_range = iova_range;
     s->vhost_vdpa.shadow_data = svq;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 10/15] vdpa: disable RAM block discard only for the first device
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (8 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 09/15] vdpa: add vdpa net migration state notifier Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27  8:11   ` Jason Wang
  2023-02-24 15:54 ` [PATCH v4 11/15] vdpa net: block migration if the device has CVQ Eugenio Pérez
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

Although it does not make a big difference, its more correct and
simplifies the cleanup path in subsequent patches.

Move ram_block_discard_disable(false) call to the top of
vhost_vdpa_cleanup because:
* We cannot use vhost_vdpa_first_dev after dev->opaque = NULL
  assignment.
* Improve the stack order in cleanup: since it is the last action taken
  in init, it should be the first at cleanup.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 71e3dc21fe..27655e7582 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -431,16 +431,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
     trace_vhost_vdpa_init(dev, opaque);
     int ret;
 
-    /*
-     * Similar to VFIO, we end up pinning all guest memory and have to
-     * disable discarding of RAM.
-     */
-    ret = ram_block_discard_disable(true);
-    if (ret) {
-        error_report("Cannot set discarding of RAM broken");
-        return ret;
-    }
-
     v = opaque;
     v->dev = dev;
     dev->opaque =  opaque ;
@@ -452,6 +442,16 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
         return 0;
     }
 
+    /*
+     * Similar to VFIO, we end up pinning all guest memory and have to
+     * disable discarding of RAM.
+     */
+    ret = ram_block_discard_disable(true);
+    if (ret) {
+        error_report("Cannot set discarding of RAM broken");
+        return ret;
+    }
+
     vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                VIRTIO_CONFIG_S_DRIVER);
 
@@ -577,12 +577,15 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
     v = dev->opaque;
     trace_vhost_vdpa_cleanup(dev, v);
+    if (vhost_vdpa_first_dev(dev)) {
+        ram_block_discard_disable(false);
+    }
+
     vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     memory_listener_unregister(&v->listener);
     vhost_vdpa_svq_cleanup(dev);
 
     dev->opaque = NULL;
-    ram_block_discard_disable(false);
 
     return 0;
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 11/15] vdpa net: block migration if the device has CVQ
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (9 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 10/15] vdpa: disable RAM block discard only for the first device Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27  8:12   ` Jason Wang
  2023-02-24 15:54 ` [PATCH v4 12/15] vdpa: block migration if device has unsupported features Eugenio Pérez
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

Devices with CVQ needs to migrate state beyond vq state.  Leaving this
to future series.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v3: Migration blocker is registered in vhost_dev.
---
 include/hw/virtio/vhost-vdpa.h | 1 +
 hw/virtio/vhost-vdpa.c         | 1 +
 net/vhost-vdpa.c               | 9 +++++++++
 3 files changed, 11 insertions(+)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 4a7d396674..c278a2a8de 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -50,6 +50,7 @@ typedef struct vhost_vdpa {
     const VhostShadowVirtqueueOps *shadow_vq_ops;
     void *shadow_vq_ops_opaque;
     struct vhost_dev *dev;
+    Error *migration_blocker;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 27655e7582..25b64ae854 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -438,6 +438,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
     v->msg_type = VHOST_IOTLB_MSG_V2;
     vhost_vdpa_init_svq(dev, v);
 
+    error_propagate(&dev->migration_blocker, v->migration_blocker);
     if (!vhost_vdpa_first_dev(dev)) {
         return 0;
     }
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index c5512ddf10..4f983df000 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -828,6 +828,15 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
         s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
         s->vhost_vdpa.shadow_vq_ops_opaque = s;
+
+        /*
+         * TODO: We cannot migrate devices with CVQ as there is no way to set
+         * the device state (MAC, MQ, etc) before starting datapath.
+         *
+         * Migration blocker ownership now belongs to v
+         */
+        error_setg(&s->vhost_vdpa.migration_blocker,
+                   "net vdpa cannot migrate with CVQ feature");
     }
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 12/15] vdpa: block migration if device has unsupported features
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (10 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 11/15] vdpa net: block migration if the device has CVQ Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27  8:15   ` Jason Wang
  2023-02-24 15:54 ` [PATCH v4 13/15] vdpa: block migration if SVQ does not admit a feature Eugenio Pérez
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

A vdpa net device must initialize with SVQ in order to be migratable at
this moment, and initialization code verifies some conditions.  If the
device is not initialized with the x-svq parameter, it will not expose
_F_LOG so the vhost subsystem will block VM migration from its
initialization.

Next patches change this, so we need to verify migration conditions
differently.

QEMU only supports a subset of net features in SVQ, and it cannot
migrate state that cannot track or restore in the destination.  Add a
migration blocker if the device offer an unsupported feature.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v3: add mirgation blocker properly so vhost_dev can handle it.
---
 net/vhost-vdpa.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 4f983df000..094dc1c2d0 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -795,7 +795,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                        int nvqs,
                                        bool is_datapath,
                                        bool svq,
-                                       struct vhost_vdpa_iova_range iova_range)
+                                       struct vhost_vdpa_iova_range iova_range,
+                                       uint64_t features)
 {
     NetClientState *nc = NULL;
     VhostVDPAState *s;
@@ -818,7 +819,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     s->vhost_vdpa.shadow_vqs_enabled = svq;
     s->vhost_vdpa.iova_range = iova_range;
     s->vhost_vdpa.shadow_data = svq;
-    if (!is_datapath) {
+    if (queue_pair_index == 0) {
+        vhost_vdpa_net_valid_svq_features(features,
+                                          &s->vhost_vdpa.migration_blocker);
+    } else if (!is_datapath) {
         s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
                                             vhost_vdpa_net_cvq_cmd_page_len());
         memset(s->cvq_cmd_out_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
@@ -956,7 +960,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
                                      vdpa_device_fd, i, 2, true, opts->x_svq,
-                                     iova_range);
+                                     iova_range, features);
         if (!ncs[i])
             goto err;
     }
@@ -964,7 +968,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
                                  vdpa_device_fd, i, 1, false,
-                                 opts->x_svq, iova_range);
+                                 opts->x_svq, iova_range, features);
         if (!nc)
             goto err;
     }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 13/15] vdpa: block migration if SVQ does not admit a feature
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (11 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 12/15] vdpa: block migration if device has unsupported features Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-24 15:54 ` [PATCH v4 14/15] vdpa net: allow VHOST_F_LOG_ALL Eugenio Pérez
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

Next patches enable devices to be migrated even if vdpa netdev has not
been started with x-svq. However, not all devices are migratable, so we
need to block migration if we detect that.

Block migration if we detect the device expose a feature SVQ does not
know how to work with.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 25b64ae854..8702780ad6 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -443,6 +443,21 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
         return 0;
     }
 
+    /*
+     * If dev->shadow_vqs_enabled at initialization that means the device has
+     * been started with x-svq=on, so don't block migration
+     */
+    if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
+        /* We don't have dev->features yet */
+        uint64_t features;
+        ret = vhost_vdpa_get_dev_features(dev, &features);
+        if (unlikely(ret)) {
+            error_setg_errno(errp, -ret, "Could not get device features");
+            return ret;
+        }
+        vhost_svq_valid_features(features, &dev->migration_blocker);
+    }
+
     /*
      * Similar to VFIO, we end up pinning all guest memory and have to
      * disable discarding of RAM.
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 14/15] vdpa net: allow VHOST_F_LOG_ALL
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (12 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 13/15] vdpa: block migration if SVQ does not admit a feature Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-24 15:54 ` [PATCH v4 15/15] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices Eugenio Pérez
  2023-02-27 12:40 ` [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Alvaro Karsz
  15 siblings, 0 replies; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

Since some actions move to the start function instead of init, the
device features may not be the parent vdpa device's, but the one
returned by vhost backend.  If transition to SVQ is supported, the vhost
backend will return _F_LOG_ALL to signal the device is migratable.

Add VHOST_F_LOG_ALL.  HW dirty page tracking can be added on top of this
change if the device supports it in the future.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 net/vhost-vdpa.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 094dc1c2d0..f55bb31400 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -101,6 +101,8 @@ static const uint64_t vdpa_svq_device_features =
     BIT_ULL(VIRTIO_NET_F_MQ) |
     BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
     BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
+    /* VHOST_F_LOG_ALL is exposed by SVQ */
+    BIT_ULL(VHOST_F_LOG_ALL) |
     BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
     BIT_ULL(VIRTIO_NET_F_STANDBY);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 15/15] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (13 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 14/15] vdpa net: allow VHOST_F_LOG_ALL Eugenio Pérez
@ 2023-02-24 15:54 ` Eugenio Pérez
  2023-02-27 12:40 ` [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Alvaro Karsz
  15 siblings, 0 replies; 48+ messages in thread
From: Eugenio Pérez @ 2023-02-24 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

vhost-vdpa devices can return this features now that blockers have been
set in case some features are not met.

Expose VHOST_F_LOG_ALL only in that case.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 8702780ad6..2a66cb51fc 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1307,10 +1307,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
 static int vhost_vdpa_get_features(struct vhost_dev *dev,
                                      uint64_t *features)
 {
-    struct vhost_vdpa *v = dev->opaque;
     int ret = vhost_vdpa_get_dev_features(dev, features);
 
-    if (ret == 0 && v->shadow_vqs_enabled) {
+    if (ret == 0) {
         /* Add SVQ logging capabilities */
         *features |= BIT_ULL(VHOST_F_LOG_ALL);
     }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 01/15] vdpa net: move iova tree creation from init to start
  2023-02-24 15:54 ` [PATCH v4 01/15] vdpa net: move iova tree creation from init to start Eugenio Pérez
@ 2023-02-27  7:04   ` Jason Wang
  2023-03-01  7:01     ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-02-27  7:04 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang


在 2023/2/24 23:54, Eugenio Pérez 写道:
> Only create iova_tree if and when it is needed.
>
> The cleanup keeps being responsible of last VQ but this change allows it
> to merge both cleanup functions.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> ---
> v4:
> * Remove leak of iova_tree because double allocation
> * Document better the sharing of IOVA tree between data and CVQ
> ---
>   net/vhost-vdpa.c | 113 ++++++++++++++++++++++++++++++++++-------------
>   1 file changed, 83 insertions(+), 30 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index de5ed8ff22..b89c99066a 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -178,13 +178,9 @@ err_init:
>   static void vhost_vdpa_cleanup(NetClientState *nc)
>   {
>       VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> -    struct vhost_dev *dev = &s->vhost_net->dev;
>   
>       qemu_vfree(s->cvq_cmd_out_buffer);
>       qemu_vfree(s->status);
> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> -    }
>       if (s->vhost_net) {
>           vhost_net_cleanup(s->vhost_net);
>           g_free(s->vhost_net);
> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>       return size;
>   }
>   
> +/** From any vdpa net client, get the netclient of first queue pair */
> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> +{
> +    NICState *nic = qemu_get_nic(s->nc.peer);
> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> +
> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> +}
> +
> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> +{
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    if (v->shadow_vqs_enabled) {
> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> +                                           v->iova_range.last);
> +    }
> +}
> +
> +static int vhost_vdpa_net_data_start(NetClientState *nc)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +
> +    if (v->index == 0) {
> +        vhost_vdpa_net_data_start_first(s);
> +        return 0;
> +    }
> +
> +    if (v->shadow_vqs_enabled) {
> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> +    }
> +
> +    return 0;
> +}
> +
> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_dev *dev;
> +
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +
> +    dev = s->vhost_vdpa.dev;
> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> +    }
> +}
> +
>   static NetClientInfo net_vhost_vdpa_info = {
>           .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>           .size = sizeof(VhostVDPAState),
>           .receive = vhost_vdpa_receive,
> +        .start = vhost_vdpa_net_data_start,
> +        .stop = vhost_vdpa_net_client_stop,


Looking at the implementation, it seems nothing net specific, any reason 
we can't simply use vhost_vdpa_dev_start()?


>           .cleanup = vhost_vdpa_cleanup,
>           .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
>           .has_ufo = vhost_vdpa_has_ufo,
> @@ -351,7 +401,7 @@ dma_map_err:
>   
>   static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>   {
> -    VhostVDPAState *s;
> +    VhostVDPAState *s, *s0;
>       struct vhost_vdpa *v;
>       uint64_t backend_features;
>       int64_t cvq_group;
> @@ -415,8 +465,6 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>           return r;
>       }
>   
> -    v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> -                                       v->iova_range.last);
>       v->shadow_vqs_enabled = true;
>       s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
>   
> @@ -425,6 +473,27 @@ out:
>           return 0;
>       }
>   
> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> +    if (s0->vhost_vdpa.iova_tree) {
> +        /*
> +         * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
> +         * simplicity, wether CVQ shares ASID with guest or not, because:


Typo, should be "whether", or "regardless of whether"(not a native speaker).

Other looks good.

Thanks


> +         * - Memory listener need access to guest's memory addresses allocated
> +         *   in the IOVA tree.
> +         * - There should be plenty of IOVA address space for both ASID not to
> +         *   worry about collisions between them.  Guest's translations are
> +         *   still validated with virtio virtqueue_pop so there is no risk for
> +         *   the guest to access memory it shouldn't.
> +         *
> +         * To allocate a iova tree per ASID is doable but it complicates the
> +         * code and it is not worth for the moment.
> +         */
> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> +    } else {
> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> +                                           v->iova_range.last);
> +    }
> +
>       r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
>                                  vhost_vdpa_net_cvq_cmd_page_len(), false);
>       if (unlikely(r < 0)) {
> @@ -449,15 +518,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>       if (s->vhost_vdpa.shadow_vqs_enabled) {
>           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> -        if (!s->always_svq) {
> -            /*
> -             * If only the CVQ is shadowed we can delete this safely.
> -             * If all the VQs are shadows this will be needed by the time the
> -             * device is started again to register SVQ vrings and similar.
> -             */
> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> -        }
>       }
> +
> +    vhost_vdpa_net_client_stop(nc);
>   }
>   
>   static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
> @@ -667,8 +730,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>                                          int nvqs,
>                                          bool is_datapath,
>                                          bool svq,
> -                                       struct vhost_vdpa_iova_range iova_range,
> -                                       VhostIOVATree *iova_tree)
> +                                       struct vhost_vdpa_iova_range iova_range)
>   {
>       NetClientState *nc = NULL;
>       VhostVDPAState *s;
> @@ -690,7 +752,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       s->vhost_vdpa.shadow_vqs_enabled = svq;
>       s->vhost_vdpa.iova_range = iova_range;
>       s->vhost_vdpa.shadow_data = svq;
> -    s->vhost_vdpa.iova_tree = iova_tree;
>       if (!is_datapath) {
>           s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>                                               vhost_vdpa_net_cvq_cmd_page_len());
> @@ -760,7 +821,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       uint64_t features;
>       int vdpa_device_fd;
>       g_autofree NetClientState **ncs = NULL;
> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
>       struct vhost_vdpa_iova_range iova_range;
>       NetClientState *nc;
>       int queue_pairs, r, i = 0, has_cvq = 0;
> @@ -812,12 +872,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>           goto err;
>       }
>   
> -    if (opts->x_svq) {
> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> -            goto err_svq;
> -        }
> -
> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
> +        goto err;
>       }
>   
>       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> @@ -825,7 +881,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       for (i = 0; i < queue_pairs; i++) {
>           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                        vdpa_device_fd, i, 2, true, opts->x_svq,
> -                                     iova_range, iova_tree);
> +                                     iova_range);
>           if (!ncs[i])
>               goto err;
>       }
> @@ -833,13 +889,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       if (has_cvq) {
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                    vdpa_device_fd, i, 1, false,
> -                                 opts->x_svq, iova_range, iova_tree);
> +                                 opts->x_svq, iova_range);
>           if (!nc)
>               goto err;
>       }
>   
> -    /* iova_tree ownership belongs to last NetClientState */
> -    g_steal_pointer(&iova_tree);
>       return 0;
>   
>   err:
> @@ -849,7 +903,6 @@ err:
>           }
>       }
>   
> -err_svq:
>       qemu_close(vdpa_device_fd);
>   
>       return -1;



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 03/15] vdpa: stop svq at vhost_vdpa_dev_start(false)
  2023-02-24 15:54 ` [PATCH v4 03/15] vdpa: stop svq at vhost_vdpa_dev_start(false) Eugenio Pérez
@ 2023-02-27  7:15   ` Jason Wang
  2023-03-03 16:29     ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-02-27  7:15 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang


在 2023/2/24 23:54, Eugenio Pérez 写道:
> It used to be done at vhost_vdpa_svq_cleanup, since a device couldn't
> switch to SVQ mode dynamically.  Now that we need to fetch the state and
> trust SVQ will not modify guest's used_idx at migration, stop
> effectively SVQ at suspend time, as a real device would do.
>
> Leaving old vhost_svq_stop call at vhost_vdpa_svq_cleanup, as its
> supported to call it many times and it follows other operations that are
> called redundantly there too:
> * vhost_vdpa_host_notifiers_uninit
> * memory_listener_unregister
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v3: New in v3
> ---
>   hw/virtio/vhost-vdpa.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 4f72a52a43..d9260191cc 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1100,10 +1100,11 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
>   
>       for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>           VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> +
> +        vhost_svq_stop(svq);
>           vhost_vdpa_svq_unmap_rings(dev, svq);
>   
>           event_notifier_cleanup(&svq->hdev_kick);
> -        event_notifier_cleanup(&svq->hdev_call);


Any reason we need to not clean callfd? (Not explained in the change 
log, or should be another patch?).

Thanks


>       }
>   }
>   



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 05/15] vdpa: move vhost reset after get vring base
  2023-02-24 15:54 ` [PATCH v4 05/15] vdpa: move vhost reset after get vring base Eugenio Pérez
@ 2023-02-27  7:22   ` Jason Wang
  2023-03-01 19:11     ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-02-27  7:22 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang


在 2023/2/24 23:54, Eugenio Pérez 写道:
> The function vhost.c:vhost_dev_stop calls vhost operation
> vhost_dev_start(false). In the case of vdpa it totally reset and wipes
> the device, making the fetching of the vring base (virtqueue state) totally
> useless.


As discussed before, should we do something reverse: in 
vhost_vdpa_dev_started() since what proposed in the patch doesn't solve 
the issue (index could be moved after the get_vring_base())

1) if _F_SUSPEND is negotiated, suspend instead of reset

2) if _F_SUSPEND is not negotiated, reset and fail 
vhost_get_vring_base() to allow graceful fallback?

Thanks


>
> The kernel backend does not use vhost_dev_start vhost op callback, but
> vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
> is desirable, but it can be added on top.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/hw/virtio/vhost-backend.h |  4 ++++
>   hw/virtio/vhost-vdpa.c            | 22 ++++++++++++++++------
>   hw/virtio/vhost.c                 |  3 +++
>   3 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index c5ab49051e..ec3fbae58d 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
>   
>   typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
>                                          int fd);
> +
> +typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
> +
>   typedef struct VhostOps {
>       VhostBackendType backend_type;
>       vhost_backend_init vhost_backend_init;
> @@ -177,6 +180,7 @@ typedef struct VhostOps {
>       vhost_get_device_id_op vhost_get_device_id;
>       vhost_force_iommu_op vhost_force_iommu;
>       vhost_set_config_call_op vhost_set_config_call;
> +    vhost_reset_status_op vhost_reset_status;
>   } VhostOps;
>   
>   int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 4fac144169..8cc9c98db9 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1134,14 +1134,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>       if (started) {
>           memory_listener_register(&v->listener, &address_space_memory);
>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> -    } else {
> -        vhost_vdpa_reset_device(dev);
> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                                   VIRTIO_CONFIG_S_DRIVER);
> -        memory_listener_unregister(&v->listener);
> +    }
>   
> -        return 0;
> +    return 0;
> +}
> +
> +static void vhost_vdpa_reset_status(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> +        return;
>       }
> +
> +    vhost_vdpa_reset_device(dev);
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                               VIRTIO_CONFIG_S_DRIVER);
> +    memory_listener_unregister(&v->listener);
>   }
>   
>   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> @@ -1328,4 +1337,5 @@ const VhostOps vdpa_ops = {
>           .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
>           .vhost_force_iommu = vhost_vdpa_force_iommu,
>           .vhost_set_config_call = vhost_vdpa_set_config_call,
> +        .vhost_reset_status = vhost_vdpa_reset_status,
>   };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index eb8c4c378c..a266396576 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
>                                hdev->vqs + i,
>                                hdev->vq_index + i);
>       }
> +    if (hdev->vhost_ops->vhost_reset_status) {
> +        hdev->vhost_ops->vhost_reset_status(hdev);
> +    }
>   
>       if (vhost_dev_has_iommu(hdev)) {
>           if (hdev->vhost_ops->vhost_set_iotlb_callback) {



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 06/15] vdpa: add vhost_vdpa->suspended parameter
  2023-02-24 15:54 ` [PATCH v4 06/15] vdpa: add vhost_vdpa->suspended parameter Eugenio Pérez
@ 2023-02-27  7:24   ` Jason Wang
  2023-03-01 19:11     ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-02-27  7:24 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang


在 2023/2/24 23:54, Eugenio Pérez 写道:
> This allows vhost_vdpa to track if it is safe to get vring base from the
> device or not.  If it is not, vhost can fall back to fetch idx from the
> guest buffer again.
>
> No functional change intended in this patch, later patches will use this
> field.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


I think we probably need to re-order the patch, e.g to let this come 
before at least patch 5.


> ---
>   include/hw/virtio/vhost-vdpa.h | 2 ++
>   hw/virtio/vhost-vdpa.c         | 8 ++++++++
>   2 files changed, 10 insertions(+)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 7997f09a8d..4a7d396674 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -42,6 +42,8 @@ typedef struct vhost_vdpa {
>       bool shadow_vqs_enabled;
>       /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */
>       bool shadow_data;
> +    /* Device suspended successfully */
> +    bool suspended;


Should we implement the set/clear in this patch as well?

Thanks


>       /* IOVA mapping used by the Shadow Virtqueue */
>       VhostIOVATree *iova_tree;
>       GPtrArray *shadow_vqs;
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 8cc9c98db9..228677895a 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1227,6 +1227,14 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>           return 0;
>       }
>   
> +    if (!v->suspended) {
> +        /*
> +         * Cannot trust in value returned by device, let vhost recover used
> +         * idx from guest.
> +         */
> +        return -1;
> +    }
> +
>       ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
>       trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
>       return ret;



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/15] vdpa: add vhost_vdpa_suspend
  2023-02-24 15:54 ` [PATCH v4 07/15] vdpa: add vhost_vdpa_suspend Eugenio Pérez
@ 2023-02-27  7:27   ` Jason Wang
  2023-03-01  1:30   ` Si-Wei Liu
  1 sibling, 0 replies; 48+ messages in thread
From: Jason Wang @ 2023-02-27  7:27 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang


在 2023/2/24 23:54, Eugenio Pérez 写道:
> The function vhost.c:vhost_dev_stop fetches the vring base so the vq
> state can be migrated to other devices.  However, this is unreliable in
> vdpa, since we didn't signal the device to suspend the queues, making
> the value fetched useless.
>
> Suspend the device if possible before fetching first and subsequent
> vring bases.
>
> Moreover, vdpa totally reset and wipes the device at the last device
> before fetch its vrings base, making that operation useless in the last
> device. This will be fixed in later patches of this series.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


I suggest to squash this into patch 5 (or even squash patch 6 into this) 
since it's not good to introduce a bug in 5 and fix in 7.


> ---
> v4:
> * Look for _F_SUSPEND at vhost_dev->backend_cap, not backend_features
> * Fall back on reset & fetch used idx from guest's memory


A hint to squash patch 6.

Thanks


> ---
>   hw/virtio/vhost-vdpa.c | 25 +++++++++++++++++++++++++
>   hw/virtio/trace-events |  1 +
>   2 files changed, 26 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 228677895a..f542960a64 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -712,6 +712,7 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>   
>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>       trace_vhost_vdpa_reset_device(dev, status);
> +    v->suspended = false;
>       return ret;
>   }
>   
> @@ -1109,6 +1110,29 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
>       }
>   }
>   
> +static void vhost_vdpa_suspend(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int r;
> +
> +    if (!vhost_vdpa_first_dev(dev)) {
> +        return;
> +    }
> +
> +    if (!(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> +        trace_vhost_vdpa_suspend(dev);
> +        r = ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
> +        if (unlikely(r)) {
> +            error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
> +        } else {
> +            v->suspended = true;
> +            return;
> +        }
> +    }
> +
> +    vhost_vdpa_reset_device(dev);
> +}
> +
>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -1123,6 +1147,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           }
>           vhost_vdpa_set_vring_ready(dev);
>       } else {
> +        vhost_vdpa_suspend(dev);
>           vhost_vdpa_svqs_stop(dev);
>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       }
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index a87c5f39a2..8f8d05cf9b 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -50,6 +50,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
>   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
>   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
>   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
> +vhost_vdpa_suspend(void *dev) "dev: %p"
>   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
>   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
>   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 08/15] vdpa: rewind at get_base, not set_base
  2023-02-24 15:54 ` [PATCH v4 08/15] vdpa: rewind at get_base, not set_base Eugenio Pérez
@ 2023-02-27  7:34   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2023-02-27  7:34 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang


在 2023/2/24 23:54, Eugenio Pérez 写道:
> At this moment it is only possible to migrate to a vdpa device running
> with x-svq=on. As a protective measure, the rewind of the inflight
> descriptors was done at the destination. That way if the source sent a
> virtqueue with inuse descriptors they are always discarded.
>
> Since this series allows to migrate also to passthrough devices with no
> SVQ, the right thing to do is to rewind at the source so the base of
> vrings are correct.
>
> Support for inflight descriptors may be added in the future.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
> v4:
> * Use virtqueue_unpop at vhost_svq_stop instead of rewinding at
>    vhost_vdpa_get_vring_base.
> ---
>   hw/virtio/vhost-shadow-virtqueue.c |  8 ++++++--
>   hw/virtio/vhost-vdpa.c             | 11 -----------
>   2 files changed, 6 insertions(+), 13 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 4307296358..523b379439 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -694,13 +694,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>           g_autofree VirtQueueElement *elem = NULL;
>           elem = g_steal_pointer(&svq->desc_state[i].elem);
>           if (elem) {
> -            virtqueue_detach_element(svq->vq, elem, 0);
> +            /*
> +             * TODO: This is ok for networking, but other kinds of devices
> +             * might have problems with just unpop these.
> +             */
> +            virtqueue_unpop(svq->vq, elem, 0);
>           }
>       }
>   
>       next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>       if (next_avail_elem) {
> -        virtqueue_detach_element(svq->vq, next_avail_elem, 0);
> +        virtqueue_unpop(svq->vq, next_avail_elem, 0);
>       }
>       svq->vq = NULL;
>       g_free(svq->desc_next);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index f542960a64..71e3dc21fe 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1218,18 +1218,7 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
>                                          struct vhost_vring_state *ring)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> -    VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
>   
> -    /*
> -     * vhost-vdpa devices does not support in-flight requests. Set all of them
> -     * as available.
> -     *
> -     * TODO: This is ok for networking, but other kinds of devices might
> -     * have problems with these retransmissions.
> -     */
> -    while (virtqueue_rewind(vq, 1)) {
> -        continue;
> -    }
>       if (v->shadow_vqs_enabled) {
>           /*
>            * Device vring base was set at device start. SVQ base is handled by



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/15] vdpa: add vdpa net migration state notifier
  2023-02-24 15:54 ` [PATCH v4 09/15] vdpa: add vdpa net migration state notifier Eugenio Pérez
@ 2023-02-27  8:08   ` Jason Wang
  2023-03-01 19:26     ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-02-27  8:08 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang


在 2023/2/24 23:54, Eugenio Pérez 写道:
> This allows net to restart the device backend to configure SVQ on it.
>
> Ideally, these changes should not be net specific. However, the vdpa net
> backend is the one with enough knowledge to configure everything because
> of some reasons:
> * Queues might need to be shadowed or not depending on its kind (control
>    vs data).
> * Queues need to share the same map translations (iova tree).
>
> Because of that it is cleaner to restart the whole net backend and
> configure again as expected, similar to how vhost-kernel moves between
> userspace and passthrough.
>
> If more kinds of devices need dynamic switching to SVQ we can create a
> callback struct like VhostOps and move most of the code there.
> VhostOps cannot be reused since all vdpa backend share them, and to
> personalize just for networking would be too heavy.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v4:
> * Delete duplication of set shadow_data and shadow_vqs_enabled moving it
>    to data / cvq net start functions.
>
> v3:
> * Check for migration state at vdpa device start to enable SVQ in data
>    vqs.
>
> v1 from RFC:
> * Add TODO to use the resume operation in the future.
> * Use migration_in_setup and migration_has_failed instead of a
>    complicated switch case.
> ---
>   net/vhost-vdpa.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 69 insertions(+), 3 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index b89c99066a..c5512ddf10 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -26,12 +26,15 @@
>   #include <err.h>
>   #include "standard-headers/linux/virtio_net.h"
>   #include "monitor/monitor.h"
> +#include "migration/migration.h"
> +#include "migration/misc.h"
>   #include "hw/virtio/vhost.h"
>   
>   /* Todo:need to add the multiqueue support here */
>   typedef struct VhostVDPAState {
>       NetClientState nc;
>       struct vhost_vdpa vhost_vdpa;
> +    Notifier migration_state;
>       VHostNetState *vhost_net;
>   
>       /* Control commands shadow buffers */
> @@ -239,10 +242,59 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>       return DO_UPCAST(VhostVDPAState, nc, nc0);
>   }
>   
> +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
> +{
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +    VirtIONet *n;
> +    VirtIODevice *vdev;
> +    int data_queue_pairs, cvq, r;
> +
> +    /* We are only called on the first data vqs and only if x-svq is not set */
> +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> +        return;
> +    }
> +
> +    vdev = v->dev->vdev;
> +    n = VIRTIO_NET(vdev);
> +    if (!n->vhost_started) {
> +        return;
> +    }
> +
> +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> +                                  n->max_ncs - n->max_queue_pairs : 0;
> +    /*
> +     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
> +     * in the future and resume the device if read-only operations between
> +     * suspend and reset goes wrong.
> +     */
> +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> +
> +    /* Start will check migration setup_or_active to configure or not SVQ */
> +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> +    if (unlikely(r < 0)) {
> +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
> +    }
> +}
> +
> +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> +{
> +    MigrationState *migration = data;
> +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> +                                     migration_state);
> +
> +    if (migration_in_setup(migration)) {
> +        vhost_vdpa_net_log_global_enable(s, true);
> +    } else if (migration_has_failed(migration)) {
> +        vhost_vdpa_net_log_global_enable(s, false);
> +    }
> +}
> +
>   static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>   {
>       struct vhost_vdpa *v = &s->vhost_vdpa;
>   
> +    add_migration_state_change_notifier(&s->migration_state);
>       if (v->shadow_vqs_enabled) {
>           v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>                                              v->iova_range.last);
> @@ -256,6 +308,15 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
>   
>       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>   
> +    if (s->always_svq ||
> +        migration_is_setup_or_active(migrate_get_current()->state)) {
> +        v->shadow_vqs_enabled = true;
> +        v->shadow_data = true;
> +    } else {
> +        v->shadow_vqs_enabled = false;
> +        v->shadow_data = false;
> +    }
> +
>       if (v->index == 0) {
>           vhost_vdpa_net_data_start_first(s);
>           return 0;
> @@ -276,6 +337,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
>   
>       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>   
> +    if (s->vhost_vdpa.index == 0) {
> +        remove_migration_state_change_notifier(&s->migration_state);
> +    }


This should work but I just realize that vhost support 
vhost_dev_set_log(), I wonder if it would be simpler to go with that way.

Using vhost_virtqueue_set_addr(, enable_log = true)?

Thanks


> +
>       dev = s->vhost_vdpa.dev;
>       if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>           g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> @@ -412,11 +477,12 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>       s = DO_UPCAST(VhostVDPAState, nc, nc);
>       v = &s->vhost_vdpa;
>   
> -    v->shadow_data = s->always_svq;
> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> +    v->shadow_data = s0->vhost_vdpa.shadow_vqs_enabled;
>       v->shadow_vqs_enabled = s->always_svq;
>       s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
>   
> -    if (s->always_svq) {
> +    if (s->vhost_vdpa.shadow_data) {
>           /* SVQ is already configured for all virtqueues */
>           goto out;
>       }
> @@ -473,7 +539,6 @@ out:
>           return 0;
>       }
>   
> -    s0 = vhost_vdpa_net_first_nc_vdpa(s);
>       if (s0->vhost_vdpa.iova_tree) {
>           /*
>            * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
> @@ -749,6 +814,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
>       s->always_svq = svq;
> +    s->migration_state.notify = vdpa_net_migration_state_notifier;
>       s->vhost_vdpa.shadow_vqs_enabled = svq;
>       s->vhost_vdpa.iova_range = iova_range;
>       s->vhost_vdpa.shadow_data = svq;



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 10/15] vdpa: disable RAM block discard only for the first device
  2023-02-24 15:54 ` [PATCH v4 10/15] vdpa: disable RAM block discard only for the first device Eugenio Pérez
@ 2023-02-27  8:11   ` Jason Wang
  2023-03-02 15:11     ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-02-27  8:11 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang


在 2023/2/24 23:54, Eugenio Pérez 写道:
> Although it does not make a big difference, its more correct and
> simplifies the cleanup path in subsequent patches.
>
> Move ram_block_discard_disable(false) call to the top of
> vhost_vdpa_cleanup because:
> * We cannot use vhost_vdpa_first_dev after dev->opaque = NULL
>    assignment.
> * Improve the stack order in cleanup: since it is the last action taken
>    in init, it should be the first at cleanup.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 25 ++++++++++++++-----------
>   1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 71e3dc21fe..27655e7582 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -431,16 +431,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>       trace_vhost_vdpa_init(dev, opaque);
>       int ret;
>   
> -    /*
> -     * Similar to VFIO, we end up pinning all guest memory and have to
> -     * disable discarding of RAM.
> -     */
> -    ret = ram_block_discard_disable(true);
> -    if (ret) {
> -        error_report("Cannot set discarding of RAM broken");
> -        return ret;
> -    }
> -
>       v = opaque;
>       v->dev = dev;
>       dev->opaque =  opaque ;
> @@ -452,6 +442,16 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>           return 0;
>       }
>   
> +    /*
> +     * Similar to VFIO, we end up pinning all guest memory and have to
> +     * disable discarding of RAM.
> +     */
> +    ret = ram_block_discard_disable(true);
> +    if (ret) {
> +        error_report("Cannot set discarding of RAM broken");
> +        return ret;
> +    }


We seems to lose the chance to free svq allocated by 
vhost_vdpa_init_svq() in this case?

Thanks


> +
>       vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>                                  VIRTIO_CONFIG_S_DRIVER);
>   
> @@ -577,12 +577,15 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>       assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>       v = dev->opaque;
>       trace_vhost_vdpa_cleanup(dev, v);
> +    if (vhost_vdpa_first_dev(dev)) {
> +        ram_block_discard_disable(false);
> +    }
> +
>       vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       memory_listener_unregister(&v->listener);
>       vhost_vdpa_svq_cleanup(dev);
>   
>       dev->opaque = NULL;
> -    ram_block_discard_disable(false);
>   
>       return 0;
>   }



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 11/15] vdpa net: block migration if the device has CVQ
  2023-02-24 15:54 ` [PATCH v4 11/15] vdpa net: block migration if the device has CVQ Eugenio Pérez
@ 2023-02-27  8:12   ` Jason Wang
  2023-03-02 15:13     ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-02-27  8:12 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang


在 2023/2/24 23:54, Eugenio Pérez 写道:
> Devices with CVQ needs to migrate state beyond vq state.  Leaving this
> to future series.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v3: Migration blocker is registered in vhost_dev.
> ---
>   include/hw/virtio/vhost-vdpa.h | 1 +
>   hw/virtio/vhost-vdpa.c         | 1 +
>   net/vhost-vdpa.c               | 9 +++++++++
>   3 files changed, 11 insertions(+)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 4a7d396674..c278a2a8de 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -50,6 +50,7 @@ typedef struct vhost_vdpa {
>       const VhostShadowVirtqueueOps *shadow_vq_ops;
>       void *shadow_vq_ops_opaque;
>       struct vhost_dev *dev;
> +    Error *migration_blocker;
>       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>   } VhostVDPA;
>   
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 27655e7582..25b64ae854 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -438,6 +438,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>       v->msg_type = VHOST_IOTLB_MSG_V2;
>       vhost_vdpa_init_svq(dev, v);
>   
> +    error_propagate(&dev->migration_blocker, v->migration_blocker);
>       if (!vhost_vdpa_first_dev(dev)) {
>           return 0;
>       }
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index c5512ddf10..4f983df000 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -828,6 +828,15 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>   
>           s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
>           s->vhost_vdpa.shadow_vq_ops_opaque = s;
> +
> +        /*
> +         * TODO: We cannot migrate devices with CVQ as there is no way to set
> +         * the device state (MAC, MQ, etc) before starting datapath.
> +         *
> +         * Migration blocker ownership now belongs to v


The sentence is incomplete.

Other looks good.

Thanks


> +         */
> +        error_setg(&s->vhost_vdpa.migration_blocker,
> +                   "net vdpa cannot migrate with CVQ feature");
>       }
>       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>       if (ret) {



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/15] vdpa: block migration if device has unsupported features
  2023-02-24 15:54 ` [PATCH v4 12/15] vdpa: block migration if device has unsupported features Eugenio Pérez
@ 2023-02-27  8:15   ` Jason Wang
  2023-02-27  8:19     ` Jason Wang
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-02-27  8:15 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang


在 2023/2/24 23:54, Eugenio Pérez 写道:
> A vdpa net device must initialize with SVQ in order to be migratable at
> this moment, and initialization code verifies some conditions.  If the
> device is not initialized with the x-svq parameter, it will not expose
> _F_LOG so the vhost subsystem will block VM migration from its
> initialization.
>
> Next patches change this, so we need to verify migration conditions
> differently.
>
> QEMU only supports a subset of net features in SVQ, and it cannot
> migrate state that cannot track or restore in the destination.  Add a
> migration blocker if the device offer an unsupported feature.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v3: add mirgation blocker properly so vhost_dev can handle it.
> ---
>   net/vhost-vdpa.c | 12 ++++++++----
>   1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 4f983df000..094dc1c2d0 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -795,7 +795,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>                                          int nvqs,
>                                          bool is_datapath,
>                                          bool svq,
> -                                       struct vhost_vdpa_iova_range iova_range)
> +                                       struct vhost_vdpa_iova_range iova_range,
> +                                       uint64_t features)
>   {
>       NetClientState *nc = NULL;
>       VhostVDPAState *s;
> @@ -818,7 +819,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       s->vhost_vdpa.shadow_vqs_enabled = svq;
>       s->vhost_vdpa.iova_range = iova_range;
>       s->vhost_vdpa.shadow_data = svq;
> -    if (!is_datapath) {
> +    if (queue_pair_index == 0) {
> +        vhost_vdpa_net_valid_svq_features(features,
> +                                          &s->vhost_vdpa.migration_blocker);


Since we do validation at initialization, is this necessary to valid 
once again in other places?

Thanks


> +    } else if (!is_datapath) {
>           s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>                                               vhost_vdpa_net_cvq_cmd_page_len());
>           memset(s->cvq_cmd_out_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
> @@ -956,7 +960,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       for (i = 0; i < queue_pairs; i++) {
>           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                        vdpa_device_fd, i, 2, true, opts->x_svq,
> -                                     iova_range);
> +                                     iova_range, features);
>           if (!ncs[i])
>               goto err;
>       }
> @@ -964,7 +968,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       if (has_cvq) {
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                    vdpa_device_fd, i, 1, false,
> -                                 opts->x_svq, iova_range);
> +                                 opts->x_svq, iova_range, features);
>           if (!nc)
>               goto err;
>       }



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/15] vdpa: block migration if device has unsupported features
  2023-02-27  8:15   ` Jason Wang
@ 2023-02-27  8:19     ` Jason Wang
  2023-03-01 19:32       ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-02-27  8:19 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Gautam Dawar, Laurent Vivier,
	alvaro.karsz, longpeng2, virtualization, Stefan Hajnoczi,
	Cindy Lu, Michael S. Tsirkin, si-wei.liu, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang

On Mon, Feb 27, 2023 at 4:15 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > A vdpa net device must initialize with SVQ in order to be migratable at
> > this moment, and initialization code verifies some conditions.  If the
> > device is not initialized with the x-svq parameter, it will not expose
> > _F_LOG so the vhost subsystem will block VM migration from its
> > initialization.
> >
> > Next patches change this, so we need to verify migration conditions
> > differently.
> >
> > QEMU only supports a subset of net features in SVQ, and it cannot
> > migrate state that cannot track or restore in the destination.  Add a
> > migration blocker if the device offer an unsupported feature.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v3: add mirgation blocker properly so vhost_dev can handle it.
> > ---
> >   net/vhost-vdpa.c | 12 ++++++++----
> >   1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 4f983df000..094dc1c2d0 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -795,7 +795,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >                                          int nvqs,
> >                                          bool is_datapath,
> >                                          bool svq,
> > -                                       struct vhost_vdpa_iova_range iova_range)
> > +                                       struct vhost_vdpa_iova_range iova_range,
> > +                                       uint64_t features)
> >   {
> >       NetClientState *nc = NULL;
> >       VhostVDPAState *s;
> > @@ -818,7 +819,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >       s->vhost_vdpa.shadow_vqs_enabled = svq;
> >       s->vhost_vdpa.iova_range = iova_range;
> >       s->vhost_vdpa.shadow_data = svq;
> > -    if (!is_datapath) {
> > +    if (queue_pair_index == 0) {
> > +        vhost_vdpa_net_valid_svq_features(features,
> > +                                          &s->vhost_vdpa.migration_blocker);
>
>
> Since we do validation at initialization, is this necessary to valid
> once again in other places?

Ok, after reading patch 13, I think the question is:

The validation seems to be independent to net, can we valid it once
during vhost_vdpa_init()?

Thanks

>
> Thanks
>
>
> > +    } else if (!is_datapath) {
> >           s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> >                                               vhost_vdpa_net_cvq_cmd_page_len());
> >           memset(s->cvq_cmd_out_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
> > @@ -956,7 +960,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       for (i = 0; i < queue_pairs; i++) {
> >           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >                                        vdpa_device_fd, i, 2, true, opts->x_svq,
> > -                                     iova_range);
> > +                                     iova_range, features);
> >           if (!ncs[i])
> >               goto err;
> >       }
> > @@ -964,7 +968,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       if (has_cvq) {
> >           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >                                    vdpa_device_fd, i, 1, false,
> > -                                 opts->x_svq, iova_range);
> > +                                 opts->x_svq, iova_range, features);
> >           if (!nc)
> >               goto err;
> >       }



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration
  2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (14 preceding siblings ...)
  2023-02-24 15:54 ` [PATCH v4 15/15] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices Eugenio Pérez
@ 2023-02-27 12:40 ` Alvaro Karsz
  15 siblings, 0 replies; 48+ messages in thread
From: Alvaro Karsz @ 2023-02-27 12:40 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Jason Wang,
	Gautam Dawar, Laurent Vivier, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

>
> It's possible to migrate vdpa net devices if they are shadowed from the
> start.  But to always shadow the dataplane is to effectively break its host
> passthrough, so its not efficient in vDPA scenarios.
>
> This series enables dynamically switching to shadow mode only at
> migration time.  This allows full data virtqueues passthrough all the
> time qemu is not migrating.
>
> In this series only net devices with no CVQ are migratable.  CVQ adds
> additional state that would make the series bigger and still had some
> controversy on previous RFC, so let's split it.
>
> Successfully tested with vdpa_sim_net with patch [1] applied and with the qemu
> emulated device with vp_vdpa with some restrictions:
> * No CVQ. No feature that didn't work with SVQ previously (packed, ...)
> * VIRTIO_RING_F_STATE patches implementing [2].
> * Expose _F_SUSPEND, but ignore it and suspend on ring state fetch like
>   DPDK.
>
> Previous versions were tested by many vendors. Not carrying Tested-by because
> of code changes, so re-testing would be appreciated.
>
> Comments are welcome.
>
> v4:
> - Recover used_idx from guest's vring if device cannot suspend.
> - Fix starting device in the middle of a migration.  Removed some
>   duplication in setting / clearing enable_shadow_vqs and shadow_data
>   members of vhost_vdpa.
> - Fix (again) "Check for SUSPEND in vhost_dev.backend_cap, as
>   .backend_features is empty at the check moment.". It was reverted by
>   mistake in v3.
> - Fix memory leak of iova tree.
> - Properly rewind SVQ as in flight descriptors were still being accounted
>   in vq base.
> - Expand documentation.
>
> v3:
> - Start datapatch in SVQ in device started while migrating.
> - Properly register migration blockers if device present unsupported features.
> - Fix race condition because of not stopping the SVQ until device cleanup.
> - Explain purpose of iova tree in the first patch message.
> - s/dynamycally/dynamically/ in cover letter.
> - at lore.kernel.org/qemu-devel/20230215173850.298832-14-eperezma@redhat.com
>
> v2:
> - Check for SUSPEND in vhost_dev.backend_cap, as .backend_features is empty at
>   the check moment.
> - at https://lore.kernel.org/all/20230208094253.702672-12-eperezma@redhat.com/T/
>
> v1:
> - Omit all code working with CVQ and block migration if the device supports
>   CVQ.
> - Remove spurious kick.
> - Move all possible checks for migration to vhost-vdpa instead of the net
>   backend. Move them to init code from start code.
> - Suspend on vhost_vdpa_dev_start(false) instead of in vhost-vdpa net backend.
> - Properly split suspend after geting base and adding of status_reset patches.
> - Add possible TODOs to points where this series can improve in the future.
> - Check the state of migration using migration_in_setup and
>   migration_has_failed instead of checking all the possible migration status in
>   a switch.
> - Add TODO with possible low hand fruit using RESUME ops.
> - Always offer _F_LOG from virtio/vhost-vdpa and let migration blockers do
>   their thing instead of adding a variable.
> - RFC v2 at https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02574.html
>
> RFC v2:
> - Use a migration listener instead of a memory listener to know when
>   the migration starts.
> - Add stuff not picked with ASID patches, like enable rings after
>   driver_ok
> - Add rewinding on the migration src, not in dst
> - RFC v1 at https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg01664.html
>
> [1] https://lore.kernel.org/lkml/20230203142501.300125-1-eperezma@redhat.com/T/
> [2] https://lists.oasis-open.org/archives/virtio-comment/202103/msg00036.html
>
> Eugenio Pérez (15):
>   vdpa net: move iova tree creation from init to start
>   vdpa: Remember last call fd set
>   vdpa: stop svq at vhost_vdpa_dev_start(false)
>   vdpa: Negotiate _F_SUSPEND feature
>   vdpa: move vhost reset after get vring base
>   vdpa: add vhost_vdpa->suspended parameter
>   vdpa: add vhost_vdpa_suspend
>   vdpa: rewind at get_base, not set_base
>   vdpa: add vdpa net migration state notifier
>   vdpa: disable RAM block discard only for the first device
>   vdpa net: block migration if the device has CVQ
>   vdpa: block migration if device has unsupported features
>   vdpa: block migration if SVQ does not admit a feature
>   vdpa net: allow VHOST_F_LOG_ALL
>   vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices
>
>  include/hw/virtio/vhost-backend.h  |   4 +
>  include/hw/virtio/vhost-vdpa.h     |   3 +
>  hw/virtio/vhost-shadow-virtqueue.c |   8 +-
>  hw/virtio/vhost-vdpa.c             | 128 +++++++++++++------
>  hw/virtio/vhost.c                  |   3 +
>  net/vhost-vdpa.c                   | 198 ++++++++++++++++++++++++-----
>  hw/virtio/trace-events             |   1 +
>  7 files changed, 273 insertions(+), 72 deletions(-)
>
> --

The migration works with SolidNET DPU.

Tested-by: Alvaro Karsz <alvaro.karsz@solid-run.com>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/15] vdpa: add vhost_vdpa_suspend
  2023-02-24 15:54 ` [PATCH v4 07/15] vdpa: add vhost_vdpa_suspend Eugenio Pérez
  2023-02-27  7:27   ` Jason Wang
@ 2023-03-01  1:30   ` Si-Wei Liu
  2023-03-03 16:34     ` Eugenio Perez Martin
  1 sibling, 1 reply; 48+ messages in thread
From: Si-Wei Liu @ 2023-03-01  1:30 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Stefano Garzarella, Shannon Nelson, Jason Wang, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, Liuxiangdong,
	Parav Pandit, Eli Cohen, Zhu Lingshan, Harpreet Singh Anand,
	Gonglei (Arei), Lei Yang



On 2/24/2023 7:54 AM, Eugenio Pérez wrote:
> The function vhost.c:vhost_dev_stop fetches the vring base so the vq
> state can be migrated to other devices.  However, this is unreliable in
> vdpa, since we didn't signal the device to suspend the queues, making
> the value fetched useless.
>
> Suspend the device if possible before fetching first and subsequent
> vring bases.
>
> Moreover, vdpa totally reset and wipes the device at the last device
> before fetch its vrings base, making that operation useless in the last
> device. This will be fixed in later patches of this series.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v4:
> * Look for _F_SUSPEND at vhost_dev->backend_cap, not backend_features
> * Fall back on reset & fetch used idx from guest's memory
> ---
>   hw/virtio/vhost-vdpa.c | 25 +++++++++++++++++++++++++
>   hw/virtio/trace-events |  1 +
>   2 files changed, 26 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 228677895a..f542960a64 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -712,6 +712,7 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>   
>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>       trace_vhost_vdpa_reset_device(dev, status);
> +    v->suspended = false;
>       return ret;
>   }
>   
> @@ -1109,6 +1110,29 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
>       }
>   }
>   
> +static void vhost_vdpa_suspend(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int r;
> +
> +    if (!vhost_vdpa_first_dev(dev)) {
> +        return;
> +    }
> +
> +    if (!(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
Polarity reversed. This ends up device getting reset always even if the 
backend offers _F_SUSPEND.

-Siwei

> +        trace_vhost_vdpa_suspend(dev);
> +        r = ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
> +        if (unlikely(r)) {
> +            error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
> +        } else {
> +            v->suspended = true;
> +            return;
> +        }
> +    }
> +
> +    vhost_vdpa_reset_device(dev);
> +}
> +
>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -1123,6 +1147,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           }
>           vhost_vdpa_set_vring_ready(dev);
>       } else {
> +        vhost_vdpa_suspend(dev);
>           vhost_vdpa_svqs_stop(dev);
>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       }
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index a87c5f39a2..8f8d05cf9b 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -50,6 +50,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
>   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
>   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
>   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
> +vhost_vdpa_suspend(void *dev) "dev: %p"
>   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
>   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
>   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 01/15] vdpa net: move iova tree creation from init to start
  2023-02-27  7:04   ` Jason Wang
@ 2023-03-01  7:01     ` Eugenio Perez Martin
  2023-03-03  3:32       ` Jason Wang
  0 siblings, 1 reply; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-01  7:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Mon, Feb 27, 2023 at 8:04 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > Only create iova_tree if and when it is needed.
> >
> > The cleanup keeps being responsible of last VQ but this change allows it
> > to merge both cleanup functions.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
> > ---
> > v4:
> > * Remove leak of iova_tree because double allocation
> > * Document better the sharing of IOVA tree between data and CVQ
> > ---
> >   net/vhost-vdpa.c | 113 ++++++++++++++++++++++++++++++++++-------------
> >   1 file changed, 83 insertions(+), 30 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index de5ed8ff22..b89c99066a 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -178,13 +178,9 @@ err_init:
> >   static void vhost_vdpa_cleanup(NetClientState *nc)
> >   {
> >       VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > -    struct vhost_dev *dev = &s->vhost_net->dev;
> >
> >       qemu_vfree(s->cvq_cmd_out_buffer);
> >       qemu_vfree(s->status);
> > -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > -    }
> >       if (s->vhost_net) {
> >           vhost_net_cleanup(s->vhost_net);
> >           g_free(s->vhost_net);
> > @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
> >       return size;
> >   }
> >
> > +/** From any vdpa net client, get the netclient of first queue pair */
> > +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> > +{
> > +    NICState *nic = qemu_get_nic(s->nc.peer);
> > +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> > +
> > +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> > +}
> > +
> > +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> > +{
> > +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > +
> > +    if (v->shadow_vqs_enabled) {
> > +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > +                                           v->iova_range.last);
> > +    }
> > +}
> > +
> > +static int vhost_vdpa_net_data_start(NetClientState *nc)
> > +{
> > +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > +
> > +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +
> > +    if (v->index == 0) {
> > +        vhost_vdpa_net_data_start_first(s);
> > +        return 0;
> > +    }
> > +
> > +    if (v->shadow_vqs_enabled) {
> > +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> > +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> > +{
> > +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    struct vhost_dev *dev;
> > +
> > +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +
> > +    dev = s->vhost_vdpa.dev;
> > +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > +    }
> > +}
> > +
> >   static NetClientInfo net_vhost_vdpa_info = {
> >           .type = NET_CLIENT_DRIVER_VHOST_VDPA,
> >           .size = sizeof(VhostVDPAState),
> >           .receive = vhost_vdpa_receive,
> > +        .start = vhost_vdpa_net_data_start,
> > +        .stop = vhost_vdpa_net_client_stop,
>
>
> Looking at the implementation, it seems nothing net specific, any reason
> we can't simply use vhost_vdpa_dev_start()?
>

IOVA tree must be shared between (at least) all dataplane vhost_vdpa.
How could we move the call to vhost_vdpa_net_first_nc_vdpa to
vhost_vdpa_dev_start?

A possibility is to always allocate it just in case. But it seems to
me it is better to not start allocating resources just in case :).

>
> >           .cleanup = vhost_vdpa_cleanup,
> >           .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> >           .has_ufo = vhost_vdpa_has_ufo,
> > @@ -351,7 +401,7 @@ dma_map_err:
> >
> >   static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >   {
> > -    VhostVDPAState *s;
> > +    VhostVDPAState *s, *s0;
> >       struct vhost_vdpa *v;
> >       uint64_t backend_features;
> >       int64_t cvq_group;
> > @@ -415,8 +465,6 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >           return r;
> >       }
> >
> > -    v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > -                                       v->iova_range.last);
> >       v->shadow_vqs_enabled = true;
> >       s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
> >
> > @@ -425,6 +473,27 @@ out:
> >           return 0;
> >       }
> >
> > +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> > +    if (s0->vhost_vdpa.iova_tree) {
> > +        /*
> > +         * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
> > +         * simplicity, wether CVQ shares ASID with guest or not, because:
>
>
> Typo, should be "whether", or "regardless of whether"(not a native speaker).
>

Good catch, I can fix it in the next version.

Thanks!

> Other looks good.
>
> Thanks
>
>
> > +         * - Memory listener need access to guest's memory addresses allocated
> > +         *   in the IOVA tree.
> > +         * - There should be plenty of IOVA address space for both ASID not to
> > +         *   worry about collisions between them.  Guest's translations are
> > +         *   still validated with virtio virtqueue_pop so there is no risk for
> > +         *   the guest to access memory it shouldn't.
> > +         *
> > +         * To allocate a iova tree per ASID is doable but it complicates the
> > +         * code and it is not worth for the moment.
> > +         */
> > +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> > +    } else {
> > +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > +                                           v->iova_range.last);
> > +    }
> > +
> >       r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
> >                                  vhost_vdpa_net_cvq_cmd_page_len(), false);
> >       if (unlikely(r < 0)) {
> > @@ -449,15 +518,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
> >       if (s->vhost_vdpa.shadow_vqs_enabled) {
> >           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> >           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> > -        if (!s->always_svq) {
> > -            /*
> > -             * If only the CVQ is shadowed we can delete this safely.
> > -             * If all the VQs are shadows this will be needed by the time the
> > -             * device is started again to register SVQ vrings and similar.
> > -             */
> > -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > -        }
> >       }
> > +
> > +    vhost_vdpa_net_client_stop(nc);
> >   }
> >
> >   static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
> > @@ -667,8 +730,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >                                          int nvqs,
> >                                          bool is_datapath,
> >                                          bool svq,
> > -                                       struct vhost_vdpa_iova_range iova_range,
> > -                                       VhostIOVATree *iova_tree)
> > +                                       struct vhost_vdpa_iova_range iova_range)
> >   {
> >       NetClientState *nc = NULL;
> >       VhostVDPAState *s;
> > @@ -690,7 +752,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >       s->vhost_vdpa.shadow_vqs_enabled = svq;
> >       s->vhost_vdpa.iova_range = iova_range;
> >       s->vhost_vdpa.shadow_data = svq;
> > -    s->vhost_vdpa.iova_tree = iova_tree;
> >       if (!is_datapath) {
> >           s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> >                                               vhost_vdpa_net_cvq_cmd_page_len());
> > @@ -760,7 +821,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       uint64_t features;
> >       int vdpa_device_fd;
> >       g_autofree NetClientState **ncs = NULL;
> > -    g_autoptr(VhostIOVATree) iova_tree = NULL;
> >       struct vhost_vdpa_iova_range iova_range;
> >       NetClientState *nc;
> >       int queue_pairs, r, i = 0, has_cvq = 0;
> > @@ -812,12 +872,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >           goto err;
> >       }
> >
> > -    if (opts->x_svq) {
> > -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> > -            goto err_svq;
> > -        }
> > -
> > -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
> > +        goto err;
> >       }
> >
> >       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> > @@ -825,7 +881,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       for (i = 0; i < queue_pairs; i++) {
> >           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >                                        vdpa_device_fd, i, 2, true, opts->x_svq,
> > -                                     iova_range, iova_tree);
> > +                                     iova_range);
> >           if (!ncs[i])
> >               goto err;
> >       }
> > @@ -833,13 +889,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       if (has_cvq) {
> >           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >                                    vdpa_device_fd, i, 1, false,
> > -                                 opts->x_svq, iova_range, iova_tree);
> > +                                 opts->x_svq, iova_range);
> >           if (!nc)
> >               goto err;
> >       }
> >
> > -    /* iova_tree ownership belongs to last NetClientState */
> > -    g_steal_pointer(&iova_tree);
> >       return 0;
> >
> >   err:
> > @@ -849,7 +903,6 @@ err:
> >           }
> >       }
> >
> > -err_svq:
> >       qemu_close(vdpa_device_fd);
> >
> >       return -1;
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 06/15] vdpa: add vhost_vdpa->suspended parameter
  2023-02-27  7:24   ` Jason Wang
@ 2023-03-01 19:11     ` Eugenio Perez Martin
  0 siblings, 0 replies; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-01 19:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Mon, Feb 27, 2023 at 8:24 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > This allows vhost_vdpa to track if it is safe to get vring base from the
> > device or not.  If it is not, vhost can fall back to fetch idx from the
> > guest buffer again.
> >
> > No functional change intended in this patch, later patches will use this
> > field.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
>
> I think we probably need to re-order the patch, e.g to let this come
> before at least patch 5.
>

Right, that was a miss. I'll reorder them.

>
> > ---
> >   include/hw/virtio/vhost-vdpa.h | 2 ++
> >   hw/virtio/vhost-vdpa.c         | 8 ++++++++
> >   2 files changed, 10 insertions(+)
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 7997f09a8d..4a7d396674 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -42,6 +42,8 @@ typedef struct vhost_vdpa {
> >       bool shadow_vqs_enabled;
> >       /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */
> >       bool shadow_data;
> > +    /* Device suspended successfully */
> > +    bool suspended;
>
>
> Should we implement the set/clear in this patch as well?
>

I'd prefer to keep each patch separated in declaration / usage but
they can be squashed for sure.

Thanks!

> Thanks
>
>
> >       /* IOVA mapping used by the Shadow Virtqueue */
> >       VhostIOVATree *iova_tree;
> >       GPtrArray *shadow_vqs;
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 8cc9c98db9..228677895a 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1227,6 +1227,14 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
> >           return 0;
> >       }
> >
> > +    if (!v->suspended) {
> > +        /*
> > +         * Cannot trust in value returned by device, let vhost recover used
> > +         * idx from guest.
> > +         */
> > +        return -1;
> > +    }
> > +
> >       ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
> >       trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
> >       return ret;
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 05/15] vdpa: move vhost reset after get vring base
  2023-02-27  7:22   ` Jason Wang
@ 2023-03-01 19:11     ` Eugenio Perez Martin
  0 siblings, 0 replies; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-01 19:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Mon, Feb 27, 2023 at 8:22 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > The function vhost.c:vhost_dev_stop calls vhost operation
> > vhost_dev_start(false). In the case of vdpa it totally reset and wipes
> > the device, making the fetching of the vring base (virtqueue state) totally
> > useless.
>
>
> As discussed before, should we do something reverse: in
> vhost_vdpa_dev_started() since what proposed in the patch doesn't solve
> the issue (index could be moved after the get_vring_base())
>
> 1) if _F_SUSPEND is negotiated, suspend instead of reset
>
> 2) if _F_SUSPEND is not negotiated, reset and fail
> vhost_get_vring_base() to allow graceful fallback?
>

Right. I think option 2 is the right one, as it introduces the changes
more gradually.

Thanks!

> Thanks
>
>
> >
> > The kernel backend does not use vhost_dev_start vhost op callback, but
> > vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
> > is desirable, but it can be added on top.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   include/hw/virtio/vhost-backend.h |  4 ++++
> >   hw/virtio/vhost-vdpa.c            | 22 ++++++++++++++++------
> >   hw/virtio/vhost.c                 |  3 +++
> >   3 files changed, 23 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > index c5ab49051e..ec3fbae58d 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
> >
> >   typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
> >                                          int fd);
> > +
> > +typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
> > +
> >   typedef struct VhostOps {
> >       VhostBackendType backend_type;
> >       vhost_backend_init vhost_backend_init;
> > @@ -177,6 +180,7 @@ typedef struct VhostOps {
> >       vhost_get_device_id_op vhost_get_device_id;
> >       vhost_force_iommu_op vhost_force_iommu;
> >       vhost_set_config_call_op vhost_set_config_call;
> > +    vhost_reset_status_op vhost_reset_status;
> >   } VhostOps;
> >
> >   int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 4fac144169..8cc9c98db9 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1134,14 +1134,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >       if (started) {
> >           memory_listener_register(&v->listener, &address_space_memory);
> >           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > -    } else {
> > -        vhost_vdpa_reset_device(dev);
> > -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > -                                   VIRTIO_CONFIG_S_DRIVER);
> > -        memory_listener_unregister(&v->listener);
> > +    }
> >
> > -        return 0;
> > +    return 0;
> > +}
> > +
> > +static void vhost_vdpa_reset_status(struct vhost_dev *dev)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +
> > +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> > +        return;
> >       }
> > +
> > +    vhost_vdpa_reset_device(dev);
> > +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > +                               VIRTIO_CONFIG_S_DRIVER);
> > +    memory_listener_unregister(&v->listener);
> >   }
> >
> >   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> > @@ -1328,4 +1337,5 @@ const VhostOps vdpa_ops = {
> >           .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
> >           .vhost_force_iommu = vhost_vdpa_force_iommu,
> >           .vhost_set_config_call = vhost_vdpa_set_config_call,
> > +        .vhost_reset_status = vhost_vdpa_reset_status,
> >   };
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index eb8c4c378c..a266396576 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
> >                                hdev->vqs + i,
> >                                hdev->vq_index + i);
> >       }
> > +    if (hdev->vhost_ops->vhost_reset_status) {
> > +        hdev->vhost_ops->vhost_reset_status(hdev);
> > +    }
> >
> >       if (vhost_dev_has_iommu(hdev)) {
> >           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/15] vdpa: add vdpa net migration state notifier
  2023-02-27  8:08   ` Jason Wang
@ 2023-03-01 19:26     ` Eugenio Perez Martin
  2023-03-03  3:34       ` Jason Wang
  0 siblings, 1 reply; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-01 19:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Mon, Feb 27, 2023 at 9:08 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > This allows net to restart the device backend to configure SVQ on it.
> >
> > Ideally, these changes should not be net specific. However, the vdpa net
> > backend is the one with enough knowledge to configure everything because
> > of some reasons:
> > * Queues might need to be shadowed or not depending on its kind (control
> >    vs data).
> > * Queues need to share the same map translations (iova tree).
> >
> > Because of that it is cleaner to restart the whole net backend and
> > configure again as expected, similar to how vhost-kernel moves between
> > userspace and passthrough.
> >
> > If more kinds of devices need dynamic switching to SVQ we can create a
> > callback struct like VhostOps and move most of the code there.
> > VhostOps cannot be reused since all vdpa backend share them, and to
> > personalize just for networking would be too heavy.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v4:
> > * Delete duplication of set shadow_data and shadow_vqs_enabled moving it
> >    to data / cvq net start functions.
> >
> > v3:
> > * Check for migration state at vdpa device start to enable SVQ in data
> >    vqs.
> >
> > v1 from RFC:
> > * Add TODO to use the resume operation in the future.
> > * Use migration_in_setup and migration_has_failed instead of a
> >    complicated switch case.
> > ---
> >   net/vhost-vdpa.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++--
> >   1 file changed, 69 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index b89c99066a..c5512ddf10 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -26,12 +26,15 @@
> >   #include <err.h>
> >   #include "standard-headers/linux/virtio_net.h"
> >   #include "monitor/monitor.h"
> > +#include "migration/migration.h"
> > +#include "migration/misc.h"
> >   #include "hw/virtio/vhost.h"
> >
> >   /* Todo:need to add the multiqueue support here */
> >   typedef struct VhostVDPAState {
> >       NetClientState nc;
> >       struct vhost_vdpa vhost_vdpa;
> > +    Notifier migration_state;
> >       VHostNetState *vhost_net;
> >
> >       /* Control commands shadow buffers */
> > @@ -239,10 +242,59 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> >       return DO_UPCAST(VhostVDPAState, nc, nc0);
> >   }
> >
> > +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
> > +{
> > +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > +    VirtIONet *n;
> > +    VirtIODevice *vdev;
> > +    int data_queue_pairs, cvq, r;
> > +
> > +    /* We are only called on the first data vqs and only if x-svq is not set */
> > +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> > +        return;
> > +    }
> > +
> > +    vdev = v->dev->vdev;
> > +    n = VIRTIO_NET(vdev);
> > +    if (!n->vhost_started) {
> > +        return;
> > +    }
> > +
> > +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> > +                                  n->max_ncs - n->max_queue_pairs : 0;
> > +    /*
> > +     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
> > +     * in the future and resume the device if read-only operations between
> > +     * suspend and reset goes wrong.
> > +     */
> > +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > +
> > +    /* Start will check migration setup_or_active to configure or not SVQ */
> > +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > +    if (unlikely(r < 0)) {
> > +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
> > +    }
> > +}
> > +
> > +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> > +{
> > +    MigrationState *migration = data;
> > +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> > +                                     migration_state);
> > +
> > +    if (migration_in_setup(migration)) {
> > +        vhost_vdpa_net_log_global_enable(s, true);
> > +    } else if (migration_has_failed(migration)) {
> > +        vhost_vdpa_net_log_global_enable(s, false);
> > +    }
> > +}
> > +
> >   static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> >   {
> >       struct vhost_vdpa *v = &s->vhost_vdpa;
> >
> > +    add_migration_state_change_notifier(&s->migration_state);
> >       if (v->shadow_vqs_enabled) {
> >           v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >                                              v->iova_range.last);
> > @@ -256,6 +308,15 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
> >
> >       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >
> > +    if (s->always_svq ||
> > +        migration_is_setup_or_active(migrate_get_current()->state)) {
> > +        v->shadow_vqs_enabled = true;
> > +        v->shadow_data = true;
> > +    } else {
> > +        v->shadow_vqs_enabled = false;
> > +        v->shadow_data = false;
> > +    }
> > +
> >       if (v->index == 0) {
> >           vhost_vdpa_net_data_start_first(s);
> >           return 0;
> > @@ -276,6 +337,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
> >
> >       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >
> > +    if (s->vhost_vdpa.index == 0) {
> > +        remove_migration_state_change_notifier(&s->migration_state);
> > +    }
>
>
> This should work but I just realize that vhost support
> vhost_dev_set_log(), I wonder if it would be simpler to go with that way.
>
> Using vhost_virtqueue_set_addr(, enable_log = true)?
>

We can do that but it has the same problem as with checking _F_LOG_ALL
in set_features:

1. We're tearing down a vhost device using a listener registered
against that device, at start / stop.
2. We need to traverse all the devices many times to first get all the
vqs state and then transverse them again to set them up properly.

My two ideas to solve the recursiveness of 1 are:
a. Duplicating vhost_dev_start / vhost_dev_stop at
vhost_vdpa_set_features / vhost_vdpa_set_vring_addr.

This has the same problem as all duplications: It will get out of sync
eventually. For example, the latest changes about configure interrupt
would need to be duplicated in this new call.

b. Add a new parameter to vhost_dev_start/stop to skip the
set_features / set_vring_address step.
Now that the virtio queue reset changes have exposed these functions
it is also possible to call them from vhost-vdpa.

Maybe we can store that parameter in vhost_vdpa so we don't call
vhost_dev_start / stop there instead of affecting all backends, but
the idea is the same.

For problem 2 I still do not have a solution. CVQ / MQ Is out of the
scope for this series but I think it will bite us when we add it
(hopefully soon).

Thanks!

> Thanks
>
>
> > +
> >       dev = s->vhost_vdpa.dev;
> >       if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >           g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > @@ -412,11 +477,12 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >       s = DO_UPCAST(VhostVDPAState, nc, nc);
> >       v = &s->vhost_vdpa;
> >
> > -    v->shadow_data = s->always_svq;
> > +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> > +    v->shadow_data = s0->vhost_vdpa.shadow_vqs_enabled;
> >       v->shadow_vqs_enabled = s->always_svq;
> >       s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
> >
> > -    if (s->always_svq) {
> > +    if (s->vhost_vdpa.shadow_data) {
> >           /* SVQ is already configured for all virtqueues */
> >           goto out;
> >       }
> > @@ -473,7 +539,6 @@ out:
> >           return 0;
> >       }
> >
> > -    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >       if (s0->vhost_vdpa.iova_tree) {
> >           /*
> >            * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
> > @@ -749,6 +814,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >       s->vhost_vdpa.device_fd = vdpa_device_fd;
> >       s->vhost_vdpa.index = queue_pair_index;
> >       s->always_svq = svq;
> > +    s->migration_state.notify = vdpa_net_migration_state_notifier;
> >       s->vhost_vdpa.shadow_vqs_enabled = svq;
> >       s->vhost_vdpa.iova_range = iova_range;
> >       s->vhost_vdpa.shadow_data = svq;
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/15] vdpa: block migration if device has unsupported features
  2023-02-27  8:19     ` Jason Wang
@ 2023-03-01 19:32       ` Eugenio Perez Martin
  2023-03-03  3:48         ` Jason Wang
  0 siblings, 1 reply; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-01 19:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Mon, Feb 27, 2023 at 9:20 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Feb 27, 2023 at 4:15 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > > A vdpa net device must initialize with SVQ in order to be migratable at
> > > this moment, and initialization code verifies some conditions.  If the
> > > device is not initialized with the x-svq parameter, it will not expose
> > > _F_LOG so the vhost subsystem will block VM migration from its
> > > initialization.
> > >
> > > Next patches change this, so we need to verify migration conditions
> > > differently.
> > >
> > > QEMU only supports a subset of net features in SVQ, and it cannot
> > > migrate state that cannot track or restore in the destination.  Add a
> > > migration blocker if the device offer an unsupported feature.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > v3: add mirgation blocker properly so vhost_dev can handle it.
> > > ---
> > >   net/vhost-vdpa.c | 12 ++++++++----
> > >   1 file changed, 8 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index 4f983df000..094dc1c2d0 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -795,7 +795,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >                                          int nvqs,
> > >                                          bool is_datapath,
> > >                                          bool svq,
> > > -                                       struct vhost_vdpa_iova_range iova_range)
> > > +                                       struct vhost_vdpa_iova_range iova_range,
> > > +                                       uint64_t features)
> > >   {
> > >       NetClientState *nc = NULL;
> > >       VhostVDPAState *s;
> > > @@ -818,7 +819,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >       s->vhost_vdpa.shadow_vqs_enabled = svq;
> > >       s->vhost_vdpa.iova_range = iova_range;
> > >       s->vhost_vdpa.shadow_data = svq;
> > > -    if (!is_datapath) {
> > > +    if (queue_pair_index == 0) {
> > > +        vhost_vdpa_net_valid_svq_features(features,
> > > +                                          &s->vhost_vdpa.migration_blocker);
> >
> >
> > Since we do validation at initialization, is this necessary to valid
> > once again in other places?
>
> Ok, after reading patch 13, I think the question is:
>
> The validation seems to be independent to net, can we valid it once
> during vhost_vdpa_init()?
>

vhost_vdpa_net_valid_svq_features also checks for net features. In
particular, all the non transport features must be in
vdpa_svq_device_features.

This is how we protect that the device / guest will never negotiate
things like VLAN filtering support, as SVQ still does not know how to
restore at the destination.

In the VLAN filtering case CVQ is needed to restore VLAN, so it is
covered by patch 11/15. But other future features may need support for
restoring it in the destination.

Thanks!

> Thanks
>
> >
> > Thanks
> >
> >
> > > +    } else if (!is_datapath) {
> > >           s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> > >                                               vhost_vdpa_net_cvq_cmd_page_len());
> > >           memset(s->cvq_cmd_out_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
> > > @@ -956,7 +960,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >       for (i = 0; i < queue_pairs; i++) {
> > >           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > >                                        vdpa_device_fd, i, 2, true, opts->x_svq,
> > > -                                     iova_range);
> > > +                                     iova_range, features);
> > >           if (!ncs[i])
> > >               goto err;
> > >       }
> > > @@ -964,7 +968,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >       if (has_cvq) {
> > >           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > >                                    vdpa_device_fd, i, 1, false,
> > > -                                 opts->x_svq, iova_range);
> > > +                                 opts->x_svq, iova_range, features);
> > >           if (!nc)
> > >               goto err;
> > >       }
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 10/15] vdpa: disable RAM block discard only for the first device
  2023-02-27  8:11   ` Jason Wang
@ 2023-03-02 15:11     ` Eugenio Perez Martin
  0 siblings, 0 replies; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-02 15:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Mon, Feb 27, 2023 at 9:11 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > Although it does not make a big difference, its more correct and
> > simplifies the cleanup path in subsequent patches.
> >
> > Move ram_block_discard_disable(false) call to the top of
> > vhost_vdpa_cleanup because:
> > * We cannot use vhost_vdpa_first_dev after dev->opaque = NULL
> >    assignment.
> > * Improve the stack order in cleanup: since it is the last action taken
> >    in init, it should be the first at cleanup.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-vdpa.c | 25 ++++++++++++++-----------
> >   1 file changed, 14 insertions(+), 11 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 71e3dc21fe..27655e7582 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -431,16 +431,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >       trace_vhost_vdpa_init(dev, opaque);
> >       int ret;
> >
> > -    /*
> > -     * Similar to VFIO, we end up pinning all guest memory and have to
> > -     * disable discarding of RAM.
> > -     */
> > -    ret = ram_block_discard_disable(true);
> > -    if (ret) {
> > -        error_report("Cannot set discarding of RAM broken");
> > -        return ret;
> > -    }
> > -
> >       v = opaque;
> >       v->dev = dev;
> >       dev->opaque =  opaque ;
> > @@ -452,6 +442,16 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >           return 0;
> >       }
> >
> > +    /*
> > +     * Similar to VFIO, we end up pinning all guest memory and have to
> > +     * disable discarding of RAM.
> > +     */
> > +    ret = ram_block_discard_disable(true);
> > +    if (ret) {
> > +        error_report("Cannot set discarding of RAM broken");
> > +        return ret;
> > +    }
>
>
> We seems to lose the chance to free svq allocated by
> vhost_vdpa_init_svq() in this case?
>

Right, I'll fix it in the next version.

Thanks!

> Thanks
>
>
> > +
> >       vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >                                  VIRTIO_CONFIG_S_DRIVER);
> >
> > @@ -577,12 +577,15 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
> >       assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> >       v = dev->opaque;
> >       trace_vhost_vdpa_cleanup(dev, v);
> > +    if (vhost_vdpa_first_dev(dev)) {
> > +        ram_block_discard_disable(false);
> > +    }
> > +
> >       vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >       memory_listener_unregister(&v->listener);
> >       vhost_vdpa_svq_cleanup(dev);
> >
> >       dev->opaque = NULL;
> > -    ram_block_discard_disable(false);
> >
> >       return 0;
> >   }
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 11/15] vdpa net: block migration if the device has CVQ
  2023-02-27  8:12   ` Jason Wang
@ 2023-03-02 15:13     ` Eugenio Perez Martin
  0 siblings, 0 replies; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-02 15:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Mon, Feb 27, 2023 at 9:13 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > Devices with CVQ needs to migrate state beyond vq state.  Leaving this
> > to future series.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v3: Migration blocker is registered in vhost_dev.
> > ---
> >   include/hw/virtio/vhost-vdpa.h | 1 +
> >   hw/virtio/vhost-vdpa.c         | 1 +
> >   net/vhost-vdpa.c               | 9 +++++++++
> >   3 files changed, 11 insertions(+)
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 4a7d396674..c278a2a8de 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -50,6 +50,7 @@ typedef struct vhost_vdpa {
> >       const VhostShadowVirtqueueOps *shadow_vq_ops;
> >       void *shadow_vq_ops_opaque;
> >       struct vhost_dev *dev;
> > +    Error *migration_blocker;
> >       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >   } VhostVDPA;
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 27655e7582..25b64ae854 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -438,6 +438,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >       v->msg_type = VHOST_IOTLB_MSG_V2;
> >       vhost_vdpa_init_svq(dev, v);
> >
> > +    error_propagate(&dev->migration_blocker, v->migration_blocker);
> >       if (!vhost_vdpa_first_dev(dev)) {
> >           return 0;
> >       }
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index c5512ddf10..4f983df000 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -828,6 +828,15 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >
> >           s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
> >           s->vhost_vdpa.shadow_vq_ops_opaque = s;
> > +
> > +        /*
> > +         * TODO: We cannot migrate devices with CVQ as there is no way to set
> > +         * the device state (MAC, MQ, etc) before starting datapath.
> > +         *
> > +         * Migration blocker ownership now belongs to v
>
>
> The sentence is incomplete.
>

Right, I'll fix it for the next version.

Thanks!

> Other looks good.
>
> Thanks
>
>
> > +         */
> > +        error_setg(&s->vhost_vdpa.migration_blocker,
> > +                   "net vdpa cannot migrate with CVQ feature");
> >       }
> >       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> >       if (ret) {
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 01/15] vdpa net: move iova tree creation from init to start
  2023-03-01  7:01     ` Eugenio Perez Martin
@ 2023-03-03  3:32       ` Jason Wang
  2023-03-03  8:00         ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-03-03  3:32 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang


在 2023/3/1 15:01, Eugenio Perez Martin 写道:
> On Mon, Feb 27, 2023 at 8:04 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2023/2/24 23:54, Eugenio Pérez 写道:
>>> Only create iova_tree if and when it is needed.
>>>
>>> The cleanup keeps being responsible of last VQ but this change allows it
>>> to merge both cleanup functions.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>> v4:
>>> * Remove leak of iova_tree because double allocation
>>> * Document better the sharing of IOVA tree between data and CVQ
>>> ---
>>>    net/vhost-vdpa.c | 113 ++++++++++++++++++++++++++++++++++-------------
>>>    1 file changed, 83 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>> index de5ed8ff22..b89c99066a 100644
>>> --- a/net/vhost-vdpa.c
>>> +++ b/net/vhost-vdpa.c
>>> @@ -178,13 +178,9 @@ err_init:
>>>    static void vhost_vdpa_cleanup(NetClientState *nc)
>>>    {
>>>        VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> -    struct vhost_dev *dev = &s->vhost_net->dev;
>>>
>>>        qemu_vfree(s->cvq_cmd_out_buffer);
>>>        qemu_vfree(s->status);
>>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>> -    }
>>>        if (s->vhost_net) {
>>>            vhost_net_cleanup(s->vhost_net);
>>>            g_free(s->vhost_net);
>>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>>>        return size;
>>>    }
>>>
>>> +/** From any vdpa net client, get the netclient of first queue pair */
>>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>>> +{
>>> +    NICState *nic = qemu_get_nic(s->nc.peer);
>>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
>>> +
>>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
>>> +}
>>> +
>>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>>> +{
>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>> +                                           v->iova_range.last);
>>> +    }
>>> +}
>>> +
>>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
>>> +{
>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>> +
>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>> +
>>> +    if (v->index == 0) {
>>> +        vhost_vdpa_net_data_start_first(s);
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
>>> +{
>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> +    struct vhost_dev *dev;
>>> +
>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>> +
>>> +    dev = s->vhost_vdpa.dev;
>>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>> +    }
>>> +}
>>> +
>>>    static NetClientInfo net_vhost_vdpa_info = {
>>>            .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>>>            .size = sizeof(VhostVDPAState),
>>>            .receive = vhost_vdpa_receive,
>>> +        .start = vhost_vdpa_net_data_start,
>>> +        .stop = vhost_vdpa_net_client_stop,
>>
>> Looking at the implementation, it seems nothing net specific, any reason
>> we can't simply use vhost_vdpa_dev_start()?
>>
> IOVA tree must be shared between (at least) all dataplane vhost_vdpa.
> How could we move the call to vhost_vdpa_net_first_nc_vdpa to
> vhost_vdpa_dev_start?


Ok, I think I get it. We should really consider to implement a parent 
structure in the future for vhost_vdpa then we can avoid tricks like:

vq_index_end and vhost_vdpa_net_first_nc_vdpa()

Thanks


>
> A possibility is to always allocate it just in case. But it seems to
> me it is better to not start allocating resources just in case :).
>
>>>            .cleanup = vhost_vdpa_cleanup,
>>>            .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
>>>            .has_ufo = vhost_vdpa_has_ufo,
>>> @@ -351,7 +401,7 @@ dma_map_err:
>>>
>>>    static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>>>    {
>>> -    VhostVDPAState *s;
>>> +    VhostVDPAState *s, *s0;
>>>        struct vhost_vdpa *v;
>>>        uint64_t backend_features;
>>>        int64_t cvq_group;
>>> @@ -415,8 +465,6 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>>>            return r;
>>>        }
>>>
>>> -    v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>> -                                       v->iova_range.last);
>>>        v->shadow_vqs_enabled = true;
>>>        s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
>>>
>>> @@ -425,6 +473,27 @@ out:
>>>            return 0;
>>>        }
>>>
>>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>> +    if (s0->vhost_vdpa.iova_tree) {
>>> +        /*
>>> +         * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
>>> +         * simplicity, wether CVQ shares ASID with guest or not, because:
>>
>> Typo, should be "whether", or "regardless of whether"(not a native speaker).
>>
> Good catch, I can fix it in the next version.
>
> Thanks!
>
>> Other looks good.
>>
>> Thanks
>>
>>
>>> +         * - Memory listener need access to guest's memory addresses allocated
>>> +         *   in the IOVA tree.
>>> +         * - There should be plenty of IOVA address space for both ASID not to
>>> +         *   worry about collisions between them.  Guest's translations are
>>> +         *   still validated with virtio virtqueue_pop so there is no risk for
>>> +         *   the guest to access memory it shouldn't.
>>> +         *
>>> +         * To allocate a iova tree per ASID is doable but it complicates the
>>> +         * code and it is not worth for the moment.
>>> +         */
>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>> +    } else {
>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>> +                                           v->iova_range.last);
>>> +    }
>>> +
>>>        r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
>>>                                   vhost_vdpa_net_cvq_cmd_page_len(), false);
>>>        if (unlikely(r < 0)) {
>>> @@ -449,15 +518,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>>>        if (s->vhost_vdpa.shadow_vqs_enabled) {
>>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
>>> -        if (!s->always_svq) {
>>> -            /*
>>> -             * If only the CVQ is shadowed we can delete this safely.
>>> -             * If all the VQs are shadows this will be needed by the time the
>>> -             * device is started again to register SVQ vrings and similar.
>>> -             */
>>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>> -        }
>>>        }
>>> +
>>> +    vhost_vdpa_net_client_stop(nc);
>>>    }
>>>
>>>    static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
>>> @@ -667,8 +730,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>                                           int nvqs,
>>>                                           bool is_datapath,
>>>                                           bool svq,
>>> -                                       struct vhost_vdpa_iova_range iova_range,
>>> -                                       VhostIOVATree *iova_tree)
>>> +                                       struct vhost_vdpa_iova_range iova_range)
>>>    {
>>>        NetClientState *nc = NULL;
>>>        VhostVDPAState *s;
>>> @@ -690,7 +752,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
>>>        s->vhost_vdpa.iova_range = iova_range;
>>>        s->vhost_vdpa.shadow_data = svq;
>>> -    s->vhost_vdpa.iova_tree = iova_tree;
>>>        if (!is_datapath) {
>>>            s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>>>                                                vhost_vdpa_net_cvq_cmd_page_len());
>>> @@ -760,7 +821,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        uint64_t features;
>>>        int vdpa_device_fd;
>>>        g_autofree NetClientState **ncs = NULL;
>>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
>>>        struct vhost_vdpa_iova_range iova_range;
>>>        NetClientState *nc;
>>>        int queue_pairs, r, i = 0, has_cvq = 0;
>>> @@ -812,12 +872,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>            goto err;
>>>        }
>>>
>>> -    if (opts->x_svq) {
>>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
>>> -            goto err_svq;
>>> -        }
>>> -
>>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
>>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
>>> +        goto err;
>>>        }
>>>
>>>        ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>>> @@ -825,7 +881,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        for (i = 0; i < queue_pairs; i++) {
>>>            ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>                                         vdpa_device_fd, i, 2, true, opts->x_svq,
>>> -                                     iova_range, iova_tree);
>>> +                                     iova_range);
>>>            if (!ncs[i])
>>>                goto err;
>>>        }
>>> @@ -833,13 +889,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        if (has_cvq) {
>>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>                                     vdpa_device_fd, i, 1, false,
>>> -                                 opts->x_svq, iova_range, iova_tree);
>>> +                                 opts->x_svq, iova_range);
>>>            if (!nc)
>>>                goto err;
>>>        }
>>>
>>> -    /* iova_tree ownership belongs to last NetClientState */
>>> -    g_steal_pointer(&iova_tree);
>>>        return 0;
>>>
>>>    err:
>>> @@ -849,7 +903,6 @@ err:
>>>            }
>>>        }
>>>
>>> -err_svq:
>>>        qemu_close(vdpa_device_fd);
>>>
>>>        return -1;



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/15] vdpa: add vdpa net migration state notifier
  2023-03-01 19:26     ` Eugenio Perez Martin
@ 2023-03-03  3:34       ` Jason Wang
  2023-03-03  8:42         ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-03-03  3:34 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang


在 2023/3/2 03:26, Eugenio Perez Martin 写道:
> On Mon, Feb 27, 2023 at 9:08 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2023/2/24 23:54, Eugenio Pérez 写道:
>>> This allows net to restart the device backend to configure SVQ on it.
>>>
>>> Ideally, these changes should not be net specific. However, the vdpa net
>>> backend is the one with enough knowledge to configure everything because
>>> of some reasons:
>>> * Queues might need to be shadowed or not depending on its kind (control
>>>     vs data).
>>> * Queues need to share the same map translations (iova tree).
>>>
>>> Because of that it is cleaner to restart the whole net backend and
>>> configure again as expected, similar to how vhost-kernel moves between
>>> userspace and passthrough.
>>>
>>> If more kinds of devices need dynamic switching to SVQ we can create a
>>> callback struct like VhostOps and move most of the code there.
>>> VhostOps cannot be reused since all vdpa backend share them, and to
>>> personalize just for networking would be too heavy.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>> v4:
>>> * Delete duplication of set shadow_data and shadow_vqs_enabled moving it
>>>     to data / cvq net start functions.
>>>
>>> v3:
>>> * Check for migration state at vdpa device start to enable SVQ in data
>>>     vqs.
>>>
>>> v1 from RFC:
>>> * Add TODO to use the resume operation in the future.
>>> * Use migration_in_setup and migration_has_failed instead of a
>>>     complicated switch case.
>>> ---
>>>    net/vhost-vdpa.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++--
>>>    1 file changed, 69 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>> index b89c99066a..c5512ddf10 100644
>>> --- a/net/vhost-vdpa.c
>>> +++ b/net/vhost-vdpa.c
>>> @@ -26,12 +26,15 @@
>>>    #include <err.h>
>>>    #include "standard-headers/linux/virtio_net.h"
>>>    #include "monitor/monitor.h"
>>> +#include "migration/migration.h"
>>> +#include "migration/misc.h"
>>>    #include "hw/virtio/vhost.h"
>>>
>>>    /* Todo:need to add the multiqueue support here */
>>>    typedef struct VhostVDPAState {
>>>        NetClientState nc;
>>>        struct vhost_vdpa vhost_vdpa;
>>> +    Notifier migration_state;
>>>        VHostNetState *vhost_net;
>>>
>>>        /* Control commands shadow buffers */
>>> @@ -239,10 +242,59 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>>>        return DO_UPCAST(VhostVDPAState, nc, nc0);
>>>    }
>>>
>>> +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
>>> +{
>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>> +    VirtIONet *n;
>>> +    VirtIODevice *vdev;
>>> +    int data_queue_pairs, cvq, r;
>>> +
>>> +    /* We are only called on the first data vqs and only if x-svq is not set */
>>> +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
>>> +        return;
>>> +    }
>>> +
>>> +    vdev = v->dev->vdev;
>>> +    n = VIRTIO_NET(vdev);
>>> +    if (!n->vhost_started) {
>>> +        return;
>>> +    }
>>> +
>>> +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
>>> +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
>>> +                                  n->max_ncs - n->max_queue_pairs : 0;
>>> +    /*
>>> +     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
>>> +     * in the future and resume the device if read-only operations between
>>> +     * suspend and reset goes wrong.
>>> +     */
>>> +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
>>> +
>>> +    /* Start will check migration setup_or_active to configure or not SVQ */
>>> +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
>>> +    if (unlikely(r < 0)) {
>>> +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
>>> +    }
>>> +}
>>> +
>>> +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
>>> +{
>>> +    MigrationState *migration = data;
>>> +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
>>> +                                     migration_state);
>>> +
>>> +    if (migration_in_setup(migration)) {
>>> +        vhost_vdpa_net_log_global_enable(s, true);
>>> +    } else if (migration_has_failed(migration)) {
>>> +        vhost_vdpa_net_log_global_enable(s, false);
>>> +    }
>>> +}
>>> +
>>>    static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>>>    {
>>>        struct vhost_vdpa *v = &s->vhost_vdpa;
>>>
>>> +    add_migration_state_change_notifier(&s->migration_state);
>>>        if (v->shadow_vqs_enabled) {
>>>            v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>>                                               v->iova_range.last);
>>> @@ -256,6 +308,15 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
>>>
>>>        assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>>
>>> +    if (s->always_svq ||
>>> +        migration_is_setup_or_active(migrate_get_current()->state)) {
>>> +        v->shadow_vqs_enabled = true;
>>> +        v->shadow_data = true;
>>> +    } else {
>>> +        v->shadow_vqs_enabled = false;
>>> +        v->shadow_data = false;
>>> +    }
>>> +
>>>        if (v->index == 0) {
>>>            vhost_vdpa_net_data_start_first(s);
>>>            return 0;
>>> @@ -276,6 +337,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
>>>
>>>        assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>>
>>> +    if (s->vhost_vdpa.index == 0) {
>>> +        remove_migration_state_change_notifier(&s->migration_state);
>>> +    }
>>
>> This should work but I just realize that vhost support
>> vhost_dev_set_log(), I wonder if it would be simpler to go with that way.
>>
>> Using vhost_virtqueue_set_addr(, enable_log = true)?
>>
> We can do that but it has the same problem as with checking _F_LOG_ALL
> in set_features:
>
> 1. We're tearing down a vhost device using a listener registered
> against that device, at start / stop.
> 2. We need to traverse all the devices many times to first get all the
> vqs state and then transverse them again to set them up properly.
>
> My two ideas to solve the recursiveness of 1 are:
> a. Duplicating vhost_dev_start / vhost_dev_stop at
> vhost_vdpa_set_features / vhost_vdpa_set_vring_addr.
>
> This has the same problem as all duplications: It will get out of sync
> eventually. For example, the latest changes about configure interrupt
> would need to be duplicated in this new call.
>
> b. Add a new parameter to vhost_dev_start/stop to skip the
> set_features / set_vring_address step.
> Now that the virtio queue reset changes have exposed these functions
> it is also possible to call them from vhost-vdpa.
>
> Maybe we can store that parameter in vhost_vdpa so we don't call
> vhost_dev_start / stop there instead of affecting all backends, but
> the idea is the same.
>
> For problem 2 I still do not have a solution. CVQ / MQ Is out of the
> scope for this series but I think it will bite us when we add it
> (hopefully soon).


Thanks for the clarification, I'd suggest to document the above in the 
changlog.


>
> Thanks!
>
>> Thanks
>>
>>
>>> +
>>>        dev = s->vhost_vdpa.dev;
>>>        if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>>            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>> @@ -412,11 +477,12 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>>>        s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>        v = &s->vhost_vdpa;
>>>
>>> -    v->shadow_data = s->always_svq;
>>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>> +    v->shadow_data = s0->vhost_vdpa.shadow_vqs_enabled;
>>>        v->shadow_vqs_enabled = s->always_svq;
>>>        s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
>>>
>>> -    if (s->always_svq) {
>>> +    if (s->vhost_vdpa.shadow_data) {
>>>            /* SVQ is already configured for all virtqueues */
>>>            goto out;
>>>        }
>>> @@ -473,7 +539,6 @@ out:
>>>            return 0;
>>>        }
>>>
>>> -    s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>>        if (s0->vhost_vdpa.iova_tree) {
>>>            /*
>>>             * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
>>> @@ -749,6 +814,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>        s->vhost_vdpa.device_fd = vdpa_device_fd;
>>>        s->vhost_vdpa.index = queue_pair_index;
>>>        s->always_svq = svq;
>>> +    s->migration_state.notify = vdpa_net_migration_state_notifier;
>>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
>>>        s->vhost_vdpa.iova_range = iova_range;
>>>        s->vhost_vdpa.shadow_data = svq;



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/15] vdpa: block migration if device has unsupported features
  2023-03-01 19:32       ` Eugenio Perez Martin
@ 2023-03-03  3:48         ` Jason Wang
  2023-03-03  8:58           ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-03-03  3:48 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang


在 2023/3/2 03:32, Eugenio Perez Martin 写道:
> On Mon, Feb 27, 2023 at 9:20 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Mon, Feb 27, 2023 at 4:15 PM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> 在 2023/2/24 23:54, Eugenio Pérez 写道:
>>>> A vdpa net device must initialize with SVQ in order to be migratable at
>>>> this moment, and initialization code verifies some conditions.  If the
>>>> device is not initialized with the x-svq parameter, it will not expose
>>>> _F_LOG so the vhost subsystem will block VM migration from its
>>>> initialization.
>>>>
>>>> Next patches change this, so we need to verify migration conditions
>>>> differently.
>>>>
>>>> QEMU only supports a subset of net features in SVQ, and it cannot
>>>> migrate state that cannot track or restore in the destination.  Add a
>>>> migration blocker if the device offer an unsupported feature.
>>>>
>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>> ---
>>>> v3: add mirgation blocker properly so vhost_dev can handle it.
>>>> ---
>>>>    net/vhost-vdpa.c | 12 ++++++++----
>>>>    1 file changed, 8 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>>> index 4f983df000..094dc1c2d0 100644
>>>> --- a/net/vhost-vdpa.c
>>>> +++ b/net/vhost-vdpa.c
>>>> @@ -795,7 +795,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>>                                           int nvqs,
>>>>                                           bool is_datapath,
>>>>                                           bool svq,
>>>> -                                       struct vhost_vdpa_iova_range iova_range)
>>>> +                                       struct vhost_vdpa_iova_range iova_range,
>>>> +                                       uint64_t features)
>>>>    {
>>>>        NetClientState *nc = NULL;
>>>>        VhostVDPAState *s;
>>>> @@ -818,7 +819,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
>>>>        s->vhost_vdpa.iova_range = iova_range;
>>>>        s->vhost_vdpa.shadow_data = svq;
>>>> -    if (!is_datapath) {
>>>> +    if (queue_pair_index == 0) {
>>>> +        vhost_vdpa_net_valid_svq_features(features,
>>>> +                                          &s->vhost_vdpa.migration_blocker);
>>>
>>> Since we do validation at initialization, is this necessary to valid
>>> once again in other places?
>> Ok, after reading patch 13, I think the question is:
>>
>> The validation seems to be independent to net, can we valid it once
>> during vhost_vdpa_init()?
>>
> vhost_vdpa_net_valid_svq_features also checks for net features. In
> particular, all the non transport features must be in
> vdpa_svq_device_features.
>
> This is how we protect that the device / guest will never negotiate
> things like VLAN filtering support, as SVQ still does not know how to
> restore at the destination.
>
> In the VLAN filtering case CVQ is needed to restore VLAN, so it is
> covered by patch 11/15. But other future features may need support for
> restoring it in the destination.


I wonder how hard to have a general validation code let net specific 
code to advertise a blacklist to avoid code duplication.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>> Thanks
>>>
>>>
>>>> +    } else if (!is_datapath) {
>>>>            s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>>>>                                                vhost_vdpa_net_cvq_cmd_page_len());
>>>>            memset(s->cvq_cmd_out_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
>>>> @@ -956,7 +960,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>        for (i = 0; i < queue_pairs; i++) {
>>>>            ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>>                                         vdpa_device_fd, i, 2, true, opts->x_svq,
>>>> -                                     iova_range);
>>>> +                                     iova_range, features);
>>>>            if (!ncs[i])
>>>>                goto err;
>>>>        }
>>>> @@ -964,7 +968,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>        if (has_cvq) {
>>>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>>                                     vdpa_device_fd, i, 1, false,
>>>> -                                 opts->x_svq, iova_range);
>>>> +                                 opts->x_svq, iova_range, features);
>>>>            if (!nc)
>>>>                goto err;
>>>>        }



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 01/15] vdpa net: move iova tree creation from init to start
  2023-03-03  3:32       ` Jason Wang
@ 2023-03-03  8:00         ` Eugenio Perez Martin
  2023-03-06  3:43           ` Jason Wang
  0 siblings, 1 reply; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-03  8:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Fri, Mar 3, 2023 at 4:32 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/3/1 15:01, Eugenio Perez Martin 写道:
> > On Mon, Feb 27, 2023 at 8:04 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> >>> Only create iova_tree if and when it is needed.
> >>>
> >>> The cleanup keeps being responsible of last VQ but this change allows it
> >>> to merge both cleanup functions.
> >>>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> Acked-by: Jason Wang <jasowang@redhat.com>
> >>> ---
> >>> v4:
> >>> * Remove leak of iova_tree because double allocation
> >>> * Document better the sharing of IOVA tree between data and CVQ
> >>> ---
> >>>    net/vhost-vdpa.c | 113 ++++++++++++++++++++++++++++++++++-------------
> >>>    1 file changed, 83 insertions(+), 30 deletions(-)
> >>>
> >>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> >>> index de5ed8ff22..b89c99066a 100644
> >>> --- a/net/vhost-vdpa.c
> >>> +++ b/net/vhost-vdpa.c
> >>> @@ -178,13 +178,9 @@ err_init:
> >>>    static void vhost_vdpa_cleanup(NetClientState *nc)
> >>>    {
> >>>        VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>> -    struct vhost_dev *dev = &s->vhost_net->dev;
> >>>
> >>>        qemu_vfree(s->cvq_cmd_out_buffer);
> >>>        qemu_vfree(s->status);
> >>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>> -    }
> >>>        if (s->vhost_net) {
> >>>            vhost_net_cleanup(s->vhost_net);
> >>>            g_free(s->vhost_net);
> >>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
> >>>        return size;
> >>>    }
> >>>
> >>> +/** From any vdpa net client, get the netclient of first queue pair */
> >>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> >>> +{
> >>> +    NICState *nic = qemu_get_nic(s->nc.peer);
> >>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> >>> +
> >>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> >>> +}
> >>> +
> >>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> >>> +{
> >>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> >>> +
> >>> +    if (v->shadow_vqs_enabled) {
> >>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >>> +                                           v->iova_range.last);
> >>> +    }
> >>> +}
> >>> +
> >>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
> >>> +{
> >>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> >>> +
> >>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>> +
> >>> +    if (v->index == 0) {
> >>> +        vhost_vdpa_net_data_start_first(s);
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    if (v->shadow_vqs_enabled) {
> >>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> >>> +    }
> >>> +
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> >>> +{
> >>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>> +    struct vhost_dev *dev;
> >>> +
> >>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>> +
> >>> +    dev = s->vhost_vdpa.dev;
> >>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>> +    }
> >>> +}
> >>> +
> >>>    static NetClientInfo net_vhost_vdpa_info = {
> >>>            .type = NET_CLIENT_DRIVER_VHOST_VDPA,
> >>>            .size = sizeof(VhostVDPAState),
> >>>            .receive = vhost_vdpa_receive,
> >>> +        .start = vhost_vdpa_net_data_start,
> >>> +        .stop = vhost_vdpa_net_client_stop,
> >>
> >> Looking at the implementation, it seems nothing net specific, any reason
> >> we can't simply use vhost_vdpa_dev_start()?
> >>
> > IOVA tree must be shared between (at least) all dataplane vhost_vdpa.
> > How could we move the call to vhost_vdpa_net_first_nc_vdpa to
> > vhost_vdpa_dev_start?
>
>
> Ok, I think I get it. We should really consider to implement a parent
> structure in the future for vhost_vdpa then we can avoid tricks like:
>
> vq_index_end and vhost_vdpa_net_first_nc_vdpa()
>

Sounds right. Maybe it is enough to link all of them with a QLIST?

Thanks!

> Thanks
>
>
> >
> > A possibility is to always allocate it just in case. But it seems to
> > me it is better to not start allocating resources just in case :).
> >
> >>>            .cleanup = vhost_vdpa_cleanup,
> >>>            .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> >>>            .has_ufo = vhost_vdpa_has_ufo,
> >>> @@ -351,7 +401,7 @@ dma_map_err:
> >>>
> >>>    static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >>>    {
> >>> -    VhostVDPAState *s;
> >>> +    VhostVDPAState *s, *s0;
> >>>        struct vhost_vdpa *v;
> >>>        uint64_t backend_features;
> >>>        int64_t cvq_group;
> >>> @@ -415,8 +465,6 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >>>            return r;
> >>>        }
> >>>
> >>> -    v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >>> -                                       v->iova_range.last);
> >>>        v->shadow_vqs_enabled = true;
> >>>        s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
> >>>
> >>> @@ -425,6 +473,27 @@ out:
> >>>            return 0;
> >>>        }
> >>>
> >>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >>> +    if (s0->vhost_vdpa.iova_tree) {
> >>> +        /*
> >>> +         * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
> >>> +         * simplicity, wether CVQ shares ASID with guest or not, because:
> >>
> >> Typo, should be "whether", or "regardless of whether"(not a native speaker).
> >>
> > Good catch, I can fix it in the next version.
> >
> > Thanks!
> >
> >> Other looks good.
> >>
> >> Thanks
> >>
> >>
> >>> +         * - Memory listener need access to guest's memory addresses allocated
> >>> +         *   in the IOVA tree.
> >>> +         * - There should be plenty of IOVA address space for both ASID not to
> >>> +         *   worry about collisions between them.  Guest's translations are
> >>> +         *   still validated with virtio virtqueue_pop so there is no risk for
> >>> +         *   the guest to access memory it shouldn't.
> >>> +         *
> >>> +         * To allocate a iova tree per ASID is doable but it complicates the
> >>> +         * code and it is not worth for the moment.
> >>> +         */
> >>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> >>> +    } else {
> >>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >>> +                                           v->iova_range.last);
> >>> +    }
> >>> +
> >>>        r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
> >>>                                   vhost_vdpa_net_cvq_cmd_page_len(), false);
> >>>        if (unlikely(r < 0)) {
> >>> @@ -449,15 +518,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
> >>>        if (s->vhost_vdpa.shadow_vqs_enabled) {
> >>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> >>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> >>> -        if (!s->always_svq) {
> >>> -            /*
> >>> -             * If only the CVQ is shadowed we can delete this safely.
> >>> -             * If all the VQs are shadows this will be needed by the time the
> >>> -             * device is started again to register SVQ vrings and similar.
> >>> -             */
> >>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>> -        }
> >>>        }
> >>> +
> >>> +    vhost_vdpa_net_client_stop(nc);
> >>>    }
> >>>
> >>>    static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
> >>> @@ -667,8 +730,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>                                           int nvqs,
> >>>                                           bool is_datapath,
> >>>                                           bool svq,
> >>> -                                       struct vhost_vdpa_iova_range iova_range,
> >>> -                                       VhostIOVATree *iova_tree)
> >>> +                                       struct vhost_vdpa_iova_range iova_range)
> >>>    {
> >>>        NetClientState *nc = NULL;
> >>>        VhostVDPAState *s;
> >>> @@ -690,7 +752,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
> >>>        s->vhost_vdpa.iova_range = iova_range;
> >>>        s->vhost_vdpa.shadow_data = svq;
> >>> -    s->vhost_vdpa.iova_tree = iova_tree;
> >>>        if (!is_datapath) {
> >>>            s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> >>>                                                vhost_vdpa_net_cvq_cmd_page_len());
> >>> @@ -760,7 +821,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>        uint64_t features;
> >>>        int vdpa_device_fd;
> >>>        g_autofree NetClientState **ncs = NULL;
> >>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
> >>>        struct vhost_vdpa_iova_range iova_range;
> >>>        NetClientState *nc;
> >>>        int queue_pairs, r, i = 0, has_cvq = 0;
> >>> @@ -812,12 +872,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>            goto err;
> >>>        }
> >>>
> >>> -    if (opts->x_svq) {
> >>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> >>> -            goto err_svq;
> >>> -        }
> >>> -
> >>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> >>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
> >>> +        goto err;
> >>>        }
> >>>
> >>>        ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> >>> @@ -825,7 +881,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>        for (i = 0; i < queue_pairs; i++) {
> >>>            ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>>                                         vdpa_device_fd, i, 2, true, opts->x_svq,
> >>> -                                     iova_range, iova_tree);
> >>> +                                     iova_range);
> >>>            if (!ncs[i])
> >>>                goto err;
> >>>        }
> >>> @@ -833,13 +889,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>        if (has_cvq) {
> >>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>>                                     vdpa_device_fd, i, 1, false,
> >>> -                                 opts->x_svq, iova_range, iova_tree);
> >>> +                                 opts->x_svq, iova_range);
> >>>            if (!nc)
> >>>                goto err;
> >>>        }
> >>>
> >>> -    /* iova_tree ownership belongs to last NetClientState */
> >>> -    g_steal_pointer(&iova_tree);
> >>>        return 0;
> >>>
> >>>    err:
> >>> @@ -849,7 +903,6 @@ err:
> >>>            }
> >>>        }
> >>>
> >>> -err_svq:
> >>>        qemu_close(vdpa_device_fd);
> >>>
> >>>        return -1;
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/15] vdpa: add vdpa net migration state notifier
  2023-03-03  3:34       ` Jason Wang
@ 2023-03-03  8:42         ` Eugenio Perez Martin
  0 siblings, 0 replies; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-03  8:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Fri, Mar 3, 2023 at 4:34 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/3/2 03:26, Eugenio Perez Martin 写道:
> > On Mon, Feb 27, 2023 at 9:08 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> >>> This allows net to restart the device backend to configure SVQ on it.
> >>>
> >>> Ideally, these changes should not be net specific. However, the vdpa net
> >>> backend is the one with enough knowledge to configure everything because
> >>> of some reasons:
> >>> * Queues might need to be shadowed or not depending on its kind (control
> >>>     vs data).
> >>> * Queues need to share the same map translations (iova tree).
> >>>
> >>> Because of that it is cleaner to restart the whole net backend and
> >>> configure again as expected, similar to how vhost-kernel moves between
> >>> userspace and passthrough.
> >>>
> >>> If more kinds of devices need dynamic switching to SVQ we can create a
> >>> callback struct like VhostOps and move most of the code there.
> >>> VhostOps cannot be reused since all vdpa backend share them, and to
> >>> personalize just for networking would be too heavy.
> >>>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> ---
> >>> v4:
> >>> * Delete duplication of set shadow_data and shadow_vqs_enabled moving it
> >>>     to data / cvq net start functions.
> >>>
> >>> v3:
> >>> * Check for migration state at vdpa device start to enable SVQ in data
> >>>     vqs.
> >>>
> >>> v1 from RFC:
> >>> * Add TODO to use the resume operation in the future.
> >>> * Use migration_in_setup and migration_has_failed instead of a
> >>>     complicated switch case.
> >>> ---
> >>>    net/vhost-vdpa.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++--
> >>>    1 file changed, 69 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> >>> index b89c99066a..c5512ddf10 100644
> >>> --- a/net/vhost-vdpa.c
> >>> +++ b/net/vhost-vdpa.c
> >>> @@ -26,12 +26,15 @@
> >>>    #include <err.h>
> >>>    #include "standard-headers/linux/virtio_net.h"
> >>>    #include "monitor/monitor.h"
> >>> +#include "migration/migration.h"
> >>> +#include "migration/misc.h"
> >>>    #include "hw/virtio/vhost.h"
> >>>
> >>>    /* Todo:need to add the multiqueue support here */
> >>>    typedef struct VhostVDPAState {
> >>>        NetClientState nc;
> >>>        struct vhost_vdpa vhost_vdpa;
> >>> +    Notifier migration_state;
> >>>        VHostNetState *vhost_net;
> >>>
> >>>        /* Control commands shadow buffers */
> >>> @@ -239,10 +242,59 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> >>>        return DO_UPCAST(VhostVDPAState, nc, nc0);
> >>>    }
> >>>
> >>> +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
> >>> +{
> >>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> >>> +    VirtIONet *n;
> >>> +    VirtIODevice *vdev;
> >>> +    int data_queue_pairs, cvq, r;
> >>> +
> >>> +    /* We are only called on the first data vqs and only if x-svq is not set */
> >>> +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    vdev = v->dev->vdev;
> >>> +    n = VIRTIO_NET(vdev);
> >>> +    if (!n->vhost_started) {
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> >>> +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> >>> +                                  n->max_ncs - n->max_queue_pairs : 0;
> >>> +    /*
> >>> +     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
> >>> +     * in the future and resume the device if read-only operations between
> >>> +     * suspend and reset goes wrong.
> >>> +     */
> >>> +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> >>> +
> >>> +    /* Start will check migration setup_or_active to configure or not SVQ */
> >>> +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> >>> +    if (unlikely(r < 0)) {
> >>> +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
> >>> +    }
> >>> +}
> >>> +
> >>> +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> >>> +{
> >>> +    MigrationState *migration = data;
> >>> +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> >>> +                                     migration_state);
> >>> +
> >>> +    if (migration_in_setup(migration)) {
> >>> +        vhost_vdpa_net_log_global_enable(s, true);
> >>> +    } else if (migration_has_failed(migration)) {
> >>> +        vhost_vdpa_net_log_global_enable(s, false);
> >>> +    }
> >>> +}
> >>> +
> >>>    static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> >>>    {
> >>>        struct vhost_vdpa *v = &s->vhost_vdpa;
> >>>
> >>> +    add_migration_state_change_notifier(&s->migration_state);
> >>>        if (v->shadow_vqs_enabled) {
> >>>            v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >>>                                               v->iova_range.last);
> >>> @@ -256,6 +308,15 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
> >>>
> >>>        assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>>
> >>> +    if (s->always_svq ||
> >>> +        migration_is_setup_or_active(migrate_get_current()->state)) {
> >>> +        v->shadow_vqs_enabled = true;
> >>> +        v->shadow_data = true;
> >>> +    } else {
> >>> +        v->shadow_vqs_enabled = false;
> >>> +        v->shadow_data = false;
> >>> +    }
> >>> +
> >>>        if (v->index == 0) {
> >>>            vhost_vdpa_net_data_start_first(s);
> >>>            return 0;
> >>> @@ -276,6 +337,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
> >>>
> >>>        assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>>
> >>> +    if (s->vhost_vdpa.index == 0) {
> >>> +        remove_migration_state_change_notifier(&s->migration_state);
> >>> +    }
> >>
> >> This should work but I just realize that vhost support
> >> vhost_dev_set_log(), I wonder if it would be simpler to go with that way.
> >>
> >> Using vhost_virtqueue_set_addr(, enable_log = true)?
> >>
> > We can do that but it has the same problem as with checking _F_LOG_ALL
> > in set_features:
> >
> > 1. We're tearing down a vhost device using a listener registered
> > against that device, at start / stop.
> > 2. We need to traverse all the devices many times to first get all the
> > vqs state and then transverse them again to set them up properly.
> >
> > My two ideas to solve the recursiveness of 1 are:
> > a. Duplicating vhost_dev_start / vhost_dev_stop at
> > vhost_vdpa_set_features / vhost_vdpa_set_vring_addr.
> >
> > This has the same problem as all duplications: It will get out of sync
> > eventually. For example, the latest changes about configure interrupt
> > would need to be duplicated in this new call.
> >
> > b. Add a new parameter to vhost_dev_start/stop to skip the
> > set_features / set_vring_address step.
> > Now that the virtio queue reset changes have exposed these functions
> > it is also possible to call them from vhost-vdpa.
> >
> > Maybe we can store that parameter in vhost_vdpa so we don't call
> > vhost_dev_start / stop there instead of affecting all backends, but
> > the idea is the same.
> >
> > For problem 2 I still do not have a solution. CVQ / MQ Is out of the
> > scope for this series but I think it will bite us when we add it
> > (hopefully soon).
>
>
> Thanks for the clarification, I'd suggest to document the above in the
> changlog.
>

Please let me know if you agree on the next message for this patch:

vdpa: add vdpa net migration state notifier

This allows net to restart the device backend to configure SVQ on it.

Ideally, these changes should not be net specific and they could be done
in:
* vhost_vdpa_set_features (with VHOST_F_LOG_ALL)
* vhost_vdpa_set_vring_addr (with .enable_log)
* vhost_vdpa_set_log_base.

However, the vdpa net backend is the one with enough knowledge to
configure everything because of some reasons:
* Queues might need to be shadowed or not depending on its kind (control
  vs data).
* Queues need to share the same map translations (iova tree).

Also, there are other problems that may have solutions but complicates
the implementation at this stage:
* We're basically duplicating vhost_dev_start and vhost_dev_stop, and
  they could go out of sync.  If we want to reuse them, we need a way to
  skip some function calls to avoid recursiveness (either vhost_ops ->
  vhost_set_features, vhost_set_vring_addr, ...).
* We need to traverse all vhost_dev of a given net device twice: one to
  stop and get the vq state and another one after the reset to
  configure properties like address, fd, etc.

Because of that it is cleaner to restart the whole net backend and
configure again as expected, similar to how vhost-kernel moves between
userspace and passthrough.

If more kinds of devices need dynamic switching to SVQ we can:
* Create a callback struct like VhostOps and move most of the code
  there.  VhostOps cannot be reused since all vdpa backend share them,
  and to personalize just for networking would be too heavy.
* Add a parent struct or link all the vhost_vdpa or vhost_dev structs so
  we can traverse them.
---

Thanks!

>
> >
> > Thanks!
> >
> >> Thanks
> >>
> >>
> >>> +
> >>>        dev = s->vhost_vdpa.dev;
> >>>        if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >>>            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>> @@ -412,11 +477,12 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >>>        s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>>        v = &s->vhost_vdpa;
> >>>
> >>> -    v->shadow_data = s->always_svq;
> >>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >>> +    v->shadow_data = s0->vhost_vdpa.shadow_vqs_enabled;
> >>>        v->shadow_vqs_enabled = s->always_svq;
> >>>        s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
> >>>
> >>> -    if (s->always_svq) {
> >>> +    if (s->vhost_vdpa.shadow_data) {
> >>>            /* SVQ is already configured for all virtqueues */
> >>>            goto out;
> >>>        }
> >>> @@ -473,7 +539,6 @@ out:
> >>>            return 0;
> >>>        }
> >>>
> >>> -    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >>>        if (s0->vhost_vdpa.iova_tree) {
> >>>            /*
> >>>             * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
> >>> @@ -749,6 +814,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>        s->vhost_vdpa.device_fd = vdpa_device_fd;
> >>>        s->vhost_vdpa.index = queue_pair_index;
> >>>        s->always_svq = svq;
> >>> +    s->migration_state.notify = vdpa_net_migration_state_notifier;
> >>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
> >>>        s->vhost_vdpa.iova_range = iova_range;
> >>>        s->vhost_vdpa.shadow_data = svq;
>
vdpa: add vdpa net migration state notifier

This allows net to restart the device backend to configure SVQ on it.

Ideally, these changes should not be net specific and they could be done
in:
* vhost_vdpa_set_features (with VHOST_F_LOG_ALL)
* vhost_vdpa_set_vring_addr (with .enable_log)
* vhost_vdpa_set_log_base.

However, the vdpa net backend is the one with enough knowledge to
configure everything because of some reasons:
* Queues might need to be shadowed or not depending on its kind (control
  vs data).
* Queues need to share the same map translations (iova tree).

Also, there are other problems that may have solutions but complicates
the implementation at this stage:
* We're basically duplicating vhost_dev_start and vhost_dev_stop, and
  they could go out of sync.  If we want to reuse them, we need a way to
  skip some function call to avoid recursiveness (either vhost_ops ->
  vhost_set_features, vhost_set_vring_addr, ...).
* We need to traverse all vhost_dev of a given net device twice: one to
  stop and get the vq state and another one after the reset to
  configure properties like address, fd, etc.

Because of that it is cleaner to restart the whole net backend and
configure again as expected, similar to how vhost-kernel moves between
userspace and passthrough.

If more kinds of devices need dynamic switching to SVQ we can:
* Create a callback struct like VhostOps and move most of the code
  there.  VhostOps cannot be reused since all vdpa backend share them,
  and to personalize just for networking would be too heavy.
* Add a parent struct or link all the vhost_vdpa or vhost_dev structs so
  we can traverse them.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/15] vdpa: block migration if device has unsupported features
  2023-03-03  3:48         ` Jason Wang
@ 2023-03-03  8:58           ` Eugenio Perez Martin
  2023-03-06  3:42             ` Jason Wang
  0 siblings, 1 reply; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-03  8:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Fri, Mar 3, 2023 at 4:48 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/3/2 03:32, Eugenio Perez Martin 写道:
> > On Mon, Feb 27, 2023 at 9:20 AM Jason Wang <jasowang@redhat.com> wrote:
> >> On Mon, Feb 27, 2023 at 4:15 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>
> >>> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> >>>> A vdpa net device must initialize with SVQ in order to be migratable at
> >>>> this moment, and initialization code verifies some conditions.  If the
> >>>> device is not initialized with the x-svq parameter, it will not expose
> >>>> _F_LOG so the vhost subsystem will block VM migration from its
> >>>> initialization.
> >>>>
> >>>> Next patches change this, so we need to verify migration conditions
> >>>> differently.
> >>>>
> >>>> QEMU only supports a subset of net features in SVQ, and it cannot
> >>>> migrate state that cannot track or restore in the destination.  Add a
> >>>> migration blocker if the device offer an unsupported feature.
> >>>>
> >>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>>> ---
> >>>> v3: add mirgation blocker properly so vhost_dev can handle it.
> >>>> ---
> >>>>    net/vhost-vdpa.c | 12 ++++++++----
> >>>>    1 file changed, 8 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> >>>> index 4f983df000..094dc1c2d0 100644
> >>>> --- a/net/vhost-vdpa.c
> >>>> +++ b/net/vhost-vdpa.c
> >>>> @@ -795,7 +795,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>>                                           int nvqs,
> >>>>                                           bool is_datapath,
> >>>>                                           bool svq,
> >>>> -                                       struct vhost_vdpa_iova_range iova_range)
> >>>> +                                       struct vhost_vdpa_iova_range iova_range,
> >>>> +                                       uint64_t features)
> >>>>    {
> >>>>        NetClientState *nc = NULL;
> >>>>        VhostVDPAState *s;
> >>>> @@ -818,7 +819,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
> >>>>        s->vhost_vdpa.iova_range = iova_range;
> >>>>        s->vhost_vdpa.shadow_data = svq;
> >>>> -    if (!is_datapath) {
> >>>> +    if (queue_pair_index == 0) {
> >>>> +        vhost_vdpa_net_valid_svq_features(features,
> >>>> +                                          &s->vhost_vdpa.migration_blocker);
> >>>
> >>> Since we do validation at initialization, is this necessary to valid
> >>> once again in other places?
> >> Ok, after reading patch 13, I think the question is:
> >>
> >> The validation seems to be independent to net, can we valid it once
> >> during vhost_vdpa_init()?
> >>
> > vhost_vdpa_net_valid_svq_features also checks for net features. In
> > particular, all the non transport features must be in
> > vdpa_svq_device_features.
> >
> > This is how we protect that the device / guest will never negotiate
> > things like VLAN filtering support, as SVQ still does not know how to
> > restore at the destination.
> >
> > In the VLAN filtering case CVQ is needed to restore VLAN, so it is
> > covered by patch 11/15. But other future features may need support for
> > restoring it in the destination.
>
>
> I wonder how hard to have a general validation code let net specific
> code to advertise a blacklist to avoid code duplication.
>

A blacklist does not work here, because I don't know if SVQ needs
changes for future feature bits that are still not in / proposed to
the standard.

Regarding the code duplication, do you mean to validate transport
features and net specific features in one shot, instead of having a
dedicated function for SVQ transport?

Thanks!

> Thanks
>
>
> >
> > Thanks!
> >
> >> Thanks
> >>
> >>> Thanks
> >>>
> >>>
> >>>> +    } else if (!is_datapath) {
> >>>>            s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> >>>>                                                vhost_vdpa_net_cvq_cmd_page_len());
> >>>>            memset(s->cvq_cmd_out_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
> >>>> @@ -956,7 +960,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>>        for (i = 0; i < queue_pairs; i++) {
> >>>>            ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>>>                                         vdpa_device_fd, i, 2, true, opts->x_svq,
> >>>> -                                     iova_range);
> >>>> +                                     iova_range, features);
> >>>>            if (!ncs[i])
> >>>>                goto err;
> >>>>        }
> >>>> @@ -964,7 +968,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>>        if (has_cvq) {
> >>>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>>>                                     vdpa_device_fd, i, 1, false,
> >>>> -                                 opts->x_svq, iova_range);
> >>>> +                                 opts->x_svq, iova_range, features);
> >>>>            if (!nc)
> >>>>                goto err;
> >>>>        }
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 03/15] vdpa: stop svq at vhost_vdpa_dev_start(false)
  2023-02-27  7:15   ` Jason Wang
@ 2023-03-03 16:29     ` Eugenio Perez Martin
  0 siblings, 0 replies; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-03 16:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Mon, Feb 27, 2023 at 8:15 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > It used to be done at vhost_vdpa_svq_cleanup, since a device couldn't
> > switch to SVQ mode dynamically.  Now that we need to fetch the state and
> > trust SVQ will not modify guest's used_idx at migration, stop
> > effectively SVQ at suspend time, as a real device would do.
> >
> > Leaving old vhost_svq_stop call at vhost_vdpa_svq_cleanup, as its
> > supported to call it many times and it follows other operations that are
> > called redundantly there too:
> > * vhost_vdpa_host_notifiers_uninit
> > * memory_listener_unregister
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v3: New in v3
> > ---
> >   hw/virtio/vhost-vdpa.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 4f72a52a43..d9260191cc 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1100,10 +1100,11 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
> >
> >       for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> >           VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> > +
> > +        vhost_svq_stop(svq);
> >           vhost_vdpa_svq_unmap_rings(dev, svq);
> >
> >           event_notifier_cleanup(&svq->hdev_kick);
> > -        event_notifier_cleanup(&svq->hdev_call);
>
>
> Any reason we need to not clean callfd? (Not explained in the change
> log, or should be another patch?).
>

This was actually an artifact of rebasing, sorry. This patch will be
removed fron v5 as there is already present in staging, commit
2e1a9de96b48 ("vdpa: stop all svq on device deletion").

Thanks!



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/15] vdpa: add vhost_vdpa_suspend
  2023-03-01  1:30   ` Si-Wei Liu
@ 2023-03-03 16:34     ` Eugenio Perez Martin
  0 siblings, 0 replies; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-03 16:34 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Jason Wang,
	Gautam Dawar, Laurent Vivier, alvaro.karsz, longpeng2,
	virtualization, Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Wed, Mar 1, 2023 at 2:30 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 2/24/2023 7:54 AM, Eugenio Pérez wrote:
> > The function vhost.c:vhost_dev_stop fetches the vring base so the vq
> > state can be migrated to other devices.  However, this is unreliable in
> > vdpa, since we didn't signal the device to suspend the queues, making
> > the value fetched useless.
> >
> > Suspend the device if possible before fetching first and subsequent
> > vring bases.
> >
> > Moreover, vdpa totally reset and wipes the device at the last device
> > before fetch its vrings base, making that operation useless in the last
> > device. This will be fixed in later patches of this series.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v4:
> > * Look for _F_SUSPEND at vhost_dev->backend_cap, not backend_features
> > * Fall back on reset & fetch used idx from guest's memory
> > ---
> >   hw/virtio/vhost-vdpa.c | 25 +++++++++++++++++++++++++
> >   hw/virtio/trace-events |  1 +
> >   2 files changed, 26 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 228677895a..f542960a64 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -712,6 +712,7 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> >
> >       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
> >       trace_vhost_vdpa_reset_device(dev, status);
> > +    v->suspended = false;
> >       return ret;
> >   }
> >
> > @@ -1109,6 +1110,29 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
> >       }
> >   }
> >
> > +static void vhost_vdpa_suspend(struct vhost_dev *dev)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +    int r;
> > +
> > +    if (!vhost_vdpa_first_dev(dev)) {
> > +        return;
> > +    }
> > +
> > +    if (!(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> Polarity reversed. This ends up device getting reset always even if the
> backend offers _F_SUSPEND.
>

Good catch, I'll fix it in v5.

Thanks!

> -Siwei
>
> > +        trace_vhost_vdpa_suspend(dev);
> > +        r = ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
> > +        if (unlikely(r)) {
> > +            error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
> > +        } else {
> > +            v->suspended = true;
> > +            return;
> > +        }
> > +    }
> > +
> > +    vhost_vdpa_reset_device(dev);
> > +}
> > +
> >   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >   {
> >       struct vhost_vdpa *v = dev->opaque;
> > @@ -1123,6 +1147,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >           }
> >           vhost_vdpa_set_vring_ready(dev);
> >       } else {
> > +        vhost_vdpa_suspend(dev);
> >           vhost_vdpa_svqs_stop(dev);
> >           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >       }
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index a87c5f39a2..8f8d05cf9b 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -50,6 +50,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
> >   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
> >   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
> >   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
> > +vhost_vdpa_suspend(void *dev) "dev: %p"
> >   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
> >   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
> >   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/15] vdpa: block migration if device has unsupported features
  2023-03-03  8:58           ` Eugenio Perez Martin
@ 2023-03-06  3:42             ` Jason Wang
  2023-03-06 11:32               ` Eugenio Perez Martin
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2023-03-06  3:42 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Fri, Mar 3, 2023 at 4:58 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Fri, Mar 3, 2023 at 4:48 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2023/3/2 03:32, Eugenio Perez Martin 写道:
> > > On Mon, Feb 27, 2023 at 9:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > >> On Mon, Feb 27, 2023 at 4:15 PM Jason Wang <jasowang@redhat.com> wrote:
> > >>>
> > >>> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > >>>> A vdpa net device must initialize with SVQ in order to be migratable at
> > >>>> this moment, and initialization code verifies some conditions.  If the
> > >>>> device is not initialized with the x-svq parameter, it will not expose
> > >>>> _F_LOG so the vhost subsystem will block VM migration from its
> > >>>> initialization.
> > >>>>
> > >>>> Next patches change this, so we need to verify migration conditions
> > >>>> differently.
> > >>>>
> > >>>> QEMU only supports a subset of net features in SVQ, and it cannot
> > >>>> migrate state that cannot track or restore in the destination.  Add a
> > >>>> migration blocker if the device offer an unsupported feature.
> > >>>>
> > >>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>>> ---
> > >>>> v3: add mirgation blocker properly so vhost_dev can handle it.
> > >>>> ---
> > >>>>    net/vhost-vdpa.c | 12 ++++++++----
> > >>>>    1 file changed, 8 insertions(+), 4 deletions(-)
> > >>>>
> > >>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > >>>> index 4f983df000..094dc1c2d0 100644
> > >>>> --- a/net/vhost-vdpa.c
> > >>>> +++ b/net/vhost-vdpa.c
> > >>>> @@ -795,7 +795,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >>>>                                           int nvqs,
> > >>>>                                           bool is_datapath,
> > >>>>                                           bool svq,
> > >>>> -                                       struct vhost_vdpa_iova_range iova_range)
> > >>>> +                                       struct vhost_vdpa_iova_range iova_range,
> > >>>> +                                       uint64_t features)
> > >>>>    {
> > >>>>        NetClientState *nc = NULL;
> > >>>>        VhostVDPAState *s;
> > >>>> @@ -818,7 +819,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >>>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
> > >>>>        s->vhost_vdpa.iova_range = iova_range;
> > >>>>        s->vhost_vdpa.shadow_data = svq;
> > >>>> -    if (!is_datapath) {
> > >>>> +    if (queue_pair_index == 0) {
> > >>>> +        vhost_vdpa_net_valid_svq_features(features,
> > >>>> +                                          &s->vhost_vdpa.migration_blocker);
> > >>>
> > >>> Since we do validation at initialization, is this necessary to valid
> > >>> once again in other places?
> > >> Ok, after reading patch 13, I think the question is:
> > >>
> > >> The validation seems to be independent to net, can we valid it once
> > >> during vhost_vdpa_init()?
> > >>
> > > vhost_vdpa_net_valid_svq_features also checks for net features. In
> > > particular, all the non transport features must be in
> > > vdpa_svq_device_features.
> > >
> > > This is how we protect that the device / guest will never negotiate
> > > things like VLAN filtering support, as SVQ still does not know how to
> > > restore at the destination.
> > >
> > > In the VLAN filtering case CVQ is needed to restore VLAN, so it is
> > > covered by patch 11/15. But other future features may need support for
> > > restoring it in the destination.
> >
> >
> > I wonder how hard to have a general validation code let net specific
> > code to advertise a blacklist to avoid code duplication.
> >
>
> A blacklist does not work here, because I don't know if SVQ needs
> changes for future feature bits that are still not in / proposed to
> the standard.

Could you give me an example for this?

>
> Regarding the code duplication, do you mean to validate transport
> features and net specific features in one shot, instead of having a
> dedicated function for SVQ transport?

Nope.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >> Thanks
> > >>
> > >>> Thanks
> > >>>
> > >>>
> > >>>> +    } else if (!is_datapath) {
> > >>>>            s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> > >>>>                                                vhost_vdpa_net_cvq_cmd_page_len());
> > >>>>            memset(s->cvq_cmd_out_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
> > >>>> @@ -956,7 +960,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >>>>        for (i = 0; i < queue_pairs; i++) {
> > >>>>            ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > >>>>                                         vdpa_device_fd, i, 2, true, opts->x_svq,
> > >>>> -                                     iova_range);
> > >>>> +                                     iova_range, features);
> > >>>>            if (!ncs[i])
> > >>>>                goto err;
> > >>>>        }
> > >>>> @@ -964,7 +968,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >>>>        if (has_cvq) {
> > >>>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > >>>>                                     vdpa_device_fd, i, 1, false,
> > >>>> -                                 opts->x_svq, iova_range);
> > >>>> +                                 opts->x_svq, iova_range, features);
> > >>>>            if (!nc)
> > >>>>                goto err;
> > >>>>        }
> >
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 01/15] vdpa net: move iova tree creation from init to start
  2023-03-03  8:00         ` Eugenio Perez Martin
@ 2023-03-06  3:43           ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2023-03-06  3:43 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Fri, Mar 3, 2023 at 4:01 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Fri, Mar 3, 2023 at 4:32 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2023/3/1 15:01, Eugenio Perez Martin 写道:
> > > On Mon, Feb 27, 2023 at 8:04 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > >>> Only create iova_tree if and when it is needed.
> > >>>
> > >>> The cleanup keeps being responsible of last VQ but this change allows it
> > >>> to merge both cleanup functions.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> Acked-by: Jason Wang <jasowang@redhat.com>
> > >>> ---
> > >>> v4:
> > >>> * Remove leak of iova_tree because double allocation
> > >>> * Document better the sharing of IOVA tree between data and CVQ
> > >>> ---
> > >>>    net/vhost-vdpa.c | 113 ++++++++++++++++++++++++++++++++++-------------
> > >>>    1 file changed, 83 insertions(+), 30 deletions(-)
> > >>>
> > >>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > >>> index de5ed8ff22..b89c99066a 100644
> > >>> --- a/net/vhost-vdpa.c
> > >>> +++ b/net/vhost-vdpa.c
> > >>> @@ -178,13 +178,9 @@ err_init:
> > >>>    static void vhost_vdpa_cleanup(NetClientState *nc)
> > >>>    {
> > >>>        VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > >>> -    struct vhost_dev *dev = &s->vhost_net->dev;
> > >>>
> > >>>        qemu_vfree(s->cvq_cmd_out_buffer);
> > >>>        qemu_vfree(s->status);
> > >>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > >>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > >>> -    }
> > >>>        if (s->vhost_net) {
> > >>>            vhost_net_cleanup(s->vhost_net);
> > >>>            g_free(s->vhost_net);
> > >>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
> > >>>        return size;
> > >>>    }
> > >>>
> > >>> +/** From any vdpa net client, get the netclient of first queue pair */
> > >>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> > >>> +{
> > >>> +    NICState *nic = qemu_get_nic(s->nc.peer);
> > >>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> > >>> +
> > >>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> > >>> +}
> > >>> +
> > >>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> > >>> +{
> > >>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > >>> +
> > >>> +    if (v->shadow_vqs_enabled) {
> > >>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > >>> +                                           v->iova_range.last);
> > >>> +    }
> > >>> +}
> > >>> +
> > >>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
> > >>> +{
> > >>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > >>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > >>> +
> > >>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > >>> +
> > >>> +    if (v->index == 0) {
> > >>> +        vhost_vdpa_net_data_start_first(s);
> > >>> +        return 0;
> > >>> +    }
> > >>> +
> > >>> +    if (v->shadow_vqs_enabled) {
> > >>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> > >>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> > >>> +    }
> > >>> +
> > >>> +    return 0;
> > >>> +}
> > >>> +
> > >>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> > >>> +{
> > >>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > >>> +    struct vhost_dev *dev;
> > >>> +
> > >>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > >>> +
> > >>> +    dev = s->vhost_vdpa.dev;
> > >>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > >>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > >>> +    }
> > >>> +}
> > >>> +
> > >>>    static NetClientInfo net_vhost_vdpa_info = {
> > >>>            .type = NET_CLIENT_DRIVER_VHOST_VDPA,
> > >>>            .size = sizeof(VhostVDPAState),
> > >>>            .receive = vhost_vdpa_receive,
> > >>> +        .start = vhost_vdpa_net_data_start,
> > >>> +        .stop = vhost_vdpa_net_client_stop,
> > >>
> > >> Looking at the implementation, it seems nothing net specific, any reason
> > >> we can't simply use vhost_vdpa_dev_start()?
> > >>
> > > IOVA tree must be shared between (at least) all dataplane vhost_vdpa.
> > > How could we move the call to vhost_vdpa_net_first_nc_vdpa to
> > > vhost_vdpa_dev_start?
> >
> >
> > Ok, I think I get it. We should really consider to implement a parent
> > structure in the future for vhost_vdpa then we can avoid tricks like:
> >
> > vq_index_end and vhost_vdpa_net_first_nc_vdpa()
> >
>
> Sounds right. Maybe it is enough to link all of them with a QLIST?

That's also fine but you need to place the parent data into the head
structure which seems not clean than having pointers to point to the
same parent structure.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > A possibility is to always allocate it just in case. But it seems to
> > > me it is better to not start allocating resources just in case :).
> > >
> > >>>            .cleanup = vhost_vdpa_cleanup,
> > >>>            .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> > >>>            .has_ufo = vhost_vdpa_has_ufo,
> > >>> @@ -351,7 +401,7 @@ dma_map_err:
> > >>>
> > >>>    static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> > >>>    {
> > >>> -    VhostVDPAState *s;
> > >>> +    VhostVDPAState *s, *s0;
> > >>>        struct vhost_vdpa *v;
> > >>>        uint64_t backend_features;
> > >>>        int64_t cvq_group;
> > >>> @@ -415,8 +465,6 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> > >>>            return r;
> > >>>        }
> > >>>
> > >>> -    v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > >>> -                                       v->iova_range.last);
> > >>>        v->shadow_vqs_enabled = true;
> > >>>        s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
> > >>>
> > >>> @@ -425,6 +473,27 @@ out:
> > >>>            return 0;
> > >>>        }
> > >>>
> > >>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> > >>> +    if (s0->vhost_vdpa.iova_tree) {
> > >>> +        /*
> > >>> +         * SVQ is already configured for all virtqueues.  Reuse IOVA tree for
> > >>> +         * simplicity, wether CVQ shares ASID with guest or not, because:
> > >>
> > >> Typo, should be "whether", or "regardless of whether"(not a native speaker).
> > >>
> > > Good catch, I can fix it in the next version.
> > >
> > > Thanks!
> > >
> > >> Other looks good.
> > >>
> > >> Thanks
> > >>
> > >>
> > >>> +         * - Memory listener need access to guest's memory addresses allocated
> > >>> +         *   in the IOVA tree.
> > >>> +         * - There should be plenty of IOVA address space for both ASID not to
> > >>> +         *   worry about collisions between them.  Guest's translations are
> > >>> +         *   still validated with virtio virtqueue_pop so there is no risk for
> > >>> +         *   the guest to access memory it shouldn't.
> > >>> +         *
> > >>> +         * To allocate a iova tree per ASID is doable but it complicates the
> > >>> +         * code and it is not worth for the moment.
> > >>> +         */
> > >>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> > >>> +    } else {
> > >>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > >>> +                                           v->iova_range.last);
> > >>> +    }
> > >>> +
> > >>>        r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
> > >>>                                   vhost_vdpa_net_cvq_cmd_page_len(), false);
> > >>>        if (unlikely(r < 0)) {
> > >>> @@ -449,15 +518,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
> > >>>        if (s->vhost_vdpa.shadow_vqs_enabled) {
> > >>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> > >>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> > >>> -        if (!s->always_svq) {
> > >>> -            /*
> > >>> -             * If only the CVQ is shadowed we can delete this safely.
> > >>> -             * If all the VQs are shadows this will be needed by the time the
> > >>> -             * device is started again to register SVQ vrings and similar.
> > >>> -             */
> > >>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > >>> -        }
> > >>>        }
> > >>> +
> > >>> +    vhost_vdpa_net_client_stop(nc);
> > >>>    }
> > >>>
> > >>>    static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
> > >>> @@ -667,8 +730,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >>>                                           int nvqs,
> > >>>                                           bool is_datapath,
> > >>>                                           bool svq,
> > >>> -                                       struct vhost_vdpa_iova_range iova_range,
> > >>> -                                       VhostIOVATree *iova_tree)
> > >>> +                                       struct vhost_vdpa_iova_range iova_range)
> > >>>    {
> > >>>        NetClientState *nc = NULL;
> > >>>        VhostVDPAState *s;
> > >>> @@ -690,7 +752,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
> > >>>        s->vhost_vdpa.iova_range = iova_range;
> > >>>        s->vhost_vdpa.shadow_data = svq;
> > >>> -    s->vhost_vdpa.iova_tree = iova_tree;
> > >>>        if (!is_datapath) {
> > >>>            s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> > >>>                                                vhost_vdpa_net_cvq_cmd_page_len());
> > >>> @@ -760,7 +821,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >>>        uint64_t features;
> > >>>        int vdpa_device_fd;
> > >>>        g_autofree NetClientState **ncs = NULL;
> > >>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
> > >>>        struct vhost_vdpa_iova_range iova_range;
> > >>>        NetClientState *nc;
> > >>>        int queue_pairs, r, i = 0, has_cvq = 0;
> > >>> @@ -812,12 +872,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >>>            goto err;
> > >>>        }
> > >>>
> > >>> -    if (opts->x_svq) {
> > >>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> > >>> -            goto err_svq;
> > >>> -        }
> > >>> -
> > >>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > >>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
> > >>> +        goto err;
> > >>>        }
> > >>>
> > >>>        ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> > >>> @@ -825,7 +881,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >>>        for (i = 0; i < queue_pairs; i++) {
> > >>>            ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > >>>                                         vdpa_device_fd, i, 2, true, opts->x_svq,
> > >>> -                                     iova_range, iova_tree);
> > >>> +                                     iova_range);
> > >>>            if (!ncs[i])
> > >>>                goto err;
> > >>>        }
> > >>> @@ -833,13 +889,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >>>        if (has_cvq) {
> > >>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > >>>                                     vdpa_device_fd, i, 1, false,
> > >>> -                                 opts->x_svq, iova_range, iova_tree);
> > >>> +                                 opts->x_svq, iova_range);
> > >>>            if (!nc)
> > >>>                goto err;
> > >>>        }
> > >>>
> > >>> -    /* iova_tree ownership belongs to last NetClientState */
> > >>> -    g_steal_pointer(&iova_tree);
> > >>>        return 0;
> > >>>
> > >>>    err:
> > >>> @@ -849,7 +903,6 @@ err:
> > >>>            }
> > >>>        }
> > >>>
> > >>> -err_svq:
> > >>>        qemu_close(vdpa_device_fd);
> > >>>
> > >>>        return -1;
> >
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/15] vdpa: block migration if device has unsupported features
  2023-03-06  3:42             ` Jason Wang
@ 2023-03-06 11:32               ` Eugenio Perez Martin
  2023-03-07  6:48                 ` Jason Wang
  0 siblings, 1 reply; 48+ messages in thread
From: Eugenio Perez Martin @ 2023-03-06 11:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Mon, Mar 6, 2023 at 4:42 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Fri, Mar 3, 2023 at 4:58 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Fri, Mar 3, 2023 at 4:48 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2023/3/2 03:32, Eugenio Perez Martin 写道:
> > > > On Mon, Feb 27, 2023 at 9:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >> On Mon, Feb 27, 2023 at 4:15 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >>>
> > > >>> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > > >>>> A vdpa net device must initialize with SVQ in order to be migratable at
> > > >>>> this moment, and initialization code verifies some conditions.  If the
> > > >>>> device is not initialized with the x-svq parameter, it will not expose
> > > >>>> _F_LOG so the vhost subsystem will block VM migration from its
> > > >>>> initialization.
> > > >>>>
> > > >>>> Next patches change this, so we need to verify migration conditions
> > > >>>> differently.
> > > >>>>
> > > >>>> QEMU only supports a subset of net features in SVQ, and it cannot
> > > >>>> migrate state that cannot track or restore in the destination.  Add a
> > > >>>> migration blocker if the device offer an unsupported feature.
> > > >>>>
> > > >>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > >>>> ---
> > > >>>> v3: add mirgation blocker properly so vhost_dev can handle it.
> > > >>>> ---
> > > >>>>    net/vhost-vdpa.c | 12 ++++++++----
> > > >>>>    1 file changed, 8 insertions(+), 4 deletions(-)
> > > >>>>
> > > >>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > >>>> index 4f983df000..094dc1c2d0 100644
> > > >>>> --- a/net/vhost-vdpa.c
> > > >>>> +++ b/net/vhost-vdpa.c
> > > >>>> @@ -795,7 +795,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > >>>>                                           int nvqs,
> > > >>>>                                           bool is_datapath,
> > > >>>>                                           bool svq,
> > > >>>> -                                       struct vhost_vdpa_iova_range iova_range)
> > > >>>> +                                       struct vhost_vdpa_iova_range iova_range,
> > > >>>> +                                       uint64_t features)
> > > >>>>    {
> > > >>>>        NetClientState *nc = NULL;
> > > >>>>        VhostVDPAState *s;
> > > >>>> @@ -818,7 +819,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > >>>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
> > > >>>>        s->vhost_vdpa.iova_range = iova_range;
> > > >>>>        s->vhost_vdpa.shadow_data = svq;
> > > >>>> -    if (!is_datapath) {
> > > >>>> +    if (queue_pair_index == 0) {
> > > >>>> +        vhost_vdpa_net_valid_svq_features(features,
> > > >>>> +                                          &s->vhost_vdpa.migration_blocker);
> > > >>>
> > > >>> Since we do validation at initialization, is this necessary to valid
> > > >>> once again in other places?
> > > >> Ok, after reading patch 13, I think the question is:
> > > >>
> > > >> The validation seems to be independent to net, can we valid it once
> > > >> during vhost_vdpa_init()?
> > > >>
> > > > vhost_vdpa_net_valid_svq_features also checks for net features. In
> > > > particular, all the non transport features must be in
> > > > vdpa_svq_device_features.
> > > >
> > > > This is how we protect that the device / guest will never negotiate
> > > > things like VLAN filtering support, as SVQ still does not know how to
> > > > restore at the destination.
> > > >
> > > > In the VLAN filtering case CVQ is needed to restore VLAN, so it is
> > > > covered by patch 11/15. But other future features may need support for
> > > > restoring it in the destination.
> > >
> > >
> > > I wonder how hard to have a general validation code let net specific
> > > code to advertise a blacklist to avoid code duplication.
> > >
> >
> > A blacklist does not work here, because I don't know if SVQ needs
> > changes for future feature bits that are still not in / proposed to
> > the standard.
>
> Could you give me an example for this?
>

Maybe I'm not understanding your blacklist proposal. I'm going to
explain my thoughts on it with a few examples.

SVQ was merged in qemu before VIRTIO_F_RING_RESET, and there are some
proposals like VIRTIO_NET_F_NOTF_COAL or VIRTIO_F_PARTIAL_ORDER in the
virtio-comment list.

If we had gone with the blacklist approach back then, the blacklist
would contain all the features of Virtio standard but the one we do
support in SVQ, isn't it? Then, VIRTIO_F_RING_RESET would get merged,
and SVQ would claim it supports it, but it is not true.

The same can happen with the other two features.
VIRTIO_NET_F_NOTF_COAL will be required to migrate coalescence
parameters, but it is not supported for the moment. _F_PARTIAL_ORDER
will also require changes to hw/virtio/vhost-shadow-virtqueue.c code,
since SVQ it's the "driver" in charge of the SVQ vring.

Most of the changes will only require small changes to support sending
the CVQ message in the destination and to track the state change
parsing CVQ, or no changes at all (like for supporting
VIRTIO_NET_F_SPEED_DUPLEX). But SVQ cannot claim it supports it
anyway.

The only alternative I can think of is to actually block new proposals
(like past VIRTIO_F_RING_RESET) until they either do the changes on
SVQ too or add a blacklist item. I think it is too intrusive.
Especially on this stage of SVQ where not even all QEMU features are
supported. Maybe we can reevaluate it in Q3 or Q4 for example?


> >
> > Regarding the code duplication, do you mean to validate transport
> > features and net specific features in one shot, instead of having a
> > dedicated function for SVQ transport?
>
> Nope.
>

Please expand, maybe I can do something to solve it :).

Thanks!



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/15] vdpa: block migration if device has unsupported features
  2023-03-06 11:32               ` Eugenio Perez Martin
@ 2023-03-07  6:48                 ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2023-03-07  6:48 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Stefano Garzarella, Shannon Nelson, Gautam Dawar,
	Laurent Vivier, alvaro.karsz, longpeng2, virtualization,
	Stefan Hajnoczi, Cindy Lu, Michael S. Tsirkin, si-wei.liu,
	Liuxiangdong, Parav Pandit, Eli Cohen, Zhu Lingshan,
	Harpreet Singh Anand, Gonglei (Arei), Lei Yang

On Mon, Mar 6, 2023 at 7:33 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Mon, Mar 6, 2023 at 4:42 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Fri, Mar 3, 2023 at 4:58 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Fri, Mar 3, 2023 at 4:48 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2023/3/2 03:32, Eugenio Perez Martin 写道:
> > > > > On Mon, Feb 27, 2023 at 9:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >> On Mon, Feb 27, 2023 at 4:15 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>
> > > > >>> 在 2023/2/24 23:54, Eugenio Pérez 写道:
> > > > >>>> A vdpa net device must initialize with SVQ in order to be migratable at
> > > > >>>> this moment, and initialization code verifies some conditions.  If the
> > > > >>>> device is not initialized with the x-svq parameter, it will not expose
> > > > >>>> _F_LOG so the vhost subsystem will block VM migration from its
> > > > >>>> initialization.
> > > > >>>>
> > > > >>>> Next patches change this, so we need to verify migration conditions
> > > > >>>> differently.
> > > > >>>>
> > > > >>>> QEMU only supports a subset of net features in SVQ, and it cannot
> > > > >>>> migrate state that cannot track or restore in the destination.  Add a
> > > > >>>> migration blocker if the device offer an unsupported feature.
> > > > >>>>
> > > > >>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > >>>> ---
> > > > >>>> v3: add mirgation blocker properly so vhost_dev can handle it.
> > > > >>>> ---
> > > > >>>>    net/vhost-vdpa.c | 12 ++++++++----
> > > > >>>>    1 file changed, 8 insertions(+), 4 deletions(-)
> > > > >>>>
> > > > >>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > >>>> index 4f983df000..094dc1c2d0 100644
> > > > >>>> --- a/net/vhost-vdpa.c
> > > > >>>> +++ b/net/vhost-vdpa.c
> > > > >>>> @@ -795,7 +795,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > > >>>>                                           int nvqs,
> > > > >>>>                                           bool is_datapath,
> > > > >>>>                                           bool svq,
> > > > >>>> -                                       struct vhost_vdpa_iova_range iova_range)
> > > > >>>> +                                       struct vhost_vdpa_iova_range iova_range,
> > > > >>>> +                                       uint64_t features)
> > > > >>>>    {
> > > > >>>>        NetClientState *nc = NULL;
> > > > >>>>        VhostVDPAState *s;
> > > > >>>> @@ -818,7 +819,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > > >>>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
> > > > >>>>        s->vhost_vdpa.iova_range = iova_range;
> > > > >>>>        s->vhost_vdpa.shadow_data = svq;
> > > > >>>> -    if (!is_datapath) {
> > > > >>>> +    if (queue_pair_index == 0) {
> > > > >>>> +        vhost_vdpa_net_valid_svq_features(features,
> > > > >>>> +                                          &s->vhost_vdpa.migration_blocker);
> > > > >>>
> > > > >>> Since we do validation at initialization, is this necessary to valid
> > > > >>> once again in other places?
> > > > >> Ok, after reading patch 13, I think the question is:
> > > > >>
> > > > >> The validation seems to be independent to net, can we valid it once
> > > > >> during vhost_vdpa_init()?
> > > > >>
> > > > > vhost_vdpa_net_valid_svq_features also checks for net features. In
> > > > > particular, all the non transport features must be in
> > > > > vdpa_svq_device_features.
> > > > >
> > > > > This is how we protect that the device / guest will never negotiate
> > > > > things like VLAN filtering support, as SVQ still does not know how to
> > > > > restore at the destination.
> > > > >
> > > > > In the VLAN filtering case CVQ is needed to restore VLAN, so it is
> > > > > covered by patch 11/15. But other future features may need support for
> > > > > restoring it in the destination.
> > > >
> > > >
> > > > I wonder how hard to have a general validation code let net specific
> > > > code to advertise a blacklist to avoid code duplication.
> > > >
> > >
> > > A blacklist does not work here, because I don't know if SVQ needs
> > > changes for future feature bits that are still not in / proposed to
> > > the standard.
> >
> > Could you give me an example for this?
> >
>
> Maybe I'm not understanding your blacklist proposal. I'm going to
> explain my thoughts on it with a few examples.
>
> SVQ was merged in qemu before VIRTIO_F_RING_RESET, and there are some
> proposals like VIRTIO_NET_F_NOTF_COAL or VIRTIO_F_PARTIAL_ORDER in the
> virtio-comment list.
>
> If we had gone with the blacklist approach back then, the blacklist
> would contain all the features of Virtio standard but the one we do
> support in SVQ, isn't it? Then, VIRTIO_F_RING_RESET would get merged,
> and SVQ would claim it supports it, but it is not true.

To solve this, the general SVQ code can have a whitelist for transport features?

>
> The same can happen with the other two features.
> VIRTIO_NET_F_NOTF_COAL will be required to migrate coalescence
> parameters, but it is not supported for the moment. _F_PARTIAL_ORDER
> will also require changes to hw/virtio/vhost-shadow-virtqueue.c code,
> since SVQ it's the "driver" in charge of the SVQ vring.
>
> Most of the changes will only require small changes to support sending
> the CVQ message in the destination and to track the state change
> parsing CVQ, or no changes at all (like for supporting
> VIRTIO_NET_F_SPEED_DUPLEX). But SVQ cannot claim it supports it
> anyway.
>
> The only alternative I can think of is to actually block new proposals
> (like past VIRTIO_F_RING_RESET) until they either do the changes on
> SVQ too or add a blacklist item. I think it is too intrusive.
> Especially on this stage of SVQ where not even all QEMU features are
> supported. Maybe we can reevaluate it in Q3 or Q4 for example?

Yes, the change is not a must just want to see if we can simply do anything.

>
>
> > >
> > > Regarding the code duplication, do you mean to validate transport
> > > features and net specific features in one shot, instead of having a
> > > dedicated function for SVQ transport?
> >
> > Nope.
> >
>
> Please expand, maybe I can do something to solve it :).

Sure, I just want to make sure we are talking about the same thing
before I can expand :)

Thanks

>
> Thanks!
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2023-03-07  6:49 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-24 15:54 [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
2023-02-24 15:54 ` [PATCH v4 01/15] vdpa net: move iova tree creation from init to start Eugenio Pérez
2023-02-27  7:04   ` Jason Wang
2023-03-01  7:01     ` Eugenio Perez Martin
2023-03-03  3:32       ` Jason Wang
2023-03-03  8:00         ` Eugenio Perez Martin
2023-03-06  3:43           ` Jason Wang
2023-02-24 15:54 ` [PATCH v4 02/15] vdpa: Remember last call fd set Eugenio Pérez
2023-02-24 15:54 ` [PATCH v4 03/15] vdpa: stop svq at vhost_vdpa_dev_start(false) Eugenio Pérez
2023-02-27  7:15   ` Jason Wang
2023-03-03 16:29     ` Eugenio Perez Martin
2023-02-24 15:54 ` [PATCH v4 04/15] vdpa: Negotiate _F_SUSPEND feature Eugenio Pérez
2023-02-24 15:54 ` [PATCH v4 05/15] vdpa: move vhost reset after get vring base Eugenio Pérez
2023-02-27  7:22   ` Jason Wang
2023-03-01 19:11     ` Eugenio Perez Martin
2023-02-24 15:54 ` [PATCH v4 06/15] vdpa: add vhost_vdpa->suspended parameter Eugenio Pérez
2023-02-27  7:24   ` Jason Wang
2023-03-01 19:11     ` Eugenio Perez Martin
2023-02-24 15:54 ` [PATCH v4 07/15] vdpa: add vhost_vdpa_suspend Eugenio Pérez
2023-02-27  7:27   ` Jason Wang
2023-03-01  1:30   ` Si-Wei Liu
2023-03-03 16:34     ` Eugenio Perez Martin
2023-02-24 15:54 ` [PATCH v4 08/15] vdpa: rewind at get_base, not set_base Eugenio Pérez
2023-02-27  7:34   ` Jason Wang
2023-02-24 15:54 ` [PATCH v4 09/15] vdpa: add vdpa net migration state notifier Eugenio Pérez
2023-02-27  8:08   ` Jason Wang
2023-03-01 19:26     ` Eugenio Perez Martin
2023-03-03  3:34       ` Jason Wang
2023-03-03  8:42         ` Eugenio Perez Martin
2023-02-24 15:54 ` [PATCH v4 10/15] vdpa: disable RAM block discard only for the first device Eugenio Pérez
2023-02-27  8:11   ` Jason Wang
2023-03-02 15:11     ` Eugenio Perez Martin
2023-02-24 15:54 ` [PATCH v4 11/15] vdpa net: block migration if the device has CVQ Eugenio Pérez
2023-02-27  8:12   ` Jason Wang
2023-03-02 15:13     ` Eugenio Perez Martin
2023-02-24 15:54 ` [PATCH v4 12/15] vdpa: block migration if device has unsupported features Eugenio Pérez
2023-02-27  8:15   ` Jason Wang
2023-02-27  8:19     ` Jason Wang
2023-03-01 19:32       ` Eugenio Perez Martin
2023-03-03  3:48         ` Jason Wang
2023-03-03  8:58           ` Eugenio Perez Martin
2023-03-06  3:42             ` Jason Wang
2023-03-06 11:32               ` Eugenio Perez Martin
2023-03-07  6:48                 ` Jason Wang
2023-02-24 15:54 ` [PATCH v4 13/15] vdpa: block migration if SVQ does not admit a feature Eugenio Pérez
2023-02-24 15:54 ` [PATCH v4 14/15] vdpa net: allow VHOST_F_LOG_ALL Eugenio Pérez
2023-02-24 15:54 ` [PATCH v4 15/15] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices Eugenio Pérez
2023-02-27 12:40 ` [PATCH v4 00/15] Dynamically switch to vhost shadow virtqueues at vdpa net migration Alvaro Karsz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).