[PATCH 0/2] Send all the SVQ control commands in parallel

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Send all the SVQ control commands in parallel
@ 2023-04-19 11:49 Hawkins Jiawei
  2023-04-19 11:49 ` [PATCH 1/2] vdpa: rename vhost_vdpa_net_cvq_add() Hawkins Jiawei
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Hawkins Jiawei @ 2023-04-19 11:49 UTC (permalink / raw)
  To: eperezma, jasowang; +Cc: yin31149, 18801353760, qemu-devel

This patchset allows QEMU to poll and check the device used buffer
after sending all SVQ control commands, instead of polling and checking
immediately after sending each SVQ control command, so that QEMU can
send all the SVQ control commands in parallel, which have better
performance improvement.

I use vdpa_sim_net to simulate vdpa device, refactor
vhost_vdpa_net_load() to call vhost_vdpa_net_load_mac() 30 times,
refactor `net_vhost_vdpa_cvq_info.load` to call vhost_vdpa_net_load()
1000 times, to build a test environment for sending
multiple SVQ control commands. Time in monotonic to
finish `net_vhost_vdpa_cvq_info.load`:

    QEMU                            monotonic time
--------------------------------------------------
not patched                              89202
--------------------------------------------------
patched                                  80455

This patchset resolves the GitLab issue at
https://gitlab.com/qemu-project/qemu/-/issues/1578.

Hawkins Jiawei (2):
  vdpa: rename vhost_vdpa_net_cvq_add()
  vdpa: send CVQ state load commands in parallel

 net/vhost-vdpa.c | 150 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 112 insertions(+), 38 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] vdpa: rename vhost_vdpa_net_cvq_add()
  2023-04-19 11:49 [PATCH 0/2] Send all the SVQ control commands in parallel Hawkins Jiawei
@ 2023-04-19 11:49 ` Hawkins Jiawei
  2023-04-19 11:49 ` [PATCH 2/2] vdpa: send CVQ state load commands in parallel Hawkins Jiawei
  2023-04-19 17:16 ` [PATCH 0/2] Send all the SVQ control " Eugenio Perez Martin
  2 siblings, 0 replies; 8+ messages in thread
From: Hawkins Jiawei @ 2023-04-19 11:49 UTC (permalink / raw)
  To: eperezma, jasowang; +Cc: yin31149, 18801353760, qemu-devel

We want to introduce a new version of vhost_vdpa_net_cvq_add() that
does not poll immediately after forwarding custom buffers
to the device, so that QEMU can send all the SVQ control commands
in parallel instead of serialized.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
---
 net/vhost-vdpa.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 99904a0da7..10804c7200 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -590,8 +590,14 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
     vhost_vdpa_net_client_stop(nc);
 }
 
-static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
-                                      size_t in_len)
+/**
+ * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ,
+ * kicks the device and polls the device used buffers.
+ *
+ * Return the length written by the device.
+ */
+static ssize_t vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s,
+                                    size_t out_len, size_t in_len)
 {
     /* Buffers for the device */
     const struct iovec out = {
@@ -636,7 +642,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class,
     memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl));
     memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size);
 
-    return vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + data_size,
+    return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size,
                                   sizeof(virtio_net_ctrl_ack));
 }
 
@@ -753,7 +759,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
         dev_written = sizeof(status);
         *s->status = VIRTIO_NET_OK;
     } else {
-        dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status));
+        dev_written = vhost_vdpa_net_cvq_add_and_wait(s, out.iov_len,
+                                                      sizeof(status));
         if (unlikely(dev_written < 0)) {
             goto out;
         }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] vdpa: send CVQ state load commands in parallel
  2023-04-19 11:49 [PATCH 0/2] Send all the SVQ control commands in parallel Hawkins Jiawei
  2023-04-19 11:49 ` [PATCH 1/2] vdpa: rename vhost_vdpa_net_cvq_add() Hawkins Jiawei
@ 2023-04-19 11:49 ` Hawkins Jiawei
  2023-04-19 17:42   ` Eugenio Perez Martin
  2023-04-19 17:16 ` [PATCH 0/2] Send all the SVQ control " Eugenio Perez Martin
  2 siblings, 1 reply; 8+ messages in thread
From: Hawkins Jiawei @ 2023-04-19 11:49 UTC (permalink / raw)
  To: eperezma, jasowang; +Cc: yin31149, 18801353760, qemu-devel

This patch introduces the vhost_vdpa_net_cvq_add() and
refactors the vhost_vdpa_net_load*(), so that QEMU can
send CVQ state load commands in parallel.

To be more specific, this patch introduces vhost_vdpa_net_cvq_add()
to add SVQ control commands to SVQ and kick the device,
but does not poll the device used buffers. QEMU will not
poll and check the device used buffers in vhost_vdpa_net_load()
until all CVQ state load commands have been sent to the device.

What's more, in order to avoid buffer overwriting caused by
using `svq->cvq_cmd_out_buffer` and `svq->status` as the
buffer for all CVQ state load commands when sending
CVQ state load commands in parallel, this patch introduces
`out_cursor` and `in_cursor` in vhost_vdpa_net_load(),
pointing to the available buffer for in descriptor and
out descriptor, so that different CVQ state load commands can
use their unique buffer.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
---
 net/vhost-vdpa.c | 137 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 102 insertions(+), 35 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 10804c7200..d1f7113992 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -590,6 +590,44 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
     vhost_vdpa_net_client_stop(nc);
 }
 
+/**
+ * vhost_vdpa_net_cvq_add() adds SVQ control commands to SVQ,
+ * kicks the device but does not poll the device used buffers.
+ *
+ * Return the length of the device used buffers.
+ */
+static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s,
+                                void **out_cursor, size_t out_len,
+                                void **in_cursor, size_t in_len)
+{
+    /* Buffers for the device */
+    const struct iovec out = {
+        .iov_base = *out_cursor,
+        .iov_len = out_len,
+    };
+    const struct iovec in = {
+        .iov_base = *in_cursor,
+        .iov_len = in_len,
+    };
+    VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
+    int r;
+
+    r = vhost_svq_add(svq, &out, 1, &in, 1, NULL);
+    if (unlikely(r != 0)) {
+        if (unlikely(r == -ENOSPC)) {
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
+                          __func__);
+        }
+        return r;
+    }
+
+    /* Update the cursor */
+    *out_cursor += out_len;
+    *in_cursor += in_len;
+
+    return in_len;
+}
+
 /**
  * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ,
  * kicks the device and polls the device used buffers.
@@ -628,69 +666,64 @@ static ssize_t vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s,
     return vhost_svq_poll(svq);
 }
 
-static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class,
-                                       uint8_t cmd, const void *data,
-                                       size_t data_size)
+static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s,
+                                void **out_cursor, uint8_t class, uint8_t cmd,
+                                const void *data, size_t data_size,
+                                void **in_cursor)
 {
     const struct virtio_net_ctrl_hdr ctrl = {
         .class = class,
         .cmd = cmd,
     };
 
-    assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl));
+    assert(sizeof(ctrl) < vhost_vdpa_net_cvq_cmd_page_len() -
+                          (*out_cursor - s->cvq_cmd_out_buffer));
+    assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl) -
+                       (*out_cursor - s->cvq_cmd_out_buffer));
 
-    memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl));
-    memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size);
+    memcpy(*out_cursor, &ctrl, sizeof(ctrl));
+    memcpy(*out_cursor + sizeof(ctrl), data, data_size);
 
-    return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size,
-                                  sizeof(virtio_net_ctrl_ack));
+    return vhost_vdpa_net_cvq_add(s, out_cursor, sizeof(ctrl) + data_size,
+                                  in_cursor, sizeof(virtio_net_ctrl_ack));
 }
 
-static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n)
+static ssize_t vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n,
+                                   void **out_cursor, void **in_cursor)
 {
     uint64_t features = n->parent_obj.guest_features;
     if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
-        ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MAC,
-                                                  VIRTIO_NET_CTRL_MAC_ADDR_SET,
-                                                  n->mac, sizeof(n->mac));
-        if (unlikely(dev_written < 0)) {
-            return dev_written;
-        }
-
-        return *s->status != VIRTIO_NET_OK;
+        return vhost_vdpa_net_load_cmd(s, out_cursor, VIRTIO_NET_CTRL_MAC,
+                                       VIRTIO_NET_CTRL_MAC_ADDR_SET,
+                                       n->mac, sizeof(n->mac), in_cursor);
     }
 
     return 0;
 }
 
-static int vhost_vdpa_net_load_mq(VhostVDPAState *s,
-                                  const VirtIONet *n)
+static ssize_t vhost_vdpa_net_load_mq(VhostVDPAState *s, const VirtIONet *n,
+                                  void **out_cursor, void **in_cursor)
 {
     struct virtio_net_ctrl_mq mq;
     uint64_t features = n->parent_obj.guest_features;
-    ssize_t dev_written;
 
     if (!(features & BIT_ULL(VIRTIO_NET_F_MQ))) {
         return 0;
     }
 
-    mq.virtqueue_pairs = cpu_to_le16(n->curr_queue_pairs);
-    dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MQ,
-                                          VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &mq,
-                                          sizeof(mq));
-    if (unlikely(dev_written < 0)) {
-        return dev_written;
-    }
-
-    return *s->status != VIRTIO_NET_OK;
+    return vhost_vdpa_net_load_cmd(s, out_cursor, VIRTIO_NET_CTRL_MQ,
+                                   VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET,
+                                   &mq, sizeof(mq), in_cursor);
 }
 
 static int vhost_vdpa_net_load(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    VhostShadowVirtqueue *svq;
+    void *out_cursor, *in_cursor;
     struct vhost_vdpa *v = &s->vhost_vdpa;
     const VirtIONet *n;
-    int r;
+    ssize_t need_poll_len = 0, r, dev_written;
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
@@ -699,16 +732,50 @@ static int vhost_vdpa_net_load(NetClientState *nc)
     }
 
     n = VIRTIO_NET(v->dev->vdev);
-    r = vhost_vdpa_net_load_mac(s, n);
+
+    need_poll_len = 0;
+    out_cursor = s->cvq_cmd_out_buffer;
+    in_cursor = s->status;
+
+    r = vhost_vdpa_net_load_mac(s, n, &out_cursor, &in_cursor);
     if (unlikely(r < 0)) {
-        return r;
+        goto load_err;
+    }
+    need_poll_len += r;
+
+    r = vhost_vdpa_net_load_mq(s, n, &out_cursor, &in_cursor);
+    if (unlikely(r < 0)) {
+        goto load_err;
+    }
+    need_poll_len += r;
+
+load_err:
+    /* Poll for all SVQ control commands used buffer from device */
+    svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
+    while (need_poll_len > 0) {
+        /*
+         * We can poll here since we've had BQL from the time we sent the
+         * descriptor. Also, we need to take the answer before SVQ pulls
+         * by itself, when BQL is released
+         */
+        dev_written = vhost_svq_poll(svq);
+        need_poll_len -= dev_written;
     }
-    r = vhost_vdpa_net_load_mq(s, n);
-    if (unlikely(r)) {
+
+    /* If code comes from `load_err` label, then we should return directly */
+    if (unlikely(r < 0)) {
         return r;
     }
 
-    return 0;
+    /* Check the used buffer from device */
+    for (virtio_net_ctrl_ack * status = s->status; (void *)status < in_cursor;
+         ++status) {
+        if (*status != VIRTIO_NET_OK) {
+            ++r;
+        }
+    }
+
+    return r;
 }
 
 static NetClientInfo net_vhost_vdpa_cvq_info = {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] vdpa: send CVQ state load commands in parallel
  2023-04-19 11:49 ` [PATCH 2/2] vdpa: send CVQ state load commands in parallel Hawkins Jiawei
@ 2023-04-19 17:42   ` Eugenio Perez Martin
  2023-04-20 11:38     ` Hawkins Jiawei
  0 siblings, 1 reply; 8+ messages in thread
From: Eugenio Perez Martin @ 2023-04-19 17:42 UTC (permalink / raw)
  To: Hawkins Jiawei; +Cc: jasowang, 18801353760, qemu-devel

On Wed, Apr 19, 2023 at 1:50 PM Hawkins Jiawei <yin31149@gmail.com> wrote:
>
> This patch introduces the vhost_vdpa_net_cvq_add() and
> refactors the vhost_vdpa_net_load*(), so that QEMU can
> send CVQ state load commands in parallel.
>
> To be more specific, this patch introduces vhost_vdpa_net_cvq_add()
> to add SVQ control commands to SVQ and kick the device,
> but does not poll the device used buffers. QEMU will not
> poll and check the device used buffers in vhost_vdpa_net_load()
> until all CVQ state load commands have been sent to the device.
>
> What's more, in order to avoid buffer overwriting caused by
> using `svq->cvq_cmd_out_buffer` and `svq->status` as the
> buffer for all CVQ state load commands when sending
> CVQ state load commands in parallel, this patch introduces
> `out_cursor` and `in_cursor` in vhost_vdpa_net_load(),
> pointing to the available buffer for in descriptor and
> out descriptor, so that different CVQ state load commands can
> use their unique buffer.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578
> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
> ---
>  net/vhost-vdpa.c | 137 +++++++++++++++++++++++++++++++++++------------
>  1 file changed, 102 insertions(+), 35 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 10804c7200..d1f7113992 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -590,6 +590,44 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>      vhost_vdpa_net_client_stop(nc);
>  }
>
> +/**
> + * vhost_vdpa_net_cvq_add() adds SVQ control commands to SVQ,
> + * kicks the device but does not poll the device used buffers.
> + *
> + * Return the length of the device used buffers.
> + */
> +static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s,
> +                                void **out_cursor, size_t out_len,
> +                                void **in_cursor, size_t in_len)
> +{
> +    /* Buffers for the device */
> +    const struct iovec out = {
> +        .iov_base = *out_cursor,
> +        .iov_len = out_len,
> +    };
> +    const struct iovec in = {
> +        .iov_base = *in_cursor,
> +        .iov_len = in_len,
> +    };
> +    VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
> +    int r;
> +
> +    r = vhost_svq_add(svq, &out, 1, &in, 1, NULL);
> +    if (unlikely(r != 0)) {
> +        if (unlikely(r == -ENOSPC)) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
> +                          __func__);
> +        }
> +        return r;
> +    }
> +
> +    /* Update the cursor */
> +    *out_cursor += out_len;
> +    *in_cursor += in_len;
> +
> +    return in_len;
> +}
> +
>  /**
>   * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ,
>   * kicks the device and polls the device used buffers.
> @@ -628,69 +666,64 @@ static ssize_t vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s,
>      return vhost_svq_poll(svq);
>  }
>
> -static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class,
> -                                       uint8_t cmd, const void *data,
> -                                       size_t data_size)
> +static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s,
> +                                void **out_cursor, uint8_t class, uint8_t cmd,
> +                                const void *data, size_t data_size,
> +                                void **in_cursor)
>  {
>      const struct virtio_net_ctrl_hdr ctrl = {
>          .class = class,
>          .cmd = cmd,
>      };
>
> -    assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl));
> +    assert(sizeof(ctrl) < vhost_vdpa_net_cvq_cmd_page_len() -
> +                          (*out_cursor - s->cvq_cmd_out_buffer));
> +    assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl) -
> +                       (*out_cursor - s->cvq_cmd_out_buffer));
>
> -    memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl));
> -    memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size);
> +    memcpy(*out_cursor, &ctrl, sizeof(ctrl));
> +    memcpy(*out_cursor + sizeof(ctrl), data, data_size);
>
> -    return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size,
> -                                  sizeof(virtio_net_ctrl_ack));
> +    return vhost_vdpa_net_cvq_add(s, out_cursor, sizeof(ctrl) + data_size,
> +                                  in_cursor, sizeof(virtio_net_ctrl_ack));
>  }
>
> -static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n)
> +static ssize_t vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n,
> +                                   void **out_cursor, void **in_cursor)
>  {
>      uint64_t features = n->parent_obj.guest_features;
>      if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
> -        ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MAC,
> -                                                  VIRTIO_NET_CTRL_MAC_ADDR_SET,
> -                                                  n->mac, sizeof(n->mac));
> -        if (unlikely(dev_written < 0)) {
> -            return dev_written;
> -        }
> -
> -        return *s->status != VIRTIO_NET_OK;
> +        return vhost_vdpa_net_load_cmd(s, out_cursor, VIRTIO_NET_CTRL_MAC,
> +                                       VIRTIO_NET_CTRL_MAC_ADDR_SET,
> +                                       n->mac, sizeof(n->mac), in_cursor);
>      }
>
>      return 0;
>  }
>
> -static int vhost_vdpa_net_load_mq(VhostVDPAState *s,
> -                                  const VirtIONet *n)
> +static ssize_t vhost_vdpa_net_load_mq(VhostVDPAState *s, const VirtIONet *n,
> +                                  void **out_cursor, void **in_cursor)
>  {
>      struct virtio_net_ctrl_mq mq;
>      uint64_t features = n->parent_obj.guest_features;
> -    ssize_t dev_written;
>
>      if (!(features & BIT_ULL(VIRTIO_NET_F_MQ))) {
>          return 0;
>      }
>
> -    mq.virtqueue_pairs = cpu_to_le16(n->curr_queue_pairs);

So where is mq filled now?

> -    dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MQ,
> -                                          VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &mq,
> -                                          sizeof(mq));
> -    if (unlikely(dev_written < 0)) {
> -        return dev_written;
> -    }
> -
> -    return *s->status != VIRTIO_NET_OK;
> +    return vhost_vdpa_net_load_cmd(s, out_cursor, VIRTIO_NET_CTRL_MQ,
> +                                   VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET,
> +                                   &mq, sizeof(mq), in_cursor);
>  }
>
>  static int vhost_vdpa_net_load(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    VhostShadowVirtqueue *svq;
> +    void *out_cursor, *in_cursor;
>      struct vhost_vdpa *v = &s->vhost_vdpa;
>      const VirtIONet *n;
> -    int r;
> +    ssize_t need_poll_len = 0, r, dev_written;
>
>      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
> @@ -699,16 +732,50 @@ static int vhost_vdpa_net_load(NetClientState *nc)
>      }
>
>      n = VIRTIO_NET(v->dev->vdev);
> -    r = vhost_vdpa_net_load_mac(s, n);
> +

Extra newline.

> +    need_poll_len = 0;

Maybe we can call it "cmds_in_flight" or something similar? It is not a length.

> +    out_cursor = s->cvq_cmd_out_buffer;
> +    in_cursor = s->status;
> +
> +    r = vhost_vdpa_net_load_mac(s, n, &out_cursor, &in_cursor);
>      if (unlikely(r < 0)) {
> -        return r;
> +        goto load_err;
> +    }
> +    need_poll_len += r;
> +
> +    r = vhost_vdpa_net_load_mq(s, n, &out_cursor, &in_cursor);
> +    if (unlikely(r < 0)) {
> +        goto load_err;
> +    }
> +    need_poll_len += r;
> +
> +load_err:
> +    /* Poll for all SVQ control commands used buffer from device */
> +    svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
> +    while (need_poll_len > 0) {
> +        /*
> +         * We can poll here since we've had BQL from the time we sent the
> +         * descriptor. Also, we need to take the answer before SVQ pulls
> +         * by itself, when BQL is released
> +         */
> +        dev_written = vhost_svq_poll(svq);

If vhost_svq_poll returns 0 we must return an error from this code.
Otherwise this will be an infinite loop.

> +        need_poll_len -= dev_written;

On the other hand, vhost_svq_poll returns 1 because that is the length
written in the "in" buffer, but it is not obvious here. It is more
clear if we just do need_poll_len-- here.

>      }
> -    r = vhost_vdpa_net_load_mq(s, n);
> -    if (unlikely(r)) {
> +
> +    /* If code comes from `load_err` label, then we should return directly */
> +    if (unlikely(r < 0)) {

If this function returns a failure we can avoid the sending of the
queued commands, as the caller must cancel the start of the device
anyway. We can return directly from the failure in
vhost_vdpa_net_load_* and avoid the label.

>          return r;
>      }
>
> -    return 0;
> +    /* Check the used buffer from device */
> +    for (virtio_net_ctrl_ack * status = s->status; (void *)status < in_cursor;

in_cursor can be a virtio_net_ctrl_ack so we don't need the casting here.

> +         ++status) {
> +        if (*status != VIRTIO_NET_OK) {
> +            ++r;
> +        }
> +    }
> +
> +    return r;

Although the caller is fine with >=0, I think we should keep the 0 ==
success. The number of commands delivered does not make a lot of sense
for the callers, just if the call succeeded or not.

Thanks!

>  }
>
>  static NetClientInfo net_vhost_vdpa_cvq_info = {
> --
> 2.25.1
>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] vdpa: send CVQ state load commands in parallel
  2023-04-19 17:42   ` Eugenio Perez Martin
@ 2023-04-20 11:38     ` Hawkins Jiawei
  2023-04-20 15:22       ` Hawkins Jiawei
  0 siblings, 1 reply; 8+ messages in thread
From: Hawkins Jiawei @ 2023-04-20 11:38 UTC (permalink / raw)
  To: Eugenio Perez Martin; +Cc: jasowang, 18801353760, qemu-devel

On Thu, 20 Apr 2023 at 01:43, Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Apr 19, 2023 at 1:50 PM Hawkins Jiawei <yin31149@gmail.com> wrote:
> >
> > This patch introduces the vhost_vdpa_net_cvq_add() and
> > refactors the vhost_vdpa_net_load*(), so that QEMU can
> > send CVQ state load commands in parallel.
> >
> > To be more specific, this patch introduces vhost_vdpa_net_cvq_add()
> > to add SVQ control commands to SVQ and kick the device,
> > but does not poll the device used buffers. QEMU will not
> > poll and check the device used buffers in vhost_vdpa_net_load()
> > until all CVQ state load commands have been sent to the device.
> >
> > What's more, in order to avoid buffer overwriting caused by
> > using `svq->cvq_cmd_out_buffer` and `svq->status` as the
> > buffer for all CVQ state load commands when sending
> > CVQ state load commands in parallel, this patch introduces
> > `out_cursor` and `in_cursor` in vhost_vdpa_net_load(),
> > pointing to the available buffer for in descriptor and
> > out descriptor, so that different CVQ state load commands can
> > use their unique buffer.
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578
> > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
> > ---
> >  net/vhost-vdpa.c | 137 +++++++++++++++++++++++++++++++++++------------
> >  1 file changed, 102 insertions(+), 35 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 10804c7200..d1f7113992 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -590,6 +590,44 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
> >      vhost_vdpa_net_client_stop(nc);
> >  }
> >
> > +/**
> > + * vhost_vdpa_net_cvq_add() adds SVQ control commands to SVQ,
> > + * kicks the device but does not poll the device used buffers.
> > + *
> > + * Return the length of the device used buffers.
> > + */
> > +static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s,
> > +                                void **out_cursor, size_t out_len,
> > +                                void **in_cursor, size_t in_len)
> > +{
> > +    /* Buffers for the device */
> > +    const struct iovec out = {
> > +        .iov_base = *out_cursor,
> > +        .iov_len = out_len,
> > +    };
> > +    const struct iovec in = {
> > +        .iov_base = *in_cursor,
> > +        .iov_len = in_len,
> > +    };
> > +    VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
> > +    int r;
> > +
> > +    r = vhost_svq_add(svq, &out, 1, &in, 1, NULL);
> > +    if (unlikely(r != 0)) {
> > +        if (unlikely(r == -ENOSPC)) {
> > +            qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
> > +                          __func__);
> > +        }
> > +        return r;
> > +    }
> > +
> > +    /* Update the cursor */
> > +    *out_cursor += out_len;
> > +    *in_cursor += in_len;
> > +
> > +    return in_len;
> > +}
> > +
> >  /**
> >   * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ,
> >   * kicks the device and polls the device used buffers.
> > @@ -628,69 +666,64 @@ static ssize_t vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s,
> >      return vhost_svq_poll(svq);
> >  }
> >
> > -static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class,
> > -                                       uint8_t cmd, const void *data,
> > -                                       size_t data_size)
> > +static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s,
> > +                                void **out_cursor, uint8_t class, uint8_t cmd,
> > +                                const void *data, size_t data_size,
> > +                                void **in_cursor)
> >  {
> >      const struct virtio_net_ctrl_hdr ctrl = {
> >          .class = class,
> >          .cmd = cmd,
> >      };
> >
> > -    assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl));
> > +    assert(sizeof(ctrl) < vhost_vdpa_net_cvq_cmd_page_len() -
> > +                          (*out_cursor - s->cvq_cmd_out_buffer));
> > +    assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl) -
> > +                       (*out_cursor - s->cvq_cmd_out_buffer));
> >
> > -    memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl));
> > -    memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size);
> > +    memcpy(*out_cursor, &ctrl, sizeof(ctrl));
> > +    memcpy(*out_cursor + sizeof(ctrl), data, data_size);
> >
> > -    return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size,
> > -                                  sizeof(virtio_net_ctrl_ack));
> > +    return vhost_vdpa_net_cvq_add(s, out_cursor, sizeof(ctrl) + data_size,
> > +                                  in_cursor, sizeof(virtio_net_ctrl_ack));
> >  }
> >
> > -static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n)
> > +static ssize_t vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n,
> > +                                   void **out_cursor, void **in_cursor)
> >  {
> >      uint64_t features = n->parent_obj.guest_features;
> >      if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
> > -        ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MAC,
> > -                                                  VIRTIO_NET_CTRL_MAC_ADDR_SET,
> > -                                                  n->mac, sizeof(n->mac));
> > -        if (unlikely(dev_written < 0)) {
> > -            return dev_written;
> > -        }
> > -
> > -        return *s->status != VIRTIO_NET_OK;
> > +        return vhost_vdpa_net_load_cmd(s, out_cursor, VIRTIO_NET_CTRL_MAC,
> > +                                       VIRTIO_NET_CTRL_MAC_ADDR_SET,
> > +                                       n->mac, sizeof(n->mac), in_cursor);
> >      }
> >
> >      return 0;
> >  }
> >
> > -static int vhost_vdpa_net_load_mq(VhostVDPAState *s,
> > -                                  const VirtIONet *n)
> > +static ssize_t vhost_vdpa_net_load_mq(VhostVDPAState *s, const VirtIONet *n,
> > +                                  void **out_cursor, void **in_cursor)
> >  {
> >      struct virtio_net_ctrl_mq mq;
> >      uint64_t features = n->parent_obj.guest_features;
> > -    ssize_t dev_written;
> >
> >      if (!(features & BIT_ULL(VIRTIO_NET_F_MQ))) {
> >          return 0;
> >      }
> >
> > -    mq.virtqueue_pairs = cpu_to_le16(n->curr_queue_pairs);
>
> So where is mq filled now?

This should be a typo, this line should not be deleted.
I will correct it in the second patch.

>
> > -    dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MQ,
> > -                                          VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &mq,
> > -                                          sizeof(mq));
> > -    if (unlikely(dev_written < 0)) {
> > -        return dev_written;
> > -    }
> > -
> > -    return *s->status != VIRTIO_NET_OK;
> > +    return vhost_vdpa_net_load_cmd(s, out_cursor, VIRTIO_NET_CTRL_MQ,
> > +                                   VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET,
> > +                                   &mq, sizeof(mq), in_cursor);
> >  }
> >
> >  static int vhost_vdpa_net_load(NetClientState *nc)
> >  {
> >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    VhostShadowVirtqueue *svq;
> > +    void *out_cursor, *in_cursor;
> >      struct vhost_vdpa *v = &s->vhost_vdpa;
> >      const VirtIONet *n;
> > -    int r;
> > +    ssize_t need_poll_len = 0, r, dev_written;
> >
> >      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >
> > @@ -699,16 +732,50 @@ static int vhost_vdpa_net_load(NetClientState *nc)
> >      }
> >
> >      n = VIRTIO_NET(v->dev->vdev);
> > -    r = vhost_vdpa_net_load_mac(s, n);
> > +
>
> Extra newline.

Thanks for the comment, I will remove it in the second patch.

>
> > +    need_poll_len = 0;
>
> Maybe we can call it "cmds_in_flight" or something similar? It is not a length.

Do you mean we should maintain the number of successful commands here?

My initial thought was to maintain the total length of device used
buffer, so that we
can know when to exit while polling for the device used buffer.

So far, each SVQ command uses one byte long used buffer for device, so
the effect of maintaining the total length of device used buffer and
the number of
successful commands is consistent.

Yet I prefer to maintain the total length of device used buffer,
because this can
avoid the assumption that each SVQ command uses one byte long used
buffer for device and can use the vhost_svq_poll() return value better during
polling, which makes the code look more reasonable.

What do you think?

>
> > +    out_cursor = s->cvq_cmd_out_buffer;
> > +    in_cursor = s->status;
> > +
> > +    r = vhost_vdpa_net_load_mac(s, n, &out_cursor, &in_cursor);
> >      if (unlikely(r < 0)) {
> > -        return r;
> > +        goto load_err;
> > +    }
> > +    need_poll_len += r;
> > +
> > +    r = vhost_vdpa_net_load_mq(s, n, &out_cursor, &in_cursor);
> > +    if (unlikely(r < 0)) {
> > +        goto load_err;
> > +    }
> > +    need_poll_len += r;
> > +
> > +load_err:
> > +    /* Poll for all SVQ control commands used buffer from device */
> > +    svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
> > +    while (need_poll_len > 0) {
> > +        /*
> > +         * We can poll here since we've had BQL from the time we sent the
> > +         * descriptor. Also, we need to take the answer before SVQ pulls
> > +         * by itself, when BQL is released
> > +         */
> > +        dev_written = vhost_svq_poll(svq);
>
> If vhost_svq_poll returns 0 we must return an error from this code.
> Otherwise this will be an infinite loop.

Yes, you are right, we should return an error here. What error code do
you think we
should return? It seems that vhost_svq_poll() returns 0 when QEMU
waits for too long
or no available used buffer from device, so is it okay to return
-EAGAIN(resource temporarily
unavailable)?

>
> > +        need_poll_len -= dev_written;
>
> On the other hand, vhost_svq_poll returns 1 because that is the length
> written in the "in" buffer, but it is not obvious here. It is more
> clear if we just do need_poll_len-- here.

It should be the same as above. If we maintain the total length of
device used buffer for
`need_poll_len`, it look more reasonable here.

Maybe `used_buffer_len` is clearer?

>
> >      }
> > -    r = vhost_vdpa_net_load_mq(s, n);
> > -    if (unlikely(r)) {
> > +
> > +    /* If code comes from `load_err` label, then we should return directly */
> > +    if (unlikely(r < 0)) {
>
> If this function returns a failure we can avoid the sending of the
> queued commands, as the caller must cancel the start of the device
> anyway. We can return directly from the failure in
> vhost_vdpa_net_load_* and avoid the label.

Thanks for the explanation, I will refactor the patch as you suggested.

>
> >          return r;
> >      }
> >
> > -    return 0;
> > +    /* Check the used buffer from device */
> > +    for (virtio_net_ctrl_ack * status = s->status; (void *)status < in_cursor;
>
> in_cursor can be a virtio_net_ctrl_ack so we don't need the casting here.
Thanks for the suggestion, I will refactor the patch as you suggested.

>
> > +         ++status) {
> > +        if (*status != VIRTIO_NET_OK) {
> > +            ++r;
> > +        }
> > +    }
> > +
> > +    return r;
>
> Although the caller is fine with >=0, I think we should keep the 0 ==
> success. The number of commands delivered does not make a lot of sense
> for the callers, just if the call succeeded or not.

Thanks for the explanation, I will refactor the patch as you suggested.

>
> Thanks!
>
> >  }
> >
> >  static NetClientInfo net_vhost_vdpa_cvq_info = {
> > --
> > 2.25.1
> >
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] vdpa: send CVQ state load commands in parallel
  2023-04-20 11:38     ` Hawkins Jiawei
@ 2023-04-20 15:22       ` Hawkins Jiawei
  0 siblings, 0 replies; 8+ messages in thread
From: Hawkins Jiawei @ 2023-04-20 15:22 UTC (permalink / raw)
  To: Eugenio Perez Martin; +Cc: jasowang, 18801353760, qemu-devel

On Thu, 20 Apr 2023 at 19:38, Hawkins Jiawei <yin31149@gmail.com> wrote:
>
> On Thu, 20 Apr 2023 at 01:43, Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Wed, Apr 19, 2023 at 1:50 PM Hawkins Jiawei <yin31149@gmail.com> wrote:
> > >
> > > +         ++status) {
> > > +        if (*status != VIRTIO_NET_OK) {
> > > +            ++r;
> > > +        }
> > > +    }
> > > +
> > > +    return r;
> >
> > Although the caller is fine with >=0, I think we should keep the 0 ==
> > success. The number of commands delivered does not make a lot of sense
> > for the callers, just if the call succeeded or not.
>
> Thanks for the explanation, I will refactor the patch as you suggested.

I still have some questions about the check on device used buffers.

My initial thought was to return the number of commands whose
`in buffer` value is not VIRTIO_NET_OK.

If we are not supposed to return value > 0, what should we return
if some commands' `in buffer` value  is not VIRTIO_NET_OK.

Should we return an error code, such as EINVAL(Invalid argument),
indicating that QEMU can not successfully send all SVQ commands
in the current state. Or should we just do not check the device used buffers,
and return 0 when QEMU finishes polling?

Thanks!

>
> >
> > Thanks!
> >
> > >  }
> > >
> > >  static NetClientInfo net_vhost_vdpa_cvq_info = {
> > > --
> > > 2.25.1
> > >
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Send all the SVQ control commands in parallel
  2023-04-19 11:49 [PATCH 0/2] Send all the SVQ control commands in parallel Hawkins Jiawei
  2023-04-19 11:49 ` [PATCH 1/2] vdpa: rename vhost_vdpa_net_cvq_add() Hawkins Jiawei
  2023-04-19 11:49 ` [PATCH 2/2] vdpa: send CVQ state load commands in parallel Hawkins Jiawei
@ 2023-04-19 17:16 ` Eugenio Perez Martin
  2023-04-20  8:32   ` Hawkins Jiawei
  2 siblings, 1 reply; 8+ messages in thread
From: Eugenio Perez Martin @ 2023-04-19 17:16 UTC (permalink / raw)
  To: Hawkins Jiawei; +Cc: jasowang, 18801353760, qemu-devel

On Wed, Apr 19, 2023 at 1:50 PM Hawkins Jiawei <yin31149@gmail.com> wrote:
>
> This patchset allows QEMU to poll and check the device used buffer
> after sending all SVQ control commands, instead of polling and checking
> immediately after sending each SVQ control command, so that QEMU can
> send all the SVQ control commands in parallel, which have better
> performance improvement.
>
> I use vdpa_sim_net to simulate vdpa device, refactor
> vhost_vdpa_net_load() to call vhost_vdpa_net_load_mac() 30 times,
> refactor `net_vhost_vdpa_cvq_info.load` to call vhost_vdpa_net_load()
> 1000 times,

Maybe a little bit too high for real scenarios but it gives us a hint
for sure :). Maybe it is more realistic to send ~10 or ~100 commands?

>  to build a test environment for sending
> multiple SVQ control commands. Time in monotonic to
> finish `net_vhost_vdpa_cvq_info.load`:
>
>     QEMU                            monotonic time
> --------------------------------------------------
> not patched                              89202
> --------------------------------------------------
> patched                                  80455
>

Is time expressed in seconds or milliseconds? I'm going to assume ms.

So let's say all the time was spent in the context switch between qemu
and kernel, this is a save of (89202 - 80455)/30000 = 0.3 ms per
command?

Thanks!

> This patchset resolves the GitLab issue at
> https://gitlab.com/qemu-project/qemu/-/issues/1578.
>
> Hawkins Jiawei (2):
>   vdpa: rename vhost_vdpa_net_cvq_add()
>   vdpa: send CVQ state load commands in parallel
>
>  net/vhost-vdpa.c | 150 +++++++++++++++++++++++++++++++++++------------
>  1 file changed, 112 insertions(+), 38 deletions(-)
>
> --
> 2.25.1
>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Send all the SVQ control commands in parallel
  2023-04-19 17:16 ` [PATCH 0/2] Send all the SVQ control " Eugenio Perez Martin
@ 2023-04-20  8:32   ` Hawkins Jiawei
  0 siblings, 0 replies; 8+ messages in thread
From: Hawkins Jiawei @ 2023-04-20  8:32 UTC (permalink / raw)
  To: Eugenio Perez Martin; +Cc: jasowang, 18801353760, qemu-devel

On Thu, 20 Apr 2023 at 01:17, Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Apr 19, 2023 at 1:50 PM Hawkins Jiawei <yin31149@gmail.com> wrote:
> >
> > This patchset allows QEMU to poll and check the device used buffer
> > after sending all SVQ control commands, instead of polling and checking
> > immediately after sending each SVQ control command, so that QEMU can
> > send all the SVQ control commands in parallel, which have better
> > performance improvement.
> >
> > I use vdpa_sim_net to simulate vdpa device, refactor
> > vhost_vdpa_net_load() to call vhost_vdpa_net_load_mac() 30 times,
> > refactor `net_vhost_vdpa_cvq_info.load` to call vhost_vdpa_net_load()
> > 1000 times,
>
> Maybe a little bit too high for real scenarios but it gives us a hint
> for sure :). Maybe it is more realistic to send ~10 or ~100 commands?

Yes, it is absolutely too high for real scenarios to call vhost_vdpa_net_load()
1000 times. But considering that the time to execute vhost_vdpa_net_load_mac()
30 times is very short, the result time may be highly unstable and
fluctuate greatly,
 so I call vhost_vdpa_net_load() 1000 times, hoping to get a more stable result.

>
> >  to build a test environment for sending
> > multiple SVQ control commands. Time in monotonic to
> > finish `net_vhost_vdpa_cvq_info.load`:
> >
> >     QEMU                            monotonic time
> > --------------------------------------------------
> > not patched                              89202
> > --------------------------------------------------
> > patched                                  80455
> >
>
> Is time expressed in seconds or milliseconds? I'm going to assume ms.

I got this by calling g_get_monotonic_time(), it should be microseconds
according to [1].

[1]. https://docs.gtk.org/glib/func.get_monotonic_time.html

>
> So let's say all the time was spent in the context switch between qemu
> and kernel, this is a save of (89202 - 80455)/30000 = 0.3 ms per
> command?

Yes, I think it is a save of 0.3 microseconds per command.

Thanks!

>
> Thanks!
>
> > This patchset resolves the GitLab issue at
> > https://gitlab.com/qemu-project/qemu/-/issues/1578.
> >
> > Hawkins Jiawei (2):
> >   vdpa: rename vhost_vdpa_net_cvq_add()
> >   vdpa: send CVQ state load commands in parallel
> >
> >  net/vhost-vdpa.c | 150 +++++++++++++++++++++++++++++++++++------------
> >  1 file changed, 112 insertions(+), 38 deletions(-)
> >
> > --
> > 2.25.1
> >
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-20 15:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-19 11:49 [PATCH 0/2] Send all the SVQ control commands in parallel Hawkins Jiawei
2023-04-19 11:49 ` [PATCH 1/2] vdpa: rename vhost_vdpa_net_cvq_add() Hawkins Jiawei
2023-04-19 11:49 ` [PATCH 2/2] vdpa: send CVQ state load commands in parallel Hawkins Jiawei
2023-04-19 17:42   ` Eugenio Perez Martin
2023-04-20 11:38     ` Hawkins Jiawei
2023-04-20 15:22       ` Hawkins Jiawei
2023-04-19 17:16 ` [PATCH 0/2] Send all the SVQ control " Eugenio Perez Martin
2023-04-20  8:32   ` Hawkins Jiawei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).