* [PATCH V3 0/6] vhost code cleanup and minor enhancement
@ 2013-09-02 8:40 Jason Wang
2013-09-02 8:40 ` [PATCH V3 1/6] vhost_net: make vhost_zerocopy_signal_used() return void Jason Wang
` (6 more replies)
0 siblings, 7 replies; 13+ messages in thread
From: Jason Wang @ 2013-09-02 8:40 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel
This series tries to unify and simplify vhost codes especially for
zerocopy. With this series, 5% - 10% improvement for per cpu throughput were
seen during netperf guest sending test.
Plase review.
Changes from V2:
- Typo fixes and code style fix
- Add performance gain in the commit log of patch 2/6
- Retest the update the result in patch 6/6
Changes from V1:
- Fix the zerocopy enabling check by changing the check of upend_idx != done_idx
to (upend_idx + 1) % UIO_MAXIOV == done_idx.
- Switch to use put_user() in __vhost_add_used_n() if there's only one used
- Keep the max pending check based on Michael's suggestion.
Jason Wang (6):
vhost_net: make vhost_zerocopy_signal_used() return void
vhost_net: use vhost_add_used_and_signal_n() in
vhost_zerocopy_signal_used()
vhost: switch to use vhost_add_used_n()
vhost_net: determine whether or not to use zerocopy at one time
vhost_net: poll vhost queue after marking DMA is done
vhost_net: correctly limit the max pending buffers
drivers/vhost/net.c | 92 ++++++++++++++++++++++--------------------------
drivers/vhost/vhost.c | 54 ++++++----------------------
2 files changed, 54 insertions(+), 92 deletions(-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH V3 1/6] vhost_net: make vhost_zerocopy_signal_used() return void
2013-09-02 8:40 [PATCH V3 0/6] vhost code cleanup and minor enhancement Jason Wang
@ 2013-09-02 8:40 ` Jason Wang
2013-09-02 8:40 ` [PATCH V3 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used() Jason Wang
` (5 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2013-09-02 8:40 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel
None of its caller use its return value, so let it return void.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 969a859..280ee66 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -276,8 +276,8 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
* of used idx. Once lower device DMA done contiguously, we will signal KVM
* guest used idx.
*/
-static int vhost_zerocopy_signal_used(struct vhost_net *net,
- struct vhost_virtqueue *vq)
+static void vhost_zerocopy_signal_used(struct vhost_net *net,
+ struct vhost_virtqueue *vq)
{
struct vhost_net_virtqueue *nvq =
container_of(vq, struct vhost_net_virtqueue, vq);
@@ -297,7 +297,6 @@ static int vhost_zerocopy_signal_used(struct vhost_net *net,
}
if (j)
nvq->done_idx = i;
- return j;
}
static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH V3 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()
2013-09-02 8:40 [PATCH V3 0/6] vhost code cleanup and minor enhancement Jason Wang
2013-09-02 8:40 ` [PATCH V3 1/6] vhost_net: make vhost_zerocopy_signal_used() return void Jason Wang
@ 2013-09-02 8:40 ` Jason Wang
2013-09-02 8:40 ` [PATCH V3 3/6] vhost: switch to use vhost_add_used_n() Jason Wang
` (4 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2013-09-02 8:40 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel
We tend to batch the used adding and signaling in vhost_zerocopy_callback()
which may result more than 100 used buffers to be updated in
vhost_zerocopy_signal_used() in some cases. So switch to use
vhost_add_used_and_signal_n() to avoid multiple calls to
vhost_add_used_and_signal(). Which means much less times of used index
updating and memory barriers.
2% performance improvement were seen on netperf TCP_RR test.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 280ee66..8a6dd0d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -281,7 +281,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
{
struct vhost_net_virtqueue *nvq =
container_of(vq, struct vhost_net_virtqueue, vq);
- int i;
+ int i, add;
int j = 0;
for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
@@ -289,14 +289,17 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
vhost_net_tx_err(net);
if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
- vhost_add_used_and_signal(vq->dev, vq,
- vq->heads[i].id, 0);
++j;
} else
break;
}
- if (j)
- nvq->done_idx = i;
+ while (j) {
+ add = min(UIO_MAXIOV - nvq->done_idx, j);
+ vhost_add_used_and_signal_n(vq->dev, vq,
+ &vq->heads[nvq->done_idx], add);
+ nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
+ j -= add;
+ }
}
static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH V3 3/6] vhost: switch to use vhost_add_used_n()
2013-09-02 8:40 [PATCH V3 0/6] vhost code cleanup and minor enhancement Jason Wang
2013-09-02 8:40 ` [PATCH V3 1/6] vhost_net: make vhost_zerocopy_signal_used() return void Jason Wang
2013-09-02 8:40 ` [PATCH V3 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used() Jason Wang
@ 2013-09-02 8:40 ` Jason Wang
2013-09-02 8:40 ` [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time Jason Wang
` (3 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2013-09-02 8:40 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel
Let vhost_add_used() to use vhost_add_used_n() to reduce the code
duplication. To avoid the overhead brought by __copy_to_user(). We will use
put_user() when one used need to be added.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/vhost.c | 54 ++++++++++--------------------------------------
1 files changed, 12 insertions(+), 42 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 448efe0..9a9502a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1332,48 +1332,9 @@ EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
* want to notify the guest, using eventfd. */
int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
{
- struct vring_used_elem __user *used;
+ struct vring_used_elem heads = { head, len };
- /* The virtqueue contains a ring of used buffers. Get a pointer to the
- * next entry in that used ring. */
- used = &vq->used->ring[vq->last_used_idx % vq->num];
- if (__put_user(head, &used->id)) {
- vq_err(vq, "Failed to write used id");
- return -EFAULT;
- }
- if (__put_user(len, &used->len)) {
- vq_err(vq, "Failed to write used len");
- return -EFAULT;
- }
- /* Make sure buffer is written before we update index. */
- smp_wmb();
- if (__put_user(vq->last_used_idx + 1, &vq->used->idx)) {
- vq_err(vq, "Failed to increment used idx");
- return -EFAULT;
- }
- if (unlikely(vq->log_used)) {
- /* Make sure data is seen before log. */
- smp_wmb();
- /* Log used ring entry write. */
- log_write(vq->log_base,
- vq->log_addr +
- ((void __user *)used - (void __user *)vq->used),
- sizeof *used);
- /* Log used index update. */
- log_write(vq->log_base,
- vq->log_addr + offsetof(struct vring_used, idx),
- sizeof vq->used->idx);
- if (vq->log_ctx)
- eventfd_signal(vq->log_ctx, 1);
- }
- vq->last_used_idx++;
- /* If the driver never bothers to signal in a very long while,
- * used index might wrap around. If that happens, invalidate
- * signalled_used index we stored. TODO: make sure driver
- * signals at least once in 2^16 and remove this. */
- if (unlikely(vq->last_used_idx == vq->signalled_used))
- vq->signalled_used_valid = false;
- return 0;
+ return vhost_add_used_n(vq, &heads, 1);
}
EXPORT_SYMBOL_GPL(vhost_add_used);
@@ -1387,7 +1348,16 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
start = vq->last_used_idx % vq->num;
used = vq->used->ring + start;
- if (__copy_to_user(used, heads, count * sizeof *used)) {
+ if (count == 1) {
+ if (__put_user(heads[0].id, &used->id)) {
+ vq_err(vq, "Failed to write used id");
+ return -EFAULT;
+ }
+ if (__put_user(heads[0].len, &used->len)) {
+ vq_err(vq, "Failed to write used len");
+ return -EFAULT;
+ }
+ } else if (__copy_to_user(used, heads, count * sizeof *used)) {
vq_err(vq, "Failed to write used");
return -EFAULT;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
2013-09-02 8:40 [PATCH V3 0/6] vhost code cleanup and minor enhancement Jason Wang
` (2 preceding siblings ...)
2013-09-02 8:40 ` [PATCH V3 3/6] vhost: switch to use vhost_add_used_n() Jason Wang
@ 2013-09-02 8:40 ` Jason Wang
2013-09-04 11:59 ` Michael S. Tsirkin
2013-09-02 8:41 ` [PATCH V3 5/6] vhost_net: poll vhost queue after marking DMA is done Jason Wang
` (2 subsequent siblings)
6 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2013-09-02 8:40 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel
Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
upend_idx != done_idx we still set zcopy_used to true and rollback this choice
later. This could be avoided by determining zerocopy once by checking all
conditions at one time before.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 47 ++++++++++++++++++++---------------------------
1 files changed, 20 insertions(+), 27 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8a6dd0d..3f89dea 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
iov_length(nvq->hdr, s), hdr_size);
break;
}
- zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
- nvq->upend_idx != nvq->done_idx);
+
+ zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
+ && (nvq->upend_idx + 1) % UIO_MAXIOV !=
+ nvq->done_idx
+ && vhost_net_tx_select_zcopy(net);
/* use msg_control to pass vhost zerocopy ubuf info to skb */
if (zcopy_used) {
+ struct ubuf_info *ubuf;
+ ubuf = nvq->ubuf_info + nvq->upend_idx;
+
vq->heads[nvq->upend_idx].id = head;
- if (!vhost_net_tx_select_zcopy(net) ||
- len < VHOST_GOODCOPY_LEN) {
- /* copy don't need to wait for DMA done */
- vq->heads[nvq->upend_idx].len =
- VHOST_DMA_DONE_LEN;
- msg.msg_control = NULL;
- msg.msg_controllen = 0;
- ubufs = NULL;
- } else {
- struct ubuf_info *ubuf;
- ubuf = nvq->ubuf_info + nvq->upend_idx;
-
- vq->heads[nvq->upend_idx].len =
- VHOST_DMA_IN_PROGRESS;
- ubuf->callback = vhost_zerocopy_callback;
- ubuf->ctx = nvq->ubufs;
- ubuf->desc = nvq->upend_idx;
- msg.msg_control = ubuf;
- msg.msg_controllen = sizeof(ubuf);
- ubufs = nvq->ubufs;
- kref_get(&ubufs->kref);
- }
+ vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
+ ubuf->callback = vhost_zerocopy_callback;
+ ubuf->ctx = nvq->ubufs;
+ ubuf->desc = nvq->upend_idx;
+ msg.msg_control = ubuf;
+ msg.msg_controllen = sizeof(ubuf);
+ ubufs = nvq->ubufs;
+ kref_get(&ubufs->kref);
nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
- } else
+ } else {
msg.msg_control = NULL;
+ ubufs = NULL;
+ }
/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock->ops->sendmsg(NULL, sock, &msg, len);
if (unlikely(err < 0)) {
if (zcopy_used) {
- if (ubufs)
- vhost_net_ubuf_put(ubufs);
+ vhost_net_ubuf_put(ubufs);
nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
% UIO_MAXIOV;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH V3 5/6] vhost_net: poll vhost queue after marking DMA is done
2013-09-02 8:40 [PATCH V3 0/6] vhost code cleanup and minor enhancement Jason Wang
` (3 preceding siblings ...)
2013-09-02 8:40 ` [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time Jason Wang
@ 2013-09-02 8:41 ` Jason Wang
2013-09-02 8:41 ` [PATCH V3 6/6] vhost_net: correctly limit the max pending buffers Jason Wang
2013-09-04 2:47 ` [PATCH V3 0/6] vhost code cleanup and minor enhancement David Miller
6 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2013-09-02 8:41 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel; +Cc: Jason Wang
We used to poll vhost queue before making DMA is done, this is racy if vhost
thread were waked up before marking DMA is done which can result the signal to
be missed. Fix this by always polling the vhost thread before DMA is done.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
- The patch is needed for stable 3.4+
---
drivers/vhost/net.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 3f89dea..8e9dc55 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -308,6 +308,11 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
struct vhost_virtqueue *vq = ubufs->vq;
int cnt = atomic_read(&ubufs->kref.refcount);
+ /* set len to mark this desc buffers done DMA */
+ vq->heads[ubuf->desc].len = success ?
+ VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
+ vhost_net_ubuf_put(ubufs);
+
/*
* Trigger polling thread if guest stopped submitting new buffers:
* in this case, the refcount after decrement will eventually reach 1
@@ -318,10 +323,6 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
*/
if (cnt <= 2 || !(cnt % 16))
vhost_poll_queue(&vq->poll);
- /* set len to mark this desc buffers done DMA */
- vq->heads[ubuf->desc].len = success ?
- VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
- vhost_net_ubuf_put(ubufs);
}
/* Expects to be always run from workqueue - which acts as
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH V3 6/6] vhost_net: correctly limit the max pending buffers
2013-09-02 8:40 [PATCH V3 0/6] vhost code cleanup and minor enhancement Jason Wang
` (4 preceding siblings ...)
2013-09-02 8:41 ` [PATCH V3 5/6] vhost_net: poll vhost queue after marking DMA is done Jason Wang
@ 2013-09-02 8:41 ` Jason Wang
2013-09-04 2:47 ` [PATCH V3 0/6] vhost code cleanup and minor enhancement David Miller
6 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2013-09-02 8:41 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel
As Michael point out, We used to limit the max pending DMAs to get better cache
utilization. But it was not done correctly since it was one done when there's no
new buffers submitted from guest. Guest can easily exceeds the limitation by
keeping sending packets.
So this patch moves the check into main loop. Tests shows about 5%-10%
improvement on per cpu throughput for guest tx.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 18 +++++++-----------
1 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8e9dc55..831eb4f 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -363,6 +363,13 @@ static void handle_tx(struct vhost_net *net)
if (zcopy)
vhost_zerocopy_signal_used(net, vq);
+ /* If more outstanding DMAs, queue the work.
+ * Handle upend_idx wrap around
+ */
+ if (unlikely((nvq->upend_idx + vq->num - VHOST_MAX_PEND)
+ % UIO_MAXIOV == nvq->done_idx))
+ break;
+
head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
ARRAY_SIZE(vq->iov),
&out, &in,
@@ -372,17 +379,6 @@ static void handle_tx(struct vhost_net *net)
break;
/* Nothing new? Wait for eventfd to tell us they refilled. */
if (head == vq->num) {
- int num_pends;
-
- /* If more outstanding DMAs, queue the work.
- * Handle upend_idx wrap around
- */
- num_pends = likely(nvq->upend_idx >= nvq->done_idx) ?
- (nvq->upend_idx - nvq->done_idx) :
- (nvq->upend_idx + UIO_MAXIOV -
- nvq->done_idx);
- if (unlikely(num_pends > VHOST_MAX_PEND))
- break;
if (unlikely(vhost_enable_notify(&net->dev, vq))) {
vhost_disable_notify(&net->dev, vq);
continue;
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH V3 0/6] vhost code cleanup and minor enhancement
2013-09-02 8:40 [PATCH V3 0/6] vhost code cleanup and minor enhancement Jason Wang
` (5 preceding siblings ...)
2013-09-02 8:41 ` [PATCH V3 6/6] vhost_net: correctly limit the max pending buffers Jason Wang
@ 2013-09-04 2:47 ` David Miller
6 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2013-09-04 2:47 UTC (permalink / raw)
To: jasowang; +Cc: netdev, virtualization, linux-kernel, kvm, mst
From: Jason Wang <jasowang@redhat.com>
Date: Mon, 2 Sep 2013 16:40:55 +0800
> This series tries to unify and simplify vhost codes especially for
> zerocopy. With this series, 5% - 10% improvement for per cpu throughput were
> seen during netperf guest sending test.
>
> Plase review.
Applied and patch #5 queued up for -stable, thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
2013-09-02 8:40 ` [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time Jason Wang
@ 2013-09-04 11:59 ` Michael S. Tsirkin
2013-09-05 2:54 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2013-09-04 11:59 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
> Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
> upend_idx != done_idx we still set zcopy_used to true and rollback this choice
> later. This could be avoided by determining zerocopy once by checking all
> conditions at one time before.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/net.c | 47 ++++++++++++++++++++---------------------------
> 1 files changed, 20 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 8a6dd0d..3f89dea 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
> iov_length(nvq->hdr, s), hdr_size);
> break;
> }
> - zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
> - nvq->upend_idx != nvq->done_idx);
> +
> + zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
> + && (nvq->upend_idx + 1) % UIO_MAXIOV !=
> + nvq->done_idx
Thinking about this, this looks strange.
The original idea was that once we start doing zcopy, we keep
using the heads ring even for short packets until no zcopy is outstanding.
What's the logic behind (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx
here?
> + && vhost_net_tx_select_zcopy(net);
>
> /* use msg_control to pass vhost zerocopy ubuf info to skb */
> if (zcopy_used) {
> + struct ubuf_info *ubuf;
> + ubuf = nvq->ubuf_info + nvq->upend_idx;
> +
> vq->heads[nvq->upend_idx].id = head;
> - if (!vhost_net_tx_select_zcopy(net) ||
> - len < VHOST_GOODCOPY_LEN) {
> - /* copy don't need to wait for DMA done */
> - vq->heads[nvq->upend_idx].len =
> - VHOST_DMA_DONE_LEN;
> - msg.msg_control = NULL;
> - msg.msg_controllen = 0;
> - ubufs = NULL;
> - } else {
> - struct ubuf_info *ubuf;
> - ubuf = nvq->ubuf_info + nvq->upend_idx;
> -
> - vq->heads[nvq->upend_idx].len =
> - VHOST_DMA_IN_PROGRESS;
> - ubuf->callback = vhost_zerocopy_callback;
> - ubuf->ctx = nvq->ubufs;
> - ubuf->desc = nvq->upend_idx;
> - msg.msg_control = ubuf;
> - msg.msg_controllen = sizeof(ubuf);
> - ubufs = nvq->ubufs;
> - kref_get(&ubufs->kref);
> - }
> + vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
> + ubuf->callback = vhost_zerocopy_callback;
> + ubuf->ctx = nvq->ubufs;
> + ubuf->desc = nvq->upend_idx;
> + msg.msg_control = ubuf;
> + msg.msg_controllen = sizeof(ubuf);
> + ubufs = nvq->ubufs;
> + kref_get(&ubufs->kref);
> nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
> - } else
> + } else {
> msg.msg_control = NULL;
> + ubufs = NULL;
> + }
> /* TODO: Check specific error and bomb out unless ENOBUFS? */
> err = sock->ops->sendmsg(NULL, sock, &msg, len);
> if (unlikely(err < 0)) {
> if (zcopy_used) {
> - if (ubufs)
> - vhost_net_ubuf_put(ubufs);
> + vhost_net_ubuf_put(ubufs);
> nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> % UIO_MAXIOV;
> }
> --
> 1.7.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
2013-09-04 11:59 ` Michael S. Tsirkin
@ 2013-09-05 2:54 ` Jason Wang
2013-09-23 7:16 ` Michael S. Tsirkin
0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2013-09-05 2:54 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization
On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote:
> On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
>> Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
>> upend_idx != done_idx we still set zcopy_used to true and rollback this choice
>> later. This could be avoided by determining zerocopy once by checking all
>> conditions at one time before.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/vhost/net.c | 47 ++++++++++++++++++++---------------------------
>> 1 files changed, 20 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 8a6dd0d..3f89dea 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
>> iov_length(nvq->hdr, s), hdr_size);
>> break;
>> }
>> - zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
>> - nvq->upend_idx != nvq->done_idx);
>> +
>> + zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
>> + && (nvq->upend_idx + 1) % UIO_MAXIOV !=
>> + nvq->done_idx
> Thinking about this, this looks strange.
> The original idea was that once we start doing zcopy, we keep
> using the heads ring even for short packets until no zcopy is outstanding.
What's the reason for keep using the heads ring?
>
> What's the logic behind (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx
> here?
Because we initialize both upend_idx and done_idx to zero, so upend_idx
!= done_idx could not be used to check whether or not the heads ring
were full.
>> + && vhost_net_tx_select_zcopy(net);
>>
>> /* use msg_control to pass vhost zerocopy ubuf info to skb */
>> if (zcopy_used) {
>> + struct ubuf_info *ubuf;
>> + ubuf = nvq->ubuf_info + nvq->upend_idx;
>> +
>> vq->heads[nvq->upend_idx].id = head;
>> - if (!vhost_net_tx_select_zcopy(net) ||
>> - len < VHOST_GOODCOPY_LEN) {
>> - /* copy don't need to wait for DMA done */
>> - vq->heads[nvq->upend_idx].len =
>> - VHOST_DMA_DONE_LEN;
>> - msg.msg_control = NULL;
>> - msg.msg_controllen = 0;
>> - ubufs = NULL;
>> - } else {
>> - struct ubuf_info *ubuf;
>> - ubuf = nvq->ubuf_info + nvq->upend_idx;
>> -
>> - vq->heads[nvq->upend_idx].len =
>> - VHOST_DMA_IN_PROGRESS;
>> - ubuf->callback = vhost_zerocopy_callback;
>> - ubuf->ctx = nvq->ubufs;
>> - ubuf->desc = nvq->upend_idx;
>> - msg.msg_control = ubuf;
>> - msg.msg_controllen = sizeof(ubuf);
>> - ubufs = nvq->ubufs;
>> - kref_get(&ubufs->kref);
>> - }
>> + vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
>> + ubuf->callback = vhost_zerocopy_callback;
>> + ubuf->ctx = nvq->ubufs;
>> + ubuf->desc = nvq->upend_idx;
>> + msg.msg_control = ubuf;
>> + msg.msg_controllen = sizeof(ubuf);
>> + ubufs = nvq->ubufs;
>> + kref_get(&ubufs->kref);
>> nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
>> - } else
>> + } else {
>> msg.msg_control = NULL;
>> + ubufs = NULL;
>> + }
>> /* TODO: Check specific error and bomb out unless ENOBUFS? */
>> err = sock->ops->sendmsg(NULL, sock, &msg, len);
>> if (unlikely(err < 0)) {
>> if (zcopy_used) {
>> - if (ubufs)
>> - vhost_net_ubuf_put(ubufs);
>> + vhost_net_ubuf_put(ubufs);
>> nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
>> % UIO_MAXIOV;
>> }
>> --
>> 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
2013-09-05 2:54 ` Jason Wang
@ 2013-09-23 7:16 ` Michael S. Tsirkin
2013-09-26 4:30 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2013-09-23 7:16 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
On Thu, Sep 05, 2013 at 10:54:44AM +0800, Jason Wang wrote:
> On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote:
> > On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
> >> Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
> >> upend_idx != done_idx we still set zcopy_used to true and rollback this choice
> >> later. This could be avoided by determining zerocopy once by checking all
> >> conditions at one time before.
> >>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >> drivers/vhost/net.c | 47 ++++++++++++++++++++---------------------------
> >> 1 files changed, 20 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index 8a6dd0d..3f89dea 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
> >> iov_length(nvq->hdr, s), hdr_size);
> >> break;
> >> }
> >> - zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
> >> - nvq->upend_idx != nvq->done_idx);
> >> +
> >> + zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
> >> + && (nvq->upend_idx + 1) % UIO_MAXIOV !=
> >> + nvq->done_idx
> > Thinking about this, this looks strange.
> > The original idea was that once we start doing zcopy, we keep
> > using the heads ring even for short packets until no zcopy is outstanding.
>
> What's the reason for keep using the heads ring?
To keep completions in order.
> >
> > What's the logic behind (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx
> > here?
>
> Because we initialize both upend_idx and done_idx to zero, so upend_idx
> != done_idx could not be used to check whether or not the heads ring
> were full.
But what does ring full have to do with zerocopy use?
> >> + && vhost_net_tx_select_zcopy(net);
> >>
> >> /* use msg_control to pass vhost zerocopy ubuf info to skb */
> >> if (zcopy_used) {
> >> + struct ubuf_info *ubuf;
> >> + ubuf = nvq->ubuf_info + nvq->upend_idx;
> >> +
> >> vq->heads[nvq->upend_idx].id = head;
> >> - if (!vhost_net_tx_select_zcopy(net) ||
> >> - len < VHOST_GOODCOPY_LEN) {
> >> - /* copy don't need to wait for DMA done */
> >> - vq->heads[nvq->upend_idx].len =
> >> - VHOST_DMA_DONE_LEN;
> >> - msg.msg_control = NULL;
> >> - msg.msg_controllen = 0;
> >> - ubufs = NULL;
> >> - } else {
> >> - struct ubuf_info *ubuf;
> >> - ubuf = nvq->ubuf_info + nvq->upend_idx;
> >> -
> >> - vq->heads[nvq->upend_idx].len =
> >> - VHOST_DMA_IN_PROGRESS;
> >> - ubuf->callback = vhost_zerocopy_callback;
> >> - ubuf->ctx = nvq->ubufs;
> >> - ubuf->desc = nvq->upend_idx;
> >> - msg.msg_control = ubuf;
> >> - msg.msg_controllen = sizeof(ubuf);
> >> - ubufs = nvq->ubufs;
> >> - kref_get(&ubufs->kref);
> >> - }
> >> + vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
> >> + ubuf->callback = vhost_zerocopy_callback;
> >> + ubuf->ctx = nvq->ubufs;
> >> + ubuf->desc = nvq->upend_idx;
> >> + msg.msg_control = ubuf;
> >> + msg.msg_controllen = sizeof(ubuf);
> >> + ubufs = nvq->ubufs;
> >> + kref_get(&ubufs->kref);
> >> nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
> >> - } else
> >> + } else {
> >> msg.msg_control = NULL;
> >> + ubufs = NULL;
> >> + }
> >> /* TODO: Check specific error and bomb out unless ENOBUFS? */
> >> err = sock->ops->sendmsg(NULL, sock, &msg, len);
> >> if (unlikely(err < 0)) {
> >> if (zcopy_used) {
> >> - if (ubufs)
> >> - vhost_net_ubuf_put(ubufs);
> >> + vhost_net_ubuf_put(ubufs);
> >> nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> >> % UIO_MAXIOV;
> >> }
> >> --
> >> 1.7.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
2013-09-23 7:16 ` Michael S. Tsirkin
@ 2013-09-26 4:30 ` Jason Wang
2013-09-29 9:36 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2013-09-26 4:30 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization
On 09/23/2013 03:16 PM, Michael S. Tsirkin wrote:
> On Thu, Sep 05, 2013 at 10:54:44AM +0800, Jason Wang wrote:
>> > On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote:
>>> > > On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
>>>> > >> Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
>>>> > >> upend_idx != done_idx we still set zcopy_used to true and rollback this choice
>>>> > >> later. This could be avoided by determining zerocopy once by checking all
>>>> > >> conditions at one time before.
>>>> > >>
>>>> > >> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> > >> ---
>>>> > >> drivers/vhost/net.c | 47 ++++++++++++++++++++---------------------------
>>>> > >> 1 files changed, 20 insertions(+), 27 deletions(-)
>>>> > >>
>>>> > >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>> > >> index 8a6dd0d..3f89dea 100644
>>>> > >> --- a/drivers/vhost/net.c
>>>> > >> +++ b/drivers/vhost/net.c
>>>> > >> @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
>>>> > >> iov_length(nvq->hdr, s), hdr_size);
>>>> > >> break;
>>>> > >> }
>>>> > >> - zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
>>>> > >> - nvq->upend_idx != nvq->done_idx);
>>>> > >> +
>>>> > >> + zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
>>>> > >> + && (nvq->upend_idx + 1) % UIO_MAXIOV !=
>>>> > >> + nvq->done_idx
>>> > > Thinking about this, this looks strange.
>>> > > The original idea was that once we start doing zcopy, we keep
>>> > > using the heads ring even for short packets until no zcopy is outstanding.
>> >
>> > What's the reason for keep using the heads ring?
> To keep completions in order.
Ok, I will do some test to see the impact.
>>> > >
>>> > > What's the logic behind (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx
>>> > > here?
>> >
>> > Because we initialize both upend_idx and done_idx to zero, so upend_idx
>> > != done_idx could not be used to check whether or not the heads ring
>> > were full.
> But what does ring full have to do with zerocopy use?
>
It was used to forbid the zerocopy when heads ring are full, but since
we have the limitation now, it could be removed.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
2013-09-26 4:30 ` Jason Wang
@ 2013-09-29 9:36 ` Jason Wang
0 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2013-09-29 9:36 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization
On 09/26/2013 12:30 PM, Jason Wang wrote:
> On 09/23/2013 03:16 PM, Michael S. Tsirkin wrote:
>> > On Thu, Sep 05, 2013 at 10:54:44AM +0800, Jason Wang wrote:
>>>> >> > On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote:
>>>>>> >>> > > On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
>>>>>>>> >>>> > >> Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
>>>>>>>> >>>> > >> upend_idx != done_idx we still set zcopy_used to true and rollback this choice
>>>>>>>> >>>> > >> later. This could be avoided by determining zerocopy once by checking all
>>>>>>>> >>>> > >> conditions at one time before.
>>>>>>>> >>>> > >>
>>>>>>>> >>>> > >> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>>>>> >>>> > >> ---
>>>>>>>> >>>> > >> drivers/vhost/net.c | 47 ++++++++++++++++++++---------------------------
>>>>>>>> >>>> > >> 1 files changed, 20 insertions(+), 27 deletions(-)
>>>>>>>> >>>> > >>
>>>>>>>> >>>> > >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>>>>>> >>>> > >> index 8a6dd0d..3f89dea 100644
>>>>>>>> >>>> > >> --- a/drivers/vhost/net.c
>>>>>>>> >>>> > >> +++ b/drivers/vhost/net.c
>>>>>>>> >>>> > >> @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
>>>>>>>> >>>> > >> iov_length(nvq->hdr, s), hdr_size);
>>>>>>>> >>>> > >> break;
>>>>>>>> >>>> > >> }
>>>>>>>> >>>> > >> - zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
>>>>>>>> >>>> > >> - nvq->upend_idx != nvq->done_idx);
>>>>>>>> >>>> > >> +
>>>>>>>> >>>> > >> + zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
>>>>>>>> >>>> > >> + && (nvq->upend_idx + 1) % UIO_MAXIOV !=
>>>>>>>> >>>> > >> + nvq->done_idx
>>>>>> >>> > > Thinking about this, this looks strange.
>>>>>> >>> > > The original idea was that once we start doing zcopy, we keep
>>>>>> >>> > > using the heads ring even for short packets until no zcopy is outstanding.
>>>> >> >
>>>> >> > What's the reason for keep using the heads ring?
>> > To keep completions in order.
> Ok, I will do some test to see the impact.
Since the our of order completion will happen when switching between
zero copy and non zero copy. I test this by using two sessions of
netperf in burst mode, one with 1 byte TCP_RR another with 512 bytes of
TCP_RR. There's no difference with the patch applied.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-09-29 9:36 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-02 8:40 [PATCH V3 0/6] vhost code cleanup and minor enhancement Jason Wang
2013-09-02 8:40 ` [PATCH V3 1/6] vhost_net: make vhost_zerocopy_signal_used() return void Jason Wang
2013-09-02 8:40 ` [PATCH V3 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used() Jason Wang
2013-09-02 8:40 ` [PATCH V3 3/6] vhost: switch to use vhost_add_used_n() Jason Wang
2013-09-02 8:40 ` [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time Jason Wang
2013-09-04 11:59 ` Michael S. Tsirkin
2013-09-05 2:54 ` Jason Wang
2013-09-23 7:16 ` Michael S. Tsirkin
2013-09-26 4:30 ` Jason Wang
2013-09-29 9:36 ` Jason Wang
2013-09-02 8:41 ` [PATCH V3 5/6] vhost_net: poll vhost queue after marking DMA is done Jason Wang
2013-09-02 8:41 ` [PATCH V3 6/6] vhost_net: correctly limit the max pending buffers Jason Wang
2013-09-04 2:47 ` [PATCH V3 0/6] vhost code cleanup and minor enhancement David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).