* [PATCH net-next V2 0/3] vhost net tx batching
@ 2016-12-28  8:09 Jason Wang
  2016-12-28  8:09 ` [PATCH net-next V2 1/3] vhost: better detection of available buffers Jason Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jason Wang @ 2016-12-28  8:09 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel
Hi:
This series tries to implement tx batching support for vhost. This was
done by using MSG_MORE as a hint for under layer socket. The backend
(e.g tap) can then batch the packets temporarily in a list and
submit it all once the number of bacthed exceeds a limitation.
Tests shows obvious improvement on guest pktgen over over
mlx4(noqueue) on host:
                    Mpps  -+%
        rx_batched=0  0.90  +0%
        rx_batched=4  0.97  +7.8%
        rx_batched=8  0.97  +7.8%
        rx_batched=16 0.98  +8.9%
        rx_batched=32 1.03  +14.4%
        rx_batched=48 1.09  +21.1%
        rx_batched=64 1.02  +13.3%
Changes from V1:
- drop NAPI handler since we don't use NAPI now
- fix the issues that may exceeds max pending of zerocopy
- more improvement on available buffer detection
- move the limitation of batched pacekts from vhost to tuntap
Please review.
Thanks
Jason Wang (3):
  vhost: better detection of available buffers
  vhost_net: tx batching
  tun: rx batching
 drivers/net/tun.c     | 66 ++++++++++++++++++++++++++++++++++++++++++++-------
 drivers/vhost/net.c   | 23 +++++++++++++++---
 drivers/vhost/vhost.c |  8 +++++--
 3 files changed, 84 insertions(+), 13 deletions(-)
-- 
2.7.4
^ permalink raw reply	[flat|nested] 11+ messages in thread* [PATCH net-next V2 1/3] vhost: better detection of available buffers 2016-12-28 8:09 [PATCH net-next V2 0/3] vhost net tx batching Jason Wang @ 2016-12-28 8:09 ` Jason Wang 2017-01-03 13:08 ` Stefan Hajnoczi 2016-12-28 8:09 ` [PATCH net-next V2 2/3] vhost_net: tx batching Jason Wang 2016-12-28 8:09 ` [PATCH net-next V2 3/3] tun: rx batching Jason Wang 2 siblings, 1 reply; 11+ messages in thread From: Jason Wang @ 2016-12-28 8:09 UTC (permalink / raw) To: mst, kvm, virtualization, netdev, linux-kernel This patch tries to do several tweaks on vhost_vq_avail_empty() for a better performance: - check cached avail index first which could avoid userspace memory access. - using unlikely() for the failure of userspace access - check vq->last_avail_idx instead of cached avail index as the last step. This patch is need for batching supports which needs to peek whether or not there's still available buffers in the ring. Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/vhost/vhost.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d643260..9f11838 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2241,11 +2241,15 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq) __virtio16 avail_idx; int r; + if (vq->avail_idx != vq->last_avail_idx) + return false; + r = vhost_get_user(vq, avail_idx, &vq->avail->idx); - if (r) + if (unlikely(r)) return false; + vq->avail_idx = vhost16_to_cpu(vq, avail_idx); - return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx; + return vq->avail_idx == vq->last_avail_idx; } EXPORT_SYMBOL_GPL(vhost_vq_avail_empty); -- 2.7.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 1/3] vhost: better detection of available buffers 2016-12-28 8:09 ` [PATCH net-next V2 1/3] vhost: better detection of available buffers Jason Wang @ 2017-01-03 13:08 ` Stefan Hajnoczi 0 siblings, 0 replies; 11+ messages in thread From: Stefan Hajnoczi @ 2017-01-03 13:08 UTC (permalink / raw) To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst [-- Attachment #1.1: Type: text/plain, Size: 719 bytes --] On Wed, Dec 28, 2016 at 04:09:29PM +0800, Jason Wang wrote: > This patch tries to do several tweaks on vhost_vq_avail_empty() for a > better performance: > > - check cached avail index first which could avoid userspace memory access. > - using unlikely() for the failure of userspace access > - check vq->last_avail_idx instead of cached avail index as the last > step. > > This patch is need for batching supports which needs to peek whether > or not there's still available buffers in the ring. > > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > drivers/vhost/vhost.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] [-- Attachment #2: Type: text/plain, Size: 183 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next V2 2/3] vhost_net: tx batching 2016-12-28 8:09 [PATCH net-next V2 0/3] vhost net tx batching Jason Wang 2016-12-28 8:09 ` [PATCH net-next V2 1/3] vhost: better detection of available buffers Jason Wang @ 2016-12-28 8:09 ` Jason Wang 2017-01-03 13:16 ` Stefan Hajnoczi 2016-12-28 8:09 ` [PATCH net-next V2 3/3] tun: rx batching Jason Wang 2 siblings, 1 reply; 11+ messages in thread From: Jason Wang @ 2016-12-28 8:09 UTC (permalink / raw) To: mst, kvm, virtualization, netdev, linux-kernel This patch tries to utilize tuntap rx batching by peeking the tx virtqueue during transmission, if there's more available buffers in the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap) to batch the packets. Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/vhost/net.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 5dc3465..c42e9c3 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -351,6 +351,15 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net, return r; } +static bool vhost_exceeds_maxpend(struct vhost_net *net) +{ + struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; + struct vhost_virtqueue *vq = &nvq->vq; + + return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV + == nvq->done_idx; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -394,8 +403,7 @@ static void handle_tx(struct vhost_net *net) /* If more outstanding DMAs, queue the work. * Handle upend_idx wrap around */ - if (unlikely((nvq->upend_idx + vq->num - VHOST_MAX_PEND) - % UIO_MAXIOV == nvq->done_idx)) + if (unlikely(vhost_exceeds_maxpend(net))) break; head = vhost_net_tx_get_vq_desc(net, vq, vq->iov, @@ -454,6 +462,16 @@ static void handle_tx(struct vhost_net *net) msg.msg_control = NULL; ubufs = NULL; } + + total_len += len; + if (total_len < VHOST_NET_WEIGHT && + !vhost_vq_avail_empty(&net->dev, vq) && + likely(!vhost_exceeds_maxpend(net))) { + msg.msg_flags |= MSG_MORE; + } else { + msg.msg_flags &= ~MSG_MORE; + } + /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { @@ -472,7 +490,6 @@ static void handle_tx(struct vhost_net *net) vhost_add_used_and_signal(&net->dev, vq, head, 0); else vhost_zerocopy_signal_used(net, vq); - total_len += len; vhost_net_tx_packet(net); if (unlikely(total_len >= VHOST_NET_WEIGHT)) { vhost_poll_queue(&vq->poll); -- 2.7.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 2/3] vhost_net: tx batching 2016-12-28 8:09 ` [PATCH net-next V2 2/3] vhost_net: tx batching Jason Wang @ 2017-01-03 13:16 ` Stefan Hajnoczi 0 siblings, 0 replies; 11+ messages in thread From: Stefan Hajnoczi @ 2017-01-03 13:16 UTC (permalink / raw) To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst [-- Attachment #1.1: Type: text/plain, Size: 519 bytes --] On Wed, Dec 28, 2016 at 04:09:30PM +0800, Jason Wang wrote: > This patch tries to utilize tuntap rx batching by peeking the tx > virtqueue during transmission, if there's more available buffers in > the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap) > to batch the packets. > > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > drivers/vhost/net.c | 23 ++++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] [-- Attachment #2: Type: text/plain, Size: 183 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next V2 3/3] tun: rx batching 2016-12-28 8:09 [PATCH net-next V2 0/3] vhost net tx batching Jason Wang 2016-12-28 8:09 ` [PATCH net-next V2 1/3] vhost: better detection of available buffers Jason Wang 2016-12-28 8:09 ` [PATCH net-next V2 2/3] vhost_net: tx batching Jason Wang @ 2016-12-28 8:09 ` Jason Wang 2016-12-29 16:35 ` David Miller 2017-01-03 13:33 ` Stefan Hajnoczi 2 siblings, 2 replies; 11+ messages in thread From: Jason Wang @ 2016-12-28 8:09 UTC (permalink / raw) To: mst, kvm, virtualization, netdev, linux-kernel; +Cc: Jason Wang We can only process 1 packet at one time during sendmsg(). This often lead bad cache utilization under heavy load. So this patch tries to do some batching during rx before submitting them to host network stack. This is done through accepting MSG_MORE as a hint from sendmsg() caller, if it was set, batch the packet temporarily in a linked list and submit them all once MSG_MORE were cleared. Tests were done by pktgen (burst=128) in guest over mlx4(noqueue) on host: Mpps -+% rx_batched=0 0.90 +0% rx_batched=4 0.97 +7.8% rx_batched=8 0.97 +7.8% rx_batched=16 0.98 +8.9% rx_batched=32 1.03 +14.4% rx_batched=48 1.09 +21.1% rx_batched=64 1.02 +13.3% The maximum number of batched packets were specified through a module parameter. Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/net/tun.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 58 insertions(+), 8 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index cd8e02c..6ea5d6d 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -75,6 +75,10 @@ #include <linux/uaccess.h> +static int rx_batched; +module_param(rx_batched, int, 0444); +MODULE_PARM_DESC(rx_batched, "Number of packets batched in rx"); + /* Uncomment to enable debugging */ /* #define TUN_DEBUG 1 */ @@ -522,6 +526,7 @@ static void tun_queue_purge(struct tun_file *tfile) while ((skb = skb_array_consume(&tfile->tx_array)) != NULL) kfree_skb(skb); + skb_queue_purge(&tfile->sk.sk_write_queue); skb_queue_purge(&tfile->sk.sk_error_queue); } @@ -1140,10 +1145,44 @@ static struct sk_buff *tun_alloc_skb(struct tun_file *tfile, return skb; } +static int tun_rx_batched(struct tun_file *tfile, struct sk_buff *skb, + int more) +{ + struct sk_buff_head *queue = &tfile->sk.sk_write_queue; + struct sk_buff_head process_queue; + int qlen; + bool rcv = false; + + spin_lock(&queue->lock); + qlen = skb_queue_len(queue); + if (qlen > rx_batched) + goto drop; + __skb_queue_tail(queue, skb); + if (!more || qlen + 1 > rx_batched) { + __skb_queue_head_init(&process_queue); + skb_queue_splice_tail_init(queue, &process_queue); + rcv = true; + } + spin_unlock(&queue->lock); + + if (rcv) { + local_bh_disable(); + while ((skb = __skb_dequeue(&process_queue))) + netif_receive_skb(skb); + local_bh_enable(); + } + + return 0; +drop: + spin_unlock(&queue->lock); + kfree_skb(skb); + return -EFAULT; +} + /* Get packet from user space buffer */ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, void *msg_control, struct iov_iter *from, - int noblock) + int noblock, bool more) { struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) }; struct sk_buff *skb; @@ -1283,18 +1322,27 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, skb_probe_transport_header(skb, 0); rxhash = skb_get_hash(skb); + #ifndef CONFIG_4KSTACKS - local_bh_disable(); - netif_receive_skb(skb); - local_bh_enable(); + if (!rx_batched) { + local_bh_disable(); + netif_receive_skb(skb); + local_bh_enable(); + } else { + err = tun_rx_batched(tfile, skb, more); + } #else netif_rx_ni(skb); #endif stats = get_cpu_ptr(tun->pcpu_stats); u64_stats_update_begin(&stats->syncp); - stats->rx_packets++; - stats->rx_bytes += len; + if (err) { + stats->rx_dropped++; + } else { + stats->rx_packets++; + stats->rx_bytes += len; + } u64_stats_update_end(&stats->syncp); put_cpu_ptr(stats); @@ -1312,7 +1360,8 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from) if (!tun) return -EBADFD; - result = tun_get_user(tun, tfile, NULL, from, file->f_flags & O_NONBLOCK); + result = tun_get_user(tun, tfile, NULL, from, + file->f_flags & O_NONBLOCK, false); tun_put(tun); return result; @@ -1570,7 +1619,8 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) return -EBADFD; ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter, - m->msg_flags & MSG_DONTWAIT); + m->msg_flags & MSG_DONTWAIT, + m->msg_flags & MSG_MORE); tun_put(tun); return ret; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 3/3] tun: rx batching 2016-12-28 8:09 ` [PATCH net-next V2 3/3] tun: rx batching Jason Wang @ 2016-12-29 16:35 ` David Miller 2016-12-30 5:14 ` Jason Wang 2017-01-03 13:33 ` Stefan Hajnoczi 1 sibling, 1 reply; 11+ messages in thread From: David Miller @ 2016-12-29 16:35 UTC (permalink / raw) To: jasowang; +Cc: netdev, virtualization, linux-kernel, kvm, mst From: Jason Wang <jasowang@redhat.com> Date: Wed, 28 Dec 2016 16:09:31 +0800 > + spin_lock(&queue->lock); > + qlen = skb_queue_len(queue); > + if (qlen > rx_batched) > + goto drop; > + __skb_queue_tail(queue, skb); > + if (!more || qlen + 1 > rx_batched) { > + __skb_queue_head_init(&process_queue); > + skb_queue_splice_tail_init(queue, &process_queue); > + rcv = true; > + } > + spin_unlock(&queue->lock); Since you always clear the 'queue' when you insert the skb that hits the limit, I don't see how the "goto drop" path can be possibly taken. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 3/3] tun: rx batching 2016-12-29 16:35 ` David Miller @ 2016-12-30 5:14 ` Jason Wang 0 siblings, 0 replies; 11+ messages in thread From: Jason Wang @ 2016-12-30 5:14 UTC (permalink / raw) To: David Miller; +Cc: mst, kvm, virtualization, netdev, linux-kernel On 2016年12月30日 00:35, David Miller wrote: > From: Jason Wang <jasowang@redhat.com> > Date: Wed, 28 Dec 2016 16:09:31 +0800 > >> + spin_lock(&queue->lock); >> + qlen = skb_queue_len(queue); >> + if (qlen > rx_batched) >> + goto drop; >> + __skb_queue_tail(queue, skb); >> + if (!more || qlen + 1 > rx_batched) { >> + __skb_queue_head_init(&process_queue); >> + skb_queue_splice_tail_init(queue, &process_queue); >> + rcv = true; >> + } >> + spin_unlock(&queue->lock); > Since you always clear the 'queue' when you insert the skb that hits > the limit, I don't see how the "goto drop" path can be possibly taken. True, will fix this. Thanks ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 3/3] tun: rx batching 2016-12-28 8:09 ` [PATCH net-next V2 3/3] tun: rx batching Jason Wang 2016-12-29 16:35 ` David Miller @ 2017-01-03 13:33 ` Stefan Hajnoczi 2017-01-04 3:03 ` Jason Wang 1 sibling, 1 reply; 11+ messages in thread From: Stefan Hajnoczi @ 2017-01-03 13:33 UTC (permalink / raw) To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst [-- Attachment #1.1: Type: text/plain, Size: 492 bytes --] On Wed, Dec 28, 2016 at 04:09:31PM +0800, Jason Wang wrote: > +static int tun_rx_batched(struct tun_file *tfile, struct sk_buff *skb, > + int more) > +{ > + struct sk_buff_head *queue = &tfile->sk.sk_write_queue; > + struct sk_buff_head process_queue; > + int qlen; > + bool rcv = false; > + > + spin_lock(&queue->lock); Should this be spin_lock_bh()? Below and in tun_get_user() there are explicit local_bh_disable() calls so I guess BHs can interrupt us here and this would deadlock. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] [-- Attachment #2: Type: text/plain, Size: 183 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 3/3] tun: rx batching 2017-01-03 13:33 ` Stefan Hajnoczi @ 2017-01-04 3:03 ` Jason Wang 2017-01-05 9:27 ` Stefan Hajnoczi 0 siblings, 1 reply; 11+ messages in thread From: Jason Wang @ 2017-01-04 3:03 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: netdev, virtualization, linux-kernel, kvm, mst On 2017年01月03日 21:33, Stefan Hajnoczi wrote: > On Wed, Dec 28, 2016 at 04:09:31PM +0800, Jason Wang wrote: >> +static int tun_rx_batched(struct tun_file *tfile, struct sk_buff *skb, >> + int more) >> +{ >> + struct sk_buff_head *queue = &tfile->sk.sk_write_queue; >> + struct sk_buff_head process_queue; >> + int qlen; >> + bool rcv = false; >> + >> + spin_lock(&queue->lock); > Should this be spin_lock_bh()? Below and in tun_get_user() there are > explicit local_bh_disable() calls so I guess BHs can interrupt us here > and this would deadlock. sk_write_queue were accessed only in this function which runs under process context, so no need for spin_lock_bh() here. _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 3/3] tun: rx batching 2017-01-04 3:03 ` Jason Wang @ 2017-01-05 9:27 ` Stefan Hajnoczi 0 siblings, 0 replies; 11+ messages in thread From: Stefan Hajnoczi @ 2017-01-05 9:27 UTC (permalink / raw) To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst [-- Attachment #1.1: Type: text/plain, Size: 830 bytes --] On Wed, Jan 04, 2017 at 11:03:32AM +0800, Jason Wang wrote: > On 2017年01月03日 21:33, Stefan Hajnoczi wrote: > > On Wed, Dec 28, 2016 at 04:09:31PM +0800, Jason Wang wrote: > > > +static int tun_rx_batched(struct tun_file *tfile, struct sk_buff *skb, > > > + int more) > > > +{ > > > + struct sk_buff_head *queue = &tfile->sk.sk_write_queue; > > > + struct sk_buff_head process_queue; > > > + int qlen; > > > + bool rcv = false; > > > + > > > + spin_lock(&queue->lock); > > Should this be spin_lock_bh()? Below and in tun_get_user() there are > > explicit local_bh_disable() calls so I guess BHs can interrupt us here > > and this would deadlock. > > sk_write_queue were accessed only in this function which runs under process > context, so no need for spin_lock_bh() here. I see, thanks! Stefan [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] [-- Attachment #2: Type: text/plain, Size: 183 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-01-05 9:27 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-28 8:09 [PATCH net-next V2 0/3] vhost net tx batching Jason Wang 2016-12-28 8:09 ` [PATCH net-next V2 1/3] vhost: better detection of available buffers Jason Wang 2017-01-03 13:08 ` Stefan Hajnoczi 2016-12-28 8:09 ` [PATCH net-next V2 2/3] vhost_net: tx batching Jason Wang 2017-01-03 13:16 ` Stefan Hajnoczi 2016-12-28 8:09 ` [PATCH net-next V2 3/3] tun: rx batching Jason Wang 2016-12-29 16:35 ` David Miller 2016-12-30 5:14 ` Jason Wang 2017-01-03 13:33 ` Stefan Hajnoczi 2017-01-04 3:03 ` Jason Wang 2017-01-05 9:27 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).