[PATCH net-next V3 1/3] vhost: better detection of available buffers

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next V3 1/3] vhost: better detection of available buffers
       [not found] <1483075251-6889-1-git-send-email-jasowang@redhat.com>
@ 2016-12-30  5:20 ` Jason Wang
  2016-12-30  5:20 ` [PATCH net-next V3 2/3] vhost_net: tx batching Jason Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Jason Wang @ 2016-12-30  5:20 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: wexu

This patch tries to do several tweaks on vhost_vq_avail_empty() for a
better performance:

- check cached avail index first which could avoid userspace memory access.
- using unlikely() for the failure of userspace access
- check vq->last_avail_idx instead of cached avail index as the last
  step.

This patch is need for batching supports which needs to peek whether
or not there's still available buffers in the ring.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d643260..9f11838 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2241,11 +2241,15 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	__virtio16 avail_idx;
 	int r;
 
+	if (vq->avail_idx != vq->last_avail_idx)
+		return false;
+
 	r = vhost_get_user(vq, avail_idx, &vq->avail->idx);
-	if (r)
+	if (unlikely(r))
 		return false;
+	vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
 
-	return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx;
+	return vq->avail_idx == vq->last_avail_idx;
 }
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next V3 2/3] vhost_net: tx batching
       [not found] <1483075251-6889-1-git-send-email-jasowang@redhat.com>
  2016-12-30  5:20 ` [PATCH net-next V3 1/3] vhost: better detection of available buffers Jason Wang
@ 2016-12-30  5:20 ` Jason Wang
  2016-12-30  5:20 ` [PATCH net-next V3 3/3] tun: rx batching Jason Wang
       [not found] ` <1483075251-6889-4-git-send-email-jasowang@redhat.com>
  3 siblings, 0 replies; 7+ messages in thread
From: Jason Wang @ 2016-12-30  5:20 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: wexu

This patch tries to utilize tuntap rx batching by peeking the tx
virtqueue during transmission, if there's more available buffers in
the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap)
to batch the packets.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc3465..c42e9c3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -351,6 +351,15 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 	return r;
 }
 
+static bool vhost_exceeds_maxpend(struct vhost_net *net)
+{
+	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
+	struct vhost_virtqueue *vq = &nvq->vq;
+
+	return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
+		== nvq->done_idx;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
@@ -394,8 +403,7 @@ static void handle_tx(struct vhost_net *net)
 		/* If more outstanding DMAs, queue the work.
 		 * Handle upend_idx wrap around
 		 */
-		if (unlikely((nvq->upend_idx + vq->num - VHOST_MAX_PEND)
-			      % UIO_MAXIOV == nvq->done_idx))
+		if (unlikely(vhost_exceeds_maxpend(net)))
 			break;
 
 		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
@@ -454,6 +462,16 @@ static void handle_tx(struct vhost_net *net)
 			msg.msg_control = NULL;
 			ubufs = NULL;
 		}
+
+		total_len += len;
+		if (total_len < VHOST_NET_WEIGHT &&
+		    !vhost_vq_avail_empty(&net->dev, vq) &&
+		    likely(!vhost_exceeds_maxpend(net))) {
+			msg.msg_flags |= MSG_MORE;
+		} else {
+			msg.msg_flags &= ~MSG_MORE;
+		}
+
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
 		if (unlikely(err < 0)) {
@@ -472,7 +490,6 @@ static void handle_tx(struct vhost_net *net)
 			vhost_add_used_and_signal(&net->dev, vq, head, 0);
 		else
 			vhost_zerocopy_signal_used(net, vq);
-		total_len += len;
 		vhost_net_tx_packet(net);
 		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
 			vhost_poll_queue(&vq->poll);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next V3 3/3] tun: rx batching
       [not found] <1483075251-6889-1-git-send-email-jasowang@redhat.com>
  2016-12-30  5:20 ` [PATCH net-next V3 1/3] vhost: better detection of available buffers Jason Wang
  2016-12-30  5:20 ` [PATCH net-next V3 2/3] vhost_net: tx batching Jason Wang
@ 2016-12-30  5:20 ` Jason Wang
       [not found] ` <1483075251-6889-4-git-send-email-jasowang@redhat.com>
  3 siblings, 0 replies; 7+ messages in thread
From: Jason Wang @ 2016-12-30  5:20 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: wexu

We can only process 1 packet at one time during sendmsg(). This often
lead bad cache utilization under heavy load. So this patch tries to do
some batching during rx before submitting them to host network
stack. This is done through accepting MSG_MORE as a hint from
sendmsg() caller, if it was set, batch the packet temporarily in a
linked list and submit them all once MSG_MORE were cleared.

Tests were done by pktgen (burst=128) in guest over mlx4(noqueue) on host:

                  Mpps  -+%
    rx_batched=0  0.90  +0%
    rx_batched=4  0.97  +7.8%
    rx_batched=8  0.97  +7.8%
    rx_batched=16 0.98  +8.9%
    rx_batched=32 1.03  +14.4%
    rx_batched=48 1.09  +21.1%
    rx_batched=64 1.02  +13.3%

The maximum number of batched packets were specified through a module
parameter.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 50 ++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index cd8e02c..a268ed9 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -75,6 +75,10 @@
 
 #include <linux/uaccess.h>
 
+static int rx_batched;
+module_param(rx_batched, int, 0444);
+MODULE_PARM_DESC(rx_batched, "Number of packets batched in rx");
+
 /* Uncomment to enable debugging */
 /* #define TUN_DEBUG 1 */
 
@@ -522,6 +526,7 @@ static void tun_queue_purge(struct tun_file *tfile)
 	while ((skb = skb_array_consume(&tfile->tx_array)) != NULL)
 		kfree_skb(skb);
 
+	skb_queue_purge(&tfile->sk.sk_write_queue);
 	skb_queue_purge(&tfile->sk.sk_error_queue);
 }
 
@@ -1140,10 +1145,36 @@ static struct sk_buff *tun_alloc_skb(struct tun_file *tfile,
 	return skb;
 }
 
+static void tun_rx_batched(struct tun_file *tfile, struct sk_buff *skb,
+			   int more)
+{
+	struct sk_buff_head *queue = &tfile->sk.sk_write_queue;
+	struct sk_buff_head process_queue;
+	int qlen;
+	bool rcv = false;
+
+	spin_lock(&queue->lock);
+	qlen = skb_queue_len(queue);
+	__skb_queue_tail(queue, skb);
+	if (!more || qlen == rx_batched) {
+		__skb_queue_head_init(&process_queue);
+		skb_queue_splice_tail_init(queue, &process_queue);
+		rcv = true;
+	}
+	spin_unlock(&queue->lock);
+
+	if (rcv) {
+		local_bh_disable();
+		while ((skb = __skb_dequeue(&process_queue)))
+			netif_receive_skb(skb);
+		local_bh_enable();
+	}
+}
+
 /* Get packet from user space buffer */
 static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 			    void *msg_control, struct iov_iter *from,
-			    int noblock)
+			    int noblock, bool more)
 {
 	struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) };
 	struct sk_buff *skb;
@@ -1283,10 +1314,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	skb_probe_transport_header(skb, 0);
 
 	rxhash = skb_get_hash(skb);
+
 #ifndef CONFIG_4KSTACKS
-	local_bh_disable();
-	netif_receive_skb(skb);
-	local_bh_enable();
+	if (!rx_batched) {
+		local_bh_disable();
+		netif_receive_skb(skb);
+		local_bh_enable();
+	} else {
+		tun_rx_batched(tfile, skb, more);
+	}
 #else
 	netif_rx_ni(skb);
 #endif
@@ -1312,7 +1348,8 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (!tun)
 		return -EBADFD;
 
-	result = tun_get_user(tun, tfile, NULL, from, file->f_flags & O_NONBLOCK);
+	result = tun_get_user(tun, tfile, NULL, from,
+			      file->f_flags & O_NONBLOCK, false);
 
 	tun_put(tun);
 	return result;
@@ -1570,7 +1607,8 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 		return -EBADFD;
 
 	ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter,
-			   m->msg_flags & MSG_DONTWAIT);
+			   m->msg_flags & MSG_DONTWAIT,
+			   m->msg_flags & MSG_MORE);
 	tun_put(tun);
 	return ret;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next V3 3/3] tun: rx batching
       [not found] ` <1483075251-6889-4-git-send-email-jasowang@redhat.com>
@ 2016-12-31 17:31   ` David Miller
  2017-01-03  3:12     ` Jason Wang
  2016-12-31 21:03   ` Stephen Hemminger
  1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2016-12-31 17:31 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, virtualization, wexu, kvm, mst

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 30 Dec 2016 13:20:51 +0800

> @@ -1283,10 +1314,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  	skb_probe_transport_header(skb, 0);
>  
>  	rxhash = skb_get_hash(skb);
> +
>  #ifndef CONFIG_4KSTACKS
> -	local_bh_disable();
> -	netif_receive_skb(skb);
> -	local_bh_enable();
> +	if (!rx_batched) {
> +		local_bh_disable();
> +		netif_receive_skb(skb);
> +		local_bh_enable();
> +	} else {
> +		tun_rx_batched(tfile, skb, more);
> +	}
>  #else
>  	netif_rx_ni(skb);
>  #endif

If rx_batched has been set, and we are talking to clients not using
this new MSG_MORE facility (or such clients don't have multiple TX
packets to send to you, thus MSG_MORE is often clear), you are doing a
lot more work per-packet than the existing code.

You take the queue lock, you test state, you splice into a local queue
on the stack, then you walk that local stack queue to submit just one
SKB to netif_receive_skb().

I think you want to streamline this sequence in such cases so that the
cost before and after is similar if not equivalent.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next V3 3/3] tun: rx batching
       [not found] ` <1483075251-6889-4-git-send-email-jasowang@redhat.com>
  2016-12-31 17:31   ` David Miller
@ 2016-12-31 21:03   ` Stephen Hemminger
  2017-01-03  3:18     ` Jason Wang
  1 sibling, 1 reply; 7+ messages in thread
From: Stephen Hemminger @ 2016-12-31 21:03 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, virtualization, wexu, kvm, mst

On Fri, 30 Dec 2016 13:20:51 +0800
Jason Wang <jasowang@redhat.com> wrote:

> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index cd8e02c..a268ed9 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -75,6 +75,10 @@
>  
>  #include <linux/uaccess.h>
>  
> +static int rx_batched;
> +module_param(rx_batched, int, 0444);
> +MODULE_PARM_DESC(rx_batched, "Number of packets batched in rx");
> +
>  /* Uncomment to enable debugging */

I like the concept or rx batching. But controlling it via a module parameter
is one of the worst API choices.  Ethtool would be better to use because that is
how other network devices control batching.

If you do ethtool, you could even extend it to have an number of packets
and max latency value.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next V3 3/3] tun: rx batching
  2016-12-31 17:31   ` David Miller
@ 2017-01-03  3:12     ` Jason Wang
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Wang @ 2017-01-03  3:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, virtualization, wexu, kvm, mst



On 2017年01月01日 01:31, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Fri, 30 Dec 2016 13:20:51 +0800
>
>> @@ -1283,10 +1314,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>>   	skb_probe_transport_header(skb, 0);
>>   
>>   	rxhash = skb_get_hash(skb);
>> +
>>   #ifndef CONFIG_4KSTACKS
>> -	local_bh_disable();
>> -	netif_receive_skb(skb);
>> -	local_bh_enable();
>> +	if (!rx_batched) {
>> +		local_bh_disable();
>> +		netif_receive_skb(skb);
>> +		local_bh_enable();
>> +	} else {
>> +		tun_rx_batched(tfile, skb, more);
>> +	}
>>   #else
>>   	netif_rx_ni(skb);
>>   #endif
> If rx_batched has been set, and we are talking to clients not using
> this new MSG_MORE facility (or such clients don't have multiple TX
> packets to send to you, thus MSG_MORE is often clear), you are doing a
> lot more work per-packet than the existing code.
>
> You take the queue lock, you test state, you splice into a local queue
> on the stack, then you walk that local stack queue to submit just one
> SKB to netif_receive_skb().
>
> I think you want to streamline this sequence in such cases so that the
> cost before and after is similar if not equivalent.

Yes, so I will do a skb_queue_empty() check if !MSG_MORE and call 
netif_receive_skb() immediately in this case. This can save the wasted 
efforts.

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next V3 3/3] tun: rx batching
  2016-12-31 21:03   ` Stephen Hemminger
@ 2017-01-03  3:18     ` Jason Wang
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Wang @ 2017-01-03  3:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, virtualization, wexu, kvm, mst



On 2017年01月01日 05:03, Stephen Hemminger wrote:
> On Fri, 30 Dec 2016 13:20:51 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index cd8e02c..a268ed9 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -75,6 +75,10 @@
>>   
>>   #include <linux/uaccess.h>
>>   
>> +static int rx_batched;
>> +module_param(rx_batched, int, 0444);
>> +MODULE_PARM_DESC(rx_batched, "Number of packets batched in rx");
>> +
>>   /* Uncomment to enable debugging */
> I like the concept or rx batching. But controlling it via a module parameter
> is one of the worst API choices.  Ethtool would be better to use because that is
> how other network devices control batching.
>
> If you do ethtool, you could even extend it to have an number of packets
> and max latency value.

Right, this is better (I believe you mean rx-frames). For rx-usecs, we 
could do it on top in the future.

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-01-03  3:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1483075251-6889-1-git-send-email-jasowang@redhat.com>
2016-12-30  5:20 ` [PATCH net-next V3 1/3] vhost: better detection of available buffers Jason Wang
2016-12-30  5:20 ` [PATCH net-next V3 2/3] vhost_net: tx batching Jason Wang
2016-12-30  5:20 ` [PATCH net-next V3 3/3] tun: rx batching Jason Wang
     [not found] ` <1483075251-6889-4-git-send-email-jasowang@redhat.com>
2016-12-31 17:31   ` David Miller
2017-01-03  3:12     ` Jason Wang
2016-12-31 21:03   ` Stephen Hemminger
2017-01-03  3:18     ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).