[PATCH net-next] vhost_net: do not stall on zerocopy depletion

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next] vhost_net: do not stall on zerocopy depletion
@ 2017-09-28  0:25 Willem de Bruijn
  2017-09-28  0:33 ` Willem de Bruijn
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Willem de Bruijn @ 2017-09-28  0:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, mst, jasowang, den, virtualization, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Vhost-net has a hard limit on the number of zerocopy skbs in flight.
When reached, transmission stalls. Stalls cause latency, as well as
head-of-line blocking of other flows that do not use zerocopy.

Instead of stalling, revert to copy-based transmission.

Tested by sending two udp flows from guest to host, one with payload
of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
large flow is redirected to a netem instance with 1MBps rate limit
and deep 1000 entry queue.

  modprobe ifb
  ip link set dev ifb0 up
  tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit

  tc qdisc add dev tap0 ingress
  tc filter add dev tap0 parent ffff: protocol ip \
      u32 match ip dport 8000 0xffff \
      action mirred egress redirect dev ifb0

Before the delay, both flows process around 80K pps. With the delay,
before this patch, both process around 400. After this patch, the
large flow is still rate limited, while the small reverts to its
original rate. See also discussion in the first link, below.

The limit in vhost_exceeds_maxpend must be carefully chosen. When
vq->num >> 1, the flows remain correlated. This value happens to
correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
fractions and ensure correctness also for much smaller values of
vq->num, by testing the min() of both explicitly. See also the
discussion in the second link below.

Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/vhost/net.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 58585ec8699e..50758602ae9d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -436,8 +436,8 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net)
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
 
-	return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
-		== nvq->done_idx;
+	return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
+	       min(VHOST_MAX_PEND, vq->num >> 2);
 }
 
 /* Expects to be always run from workqueue - which acts as
@@ -480,12 +480,6 @@ static void handle_tx(struct vhost_net *net)
 		if (zcopy)
 			vhost_zerocopy_signal_used(net, vq);
 
-		/* If more outstanding DMAs, queue the work.
-		 * Handle upend_idx wrap around
-		 */
-		if (unlikely(vhost_exceeds_maxpend(net)))
-			break;
-
 		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
 						ARRAY_SIZE(vq->iov),
 						&out, &in);
@@ -509,6 +503,7 @@ static void handle_tx(struct vhost_net *net)
 		len = iov_length(vq->iov, out);
 		iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
 		iov_iter_advance(&msg.msg_iter, hdr_size);
+
 		/* Sanity check */
 		if (!msg_data_left(&msg)) {
 			vq_err(vq, "Unexpected header len for TX: "
@@ -519,8 +514,7 @@ static void handle_tx(struct vhost_net *net)
 		len = msg_data_left(&msg);
 
 		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
-				   && (nvq->upend_idx + 1) % UIO_MAXIOV !=
-				      nvq->done_idx
+				   && !vhost_exceeds_maxpend(net)
 				   && vhost_net_tx_select_zcopy(net);
 
 		/* use msg_control to pass vhost zerocopy ubuf info to skb */
-- 
2.14.2.822.g60be5d43e6-goog

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-09-28  0:25 [PATCH net-next] vhost_net: do not stall on zerocopy depletion Willem de Bruijn
@ 2017-09-28  0:33 ` Willem de Bruijn
  2017-09-28  7:41 ` Jason Wang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2017-09-28  0:33 UTC (permalink / raw)
  To: Network Development
  Cc: David Miller, Michael S. Tsirkin, Jason Wang, Koichiro Den,
	virtualization, Willem de Bruijn

On Wed, Sep 27, 2017 at 8:25 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Vhost-net has a hard limit on the number of zerocopy skbs in flight.
> When reached, transmission stalls. Stalls cause latency, as well as
> head-of-line blocking of other flows that do not use zerocopy.
>
> Instead of stalling, revert to copy-based transmission.
>
> Tested by sending two udp flows from guest to host, one with payload
> of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
> large flow is redirected to a netem instance with 1MBps rate limit
> and deep 1000 entry queue.
>
>   modprobe ifb
>   ip link set dev ifb0 up
>   tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit
>
>   tc qdisc add dev tap0 ingress
>   tc filter add dev tap0 parent ffff: protocol ip \
>       u32 match ip dport 8000 0xffff \
>       action mirred egress redirect dev ifb0
>
> Before the delay, both flows process around 80K pps. With the delay,
> before this patch, both process around 400. After this patch, the
> large flow is still rate limited, while the small reverts to its
> original rate. See also discussion in the first link, below.
>
> The limit in vhost_exceeds_maxpend must be carefully chosen. When
> vq->num >> 1, the flows remain correlated. This value happens to
> correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
> fractions and ensure correctness also for much smaller values of
> vq->num, by testing the min() of both explicitly. See also the
> discussion in the second link below.
>
> Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com

>From the same discussion thread: it would be good to expose stats
on the number of zerocopy skb sent and number completed without
copy.

To test this patch, I also added ethtool stats to tun and extended them
with two zerocopy counters. Then had tun override the uarg->callback
with its own and update the counters before calling the original callback.

The one useful datapoint I did not get out of that is why skbs would
revert to non-zerocopy: because of size, vhost_exceeds_maxpend
or vhost_net_tx_select_zcopy. The simplistic implementation with an
extra indirect function call and without percpu counters is also not
suitable for submission as is.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-09-28  0:25 [PATCH net-next] vhost_net: do not stall on zerocopy depletion Willem de Bruijn
  2017-09-28  0:33 ` Willem de Bruijn
@ 2017-09-28  7:41 ` Jason Wang
  2017-09-28 16:05   ` Willem de Bruijn
  2017-09-29 19:38 ` Michael S. Tsirkin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2017-09-28  7:41 UTC (permalink / raw)
  To: Willem de Bruijn, netdev
  Cc: davem, mst, den, virtualization, Willem de Bruijn



On 2017年09月28日 08:25, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Vhost-net has a hard limit on the number of zerocopy skbs in flight.
> When reached, transmission stalls. Stalls cause latency, as well as
> head-of-line blocking of other flows that do not use zerocopy.
>
> Instead of stalling, revert to copy-based transmission.
>
> Tested by sending two udp flows from guest to host, one with payload
> of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
> large flow is redirected to a netem instance with 1MBps rate limit
> and deep 1000 entry queue.
>
>    modprobe ifb
>    ip link set dev ifb0 up
>    tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit
>
>    tc qdisc add dev tap0 ingress
>    tc filter add dev tap0 parent ffff: protocol ip \
>        u32 match ip dport 8000 0xffff \
>        action mirred egress redirect dev ifb0
>
> Before the delay, both flows process around 80K pps. With the delay,
> before this patch, both process around 400. After this patch, the
> large flow is still rate limited, while the small reverts to its
> original rate. See also discussion in the first link, below.
>
> The limit in vhost_exceeds_maxpend must be carefully chosen. When
> vq->num >> 1, the flows remain correlated. This value happens to
> correspond to VHOST_MAX_PENDING for vq->num == 256.

Have you tested e.g vq->num = 512 or 1024?

>   Allow smaller
> fractions and ensure correctness also for much smaller values of
> vq->num, by testing the min() of both explicitly. See also the
> discussion in the second link below.
>
> Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
> Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>   drivers/vhost/net.c | 14 ++++----------
>   1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 58585ec8699e..50758602ae9d 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -436,8 +436,8 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net)
>   	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>   	struct vhost_virtqueue *vq = &nvq->vq;
>   
> -	return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
> -		== nvq->done_idx;
> +	return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
> +	       min(VHOST_MAX_PEND, vq->num >> 2);
>   }
>   
>   /* Expects to be always run from workqueue - which acts as
> @@ -480,12 +480,6 @@ static void handle_tx(struct vhost_net *net)
>   		if (zcopy)
>   			vhost_zerocopy_signal_used(net, vq);
>   
> -		/* If more outstanding DMAs, queue the work.
> -		 * Handle upend_idx wrap around
> -		 */
> -		if (unlikely(vhost_exceeds_maxpend(net)))
> -			break;
> -
>   		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
>   						ARRAY_SIZE(vq->iov),
>   						&out, &in);
> @@ -509,6 +503,7 @@ static void handle_tx(struct vhost_net *net)
>   		len = iov_length(vq->iov, out);
>   		iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
>   		iov_iter_advance(&msg.msg_iter, hdr_size);
> +

Looks unnecessary. Other looks good.

>   		/* Sanity check */
>   		if (!msg_data_left(&msg)) {
>   			vq_err(vq, "Unexpected header len for TX: "
> @@ -519,8 +514,7 @@ static void handle_tx(struct vhost_net *net)
>   		len = msg_data_left(&msg);
>   
>   		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
> -				   && (nvq->upend_idx + 1) % UIO_MAXIOV !=
> -				      nvq->done_idx
> +				   && !vhost_exceeds_maxpend(net)
>   				   && vhost_net_tx_select_zcopy(net);
>   
>   		/* use msg_control to pass vhost zerocopy ubuf info to skb */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-09-28  7:41 ` Jason Wang
@ 2017-09-28 16:05   ` Willem de Bruijn
  0 siblings, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2017-09-28 16:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Network Development, David Miller, Michael S. Tsirkin,
	Koichiro Den, virtualization, Willem de Bruijn

On Thu, Sep 28, 2017 at 3:41 AM, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2017年09月28日 08:25, Willem de Bruijn wrote:
>>
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Vhost-net has a hard limit on the number of zerocopy skbs in flight.
>> When reached, transmission stalls. Stalls cause latency, as well as
>> head-of-line blocking of other flows that do not use zerocopy.
>>
>> Instead of stalling, revert to copy-based transmission.
>>
>> Tested by sending two udp flows from guest to host, one with payload
>> of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
>> large flow is redirected to a netem instance with 1MBps rate limit
>> and deep 1000 entry queue.
>>
>>    modprobe ifb
>>    ip link set dev ifb0 up
>>    tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit
>>
>>    tc qdisc add dev tap0 ingress
>>    tc filter add dev tap0 parent ffff: protocol ip \
>>        u32 match ip dport 8000 0xffff \
>>        action mirred egress redirect dev ifb0
>>
>> Before the delay, both flows process around 80K pps. With the delay,
>> before this patch, both process around 400. After this patch, the
>> large flow is still rate limited, while the small reverts to its
>> original rate. See also discussion in the first link, below.
>>
>> The limit in vhost_exceeds_maxpend must be carefully chosen. When
>> vq->num >> 1, the flows remain correlated. This value happens to
>> correspond to VHOST_MAX_PENDING for vq->num == 256.
>
>
> Have you tested e.g vq->num = 512 or 1024?

I did test with 1024 previously, but let me run that again
with this patch applied.

>
>
>>   Allow smaller
>> fractions and ensure correctness also for much smaller values of
>> vq->num, by testing the min() of both explicitly. See also the
>> discussion in the second link below.
>>
>>
>> Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
>> Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> ---
>>   drivers/vhost/net.c | 14 ++++----------
>>   1 file changed, 4 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 58585ec8699e..50758602ae9d 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -436,8 +436,8 @@ static bool vhost_exceeds_maxpend(struct vhost_net
>> *net)
>>         struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>>         struct vhost_virtqueue *vq = &nvq->vq;
>>   -     return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
>> -               == nvq->done_idx;
>> +       return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV
>> >
>> +              min(VHOST_MAX_PEND, vq->num >> 2);
>>   }
>>     /* Expects to be always run from workqueue - which acts as
>> @@ -480,12 +480,6 @@ static void handle_tx(struct vhost_net *net)
>>                 if (zcopy)
>>                         vhost_zerocopy_signal_used(net, vq);
>>   -             /* If more outstanding DMAs, queue the work.
>> -                * Handle upend_idx wrap around
>> -                */
>> -               if (unlikely(vhost_exceeds_maxpend(net)))
>> -                       break;
>> -
>>                 head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
>>                                                 ARRAY_SIZE(vq->iov),
>>                                                 &out, &in);
>> @@ -509,6 +503,7 @@ static void handle_tx(struct vhost_net *net)
>>                 len = iov_length(vq->iov, out);
>>                 iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
>>                 iov_iter_advance(&msg.msg_iter, hdr_size);
>> +
>
>
> Looks unnecessary. Other looks good.

Oops, indeed. Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-09-28  0:25 [PATCH net-next] vhost_net: do not stall on zerocopy depletion Willem de Bruijn
  2017-09-28  0:33 ` Willem de Bruijn
  2017-09-28  7:41 ` Jason Wang
@ 2017-09-29 19:38 ` Michael S. Tsirkin
  2017-09-30  1:25   ` Willem de Bruijn
  2017-09-30 22:12 ` kbuild test robot
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2017-09-29 19:38 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: netdev, davem, jasowang, den, virtualization, Willem de Bruijn

On Wed, Sep 27, 2017 at 08:25:56PM -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Vhost-net has a hard limit on the number of zerocopy skbs in flight.
> When reached, transmission stalls. Stalls cause latency, as well as
> head-of-line blocking of other flows that do not use zerocopy.
> 
> Instead of stalling, revert to copy-based transmission.
> 
> Tested by sending two udp flows from guest to host, one with payload
> of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
> large flow is redirected to a netem instance with 1MBps rate limit
> and deep 1000 entry queue.
> 
>   modprobe ifb
>   ip link set dev ifb0 up
>   tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit
> 
>   tc qdisc add dev tap0 ingress
>   tc filter add dev tap0 parent ffff: protocol ip \
>       u32 match ip dport 8000 0xffff \
>       action mirred egress redirect dev ifb0
> 
> Before the delay, both flows process around 80K pps. With the delay,
> before this patch, both process around 400. After this patch, the
> large flow is still rate limited, while the small reverts to its
> original rate. See also discussion in the first link, below.
> 
> The limit in vhost_exceeds_maxpend must be carefully chosen. When
> vq->num >> 1, the flows remain correlated. This value happens to
> correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
> fractions and ensure correctness also for much smaller values of
> vq->num, by testing the min() of both explicitly. See also the
> discussion in the second link below.
> 
> Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
> Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
> Signed-off-by: Willem de Bruijn <willemb@google.com>

I'd like to see the effect on the non rate limited case though.
If guest is quick won't we have lots of copies then?


> ---
>  drivers/vhost/net.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 58585ec8699e..50758602ae9d 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -436,8 +436,8 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net)
>  	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>  	struct vhost_virtqueue *vq = &nvq->vq;
>  
> -	return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
> -		== nvq->done_idx;
> +	return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
> +	       min(VHOST_MAX_PEND, vq->num >> 2);
>  }
>  
>  /* Expects to be always run from workqueue - which acts as
> @@ -480,12 +480,6 @@ static void handle_tx(struct vhost_net *net)
>  		if (zcopy)
>  			vhost_zerocopy_signal_used(net, vq);
>  
> -		/* If more outstanding DMAs, queue the work.
> -		 * Handle upend_idx wrap around
> -		 */
> -		if (unlikely(vhost_exceeds_maxpend(net)))
> -			break;
> -
>  		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
>  						ARRAY_SIZE(vq->iov),
>  						&out, &in);
> @@ -509,6 +503,7 @@ static void handle_tx(struct vhost_net *net)
>  		len = iov_length(vq->iov, out);
>  		iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
>  		iov_iter_advance(&msg.msg_iter, hdr_size);
> +
>  		/* Sanity check */
>  		if (!msg_data_left(&msg)) {
>  			vq_err(vq, "Unexpected header len for TX: "
> @@ -519,8 +514,7 @@ static void handle_tx(struct vhost_net *net)
>  		len = msg_data_left(&msg);
>  
>  		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
> -				   && (nvq->upend_idx + 1) % UIO_MAXIOV !=
> -				      nvq->done_idx
> +				   && !vhost_exceeds_maxpend(net)
>  				   && vhost_net_tx_select_zcopy(net);
>  
>  		/* use msg_control to pass vhost zerocopy ubuf info to skb */
> -- 
> 2.14.2.822.g60be5d43e6-goog

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-09-29 19:38 ` Michael S. Tsirkin
@ 2017-09-30  1:25   ` Willem de Bruijn
  2017-10-02  4:08     ` Michael S. Tsirkin
  2017-10-02 21:34     ` Willem de Bruijn
  0 siblings, 2 replies; 13+ messages in thread
From: Willem de Bruijn @ 2017-09-30  1:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Network Development, David Miller, Jason Wang, Koichiro Den,
	virtualization, Willem de Bruijn

On Fri, Sep 29, 2017 at 3:38 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, Sep 27, 2017 at 08:25:56PM -0400, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Vhost-net has a hard limit on the number of zerocopy skbs in flight.
>> When reached, transmission stalls. Stalls cause latency, as well as
>> head-of-line blocking of other flows that do not use zerocopy.
>>
>> Instead of stalling, revert to copy-based transmission.
>>
>> Tested by sending two udp flows from guest to host, one with payload
>> of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
>> large flow is redirected to a netem instance with 1MBps rate limit
>> and deep 1000 entry queue.
>>
>>   modprobe ifb
>>   ip link set dev ifb0 up
>>   tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit
>>
>>   tc qdisc add dev tap0 ingress
>>   tc filter add dev tap0 parent ffff: protocol ip \
>>       u32 match ip dport 8000 0xffff \
>>       action mirred egress redirect dev ifb0
>>
>> Before the delay, both flows process around 80K pps. With the delay,
>> before this patch, both process around 400. After this patch, the
>> large flow is still rate limited, while the small reverts to its
>> original rate. See also discussion in the first link, below.
>>
>> The limit in vhost_exceeds_maxpend must be carefully chosen. When
>> vq->num >> 1, the flows remain correlated. This value happens to
>> correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
>> fractions and ensure correctness also for much smaller values of
>> vq->num, by testing the min() of both explicitly. See also the
>> discussion in the second link below.
>>
>> Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
>> Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>
> I'd like to see the effect on the non rate limited case though.
> If guest is quick won't we have lots of copies then?

Yes, but not significantly more than without this patch.

I ran 1, 10 and 100 flow tcp_stream throughput tests from a sender
in the guest to a receiver in the host.

To answer the other benchmark question first, I did not see anything
noteworthy when increasing vq->num from 256 to 1024.

With 1 and 10 flows without this patch all packets use zerocopy.
With the patch, less than 1% eschews zerocopy.

With 100 flows, even without this patch, 90+% of packets are copied.
Some zerocopy packets from vhost_net fail this test in tun.c

    if (iov_iter_npages(&i, INT_MAX) <= MAX_SKB_FRAGS)

Generating packets with up to 21 frags. I'm not sure yet why or
what the fraction of these packets is. But this in turn can
disable zcopy_used in vhost_net_tx_select_zcopy for a
larger share of packets:

        return !net->tx_flush &&
                net->tx_packets / 64 >= net->tx_zcopy_err;

Because the number of copied and zerocopy packets are the
same before and after the patch, so are the overall throughput
numbers.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-09-30  1:25   ` Willem de Bruijn
@ 2017-10-02  4:08     ` Michael S. Tsirkin
  2017-10-02 21:34     ` Willem de Bruijn
  1 sibling, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02  4:08 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David Miller, Jason Wang, Koichiro Den,
	virtualization, Willem de Bruijn

On Fri, Sep 29, 2017 at 09:25:27PM -0400, Willem de Bruijn wrote:
> On Fri, Sep 29, 2017 at 3:38 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, Sep 27, 2017 at 08:25:56PM -0400, Willem de Bruijn wrote:
> >> From: Willem de Bruijn <willemb@google.com>
> >>
> >> Vhost-net has a hard limit on the number of zerocopy skbs in flight.
> >> When reached, transmission stalls. Stalls cause latency, as well as
> >> head-of-line blocking of other flows that do not use zerocopy.
> >>
> >> Instead of stalling, revert to copy-based transmission.
> >>
> >> Tested by sending two udp flows from guest to host, one with payload
> >> of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
> >> large flow is redirected to a netem instance with 1MBps rate limit
> >> and deep 1000 entry queue.
> >>
> >>   modprobe ifb
> >>   ip link set dev ifb0 up
> >>   tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit
> >>
> >>   tc qdisc add dev tap0 ingress
> >>   tc filter add dev tap0 parent ffff: protocol ip \
> >>       u32 match ip dport 8000 0xffff \
> >>       action mirred egress redirect dev ifb0
> >>
> >> Before the delay, both flows process around 80K pps. With the delay,
> >> before this patch, both process around 400. After this patch, the
> >> large flow is still rate limited, while the small reverts to its
> >> original rate. See also discussion in the first link, below.
> >>
> >> The limit in vhost_exceeds_maxpend must be carefully chosen. When
> >> vq->num >> 1, the flows remain correlated. This value happens to
> >> correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
> >> fractions and ensure correctness also for much smaller values of
> >> vq->num, by testing the min() of both explicitly. See also the
> >> discussion in the second link below.
> >>
> >> Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
> >> Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
> >> Signed-off-by: Willem de Bruijn <willemb@google.com>
> >
> > I'd like to see the effect on the non rate limited case though.
> > If guest is quick won't we have lots of copies then?
> 
> Yes, but not significantly more than without this patch.
> 
> I ran 1, 10 and 100 flow tcp_stream throughput tests from a sender
> in the guest to a receiver in the host.
> 
> To answer the other benchmark question first, I did not see anything
> noteworthy when increasing vq->num from 256 to 1024.
> 
> With 1 and 10 flows without this patch all packets use zerocopy.
> With the patch, less than 1% eschews zerocopy.
> 
> With 100 flows, even without this patch, 90+% of packets are copied.
> Some zerocopy packets from vhost_net fail this test in tun.c
> 
>     if (iov_iter_npages(&i, INT_MAX) <= MAX_SKB_FRAGS)
> 
> Generating packets with up to 21 frags. I'm not sure yet why or
> what the fraction of these packets is. But this in turn can
> disable zcopy_used in vhost_net_tx_select_zcopy for a
> larger share of packets:
> 
>         return !net->tx_flush &&
>                 net->tx_packets / 64 >= net->tx_zcopy_err;
> 
> Because the number of copied and zerocopy packets are the
> same before and after the patch, so are the overall throughput
> numbers.

OK, thanks!
Are you looking into new warnings that kbuild system reported
with this patch?

Thanks,

-- 
MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-09-30  1:25   ` Willem de Bruijn
  2017-10-02  4:08     ` Michael S. Tsirkin
@ 2017-10-02 21:34     ` Willem de Bruijn
  1 sibling, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2017-10-02 21:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Network Development, David Miller, Jason Wang, Koichiro Den,
	virtualization, Willem de Bruijn

On Fri, Sep 29, 2017 at 9:25 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Fri, Sep 29, 2017 at 3:38 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Wed, Sep 27, 2017 at 08:25:56PM -0400, Willem de Bruijn wrote:
>>> From: Willem de Bruijn <willemb@google.com>
>>>
>>> Vhost-net has a hard limit on the number of zerocopy skbs in flight.
>>> When reached, transmission stalls. Stalls cause latency, as well as
>>> head-of-line blocking of other flows that do not use zerocopy.
>>>
>>> Instead of stalling, revert to copy-based transmission.
>>>
>>> Tested by sending two udp flows from guest to host, one with payload
>>> of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
>>> large flow is redirected to a netem instance with 1MBps rate limit
>>> and deep 1000 entry queue.
>>>
>>>   modprobe ifb
>>>   ip link set dev ifb0 up
>>>   tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit
>>>
>>>   tc qdisc add dev tap0 ingress
>>>   tc filter add dev tap0 parent ffff: protocol ip \
>>>       u32 match ip dport 8000 0xffff \
>>>       action mirred egress redirect dev ifb0
>>>
>>> Before the delay, both flows process around 80K pps. With the delay,
>>> before this patch, both process around 400. After this patch, the
>>> large flow is still rate limited, while the small reverts to its
>>> original rate. See also discussion in the first link, below.
>>>
>>> The limit in vhost_exceeds_maxpend must be carefully chosen. When
>>> vq->num >> 1, the flows remain correlated. This value happens to
>>> correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
>>> fractions and ensure correctness also for much smaller values of
>>> vq->num, by testing the min() of both explicitly. See also the
>>> discussion in the second link below.
>>>
>>> Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
>>> Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>>
>> I'd like to see the effect on the non rate limited case though.
>> If guest is quick won't we have lots of copies then?
>
> Yes, but not significantly more than without this patch.
>
> I ran 1, 10 and 100 flow tcp_stream throughput tests from a sender
> in the guest to a receiver in the host.
>
> To answer the other benchmark question first, I did not see anything
> noteworthy when increasing vq->num from 256 to 1024.
>
> With 1 and 10 flows without this patch all packets use zerocopy.
> With the patch, less than 1% eschews zerocopy.
>
> With 100 flows, even without this patch, 90+% of packets are copied.
> Some zerocopy packets from vhost_net fail this test in tun.c
>
>     if (iov_iter_npages(&i, INT_MAX) <= MAX_SKB_FRAGS)
>
> Generating packets with up to 21 frags. I'm not sure yet why or
> what the fraction of these packets is.

This seems to be a mix of page alignment and compound pages.
The iov_len is always well below the maximum, but frags exceed
page size and can start high in the initial page.

 tun_get_user: num_pages=21 max=17 iov_len=6 len=65226
   0: p_off=3264 len=6888
   1: p_off=1960 len=16384
   2: p_off=1960 len=6232
   3: p_off=0 len=10152
   4: p_off=1960 len=16384
   5: p_off=1960 len=9120

> But this in turn can
> disable zcopy_used in vhost_net_tx_select_zcopy for a
> larger share of packets:
>
>         return !net->tx_flush &&
>                 net->tx_packets / 64 >= net->tx_zcopy_err;
>

Testing iov_iter_npages() in handle_tx to inform zcopy_used allows
skipping these without turning off zerocopy for all other packets.

After implementing that, tx_zcopy_err drops to zero, but only around
40% of packets use zerocopy.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-09-28  0:25 [PATCH net-next] vhost_net: do not stall on zerocopy depletion Willem de Bruijn
                   ` (2 preceding siblings ...)
  2017-09-29 19:38 ` Michael S. Tsirkin
@ 2017-09-30 22:12 ` kbuild test robot
  2017-09-30 22:20 ` kbuild test robot
  2017-10-01  0:09 ` kbuild test robot
  5 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2017-09-30 22:12 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Willem de Bruijn, mst, netdev, den, virtualization, kbuild-all,
	davem

[-- Attachment #1: Type: text/plain, Size: 1630 bytes --]

Hi Willem,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Willem-de-Bruijn/vhost_net-do-not-stall-on-zerocopy-depletion/20171001-054709
config: x86_64-randconfig-x002-201740 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/list.h:8:0,
                    from include/linux/wait.h:6,
                    from include/linux/eventfd.h:12,
                    from drivers/vhost/net.c:10:
   drivers/vhost/net.c: In function 'vhost_exceeds_maxpend':
   include/linux/kernel.h:772:16: warning: comparison of distinct pointer types lacks a cast
     (void) (&min1 == &min2);   \
                   ^
   include/linux/kernel.h:775:2: note: in expansion of macro '__min'
     __min(typeof(x), typeof(y),   \
     ^~~~~
>> drivers/vhost/net.c:440:9: note: in expansion of macro 'min'
            min(VHOST_MAX_PEND, vq->num >> 2);
            ^~~

vim +/min +440 drivers/vhost/net.c

   433	
   434	static bool vhost_exceeds_maxpend(struct vhost_net *net)
   435	{
   436		struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
   437		struct vhost_virtqueue *vq = &nvq->vq;
   438	
   439		return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
 > 440		       min(VHOST_MAX_PEND, vq->num >> 2);
   441	}
   442	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25413 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-09-28  0:25 [PATCH net-next] vhost_net: do not stall on zerocopy depletion Willem de Bruijn
                   ` (3 preceding siblings ...)
  2017-09-30 22:12 ` kbuild test robot
@ 2017-09-30 22:20 ` kbuild test robot
  2017-10-01  0:09 ` kbuild test robot
  5 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2017-09-30 22:20 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: kbuild-all, netdev, davem, mst, jasowang, den, virtualization,
	Willem de Bruijn

[-- Attachment #1: Type: text/plain, Size: 1263 bytes --]

Hi Willem,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Willem-de-Bruijn/vhost_net-do-not-stall-on-zerocopy-depletion/20171001-054709
config: tile-allyesconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=tile 

All warnings (new ones prefixed by >>):

   drivers/vhost/net.c: In function 'vhost_exceeds_maxpend':
>> drivers/vhost/net.c:440:9: warning: comparison of distinct pointer types lacks a cast [enabled by default]

vim +440 drivers/vhost/net.c

   433	
   434	static bool vhost_exceeds_maxpend(struct vhost_net *net)
   435	{
   436		struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
   437		struct vhost_virtqueue *vq = &nvq->vq;
   438	
   439		return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
 > 440		       min(VHOST_MAX_PEND, vq->num >> 2);
   441	}
   442	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 50505 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-09-28  0:25 [PATCH net-next] vhost_net: do not stall on zerocopy depletion Willem de Bruijn
                   ` (4 preceding siblings ...)
  2017-09-30 22:20 ` kbuild test robot
@ 2017-10-01  0:09 ` kbuild test robot
  2017-10-01  3:20   ` Michael S. Tsirkin
  5 siblings, 1 reply; 13+ messages in thread
From: kbuild test robot @ 2017-10-01  0:09 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: kbuild-all, netdev, davem, mst, jasowang, den, virtualization,
	Willem de Bruijn

Hi Willem,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Willem-de-Bruijn/vhost_net-do-not-stall-on-zerocopy-depletion/20171001-054709
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)


vim +440 drivers/vhost/net.c

   433	
   434	static bool vhost_exceeds_maxpend(struct vhost_net *net)
   435	{
   436		struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
   437		struct vhost_virtqueue *vq = &nvq->vq;
   438	
   439		return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
 > 440		       min(VHOST_MAX_PEND, vq->num >> 2);
   441	}
   442	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-10-01  0:09 ` kbuild test robot
@ 2017-10-01  3:20   ` Michael S. Tsirkin
  2017-10-01  3:26     ` [kbuild-all] " Fengguang Wu
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2017-10-01  3:20 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Willem de Bruijn, kbuild-all, netdev, davem, jasowang, den,
	virtualization, Willem de Bruijn

On Sun, Oct 01, 2017 at 08:09:30AM +0800, kbuild test robot wrote:
> Hi Willem,
> 
> [auto build test WARNING on net-next/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Willem-de-Bruijn/vhost_net-do-not-stall-on-zerocopy-depletion/20171001-054709
> reproduce:
>         # apt-get install sparse
>         make ARCH=x86_64 allmodconfig
>         make C=1 CF=-D__CHECK_ENDIAN__

BTW __CHECK_ENDIAN__ is the default now, I think you can drop it from
your scripts.

> 
> sparse warnings: (new ones prefixed by >>)
> 
> 
> vim +440 drivers/vhost/net.c
> 
>    433	
>    434	static bool vhost_exceeds_maxpend(struct vhost_net *net)
>    435	{
>    436		struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>    437		struct vhost_virtqueue *vq = &nvq->vq;
>    438	
>    439		return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
>  > 440		       min(VHOST_MAX_PEND, vq->num >> 2);
>    441	}
>    442	
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [kbuild-all] [PATCH net-next] vhost_net: do not stall on zerocopy depletion
  2017-10-01  3:20   ` Michael S. Tsirkin
@ 2017-10-01  3:26     ` Fengguang Wu
  0 siblings, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2017-10-01  3:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, netdev, jasowang, Willem de Bruijn, den,
	virtualization, kbuild-all, davem

On Sun, Oct 01, 2017 at 06:20:49AM +0300, Michael S. Tsirkin wrote:
>On Sun, Oct 01, 2017 at 08:09:30AM +0800, kbuild test robot wrote:
>> Hi Willem,
>>
>> [auto build test WARNING on net-next/master]
>>
>> url:    https://github.com/0day-ci/linux/commits/Willem-de-Bruijn/vhost_net-do-not-stall-on-zerocopy-depletion/20171001-054709
>> reproduce:
>>         # apt-get install sparse
>>         make ARCH=x86_64 allmodconfig
>>         make C=1 CF=-D__CHECK_ENDIAN__
>
>BTW __CHECK_ENDIAN__ is the default now, I think you can drop it from
>your scripts.

Good tip, thank you! However since we're testing old kernels, too,
we'll need to keep it for probably 1-3 more years.

Thanks,
Fengguang

>>
>> sparse warnings: (new ones prefixed by >>)
>>
>>
>> vim +440 drivers/vhost/net.c
>>
>>    433	
>>    434	static bool vhost_exceeds_maxpend(struct vhost_net *net)
>>    435	{
>>    436		struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>>    437		struct vhost_virtqueue *vq = &nvq->vq;
>>    438	
>>    439		return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
>>  > 440		       min(VHOST_MAX_PEND, vq->num >> 2);
>>    441	}
>>    442	
>>
>> ---
>> 0-DAY kernel test infrastructure                Open Source Technology Center
>> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
>_______________________________________________
>kbuild-all mailing list
>kbuild-all@lists.01.org
>https://lists.01.org/mailman/listinfo/kbuild-all

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-10-02 21:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-28  0:25 [PATCH net-next] vhost_net: do not stall on zerocopy depletion Willem de Bruijn
2017-09-28  0:33 ` Willem de Bruijn
2017-09-28  7:41 ` Jason Wang
2017-09-28 16:05   ` Willem de Bruijn
2017-09-29 19:38 ` Michael S. Tsirkin
2017-09-30  1:25   ` Willem de Bruijn
2017-10-02  4:08     ` Michael S. Tsirkin
2017-10-02 21:34     ` Willem de Bruijn
2017-09-30 22:12 ` kbuild test robot
2017-09-30 22:20 ` kbuild test robot
2017-10-01  0:09 ` kbuild test robot
2017-10-01  3:20   ` Michael S. Tsirkin
2017-10-01  3:26     ` [kbuild-all] " Fengguang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).