netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 1/2] vhost-net: unbreak busy polling
@ 2025-09-12  8:26 Jason Wang
  2025-09-12  8:26 ` [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification Jason Wang
  2025-09-12  8:51 ` [PATCH net 1/2] vhost-net: unbreak busy polling Michael S. Tsirkin
  0 siblings, 2 replies; 16+ messages in thread
From: Jason Wang @ 2025-09-12  8:26 UTC (permalink / raw)
  To: mst, jasowang, eperezma
  Cc: jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

Commit 67a873df0c41 ("vhost: basic in order support") pass the number
of used elem to vhost_net_rx_peek_head_len() to make sure it can
signal the used correctly before trying to do busy polling. But it
forgets to clear the count, this would cause the count run out of sync
with handle_rx() and break the busy polling.

Fixing this by passing the pointer of the count and clearing it after
the signaling the used.

Cc: stable@vger.kernel.org
Fixes: 67a873df0c41 ("vhost: basic in order support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index c6508fe0d5c8..16e39f3ab956 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1014,7 +1014,7 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
 }
 
 static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
-				      bool *busyloop_intr, unsigned int count)
+				      bool *busyloop_intr, unsigned int *count)
 {
 	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
 	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
@@ -1024,7 +1024,8 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
 
 	if (!len && rvq->busyloop_timeout) {
 		/* Flush batched heads first */
-		vhost_net_signal_used(rnvq, count);
+		vhost_net_signal_used(rnvq, *count);
+		*count = 0;
 		/* Both tx vq and rx socket were polled here */
 		vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, true);
 
@@ -1180,7 +1181,7 @@ static void handle_rx(struct vhost_net *net)
 
 	do {
 		sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
-						      &busyloop_intr, count);
+						      &busyloop_intr, &count);
 		if (!sock_len)
 			break;
 		sock_len += sock_hlen;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-12  8:26 [PATCH net 1/2] vhost-net: unbreak busy polling Jason Wang
@ 2025-09-12  8:26 ` Jason Wang
  2025-09-12  8:50   ` Michael S. Tsirkin
  2025-09-15 16:03   ` Michael S. Tsirkin
  2025-09-12  8:51 ` [PATCH net 1/2] vhost-net: unbreak busy polling Michael S. Tsirkin
  1 sibling, 2 replies; 16+ messages in thread
From: Jason Wang @ 2025-09-12  8:26 UTC (permalink / raw)
  To: mst, jasowang, eperezma
  Cc: jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
sendmsg") tries to defer the notification enabling by moving the logic
out of the loop after the vhost_tx_batch() when nothing new is
spotted. This will bring side effects as the new logic would be reused
for several other error conditions.

One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
might return -EAGAIN and exit the loop and see there's still available
buffers, so it will queue the tx work again until userspace feed the
IOTLB entry correctly. This will slowdown the tx processing and may
trigger the TX watchdog in the guest.

Fixing this by stick the notificaiton enabling logic inside the loop
when nothing new is spotted and flush the batched before.

Reported-by: Jon Kohler <jon@nutanix.com>
Cc: stable@vger.kernel.org
Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 16e39f3ab956..3611b7537932 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 	int err;
 	int sent_pkts = 0;
 	bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
-	bool busyloop_intr;
 	bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
 
 	do {
-		busyloop_intr = false;
+		bool busyloop_intr = false;
+
 		if (nvq->done_idx == VHOST_NET_BATCH)
 			vhost_tx_batch(net, nvq, sock, &msg);
 
@@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 			break;
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
 		if (head == vq->num) {
-			/* Kicks are disabled at this point, break loop and
-			 * process any remaining batched packets. Queue will
-			 * be re-enabled afterwards.
+			/* Flush batched packets before enabling
+			 * virqtueue notification to reduce
+			 * unnecssary virtqueue kicks.
 			 */
+			vhost_tx_batch(net, nvq, sock, &msg);
+			if (unlikely(busyloop_intr)) {
+				vhost_poll_queue(&vq->poll);
+			} else if (unlikely(vhost_enable_notify(&net->dev,
+								vq))) {
+				vhost_disable_notify(&net->dev, vq);
+				continue;
+			}
 			break;
 		}
 
@@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 		++nvq->done_idx;
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
 
-	/* Kicks are still disabled, dispatch any remaining batched msgs. */
 	vhost_tx_batch(net, nvq, sock, &msg);
-
-	if (unlikely(busyloop_intr))
-		/* If interrupted while doing busy polling, requeue the
-		 * handler to be fair handle_rx as well as other tasks
-		 * waiting on cpu.
-		 */
-		vhost_poll_queue(&vq->poll);
-	else
-		/* All of our work has been completed; however, before
-		 * leaving the TX handler, do one last check for work,
-		 * and requeue handler if necessary. If there is no work,
-		 * queue will be reenabled.
-		 */
-		vhost_net_busy_poll_try_queue(net, vq);
 }
 
 static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-12  8:26 ` [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification Jason Wang
@ 2025-09-12  8:50   ` Michael S. Tsirkin
  2025-09-12 15:24     ` Jon Kohler
  2025-09-15 16:03   ` Michael S. Tsirkin
  1 sibling, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2025-09-12  8:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: eperezma, jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
> Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
> sendmsg") tries to defer the notification enabling by moving the logic
> out of the loop after the vhost_tx_batch() when nothing new is
> spotted. This will bring side effects as the new logic would be reused
> for several other error conditions.
> 
> One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
> might return -EAGAIN and exit the loop and see there's still available
> buffers, so it will queue the tx work again until userspace feed the
> IOTLB entry correctly. This will slowdown the tx processing and may
> trigger the TX watchdog in the guest.

It's not that it might.
Pls clarify that it *has been reported* to do exactly that,
and add a link to the report.


> Fixing this by stick the notificaiton enabling logic inside the loop
> when nothing new is spotted and flush the batched before.
> 
> Reported-by: Jon Kohler <jon@nutanix.com>
> Cc: stable@vger.kernel.org
> Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

So this is mostly a revert, but with
                     vhost_tx_batch(net, nvq, sock, &msg);
added in to avoid regressing performance.

If you do not want to structure it like this (revert+optimization),
then pls make that clear in the message.


> ---
>  drivers/vhost/net.c | 33 +++++++++++++--------------------
>  1 file changed, 13 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 16e39f3ab956..3611b7537932 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  	int err;
>  	int sent_pkts = 0;
>  	bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
> -	bool busyloop_intr;
>  	bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
>  
>  	do {
> -		busyloop_intr = false;
> +		bool busyloop_intr = false;
> +
>  		if (nvq->done_idx == VHOST_NET_BATCH)
>  			vhost_tx_batch(net, nvq, sock, &msg);
>  
> @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  			break;
>  		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>  		if (head == vq->num) {
> -			/* Kicks are disabled at this point, break loop and
> -			 * process any remaining batched packets. Queue will
> -			 * be re-enabled afterwards.
> +			/* Flush batched packets before enabling
> +			 * virqtueue notification to reduce
> +			 * unnecssary virtqueue kicks.

typos: virtqueue, unnecessary

>  			 */
> +			vhost_tx_batch(net, nvq, sock, &msg);
> +			if (unlikely(busyloop_intr)) {
> +				vhost_poll_queue(&vq->poll);
> +			} else if (unlikely(vhost_enable_notify(&net->dev,
> +								vq))) {
> +				vhost_disable_notify(&net->dev, vq);
> +				continue;
> +			}
>  			break;
>  		}
>  
> @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  		++nvq->done_idx;
>  	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
>  
> -	/* Kicks are still disabled, dispatch any remaining batched msgs. */
>  	vhost_tx_batch(net, nvq, sock, &msg);
> -
> -	if (unlikely(busyloop_intr))
> -		/* If interrupted while doing busy polling, requeue the
> -		 * handler to be fair handle_rx as well as other tasks
> -		 * waiting on cpu.
> -		 */
> -		vhost_poll_queue(&vq->poll);
> -	else
> -		/* All of our work has been completed; however, before
> -		 * leaving the TX handler, do one last check for work,
> -		 * and requeue handler if necessary. If there is no work,
> -		 * queue will be reenabled.
> -		 */
> -		vhost_net_busy_poll_try_queue(net, vq);
>  }
>  
>  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 1/2] vhost-net: unbreak busy polling
  2025-09-12  8:26 [PATCH net 1/2] vhost-net: unbreak busy polling Jason Wang
  2025-09-12  8:26 ` [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification Jason Wang
@ 2025-09-12  8:51 ` Michael S. Tsirkin
  1 sibling, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2025-09-12  8:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: eperezma, jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

On Fri, Sep 12, 2025 at 04:26:57PM +0800, Jason Wang wrote:
> Commit 67a873df0c41 ("vhost: basic in order support") pass the number
> of used elem to vhost_net_rx_peek_head_len() to make sure it can
> signal the used correctly before trying to do busy polling. But it
> forgets to clear the count, this would cause the count run out of sync
> with handle_rx() and break the busy polling.
> 
> Fixing this by passing the pointer of the count and clearing it after
> the signaling the used.
> 
> Cc: stable@vger.kernel.org
> Fixes: 67a873df0c41 ("vhost: basic in order support")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/vhost/net.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index c6508fe0d5c8..16e39f3ab956 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1014,7 +1014,7 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
>  }
>  
>  static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
> -				      bool *busyloop_intr, unsigned int count)
> +				      bool *busyloop_intr, unsigned int *count)
>  {
>  	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
>  	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> @@ -1024,7 +1024,8 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
>  
>  	if (!len && rvq->busyloop_timeout) {
>  		/* Flush batched heads first */
> -		vhost_net_signal_used(rnvq, count);
> +		vhost_net_signal_used(rnvq, *count);
> +		*count = 0;
>  		/* Both tx vq and rx socket were polled here */
>  		vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, true);
>  
> @@ -1180,7 +1181,7 @@ static void handle_rx(struct vhost_net *net)
>  
>  	do {
>  		sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
> -						      &busyloop_intr, count);
> +						      &busyloop_intr, &count);
>  		if (!sock_len)
>  			break;
>  		sock_len += sock_hlen;
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-12  8:50   ` Michael S. Tsirkin
@ 2025-09-12 15:24     ` Jon Kohler
  2025-09-12 15:30       ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2025-09-12 15:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eperezma@redhat.com, jonah.palmer@oracle.com,
	kuba@kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org



> On Sep 12, 2025, at 4:50 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
>> Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
>> sendmsg") tries to defer the notification enabling by moving the logic
>> out of the loop after the vhost_tx_batch() when nothing new is
>> spotted. This will bring side effects as the new logic would be reused
>> for several other error conditions.
>> 
>> One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
>> might return -EAGAIN and exit the loop and see there's still available
>> buffers, so it will queue the tx work again until userspace feed the
>> IOTLB entry correctly. This will slowdown the tx processing and may
>> trigger the TX watchdog in the guest.
> 
> It's not that it might.
> Pls clarify that it *has been reported* to do exactly that,
> and add a link to the report.
> 
> 
>> Fixing this by stick the notificaiton enabling logic inside the loop
>> when nothing new is spotted and flush the batched before.
>> 
>> Reported-by: Jon Kohler <jon@nutanix.com>
>> Cc: stable@vger.kernel.org
>> Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> So this is mostly a revert, but with
>                     vhost_tx_batch(net, nvq, sock, &msg);
> added in to avoid regressing performance.
> 
> If you do not want to structure it like this (revert+optimization),
> then pls make that clear in the message.
> 
> 
>> ---
>> drivers/vhost/net.c | 33 +++++++++++++--------------------
>> 1 file changed, 13 insertions(+), 20 deletions(-)
>> 
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 16e39f3ab956..3611b7537932 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>> int err;
>> int sent_pkts = 0;
>> bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
>> - bool busyloop_intr;
>> bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
>> 
>> do {
>> - busyloop_intr = false;
>> + bool busyloop_intr = false;
>> +
>> if (nvq->done_idx == VHOST_NET_BATCH)
>> vhost_tx_batch(net, nvq, sock, &msg);
>> 
>> @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>> break;
>> /* Nothing new?  Wait for eventfd to tell us they refilled. */
>> if (head == vq->num) {
>> - /* Kicks are disabled at this point, break loop and
>> - * process any remaining batched packets. Queue will
>> - * be re-enabled afterwards.
>> + /* Flush batched packets before enabling
>> + * virqtueue notification to reduce
>> + * unnecssary virtqueue kicks.
> 
> typos: virtqueue, unnecessary
> 
>> */
>> + vhost_tx_batch(net, nvq, sock, &msg);
>> + if (unlikely(busyloop_intr)) {
>> + vhost_poll_queue(&vq->poll);
>> + } else if (unlikely(vhost_enable_notify(&net->dev,
>> + vq))) {
>> + vhost_disable_notify(&net->dev, vq);
>> + continue;
>> + }
>> break;
>> }

See my comment below, but how about something like this?
 		if (head == vq->num) {
			/* Flush batched packets before enabling
			 * virtqueue notification to reduce
			 * unnecessary virtqueue kicks.
			 */
			vhost_tx_batch(net, nvq, sock, &msg);
			if (unlikely(busyloop_intr))
				/* If interrupted while doing busy polling,
				 * requeue the handler to be fair handle_rx
				 * as well as other tasks waiting on cpu.
				 */
				vhost_poll_queue(&vq->poll);
			else
				/* All of our work has been completed;
				 * however, before leaving the TX handler,
				 * do one last check for work, and requeue
				 * handler if necessary. If there is no work,
				 * queue will be reenabled.
				 */
				vhost_net_busy_poll_try_queue(net, vq);
 			break;
 		}


>> 
>> @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>> ++nvq->done_idx;
>> } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
>> 
>> - /* Kicks are still disabled, dispatch any remaining batched msgs. */
>> vhost_tx_batch(net, nvq, sock, &msg);
>> -
>> - if (unlikely(busyloop_intr))
>> - /* If interrupted while doing busy polling, requeue the
>> - * handler to be fair handle_rx as well as other tasks
>> - * waiting on cpu.
>> - */
>> - vhost_poll_queue(&vq->poll);
>> - else
>> - /* All of our work has been completed; however, before
>> - * leaving the TX handler, do one last check for work,
>> - * and requeue handler if necessary. If there is no work,
>> - * queue will be reenabled.
>> - */
>> - vhost_net_busy_poll_try_queue(net, vq);

Note: the use of vhost_net_busy_poll_try_queue was intentional in my
patch as it was checking to see both conditionals.

Can we simply hoist my logic up instead?

>> }
>> 
>> static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>> -- 
>> 2.34.1
> 

Tested-by: Jon Kohler <jon@nutanix.com <mailto:jon@nutanix.com>>

Tried this out on a 6.16 host / guest that locked up with iotlb miss loop,
applied this patch and all was well. 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-12 15:24     ` Jon Kohler
@ 2025-09-12 15:30       ` Michael S. Tsirkin
  2025-09-12 15:33         ` Jon Kohler
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2025-09-12 15:30 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Jason Wang, eperezma@redhat.com, jonah.palmer@oracle.com,
	kuba@kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org

On Fri, Sep 12, 2025 at 03:24:42PM +0000, Jon Kohler wrote:
> 
> 
> > On Sep 12, 2025, at 4:50 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > !-------------------------------------------------------------------|
> >  CAUTION: External Email
> > 
> > |-------------------------------------------------------------------!
> > 
> > On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
> >> Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
> >> sendmsg") tries to defer the notification enabling by moving the logic
> >> out of the loop after the vhost_tx_batch() when nothing new is
> >> spotted. This will bring side effects as the new logic would be reused
> >> for several other error conditions.
> >> 
> >> One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
> >> might return -EAGAIN and exit the loop and see there's still available
> >> buffers, so it will queue the tx work again until userspace feed the
> >> IOTLB entry correctly. This will slowdown the tx processing and may
> >> trigger the TX watchdog in the guest.
> > 
> > It's not that it might.
> > Pls clarify that it *has been reported* to do exactly that,
> > and add a link to the report.
> > 
> > 
> >> Fixing this by stick the notificaiton enabling logic inside the loop
> >> when nothing new is spotted and flush the batched before.
> >> 
> >> Reported-by: Jon Kohler <jon@nutanix.com>
> >> Cc: stable@vger.kernel.org
> >> Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > 
> > So this is mostly a revert, but with
> >                     vhost_tx_batch(net, nvq, sock, &msg);
> > added in to avoid regressing performance.
> > 
> > If you do not want to structure it like this (revert+optimization),
> > then pls make that clear in the message.
> > 
> > 
> >> ---
> >> drivers/vhost/net.c | 33 +++++++++++++--------------------
> >> 1 file changed, 13 insertions(+), 20 deletions(-)
> >> 
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index 16e39f3ab956..3611b7537932 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >> int err;
> >> int sent_pkts = 0;
> >> bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
> >> - bool busyloop_intr;
> >> bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
> >> 
> >> do {
> >> - busyloop_intr = false;
> >> + bool busyloop_intr = false;
> >> +
> >> if (nvq->done_idx == VHOST_NET_BATCH)
> >> vhost_tx_batch(net, nvq, sock, &msg);
> >> 
> >> @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >> break;
> >> /* Nothing new?  Wait for eventfd to tell us they refilled. */
> >> if (head == vq->num) {
> >> - /* Kicks are disabled at this point, break loop and
> >> - * process any remaining batched packets. Queue will
> >> - * be re-enabled afterwards.
> >> + /* Flush batched packets before enabling
> >> + * virqtueue notification to reduce
> >> + * unnecssary virtqueue kicks.
> > 
> > typos: virtqueue, unnecessary
> > 
> >> */
> >> + vhost_tx_batch(net, nvq, sock, &msg);
> >> + if (unlikely(busyloop_intr)) {
> >> + vhost_poll_queue(&vq->poll);
> >> + } else if (unlikely(vhost_enable_notify(&net->dev,
> >> + vq))) {
> >> + vhost_disable_notify(&net->dev, vq);
> >> + continue;
> >> + }
> >> break;
> >> }
> 
> See my comment below, but how about something like this?
>  		if (head == vq->num) {
> 			/* Flush batched packets before enabling
> 			 * virtqueue notification to reduce
> 			 * unnecessary virtqueue kicks.
> 			 */
> 			vhost_tx_batch(net, nvq, sock, &msg);
> 			if (unlikely(busyloop_intr))
> 				/* If interrupted while doing busy polling,
> 				 * requeue the handler to be fair handle_rx
> 				 * as well as other tasks waiting on cpu.
> 				 */
> 				vhost_poll_queue(&vq->poll);
> 			else
> 				/* All of our work has been completed;
> 				 * however, before leaving the TX handler,
> 				 * do one last check for work, and requeue
> 				 * handler if necessary. If there is no work,
> 				 * queue will be reenabled.
> 				 */
> 				vhost_net_busy_poll_try_queue(net, vq);


I mean it's functionally equivalent, but vhost_net_busy_poll_try_queue 
checks the avail ring again and we just checked it.
Why is this a good idea?
This happens on good path so I dislike unnecessary work like this.


>  			break;
>  		}
> 
> 
> >> 
> >> @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >> ++nvq->done_idx;
> >> } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
> >> 
> >> - /* Kicks are still disabled, dispatch any remaining batched msgs. */
> >> vhost_tx_batch(net, nvq, sock, &msg);
> >> -
> >> - if (unlikely(busyloop_intr))
> >> - /* If interrupted while doing busy polling, requeue the
> >> - * handler to be fair handle_rx as well as other tasks
> >> - * waiting on cpu.
> >> - */
> >> - vhost_poll_queue(&vq->poll);
> >> - else
> >> - /* All of our work has been completed; however, before
> >> - * leaving the TX handler, do one last check for work,
> >> - * and requeue handler if necessary. If there is no work,
> >> - * queue will be reenabled.
> >> - */
> >> - vhost_net_busy_poll_try_queue(net, vq);
> 
> Note: the use of vhost_net_busy_poll_try_queue was intentional in my
> patch as it was checking to see both conditionals.
> 
> Can we simply hoist my logic up instead?
> 
> >> }
> >> 
> >> static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> >> -- 
> >> 2.34.1
> > 
> 
> Tested-by: Jon Kohler <jon@nutanix.com <mailto:jon@nutanix.com>>
> 
> Tried this out on a 6.16 host / guest that locked up with iotlb miss loop,
> applied this patch and all was well. 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-12 15:30       ` Michael S. Tsirkin
@ 2025-09-12 15:33         ` Jon Kohler
  2025-09-12 15:38           ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2025-09-12 15:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eperezma@redhat.com, jonah.palmer@oracle.com,
	kuba@kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org



> On Sep 12, 2025, at 11:30 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Fri, Sep 12, 2025 at 03:24:42PM +0000, Jon Kohler wrote:
>> 
>> 
>>> On Sep 12, 2025, at 4:50 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> !-------------------------------------------------------------------|
>>> CAUTION: External Email
>>> 
>>> |-------------------------------------------------------------------!
>>> 
>>> On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
>>>> Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
>>>> sendmsg") tries to defer the notification enabling by moving the logic
>>>> out of the loop after the vhost_tx_batch() when nothing new is
>>>> spotted. This will bring side effects as the new logic would be reused
>>>> for several other error conditions.
>>>> 
>>>> One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
>>>> might return -EAGAIN and exit the loop and see there's still available
>>>> buffers, so it will queue the tx work again until userspace feed the
>>>> IOTLB entry correctly. This will slowdown the tx processing and may
>>>> trigger the TX watchdog in the guest.
>>> 
>>> It's not that it might.
>>> Pls clarify that it *has been reported* to do exactly that,
>>> and add a link to the report.
>>> 
>>> 
>>>> Fixing this by stick the notificaiton enabling logic inside the loop
>>>> when nothing new is spotted and flush the batched before.
>>>> 
>>>> Reported-by: Jon Kohler <jon@nutanix.com>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> 
>>> So this is mostly a revert, but with
>>>                    vhost_tx_batch(net, nvq, sock, &msg);
>>> added in to avoid regressing performance.
>>> 
>>> If you do not want to structure it like this (revert+optimization),
>>> then pls make that clear in the message.
>>> 
>>> 
>>>> ---
>>>> drivers/vhost/net.c | 33 +++++++++++++--------------------
>>>> 1 file changed, 13 insertions(+), 20 deletions(-)
>>>> 
>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>> index 16e39f3ab956..3611b7537932 100644
>>>> --- a/drivers/vhost/net.c
>>>> +++ b/drivers/vhost/net.c
>>>> @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>>>> int err;
>>>> int sent_pkts = 0;
>>>> bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
>>>> - bool busyloop_intr;
>>>> bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
>>>> 
>>>> do {
>>>> - busyloop_intr = false;
>>>> + bool busyloop_intr = false;
>>>> +
>>>> if (nvq->done_idx == VHOST_NET_BATCH)
>>>> vhost_tx_batch(net, nvq, sock, &msg);
>>>> 
>>>> @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>>>> break;
>>>> /* Nothing new?  Wait for eventfd to tell us they refilled. */
>>>> if (head == vq->num) {
>>>> - /* Kicks are disabled at this point, break loop and
>>>> - * process any remaining batched packets. Queue will
>>>> - * be re-enabled afterwards.
>>>> + /* Flush batched packets before enabling
>>>> + * virqtueue notification to reduce
>>>> + * unnecssary virtqueue kicks.
>>> 
>>> typos: virtqueue, unnecessary
>>> 
>>>> */
>>>> + vhost_tx_batch(net, nvq, sock, &msg);
>>>> + if (unlikely(busyloop_intr)) {
>>>> + vhost_poll_queue(&vq->poll);
>>>> + } else if (unlikely(vhost_enable_notify(&net->dev,
>>>> + vq))) {
>>>> + vhost_disable_notify(&net->dev, vq);
>>>> + continue;
>>>> + }
>>>> break;
>>>> }
>> 
>> See my comment below, but how about something like this?
>> if (head == vq->num) {
>> /* Flush batched packets before enabling
>> * virtqueue notification to reduce
>> * unnecessary virtqueue kicks.
>> */
>> vhost_tx_batch(net, nvq, sock, &msg);
>> if (unlikely(busyloop_intr))
>> /* If interrupted while doing busy polling,
>> * requeue the handler to be fair handle_rx
>> * as well as other tasks waiting on cpu.
>> */
>> vhost_poll_queue(&vq->poll);
>> else
>> /* All of our work has been completed;
>> * however, before leaving the TX handler,
>> * do one last check for work, and requeue
>> * handler if necessary. If there is no work,
>> * queue will be reenabled.
>> */
>> vhost_net_busy_poll_try_queue(net, vq);
> 
> 
> I mean it's functionally equivalent, but vhost_net_busy_poll_try_queue 
> checks the avail ring again and we just checked it.
> Why is this a good idea?
> This happens on good path so I dislike unnecessary work like this.

For the sake of discussion, let’s say vhost_tx_batch and the
sendmsg within took 1 full second to complete. A lot could potentially
happen in that amount of time. So sure, control path wise it looks like
we just checked it, but time wise, that could have been ages ago.

> 
> 
>> break;
>> }
>> 
>> 
>>>> 
>>>> @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>>>> ++nvq->done_idx;
>>>> } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
>>>> 
>>>> - /* Kicks are still disabled, dispatch any remaining batched msgs. */
>>>> vhost_tx_batch(net, nvq, sock, &msg);
>>>> -
>>>> - if (unlikely(busyloop_intr))
>>>> - /* If interrupted while doing busy polling, requeue the
>>>> - * handler to be fair handle_rx as well as other tasks
>>>> - * waiting on cpu.
>>>> - */
>>>> - vhost_poll_queue(&vq->poll);
>>>> - else
>>>> - /* All of our work has been completed; however, before
>>>> - * leaving the TX handler, do one last check for work,
>>>> - * and requeue handler if necessary. If there is no work,
>>>> - * queue will be reenabled.
>>>> - */
>>>> - vhost_net_busy_poll_try_queue(net, vq);
>> 
>> Note: the use of vhost_net_busy_poll_try_queue was intentional in my
>> patch as it was checking to see both conditionals.
>> 
>> Can we simply hoist my logic up instead?
>> 
>>>> }
>>>> 
>>>> static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>>>> -- 
>>>> 2.34.1
>>> 
>> 
>> Tested-by: Jon Kohler <jon@nutanix.com <mailto:jon@nutanix.com>>
>> 
>> Tried this out on a 6.16 host / guest that locked up with iotlb miss loop,
>> applied this patch and all was well.
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-12 15:33         ` Jon Kohler
@ 2025-09-12 15:38           ` Michael S. Tsirkin
  2025-09-12 15:40             ` Jon Kohler
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2025-09-12 15:38 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Jason Wang, eperezma@redhat.com, jonah.palmer@oracle.com,
	kuba@kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org

On Fri, Sep 12, 2025 at 03:33:32PM +0000, Jon Kohler wrote:
> 
> 
> > On Sep 12, 2025, at 11:30 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > !-------------------------------------------------------------------|
> >  CAUTION: External Email
> > 
> > |-------------------------------------------------------------------!
> > 
> > On Fri, Sep 12, 2025 at 03:24:42PM +0000, Jon Kohler wrote:
> >> 
> >> 
> >>> On Sep 12, 2025, at 4:50 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> 
> >>> !-------------------------------------------------------------------|
> >>> CAUTION: External Email
> >>> 
> >>> |-------------------------------------------------------------------!
> >>> 
> >>> On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
> >>>> Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
> >>>> sendmsg") tries to defer the notification enabling by moving the logic
> >>>> out of the loop after the vhost_tx_batch() when nothing new is
> >>>> spotted. This will bring side effects as the new logic would be reused
> >>>> for several other error conditions.
> >>>> 
> >>>> One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
> >>>> might return -EAGAIN and exit the loop and see there's still available
> >>>> buffers, so it will queue the tx work again until userspace feed the
> >>>> IOTLB entry correctly. This will slowdown the tx processing and may
> >>>> trigger the TX watchdog in the guest.
> >>> 
> >>> It's not that it might.
> >>> Pls clarify that it *has been reported* to do exactly that,
> >>> and add a link to the report.
> >>> 
> >>> 
> >>>> Fixing this by stick the notificaiton enabling logic inside the loop
> >>>> when nothing new is spotted and flush the batched before.
> >>>> 
> >>>> Reported-by: Jon Kohler <jon@nutanix.com>
> >>>> Cc: stable@vger.kernel.org
> >>>> Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>> 
> >>> So this is mostly a revert, but with
> >>>                    vhost_tx_batch(net, nvq, sock, &msg);
> >>> added in to avoid regressing performance.
> >>> 
> >>> If you do not want to structure it like this (revert+optimization),
> >>> then pls make that clear in the message.
> >>> 
> >>> 
> >>>> ---
> >>>> drivers/vhost/net.c | 33 +++++++++++++--------------------
> >>>> 1 file changed, 13 insertions(+), 20 deletions(-)
> >>>> 
> >>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >>>> index 16e39f3ab956..3611b7537932 100644
> >>>> --- a/drivers/vhost/net.c
> >>>> +++ b/drivers/vhost/net.c
> >>>> @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >>>> int err;
> >>>> int sent_pkts = 0;
> >>>> bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
> >>>> - bool busyloop_intr;
> >>>> bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
> >>>> 
> >>>> do {
> >>>> - busyloop_intr = false;
> >>>> + bool busyloop_intr = false;
> >>>> +
> >>>> if (nvq->done_idx == VHOST_NET_BATCH)
> >>>> vhost_tx_batch(net, nvq, sock, &msg);
> >>>> 
> >>>> @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >>>> break;
> >>>> /* Nothing new?  Wait for eventfd to tell us they refilled. */
> >>>> if (head == vq->num) {
> >>>> - /* Kicks are disabled at this point, break loop and
> >>>> - * process any remaining batched packets. Queue will
> >>>> - * be re-enabled afterwards.
> >>>> + /* Flush batched packets before enabling
> >>>> + * virqtueue notification to reduce
> >>>> + * unnecssary virtqueue kicks.
> >>> 
> >>> typos: virtqueue, unnecessary
> >>> 
> >>>> */
> >>>> + vhost_tx_batch(net, nvq, sock, &msg);
> >>>> + if (unlikely(busyloop_intr)) {
> >>>> + vhost_poll_queue(&vq->poll);
> >>>> + } else if (unlikely(vhost_enable_notify(&net->dev,
> >>>> + vq))) {
> >>>> + vhost_disable_notify(&net->dev, vq);
> >>>> + continue;
> >>>> + }
> >>>> break;
> >>>> }
> >> 
> >> See my comment below, but how about something like this?
> >> if (head == vq->num) {
> >> /* Flush batched packets before enabling
> >> * virtqueue notification to reduce
> >> * unnecessary virtqueue kicks.
> >> */
> >> vhost_tx_batch(net, nvq, sock, &msg);
> >> if (unlikely(busyloop_intr))
> >> /* If interrupted while doing busy polling,
> >> * requeue the handler to be fair handle_rx
> >> * as well as other tasks waiting on cpu.
> >> */
> >> vhost_poll_queue(&vq->poll);
> >> else
> >> /* All of our work has been completed;
> >> * however, before leaving the TX handler,
> >> * do one last check for work, and requeue
> >> * handler if necessary. If there is no work,
> >> * queue will be reenabled.
> >> */
> >> vhost_net_busy_poll_try_queue(net, vq);
> > 
> > 
> > I mean it's functionally equivalent, but vhost_net_busy_poll_try_queue 
> > checks the avail ring again and we just checked it.
> > Why is this a good idea?
> > This happens on good path so I dislike unnecessary work like this.
> 
> For the sake of discussion, let’s say vhost_tx_batch and the
> sendmsg within took 1 full second to complete. A lot could potentially
> happen in that amount of time. So sure, control path wise it looks like
> we just checked it, but time wise, that could have been ages ago.


Oh I forgot we had the tx batch in there.
OK then, I don't have a problem with this.


However, what I like about Jason's patch is that
it is actually simply revert of your patch +
a single call to 
vhost_tx_batch(net, nvq, sock, &msg);

So it is a more obviosly correct approach.


I'll be fine with doing what you propose on top,
with testing that they are benefitial for performance.






> > 
> > 
> >> break;
> >> }
> >> 
> >> 
> >>>> 
> >>>> @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >>>> ++nvq->done_idx;
> >>>> } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
> >>>> 
> >>>> - /* Kicks are still disabled, dispatch any remaining batched msgs. */
> >>>> vhost_tx_batch(net, nvq, sock, &msg);
> >>>> -
> >>>> - if (unlikely(busyloop_intr))
> >>>> - /* If interrupted while doing busy polling, requeue the
> >>>> - * handler to be fair handle_rx as well as other tasks
> >>>> - * waiting on cpu.
> >>>> - */
> >>>> - vhost_poll_queue(&vq->poll);
> >>>> - else
> >>>> - /* All of our work has been completed; however, before
> >>>> - * leaving the TX handler, do one last check for work,
> >>>> - * and requeue handler if necessary. If there is no work,
> >>>> - * queue will be reenabled.
> >>>> - */
> >>>> - vhost_net_busy_poll_try_queue(net, vq);
> >> 
> >> Note: the use of vhost_net_busy_poll_try_queue was intentional in my
> >> patch as it was checking to see both conditionals.
> >> 
> >> Can we simply hoist my logic up instead?
> >> 
> >>>> }
> >>>> 
> >>>> static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> >>>> -- 
> >>>> 2.34.1
> >>> 
> >> 
> >> Tested-by: Jon Kohler <jon@nutanix.com <mailto:jon@nutanix.com>>
> >> 
> >> Tried this out on a 6.16 host / guest that locked up with iotlb miss loop,
> >> applied this patch and all was well.
> > 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-12 15:38           ` Michael S. Tsirkin
@ 2025-09-12 15:40             ` Jon Kohler
  0 siblings, 0 replies; 16+ messages in thread
From: Jon Kohler @ 2025-09-12 15:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eperezma@redhat.com, jonah.palmer@oracle.com,
	kuba@kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org



> On Sep 12, 2025, at 11:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Fri, Sep 12, 2025 at 03:33:32PM +0000, Jon Kohler wrote:
>> 
>> 
>>> On Sep 12, 2025, at 11:30 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> !-------------------------------------------------------------------|
>>> CAUTION: External Email
>>> 
>>> |-------------------------------------------------------------------!
>>> 
>>> On Fri, Sep 12, 2025 at 03:24:42PM +0000, Jon Kohler wrote:
>>>> 
>>>> 
>>>>> On Sep 12, 2025, at 4:50 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> 
>>>>> !-------------------------------------------------------------------|
>>>>> CAUTION: External Email
>>>>> 
>>>>> |-------------------------------------------------------------------!
>>>>> 
>>>>> On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
>>>>>> Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
>>>>>> sendmsg") tries to defer the notification enabling by moving the logic
>>>>>> out of the loop after the vhost_tx_batch() when nothing new is
>>>>>> spotted. This will bring side effects as the new logic would be reused
>>>>>> for several other error conditions.
>>>>>> 
>>>>>> One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
>>>>>> might return -EAGAIN and exit the loop and see there's still available
>>>>>> buffers, so it will queue the tx work again until userspace feed the
>>>>>> IOTLB entry correctly. This will slowdown the tx processing and may
>>>>>> trigger the TX watchdog in the guest.
>>>>> 
>>>>> It's not that it might.
>>>>> Pls clarify that it *has been reported* to do exactly that,
>>>>> and add a link to the report.
>>>>> 
>>>>> 
>>>>>> Fixing this by stick the notificaiton enabling logic inside the loop
>>>>>> when nothing new is spotted and flush the batched before.
>>>>>> 
>>>>>> Reported-by: Jon Kohler <jon@nutanix.com>
>>>>>> Cc: stable@vger.kernel.org
>>>>>> Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> 
>>>>> So this is mostly a revert, but with
>>>>>                   vhost_tx_batch(net, nvq, sock, &msg);
>>>>> added in to avoid regressing performance.
>>>>> 
>>>>> If you do not want to structure it like this (revert+optimization),
>>>>> then pls make that clear in the message.
>>>>> 
>>>>> 
>>>>>> ---
>>>>>> drivers/vhost/net.c | 33 +++++++++++++--------------------
>>>>>> 1 file changed, 13 insertions(+), 20 deletions(-)
>>>>>> 
>>>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>>>> index 16e39f3ab956..3611b7537932 100644
>>>>>> --- a/drivers/vhost/net.c
>>>>>> +++ b/drivers/vhost/net.c
>>>>>> @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>>>>>> int err;
>>>>>> int sent_pkts = 0;
>>>>>> bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
>>>>>> - bool busyloop_intr;
>>>>>> bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
>>>>>> 
>>>>>> do {
>>>>>> - busyloop_intr = false;
>>>>>> + bool busyloop_intr = false;
>>>>>> +
>>>>>> if (nvq->done_idx == VHOST_NET_BATCH)
>>>>>> vhost_tx_batch(net, nvq, sock, &msg);
>>>>>> 
>>>>>> @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>>>>>> break;
>>>>>> /* Nothing new?  Wait for eventfd to tell us they refilled. */
>>>>>> if (head == vq->num) {
>>>>>> - /* Kicks are disabled at this point, break loop and
>>>>>> - * process any remaining batched packets. Queue will
>>>>>> - * be re-enabled afterwards.
>>>>>> + /* Flush batched packets before enabling
>>>>>> + * virqtueue notification to reduce
>>>>>> + * unnecssary virtqueue kicks.
>>>>> 
>>>>> typos: virtqueue, unnecessary
>>>>> 
>>>>>> */
>>>>>> + vhost_tx_batch(net, nvq, sock, &msg);
>>>>>> + if (unlikely(busyloop_intr)) {
>>>>>> + vhost_poll_queue(&vq->poll);
>>>>>> + } else if (unlikely(vhost_enable_notify(&net->dev,
>>>>>> + vq))) {
>>>>>> + vhost_disable_notify(&net->dev, vq);
>>>>>> + continue;
>>>>>> + }
>>>>>> break;
>>>>>> }
>>>> 
>>>> See my comment below, but how about something like this?
>>>> if (head == vq->num) {
>>>> /* Flush batched packets before enabling
>>>> * virtqueue notification to reduce
>>>> * unnecessary virtqueue kicks.
>>>> */
>>>> vhost_tx_batch(net, nvq, sock, &msg);
>>>> if (unlikely(busyloop_intr))
>>>> /* If interrupted while doing busy polling,
>>>> * requeue the handler to be fair handle_rx
>>>> * as well as other tasks waiting on cpu.
>>>> */
>>>> vhost_poll_queue(&vq->poll);
>>>> else
>>>> /* All of our work has been completed;
>>>> * however, before leaving the TX handler,
>>>> * do one last check for work, and requeue
>>>> * handler if necessary. If there is no work,
>>>> * queue will be reenabled.
>>>> */
>>>> vhost_net_busy_poll_try_queue(net, vq);
>>> 
>>> 
>>> I mean it's functionally equivalent, but vhost_net_busy_poll_try_queue 
>>> checks the avail ring again and we just checked it.
>>> Why is this a good idea?
>>> This happens on good path so I dislike unnecessary work like this.
>> 
>> For the sake of discussion, let’s say vhost_tx_batch and the
>> sendmsg within took 1 full second to complete. A lot could potentially
>> happen in that amount of time. So sure, control path wise it looks like
>> we just checked it, but time wise, that could have been ages ago.
> 
> 
> Oh I forgot we had the tx batch in there.
> OK then, I don't have a problem with this.
> 
> 
> However, what I like about Jason's patch is that
> it is actually simply revert of your patch +
> a single call to 
> vhost_tx_batch(net, nvq, sock, &msg);
> 
> So it is a more obviosly correct approach.
> 
> 
> I'll be fine with doing what you propose on top,
> with testing that they are benefitial for performance.

Ok fair enough, agreed, let’s fix the bug business first,
then reoptimize on top.

> 
> 
> 
> 
> 
> 
>>> 
>>> 
>>>> break;
>>>> }
>>>> 
>>>> 
>>>>>> 
>>>>>> @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>>>>>> ++nvq->done_idx;
>>>>>> } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
>>>>>> 
>>>>>> - /* Kicks are still disabled, dispatch any remaining batched msgs. */
>>>>>> vhost_tx_batch(net, nvq, sock, &msg);
>>>>>> -
>>>>>> - if (unlikely(busyloop_intr))
>>>>>> - /* If interrupted while doing busy polling, requeue the
>>>>>> - * handler to be fair handle_rx as well as other tasks
>>>>>> - * waiting on cpu.
>>>>>> - */
>>>>>> - vhost_poll_queue(&vq->poll);
>>>>>> - else
>>>>>> - /* All of our work has been completed; however, before
>>>>>> - * leaving the TX handler, do one last check for work,
>>>>>> - * and requeue handler if necessary. If there is no work,
>>>>>> - * queue will be reenabled.
>>>>>> - */
>>>>>> - vhost_net_busy_poll_try_queue(net, vq);
>>>> 
>>>> Note: the use of vhost_net_busy_poll_try_queue was intentional in my
>>>> patch as it was checking to see both conditionals.
>>>> 
>>>> Can we simply hoist my logic up instead?
>>>> 
>>>>>> }
>>>>>> 
>>>>>> static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>>>>>> -- 
>>>>>> 2.34.1
>>>>> 
>>>> 
>>>> Tested-by: Jon Kohler <jon@nutanix.com <mailto:jon@nutanix.com>>
>>>> 
>>>> Tried this out on a 6.16 host / guest that locked up with iotlb miss loop,
>>>> applied this patch and all was well.
>>> 
>> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-12  8:26 ` [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification Jason Wang
  2025-09-12  8:50   ` Michael S. Tsirkin
@ 2025-09-15 16:03   ` Michael S. Tsirkin
  2025-09-16  2:37     ` Jason Wang
  1 sibling, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2025-09-15 16:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: eperezma, jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
> Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
> sendmsg") tries to defer the notification enabling by moving the logic
> out of the loop after the vhost_tx_batch() when nothing new is
> spotted. This will bring side effects as the new logic would be reused
> for several other error conditions.
> 
> One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
> might return -EAGAIN and exit the loop and see there's still available
> buffers, so it will queue the tx work again until userspace feed the
> IOTLB entry correctly. This will slowdown the tx processing and may
> trigger the TX watchdog in the guest.
> 
> Fixing this by stick the notificaiton enabling logic inside the loop
> when nothing new is spotted and flush the batched before.
> 
> Reported-by: Jon Kohler <jon@nutanix.com>
> Cc: stable@vger.kernel.org
> Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c | 33 +++++++++++++--------------------
>  1 file changed, 13 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 16e39f3ab956..3611b7537932 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  	int err;
>  	int sent_pkts = 0;
>  	bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
> -	bool busyloop_intr;
>  	bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
>  
>  	do {
> -		busyloop_intr = false;
> +		bool busyloop_intr = false;
> +
>  		if (nvq->done_idx == VHOST_NET_BATCH)
>  			vhost_tx_batch(net, nvq, sock, &msg);
>  
> @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  			break;
>  		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>  		if (head == vq->num) {
> -			/* Kicks are disabled at this point, break loop and
> -			 * process any remaining batched packets. Queue will
> -			 * be re-enabled afterwards.
> +			/* Flush batched packets before enabling
> +			 * virqtueue notification to reduce
> +			 * unnecssary virtqueue kicks.
>  			 */
> +			vhost_tx_batch(net, nvq, sock, &msg);

So why don't we do this in the "else" branch"? If we are busy polling
then we are not enabling kicks, so is there a reason to flush?


> +			if (unlikely(busyloop_intr)) {
> +				vhost_poll_queue(&vq->poll);
> +			} else if (unlikely(vhost_enable_notify(&net->dev,
> +								vq))) {
> +				vhost_disable_notify(&net->dev, vq);
> +				continue;
> +			}
>  			break;
>  		}
>  
> @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  		++nvq->done_idx;
>  	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
>  
> -	/* Kicks are still disabled, dispatch any remaining batched msgs. */
>  	vhost_tx_batch(net, nvq, sock, &msg);
> -
> -	if (unlikely(busyloop_intr))
> -		/* If interrupted while doing busy polling, requeue the
> -		 * handler to be fair handle_rx as well as other tasks
> -		 * waiting on cpu.
> -		 */
> -		vhost_poll_queue(&vq->poll);
> -	else
> -		/* All of our work has been completed; however, before
> -		 * leaving the TX handler, do one last check for work,
> -		 * and requeue handler if necessary. If there is no work,
> -		 * queue will be reenabled.
> -		 */
> -		vhost_net_busy_poll_try_queue(net, vq);
>  }
>  
>  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-15 16:03   ` Michael S. Tsirkin
@ 2025-09-16  2:37     ` Jason Wang
  2025-09-16  5:18       ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2025-09-16  2:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eperezma, jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

On Tue, Sep 16, 2025 at 12:03 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
> > Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
> > sendmsg") tries to defer the notification enabling by moving the logic
> > out of the loop after the vhost_tx_batch() when nothing new is
> > spotted. This will bring side effects as the new logic would be reused
> > for several other error conditions.
> >
> > One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
> > might return -EAGAIN and exit the loop and see there's still available
> > buffers, so it will queue the tx work again until userspace feed the
> > IOTLB entry correctly. This will slowdown the tx processing and may
> > trigger the TX watchdog in the guest.
> >
> > Fixing this by stick the notificaiton enabling logic inside the loop
> > when nothing new is spotted and flush the batched before.
> >
> > Reported-by: Jon Kohler <jon@nutanix.com>
> > Cc: stable@vger.kernel.org
> > Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/vhost/net.c | 33 +++++++++++++--------------------
> >  1 file changed, 13 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 16e39f3ab956..3611b7537932 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >       int err;
> >       int sent_pkts = 0;
> >       bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
> > -     bool busyloop_intr;
> >       bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
> >
> >       do {
> > -             busyloop_intr = false;
> > +             bool busyloop_intr = false;
> > +
> >               if (nvq->done_idx == VHOST_NET_BATCH)
> >                       vhost_tx_batch(net, nvq, sock, &msg);
> >
> > @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >                       break;
> >               /* Nothing new?  Wait for eventfd to tell us they refilled. */
> >               if (head == vq->num) {
> > -                     /* Kicks are disabled at this point, break loop and
> > -                      * process any remaining batched packets. Queue will
> > -                      * be re-enabled afterwards.
> > +                     /* Flush batched packets before enabling
> > +                      * virqtueue notification to reduce
> > +                      * unnecssary virtqueue kicks.
> >                        */
> > +                     vhost_tx_batch(net, nvq, sock, &msg);
>
> So why don't we do this in the "else" branch"? If we are busy polling
> then we are not enabling kicks, so is there a reason to flush?

It should be functional equivalent:

do {
    if (head == vq->num) {
        vhost_tx_batch();
        if (unlikely(busyloop_intr)) {
            vhost_poll_queue()
        } else if () {
            vhost_disable_notify(&net->dev, vq);
            continue;
        }
        return;
}

vs

do {
    if (head == vq->num) {
        if (unlikely(busyloop_intr)) {
            vhost_poll_queue()
        } else if () {
            vhost_tx_batch();
            vhost_disable_notify(&net->dev, vq);
            continue;
        }
        break;
}

vhost_tx_batch();
return;

Thanks


>
>
> > +                     if (unlikely(busyloop_intr)) {
> > +                             vhost_poll_queue(&vq->poll);
> > +                     } else if (unlikely(vhost_enable_notify(&net->dev,
> > +                                                             vq))) {
> > +                             vhost_disable_notify(&net->dev, vq);
> > +                             continue;
> > +                     }
> >                       break;
> >               }
> >
> > @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >               ++nvq->done_idx;
> >       } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
> >
> > -     /* Kicks are still disabled, dispatch any remaining batched msgs. */
> >       vhost_tx_batch(net, nvq, sock, &msg);
> > -
> > -     if (unlikely(busyloop_intr))
> > -             /* If interrupted while doing busy polling, requeue the
> > -              * handler to be fair handle_rx as well as other tasks
> > -              * waiting on cpu.
> > -              */
> > -             vhost_poll_queue(&vq->poll);
> > -     else
> > -             /* All of our work has been completed; however, before
> > -              * leaving the TX handler, do one last check for work,
> > -              * and requeue handler if necessary. If there is no work,
> > -              * queue will be reenabled.
> > -              */
> > -             vhost_net_busy_poll_try_queue(net, vq);
> >  }
> >
> >  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> > --
> > 2.34.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-16  2:37     ` Jason Wang
@ 2025-09-16  5:18       ` Michael S. Tsirkin
  2025-09-16  6:24         ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2025-09-16  5:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: eperezma, jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

On Tue, Sep 16, 2025 at 10:37:35AM +0800, Jason Wang wrote:
> On Tue, Sep 16, 2025 at 12:03 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
> > > Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
> > > sendmsg") tries to defer the notification enabling by moving the logic
> > > out of the loop after the vhost_tx_batch() when nothing new is
> > > spotted. This will bring side effects as the new logic would be reused
> > > for several other error conditions.
> > >
> > > One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
> > > might return -EAGAIN and exit the loop and see there's still available
> > > buffers, so it will queue the tx work again until userspace feed the
> > > IOTLB entry correctly. This will slowdown the tx processing and may
> > > trigger the TX watchdog in the guest.
> > >
> > > Fixing this by stick the notificaiton enabling logic inside the loop
> > > when nothing new is spotted and flush the batched before.
> > >
> > > Reported-by: Jon Kohler <jon@nutanix.com>
> > > Cc: stable@vger.kernel.org
> > > Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  drivers/vhost/net.c | 33 +++++++++++++--------------------
> > >  1 file changed, 13 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index 16e39f3ab956..3611b7537932 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > >       int err;
> > >       int sent_pkts = 0;
> > >       bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
> > > -     bool busyloop_intr;
> > >       bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
> > >
> > >       do {
> > > -             busyloop_intr = false;
> > > +             bool busyloop_intr = false;
> > > +
> > >               if (nvq->done_idx == VHOST_NET_BATCH)
> > >                       vhost_tx_batch(net, nvq, sock, &msg);
> > >
> > > @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > >                       break;
> > >               /* Nothing new?  Wait for eventfd to tell us they refilled. */
> > >               if (head == vq->num) {
> > > -                     /* Kicks are disabled at this point, break loop and
> > > -                      * process any remaining batched packets. Queue will
> > > -                      * be re-enabled afterwards.
> > > +                     /* Flush batched packets before enabling
> > > +                      * virqtueue notification to reduce
> > > +                      * unnecssary virtqueue kicks.
> > >                        */
> > > +                     vhost_tx_batch(net, nvq, sock, &msg);
> >
> > So why don't we do this in the "else" branch"? If we are busy polling
> > then we are not enabling kicks, so is there a reason to flush?
> 
> It should be functional equivalent:
> 
> do {
>     if (head == vq->num) {
>         vhost_tx_batch();
>         if (unlikely(busyloop_intr)) {
>             vhost_poll_queue()
>         } else if () {
>             vhost_disable_notify(&net->dev, vq);
>             continue;
>         }
>         return;
> }
> 
> vs
> 
> do {
>     if (head == vq->num) {
>         if (unlikely(busyloop_intr)) {
>             vhost_poll_queue()
>         } else if () {
>             vhost_tx_batch();
>             vhost_disable_notify(&net->dev, vq);
>             continue;
>         }
>         break;
> }
> 
> vhost_tx_batch();
> return;
> 
> Thanks
>

But this is not what the code comment says:

                     /* Flush batched packets before enabling
                      * virqtueue notification to reduce
                      * unnecssary virtqueue kicks.


So I ask - of we queued more polling, why do we need
to flush batched packets? We might get more in the next
polling round, this is what polling is designed to do.

 
> 
> >
> >
> > > +                     if (unlikely(busyloop_intr)) {
> > > +                             vhost_poll_queue(&vq->poll);
> > > +                     } else if (unlikely(vhost_enable_notify(&net->dev,
> > > +                                                             vq))) {
> > > +                             vhost_disable_notify(&net->dev, vq);
> > > +                             continue;
> > > +                     }
> > >                       break;
> > >               }
> > >
> > > @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > >               ++nvq->done_idx;
> > >       } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
> > >
> > > -     /* Kicks are still disabled, dispatch any remaining batched msgs. */
> > >       vhost_tx_batch(net, nvq, sock, &msg);
> > > -
> > > -     if (unlikely(busyloop_intr))
> > > -             /* If interrupted while doing busy polling, requeue the
> > > -              * handler to be fair handle_rx as well as other tasks
> > > -              * waiting on cpu.
> > > -              */
> > > -             vhost_poll_queue(&vq->poll);
> > > -     else
> > > -             /* All of our work has been completed; however, before
> > > -              * leaving the TX handler, do one last check for work,
> > > -              * and requeue handler if necessary. If there is no work,
> > > -              * queue will be reenabled.
> > > -              */
> > > -             vhost_net_busy_poll_try_queue(net, vq);
> > >  }
> > >
> > >  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> > > --
> > > 2.34.1
> >


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-16  5:18       ` Michael S. Tsirkin
@ 2025-09-16  6:24         ` Jason Wang
  2025-09-16  7:07           ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2025-09-16  6:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eperezma, jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

On Tue, Sep 16, 2025 at 1:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Sep 16, 2025 at 10:37:35AM +0800, Jason Wang wrote:
> > On Tue, Sep 16, 2025 at 12:03 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
> > > > Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
> > > > sendmsg") tries to defer the notification enabling by moving the logic
> > > > out of the loop after the vhost_tx_batch() when nothing new is
> > > > spotted. This will bring side effects as the new logic would be reused
> > > > for several other error conditions.
> > > >
> > > > One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
> > > > might return -EAGAIN and exit the loop and see there's still available
> > > > buffers, so it will queue the tx work again until userspace feed the
> > > > IOTLB entry correctly. This will slowdown the tx processing and may
> > > > trigger the TX watchdog in the guest.
> > > >
> > > > Fixing this by stick the notificaiton enabling logic inside the loop
> > > > when nothing new is spotted and flush the batched before.
> > > >
> > > > Reported-by: Jon Kohler <jon@nutanix.com>
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > >  drivers/vhost/net.c | 33 +++++++++++++--------------------
> > > >  1 file changed, 13 insertions(+), 20 deletions(-)
> > > >
> > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > > index 16e39f3ab956..3611b7537932 100644
> > > > --- a/drivers/vhost/net.c
> > > > +++ b/drivers/vhost/net.c
> > > > @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > >       int err;
> > > >       int sent_pkts = 0;
> > > >       bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
> > > > -     bool busyloop_intr;
> > > >       bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
> > > >
> > > >       do {
> > > > -             busyloop_intr = false;
> > > > +             bool busyloop_intr = false;
> > > > +
> > > >               if (nvq->done_idx == VHOST_NET_BATCH)
> > > >                       vhost_tx_batch(net, nvq, sock, &msg);
> > > >
> > > > @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > >                       break;
> > > >               /* Nothing new?  Wait for eventfd to tell us they refilled. */
> > > >               if (head == vq->num) {
> > > > -                     /* Kicks are disabled at this point, break loop and
> > > > -                      * process any remaining batched packets. Queue will
> > > > -                      * be re-enabled afterwards.
> > > > +                     /* Flush batched packets before enabling
> > > > +                      * virqtueue notification to reduce
> > > > +                      * unnecssary virtqueue kicks.
> > > >                        */
> > > > +                     vhost_tx_batch(net, nvq, sock, &msg);
> > >
> > > So why don't we do this in the "else" branch"? If we are busy polling
> > > then we are not enabling kicks, so is there a reason to flush?
> >
> > It should be functional equivalent:
> >
> > do {
> >     if (head == vq->num) {
> >         vhost_tx_batch();
> >         if (unlikely(busyloop_intr)) {
> >             vhost_poll_queue()
> >         } else if () {
> >             vhost_disable_notify(&net->dev, vq);
> >             continue;
> >         }
> >         return;
> > }
> >
> > vs
> >
> > do {
> >     if (head == vq->num) {
> >         if (unlikely(busyloop_intr)) {
> >             vhost_poll_queue()
> >         } else if () {
> >             vhost_tx_batch();
> >             vhost_disable_notify(&net->dev, vq);
> >             continue;
> >         }
> >         break;
> > }
> >
> > vhost_tx_batch();
> > return;
> >
> > Thanks
> >
>
> But this is not what the code comment says:
>
>                      /* Flush batched packets before enabling
>                       * virqtueue notification to reduce
>                       * unnecssary virtqueue kicks.
>
>
> So I ask - of we queued more polling, why do we need
> to flush batched packets? We might get more in the next
> polling round, this is what polling is designed to do.

The reason is there could be a rx work when busyloop_intr is true, so
we need to flush.

Thanks

>
>
> >
> > >
> > >
> > > > +                     if (unlikely(busyloop_intr)) {
> > > > +                             vhost_poll_queue(&vq->poll);
> > > > +                     } else if (unlikely(vhost_enable_notify(&net->dev,
> > > > +                                                             vq))) {
> > > > +                             vhost_disable_notify(&net->dev, vq);
> > > > +                             continue;
> > > > +                     }
> > > >                       break;
> > > >               }
> > > >
> > > > @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > >               ++nvq->done_idx;
> > > >       } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
> > > >
> > > > -     /* Kicks are still disabled, dispatch any remaining batched msgs. */
> > > >       vhost_tx_batch(net, nvq, sock, &msg);
> > > > -
> > > > -     if (unlikely(busyloop_intr))
> > > > -             /* If interrupted while doing busy polling, requeue the
> > > > -              * handler to be fair handle_rx as well as other tasks
> > > > -              * waiting on cpu.
> > > > -              */
> > > > -             vhost_poll_queue(&vq->poll);
> > > > -     else
> > > > -             /* All of our work has been completed; however, before
> > > > -              * leaving the TX handler, do one last check for work,
> > > > -              * and requeue handler if necessary. If there is no work,
> > > > -              * queue will be reenabled.
> > > > -              */
> > > > -             vhost_net_busy_poll_try_queue(net, vq);
> > > >  }
> > > >
> > > >  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> > > > --
> > > > 2.34.1
> > >
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-16  6:24         ` Jason Wang
@ 2025-09-16  7:07           ` Michael S. Tsirkin
  2025-09-16  7:20             ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2025-09-16  7:07 UTC (permalink / raw)
  To: Jason Wang
  Cc: eperezma, jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

On Tue, Sep 16, 2025 at 02:24:22PM +0800, Jason Wang wrote:
> On Tue, Sep 16, 2025 at 1:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Sep 16, 2025 at 10:37:35AM +0800, Jason Wang wrote:
> > > On Tue, Sep 16, 2025 at 12:03 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
> > > > > Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
> > > > > sendmsg") tries to defer the notification enabling by moving the logic
> > > > > out of the loop after the vhost_tx_batch() when nothing new is
> > > > > spotted. This will bring side effects as the new logic would be reused
> > > > > for several other error conditions.
> > > > >
> > > > > One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
> > > > > might return -EAGAIN and exit the loop and see there's still available
> > > > > buffers, so it will queue the tx work again until userspace feed the
> > > > > IOTLB entry correctly. This will slowdown the tx processing and may
> > > > > trigger the TX watchdog in the guest.
> > > > >
> > > > > Fixing this by stick the notificaiton enabling logic inside the loop
> > > > > when nothing new is spotted and flush the batched before.
> > > > >
> > > > > Reported-by: Jon Kohler <jon@nutanix.com>
> > > > > Cc: stable@vger.kernel.org
> > > > > Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > ---
> > > > >  drivers/vhost/net.c | 33 +++++++++++++--------------------
> > > > >  1 file changed, 13 insertions(+), 20 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > > > index 16e39f3ab956..3611b7537932 100644
> > > > > --- a/drivers/vhost/net.c
> > > > > +++ b/drivers/vhost/net.c
> > > > > @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > > >       int err;
> > > > >       int sent_pkts = 0;
> > > > >       bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
> > > > > -     bool busyloop_intr;
> > > > >       bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
> > > > >
> > > > >       do {
> > > > > -             busyloop_intr = false;
> > > > > +             bool busyloop_intr = false;
> > > > > +
> > > > >               if (nvq->done_idx == VHOST_NET_BATCH)
> > > > >                       vhost_tx_batch(net, nvq, sock, &msg);
> > > > >
> > > > > @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > > >                       break;
> > > > >               /* Nothing new?  Wait for eventfd to tell us they refilled. */
> > > > >               if (head == vq->num) {
> > > > > -                     /* Kicks are disabled at this point, break loop and
> > > > > -                      * process any remaining batched packets. Queue will
> > > > > -                      * be re-enabled afterwards.
> > > > > +                     /* Flush batched packets before enabling
> > > > > +                      * virqtueue notification to reduce
> > > > > +                      * unnecssary virtqueue kicks.
> > > > >                        */
> > > > > +                     vhost_tx_batch(net, nvq, sock, &msg);
> > > >
> > > > So why don't we do this in the "else" branch"? If we are busy polling
> > > > then we are not enabling kicks, so is there a reason to flush?
> > >
> > > It should be functional equivalent:
> > >
> > > do {
> > >     if (head == vq->num) {
> > >         vhost_tx_batch();
> > >         if (unlikely(busyloop_intr)) {
> > >             vhost_poll_queue()
> > >         } else if () {
> > >             vhost_disable_notify(&net->dev, vq);
> > >             continue;
> > >         }
> > >         return;
> > > }
> > >
> > > vs
> > >
> > > do {
> > >     if (head == vq->num) {
> > >         if (unlikely(busyloop_intr)) {
> > >             vhost_poll_queue()
> > >         } else if () {
> > >             vhost_tx_batch();
> > >             vhost_disable_notify(&net->dev, vq);
> > >             continue;
> > >         }
> > >         break;
> > > }
> > >
> > > vhost_tx_batch();
> > > return;
> > >
> > > Thanks
> > >
> >
> > But this is not what the code comment says:
> >
> >                      /* Flush batched packets before enabling
> >                       * virqtueue notification to reduce
> >                       * unnecssary virtqueue kicks.
> >
> >
> > So I ask - of we queued more polling, why do we need
> > to flush batched packets? We might get more in the next
> > polling round, this is what polling is designed to do.
> 
> The reason is there could be a rx work when busyloop_intr is true, so
> we need to flush.
> 
> Thanks

Then you need to update the comment to explain.
Want to post your version of this patchset?


> >
> >
> > >
> > > >
> > > >
> > > > > +                     if (unlikely(busyloop_intr)) {
> > > > > +                             vhost_poll_queue(&vq->poll);
> > > > > +                     } else if (unlikely(vhost_enable_notify(&net->dev,
> > > > > +                                                             vq))) {
> > > > > +                             vhost_disable_notify(&net->dev, vq);
> > > > > +                             continue;
> > > > > +                     }
> > > > >                       break;
> > > > >               }
> > > > >
> > > > > @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > > >               ++nvq->done_idx;
> > > > >       } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
> > > > >
> > > > > -     /* Kicks are still disabled, dispatch any remaining batched msgs. */
> > > > >       vhost_tx_batch(net, nvq, sock, &msg);
> > > > > -
> > > > > -     if (unlikely(busyloop_intr))
> > > > > -             /* If interrupted while doing busy polling, requeue the
> > > > > -              * handler to be fair handle_rx as well as other tasks
> > > > > -              * waiting on cpu.
> > > > > -              */
> > > > > -             vhost_poll_queue(&vq->poll);
> > > > > -     else
> > > > > -             /* All of our work has been completed; however, before
> > > > > -              * leaving the TX handler, do one last check for work,
> > > > > -              * and requeue handler if necessary. If there is no work,
> > > > > -              * queue will be reenabled.
> > > > > -              */
> > > > > -             vhost_net_busy_poll_try_queue(net, vq);
> > > > >  }
> > > > >
> > > > >  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> > > > > --
> > > > > 2.34.1
> > > >
> >


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-16  7:07           ` Michael S. Tsirkin
@ 2025-09-16  7:20             ` Jason Wang
  2025-09-16  9:39               ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2025-09-16  7:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eperezma, jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

On Tue, Sep 16, 2025 at 3:08 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Sep 16, 2025 at 02:24:22PM +0800, Jason Wang wrote:
> > On Tue, Sep 16, 2025 at 1:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Sep 16, 2025 at 10:37:35AM +0800, Jason Wang wrote:
> > > > On Tue, Sep 16, 2025 at 12:03 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
> > > > > > Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
> > > > > > sendmsg") tries to defer the notification enabling by moving the logic
> > > > > > out of the loop after the vhost_tx_batch() when nothing new is
> > > > > > spotted. This will bring side effects as the new logic would be reused
> > > > > > for several other error conditions.
> > > > > >
> > > > > > One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
> > > > > > might return -EAGAIN and exit the loop and see there's still available
> > > > > > buffers, so it will queue the tx work again until userspace feed the
> > > > > > IOTLB entry correctly. This will slowdown the tx processing and may
> > > > > > trigger the TX watchdog in the guest.
> > > > > >
> > > > > > Fixing this by stick the notificaiton enabling logic inside the loop
> > > > > > when nothing new is spotted and flush the batched before.
> > > > > >
> > > > > > Reported-by: Jon Kohler <jon@nutanix.com>
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > ---
> > > > > >  drivers/vhost/net.c | 33 +++++++++++++--------------------
> > > > > >  1 file changed, 13 insertions(+), 20 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > > > > index 16e39f3ab956..3611b7537932 100644
> > > > > > --- a/drivers/vhost/net.c
> > > > > > +++ b/drivers/vhost/net.c
> > > > > > @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > > > >       int err;
> > > > > >       int sent_pkts = 0;
> > > > > >       bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
> > > > > > -     bool busyloop_intr;
> > > > > >       bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
> > > > > >
> > > > > >       do {
> > > > > > -             busyloop_intr = false;
> > > > > > +             bool busyloop_intr = false;
> > > > > > +
> > > > > >               if (nvq->done_idx == VHOST_NET_BATCH)
> > > > > >                       vhost_tx_batch(net, nvq, sock, &msg);
> > > > > >
> > > > > > @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > > > >                       break;
> > > > > >               /* Nothing new?  Wait for eventfd to tell us they refilled. */
> > > > > >               if (head == vq->num) {
> > > > > > -                     /* Kicks are disabled at this point, break loop and
> > > > > > -                      * process any remaining batched packets. Queue will
> > > > > > -                      * be re-enabled afterwards.
> > > > > > +                     /* Flush batched packets before enabling
> > > > > > +                      * virqtueue notification to reduce
> > > > > > +                      * unnecssary virtqueue kicks.
> > > > > >                        */
> > > > > > +                     vhost_tx_batch(net, nvq, sock, &msg);
> > > > >
> > > > > So why don't we do this in the "else" branch"? If we are busy polling
> > > > > then we are not enabling kicks, so is there a reason to flush?
> > > >
> > > > It should be functional equivalent:
> > > >
> > > > do {
> > > >     if (head == vq->num) {
> > > >         vhost_tx_batch();
> > > >         if (unlikely(busyloop_intr)) {
> > > >             vhost_poll_queue()
> > > >         } else if () {
> > > >             vhost_disable_notify(&net->dev, vq);
> > > >             continue;
> > > >         }
> > > >         return;
> > > > }
> > > >
> > > > vs
> > > >
> > > > do {
> > > >     if (head == vq->num) {
> > > >         if (unlikely(busyloop_intr)) {
> > > >             vhost_poll_queue()
> > > >         } else if () {
> > > >             vhost_tx_batch();
> > > >             vhost_disable_notify(&net->dev, vq);
> > > >             continue;
> > > >         }
> > > >         break;
> > > > }
> > > >
> > > > vhost_tx_batch();
> > > > return;
> > > >
> > > > Thanks
> > > >
> > >
> > > But this is not what the code comment says:
> > >
> > >                      /* Flush batched packets before enabling
> > >                       * virqtueue notification to reduce
> > >                       * unnecssary virtqueue kicks.
> > >
> > >
> > > So I ask - of we queued more polling, why do we need
> > > to flush batched packets? We might get more in the next
> > > polling round, this is what polling is designed to do.
> >
> > The reason is there could be a rx work when busyloop_intr is true, so
> > we need to flush.
> >
> > Thanks
>
> Then you need to update the comment to explain.
> Want to post your version of this patchset?

I'm fine if you wish. Just want to make sure, do you prefer a patch
for your vhost tree or net?

For net, I would stick to 2 patches as if we go for 3, the last patch
that brings back flush looks more like an optimization.
For vhost, I can go with 3 patches, but I see that your series has been queued.

And the build of the current vhost tree is broken by:

commit 41bafbdcd27bf5ce8cd866a9b68daeb28f3ef12b (HEAD)
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Mon Sep 15 10:47:03 2025 +0800

    vhost-net: flush batched before enabling notifications

It looks like it misses a brace.

Thanks

>
>
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > > +                     if (unlikely(busyloop_intr)) {
> > > > > > +                             vhost_poll_queue(&vq->poll);
> > > > > > +                     } else if (unlikely(vhost_enable_notify(&net->dev,
> > > > > > +                                                             vq))) {
> > > > > > +                             vhost_disable_notify(&net->dev, vq);
> > > > > > +                             continue;
> > > > > > +                     }
> > > > > >                       break;
> > > > > >               }
> > > > > >
> > > > > > @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > > > >               ++nvq->done_idx;
> > > > > >       } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
> > > > > >
> > > > > > -     /* Kicks are still disabled, dispatch any remaining batched msgs. */
> > > > > >       vhost_tx_batch(net, nvq, sock, &msg);
> > > > > > -
> > > > > > -     if (unlikely(busyloop_intr))
> > > > > > -             /* If interrupted while doing busy polling, requeue the
> > > > > > -              * handler to be fair handle_rx as well as other tasks
> > > > > > -              * waiting on cpu.
> > > > > > -              */
> > > > > > -             vhost_poll_queue(&vq->poll);
> > > > > > -     else
> > > > > > -             /* All of our work has been completed; however, before
> > > > > > -              * leaving the TX handler, do one last check for work,
> > > > > > -              * and requeue handler if necessary. If there is no work,
> > > > > > -              * queue will be reenabled.
> > > > > > -              */
> > > > > > -             vhost_net_busy_poll_try_queue(net, vq);
> > > > > >  }
> > > > > >
> > > > > >  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> > > > > > --
> > > > > > 2.34.1
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification
  2025-09-16  7:20             ` Jason Wang
@ 2025-09-16  9:39               ` Michael S. Tsirkin
  0 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2025-09-16  9:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: eperezma, jonah.palmer, kuba, jon, kvm, virtualization, netdev,
	linux-kernel, stable

On Tue, Sep 16, 2025 at 03:20:36PM +0800, Jason Wang wrote:
> On Tue, Sep 16, 2025 at 3:08 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Sep 16, 2025 at 02:24:22PM +0800, Jason Wang wrote:
> > > On Tue, Sep 16, 2025 at 1:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Sep 16, 2025 at 10:37:35AM +0800, Jason Wang wrote:
> > > > > On Tue, Sep 16, 2025 at 12:03 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Fri, Sep 12, 2025 at 04:26:58PM +0800, Jason Wang wrote:
> > > > > > > Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after
> > > > > > > sendmsg") tries to defer the notification enabling by moving the logic
> > > > > > > out of the loop after the vhost_tx_batch() when nothing new is
> > > > > > > spotted. This will bring side effects as the new logic would be reused
> > > > > > > for several other error conditions.
> > > > > > >
> > > > > > > One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
> > > > > > > might return -EAGAIN and exit the loop and see there's still available
> > > > > > > buffers, so it will queue the tx work again until userspace feed the
> > > > > > > IOTLB entry correctly. This will slowdown the tx processing and may
> > > > > > > trigger the TX watchdog in the guest.
> > > > > > >
> > > > > > > Fixing this by stick the notificaiton enabling logic inside the loop
> > > > > > > when nothing new is spotted and flush the batched before.
> > > > > > >
> > > > > > > Reported-by: Jon Kohler <jon@nutanix.com>
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > > Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg")
> > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > ---
> > > > > > >  drivers/vhost/net.c | 33 +++++++++++++--------------------
> > > > > > >  1 file changed, 13 insertions(+), 20 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > > > > > index 16e39f3ab956..3611b7537932 100644
> > > > > > > --- a/drivers/vhost/net.c
> > > > > > > +++ b/drivers/vhost/net.c
> > > > > > > @@ -765,11 +765,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > > > > >       int err;
> > > > > > >       int sent_pkts = 0;
> > > > > > >       bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
> > > > > > > -     bool busyloop_intr;
> > > > > > >       bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
> > > > > > >
> > > > > > >       do {
> > > > > > > -             busyloop_intr = false;
> > > > > > > +             bool busyloop_intr = false;
> > > > > > > +
> > > > > > >               if (nvq->done_idx == VHOST_NET_BATCH)
> > > > > > >                       vhost_tx_batch(net, nvq, sock, &msg);
> > > > > > >
> > > > > > > @@ -780,10 +780,18 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > > > > >                       break;
> > > > > > >               /* Nothing new?  Wait for eventfd to tell us they refilled. */
> > > > > > >               if (head == vq->num) {
> > > > > > > -                     /* Kicks are disabled at this point, break loop and
> > > > > > > -                      * process any remaining batched packets. Queue will
> > > > > > > -                      * be re-enabled afterwards.
> > > > > > > +                     /* Flush batched packets before enabling
> > > > > > > +                      * virqtueue notification to reduce
> > > > > > > +                      * unnecssary virtqueue kicks.
> > > > > > >                        */
> > > > > > > +                     vhost_tx_batch(net, nvq, sock, &msg);
> > > > > >
> > > > > > So why don't we do this in the "else" branch"? If we are busy polling
> > > > > > then we are not enabling kicks, so is there a reason to flush?
> > > > >
> > > > > It should be functional equivalent:
> > > > >
> > > > > do {
> > > > >     if (head == vq->num) {
> > > > >         vhost_tx_batch();
> > > > >         if (unlikely(busyloop_intr)) {
> > > > >             vhost_poll_queue()
> > > > >         } else if () {
> > > > >             vhost_disable_notify(&net->dev, vq);
> > > > >             continue;
> > > > >         }
> > > > >         return;
> > > > > }
> > > > >
> > > > > vs
> > > > >
> > > > > do {
> > > > >     if (head == vq->num) {
> > > > >         if (unlikely(busyloop_intr)) {
> > > > >             vhost_poll_queue()
> > > > >         } else if () {
> > > > >             vhost_tx_batch();
> > > > >             vhost_disable_notify(&net->dev, vq);
> > > > >             continue;
> > > > >         }
> > > > >         break;
> > > > > }
> > > > >
> > > > > vhost_tx_batch();
> > > > > return;
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > > > But this is not what the code comment says:
> > > >
> > > >                      /* Flush batched packets before enabling
> > > >                       * virqtueue notification to reduce
> > > >                       * unnecssary virtqueue kicks.
> > > >
> > > >
> > > > So I ask - of we queued more polling, why do we need
> > > > to flush batched packets? We might get more in the next
> > > > polling round, this is what polling is designed to do.
> > >
> > > The reason is there could be a rx work when busyloop_intr is true, so
> > > we need to flush.
> > >
> > > Thanks
> >
> > Then you need to update the comment to explain.
> > Want to post your version of this patchset?
> 
> I'm fine if you wish. Just want to make sure, do you prefer a patch
> for your vhost tree or net?
> 
> For net, I would stick to 2 patches as if we go for 3, the last patch
> that brings back flush looks more like an optimization.

Jason it does not matter how it looks. We do not need to sneak in
features - if the right thing is to add patch 3 in net then
it is, just add an explanation why in the cover letter.
And if it is not then it is not and squashing it with a revert
is not a good idea.

> For vhost, I can go with 3 patches, but I see that your series has been queued.
>
> And the build of the current vhost tree is broken by:
> 
> commit 41bafbdcd27bf5ce8cd866a9b68daeb28f3ef12b (HEAD)
> Author: Michael S. Tsirkin <mst@redhat.com>
> Date:   Mon Sep 15 10:47:03 2025 +0800
> 
>     vhost-net: flush batched before enabling notifications
> 
> It looks like it misses a brace.
> 
> Thanks

Ugh forgot to commit :(
I guess this is what happens when one tries to code past midnight.
Dropped now pls do proceed.

> >
> >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > > +                     if (unlikely(busyloop_intr)) {
> > > > > > > +                             vhost_poll_queue(&vq->poll);
> > > > > > > +                     } else if (unlikely(vhost_enable_notify(&net->dev,
> > > > > > > +                                                             vq))) {
> > > > > > > +                             vhost_disable_notify(&net->dev, vq);
> > > > > > > +                             continue;
> > > > > > > +                     }
> > > > > > >                       break;
> > > > > > >               }
> > > > > > >
> > > > > > > @@ -839,22 +847,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > > > > >               ++nvq->done_idx;
> > > > > > >       } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
> > > > > > >
> > > > > > > -     /* Kicks are still disabled, dispatch any remaining batched msgs. */
> > > > > > >       vhost_tx_batch(net, nvq, sock, &msg);
> > > > > > > -
> > > > > > > -     if (unlikely(busyloop_intr))
> > > > > > > -             /* If interrupted while doing busy polling, requeue the
> > > > > > > -              * handler to be fair handle_rx as well as other tasks
> > > > > > > -              * waiting on cpu.
> > > > > > > -              */
> > > > > > > -             vhost_poll_queue(&vq->poll);
> > > > > > > -     else
> > > > > > > -             /* All of our work has been completed; however, before
> > > > > > > -              * leaving the TX handler, do one last check for work,
> > > > > > > -              * and requeue handler if necessary. If there is no work,
> > > > > > > -              * queue will be reenabled.
> > > > > > > -              */
> > > > > > > -             vhost_net_busy_poll_try_queue(net, vq);
> > > > > > >  }
> > > > > > >
> > > > > > >  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> > > > > > > --
> > > > > > > 2.34.1
> > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-09-16  9:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-12  8:26 [PATCH net 1/2] vhost-net: unbreak busy polling Jason Wang
2025-09-12  8:26 ` [PATCH net 2/2] vhost-net: correctly flush batched packet before enabling notification Jason Wang
2025-09-12  8:50   ` Michael S. Tsirkin
2025-09-12 15:24     ` Jon Kohler
2025-09-12 15:30       ` Michael S. Tsirkin
2025-09-12 15:33         ` Jon Kohler
2025-09-12 15:38           ` Michael S. Tsirkin
2025-09-12 15:40             ` Jon Kohler
2025-09-15 16:03   ` Michael S. Tsirkin
2025-09-16  2:37     ` Jason Wang
2025-09-16  5:18       ` Michael S. Tsirkin
2025-09-16  6:24         ` Jason Wang
2025-09-16  7:07           ` Michael S. Tsirkin
2025-09-16  7:20             ` Jason Wang
2025-09-16  9:39               ` Michael S. Tsirkin
2025-09-12  8:51 ` [PATCH net 1/2] vhost-net: unbreak busy polling Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).