[PATCH net] virtio-net: enable all napis before scheduling refill work

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] virtio-net: enable all napis before scheduling refill work
@ 2025-12-08 15:34 Bui Quang Minh
  2025-12-08 15:48 ` Bui Quang Minh
  2025-12-09  4:30 ` Jason Wang
  0 siblings, 2 replies; 10+ messages in thread
From: Bui Quang Minh @ 2025-12-08 15:34 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf, Bui Quang Minh

Calling napi_disable() on an already disabled napi can cause the
deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
when pausing rx"), to avoid the deadlock, when pausing the RX in
virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
However, in the virtnet_rx_resume_all(), we enable the delayed refill
work too early before enabling all the receive queue napis.

The deadlock can be reproduced by running
selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
device and inserting a cond_resched() inside the for loop in
virtnet_rx_resume_all() to increase the success rate. Because the worker
processing the delayed refilled work runs on the same CPU as
virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
In real scenario, the contention on netdev_lock can cause the
reschedule.

This fixes the deadlock by ensuring all receive queue's napis are
enabled before we enable the delayed refill work in
virtnet_rx_resume_all() and virtnet_open().

Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 drivers/net/virtio_net.c | 59 +++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 31 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8e04adb57f52..f2b1ea65767d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2858,6 +2858,20 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
 	return err != -ENOMEM;
 }
 
+static void virtnet_rx_refill_all(struct virtnet_info *vi)
+{
+	bool schedule_refill = false;
+	int i;
+
+	enable_delayed_refill(vi);
+	for (i = 0; i < vi->curr_queue_pairs; i++)
+		if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
+			schedule_refill = true;
+
+	if (schedule_refill)
+		schedule_delayed_work(&vi->refill, 0);
+}
+
 static void skb_recv_done(struct virtqueue *rvq)
 {
 	struct virtnet_info *vi = rvq->vdev->priv;
@@ -3216,19 +3230,14 @@ static int virtnet_open(struct net_device *dev)
 	struct virtnet_info *vi = netdev_priv(dev);
 	int i, err;
 
-	enable_delayed_refill(vi);
-
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		if (i < vi->curr_queue_pairs)
-			/* Make sure we have some buffers: if oom use wq. */
-			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
-				schedule_delayed_work(&vi->refill, 0);
-
 		err = virtnet_enable_queue_pair(vi, i);
 		if (err < 0)
 			goto err_enable_qp;
 	}
 
+	virtnet_rx_refill_all(vi);
+
 	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
 		if (vi->status & VIRTIO_NET_S_LINK_UP)
 			netif_carrier_on(vi->dev);
@@ -3463,39 +3472,27 @@ static void virtnet_rx_pause(struct virtnet_info *vi, struct receive_queue *rq)
 	__virtnet_rx_pause(vi, rq);
 }
 
-static void __virtnet_rx_resume(struct virtnet_info *vi,
-				struct receive_queue *rq,
-				bool refill)
-{
-	bool running = netif_running(vi->dev);
-	bool schedule_refill = false;
-
-	if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
-		schedule_refill = true;
-	if (running)
-		virtnet_napi_enable(rq);
-
-	if (schedule_refill)
-		schedule_delayed_work(&vi->refill, 0);
-}
-
 static void virtnet_rx_resume_all(struct virtnet_info *vi)
 {
 	int i;
 
-	enable_delayed_refill(vi);
-	for (i = 0; i < vi->max_queue_pairs; i++) {
-		if (i < vi->curr_queue_pairs)
-			__virtnet_rx_resume(vi, &vi->rq[i], true);
-		else
-			__virtnet_rx_resume(vi, &vi->rq[i], false);
+	if (netif_running(vi->dev)) {
+		for (i = 0; i < vi->max_queue_pairs; i++)
+			virtnet_napi_enable(&vi->rq[i]);
+
+		virtnet_rx_refill_all(vi);
 	}
 }
 
 static void virtnet_rx_resume(struct virtnet_info *vi, struct receive_queue *rq)
 {
-	enable_delayed_refill(vi);
-	__virtnet_rx_resume(vi, rq, true);
+	if (netif_running(vi->dev)) {
+		virtnet_napi_enable(rq);
+
+		enable_delayed_refill(vi);
+		if (!try_fill_recv(vi, rq, GFP_KERNEL))
+			schedule_delayed_work(&vi->refill, 0);
+	}
 }
 
 static int virtnet_rx_resize(struct virtnet_info *vi,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] virtio-net: enable all napis before scheduling refill work
  2025-12-08 15:34 [PATCH net] virtio-net: enable all napis before scheduling refill work Bui Quang Minh
@ 2025-12-08 15:48 ` Bui Quang Minh
  2025-12-09  4:30 ` Jason Wang
  1 sibling, 0 replies; 10+ messages in thread
From: Bui Quang Minh @ 2025-12-08 15:48 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf

On 12/8/25 22:34, Bui Quang Minh wrote:
> Calling napi_disable() on an already disabled napi can cause the
> deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
> when pausing rx"), to avoid the deadlock, when pausing the RX in
> virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
> However, in the virtnet_rx_resume_all(), we enable the delayed refill
> work too early before enabling all the receive queue napis.
>
> The deadlock can be reproduced by running
> selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
> device and inserting a cond_resched() inside the for loop in
> virtnet_rx_resume_all() to increase the success rate. Because the worker
> processing the delayed refilled work runs on the same CPU as
> virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
> In real scenario, the contention on netdev_lock can cause the
> reschedule.
>
> This fixes the deadlock by ensuring all receive queue's napis are
> enabled before we enable the delayed refill work in
> virtnet_rx_resume_all() and virtnet_open().
>
> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> Reported-by: Paolo Abeni <pabeni@redhat.com>
> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>

I forgot to add Cc:stable@vger.kernel.org. I will add in next version.

Thanks,
Quang Minh.

> ---
>   drivers/net/virtio_net.c | 59 +++++++++++++++++++---------------------
>   1 file changed, 28 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8e04adb57f52..f2b1ea65767d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2858,6 +2858,20 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
>   	return err != -ENOMEM;
>   }
>   
> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
> +{
> +	bool schedule_refill = false;
> +	int i;
> +
> +	enable_delayed_refill(vi);
> +	for (i = 0; i < vi->curr_queue_pairs; i++)
> +		if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> +			schedule_refill = true;
> +
> +	if (schedule_refill)
> +		schedule_delayed_work(&vi->refill, 0);
> +}
> +
>   static void skb_recv_done(struct virtqueue *rvq)
>   {
>   	struct virtnet_info *vi = rvq->vdev->priv;
> @@ -3216,19 +3230,14 @@ static int virtnet_open(struct net_device *dev)
>   	struct virtnet_info *vi = netdev_priv(dev);
>   	int i, err;
>   
> -	enable_delayed_refill(vi);
> -
>   	for (i = 0; i < vi->max_queue_pairs; i++) {
> -		if (i < vi->curr_queue_pairs)
> -			/* Make sure we have some buffers: if oom use wq. */
> -			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> -				schedule_delayed_work(&vi->refill, 0);
> -
>   		err = virtnet_enable_queue_pair(vi, i);
>   		if (err < 0)
>   			goto err_enable_qp;
>   	}
>   
> +	virtnet_rx_refill_all(vi);
> +
>   	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
>   		if (vi->status & VIRTIO_NET_S_LINK_UP)
>   			netif_carrier_on(vi->dev);
> @@ -3463,39 +3472,27 @@ static void virtnet_rx_pause(struct virtnet_info *vi, struct receive_queue *rq)
>   	__virtnet_rx_pause(vi, rq);
>   }
>   
> -static void __virtnet_rx_resume(struct virtnet_info *vi,
> -				struct receive_queue *rq,
> -				bool refill)
> -{
> -	bool running = netif_running(vi->dev);
> -	bool schedule_refill = false;
> -
> -	if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
> -		schedule_refill = true;
> -	if (running)
> -		virtnet_napi_enable(rq);
> -
> -	if (schedule_refill)
> -		schedule_delayed_work(&vi->refill, 0);
> -}
> -
>   static void virtnet_rx_resume_all(struct virtnet_info *vi)
>   {
>   	int i;
>   
> -	enable_delayed_refill(vi);
> -	for (i = 0; i < vi->max_queue_pairs; i++) {
> -		if (i < vi->curr_queue_pairs)
> -			__virtnet_rx_resume(vi, &vi->rq[i], true);
> -		else
> -			__virtnet_rx_resume(vi, &vi->rq[i], false);
> +	if (netif_running(vi->dev)) {
> +		for (i = 0; i < vi->max_queue_pairs; i++)
> +			virtnet_napi_enable(&vi->rq[i]);
> +
> +		virtnet_rx_refill_all(vi);
>   	}
>   }
>   
>   static void virtnet_rx_resume(struct virtnet_info *vi, struct receive_queue *rq)
>   {
> -	enable_delayed_refill(vi);
> -	__virtnet_rx_resume(vi, rq, true);
> +	if (netif_running(vi->dev)) {
> +		virtnet_napi_enable(rq);
> +
> +		enable_delayed_refill(vi);
> +		if (!try_fill_recv(vi, rq, GFP_KERNEL))
> +			schedule_delayed_work(&vi->refill, 0);
> +	}
>   }
>   
>   static int virtnet_rx_resize(struct virtnet_info *vi,


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] virtio-net: enable all napis before scheduling refill work
  2025-12-08 15:34 [PATCH net] virtio-net: enable all napis before scheduling refill work Bui Quang Minh
  2025-12-08 15:48 ` Bui Quang Minh
@ 2025-12-09  4:30 ` Jason Wang
  2025-12-09 15:23   ` Bui Quang Minh
  1 sibling, 1 reply; 10+ messages in thread
From: Jason Wang @ 2025-12-09  4:30 UTC (permalink / raw)
  To: Bui Quang Minh
  Cc: netdev, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf

On Mon, Dec 8, 2025 at 11:35 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>
> Calling napi_disable() on an already disabled napi can cause the
> deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
> when pausing rx"), to avoid the deadlock, when pausing the RX in
> virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
> However, in the virtnet_rx_resume_all(), we enable the delayed refill
> work too early before enabling all the receive queue napis.
>
> The deadlock can be reproduced by running
> selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
> device and inserting a cond_resched() inside the for loop in
> virtnet_rx_resume_all() to increase the success rate. Because the worker
> processing the delayed refilled work runs on the same CPU as
> virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
> In real scenario, the contention on netdev_lock can cause the
> reschedule.
>
> This fixes the deadlock by ensuring all receive queue's napis are
> enabled before we enable the delayed refill work in
> virtnet_rx_resume_all() and virtnet_open().
>
> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> Reported-by: Paolo Abeni <pabeni@redhat.com>
> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
> ---
>  drivers/net/virtio_net.c | 59 +++++++++++++++++++---------------------
>  1 file changed, 28 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8e04adb57f52..f2b1ea65767d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2858,6 +2858,20 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
>         return err != -ENOMEM;
>  }
>
> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
> +{
> +       bool schedule_refill = false;
> +       int i;
> +
> +       enable_delayed_refill(vi);

This seems to be still racy?

For example, in virtnet_open() we had:

static int virtnet_open(struct net_device *dev)
{
        struct virtnet_info *vi = netdev_priv(dev);
        int i, err;

        for (i = 0; i < vi->max_queue_pairs; i++) {
                err = virtnet_enable_queue_pair(vi, i);
                if (err < 0)
                        goto err_enable_qp;
        }

        virtnet_rx_refill_all(vi);

So NAPI and refill work is enabled in this case, so the refill work
could be scheduled and run at the same time?

Thanks


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] virtio-net: enable all napis before scheduling refill work
  2025-12-09  4:30 ` Jason Wang
@ 2025-12-09 15:23   ` Bui Quang Minh
  2025-12-10  5:45     ` Jason Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Bui Quang Minh @ 2025-12-09 15:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf

On 12/9/25 11:30, Jason Wang wrote:
> On Mon, Dec 8, 2025 at 11:35 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>> Calling napi_disable() on an already disabled napi can cause the
>> deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
>> when pausing rx"), to avoid the deadlock, when pausing the RX in
>> virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
>> However, in the virtnet_rx_resume_all(), we enable the delayed refill
>> work too early before enabling all the receive queue napis.
>>
>> The deadlock can be reproduced by running
>> selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
>> device and inserting a cond_resched() inside the for loop in
>> virtnet_rx_resume_all() to increase the success rate. Because the worker
>> processing the delayed refilled work runs on the same CPU as
>> virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
>> In real scenario, the contention on netdev_lock can cause the
>> reschedule.
>>
>> This fixes the deadlock by ensuring all receive queue's napis are
>> enabled before we enable the delayed refill work in
>> virtnet_rx_resume_all() and virtnet_open().
>>
>> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
>> Reported-by: Paolo Abeni <pabeni@redhat.com>
>> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
>> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
>> ---
>>   drivers/net/virtio_net.c | 59 +++++++++++++++++++---------------------
>>   1 file changed, 28 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 8e04adb57f52..f2b1ea65767d 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -2858,6 +2858,20 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
>>          return err != -ENOMEM;
>>   }
>>
>> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
>> +{
>> +       bool schedule_refill = false;
>> +       int i;
>> +
>> +       enable_delayed_refill(vi);
> This seems to be still racy?
>
> For example, in virtnet_open() we had:
>
> static int virtnet_open(struct net_device *dev)
> {
>          struct virtnet_info *vi = netdev_priv(dev);
>          int i, err;
>
>          for (i = 0; i < vi->max_queue_pairs; i++) {
>                  err = virtnet_enable_queue_pair(vi, i);
>                  if (err < 0)
>                          goto err_enable_qp;
>          }
>
>          virtnet_rx_refill_all(vi);
>
> So NAPI and refill work is enabled in this case, so the refill work
> could be scheduled and run at the same time?

Yes, that's what we expect. We must ensure that refill work is scheduled 
only when all NAPIs are enabled. The deadlock happens when refill work 
is scheduled but there are still disabled RX NAPIs.

Thanks,
Quang Minh.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] virtio-net: enable all napis before scheduling refill work
  2025-12-09 15:23   ` Bui Quang Minh
@ 2025-12-10  5:45     ` Jason Wang
  2025-12-10 15:33       ` Bui Quang Minh
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Wang @ 2025-12-10  5:45 UTC (permalink / raw)
  To: Bui Quang Minh
  Cc: netdev, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf

On Tue, Dec 9, 2025 at 11:23 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>
> On 12/9/25 11:30, Jason Wang wrote:
> > On Mon, Dec 8, 2025 at 11:35 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >> Calling napi_disable() on an already disabled napi can cause the
> >> deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
> >> when pausing rx"), to avoid the deadlock, when pausing the RX in
> >> virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
> >> However, in the virtnet_rx_resume_all(), we enable the delayed refill
> >> work too early before enabling all the receive queue napis.
> >>
> >> The deadlock can be reproduced by running
> >> selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
> >> device and inserting a cond_resched() inside the for loop in
> >> virtnet_rx_resume_all() to increase the success rate. Because the worker
> >> processing the delayed refilled work runs on the same CPU as
> >> virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
> >> In real scenario, the contention on netdev_lock can cause the
> >> reschedule.
> >>
> >> This fixes the deadlock by ensuring all receive queue's napis are
> >> enabled before we enable the delayed refill work in
> >> virtnet_rx_resume_all() and virtnet_open().
> >>
> >> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> >> Reported-by: Paolo Abeni <pabeni@redhat.com>
> >> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
> >> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
> >> ---
> >>   drivers/net/virtio_net.c | 59 +++++++++++++++++++---------------------
> >>   1 file changed, 28 insertions(+), 31 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 8e04adb57f52..f2b1ea65767d 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -2858,6 +2858,20 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
> >>          return err != -ENOMEM;
> >>   }
> >>
> >> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
> >> +{
> >> +       bool schedule_refill = false;
> >> +       int i;
> >> +
> >> +       enable_delayed_refill(vi);
> > This seems to be still racy?
> >
> > For example, in virtnet_open() we had:
> >
> > static int virtnet_open(struct net_device *dev)
> > {
> >          struct virtnet_info *vi = netdev_priv(dev);
> >          int i, err;
> >
> >          for (i = 0; i < vi->max_queue_pairs; i++) {
> >                  err = virtnet_enable_queue_pair(vi, i);
> >                  if (err < 0)
> >                          goto err_enable_qp;
> >          }
> >
> >          virtnet_rx_refill_all(vi);
> >
> > So NAPI and refill work is enabled in this case, so the refill work
> > could be scheduled and run at the same time?
>
> Yes, that's what we expect. We must ensure that refill work is scheduled
> only when all NAPIs are enabled. The deadlock happens when refill work
> is scheduled but there are still disabled RX NAPIs.

Just to make sure we are on the same page, I meant, after refill work
is enabled, rq0 is NAPI is enabled, in this case the refill work could
be triggered by the rq0's NAPI so we may end up in the refill work
that it tries to disable rq1's NAPI while holding the netdev lock.

Thanks

>
> Thanks,
> Quang Minh.
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] virtio-net: enable all napis before scheduling refill work
  2025-12-10  5:45     ` Jason Wang
@ 2025-12-10 15:33       ` Bui Quang Minh
  2025-12-11  7:27         ` Jason Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Bui Quang Minh @ 2025-12-10 15:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf

On 12/10/25 12:45, Jason Wang wrote:
> On Tue, Dec 9, 2025 at 11:23 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>> On 12/9/25 11:30, Jason Wang wrote:
>>> On Mon, Dec 8, 2025 at 11:35 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>>>> Calling napi_disable() on an already disabled napi can cause the
>>>> deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
>>>> when pausing rx"), to avoid the deadlock, when pausing the RX in
>>>> virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
>>>> However, in the virtnet_rx_resume_all(), we enable the delayed refill
>>>> work too early before enabling all the receive queue napis.
>>>>
>>>> The deadlock can be reproduced by running
>>>> selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
>>>> device and inserting a cond_resched() inside the for loop in
>>>> virtnet_rx_resume_all() to increase the success rate. Because the worker
>>>> processing the delayed refilled work runs on the same CPU as
>>>> virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
>>>> In real scenario, the contention on netdev_lock can cause the
>>>> reschedule.
>>>>
>>>> This fixes the deadlock by ensuring all receive queue's napis are
>>>> enabled before we enable the delayed refill work in
>>>> virtnet_rx_resume_all() and virtnet_open().
>>>>
>>>> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
>>>> Reported-by: Paolo Abeni <pabeni@redhat.com>
>>>> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
>>>> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
>>>> ---
>>>>    drivers/net/virtio_net.c | 59 +++++++++++++++++++---------------------
>>>>    1 file changed, 28 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index 8e04adb57f52..f2b1ea65767d 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -2858,6 +2858,20 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
>>>>           return err != -ENOMEM;
>>>>    }
>>>>
>>>> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
>>>> +{
>>>> +       bool schedule_refill = false;
>>>> +       int i;
>>>> +
>>>> +       enable_delayed_refill(vi);
>>> This seems to be still racy?
>>>
>>> For example, in virtnet_open() we had:
>>>
>>> static int virtnet_open(struct net_device *dev)
>>> {
>>>           struct virtnet_info *vi = netdev_priv(dev);
>>>           int i, err;
>>>
>>>           for (i = 0; i < vi->max_queue_pairs; i++) {
>>>                   err = virtnet_enable_queue_pair(vi, i);
>>>                   if (err < 0)
>>>                           goto err_enable_qp;
>>>           }
>>>
>>>           virtnet_rx_refill_all(vi);
>>>
>>> So NAPI and refill work is enabled in this case, so the refill work
>>> could be scheduled and run at the same time?
>> Yes, that's what we expect. We must ensure that refill work is scheduled
>> only when all NAPIs are enabled. The deadlock happens when refill work
>> is scheduled but there are still disabled RX NAPIs.
> Just to make sure we are on the same page, I meant, after refill work
> is enabled, rq0 is NAPI is enabled, in this case the refill work could
> be triggered by the rq0's NAPI so we may end up in the refill work
> that it tries to disable rq1's NAPI while holding the netdev lock.

I don't quite get your point. The current deadlock scenario is this

virtnet_rx_resume_all
napi_enable(rq0) (the rq1 napi is still disabled)
enable_refill_work

refill_work
napi_disable(rq0) -> still okay
napi_enable(rq0) -> still okay
napi_disable(rq1)
-> hold netdev_lock
     -> stuck inside the while loop in napi_disable_locked
             while (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) {
                 usleep_range(20, 200);
                 val = READ_ONCE(n->state);
             }


napi_enable(rq1)
-> stuck while trying to acquire the netdev_lock

The problem is that we must not call napi_disable() on an already 
disabled NAPI (rq1's NAPI in the example).

In the new virtnet_open

static int virtnet_open(struct net_device *dev)
{
          struct virtnet_info *vi = netdev_priv(dev);
          int i, err;

          // Note that at this point, refill work is still disabled, vi->refill_enabled == false,
          // so even if virtnet_receive is called, the refill_work will not be scheduled.
          for (i = 0; i < vi->max_queue_pairs; i++) {
                  err = virtnet_enable_queue_pair(vi, i);
                  if (err < 0)
                          goto err_enable_qp;
          }

          // Here all RX NAPIs are enabled so it's safe to enable refill work again
          virtnet_rx_refill_all(vi);


Thanks,
Quang Minh.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] virtio-net: enable all napis before scheduling refill work
  2025-12-10 15:33       ` Bui Quang Minh
@ 2025-12-11  7:27         ` Jason Wang
  2025-12-11 15:04           ` Bui Quang Minh
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Wang @ 2025-12-11  7:27 UTC (permalink / raw)
  To: Bui Quang Minh
  Cc: netdev, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf

On Wed, Dec 10, 2025 at 11:33 PM Bui Quang Minh
<minhquangbui99@gmail.com> wrote:
>
> On 12/10/25 12:45, Jason Wang wrote:
> > On Tue, Dec 9, 2025 at 11:23 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >> On 12/9/25 11:30, Jason Wang wrote:
> >>> On Mon, Dec 8, 2025 at 11:35 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >>>> Calling napi_disable() on an already disabled napi can cause the
> >>>> deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
> >>>> when pausing rx"), to avoid the deadlock, when pausing the RX in
> >>>> virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
> >>>> However, in the virtnet_rx_resume_all(), we enable the delayed refill
> >>>> work too early before enabling all the receive queue napis.
> >>>>
> >>>> The deadlock can be reproduced by running
> >>>> selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
> >>>> device and inserting a cond_resched() inside the for loop in
> >>>> virtnet_rx_resume_all() to increase the success rate. Because the worker
> >>>> processing the delayed refilled work runs on the same CPU as
> >>>> virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
> >>>> In real scenario, the contention on netdev_lock can cause the
> >>>> reschedule.
> >>>>
> >>>> This fixes the deadlock by ensuring all receive queue's napis are
> >>>> enabled before we enable the delayed refill work in
> >>>> virtnet_rx_resume_all() and virtnet_open().
> >>>>
> >>>> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> >>>> Reported-by: Paolo Abeni <pabeni@redhat.com>
> >>>> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
> >>>> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
> >>>> ---
> >>>>    drivers/net/virtio_net.c | 59 +++++++++++++++++++---------------------
> >>>>    1 file changed, 28 insertions(+), 31 deletions(-)
> >>>>
> >>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>>> index 8e04adb57f52..f2b1ea65767d 100644
> >>>> --- a/drivers/net/virtio_net.c
> >>>> +++ b/drivers/net/virtio_net.c
> >>>> @@ -2858,6 +2858,20 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
> >>>>           return err != -ENOMEM;
> >>>>    }
> >>>>
> >>>> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
> >>>> +{
> >>>> +       bool schedule_refill = false;
> >>>> +       int i;
> >>>> +
> >>>> +       enable_delayed_refill(vi);
> >>> This seems to be still racy?
> >>>
> >>> For example, in virtnet_open() we had:
> >>>
> >>> static int virtnet_open(struct net_device *dev)
> >>> {
> >>>           struct virtnet_info *vi = netdev_priv(dev);
> >>>           int i, err;
> >>>
> >>>           for (i = 0; i < vi->max_queue_pairs; i++) {
> >>>                   err = virtnet_enable_queue_pair(vi, i);
> >>>                   if (err < 0)
> >>>                           goto err_enable_qp;
> >>>           }
> >>>
> >>>           virtnet_rx_refill_all(vi);
> >>>
> >>> So NAPI and refill work is enabled in this case, so the refill work
> >>> could be scheduled and run at the same time?
> >> Yes, that's what we expect. We must ensure that refill work is scheduled
> >> only when all NAPIs are enabled. The deadlock happens when refill work
> >> is scheduled but there are still disabled RX NAPIs.
> > Just to make sure we are on the same page, I meant, after refill work
> > is enabled, rq0 is NAPI is enabled, in this case the refill work could
> > be triggered by the rq0's NAPI so we may end up in the refill work
> > that it tries to disable rq1's NAPI while holding the netdev lock.
>
> I don't quite get your point. The current deadlock scenario is this
>
> virtnet_rx_resume_all
> napi_enable(rq0) (the rq1 napi is still disabled)
> enable_refill_work
>
> refill_work
> napi_disable(rq0) -> still okay
> napi_enable(rq0) -> still okay
> napi_disable(rq1)
> -> hold netdev_lock
>      -> stuck inside the while loop in napi_disable_locked
>              while (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) {
>                  usleep_range(20, 200);
>                  val = READ_ONCE(n->state);
>              }
>
>
> napi_enable(rq1)
> -> stuck while trying to acquire the netdev_lock
>
> The problem is that we must not call napi_disable() on an already
> disabled NAPI (rq1's NAPI in the example).
>
> In the new virtnet_open
>
> static int virtnet_open(struct net_device *dev)
> {
>           struct virtnet_info *vi = netdev_priv(dev);
>           int i, err;
>
>           // Note that at this point, refill work is still disabled, vi->refill_enabled == false,
>           // so even if virtnet_receive is called, the refill_work will not be scheduled.
>           for (i = 0; i < vi->max_queue_pairs; i++) {
>                   err = virtnet_enable_queue_pair(vi, i);
>                   if (err < 0)
>                           goto err_enable_qp;
>           }
>
>           // Here all RX NAPIs are enabled so it's safe to enable refill work again
>           virtnet_rx_refill_all(vi);
>

I meant this part:

+static void virtnet_rx_refill_all(struct virtnet_info *vi)
+{
+       bool schedule_refill = false;
+       int i;
+
+       enable_delayed_refill(vi);

refill_work could run here.

+       for (i = 0; i < vi->curr_queue_pairs; i++)
+               if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
+                       schedule_refill = true;
+

I think it can be fixed by moving enable_delayed_refill() here.

+       if (schedule_refill)
+               schedule_delayed_work(&vi->refill, 0);
+}

Thanks

>
> Thanks,
> Quang Minh.
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] virtio-net: enable all napis before scheduling refill work
  2025-12-11  7:27         ` Jason Wang
@ 2025-12-11 15:04           ` Bui Quang Minh
  2025-12-12  6:48             ` Jason Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Bui Quang Minh @ 2025-12-11 15:04 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf

On 12/11/25 14:27, Jason Wang wrote:
> On Wed, Dec 10, 2025 at 11:33 PM Bui Quang Minh
> <minhquangbui99@gmail.com> wrote:
>> On 12/10/25 12:45, Jason Wang wrote:
>>> On Tue, Dec 9, 2025 at 11:23 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>>>> On 12/9/25 11:30, Jason Wang wrote:
>>>>> On Mon, Dec 8, 2025 at 11:35 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>>>>>> Calling napi_disable() on an already disabled napi can cause the
>>>>>> deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
>>>>>> when pausing rx"), to avoid the deadlock, when pausing the RX in
>>>>>> virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
>>>>>> However, in the virtnet_rx_resume_all(), we enable the delayed refill
>>>>>> work too early before enabling all the receive queue napis.
>>>>>>
>>>>>> The deadlock can be reproduced by running
>>>>>> selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
>>>>>> device and inserting a cond_resched() inside the for loop in
>>>>>> virtnet_rx_resume_all() to increase the success rate. Because the worker
>>>>>> processing the delayed refilled work runs on the same CPU as
>>>>>> virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
>>>>>> In real scenario, the contention on netdev_lock can cause the
>>>>>> reschedule.
>>>>>>
>>>>>> This fixes the deadlock by ensuring all receive queue's napis are
>>>>>> enabled before we enable the delayed refill work in
>>>>>> virtnet_rx_resume_all() and virtnet_open().
>>>>>>
>>>>>> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
>>>>>> Reported-by: Paolo Abeni <pabeni@redhat.com>
>>>>>> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
>>>>>> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
>>>>>> ---
>>>>>>     drivers/net/virtio_net.c | 59 +++++++++++++++++++---------------------
>>>>>>     1 file changed, 28 insertions(+), 31 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>> index 8e04adb57f52..f2b1ea65767d 100644
>>>>>> --- a/drivers/net/virtio_net.c
>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>> @@ -2858,6 +2858,20 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
>>>>>>            return err != -ENOMEM;
>>>>>>     }
>>>>>>
>>>>>> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
>>>>>> +{
>>>>>> +       bool schedule_refill = false;
>>>>>> +       int i;
>>>>>> +
>>>>>> +       enable_delayed_refill(vi);
>>>>> This seems to be still racy?
>>>>>
>>>>> For example, in virtnet_open() we had:
>>>>>
>>>>> static int virtnet_open(struct net_device *dev)
>>>>> {
>>>>>            struct virtnet_info *vi = netdev_priv(dev);
>>>>>            int i, err;
>>>>>
>>>>>            for (i = 0; i < vi->max_queue_pairs; i++) {
>>>>>                    err = virtnet_enable_queue_pair(vi, i);
>>>>>                    if (err < 0)
>>>>>                            goto err_enable_qp;
>>>>>            }
>>>>>
>>>>>            virtnet_rx_refill_all(vi);
>>>>>
>>>>> So NAPI and refill work is enabled in this case, so the refill work
>>>>> could be scheduled and run at the same time?
>>>> Yes, that's what we expect. We must ensure that refill work is scheduled
>>>> only when all NAPIs are enabled. The deadlock happens when refill work
>>>> is scheduled but there are still disabled RX NAPIs.
>>> Just to make sure we are on the same page, I meant, after refill work
>>> is enabled, rq0 is NAPI is enabled, in this case the refill work could
>>> be triggered by the rq0's NAPI so we may end up in the refill work
>>> that it tries to disable rq1's NAPI while holding the netdev lock.
>> I don't quite get your point. The current deadlock scenario is this
>>
>> virtnet_rx_resume_all
>> napi_enable(rq0) (the rq1 napi is still disabled)
>> enable_refill_work
>>
>> refill_work
>> napi_disable(rq0) -> still okay
>> napi_enable(rq0) -> still okay
>> napi_disable(rq1)
>> -> hold netdev_lock
>>       -> stuck inside the while loop in napi_disable_locked
>>               while (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) {
>>                   usleep_range(20, 200);
>>                   val = READ_ONCE(n->state);
>>               }
>>
>>
>> napi_enable(rq1)
>> -> stuck while trying to acquire the netdev_lock
>>
>> The problem is that we must not call napi_disable() on an already
>> disabled NAPI (rq1's NAPI in the example).
>>
>> In the new virtnet_open
>>
>> static int virtnet_open(struct net_device *dev)
>> {
>>            struct virtnet_info *vi = netdev_priv(dev);
>>            int i, err;
>>
>>            // Note that at this point, refill work is still disabled, vi->refill_enabled == false,
>>            // so even if virtnet_receive is called, the refill_work will not be scheduled.
>>            for (i = 0; i < vi->max_queue_pairs; i++) {
>>                    err = virtnet_enable_queue_pair(vi, i);
>>                    if (err < 0)
>>                            goto err_enable_qp;
>>            }
>>
>>            // Here all RX NAPIs are enabled so it's safe to enable refill work again
>>            virtnet_rx_refill_all(vi);
>>
> I meant this part:
>
> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
> +{
> +       bool schedule_refill = false;
> +       int i;
> +
> +       enable_delayed_refill(vi);
>
> refill_work could run here.

I don't see how this can trigger the current deadlock race. However, I 
see that this code is racy, the try_fill_recv function is not safe to 
concurrently executed on the same receive queue. So there is a 
requirement that we need to call try_fill_recv before enabling napi. Is 
it what you mean?

>
> +       for (i = 0; i < vi->curr_queue_pairs; i++)
> +               if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> +                       schedule_refill = true;
> +
>
> I think it can be fixed by moving enable_delayed_refill() here.
>
> +       if (schedule_refill)
> +               schedule_delayed_work(&vi->refill, 0);
> +}

Thanks,
Quang Minh.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] virtio-net: enable all napis before scheduling refill work
  2025-12-11 15:04           ` Bui Quang Minh
@ 2025-12-12  6:48             ` Jason Wang
  2025-12-12 14:55               ` Bui Quang Minh
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Wang @ 2025-12-12  6:48 UTC (permalink / raw)
  To: Bui Quang Minh
  Cc: netdev, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf

On Thu, Dec 11, 2025 at 11:04 PM Bui Quang Minh
<minhquangbui99@gmail.com> wrote:
>
> On 12/11/25 14:27, Jason Wang wrote:
> > On Wed, Dec 10, 2025 at 11:33 PM Bui Quang Minh
> > <minhquangbui99@gmail.com> wrote:
> >> On 12/10/25 12:45, Jason Wang wrote:
> >>> On Tue, Dec 9, 2025 at 11:23 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >>>> On 12/9/25 11:30, Jason Wang wrote:
> >>>>> On Mon, Dec 8, 2025 at 11:35 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >>>>>> Calling napi_disable() on an already disabled napi can cause the
> >>>>>> deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
> >>>>>> when pausing rx"), to avoid the deadlock, when pausing the RX in
> >>>>>> virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
> >>>>>> However, in the virtnet_rx_resume_all(), we enable the delayed refill
> >>>>>> work too early before enabling all the receive queue napis.
> >>>>>>
> >>>>>> The deadlock can be reproduced by running
> >>>>>> selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
> >>>>>> device and inserting a cond_resched() inside the for loop in
> >>>>>> virtnet_rx_resume_all() to increase the success rate. Because the worker
> >>>>>> processing the delayed refilled work runs on the same CPU as
> >>>>>> virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
> >>>>>> In real scenario, the contention on netdev_lock can cause the
> >>>>>> reschedule.
> >>>>>>
> >>>>>> This fixes the deadlock by ensuring all receive queue's napis are
> >>>>>> enabled before we enable the delayed refill work in
> >>>>>> virtnet_rx_resume_all() and virtnet_open().
> >>>>>>
> >>>>>> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> >>>>>> Reported-by: Paolo Abeni <pabeni@redhat.com>
> >>>>>> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
> >>>>>> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
> >>>>>> ---
> >>>>>>     drivers/net/virtio_net.c | 59 +++++++++++++++++++---------------------
> >>>>>>     1 file changed, 28 insertions(+), 31 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>>>>> index 8e04adb57f52..f2b1ea65767d 100644
> >>>>>> --- a/drivers/net/virtio_net.c
> >>>>>> +++ b/drivers/net/virtio_net.c
> >>>>>> @@ -2858,6 +2858,20 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
> >>>>>>            return err != -ENOMEM;
> >>>>>>     }
> >>>>>>
> >>>>>> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
> >>>>>> +{
> >>>>>> +       bool schedule_refill = false;
> >>>>>> +       int i;
> >>>>>> +
> >>>>>> +       enable_delayed_refill(vi);
> >>>>> This seems to be still racy?
> >>>>>
> >>>>> For example, in virtnet_open() we had:
> >>>>>
> >>>>> static int virtnet_open(struct net_device *dev)
> >>>>> {
> >>>>>            struct virtnet_info *vi = netdev_priv(dev);
> >>>>>            int i, err;
> >>>>>
> >>>>>            for (i = 0; i < vi->max_queue_pairs; i++) {
> >>>>>                    err = virtnet_enable_queue_pair(vi, i);
> >>>>>                    if (err < 0)
> >>>>>                            goto err_enable_qp;
> >>>>>            }
> >>>>>
> >>>>>            virtnet_rx_refill_all(vi);
> >>>>>
> >>>>> So NAPI and refill work is enabled in this case, so the refill work
> >>>>> could be scheduled and run at the same time?
> >>>> Yes, that's what we expect. We must ensure that refill work is scheduled
> >>>> only when all NAPIs are enabled. The deadlock happens when refill work
> >>>> is scheduled but there are still disabled RX NAPIs.
> >>> Just to make sure we are on the same page, I meant, after refill work
> >>> is enabled, rq0 is NAPI is enabled, in this case the refill work could
> >>> be triggered by the rq0's NAPI so we may end up in the refill work
> >>> that it tries to disable rq1's NAPI while holding the netdev lock.
> >> I don't quite get your point. The current deadlock scenario is this
> >>
> >> virtnet_rx_resume_all
> >> napi_enable(rq0) (the rq1 napi is still disabled)
> >> enable_refill_work
> >>
> >> refill_work
> >> napi_disable(rq0) -> still okay
> >> napi_enable(rq0) -> still okay
> >> napi_disable(rq1)
> >> -> hold netdev_lock
> >>       -> stuck inside the while loop in napi_disable_locked
> >>               while (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) {
> >>                   usleep_range(20, 200);
> >>                   val = READ_ONCE(n->state);
> >>               }
> >>
> >>
> >> napi_enable(rq1)
> >> -> stuck while trying to acquire the netdev_lock
> >>
> >> The problem is that we must not call napi_disable() on an already
> >> disabled NAPI (rq1's NAPI in the example).
> >>
> >> In the new virtnet_open
> >>
> >> static int virtnet_open(struct net_device *dev)
> >> {
> >>            struct virtnet_info *vi = netdev_priv(dev);
> >>            int i, err;
> >>
> >>            // Note that at this point, refill work is still disabled, vi->refill_enabled == false,
> >>            // so even if virtnet_receive is called, the refill_work will not be scheduled.
> >>            for (i = 0; i < vi->max_queue_pairs; i++) {
> >>                    err = virtnet_enable_queue_pair(vi, i);
> >>                    if (err < 0)
> >>                            goto err_enable_qp;
> >>            }
> >>
> >>            // Here all RX NAPIs are enabled so it's safe to enable refill work again
> >>            virtnet_rx_refill_all(vi);
> >>
> > I meant this part:
> >
> > +static void virtnet_rx_refill_all(struct virtnet_info *vi)
> > +{
> > +       bool schedule_refill = false;
> > +       int i;
> > +
> > +       enable_delayed_refill(vi);
> >
> > refill_work could run here.
>
> I don't see how this can trigger the current deadlock race. However, I
> see that this code is racy, the try_fill_recv function is not safe to
> concurrently executed on the same receive queue. So there is a
> requirement that we need to call try_fill_recv before enabling napi. Is
> it what you mean?

Exactly, I meant it's racy.

>
> >
> > +       for (i = 0; i < vi->curr_queue_pairs; i++)
> > +               if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> > +                       schedule_refill = true;
> > +
> >
> > I think it can be fixed by moving enable_delayed_refill() here.
> >
> > +       if (schedule_refill)
> > +               schedule_delayed_work(&vi->refill, 0);
> > +}
>
> Thanks,
> Quang Minh.
>

Thanks

>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] virtio-net: enable all napis before scheduling refill work
  2025-12-12  6:48             ` Jason Wang
@ 2025-12-12 14:55               ` Bui Quang Minh
  0 siblings, 0 replies; 10+ messages in thread
From: Bui Quang Minh @ 2025-12-12 14:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf

On 12/12/25 13:48, Jason Wang wrote:
> On Thu, Dec 11, 2025 at 11:04 PM Bui Quang Minh
> <minhquangbui99@gmail.com> wrote:
>> On 12/11/25 14:27, Jason Wang wrote:
>>> On Wed, Dec 10, 2025 at 11:33 PM Bui Quang Minh
>>> <minhquangbui99@gmail.com> wrote:
>>>> On 12/10/25 12:45, Jason Wang wrote:
>>>>> On Tue, Dec 9, 2025 at 11:23 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>>>>>> On 12/9/25 11:30, Jason Wang wrote:
>>>>>>> On Mon, Dec 8, 2025 at 11:35 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>>>>>>>> Calling napi_disable() on an already disabled napi can cause the
>>>>>>>> deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
>>>>>>>> when pausing rx"), to avoid the deadlock, when pausing the RX in
>>>>>>>> virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
>>>>>>>> However, in the virtnet_rx_resume_all(), we enable the delayed refill
>>>>>>>> work too early before enabling all the receive queue napis.
>>>>>>>>
>>>>>>>> The deadlock can be reproduced by running
>>>>>>>> selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
>>>>>>>> device and inserting a cond_resched() inside the for loop in
>>>>>>>> virtnet_rx_resume_all() to increase the success rate. Because the worker
>>>>>>>> processing the delayed refilled work runs on the same CPU as
>>>>>>>> virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
>>>>>>>> In real scenario, the contention on netdev_lock can cause the
>>>>>>>> reschedule.
>>>>>>>>
>>>>>>>> This fixes the deadlock by ensuring all receive queue's napis are
>>>>>>>> enabled before we enable the delayed refill work in
>>>>>>>> virtnet_rx_resume_all() and virtnet_open().
>>>>>>>>
>>>>>>>> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
>>>>>>>> Reported-by: Paolo Abeni <pabeni@redhat.com>
>>>>>>>> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
>>>>>>>> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
>>>>>>>> ---
>>>>>>>>      drivers/net/virtio_net.c | 59 +++++++++++++++++++---------------------
>>>>>>>>      1 file changed, 28 insertions(+), 31 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>> index 8e04adb57f52..f2b1ea65767d 100644
>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>> @@ -2858,6 +2858,20 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
>>>>>>>>             return err != -ENOMEM;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
>>>>>>>> +{
>>>>>>>> +       bool schedule_refill = false;
>>>>>>>> +       int i;
>>>>>>>> +
>>>>>>>> +       enable_delayed_refill(vi);
>>>>>>> This seems to be still racy?
>>>>>>>
>>>>>>> For example, in virtnet_open() we had:
>>>>>>>
>>>>>>> static int virtnet_open(struct net_device *dev)
>>>>>>> {
>>>>>>>             struct virtnet_info *vi = netdev_priv(dev);
>>>>>>>             int i, err;
>>>>>>>
>>>>>>>             for (i = 0; i < vi->max_queue_pairs; i++) {
>>>>>>>                     err = virtnet_enable_queue_pair(vi, i);
>>>>>>>                     if (err < 0)
>>>>>>>                             goto err_enable_qp;
>>>>>>>             }
>>>>>>>
>>>>>>>             virtnet_rx_refill_all(vi);
>>>>>>>
>>>>>>> So NAPI and refill work is enabled in this case, so the refill work
>>>>>>> could be scheduled and run at the same time?
>>>>>> Yes, that's what we expect. We must ensure that refill work is scheduled
>>>>>> only when all NAPIs are enabled. The deadlock happens when refill work
>>>>>> is scheduled but there are still disabled RX NAPIs.
>>>>> Just to make sure we are on the same page, I meant, after refill work
>>>>> is enabled, rq0 is NAPI is enabled, in this case the refill work could
>>>>> be triggered by the rq0's NAPI so we may end up in the refill work
>>>>> that it tries to disable rq1's NAPI while holding the netdev lock.
>>>> I don't quite get your point. The current deadlock scenario is this
>>>>
>>>> virtnet_rx_resume_all
>>>> napi_enable(rq0) (the rq1 napi is still disabled)
>>>> enable_refill_work
>>>>
>>>> refill_work
>>>> napi_disable(rq0) -> still okay
>>>> napi_enable(rq0) -> still okay
>>>> napi_disable(rq1)
>>>> -> hold netdev_lock
>>>>        -> stuck inside the while loop in napi_disable_locked
>>>>                while (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) {
>>>>                    usleep_range(20, 200);
>>>>                    val = READ_ONCE(n->state);
>>>>                }
>>>>
>>>>
>>>> napi_enable(rq1)
>>>> -> stuck while trying to acquire the netdev_lock
>>>>
>>>> The problem is that we must not call napi_disable() on an already
>>>> disabled NAPI (rq1's NAPI in the example).
>>>>
>>>> In the new virtnet_open
>>>>
>>>> static int virtnet_open(struct net_device *dev)
>>>> {
>>>>             struct virtnet_info *vi = netdev_priv(dev);
>>>>             int i, err;
>>>>
>>>>             // Note that at this point, refill work is still disabled, vi->refill_enabled == false,
>>>>             // so even if virtnet_receive is called, the refill_work will not be scheduled.
>>>>             for (i = 0; i < vi->max_queue_pairs; i++) {
>>>>                     err = virtnet_enable_queue_pair(vi, i);
>>>>                     if (err < 0)
>>>>                             goto err_enable_qp;
>>>>             }
>>>>
>>>>             // Here all RX NAPIs are enabled so it's safe to enable refill work again
>>>>             virtnet_rx_refill_all(vi);
>>>>
>>> I meant this part:
>>>
>>> +static void virtnet_rx_refill_all(struct virtnet_info *vi)
>>> +{
>>> +       bool schedule_refill = false;
>>> +       int i;
>>> +
>>> +       enable_delayed_refill(vi);
>>>
>>> refill_work could run here.
>> I don't see how this can trigger the current deadlock race. However, I
>> see that this code is racy, the try_fill_recv function is not safe to
>> concurrently executed on the same receive queue. So there is a
>> requirement that we need to call try_fill_recv before enabling napi. Is
>> it what you mean?
> Exactly, I meant it's racy.

Okay, I'll fix it in the next version.

Thanks,
Quang Minh.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-12-12 14:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-08 15:34 [PATCH net] virtio-net: enable all napis before scheduling refill work Bui Quang Minh
2025-12-08 15:48 ` Bui Quang Minh
2025-12-09  4:30 ` Jason Wang
2025-12-09 15:23   ` Bui Quang Minh
2025-12-10  5:45     ` Jason Wang
2025-12-10 15:33       ` Bui Quang Minh
2025-12-11  7:27         ` Jason Wang
2025-12-11 15:04           ` Bui Quang Minh
2025-12-12  6:48             ` Jason Wang
2025-12-12 14:55               ` Bui Quang Minh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).