netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
@ 2025-04-29 14:31 Jakub Kicinski
  2025-04-29 16:26 ` Bui Quang Minh
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-04-29 14:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	mst, jasowang, xuanzhuo, eperezma, minhquangbui99, romieu, kuniyu,
	virtualization

Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
fixed a deadlock between reconfig paths and refill work trying to disable
the same NAPI instance. The refill work can't run in parallel with reconfig
because trying to double-disable a NAPI instance causes a stall under the
instance lock, which the reconfig path needs to re-enable the NAPI and
therefore unblock the stalled thread.

There are two cases where we re-enable refill too early. One is in the
virtnet_set_queues() handler. We call it when installing XDP:

   virtnet_rx_pause_all(vi);
   ...
   virtnet_napi_tx_disable(..);
   ...
   virtnet_set_queues(..);
   ...
   virtnet_rx_resume_all(..);

We want the work to be disabled until we call virtnet_rx_resume_all(),
but virtnet_set_queues() kicks it before NAPIs were re-enabled.

The other case is a more trivial case of mis-ordering in
__virtnet_rx_resume() found by code inspection.

Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: mst@redhat.com
CC: jasowang@redhat.com
CC: xuanzhuo@linux.alibaba.com
CC: eperezma@redhat.com
CC: minhquangbui99@gmail.com
CC: romieu@fr.zoreil.com
CC: kuniyu@amazon.com
CC: virtualization@lists.linux.dev
---
 drivers/net/virtio_net.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 848fab51dfa1..4c904e176495 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3383,12 +3383,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
 				bool refill)
 {
 	bool running = netif_running(vi->dev);
+	bool schedule_refill = false;
 
 	if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
-		schedule_delayed_work(&vi->refill, 0);
-
+		schedule_refill = true;
 	if (running)
 		virtnet_napi_enable(rq);
+
+	if (schedule_refill)
+		schedule_delayed_work(&vi->refill, 0);
 }
 
 static void virtnet_rx_resume_all(struct virtnet_info *vi)
@@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 succ:
 	vi->curr_queue_pairs = queue_pairs;
 	/* virtnet_open() will refill when device is going to up. */
-	if (dev->flags & IFF_UP)
+	if (dev->flags & IFF_UP && vi->refill_enabled)
 		schedule_delayed_work(&vi->refill, 0);
 
 	return 0;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
  2025-04-29 14:31 [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled Jakub Kicinski
@ 2025-04-29 16:26 ` Bui Quang Minh
  2025-04-30  3:49 ` Jason Wang
  2025-04-30  4:16 ` Michael S. Tsirkin
  2 siblings, 0 replies; 6+ messages in thread
From: Bui Quang Minh @ 2025-04-29 16:26 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, mst, jasowang,
	xuanzhuo, eperezma, romieu, kuniyu, virtualization

On 4/29/25 21:31, Jakub Kicinski wrote:
> Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> fixed a deadlock between reconfig paths and refill work trying to disable
> the same NAPI instance. The refill work can't run in parallel with reconfig
> because trying to double-disable a NAPI instance causes a stall under the
> instance lock, which the reconfig path needs to re-enable the NAPI and
> therefore unblock the stalled thread.
>
> There are two cases where we re-enable refill too early. One is in the
> virtnet_set_queues() handler. We call it when installing XDP:
>
>     virtnet_rx_pause_all(vi);
>     ...
>     virtnet_napi_tx_disable(..);
>     ...
>     virtnet_set_queues(..);
>     ...
>     virtnet_rx_resume_all(..);
>
> We want the work to be disabled until we call virtnet_rx_resume_all(),
> but virtnet_set_queues() kicks it before NAPIs were re-enabled.
>
> The other case is a more trivial case of mis-ordering in
> __virtnet_rx_resume() found by code inspection.
>
> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: mst@redhat.com
> CC: jasowang@redhat.com
> CC: xuanzhuo@linux.alibaba.com
> CC: eperezma@redhat.com
> CC: minhquangbui99@gmail.com
> CC: romieu@fr.zoreil.com
> CC: kuniyu@amazon.com
> CC: virtualization@lists.linux.dev
> ---
>   drivers/net/virtio_net.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 848fab51dfa1..4c904e176495 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3383,12 +3383,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
>   				bool refill)
>   {
>   	bool running = netif_running(vi->dev);
> +	bool schedule_refill = false;
>   
>   	if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
> -		schedule_delayed_work(&vi->refill, 0);
> -
> +		schedule_refill = true;
>   	if (running)
>   		virtnet_napi_enable(rq);
> +
> +	if (schedule_refill)
> +		schedule_delayed_work(&vi->refill, 0);
>   }
>   
>   static void virtnet_rx_resume_all(struct virtnet_info *vi)
> @@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
>   succ:
>   	vi->curr_queue_pairs = queue_pairs;
>   	/* virtnet_open() will refill when device is going to up. */
> -	if (dev->flags & IFF_UP)
> +	if (dev->flags & IFF_UP && vi->refill_enabled)
>   		schedule_delayed_work(&vi->refill, 0);
>   
>   	return 0;

Reviewed-by: Bui Quang Minh <minhquangbui99@gmail.com>

Thanks,
Quang Minh.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
  2025-04-29 14:31 [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled Jakub Kicinski
  2025-04-29 16:26 ` Bui Quang Minh
@ 2025-04-30  3:49 ` Jason Wang
  2025-04-30  5:29   ` Michael S. Tsirkin
  2025-04-30  4:16 ` Michael S. Tsirkin
  2 siblings, 1 reply; 6+ messages in thread
From: Jason Wang @ 2025-04-30  3:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, mst,
	xuanzhuo, eperezma, minhquangbui99, romieu, kuniyu,
	virtualization

On Tue, Apr 29, 2025 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> fixed a deadlock between reconfig paths and refill work trying to disable
> the same NAPI instance. The refill work can't run in parallel with reconfig
> because trying to double-disable a NAPI instance causes a stall under the
> instance lock, which the reconfig path needs to re-enable the NAPI and
> therefore unblock the stalled thread.
>
> There are two cases where we re-enable refill too early. One is in the
> virtnet_set_queues() handler. We call it when installing XDP:
>
>    virtnet_rx_pause_all(vi);
>    ...
>    virtnet_napi_tx_disable(..);
>    ...
>    virtnet_set_queues(..);
>    ...
>    virtnet_rx_resume_all(..);
>
> We want the work to be disabled until we call virtnet_rx_resume_all(),
> but virtnet_set_queues() kicks it before NAPIs were re-enabled.
>
> The other case is a more trivial case of mis-ordering in
> __virtnet_rx_resume() found by code inspection.
>
> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: mst@redhat.com
> CC: jasowang@redhat.com
> CC: xuanzhuo@linux.alibaba.com
> CC: eperezma@redhat.com
> CC: minhquangbui99@gmail.com
> CC: romieu@fr.zoreil.com
> CC: kuniyu@amazon.com
> CC: virtualization@lists.linux.dev
> ---
>  drivers/net/virtio_net.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 848fab51dfa1..4c904e176495 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3383,12 +3383,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
>                                 bool refill)
>  {
>         bool running = netif_running(vi->dev);
> +       bool schedule_refill = false;
>
>         if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
> -               schedule_delayed_work(&vi->refill, 0);
> -
> +               schedule_refill = true;
>         if (running)
>                 virtnet_napi_enable(rq);
> +
> +       if (schedule_refill)
> +               schedule_delayed_work(&vi->refill, 0);
>  }
>
>  static void virtnet_rx_resume_all(struct virtnet_info *vi)
> @@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
>  succ:
>         vi->curr_queue_pairs = queue_pairs;
>         /* virtnet_open() will refill when device is going to up. */
> -       if (dev->flags & IFF_UP)
> +       if (dev->flags & IFF_UP && vi->refill_enabled)
>                 schedule_delayed_work(&vi->refill, 0);

This has the assumption that the toggle of the refill_enabled is under
RTNL. Though it's true now but it looks to me it's better to protect
it against refill_lock.

Thanks

>
>         return 0;
> --
> 2.49.0
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
  2025-04-29 14:31 [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled Jakub Kicinski
  2025-04-29 16:26 ` Bui Quang Minh
  2025-04-30  3:49 ` Jason Wang
@ 2025-04-30  4:16 ` Michael S. Tsirkin
  2 siblings, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2025-04-30  4:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, jasowang,
	xuanzhuo, eperezma, minhquangbui99, romieu, kuniyu,
	virtualization

On Tue, Apr 29, 2025 at 07:31:04AM -0700, Jakub Kicinski wrote:
> Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> fixed a deadlock between reconfig paths and refill work trying to disable
> the same NAPI instance. The refill work can't run in parallel with reconfig
> because trying to double-disable a NAPI instance causes a stall under the
> instance lock, which the reconfig path needs to re-enable the NAPI and
> therefore unblock the stalled thread.
> 
> There are two cases where we re-enable refill too early. One is in the
> virtnet_set_queues() handler. We call it when installing XDP:
> 
>    virtnet_rx_pause_all(vi);
>    ...
>    virtnet_napi_tx_disable(..);
>    ...
>    virtnet_set_queues(..);
>    ...
>    virtnet_rx_resume_all(..);
> 
> We want the work to be disabled until we call virtnet_rx_resume_all(),
> but virtnet_set_queues() kicks it before NAPIs were re-enabled.
> 
> The other case is a more trivial case of mis-ordering in
> __virtnet_rx_resume() found by code inspection.
> 
> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
> CC: mst@redhat.com
> CC: jasowang@redhat.com
> CC: xuanzhuo@linux.alibaba.com
> CC: eperezma@redhat.com
> CC: minhquangbui99@gmail.com
> CC: romieu@fr.zoreil.com
> CC: kuniyu@amazon.com
> CC: virtualization@lists.linux.dev
> ---
>  drivers/net/virtio_net.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 848fab51dfa1..4c904e176495 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3383,12 +3383,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
>  				bool refill)
>  {
>  	bool running = netif_running(vi->dev);
> +	bool schedule_refill = false;
>  
>  	if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
> -		schedule_delayed_work(&vi->refill, 0);
> -
> +		schedule_refill = true;
>  	if (running)
>  		virtnet_napi_enable(rq);
> +
> +	if (schedule_refill)
> +		schedule_delayed_work(&vi->refill, 0);
>  }
>  
>  static void virtnet_rx_resume_all(struct virtnet_info *vi)
> @@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
>  succ:
>  	vi->curr_queue_pairs = queue_pairs;
>  	/* virtnet_open() will refill when device is going to up. */
> -	if (dev->flags & IFF_UP)
> +	if (dev->flags & IFF_UP && vi->refill_enabled)
>  		schedule_delayed_work(&vi->refill, 0);
>  
>  	return 0;
> -- 
> 2.49.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
  2025-04-30  3:49 ` Jason Wang
@ 2025-04-30  5:29   ` Michael S. Tsirkin
  2025-04-30 14:02     ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Michael S. Tsirkin @ 2025-04-30  5:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, andrew+netdev,
	horms, xuanzhuo, eperezma, minhquangbui99, romieu, kuniyu,
	virtualization

On Wed, Apr 30, 2025 at 11:49:15AM +0800, Jason Wang wrote:
> On Tue, Apr 29, 2025 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> > fixed a deadlock between reconfig paths and refill work trying to disable
> > the same NAPI instance. The refill work can't run in parallel with reconfig
> > because trying to double-disable a NAPI instance causes a stall under the
> > instance lock, which the reconfig path needs to re-enable the NAPI and
> > therefore unblock the stalled thread.
> >
> > There are two cases where we re-enable refill too early. One is in the
> > virtnet_set_queues() handler. We call it when installing XDP:
> >
> >    virtnet_rx_pause_all(vi);
> >    ...
> >    virtnet_napi_tx_disable(..);
> >    ...
> >    virtnet_set_queues(..);
> >    ...
> >    virtnet_rx_resume_all(..);
> >
> > We want the work to be disabled until we call virtnet_rx_resume_all(),
> > but virtnet_set_queues() kicks it before NAPIs were re-enabled.
> >
> > The other case is a more trivial case of mis-ordering in
> > __virtnet_rx_resume() found by code inspection.
> >
> > Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> > Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > ---
> > CC: mst@redhat.com
> > CC: jasowang@redhat.com
> > CC: xuanzhuo@linux.alibaba.com
> > CC: eperezma@redhat.com
> > CC: minhquangbui99@gmail.com
> > CC: romieu@fr.zoreil.com
> > CC: kuniyu@amazon.com
> > CC: virtualization@lists.linux.dev
> > ---
> >  drivers/net/virtio_net.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 848fab51dfa1..4c904e176495 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3383,12 +3383,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
> >                                 bool refill)
> >  {
> >         bool running = netif_running(vi->dev);
> > +       bool schedule_refill = false;
> >
> >         if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
> > -               schedule_delayed_work(&vi->refill, 0);
> > -
> > +               schedule_refill = true;
> >         if (running)
> >                 virtnet_napi_enable(rq);
> > +
> > +       if (schedule_refill)
> > +               schedule_delayed_work(&vi->refill, 0);
> >  }
> >
> >  static void virtnet_rx_resume_all(struct virtnet_info *vi)
> > @@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
> >  succ:
> >         vi->curr_queue_pairs = queue_pairs;
> >         /* virtnet_open() will refill when device is going to up. */
> > -       if (dev->flags & IFF_UP)
> > +       if (dev->flags & IFF_UP && vi->refill_enabled)
> >                 schedule_delayed_work(&vi->refill, 0);
> 
> This has the assumption that the toggle of the refill_enabled is under
> RTNL. Though it's true now but it looks to me it's better to protect
> it against refill_lock.
> 
> Thanks

Good point.

> >
> >         return 0;
> > --
> > 2.49.0
> >


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
  2025-04-30  5:29   ` Michael S. Tsirkin
@ 2025-04-30 14:02     ` Jakub Kicinski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-04-30 14:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	xuanzhuo, eperezma, minhquangbui99, romieu, kuniyu,
	virtualization

On Wed, 30 Apr 2025 01:29:06 -0400 Michael S. Tsirkin wrote:
> > > @@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
> > >  succ:
> > >         vi->curr_queue_pairs = queue_pairs;
> > >         /* virtnet_open() will refill when device is going to up. */
> > > -       if (dev->flags & IFF_UP)
> > > +       if (dev->flags & IFF_UP && vi->refill_enabled)
> > >                 schedule_delayed_work(&vi->refill, 0);  
> > 
> > This has the assumption that the toggle of the refill_enabled is under
> > RTNL.

Yes, this line of code must be under rtnl_lock to be correct, since 
it is also checking flags & IFF_UP. I was thinking of moving it to
virtnet_restore(), AFAIU that's the only place that needs it.
But that's more of a -next change.

> > Though it's true now but it looks to me it's better to protect
> > it against refill_lock.
> 
> Good point.

Sure. I'll wrap the check and the call to schedule_.. with the lock.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-04-30 14:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29 14:31 [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled Jakub Kicinski
2025-04-29 16:26 ` Bui Quang Minh
2025-04-30  3:49 ` Jason Wang
2025-04-30  5:29   ` Michael S. Tsirkin
2025-04-30 14:02     ` Jakub Kicinski
2025-04-30  4:16 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).