* Re: [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
2025-04-29 14:31 [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled Jakub Kicinski
@ 2025-04-29 16:26 ` Bui Quang Minh
2025-04-30 3:49 ` Jason Wang
2025-04-30 4:16 ` Michael S. Tsirkin
2 siblings, 0 replies; 6+ messages in thread
From: Bui Quang Minh @ 2025-04-29 16:26 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, mst, jasowang,
xuanzhuo, eperezma, romieu, kuniyu, virtualization
On 4/29/25 21:31, Jakub Kicinski wrote:
> Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> fixed a deadlock between reconfig paths and refill work trying to disable
> the same NAPI instance. The refill work can't run in parallel with reconfig
> because trying to double-disable a NAPI instance causes a stall under the
> instance lock, which the reconfig path needs to re-enable the NAPI and
> therefore unblock the stalled thread.
>
> There are two cases where we re-enable refill too early. One is in the
> virtnet_set_queues() handler. We call it when installing XDP:
>
> virtnet_rx_pause_all(vi);
> ...
> virtnet_napi_tx_disable(..);
> ...
> virtnet_set_queues(..);
> ...
> virtnet_rx_resume_all(..);
>
> We want the work to be disabled until we call virtnet_rx_resume_all(),
> but virtnet_set_queues() kicks it before NAPIs were re-enabled.
>
> The other case is a more trivial case of mis-ordering in
> __virtnet_rx_resume() found by code inspection.
>
> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: mst@redhat.com
> CC: jasowang@redhat.com
> CC: xuanzhuo@linux.alibaba.com
> CC: eperezma@redhat.com
> CC: minhquangbui99@gmail.com
> CC: romieu@fr.zoreil.com
> CC: kuniyu@amazon.com
> CC: virtualization@lists.linux.dev
> ---
> drivers/net/virtio_net.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 848fab51dfa1..4c904e176495 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3383,12 +3383,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
> bool refill)
> {
> bool running = netif_running(vi->dev);
> + bool schedule_refill = false;
>
> if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
> - schedule_delayed_work(&vi->refill, 0);
> -
> + schedule_refill = true;
> if (running)
> virtnet_napi_enable(rq);
> +
> + if (schedule_refill)
> + schedule_delayed_work(&vi->refill, 0);
> }
>
> static void virtnet_rx_resume_all(struct virtnet_info *vi)
> @@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
> succ:
> vi->curr_queue_pairs = queue_pairs;
> /* virtnet_open() will refill when device is going to up. */
> - if (dev->flags & IFF_UP)
> + if (dev->flags & IFF_UP && vi->refill_enabled)
> schedule_delayed_work(&vi->refill, 0);
>
> return 0;
Reviewed-by: Bui Quang Minh <minhquangbui99@gmail.com>
Thanks,
Quang Minh.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
2025-04-29 14:31 [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled Jakub Kicinski
2025-04-29 16:26 ` Bui Quang Minh
@ 2025-04-30 3:49 ` Jason Wang
2025-04-30 5:29 ` Michael S. Tsirkin
2025-04-30 4:16 ` Michael S. Tsirkin
2 siblings, 1 reply; 6+ messages in thread
From: Jason Wang @ 2025-04-30 3:49 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, mst,
xuanzhuo, eperezma, minhquangbui99, romieu, kuniyu,
virtualization
On Tue, Apr 29, 2025 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> fixed a deadlock between reconfig paths and refill work trying to disable
> the same NAPI instance. The refill work can't run in parallel with reconfig
> because trying to double-disable a NAPI instance causes a stall under the
> instance lock, which the reconfig path needs to re-enable the NAPI and
> therefore unblock the stalled thread.
>
> There are two cases where we re-enable refill too early. One is in the
> virtnet_set_queues() handler. We call it when installing XDP:
>
> virtnet_rx_pause_all(vi);
> ...
> virtnet_napi_tx_disable(..);
> ...
> virtnet_set_queues(..);
> ...
> virtnet_rx_resume_all(..);
>
> We want the work to be disabled until we call virtnet_rx_resume_all(),
> but virtnet_set_queues() kicks it before NAPIs were re-enabled.
>
> The other case is a more trivial case of mis-ordering in
> __virtnet_rx_resume() found by code inspection.
>
> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: mst@redhat.com
> CC: jasowang@redhat.com
> CC: xuanzhuo@linux.alibaba.com
> CC: eperezma@redhat.com
> CC: minhquangbui99@gmail.com
> CC: romieu@fr.zoreil.com
> CC: kuniyu@amazon.com
> CC: virtualization@lists.linux.dev
> ---
> drivers/net/virtio_net.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 848fab51dfa1..4c904e176495 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3383,12 +3383,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
> bool refill)
> {
> bool running = netif_running(vi->dev);
> + bool schedule_refill = false;
>
> if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
> - schedule_delayed_work(&vi->refill, 0);
> -
> + schedule_refill = true;
> if (running)
> virtnet_napi_enable(rq);
> +
> + if (schedule_refill)
> + schedule_delayed_work(&vi->refill, 0);
> }
>
> static void virtnet_rx_resume_all(struct virtnet_info *vi)
> @@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
> succ:
> vi->curr_queue_pairs = queue_pairs;
> /* virtnet_open() will refill when device is going to up. */
> - if (dev->flags & IFF_UP)
> + if (dev->flags & IFF_UP && vi->refill_enabled)
> schedule_delayed_work(&vi->refill, 0);
This has the assumption that the toggle of the refill_enabled is under
RTNL. Though it's true now but it looks to me it's better to protect
it against refill_lock.
Thanks
>
> return 0;
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
2025-04-30 3:49 ` Jason Wang
@ 2025-04-30 5:29 ` Michael S. Tsirkin
2025-04-30 14:02 ` Jakub Kicinski
0 siblings, 1 reply; 6+ messages in thread
From: Michael S. Tsirkin @ 2025-04-30 5:29 UTC (permalink / raw)
To: Jason Wang
Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, andrew+netdev,
horms, xuanzhuo, eperezma, minhquangbui99, romieu, kuniyu,
virtualization
On Wed, Apr 30, 2025 at 11:49:15AM +0800, Jason Wang wrote:
> On Tue, Apr 29, 2025 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> > fixed a deadlock between reconfig paths and refill work trying to disable
> > the same NAPI instance. The refill work can't run in parallel with reconfig
> > because trying to double-disable a NAPI instance causes a stall under the
> > instance lock, which the reconfig path needs to re-enable the NAPI and
> > therefore unblock the stalled thread.
> >
> > There are two cases where we re-enable refill too early. One is in the
> > virtnet_set_queues() handler. We call it when installing XDP:
> >
> > virtnet_rx_pause_all(vi);
> > ...
> > virtnet_napi_tx_disable(..);
> > ...
> > virtnet_set_queues(..);
> > ...
> > virtnet_rx_resume_all(..);
> >
> > We want the work to be disabled until we call virtnet_rx_resume_all(),
> > but virtnet_set_queues() kicks it before NAPIs were re-enabled.
> >
> > The other case is a more trivial case of mis-ordering in
> > __virtnet_rx_resume() found by code inspection.
> >
> > Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> > Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > ---
> > CC: mst@redhat.com
> > CC: jasowang@redhat.com
> > CC: xuanzhuo@linux.alibaba.com
> > CC: eperezma@redhat.com
> > CC: minhquangbui99@gmail.com
> > CC: romieu@fr.zoreil.com
> > CC: kuniyu@amazon.com
> > CC: virtualization@lists.linux.dev
> > ---
> > drivers/net/virtio_net.c | 9 ++++++---
> > 1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 848fab51dfa1..4c904e176495 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3383,12 +3383,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
> > bool refill)
> > {
> > bool running = netif_running(vi->dev);
> > + bool schedule_refill = false;
> >
> > if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
> > - schedule_delayed_work(&vi->refill, 0);
> > -
> > + schedule_refill = true;
> > if (running)
> > virtnet_napi_enable(rq);
> > +
> > + if (schedule_refill)
> > + schedule_delayed_work(&vi->refill, 0);
> > }
> >
> > static void virtnet_rx_resume_all(struct virtnet_info *vi)
> > @@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
> > succ:
> > vi->curr_queue_pairs = queue_pairs;
> > /* virtnet_open() will refill when device is going to up. */
> > - if (dev->flags & IFF_UP)
> > + if (dev->flags & IFF_UP && vi->refill_enabled)
> > schedule_delayed_work(&vi->refill, 0);
>
> This has the assumption that the toggle of the refill_enabled is under
> RTNL. Though it's true now but it looks to me it's better to protect
> it against refill_lock.
>
> Thanks
Good point.
> >
> > return 0;
> > --
> > 2.49.0
> >
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
2025-04-30 5:29 ` Michael S. Tsirkin
@ 2025-04-30 14:02 ` Jakub Kicinski
0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-04-30 14:02 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jason Wang, davem, netdev, edumazet, pabeni, andrew+netdev, horms,
xuanzhuo, eperezma, minhquangbui99, romieu, kuniyu,
virtualization
On Wed, 30 Apr 2025 01:29:06 -0400 Michael S. Tsirkin wrote:
> > > @@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
> > > succ:
> > > vi->curr_queue_pairs = queue_pairs;
> > > /* virtnet_open() will refill when device is going to up. */
> > > - if (dev->flags & IFF_UP)
> > > + if (dev->flags & IFF_UP && vi->refill_enabled)
> > > schedule_delayed_work(&vi->refill, 0);
> >
> > This has the assumption that the toggle of the refill_enabled is under
> > RTNL.
Yes, this line of code must be under rtnl_lock to be correct, since
it is also checking flags & IFF_UP. I was thinking of moving it to
virtnet_restore(), AFAIU that's the only place that needs it.
But that's more of a -next change.
> > Though it's true now but it looks to me it's better to protect
> > it against refill_lock.
>
> Good point.
Sure. I'll wrap the check and the call to schedule_.. with the lock.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled
2025-04-29 14:31 [PATCH net] virtio-net: don't re-enable refill work too early when NAPI is disabled Jakub Kicinski
2025-04-29 16:26 ` Bui Quang Minh
2025-04-30 3:49 ` Jason Wang
@ 2025-04-30 4:16 ` Michael S. Tsirkin
2 siblings, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2025-04-30 4:16 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, jasowang,
xuanzhuo, eperezma, minhquangbui99, romieu, kuniyu,
virtualization
On Tue, Apr 29, 2025 at 07:31:04AM -0700, Jakub Kicinski wrote:
> Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> fixed a deadlock between reconfig paths and refill work trying to disable
> the same NAPI instance. The refill work can't run in parallel with reconfig
> because trying to double-disable a NAPI instance causes a stall under the
> instance lock, which the reconfig path needs to re-enable the NAPI and
> therefore unblock the stalled thread.
>
> There are two cases where we re-enable refill too early. One is in the
> virtnet_set_queues() handler. We call it when installing XDP:
>
> virtnet_rx_pause_all(vi);
> ...
> virtnet_napi_tx_disable(..);
> ...
> virtnet_set_queues(..);
> ...
> virtnet_rx_resume_all(..);
>
> We want the work to be disabled until we call virtnet_rx_resume_all(),
> but virtnet_set_queues() kicks it before NAPIs were re-enabled.
>
> The other case is a more trivial case of mis-ordering in
> __virtnet_rx_resume() found by code inspection.
>
> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
> Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> CC: mst@redhat.com
> CC: jasowang@redhat.com
> CC: xuanzhuo@linux.alibaba.com
> CC: eperezma@redhat.com
> CC: minhquangbui99@gmail.com
> CC: romieu@fr.zoreil.com
> CC: kuniyu@amazon.com
> CC: virtualization@lists.linux.dev
> ---
> drivers/net/virtio_net.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 848fab51dfa1..4c904e176495 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3383,12 +3383,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
> bool refill)
> {
> bool running = netif_running(vi->dev);
> + bool schedule_refill = false;
>
> if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))
> - schedule_delayed_work(&vi->refill, 0);
> -
> + schedule_refill = true;
> if (running)
> virtnet_napi_enable(rq);
> +
> + if (schedule_refill)
> + schedule_delayed_work(&vi->refill, 0);
> }
>
> static void virtnet_rx_resume_all(struct virtnet_info *vi)
> @@ -3728,7 +3731,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
> succ:
> vi->curr_queue_pairs = queue_pairs;
> /* virtnet_open() will refill when device is going to up. */
> - if (dev->flags & IFF_UP)
> + if (dev->flags & IFF_UP && vi->refill_enabled)
> schedule_delayed_work(&vi->refill, 0);
>
> return 0;
> --
> 2.49.0
^ permalink raw reply [flat|nested] 6+ messages in thread