All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] netpoll: fix a use-after-free on shutdown path
@ 2026-06-22 15:01 Breno Leitao
  2026-06-23  4:05 ` Pavan Chebbi
  2026-06-25  2:25 ` Jakub Kicinski
  0 siblings, 2 replies; 3+ messages in thread
From: Breno Leitao @ 2026-06-22 15:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Amerigo Wang
  Cc: netdev, linux-kernel, vlad.wing, asantostc, kernel-team, stable,
	Breno Leitao

There is a use-after-free error on netpoll, which is clearly detected by
KASAN.

      BUG: KASAN: slab-use-after-free in _raw_spin_lock_irqsave+0x3b/0x80
      Read of size 1 at addr ... by task kworker/9:1
      Workqueue: events queue_process
      Call Trace:
       skb_dequeue+0x1e/0xb0
       queue_process+0x2c/0x600
       process_scheduled_works+0x4b6/0x850
       worker_thread+0x414/0x5a0
      Allocated by task 242:
       __netpoll_setup+0x201/0x4a0
       netpoll_setup+0x249/0x550
       enabled_store+0x32f/0x380
      Freed by task 0:
       kfree+0x1b7/0x540
       rcu_core+0x3f8/0x7a0

The problem happens when there is a pending TX worker running in
parallel with the cleanup path.

This is what happens on netpoll shutdown path:

1) __netpoll_cleanup() is called
2) set dev->npinfo to NULL
3) call_rcu() with rcu_cleanup_netpoll_info()
  3.1) rcu_cleanup_netpoll_info() tries to cancel all workers with
       cancel_delayed_work(), but doesn't wait for the worker to finish
4) and kfree(npinfo);

Because 3.1) doesn't really cancel the work, as the comment says "we
can't call cancel_delayed_work_sync here, as we are in softirq", the TX
worker can run after 4).

Tl;DR: queue_process() is not an RCU reader, it reaches npinfo through
the work item via container_of().

In reality, we can improve this cleanup path by a lot, but, given that
this is targeting net, just do the sane path:

1) set dev->npinfo to NULL
2) synchronize net / RCU
3) cancel_delayed_work_sync() any new worker (that potentially showed up
   after the grace period -- and should exit soon given they will see
   dev->npinfo = NULL)
4) then rcu_cleanup_netpoll_info() -> kfree() npinfo

In the future, we can do the cleanup inline here, and don't need
npinfo->rcu rcu_head, but that is net-next material.

Cc: stable@vger.kernel.org
Fixes: 38e6bc185d95 ("netpoll: make __netpoll_cleanup non-block")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 net/core/netpoll.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 229dde818ab33..5765015b40720 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -634,9 +634,6 @@ static void rcu_cleanup_netpoll_info(struct rcu_head *rcu_head)
 
 	skb_queue_purge(&npinfo->txq);
 
-	/* we can't call cancel_delayed_work_sync here, as we are in softirq */
-	cancel_delayed_work(&npinfo->tx_work);
-
 	/* clean after last, unfinished work */
 	__skb_queue_purge(&npinfo->txq);
 	/* now cancel it again */
@@ -664,6 +661,14 @@ static void __netpoll_cleanup(struct netpoll *np)
 			ops->ndo_netpoll_cleanup(np->dev);
 
 		RCU_INIT_POINTER(np->dev->npinfo, NULL);
+		/*
+		 * synchronize_net() does not protect the worker
+		 * (queue_process() is not an RCU reader). It fences the
+		 * senders -- the real RCU readers -- so they cannot re-arm
+		 * tx_work after the np->dev->npinfo was set to NULL.
+		 */
+		synchronize_net();
+		cancel_delayed_work_sync(&npinfo->tx_work);
 		call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
 	}
 

---
base-commit: d07d80b6a129a44538cda1549b7acf95154fb197
change-id: 20260622-netpoll_rcu_fix-def7bce1207a

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] netpoll: fix a use-after-free on shutdown path
  2026-06-22 15:01 [PATCH net] netpoll: fix a use-after-free on shutdown path Breno Leitao
@ 2026-06-23  4:05 ` Pavan Chebbi
  2026-06-25  2:25 ` Jakub Kicinski
  1 sibling, 0 replies; 3+ messages in thread
From: Pavan Chebbi @ 2026-06-23  4:05 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Amerigo Wang, netdev, linux-kernel, vlad.wing,
	asantostc, kernel-team, stable

[-- Attachment #1: Type: text/plain, Size: 2334 bytes --]

On Mon, Jun 22, 2026 at 8:31 PM Breno Leitao <leitao@debian.org> wrote:
>
> There is a use-after-free error on netpoll, which is clearly detected by
> KASAN.
>
>       BUG: KASAN: slab-use-after-free in _raw_spin_lock_irqsave+0x3b/0x80
>       Read of size 1 at addr ... by task kworker/9:1
>       Workqueue: events queue_process
>       Call Trace:
>        skb_dequeue+0x1e/0xb0
>        queue_process+0x2c/0x600
>        process_scheduled_works+0x4b6/0x850
>        worker_thread+0x414/0x5a0
>       Allocated by task 242:
>        __netpoll_setup+0x201/0x4a0
>        netpoll_setup+0x249/0x550
>        enabled_store+0x32f/0x380
>       Freed by task 0:
>        kfree+0x1b7/0x540
>        rcu_core+0x3f8/0x7a0
>
> The problem happens when there is a pending TX worker running in
> parallel with the cleanup path.
>
> This is what happens on netpoll shutdown path:
>
> 1) __netpoll_cleanup() is called
> 2) set dev->npinfo to NULL
> 3) call_rcu() with rcu_cleanup_netpoll_info()
>   3.1) rcu_cleanup_netpoll_info() tries to cancel all workers with
>        cancel_delayed_work(), but doesn't wait for the worker to finish
> 4) and kfree(npinfo);
>
> Because 3.1) doesn't really cancel the work, as the comment says "we
> can't call cancel_delayed_work_sync here, as we are in softirq", the TX
> worker can run after 4).
>
> Tl;DR: queue_process() is not an RCU reader, it reaches npinfo through
> the work item via container_of().
>
> In reality, we can improve this cleanup path by a lot, but, given that
> this is targeting net, just do the sane path:
>
> 1) set dev->npinfo to NULL
> 2) synchronize net / RCU
> 3) cancel_delayed_work_sync() any new worker (that potentially showed up
>    after the grace period -- and should exit soon given they will see
>    dev->npinfo = NULL)
> 4) then rcu_cleanup_netpoll_info() -> kfree() npinfo
>
> In the future, we can do the cleanup inline here, and don't need
> npinfo->rcu rcu_head, but that is net-next material.
>
> Cc: stable@vger.kernel.org
> Fixes: 38e6bc185d95 ("netpoll: make __netpoll_cleanup non-block")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  net/core/netpoll.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] netpoll: fix a use-after-free on shutdown path
  2026-06-22 15:01 [PATCH net] netpoll: fix a use-after-free on shutdown path Breno Leitao
  2026-06-23  4:05 ` Pavan Chebbi
@ 2026-06-25  2:25 ` Jakub Kicinski
  1 sibling, 0 replies; 3+ messages in thread
From: Jakub Kicinski @ 2026-06-25  2:25 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Amerigo Wang, netdev, linux-kernel, vlad.wing, asantostc,
	kernel-team, stable

On Mon, 22 Jun 2026 08:01:23 -0700 Breno Leitao wrote:
> +		 * synchronize_net() does not protect the worker
> +		 * (queue_process() is not an RCU reader). It fences the
> +		 * senders -- the real RCU readers -- so they cannot re-arm
> +		 * tx_work after the np->dev->npinfo was set to NULL.
> +		 */
> +		synchronize_net();
> +		cancel_delayed_work_sync(&npinfo->tx_work);

Maybe we can avoid the sync_net and the comment by using
disable_delayed_work_sync() ?

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-25  2:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22 15:01 [PATCH net] netpoll: fix a use-after-free on shutdown path Breno Leitao
2026-06-23  4:05 ` Pavan Chebbi
2026-06-25  2:25 ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.