public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: sched: avoid qdisc_reset_all_tx_gt() vs dequeue race for lockless qdiscs
@ 2026-02-28 14:53 Koichiro Den
  2026-03-05  2:00 ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 2+ messages in thread
From: Koichiro Den @ 2026-02-28 14:53 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, John Fastabend
  Cc: netdev, linux-kernel

When shrinking the number of real tx queues,
netif_set_real_num_tx_queues() calls qdisc_reset_all_tx_gt() to flush
qdiscs for queues which will no longer be used.

qdisc_reset_all_tx_gt() currently serializes qdisc_reset() with
qdisc_lock(). However, for lockless qdiscs, the dequeue path is
serialized by qdisc_run_begin/end() using qdisc->seqlock instead, so
qdisc_reset() can run concurrently with __qdisc_run() and free skbs
while they are still being dequeued, leading to UAF.

This can easily be reproduced on e.g. virtio-net by imposing heavy
traffic while frequently changing the number of queue pairs:

  iperf3 -ub0 -c $peer -t 0 &
  while :; do
    ethtool -L eth0 combined 1
    ethtool -L eth0 combined 2
  done

With KASAN enabled, this leads to reports like:

  BUG: KASAN: slab-use-after-free in __qdisc_run+0x133f/0x1760
  ...
  Call Trace:
   <TASK>
   ...
   __qdisc_run+0x133f/0x1760
   __dev_queue_xmit+0x248f/0x3550
   ip_finish_output2+0xa42/0x2110
   ip_output+0x1a7/0x410
   ip_send_skb+0x2e6/0x480
   udp_send_skb+0xb0a/0x1590
   udp_sendmsg+0x13c9/0x1fc0
   ...
   </TASK>

  Allocated by task 1270 on cpu 5 at 44.558414s:
   ...
   alloc_skb_with_frags+0x84/0x7c0
   sock_alloc_send_pskb+0x69a/0x830
   __ip_append_data+0x1b86/0x48c0
   ip_make_skb+0x1e8/0x2b0
   udp_sendmsg+0x13a6/0x1fc0
   ...

  Freed by task 1306 on cpu 3 at 44.558445s:
   ...
   kmem_cache_free+0x117/0x5e0
   pfifo_fast_reset+0x14d/0x580
   qdisc_reset+0x9e/0x5f0
   netif_set_real_num_tx_queues+0x303/0x840
   virtnet_set_channels+0x1bf/0x260 [virtio_net]
   ethnl_set_channels+0x684/0xae0
   ethnl_default_set_doit+0x31a/0x890
   ...

Serialize qdisc_reset_all_tx_gt() against the lockless dequeue path by
taking qdisc->seqlock for TCQ_F_NOLOCK qdiscs, matching the
serialization model already used by dev_reset_queue().

Additionally clear QDISC_STATE_NON_EMPTY after reset so the qdisc state
reflects an empty queue, avoiding needless re-scheduling.

Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 include/net/sch_generic.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c3a7268b567e..d5d55cb21686 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -778,13 +778,23 @@ static inline bool skb_skip_tc_classify(struct sk_buff *skb)
 static inline void qdisc_reset_all_tx_gt(struct net_device *dev, unsigned int i)
 {
 	struct Qdisc *qdisc;
+	bool nolock;
 
 	for (; i < dev->num_tx_queues; i++) {
 		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
 		if (qdisc) {
+			nolock = qdisc->flags & TCQ_F_NOLOCK;
+
+			if (nolock)
+				spin_lock_bh(&qdisc->seqlock);
 			spin_lock_bh(qdisc_lock(qdisc));
 			qdisc_reset(qdisc);
 			spin_unlock_bh(qdisc_lock(qdisc));
+			if (nolock) {
+				clear_bit(__QDISC_STATE_MISSED, &qdisc->state);
+				clear_bit(__QDISC_STATE_DRAINING, &qdisc->state);
+				spin_unlock_bh(&qdisc->seqlock);
+			}
 		}
 	}
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] net: sched: avoid qdisc_reset_all_tx_gt() vs dequeue race for lockless qdiscs
  2026-02-28 14:53 [PATCH] net: sched: avoid qdisc_reset_all_tx_gt() vs dequeue race for lockless qdiscs Koichiro Den
@ 2026-03-05  2:00 ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 2+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-03-05  2:00 UTC (permalink / raw)
  To: Koichiro Den
  Cc: davem, edumazet, kuba, pabeni, horms, john.fastabend, netdev,
	linux-kernel

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 28 Feb 2026 23:53:07 +0900 you wrote:
> When shrinking the number of real tx queues,
> netif_set_real_num_tx_queues() calls qdisc_reset_all_tx_gt() to flush
> qdiscs for queues which will no longer be used.
> 
> qdisc_reset_all_tx_gt() currently serializes qdisc_reset() with
> qdisc_lock(). However, for lockless qdiscs, the dequeue path is
> serialized by qdisc_run_begin/end() using qdisc->seqlock instead, so
> qdisc_reset() can run concurrently with __qdisc_run() and free skbs
> while they are still being dequeued, leading to UAF.
> 
> [...]

Here is the summary with links:
  - net: sched: avoid qdisc_reset_all_tx_gt() vs dequeue race for lockless qdiscs
    https://git.kernel.org/netdev/net/c/7f083faf59d1

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-03-05  2:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-28 14:53 [PATCH] net: sched: avoid qdisc_reset_all_tx_gt() vs dequeue race for lockless qdiscs Koichiro Den
2026-03-05  2:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox