Netdev List
 help / color / mirror / Atom feed
* [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq)
@ 2026-06-25  9:51 Bryam Vargas via B4 Relay
  2026-06-25  9:51 ` [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked Bryam Vargas via B4 Relay
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Bryam Vargas via B4 Relay @ 2026-06-25  9:51 UTC (permalink / raw)
  To: Vinicius Costa Gomes, Paolo Abeni, Jamal Hadi Salim, Jiri Pirko,
	Jakub Kicinski, David S. Miller, Eric Dumazet
  Cc: Simon Horman, netdev, Jarek Poplawski, Vladimir Oltean,
	linux-kernel

Commit 77be155cba4e added peek emulation: a non-work-conserving qdisc's
->peek dequeues one skb and stashes it in the child's gso_skb. A parent
that peeks such a child must then take the packet with
qdisc_dequeue_peeked(), not a direct ->dequeue(), or the stashed skb is
bypassed and the child's qlen/backlog desync. sch_red and sch_sfb were
just fixed for this; taprio and multiq still take the direct path.

With a qfq child the desync re-enters qfq_dequeue on an emptied aggregate
list and dereferences NULL, panicking from softirq on ordinary egress.
taprio reaches it on its own (root-only software path, all gates open);
multiq reaches it when a peeking parent such as tbf wraps it over a
non-work-conserving grandchild. Both need only CAP_NET_ADMIN.

Confirmed under KASAN: the unpatched arm panics, the patched arm is
clean, and a work-conserving-child control is clean. The reproducers and
splats for both are below; the per-patch changes are one line each.

taprio reproducer (self-triggering, no parent qdisc needed):

  ip link add dummy0 numtxqueues 4 type dummy; ip link set dummy0 up
  ip addr add 10.10.11.10/24 dev dummy0
  tc qdisc add dev dummy0 root handle 1: taprio num_tc 2 \
     map 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 1@1 \
     base-time 9000000000000000000 sched-entry S 03 200000 flags 0x0 clockid CLOCK_TAI
  tc qdisc replace dev dummy0 parent 1:1 handle 3: qfq
  tc class  add dev dummy0 classid 3:1 parent 3: qfq maxpkt 512 weight 1
  tc filter add dev dummy0 parent 3: protocol ip prio 1 matchall classid 3:1
  ping -c1 10.10.11.99 -I dummy0

[  903.769174] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] SMP KASAN NOPTI
[  903.769953] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[  903.770456] CPU: 7 UID: 0 PID: 16162 Comm: ping Not tainted 7.1.0-rc5 #1 PREEMPT(lazy)
[  903.771725] RIP: 0010:qfq_dequeue+0x362/0x1580 [sch_qfq]
[  903.777452] Call Trace:
[  903.778311]  taprio_dequeue_from_txq+0x383/0x680 [sch_taprio]
[  903.778685]  taprio_dequeue_tc_priority+0x19a/0x330 [sch_taprio]
[  903.779645]  taprio_dequeue+0xa6/0x330 [sch_taprio]
[  903.780299]  __qdisc_run+0x16c/0x1890
[  903.780854]  __dev_queue_xmit+0x1ece/0x3390
[  903.784109]  ip_finish_output2+0x571/0x1da0
[  903.785996]  ip_output+0x26c/0x4d0
[  903.789572]  ping_v4_sendmsg+0xd22/0x12b0
[  903.796118]  __x64_sys_sendto+0xe0/0x1c0
[  903.796612]  do_syscall_64+0xee/0x590
[  903.818669] Kernel panic - not syncing: Fatal exception in interrupt

multiq reproducer (needs a peeking parent over a stashing child; tbf
values chosen to force it to throttle):

  ip link add dummy0 numtxqueues 2 type dummy; ip link set dummy0 up
  ip addr add 10.10.11.10/24 dev dummy0
  tc qdisc add dev dummy0 root handle 1: tbf rate 88bit burst 1661b \
     peakrate 2257333 minburst 1024 limit 7b
  tc qdisc add dev dummy0 parent 1: handle 2: multiq
  for b in 1 2; do                          # qfq on every band
    tc qdisc  add dev dummy0 parent 2:$b handle 3$b: qfq
    tc class  add dev dummy0 classid 3$b:1 parent 3$b: qfq maxpkt 512 weight 1
    tc filter add dev dummy0 parent 3$b: protocol ip prio 1 matchall classid 3$b:1
  done
  ping -c12 10.10.11.99 -I dummy0

[ 1066.385097] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] SMP KASAN NOPTI
[ 1066.386385] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[ 1066.387227] CPU: 1 UID: 0 PID: 5357 Comm: ping Not tainted 7.1.0-rc5 #1 PREEMPT(lazy)
[ 1066.389183] RIP: 0010:qfq_dequeue+0x362/0x1580 [sch_qfq]
[ 1066.396316] Call Trace:
[ 1066.396768]  multiq_dequeue+0x163/0x360 [sch_multiq]
[ 1066.397885]  tbf_dequeue+0x6b9/0xf17 [sch_tbf]
[ 1066.398269]  __qdisc_run+0x16c/0x1890
[ 1066.399315]  __dev_queue_xmit+0x1ece/0x3390
[ 1066.403276]  ip_finish_output2+0x571/0x1da0
[ 1066.404818]  ip_output+0x26c/0x4d0
[ 1066.408620]  ping_v4_sendmsg+0xd22/0x12b0
[ 1066.415264]  __x64_sys_sendto+0xe0/0x1c0
[ 1066.416251]  do_syscall_64+0xee/0x590
[ 1066.441210] Kernel panic - not syncing: Fatal exception in interrupt

---
Bryam Vargas (2):
      net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked
      net/sched: sch_multiq: Replace direct dequeue call with peek and qdisc_dequeue_peeked

 net/sched/sch_multiq.c | 2 +-
 net/sched/sch_taprio.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
---
base-commit: 02f144fbb4c86c360495d33debe307cb46a57f95
change-id: 20260625-b4-disp-31bcb279-082e59a3aa36

Best regards,
-- 
Bryam Vargas <hexlabsecurity@proton.me>



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-27  2:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25  9:51 [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq) Bryam Vargas via B4 Relay
2026-06-25  9:51 ` [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked Bryam Vargas via B4 Relay
2026-06-26 17:16   ` Victor Nogueira
2026-06-26 17:46     ` Jamal Hadi Salim
2026-06-25  9:51 ` [PATCH 2/2] net/sched: sch_multiq: " Bryam Vargas via B4 Relay
2026-06-26 17:18   ` Victor Nogueira
2026-06-26 17:47     ` Jamal Hadi Salim
2026-06-27  2:00 ` [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq) patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox