* [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq)
@ 2026-06-25 9:51 Bryam Vargas via B4 Relay
2026-06-25 9:51 ` [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked Bryam Vargas via B4 Relay
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Bryam Vargas via B4 Relay @ 2026-06-25 9:51 UTC (permalink / raw)
To: Vinicius Costa Gomes, Paolo Abeni, Jamal Hadi Salim, Jiri Pirko,
Jakub Kicinski, David S. Miller, Eric Dumazet
Cc: Simon Horman, netdev, Jarek Poplawski, Vladimir Oltean,
linux-kernel
Commit 77be155cba4e added peek emulation: a non-work-conserving qdisc's
->peek dequeues one skb and stashes it in the child's gso_skb. A parent
that peeks such a child must then take the packet with
qdisc_dequeue_peeked(), not a direct ->dequeue(), or the stashed skb is
bypassed and the child's qlen/backlog desync. sch_red and sch_sfb were
just fixed for this; taprio and multiq still take the direct path.
With a qfq child the desync re-enters qfq_dequeue on an emptied aggregate
list and dereferences NULL, panicking from softirq on ordinary egress.
taprio reaches it on its own (root-only software path, all gates open);
multiq reaches it when a peeking parent such as tbf wraps it over a
non-work-conserving grandchild. Both need only CAP_NET_ADMIN.
Confirmed under KASAN: the unpatched arm panics, the patched arm is
clean, and a work-conserving-child control is clean. The reproducers and
splats for both are below; the per-patch changes are one line each.
taprio reproducer (self-triggering, no parent qdisc needed):
ip link add dummy0 numtxqueues 4 type dummy; ip link set dummy0 up
ip addr add 10.10.11.10/24 dev dummy0
tc qdisc add dev dummy0 root handle 1: taprio num_tc 2 \
map 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 1@1 \
base-time 9000000000000000000 sched-entry S 03 200000 flags 0x0 clockid CLOCK_TAI
tc qdisc replace dev dummy0 parent 1:1 handle 3: qfq
tc class add dev dummy0 classid 3:1 parent 3: qfq maxpkt 512 weight 1
tc filter add dev dummy0 parent 3: protocol ip prio 1 matchall classid 3:1
ping -c1 10.10.11.99 -I dummy0
[ 903.769174] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] SMP KASAN NOPTI
[ 903.769953] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[ 903.770456] CPU: 7 UID: 0 PID: 16162 Comm: ping Not tainted 7.1.0-rc5 #1 PREEMPT(lazy)
[ 903.771725] RIP: 0010:qfq_dequeue+0x362/0x1580 [sch_qfq]
[ 903.777452] Call Trace:
[ 903.778311] taprio_dequeue_from_txq+0x383/0x680 [sch_taprio]
[ 903.778685] taprio_dequeue_tc_priority+0x19a/0x330 [sch_taprio]
[ 903.779645] taprio_dequeue+0xa6/0x330 [sch_taprio]
[ 903.780299] __qdisc_run+0x16c/0x1890
[ 903.780854] __dev_queue_xmit+0x1ece/0x3390
[ 903.784109] ip_finish_output2+0x571/0x1da0
[ 903.785996] ip_output+0x26c/0x4d0
[ 903.789572] ping_v4_sendmsg+0xd22/0x12b0
[ 903.796118] __x64_sys_sendto+0xe0/0x1c0
[ 903.796612] do_syscall_64+0xee/0x590
[ 903.818669] Kernel panic - not syncing: Fatal exception in interrupt
multiq reproducer (needs a peeking parent over a stashing child; tbf
values chosen to force it to throttle):
ip link add dummy0 numtxqueues 2 type dummy; ip link set dummy0 up
ip addr add 10.10.11.10/24 dev dummy0
tc qdisc add dev dummy0 root handle 1: tbf rate 88bit burst 1661b \
peakrate 2257333 minburst 1024 limit 7b
tc qdisc add dev dummy0 parent 1: handle 2: multiq
for b in 1 2; do # qfq on every band
tc qdisc add dev dummy0 parent 2:$b handle 3$b: qfq
tc class add dev dummy0 classid 3$b:1 parent 3$b: qfq maxpkt 512 weight 1
tc filter add dev dummy0 parent 3$b: protocol ip prio 1 matchall classid 3$b:1
done
ping -c12 10.10.11.99 -I dummy0
[ 1066.385097] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] SMP KASAN NOPTI
[ 1066.386385] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[ 1066.387227] CPU: 1 UID: 0 PID: 5357 Comm: ping Not tainted 7.1.0-rc5 #1 PREEMPT(lazy)
[ 1066.389183] RIP: 0010:qfq_dequeue+0x362/0x1580 [sch_qfq]
[ 1066.396316] Call Trace:
[ 1066.396768] multiq_dequeue+0x163/0x360 [sch_multiq]
[ 1066.397885] tbf_dequeue+0x6b9/0xf17 [sch_tbf]
[ 1066.398269] __qdisc_run+0x16c/0x1890
[ 1066.399315] __dev_queue_xmit+0x1ece/0x3390
[ 1066.403276] ip_finish_output2+0x571/0x1da0
[ 1066.404818] ip_output+0x26c/0x4d0
[ 1066.408620] ping_v4_sendmsg+0xd22/0x12b0
[ 1066.415264] __x64_sys_sendto+0xe0/0x1c0
[ 1066.416251] do_syscall_64+0xee/0x590
[ 1066.441210] Kernel panic - not syncing: Fatal exception in interrupt
---
Bryam Vargas (2):
net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked
net/sched: sch_multiq: Replace direct dequeue call with peek and qdisc_dequeue_peeked
net/sched/sch_multiq.c | 2 +-
net/sched/sch_taprio.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
---
base-commit: 02f144fbb4c86c360495d33debe307cb46a57f95
change-id: 20260625-b4-disp-31bcb279-082e59a3aa36
Best regards,
--
Bryam Vargas <hexlabsecurity@proton.me>
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked
2026-06-25 9:51 [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq) Bryam Vargas via B4 Relay
@ 2026-06-25 9:51 ` Bryam Vargas via B4 Relay
2026-06-26 17:16 ` Victor Nogueira
2026-06-25 9:51 ` [PATCH 2/2] net/sched: sch_multiq: " Bryam Vargas via B4 Relay
2026-06-27 2:00 ` [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq) patchwork-bot+netdevbpf
2 siblings, 1 reply; 8+ messages in thread
From: Bryam Vargas via B4 Relay @ 2026-06-25 9:51 UTC (permalink / raw)
To: Vinicius Costa Gomes, Paolo Abeni, Jamal Hadi Salim, Jiri Pirko,
Jakub Kicinski, David S. Miller, Eric Dumazet
Cc: Simon Horman, netdev, Jarek Poplawski, Vladimir Oltean,
linux-kernel
From: Bryam Vargas <hexlabsecurity@proton.me>
When taprio's software path peeks a non-work-conserving child qdisc, the
child stashes the peeked skb in its gso_skb; taprio_dequeue_from_txq()
then takes the packet with a direct child ->dequeue() call, which ignores
that stash, orphans the peeked skb and desyncs the child's qlen/backlog.
With a qfq child this re-enters the child on an emptied list and
dereferences NULL, panicking the kernel from softirq on ordinary egress.
Take the packet through qdisc_dequeue_peeked(), as sch_red and sch_sfb
now do. The helper returns the child's stashed skb first and is a no-op
when there is none, so a work-conserving child is unaffected and the
gated path now consumes the skb whose length was charged to the budget.
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Cc: stable@vger.kernel.org
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
net/sched/sch_taprio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 558987d9b977..299234a5f0fe 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -749,7 +749,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
return NULL;
skip_peek_checks:
- skb = child->ops->dequeue(child);
+ skb = qdisc_dequeue_peeked(child);
if (unlikely(!skb))
return NULL;
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked
2026-06-25 9:51 ` [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked Bryam Vargas via B4 Relay
@ 2026-06-26 17:16 ` Victor Nogueira
2026-06-26 17:46 ` Jamal Hadi Salim
0 siblings, 1 reply; 8+ messages in thread
From: Victor Nogueira @ 2026-06-26 17:16 UTC (permalink / raw)
To: hexlabsecurity, Vinicius Costa Gomes, Paolo Abeni,
Jamal Hadi Salim, Jiri Pirko, Jakub Kicinski, David S. Miller,
Eric Dumazet
Cc: Simon Horman, netdev, Jarek Poplawski, Vladimir Oltean,
linux-kernel
On 25/06/2026 06:51, Bryam Vargas via B4 Relay wrote:
> From: Bryam Vargas <hexlabsecurity@proton.me>
>
> When taprio's software path peeks a non-work-conserving child qdisc, the
> child stashes the peeked skb in its gso_skb; taprio_dequeue_from_txq()
> then takes the packet with a direct child ->dequeue() call, which ignores
> that stash, orphans the peeked skb and desyncs the child's qlen/backlog.
> With a qfq child this re-enters the child on an emptied list and
> dereferences NULL, panicking the kernel from softirq on ordinary egress.
>
> Take the packet through qdisc_dequeue_peeked(), as sch_red and sch_sfb
> now do. The helper returns the child's stashed skb first and is a no-op
> when there is none, so a work-conserving child is unaffected and the
> gated path now consumes the skb whose length was charged to the budget.
>
> Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
> Cc: stable@vger.kernel.org
> Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
> Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked
2026-06-26 17:16 ` Victor Nogueira
@ 2026-06-26 17:46 ` Jamal Hadi Salim
0 siblings, 0 replies; 8+ messages in thread
From: Jamal Hadi Salim @ 2026-06-26 17:46 UTC (permalink / raw)
To: Victor Nogueira
Cc: hexlabsecurity, Vinicius Costa Gomes, Paolo Abeni, Jiri Pirko,
Jakub Kicinski, David S. Miller, Eric Dumazet, Simon Horman,
netdev, Jarek Poplawski, Vladimir Oltean, linux-kernel
On Fri, Jun 26, 2026 at 1:16 PM Victor Nogueira <victor@mojatatu.com> wrote:
>
> On 25/06/2026 06:51, Bryam Vargas via B4 Relay wrote:
> > From: Bryam Vargas <hexlabsecurity@proton.me>
> >
> > When taprio's software path peeks a non-work-conserving child qdisc, the
> > child stashes the peeked skb in its gso_skb; taprio_dequeue_from_txq()
> > then takes the packet with a direct child ->dequeue() call, which ignores
> > that stash, orphans the peeked skb and desyncs the child's qlen/backlog.
> > With a qfq child this re-enters the child on an emptied list and
> > dereferences NULL, panicking the kernel from softirq on ordinary egress.
> >
> > Take the packet through qdisc_dequeue_peeked(), as sch_red and sch_sfb
> > now do. The helper returns the child's stashed skb first and is a no-op
> > when there is none, so a work-conserving child is unaffected and the
> > gated path now consumes the skb whose length was charged to the budget.
> >
> > Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
> > Cc: stable@vger.kernel.org
> > Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
> > Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
>
> Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] net/sched: sch_multiq: Replace direct dequeue call with peek and qdisc_dequeue_peeked
2026-06-25 9:51 [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq) Bryam Vargas via B4 Relay
2026-06-25 9:51 ` [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked Bryam Vargas via B4 Relay
@ 2026-06-25 9:51 ` Bryam Vargas via B4 Relay
2026-06-26 17:18 ` Victor Nogueira
2026-06-27 2:00 ` [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq) patchwork-bot+netdevbpf
2 siblings, 1 reply; 8+ messages in thread
From: Bryam Vargas via B4 Relay @ 2026-06-25 9:51 UTC (permalink / raw)
To: Vinicius Costa Gomes, Paolo Abeni, Jamal Hadi Salim, Jiri Pirko,
Jakub Kicinski, David S. Miller, Eric Dumazet
Cc: Simon Horman, netdev, Jarek Poplawski, Vladimir Oltean,
linux-kernel
From: Bryam Vargas <hexlabsecurity@proton.me>
multiq_dequeue() takes a packet from a band's child with a direct
->dequeue() call after multiq_peek() peeked it. When the child is
non-work-conserving the peek stashes the skb in the child's gso_skb, so
the direct dequeue returns a different skb and orphans the stash,
desyncing the child's qlen/backlog. With a qfq child reached through a
peeking parent (e.g. tbf) this re-enters the child on an emptied list and
dereferences NULL, panicking the kernel from softirq on ordinary egress.
Take the packet through qdisc_dequeue_peeked(), as sch_prio already does
and as sch_red and sch_sfb were just fixed to do. The helper is a no-op
when the child has no stash, so a work-conserving child is unaffected.
Fixes: 77be155cba4e ("pkt_sched: Add peek emulation for non-work-conserving qdiscs.")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
net/sched/sch_multiq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 4e465d11e3d7..a467dd122369 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -103,7 +103,7 @@ static struct sk_buff *multiq_dequeue(struct Qdisc *sch)
if (!netif_xmit_stopped(
netdev_get_tx_queue(qdisc_dev(sch), q->curband))) {
qdisc = q->queues[q->curband];
- skb = qdisc->dequeue(qdisc);
+ skb = qdisc_dequeue_peeked(qdisc);
if (skb) {
qdisc_bstats_update(sch, skb);
qdisc_qlen_dec(sch);
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 2/2] net/sched: sch_multiq: Replace direct dequeue call with peek and qdisc_dequeue_peeked
2026-06-25 9:51 ` [PATCH 2/2] net/sched: sch_multiq: " Bryam Vargas via B4 Relay
@ 2026-06-26 17:18 ` Victor Nogueira
2026-06-26 17:47 ` Jamal Hadi Salim
0 siblings, 1 reply; 8+ messages in thread
From: Victor Nogueira @ 2026-06-26 17:18 UTC (permalink / raw)
To: hexlabsecurity, Vinicius Costa Gomes, Paolo Abeni,
Jamal Hadi Salim, Jiri Pirko, Jakub Kicinski, David S. Miller,
Eric Dumazet
Cc: Simon Horman, netdev, Jarek Poplawski, Vladimir Oltean,
linux-kernel
On 25/06/2026 06:51, Bryam Vargas via B4 Relay wrote:
> From: Bryam Vargas <hexlabsecurity@proton.me>
>
> multiq_dequeue() takes a packet from a band's child with a direct
> ->dequeue() call after multiq_peek() peeked it. When the child is
> non-work-conserving the peek stashes the skb in the child's gso_skb, so
> the direct dequeue returns a different skb and orphans the stash,
> desyncing the child's qlen/backlog. With a qfq child reached through a
> peeking parent (e.g. tbf) this re-enters the child on an emptied list and
> dereferences NULL, panicking the kernel from softirq on ordinary egress.
>
> Take the packet through qdisc_dequeue_peeked(), as sch_prio already does
> and as sch_red and sch_sfb were just fixed to do. The helper is a no-op
> when the child has no stash, so a work-conserving child is unaffected.
>
> Fixes: 77be155cba4e ("pkt_sched: Add peek emulation for non-work-conserving qdiscs.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 2/2] net/sched: sch_multiq: Replace direct dequeue call with peek and qdisc_dequeue_peeked
2026-06-26 17:18 ` Victor Nogueira
@ 2026-06-26 17:47 ` Jamal Hadi Salim
0 siblings, 0 replies; 8+ messages in thread
From: Jamal Hadi Salim @ 2026-06-26 17:47 UTC (permalink / raw)
To: Victor Nogueira
Cc: hexlabsecurity, Vinicius Costa Gomes, Paolo Abeni, Jiri Pirko,
Jakub Kicinski, David S. Miller, Eric Dumazet, Simon Horman,
netdev, Jarek Poplawski, Vladimir Oltean, linux-kernel
On Fri, Jun 26, 2026 at 1:18 PM Victor Nogueira <victor@mojatatu.com> wrote:
>
> On 25/06/2026 06:51, Bryam Vargas via B4 Relay wrote:
> > From: Bryam Vargas <hexlabsecurity@proton.me>
> >
> > multiq_dequeue() takes a packet from a band's child with a direct
> > ->dequeue() call after multiq_peek() peeked it. When the child is
> > non-work-conserving the peek stashes the skb in the child's gso_skb, so
> > the direct dequeue returns a different skb and orphans the stash,
> > desyncing the child's qlen/backlog. With a qfq child reached through a
> > peeking parent (e.g. tbf) this re-enters the child on an emptied list and
> > dereferences NULL, panicking the kernel from softirq on ordinary egress.
> >
> > Take the packet through qdisc_dequeue_peeked(), as sch_prio already does
> > and as sch_red and sch_sfb were just fixed to do. The helper is a no-op
> > when the child has no stash, so a work-conserving child is unaffected.
> >
> > Fixes: 77be155cba4e ("pkt_sched: Add peek emulation for non-work-conserving qdiscs.")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
>
> Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq)
2026-06-25 9:51 [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq) Bryam Vargas via B4 Relay
2026-06-25 9:51 ` [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked Bryam Vargas via B4 Relay
2026-06-25 9:51 ` [PATCH 2/2] net/sched: sch_multiq: " Bryam Vargas via B4 Relay
@ 2026-06-27 2:00 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-06-27 2:00 UTC (permalink / raw)
To: Bryam Vargas
Cc: vinicius.gomes, pabeni, jhs, jiri, kuba, davem, edumazet, horms,
netdev, jarkao2, vladimir.oltean, linux-kernel
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 25 Jun 2026 04:51:18 -0500 you wrote:
> Commit 77be155cba4e added peek emulation: a non-work-conserving qdisc's
> ->peek dequeues one skb and stashes it in the child's gso_skb. A parent
> that peeks such a child must then take the packet with
> qdisc_dequeue_peeked(), not a direct ->dequeue(), or the stashed skb is
> bypassed and the child's qlen/backlog desync. sch_red and sch_sfb were
> just fixed for this; taprio and multiq still take the direct path.
>
> [...]
Here is the summary with links:
- [1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked
https://git.kernel.org/netdev/net/c/e056e1dfcddc
- [2/2] net/sched: sch_multiq: Replace direct dequeue call with peek and qdisc_dequeue_peeked
https://git.kernel.org/netdev/net/c/54f6b0c843e2
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-06-27 2:00 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 9:51 [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq) Bryam Vargas via B4 Relay
2026-06-25 9:51 ` [PATCH 1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked Bryam Vargas via B4 Relay
2026-06-26 17:16 ` Victor Nogueira
2026-06-26 17:46 ` Jamal Hadi Salim
2026-06-25 9:51 ` [PATCH 2/2] net/sched: sch_multiq: " Bryam Vargas via B4 Relay
2026-06-26 17:18 ` Victor Nogueira
2026-06-26 17:47 ` Jamal Hadi Salim
2026-06-27 2:00 ` [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq) patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox