* [PATCH v2 net 1/1] net/sched: sch_dualpi2: fix limit/memlimit enforcement when dequeueing L-queue
@ 2026-04-16 17:09 chia-yu.chang
2026-04-16 17:55 ` Stephen Hemminger
0 siblings, 1 reply; 4+ messages in thread
From: chia-yu.chang @ 2026-04-16 17:09 UTC (permalink / raw)
To: victor, hxzene, linux-hardening, kees, gustavoars, jhs, jiri,
davem, edumazet, kuba, pabeni, linux-kernel, netdev, horms, ij,
ncardwell, koen.de_schepper, g.white, ingemar.s.johansson,
mirja.kuehlewind, cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Fix dualpi2_change() to correctly enforce updated limit and memlimit values
after a configuration change of the dualpi2 qdisc.
Before this patch, dualpi2_change() always attempted to dequeue packets via
the root qdisc (C-queue) when reducing backlog or memory usage, and
unconditionally assumed that a valid skb will be returned. When traffic
classification results in packets being queued in the L-queue while the
C-queue is empty, this leads to a NULL skb dereference during limit or
memlimit enforcement.
This is fixed by first dequeuing from the C-queue path if it is non-empty.
Once the C-queue is empty, packets are dequeued directly from the L-queue.
Return values from qdisc_dequeue_internal() are checked for both queues. When
dequeuing from the L-queue, the parent qdisc qlen and backlog counters are
updated explicitly to keep overall qdisc statistics consistent.
Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
net/sched/sch_dualpi2.c | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)
diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
index fe6f5e889625..5fcec5e6e97d 100644
--- a/net/sched/sch_dualpi2.c
+++ b/net/sched/sch_dualpi2.c
@@ -868,11 +868,31 @@ static int dualpi2_change(struct Qdisc *sch, struct nlattr *opt,
old_backlog = sch->qstats.backlog;
while (qdisc_qlen(sch) > sch->limit ||
q->memory_used > q->memory_limit) {
- struct sk_buff *skb = qdisc_dequeue_internal(sch, true);
-
- q->memory_used -= skb->truesize;
- qdisc_qstats_backlog_dec(sch, skb);
- rtnl_qdisc_drop(skb, sch);
+ int c_len = qdisc_qlen(sch) - qdisc_qlen(q->l_queue);
+ struct sk_buff *skb = NULL;
+
+ if (c_len) {
+ skb = qdisc_dequeue_internal(sch, true);
+ if (!skb)
+ break;
+ q->memory_used -= skb->truesize;
+ rtnl_qdisc_drop(skb, sch);
+ } else if (qdisc_qlen(q->l_queue)) {
+ skb = qdisc_dequeue_internal(q->l_queue, true);
+ if (!skb)
+ break;
+ /* Keep the overall qdisc stats consistent */
+ --sch->q.qlen;
+ qdisc_qstats_backlog_dec(sch, skb);
+
+ q->memory_used -= skb->truesize;
+ rtnl_qdisc_drop(skb, q->l_queue);
+
+ /* After incrementing the drop counter for the L-queue
+ via rtnl_qdisc_drop(), update the parent qdisc
+ drop counter via qdisc_qstats_drop(sch) */
+ qdisc_qstats_drop(sch);
+ }
}
qdisc_tree_reduce_backlog(sch, old_qlen - qdisc_qlen(sch),
old_backlog - sch->qstats.backlog);
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v2 net 1/1] net/sched: sch_dualpi2: fix limit/memlimit enforcement when dequeueing L-queue 2026-04-16 17:09 [PATCH v2 net 1/1] net/sched: sch_dualpi2: fix limit/memlimit enforcement when dequeueing L-queue chia-yu.chang @ 2026-04-16 17:55 ` Stephen Hemminger 2026-04-16 18:30 ` Chia-Yu Chang (Nokia) 0 siblings, 1 reply; 4+ messages in thread From: Stephen Hemminger @ 2026-04-16 17:55 UTC (permalink / raw) To: chia-yu.chang Cc: victor, hxzene, linux-hardening, kees, gustavoars, jhs, jiri, davem, edumazet, kuba, pabeni, linux-kernel, netdev, horms, ij, ncardwell, koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf, Jason_Livingood, vidhi_goel On Thu, 16 Apr 2026 19:09:06 +0200 chia-yu.chang@nokia-bell-labs.com wrote: > From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> > > Fix dualpi2_change() to correctly enforce updated limit and memlimit values > after a configuration change of the dualpi2 qdisc. > > Before this patch, dualpi2_change() always attempted to dequeue packets via > the root qdisc (C-queue) when reducing backlog or memory usage, and > unconditionally assumed that a valid skb will be returned. When traffic > classification results in packets being queued in the L-queue while the > C-queue is empty, this leads to a NULL skb dereference during limit or > memlimit enforcement. > > This is fixed by first dequeuing from the C-queue path if it is non-empty. > Once the C-queue is empty, packets are dequeued directly from the L-queue. > Return values from qdisc_dequeue_internal() are checked for both queues. When > dequeuing from the L-queue, the parent qdisc qlen and backlog counters are > updated explicitly to keep overall qdisc statistics consistent. > > Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc") > Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> > --- I was a little concerned about the complexity of managing qlen here. But could not find anything obvious. Turned to AI review and it found some things: Right fix direction and the reported crash is real. A few issues before this is ready: 1. The `c_len` construction is fragile. Declared `int`, initialized from a `u32 - u32`. If the invariant `qdisc_qlen(sch) >= qdisc_qlen(q->l_queue)` is ever violated, you get a large positive value, the C-queue branch is taken on an empty C-queue, `qdisc_dequeue_internal()` returns NULL, and the loop breaks out without draining the L-queue -- leaving the qdisc over limit. Simpler and more robust to just compare the two qlens directly and drop the delta variable entirely. 2. Missing else/termination. If both branches' conditions are false (neither `c_len` nor `qdisc_qlen(q->l_queue)`) but the outer `while` still holds because `memory_used > memory_limit`, the loop spins forever. An explicit `else break;` guards against an accounting desync becoming a hang. 3. Whitespace: two lines in the L-queue branch use spaces instead of tabs -- + q->memory_used -= skb->truesize; + rtnl_qdisc_drop(skb, q->l_queue); checkpatch will flag this. 4. Comment style. The three-line comment at the end of the L-queue branch doesn't follow the net subsystem multi-line comment style (leading ' * ' on continuation lines, closing ' */' on its own line). Once the code is cleaner, the comment could also just be dropped or shortened to one line. 5. The accounting in the L-queue branch is correct, but only if you trace the enqueue invariants carefully: L-queue packets are counted in *both* `sch` and `q->l_queue` on enqueue (see dualpi2_enqueue_skb lines 413-423), `qdisc_dequeue_internal(q->l_queue, true)` adjusts l_queue's side, and the explicit `--sch->q.qlen` + `qdisc_qstats_backlog_dec(sch, skb)` adjusts sch's side. Separately, the C-queue branch now quietly relies on the post-CVE-2025-39677 semantics of `qdisc_dequeue_internal()` handling parent backlog -- which is why the pre-patch `qdisc_qstats_backlog_dec(sch, skb)` could be removed. Neither of these load-bearing invariants is documented in the code or the commit message. Please add an inline comment in the L-queue branch explaining the double-count-on-enqueue, and mention the qdisc_dequeue_internal() dependency in the commit log. 6. Commit message / subject. Subject reads as if only the L-queue path changed, but the whole drain loop was restructured. Something like "sch_dualpi2: drain both C-queue and L-queue in dualpi2_change()" would describe it better. Also, on NULL return from qdisc_dequeue_internal() the loop silently breaks -- if that ever triggers it means qdisc_qlen() > 0 but dequeue returned NULL, which is a real invariant violation. > Worth a WARN_ON_ONCE(). Suggested shape: while (qdisc_qlen(sch) > sch->limit || q->memory_used > q->memory_limit) { struct sk_buff *skb; if (qdisc_qlen(sch) > qdisc_qlen(q->l_queue)) { skb = qdisc_dequeue_internal(sch, true); if (!skb) break; q->memory_used -= skb->truesize; rtnl_qdisc_drop(skb, sch); } else if (qdisc_qlen(q->l_queue)) { skb = qdisc_dequeue_internal(q->l_queue, true); if (!skb) break; /* L-queue packets are counted in both sch and * l_queue on enqueue; qdisc_dequeue_internal() * handled l_queue, account sch here. */ sch->q.qlen--; qdisc_qstats_backlog_dec(sch, skb); q->memory_used -= skb->truesize; rtnl_qdisc_drop(skb, q->l_queue); qdisc_qstats_drop(sch); } else { break; } } As with any AI feedback, expect it to generate hints but also be wrong. ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH v2 net 1/1] net/sched: sch_dualpi2: fix limit/memlimit enforcement when dequeueing L-queue 2026-04-16 17:55 ` Stephen Hemminger @ 2026-04-16 18:30 ` Chia-Yu Chang (Nokia) 2026-04-16 19:35 ` Ilpo Järvinen 0 siblings, 1 reply; 4+ messages in thread From: Chia-Yu Chang (Nokia) @ 2026-04-16 18:30 UTC (permalink / raw) To: Stephen Hemminger Cc: victor@mojatatu.com, hxzene@gmail.com, linux-hardening@vger.kernel.org, kees@kernel.org, gustavoars@kernel.org, jhs@mojatatu.com, jiri@resnulli.us, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, horms@kernel.org, ij@kernel.org, ncardwell@google.com, Koen De Schepper (Nokia), g.white@cablelabs.com, ingemar.s.johansson@ericsson.com, mirja.kuehlewind@ericsson.com, cheshire@apple.com, rs.ietf@gmx.at, Jason_Livingood@comcast.com, vidhi_goel@apple.com > -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Thursday, April 16, 2026 7:55 PM > To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com> > Cc: victor@mojatatu.com; hxzene@gmail.com; linux-hardening@vger.kernel.org; kees@kernel.org; gustavoars@kernel.org; jhs@mojatatu.com; jiri@resnulli.us; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; horms@kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com > Subject: Re: [PATCH v2 net 1/1] net/sched: sch_dualpi2: fix limit/memlimit enforcement when dequeueing L-queue > > > CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information. > > > > On Thu, 16 Apr 2026 19:09:06 +0200 > chia-yu.chang@nokia-bell-labs.com wrote: > > > From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> > > > > Fix dualpi2_change() to correctly enforce updated limit and memlimit > > values after a configuration change of the dualpi2 qdisc. > > > > Before this patch, dualpi2_change() always attempted to dequeue > > packets via the root qdisc (C-queue) when reducing backlog or memory > > usage, and unconditionally assumed that a valid skb will be returned. > > When traffic classification results in packets being queued in the > > L-queue while the C-queue is empty, this leads to a NULL skb > > dereference during limit or memlimit enforcement. > > > > This is fixed by first dequeuing from the C-queue path if it is non-empty. > > Once the C-queue is empty, packets are dequeued directly from the L-queue. > > Return values from qdisc_dequeue_internal() are checked for both > > queues. When dequeuing from the L-queue, the parent qdisc qlen and > > backlog counters are updated explicitly to keep overall qdisc statistics consistent. > > > > Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 > > qdisc") > > Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com> > > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> > > --- > > I was a little concerned about the complexity of managing qlen here. > But could not find anything obvious. Hi Stephen, This fix relies on some existing assmuptions of DualPI2. > > Turned to AI review and it found some things: > > Right fix direction and the reported crash is real. A few issues before this is ready: > > 1. The `c_len` construction is fragile. Declared `int`, initialized from a `u32 - u32`. If the invariant `qdisc_qlen(sch) >= qdisc_qlen(q->l_queue)` is ever violated, you get a large positive value, the C-queue branch is taken on an empty C-queue, `qdisc_dequeue_internal()` returns NULL, and the loop breaks out without draining the L-queue -- leaving the qdisc over limit. Simpler and more robust to just compare the two qlens directly and drop the delta variable entirely. > In current dequeue_packet() of DualPI2, we also calculate c_len via the same approach (line 524). As we only have queue length of L-queue and both C- and L-queues, so this is the way we derive the queue length of C-queue. > 2. Missing else/termination. If both branches' conditions are false (neither `c_len` nor `qdisc_qlen(q->l_queue)`) but the outer `while` still holds because `memory_used > memory_limit`, the loop spins forever. An explicit `else break;` guards against an accounting desync becoming a hang. > This shall not happen, but adding an extra else guard indeed is definitely a good suggestion. > 3. Whitespace: two lines in the L-queue branch use spaces instead of tabs -- > > + q->memory_used -= skb->truesize; > + rtnl_qdisc_drop(skb, q->l_queue); > > checkpatch will flag this. Sure, I will fix this, sorry for my miss. > > 4. Comment style. The three-line comment at the end of the L-queue branch doesn't follow the net subsystem multi-line comment style (leading ' * ' on continuation lines, closing ' */' on its own line). > Once the code is cleaner, the comment could also just be dropped or shortened to one line. > Thanks, I will fix this as well. > 5. The accounting in the L-queue branch is correct, but only if you trace the enqueue invariants carefully: L-queue packets are counted in > *both* `sch` and `q->l_queue` on enqueue (see dualpi2_enqueue_skb lines 413-423), `qdisc_dequeue_internal(q->l_queue, true)` adjusts l_queue's side, and the explicit `--sch->q.qlen` + `qdisc_qstats_backlog_dec(sch, skb)` adjusts sch's side. Separately, the C-queue branch now quietly relies on the post-CVE-2025-39677 semantics of `qdisc_dequeue_internal()` handling parent backlog -- which is why the pre-patch `qdisc_qstats_backlog_dec(sch, skb)` could be removed. > Neither of these load-bearing invariants is documented in the code or the commit message. Please add an inline comment in the L-queue branch explaining the double-count-on-enqueue, and mention the > qdisc_dequeue_internal() dependency in the commit log. Yes, L-queue packets are counted in both parent qdisc (sch) and child qdisc (q->l_queue) during enqueue. And we re-use the qdisc_dequeue_internal() of sch_generic.h for C-queue case. > 6. Commit message / subject. Subject reads as if only the L-queue path changed, but the whole drain loop was restructured. Something like > "sch_dualpi2: drain both C-queue and L-queue in dualpi2_change()" would describe it better. Also, on NULL return from qdisc_dequeue_internal() the loop silently breaks -- if that ever triggers it means qdisc_qlen() > > 0 but dequeue returned NULL, which is a real invariant violation. > > Worth a WARN_ON_ONCE(). > > Suggested shape: > > while (qdisc_qlen(sch) > sch->limit || > q->memory_used > q->memory_limit) { > struct sk_buff *skb; > > if (qdisc_qlen(sch) > qdisc_qlen(q->l_queue)) { > skb = qdisc_dequeue_internal(sch, true); > if (!skb) > break; > q->memory_used -= skb->truesize; > rtnl_qdisc_drop(skb, sch); > } else if (qdisc_qlen(q->l_queue)) { > skb = qdisc_dequeue_internal(q->l_queue, true); > if (!skb) > break; > /* L-queue packets are counted in both sch and > * l_queue on enqueue; qdisc_dequeue_internal() > * handled l_queue, account sch here. > */ > sch->q.qlen--; > qdisc_qstats_backlog_dec(sch, skb); > q->memory_used -= skb->truesize; > rtnl_qdisc_drop(skb, q->l_queue); > qdisc_qstats_drop(sch); > } else { > break; > } > } > > > As with any AI feedback, expect it to generate hints but also be wrong. I am ok with this suggestion and I will take action in v3. But I would say the origianl c_len calculation already existed in dualpi2 of dequeue_packet(). And this is because we maintained parent and child qdisc statistics during normal enqueue and dequeue operations. Thanks! Chia-Yu ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH v2 net 1/1] net/sched: sch_dualpi2: fix limit/memlimit enforcement when dequeueing L-queue 2026-04-16 18:30 ` Chia-Yu Chang (Nokia) @ 2026-04-16 19:35 ` Ilpo Järvinen 0 siblings, 0 replies; 4+ messages in thread From: Ilpo Järvinen @ 2026-04-16 19:35 UTC (permalink / raw) To: Chia-Yu Chang (Nokia) Cc: Stephen Hemminger, victor@mojatatu.com, hxzene@gmail.com, linux-hardening@vger.kernel.org, kees@kernel.org, gustavoars@kernel.org, jhs@mojatatu.com, jiri@resnulli.us, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, horms@kernel.org, ncardwell@google.com, Koen De Schepper (Nokia), g.white@cablelabs.com, ingemar.s.johansson@ericsson.com, mirja.kuehlewind@ericsson.com, cheshire@apple.com, rs.ietf@gmx.at, Jason_Livingood@comcast.com, vidhi_goel@apple.com On Thu, 16 Apr 2026, Chia-Yu Chang (Nokia) wrote: > > -----Original Message----- > > From: Stephen Hemminger <stephen@networkplumber.org> > > Sent: Thursday, April 16, 2026 7:55 PM > > To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com> > > Cc: victor@mojatatu.com; hxzene@gmail.com; linux-hardening@vger.kernel.org; kees@kernel.org; gustavoars@kernel.org; jhs@mojatatu.com; jiri@resnulli.us; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; horms@kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com > > Subject: Re: [PATCH v2 net 1/1] net/sched: sch_dualpi2: fix limit/memlimit enforcement when dequeueing L-queue > > > > > > CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information. > > > > > > > > On Thu, 16 Apr 2026 19:09:06 +0200 > > chia-yu.chang@nokia-bell-labs.com wrote: > > > > > From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> > > > > > > Fix dualpi2_change() to correctly enforce updated limit and memlimit > > > values after a configuration change of the dualpi2 qdisc. > > > > > > Before this patch, dualpi2_change() always attempted to dequeue > > > packets via the root qdisc (C-queue) when reducing backlog or memory > > > usage, and unconditionally assumed that a valid skb will be returned. > > > When traffic classification results in packets being queued in the > > > L-queue while the C-queue is empty, this leads to a NULL skb > > > dereference during limit or memlimit enforcement. > > > > > > This is fixed by first dequeuing from the C-queue path if it is non-empty. > > > Once the C-queue is empty, packets are dequeued directly from the L-queue. > > > Return values from qdisc_dequeue_internal() are checked for both > > > queues. When dequeuing from the L-queue, the parent qdisc qlen and > > > backlog counters are updated explicitly to keep overall qdisc statistics consistent. > > > > > > Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 > > > qdisc") > > > Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com> > > > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> > > > --- > > > > I was a little concerned about the complexity of managing qlen here. > > But could not find anything obvious. > > Hi Stephen, > > This fix relies on some existing assmuptions of DualPI2. > > > > > Turned to AI review and it found some things: > > > > Right fix direction and the reported crash is real. A few issues before this is ready: > > > > 1. The `c_len` construction is fragile. Declared `int`, initialized from a `u32 - u32`. If the invariant `qdisc_qlen(sch) >= qdisc_qlen(q->l_queue)` is ever violated, you get a large positive value, the C-queue branch is taken on an empty C-queue, `qdisc_dequeue_internal()` returns NULL, and the loop breaks out without draining the L-queue -- leaving the qdisc over limit. Simpler and more robust to just compare the two qlens directly and drop the delta variable entirely. > > > > In current dequeue_packet() of DualPI2, we also calculate c_len via the same approach (line 524). > > As we only have queue length of L-queue and both C- and L-queues, so this is the way we derive the queue length of C-queue. > > > 2. Missing else/termination. If both branches' conditions are false > > (neither `c_len` nor `qdisc_qlen(q->l_queue)`) but the outer `while` > > still holds because `memory_used > memory_limit`, the loop spins > > forever. An explicit `else break;` guards against an accounting > > desync becoming a hang. > > This shall not happen, but adding an extra else guard indeed is > definitely a good suggestion. Hi, Maybe also add WARN_ON_ONCE() there so that such a problem would be exposed if it ever happens. -- i. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-16 19:35 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-16 17:09 [PATCH v2 net 1/1] net/sched: sch_dualpi2: fix limit/memlimit enforcement when dequeueing L-queue chia-yu.chang 2026-04-16 17:55 ` Stephen Hemminger 2026-04-16 18:30 ` Chia-Yu Chang (Nokia) 2026-04-16 19:35 ` Ilpo Järvinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox