From: Stephen Hemminger <stephen@networkplumber.org>
To: netdev@vger.kernel.org
Cc: Stephen Hemminger <stephen@networkplumber.org>, stable@vger.kernel.org
Subject: [PATCH 04/12] net/sched: netem: restructure dequeue to avoid re-entrancy with child qdisc
Date: Fri, 13 Mar 2026 14:15:04 -0700 [thread overview]
Message-ID: <20260313211646.12549-5-stephen@networkplumber.org> (raw)
In-Reply-To: <20260313211646.12549-1-stephen@networkplumber.org>
netem_dequeue() currently enqueues time-ready packets into the child
qdisc during the dequeue call path. This creates several problems:
1. Parent qdiscs like HFSC track class active/inactive state based on
qlen transitions. The child enqueue during netem's dequeue can cause
qlen to increase while the parent is mid-dequeue, leading to
double-insertion in HFSC's eltree (CVE-2025-37890, CVE-2025-38001).
2. If the child qdisc is non-work-conserving (e.g., TBF), it may refuse
to release packets during its dequeue even though they were just
enqueued. The parent then sees netem returning NULL despite having
backlog, violating the work-conserving contract and causing stalls
with parents like DRR that deactivate classes in this case.
Restructure netem_dequeue so that when a child qdisc is present, all
time-ready packets are transferred from the tfifo to the child in a
batch before asking the child for output. This ensures the child only
receives packets whose delay has already elapsed. The no-child path
(tfifo direct dequeue) is unchanged.
Fixes: 50612537e9ab ("netem: fix classful handling")
Cc: stable@vger.kernel.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
net/sched/sch_netem.c | 82 +++++++++++++++++++++++++++++--------------
1 file changed, 56 insertions(+), 26 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 085fa3ad6f83..08006a60849e 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -726,7 +726,6 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
struct netem_sched_data *q = qdisc_priv(sch);
struct sk_buff *skb;
-tfifo_dequeue:
skb = __qdisc_dequeue_head(&sch->q);
if (skb) {
deliver:
@@ -734,24 +733,28 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
qdisc_bstats_update(sch, skb);
return skb;
}
- skb = netem_peek(q);
- if (skb) {
- u64 time_to_send;
+
+ /* If we have a child qdisc, transfer all time-ready packets
+ * from the tfifo into the child, then dequeue from the child.
+ * This avoids enqueueing into the child during the parent's
+ * dequeue callback, which can confuse parents that track
+ * active/inactive state based on qlen transitions (HFSC).
+ */
+ if (q->qdisc) {
u64 now = ktime_get_ns();
- /* if more time remaining? */
- time_to_send = netem_skb_cb(skb)->time_to_send;
- if (q->slot.slot_next && q->slot.slot_next < time_to_send)
- get_slot_next(q, now);
+ while ((skb = netem_peek(q)) != NULL) {
+ u64 t = netem_skb_cb(skb)->time_to_send;
+
+ if (t > now)
+ break;
+ if (q->slot.slot_next && q->slot.slot_next > now)
+ break;
- if (time_to_send <= now && q->slot.slot_next <= now) {
netem_erase_head(q, skb);
q->t_len--;
skb->next = NULL;
skb->prev = NULL;
- /* skb->dev shares skb->rbnode area,
- * we need to restore its value.
- */
skb->dev = qdisc_dev(sch);
if (q->slot.slot_next) {
@@ -762,7 +765,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
get_slot_next(q, now);
}
- if (q->qdisc) {
+ {
unsigned int pkt_len = qdisc_pkt_len(skb);
struct sk_buff *to_free = NULL;
int err;
@@ -774,34 +777,61 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
qdisc_qstats_drop(sch);
sch->qstats.backlog -= pkt_len;
sch->q.qlen--;
- qdisc_tree_reduce_backlog(sch, 1, pkt_len);
+ qdisc_tree_reduce_backlog(sch,
+ 1, pkt_len);
}
- goto tfifo_dequeue;
}
+ }
+
+ skb = q->qdisc->ops->dequeue(q->qdisc);
+ if (skb) {
sch->q.qlen--;
goto deliver;
}
-
- if (q->qdisc) {
- skb = q->qdisc->ops->dequeue(q->qdisc);
- if (skb) {
+ } else {
+ /* No child qdisc: dequeue directly from tfifo */
+ skb = netem_peek(q);
+ if (skb) {
+ u64 time_to_send;
+ u64 now = ktime_get_ns();
+
+ time_to_send = netem_skb_cb(skb)->time_to_send;
+ if (q->slot.slot_next &&
+ q->slot.slot_next < time_to_send)
+ get_slot_next(q, now);
+
+ if (time_to_send <= now &&
+ q->slot.slot_next <= now) {
+ netem_erase_head(q, skb);
+ q->t_len--;
+ skb->next = NULL;
+ skb->prev = NULL;
+ skb->dev = qdisc_dev(sch);
+
+ if (q->slot.slot_next) {
+ q->slot.packets_left--;
+ q->slot.bytes_left -=
+ qdisc_pkt_len(skb);
+ if (q->slot.packets_left <= 0 ||
+ q->slot.bytes_left <= 0)
+ get_slot_next(q, now);
+ }
sch->q.qlen--;
goto deliver;
}
}
+ }
+
+ /* Schedule watchdog for next time-ready packet */
+ skb = netem_peek(q);
+ if (skb) {
+ u64 time_to_send = netem_skb_cb(skb)->time_to_send;
qdisc_watchdog_schedule_ns(&q->watchdog,
max(time_to_send,
q->slot.slot_next));
}
- if (q->qdisc) {
- skb = q->qdisc->ops->dequeue(q->qdisc);
- if (skb) {
- sch->q.qlen--;
- goto deliver;
- }
- }
return NULL;
}
--
2.51.0
next prev parent reply other threads:[~2026-03-13 21:17 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-13 21:15 [PATCH 00/12] netem: fixes, cleanup, and selftest Stephen Hemminger
2026-03-13 21:15 ` [PATCH 01/12] selftests: net: add netem qdisc test Stephen Hemminger
2026-03-13 21:15 ` [PATCH 02/12] Revert "net/sched: Restrict conditions for adding duplicating netems to qdisc tree" Stephen Hemminger
2026-03-13 21:15 ` [PATCH 03/12] net/sched: netem: add per-CPU recursion guard for duplication Stephen Hemminger
2026-03-14 19:29 ` William Liu
2026-03-15 16:06 ` Stephen Hemminger
2026-03-15 16:19 ` Jamal Hadi Salim
2026-03-15 17:18 ` Stephen Hemminger
2026-03-16 17:52 ` Jamal Hadi Salim
2026-03-13 21:15 ` Stephen Hemminger [this message]
2026-03-13 21:15 ` [PATCH 05/12] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
2026-03-13 21:15 ` [PATCH 06/12] net/sched: netem: fix slot delay calculation overflow Stephen Hemminger
2026-03-13 21:15 ` [PATCH 07/12] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
2026-03-13 21:15 ` [PATCH 08/12] net/sched: netem: null-terminate tfifo linear queue tail Stephen Hemminger
2026-03-13 21:15 ` [PATCH 09/12] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
2026-03-13 21:15 ` [PATCH 10/12] net/sched: netem: move state enums out of struct netem_sched_data Stephen Hemminger
2026-03-13 21:15 ` [PATCH 11/12] net/sched: netem: remove useless VERSION Stephen Hemminger
2026-03-13 21:15 ` [PATCH 12/12] net/sched: netem: replace pr_info with netlink extack error messages Stephen Hemminger
2026-03-14 14:09 ` [PATCH 00/12] netem: fixes, cleanup, and selftest Jakub Kicinski
2026-03-14 15:39 ` Stephen Hemminger
2026-03-14 15:51 ` Stephen Hemminger
2026-03-14 16:00 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260313211646.12549-5-stephen@networkplumber.org \
--to=stephen@networkplumber.org \
--cc=netdev@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox