From: Stephen Hemminger <stephen@networkplumber.org>
To: netdev@vger.kernel.org
Cc: Stephen Hemminger <stephen@networkplumber.org>, stable@vger.kernel.org
Subject: [PATCH net v2 04/10] net/sched: netem: restructure dequeue to avoid re-entrancy with child qdisc
Date: Sat, 14 Mar 2026 17:14:08 -0700 [thread overview]
Message-ID: <20260315001649.23931-5-stephen@networkplumber.org> (raw)
In-Reply-To: <20260315001649.23931-1-stephen@networkplumber.org>
netem_dequeue() currently enqueues time-ready packets into the child
qdisc during the dequeue call path. This creates several problems:
1. Parent qdiscs like HFSC track class active/inactive state based on
qlen transitions. The child enqueue during netem's dequeue can cause
qlen to increase while the parent is mid-dequeue, leading to
double-insertion in HFSC's eltree (CVE-2025-37890, CVE-2025-38001).
2. If the child qdisc is non-work-conserving (e.g., TBF), it may refuse
to release packets during its dequeue even though they were just
enqueued. The parent then sees netem returning NULL despite having
backlog, violating the work-conserving contract and causing stalls
with parents like DRR that deactivate classes in this case.
Restructure netem_dequeue so that when a child qdisc is present, all
time-ready packets are transferred from the tfifo to the child in a
batch before asking the child for output. This ensures the child only
receives packets whose delay has already elapsed. The no-child path
(tfifo direct dequeue) is unchanged.
Fixes: 50612537e9ab ("netem: fix classful handling")
Cc: stable@vger.kernel.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
net/sched/sch_netem.c | 79 +++++++++++++++++++++++++++++--------------
1 file changed, 54 insertions(+), 25 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 085fa3ad6f83..7488ff9f2933 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -726,7 +726,6 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
struct netem_sched_data *q = qdisc_priv(sch);
struct sk_buff *skb;
-tfifo_dequeue:
skb = __qdisc_dequeue_head(&sch->q);
if (skb) {
deliver:
@@ -734,24 +733,28 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
qdisc_bstats_update(sch, skb);
return skb;
}
- skb = netem_peek(q);
- if (skb) {
- u64 time_to_send;
+
+ /* If we have a child qdisc, transfer all time-ready packets
+ * from the tfifo into the child, then dequeue from the child.
+ * This avoids enqueueing into the child during the parent's
+ * dequeue callback, which can confuse parents that track
+ * active/inactive state based on qlen transitions (HFSC).
+ */
+ if (q->qdisc) {
u64 now = ktime_get_ns();
- /* if more time remaining? */
- time_to_send = netem_skb_cb(skb)->time_to_send;
- if (q->slot.slot_next && q->slot.slot_next < time_to_send)
- get_slot_next(q, now);
+ while ((skb = netem_peek(q)) != NULL) {
+ u64 t = netem_skb_cb(skb)->time_to_send;
+
+ if (t > now)
+ break;
+ if (q->slot.slot_next && q->slot.slot_next > now)
+ break;
- if (time_to_send <= now && q->slot.slot_next <= now) {
netem_erase_head(q, skb);
q->t_len--;
skb->next = NULL;
skb->prev = NULL;
- /* skb->dev shares skb->rbnode area,
- * we need to restore its value.
- */
skb->dev = qdisc_dev(sch);
if (q->slot.slot_next) {
@@ -762,7 +765,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
get_slot_next(q, now);
}
- if (q->qdisc) {
+ {
unsigned int pkt_len = qdisc_pkt_len(skb);
struct sk_buff *to_free = NULL;
int err;
@@ -776,32 +779,58 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
sch->q.qlen--;
qdisc_tree_reduce_backlog(sch, 1, pkt_len);
}
- goto tfifo_dequeue;
}
+ }
+
+ skb = q->qdisc->ops->dequeue(q->qdisc);
+ if (skb) {
sch->q.qlen--;
goto deliver;
}
-
- if (q->qdisc) {
- skb = q->qdisc->ops->dequeue(q->qdisc);
- if (skb) {
+ } else {
+ /* No child qdisc: dequeue directly from tfifo */
+ skb = netem_peek(q);
+ if (skb) {
+ u64 time_to_send;
+ u64 now = ktime_get_ns();
+
+ time_to_send = netem_skb_cb(skb)->time_to_send;
+ if (q->slot.slot_next &&
+ q->slot.slot_next < time_to_send)
+ get_slot_next(q, now);
+
+ if (time_to_send <= now &&
+ q->slot.slot_next <= now) {
+ netem_erase_head(q, skb);
+ q->t_len--;
+ skb->next = NULL;
+ skb->prev = NULL;
+ skb->dev = qdisc_dev(sch);
+
+ if (q->slot.slot_next) {
+ q->slot.packets_left--;
+ q->slot.bytes_left -=
+ qdisc_pkt_len(skb);
+ if (q->slot.packets_left <= 0 ||
+ q->slot.bytes_left <= 0)
+ get_slot_next(q, now);
+ }
sch->q.qlen--;
goto deliver;
}
}
+ }
+
+ /* Schedule watchdog for next time-ready packet */
+ skb = netem_peek(q);
+ if (skb) {
+ u64 time_to_send = netem_skb_cb(skb)->time_to_send;
qdisc_watchdog_schedule_ns(&q->watchdog,
max(time_to_send,
q->slot.slot_next));
}
- if (q->qdisc) {
- skb = q->qdisc->ops->dequeue(q->qdisc);
- if (skb) {
- sch->q.qlen--;
- goto deliver;
- }
- }
return NULL;
}
--
2.51.0
prev parent reply other threads:[~2026-03-15 0:17 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260315001649.23931-1-stephen@networkplumber.org>
2026-03-15 0:14 ` [PATCH net v2 01/10] Revert "net/sched: Restrict conditions for adding duplicating netems to qdisc tree" Stephen Hemminger
2026-03-15 0:14 ` [PATCH net v2 02/10] net/sched: netem: add per-CPU recursion guard for duplication Stephen Hemminger
2026-03-15 0:14 ` Stephen Hemminger [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260315001649.23931-5-stephen@networkplumber.org \
--to=stephen@networkplumber.org \
--cc=netdev@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox