public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: netdev@vger.kernel.org
Cc: Stephen Hemminger <stephen@networkplumber.org>, stable@vger.kernel.org
Subject: [PATCH net v2 04/10] net/sched: netem: restructure dequeue to avoid re-entrancy with child qdisc
Date: Sat, 14 Mar 2026 17:14:08 -0700	[thread overview]
Message-ID: <20260315001649.23931-5-stephen@networkplumber.org> (raw)
In-Reply-To: <20260315001649.23931-1-stephen@networkplumber.org>

netem_dequeue() currently enqueues time-ready packets into the child
qdisc during the dequeue call path. This creates several problems:

1. Parent qdiscs like HFSC track class active/inactive state based on
   qlen transitions. The child enqueue during netem's dequeue can cause
   qlen to increase while the parent is mid-dequeue, leading to
   double-insertion in HFSC's eltree (CVE-2025-37890, CVE-2025-38001).

2. If the child qdisc is non-work-conserving (e.g., TBF), it may refuse
   to release packets during its dequeue even though they were just
   enqueued. The parent then sees netem returning NULL despite having
   backlog, violating the work-conserving contract and causing stalls
   with parents like DRR that deactivate classes in this case.

Restructure netem_dequeue so that when a child qdisc is present, all
time-ready packets are transferred from the tfifo to the child in a
batch before asking the child for output. This ensures the child only
receives packets whose delay has already elapsed. The no-child path
(tfifo direct dequeue) is unchanged.

Fixes: 50612537e9ab ("netem: fix classful handling")
Cc: stable@vger.kernel.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 net/sched/sch_netem.c | 79 +++++++++++++++++++++++++++++--------------
 1 file changed, 54 insertions(+), 25 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 085fa3ad6f83..7488ff9f2933 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -726,7 +726,6 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 	struct netem_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
 
-tfifo_dequeue:
 	skb = __qdisc_dequeue_head(&sch->q);
 	if (skb) {
 deliver:
@@ -734,24 +733,28 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 		qdisc_bstats_update(sch, skb);
 		return skb;
 	}
-	skb = netem_peek(q);
-	if (skb) {
-		u64 time_to_send;
+
+	/* If we have a child qdisc, transfer all time-ready packets
+	 * from the tfifo into the child, then dequeue from the child.
+	 * This avoids enqueueing into the child during the parent's
+	 * dequeue callback, which can confuse parents that track
+	 * active/inactive state based on qlen transitions (HFSC).
+	 */
+	if (q->qdisc) {
 		u64 now = ktime_get_ns();
 
-		/* if more time remaining? */
-		time_to_send = netem_skb_cb(skb)->time_to_send;
-		if (q->slot.slot_next && q->slot.slot_next < time_to_send)
-			get_slot_next(q, now);
+		while ((skb = netem_peek(q)) != NULL) {
+			u64 t = netem_skb_cb(skb)->time_to_send;
+
+			if (t > now)
+				break;
+			if (q->slot.slot_next && q->slot.slot_next > now)
+				break;
 
-		if (time_to_send <= now && q->slot.slot_next <= now) {
 			netem_erase_head(q, skb);
 			q->t_len--;
 			skb->next = NULL;
 			skb->prev = NULL;
-			/* skb->dev shares skb->rbnode area,
-			 * we need to restore its value.
-			 */
 			skb->dev = qdisc_dev(sch);
 
 			if (q->slot.slot_next) {
@@ -762,7 +765,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 					get_slot_next(q, now);
 			}
 
-			if (q->qdisc) {
+			{
 				unsigned int pkt_len = qdisc_pkt_len(skb);
 				struct sk_buff *to_free = NULL;
 				int err;
@@ -776,32 +779,58 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 					sch->q.qlen--;
 					qdisc_tree_reduce_backlog(sch, 1, pkt_len);
 				}
-				goto tfifo_dequeue;
 			}
+		}
+
+		skb = q->qdisc->ops->dequeue(q->qdisc);
+		if (skb) {
 			sch->q.qlen--;
 			goto deliver;
 		}
-
-		if (q->qdisc) {
-			skb = q->qdisc->ops->dequeue(q->qdisc);
-			if (skb) {
+	} else {
+		/* No child qdisc: dequeue directly from tfifo */
+		skb = netem_peek(q);
+		if (skb) {
+			u64 time_to_send;
+			u64 now = ktime_get_ns();
+
+			time_to_send = netem_skb_cb(skb)->time_to_send;
+			if (q->slot.slot_next &&
+			    q->slot.slot_next < time_to_send)
+				get_slot_next(q, now);
+
+			if (time_to_send <= now &&
+			    q->slot.slot_next <= now) {
+				netem_erase_head(q, skb);
+				q->t_len--;
+				skb->next = NULL;
+				skb->prev = NULL;
+				skb->dev = qdisc_dev(sch);
+
+				if (q->slot.slot_next) {
+					q->slot.packets_left--;
+					q->slot.bytes_left -=
+						qdisc_pkt_len(skb);
+					if (q->slot.packets_left <= 0 ||
+					    q->slot.bytes_left <= 0)
+						get_slot_next(q, now);
+				}
 				sch->q.qlen--;
 				goto deliver;
 			}
 		}
+	}
+
+	/* Schedule watchdog for next time-ready packet */
+	skb = netem_peek(q);
+	if (skb) {
+		u64 time_to_send = netem_skb_cb(skb)->time_to_send;
 
 		qdisc_watchdog_schedule_ns(&q->watchdog,
 					   max(time_to_send,
 					       q->slot.slot_next));
 	}
 
-	if (q->qdisc) {
-		skb = q->qdisc->ops->dequeue(q->qdisc);
-		if (skb) {
-			sch->q.qlen--;
-			goto deliver;
-		}
-	}
 	return NULL;
 }
 
-- 
2.51.0


  parent reply	other threads:[~2026-03-15  0:17 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-15  0:14 [PATCH net v2 00/10] netem: fixes and selftests Stephen Hemminger
2026-03-15  0:14 ` [PATCH net v2 01/10] Revert "net/sched: Restrict conditions for adding duplicating netems to qdisc tree" Stephen Hemminger
2026-03-15  0:14 ` [PATCH net v2 02/10] net/sched: netem: add per-CPU recursion guard for duplication Stephen Hemminger
2026-03-15  0:14 ` [PATCH net v2 03/10] selftests/tc-testing: update netem tests after check_netem_in_tree revert Stephen Hemminger
2026-03-15  0:14 ` Stephen Hemminger [this message]
2026-03-15  0:14 ` [PATCH net v2 05/10] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
2026-03-15  0:14 ` [PATCH net v2 06/10] net/sched: netem: fix slot delay calculation overflow Stephen Hemminger
2026-03-15  0:14 ` [PATCH net v2 07/10] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
2026-03-15  0:14 ` [PATCH net v2 08/10] net/sched: netem: null-terminate tfifo linear queue tail Stephen Hemminger
2026-03-15  0:14 ` [PATCH net v2 09/10] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
2026-03-15  0:14 ` [PATCH net v2 10/10] selftests/tc-testing: add netem configuration and traffic tests Stephen Hemminger
2026-03-15 15:19 ` [PATCH net v2 00/10] netem: fixes and selftests Jakub Kicinski
2026-03-15 16:09   ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260315001649.23931-5-stephen@networkplumber.org \
    --to=stephen@networkplumber.org \
    --cc=netdev@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox