public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/5] net/sched: netem: bug fixes found during AI-assisted review
@ 2026-03-28 18:21 Stephen Hemminger
  2026-03-28 18:21 ` [PATCH net 1/5] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Stephen Hemminger @ 2026-03-28 18:21 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger

These bugs were identified while using AI-assisted code review of
sch_netem.c to analyze the packet duplication re-entrancy problem
(CVE-2025-37890, CVE-2025-38001), which are addressed in a separate
series.

The review uncovered several additional issues:

- probability gaps in the 4-state Markov loss model where
  boundary values produce no state transition
- queue limit check not accounting for reordered packets
- PRNG reseeded on every tc change, breaking reproducibility
- the core dequeue re-entrancy issue with child qdiscs
  causing HFSC eltree corruption and DRR class stalls
- missing NULL termination on the tfifo linear list tail

Stephen Hemminger (5):
  net/sched: netem: fix probability gaps in 4-state loss model
  net/sched: netem: fix queue limit check to include reordered packets
  net/sched: netem: only reseed PRNG when seed is explicitly provided
  net/sched: netem: restructure dequeue to avoid re-entrancy with child
    qdisc
  net/sched: netem: null-terminate tfifo linear queue tail

 net/sched/sch_netem.c | 222 ++++++++++++++++++++++++++----------------
 1 file changed, 140 insertions(+), 82 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net 1/5] net/sched: netem: fix probability gaps in 4-state loss model
  2026-03-28 18:21 [PATCH net 0/5] net/sched: netem: bug fixes found during AI-assisted review Stephen Hemminger
@ 2026-03-28 18:21 ` Stephen Hemminger
  2026-03-28 18:21 ` [PATCH net 2/5] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2026-03-28 18:21 UTC (permalink / raw)
  To: netdev
  Cc: Stephen Hemminger, Jamal Hadi Salim, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	open list

The 4-state Markov chain in loss_4state() has gaps at the boundaries
between transition probability ranges. The comparisons use:

  if (rnd < a4)
  else if (a4 < rnd && rnd < a1 + a4)

When rnd equals a boundary value exactly, neither branch matches and
no state transition occurs. The redundant lower-bound check (a4 < rnd)
is already implied by being in the else branch.

Remove the unnecessary lower-bound comparisons so the ranges are
contiguous and every random value produces a transition, matching
the GI (General and Intuitive) loss model specification.

This bug goes back to original implementation of this model.

Fixes: 661b79725fea ("netem: revised correlated loss generator")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 net/sched/sch_netem.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 5de1c932944a..2cc3acaa4068 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -227,10 +227,10 @@ static bool loss_4state(struct netem_sched_data *q)
 		if (rnd < clg->a4) {
 			clg->state = LOST_IN_GAP_PERIOD;
 			return true;
-		} else if (clg->a4 < rnd && rnd < clg->a1 + clg->a4) {
+		} else if (rnd < clg->a1 + clg->a4) {
 			clg->state = LOST_IN_BURST_PERIOD;
 			return true;
-		} else if (clg->a1 + clg->a4 < rnd) {
+		} else {
 			clg->state = TX_IN_GAP_PERIOD;
 		}
 
@@ -247,9 +247,9 @@ static bool loss_4state(struct netem_sched_data *q)
 	case LOST_IN_BURST_PERIOD:
 		if (rnd < clg->a3)
 			clg->state = TX_IN_BURST_PERIOD;
-		else if (clg->a3 < rnd && rnd < clg->a2 + clg->a3) {
+		else if (rnd < clg->a2 + clg->a3) {
 			clg->state = TX_IN_GAP_PERIOD;
-		} else if (clg->a2 + clg->a3 < rnd) {
+		} else {
 			clg->state = LOST_IN_BURST_PERIOD;
 			return true;
 		}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net 2/5] net/sched: netem: fix queue limit check to include reordered packets
  2026-03-28 18:21 [PATCH net 0/5] net/sched: netem: bug fixes found during AI-assisted review Stephen Hemminger
  2026-03-28 18:21 ` [PATCH net 1/5] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
@ 2026-03-28 18:21 ` Stephen Hemminger
  2026-03-28 18:21 ` [PATCH net 3/5] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2026-03-28 18:21 UTC (permalink / raw)
  To: netdev
  Cc: Stephen Hemminger, Jamal Hadi Salim, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	open list

The queue limit check in netem_enqueue() uses q->t_len which only
counts packets in the internal tfifo. Packets placed in sch->q by
the reorder path (__qdisc_enqueue_head) are not counted, allowing
the total queue occupancy to exceed sch->limit under reordering.

Include sch->q.qlen in the limit check.

Fixes: 50612537e9ab ("netem: fix classful handling")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 net/sched/sch_netem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 2cc3acaa4068..6cc48b698e48 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -523,7 +523,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			1<<get_random_u32_below(8);
 	}
 
-	if (unlikely(q->t_len >= sch->limit)) {
+	if (unlikely(sch->q.qlen >= sch->limit)) {
 		/* re-link segs, so that qdisc_drop_all() frees them all */
 		skb->next = segs;
 		qdisc_drop_all(skb, sch, to_free);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net 3/5] net/sched: netem: only reseed PRNG when seed is explicitly provided
  2026-03-28 18:21 [PATCH net 0/5] net/sched: netem: bug fixes found during AI-assisted review Stephen Hemminger
  2026-03-28 18:21 ` [PATCH net 1/5] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
  2026-03-28 18:21 ` [PATCH net 2/5] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
@ 2026-03-28 18:21 ` Stephen Hemminger
  2026-03-28 18:21 ` [PATCH net 4/5] net/sched: netem: restructure dequeue to avoid re-entrancy with child qdisc Stephen Hemminger
  2026-03-28 18:21 ` [PATCH net 5/5] net/sched: netem: null-terminate tfifo linear queue tail Stephen Hemminger
  4 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2026-03-28 18:21 UTC (permalink / raw)
  To: netdev
  Cc: Stephen Hemminger, Jamal Hadi Salim, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	François Michel, open list

netem_change() unconditionally reseeds the PRNG on every tc change
command. If TCA_NETEM_PRNG_SEED is not specified, a new random seed
is generated, destroying reproducibility for users who set a
deterministic seed on a previous change.

Move the initial random seed generation to netem_init() and only
reseed in netem_change() when TCA_NETEM_PRNG_SEED is explicitly
provided by the user.

Fixes: 4072d97ddc44 ("netem: add prng attribute to netem_sched_data")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 net/sched/sch_netem.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 6cc48b698e48..448097fc88a4 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -1111,11 +1111,10 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
 	/* capping jitter to the range acceptable by tabledist() */
 	q->jitter = min_t(s64, abs(q->jitter), INT_MAX);
 
-	if (tb[TCA_NETEM_PRNG_SEED])
+	if (tb[TCA_NETEM_PRNG_SEED]) {
 		q->prng.seed = nla_get_u64(tb[TCA_NETEM_PRNG_SEED]);
-	else
-		q->prng.seed = get_random_u64();
-	prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+		prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+	}
 
 unlock:
 	sch_tree_unlock(sch);
@@ -1138,6 +1137,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt,
 		return -EINVAL;
 
 	q->loss_model = CLG_RANDOM;
+	q->prng.seed = get_random_u64();
+	prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+
 	ret = netem_change(sch, opt, extack);
 	if (ret)
 		pr_info("netem: change failed\n");
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net 4/5] net/sched: netem: restructure dequeue to avoid re-entrancy with child qdisc
  2026-03-28 18:21 [PATCH net 0/5] net/sched: netem: bug fixes found during AI-assisted review Stephen Hemminger
                   ` (2 preceding siblings ...)
  2026-03-28 18:21 ` [PATCH net 3/5] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
@ 2026-03-28 18:21 ` Stephen Hemminger
  2026-03-28 18:21 ` [PATCH net 5/5] net/sched: netem: null-terminate tfifo linear queue tail Stephen Hemminger
  4 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2026-03-28 18:21 UTC (permalink / raw)
  To: netdev
  Cc: Stephen Hemminger, Jamal Hadi Salim, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	open list

netem_dequeue() enqueues packets into its child qdisc while being
called from the parent's dequeue path. This causes two problems:

- HFSC tracks class active/inactive state on qlen transitions.
  A child enqueue during dequeue causes double-insertion into
  the eltree (CVE-2025-37890, CVE-2025-38001).

- Non-work-conserving children like TBF may refuse to dequeue
  packets just enqueued, causing netem to return NULL despite
  having backlog. Parents like DRR then incorrectly deactivate
  the class.

Split the dequeue into helpers:

  netem_pull_tfifo()    - remove head packet from tfifo
  netem_slot_account()  - update slot pacing counters
  netem_dequeue_child() - batch-transfer ready packets to the
                          child, then dequeue from the child
  netem_dequeue_direct()- dequeue from tfifo when no child

When a child qdisc is present, all time-ready packets are moved
into the child before calling its dequeue. This separates the
enqueue and dequeue phases so the parent sees consistent qlen
transitions.

Fixes: 50612537e9ab ("netem: fix classful handling")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 net/sched/sch_netem.c | 201 +++++++++++++++++++++++++++---------------
 1 file changed, 128 insertions(+), 73 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 448097fc88a4..ce12b64603b2 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -688,99 +688,154 @@ static struct sk_buff *netem_peek(struct netem_sched_data *q)
 	return q->t_head;
 }
 
-static void netem_erase_head(struct netem_sched_data *q, struct sk_buff *skb)
+/*
+ * Pop the head packet from the tfifo and prepare it for delivery.
+ * skb->dev shares the rbnode area and must be restored after removal.
+ */
+static struct sk_buff *netem_pull_tfifo(struct netem_sched_data *q,
+					struct Qdisc *sch)
 {
-	if (skb == q->t_head) {
+	struct sk_buff *skb;
+
+	if (q->t_head) {
+		skb = q->t_head;
 		q->t_head = skb->next;
 		if (!q->t_head)
 			q->t_tail = NULL;
 	} else {
-		rb_erase(&skb->rbnode, &q->t_root);
+		struct rb_node *p = rb_first(&q->t_root);
+
+		if (!p)
+			return NULL;
+		skb = rb_to_skb(p);
+		rb_erase(p, &q->t_root);
 	}
+
+	q->t_len--;
+	skb->next = NULL;
+	skb->prev = NULL;
+	skb->dev = qdisc_dev(sch);
+
+	return skb;
 }
 
-static struct sk_buff *netem_dequeue(struct Qdisc *sch)
+/* Update slot pacing counters after releasing a packet */
+static void netem_slot_account(struct netem_sched_data *q,
+			       const struct sk_buff *skb, u64 now)
+{
+	if (!q->slot.slot_next)
+		return;
+
+	q->slot.packets_left--;
+	q->slot.bytes_left -= qdisc_pkt_len(skb);
+	if (q->slot.packets_left <= 0 || q->slot.bytes_left <= 0)
+		get_slot_next(q, now);
+}
+
+/*
+ * Transfer all time-ready packets from the tfifo into the child qdisc,
+ * then dequeue from the child.  Batching the transfers avoids calling
+ * qdisc_enqueue() inside the parent's dequeue path, which confuses
+ * parents that track active/inactive state on qlen transitions (HFSC).
+ */
+static struct sk_buff *netem_dequeue_child(struct Qdisc *sch)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
+	u64 now = ktime_get_ns();
 	struct sk_buff *skb;
 
-tfifo_dequeue:
-	skb = __qdisc_dequeue_head(&sch->q);
-	if (skb) {
-deliver:
-		qdisc_qstats_backlog_dec(sch, skb);
-		qdisc_bstats_update(sch, skb);
-		return skb;
-	}
-	skb = netem_peek(q);
-	if (skb) {
-		u64 time_to_send;
-		u64 now = ktime_get_ns();
-
-		/* if more time remaining? */
-		time_to_send = netem_skb_cb(skb)->time_to_send;
-		if (q->slot.slot_next && q->slot.slot_next < time_to_send)
-			get_slot_next(q, now);
-
-		if (time_to_send <= now && q->slot.slot_next <= now) {
-			netem_erase_head(q, skb);
-			q->t_len--;
-			skb->next = NULL;
-			skb->prev = NULL;
-			/* skb->dev shares skb->rbnode area,
-			 * we need to restore its value.
-			 */
-			skb->dev = qdisc_dev(sch);
-
-			if (q->slot.slot_next) {
-				q->slot.packets_left--;
-				q->slot.bytes_left -= qdisc_pkt_len(skb);
-				if (q->slot.packets_left <= 0 ||
-				    q->slot.bytes_left <= 0)
-					get_slot_next(q, now);
-			}
+	while ((skb = netem_peek(q)) != NULL) {
+		struct sk_buff *to_free = NULL;
+		unsigned int pkt_len;
+		int err;
 
-			if (q->qdisc) {
-				unsigned int pkt_len = qdisc_pkt_len(skb);
-				struct sk_buff *to_free = NULL;
-				int err;
-
-				err = qdisc_enqueue(skb, q->qdisc, &to_free);
-				kfree_skb_list(to_free);
-				if (err != NET_XMIT_SUCCESS) {
-					if (net_xmit_drop_count(err))
-						qdisc_qstats_drop(sch);
-					sch->qstats.backlog -= pkt_len;
-					sch->q.qlen--;
-					qdisc_tree_reduce_backlog(sch, 1, pkt_len);
-				}
-				goto tfifo_dequeue;
-			}
+		if (netem_skb_cb(skb)->time_to_send > now)
+			break;
+		if (q->slot.slot_next && q->slot.slot_next > now)
+			break;
+
+		skb = netem_pull_tfifo(q, sch);
+		netem_slot_account(q, skb, now);
+
+		pkt_len = qdisc_pkt_len(skb);
+		err = qdisc_enqueue(skb, q->qdisc, &to_free);
+		kfree_skb_list(to_free);
+		if (unlikely(err != NET_XMIT_SUCCESS)) {
+			if (net_xmit_drop_count(err))
+				qdisc_qstats_drop(sch);
+			sch->qstats.backlog -= pkt_len;
 			sch->q.qlen--;
-			goto deliver;
+			qdisc_tree_reduce_backlog(sch, 1, pkt_len);
 		}
+	}
 
-		if (q->qdisc) {
-			skb = q->qdisc->ops->dequeue(q->qdisc);
-			if (skb) {
-				sch->q.qlen--;
-				goto deliver;
-			}
-		}
+	skb = q->qdisc->ops->dequeue(q->qdisc);
+	if (skb)
+		sch->q.qlen--;
 
-		qdisc_watchdog_schedule_ns(&q->watchdog,
-					   max(time_to_send,
-					       q->slot.slot_next));
-	}
+	return skb;
+}
 
-	if (q->qdisc) {
-		skb = q->qdisc->ops->dequeue(q->qdisc);
-		if (skb) {
-			sch->q.qlen--;
-			goto deliver;
-		}
+/* Dequeue directly from the tfifo when no child qdisc is configured. */
+static struct sk_buff *netem_dequeue_direct(struct Qdisc *sch)
+{
+	struct netem_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+	u64 time_to_send;
+	u64 now;
+
+	skb = netem_peek(q);
+	if (!skb)
+		return NULL;
+
+	now = ktime_get_ns();
+	time_to_send = netem_skb_cb(skb)->time_to_send;
+
+	if (q->slot.slot_next && q->slot.slot_next < time_to_send)
+		get_slot_next(q, now);
+
+	if (time_to_send > now || q->slot.slot_next > now)
+		return NULL;
+
+	skb = netem_pull_tfifo(q, sch);
+	netem_slot_account(q, skb, now);
+	sch->q.qlen--;
+
+	return skb;
+}
+
+static struct sk_buff *netem_dequeue(struct Qdisc *sch)
+{
+	struct netem_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	/* First check the reorder queue */
+	skb = __qdisc_dequeue_head(&sch->q);
+	if (skb)
+		goto deliver;
+
+	if (q->qdisc)
+		skb = netem_dequeue_child(sch);
+	else
+		skb = netem_dequeue_direct(sch);
+
+	if (skb)
+		goto deliver;
+
+	/* Nothing ready — schedule watchdog for next packet */
+	skb = netem_peek(q);
+	if (skb) {
+		u64 time_to_send = netem_skb_cb(skb)->time_to_send;
+
+		qdisc_watchdog_schedule_ns(&q->watchdog,
+					   max(time_to_send, q->slot.slot_next));
 	}
 	return NULL;
+
+deliver:
+	qdisc_qstats_backlog_dec(sch, skb);
+	qdisc_bstats_update(sch, skb);
+	return skb;
 }
 
 static void netem_reset(struct Qdisc *sch)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net 5/5] net/sched: netem: null-terminate tfifo linear queue tail
  2026-03-28 18:21 [PATCH net 0/5] net/sched: netem: bug fixes found during AI-assisted review Stephen Hemminger
                   ` (3 preceding siblings ...)
  2026-03-28 18:21 ` [PATCH net 4/5] net/sched: netem: restructure dequeue to avoid re-entrancy with child qdisc Stephen Hemminger
@ 2026-03-28 18:21 ` Stephen Hemminger
  4 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2026-03-28 18:21 UTC (permalink / raw)
  To: netdev
  Cc: Stephen Hemminger, Jamal Hadi Salim, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Peter Oskolkov, open list

When tfifo_enqueue() appends a packet to the linear queue tail,
nskb->next is never set to NULL. The list terminates correctly
only by accident if the skb arrived with next already NULL.

Explicitly null-terminate the tail to prevent list corruption.

Fixes: d66280b12bd7 ("net: netem: use a list in addition to rbtree")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 net/sched/sch_netem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index ce12b64603b2..4b27fab72fef 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -398,6 +398,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 			q->t_tail->next = nskb;
 		else
 			q->t_head = nskb;
+		nskb->next = NULL;
 		q->t_tail = nskb;
 	} else {
 		struct rb_node **p = &q->t_root.rb_node, *parent = NULL;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-28 18:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-28 18:21 [PATCH net 0/5] net/sched: netem: bug fixes found during AI-assisted review Stephen Hemminger
2026-03-28 18:21 ` [PATCH net 1/5] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
2026-03-28 18:21 ` [PATCH net 2/5] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
2026-03-28 18:21 ` [PATCH net 3/5] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
2026-03-28 18:21 ` [PATCH net 4/5] net/sched: netem: restructure dequeue to avoid re-entrancy with child qdisc Stephen Hemminger
2026-03-28 18:21 ` [PATCH net 5/5] net/sched: netem: null-terminate tfifo linear queue tail Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox