* [PATCH net v4 0/8] net/sched: netem bug fixes
@ 2026-04-06 17:25 Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 1/8] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
` (7 more replies)
0 siblings, 8 replies; 12+ messages in thread
From: Stephen Hemminger @ 2026-04-06 17:25 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger
These bugs were identified while using AI-assisted code review of
sch_netem.c to analyze the packet duplication re-entrancy problem
(CVE-2025-37890, CVE-2025-38001), which are addressed in a separate
series.
The review uncovered several additional issues:
- probability gaps in the 4-state Markov loss model where
boundary values produce no state transition
- queue limit check not accounting for reordered packets
- PRNG reseeded on every tc change, breaking reproducibility
- the core dequeue re-entrancy issue with child qdiscs
causing HFSC eltree corruption and DRR class stalls
- missing NULL termination on the tfifo linear list tail
- slot delay configuration not validated for inverted ranges
- slot delay arithmetic overflow for ranges above ~2.1 seconds
v4 - split refactoring and fix for dequeue into two patches
Stephen Hemminger (8):
net/sched: netem: fix probability gaps in 4-state loss model
net/sched: netem: fix queue limit check to include reordered packets
net/sched: netem: only reseed PRNG when seed is explicitly provided
net/sched: netem: refactor dequeue into helper functions
net/sched: netem: batch-transfer ready packets to avoid child
re-entrancy
net/sched: netem: null-terminate tfifo linear queue tail
net/sched: netem: check for invalid slot range
net/sched: netem: fix slot delay calculation overflow
net/sched/sch_netem.c | 245 +++++++++++++++++++++++++++---------------
1 file changed, 160 insertions(+), 85 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net v4 1/8] net/sched: netem: fix probability gaps in 4-state loss model
2026-04-06 17:25 [PATCH net v4 0/8] net/sched: netem bug fixes Stephen Hemminger
@ 2026-04-06 17:25 ` Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 2/8] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2026-04-06 17:25 UTC (permalink / raw)
To: netdev
Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
open list
The 4-state Markov chain in loss_4state() has gaps at the boundaries
between transition probability ranges. The comparisons use:
if (rnd < a4)
else if (a4 < rnd && rnd < a1 + a4)
When rnd equals a boundary value exactly, neither branch matches and
no state transition occurs. The redundant lower-bound check (a4 < rnd)
is already implied by being in the else branch.
Remove the unnecessary lower-bound comparisons so the ranges are
contiguous and every random value produces a transition, matching
the GI (General and Intuitive) loss model specification.
This bug goes back to original implementation of this model.
Fixes: 661b79725fea ("netem: revised correlated loss generator")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
net/sched/sch_netem.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 20df1c08b1e9..8ee72cac1faf 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -227,10 +227,10 @@ static bool loss_4state(struct netem_sched_data *q)
if (rnd < clg->a4) {
clg->state = LOST_IN_GAP_PERIOD;
return true;
- } else if (clg->a4 < rnd && rnd < clg->a1 + clg->a4) {
+ } else if (rnd < clg->a1 + clg->a4) {
clg->state = LOST_IN_BURST_PERIOD;
return true;
- } else if (clg->a1 + clg->a4 < rnd) {
+ } else {
clg->state = TX_IN_GAP_PERIOD;
}
@@ -247,9 +247,9 @@ static bool loss_4state(struct netem_sched_data *q)
case LOST_IN_BURST_PERIOD:
if (rnd < clg->a3)
clg->state = TX_IN_BURST_PERIOD;
- else if (clg->a3 < rnd && rnd < clg->a2 + clg->a3) {
+ else if (rnd < clg->a2 + clg->a3) {
clg->state = TX_IN_GAP_PERIOD;
- } else if (clg->a2 + clg->a3 < rnd) {
+ } else {
clg->state = LOST_IN_BURST_PERIOD;
return true;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net v4 2/8] net/sched: netem: fix queue limit check to include reordered packets
2026-04-06 17:25 [PATCH net v4 0/8] net/sched: netem bug fixes Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 1/8] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
@ 2026-04-06 17:25 ` Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 3/8] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
` (5 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2026-04-06 17:25 UTC (permalink / raw)
To: netdev
Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
open list
The queue limit check in netem_enqueue() uses q->t_len which only
counts packets in the internal tfifo. Packets placed in sch->q by
the reorder path (__qdisc_enqueue_head) are not counted, allowing
the total queue occupancy to exceed sch->limit under reordering.
Include sch->q.qlen in the limit check.
Fixes: 50612537e9ab ("netem: fix classful handling")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
net/sched/sch_netem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 8ee72cac1faf..d400a730eadd 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -524,7 +524,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
1 << get_random_u32_below(8);
}
- if (unlikely(q->t_len >= sch->limit)) {
+ if (unlikely(sch->q.qlen >= sch->limit)) {
/* re-link segs, so that qdisc_drop_all() frees them all */
skb->next = segs;
qdisc_drop_all(skb, sch, to_free);
--
2.53.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net v4 3/8] net/sched: netem: only reseed PRNG when seed is explicitly provided
2026-04-06 17:25 [PATCH net v4 0/8] net/sched: netem bug fixes Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 1/8] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 2/8] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
@ 2026-04-06 17:25 ` Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 4/8] net/sched: netem: refactor dequeue into helper functions Stephen Hemminger
` (4 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2026-04-06 17:25 UTC (permalink / raw)
To: netdev
Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
François Michel, open list
netem_change() unconditionally reseeds the PRNG on every tc change
command. If TCA_NETEM_PRNG_SEED is not specified, a new random seed
is generated, destroying reproducibility for users who set a
deterministic seed on a previous change.
Move the initial random seed generation to netem_init() and only
reseed in netem_change() when TCA_NETEM_PRNG_SEED is explicitly
provided by the user.
Fixes: 4072d97ddc44 ("netem: add prng attribute to netem_sched_data")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
net/sched/sch_netem.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index d400a730eadd..556f9747f0e7 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -1112,11 +1112,10 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
/* capping jitter to the range acceptable by tabledist() */
q->jitter = min_t(s64, abs(q->jitter), INT_MAX);
- if (tb[TCA_NETEM_PRNG_SEED])
+ if (tb[TCA_NETEM_PRNG_SEED]) {
q->prng.seed = nla_get_u64(tb[TCA_NETEM_PRNG_SEED]);
- else
- q->prng.seed = get_random_u64();
- prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+ prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+ }
unlock:
sch_tree_unlock(sch);
@@ -1139,6 +1138,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt,
return -EINVAL;
q->loss_model = CLG_RANDOM;
+ q->prng.seed = get_random_u64();
+ prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+
ret = netem_change(sch, opt, extack);
if (ret)
pr_info("netem: change failed\n");
--
2.53.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net v4 4/8] net/sched: netem: refactor dequeue into helper functions
2026-04-06 17:25 [PATCH net v4 0/8] net/sched: netem bug fixes Stephen Hemminger
` (2 preceding siblings ...)
2026-04-06 17:25 ` [PATCH net v4 3/8] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
@ 2026-04-06 17:25 ` Stephen Hemminger
2026-04-10 11:39 ` Simon Horman
2026-04-06 17:25 ` [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy Stephen Hemminger
` (3 subsequent siblings)
7 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2026-04-06 17:25 UTC (permalink / raw)
To: netdev
Cc: Stephen Hemminger, Jamal Hadi Salim, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
open list
Extract the tfifo removal, slot accounting, and child/direct dequeue
paths from the monolithic netem_dequeue() into separate helpers:
netem_pull_tfifo() - remove head packet from tfifo
netem_slot_account() - update slot pacing counters
netem_dequeue_child() - enqueue to child, then dequeue from child
netem_dequeue_direct()- dequeue from tfifo when no child
This replaces the goto-based control flow with straightforward function
calls, making the code easier to follow and modify.
No functional change intended.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
net/sched/sch_netem.c | 190 +++++++++++++++++++++++++++---------------
1 file changed, 123 insertions(+), 67 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 556f9747f0e7..e264f7aefb97 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -689,99 +689,155 @@ static struct sk_buff *netem_peek(struct netem_sched_data *q)
return q->t_head;
}
-static void netem_erase_head(struct netem_sched_data *q, struct sk_buff *skb)
+/*
+ * Pop the head packet from the tfifo and prepare it for delivery.
+ * skb->dev shares the rbnode area and must be restored after removal.
+ */
+static struct sk_buff *netem_pull_tfifo(struct netem_sched_data *q,
+ struct Qdisc *sch)
{
- if (skb == q->t_head) {
+ struct sk_buff *skb;
+
+ if (q->t_head) {
+ skb = q->t_head;
q->t_head = skb->next;
if (!q->t_head)
q->t_tail = NULL;
} else {
- rb_erase(&skb->rbnode, &q->t_root);
+ struct rb_node *p = rb_first(&q->t_root);
+
+ if (!p)
+ return NULL;
+ skb = rb_to_skb(p);
+ rb_erase(p, &q->t_root);
}
+
+ q->t_len--;
+ skb->next = NULL;
+ skb->prev = NULL;
+ skb->dev = qdisc_dev(sch);
+
+ return skb;
}
-static struct sk_buff *netem_dequeue(struct Qdisc *sch)
+/* Update slot pacing counters after releasing a packet */
+static void netem_slot_account(struct netem_sched_data *q,
+ const struct sk_buff *skb, u64 now)
+{
+ if (!q->slot.slot_next)
+ return;
+
+ q->slot.packets_left--;
+ q->slot.bytes_left -= qdisc_pkt_len(skb);
+ if (q->slot.packets_left <= 0 || q->slot.bytes_left <= 0)
+ get_slot_next(q, now);
+}
+
+/*
+ * Transfer time-ready packets from the tfifo into the child qdisc,
+ * then dequeue from the child.
+ */
+static struct sk_buff *netem_dequeue_child(struct Qdisc *sch)
{
struct netem_sched_data *q = qdisc_priv(sch);
+ u64 now = ktime_get_ns();
struct sk_buff *skb;
-tfifo_dequeue:
- skb = __qdisc_dequeue_head(&sch->q);
- if (skb) {
-deliver:
- qdisc_qstats_backlog_dec(sch, skb);
- qdisc_bstats_update(sch, skb);
- return skb;
- }
skb = netem_peek(q);
if (skb) {
- u64 time_to_send;
- u64 now = ktime_get_ns();
+ u64 time_to_send = netem_skb_cb(skb)->time_to_send;
- /* if more time remaining? */
- time_to_send = netem_skb_cb(skb)->time_to_send;
if (q->slot.slot_next && q->slot.slot_next < time_to_send)
get_slot_next(q, now);
if (time_to_send <= now && q->slot.slot_next <= now) {
- netem_erase_head(q, skb);
- q->t_len--;
- skb->next = NULL;
- skb->prev = NULL;
- /* skb->dev shares skb->rbnode area,
- * we need to restore its value.
- */
- skb->dev = qdisc_dev(sch);
-
- if (q->slot.slot_next) {
- q->slot.packets_left--;
- q->slot.bytes_left -= qdisc_pkt_len(skb);
- if (q->slot.packets_left <= 0 ||
- q->slot.bytes_left <= 0)
- get_slot_next(q, now);
- }
-
- if (q->qdisc) {
- unsigned int pkt_len = qdisc_pkt_len(skb);
- struct sk_buff *to_free = NULL;
- int err;
-
- err = qdisc_enqueue(skb, q->qdisc, &to_free);
- kfree_skb_list(to_free);
- if (err != NET_XMIT_SUCCESS) {
- if (net_xmit_drop_count(err))
- qdisc_qstats_drop(sch);
- sch->qstats.backlog -= pkt_len;
- sch->q.qlen--;
- qdisc_tree_reduce_backlog(sch, 1, pkt_len);
- }
- goto tfifo_dequeue;
- }
- sch->q.qlen--;
- goto deliver;
- }
-
- if (q->qdisc) {
- skb = q->qdisc->ops->dequeue(q->qdisc);
- if (skb) {
+ struct sk_buff *to_free = NULL;
+ unsigned int pkt_len;
+ int err;
+
+ skb = netem_pull_tfifo(q, sch);
+ netem_slot_account(q, skb, now);
+
+ pkt_len = qdisc_pkt_len(skb);
+ err = qdisc_enqueue(skb, q->qdisc, &to_free);
+ kfree_skb_list(to_free);
+ if (err != NET_XMIT_SUCCESS) {
+ if (net_xmit_drop_count(err))
+ qdisc_qstats_drop(sch);
+ sch->qstats.backlog -= pkt_len;
sch->q.qlen--;
- goto deliver;
+ qdisc_tree_reduce_backlog(sch, 1, pkt_len);
}
}
-
- qdisc_watchdog_schedule_ns(&q->watchdog,
- max(time_to_send,
- q->slot.slot_next));
}
- if (q->qdisc) {
- skb = q->qdisc->ops->dequeue(q->qdisc);
- if (skb) {
- sch->q.qlen--;
- goto deliver;
- }
+ skb = q->qdisc->ops->dequeue(q->qdisc);
+ if (skb)
+ sch->q.qlen--;
+
+ return skb;
+}
+
+/* Dequeue directly from the tfifo when no child qdisc is configured. */
+static struct sk_buff *netem_dequeue_direct(struct Qdisc *sch)
+{
+ struct netem_sched_data *q = qdisc_priv(sch);
+ struct sk_buff *skb;
+ u64 time_to_send;
+ u64 now;
+
+ skb = netem_peek(q);
+ if (!skb)
+ return NULL;
+
+ now = ktime_get_ns();
+ time_to_send = netem_skb_cb(skb)->time_to_send;
+
+ if (q->slot.slot_next && q->slot.slot_next < time_to_send)
+ get_slot_next(q, now);
+
+ if (time_to_send > now || q->slot.slot_next > now)
+ return NULL;
+
+ skb = netem_pull_tfifo(q, sch);
+ netem_slot_account(q, skb, now);
+ sch->q.qlen--;
+
+ return skb;
+}
+
+static struct sk_buff *netem_dequeue(struct Qdisc *sch)
+{
+ struct netem_sched_data *q = qdisc_priv(sch);
+ struct sk_buff *skb;
+
+ /* First check the reorder queue */
+ skb = __qdisc_dequeue_head(&sch->q);
+ if (skb)
+ goto deliver;
+
+ if (q->qdisc)
+ skb = netem_dequeue_child(sch);
+ else
+ skb = netem_dequeue_direct(sch);
+
+ if (skb)
+ goto deliver;
+
+ /* Nothing ready — schedule watchdog for next packet */
+ skb = netem_peek(q);
+ if (skb) {
+ u64 time_to_send = netem_skb_cb(skb)->time_to_send;
+
+ qdisc_watchdog_schedule_ns(&q->watchdog,
+ max(time_to_send, q->slot.slot_next));
}
return NULL;
+
+deliver:
+ qdisc_qstats_backlog_dec(sch, skb);
+ qdisc_bstats_update(sch, skb);
+ return skb;
}
static void netem_reset(struct Qdisc *sch)
--
2.53.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy
2026-04-06 17:25 [PATCH net v4 0/8] net/sched: netem bug fixes Stephen Hemminger
` (3 preceding siblings ...)
2026-04-06 17:25 ` [PATCH net v4 4/8] net/sched: netem: refactor dequeue into helper functions Stephen Hemminger
@ 2026-04-06 17:25 ` Stephen Hemminger
2026-04-10 11:36 ` Simon Horman
2026-04-10 11:39 ` Simon Horman
2026-04-06 17:25 ` [PATCH net v4 6/8] net/sched: netem: null-terminate tfifo linear queue tail Stephen Hemminger
` (2 subsequent siblings)
7 siblings, 2 replies; 12+ messages in thread
From: Stephen Hemminger @ 2026-04-06 17:25 UTC (permalink / raw)
To: netdev
Cc: Stephen Hemminger, Jamal Hadi Salim, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
open list
netem_dequeue_child() previously transferred one packet from the tfifo
to the child qdisc per dequeue call. Parents like HFSC that track
class active/inactive state on qlen transitions could see an enqueue
during dequeue, causing double-insertion into the eltree
(CVE-2025-37890, CVE-2025-38001). Non-work-conserving children like
TBF could also refuse to return a just-enqueued packet, making netem
return NULL despite having backlog, which causes parents like DRR to
incorrectly deactivate the class.
Move all time-ready packets into the child before calling its dequeue.
This separates the enqueue and dequeue phases so the parent sees
consistent qlen transitions.
Fixes: 50612537e9ab ("netem: fix classful handling")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
net/sched/sch_netem.c | 49 +++++++++++++++++++++----------------------
1 file changed, 24 insertions(+), 25 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index e264f7aefb97..b93f0e886a2b 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -734,8 +734,10 @@ static void netem_slot_account(struct netem_sched_data *q,
}
/*
- * Transfer time-ready packets from the tfifo into the child qdisc,
- * then dequeue from the child.
+ * Transfer all time-ready packets from the tfifo into the child qdisc,
+ * then dequeue from the child. Batching the transfers avoids calling
+ * qdisc_enqueue() inside the parent's dequeue path, which confuses
+ * parents that track active/inactive state on qlen transitions (HFSC).
*/
static struct sk_buff *netem_dequeue_child(struct Qdisc *sch)
{
@@ -743,31 +745,28 @@ static struct sk_buff *netem_dequeue_child(struct Qdisc *sch)
u64 now = ktime_get_ns();
struct sk_buff *skb;
- skb = netem_peek(q);
- if (skb) {
- u64 time_to_send = netem_skb_cb(skb)->time_to_send;
-
- if (q->slot.slot_next && q->slot.slot_next < time_to_send)
- get_slot_next(q, now);
-
- if (time_to_send <= now && q->slot.slot_next <= now) {
- struct sk_buff *to_free = NULL;
- unsigned int pkt_len;
- int err;
+ while ((skb = netem_peek(q)) != NULL) {
+ struct sk_buff *to_free = NULL;
+ unsigned int pkt_len;
+ int err;
- skb = netem_pull_tfifo(q, sch);
- netem_slot_account(q, skb, now);
+ if (netem_skb_cb(skb)->time_to_send > now)
+ break;
+ if (q->slot.slot_next && q->slot.slot_next > now)
+ break;
- pkt_len = qdisc_pkt_len(skb);
- err = qdisc_enqueue(skb, q->qdisc, &to_free);
- kfree_skb_list(to_free);
- if (err != NET_XMIT_SUCCESS) {
- if (net_xmit_drop_count(err))
- qdisc_qstats_drop(sch);
- sch->qstats.backlog -= pkt_len;
- sch->q.qlen--;
- qdisc_tree_reduce_backlog(sch, 1, pkt_len);
- }
+ skb = netem_pull_tfifo(q, sch);
+ netem_slot_account(q, skb, now);
+
+ pkt_len = qdisc_pkt_len(skb);
+ err = qdisc_enqueue(skb, q->qdisc, &to_free);
+ kfree_skb_list(to_free);
+ if (unlikely(err != NET_XMIT_SUCCESS)) {
+ if (net_xmit_drop_count(err))
+ qdisc_qstats_drop(sch);
+ sch->qstats.backlog -= pkt_len;
+ sch->q.qlen--;
+ qdisc_tree_reduce_backlog(sch, 1, pkt_len);
}
}
--
2.53.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net v4 6/8] net/sched: netem: null-terminate tfifo linear queue tail
2026-04-06 17:25 [PATCH net v4 0/8] net/sched: netem bug fixes Stephen Hemminger
` (4 preceding siblings ...)
2026-04-06 17:25 ` [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy Stephen Hemminger
@ 2026-04-06 17:25 ` Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 7/8] net/sched: netem: check for invalid slot range Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 8/8] net/sched: netem: fix slot delay calculation overflow Stephen Hemminger
7 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2026-04-06 17:25 UTC (permalink / raw)
To: netdev
Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Peter Oskolkov, open list
When tfifo_enqueue() appends a packet to the linear queue tail,
nskb->next is never set to NULL. The list terminates correctly
only by accident if the skb arrived with next already NULL.
Explicitly null-terminate the tail to prevent list corruption.
Fixes: d66280b12bd7 ("net: netem: use a list in addition to rbtree")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
net/sched/sch_netem.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index b93f0e886a2b..e405bf862163 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -398,6 +398,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
q->t_tail->next = nskb;
else
q->t_head = nskb;
+ nskb->next = NULL;
q->t_tail = nskb;
} else {
struct rb_node **p = &q->t_root.rb_node, *parent = NULL;
--
2.53.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net v4 7/8] net/sched: netem: check for invalid slot range
2026-04-06 17:25 [PATCH net v4 0/8] net/sched: netem bug fixes Stephen Hemminger
` (5 preceding siblings ...)
2026-04-06 17:25 ` [PATCH net v4 6/8] net/sched: netem: null-terminate tfifo linear queue tail Stephen Hemminger
@ 2026-04-06 17:25 ` Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 8/8] net/sched: netem: fix slot delay calculation overflow Stephen Hemminger
7 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2026-04-06 17:25 UTC (permalink / raw)
To: netdev
Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Neal Cardwell, Yousuk Seung, open list
Reject slot configuration where min_delay exceeds max_delay.
The delay range computation in get_slot_next() underflows in
this case, producing bogus results.
Fixes: 0a9fe5c375b5 ("netem: slotting with non-uniform distribution")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
net/sched/sch_netem.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index e405bf862163..c82f76af41aa 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -883,6 +883,18 @@ static int get_dist_table(struct disttable **tbl, const struct nlattr *attr)
return 0;
}
+static int validate_slot(const struct nlattr *attr,
+ struct netlink_ext_ack *extack)
+{
+ const struct tc_netem_slot *c = nla_data(attr);
+
+ if (c->min_delay > c->max_delay) {
+ NL_SET_ERR_MSG(extack, "slot min delay greater than max delay");
+ return -EINVAL;
+ }
+ return 0;
+}
+
static void get_slot(struct netem_sched_data *q, const struct nlattr *attr)
{
const struct tc_netem_slot *c = nla_data(attr);
@@ -1096,6 +1108,12 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
goto table_free;
}
+ if (tb[TCA_NETEM_SLOT]) {
+ ret = validate_slot(tb[TCA_NETEM_SLOT], extack);
+ if (ret)
+ goto table_free;
+ }
+
sch_tree_lock(sch);
/* backup q->clg and q->loss_model */
old_clg = q->clg;
--
2.53.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net v4 8/8] net/sched: netem: fix slot delay calculation overflow
2026-04-06 17:25 [PATCH net v4 0/8] net/sched: netem bug fixes Stephen Hemminger
` (6 preceding siblings ...)
2026-04-06 17:25 ` [PATCH net v4 7/8] net/sched: netem: check for invalid slot range Stephen Hemminger
@ 2026-04-06 17:25 ` Stephen Hemminger
7 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2026-04-06 17:25 UTC (permalink / raw)
To: netdev
Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Yousuk Seung, Neal Cardwell, open list
get_slot_next() computes a random delay between min_delay and
max_delay using:
get_random_u32() * (max_delay - min_delay) >> 32
This overflows signed 64-bit arithmetic when the delay range exceeds
approximately 2.1 seconds (2^31 nanoseconds), producing a negative
result that effectively disables slot-based pacing. This is a
realistic configuration for WAN emulation (e.g., slot 1s 5s).
Use mul_u64_u32_shr() which handles the widening multiply without
overflow.
Fixes: 0a9fe5c375b5 ("netem: slotting with non-uniform distribution")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
net/sched/sch_netem.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index c82f76af41aa..d47e1b0d7942 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -660,9 +660,8 @@ static void get_slot_next(struct netem_sched_data *q, u64 now)
if (!q->slot_dist)
next_delay = q->slot_config.min_delay +
- (get_random_u32() *
- (q->slot_config.max_delay -
- q->slot_config.min_delay) >> 32);
+ mul_u64_u32_shr(q->slot_config.max_delay - q->slot_config.min_delay,
+ get_random_u32(), 32);
else
next_delay = tabledist(q->slot_config.dist_delay,
(s32)(q->slot_config.dist_jitter),
--
2.53.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy
2026-04-06 17:25 ` [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy Stephen Hemminger
@ 2026-04-10 11:36 ` Simon Horman
2026-04-10 11:39 ` Simon Horman
1 sibling, 0 replies; 12+ messages in thread
From: Simon Horman @ 2026-04-10 11:36 UTC (permalink / raw)
To: stephen
Cc: Simon Horman, ncardwell, open list, francois.michel, davem,
netdev, pabeni, ysseung, edumazet, jhs, kuba, posk, jiri
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net/sched: netem: batch-transfer ready packets to avoid child re-entrancy
netem_dequeue_child() previously transferred one packet from the tfifo
to the child qdisc per dequeue call. Parents like HFSC that track
class active/inactive state on qlen transitions could see an enqueue
during dequeue, causing double-insertion into the eltree
(CVE-2025-37890, CVE-2025-38001). Non-work-conserving children like
TBF could also refuse to return a just-enqueued packet, making netem
return NULL despite having backlog, which causes parents like DRR to
incorrectly deactivate the class.
Move all time-ready packets into the child before calling its dequeue.
This separates the enqueue and dequeue phases so the parent sees
consistent qlen transitions.
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index e264f7aefb97..b93f0e886a2b 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
[ ... ]
> @@ -743,31 +745,28 @@ static struct sk_buff *netem_dequeue_child(struct Qdisc *sch)
> u64 now = ktime_get_ns();
> struct sk_buff *skb;
>
> - skb = netem_peek(q);
> - if (skb) {
> - u64 time_to_send = netem_skb_cb(skb)->time_to_send;
> -
> - if (q->slot.slot_next && q->slot.slot_next < time_to_send)
> - get_slot_next(q, now);
> -
> - if (time_to_send <= now && q->slot.slot_next <= now) {
> - struct sk_buff *to_free = NULL;
> - unsigned int pkt_len;
> - int err;
> + while ((skb = netem_peek(q)) != NULL) {
> + struct sk_buff *to_free = NULL;
> + unsigned int pkt_len;
> + int err;
>
> - skb = netem_pull_tfifo(q, sch);
> - netem_slot_account(q, skb, now);
> + if (netem_skb_cb(skb)->time_to_send > now)
> + break;
> + if (q->slot.slot_next && q->slot.slot_next > now)
> + break;
Does the batch transfer preserve slot gap behavior?
The old code called get_slot_next(q, now) when a slot expired by time:
if (q->slot.slot_next && q->slot.slot_next < time_to_send)
get_slot_next(q, now);
This advanced the slot to a new future window (slot_next = now + next_delay),
enforcing inter-slot gaps when a slot expired by time rather than by packet
or byte budget exhaustion.
The new batching loop only checks 'q->slot.slot_next > now' to break, and
relies solely on netem_slot_account() -> get_slot_next() for slot advancement
(which only triggers when packets_left or bytes_left reach zero).
When a slot expires by time but still has remaining budget, the new code
transfers packets immediately without creating a new inter-slot gap.
netem_dequeue_direct() still has the old slot gap logic:
if (q->slot.slot_next && q->slot.slot_next < time_to_send)
get_slot_next(q, now);
Should netem_dequeue_child() call get_slot_next() before the batching loop
to maintain consistent slot-based traffic shaping semantics between the
direct and child paths?
>
> - pkt_len = qdisc_pkt_len(skb);
> - err = qdisc_enqueue(skb, q->qdisc, &to_free);
> - kfree_skb_list(to_free);
> - if (err != NET_XMIT_SUCCESS) {
> - if (net_xmit_drop_count(err))
> - qdisc_qstats_drop(sch);
> - sch->qstats.backlog -= pkt_len;
> - sch->q.qlen--;
> - qdisc_tree_reduce_backlog(sch, 1, pkt_len);
> - }
> + skb = netem_pull_tfifo(q, sch);
> + netem_slot_account(q, skb, now);
> +
> + pkt_len = qdisc_pkt_len(skb);
> + err = qdisc_enqueue(skb, q->qdisc, &to_free);
> + kfree_skb_list(to_free);
> + if (unlikely(err != NET_XMIT_SUCCESS)) {
> + if (net_xmit_drop_count(err))
> + qdisc_qstats_drop(sch);
> + sch->qstats.backlog -= pkt_len;
> + sch->q.qlen--;
> + qdisc_tree_reduce_backlog(sch, 1, pkt_len);
> }
> }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy
2026-04-06 17:25 ` [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy Stephen Hemminger
2026-04-10 11:36 ` Simon Horman
@ 2026-04-10 11:39 ` Simon Horman
1 sibling, 0 replies; 12+ messages in thread
From: Simon Horman @ 2026-04-10 11:39 UTC (permalink / raw)
To: Stephen Hemminger
Cc: netdev, Jamal Hadi Salim, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, open list
On Mon, Apr 06, 2026 at 10:25:13AM -0700, Stephen Hemminger wrote:
> netem_dequeue_child() previously transferred one packet from the tfifo
> to the child qdisc per dequeue call. Parents like HFSC that track
> class active/inactive state on qlen transitions could see an enqueue
> during dequeue, causing double-insertion into the eltree
> (CVE-2025-37890, CVE-2025-38001). Non-work-conserving children like
> TBF could also refuse to return a just-enqueued packet, making netem
> return NULL despite having backlog, which causes parents like DRR to
> incorrectly deactivate the class.
>
> Move all time-ready packets into the child before calling its dequeue.
> This separates the enqueue and dequeue phases so the parent sees
> consistent qlen transitions.
>
> Fixes: 50612537e9ab ("netem: fix classful handling")
>
nit: no blank line here
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
I forwarded an AI generated review separately.
Because I couldn't convince myself it wasn't valid.
...
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v4 4/8] net/sched: netem: refactor dequeue into helper functions
2026-04-06 17:25 ` [PATCH net v4 4/8] net/sched: netem: refactor dequeue into helper functions Stephen Hemminger
@ 2026-04-10 11:39 ` Simon Horman
0 siblings, 0 replies; 12+ messages in thread
From: Simon Horman @ 2026-04-10 11:39 UTC (permalink / raw)
To: Stephen Hemminger
Cc: netdev, Jamal Hadi Salim, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, open list
On Mon, Apr 06, 2026 at 10:25:12AM -0700, Stephen Hemminger wrote:
> Extract the tfifo removal, slot accounting, and child/direct dequeue
> paths from the monolithic netem_dequeue() into separate helpers:
>
> netem_pull_tfifo() - remove head packet from tfifo
> netem_slot_account() - update slot pacing counters
> netem_dequeue_child() - enqueue to child, then dequeue from child
> netem_dequeue_direct()- dequeue from tfifo when no child
>
> This replaces the goto-based control flow with straightforward function
> calls, making the code easier to follow and modify.
>
> No functional change intended.
Manually verified (as best I could :)
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Thanks for splitting this out.
It was very helpful.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-04-10 11:40 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-06 17:25 [PATCH net v4 0/8] net/sched: netem bug fixes Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 1/8] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 2/8] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 3/8] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 4/8] net/sched: netem: refactor dequeue into helper functions Stephen Hemminger
2026-04-10 11:39 ` Simon Horman
2026-04-06 17:25 ` [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy Stephen Hemminger
2026-04-10 11:36 ` Simon Horman
2026-04-10 11:39 ` Simon Horman
2026-04-06 17:25 ` [PATCH net v4 6/8] net/sched: netem: null-terminate tfifo linear queue tail Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 7/8] net/sched: netem: check for invalid slot range Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 8/8] net/sched: netem: fix slot delay calculation overflow Stephen Hemminger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.