public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] netem: bug fixes
@ 2026-04-15 14:27 Stephen Hemminger
  2026-04-15 14:27 ` [PATCH v7 1/5] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-04-15 14:27 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger

These bugs were found when doing AI assisted  review of sch_netem.c
during investigation of the packet duplication recursion problem
addressed in Jamal's series.

The fixes cover:

 - probability gaps in the 4-state Markov loss model
 - queue limit not accounting for reordered packets
 - PRNG reseeded on every tc change, breaking reproducibility
 - slot delay configuration not validated for inverted ranges
 - slot delay arithmetic overflow for ranges above ~2.1 seconds

v7 - queue limit check Fixes: goes back further to earlier change
   - use NL_SET_ERR_MSG_ATTR

Stephen Hemminger (5):
  net/sched: netem: fix probability gaps in 4-state loss model
  net/sched: netem: fix queue limit check to include reordered packets
  net/sched: netem: only reseed PRNG when seed is explicitly provided
  net/sched: netem: check for invalid slot range
  net/sched: netem: fix slot delay calculation overflow

 net/sched/sch_netem.c | 44 +++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 12 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v7 1/5] net/sched: netem: fix probability gaps in 4-state loss model
  2026-04-15 14:27 [PATCH v7 0/5] netem: bug fixes Stephen Hemminger
@ 2026-04-15 14:27 ` Stephen Hemminger
  2026-04-15 14:27 ` [PATCH v7 2/5] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-04-15 14:27 UTC (permalink / raw)
  To: netdev
  Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	open list

The 4-state Markov chain in loss_4state() has gaps at the boundaries
between transition probability ranges. The comparisons use:

  if (rnd < a4)
  else if (a4 < rnd && rnd < a1 + a4)

When rnd equals a boundary value exactly, neither branch matches and
no state transition occurs. The redundant lower-bound check (a4 < rnd)
is already implied by being in the else branch.

Remove the unnecessary lower-bound comparisons so the ranges are
contiguous and every random value produces a transition, matching
the GI (General and Intuitive) loss model specification.

This bug goes back to original implementation of this model.

Fixes: 661b79725fea ("netem: revised correlated loss generator")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 net/sched/sch_netem.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 20df1c08b1e9..8ee72cac1faf 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -227,10 +227,10 @@ static bool loss_4state(struct netem_sched_data *q)
 		if (rnd < clg->a4) {
 			clg->state = LOST_IN_GAP_PERIOD;
 			return true;
-		} else if (clg->a4 < rnd && rnd < clg->a1 + clg->a4) {
+		} else if (rnd < clg->a1 + clg->a4) {
 			clg->state = LOST_IN_BURST_PERIOD;
 			return true;
-		} else if (clg->a1 + clg->a4 < rnd) {
+		} else {
 			clg->state = TX_IN_GAP_PERIOD;
 		}
 
@@ -247,9 +247,9 @@ static bool loss_4state(struct netem_sched_data *q)
 	case LOST_IN_BURST_PERIOD:
 		if (rnd < clg->a3)
 			clg->state = TX_IN_BURST_PERIOD;
-		else if (clg->a3 < rnd && rnd < clg->a2 + clg->a3) {
+		else if (rnd < clg->a2 + clg->a3) {
 			clg->state = TX_IN_GAP_PERIOD;
-		} else if (clg->a2 + clg->a3 < rnd) {
+		} else {
 			clg->state = LOST_IN_BURST_PERIOD;
 			return true;
 		}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v7 2/5] net/sched: netem: fix queue limit check to include reordered packets
  2026-04-15 14:27 [PATCH v7 0/5] netem: bug fixes Stephen Hemminger
  2026-04-15 14:27 ` [PATCH v7 1/5] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
@ 2026-04-15 14:27 ` Stephen Hemminger
  2026-04-15 14:27 ` [PATCH v7 3/5] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-04-15 14:27 UTC (permalink / raw)
  To: netdev
  Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Martin Ottens, open list

The queue limit check in netem_enqueue() uses q->t_len which only
counts packets in the internal tfifo. Packets placed in sch->q by
the reorder path (__qdisc_enqueue_head) are not counted, allowing
the total queue occupancy to exceed sch->limit under reordering.

Include sch->q.qlen in the limit check.

Fixes: f8d4bc455047 ("net/sched: netem: account for backlog updates from child qdisc")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 net/sched/sch_netem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 8ee72cac1faf..d400a730eadd 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -524,7 +524,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 				1 << get_random_u32_below(8);
 	}
 
-	if (unlikely(q->t_len >= sch->limit)) {
+	if (unlikely(sch->q.qlen >= sch->limit)) {
 		/* re-link segs, so that qdisc_drop_all() frees them all */
 		skb->next = segs;
 		qdisc_drop_all(skb, sch, to_free);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v7 3/5] net/sched: netem: only reseed PRNG when seed is explicitly provided
  2026-04-15 14:27 [PATCH v7 0/5] netem: bug fixes Stephen Hemminger
  2026-04-15 14:27 ` [PATCH v7 1/5] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
  2026-04-15 14:27 ` [PATCH v7 2/5] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
@ 2026-04-15 14:27 ` Stephen Hemminger
  2026-04-15 14:27 ` [PATCH v7 4/5] net/sched: netem: check for invalid slot range Stephen Hemminger
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-04-15 14:27 UTC (permalink / raw)
  To: netdev
  Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	François Michel, open list

netem_change() unconditionally reseeds the PRNG on every tc change
command. If TCA_NETEM_PRNG_SEED is not specified, a new random seed
is generated, destroying reproducibility for users who set a
deterministic seed on a previous change.

Move the initial random seed generation to netem_init() and only
reseed in netem_change() when TCA_NETEM_PRNG_SEED is explicitly
provided by the user.

Fixes: 4072d97ddc44 ("netem: add prng attribute to netem_sched_data")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 net/sched/sch_netem.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index d400a730eadd..556f9747f0e7 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -1112,11 +1112,10 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
 	/* capping jitter to the range acceptable by tabledist() */
 	q->jitter = min_t(s64, abs(q->jitter), INT_MAX);
 
-	if (tb[TCA_NETEM_PRNG_SEED])
+	if (tb[TCA_NETEM_PRNG_SEED]) {
 		q->prng.seed = nla_get_u64(tb[TCA_NETEM_PRNG_SEED]);
-	else
-		q->prng.seed = get_random_u64();
-	prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+		prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+	}
 
 unlock:
 	sch_tree_unlock(sch);
@@ -1139,6 +1138,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt,
 		return -EINVAL;
 
 	q->loss_model = CLG_RANDOM;
+	q->prng.seed = get_random_u64();
+	prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+
 	ret = netem_change(sch, opt, extack);
 	if (ret)
 		pr_info("netem: change failed\n");
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v7 4/5] net/sched: netem: check for invalid slot range
  2026-04-15 14:27 [PATCH v7 0/5] netem: bug fixes Stephen Hemminger
                   ` (2 preceding siblings ...)
  2026-04-15 14:27 ` [PATCH v7 3/5] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
@ 2026-04-15 14:27 ` Stephen Hemminger
  2026-04-15 14:27 ` [PATCH v7 5/5] net/sched: netem: fix slot delay calculation overflow Stephen Hemminger
  2026-04-17 16:02 ` [PATCH v7 0/5] netem: bug fixes Simon Horman
  5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-04-15 14:27 UTC (permalink / raw)
  To: netdev
  Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Yousuk Seung, Neal Cardwell, open list

Reject slot configuration where min_delay exceeds max_delay.
The delay range computation in get_slot_next() underflows in
this case, producing bogus results.

Fixes: 0a9fe5c375b5 ("netem: slotting with non-uniform distribution")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 net/sched/sch_netem.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 556f9747f0e7..8593e62f3c6a 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -827,6 +827,19 @@ static int get_dist_table(struct disttable **tbl, const struct nlattr *attr)
 	return 0;
 }
 
+static int validate_slot(const struct nlattr *attr,
+			 struct netlink_ext_ack *extack)
+{
+	const struct tc_netem_slot *c = nla_data(attr);
+
+	if (c->min_delay > c->max_delay) {
+		NL_SET_ERR_MSG_ATTR(extack, attr,
+				    "slot min delay greater than max delay");
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static void get_slot(struct netem_sched_data *q, const struct nlattr *attr)
 {
 	const struct tc_netem_slot *c = nla_data(attr);
@@ -1040,6 +1053,12 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
 			goto table_free;
 	}
 
+	if (tb[TCA_NETEM_SLOT]) {
+		ret = validate_slot(tb[TCA_NETEM_SLOT], extack);
+		if (ret)
+			goto table_free;
+	}
+
 	sch_tree_lock(sch);
 	/* backup q->clg and q->loss_model */
 	old_clg = q->clg;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v7 5/5] net/sched: netem: fix slot delay calculation overflow
  2026-04-15 14:27 [PATCH v7 0/5] netem: bug fixes Stephen Hemminger
                   ` (3 preceding siblings ...)
  2026-04-15 14:27 ` [PATCH v7 4/5] net/sched: netem: check for invalid slot range Stephen Hemminger
@ 2026-04-15 14:27 ` Stephen Hemminger
  2026-04-17 16:02 ` [PATCH v7 0/5] netem: bug fixes Simon Horman
  5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-04-15 14:27 UTC (permalink / raw)
  To: netdev
  Cc: Stephen Hemminger, Simon Horman, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Yousuk Seung, Neal Cardwell, open list

get_slot_next() computes a random delay between min_delay and
max_delay using:

  get_random_u32() * (max_delay - min_delay) >> 32

This overflows signed 64-bit arithmetic when the delay range exceeds
approximately 2.1 seconds (2^31 nanoseconds), producing a negative
result that effectively disables slot-based pacing. This is a
realistic configuration for WAN emulation (e.g., slot 1s 5s).

Use mul_u64_u32_shr() which handles the widening multiply without
overflow.

Fixes: 0a9fe5c375b5 ("netem: slotting with non-uniform distribution")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 net/sched/sch_netem.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 8593e62f3c6a..41e56908ab0c 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -659,9 +659,8 @@ static void get_slot_next(struct netem_sched_data *q, u64 now)
 
 	if (!q->slot_dist)
 		next_delay = q->slot_config.min_delay +
-				(get_random_u32() *
-				 (q->slot_config.max_delay -
-				  q->slot_config.min_delay) >> 32);
+			mul_u64_u32_shr(q->slot_config.max_delay - q->slot_config.min_delay,
+					get_random_u32(), 32);
 	else
 		next_delay = tabledist(q->slot_config.dist_delay,
 				       (s32)(q->slot_config.dist_jitter),
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v7 0/5] netem: bug fixes
  2026-04-15 14:27 [PATCH v7 0/5] netem: bug fixes Stephen Hemminger
                   ` (4 preceding siblings ...)
  2026-04-15 14:27 ` [PATCH v7 5/5] net/sched: netem: fix slot delay calculation overflow Stephen Hemminger
@ 2026-04-17 16:02 ` Simon Horman
  5 siblings, 0 replies; 7+ messages in thread
From: Simon Horman @ 2026-04-17 16:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On Wed, Apr 15, 2026 at 07:27:03AM -0700, Stephen Hemminger wrote:
> These bugs were found when doing AI assisted  review of sch_netem.c
> during investigation of the packet duplication recursion problem
> addressed in Jamal's series.
> 
> The fixes cover:
> 
>  - probability gaps in the 4-state Markov loss model
>  - queue limit not accounting for reordered packets
>  - PRNG reseeded on every tc change, breaking reproducibility
>  - slot delay configuration not validated for inverted ranges
>  - slot delay arithmetic overflow for ranges above ~2.1 seconds
> 
> v7 - queue limit check Fixes: goes back further to earlier change
>    - use NL_SET_ERR_MSG_ATTR
> 
> Stephen Hemminger (5):
>   net/sched: netem: fix probability gaps in 4-state loss model
>   net/sched: netem: fix queue limit check to include reordered packets
>   net/sched: netem: only reseed PRNG when seed is explicitly provided
>   net/sched: netem: check for invalid slot range
>   net/sched: netem: fix slot delay calculation overflow

To the maintainers: I'd like to ask for more time to complete review of this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-17 16:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15 14:27 [PATCH v7 0/5] netem: bug fixes Stephen Hemminger
2026-04-15 14:27 ` [PATCH v7 1/5] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
2026-04-15 14:27 ` [PATCH v7 2/5] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
2026-04-15 14:27 ` [PATCH v7 3/5] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
2026-04-15 14:27 ` [PATCH v7 4/5] net/sched: netem: check for invalid slot range Stephen Hemminger
2026-04-15 14:27 ` [PATCH v7 5/5] net/sched: netem: fix slot delay calculation overflow Stephen Hemminger
2026-04-17 16:02 ` [PATCH v7 0/5] netem: bug fixes Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox