netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: netdev@vger.kernel.org, kuba@kernel.org,
	William Liu <will@willsroot.io>,
	Savino Dicanosa <savy@syst3mfailure.io>
Subject: Re: [Patch net v5 3/9] net_sched: Implement the right netem duplication behavior
Date: Wed, 3 Dec 2025 07:05:40 -0800	[thread overview]
Message-ID: <20251203070540.6ea53471@phoenix.local> (raw)
In-Reply-To: <20251126195244.88124-4-xiyou.wangcong@gmail.com>

On Wed, 26 Nov 2025 11:52:38 -0800
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> In the old behavior, duplicated packets were sent back to the root qdisc,
> which could create dangerous infinite loops in hierarchical setups -
> imagine a scenario where each level of a multi-stage netem hierarchy kept
> feeding duplicates back to the top, potentially causing system instability
> or resource exhaustion.
> 
> The new behavior elegantly solves this by enqueueing duplicates to the same
> qdisc that created them, ensuring that packet duplication occurs exactly
> once per netem stage in a controlled, predictable manner. This change
> enables users to safely construct complex network emulation scenarios using
> netem hierarchies (like the 4x multiplication demonstrated in testing)
> without worrying about runaway packet generation, while still preserving
> the intended duplication effects.
> 
> Another advantage of this approach is that it eliminates the enqueue reentrant
> behaviour which triggered many vulnerabilities. See the last patch in this
> patchset which updates the test cases for such vulnerabilities.
> 
> Now users can confidently chain multiple netem qdiscs together to achieve
> sophisticated network impairment combinations, knowing that each stage will
> apply its effects exactly once to the packet flow, making network testing
> scenarios more reliable and results more deterministic.
> 
> I tested netem packet duplication in two configurations:
> 1. Nest netem-to-netem hierarchy using parent/child attachment
> 2. Single netem using prio qdisc with netem leaf
> 
> Setup commands and results:
> 
> Single netem hierarchy (prio + netem):
>   tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>   tc filter add dev lo parent 1:0 protocol ip matchall classid 1:1
>   tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> 
> Result: 2x packet multiplication (1→2 packets)
>   2 echo requests + 4 echo replies = 6 total packets
> 
> Expected behavior: Only one netem stage exists in this hierarchy, so
> 1 ping becomes 2 packets (100% duplication). The 2 echo requests generate
> 2 echo replies, which also get duplicated to 4 replies, yielding the
> predictable total of 6 packets (2 requests + 4 replies).
> 
> Nest netem hierarchy (netem + netem):
>   tc qdisc add dev lo root handle 1: netem limit 1000 duplicate 100%
>   tc qdisc add dev lo parent 1: handle 2: netem limit 1000 duplicate 100%
> 
> Result: 4x packet multiplication (1→2→4 packets)
>   4 echo requests + 16 echo replies = 20 total packets
> 
> Expected behavior: Root netem duplicates 1 ping to 2 packets, child netem
> receives 2 packets and duplicates each to create 4 total packets. Since
> ping operates bidirectionally, 4 echo requests generate 4 echo replies,
> which also get duplicated through the same hierarchy (4→8→16), resulting
> in the predictable total of 20 packets (4 requests + 16 replies).
> 
> The new netem duplication behavior does not break the documented
> semantics of "creates a copy of the packet before queuing." The man page
> description remains true since duplication occurs before the queuing
> process, creating both original and duplicate packets that are then
> enqueued. The documentation does not specify which qdisc should receive
> the duplicates, only that copying happens before queuing. The implementation
> choice to enqueue duplicates to the same qdisc (rather than root) is an
> internal detail that maintains the documented behavior while preventing
> infinite loops in hierarchical configurations.
> 
> Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
> Reported-by: William Liu <will@willsroot.io>
> Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  net/sched/sch_netem.c | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fdd79d3ccd8c..191f64bd68ff 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -165,6 +165,7 @@ struct netem_sched_data {
>   */
>  struct netem_skb_cb {
>  	u64	        time_to_send;
> +	u8		duplicate : 1;
>  };
>  
>  static inline struct netem_skb_cb *netem_skb_cb(struct sk_buff *skb)
> @@ -460,8 +461,16 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  	skb->prev = NULL;
>  
>  	/* Random duplication */
> -	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> -		++count;
> +	if (q->duplicate) {
> +		bool dup = true;
> +
> +		if (netem_skb_cb(skb)->duplicate) {
> +			netem_skb_cb(skb)->duplicate = 0;
> +			dup = false;
> +		}
> +		if (dup && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> +			++count;
> +	}
>  
>  	/* Drop packet? */
>  	if (loss_event(q)) {
> @@ -532,17 +541,12 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  	}
>  
>  	/*
> -	 * If doing duplication then re-insert at top of the
> -	 * qdisc tree, since parent queuer expects that only one
> -	 * skb will be queued.
> +	 * If doing duplication then re-insert at the same qdisc,
> +	 * as going back to the root would induce loops.
>  	 */
>  	if (skb2) {
> -		struct Qdisc *rootq = qdisc_root_bh(sch);
> -		u32 dupsave = q->duplicate; /* prevent duplicating a dup... */
> -
> -		q->duplicate = 0;
> -		rootq->enqueue(skb2, rootq, to_free);
> -		q->duplicate = dupsave;
> +		netem_skb_cb(skb2)->duplicate = 1;
> +		qdisc_enqueue(skb2, sch, to_free);
>  		skb2 = NULL;
>  	}
>  

I wonder if a lot of the issues would go away if netem used a workqueue
to do the duplication. It would avoid nested calls etc.

  parent reply	other threads:[~2025-12-03 15:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-26 19:52 [Patch net v5 0/9] netem: Fix skb duplication logic and prevent infinite loops Cong Wang
2025-11-26 19:52 ` [Patch net v5 1/9] Revert "net/sched: Restrict conditions for adding duplicating netems to qdisc tree" Cong Wang
2025-11-26 19:52 ` [Patch net v5 2/9] Revert "selftests/tc-testing: Add tests for restrictions on netem duplication" Cong Wang
2025-11-26 19:52 ` [Patch net v5 3/9] net_sched: Implement the right netem duplication behavior Cong Wang
2025-11-26 20:30   ` William Liu
2025-11-26 22:08     ` Cong Wang
2025-11-26 22:43       ` William Liu
2025-11-26 23:13         ` Cong Wang
2025-11-27  2:09           ` William Liu
2025-11-27  3:01             ` Cong Wang
2025-12-03 15:05   ` Stephen Hemminger [this message]
2025-11-26 19:52 ` [Patch net v5 4/9] net_sched: Prevent using netem duplication in non-initial user namespace Cong Wang
2025-12-02  0:25   ` Jakub Kicinski
2025-12-03  5:41     ` Cong Wang
2025-12-02  0:40   ` Stephen Hemminger
2025-12-02  9:16   ` Paolo Abeni
2025-11-26 19:52 ` [Patch net v5 5/9] net_sched: Check the return value of qfq_choose_next_agg() Cong Wang
2025-12-02  9:20   ` Paolo Abeni
2025-12-03  5:42     ` Cong Wang
2025-12-02 21:18   ` Xiang Mei
2025-11-26 19:52 ` [Patch net v5 6/9] selftests/tc-testing: Add a nested netem duplicate test Cong Wang
2025-11-26 19:52 ` [Patch net v5 7/9] selftests/tc-testing: Add a test case for piro with netem duplicate Cong Wang
2025-11-26 19:52 ` [Patch net v5 8/9] selftests/tc-testing: Add a test case for mq " Cong Wang
2025-11-26 19:52 ` [Patch net v5 9/9] selftests/tc-testing: Update test cases " Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251203070540.6ea53471@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=savy@syst3mfailure.io \
    --cc=will@willsroot.io \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).