public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: William Liu <will@willsroot.io>
Cc: netdev@vger.kernel.org, stable@vger.kernel.org,
	Savino Dicanosa <savy@syst3mfailure.io>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Victor Nogueira <victor@mojatatu.com>
Subject: Re: [PATCH 03/12] net/sched: netem: add per-CPU recursion guard for duplication
Date: Sun, 15 Mar 2026 09:06:22 -0700	[thread overview]
Message-ID: <20260315090622.5fbdf074@phoenix.local> (raw)
In-Reply-To: <ydqKfVXU_4_kSsU89EbfSyd66aGeaCaVHFr6kXhrF-qG7G2WCAAOS51LTAM2y_thGnB0pp-pbGLDWqM4XooMX-jelz30W0KbpE3KzCKiWqo=@willsroot.io>

On Sat, 14 Mar 2026 19:29:10 +0000
William Liu <will@willsroot.io> wrote:

> Looping in Jamal and Victor.
> 
> On Friday, March 13th, 2026 at 9:17 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> > Add a per-CPU recursion depth counter to netem_enqueue(). When netem
> > duplicates a packet, the clone is re-enqueued at the root qdisc. If
> > the tree contains other netem instances, this can recurse without
> > bound, causing soft lockups and OOM.
> > 
> > This approach was previously considered but rejected on the grounds
> > that netem_dequeue calling enqueue on a child netem could bypass the
> > depth check. That concern does not apply: the child netem's
> > netem_enqueue() increments the same per-CPU counter, so the total
> > nesting depth across all netem instances in the call chain is tracked
> > correctly.  
> 
> I'm assuming you are referring to [1] (and other relevant followup messages), but has this setup been tested against the original repro? I think there was a similar draft fix originally but it failed during testing because DOS still happened [2].
> 
> If I remember correctly,  the issue is less so the recursive depth but more so being able to differentiate between packets that are previously involved in duplication or not.
> 
> > 
> > A depth limit of 4 is generous for any legitimate configuration.
> > 
> > Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=220774
> > Cc: stable@vger.kernel.org
> > Reported-by: William Liu <will@willsroot.io>
> > Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  net/sched/sch_netem.c | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> > 
> > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > index 0ccf74a9cb82..085fa3ad6f83 100644
> > --- a/net/sched/sch_netem.c
> > +++ b/net/sched/sch_netem.c
> > @@ -21,6 +21,7 @@
> >  #include <linux/rtnetlink.h>
> >  #include <linux/reciprocal_div.h>
> >  #include <linux/rbtree.h>
> > +#include <linux/percpu.h>
> > 
> >  #include <net/gso.h>
> >  #include <net/netlink.h>
> > @@ -29,6 +30,15 @@
> > 
> >  #define VERSION "1.3"
> > 
> > +/*
> > + * Limit for recursion from duplication.
> > + * Duplicated packets are re-enqueued at the root qdisc, which may
> > + * reach this or another netem instance, causing nested calls to
> > + * netem_enqueue(). This per-CPU counter limits the total depth.
> > + */
> > +static DEFINE_PER_CPU(unsigned int, netem_enqueue_depth);
> > +#define NETEM_RECURSION_LIMIT	4
> > +
> >  /*	Network Emulation Queuing algorithm.
> >  	====================================
> > 
> > @@ -460,6 +470,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >  	/* Do not fool qdisc_drop_all() */
> >  	skb->prev = NULL;
> > 
> > +	/* Guard against recursion from duplication re-injection. */
> > +	if (unlikely(this_cpu_inc_return(netem_enqueue_depth) >
> > +		     NETEM_RECURSION_LIMIT)) {
> > +		this_cpu_dec(netem_enqueue_depth);
> > +		qdisc_drop(skb, sch, to_free);
> > +		return NET_XMIT_DROP;
> > +	}
> > +
> >  	/* Random duplication */
> >  	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> >  		++count;
> > @@ -474,6 +492,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >  	if (count == 0) {
> >  		qdisc_qstats_drop(sch);
> >  		__qdisc_drop(skb, to_free);
> > +		this_cpu_dec(netem_enqueue_depth);
> >  		return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
> >  	}
> > 
> > @@ -529,6 +548,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >  		qdisc_drop_all(skb, sch, to_free);
> >  		if (skb2)
> >  			__qdisc_drop(skb2, to_free);
> > +		this_cpu_dec(netem_enqueue_depth);
> >  		return NET_XMIT_DROP;
> >  	}
> > 
> > @@ -643,8 +663,10 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >  		/* Parent qdiscs accounted for 1 skb of size @prev_len */
> >  		qdisc_tree_reduce_backlog(sch, -(nb - 1), -(len - prev_len));
> >  	} else if (!skb) {
> > +		this_cpu_dec(netem_enqueue_depth);
> >  		return NET_XMIT_DROP;
> >  	}
> > +	this_cpu_dec(netem_enqueue_depth);
> >  	return NET_XMIT_SUCCESS;
> >  }
> > 
> > --
> > 2.51.0
> > 
> >   
> 
> What about the last suggestion for a robust fix from [3]?
> 
> Best,
> Will
> 
> [1] https://lore.kernel.org/netdev/DISZZlS5CdbUKITzkIyT3jki3inTWSMecT6FplNmkpYs9bJizbs0iwRbTGMrnqEXrL3-__IjOQxdULPdZwGdKFSXJ1DZYIj6xmWPBZxerdk=@willsroot.io/
> [2] https://lore.kernel.org/netdev/q7G0Z7oMR2x9TWwNHOiPNsZ8lHzAuXuVgrZgGmAgkH8lkIYyTgeqXwcDrelE_fdS9OdJ4TlfS96px6O9SvnmKigNKFkiaFlStvAGPIJ3b84=@willsroot.io/
> [3] https://lore.kernel.org/netdev/20260111163947.811248-6-jhs@mojatatu.com/

Thanks, this is a tight corner here, and not all solutions work out.

You're right that the per-CPU guard alone doesn't cover the
dequeue-to-child-enqueue path you described. That's exactly why
the series has two patches working together:

Patch 02 adds the per-CPU recursion guard, which handles the
direct enqueue recursion (rootq->enqueue duplicated packet hits
another netem_enqueue in the same call chain).

Patch 04 restructures netem_dequeue to eliminate the pump. The
old code had "goto tfifo_dequeue" which looped back after each
child enqueue, so packets the child duplicated back to root would
immediately get picked up by the same dequeue iteration. The new
code transfers all currently-ready packets from the tfifo to the
child in one batch, then does a single dequeue from the child and
returns. Packets that the child duplicates back to root land in
the tfifo but won't be processed until the next dequeue call from
the parent — breaking the loop you diagrammed.

The original repro is:

  tc qdisc add dev lo root handle 1: netem delay 1ms duplicate 100%
  tc qdisc add dev lo parent 1:1 handle 2: netem delay 1ms duplicate 100%
  ping -f localhost

This is covered by tdc test f2a3 (nested netem config acceptance)
and test 7a07 (nested netem with duplication, traffic via scapy).
More tests are added in the new version.

Jamal's proposed change with skb ttl would also work but
it was rejected because it required adding ttl field to skb
and skb size is a performance critical. As Cong pointed out
adding a couple of bits for ttl makes it increase.
So considered the idea and decided against it.

Thanks.

  reply	other threads:[~2026-03-15 16:06 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-13 21:15 [PATCH 00/12] netem: fixes, cleanup, and selftest Stephen Hemminger
2026-03-13 21:15 ` [PATCH 01/12] selftests: net: add netem qdisc test Stephen Hemminger
2026-03-13 21:15 ` [PATCH 02/12] Revert "net/sched: Restrict conditions for adding duplicating netems to qdisc tree" Stephen Hemminger
2026-03-13 21:15 ` [PATCH 03/12] net/sched: netem: add per-CPU recursion guard for duplication Stephen Hemminger
2026-03-14 19:29   ` William Liu
2026-03-15 16:06     ` Stephen Hemminger [this message]
2026-03-15 16:19       ` Jamal Hadi Salim
2026-03-15 17:18         ` Stephen Hemminger
2026-03-16 17:52           ` Jamal Hadi Salim
2026-03-13 21:15 ` [PATCH 04/12] net/sched: netem: restructure dequeue to avoid re-entrancy with child qdisc Stephen Hemminger
2026-03-13 21:15 ` [PATCH 05/12] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
2026-03-13 21:15 ` [PATCH 06/12] net/sched: netem: fix slot delay calculation overflow Stephen Hemminger
2026-03-13 21:15 ` [PATCH 07/12] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
2026-03-13 21:15 ` [PATCH 08/12] net/sched: netem: null-terminate tfifo linear queue tail Stephen Hemminger
2026-03-13 21:15 ` [PATCH 09/12] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
2026-03-13 21:15 ` [PATCH 10/12] net/sched: netem: move state enums out of struct netem_sched_data Stephen Hemminger
2026-03-13 21:15 ` [PATCH 11/12] net/sched: netem: remove useless VERSION Stephen Hemminger
2026-03-13 21:15 ` [PATCH 12/12] net/sched: netem: replace pr_info with netlink extack error messages Stephen Hemminger
2026-03-14 14:09 ` [PATCH 00/12] netem: fixes, cleanup, and selftest Jakub Kicinski
2026-03-14 15:39   ` Stephen Hemminger
2026-03-14 15:51   ` Stephen Hemminger
2026-03-14 16:00     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260315090622.5fbdf074@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=jhs@mojatatu.com \
    --cc=netdev@vger.kernel.org \
    --cc=savy@syst3mfailure.io \
    --cc=stable@vger.kernel.org \
    --cc=victor@mojatatu.com \
    --cc=will@willsroot.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox