netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch net 0/2] netem: Fix skb duplication logic to prevent infinite loops
@ 2025-07-01 23:13 Cong Wang
  2025-07-01 23:13 ` [Patch net 1/2] " Cong Wang
  2025-07-01 23:13 ` [Patch net 2/2] selftests/tc-testing: Add a nested netem duplicate test Cong Wang
  0 siblings, 2 replies; 16+ messages in thread
From: Cong Wang @ 2025-07-01 23:13 UTC (permalink / raw)
  To: netdev; +Cc: jhs, will, stephen, Cong Wang

This patchset fixes the infinite loops due to skb duplication in netem.

This replaces the patches from William, with much less code and without
any workaround. More importantly, this does not break any use case at
all.

---

Cong Wang (2):
  netem: Fix skb duplication logic to prevent infinite loops
  selftests/tc-testing: Add a nested netem duplicate test

 include/net/sch_generic.h                     |  1 +
 net/sched/sch_netem.c                         |  7 +++---
 .../tc-testing/tc-tests/qdiscs/netem.json     | 25 +++++++++++++++++++
 3 files changed, 29 insertions(+), 4 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-01 23:13 [Patch net 0/2] netem: Fix skb duplication logic to prevent infinite loops Cong Wang
@ 2025-07-01 23:13 ` Cong Wang
  2025-07-02  1:57   ` Cong Wang
  2025-07-01 23:13 ` [Patch net 2/2] selftests/tc-testing: Add a nested netem duplicate test Cong Wang
  1 sibling, 1 reply; 16+ messages in thread
From: Cong Wang @ 2025-07-01 23:13 UTC (permalink / raw)
  To: netdev; +Cc: jhs, will, stephen, Cong Wang, Savino Dicanosa

This patch refines the packet duplication handling in netem_enqueue() to ensure
that only newly cloned skbs are marked as duplicates. This prevents scenarios
where nested netem qdiscs with 100% duplication could cause infinite loops of
skb duplication.

By ensuring the duplicate flag is properly managed, this patch maintains skb
integrity and avoids excessive packet duplication in complex qdisc setups.

Now we could also get rid of the ugly temporary overwrite of
q->duplicate.

Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
Reported-by: William Liu <will@willsroot.io>
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/sch_generic.h | 1 +
 net/sched/sch_netem.c     | 7 +++----
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 638948be4c50..595b24180d62 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1067,6 +1067,7 @@ struct tc_skb_cb {
 	u8 post_ct:1;
 	u8 post_ct_snat:1;
 	u8 post_ct_dnat:1;
+	u8 duplicate:1;
 };
 
 static inline struct tc_skb_cb *tc_skb_cb(const struct sk_buff *skb)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index fdd79d3ccd8c..33de9c3e4d1b 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	skb->prev = NULL;
 
 	/* Random duplication */
-	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
+	if (tc_skb_cb(skb)->duplicate &&
+	    q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
 		++count;
 
 	/* Drop packet? */
@@ -538,11 +539,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	 */
 	if (skb2) {
 		struct Qdisc *rootq = qdisc_root_bh(sch);
-		u32 dupsave = q->duplicate; /* prevent duplicating a dup... */
 
-		q->duplicate = 0;
+		tc_skb_cb(skb2)->duplicate = 1;
 		rootq->enqueue(skb2, rootq, to_free);
-		q->duplicate = dupsave;
 		skb2 = NULL;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Patch net 2/2] selftests/tc-testing: Add a nested netem duplicate test
  2025-07-01 23:13 [Patch net 0/2] netem: Fix skb duplication logic to prevent infinite loops Cong Wang
  2025-07-01 23:13 ` [Patch net 1/2] " Cong Wang
@ 2025-07-01 23:13 ` Cong Wang
  1 sibling, 0 replies; 16+ messages in thread
From: Cong Wang @ 2025-07-01 23:13 UTC (permalink / raw)
  To: netdev; +Cc: jhs, will, stephen, Cong Wang

Integrate the reproducer from William into tc-testing and use scapy
to generate packets for testing:

 # ./tdc.py -e 1676
  -- ns/SubPlugin.__init__
  -- scapy/SubPlugin.__init__
 Test 1676: NETEM nested duplicate 100%
 [ 1154.071135] v0p0id1676: entered promiscuous mode
 [ 1154.101066] virtio_net virtio0 enp1s0: entered promiscuous mode
 [ 1154.146301] virtio_net virtio0 enp1s0: left promiscuous mode
 .
 Sent 1 packets.
 [ 1154.173695] v0p0id1676: left promiscuous mode
 [ 1154.216159] v0p0id1676: entered promiscuous mode
 .
 Sent 1 packets.
 [ 1154.238398] v0p0id1676: left promiscuous mode
 [ 1154.260909] v0p0id1676: entered promiscuous mode
 .
 Sent 1 packets.
 [ 1154.282708] v0p0id1676: left promiscuous mode
 [ 1154.309399] v0p0id1676: entered promiscuous mode
 .
 Sent 1 packets.
 [ 1154.337949] v0p0id1676: left promiscuous mode
 [ 1154.360924] v0p0id1676: entered promiscuous mode
 .
 Sent 1 packets.
 [ 1154.383522] v0p0id1676: left promiscuous mode

 All test results:

 1..1
 ok 1 1676 - NETEM nested duplicate 100%

Reported-by: William Liu <will@willsroot.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 .../tc-testing/tc-tests/qdiscs/netem.json     | 25 +++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json
index 3c4444961488..e5425b3bdd5e 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json
@@ -336,5 +336,30 @@
         "teardown": [
             "$TC qdisc del dev $DUMMY handle 1: root"
         ]
+    },
+    {
+        "id": "1676",
+        "name": "NETEM nested duplicate 100%",
+        "category": ["qdisc", "netem"],
+        "setup": [
+            "$TC qdisc add dev $DEV1 root handle 1: netem limit 1 duplicate 100%",
+            "$TC qdisc add dev $DEV1 parent 1: handle 2: netem gap 1 limit 1 duplicate 100% delay 1us reorder 100%"
+        ],
+        "teardown": [
+            "$TC qdisc del dev $DEV1 root"
+        ],
+        "plugins": {
+            "requires": ["nsPlugin", "scapyPlugin"]
+        },
+        "scapy": {
+            "iface": "$DEV0",
+            "count": 5,
+            "packet": "Ether()/IP(dst='10.10.10.1', src='10.10.10.10')/TCP(sport=12345, dport=80)"
+        },
+        "cmdUnderTest": "$TC -s qdisc show dev $DEV1",
+        "expExitCode": "0",
+        "verifyCmd": "$TC -s qdisc show dev $DEV1",
+        "matchPattern": "qdisc netem",
+        "matchCount": "2"
     }
 ]
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-01 23:13 ` [Patch net 1/2] " Cong Wang
@ 2025-07-02  1:57   ` Cong Wang
  2025-07-02 14:12     ` Jamal Hadi Salim
  0 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2025-07-02  1:57 UTC (permalink / raw)
  To: netdev; +Cc: jhs, will, stephen, Savino Dicanosa

On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fdd79d3ccd8c..33de9c3e4d1b 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  	skb->prev = NULL;
>  
>  	/* Random duplication */
> -	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> +	if (tc_skb_cb(skb)->duplicate &&

Oops, this is clearly should be !duplicate... It was lost during my
stupid copy-n-paste... Sorry for this mistake.

I will send v2 after waiting for other feedback.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-02  1:57   ` Cong Wang
@ 2025-07-02 14:12     ` Jamal Hadi Salim
  2025-07-02 15:04       ` Jamal Hadi Salim
  0 siblings, 1 reply; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-02 14:12 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, will, stephen, Savino Dicanosa

On Tue, Jul 1, 2025 at 9:57 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > --- a/net/sched/sch_netem.c
> > +++ b/net/sched/sch_netem.c
> > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >       skb->prev = NULL;
> >
> >       /* Random duplication */
> > -     if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > +     if (tc_skb_cb(skb)->duplicate &&
>
> Oops, this is clearly should be !duplicate... It was lost during my
> stupid copy-n-paste... Sorry for this mistake.
>

I understood you earlier, Cong. My view still stands:
You are adding logic to a common data structure for a use case that
really makes no sense. The ROI is not good.
BTW: I am almost certain you will hit other issues when this goes out
or when you actually start to test and then you will have to fix more
spots.

cheers,
jamal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-02 14:12     ` Jamal Hadi Salim
@ 2025-07-02 15:04       ` Jamal Hadi Salim
  2025-07-02 15:06         ` Jamal Hadi Salim
  2025-07-05  0:48         ` Cong Wang
  0 siblings, 2 replies; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-02 15:04 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, will, stephen, Savino Dicanosa

[-- Attachment #1: Type: text/plain, Size: 1976 bytes --]

On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Tue, Jul 1, 2025 at 9:57 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > --- a/net/sched/sch_netem.c
> > > +++ b/net/sched/sch_netem.c
> > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > >       skb->prev = NULL;
> > >
> > >       /* Random duplication */
> > > -     if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > +     if (tc_skb_cb(skb)->duplicate &&
> >
> > Oops, this is clearly should be !duplicate... It was lost during my
> > stupid copy-n-paste... Sorry for this mistake.
> >
>
> I understood you earlier, Cong. My view still stands:
> You are adding logic to a common data structure for a use case that
> really makes no sense. The ROI is not good.
> BTW: I am almost certain you will hit other issues when this goes out
> or when you actually start to test and then you will have to fix more
> spots.
>
Here's an example that breaks it:

sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
netem_bug_test.o sec classifier/pass classid 1:1
sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
duplicate 100% delay 1us reorder 100%

And the ping 127.0.0.1 -c 1
I had to fix your patch for correctness (attached)


the ebpf prog is trivial - make it just return the classid or even zero.

William, as a middle ground can you take a crack at using cb_ext -
take a look for example at struct tc_skb_ext_alloc in cls_api.c (that
one is safe to extend).


cheers
jamal

> cheers,
> jamal

[-- Attachment #2: p1 --]
[-- Type: application/octet-stream, Size: 1225 bytes --]

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 638948be4c50..595b24180d62 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1067,6 +1067,7 @@ struct tc_skb_cb {
 	u8 post_ct:1;
 	u8 post_ct_snat:1;
 	u8 post_ct_dnat:1;
+	u8 duplicate:1;
 };
 
 static inline struct tc_skb_cb *tc_skb_cb(const struct sk_buff *skb)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index fdd79d3ccd8c..d5b969df9143 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	skb->prev = NULL;
 
 	/* Random duplication */
-	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
+	if (tc_skb_cb(skb)->duplicate == 0 &&
+	    q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
 		++count;
 
 	/* Drop packet? */
@@ -538,11 +539,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	 */
 	if (skb2) {
 		struct Qdisc *rootq = qdisc_root_bh(sch);
-		u32 dupsave = q->duplicate; /* prevent duplicating a dup... */
 
-		q->duplicate = 0;
+		tc_skb_cb(skb2)->duplicate = 1;
 		rootq->enqueue(skb2, rootq, to_free);
-		q->duplicate = dupsave;
 		skb2 = NULL;
 	}
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-02 15:04       ` Jamal Hadi Salim
@ 2025-07-02 15:06         ` Jamal Hadi Salim
  2025-07-02 15:20           ` William Liu
  2025-07-05  0:48         ` Cong Wang
  1 sibling, 1 reply; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-02 15:06 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, will, stephen, Savino Dicanosa

On Wed, Jul 2, 2025 at 11:04 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Tue, Jul 1, 2025 at 9:57 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > > --- a/net/sched/sch_netem.c
> > > > +++ b/net/sched/sch_netem.c
> > > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > > >       skb->prev = NULL;
> > > >
> > > >       /* Random duplication */
> > > > -     if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > > +     if (tc_skb_cb(skb)->duplicate &&
> > >
> > > Oops, this is clearly should be !duplicate... It was lost during my
> > > stupid copy-n-paste... Sorry for this mistake.
> > >
> >
> > I understood you earlier, Cong. My view still stands:
> > You are adding logic to a common data structure for a use case that
> > really makes no sense. The ROI is not good.
> > BTW: I am almost certain you will hit other issues when this goes out
> > or when you actually start to test and then you will have to fix more
> > spots.
> >
> Here's an example that breaks it:
>
> sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0
> sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> netem_bug_test.o sec classifier/pass classid 1:1
> sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
> duplicate 100% delay 1us reorder 100%
>
> And the ping 127.0.0.1 -c 1
> I had to fix your patch for correctness (attached)
>
>
> the ebpf prog is trivial - make it just return the classid or even zero.
>
> William, as a middle ground can you take a crack at using cb_ext -
> take a look for example at struct tc_skb_ext_alloc in cls_api.c (that
> one is safe to extend).
>
>
Meant: struct tc_skb_ext *ex

If you need help ping me privately - some latency will be involved..

cheers,
jamal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-02 15:06         ` Jamal Hadi Salim
@ 2025-07-02 15:20           ` William Liu
  0 siblings, 0 replies; 16+ messages in thread
From: William Liu @ 2025-07-02 15:20 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Cong Wang, netdev, stephen, Savino Dicanosa






On Wednesday, July 2nd, 2025 at 3:06 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> 
> 
> On Wed, Jul 2, 2025 at 11:04 AM Jamal Hadi Salim jhs@mojatatu.com wrote:
> 
> > On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim jhs@mojatatu.com wrote:
> > 
> > > On Tue, Jul 1, 2025 at 9:57 PM Cong Wang xiyou.wangcong@gmail.com wrote:
> > > 
> > > > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > > 
> > > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > > > --- a/net/sched/sch_netem.c
> > > > > +++ b/net/sched/sch_netem.c
> > > > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > > > > skb->prev = NULL;
> > > > > 
> > > > > /* Random duplication */
> > > > > - if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > > > + if (tc_skb_cb(skb)->duplicate &&
> > > > 
> > > > Oops, this is clearly should be !duplicate... It was lost during my
> > > > stupid copy-n-paste... Sorry for this mistake.
> > > 
> > > I understood you earlier, Cong. My view still stands:
> > > You are adding logic to a common data structure for a use case that
> > > really makes no sense. The ROI is not good.
> > > BTW: I am almost certain you will hit other issues when this goes out
> > > or when you actually start to test and then you will have to fix more
> > > spots.
> > 
> > Here's an example that breaks it:
> > 
> > sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0
> > sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> > netem_bug_test.o sec classifier/pass classid 1:1
> > sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> > sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
> > duplicate 100% delay 1us reorder 100%
> > 
> > And the ping 127.0.0.1 -c 1
> > I had to fix your patch for correctness (attached)
> > 
> > the ebpf prog is trivial - make it just return the classid or even zero.
> > 
> > William, as a middle ground can you take a crack at using cb_ext -
> > take a look for example at struct tc_skb_ext_alloc in cls_api.c (that
> > one is safe to extend).
> 
> Meant: struct tc_skb_ext *ex
> 
> If you need help ping me privately - some latency will be involved..
> 
> cheers,
> jamal

Sure, I can take a crack at it in the upcoming days.

Best,
Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-02 15:04       ` Jamal Hadi Salim
  2025-07-02 15:06         ` Jamal Hadi Salim
@ 2025-07-05  0:48         ` Cong Wang
  2025-07-05 13:52           ` Jamal Hadi Salim
  1 sibling, 1 reply; 16+ messages in thread
From: Cong Wang @ 2025-07-05  0:48 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev, will, stephen, Savino Dicanosa

On Wed, Jul 02, 2025 at 11:04:22AM -0400, Jamal Hadi Salim wrote:
> On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Tue, Jul 1, 2025 at 9:57 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > > --- a/net/sched/sch_netem.c
> > > > +++ b/net/sched/sch_netem.c
> > > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > > >       skb->prev = NULL;
> > > >
> > > >       /* Random duplication */
> > > > -     if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > > +     if (tc_skb_cb(skb)->duplicate &&
> > >
> > > Oops, this is clearly should be !duplicate... It was lost during my
> > > stupid copy-n-paste... Sorry for this mistake.
> > >
> >
> > I understood you earlier, Cong. My view still stands:
> > You are adding logic to a common data structure for a use case that

You are exaggerating this. I only added 1 bit to the core data structure,
the code logic remains in the netem, so it is contained within netem.

> > really makes no sense. The ROI is not good.

Speaking of ROI, I think you need to look at the patch stats:

William/Your patch:
 1 file changed, 40 insertions(+)

My patch:
 2 files changed, 4 insertions(+), 4 deletions(-)


> > BTW: I am almost certain you will hit other issues when this goes out
> > or when you actually start to test and then you will have to fix more
> > spots.
> >
> Here's an example that breaks it:
> 
> sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0
> sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> netem_bug_test.o sec classifier/pass classid 1:1
> sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
> duplicate 100% delay 1us reorder 100%
> 
> And the ping 127.0.0.1 -c 1
> I had to fix your patch for correctness (attached)
> 
> 
> the ebpf prog is trivial - make it just return the classid or even zero.

Interesting, are you sure this works before my patch?

I don't intend to change any logic except closing the infinite loop. IOW,
if it didn't work before, I don't expect to make it work with this patch,
this patch merely fixes the infinite loop, which is sufficient as a bug fix.
Otherwise it would become a feature improvement. (Don't get me wrong, I
think this feature should be improved rather than simply forbidden, it just
belongs to a different patch.)

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-05  0:48         ` Cong Wang
@ 2025-07-05 13:52           ` Jamal Hadi Salim
  2025-07-06 14:59             ` William Liu
  2025-07-07 19:40             ` Cong Wang
  0 siblings, 2 replies; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-05 13:52 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, will, stephen, Savino Dicanosa

On Fri, Jul 4, 2025 at 8:48 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Wed, Jul 02, 2025 at 11:04:22AM -0400, Jamal Hadi Salim wrote:
> > On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > >
> > > On Tue, Jul 1, 2025 at 9:57 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > >
> > > > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > > > --- a/net/sched/sch_netem.c
> > > > > +++ b/net/sched/sch_netem.c
> > > > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > > > >       skb->prev = NULL;
> > > > >
> > > > >       /* Random duplication */
> > > > > -     if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > > > +     if (tc_skb_cb(skb)->duplicate &&
> > > >
> > > > Oops, this is clearly should be !duplicate... It was lost during my
> > > > stupid copy-n-paste... Sorry for this mistake.
> > > >
> > >
> > > I understood you earlier, Cong. My view still stands:
> > > You are adding logic to a common data structure for a use case that
>
> You are exaggerating this. I only added 1 bit to the core data structure,
> the code logic remains in the netem, so it is contained within netem.

Try it out ;->
Here's an even simpler setup:

sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
netem_bug_test.o sec classifier/pass classid 1:1
sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
then:
ping -c 1 127.0.0.1

Note: there are other issues as well but i thought citing the ebpf one
was sufficient to get the point across.

>
> > > really makes no sense. The ROI is not good.
>
> Speaking of ROI, I think you need to look at the patch stats:
>
> William/Your patch:
>  1 file changed, 40 insertions(+)
>
> My patch:
>  2 files changed, 4 insertions(+), 4 deletions(-)
>

ROI is not just about LOC. The consequences of a patch are also part
of that formula. And let's not forget the time spent so far debating
instead of plugging the hole.

>
> > > BTW: I am almost certain you will hit other issues when this goes out
> > > or when you actually start to test and then you will have to fix more
> > > spots.
> > >
> > Here's an example that breaks it:
> >
> > sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0
> > sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> > netem_bug_test.o sec classifier/pass classid 1:1
> > sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> > sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
> > duplicate 100% delay 1us reorder 100%
> >
> > And the ping 127.0.0.1 -c 1
> > I had to fix your patch for correctness (attached)
> >
> >
> > the ebpf prog is trivial - make it just return the classid or even zero.
>
> Interesting, are you sure this works before my patch?
>
> I don't intend to change any logic except closing the infinite loop. IOW,
> if it didn't work before, I don't expect to make it work with this patch,
> this patch merely fixes the infinite loop, which is sufficient as a bug fix.
> Otherwise it would become a feature improvement. (Don't get me wrong, I
> think this feature should be improved rather than simply forbidden, it just
> belongs to a different patch.)

A quick solution is what William had. I asked him to use ext_cb not
because i think it is a better solution but just so we can move
forward.
Agree that for a longer term we need a more generic solution as discussed ...

cheers,
jamal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-05 13:52           ` Jamal Hadi Salim
@ 2025-07-06 14:59             ` William Liu
  2025-07-07 20:49               ` Jamal Hadi Salim
  2025-07-07 19:40             ` Cong Wang
  1 sibling, 1 reply; 16+ messages in thread
From: William Liu @ 2025-07-06 14:59 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Cong Wang, netdev, stephen, Savino Dicanosa

On Saturday, July 5th, 2025 at 1:52 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> 
> 
> On Fri, Jul 4, 2025 at 8:48 PM Cong Wang xiyou.wangcong@gmail.com wrote:
> 
> > On Wed, Jul 02, 2025 at 11:04:22AM -0400, Jamal Hadi Salim wrote:
> > 
> > > On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim jhs@mojatatu.com wrote:
> > > 
> > > > On Tue, Jul 1, 2025 at 9:57 PM Cong Wang xiyou.wangcong@gmail.com wrote:
> > > > 
> > > > > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > > > 
> > > > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > > > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > > > > --- a/net/sched/sch_netem.c
> > > > > > +++ b/net/sched/sch_netem.c
> > > > > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > > > > > skb->prev = NULL;
> > > > > > 
> > > > > > /* Random duplication */
> > > > > > - if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > > > > + if (tc_skb_cb(skb)->duplicate &&
> > > > > 
> > > > > Oops, this is clearly should be !duplicate... It was lost during my
> > > > > stupid copy-n-paste... Sorry for this mistake.
> > > > 
> > > > I understood you earlier, Cong. My view still stands:
> > > > You are adding logic to a common data structure for a use case that
> > 
> > You are exaggerating this. I only added 1 bit to the core data structure,
> > the code logic remains in the netem, so it is contained within netem.
> 
> 
> Try it out ;->
> 
> Here's an even simpler setup:
> 
> sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0
> sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> netem_bug_test.o sec classifier/pass classid 1:1
> sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> then:
> ping -c 1 127.0.0.1
> 
> Note: there are other issues as well but i thought citing the ebpf one
> was sufficient to get the point across.
> 
> > > > really makes no sense. The ROI is not good.
> > 
> > Speaking of ROI, I think you need to look at the patch stats:
> > 
> > William/Your patch:
> > 1 file changed, 40 insertions(+)
> > 
> > My patch:
> > 2 files changed, 4 insertions(+), 4 deletions(-)
> 
> 
> ROI is not just about LOC. The consequences of a patch are also part
> of that formula. And let's not forget the time spent so far debating
> instead of plugging the hole.
> 
> > > > BTW: I am almost certain you will hit other issues when this goes out
> > > > or when you actually start to test and then you will have to fix more
> > > > spots.
> > > 
> > > Here's an example that breaks it:
> > > 
> > > sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> > > 0 0 0 0 0 0 0 0 0 0 0
> > > sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> > > netem_bug_test.o sec classifier/pass classid 1:1
> > > sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> > > sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
> > > duplicate 100% delay 1us reorder 100%
> > > 
> > > And the ping 127.0.0.1 -c 1
> > > I had to fix your patch for correctness (attached)
> > > 
> > > the ebpf prog is trivial - make it just return the classid or even zero.
> > 
> > Interesting, are you sure this works before my patch?
> > 
> > I don't intend to change any logic except closing the infinite loop. IOW,
> > if it didn't work before, I don't expect to make it work with this patch,
> > this patch merely fixes the infinite loop, which is sufficient as a bug fix.
> > Otherwise it would become a feature improvement. (Don't get me wrong, I
> > think this feature should be improved rather than simply forbidden, it just
> > belongs to a different patch.)
> 
> 
> A quick solution is what William had. I asked him to use ext_cb not
> because i think it is a better solution but just so we can move
> forward.
> Agree that for a longer term we need a more generic solution as discussed ...
> 
> cheers,
> jamal

The tc_skb_ext approach has a problem... the config option that enables it is NET_TC_SKB_EXT. I assumed this is a generic name for skb extensions in the tc subsystem, but unfortunately this is hardcoded for NET_CLS_ACT recirculation support.

So what this means is we have the following choices:
1. Make SCH_NETEM depend on NET_CLS_ACT and NET_TC_SKB_EXT
2. Add "|| IS_ENABLED(CONFIG_SCH_NETEM)" next to "IS_ENABLED(CONFIG_NET_TC_SKB_EXT)"
3. Separate NET_TC_SKB_EXT and the idea of recirculation support. But I'm not sure how people feel about renaming config options. And this would require a small change to the Mellanox driver subsystem.

None of these sound too nice to do, and I'm not sure which approach to take. In an ideal world, 3 would be best, but I'm not sure how others would feel about all that just to account for a netem edge case.

Best,
William

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-05 13:52           ` Jamal Hadi Salim
  2025-07-06 14:59             ` William Liu
@ 2025-07-07 19:40             ` Cong Wang
  2025-07-07 20:24               ` Jamal Hadi Salim
  1 sibling, 1 reply; 16+ messages in thread
From: Cong Wang @ 2025-07-07 19:40 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev, will, stephen, Savino Dicanosa

On Sat, Jul 05, 2025 at 09:52:05AM -0400, Jamal Hadi Salim wrote:
> On Fri, Jul 4, 2025 at 8:48 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Wed, Jul 02, 2025 at 11:04:22AM -0400, Jamal Hadi Salim wrote:
> > > On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > > >
> > > > On Tue, Jul 1, 2025 at 9:57 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > > > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > > > > --- a/net/sched/sch_netem.c
> > > > > > +++ b/net/sched/sch_netem.c
> > > > > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > > > > >       skb->prev = NULL;
> > > > > >
> > > > > >       /* Random duplication */
> > > > > > -     if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > > > > +     if (tc_skb_cb(skb)->duplicate &&
> > > > >
> > > > > Oops, this is clearly should be !duplicate... It was lost during my
> > > > > stupid copy-n-paste... Sorry for this mistake.
> > > > >
> > > >
> > > > I understood you earlier, Cong. My view still stands:
> > > > You are adding logic to a common data structure for a use case that
> >
> > You are exaggerating this. I only added 1 bit to the core data structure,
> > the code logic remains in the netem, so it is contained within netem.
> 
> Try it out ;->
> Here's an even simpler setup:
> 
> sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0
> sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> netem_bug_test.o sec classifier/pass classid 1:1
> sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> then:
> ping -c 1 127.0.0.1

Of course (I replaced your ebpf filter with matchall):

[root@localhost ~]# cat netem_from_jamal.sh
tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# tc filter add dev lo parent 1:0 protocol ip bpf obj netem_bug_test.o sec classifier/pass classid 1:1
tc filter add dev lo parent 1:0 protocol ip matchall classid 1:1
tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%

[root@localhost ~]# bash -x netem_from_jamal.sh
+ tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+ tc filter add dev lo parent 1:0 protocol ip matchall classid 1:1
+ tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
[root@localhost ~]# ping -c 1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=3.84 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 3.836/3.836/3.836/0.000 ms

There is clearly no soft lockup. Hence the original issue has been successfully fixed.

> 
> Note: there are other issues as well but i thought citing the ebpf one
> was sufficient to get the point across.

Please kindly define "issues" here. My definition for issue in this
context is the soft lockup issue reported by William. Like I already
explained, I have _no_ intention to solve any other issue than the one
reported by William, simply because they probably can be deferred to
-net-next.

> 
> >
> > > > really makes no sense. The ROI is not good.
> >
> > Speaking of ROI, I think you need to look at the patch stats:
> >
> > William/Your patch:
> >  1 file changed, 40 insertions(+)
> >
> > My patch:
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> 
> ROI is not just about LOC. The consequences of a patch are also part
> of that formula. And let's not forget the time spent so far debating
> instead of plugging the hole.

LOC matters a lot for code review and maintainance.

> 
> >
> > > > BTW: I am almost certain you will hit other issues when this goes out
> > > > or when you actually start to test and then you will have to fix more
> > > > spots.
> > > >
> > > Here's an example that breaks it:
> > >
> > > sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> > > 0 0 0 0 0 0 0 0 0 0 0
> > > sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> > > netem_bug_test.o sec classifier/pass classid 1:1
> > > sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> > > sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
> > > duplicate 100% delay 1us reorder 100%
> > >
> > > And the ping 127.0.0.1 -c 1
> > > I had to fix your patch for correctness (attached)
> > >
> > >
> > > the ebpf prog is trivial - make it just return the classid or even zero.
> >
> > Interesting, are you sure this works before my patch?
> >
> > I don't intend to change any logic except closing the infinite loop. IOW,
> > if it didn't work before, I don't expect to make it work with this patch,
> > this patch merely fixes the infinite loop, which is sufficient as a bug fix.
> > Otherwise it would become a feature improvement. (Don't get me wrong, I
> > think this feature should be improved rather than simply forbidden, it just
> > belongs to a different patch.)
> 
> A quick solution is what William had. I asked him to use ext_cb not
> because i think it is a better solution but just so we can move
> forward.

I already posted a patch, instead of just arguing. Now you are arguing
about the patch I posted...

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-07 19:40             ` Cong Wang
@ 2025-07-07 20:24               ` Jamal Hadi Salim
  0 siblings, 0 replies; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-07 20:24 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, will, stephen, Savino Dicanosa

On Mon, Jul 7, 2025 at 3:40 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Sat, Jul 05, 2025 at 09:52:05AM -0400, Jamal Hadi Salim wrote:
> > On Fri, Jul 4, 2025 at 8:48 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > On Wed, Jul 02, 2025 at 11:04:22AM -0400, Jamal Hadi Salim wrote:
> > > > On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > > > >
> > > > > On Tue, Jul 1, 2025 at 9:57 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > > > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > > > > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > > > > > --- a/net/sched/sch_netem.c
> > > > > > > +++ b/net/sched/sch_netem.c
> > > > > > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > > > > > >       skb->prev = NULL;
> > > > > > >
> > > > > > >       /* Random duplication */
> > > > > > > -     if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > > > > > +     if (tc_skb_cb(skb)->duplicate &&
> > > > > >
> > > > > > Oops, this is clearly should be !duplicate... It was lost during my
> > > > > > stupid copy-n-paste... Sorry for this mistake.
> > > > > >
> > > > >
> > > > > I understood you earlier, Cong. My view still stands:
> > > > > You are adding logic to a common data structure for a use case that
> > >
> > > You are exaggerating this. I only added 1 bit to the core data structure,
> > > the code logic remains in the netem, so it is contained within netem.
> >
> > Try it out ;->
> > Here's an even simpler setup:
> >
> > sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0
> > sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> > netem_bug_test.o sec classifier/pass classid 1:1
> > sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> > then:
> > ping -c 1 127.0.0.1
>
> Of course (I replaced your ebpf filter with matchall):
>

Replacing ebpf with matchall is missing the core issue ;->
We cant just tell people "please dont create such a config using ebpf
classifier, use matchall instead"
Cong, just run the example and then look at the code.

> [root@localhost ~]# cat netem_from_jamal.sh
> tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> # tc filter add dev lo parent 1:0 protocol ip bpf obj netem_bug_test.o sec classifier/pass classid 1:1
> tc filter add dev lo parent 1:0 protocol ip matchall classid 1:1
> tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
>
> [root@localhost ~]# bash -x netem_from_jamal.sh
> + tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> + tc filter add dev lo parent 1:0 protocol ip matchall classid 1:1
> + tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> [root@localhost ~]# ping -c 1 127.0.0.1
> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=3.84 ms
>
> --- 127.0.0.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 3.836/3.836/3.836/0.000 ms
>
> There is clearly no soft lockup. Hence the original issue has been successfully fixed.
>
> >
> > Note: there are other issues as well but i thought citing the ebpf one
> > was sufficient to get the point across.
>
> Please kindly define "issues" here. My definition for issue in this
> context is the soft lockup issue reported by William. Like I already
> explained, I have _no_ intention to solve any other issue than the one
> reported by William, simply because they probably can be deferred to
> -net-next.

But you cant just introduce new issues to solve an existing one. Take
a look: your cb values will be overwritten by ebpf.

cheers,
jamal

> > >
> > > > > really makes no sense. The ROI is not good.
> > >
> > > Speaking of ROI, I think you need to look at the patch stats:
> > >
> > > William/Your patch:
> > >  1 file changed, 40 insertions(+)
> > >
> > > My patch:
> > >  2 files changed, 4 insertions(+), 4 deletions(-)
> > >
> >
> > ROI is not just about LOC. The consequences of a patch are also part
> > of that formula. And let's not forget the time spent so far debating
> > instead of plugging the hole.
>
> LOC matters a lot for code review and maintainance.
>
> >
> > >
> > > > > BTW: I am almost certain you will hit other issues when this goes out
> > > > > or when you actually start to test and then you will have to fix more
> > > > > spots.
> > > > >
> > > > Here's an example that breaks it:
> > > >
> > > > sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> > > > 0 0 0 0 0 0 0 0 0 0 0
> > > > sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> > > > netem_bug_test.o sec classifier/pass classid 1:1
> > > > sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> > > > sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
> > > > duplicate 100% delay 1us reorder 100%
> > > >
> > > > And the ping 127.0.0.1 -c 1
> > > > I had to fix your patch for correctness (attached)
> > > >
> > > >
> > > > the ebpf prog is trivial - make it just return the classid or even zero.
> > >
> > > Interesting, are you sure this works before my patch?
> > >
> > > I don't intend to change any logic except closing the infinite loop. IOW,
> > > if it didn't work before, I don't expect to make it work with this patch,
> > > this patch merely fixes the infinite loop, which is sufficient as a bug fix.
> > > Otherwise it would become a feature improvement. (Don't get me wrong, I
> > > think this feature should be improved rather than simply forbidden, it just
> > > belongs to a different patch.)
> >
> > A quick solution is what William had. I asked him to use ext_cb not
> > because i think it is a better solution but just so we can move
> > forward.
>
> I already posted a patch, instead of just arguing. Now you are arguing
> about the patch I posted...
>
> Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-06 14:59             ` William Liu
@ 2025-07-07 20:49               ` Jamal Hadi Salim
  2025-07-07 21:26                 ` Jakub Kicinski
  0 siblings, 1 reply; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-07 20:49 UTC (permalink / raw)
  To: William Liu; +Cc: Cong Wang, netdev, stephen, Savino Dicanosa

On Sun, Jul 6, 2025 at 10:59 AM William Liu <will@willsroot.io> wrote:
>
> On Saturday, July 5th, 2025 at 1:52 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> >
> >
> > On Fri, Jul 4, 2025 at 8:48 PM Cong Wang xiyou.wangcong@gmail.com wrote:
> >
> > > On Wed, Jul 02, 2025 at 11:04:22AM -0400, Jamal Hadi Salim wrote:
> > >
> > > > On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim jhs@mojatatu.com wrote:
> > > >
> > > > > On Tue, Jul 1, 2025 at 9:57 PM Cong Wang xiyou.wangcong@gmail.com wrote:
> > > > >
> > > > > > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > > > >
> > > > > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > > > > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > > > > > --- a/net/sched/sch_netem.c
> > > > > > > +++ b/net/sched/sch_netem.c
> > > > > > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > > > > > > skb->prev = NULL;
> > > > > > >
> > > > > > > /* Random duplication */
> > > > > > > - if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > > > > > + if (tc_skb_cb(skb)->duplicate &&
> > > > > >
> > > > > > Oops, this is clearly should be !duplicate... It was lost during my
> > > > > > stupid copy-n-paste... Sorry for this mistake.
> > > > >
> > > > > I understood you earlier, Cong. My view still stands:
> > > > > You are adding logic to a common data structure for a use case that
> > >
> > > You are exaggerating this. I only added 1 bit to the core data structure,
> > > the code logic remains in the netem, so it is contained within netem.
> >
> >
> > Try it out ;->
> >
> > Here's an even simpler setup:
> >
> > sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0
> > sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> > netem_bug_test.o sec classifier/pass classid 1:1
> > sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> > then:
> > ping -c 1 127.0.0.1
> >
> > Note: there are other issues as well but i thought citing the ebpf one
> > was sufficient to get the point across.
> >
> > > > > really makes no sense. The ROI is not good.
> > >
> > > Speaking of ROI, I think you need to look at the patch stats:
> > >
> > > William/Your patch:
> > > 1 file changed, 40 insertions(+)
> > >
> > > My patch:
> > > 2 files changed, 4 insertions(+), 4 deletions(-)
> >
> >
> > ROI is not just about LOC. The consequences of a patch are also part
> > of that formula. And let's not forget the time spent so far debating
> > instead of plugging the hole.
> >
> > > > > BTW: I am almost certain you will hit other issues when this goes out
> > > > > or when you actually start to test and then you will have to fix more
> > > > > spots.
> > > >
> > > > Here's an example that breaks it:
> > > >
> > > > sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> > > > 0 0 0 0 0 0 0 0 0 0 0
> > > > sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> > > > netem_bug_test.o sec classifier/pass classid 1:1
> > > > sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> > > > sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
> > > > duplicate 100% delay 1us reorder 100%
> > > >
> > > > And the ping 127.0.0.1 -c 1
> > > > I had to fix your patch for correctness (attached)
> > > >
> > > > the ebpf prog is trivial - make it just return the classid or even zero.
> > >
> > > Interesting, are you sure this works before my patch?
> > >
> > > I don't intend to change any logic except closing the infinite loop. IOW,
> > > if it didn't work before, I don't expect to make it work with this patch,
> > > this patch merely fixes the infinite loop, which is sufficient as a bug fix.
> > > Otherwise it would become a feature improvement. (Don't get me wrong, I
> > > think this feature should be improved rather than simply forbidden, it just
> > > belongs to a different patch.)
> >
> >
> > A quick solution is what William had. I asked him to use ext_cb not
> > because i think it is a better solution but just so we can move
> > forward.
> > Agree that for a longer term we need a more generic solution as discussed ...
> >
> > cheers,
> > jamal
>
> The tc_skb_ext approach has a problem... the config option that enables it is NET_TC_SKB_EXT. I assumed this is a generic name for skb extensions in the tc subsystem, but unfortunately this is hardcoded for NET_CLS_ACT recirculation support.
>
> So what this means is we have the following choices:
> 1. Make SCH_NETEM depend on NET_CLS_ACT and NET_TC_SKB_EXT
> 2. Add "|| IS_ENABLED(CONFIG_SCH_NETEM)" next to "IS_ENABLED(CONFIG_NET_TC_SKB_EXT)"
> 3. Separate NET_TC_SKB_EXT and the idea of recirculation support. But I'm not sure how people feel about renaming config options. And this would require a small change to the Mellanox driver subsystem.
>
> None of these sound too nice to do, and I'm not sure which approach to take. In an ideal world, 3 would be best, but I'm not sure how others would feel about all that just to account for a netem edge case.
>

I think you should just create a new field/type, add it here:
include/linux/skbuff.h around line 4814 and make netem just select
CONFIG_SKB_EXTENSIONS kconfig
It's not the best solution but we are grasping for straws at this point.

cheers,
jamal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-07 20:49               ` Jamal Hadi Salim
@ 2025-07-07 21:26                 ` Jakub Kicinski
  2025-07-08 13:18                   ` Jamal Hadi Salim
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-07-07 21:26 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: William Liu, Cong Wang, netdev, stephen, Savino Dicanosa

On Mon, 7 Jul 2025 16:49:46 -0400 Jamal Hadi Salim wrote:
> > The tc_skb_ext approach has a problem... the config option that
> > enables it is NET_TC_SKB_EXT. I assumed this is a generic name for
> > skb extensions in the tc subsystem, but unfortunately this is
> > hardcoded for NET_CLS_ACT recirculation support.
> >
> > So what this means is we have the following choices:
> > 1. Make SCH_NETEM depend on NET_CLS_ACT and NET_TC_SKB_EXT
> > 2. Add "|| IS_ENABLED(CONFIG_SCH_NETEM)" next to
> > "IS_ENABLED(CONFIG_NET_TC_SKB_EXT)" 3. Separate NET_TC_SKB_EXT and
> > the idea of recirculation support. But I'm not sure how people feel
> > about renaming config options. And this would require a small
> > change to the Mellanox driver subsystem.
> >
> > None of these sound too nice to do, and I'm not sure which approach
> > to take. In an ideal world, 3 would be best, but I'm not sure how
> > others would feel about all that just to account for a netem edge
> > case. 
> 
> I think you should just create a new field/type, add it here:
> include/linux/skbuff.h around line 4814 and make netem just select
> CONFIG_SKB_EXTENSIONS kconfig
> It's not the best solution but we are grasping for straws at this
> point.

Did someone report a real user of nested duplication?

Let's go ahead with the patch preventing such configurations and worry
about supporting them IIF someone actually asks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops
  2025-07-07 21:26                 ` Jakub Kicinski
@ 2025-07-08 13:18                   ` Jamal Hadi Salim
  0 siblings, 0 replies; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-08 13:18 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: William Liu, Cong Wang, netdev, stephen, Savino Dicanosa

On Mon, Jul 7, 2025 at 5:26 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 7 Jul 2025 16:49:46 -0400 Jamal Hadi Salim wrote:
> > > The tc_skb_ext approach has a problem... the config option that
> > > enables it is NET_TC_SKB_EXT. I assumed this is a generic name for
> > > skb extensions in the tc subsystem, but unfortunately this is
> > > hardcoded for NET_CLS_ACT recirculation support.
> > >
> > > So what this means is we have the following choices:
> > > 1. Make SCH_NETEM depend on NET_CLS_ACT and NET_TC_SKB_EXT
> > > 2. Add "|| IS_ENABLED(CONFIG_SCH_NETEM)" next to
> > > "IS_ENABLED(CONFIG_NET_TC_SKB_EXT)" 3. Separate NET_TC_SKB_EXT and
> > > the idea of recirculation support. But I'm not sure how people feel
> > > about renaming config options. And this would require a small
> > > change to the Mellanox driver subsystem.
> > >
> > > None of these sound too nice to do, and I'm not sure which approach
> > > to take. In an ideal world, 3 would be best, but I'm not sure how
> > > others would feel about all that just to account for a netem edge
> > > case.
> >
> > I think you should just create a new field/type, add it here:
> > include/linux/skbuff.h around line 4814 and make netem just select
> > CONFIG_SKB_EXTENSIONS kconfig
> > It's not the best solution but we are grasping for straws at this
> > point.
>
> Did someone report a real user of nested duplication?
>

None i have seen - regardless, if there was one it would be a strange use case.

> Let's go ahead with the patch preventing such configurations and worry
> about supporting them IIF someone actually asks.

That was my view as well.

cheers,
jamal

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-07-08 13:18 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01 23:13 [Patch net 0/2] netem: Fix skb duplication logic to prevent infinite loops Cong Wang
2025-07-01 23:13 ` [Patch net 1/2] " Cong Wang
2025-07-02  1:57   ` Cong Wang
2025-07-02 14:12     ` Jamal Hadi Salim
2025-07-02 15:04       ` Jamal Hadi Salim
2025-07-02 15:06         ` Jamal Hadi Salim
2025-07-02 15:20           ` William Liu
2025-07-05  0:48         ` Cong Wang
2025-07-05 13:52           ` Jamal Hadi Salim
2025-07-06 14:59             ` William Liu
2025-07-07 20:49               ` Jamal Hadi Salim
2025-07-07 21:26                 ` Jakub Kicinski
2025-07-08 13:18                   ` Jamal Hadi Salim
2025-07-07 19:40             ` Cong Wang
2025-07-07 20:24               ` Jamal Hadi Salim
2025-07-01 23:13 ` [Patch net 2/2] selftests/tc-testing: Add a nested netem duplicate test Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).