netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
@ 2025-07-08 16:43 William Liu
  2025-07-08 16:44 ` [PATCH net v5 2/2] selftests/tc-testing: Add tests for restrictions on netem duplication William Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: William Liu @ 2025-07-08 16:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, victor, pctammela, pabeni, kuba, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms, William Liu

netem_enqueue's duplication prevention logic breaks when a netem
resides in a qdisc tree with other netems - this can lead to a
soft lockup and OOM loop in netem_dequeue, as seen in [1].
Ensure that a duplicating netem cannot exist in a tree with other
netems.

Previous approaches suggested in discussions in chronological order:

1) Track duplication status or ttl in the sk_buff struct. Considered
too specific a use case to extend such a struct, though this would
be a resilient fix and address other previous and potential future
DOS bugs like the one described in loopy fun [2].

2) Restrict netem_enqueue recursion depth like in act_mirred with a
per cpu variable. However, netem_dequeue can call enqueue on its
child, and the depth restriction could be bypassed if the child is a
netem.

3) Use the same approach as in 2, but add metadata in netem_skb_cb
to handle the netem_dequeue case and track a packet's involvement
in duplication. This is an overly complex approach, and Jamal
notes that the skb cb can be overwritten to circumvent this
safeguard.

4) Prevent the addition of a netem to a qdisc tree if its ancestral
path contains a netem. However, filters and actions can cause a
packet to change paths when re-enqueued to the root from netem
duplication, leading us to the current solution: prevent a
duplicating netem from inhabiting the same tree as other netems.

[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/
[2] https://lwn.net/Articles/719297/

Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
Reported-by: William Liu <will@willsroot.io>
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Signed-off-by: William Liu <will@willsroot.io>
Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io>
---
v4 -> v5: no changes, reposting per Jakub's request
v3 -> v4:
  - Clarify changelog with chronological order of attempted solutions
v2 -> v3:
  - Clarify reasoning for approach in changelog
  - Removed has_duplication
v1 -> v2:
  - Renamed only_duplicating to duplicates and invert logic for clarity
---
 net/sched/sch_netem.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index fdd79d3ccd8c..eafc316ae319 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -973,6 +973,41 @@ static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
 	return 0;
 }
 
+static const struct Qdisc_class_ops netem_class_ops;
+
+static int check_netem_in_tree(struct Qdisc *sch, bool duplicates,
+			       struct netlink_ext_ack *extack)
+{
+	struct Qdisc *root, *q;
+	unsigned int i;
+
+	root = qdisc_root_sleeping(sch);
+
+	if (sch != root && root->ops->cl_ops == &netem_class_ops) {
+		if (duplicates ||
+		    ((struct netem_sched_data *)qdisc_priv(root))->duplicate)
+			goto err;
+	}
+
+	if (!qdisc_dev(root))
+		return 0;
+
+	hash_for_each(qdisc_dev(root)->qdisc_hash, i, q, hash) {
+		if (sch != q && q->ops->cl_ops == &netem_class_ops) {
+			if (duplicates ||
+			    ((struct netem_sched_data *)qdisc_priv(q))->duplicate)
+				goto err;
+		}
+	}
+
+	return 0;
+
+err:
+	NL_SET_ERR_MSG(extack,
+		       "netem: cannot mix duplicating netems with other netems in tree");
+	return -EINVAL;
+}
+
 /* Parse netlink message to set options */
 static int netem_change(struct Qdisc *sch, struct nlattr *opt,
 			struct netlink_ext_ack *extack)
@@ -1031,6 +1066,11 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
 	q->gap = qopt->gap;
 	q->counter = 0;
 	q->loss = qopt->loss;
+
+	ret = check_netem_in_tree(sch, qopt->duplicate, extack);
+	if (ret)
+		goto unlock;
+
 	q->duplicate = qopt->duplicate;
 
 	/* for compatibility with earlier versions.
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net v5 2/2] selftests/tc-testing: Add tests for restrictions on netem duplication
  2025-07-08 16:43 [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree William Liu
@ 2025-07-08 16:44 ` William Liu
  2025-07-08 18:50   ` Jamal Hadi Salim
  2025-07-08 18:50 ` [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree Jamal Hadi Salim
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: William Liu @ 2025-07-08 16:44 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, victor, pctammela, pabeni, kuba, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms, William Liu

Ensure that a duplicating netem cannot exist in a tree with other netems
in both qdisc addition and change. This is meant to prevent the soft
lockup and OOM loop scenario discussed in [1]. Also adjust a HFSC's
re-entrancy test case with netem for this new restriction - KASAN
still triggers upon its failure.

[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/

Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
---
v1 -> v2:
  - Fixed existing test case for new restrictions
  - Add test cases
  - Use dummy instead of lo
---
 .../tc-testing/tc-tests/infra/qdiscs.json     |  5 +-
 .../tc-testing/tc-tests/qdiscs/netem.json     | 81 +++++++++++++++++++
 2 files changed, 83 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
index 9aa44d8176d9..9acc88297484 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
@@ -478,7 +478,6 @@
             "$TC qdisc add dev $DUMMY parent 1:1 handle 2:0 netem duplicate 100%",
             "$TC filter add dev $DUMMY parent 1:0 protocol ip prio 1 u32 match ip dst 10.10.10.1/32 flowid 1:1",
             "$TC class add dev $DUMMY parent 1:0 classid 1:2 hfsc ls m2 10Mbit",
-            "$TC qdisc add dev $DUMMY parent 1:2 handle 3:0 netem duplicate 100%",
             "$TC filter add dev $DUMMY parent 1:0 protocol ip prio 2 u32 match ip dst 10.10.10.2/32 flowid 1:2",
             "ping -c 1 10.10.10.1 -I$DUMMY > /dev/null || true",
             "$TC filter del dev $DUMMY parent 1:0 protocol ip prio 1",
@@ -491,8 +490,8 @@
             {
                 "kind": "hfsc",
                 "handle": "1:",
-                "bytes": 392,
-                "packets": 4
+                "bytes": 294,
+                "packets": 3
             }
         ],
         "matchCount": "1",
diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json
index 3c4444961488..718d2df2aafa 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json
@@ -336,5 +336,86 @@
         "teardown": [
             "$TC qdisc del dev $DUMMY handle 1: root"
         ]
+    },
+    {
+        "id": "d34d",
+        "name": "NETEM test qdisc duplication restriction in qdisc tree in netem_change root",
+        "category": ["qdisc", "netem"],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DUMMY root handle 1: netem limit 1",
+            "$TC qdisc add dev $DUMMY parent 1: handle 2: netem limit 1"
+        ],
+        "cmdUnderTest": "$TC qdisc change dev $DUMMY handle 1: netem duplicate 50%",
+        "expExitCode": "2",
+        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+        "matchPattern": "qdisc netem",
+        "matchCount": "2",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY handle 1:0 root"
+        ]
+    },
+    {
+        "id": "b33f",
+        "name": "NETEM test qdisc duplication restriction in qdisc tree in netem_change non-root",
+        "category": ["qdisc", "netem"],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DUMMY root handle 1: netem limit 1",
+            "$TC qdisc add dev $DUMMY parent 1: handle 2: netem limit 1"
+        ],
+        "cmdUnderTest": "$TC qdisc change dev $DUMMY handle 2: netem duplicate 50%",
+        "expExitCode": "2",
+        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+        "matchPattern": "qdisc netem",
+        "matchCount": "2",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY handle 1:0 root"
+        ]
+    },
+    {
+        "id": "cafe",
+        "name": "NETEM test qdisc duplication restriction in qdisc tree",
+        "category": ["qdisc", "netem"],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DUMMY root handle 1: netem limit 1 duplicate 100%"
+        ],
+        "cmdUnderTest": "$TC qdisc add dev $DUMMY parent 1: handle 2: netem duplicate 100%",
+        "expExitCode": "2",
+        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+        "matchPattern": "qdisc netem",
+        "matchCount": "1",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY handle 1:0 root"
+        ]
+    },
+    {
+        "id": "1337",
+        "name": "NETEM test qdisc duplication restriction in qdisc tree across branches",
+        "category": ["qdisc", "netem"],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DUMMY parent root handle 1:0 hfsc",
+            "$TC class add dev $DUMMY parent 1:0 classid 1:1 hfsc rt m2 10Mbit",
+            "$TC qdisc add dev $DUMMY parent 1:1 handle 2:0 netem",
+            "$TC class add dev $DUMMY parent 1:0 classid 1:2 hfsc rt m2 10Mbit"
+        ],
+        "cmdUnderTest": "$TC qdisc add dev $DUMMY parent 1:2 handle 3:0 netem duplicate 100%",
+        "expExitCode": "2",
+        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+        "matchPattern": "qdisc netem",
+        "matchCount": "1",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY handle 1:0 root"
+        ]
     }
 ]
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
  2025-07-08 16:43 [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree William Liu
  2025-07-08 16:44 ` [PATCH net v5 2/2] selftests/tc-testing: Add tests for restrictions on netem duplication William Liu
@ 2025-07-08 18:50 ` Jamal Hadi Salim
  2025-07-08 19:42 ` This breaks netem use cases Cong Wang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-08 18:50 UTC (permalink / raw)
  To: William Liu
  Cc: netdev, xiyou.wangcong, victor, pctammela, pabeni, kuba, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms

On Tue, Jul 8, 2025 at 12:43 PM William Liu <will@willsroot.io> wrote:
>
> netem_enqueue's duplication prevention logic breaks when a netem
> resides in a qdisc tree with other netems - this can lead to a
> soft lockup and OOM loop in netem_dequeue, as seen in [1].
> Ensure that a duplicating netem cannot exist in a tree with other
> netems.
>
> Previous approaches suggested in discussions in chronological order:
>
> 1) Track duplication status or ttl in the sk_buff struct. Considered
> too specific a use case to extend such a struct, though this would
> be a resilient fix and address other previous and potential future
> DOS bugs like the one described in loopy fun [2].
>
> 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> per cpu variable. However, netem_dequeue can call enqueue on its
> child, and the depth restriction could be bypassed if the child is a
> netem.
>
> 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> to handle the netem_dequeue case and track a packet's involvement
> in duplication. This is an overly complex approach, and Jamal
> notes that the skb cb can be overwritten to circumvent this
> safeguard.
>
> 4) Prevent the addition of a netem to a qdisc tree if its ancestral
> path contains a netem. However, filters and actions can cause a
> packet to change paths when re-enqueued to the root from netem
> duplication, leading us to the current solution: prevent a
> duplicating netem from inhabiting the same tree as other netems.
>
> [1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/
> [2] https://lwn.net/Articles/719297/
>
> Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
> Reported-by: William Liu <will@willsroot.io>
> Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
> Signed-off-by: William Liu <will@willsroot.io>
> Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

> ---
> v4 -> v5: no changes, reposting per Jakub's request
> v3 -> v4:
>   - Clarify changelog with chronological order of attempted solutions
> v2 -> v3:
>   - Clarify reasoning for approach in changelog
>   - Removed has_duplication
> v1 -> v2:
>   - Renamed only_duplicating to duplicates and invert logic for clarity
> ---
>  net/sched/sch_netem.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fdd79d3ccd8c..eafc316ae319 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -973,6 +973,41 @@ static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
>         return 0;
>  }
>
> +static const struct Qdisc_class_ops netem_class_ops;
> +
> +static int check_netem_in_tree(struct Qdisc *sch, bool duplicates,
> +                              struct netlink_ext_ack *extack)
> +{
> +       struct Qdisc *root, *q;
> +       unsigned int i;
> +
> +       root = qdisc_root_sleeping(sch);
> +
> +       if (sch != root && root->ops->cl_ops == &netem_class_ops) {
> +               if (duplicates ||
> +                   ((struct netem_sched_data *)qdisc_priv(root))->duplicate)
> +                       goto err;
> +       }
> +
> +       if (!qdisc_dev(root))
> +               return 0;
> +
> +       hash_for_each(qdisc_dev(root)->qdisc_hash, i, q, hash) {
> +               if (sch != q && q->ops->cl_ops == &netem_class_ops) {
> +                       if (duplicates ||
> +                           ((struct netem_sched_data *)qdisc_priv(q))->duplicate)
> +                               goto err;
> +               }
> +       }
> +
> +       return 0;
> +
> +err:
> +       NL_SET_ERR_MSG(extack,
> +                      "netem: cannot mix duplicating netems with other netems in tree");
> +       return -EINVAL;
> +}
> +
>  /* Parse netlink message to set options */
>  static int netem_change(struct Qdisc *sch, struct nlattr *opt,
>                         struct netlink_ext_ack *extack)
> @@ -1031,6 +1066,11 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
>         q->gap = qopt->gap;
>         q->counter = 0;
>         q->loss = qopt->loss;
> +
> +       ret = check_netem_in_tree(sch, qopt->duplicate, extack);
> +       if (ret)
> +               goto unlock;
> +
>         q->duplicate = qopt->duplicate;
>
>         /* for compatibility with earlier versions.
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net v5 2/2] selftests/tc-testing: Add tests for restrictions on netem duplication
  2025-07-08 16:44 ` [PATCH net v5 2/2] selftests/tc-testing: Add tests for restrictions on netem duplication William Liu
@ 2025-07-08 18:50   ` Jamal Hadi Salim
  0 siblings, 0 replies; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-08 18:50 UTC (permalink / raw)
  To: William Liu
  Cc: netdev, xiyou.wangcong, victor, pctammela, pabeni, kuba, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms

On Tue, Jul 8, 2025 at 12:44 PM William Liu <will@willsroot.io> wrote:
>
> Ensure that a duplicating netem cannot exist in a tree with other netems
> in both qdisc addition and change. This is meant to prevent the soft
> lockup and OOM loop scenario discussed in [1]. Also adjust a HFSC's
> re-entrancy test case with netem for this new restriction - KASAN
> still triggers upon its failure.
>
> [1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/
>
> Signed-off-by: William Liu <will@willsroot.io>
> Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

> v1 -> v2:
>   - Fixed existing test case for new restrictions
>   - Add test cases
>   - Use dummy instead of lo
> ---
>  .../tc-testing/tc-tests/infra/qdiscs.json     |  5 +-
>  .../tc-testing/tc-tests/qdiscs/netem.json     | 81 +++++++++++++++++++
>  2 files changed, 83 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
> index 9aa44d8176d9..9acc88297484 100644
> --- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
> +++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
> @@ -478,7 +478,6 @@
>              "$TC qdisc add dev $DUMMY parent 1:1 handle 2:0 netem duplicate 100%",
>              "$TC filter add dev $DUMMY parent 1:0 protocol ip prio 1 u32 match ip dst 10.10.10.1/32 flowid 1:1",
>              "$TC class add dev $DUMMY parent 1:0 classid 1:2 hfsc ls m2 10Mbit",
> -            "$TC qdisc add dev $DUMMY parent 1:2 handle 3:0 netem duplicate 100%",
>              "$TC filter add dev $DUMMY parent 1:0 protocol ip prio 2 u32 match ip dst 10.10.10.2/32 flowid 1:2",
>              "ping -c 1 10.10.10.1 -I$DUMMY > /dev/null || true",
>              "$TC filter del dev $DUMMY parent 1:0 protocol ip prio 1",
> @@ -491,8 +490,8 @@
>              {
>                  "kind": "hfsc",
>                  "handle": "1:",
> -                "bytes": 392,
> -                "packets": 4
> +                "bytes": 294,
> +                "packets": 3
>              }
>          ],
>          "matchCount": "1",
> diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json
> index 3c4444961488..718d2df2aafa 100644
> --- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json
> +++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/netem.json
> @@ -336,5 +336,86 @@
>          "teardown": [
>              "$TC qdisc del dev $DUMMY handle 1: root"
>          ]
> +    },
> +    {
> +        "id": "d34d",
> +        "name": "NETEM test qdisc duplication restriction in qdisc tree in netem_change root",
> +        "category": ["qdisc", "netem"],
> +        "plugins": {
> +            "requires": "nsPlugin"
> +        },
> +        "setup": [
> +            "$TC qdisc add dev $DUMMY root handle 1: netem limit 1",
> +            "$TC qdisc add dev $DUMMY parent 1: handle 2: netem limit 1"
> +        ],
> +        "cmdUnderTest": "$TC qdisc change dev $DUMMY handle 1: netem duplicate 50%",
> +        "expExitCode": "2",
> +        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
> +        "matchPattern": "qdisc netem",
> +        "matchCount": "2",
> +        "teardown": [
> +            "$TC qdisc del dev $DUMMY handle 1:0 root"
> +        ]
> +    },
> +    {
> +        "id": "b33f",
> +        "name": "NETEM test qdisc duplication restriction in qdisc tree in netem_change non-root",
> +        "category": ["qdisc", "netem"],
> +        "plugins": {
> +            "requires": "nsPlugin"
> +        },
> +        "setup": [
> +            "$TC qdisc add dev $DUMMY root handle 1: netem limit 1",
> +            "$TC qdisc add dev $DUMMY parent 1: handle 2: netem limit 1"
> +        ],
> +        "cmdUnderTest": "$TC qdisc change dev $DUMMY handle 2: netem duplicate 50%",
> +        "expExitCode": "2",
> +        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
> +        "matchPattern": "qdisc netem",
> +        "matchCount": "2",
> +        "teardown": [
> +            "$TC qdisc del dev $DUMMY handle 1:0 root"
> +        ]
> +    },
> +    {
> +        "id": "cafe",
> +        "name": "NETEM test qdisc duplication restriction in qdisc tree",
> +        "category": ["qdisc", "netem"],
> +        "plugins": {
> +            "requires": "nsPlugin"
> +        },
> +        "setup": [
> +            "$TC qdisc add dev $DUMMY root handle 1: netem limit 1 duplicate 100%"
> +        ],
> +        "cmdUnderTest": "$TC qdisc add dev $DUMMY parent 1: handle 2: netem duplicate 100%",
> +        "expExitCode": "2",
> +        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
> +        "matchPattern": "qdisc netem",
> +        "matchCount": "1",
> +        "teardown": [
> +            "$TC qdisc del dev $DUMMY handle 1:0 root"
> +        ]
> +    },
> +    {
> +        "id": "1337",
> +        "name": "NETEM test qdisc duplication restriction in qdisc tree across branches",
> +        "category": ["qdisc", "netem"],
> +        "plugins": {
> +            "requires": "nsPlugin"
> +        },
> +        "setup": [
> +            "$TC qdisc add dev $DUMMY parent root handle 1:0 hfsc",
> +            "$TC class add dev $DUMMY parent 1:0 classid 1:1 hfsc rt m2 10Mbit",
> +            "$TC qdisc add dev $DUMMY parent 1:1 handle 2:0 netem",
> +            "$TC class add dev $DUMMY parent 1:0 classid 1:2 hfsc rt m2 10Mbit"
> +        ],
> +        "cmdUnderTest": "$TC qdisc add dev $DUMMY parent 1:2 handle 3:0 netem duplicate 100%",
> +        "expExitCode": "2",
> +        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
> +        "matchPattern": "qdisc netem",
> +        "matchCount": "1",
> +        "teardown": [
> +            "$TC qdisc del dev $DUMMY handle 1:0 root"
> +        ]
>      }
>  ]
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* This breaks netem use cases
  2025-07-08 16:43 [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree William Liu
  2025-07-08 16:44 ` [PATCH net v5 2/2] selftests/tc-testing: Add tests for restrictions on netem duplication William Liu
  2025-07-08 18:50 ` [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree Jamal Hadi Salim
@ 2025-07-08 19:42 ` Cong Wang
  2025-07-08 20:35   ` Jamal Hadi Salim
  2025-07-10  8:26   ` Paolo Abeni
  2025-07-11 22:55 ` [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree Jakub Kicinski
  2025-07-11 23:00 ` patchwork-bot+netdevbpf
  4 siblings, 2 replies; 16+ messages in thread
From: Cong Wang @ 2025-07-08 19:42 UTC (permalink / raw)
  To: William Liu
  Cc: netdev, jhs, victor, pctammela, pabeni, kuba, stephen, dcaratti,
	savy, jiri, davem, edumazet, horms, linux-kernel

(Cc LKML for more audience, since this clearly breaks potentially useful
use cases)

On Tue, Jul 08, 2025 at 04:43:26PM +0000, William Liu wrote:
> netem_enqueue's duplication prevention logic breaks when a netem
> resides in a qdisc tree with other netems - this can lead to a
> soft lockup and OOM loop in netem_dequeue, as seen in [1].
> Ensure that a duplicating netem cannot exist in a tree with other
> netems.

As I already warned in your previous patchset, this breaks the following
potentially useful use case:

sudo tc qdisc add dev eth0 root handle 1: mq
sudo tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 100%
sudo tc qdisc add dev eth0 parent 1:2 handle 20: netem duplicate 100%

I don't see any logical problem of such use case, therefore we should
consider it as valid, we can't break it.

> 
> Previous approaches suggested in discussions in chronological order:
> 
> 1) Track duplication status or ttl in the sk_buff struct. Considered
> too specific a use case to extend such a struct, though this would
> be a resilient fix and address other previous and potential future
> DOS bugs like the one described in loopy fun [2].

The link you provid is from 8 years ago, since then the redirection
logic has been improved. I am not sure why it helps to justify your
refusal of this approach. 

I also strongly disagree with "too specific a use case to extend such
a struct", we simply have so many use-case-specific fields within
sk_buff->cb. For example, the tc_skb_cb->zone is very specific
for act_ct.

skb->cb is precisely designed to be use-case-specific and layer-specific.

None of the above points stands.

> 
> 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> per cpu variable. However, netem_dequeue can call enqueue on its
> child, and the depth restriction could be bypassed if the child is a
> netem.
> 
> 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> to handle the netem_dequeue case and track a packet's involvement
> in duplication. This is an overly complex approach, and Jamal
> notes that the skb cb can be overwritten to circumvent this
> safeguard.

This is not true, except qdisc_skb_cb(skb)->data, other area of
skb->cb is preserved within Qdisc layer.

Based on the above reasoning, this is clearly no way to go:

NACK-by: Cong Wang <xiyou.wangcong@gmail.com>

Sorry for standing firmly for the users, we simply don't break use
cases. This is nothing personal, just a firm principle.

Please let me know if there is anything else I can help you with. I am
always ready to help (but not in a way of breaking use cases).

Thanks for your understanding!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: This breaks netem use cases
  2025-07-08 19:42 ` This breaks netem use cases Cong Wang
@ 2025-07-08 20:35   ` Jamal Hadi Salim
  2025-07-08 21:32     ` Cong Wang
  2025-07-10  8:26   ` Paolo Abeni
  1 sibling, 1 reply; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-08 20:35 UTC (permalink / raw)
  To: Cong Wang
  Cc: William Liu, netdev, victor, pctammela, pabeni, kuba, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms, linux-kernel

On Tue, Jul 8, 2025 at 3:42 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> (Cc LKML for more audience, since this clearly breaks potentially useful
> use cases)
>
> On Tue, Jul 08, 2025 at 04:43:26PM +0000, William Liu wrote:
> > netem_enqueue's duplication prevention logic breaks when a netem
> > resides in a qdisc tree with other netems - this can lead to a
> > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > Ensure that a duplicating netem cannot exist in a tree with other
> > netems.
>
> As I already warned in your previous patchset, this breaks the following
> potentially useful use case:
>
> sudo tc qdisc add dev eth0 root handle 1: mq
> sudo tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 100%
> sudo tc qdisc add dev eth0 parent 1:2 handle 20: netem duplicate 100%
>
> I don't see any logical problem of such use case, therefore we should
> consider it as valid, we can't break it.
>

I thought we are trying to provide an intermediate solution to plug an
existing hole and come up with a longer term solution.
If there are users of such a "potential setup" you show above we are
going to find out very quickly.
We are working against security people who are finding all sorts of
"potential use cases" to create CVEs.

> >
> > Previous approaches suggested in discussions in chronological order:
> >
> > 1) Track duplication status or ttl in the sk_buff struct. Considered
> > too specific a use case to extend such a struct, though this would
> > be a resilient fix and address other previous and potential future
> > DOS bugs like the one described in loopy fun [2].
>
> The link you provid is from 8 years ago, since then the redirection
> logic has been improved. I am not sure why it helps to justify your
> refusal of this approach.
>
> I also strongly disagree with "too specific a use case to extend such
> a struct", we simply have so many use-case-specific fields within
> sk_buff->cb. For example, the tc_skb_cb->zone is very specific
> for act_ct.
>
> skb->cb is precisely designed to be use-case-specific and layer-specific.
>
> None of the above points stands.
>

I doubt you have looked at the code based on how you keep coming back
with the same points.

> >
> > 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> > per cpu variable. However, netem_dequeue can call enqueue on its
> > child, and the depth restriction could be bypassed if the child is a
> > netem.
> >
> > 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> > to handle the netem_dequeue case and track a packet's involvement
> > in duplication. This is an overly complex approach, and Jamal
> > notes that the skb cb can be overwritten to circumvent this
> > safeguard.
>
> This is not true, except qdisc_skb_cb(skb)->data, other area of
> skb->cb is preserved within Qdisc layer.
>

Your approach has issues as i pointed out. At minimum invest the time
please and look at code.
I am certain you could keep changing other code outside of netem and
fix all issues you are exposing.
We agreed this is for a short term solution and needs to be contained
on just netem, what is the point of this whole long discussion really?
We have spent over a month already..


cheers,
jamal

> Based on the above reasoning, this is clearly no way to go:
>
> NACK-by: Cong Wang <xiyou.wangcong@gmail.com>
>
> Sorry for standing firmly for the users, we simply don't break use
> cases. This is nothing personal, just a firm principle.
>
> Please let me know if there is anything else I can help you with. I am
> always ready to help (but not in a way of breaking use cases).
>
> Thanks for your understanding!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: This breaks netem use cases
  2025-07-08 20:35   ` Jamal Hadi Salim
@ 2025-07-08 21:32     ` Cong Wang
  2025-07-08 22:26       ` Jamal Hadi Salim
  0 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2025-07-08 21:32 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: William Liu, netdev, victor, pctammela, pabeni, kuba, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms, linux-kernel,
	torvalds

(Cc Linus Torvalds)

On Tue, Jul 08, 2025 at 04:35:37PM -0400, Jamal Hadi Salim wrote:
> On Tue, Jul 8, 2025 at 3:42 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > (Cc LKML for more audience, since this clearly breaks potentially useful
> > use cases)
> >
> > On Tue, Jul 08, 2025 at 04:43:26PM +0000, William Liu wrote:
> > > netem_enqueue's duplication prevention logic breaks when a netem
> > > resides in a qdisc tree with other netems - this can lead to a
> > > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > > Ensure that a duplicating netem cannot exist in a tree with other
> > > netems.
> >
> > As I already warned in your previous patchset, this breaks the following
> > potentially useful use case:
> >
> > sudo tc qdisc add dev eth0 root handle 1: mq
> > sudo tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 100%
> > sudo tc qdisc add dev eth0 parent 1:2 handle 20: netem duplicate 100%
> >
> > I don't see any logical problem of such use case, therefore we should
> > consider it as valid, we can't break it.
> >
> 
> I thought we are trying to provide an intermediate solution to plug an
> existing hole and come up with a longer term solution.

Breaking valid use cases even for a short period is still no way to go.
Sorry, Jamal. Since I can't convince you, please ask Linus.

Also, I don't see you have proposed any long term solution. If you
really have one, please state it clearly and provide a clear timeline to
users.

> If there are users of such a "potential setup" you show above we are
> going to find out very quickly.

Please read the above specific example. It is more than just valid, it
is very reasonable, installing netem for each queue is the right way of
using netem duplication to avoid the global root spinlock in a multiqueue
setup.

Breaking users and letting them complain is not a good strategy either.

On the other hand, thanks for acknowledging it breaks users, which
confirms my point.

I will wait for Linus' response.

> We are working against security people who are finding all sorts of
> "potential use cases" to create CVEs.

I seriouly doubt the urgency of those CVE's, because none of them can be
triggered without root. Please don't get me wrong, I already fixed many of
them, but I believe they can wait, false urgency does not help anything.

> 
> > >
> > > Previous approaches suggested in discussions in chronological order:
> > >
> > > 1) Track duplication status or ttl in the sk_buff struct. Considered
> > > too specific a use case to extend such a struct, though this would
> > > be a resilient fix and address other previous and potential future
> > > DOS bugs like the one described in loopy fun [2].
> >
> > The link you provid is from 8 years ago, since then the redirection
> > logic has been improved. I am not sure why it helps to justify your
> > refusal of this approach.
> >
> > I also strongly disagree with "too specific a use case to extend such
> > a struct", we simply have so many use-case-specific fields within
> > sk_buff->cb. For example, the tc_skb_cb->zone is very specific
> > for act_ct.
> >
> > skb->cb is precisely designed to be use-case-specific and layer-specific.
> >
> > None of the above points stands.
> >
> 
> I doubt you have looked at the code based on how you keep coming back
> with the same points.

Please avoid personal attacks. It helps nothing to your argument here,
in fact, it will only weaken your arguments.

> > >
> > > 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> > > per cpu variable. However, netem_dequeue can call enqueue on its
> > > child, and the depth restriction could be bypassed if the child is a
> > > netem.
> > >
> > > 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> > > to handle the netem_dequeue case and track a packet's involvement
> > > in duplication. This is an overly complex approach, and Jamal
> > > notes that the skb cb can be overwritten to circumvent this
> > > safeguard.
> >
> > This is not true, except qdisc_skb_cb(skb)->data, other area of
> > skb->cb is preserved within Qdisc layer.
> >
> 
> Your approach has issues as i pointed out. At minimum invest the time
> please and look at code.

Sure, no one's patch is perfect. I am open to improve any of my patch.
First, let's discard this user-breaking patch for disatractions and
start focusing other solutions (not necessarily mine).

If it could help you, I can set the author to be you. I have no interest
to take any credit out of here, the reason why I sent out a patch is only
because you asked me to help.

> I am certain you could keep changing other code outside of netem and
> fix all issues you are exposing.
> We agreed this is for a short term solution and needs to be contained

I never agreed with you on breaking users, to me it is out of table for
discussion. Just to clarify.

> on just netem, what is the point of this whole long discussion really?

To defend users, obviously.

> We have spent over a month already..

Sorry to hear that, I think you are going to a wrong direction. The
earlier you switch to a right direction, the sooner we will have a right
solution without breaking any users.

Once again, please let me know how I specifically can help you out of
this situation. I am here to solve problems, not to bring one.

(If you need a video call for help, my calendar is open.)

Thanks for your understanding!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: This breaks netem use cases
  2025-07-08 21:32     ` Cong Wang
@ 2025-07-08 22:26       ` Jamal Hadi Salim
  2025-07-08 22:45         ` Stephen Hemminger
  2025-07-11  5:19         ` Cong Wang
  0 siblings, 2 replies; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-08 22:26 UTC (permalink / raw)
  To: Cong Wang
  Cc: William Liu, netdev, victor, pctammela, pabeni, kuba, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms, linux-kernel,
	torvalds

On Tue, Jul 8, 2025 at 5:32 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> (Cc Linus Torvalds)
>
> On Tue, Jul 08, 2025 at 04:35:37PM -0400, Jamal Hadi Salim wrote:
> > On Tue, Jul 8, 2025 at 3:42 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > (Cc LKML for more audience, since this clearly breaks potentially useful
> > > use cases)
> > >
> > > On Tue, Jul 08, 2025 at 04:43:26PM +0000, William Liu wrote:
> > > > netem_enqueue's duplication prevention logic breaks when a netem
> > > > resides in a qdisc tree with other netems - this can lead to a
> > > > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > > > Ensure that a duplicating netem cannot exist in a tree with other
> > > > netems.
> > >
> > > As I already warned in your previous patchset, this breaks the following
> > > potentially useful use case:
> > >
> > > sudo tc qdisc add dev eth0 root handle 1: mq
> > > sudo tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 100%
> > > sudo tc qdisc add dev eth0 parent 1:2 handle 20: netem duplicate 100%
> > >
> > > I don't see any logical problem of such use case, therefore we should
> > > consider it as valid, we can't break it.
> > >
> >
> > I thought we are trying to provide an intermediate solution to plug an
> > existing hole and come up with a longer term solution.
>
> Breaking valid use cases even for a short period is still no way to go.
> Sorry, Jamal. Since I can't convince you, please ask Linus.
>
> Also, I don't see you have proposed any long term solution. If you
> really have one, please state it clearly and provide a clear timeline to
> users.
>

I explained my approach a few times: We need to come up with a long
term solution that looks at the sanity of hierarchies.
Equivalent to init/change()
Today we only look at netlink requests for a specific qdisc. The new
approach (possibly an ops) will also look at the sanity of configs in
relation to hierarchies.
You can work on it or come with an alternative proposal.
That is not the scope of this discussion though

> > If there are users of such a "potential setup" you show above we are
> > going to find out very quickly.
>
> Please read the above specific example. It is more than just valid, it
> is very reasonable, installing netem for each queue is the right way of
> using netem duplication to avoid the global root spinlock in a multiqueue
> setup.
>

In all my years working on tc I have never seen _anyone_ using
duplication where netem is _not the root qdisc_. And i have done a lot
of "support" in this area.
You can craft any example you want but it needs to be practical - I
dont see the practicality in your example.
Just because we allow arbitrary crafting of hierarchies doesnt mean
they are correct.
The choice is between complicating things to fix a "potential" corner
use case vs simplicity (especially of a short term approach that is
intended to be obsoleted in the long term).


> Breaking users and letting them complain is not a good strategy either.
>
> On the other hand, thanks for acknowledging it breaks users, which
> confirms my point.
>
> I will wait for Linus' response.
>
> > We are working against security people who are finding all sorts of
> > "potential use cases" to create CVEs.
>
> I seriouly doubt the urgency of those CVE's, because none of them can be
> triggered without root. Please don't get me wrong, I already fixed many of
> them, but I believe they can wait, false urgency does not help anything.
>

All tc rules require root including your example  - afaik, bounties
are being given for unprivileged user namespaces

> >
> > > >
> > > > Previous approaches suggested in discussions in chronological order:
> > > >
> > > > 1) Track duplication status or ttl in the sk_buff struct. Considered
> > > > too specific a use case to extend such a struct, though this would
> > > > be a resilient fix and address other previous and potential future
> > > > DOS bugs like the one described in loopy fun [2].
> > >
> > > The link you provid is from 8 years ago, since then the redirection
> > > logic has been improved. I am not sure why it helps to justify your
> > > refusal of this approach.
> > >
> > > I also strongly disagree with "too specific a use case to extend such
> > > a struct", we simply have so many use-case-specific fields within
> > > sk_buff->cb. For example, the tc_skb_cb->zone is very specific
> > > for act_ct.
> > >
> > > skb->cb is precisely designed to be use-case-specific and layer-specific.
> > >
> > > None of the above points stands.
> > >
> >
> > I doubt you have looked at the code based on how you keep coming back
> > with the same points.
>
> Please avoid personal attacks. It helps nothing to your argument here,
> in fact, it will only weaken your arguments.
>

How is this a personal attack? You posted a patch that breaks things further.
I pointed it to you _multiple times_. You still posted it as a solution!


> > > >
> > > > 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> > > > per cpu variable. However, netem_dequeue can call enqueue on its
> > > > child, and the depth restriction could be bypassed if the child is a
> > > > netem.
> > > >
> > > > 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> > > > to handle the netem_dequeue case and track a packet's involvement
> > > > in duplication. This is an overly complex approach, and Jamal
> > > > notes that the skb cb can be overwritten to circumvent this
> > > > safeguard.
> > >
> > > This is not true, except qdisc_skb_cb(skb)->data, other area of
> > > skb->cb is preserved within Qdisc layer.
> > >
> >
> > Your approach has issues as i pointed out. At minimum invest the time
> > please and look at code.
>
> Sure, no one's patch is perfect. I am open to improve any of my patch
> First, let's discard this user-breaking patch for disatractions and
> start focusing other solutions (not necessarily mine).
>
> If it could help you, I can set the author to be you.

You think i am after getting my name on patches after all these years?
The patch is not sent by me - it is William's. There's zero credit on
my name. I could have written the patch myself, instead i guided a new
contributor on a path forward. That is my general approach to these
things.

> I have no interest
> to take any credit out of here, the reason why I sent out a patch is only
> because you asked me to help.
>

I think you are still missing the point. Let William get his patch in.

cheers,
jamal

> > I am certain you could keep changing other code outside of netem and
> > fix all issues you are exposing.
> > We agreed this is for a short term solution and needs to be contained
>
> I never agreed with you on breaking users, to me it is out of table for
> discussion. Just to clarify.
>
> > on just netem, what is the point of this whole long discussion really?
>
> To defend users, obviously.
>
> > We have spent over a month already..
>
> Sorry to hear that, I think you are going to a wrong direction. The
> earlier you switch to a right direction, the sooner we will have a right
> solution without breaking any users.
>
> Once again, please let me know how I specifically can help you out of
> this situation. I am here to solve problems, not to bring one.
>
> (If you need a video call for help, my calendar is open.)
>
> Thanks for your understanding!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: This breaks netem use cases
  2025-07-08 22:26       ` Jamal Hadi Salim
@ 2025-07-08 22:45         ` Stephen Hemminger
  2025-07-11  5:19         ` Cong Wang
  1 sibling, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2025-07-08 22:45 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Cong Wang, William Liu, netdev, victor, pctammela, pabeni, kuba,
	dcaratti, savy, jiri, davem, edumazet, horms, linux-kernel,
	torvalds

On Tue, 8 Jul 2025 18:26:28 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> On Tue, Jul 8, 2025 at 5:32 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > (Cc Linus Torvalds)
> >
> > On Tue, Jul 08, 2025 at 04:35:37PM -0400, Jamal Hadi Salim wrote:  
> > > On Tue, Jul 8, 2025 at 3:42 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:  
> > > >
> > > > (Cc LKML for more audience, since this clearly breaks potentially useful
> > > > use cases)
> > > >
> > > > On Tue, Jul 08, 2025 at 04:43:26PM +0000, William Liu wrote:  
> > > > > netem_enqueue's duplication prevention logic breaks when a netem
> > > > > resides in a qdisc tree with other netems - this can lead to a
> > > > > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > > > > Ensure that a duplicating netem cannot exist in a tree with other
> > > > > netems.  
> > > >
> > > > As I already warned in your previous patchset, this breaks the following
> > > > potentially useful use case:
> > > >
> > > > sudo tc qdisc add dev eth0 root handle 1: mq
> > > > sudo tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 100%
> > > > sudo tc qdisc add dev eth0 parent 1:2 handle 20: netem duplicate 100%
> > > >
> > > > I don't see any logical problem of such use case, therefore we should
> > > > consider it as valid, we can't break it.
> > > >  
> > >
> > > I thought we are trying to provide an intermediate solution to plug an
> > > existing hole and come up with a longer term solution.  
> >
> > Breaking valid use cases even for a short period is still no way to go.
> > Sorry, Jamal. Since I can't convince you, please ask Linus.
> >
> > Also, I don't see you have proposed any long term solution. If you
> > really have one, please state it clearly and provide a clear timeline to
> > users.
> >  
> 
> I explained my approach a few times: We need to come up with a long
> term solution that looks at the sanity of hierarchies.
> Equivalent to init/change()
> Today we only look at netlink requests for a specific qdisc. The new
> approach (possibly an ops) will also look at the sanity of configs in
> relation to hierarchies.
> You can work on it or come with an alternative proposal.
> That is not the scope of this discussion though
> 
> > > If there are users of such a "potential setup" you show above we are
> > > going to find out very quickly.  
> >
> > Please read the above specific example. It is more than just valid, it
> > is very reasonable, installing netem for each queue is the right way of
> > using netem duplication to avoid the global root spinlock in a multiqueue
> > setup.
> >  
> 
> In all my years working on tc I have never seen _anyone_ using
> duplication where netem is _not the root qdisc_. And i have done a lot
> of "support" in this area.
> You can craft any example you want but it needs to be practical - I
> dont see the practicality in your example.
> Just because we allow arbitrary crafting of hierarchies doesnt mean
> they are correct.
> The choice is between complicating things to fix a "potential" corner
> use case vs simplicity (especially of a short term approach that is
> intended to be obsoleted in the long term).

One of my initial examples was to use HTB and netem.
Where each class got a different loss rate etc.
But duplication is usually link wide.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: This breaks netem use cases
  2025-07-08 19:42 ` This breaks netem use cases Cong Wang
  2025-07-08 20:35   ` Jamal Hadi Salim
@ 2025-07-10  8:26   ` Paolo Abeni
  2025-07-11  5:44     ` Cong Wang
  1 sibling, 1 reply; 16+ messages in thread
From: Paolo Abeni @ 2025-07-10  8:26 UTC (permalink / raw)
  To: Cong Wang, William Liu
  Cc: netdev, jhs, victor, pctammela, kuba, stephen, dcaratti, savy,
	jiri, davem, edumazet, horms, linux-kernel

On 7/8/25 9:42 PM, Cong Wang wrote:
> (Cc LKML for more audience, since this clearly breaks potentially useful
> use cases)
> 
> On Tue, Jul 08, 2025 at 04:43:26PM +0000, William Liu wrote:
>> netem_enqueue's duplication prevention logic breaks when a netem
>> resides in a qdisc tree with other netems - this can lead to a
>> soft lockup and OOM loop in netem_dequeue, as seen in [1].
>> Ensure that a duplicating netem cannot exist in a tree with other
>> netems.
> 
> As I already warned in your previous patchset, this breaks the following
> potentially useful use case:
> 
> sudo tc qdisc add dev eth0 root handle 1: mq
> sudo tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 100%
> sudo tc qdisc add dev eth0 parent 1:2 handle 20: netem duplicate 100%
> 
> I don't see any logical problem of such use case, therefore we should
> consider it as valid, we can't break it.

My understanding is that even the solution you proposed breaks a
currently accepted configuration:

https://lore.kernel.org/netdev/CAM0EoMmBdZBzfUAms5-0hH5qF5ODvxWfgqrbHaGT6p3-uOD6vg@mail.gmail.com/

I call them (both the linked one and the inline one) 'configurations'
instead of 'use-cases' because I don't see how any of them could have
real users, other than: https://xkcd.com/1172/.

TC historically allowing every configuration, even non completely
nonsensical ones, makes very hard to impossible to address this kind of
issues without breaking any previously accepted configuration.

My personal take would be to go with the change posted here: IMHO
keeping the fix self-encapsulated is better than saving an handful of
LoC and spreading the hack in more visible part of the code.

@Cong: would you reconsider your position?

Paolo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: This breaks netem use cases
  2025-07-08 22:26       ` Jamal Hadi Salim
  2025-07-08 22:45         ` Stephen Hemminger
@ 2025-07-11  5:19         ` Cong Wang
  1 sibling, 0 replies; 16+ messages in thread
From: Cong Wang @ 2025-07-11  5:19 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: William Liu, netdev, victor, pctammela, pabeni, kuba, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms, linux-kernel,
	torvalds

On Tue, Jul 08, 2025 at 06:26:28PM -0400, Jamal Hadi Salim wrote:
> On Tue, Jul 8, 2025 at 5:32 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > (Cc Linus Torvalds)
> >
> > On Tue, Jul 08, 2025 at 04:35:37PM -0400, Jamal Hadi Salim wrote:
> > > On Tue, Jul 8, 2025 at 3:42 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > >
> > > > (Cc LKML for more audience, since this clearly breaks potentially useful
> > > > use cases)
> > > >
> > > > On Tue, Jul 08, 2025 at 04:43:26PM +0000, William Liu wrote:
> > > > > netem_enqueue's duplication prevention logic breaks when a netem
> > > > > resides in a qdisc tree with other netems - this can lead to a
> > > > > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > > > > Ensure that a duplicating netem cannot exist in a tree with other
> > > > > netems.
> > > >
> > > > As I already warned in your previous patchset, this breaks the following
> > > > potentially useful use case:
> > > >
> > > > sudo tc qdisc add dev eth0 root handle 1: mq
> > > > sudo tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 100%
> > > > sudo tc qdisc add dev eth0 parent 1:2 handle 20: netem duplicate 100%
> > > >
> > > > I don't see any logical problem of such use case, therefore we should
> > > > consider it as valid, we can't break it.
> > > >
> > >
> > > I thought we are trying to provide an intermediate solution to plug an
> > > existing hole and come up with a longer term solution.
> >
> > Breaking valid use cases even for a short period is still no way to go.
> > Sorry, Jamal. Since I can't convince you, please ask Linus.
> >
> > Also, I don't see you have proposed any long term solution. If you
> > really have one, please state it clearly and provide a clear timeline to
> > users.
> >
> 
> I explained my approach a few times: We need to come up with a long
> term solution that looks at the sanity of hierarchies.

I interpret as you have no long term solution, so without any long term
solution, how do you convince users you will unbreak them after breaking
them? This looks more and more concerning.


> Equivalent to init/change()
> Today we only look at netlink requests for a specific qdisc. The new
> approach (possibly an ops) will also look at the sanity of configs in
> relation to hierarchies.
> You can work on it or come with an alternative proposal.

You misunderstood this. It is never about me, mentioning me is not even
relevant. I defend users, please think for users, not me (or youself).

If you think from users' perspective, you wouldn't even suggest breaking
any of their use cases for any time. All what you said here is from your
own perspective, surely you understand all the TC details, but users
don't.

> That is not the scope of this discussion though
> 
> > > If there are users of such a "potential setup" you show above we are
> > > going to find out very quickly.
> >
> > Please read the above specific example. It is more than just valid, it
> > is very reasonable, installing netem for each queue is the right way of
> > using netem duplication to avoid the global root spinlock in a multiqueue
> > setup.
> >
> 
> In all my years working on tc I have never seen _anyone_ using
> duplication where netem is _not the root qdisc_. And i have done a lot
> of "support" in this area.
> You can craft any example you want but it needs to be practical - I
> dont see the practicality in your example.

The example I provide is real and practical, in fact, it is _the only_
reasonable way to use netem duplication directly on multiqueue NIC.

I bet you don't have another way (unless you don't care about the global
spinlock) in such setup.


> Just because we allow arbitrary crafting of hierarchies doesnt mean
> they are correct.

Can we let users decide? Why do we have the priviledge to decide
everything for users? Users are usually more correct us, this is why so
many bugs actually became features, it is just simply not up to us.

We serve users, not vice versa, apparently. Let's be humble.

> The choice is between complicating things to fix a "potential" corner
> use case vs simplicity (especially of a short term approach that is
> intended to be obsoleted in the long term).

I don't see any simplicity from your patch, it is not maintainable at
all (I already explained why and suggested a better way). 40-LOC vs
4+/4-,  you call the 40LOC simplicity?

And this case is not corner, nor potential, it is valid and reasonable.
Downplaying use cases only hurts users.

> 
> 
> > Breaking users and letting them complain is not a good strategy either.
> >
> > On the other hand, thanks for acknowledging it breaks users, which
> > confirms my point.
> >
> > I will wait for Linus' response.
> >
> > > We are working against security people who are finding all sorts of
> > > "potential use cases" to create CVEs.
> >
> > I seriouly doubt the urgency of those CVE's, because none of them can be
> > triggered without root. Please don't get me wrong, I already fixed many of
> > them, but I believe they can wait, false urgency does not help anything.
> >
> 
> All tc rules require root including your example  - afaik, bounties
> are being given for unprivileged user namespaces

Sure, many CVE's have bounties. This does not mean all of them are
urgent. They are important, but just not urgent. Creating false urgency
is harmful for decision making.

> > >
> > > > >
> > > > > Previous approaches suggested in discussions in chronological order:
> > > > >
> > > > > 1) Track duplication status or ttl in the sk_buff struct. Considered
> > > > > too specific a use case to extend such a struct, though this would
> > > > > be a resilient fix and address other previous and potential future
> > > > > DOS bugs like the one described in loopy fun [2].
> > > >
> > > > The link you provid is from 8 years ago, since then the redirection
> > > > logic has been improved. I am not sure why it helps to justify your
> > > > refusal of this approach.
> > > >
> > > > I also strongly disagree with "too specific a use case to extend such
> > > > a struct", we simply have so many use-case-specific fields within
> > > > sk_buff->cb. For example, the tc_skb_cb->zone is very specific
> > > > for act_ct.
> > > >
> > > > skb->cb is precisely designed to be use-case-specific and layer-specific.
> > > >
> > > > None of the above points stands.
> > > >
> > >
> > > I doubt you have looked at the code based on how you keep coming back
> > > with the same points.
> >
> > Please avoid personal attacks. It helps nothing to your argument here,
> > in fact, it will only weaken your arguments.
> >
> 
> How is this a personal attack? You posted a patch that breaks things further.
> I pointed it to you _multiple times_. You still posted it as a solution!

Maybe you are not helping at all? You kept mentioning "issues" without
even explaining what issues are.

Here, you keep mentioning I didn't look at the code base without saying
anything helpful. The fact is I looked at all the qdisc_skb_cb and
tc_skb_cb use cases, I tried to place a new field/bit in at least 3
different locations with multiple failures.

Maybe you need to be helpful and respectful?

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: This breaks netem use cases
  2025-07-10  8:26   ` Paolo Abeni
@ 2025-07-11  5:44     ` Cong Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Cong Wang @ 2025-07-11  5:44 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: William Liu, netdev, jhs, victor, pctammela, kuba, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms, linux-kernel

On Thu, Jul 10, 2025 at 10:26:46AM +0200, Paolo Abeni wrote:
> On 7/8/25 9:42 PM, Cong Wang wrote:
> > (Cc LKML for more audience, since this clearly breaks potentially useful
> > use cases)
> > 
> > On Tue, Jul 08, 2025 at 04:43:26PM +0000, William Liu wrote:
> >> netem_enqueue's duplication prevention logic breaks when a netem
> >> resides in a qdisc tree with other netems - this can lead to a
> >> soft lockup and OOM loop in netem_dequeue, as seen in [1].
> >> Ensure that a duplicating netem cannot exist in a tree with other
> >> netems.
> > 
> > As I already warned in your previous patchset, this breaks the following
> > potentially useful use case:
> > 
> > sudo tc qdisc add dev eth0 root handle 1: mq
> > sudo tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 100%
> > sudo tc qdisc add dev eth0 parent 1:2 handle 20: netem duplicate 100%
> > 
> > I don't see any logical problem of such use case, therefore we should
> > consider it as valid, we can't break it.
> 
> My understanding is that even the solution you proposed breaks a
> currently accepted configuration:
> 
> https://lore.kernel.org/netdev/CAM0EoMmBdZBzfUAms5-0hH5qF5ODvxWfgqrbHaGT6p3-uOD6vg@mail.gmail.com/

Maybe it is not obvious, my patch does not reject users' setup. It
probably has bugs, I am more than just happy to address any bugs in the
next iteration (like for any patch), in fact it is my obligation.

My appologize if I misled any of you to believe my patch is bug-free or
perfect, it is never the case.

For Jamal's patch, it is his intention to break users' setup, and this
won't change during any iteration.

They are significantly different.

> 
> I call them (both the linked one and the inline one) 'configurations'
> instead of 'use-cases' because I don't see how any of them could have
> real users, other than: https://xkcd.com/1172/.

Please let me know if you have any other way to use netem duplication on
a multiqueue NIC _directly_ without worrying about the global spinlock.

I bet you have none. Either you need to use it indirectly (attaching to
a non-root) or you have to face the global spinlock (aka without using
mq).

I am open to your education. :)

Thanks a lot!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
  2025-07-08 16:43 [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree William Liu
                   ` (2 preceding siblings ...)
  2025-07-08 19:42 ` This breaks netem use cases Cong Wang
@ 2025-07-11 22:55 ` Jakub Kicinski
  2025-07-12 16:51   ` William Liu
  2025-07-11 23:00 ` patchwork-bot+netdevbpf
  4 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-07-11 22:55 UTC (permalink / raw)
  To: William Liu
  Cc: netdev, jhs, xiyou.wangcong, victor, pctammela, pabeni, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms

On Tue, 08 Jul 2025 16:43:26 +0000 William Liu wrote:
> netem_enqueue's duplication prevention logic breaks when a netem
> resides in a qdisc tree with other netems - this can lead to a
> soft lockup and OOM loop in netem_dequeue, as seen in [1].
> Ensure that a duplicating netem cannot exist in a tree with other
> netems.

We already had one regression scare this week, so given that Cong
is not relenting I'll do the unusual thing of parking this fix in
net-next. It will reach Linus during the merge window, hopefully
the <2 week delay is not a big deal given how long we've taken already.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
  2025-07-08 16:43 [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree William Liu
                   ` (3 preceding siblings ...)
  2025-07-11 22:55 ` [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree Jakub Kicinski
@ 2025-07-11 23:00 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 16+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-07-11 23:00 UTC (permalink / raw)
  To: William Liu
  Cc: netdev, jhs, xiyou.wangcong, victor, pctammela, pabeni, kuba,
	stephen, dcaratti, savy, jiri, davem, edumazet, horms

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 08 Jul 2025 16:43:26 +0000 you wrote:
> netem_enqueue's duplication prevention logic breaks when a netem
> resides in a qdisc tree with other netems - this can lead to a
> soft lockup and OOM loop in netem_dequeue, as seen in [1].
> Ensure that a duplicating netem cannot exist in a tree with other
> netems.
> 
> Previous approaches suggested in discussions in chronological order:
> 
> [...]

Here is the summary with links:
  - [net,v5,1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
    https://git.kernel.org/netdev/net-next/c/ec8e0e3d7ade
  - [net,v5,2/2] selftests/tc-testing: Add tests for restrictions on netem duplication
    https://git.kernel.org/netdev/net-next/c/ecdec65ec78d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
  2025-07-11 22:55 ` [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree Jakub Kicinski
@ 2025-07-12 16:51   ` William Liu
  2025-07-14 14:58     ` Jakub Kicinski
  0 siblings, 1 reply; 16+ messages in thread
From: William Liu @ 2025-07-12 16:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, jhs, xiyou.wangcong, victor, pctammela, pabeni, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms

On Friday, July 11th, 2025 at 10:55 PM, Jakub Kicinski <kuba@kernel.org> wrote:

> 
> 
> On Tue, 08 Jul 2025 16:43:26 +0000 William Liu wrote:
> 
> > netem_enqueue's duplication prevention logic breaks when a netem
> > resides in a qdisc tree with other netems - this can lead to a
> > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > Ensure that a duplicating netem cannot exist in a tree with other
> > netems.
> 
> 
> We already had one regression scare this week, so given that Cong
> is not relenting I'll do the unusual thing of parking this fix in
> net-next. It will reach Linus during the merge window, hopefully
> the <2 week delay is not a big deal given how long we've taken already.

If this is going to net-next, will this still be a candidate for backporting like other recent net/sched bug-fixes? Or will this only be considered fixed starting 6.17 onwards?

Best,
William

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
  2025-07-12 16:51   ` William Liu
@ 2025-07-14 14:58     ` Jakub Kicinski
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2025-07-14 14:58 UTC (permalink / raw)
  To: William Liu
  Cc: netdev, jhs, xiyou.wangcong, victor, pctammela, pabeni, stephen,
	dcaratti, savy, jiri, davem, edumazet, horms

On Sat, 12 Jul 2025 16:51:39 +0000 William Liu wrote:
> > We already had one regression scare this week, so given that Cong
> > is not relenting I'll do the unusual thing of parking this fix in
> > net-next. It will reach Linus during the merge window, hopefully
> > the <2 week delay is not a big deal given how long we've taken already.  
> 
> If this is going to net-next, will this still be a candidate for
> backporting like other recent net/sched bug-fixes? Or will this only
> be considered fixed starting 6.17 onwards?

It will be a candidate for backporting once it reaches Linus.
The only difference is timing.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-07-14 14:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 16:43 [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree William Liu
2025-07-08 16:44 ` [PATCH net v5 2/2] selftests/tc-testing: Add tests for restrictions on netem duplication William Liu
2025-07-08 18:50   ` Jamal Hadi Salim
2025-07-08 18:50 ` [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree Jamal Hadi Salim
2025-07-08 19:42 ` This breaks netem use cases Cong Wang
2025-07-08 20:35   ` Jamal Hadi Salim
2025-07-08 21:32     ` Cong Wang
2025-07-08 22:26       ` Jamal Hadi Salim
2025-07-08 22:45         ` Stephen Hemminger
2025-07-11  5:19         ` Cong Wang
2025-07-10  8:26   ` Paolo Abeni
2025-07-11  5:44     ` Cong Wang
2025-07-11 22:55 ` [PATCH net v5 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree Jakub Kicinski
2025-07-12 16:51   ` William Liu
2025-07-14 14:58     ` Jakub Kicinski
2025-07-11 23:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).