* [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
@ 2025-06-27 6:17 William Liu
2025-06-28 0:15 ` Cong Wang
0 siblings, 1 reply; 16+ messages in thread
From: William Liu @ 2025-06-27 6:17 UTC (permalink / raw)
To: netdev
Cc: jhs, xiyou.wangcong, victor, pctammela, pabeni, kuba, stephen,
dcaratti, savy, jiri, davem, edumazet, horms, William Liu
netem_enqueue's duplication prevention logic breaks when a netem
resides in a qdisc tree with other netems - this can lead to a
soft lockup and OOM loop in netem_dequeue, as seen in [1].
Ensure that a duplicating netem cannot exist in a tree with other
netems.
Previous approaches suggested in discussions in chronological order:
1) Track duplication status or ttl in the sk_buff struct. Considered
too specific a use case to extend such a struct, though this would
be a resilient fix and address other previous and potential future
DOS bugs like the one described in loopy fun [2].
2) Restrict netem_enqueue recursion depth like in act_mirred with a
per cpu variable. However, netem_dequeue can call enqueue on its
child, and the depth restriction could be bypassed if the child is a
netem.
3) Use the same approach as in 2, but add metadata in netem_skb_cb
to handle the netem_dequeue case and track a packet's involvement
in duplication. This is an overly complex approach, and Jamal
notes that the skb cb can be overwritten to circumvent this
safeguard.
4) Prevent the addition of a netem to a qdisc tree if its ancestral
path contains a netem. However, filters and actions can cause a
packet to change paths when re-enqueued to the root from netem
duplication, leading us to the current solution: prevent a
duplicating netem from inhabiting the same tree as other netems.
[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/
[2] https://lwn.net/Articles/719297/
Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
Reported-by: William Liu <will@willsroot.io>
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Signed-off-by: William Liu <will@willsroot.io>
Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io>
---
v3 -> v4:
- Clarify changelog with chronological order of attempted solutions
v2 -> v3:
- Clarify reasoning for approach in changelog
- Removed has_duplication
v1 -> v2:
- Renamed only_duplicating to duplicates and invert logic for clarity
---
net/sched/sch_netem.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index fdd79d3ccd8c..eafc316ae319 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -973,6 +973,41 @@ static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
return 0;
}
+static const struct Qdisc_class_ops netem_class_ops;
+
+static int check_netem_in_tree(struct Qdisc *sch, bool duplicates,
+ struct netlink_ext_ack *extack)
+{
+ struct Qdisc *root, *q;
+ unsigned int i;
+
+ root = qdisc_root_sleeping(sch);
+
+ if (sch != root && root->ops->cl_ops == &netem_class_ops) {
+ if (duplicates ||
+ ((struct netem_sched_data *)qdisc_priv(root))->duplicate)
+ goto err;
+ }
+
+ if (!qdisc_dev(root))
+ return 0;
+
+ hash_for_each(qdisc_dev(root)->qdisc_hash, i, q, hash) {
+ if (sch != q && q->ops->cl_ops == &netem_class_ops) {
+ if (duplicates ||
+ ((struct netem_sched_data *)qdisc_priv(q))->duplicate)
+ goto err;
+ }
+ }
+
+ return 0;
+
+err:
+ NL_SET_ERR_MSG(extack,
+ "netem: cannot mix duplicating netems with other netems in tree");
+ return -EINVAL;
+}
+
/* Parse netlink message to set options */
static int netem_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
@@ -1031,6 +1066,11 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
q->gap = qopt->gap;
q->counter = 0;
q->loss = qopt->loss;
+
+ ret = check_netem_in_tree(sch, qopt->duplicate, extack);
+ if (ret)
+ goto unlock;
+
q->duplicate = qopt->duplicate;
/* for compatibility with earlier versions.
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-27 6:17 [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree William Liu
@ 2025-06-28 0:15 ` Cong Wang
2025-06-28 4:23 ` William Liu
2025-06-28 15:15 ` Stephen Hemminger
0 siblings, 2 replies; 16+ messages in thread
From: Cong Wang @ 2025-06-28 0:15 UTC (permalink / raw)
To: William Liu
Cc: netdev, jhs, victor, pctammela, pabeni, kuba, stephen, dcaratti,
savy, jiri, davem, edumazet, horms
On Fri, Jun 27, 2025 at 06:17:31AM +0000, William Liu wrote:
> netem_enqueue's duplication prevention logic breaks when a netem
> resides in a qdisc tree with other netems - this can lead to a
> soft lockup and OOM loop in netem_dequeue, as seen in [1].
> Ensure that a duplicating netem cannot exist in a tree with other
> netems.
>
Thanks for providing more details.
> Previous approaches suggested in discussions in chronological order:
>
> 1) Track duplication status or ttl in the sk_buff struct. Considered
> too specific a use case to extend such a struct, though this would
> be a resilient fix and address other previous and potential future
> DOS bugs like the one described in loopy fun [2].
>
> 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> per cpu variable. However, netem_dequeue can call enqueue on its
> child, and the depth restriction could be bypassed if the child is a
> netem.
>
> 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> to handle the netem_dequeue case and track a packet's involvement
> in duplication. This is an overly complex approach, and Jamal
> notes that the skb cb can be overwritten to circumvent this
> safeguard.
This approach looks most elegant to me since it is per-skb and only
contained for netem. Since netem_skb_cb is shared among qdisc's, what
about just extending qdisc_skb_cb? Something like:
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 638948be4c50..4c5505661986 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -436,6 +436,7 @@ struct qdisc_skb_cb {
unsigned int pkt_len;
u16 slave_dev_queue_mapping;
u16 tc_classid;
+ u32 reserved;
};
#define QDISC_CB_PRIV_LEN 20
unsigned char data[QDISC_CB_PRIV_LEN];
Then we just set and check it for duplicated skbs:
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index fdd79d3ccd8c..4290f8fca0e9 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -486,7 +486,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
* If we need to duplicate packet, then clone it before
* original is modified.
*/
- if (count > 1)
+ if (count > 1 && !qdisc_skb_cb(skb)->reserved)
skb2 = skb_clone(skb, GFP_ATOMIC);
/*
@@ -540,9 +540,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct Qdisc *rootq = qdisc_root_bh(sch);
u32 dupsave = q->duplicate; /* prevent duplicating a dup... */
- q->duplicate = 0;
+ qdisc_skb_cb(skb2)->reserved = dupsave;
rootq->enqueue(skb2, rootq, to_free);
- q->duplicate = dupsave;
skb2 = NULL;
}
Could this work? It looks even shorter than your patch. :-)
Note, I don't even compile test it, I just show it to you for discussion.
Regards,
Cong Wang
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-28 0:15 ` Cong Wang
@ 2025-06-28 4:23 ` William Liu
2025-06-28 21:25 ` Jamal Hadi Salim
2025-06-28 15:15 ` Stephen Hemminger
1 sibling, 1 reply; 16+ messages in thread
From: William Liu @ 2025-06-28 4:23 UTC (permalink / raw)
To: Cong Wang
Cc: netdev, jhs, victor, pctammela, pabeni, kuba, stephen, dcaratti,
savy, jiri, davem, edumazet, horms
On Saturday, June 28th, 2025 at 12:15 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
>
> On Fri, Jun 27, 2025 at 06:17:31AM +0000, William Liu wrote:
>
> > netem_enqueue's duplication prevention logic breaks when a netem
> > resides in a qdisc tree with other netems - this can lead to a
> > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > Ensure that a duplicating netem cannot exist in a tree with other
> > netems.
>
>
> Thanks for providing more details.
>
> > Previous approaches suggested in discussions in chronological order:
> >
> > 1) Track duplication status or ttl in the sk_buff struct. Considered
> > too specific a use case to extend such a struct, though this would
> > be a resilient fix and address other previous and potential future
> > DOS bugs like the one described in loopy fun [2].
> >
> > 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> > per cpu variable. However, netem_dequeue can call enqueue on its
> > child, and the depth restriction could be bypassed if the child is a
> > netem.
> >
> > 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> > to handle the netem_dequeue case and track a packet's involvement
> > in duplication. This is an overly complex approach, and Jamal
> > notes that the skb cb can be overwritten to circumvent this
> > safeguard.
>
>
> This approach looks most elegant to me since it is per-skb and only
> contained for netem. Since netem_skb_cb is shared among qdisc's, what
> about just extending qdisc_skb_cb? Something like:
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 638948be4c50..4c5505661986 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -436,6 +436,7 @@ struct qdisc_skb_cb {
> unsigned int pkt_len;
> u16 slave_dev_queue_mapping;
> u16 tc_classid;
> + u32 reserved;
> };
> #define QDISC_CB_PRIV_LEN 20
> unsigned char data[QDISC_CB_PRIV_LEN];
>
>
> Then we just set and check it for duplicated skbs:
>
>
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fdd79d3ccd8c..4290f8fca0e9 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -486,7 +486,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> * If we need to duplicate packet, then clone it before
> * original is modified.
> */
> - if (count > 1)
>
> + if (count > 1 && !qdisc_skb_cb(skb)->reserved)
>
> skb2 = skb_clone(skb, GFP_ATOMIC);
>
> /*
> @@ -540,9 +540,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> struct Qdisc rootq = qdisc_root_bh(sch);
> u32 dupsave = q->duplicate; / prevent duplicating a dup... */
>
>
> - q->duplicate = 0;
>
> + qdisc_skb_cb(skb2)->reserved = dupsave;
>
> rootq->enqueue(skb2, rootq, to_free);
>
> - q->duplicate = dupsave;
>
> skb2 = NULL;
> }
>
>
> Could this work? It looks even shorter than your patch. :-)
>
> Note, I don't even compile test it, I just show it to you for discussion.
>
Thank you for the suggestion. Jamal, would this work in this case? I recall you mentioning that the cb approach can be circumvented by zeroing the cb at the root: https://lore.kernel.org/netdev/CAM0EoMk4dxOFoN_=3yOy+XrtU=yvjJXAw3fVTmN9=M=R=vtbxA@mail.gmail.com/
I'm not sure if qdisc_skb_cb would be different than the case for private data for qdiscs.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-28 4:23 ` William Liu
@ 2025-06-28 21:25 ` Jamal Hadi Salim
2025-06-29 20:16 ` Cong Wang
0 siblings, 1 reply; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-06-28 21:25 UTC (permalink / raw)
To: William Liu
Cc: Cong Wang, netdev, victor, pctammela, pabeni, kuba, stephen,
dcaratti, savy, jiri, davem, edumazet, horms
On Sat, Jun 28, 2025 at 12:23 AM William Liu <will@willsroot.io> wrote:
>
> On Saturday, June 28th, 2025 at 12:15 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> >
> >
> > On Fri, Jun 27, 2025 at 06:17:31AM +0000, William Liu wrote:
> >
> > > netem_enqueue's duplication prevention logic breaks when a netem
> > > resides in a qdisc tree with other netems - this can lead to a
> > > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > > Ensure that a duplicating netem cannot exist in a tree with other
> > > netems.
> >
> >
> > Thanks for providing more details.
> >
> > > Previous approaches suggested in discussions in chronological order:
> > >
> > > 1) Track duplication status or ttl in the sk_buff struct. Considered
> > > too specific a use case to extend such a struct, though this would
> > > be a resilient fix and address other previous and potential future
> > > DOS bugs like the one described in loopy fun [2].
> > >
> > > 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> > > per cpu variable. However, netem_dequeue can call enqueue on its
> > > child, and the depth restriction could be bypassed if the child is a
> > > netem.
> > >
> > > 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> > > to handle the netem_dequeue case and track a packet's involvement
> > > in duplication. This is an overly complex approach, and Jamal
> > > notes that the skb cb can be overwritten to circumvent this
> > > safeguard.
> >
> >
> > This approach looks most elegant to me since it is per-skb and only
> > contained for netem. Since netem_skb_cb is shared among qdisc's, what
> > about just extending qdisc_skb_cb? Something like:
> >
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index 638948be4c50..4c5505661986 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -436,6 +436,7 @@ struct qdisc_skb_cb {
> > unsigned int pkt_len;
> > u16 slave_dev_queue_mapping;
> > u16 tc_classid;
> > + u32 reserved;
> > };
> > #define QDISC_CB_PRIV_LEN 20
> > unsigned char data[QDISC_CB_PRIV_LEN];
> >
> >
> > Then we just set and check it for duplicated skbs:
> >
> >
> > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > index fdd79d3ccd8c..4290f8fca0e9 100644
> > --- a/net/sched/sch_netem.c
> > +++ b/net/sched/sch_netem.c
> > @@ -486,7 +486,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > * If we need to duplicate packet, then clone it before
> > * original is modified.
> > */
> > - if (count > 1)
> >
> > + if (count > 1 && !qdisc_skb_cb(skb)->reserved)
> >
> > skb2 = skb_clone(skb, GFP_ATOMIC);
> >
> > /*
> > @@ -540,9 +540,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > struct Qdisc rootq = qdisc_root_bh(sch);
> > u32 dupsave = q->duplicate; / prevent duplicating a dup... */
> >
> >
> > - q->duplicate = 0;
> >
> > + qdisc_skb_cb(skb2)->reserved = dupsave;
> >
> > rootq->enqueue(skb2, rootq, to_free);
> >
> > - q->duplicate = dupsave;
> >
> > skb2 = NULL;
> > }
> >
> >
> > Could this work? It looks even shorter than your patch. :-)
> >
> > Note, I don't even compile test it, I just show it to you for discussion.
> >
>
> Thank you for the suggestion. Jamal, would this work in this case? I recall you mentioning that the cb approach can be circumvented by zeroing the cb at the root:
your approach was to overwrite the netem specific cb which is exposed
via the cb ->data that can be overwritten for example by a trivial
ebpf program attach to any level of the hierarchy. This specific
variant from Cong is not accessible to ebpf but as i expressed my view
in other email i feel it is not a good solution.
cheers,
jamal
https://lore.kernel.org/netdev/CAM0EoMk4dxOFoN_=3yOy+XrtU=yvjJXAw3fVTmN9=M=R=vtbxA@mail.gmail.com/
>
> I'm not sure if qdisc_skb_cb would be different than the case for private data for qdiscs.
>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-28 21:25 ` Jamal Hadi Salim
@ 2025-06-29 20:16 ` Cong Wang
2025-06-30 11:32 ` Jamal Hadi Salim
0 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2025-06-29 20:16 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: William Liu, netdev, victor, pctammela, pabeni, kuba, stephen,
dcaratti, savy, jiri, davem, edumazet, horms
On Sat, Jun 28, 2025 at 05:25:25PM -0400, Jamal Hadi Salim wrote:
> your approach was to overwrite the netem specific cb which is exposed
> via the cb ->data that can be overwritten for example by a trivial
> ebpf program attach to any level of the hierarchy. This specific
> variant from Cong is not accessible to ebpf but as i expressed my view
> in other email i feel it is not a good solution.
>
> https://lore.kernel.org/netdev/CAM0EoMk4dxOFoN_=3yOy+XrtU=yvjJXAw3fVTmN9=M=R=vtbxA@mail.gmail.com/
Hi Jamal,
I have two concerns regarding your/Will's proposal:
1) I am not sure whether disallowing such case is safe. From my
understanding this case is not obviously or logically wrong. So if we
disallow it, we may have a chance to break some application.
2) Singling out this case looks not elegant to me. Even _if_ we really
want to disallow such case, we still need to find a better and more
elegant way to do so, for example, adding a new ops for netem and calling
it in sch_api.c. Something like below:
// Implement netem_avoid_duplicate()
// ...
static struct Qdisc_ops netem_qdisc_ops __read_mostly = {
.avoid_duplicate = netem_avoid_duplicate,
};
// In sch_api.c
// traverse the Qdisch hierarch and call ->avoid_duplicate()
What do you think?
Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-29 20:16 ` Cong Wang
@ 2025-06-30 11:32 ` Jamal Hadi Salim
2025-06-30 22:39 ` Cong Wang
0 siblings, 1 reply; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-06-30 11:32 UTC (permalink / raw)
To: Cong Wang
Cc: William Liu, netdev, victor, pctammela, pabeni, kuba, stephen,
dcaratti, savy, jiri, davem, edumazet, horms
On Sun, Jun 29, 2025 at 4:16 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Sat, Jun 28, 2025 at 05:25:25PM -0400, Jamal Hadi Salim wrote:
> > your approach was to overwrite the netem specific cb which is exposed
> > via the cb ->data that can be overwritten for example by a trivial
> > ebpf program attach to any level of the hierarchy. This specific
> > variant from Cong is not accessible to ebpf but as i expressed my view
> > in other email i feel it is not a good solution.
> >
> > https://lore.kernel.org/netdev/CAM0EoMk4dxOFoN_=3yOy+XrtU=yvjJXAw3fVTmN9=M=R=vtbxA@mail.gmail.com/
>
> Hi Jamal,
>
> I have two concerns regarding your/Will's proposal:
>
> 1) I am not sure whether disallowing such case is safe. From my
> understanding this case is not obviously or logically wrong. So if we
> disallow it, we may have a chance to break some application.
>
I dont intentionaly creating a loop-inside-a-loop as being correct.
Stephen, is this a legit use case?
Agreed that we need to be careful about some corner cases which may
look crazy but are legit.
> 2) Singling out this case looks not elegant to me.
My thinking is to long term disallow all nonsense hierarchy use cases,
such as this one, with some
"feature bits". ATM, it's easy to catch the bad configs within a
single qdisc in ->init() but currently not possible if it affects a
hierarchy.
For starters this is the first one such "deny" feature specific to
netem to not allow hierarchies where we can have a netem loop inside a
loop. It didnt seem like we need the infrastructure just to handle one
case and for that reason i thought William's solution was reasonable.
I could swear i posted a sample patch a while back but i cant find it
now.
The other obvious case seems to be ->qlen_notify() effect (but there
may be other solutions to that one which we can discuss).
The problem we have is all the bounty hunting is focussed on finding
such nonsensical hierarchy (and it is getting exhausting). We can keep
fixing things that break because the bounty hunters find another
netlink message to send that will put it back into an awkward
position.
> Even _if_ we really
> want to disallow such case, we still need to find a better and more
> elegant way to do so, for example, adding a new ops for netem and calling
> it in sch_api.c. Something like below:
>
> // Implement netem_avoid_duplicate()
> // ...
>
> static struct Qdisc_ops netem_qdisc_ops __read_mostly = {
> .avoid_duplicate = netem_avoid_duplicate,
> };
>
> // In sch_api.c
> // traverse the Qdisch hierarch and call ->avoid_duplicate()
>
> What do you think?
>
Could this not be circumvented by some filter setting this to some bad
state? The answer seems to be "yes" but i may be misreading.
If you can isolate the solution to just netem and netem fields then it
should be fine. What i am not comfortable with is adding some feature
or metadata that is available across all qdiscs just to solve this
(nonsensical) hierarchy use case.
cheers,
jamal
> Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-30 11:32 ` Jamal Hadi Salim
@ 2025-06-30 22:39 ` Cong Wang
2025-07-01 13:36 ` Paolo Abeni
2025-07-01 14:15 ` Jamal Hadi Salim
0 siblings, 2 replies; 16+ messages in thread
From: Cong Wang @ 2025-06-30 22:39 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: William Liu, netdev, victor, pctammela, pabeni, kuba, stephen,
dcaratti, savy, jiri, davem, edumazet, horms
On Mon, Jun 30, 2025 at 07:32:48AM -0400, Jamal Hadi Salim wrote:
> On Sun, Jun 29, 2025 at 4:16 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Sat, Jun 28, 2025 at 05:25:25PM -0400, Jamal Hadi Salim wrote:
> > > your approach was to overwrite the netem specific cb which is exposed
> > > via the cb ->data that can be overwritten for example by a trivial
> > > ebpf program attach to any level of the hierarchy. This specific
> > > variant from Cong is not accessible to ebpf but as i expressed my view
> > > in other email i feel it is not a good solution.
> > >
> > > https://lore.kernel.org/netdev/CAM0EoMk4dxOFoN_=3yOy+XrtU=yvjJXAw3fVTmN9=M=R=vtbxA@mail.gmail.com/
> >
> > Hi Jamal,
> >
> > I have two concerns regarding your/Will's proposal:
> >
> > 1) I am not sure whether disallowing such case is safe. From my
> > understanding this case is not obviously or logically wrong. So if we
> > disallow it, we may have a chance to break some application.
> >
>
> I dont intentionaly creating a loop-inside-a-loop as being correct.
> Stephen, is this a legit use case?
> Agreed that we need to be careful about some corner cases which may
> look crazy but are legit.
Maybe I misunderstand your patch, to me duplicating packets in
parallel sub-hierarchy is not wrong, may be even useful.
>
> > 2) Singling out this case looks not elegant to me.
>
> My thinking is to long term disallow all nonsense hierarchy use cases,
> such as this one, with some
> "feature bits". ATM, it's easy to catch the bad configs within a
> single qdisc in ->init() but currently not possible if it affects a
> hierarchy.
The problem with this is it becomes harder to get a clear big picture,
today netem, tomorrow maybe hfsc etc.? We could end up with hiding such
bad-config-prevention code in different Qdisc's.
With the approach I suggested, we have a central place (probably
sch_api.c) to have all the logics, nothing is hidden, easier to
understand and easier to introduce more bad-config-prevention code.
I hope this makes sense to you.
Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-30 22:39 ` Cong Wang
@ 2025-07-01 13:36 ` Paolo Abeni
2025-07-01 14:15 ` Jamal Hadi Salim
1 sibling, 0 replies; 16+ messages in thread
From: Paolo Abeni @ 2025-07-01 13:36 UTC (permalink / raw)
To: Cong Wang, Jamal Hadi Salim
Cc: William Liu, netdev, victor, pctammela, kuba, stephen, dcaratti,
savy, jiri, davem, edumazet, horms
On 7/1/25 12:39 AM, Cong Wang wrote:
> On Mon, Jun 30, 2025 at 07:32:48AM -0400, Jamal Hadi Salim wrote:
>> On Sun, Jun 29, 2025 at 4:16 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> On Sat, Jun 28, 2025 at 05:25:25PM -0400, Jamal Hadi Salim wrote:
>>>> your approach was to overwrite the netem specific cb which is exposed
>>>> via the cb ->data that can be overwritten for example by a trivial
>>>> ebpf program attach to any level of the hierarchy. This specific
>>>> variant from Cong is not accessible to ebpf but as i expressed my view
>>>> in other email i feel it is not a good solution.
>>>>
>>>> https://lore.kernel.org/netdev/CAM0EoMk4dxOFoN_=3yOy+XrtU=yvjJXAw3fVTmN9=M=R=vtbxA@mail.gmail.com/
>>>
>>> Hi Jamal,
>>>
>>> I have two concerns regarding your/Will's proposal:
>>>
>>> 1) I am not sure whether disallowing such case is safe. From my
>>> understanding this case is not obviously or logically wrong. So if we
>>> disallow it, we may have a chance to break some application.
>>>
>>
>> I dont intentionaly creating a loop-inside-a-loop as being correct.
>> Stephen, is this a legit use case?
>> Agreed that we need to be careful about some corner cases which may
>> look crazy but are legit.
>
> Maybe I misunderstand your patch, to me duplicating packets in
> parallel sub-hierarchy is not wrong, may be even useful.
>
>>
>>> 2) Singling out this case looks not elegant to me.
>>
>> My thinking is to long term disallow all nonsense hierarchy use cases,
>> such as this one, with some
>> "feature bits". ATM, it's easy to catch the bad configs within a
>> single qdisc in ->init() but currently not possible if it affects a
>> hierarchy.
>
> The problem with this is it becomes harder to get a clear big picture,
> today netem, tomorrow maybe hfsc etc.? We could end up with hiding such
> bad-config-prevention code in different Qdisc's.
>
> With the approach I suggested, we have a central place (probably
> sch_api.c) to have all the logics, nothing is hidden, easier to
> understand and easier to introduce more bad-config-prevention code.
Since an agreement about an optimal long term solution looks not
trivial, and the proposed code is AFACS solving the issue at hand, WDYT
about accepting this one and incrementally improve it as needed?
/P
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-30 22:39 ` Cong Wang
2025-07-01 13:36 ` Paolo Abeni
@ 2025-07-01 14:15 ` Jamal Hadi Salim
2025-07-01 17:31 ` Cong Wang
1 sibling, 1 reply; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-01 14:15 UTC (permalink / raw)
To: Cong Wang
Cc: William Liu, netdev, victor, pctammela, pabeni, kuba, stephen,
dcaratti, savy, jiri, davem, edumazet, horms
On Mon, Jun 30, 2025 at 6:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Mon, Jun 30, 2025 at 07:32:48AM -0400, Jamal Hadi Salim wrote:
> > On Sun, Jun 29, 2025 at 4:16 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > On Sat, Jun 28, 2025 at 05:25:25PM -0400, Jamal Hadi Salim wrote:
> > > > your approach was to overwrite the netem specific cb which is exposed
> > > > via the cb ->data that can be overwritten for example by a trivial
> > > > ebpf program attach to any level of the hierarchy. This specific
> > > > variant from Cong is not accessible to ebpf but as i expressed my view
> > > > in other email i feel it is not a good solution.
> > > >
> > > > https://lore.kernel.org/netdev/CAM0EoMk4dxOFoN_=3yOy+XrtU=yvjJXAw3fVTmN9=M=R=vtbxA@mail.gmail.com/
> > >
> > > Hi Jamal,
> > >
> > > I have two concerns regarding your/Will's proposal:
> > >
> > > 1) I am not sure whether disallowing such case is safe. From my
> > > understanding this case is not obviously or logically wrong. So if we
> > > disallow it, we may have a chance to break some application.
> > >
> >
> > I dont intentionaly creating a loop-inside-a-loop as being correct.
> > Stephen, is this a legit use case?
> > Agreed that we need to be careful about some corner cases which may
> > look crazy but are legit.
>
> Maybe I misunderstand your patch, to me duplicating packets in
> parallel sub-hierarchy is not wrong, may be even useful.
>
TBH, there's no real world value for that specific config/repro and
worse that it causes the infinite loop.
I also cant see a good reason to have multiple netem children that all
loop back to root.
If there is one, we are going to find out when the patch goes in and
someone complains.
> >
> > > 2) Singling out this case looks not elegant to me.
> >
> > My thinking is to long term disallow all nonsense hierarchy use cases,
> > such as this one, with some
> > "feature bits". ATM, it's easy to catch the bad configs within a
> > single qdisc in ->init() but currently not possible if it affects a
> > hierarchy.
>
> The problem with this is it becomes harder to get a clear big picture,
> today netem, tomorrow maybe hfsc etc.? We could end up with hiding such
> bad-config-prevention code in different Qdisc's.
>
> With the approach I suggested, we have a central place (probably
> sch_api.c) to have all the logics, nothing is hidden, easier to
> understand and easier to introduce more bad-config-prevention code.
>
> I hope this makes sense to you.
>
To me the most positive outcome from the bounty hunters is getting
clarity that we not only need a per-qdisc validation as we do today,
but per-hierarchy as well; however, that is a different discussion we
can have after.
IIUC, we may be saying the same thing - a generic way to do hierarchy
validation. I even had a patch which i didnt think was the right thing
to do at the time. We can have that discussion.
But let's _please_ move forward with this patch, it fixes the
outstanding issues then we can discuss the best path forward more
calmly. The issue this patch fixes can be retrofitted into whatever
new scheme that we agree on after (and we may have to undo all the
backlog fixes as well).
cheers,
jamal
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-07-01 14:15 ` Jamal Hadi Salim
@ 2025-07-01 17:31 ` Cong Wang
2025-07-01 17:37 ` William Liu
2025-07-01 19:08 ` Jamal Hadi Salim
0 siblings, 2 replies; 16+ messages in thread
From: Cong Wang @ 2025-07-01 17:31 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: William Liu, netdev, victor, pctammela, pabeni, kuba, stephen,
dcaratti, savy, jiri, davem, edumazet, horms
On Tue, Jul 01, 2025 at 10:15:10AM -0400, Jamal Hadi Salim wrote:
> On Mon, Jun 30, 2025 at 6:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Mon, Jun 30, 2025 at 07:32:48AM -0400, Jamal Hadi Salim wrote:
> > > On Sun, Jun 29, 2025 at 4:16 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > >
> > > > On Sat, Jun 28, 2025 at 05:25:25PM -0400, Jamal Hadi Salim wrote:
> > > > > your approach was to overwrite the netem specific cb which is exposed
> > > > > via the cb ->data that can be overwritten for example by a trivial
> > > > > ebpf program attach to any level of the hierarchy. This specific
> > > > > variant from Cong is not accessible to ebpf but as i expressed my view
> > > > > in other email i feel it is not a good solution.
> > > > >
> > > > > https://lore.kernel.org/netdev/CAM0EoMk4dxOFoN_=3yOy+XrtU=yvjJXAw3fVTmN9=M=R=vtbxA@mail.gmail.com/
> > > >
> > > > Hi Jamal,
> > > >
> > > > I have two concerns regarding your/Will's proposal:
> > > >
> > > > 1) I am not sure whether disallowing such case is safe. From my
> > > > understanding this case is not obviously or logically wrong. So if we
> > > > disallow it, we may have a chance to break some application.
> > > >
> > >
> > > I dont intentionaly creating a loop-inside-a-loop as being correct.
> > > Stephen, is this a legit use case?
> > > Agreed that we need to be careful about some corner cases which may
> > > look crazy but are legit.
> >
> > Maybe I misunderstand your patch, to me duplicating packets in
> > parallel sub-hierarchy is not wrong, may be even useful.
> >
>
> TBH, there's no real world value for that specific config/repro and
> worse that it causes the infinite loop.
> I also cant see a good reason to have multiple netem children that all
> loop back to root.
> If there is one, we are going to find out when the patch goes in and
> someone complains.
I tend to be conservative here since breaking potential users is not a
good practice. It takes a long time for regular users to realize this
get removed since many of them use long term stable releases rather than
the latest release.
Also, the patch using qdisc_skb_cb() looks smaller than this one,
which means it is easier to review.
>
> > >
> > > > 2) Singling out this case looks not elegant to me.
> > >
> > > My thinking is to long term disallow all nonsense hierarchy use cases,
> > > such as this one, with some
> > > "feature bits". ATM, it's easy to catch the bad configs within a
> > > single qdisc in ->init() but currently not possible if it affects a
> > > hierarchy.
> >
> > The problem with this is it becomes harder to get a clear big picture,
> > today netem, tomorrow maybe hfsc etc.? We could end up with hiding such
> > bad-config-prevention code in different Qdisc's.
> >
> > With the approach I suggested, we have a central place (probably
> > sch_api.c) to have all the logics, nothing is hidden, easier to
> > understand and easier to introduce more bad-config-prevention code.
> >
> > I hope this makes sense to you.
> >
>
> To me the most positive outcome from the bounty hunters is getting
> clarity that we not only need a per-qdisc validation as we do today,
> but per-hierarchy as well; however, that is a different discussion we
> can have after.
>
> IIUC, we may be saying the same thing - a generic way to do hierarchy
> validation. I even had a patch which i didnt think was the right thing
> to do at the time. We can have that discussion.
Why not? It is not even necessarily more complex to have a generic
solution. With AI copilot, it is pretty quick. :)
FYI: I wrote the GSO segmentation patches with AI, they work well to fix
the UAF report by Mingi. I can post them at any time, just waiting for
Mingi's response to decide whether they are for -net or -net-next. I
hope this more complicated case could convince you to use AI to write
kernel code, if you still haven't.
>
> But let's _please_ move forward with this patch, it fixes the
> outstanding issues then we can discuss the best path forward more
> calmly. The issue this patch fixes can be retrofitted into whatever
> new scheme that we agree on after (and we may have to undo all the
> backlog fixes as well).
Sure, I will send out a patch to use qdisc_skb_cb() to help everyone out
of this situation.
Thanks!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-07-01 17:31 ` Cong Wang
@ 2025-07-01 17:37 ` William Liu
2025-07-01 19:08 ` Jamal Hadi Salim
1 sibling, 0 replies; 16+ messages in thread
From: William Liu @ 2025-07-01 17:37 UTC (permalink / raw)
To: Cong Wang
Cc: Jamal Hadi Salim, netdev, victor, pctammela, pabeni, kuba,
stephen, dcaratti, savy, jiri, davem, edumazet, horms
On Tuesday, July 1st, 2025 at 5:31 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
>
> On Tue, Jul 01, 2025 at 10:15:10AM -0400, Jamal Hadi Salim wrote:
>
> > On Mon, Jun 30, 2025 at 6:39 PM Cong Wang xiyou.wangcong@gmail.com wrote:
> >
> > > On Mon, Jun 30, 2025 at 07:32:48AM -0400, Jamal Hadi Salim wrote:
> > >
> > > > On Sun, Jun 29, 2025 at 4:16 PM Cong Wang xiyou.wangcong@gmail.com wrote:
> > > >
> > > > > On Sat, Jun 28, 2025 at 05:25:25PM -0400, Jamal Hadi Salim wrote:
> > > > >
> > > > > > your approach was to overwrite the netem specific cb which is exposed
> > > > > > via the cb ->data that can be overwritten for example by a trivial
> > > > > > ebpf program attach to any level of the hierarchy. This specific
> > > > > > variant from Cong is not accessible to ebpf but as i expressed my view
> > > > > > in other email i feel it is not a good solution.
> > > > > >
> > > > > > https://lore.kernel.org/netdev/CAM0EoMk4dxOFoN_=3yOy+XrtU=yvjJXAw3fVTmN9=M=R=vtbxA@mail.gmail.com/
> > > > >
> > > > > Hi Jamal,
> > > > >
> > > > > I have two concerns regarding your/Will's proposal:
> > > > >
> > > > > 1) I am not sure whether disallowing such case is safe. From my
> > > > > understanding this case is not obviously or logically wrong. So if we
> > > > > disallow it, we may have a chance to break some application.
> > > >
> > > > I dont intentionaly creating a loop-inside-a-loop as being correct.
> > > > Stephen, is this a legit use case?
> > > > Agreed that we need to be careful about some corner cases which may
> > > > look crazy but are legit.
> > >
> > > Maybe I misunderstand your patch, to me duplicating packets in
> > > parallel sub-hierarchy is not wrong, may be even useful.
> >
> > TBH, there's no real world value for that specific config/repro and
> > worse that it causes the infinite loop.
> > I also cant see a good reason to have multiple netem children that all
> > loop back to root.
> > If there is one, we are going to find out when the patch goes in and
> > someone complains.
>
>
> I tend to be conservative here since breaking potential users is not a
> good practice. It takes a long time for regular users to realize this
> get removed since many of them use long term stable releases rather than
> the latest release.
>
> Also, the patch using qdisc_skb_cb() looks smaller than this one,
> which means it is easier to review.
>
> > > > > 2) Singling out this case looks not elegant to me.
> > > >
> > > > My thinking is to long term disallow all nonsense hierarchy use cases,
> > > > such as this one, with some
> > > > "feature bits". ATM, it's easy to catch the bad configs within a
> > > > single qdisc in ->init() but currently not possible if it affects a
> > > > hierarchy.
> > >
> > > The problem with this is it becomes harder to get a clear big picture,
> > > today netem, tomorrow maybe hfsc etc.? We could end up with hiding such
> > > bad-config-prevention code in different Qdisc's.
> > >
> > > With the approach I suggested, we have a central place (probably
> > > sch_api.c) to have all the logics, nothing is hidden, easier to
> > > understand and easier to introduce more bad-config-prevention code.
> > >
> > > I hope this makes sense to you.
> >
> > To me the most positive outcome from the bounty hunters is getting
> > clarity that we not only need a per-qdisc validation as we do today,
> > but per-hierarchy as well; however, that is a different discussion we
> > can have after.
> >
> > IIUC, we may be saying the same thing - a generic way to do hierarchy
> > validation. I even had a patch which i didnt think was the right thing
> > to do at the time. We can have that discussion.
>
>
> Why not? It is not even necessarily more complex to have a generic
> solution. With AI copilot, it is pretty quick. :)
>
> FYI: I wrote the GSO segmentation patches with AI, they work well to fix
> the UAF report by Mingi. I can post them at any time, just waiting for
> Mingi's response to decide whether they are for -net or -net-next. I
> hope this more complicated case could convince you to use AI to write
> kernel code, if you still haven't.
>
> > But let's please move forward with this patch, it fixes the
> > outstanding issues then we can discuss the best path forward more
> > calmly. The issue this patch fixes can be retrofitted into whatever
> > new scheme that we agree on after (and we may have to undo all the
> > backlog fixes as well).
>
>
> Sure, I will send out a patch to use qdisc_skb_cb() to help everyone out
> of this situation.
>
I can send a patch with that variant for you all to compare sometime in the next 24 hrs - I have been working on fixing this back and forth with Jamal for the past month.
Best,
Will
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-07-01 17:31 ` Cong Wang
2025-07-01 17:37 ` William Liu
@ 2025-07-01 19:08 ` Jamal Hadi Salim
1 sibling, 0 replies; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-07-01 19:08 UTC (permalink / raw)
To: Cong Wang
Cc: William Liu, netdev, victor, pctammela, pabeni, kuba, stephen,
dcaratti, savy, jiri, davem, edumazet, horms
On Tue, Jul 1, 2025 at 1:31 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Tue, Jul 01, 2025 at 10:15:10AM -0400, Jamal Hadi Salim wrote:
> > On Mon, Jun 30, 2025 at 6:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > On Mon, Jun 30, 2025 at 07:32:48AM -0400, Jamal Hadi Salim wrote:
> > > > On Sun, Jun 29, 2025 at 4:16 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > > >
> > > > > On Sat, Jun 28, 2025 at 05:25:25PM -0400, Jamal Hadi Salim wrote:
> > > > > > your approach was to overwrite the netem specific cb which is exposed
> > > > > > via the cb ->data that can be overwritten for example by a trivial
> > > > > > ebpf program attach to any level of the hierarchy. This specific
> > > > > > variant from Cong is not accessible to ebpf but as i expressed my view
> > > > > > in other email i feel it is not a good solution.
> > > > > >
> > > > > > https://lore.kernel.org/netdev/CAM0EoMk4dxOFoN_=3yOy+XrtU=yvjJXAw3fVTmN9=M=R=vtbxA@mail.gmail.com/
> > > > >
> > > > > Hi Jamal,
> > > > >
> > > > > I have two concerns regarding your/Will's proposal:
> > > > >
> > > > > 1) I am not sure whether disallowing such case is safe. From my
> > > > > understanding this case is not obviously or logically wrong. So if we
> > > > > disallow it, we may have a chance to break some application.
> > > > >
> > > >
> > > > I dont intentionaly creating a loop-inside-a-loop as being correct.
> > > > Stephen, is this a legit use case?
> > > > Agreed that we need to be careful about some corner cases which may
> > > > look crazy but are legit.
> > >
> > > Maybe I misunderstand your patch, to me duplicating packets in
> > > parallel sub-hierarchy is not wrong, may be even useful.
> > >
> >
> > TBH, there's no real world value for that specific config/repro and
> > worse that it causes the infinite loop.
> > I also cant see a good reason to have multiple netem children that all
> > loop back to root.
> > If there is one, we are going to find out when the patch goes in and
> > someone complains.
>
> I tend to be conservative here since breaking potential users is not a
> good practice. It takes a long time for regular users to realize this
> get removed since many of them use long term stable releases rather than
> the latest release.
>
> Also, the patch using qdisc_skb_cb() looks smaller than this one,
> which means it is easier to review.
>
I think we are at a stalemate.
> >
> > > >
> > > > > 2) Singling out this case looks not elegant to me.
> > > >
> > > > My thinking is to long term disallow all nonsense hierarchy use cases,
> > > > such as this one, with some
> > > > "feature bits". ATM, it's easy to catch the bad configs within a
> > > > single qdisc in ->init() but currently not possible if it affects a
> > > > hierarchy.
> > >
> > > The problem with this is it becomes harder to get a clear big picture,
> > > today netem, tomorrow maybe hfsc etc.? We could end up with hiding such
> > > bad-config-prevention code in different Qdisc's.
> > >
> > > With the approach I suggested, we have a central place (probably
> > > sch_api.c) to have all the logics, nothing is hidden, easier to
> > > understand and easier to introduce more bad-config-prevention code.
> > >
> > > I hope this makes sense to you.
> > >
> >
> > To me the most positive outcome from the bounty hunters is getting
> > clarity that we not only need a per-qdisc validation as we do today,
> > but per-hierarchy as well; however, that is a different discussion we
> > can have after.
> >
> > IIUC, we may be saying the same thing - a generic way to do hierarchy
> > validation. I even had a patch which i didnt think was the right thing
> > to do at the time. We can have that discussion.
>
> Why not? It is not even necessarily more complex to have a generic
> solution. With AI copilot, it is pretty quick. :)
>
> FYI: I wrote the GSO segmentation patches with AI, they work well to fix
> the UAF report by Mingi. I can post them at any time, just waiting for
> Mingi's response to decide whether they are for -net or -net-next. I
> hope this more complicated case could convince you to use AI to write
> kernel code, if you still haven't.
>
> >
> > But let's _please_ move forward with this patch, it fixes the
> > outstanding issues then we can discuss the best path forward more
> > calmly. The issue this patch fixes can be retrofitted into whatever
> > new scheme that we agree on after (and we may have to undo all the
> > backlog fixes as well).
>
> Sure, I will send out a patch to use qdisc_skb_cb() to help everyone out
> of this situation.
>
I really dont see the point. Let's get Williams patch in please.
cheers,
jamal
> Thanks!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-28 0:15 ` Cong Wang
2025-06-28 4:23 ` William Liu
@ 2025-06-28 15:15 ` Stephen Hemminger
2025-06-28 21:15 ` Jamal Hadi Salim
2025-07-01 18:11 ` Eric Dumazet
1 sibling, 2 replies; 16+ messages in thread
From: Stephen Hemminger @ 2025-06-28 15:15 UTC (permalink / raw)
To: Cong Wang
Cc: William Liu, netdev, jhs, victor, pctammela, pabeni, kuba,
dcaratti, savy, jiri, davem, edumazet, horms
On Fri, 27 Jun 2025 17:15:08 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, Jun 27, 2025 at 06:17:31AM +0000, William Liu wrote:
> > netem_enqueue's duplication prevention logic breaks when a netem
> > resides in a qdisc tree with other netems - this can lead to a
> > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > Ensure that a duplicating netem cannot exist in a tree with other
> > netems.
> >
>
> Thanks for providing more details.
>
> > Previous approaches suggested in discussions in chronological order:
> >
> > 1) Track duplication status or ttl in the sk_buff struct. Considered
> > too specific a use case to extend such a struct, though this would
> > be a resilient fix and address other previous and potential future
> > DOS bugs like the one described in loopy fun [2].
> >
> > 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> > per cpu variable. However, netem_dequeue can call enqueue on its
> > child, and the depth restriction could be bypassed if the child is a
> > netem.
> >
> > 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> > to handle the netem_dequeue case and track a packet's involvement
> > in duplication. This is an overly complex approach, and Jamal
> > notes that the skb cb can be overwritten to circumvent this
> > safeguard.
>
> This approach looks most elegant to me since it is per-skb and only
> contained for netem. Since netem_skb_cb is shared among qdisc's, what
> about just extending qdisc_skb_cb? Something like:
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 638948be4c50..4c5505661986 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -436,6 +436,7 @@ struct qdisc_skb_cb {
> unsigned int pkt_len;
> u16 slave_dev_queue_mapping;
> u16 tc_classid;
> + u32 reserved;
> };
> #define QDISC_CB_PRIV_LEN 20
> unsigned char data[QDISC_CB_PRIV_LEN];
>
>
> Then we just set and check it for duplicated skbs:
>
>
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fdd79d3ccd8c..4290f8fca0e9 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -486,7 +486,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> * If we need to duplicate packet, then clone it before
> * original is modified.
> */
> - if (count > 1)
> + if (count > 1 && !qdisc_skb_cb(skb)->reserved)
> skb2 = skb_clone(skb, GFP_ATOMIC);
>
> /*
> @@ -540,9 +540,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> struct Qdisc *rootq = qdisc_root_bh(sch);
> u32 dupsave = q->duplicate; /* prevent duplicating a dup... */
>
> - q->duplicate = 0;
> + qdisc_skb_cb(skb2)->reserved = dupsave;
> rootq->enqueue(skb2, rootq, to_free);
> - q->duplicate = dupsave;
> skb2 = NULL;
> }
>
>
> Could this work? It looks even shorter than your patch. :-)
>
> Note, I don't even compile test it, I just show it to you for discussion.
>
> Regards,
> Cong Wang
Looks like an ok workaround, but the name 'reserved' is most often used
as placeholder in API's. Maybe something like 'duplicated'?
Why a whole u32 for one flag?
This increases qdisc_skb_cb from 28 bytes to 32 bytes.
So still ok, but there should be a build check that it is less than
space in skb->cb.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-28 15:15 ` Stephen Hemminger
@ 2025-06-28 21:15 ` Jamal Hadi Salim
2025-07-01 18:11 ` Eric Dumazet
1 sibling, 0 replies; 16+ messages in thread
From: Jamal Hadi Salim @ 2025-06-28 21:15 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Cong Wang, William Liu, netdev, victor, pctammela, pabeni, kuba,
dcaratti, savy, jiri, davem, edumazet, horms
On Sat, Jun 28, 2025 at 11:15 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Fri, 27 Jun 2025 17:15:08 -0700
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> > On Fri, Jun 27, 2025 at 06:17:31AM +0000, William Liu wrote:
> > > netem_enqueue's duplication prevention logic breaks when a netem
> > > resides in a qdisc tree with other netems - this can lead to a
> > > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > > Ensure that a duplicating netem cannot exist in a tree with other
> > > netems.
> > >
> >
> > Thanks for providing more details.
> >
> > > Previous approaches suggested in discussions in chronological order:
> > >
> > > 1) Track duplication status or ttl in the sk_buff struct. Considered
> > > too specific a use case to extend such a struct, though this would
> > > be a resilient fix and address other previous and potential future
> > > DOS bugs like the one described in loopy fun [2].
> > >
> > > 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> > > per cpu variable. However, netem_dequeue can call enqueue on its
> > > child, and the depth restriction could be bypassed if the child is a
> > > netem.
> > >
> > > 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> > > to handle the netem_dequeue case and track a packet's involvement
> > > in duplication. This is an overly complex approach, and Jamal
> > > notes that the skb cb can be overwritten to circumvent this
> > > safeguard.
> >
> > This approach looks most elegant to me since it is per-skb and only
> > contained for netem. Since netem_skb_cb is shared among qdisc's, what
> > about just extending qdisc_skb_cb? Something like:
> >
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index 638948be4c50..4c5505661986 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -436,6 +436,7 @@ struct qdisc_skb_cb {
> > unsigned int pkt_len;
> > u16 slave_dev_queue_mapping;
> > u16 tc_classid;
> > + u32 reserved;
> > };
> > #define QDISC_CB_PRIV_LEN 20
> > unsigned char data[QDISC_CB_PRIV_LEN];
> >
> >
> > Then we just set and check it for duplicated skbs:
> >
> >
> > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > index fdd79d3ccd8c..4290f8fca0e9 100644
> > --- a/net/sched/sch_netem.c
> > +++ b/net/sched/sch_netem.c
> > @@ -486,7 +486,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > * If we need to duplicate packet, then clone it before
> > * original is modified.
> > */
> > - if (count > 1)
> > + if (count > 1 && !qdisc_skb_cb(skb)->reserved)
> > skb2 = skb_clone(skb, GFP_ATOMIC);
> >
> > /*
> > @@ -540,9 +540,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > struct Qdisc *rootq = qdisc_root_bh(sch);
> > u32 dupsave = q->duplicate; /* prevent duplicating a dup... */
> >
> > - q->duplicate = 0;
> > + qdisc_skb_cb(skb2)->reserved = dupsave;
> > rootq->enqueue(skb2, rootq, to_free);
> > - q->duplicate = dupsave;
> > skb2 = NULL;
> > }
> >
> >
> > Could this work? It looks even shorter than your patch. :-)
> >
> > Note, I don't even compile test it, I just show it to you for discussion.
> >
> > Regards,
> > Cong Wang
>
> Looks like an ok workaround, but the name 'reserved' is most often used
> as placeholder in API's. Maybe something like 'duplicated'?
>
> Why a whole u32 for one flag?
>
> This increases qdisc_skb_cb from 28 bytes to 32 bytes.
> So still ok, but there should be a build check that it is less than
> space in skb->cb.
>
This is a nonsensical setup (in this case, it is nonsensical to have a
netem duplicate with a child netem in the hierarchy both with
duplicates) and imo adding fields to handle nonsensical setups is not
a good idea. I doubt that code will compile as is for example..
From my POV Ive had it with these nonsensical setups from the bounty
hunting crowd (the majority of the pawning ones are nonsensical) -
disallowing these setups by adding deny lists is a good approach. I
would not have minded to add the field if this was a legitimate
setup....
cheers,
jamal
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-06-28 15:15 ` Stephen Hemminger
2025-06-28 21:15 ` Jamal Hadi Salim
@ 2025-07-01 18:11 ` Eric Dumazet
2025-07-01 18:46 ` William Liu
1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2025-07-01 18:11 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Cong Wang, William Liu, netdev, jhs, victor, pctammela, pabeni,
kuba, dcaratti, savy, jiri, davem, horms
On Sat, Jun 28, 2025 at 8:15 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
>
>
> Why a whole u32 for one flag?
>
> This increases qdisc_skb_cb from 28 bytes to 32 bytes.
> So still ok, but there should be a build check that it is less than
> space in skb->cb.
I am pretty sure this will break some configs. This has been discussed
in the past.
I would guess drivers/net/amt.c would need some tweaks.
commit bec161add35b478a7746bf58bcdea6faa19129ef
Author: Taehee Yoo <ap420073@gmail.com>
Date: Sun Jan 7 14:42:41 2024 +0000
amt: do not use overwrapped cb area
Also net/core/filter.c:9740 would complain.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree
2025-07-01 18:11 ` Eric Dumazet
@ 2025-07-01 18:46 ` William Liu
0 siblings, 0 replies; 16+ messages in thread
From: William Liu @ 2025-07-01 18:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: Stephen Hemminger, Cong Wang, netdev, jhs, victor, pctammela,
pabeni, kuba, dcaratti, savy, jiri, davem, horms
On Tuesday, July 1st, 2025 at 6:11 PM, Eric Dumazet <edumazet@google.com> wrote:
>
>
> On Sat, Jun 28, 2025 at 8:15 AM Stephen Hemminger
> stephen@networkplumber.org wrote:
>
> > Why a whole u32 for one flag?
> >
> > This increases qdisc_skb_cb from 28 bytes to 32 bytes.
> > So still ok, but there should be a build check that it is less than
> > space in skb->cb.
>
>
> I am pretty sure this will break some configs. This has been discussed
> in the past.
>
> I would guess drivers/net/amt.c would need some tweaks.
>
> commit bec161add35b478a7746bf58bcdea6faa19129ef
> Author: Taehee Yoo ap420073@gmail.com
>
> Date: Sun Jan 7 14:42:41 2024 +0000
>
> amt: do not use overwrapped cb area
>
> Also net/core/filter.c:9740 would complain.
In that case, maybe it is safer to just break a config very few (if any) people use than risk spreading more breakage? I doubt people are stacking netems together as they would pretty easily encounter the OOM loop issue - the only legitimate use case is having multiple duplicating netems as children in separate paths of the tree.
Of course, this can all be solved if people would be happy to add a bit to sk_buff. From some brief testing, this did not add a byte, and even if it did, it shouldn't matter memory wise since we are backed by a slab allocator. And if we take that route, we might as well bring back the ttl fields Jamal was referring to in [1] and prevent potential future loop based DOS bugs.
I just hope this can be resolved soon, this bug report has been going on for quite a while (since May 6th).
Best,
Will
[1] https://lwn.net/Articles/719297/
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-07-01 19:09 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27 6:17 [PATCH net v4 1/2] net/sched: Restrict conditions for adding duplicating netems to qdisc tree William Liu
2025-06-28 0:15 ` Cong Wang
2025-06-28 4:23 ` William Liu
2025-06-28 21:25 ` Jamal Hadi Salim
2025-06-29 20:16 ` Cong Wang
2025-06-30 11:32 ` Jamal Hadi Salim
2025-06-30 22:39 ` Cong Wang
2025-07-01 13:36 ` Paolo Abeni
2025-07-01 14:15 ` Jamal Hadi Salim
2025-07-01 17:31 ` Cong Wang
2025-07-01 17:37 ` William Liu
2025-07-01 19:08 ` Jamal Hadi Salim
2025-06-28 15:15 ` Stephen Hemminger
2025-06-28 21:15 ` Jamal Hadi Salim
2025-07-01 18:11 ` Eric Dumazet
2025-07-01 18:46 ` William Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).