* [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications
@ 2025-04-07 10:55 Toke Høiland-Jørgensen
2025-04-07 11:12 ` Jiri Pirko
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-07 10:55 UTC (permalink / raw)
To: Jamal Hadi Salim, Cong Wang, Jiri Pirko
Cc: Ilya Maximets, Toke Høiland-Jørgensen, Frode Nordahl,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
The tfilter_notify() and tfilter_del_notify() functions assume that
NLMSG_GOODSIZE is always enough to dump the filter chain. This is not
always the case, which can lead to silent notify failures (because the
return code of tfilter_notify() is not always checked). In particular,
this can lead to NLM_F_ECHO not being honoured even though an action
succeeds, which forces userspace to create workarounds[0].
Fix this by increasing the message size if dumping the filter chain into
the allocated skb fails. Use the size of the incoming skb as a size hint
if set, so we can start at a larger value when appropriate.
To trigger this, run the following commands:
# ip link add type veth
# tc qdisc replace dev veth0 root handle 1: fq_codel
# tc -echo filter add dev veth0 parent 1: u32 match u32 0 0 $(for i in $(seq 32); do echo action pedit munge ip dport set 22; done)
Before this fix, tc just returns:
Not a filter(cmd 2)
After the fix, we get the correct echo:
added filter dev veth0 parent 1: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid not_in_hw
match 00000000/00000000 at 0
action order 1: pedit action pass keys 1
index 1 ref 1 bind 1
key #0 at 20: val 00000016 mask ffff0000
[repeated 32 times]
[0] https://github.com/openvswitch/ovs/commit/106ef21860c935e5e0017a88bf42b94025c4e511
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/2018500
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
net/sched/cls_api.c | 66 ++++++++++++++++++++++++++++++---------------
1 file changed, 45 insertions(+), 21 deletions(-)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 4f648af8cfaa..ecec0a1e1c1a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2057,6 +2057,7 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
struct tcmsg *tcm;
struct nlmsghdr *nlh;
unsigned char *b = skb_tail_pointer(skb);
+ int ret = -EMSGSIZE;
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
if (!nlh)
@@ -2101,11 +2102,45 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
return skb->len;
+cls_op_not_supp:
+ ret = -EOPNOTSUPP;
out_nlmsg_trim:
nla_put_failure:
-cls_op_not_supp:
nlmsg_trim(skb, b);
- return -1;
+ return ret;
+}
+
+static struct sk_buff *tfilter_notify_prep(struct net *net,
+ struct sk_buff *oskb,
+ struct nlmsghdr *n,
+ struct tcf_proto *tp,
+ struct tcf_block *block,
+ struct Qdisc *q, u32 parent,
+ void *fh, int event,
+ u32 portid, bool rtnl_held,
+ struct netlink_ext_ack *extack)
+{
+ unsigned int size = oskb ? max(NLMSG_GOODSIZE, oskb->len) : NLMSG_GOODSIZE;
+ struct sk_buff *skb;
+ int ret;
+
+retry:
+ skb = alloc_skb(size, GFP_KERNEL);
+ if (!skb)
+ return ERR_PTR(-ENOBUFS);
+
+ ret = tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
+ n->nlmsg_seq, n->nlmsg_flags, event, false,
+ rtnl_held, extack);
+ if (ret <= 0) {
+ kfree_skb(skb);
+ if (ret == -EMSGSIZE) {
+ size += NLMSG_GOODSIZE;
+ goto retry;
+ }
+ return ERR_PTR(-EINVAL);
+ }
+ return skb;
}
static int tfilter_notify(struct net *net, struct sk_buff *oskb,
@@ -2121,16 +2156,10 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb,
if (!unicast && !rtnl_notify_needed(net, n->nlmsg_flags, RTNLGRP_TC))
return 0;
- skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
- if (!skb)
- return -ENOBUFS;
-
- if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
- n->nlmsg_seq, n->nlmsg_flags, event,
- false, rtnl_held, extack) <= 0) {
- kfree_skb(skb);
- return -EINVAL;
- }
+ skb = tfilter_notify_prep(net, oskb, n, tp, block, q, parent, fh, event,
+ portid, rtnl_held, extack);
+ if (IS_ERR(skb))
+ return PTR_ERR(skb);
if (unicast)
err = rtnl_unicast(skb, net, portid);
@@ -2153,16 +2182,11 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
if (!rtnl_notify_needed(net, n->nlmsg_flags, RTNLGRP_TC))
return tp->ops->delete(tp, fh, last, rtnl_held, extack);
- skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
- if (!skb)
- return -ENOBUFS;
-
- if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
- n->nlmsg_seq, n->nlmsg_flags, RTM_DELTFILTER,
- false, rtnl_held, extack) <= 0) {
+ skb = tfilter_notify_prep(net, oskb, n, tp, block, q, parent, fh,
+ RTM_DELTFILTER, portid, rtnl_held, extack);
+ if (IS_ERR(skb)) {
NL_SET_ERR_MSG(extack, "Failed to build del event notification");
- kfree_skb(skb);
- return -EINVAL;
+ return PTR_ERR(skb);
}
err = tp->ops->delete(tp, fh, last, rtnl_held, extack);
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications
2025-04-07 10:55 [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications Toke Høiland-Jørgensen
@ 2025-04-07 11:12 ` Jiri Pirko
2025-04-08 11:57 ` Paolo Abeni
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Jiri Pirko @ 2025-04-07 11:12 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Jamal Hadi Salim, Cong Wang, Ilya Maximets, Frode Nordahl,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
Mon, Apr 07, 2025 at 12:55:34PM +0200, toke@redhat.com wrote:
>The tfilter_notify() and tfilter_del_notify() functions assume that
>NLMSG_GOODSIZE is always enough to dump the filter chain. This is not
>always the case, which can lead to silent notify failures (because the
>return code of tfilter_notify() is not always checked). In particular,
>this can lead to NLM_F_ECHO not being honoured even though an action
>succeeds, which forces userspace to create workarounds[0].
>
>Fix this by increasing the message size if dumping the filter chain into
>the allocated skb fails. Use the size of the incoming skb as a size hint
>if set, so we can start at a larger value when appropriate.
>
>To trigger this, run the following commands:
>
> # ip link add type veth
> # tc qdisc replace dev veth0 root handle 1: fq_codel
> # tc -echo filter add dev veth0 parent 1: u32 match u32 0 0 $(for i in $(seq 32); do echo action pedit munge ip dport set 22; done)
>
>Before this fix, tc just returns:
>
>Not a filter(cmd 2)
>
>After the fix, we get the correct echo:
>
>added filter dev veth0 parent 1: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid not_in_hw
> match 00000000/00000000 at 0
> action order 1: pedit action pass keys 1
> index 1 ref 1 bind 1
> key #0 at 20: val 00000016 mask ffff0000
>[repeated 32 times]
>
>[0] https://github.com/openvswitch/ovs/commit/106ef21860c935e5e0017a88bf42b94025c4e511
>
>Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
>Closes: https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/2018500
>Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications
2025-04-07 10:55 [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications Toke Høiland-Jørgensen
2025-04-07 11:12 ` Jiri Pirko
@ 2025-04-08 11:57 ` Paolo Abeni
2025-04-08 12:59 ` Toke Høiland-Jørgensen
2025-04-08 12:10 ` patchwork-bot+netdevbpf
2025-04-09 17:36 ` Cong Wang
3 siblings, 1 reply; 8+ messages in thread
From: Paolo Abeni @ 2025-04-08 11:57 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, Jamal Hadi Salim, Cong Wang,
Jiri Pirko
Cc: Ilya Maximets, Frode Nordahl, David S. Miller, Eric Dumazet,
Jakub Kicinski, Simon Horman, netdev
On 4/7/25 12:55 PM, Toke Høiland-Jørgensen wrote:
> To trigger this, run the following commands:
>
> # ip link add type veth
> # tc qdisc replace dev veth0 root handle 1: fq_codel
> # tc -echo filter add dev veth0 parent 1: u32 match u32 0 0 $(for i in $(seq 32); do echo action pedit munge ip dport set 22; done)
>
> Before this fix, tc just returns:
>
> Not a filter(cmd 2)
>
> After the fix, we get the correct echo:
>
> added filter dev veth0 parent 1: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid not_in_hw
> match 00000000/00000000 at 0
> action order 1: pedit action pass keys 1
> index 1 ref 1 bind 1
> key #0 at 20: val 00000016 mask ffff0000
> [repeated 32 times]
I think it would be great if you could follow-up capturing the above in
a self-test.
Thanks!
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications
2025-04-08 11:57 ` Paolo Abeni
@ 2025-04-08 12:59 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-08 12:59 UTC (permalink / raw)
To: Paolo Abeni, Jamal Hadi Salim, Cong Wang, Jiri Pirko
Cc: Ilya Maximets, Frode Nordahl, David S. Miller, Eric Dumazet,
Jakub Kicinski, Simon Horman, netdev
Paolo Abeni <pabeni@redhat.com> writes:
> On 4/7/25 12:55 PM, Toke Høiland-Jørgensen wrote:
>> To trigger this, run the following commands:
>>
>> # ip link add type veth
>> # tc qdisc replace dev veth0 root handle 1: fq_codel
>> # tc -echo filter add dev veth0 parent 1: u32 match u32 0 0 $(for i in $(seq 32); do echo action pedit munge ip dport set 22; done)
>>
>> Before this fix, tc just returns:
>>
>> Not a filter(cmd 2)
>>
>> After the fix, we get the correct echo:
>>
>> added filter dev veth0 parent 1: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid not_in_hw
>> match 00000000/00000000 at 0
>> action order 1: pedit action pass keys 1
>> index 1 ref 1 bind 1
>> key #0 at 20: val 00000016 mask ffff0000
>> [repeated 32 times]
>
> I think it would be great if you could follow-up capturing the above in
> a self-test.
Sure, can do!
-Toke
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications
2025-04-07 10:55 [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications Toke Høiland-Jørgensen
2025-04-07 11:12 ` Jiri Pirko
2025-04-08 11:57 ` Paolo Abeni
@ 2025-04-08 12:10 ` patchwork-bot+netdevbpf
2025-04-09 17:36 ` Cong Wang
3 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-04-08 12:10 UTC (permalink / raw)
To: =?utf-8?b?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2VuIDx0b2tlQHJlZGhhdC5jb20+?=
Cc: jhs, xiyou.wangcong, jiri, i.maximets, frode.nordahl, davem,
edumazet, kuba, pabeni, horms, netdev
Hello:
This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Mon, 7 Apr 2025 12:55:34 +0200 you wrote:
> The tfilter_notify() and tfilter_del_notify() functions assume that
> NLMSG_GOODSIZE is always enough to dump the filter chain. This is not
> always the case, which can lead to silent notify failures (because the
> return code of tfilter_notify() is not always checked). In particular,
> this can lead to NLM_F_ECHO not being honoured even though an action
> succeeds, which forces userspace to create workarounds[0].
>
> [...]
Here is the summary with links:
- [net] tc: Ensure we have enough buffer space when sending filter netlink notifications
https://git.kernel.org/netdev/net/c/369609fc6272
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications
2025-04-07 10:55 [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications Toke Høiland-Jørgensen
` (2 preceding siblings ...)
2025-04-08 12:10 ` patchwork-bot+netdevbpf
@ 2025-04-09 17:36 ` Cong Wang
2025-04-10 10:28 ` Toke Høiland-Jørgensen
3 siblings, 1 reply; 8+ messages in thread
From: Cong Wang @ 2025-04-09 17:36 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Jamal Hadi Salim, Jiri Pirko, Ilya Maximets, Frode Nordahl,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
On Mon, Apr 07, 2025 at 12:55:34PM +0200, Toke Høiland-Jørgensen wrote:
> +static struct sk_buff *tfilter_notify_prep(struct net *net,
> + struct sk_buff *oskb,
> + struct nlmsghdr *n,
> + struct tcf_proto *tp,
> + struct tcf_block *block,
> + struct Qdisc *q, u32 parent,
> + void *fh, int event,
> + u32 portid, bool rtnl_held,
> + struct netlink_ext_ack *extack)
> +{
> + unsigned int size = oskb ? max(NLMSG_GOODSIZE, oskb->len) : NLMSG_GOODSIZE;
> + struct sk_buff *skb;
> + int ret;
> +
> +retry:
> + skb = alloc_skb(size, GFP_KERNEL);
> + if (!skb)
> + return ERR_PTR(-ENOBUFS);
> +
> + ret = tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
> + n->nlmsg_seq, n->nlmsg_flags, event, false,
> + rtnl_held, extack);
> + if (ret <= 0) {
> + kfree_skb(skb);
> + if (ret == -EMSGSIZE) {
> + size += NLMSG_GOODSIZE;
> + goto retry;
It is a bit concerning to see this technically unbound loop.
> + }
> + return ERR_PTR(-EINVAL);
I think you probably want to propagate the error code from
tcf_fill_node() here.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications
2025-04-09 17:36 ` Cong Wang
@ 2025-04-10 10:28 ` Toke Høiland-Jørgensen
2025-04-14 21:00 ` Cong Wang
0 siblings, 1 reply; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-10 10:28 UTC (permalink / raw)
To: Cong Wang
Cc: Jamal Hadi Salim, Jiri Pirko, Ilya Maximets, Frode Nordahl,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
Cong Wang <xiyou.wangcong@gmail.com> writes:
> On Mon, Apr 07, 2025 at 12:55:34PM +0200, Toke Høiland-Jørgensen wrote:
>> +static struct sk_buff *tfilter_notify_prep(struct net *net,
>> + struct sk_buff *oskb,
>> + struct nlmsghdr *n,
>> + struct tcf_proto *tp,
>> + struct tcf_block *block,
>> + struct Qdisc *q, u32 parent,
>> + void *fh, int event,
>> + u32 portid, bool rtnl_held,
>> + struct netlink_ext_ack *extack)
>> +{
>> + unsigned int size = oskb ? max(NLMSG_GOODSIZE, oskb->len) : NLMSG_GOODSIZE;
>> + struct sk_buff *skb;
>> + int ret;
>> +
>> +retry:
>> + skb = alloc_skb(size, GFP_KERNEL);
>> + if (!skb)
>> + return ERR_PTR(-ENOBUFS);
>> +
>> + ret = tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
>> + n->nlmsg_seq, n->nlmsg_flags, event, false,
>> + rtnl_held, extack);
>> + if (ret <= 0) {
>> + kfree_skb(skb);
>> + if (ret == -EMSGSIZE) {
>> + size += NLMSG_GOODSIZE;
>> + goto retry;
>
> It is a bit concerning to see this technically unbound loop.
Well, I did think about that. The loop will terminate eventually by
either succeeding, or failing the allocation. Most likely the former,
since this is only called after a filter has been successfully
installed. I.e., it's not like the amount of data being put into the skb
is unbounded.
>> + }
>> + return ERR_PTR(-EINVAL);
>
> I think you probably want to propagate the error code from
> tcf_fill_node() here.
I just kept the existing return value (of tfilter_notify()) for the same
error case. tcf_fill_node() always returns -1 on error, so I think it
makes more sense to keep this?
Paolo already merged the patch, and I don't think it's worth it to
follow up with any fixes, cf the above. WDYT?
-Toke
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications
2025-04-10 10:28 ` Toke Høiland-Jørgensen
@ 2025-04-14 21:00 ` Cong Wang
0 siblings, 0 replies; 8+ messages in thread
From: Cong Wang @ 2025-04-14 21:00 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Jamal Hadi Salim, Jiri Pirko, Ilya Maximets, Frode Nordahl,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
On Thu, Apr 10, 2025 at 12:28:43PM +0200, Toke Høiland-Jørgensen wrote:
> Cong Wang <xiyou.wangcong@gmail.com> writes:
>
> > On Mon, Apr 07, 2025 at 12:55:34PM +0200, Toke Høiland-Jørgensen wrote:
> >> +static struct sk_buff *tfilter_notify_prep(struct net *net,
> >> + struct sk_buff *oskb,
> >> + struct nlmsghdr *n,
> >> + struct tcf_proto *tp,
> >> + struct tcf_block *block,
> >> + struct Qdisc *q, u32 parent,
> >> + void *fh, int event,
> >> + u32 portid, bool rtnl_held,
> >> + struct netlink_ext_ack *extack)
> >> +{
> >> + unsigned int size = oskb ? max(NLMSG_GOODSIZE, oskb->len) : NLMSG_GOODSIZE;
> >> + struct sk_buff *skb;
> >> + int ret;
> >> +
> >> +retry:
> >> + skb = alloc_skb(size, GFP_KERNEL);
> >> + if (!skb)
> >> + return ERR_PTR(-ENOBUFS);
> >> +
> >> + ret = tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
> >> + n->nlmsg_seq, n->nlmsg_flags, event, false,
> >> + rtnl_held, extack);
> >> + if (ret <= 0) {
> >> + kfree_skb(skb);
> >> + if (ret == -EMSGSIZE) {
> >> + size += NLMSG_GOODSIZE;
> >> + goto retry;
> >
> > It is a bit concerning to see this technically unbound loop.
>
> Well, I did think about that. The loop will terminate eventually by
> either succeeding, or failing the allocation. Most likely the former,
> since this is only called after a filter has been successfully
> installed. I.e., it's not like the amount of data being put into the skb
> is unbounded.
Yeah, I totally agree, it is probably just a theoretical problem.
>
> >> + }
> >> + return ERR_PTR(-EINVAL);
> >
> > I think you probably want to propagate the error code from
> > tcf_fill_node() here.
>
> I just kept the existing return value (of tfilter_notify()) for the same
> error case. tcf_fill_node() always returns -1 on error, so I think it
> makes more sense to keep this?
>
> Paolo already merged the patch, and I don't think it's worth it to
> follow up with any fixes, cf the above. WDYT?
Not a big deal, we can keep as it is.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-04-14 21:00 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-07 10:55 [PATCH net] tc: Ensure we have enough buffer space when sending filter netlink notifications Toke Høiland-Jørgensen
2025-04-07 11:12 ` Jiri Pirko
2025-04-08 11:57 ` Paolo Abeni
2025-04-08 12:59 ` Toke Høiland-Jørgensen
2025-04-08 12:10 ` patchwork-bot+netdevbpf
2025-04-09 17:36 ` Cong Wang
2025-04-10 10:28 ` Toke Høiland-Jørgensen
2025-04-14 21:00 ` Cong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).