* [PATCH net] inet: bring NLM_DONE out to a separate recv() again
@ 2024-04-11 18:02 Jakub Kicinski
2024-04-11 18:42 ` Eric Dumazet
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Jakub Kicinski @ 2024-04-11 18:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, Jakub Kicinski, Stefano Brivio,
Ilya Maximets, dsahern, donald.hunter
Commit under Fixes optimized the number of recv() calls
needed during RTM_GETROUTE dumps, but we got multiple
reports of applications hanging on recv() calls.
Applications expect that a route dump will be terminated
with a recv() reading an individual NLM_DONE message.
Coalescing NLM_DONE is perfectly legal in netlink,
but even tho reporters fixed the code in respective
projects, chances are it will take time for those
applications to get updated. So revert to old behavior
(for now)?
Old kernel (5.19):
$ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
--dump getroute --json '{"rtm-family": 2}'
Recv: read 692 bytes, 11 messages
nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
...
nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
Recv: read 20 bytes, 1 messages
nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
Before (6.9-rc2):
$ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
--dump getroute --json '{"rtm-family": 2}'
Recv: read 712 bytes, 12 messages
nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
...
nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
After:
$ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
--dump getroute --json '{"rtm-family": 2}'
Recv: read 692 bytes, 11 messages
nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
...
nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
Recv: read 20 bytes, 1 messages
nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: dsahern@kernel.org
CC: donald.hunter@gmail.com
---
net/ipv4/fib_frontend.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 48741352a88a..c484b1c0fc00 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1050,6 +1050,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
e++;
}
}
+
+ /* Don't let NLM_DONE coalesce into a message, even if it could.
+ * Some user space expects NLM_DONE in a separate recv().
+ */
+ err = skb->len;
out:
cb->args[1] = e;
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-11 18:02 [PATCH net] inet: bring NLM_DONE out to a separate recv() again Jakub Kicinski
@ 2024-04-11 18:42 ` Eric Dumazet
2024-04-11 18:57 ` Jakub Kicinski
2024-04-11 19:35 ` Ilya Maximets
` (2 subsequent siblings)
3 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2024-04-11 18:42 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, pabeni, Stefano Brivio, Ilya Maximets, dsahern,
donald.hunter
On Thu, Apr 11, 2024 at 8:02 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Commit under Fixes optimized the number of recv() calls
> needed during RTM_GETROUTE dumps, but we got multiple
> reports of applications hanging on recv() calls.
> Applications expect that a route dump will be terminated
> with a recv() reading an individual NLM_DONE message.
>
> Coalescing NLM_DONE is perfectly legal in netlink,
> but even tho reporters fixed the code in respective
> projects, chances are it will take time for those
> applications to get updated. So revert to old behavior
> (for now)?
>
> Old kernel (5.19):
>
> Reported-by: Stefano Brivio <sbrivio@redhat.com>
> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
> Reported-by: Ilya Maximets <i.maximets@ovn.org>
> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
> Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: dsahern@kernel.org
> CC: donald.hunter@gmail.com
> ---
> net/ipv4/fib_frontend.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 48741352a88a..c484b1c0fc00 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1050,6 +1050,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
> e++;
> }
> }
> +
> + /* Don't let NLM_DONE coalesce into a message, even if it could.
> + * Some user space expects NLM_DONE in a separate recv().
> + */
> + err = skb->len;
My plan was to perform this generically from netlink_dump_done().
This would still avoid calling a RTNL-enabled-dump() again if EOF has
been met already.
A sysctl could opt-in for the coalescing, if there is interest.
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index dc8c3c01d51b709c132ff63a0c534c1cc286589a..cad1124393ac74f3d5bfa86556ed9028f5ec8f65
100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2282,7 +2282,12 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
cb->extack = NULL;
}
+ /* Don't let NLM_DONE coalesce into a message, even if it could.
+ * Some user space expects NLM_DONE in a separate recv().
+ * Maybe opt-in this coalescing with a sysctl or socket option ?
+ */
if (nlk->dump_done_errno > 0 ||
+ skb->len ||
skb_tailroom(skb) <
nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
mutex_unlock(&nlk->nl_cb_mutex);
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-11 18:42 ` Eric Dumazet
@ 2024-04-11 18:57 ` Jakub Kicinski
0 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2024-04-11 18:57 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, netdev, pabeni, Stefano Brivio, Ilya Maximets, dsahern,
donald.hunter
On Thu, 11 Apr 2024 20:42:45 +0200 Eric Dumazet wrote:
> My plan was to perform this generically from netlink_dump_done().
>
> This would still avoid calling a RTNL-enabled-dump() again if EOF has
> been met already.
>
> A sysctl could opt-in for the coalescing, if there is interest.
I'd prefer to keep hacks contained to the few places that need them.
Netlink is full of strange quirks, I don't think this one deserves
a sysctl.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-11 18:02 [PATCH net] inet: bring NLM_DONE out to a separate recv() again Jakub Kicinski
2024-04-11 18:42 ` Eric Dumazet
@ 2024-04-11 19:35 ` Ilya Maximets
2024-06-17 15:09 ` Ilya Maximets
2024-04-11 19:45 ` David Ahern
2024-04-15 9:30 ` patchwork-bot+netdevbpf
3 siblings, 1 reply; 15+ messages in thread
From: Ilya Maximets @ 2024-04-11 19:35 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: i.maximets, netdev, edumazet, pabeni, Stefano Brivio, dsahern,
donald.hunter
On 4/11/24 20:02, Jakub Kicinski wrote:
> Commit under Fixes optimized the number of recv() calls
> needed during RTM_GETROUTE dumps, but we got multiple
> reports of applications hanging on recv() calls.
> Applications expect that a route dump will be terminated
> with a recv() reading an individual NLM_DONE message.
>
> Coalescing NLM_DONE is perfectly legal in netlink,
> but even tho reporters fixed the code in respective
> projects, chances are it will take time for those
> applications to get updated. So revert to old behavior
> (for now)?
>
> Old kernel (5.19):
>
> $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
> --dump getroute --json '{"rtm-family": 2}'
> Recv: read 692 bytes, 11 messages
> nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
> ...
> nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
> Recv: read 20 bytes, 1 messages
> nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>
> Before (6.9-rc2):
>
> $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
> --dump getroute --json '{"rtm-family": 2}'
> Recv: read 712 bytes, 12 messages
> nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
> ...
> nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
> nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>
> After:
>
> $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
> --dump getroute --json '{"rtm-family": 2}'
> Recv: read 692 bytes, 11 messages
> nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
> ...
> nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
> Recv: read 20 bytes, 1 messages
> nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>
> Reported-by: Stefano Brivio <sbrivio@redhat.com>
> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
> Reported-by: Ilya Maximets <i.maximets@ovn.org>
> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
> Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: dsahern@kernel.org
> CC: donald.hunter@gmail.com
> ---
> net/ipv4/fib_frontend.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 48741352a88a..c484b1c0fc00 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1050,6 +1050,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
> e++;
> }
> }
> +
> + /* Don't let NLM_DONE coalesce into a message, even if it could.
> + * Some user space expects NLM_DONE in a separate recv().
> + */
> + err = skb->len;
> out:
>
> cb->args[1] = e;
FWIW, on current net-next this fixes the issue with Libreswan and IPv4
(IPv6 issue remains, obviously). I also did a round of other OVS system
tests and they worked fine.
Tested-by: Ilya Maximets <i.maximets@ovn.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-11 18:02 [PATCH net] inet: bring NLM_DONE out to a separate recv() again Jakub Kicinski
2024-04-11 18:42 ` Eric Dumazet
2024-04-11 19:35 ` Ilya Maximets
@ 2024-04-11 19:45 ` David Ahern
2024-04-11 21:01 ` Jakub Kicinski
2024-04-15 9:30 ` patchwork-bot+netdevbpf
3 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2024-04-11 19:45 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, Stefano Brivio, Ilya Maximets,
donald.hunter
On 4/11/24 12:02 PM, Jakub Kicinski wrote:
> Commit under Fixes optimized the number of recv() calls
> needed during RTM_GETROUTE dumps, but we got multiple
> reports of applications hanging on recv() calls.
> Applications expect that a route dump will be terminated
> with a recv() reading an individual NLM_DONE message.
>
> Coalescing NLM_DONE is perfectly legal in netlink,
> but even tho reporters fixed the code in respective
> projects, chances are it will take time for those
> applications to get updated. So revert to old behavior
> (for now)?
>
> Old kernel (5.19):
>
> $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
> --dump getroute --json '{"rtm-family": 2}'
> Recv: read 692 bytes, 11 messages
> nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
> ...
> nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
> Recv: read 20 bytes, 1 messages
> nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>
> Before (6.9-rc2):
>
> $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
> --dump getroute --json '{"rtm-family": 2}'
> Recv: read 712 bytes, 12 messages
> nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
> ...
> nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
> nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>
> After:
>
> $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
> --dump getroute --json '{"rtm-family": 2}'
> Recv: read 692 bytes, 11 messages
> nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
> ...
> nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
> Recv: read 20 bytes, 1 messages
> nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>
> Reported-by: Stefano Brivio <sbrivio@redhat.com>
> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
> Reported-by: Ilya Maximets <i.maximets@ovn.org>
> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
> Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: dsahern@kernel.org
> CC: donald.hunter@gmail.com
> ---
> net/ipv4/fib_frontend.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 48741352a88a..c484b1c0fc00 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1050,6 +1050,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
> e++;
> }
> }
> +
> + /* Don't let NLM_DONE coalesce into a message, even if it could.
> + * Some user space expects NLM_DONE in a separate recv().
that's unfortunate
> + */
> + err = skb->len;
> out:
>
> cb->args[1] = e;
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-11 19:45 ` David Ahern
@ 2024-04-11 21:01 ` Jakub Kicinski
2024-04-11 21:14 ` David Ahern
2024-04-12 17:22 ` Stefano Brivio
0 siblings, 2 replies; 15+ messages in thread
From: Jakub Kicinski @ 2024-04-11 21:01 UTC (permalink / raw)
To: David Ahern
Cc: davem, netdev, edumazet, pabeni, Stefano Brivio, Ilya Maximets,
donald.hunter
On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:
> > + /* Don't let NLM_DONE coalesce into a message, even if it could.
> > + * Some user space expects NLM_DONE in a separate recv().
>
> that's unfortunate
Do you have an opinion on the sysfs/opt-in question?
Feels to me like there shouldn't be that much user space doing raw
netlink, without a library. Old crufty code usually does ioctls, right?
So maybe we can periodically reintroduce this bug to shake out all
the bad apps? :D
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-11 21:01 ` Jakub Kicinski
@ 2024-04-11 21:14 ` David Ahern
2024-04-12 17:22 ` Stefano Brivio
1 sibling, 0 replies; 15+ messages in thread
From: David Ahern @ 2024-04-11 21:14 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, Stefano Brivio, Ilya Maximets,
donald.hunter
On 4/11/24 3:01 PM, Jakub Kicinski wrote:
> On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:
>>> + /* Don't let NLM_DONE coalesce into a message, even if it could.
>>> + * Some user space expects NLM_DONE in a separate recv().
>>
>> that's unfortunate
>
> Do you have an opinion on the sysfs/opt-in question?
Not a performance critical path, so I would not add it right now.
> Feels to me like there shouldn't be that much user space doing raw
> netlink, without a library. Old crufty code usually does ioctls, right?
> So maybe we can periodically reintroduce this bug to shake out all
> the bad apps? :D
:-) On the one hand, yea, push apps to improve. On the other, Donald
(FRR) is one who complains about nightmares chasing nuances across kernels.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-11 21:01 ` Jakub Kicinski
2024-04-11 21:14 ` David Ahern
@ 2024-04-12 17:22 ` Stefano Brivio
2024-04-12 17:38 ` Ilya Maximets
1 sibling, 1 reply; 15+ messages in thread
From: Stefano Brivio @ 2024-04-12 17:22 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Ahern, davem, netdev, edumazet, pabeni, Ilya Maximets,
donald.hunter
On Thu, 11 Apr 2024 14:01:54 -0700
Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:
> > > + /* Don't let NLM_DONE coalesce into a message, even if it could.
> > > + * Some user space expects NLM_DONE in a separate recv().
> >
> > that's unfortunate
>
> Do you have an opinion on the sysfs/opt-in question?
> Feels to me like there shouldn't be that much user space doing raw
> netlink, without a library. Old crufty code usually does ioctls, right?
I think so too -- if there were more (maintained) applications with
this issue, we would have noticed by now.
> So maybe we can periodically reintroduce this bug to shake out all
> the bad apps? :D
Actually, I had half a mind of proposing something on these lines: add
a TODO comment here and revisit in, say, two years.
I guess it's definitely more painful for libreswan, but for passt, I
think it's quite unlikely that distribution users could get the
"breaking" kernel change without a fixed version of the application: we
made a new release relatively close to the NLM_DONE change.
There might be substantial value in keeping this type of short netlink
exchanges fast, for example for container engines that need to be able
to spawn a bazillion containers per second.
--
Stefano
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-12 17:22 ` Stefano Brivio
@ 2024-04-12 17:38 ` Ilya Maximets
2024-04-12 18:03 ` Stefano Brivio
0 siblings, 1 reply; 15+ messages in thread
From: Ilya Maximets @ 2024-04-12 17:38 UTC (permalink / raw)
To: Stefano Brivio, Jakub Kicinski
Cc: i.maximets, David Ahern, davem, netdev, edumazet, pabeni,
donald.hunter
On 4/12/24 19:22, Stefano Brivio wrote:
> On Thu, 11 Apr 2024 14:01:54 -0700
> Jakub Kicinski <kuba@kernel.org> wrote:
>
>> On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:
>>>> + /* Don't let NLM_DONE coalesce into a message, even if it could.
>>>> + * Some user space expects NLM_DONE in a separate recv().
>>>
>>> that's unfortunate
>>
>> Do you have an opinion on the sysfs/opt-in question?
>> Feels to me like there shouldn't be that much user space doing raw
>> netlink, without a library. Old crufty code usually does ioctls, right?
>
> I think so too -- if there were more (maintained) applications with
> this issue, we would have noticed by now.
It depends on how you define "maintained". Most application devs
do not test with unreleased kernels.
>
>> So maybe we can periodically reintroduce this bug to shake out all
>> the bad apps? :D
>
> Actually, I had half a mind of proposing something on these lines: add
> a TODO comment here and revisit in, say, two years.
>
> I guess it's definitely more painful for libreswan, but for passt, I
> think it's quite unlikely that distribution users could get the
> "breaking" kernel change without a fixed version of the application: we
> made a new release relatively close to the NLM_DONE change.
>
> There might be substantial value in keeping this type of short netlink
> exchanges fast, for example for container engines that need to be able
> to spawn a bazillion containers per second.
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-12 17:38 ` Ilya Maximets
@ 2024-04-12 18:03 ` Stefano Brivio
2024-04-12 18:22 ` Ilya Maximets
0 siblings, 1 reply; 15+ messages in thread
From: Stefano Brivio @ 2024-04-12 18:03 UTC (permalink / raw)
To: Ilya Maximets
Cc: Jakub Kicinski, David Ahern, davem, netdev, edumazet, pabeni,
donald.hunter
On Fri, 12 Apr 2024 19:38:53 +0200
Ilya Maximets <i.maximets@ovn.org> wrote:
> On 4/12/24 19:22, Stefano Brivio wrote:
> > On Thu, 11 Apr 2024 14:01:54 -0700
> > Jakub Kicinski <kuba@kernel.org> wrote:
> >
> >> On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:
> >>>> + /* Don't let NLM_DONE coalesce into a message, even if it could.
> >>>> + * Some user space expects NLM_DONE in a separate recv().
> >>>
> >>> that's unfortunate
> >>
> >> Do you have an opinion on the sysfs/opt-in question?
> >> Feels to me like there shouldn't be that much user space doing raw
> >> netlink, without a library. Old crufty code usually does ioctls, right?
> >
> > I think so too -- if there were more (maintained) applications with
> > this issue, we would have noticed by now.
>
> It depends on how you define "maintained". Most application devs
> do not test with unreleased kernels.
I haven't, either, but users started shouting: we have nowadays plenty
of distributions shipping unreleased kernels.
--
Stefano
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-12 18:03 ` Stefano Brivio
@ 2024-04-12 18:22 ` Ilya Maximets
0 siblings, 0 replies; 15+ messages in thread
From: Ilya Maximets @ 2024-04-12 18:22 UTC (permalink / raw)
To: Stefano Brivio
Cc: i.maximets, Jakub Kicinski, David Ahern, davem, netdev, edumazet,
pabeni, donald.hunter
On 4/12/24 20:03, Stefano Brivio wrote:
> On Fri, 12 Apr 2024 19:38:53 +0200
> Ilya Maximets <i.maximets@ovn.org> wrote:
>
>> On 4/12/24 19:22, Stefano Brivio wrote:
>>> On Thu, 11 Apr 2024 14:01:54 -0700
>>> Jakub Kicinski <kuba@kernel.org> wrote:
>>>
>>>> On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:
>>>>>> + /* Don't let NLM_DONE coalesce into a message, even if it could.
>>>>>> + * Some user space expects NLM_DONE in a separate recv().
>>>>>
>>>>> that's unfortunate
>>>>
>>>> Do you have an opinion on the sysfs/opt-in question?
>>>> Feels to me like there shouldn't be that much user space doing raw
>>>> netlink, without a library. Old crufty code usually does ioctls, right?
>>>
>>> I think so too -- if there were more (maintained) applications with
>>> this issue, we would have noticed by now.
>>
>> It depends on how you define "maintained". Most application devs
>> do not test with unreleased kernels.
>
> I haven't, either, but users started shouting: we have nowadays plenty
> of distributions shipping unreleased kernels.
>
In OVS we typically don't get any bug reports from users until the
issue hits a major distribution like RHEL or Ubuntu.
Slightly different crowd, I guess. :)
Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-11 18:02 [PATCH net] inet: bring NLM_DONE out to a separate recv() again Jakub Kicinski
` (2 preceding siblings ...)
2024-04-11 19:45 ` David Ahern
@ 2024-04-15 9:30 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-04-15 9:30 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, sbrivio, i.maximets, dsahern,
donald.hunter
Hello:
This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:
On Thu, 11 Apr 2024 11:02:02 -0700 you wrote:
> Commit under Fixes optimized the number of recv() calls
> needed during RTM_GETROUTE dumps, but we got multiple
> reports of applications hanging on recv() calls.
> Applications expect that a route dump will be terminated
> with a recv() reading an individual NLM_DONE message.
>
> Coalescing NLM_DONE is perfectly legal in netlink,
> but even tho reporters fixed the code in respective
> projects, chances are it will take time for those
> applications to get updated. So revert to old behavior
> (for now)?
>
> [...]
Here is the summary with links:
- [net] inet: bring NLM_DONE out to a separate recv() again
https://git.kernel.org/netdev/net/c/460b0d33cf10
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-04-11 19:35 ` Ilya Maximets
@ 2024-06-17 15:09 ` Ilya Maximets
2024-06-17 16:36 ` Jakub Kicinski
0 siblings, 1 reply; 15+ messages in thread
From: Ilya Maximets @ 2024-06-17 15:09 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: i.maximets, netdev, edumazet, pabeni, Stefano Brivio, dsahern,
donald.hunter, Sabrina Dubroca
On 4/11/24 21:35, Ilya Maximets wrote:
> On 4/11/24 20:02, Jakub Kicinski wrote:
>> Commit under Fixes optimized the number of recv() calls
>> needed during RTM_GETROUTE dumps, but we got multiple
>> reports of applications hanging on recv() calls.
>> Applications expect that a route dump will be terminated
>> with a recv() reading an individual NLM_DONE message.
>>
>> Coalescing NLM_DONE is perfectly legal in netlink,
>> but even tho reporters fixed the code in respective
>> projects, chances are it will take time for those
>> applications to get updated. So revert to old behavior
>> (for now)?
>>
>> Old kernel (5.19):
>>
>> $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>> --dump getroute --json '{"rtm-family": 2}'
>> Recv: read 692 bytes, 11 messages
>> nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>> ...
>> nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>> Recv: read 20 bytes, 1 messages
>> nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>>
>> Before (6.9-rc2):
>>
>> $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>> --dump getroute --json '{"rtm-family": 2}'
>> Recv: read 712 bytes, 12 messages
>> nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>> ...
>> nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>> nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>>
>> After:
>>
>> $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>> --dump getroute --json '{"rtm-family": 2}'
>> Recv: read 692 bytes, 11 messages
>> nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>> ...
>> nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>> Recv: read 20 bytes, 1 messages
>> nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>>
>> Reported-by: Stefano Brivio <sbrivio@redhat.com>
>> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
>> Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> ---
>> CC: dsahern@kernel.org
>> CC: donald.hunter@gmail.com
>> ---
>> net/ipv4/fib_frontend.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index 48741352a88a..c484b1c0fc00 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -1050,6 +1050,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
>> e++;
>> }
>> }
>> +
>> + /* Don't let NLM_DONE coalesce into a message, even if it could.
>> + * Some user space expects NLM_DONE in a separate recv().
>> + */
>> + err = skb->len;
>> out:
>>
>> cb->args[1] = e;
>
> FWIW, on current net-next this fixes the issue with Libreswan and IPv4
> (IPv6 issue remains, obviously). I also did a round of other OVS system
> tests and they worked fine.
>
> Tested-by: Ilya Maximets <i.maximets@ovn.org>
Hi, Jakub. Now that IPv6 change is in 6.10-rc, do you plan to submit a similar
fix for it as well? (Sorry if I missed it.) Libreswan is getting stuck on IPv6
route lookups with 6.10-rc4.
Note: Libreswan fixed the issue on their main branch, but it is not available in
any release yet, and I'm not sure if the fix is going to make it into stable
releases.
Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-06-17 15:09 ` Ilya Maximets
@ 2024-06-17 16:36 ` Jakub Kicinski
2024-06-17 17:05 ` Ilya Maximets
0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2024-06-17 16:36 UTC (permalink / raw)
To: Ilya Maximets
Cc: davem, netdev, edumazet, pabeni, Stefano Brivio, dsahern,
donald.hunter, Sabrina Dubroca
On Mon, 17 Jun 2024 17:09:44 +0200 Ilya Maximets wrote:
> > FWIW, on current net-next this fixes the issue with Libreswan and IPv4
> > (IPv6 issue remains, obviously). I also did a round of other OVS system
> > tests and they worked fine.
> >
> > Tested-by: Ilya Maximets <i.maximets@ovn.org>
>
> Hi, Jakub. Now that IPv6 change is in 6.10-rc, do you plan to submit a similar
> fix for it as well? (Sorry if I missed it.) Libreswan is getting stuck on IPv6
> route lookups with 6.10-rc4.
>
> Note: Libreswan fixed the issue on their main branch, but it is not available in
> any release yet, and I'm not sure if the fix is going to make it into stable
> releases.
Sorry for the delay, we'll get it resolved before this week's PR.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again
2024-06-17 16:36 ` Jakub Kicinski
@ 2024-06-17 17:05 ` Ilya Maximets
0 siblings, 0 replies; 15+ messages in thread
From: Ilya Maximets @ 2024-06-17 17:05 UTC (permalink / raw)
To: Jakub Kicinski
Cc: i.maximets, davem, netdev, edumazet, pabeni, Stefano Brivio,
dsahern, donald.hunter, Sabrina Dubroca
On 6/17/24 18:36, Jakub Kicinski wrote:
> On Mon, 17 Jun 2024 17:09:44 +0200 Ilya Maximets wrote:
>>> FWIW, on current net-next this fixes the issue with Libreswan and IPv4
>>> (IPv6 issue remains, obviously). I also did a round of other OVS system
>>> tests and they worked fine.
>>>
>>> Tested-by: Ilya Maximets <i.maximets@ovn.org>
>>
>> Hi, Jakub. Now that IPv6 change is in 6.10-rc, do you plan to submit a similar
>> fix for it as well? (Sorry if I missed it.) Libreswan is getting stuck on IPv6
>> route lookups with 6.10-rc4.
>>
>> Note: Libreswan fixed the issue on their main branch, but it is not available in
>> any release yet, and I'm not sure if the fix is going to make it into stable
>> releases.
>
> Sorry for the delay, we'll get it resolved before this week's PR.
Ack. Thanks!
Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-06-17 17:05 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-11 18:02 [PATCH net] inet: bring NLM_DONE out to a separate recv() again Jakub Kicinski
2024-04-11 18:42 ` Eric Dumazet
2024-04-11 18:57 ` Jakub Kicinski
2024-04-11 19:35 ` Ilya Maximets
2024-06-17 15:09 ` Ilya Maximets
2024-06-17 16:36 ` Jakub Kicinski
2024-06-17 17:05 ` Ilya Maximets
2024-04-11 19:45 ` David Ahern
2024-04-11 21:01 ` Jakub Kicinski
2024-04-11 21:14 ` David Ahern
2024-04-12 17:22 ` Stefano Brivio
2024-04-12 17:38 ` Ilya Maximets
2024-04-12 18:03 ` Stefano Brivio
2024-04-12 18:22 ` Ilya Maximets
2024-04-15 9:30 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).