* Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9
@ 2024-06-13 12:18 Maciej Żenczykowski
2024-06-13 13:29 ` Jakub Kicinski
0 siblings, 1 reply; 4+ messages in thread
From: Maciej Żenczykowski @ 2024-06-13 12:18 UTC (permalink / raw)
To: Linux NetDev, Eric Dumazet, Paolo Abeni, Jakub Kicinski,
David S. Miller
The Android net tests
(available at https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/
more specifically multinetwork_test.py & neighbour_test.py)
run via:
/...aosp-tests.../net/test/run_net_test.sh --builder
from within a 6.10-rc3 kernel tree are falling over due to a *plethora* of:
TypeError: NLMsgHdr requires a bytes object of length 16, got 4
The problems might be limited to RTM_GETROUTE and RTM_GETRULE and RTM_GETNEIGH,
as various other netlink using xfrm tests appear to be okay...
(note: 6.10-rc3 also fails to build for UML due to a buggy bpf change,
but I sent out a 1-line fix for that already:
https://patchwork.kernel.org/project/netdevbpf/patch/20240613112520.1526350-1-maze@google.com/
)
It is of course entirely possible the test code is buggy in how it
parses netlink, but it has worked for years and years...
Before I go trying to bisect this... anyone have any idea what might
be the cause?
Perhaps some sort of change to how these dumps work? Some sort of new
netlink extended errors?
- Maciej
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9
2024-06-13 12:18 Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9 Maciej Żenczykowski
@ 2024-06-13 13:29 ` Jakub Kicinski
2024-06-13 14:21 ` Maciej Żenczykowski
0 siblings, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2024-06-13 13:29 UTC (permalink / raw)
To: Maciej Żenczykowski
Cc: Linux NetDev, Eric Dumazet, Paolo Abeni, David S. Miller
On Thu, 13 Jun 2024 14:18:41 +0200 Maciej Żenczykowski wrote:
> The Android net tests
> (available at https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/
> more specifically multinetwork_test.py & neighbour_test.py)
> run via:
> /...aosp-tests.../net/test/run_net_test.sh --builder
> from within a 6.10-rc3 kernel tree are falling over due to a *plethora* of:
> TypeError: NLMsgHdr requires a bytes object of length 16, got 4
>
> The problems might be limited to RTM_GETROUTE and RTM_GETRULE and RTM_GETNEIGH,
> as various other netlink using xfrm tests appear to be okay...
>
> (note: 6.10-rc3 also fails to build for UML due to a buggy bpf change,
> but I sent out a 1-line fix for that already:
> https://patchwork.kernel.org/project/netdevbpf/patch/20240613112520.1526350-1-maze@google.com/
> )
>
> It is of course entirely possible the test code is buggy in how it
> parses netlink, but it has worked for years and years...
>
> Before I go trying to bisect this... anyone have any idea what might
> be the cause?
> Perhaps some sort of change to how these dumps work? Some sort of new
> netlink extended errors?
Take a look at commit 5b4b62a169e1 ("rtnetlink: make the "split"
NLM_DONE handling generic"), there may be more such workarounds missing.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9
2024-06-13 13:29 ` Jakub Kicinski
@ 2024-06-13 14:21 ` Maciej Żenczykowski
2024-06-13 14:59 ` Jakub Kicinski
0 siblings, 1 reply; 4+ messages in thread
From: Maciej Żenczykowski @ 2024-06-13 14:21 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Linux NetDev, Eric Dumazet, Paolo Abeni, David S. Miller
On Thu, Jun 13, 2024 at 3:29 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 13 Jun 2024 14:18:41 +0200 Maciej Żenczykowski wrote:
> > The Android net tests
> > (available at https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/
> > more specifically multinetwork_test.py & neighbour_test.py)
> > run via:
> > /...aosp-tests.../net/test/run_net_test.sh --builder
> > from within a 6.10-rc3 kernel tree are falling over due to a *plethora* of:
> > TypeError: NLMsgHdr requires a bytes object of length 16, got 4
> >
> > The problems might be limited to RTM_GETROUTE and RTM_GETRULE and RTM_GETNEIGH,
> > as various other netlink using xfrm tests appear to be okay...
> >
> > (note: 6.10-rc3 also fails to build for UML due to a buggy bpf change,
> > but I sent out a 1-line fix for that already:
> > https://patchwork.kernel.org/project/netdevbpf/patch/20240613112520.1526350-1-maze@google.com/
> > )
> >
> > It is of course entirely possible the test code is buggy in how it
> > parses netlink, but it has worked for years and years...
> >
> > Before I go trying to bisect this... anyone have any idea what might
> > be the cause?
> > Perhaps some sort of change to how these dumps work? Some sort of new
> > netlink extended errors?
>
> Take a look at commit 5b4b62a169e1 ("rtnetlink: make the "split"
> NLM_DONE handling generic"), there may be more such workarounds missing.
Ok, I sent out 2 patches adding the flag in 3 more spots that are
enough to get both tests working.
The first in RTM_GETNEIGH seems obvious enough.
$ git grep rtnl_register.*RTM_GETNEIGH,
net/core/neighbour.c:3894: rtnl_register(PF_UNSPEC, RTM_GETNEIGH,
neigh_get, neigh_dump_info,
net/core/rtnetlink.c:6752: rtnl_register(PF_BRIDGE, RTM_GETNEIGH,
rtnl_fdb_get, rtnl_fdb_dump, 0);
net/mctp/neigh.c:331: rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETNEIGH,
but there is also PF_BRIDGE and PF_MCTP... (though obviously the test
doesn't care)
(and also RTM_GETNEIGHTBL...)
The RTM_GETRULE portion of the second one seems fine too:
$ git grep rtnl_register.*RTM_GETRULE
net/core/fib_rules.c:1296: rtnl_register(PF_UNSPEC, RTM_GETRULE,
NULL, fib_nl_dumprule,
but I'm less certain about the GET_ROUTE portion there-of... as
there's a lot of hits:
$ git grep rtnl_register.*RTM_GETROUTE
net/can/gw.c:1293: ret = rtnl_register_module(THIS_MODULE,
PF_CAN, RTM_GETROUTE,
net/core/rtnetlink.c:6743: rtnl_register(PF_UNSPEC, RTM_GETROUTE,
NULL, rtnl_dump_all, 0);
net/ipv4/fib_frontend.c:1662: rtnl_register(PF_INET, RTM_GETROUTE,
NULL, inet_dump_fib,
net/ipv4/ipmr.c:3162: rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE,
net/ipv4/route.c:3696: rtnl_register(PF_INET, RTM_GETROUTE,
inet_rtm_getroute, NULL,
net/ipv6/ip6_fib.c:2516: ret =
rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETROUTE, NULL,
net/ipv6/ip6mr.c:1394: err = rtnl_register_module(THIS_MODULE,
RTNL_FAMILY_IP6MR, RTM_GETROUTE,
net/ipv6/route.c:6737: ret = rtnl_register_module(THIS_MODULE,
PF_INET6, RTM_GETROUTE,
net/mctp/route.c:1481: rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETROUTE,
net/mpls/af_mpls.c:2755: rtnl_register_module(THIS_MODULE,
PF_MPLS, RTM_GETROUTE,
net/phonet/pn_netlink.c:304: rtnl_register_module(THIS_MODULE,
PF_PHONET, RTM_GETROUTE,
It seems like maybe v4 and both mr's should be changed too?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9
2024-06-13 14:21 ` Maciej Żenczykowski
@ 2024-06-13 14:59 ` Jakub Kicinski
0 siblings, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2024-06-13 14:59 UTC (permalink / raw)
To: Maciej Żenczykowski
Cc: Linux NetDev, Eric Dumazet, Paolo Abeni, David S. Miller
On Thu, 13 Jun 2024 16:21:15 +0200 Maciej Żenczykowski wrote:
> Ok, I sent out 2 patches adding the flag in 3 more spots that are
> enough to get both tests working.
Thanks!
> The first in RTM_GETNEIGH seems obvious enough.
>
> $ git grep rtnl_register.*RTM_GETNEIGH,
> net/core/neighbour.c:3894: rtnl_register(PF_UNSPEC, RTM_GETNEIGH,
> neigh_get, neigh_dump_info,
> net/core/rtnetlink.c:6752: rtnl_register(PF_BRIDGE, RTM_GETNEIGH,
> rtnl_fdb_get, rtnl_fdb_dump, 0);
> net/mctp/neigh.c:331: rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETNEIGH,
>
> but there is also PF_BRIDGE and PF_MCTP... (though obviously the test
> doesn't care)
> (and also RTM_GETNEIGHTBL...)
These weren't converted to the new way, so they will be okay.
> The RTM_GETRULE portion of the second one seems fine too:
>
> $ git grep rtnl_register.*RTM_GETRULE
> net/core/fib_rules.c:1296: rtnl_register(PF_UNSPEC, RTM_GETRULE,
> NULL, fib_nl_dumprule,
>
> but I'm less certain about the GET_ROUTE portion there-of... as
> there's a lot of hits:
>
> $ git grep rtnl_register.*RTM_GETROUTE
> net/can/gw.c:1293: ret = rtnl_register_module(THIS_MODULE,
> PF_CAN, RTM_GETROUTE,
> net/core/rtnetlink.c:6743: rtnl_register(PF_UNSPEC, RTM_GETROUTE,
> NULL, rtnl_dump_all, 0);
> net/ipv4/fib_frontend.c:1662: rtnl_register(PF_INET, RTM_GETROUTE,
> NULL, inet_dump_fib,
> net/ipv4/ipmr.c:3162: rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE,
> net/ipv4/route.c:3696: rtnl_register(PF_INET, RTM_GETROUTE,
> inet_rtm_getroute, NULL,
> net/ipv6/ip6_fib.c:2516: ret =
> rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETROUTE, NULL,
> net/ipv6/ip6mr.c:1394: err = rtnl_register_module(THIS_MODULE,
> RTNL_FAMILY_IP6MR, RTM_GETROUTE,
> net/ipv6/route.c:6737: ret = rtnl_register_module(THIS_MODULE,
> PF_INET6, RTM_GETROUTE,
> net/mctp/route.c:1481: rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETROUTE,
> net/mpls/af_mpls.c:2755: rtnl_register_module(THIS_MODULE,
> PF_MPLS, RTM_GETROUTE,
> net/phonet/pn_netlink.c:304: rtnl_register_module(THIS_MODULE,
> PF_PHONET, RTM_GETROUTE,
>
> It seems like maybe v4 and both mr's should be changed too?
Didn't check MR, the v4 route dump has the flag already, AFAICS.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-13 14:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 12:18 Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9 Maciej Żenczykowski
2024-06-13 13:29 ` Jakub Kicinski
2024-06-13 14:21 ` Maciej Żenczykowski
2024-06-13 14:59 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).