* Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9 @ 2024-06-13 12:18 Maciej Żenczykowski 2024-06-13 13:29 ` Jakub Kicinski 0 siblings, 1 reply; 4+ messages in thread From: Maciej Żenczykowski @ 2024-06-13 12:18 UTC (permalink / raw) To: Linux NetDev, Eric Dumazet, Paolo Abeni, Jakub Kicinski, David S. Miller The Android net tests (available at https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/ more specifically multinetwork_test.py & neighbour_test.py) run via: /...aosp-tests.../net/test/run_net_test.sh --builder from within a 6.10-rc3 kernel tree are falling over due to a *plethora* of: TypeError: NLMsgHdr requires a bytes object of length 16, got 4 The problems might be limited to RTM_GETROUTE and RTM_GETRULE and RTM_GETNEIGH, as various other netlink using xfrm tests appear to be okay... (note: 6.10-rc3 also fails to build for UML due to a buggy bpf change, but I sent out a 1-line fix for that already: https://patchwork.kernel.org/project/netdevbpf/patch/20240613112520.1526350-1-maze@google.com/ ) It is of course entirely possible the test code is buggy in how it parses netlink, but it has worked for years and years... Before I go trying to bisect this... anyone have any idea what might be the cause? Perhaps some sort of change to how these dumps work? Some sort of new netlink extended errors? - Maciej ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9 2024-06-13 12:18 Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9 Maciej Żenczykowski @ 2024-06-13 13:29 ` Jakub Kicinski 2024-06-13 14:21 ` Maciej Żenczykowski 0 siblings, 1 reply; 4+ messages in thread From: Jakub Kicinski @ 2024-06-13 13:29 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Linux NetDev, Eric Dumazet, Paolo Abeni, David S. Miller On Thu, 13 Jun 2024 14:18:41 +0200 Maciej Żenczykowski wrote: > The Android net tests > (available at https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/ > more specifically multinetwork_test.py & neighbour_test.py) > run via: > /...aosp-tests.../net/test/run_net_test.sh --builder > from within a 6.10-rc3 kernel tree are falling over due to a *plethora* of: > TypeError: NLMsgHdr requires a bytes object of length 16, got 4 > > The problems might be limited to RTM_GETROUTE and RTM_GETRULE and RTM_GETNEIGH, > as various other netlink using xfrm tests appear to be okay... > > (note: 6.10-rc3 also fails to build for UML due to a buggy bpf change, > but I sent out a 1-line fix for that already: > https://patchwork.kernel.org/project/netdevbpf/patch/20240613112520.1526350-1-maze@google.com/ > ) > > It is of course entirely possible the test code is buggy in how it > parses netlink, but it has worked for years and years... > > Before I go trying to bisect this... anyone have any idea what might > be the cause? > Perhaps some sort of change to how these dumps work? Some sort of new > netlink extended errors? Take a look at commit 5b4b62a169e1 ("rtnetlink: make the "split" NLM_DONE handling generic"), there may be more such workarounds missing. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9 2024-06-13 13:29 ` Jakub Kicinski @ 2024-06-13 14:21 ` Maciej Żenczykowski 2024-06-13 14:59 ` Jakub Kicinski 0 siblings, 1 reply; 4+ messages in thread From: Maciej Żenczykowski @ 2024-06-13 14:21 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Linux NetDev, Eric Dumazet, Paolo Abeni, David S. Miller On Thu, Jun 13, 2024 at 3:29 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 13 Jun 2024 14:18:41 +0200 Maciej Żenczykowski wrote: > > The Android net tests > > (available at https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/ > > more specifically multinetwork_test.py & neighbour_test.py) > > run via: > > /...aosp-tests.../net/test/run_net_test.sh --builder > > from within a 6.10-rc3 kernel tree are falling over due to a *plethora* of: > > TypeError: NLMsgHdr requires a bytes object of length 16, got 4 > > > > The problems might be limited to RTM_GETROUTE and RTM_GETRULE and RTM_GETNEIGH, > > as various other netlink using xfrm tests appear to be okay... > > > > (note: 6.10-rc3 also fails to build for UML due to a buggy bpf change, > > but I sent out a 1-line fix for that already: > > https://patchwork.kernel.org/project/netdevbpf/patch/20240613112520.1526350-1-maze@google.com/ > > ) > > > > It is of course entirely possible the test code is buggy in how it > > parses netlink, but it has worked for years and years... > > > > Before I go trying to bisect this... anyone have any idea what might > > be the cause? > > Perhaps some sort of change to how these dumps work? Some sort of new > > netlink extended errors? > > Take a look at commit 5b4b62a169e1 ("rtnetlink: make the "split" > NLM_DONE handling generic"), there may be more such workarounds missing. Ok, I sent out 2 patches adding the flag in 3 more spots that are enough to get both tests working. The first in RTM_GETNEIGH seems obvious enough. $ git grep rtnl_register.*RTM_GETNEIGH, net/core/neighbour.c:3894: rtnl_register(PF_UNSPEC, RTM_GETNEIGH, neigh_get, neigh_dump_info, net/core/rtnetlink.c:6752: rtnl_register(PF_BRIDGE, RTM_GETNEIGH, rtnl_fdb_get, rtnl_fdb_dump, 0); net/mctp/neigh.c:331: rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETNEIGH, but there is also PF_BRIDGE and PF_MCTP... (though obviously the test doesn't care) (and also RTM_GETNEIGHTBL...) The RTM_GETRULE portion of the second one seems fine too: $ git grep rtnl_register.*RTM_GETRULE net/core/fib_rules.c:1296: rtnl_register(PF_UNSPEC, RTM_GETRULE, NULL, fib_nl_dumprule, but I'm less certain about the GET_ROUTE portion there-of... as there's a lot of hits: $ git grep rtnl_register.*RTM_GETROUTE net/can/gw.c:1293: ret = rtnl_register_module(THIS_MODULE, PF_CAN, RTM_GETROUTE, net/core/rtnetlink.c:6743: rtnl_register(PF_UNSPEC, RTM_GETROUTE, NULL, rtnl_dump_all, 0); net/ipv4/fib_frontend.c:1662: rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib, net/ipv4/ipmr.c:3162: rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE, net/ipv4/route.c:3696: rtnl_register(PF_INET, RTM_GETROUTE, inet_rtm_getroute, NULL, net/ipv6/ip6_fib.c:2516: ret = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETROUTE, NULL, net/ipv6/ip6mr.c:1394: err = rtnl_register_module(THIS_MODULE, RTNL_FAMILY_IP6MR, RTM_GETROUTE, net/ipv6/route.c:6737: ret = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETROUTE, net/mctp/route.c:1481: rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETROUTE, net/mpls/af_mpls.c:2755: rtnl_register_module(THIS_MODULE, PF_MPLS, RTM_GETROUTE, net/phonet/pn_netlink.c:304: rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_GETROUTE, It seems like maybe v4 and both mr's should be changed too? ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9 2024-06-13 14:21 ` Maciej Żenczykowski @ 2024-06-13 14:59 ` Jakub Kicinski 0 siblings, 0 replies; 4+ messages in thread From: Jakub Kicinski @ 2024-06-13 14:59 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Linux NetDev, Eric Dumazet, Paolo Abeni, David S. Miller On Thu, 13 Jun 2024 16:21:15 +0200 Maciej Żenczykowski wrote: > Ok, I sent out 2 patches adding the flag in 3 more spots that are > enough to get both tests working. Thanks! > The first in RTM_GETNEIGH seems obvious enough. > > $ git grep rtnl_register.*RTM_GETNEIGH, > net/core/neighbour.c:3894: rtnl_register(PF_UNSPEC, RTM_GETNEIGH, > neigh_get, neigh_dump_info, > net/core/rtnetlink.c:6752: rtnl_register(PF_BRIDGE, RTM_GETNEIGH, > rtnl_fdb_get, rtnl_fdb_dump, 0); > net/mctp/neigh.c:331: rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETNEIGH, > > but there is also PF_BRIDGE and PF_MCTP... (though obviously the test > doesn't care) > (and also RTM_GETNEIGHTBL...) These weren't converted to the new way, so they will be okay. > The RTM_GETRULE portion of the second one seems fine too: > > $ git grep rtnl_register.*RTM_GETRULE > net/core/fib_rules.c:1296: rtnl_register(PF_UNSPEC, RTM_GETRULE, > NULL, fib_nl_dumprule, > > but I'm less certain about the GET_ROUTE portion there-of... as > there's a lot of hits: > > $ git grep rtnl_register.*RTM_GETROUTE > net/can/gw.c:1293: ret = rtnl_register_module(THIS_MODULE, > PF_CAN, RTM_GETROUTE, > net/core/rtnetlink.c:6743: rtnl_register(PF_UNSPEC, RTM_GETROUTE, > NULL, rtnl_dump_all, 0); > net/ipv4/fib_frontend.c:1662: rtnl_register(PF_INET, RTM_GETROUTE, > NULL, inet_dump_fib, > net/ipv4/ipmr.c:3162: rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE, > net/ipv4/route.c:3696: rtnl_register(PF_INET, RTM_GETROUTE, > inet_rtm_getroute, NULL, > net/ipv6/ip6_fib.c:2516: ret = > rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETROUTE, NULL, > net/ipv6/ip6mr.c:1394: err = rtnl_register_module(THIS_MODULE, > RTNL_FAMILY_IP6MR, RTM_GETROUTE, > net/ipv6/route.c:6737: ret = rtnl_register_module(THIS_MODULE, > PF_INET6, RTM_GETROUTE, > net/mctp/route.c:1481: rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETROUTE, > net/mpls/af_mpls.c:2755: rtnl_register_module(THIS_MODULE, > PF_MPLS, RTM_GETROUTE, > net/phonet/pn_netlink.c:304: rtnl_register_module(THIS_MODULE, > PF_PHONET, RTM_GETROUTE, > > It seems like maybe v4 and both mr's should be changed too? Didn't check MR, the v4 route dump has the flag already, AFAICS. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-13 14:59 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-13 12:18 Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9 Maciej Żenczykowski 2024-06-13 13:29 ` Jakub Kicinski 2024-06-13 14:21 ` Maciej Żenczykowski 2024-06-13 14:59 ` Jakub Kicinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).