netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9
@ 2024-06-13 12:18 Maciej Żenczykowski
  2024-06-13 13:29 ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: Maciej Żenczykowski @ 2024-06-13 12:18 UTC (permalink / raw)
  To: Linux NetDev, Eric Dumazet, Paolo Abeni, Jakub Kicinski,
	David S. Miller

The Android net tests
(available at https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/
more specifically multinetwork_test.py & neighbour_test.py)
run via:
  /...aosp-tests.../net/test/run_net_test.sh --builder
from within a 6.10-rc3 kernel tree are falling over due to a *plethora* of:
  TypeError: NLMsgHdr requires a bytes object of length 16, got 4

The problems might be limited to RTM_GETROUTE and RTM_GETRULE and RTM_GETNEIGH,
as various other netlink using xfrm tests appear to be okay...

(note: 6.10-rc3 also fails to build for UML due to a buggy bpf change,
but I sent out a 1-line fix for that already:
https://patchwork.kernel.org/project/netdevbpf/patch/20240613112520.1526350-1-maze@google.com/
)

It is of course entirely possible the test code is buggy in how it
parses netlink, but it has worked for years and years...

Before I go trying to bisect this... anyone have any idea what might
be the cause?
Perhaps some sort of change to how these dumps work? Some sort of new
netlink extended errors?

- Maciej

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9
  2024-06-13 12:18 Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9 Maciej Żenczykowski
@ 2024-06-13 13:29 ` Jakub Kicinski
  2024-06-13 14:21   ` Maciej Żenczykowski
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2024-06-13 13:29 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Linux NetDev, Eric Dumazet, Paolo Abeni, David S. Miller

On Thu, 13 Jun 2024 14:18:41 +0200 Maciej Żenczykowski wrote:
> The Android net tests
> (available at https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/
> more specifically multinetwork_test.py & neighbour_test.py)
> run via:
>   /...aosp-tests.../net/test/run_net_test.sh --builder
> from within a 6.10-rc3 kernel tree are falling over due to a *plethora* of:
>   TypeError: NLMsgHdr requires a bytes object of length 16, got 4
> 
> The problems might be limited to RTM_GETROUTE and RTM_GETRULE and RTM_GETNEIGH,
> as various other netlink using xfrm tests appear to be okay...
> 
> (note: 6.10-rc3 also fails to build for UML due to a buggy bpf change,
> but I sent out a 1-line fix for that already:
> https://patchwork.kernel.org/project/netdevbpf/patch/20240613112520.1526350-1-maze@google.com/
> )
> 
> It is of course entirely possible the test code is buggy in how it
> parses netlink, but it has worked for years and years...
> 
> Before I go trying to bisect this... anyone have any idea what might
> be the cause?
> Perhaps some sort of change to how these dumps work? Some sort of new
> netlink extended errors?

Take a look at commit 5b4b62a169e1 ("rtnetlink: make the "split"
NLM_DONE handling generic"), there may be more such workarounds missing. 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9
  2024-06-13 13:29 ` Jakub Kicinski
@ 2024-06-13 14:21   ` Maciej Żenczykowski
  2024-06-13 14:59     ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: Maciej Żenczykowski @ 2024-06-13 14:21 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Linux NetDev, Eric Dumazet, Paolo Abeni, David S. Miller

On Thu, Jun 13, 2024 at 3:29 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 13 Jun 2024 14:18:41 +0200 Maciej Żenczykowski wrote:
> > The Android net tests
> > (available at https://cs.android.com/android/platform/superproject/main/+/main:kernel/tests/net/test/
> > more specifically multinetwork_test.py & neighbour_test.py)
> > run via:
> >   /...aosp-tests.../net/test/run_net_test.sh --builder
> > from within a 6.10-rc3 kernel tree are falling over due to a *plethora* of:
> >   TypeError: NLMsgHdr requires a bytes object of length 16, got 4
> >
> > The problems might be limited to RTM_GETROUTE and RTM_GETRULE and RTM_GETNEIGH,
> > as various other netlink using xfrm tests appear to be okay...
> >
> > (note: 6.10-rc3 also fails to build for UML due to a buggy bpf change,
> > but I sent out a 1-line fix for that already:
> > https://patchwork.kernel.org/project/netdevbpf/patch/20240613112520.1526350-1-maze@google.com/
> > )
> >
> > It is of course entirely possible the test code is buggy in how it
> > parses netlink, but it has worked for years and years...
> >
> > Before I go trying to bisect this... anyone have any idea what might
> > be the cause?
> > Perhaps some sort of change to how these dumps work? Some sort of new
> > netlink extended errors?
>
> Take a look at commit 5b4b62a169e1 ("rtnetlink: make the "split"
> NLM_DONE handling generic"), there may be more such workarounds missing.

Ok, I sent out 2 patches adding the flag in 3 more spots that are
enough to get both tests working.

The first in RTM_GETNEIGH seems obvious enough.

$ git grep rtnl_register.*RTM_GETNEIGH,
net/core/neighbour.c:3894:      rtnl_register(PF_UNSPEC, RTM_GETNEIGH,
neigh_get, neigh_dump_info,
net/core/rtnetlink.c:6752:      rtnl_register(PF_BRIDGE, RTM_GETNEIGH,
rtnl_fdb_get, rtnl_fdb_dump, 0);
net/mctp/neigh.c:331:   rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETNEIGH,

but there is also PF_BRIDGE and PF_MCTP... (though obviously the test
doesn't care)
(and also RTM_GETNEIGHTBL...)

The RTM_GETRULE portion of the second one seems fine too:

$ git grep rtnl_register.*RTM_GETRULE
net/core/fib_rules.c:1296:      rtnl_register(PF_UNSPEC, RTM_GETRULE,
NULL, fib_nl_dumprule,

but I'm less certain about the GET_ROUTE portion there-of... as
there's a lot of hits:

$ git grep rtnl_register.*RTM_GETROUTE
net/can/gw.c:1293:      ret = rtnl_register_module(THIS_MODULE,
PF_CAN, RTM_GETROUTE,
net/core/rtnetlink.c:6743:      rtnl_register(PF_UNSPEC, RTM_GETROUTE,
NULL, rtnl_dump_all, 0);
net/ipv4/fib_frontend.c:1662:   rtnl_register(PF_INET, RTM_GETROUTE,
NULL, inet_dump_fib,
net/ipv4/ipmr.c:3162:   rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE,
net/ipv4/route.c:3696:  rtnl_register(PF_INET, RTM_GETROUTE,
inet_rtm_getroute, NULL,
net/ipv6/ip6_fib.c:2516:        ret =
rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETROUTE, NULL,
net/ipv6/ip6mr.c:1394:  err = rtnl_register_module(THIS_MODULE,
RTNL_FAMILY_IP6MR, RTM_GETROUTE,
net/ipv6/route.c:6737:  ret = rtnl_register_module(THIS_MODULE,
PF_INET6, RTM_GETROUTE,
net/mctp/route.c:1481:  rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETROUTE,
net/mpls/af_mpls.c:2755:        rtnl_register_module(THIS_MODULE,
PF_MPLS, RTM_GETROUTE,
net/phonet/pn_netlink.c:304:    rtnl_register_module(THIS_MODULE,
PF_PHONET, RTM_GETROUTE,

It seems like maybe v4 and both mr's should be changed too?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9
  2024-06-13 14:21   ` Maciej Żenczykowski
@ 2024-06-13 14:59     ` Jakub Kicinski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2024-06-13 14:59 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Linux NetDev, Eric Dumazet, Paolo Abeni, David S. Miller

On Thu, 13 Jun 2024 16:21:15 +0200 Maciej Żenczykowski wrote:
> Ok, I sent out 2 patches adding the flag in 3 more spots that are
> enough to get both tests working.

Thanks!

> The first in RTM_GETNEIGH seems obvious enough.
> 
> $ git grep rtnl_register.*RTM_GETNEIGH,
> net/core/neighbour.c:3894:      rtnl_register(PF_UNSPEC, RTM_GETNEIGH,
> neigh_get, neigh_dump_info,
> net/core/rtnetlink.c:6752:      rtnl_register(PF_BRIDGE, RTM_GETNEIGH,
> rtnl_fdb_get, rtnl_fdb_dump, 0);
> net/mctp/neigh.c:331:   rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETNEIGH,
> 
> but there is also PF_BRIDGE and PF_MCTP... (though obviously the test
> doesn't care)
> (and also RTM_GETNEIGHTBL...)

These weren't converted to the new way, so they will be okay.

> The RTM_GETRULE portion of the second one seems fine too:
> 
> $ git grep rtnl_register.*RTM_GETRULE
> net/core/fib_rules.c:1296:      rtnl_register(PF_UNSPEC, RTM_GETRULE,
> NULL, fib_nl_dumprule,
> 
> but I'm less certain about the GET_ROUTE portion there-of... as
> there's a lot of hits:
> 
> $ git grep rtnl_register.*RTM_GETROUTE
> net/can/gw.c:1293:      ret = rtnl_register_module(THIS_MODULE,
> PF_CAN, RTM_GETROUTE,
> net/core/rtnetlink.c:6743:      rtnl_register(PF_UNSPEC, RTM_GETROUTE,
> NULL, rtnl_dump_all, 0);
> net/ipv4/fib_frontend.c:1662:   rtnl_register(PF_INET, RTM_GETROUTE,
> NULL, inet_dump_fib,
> net/ipv4/ipmr.c:3162:   rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE,
> net/ipv4/route.c:3696:  rtnl_register(PF_INET, RTM_GETROUTE,
> inet_rtm_getroute, NULL,
> net/ipv6/ip6_fib.c:2516:        ret =
> rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETROUTE, NULL,
> net/ipv6/ip6mr.c:1394:  err = rtnl_register_module(THIS_MODULE,
> RTNL_FAMILY_IP6MR, RTM_GETROUTE,
> net/ipv6/route.c:6737:  ret = rtnl_register_module(THIS_MODULE,
> PF_INET6, RTM_GETROUTE,
> net/mctp/route.c:1481:  rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETROUTE,
> net/mpls/af_mpls.c:2755:        rtnl_register_module(THIS_MODULE,
> PF_MPLS, RTM_GETROUTE,
> net/phonet/pn_netlink.c:304:    rtnl_register_module(THIS_MODULE,
> PF_PHONET, RTM_GETROUTE,
> 
> It seems like maybe v4 and both mr's should be changed too?

Didn't check MR, the v4 route dump has the flag already, AFAICS.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-13 14:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 12:18 Some sort of netlink RTM_GET(ROUTE|RULE|NEIGH) regression(?) in 6.10-rc3 vs 6.9 Maciej Żenczykowski
2024-06-13 13:29 ` Jakub Kicinski
2024-06-13 14:21   ` Maciej Żenczykowski
2024-06-13 14:59     ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).