* [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features [not found] <20260310-offload_compute-v1-0-3df79c09ea65@gmail.com> @ 2026-03-10 17:02 ` syzbot ci 2026-03-10 19:17 ` Sabrina Dubroca 0 siblings, 1 reply; 10+ messages in thread From: syzbot ci @ 2026-03-10 17:02 UTC (permalink / raw) To: andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba, linux-kernel, liuhangbin, netdev, pabeni, razor, sridhar.samudrala Cc: syzbot, syzkaller-bugs syzbot ci has tested the following series [v1] net: move netdev_compute_master_upper_features to ndo_set_features https://lore.kernel.org/all/20260310-offload_compute-v1-0-3df79c09ea65@gmail.com * [PATCH net-next 1/3] net: use ndo_set_features to set offload features for bonding/bridge/team * [PATCH net-next 2/3] failover: use ndo_set_features for failover offload compute * [PATCH net-next 3/3] net: no need to disable LRO specifically and found the following issue: WARNING in rtmsg_ifinfo_build_skb Full report is available here: https://ci.syzbot.org/series/d5001c4a-a51e-49c1-9106-624836e43ec2 *** WARNING in rtmsg_ifinfo_build_skb tree: net-next URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git base: 52ede1bce557c66309f41ac29dd190be23ca9129 arch: amd64 compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8 config: https://ci.syzbot.org/builds/281e7e02-f6af-4b4a-845e-d1d5842a9301/config batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active hsr_slave_0: entered promiscuous mode hsr_slave_1: entered promiscuous mode ------------[ cut here ]------------ err == -EMSGSIZE WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496 Modules linked in: CPU: 0 UID: 0 PID: 6496 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:rtmsg_ifinfo_build_skb+0x218/0x260 Code: f6 ba 01 00 00 00 89 e9 e8 25 15 3a 00 4c 89 f0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d e9 7f 3a 2e 02 cc e8 49 3b 42 f8 90 <0f> 0b 90 eb 90 89 d9 80 e1 07 fe c1 38 c1 0f 8c 95 fe ff ff 48 89 RSP: 0018:ffffc9000637e9a0 EFLAGS: 00010293 RAX: ffffffff89835e27 RBX: 0000000000000000 RCX: ffff8881b80a57c0 RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6 RBP: 00000000ffffffa6 R08: 0000000000000004 R09: 0000000000000004 R10: fffff52000c6fcdc R11: 0000000000000000 R12: 1ffff110235ddc21 R13: 0000000000000000 R14: ffff8881133dc780 R15: ffff88811aeee000 FS: 0000555557c4a500(0000) GS:ffff88818de65000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055555e1838c8 CR3: 0000000168b80000 CR4: 00000000000006f0 Call Trace: <TASK> rtnetlink_event+0x1b7/0x270 notifier_call_chain+0x1be/0x400 netdev_change_features+0x95/0xe0 __netdev_upper_dev_link+0xb20/0xc80 netdev_upper_dev_link+0xb0/0x100 macsec_newlink+0xb11/0x1200 rtnl_newlink_create+0x329/0xb70 rtnl_newlink+0x1666/0x1be0 rtnetlink_rcv_msg+0x7d5/0xbe0 netlink_rcv_skb+0x232/0x4b0 netlink_unicast+0x80f/0x9b0 netlink_sendmsg+0x813/0xb40 __sys_sendto+0x672/0x710 __x64_sys_sendto+0xde/0x100 do_syscall_64+0x14d/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ffadf757917 Code: 48 89 fa 4c 89 df e8 a8 56 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff RSP: 002b:00007ffd0eb79f80 EFLAGS: 00000202 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000555557c4a500 RCX: 00007ffadf757917 RDX: 0000000000000054 RSI: 00007ffae0544670 RDI: 0000000000000003 RBP: 0000000000000001 R08: 00007ffd0eb79fe4 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003 R13: 0000000000000000 R14: 00007ffae0544670 R15: 0000000000000000 </TASK> *** If these findings have caused you to resend the series or submit a separate fix, please add the following tag to your commit message: Tested-by: syzbot@syzkaller.appspotmail.com --- This report is generated by a bot. It may contain errors. syzbot ci engineers can be reached at syzkaller@googlegroups.com. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features 2026-03-10 17:02 ` [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features syzbot ci @ 2026-03-10 19:17 ` Sabrina Dubroca 2026-03-11 0:47 ` Hangbin Liu 0 siblings, 1 reply; 10+ messages in thread From: Sabrina Dubroca @ 2026-03-10 19:17 UTC (permalink / raw) To: syzbot ci Cc: andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba, linux-kernel, liuhangbin, netdev, pabeni, razor, sridhar.samudrala, syzbot, syzkaller-bugs 2026-03-10, 10:02:09 -0700, syzbot ci wrote: > batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active > hsr_slave_0: entered promiscuous mode > hsr_slave_1: entered promiscuous mode > ------------[ cut here ]------------ > err == -EMSGSIZE > WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496 I'm not sure this one is caused by this series, but either way, reviewing if_nlmsg_size/rtnl_fill_ifinfo for mismatches is really unpleasant :/ Things I see in rtnl_fill_ifinfo but don't find in if_nlmsg_size: - IFLA_PARENT_DEV_NAME - IFLA_PARENT_DEV_BUS_NAME (both from 00e77ed8e64d ("rtnetlink: add IFLA_PARENT_[DEV|DEV_BUS]_NAME"), which doesn't include a change to if_nlmsg_size) - rtnl_link_slave_info_fill also outputs IFLA_INFO_SLAVE_KIND + the IFLA_INFO_SLAVE_DATA nest, but rtnl_link_get_slave_info_data_size only counts the nest, and its caller (rtnl_link_get_size) doesn't have anything more about the slave info. This may be what syzbot is tripping on here. But there's a + nla_total_size(4) /* IFLA_WEIGHT */ that doesn't get filled anywhere. > Modules linked in: > CPU: 0 UID: 0 PID: 6496 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > RIP: 0010:rtmsg_ifinfo_build_skb+0x218/0x260 > Code: f6 ba 01 00 00 00 89 e9 e8 25 15 3a 00 4c 89 f0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d e9 7f 3a 2e 02 cc e8 49 3b 42 f8 90 <0f> 0b 90 eb 90 89 d9 80 e1 07 fe c1 38 c1 0f 8c 95 fe ff ff 48 89 > RSP: 0018:ffffc9000637e9a0 EFLAGS: 00010293 > RAX: ffffffff89835e27 RBX: 0000000000000000 RCX: ffff8881b80a57c0 > RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6 > RBP: 00000000ffffffa6 R08: 0000000000000004 R09: 0000000000000004 > R10: fffff52000c6fcdc R11: 0000000000000000 R12: 1ffff110235ddc21 > R13: 0000000000000000 R14: ffff8881133dc780 R15: ffff88811aeee000 > FS: 0000555557c4a500(0000) GS:ffff88818de65000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055555e1838c8 CR3: 0000000168b80000 CR4: 00000000000006f0 > Call Trace: > <TASK> > rtnetlink_event+0x1b7/0x270 > notifier_call_chain+0x1be/0x400 > netdev_change_features+0x95/0xe0 > __netdev_upper_dev_link+0xb20/0xc80 > netdev_upper_dev_link+0xb0/0x100 > macsec_newlink+0xb11/0x1200 > rtnl_newlink_create+0x329/0xb70 > rtnl_newlink+0x1666/0x1be0 -- Sabrina ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features 2026-03-10 19:17 ` Sabrina Dubroca @ 2026-03-11 0:47 ` Hangbin Liu 2026-03-11 21:18 ` Sabrina Dubroca 0 siblings, 1 reply; 10+ messages in thread From: Hangbin Liu @ 2026-03-11 0:47 UTC (permalink / raw) To: Sabrina Dubroca Cc: syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba, linux-kernel, netdev, pabeni, razor, sridhar.samudrala, syzbot, syzkaller-bugs On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote: > 2026-03-10, 10:02:09 -0700, syzbot ci wrote: > > batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active > > hsr_slave_0: entered promiscuous mode > > hsr_slave_1: entered promiscuous mode > > ------------[ cut here ]------------ > > err == -EMSGSIZE > > WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496 > > I'm not sure this one is caused by this series, but either way, rtnetlink_event+0x1b7/0x270 notifier_call_chain+0x1be/0x400 netdev_change_features+0x95/0xe0 __netdev_upper_dev_link+0xb20/0xc80 netdev_upper_dev_link+0xb0/0x100 This patch calls netdev_change_features() after __netdev_upper_dev_link(), Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event() to fill the new link info. Maybe the event is a bit early and macsec has data not ready? Thanks Hangbin > reviewing if_nlmsg_size/rtnl_fill_ifinfo for mismatches is really > unpleasant :/ > > Things I see in rtnl_fill_ifinfo but don't find in if_nlmsg_size: > - IFLA_PARENT_DEV_NAME > - IFLA_PARENT_DEV_BUS_NAME > (both from 00e77ed8e64d ("rtnetlink: add > IFLA_PARENT_[DEV|DEV_BUS]_NAME"), which doesn't include a change to > if_nlmsg_size) > - rtnl_link_slave_info_fill also outputs IFLA_INFO_SLAVE_KIND + the > IFLA_INFO_SLAVE_DATA nest, but rtnl_link_get_slave_info_data_size > only counts the nest, and its caller (rtnl_link_get_size) doesn't > have anything more about the slave info. This may be what syzbot is > tripping on here. > > > But there's a > > + nla_total_size(4) /* IFLA_WEIGHT */ > > that doesn't get filled anywhere. > > > > Modules linked in: > > CPU: 0 UID: 0 PID: 6496 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > > RIP: 0010:rtmsg_ifinfo_build_skb+0x218/0x260 > > Code: f6 ba 01 00 00 00 89 e9 e8 25 15 3a 00 4c 89 f0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d e9 7f 3a 2e 02 cc e8 49 3b 42 f8 90 <0f> 0b 90 eb 90 89 d9 80 e1 07 fe c1 38 c1 0f 8c 95 fe ff ff 48 89 > > RSP: 0018:ffffc9000637e9a0 EFLAGS: 00010293 > > RAX: ffffffff89835e27 RBX: 0000000000000000 RCX: ffff8881b80a57c0 > > RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6 > > RBP: 00000000ffffffa6 R08: 0000000000000004 R09: 0000000000000004 > > R10: fffff52000c6fcdc R11: 0000000000000000 R12: 1ffff110235ddc21 > > R13: 0000000000000000 R14: ffff8881133dc780 R15: ffff88811aeee000 > > FS: 0000555557c4a500(0000) GS:ffff88818de65000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000055555e1838c8 CR3: 0000000168b80000 CR4: 00000000000006f0 > > Call Trace: > > <TASK> > > rtnetlink_event+0x1b7/0x270 > > notifier_call_chain+0x1be/0x400 > > netdev_change_features+0x95/0xe0 > > __netdev_upper_dev_link+0xb20/0xc80 > > netdev_upper_dev_link+0xb0/0x100 > > macsec_newlink+0xb11/0x1200 > > rtnl_newlink_create+0x329/0xb70 > > rtnl_newlink+0x1666/0x1be0 > > -- > Sabrina ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features 2026-03-11 0:47 ` Hangbin Liu @ 2026-03-11 21:18 ` Sabrina Dubroca 2026-03-12 9:40 ` Paolo Abeni 0 siblings, 1 reply; 10+ messages in thread From: Sabrina Dubroca @ 2026-03-11 21:18 UTC (permalink / raw) To: Hangbin Liu Cc: syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba, linux-kernel, netdev, pabeni, razor, sridhar.samudrala, syzbot, syzkaller-bugs 2026-03-11, 00:47:41 +0000, Hangbin Liu wrote: > On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote: > > 2026-03-10, 10:02:09 -0700, syzbot ci wrote: > > > batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active > > > hsr_slave_0: entered promiscuous mode > > > hsr_slave_1: entered promiscuous mode > > > ------------[ cut here ]------------ > > > err == -EMSGSIZE > > > WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496 > > > > I'm not sure this one is caused by this series, but either way, > > > rtnetlink_event+0x1b7/0x270 > notifier_call_chain+0x1be/0x400 > netdev_change_features+0x95/0xe0 > __netdev_upper_dev_link+0xb20/0xc80 > netdev_upper_dev_link+0xb0/0x100 > > > This patch calls netdev_change_features() after __netdev_upper_dev_link(), > Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event() > to fill the new link info. Maybe the event is a bit early and macsec has > data not ready? But this would still mean that there's a mismatch between if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only revealing it. I'll send fixes for the stuff I mentioned, no idea if that's what syzbot saw since we don't have a repro. -- Sabrina ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features 2026-03-11 21:18 ` Sabrina Dubroca @ 2026-03-12 9:40 ` Paolo Abeni 2026-03-12 11:13 ` Sabrina Dubroca 0 siblings, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2026-03-12 9:40 UTC (permalink / raw) To: Sabrina Dubroca, Hangbin Liu Cc: syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba, linux-kernel, netdev, razor, sridhar.samudrala, syzbot, syzkaller-bugs On 3/11/26 10:18 PM, Sabrina Dubroca wrote: > 2026-03-11, 00:47:41 +0000, Hangbin Liu wrote: >> On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote: >>> 2026-03-10, 10:02:09 -0700, syzbot ci wrote: >>>> batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active >>>> hsr_slave_0: entered promiscuous mode >>>> hsr_slave_1: entered promiscuous mode >>>> ------------[ cut here ]------------ >>>> err == -EMSGSIZE >>>> WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496 >>> >>> I'm not sure this one is caused by this series, but either way, >> >> >> rtnetlink_event+0x1b7/0x270 >> notifier_call_chain+0x1be/0x400 >> netdev_change_features+0x95/0xe0 >> __netdev_upper_dev_link+0xb20/0xc80 >> netdev_upper_dev_link+0xb0/0x100 >> >> >> This patch calls netdev_change_features() after __netdev_upper_dev_link(), >> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event() >> to fill the new link info. Maybe the event is a bit early and macsec has >> data not ready? > > But this would still mean that there's a mismatch between > if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only > revealing it. > > I'll send fixes for the stuff I mentioned, no idea if that's what > syzbot saw since we don't have a repro. It looks like even the nipa CI is reproducing the issue, i.e.: https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/554921/17-rtnetlink-sh/ more failures here: https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2026-03-12--06-00&pw-n=0&pass=0 the fail in mascsec-offload looks quite deterministic, could you please have a look? /P ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features 2026-03-12 9:40 ` Paolo Abeni @ 2026-03-12 11:13 ` Sabrina Dubroca 2026-03-12 14:34 ` Hangbin Liu 0 siblings, 1 reply; 10+ messages in thread From: Sabrina Dubroca @ 2026-03-12 11:13 UTC (permalink / raw) To: Paolo Abeni Cc: Hangbin Liu, syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba, linux-kernel, netdev, razor, sridhar.samudrala, syzbot, syzkaller-bugs 2026-03-12, 10:40:43 +0100, Paolo Abeni wrote: > On 3/11/26 10:18 PM, Sabrina Dubroca wrote: > > 2026-03-11, 00:47:41 +0000, Hangbin Liu wrote: > >> On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote: > >>> 2026-03-10, 10:02:09 -0700, syzbot ci wrote: > >>>> batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active > >>>> hsr_slave_0: entered promiscuous mode > >>>> hsr_slave_1: entered promiscuous mode > >>>> ------------[ cut here ]------------ > >>>> err == -EMSGSIZE > >>>> WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496 > >>> > >>> I'm not sure this one is caused by this series, but either way, > >> > >> > >> rtnetlink_event+0x1b7/0x270 > >> notifier_call_chain+0x1be/0x400 > >> netdev_change_features+0x95/0xe0 > >> __netdev_upper_dev_link+0xb20/0xc80 > >> netdev_upper_dev_link+0xb0/0x100 > >> > >> > >> This patch calls netdev_change_features() after __netdev_upper_dev_link(), > >> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event() > >> to fill the new link info. Maybe the event is a bit early and macsec has > >> data not ready? > > > > But this would still mean that there's a mismatch between > > if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only > > revealing it. > > > > I'll send fixes for the stuff I mentioned, no idea if that's what > > syzbot saw since we don't have a repro. > > It looks like even the nipa CI is reproducing the issue, i.e.: > > https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/554921/17-rtnetlink-sh/ > > more failures here: > > https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2026-03-12--06-00&pw-n=0&pass=0 > > the fail in mascsec-offload looks quite deterministic, could you please > have a look? Ah crap, sorry Hangbin, you were right. macsec_fill_info() returns -EMSGSIZE when the key length is unexpected, and at this point we haven't set it to its proper value yet. Bandaid solution would be: diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index f6cad0746a02..0f7ef7bfbdde 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -4337,7 +4337,7 @@ static int macsec_fill_info(struct sk_buff *skb, csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 : MACSEC_CIPHER_ID_GCM_AES_256; break; default: - goto nla_put_failure; + return 0; } if (nla_put_sci(skb, IFLA_MACSEC_SCI, secy->sci, Proper fix (so that the notification we're sending during upper_dev_link has full linkinfo) would be to move netdev_upper_dev_link() to after macsec_changelink_common() and fix up the error handling. I don't see anything in macsec_add_dev or macsec_changelink_common that needs the device to be linked. But anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE on invalid data, so the "bandaid" should be included as well. Should this be part of this series (either just the "bandaid" or the "proper fix"+bandaid), since we never saw a problem before? -- Sabrina ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features 2026-03-12 11:13 ` Sabrina Dubroca @ 2026-03-12 14:34 ` Hangbin Liu 2026-03-12 15:58 ` Sabrina Dubroca 0 siblings, 1 reply; 10+ messages in thread From: Hangbin Liu @ 2026-03-12 14:34 UTC (permalink / raw) To: Sabrina Dubroca Cc: Paolo Abeni, syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba, linux-kernel, netdev, razor, sridhar.samudrala, syzbot, syzkaller-bugs On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote: > > >> This patch calls netdev_change_features() after __netdev_upper_dev_link(), > > >> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event() > > >> to fill the new link info. Maybe the event is a bit early and macsec has > > >> data not ready? > > > > > > But this would still mean that there's a mismatch between > > > if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only > > > revealing it. > > > > > > I'll send fixes for the stuff I mentioned, no idea if that's what > > > syzbot saw since we don't have a repro. > > > > It looks like even the nipa CI is reproducing the issue, i.e.: > > > > https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/554921/17-rtnetlink-sh/ > > > > more failures here: > > > > https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2026-03-12--06-00&pw-n=0&pass=0 > > > > the fail in mascsec-offload looks quite deterministic, could you please > > have a look? > > Ah crap, sorry Hangbin, you were right. macsec_fill_info() returns > -EMSGSIZE when the key length is unexpected, and at this point we > haven't set it to its proper value yet. > > Bandaid solution would be: > > diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c > index f6cad0746a02..0f7ef7bfbdde 100644 > --- a/drivers/net/macsec.c > +++ b/drivers/net/macsec.c > @@ -4337,7 +4337,7 @@ static int macsec_fill_info(struct sk_buff *skb, > csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 : MACSEC_CIPHER_ID_GCM_AES_256; > break; > default: > - goto nla_put_failure; > + return 0; > } > > if (nla_put_sci(skb, IFLA_MACSEC_SCI, secy->sci, > > > Proper fix (so that the notification we're sending during > upper_dev_link has full linkinfo) would be to move > netdev_upper_dev_link() to after macsec_changelink_common() and fix up > the error handling. I don't see anything in macsec_add_dev or > macsec_changelink_common that needs the device to be linked. But If we move the netdev_upper_dev_link() after macsec_changelink_common(), we will not goto nla_put_failure via default, right? > anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE > on invalid data, so the "bandaid" should be included as well. > > Should this be part of this series (either just the "bandaid" or the > "proper fix"+bandaid), since we never saw a problem before? Since macsec need the "bandaid" fix either way. How about you post the "bandaid" fix to net. And I include the "proper fix" in this series for net-next? BTW, here is my draft patch, would you like to review it first? diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index f6cad0746a02..1f8da4c291c6 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -4161,10 +4161,6 @@ static int macsec_newlink(struct net_device *dev, lockdep_set_class(&dev->addr_list_lock, &macsec_netdev_addr_lock_key); - err = netdev_upper_dev_link(real_dev, dev, extack); - if (err < 0) - goto unregister; - /* need to be already registered so that ->init has run and * the MAC addr is set */ @@ -4177,12 +4173,12 @@ static int macsec_newlink(struct net_device *dev, if (rx_handler && sci_exists(real_dev, sci)) { err = -EBUSY; - goto unlink; + goto unregister; } err = macsec_add_dev(dev, sci, icv_len); if (err) - goto unlink; + goto unregister; if (data) { err = macsec_changelink_common(dev, data); @@ -4190,6 +4186,10 @@ static int macsec_newlink(struct net_device *dev, goto del_dev; } + err = netdev_upper_dev_link(real_dev, dev, extack); + if (err < 0) + goto unlink; + /* If h/w offloading is available, propagate to the device */ if (macsec_is_offloaded(macsec)) { const struct macsec_ops *ops; @@ -4200,7 +4200,7 @@ static int macsec_newlink(struct net_device *dev, ctx.secy = &macsec->secy; err = macsec_offload(ops->mdo_add_secy, &ctx); if (err) - goto del_dev; + goto unlink; macsec->insert_tx_tag = macsec_needs_tx_tag(macsec, ops); @@ -4209,7 +4209,7 @@ static int macsec_newlink(struct net_device *dev, err = register_macsec_dev(real_dev, dev); if (err < 0) - goto del_dev; + goto unlink; netdev_update_features(dev); netif_stacked_transfer_operstate(real_dev, dev); @@ -4219,10 +4219,10 @@ static int macsec_newlink(struct net_device *dev, return 0; -del_dev: - macsec_del_dev(macsec); unlink: netdev_upper_dev_unlink(real_dev, dev); +del_dev: + macsec_del_dev(macsec); unregister: unregister_netdevice(dev); return err; Thanks Hangbin ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features 2026-03-12 14:34 ` Hangbin Liu @ 2026-03-12 15:58 ` Sabrina Dubroca 2026-03-12 16:47 ` Paolo Abeni 0 siblings, 1 reply; 10+ messages in thread From: Sabrina Dubroca @ 2026-03-12 15:58 UTC (permalink / raw) To: Hangbin Liu Cc: Paolo Abeni, syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba, linux-kernel, netdev, razor, sridhar.samudrala, syzbot, syzkaller-bugs 2026-03-12, 14:34:44 +0000, Hangbin Liu wrote: > On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote: > > Proper fix (so that the notification we're sending during > > upper_dev_link has full linkinfo) would be to move > > netdev_upper_dev_link() to after macsec_changelink_common() and fix up > > the error handling. I don't see anything in macsec_add_dev or > > macsec_changelink_common that needs the device to be linked. But > > If we move the netdev_upper_dev_link() after macsec_changelink_common(), > we will not goto nla_put_failure via default, right? Yes. > > anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE > > on invalid data, so the "bandaid" should be included as well. > > > > Should this be part of this series (either just the "bandaid" or the > > "proper fix"+bandaid), since we never saw a problem before? > > Since macsec need the "bandaid" fix either way. How about you post the > "bandaid" fix to net. And I include the "proper fix" in this series for > net-next? But I don't think it's needed in net. Am I missing a codepath (before your series) where macsec_fill_info could be called for the new device before macsec_newlink returns? If not, it doesn't really qualify as a fix, that's why I was asking Paolo. > BTW, here is my draft patch, would you like to review it first? > > diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c > index f6cad0746a02..1f8da4c291c6 100644 > --- a/drivers/net/macsec.c > +++ b/drivers/net/macsec.c > @@ -4161,10 +4161,6 @@ static int macsec_newlink(struct net_device *dev, > lockdep_set_class(&dev->addr_list_lock, > &macsec_netdev_addr_lock_key); > > - err = netdev_upper_dev_link(real_dev, dev, extack); > - if (err < 0) > - goto unregister; > - > /* need to be already registered so that ->init has run and > * the MAC addr is set > */ > @@ -4177,12 +4173,12 @@ static int macsec_newlink(struct net_device *dev, > > if (rx_handler && sci_exists(real_dev, sci)) { > err = -EBUSY; > - goto unlink; > + goto unregister; > } > > err = macsec_add_dev(dev, sci, icv_len); > if (err) > - goto unlink; > + goto unregister; > > if (data) { > err = macsec_changelink_common(dev, data); > @@ -4190,6 +4186,10 @@ static int macsec_newlink(struct net_device *dev, > goto del_dev; > } > > + err = netdev_upper_dev_link(real_dev, dev, extack); > + if (err < 0) > + goto unlink; This one shouldn't be "goto unlink" since we haven't linked yet. I'm not sure netdev_upper_dev_unlink can handle "not linked yet" devices. Otherwise this looks ok. -- Sabrina ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features 2026-03-12 15:58 ` Sabrina Dubroca @ 2026-03-12 16:47 ` Paolo Abeni 2026-03-12 17:07 ` Sabrina Dubroca 0 siblings, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2026-03-12 16:47 UTC (permalink / raw) To: Sabrina Dubroca, Hangbin Liu Cc: syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba, linux-kernel, netdev, razor, sridhar.samudrala, syzbot, syzkaller-bugs On 3/12/26 4:58 PM, Sabrina Dubroca wrote: > 2026-03-12, 14:34:44 +0000, Hangbin Liu wrote: >> On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote: >>> Proper fix (so that the notification we're sending during >>> upper_dev_link has full linkinfo) would be to move >>> netdev_upper_dev_link() to after macsec_changelink_common() and fix up >>> the error handling. I don't see anything in macsec_add_dev or >>> macsec_changelink_common that needs the device to be linked. But >> >> If we move the netdev_upper_dev_link() after macsec_changelink_common(), >> we will not goto nla_put_failure via default, right? > > Yes. > >>> anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE >>> on invalid data, so the "bandaid" should be included as well. >>> >>> Should this be part of this series (either just the "bandaid" or the >>> "proper fix"+bandaid), since we never saw a problem before? >> >> Since macsec need the "bandaid" fix either way. How about you post the >> "bandaid" fix to net. And I include the "proper fix" in this series for >> net-next? > > But I don't think it's needed in net. Am I missing a codepath (before > your series) where macsec_fill_info could be called for the new device > before macsec_newlink returns? If not, it doesn't really qualify as a > fix, that's why I was asking Paolo. FWIW, I don't see a codepath calling into rtmsg_ifinfo_build_skb() before device initialization, so I would be fine targeting net-next with the EMSGSIZE-related change. Side note, it looks like that the WARN() in the rnnetlink code here helped identifying a real problem and correctly returning 0 when the key_len is not yet initialized will silence it forever, what about preserving a warning for this kind of race? something alike: --- diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index f6cad0746a02..82974d4fa3f6 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -4337,7 +4337,8 @@ static int macsec_fill_info(struct sk_buff *skb, csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 : MACSEC_CIPHER_ID_GCM_AES_256; break; default: - goto nla_put_failure; + WARN_ON_ONCE(1); + return 0; } ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features 2026-03-12 16:47 ` Paolo Abeni @ 2026-03-12 17:07 ` Sabrina Dubroca 0 siblings, 0 replies; 10+ messages in thread From: Sabrina Dubroca @ 2026-03-12 17:07 UTC (permalink / raw) To: Paolo Abeni Cc: Hangbin Liu, syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba, linux-kernel, netdev, razor, sridhar.samudrala, syzbot, syzkaller-bugs 2026-03-12, 17:47:45 +0100, Paolo Abeni wrote: > > > On 3/12/26 4:58 PM, Sabrina Dubroca wrote: > > 2026-03-12, 14:34:44 +0000, Hangbin Liu wrote: > >> On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote: > >>> Proper fix (so that the notification we're sending during > >>> upper_dev_link has full linkinfo) would be to move > >>> netdev_upper_dev_link() to after macsec_changelink_common() and fix up > >>> the error handling. I don't see anything in macsec_add_dev or > >>> macsec_changelink_common that needs the device to be linked. But > >> > >> If we move the netdev_upper_dev_link() after macsec_changelink_common(), > >> we will not goto nla_put_failure via default, right? > > > > Yes. > > > >>> anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE > >>> on invalid data, so the "bandaid" should be included as well. > >>> > >>> Should this be part of this series (either just the "bandaid" or the > >>> "proper fix"+bandaid), since we never saw a problem before? > >> > >> Since macsec need the "bandaid" fix either way. How about you post the > >> "bandaid" fix to net. And I include the "proper fix" in this series for > >> net-next? > > > > But I don't think it's needed in net. Am I missing a codepath (before > > your series) where macsec_fill_info could be called for the new device > > before macsec_newlink returns? If not, it doesn't really qualify as a > > fix, that's why I was asking Paolo. > > FWIW, I don't see a codepath calling into rtmsg_ifinfo_build_skb() > before device initialization, so I would be fine targeting net-next with > the EMSGSIZE-related change. > > Side note, it looks like that the WARN() in the rnnetlink code here > helped identifying a real problem and correctly returning 0 when the > key_len is not yet initialized will silence it forever, what about > preserving a warning for this kind of race? something alike: Right, I was thinking about putting a DEBUG_NET_WARN_ON_ONCE there. -- Sabrina ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-12 17:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260310-offload_compute-v1-0-3df79c09ea65@gmail.com>
2026-03-10 17:02 ` [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features syzbot ci
2026-03-10 19:17 ` Sabrina Dubroca
2026-03-11 0:47 ` Hangbin Liu
2026-03-11 21:18 ` Sabrina Dubroca
2026-03-12 9:40 ` Paolo Abeni
2026-03-12 11:13 ` Sabrina Dubroca
2026-03-12 14:34 ` Hangbin Liu
2026-03-12 15:58 ` Sabrina Dubroca
2026-03-12 16:47 ` Paolo Abeni
2026-03-12 17:07 ` Sabrina Dubroca
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox