* [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
[not found] <20260310-offload_compute-v1-0-3df79c09ea65@gmail.com>
@ 2026-03-10 17:02 ` syzbot ci
2026-03-10 19:17 ` Sabrina Dubroca
0 siblings, 1 reply; 10+ messages in thread
From: syzbot ci @ 2026-03-10 17:02 UTC (permalink / raw)
To: andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba,
linux-kernel, liuhangbin, netdev, pabeni, razor,
sridhar.samudrala
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v1] net: move netdev_compute_master_upper_features to ndo_set_features
https://lore.kernel.org/all/20260310-offload_compute-v1-0-3df79c09ea65@gmail.com
* [PATCH net-next 1/3] net: use ndo_set_features to set offload features for bonding/bridge/team
* [PATCH net-next 2/3] failover: use ndo_set_features for failover offload compute
* [PATCH net-next 3/3] net: no need to disable LRO specifically
and found the following issue:
WARNING in rtmsg_ifinfo_build_skb
Full report is available here:
https://ci.syzbot.org/series/d5001c4a-a51e-49c1-9106-624836e43ec2
***
WARNING in rtmsg_ifinfo_build_skb
tree: net-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
base: 52ede1bce557c66309f41ac29dd190be23ca9129
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/281e7e02-f6af-4b4a-845e-d1d5842a9301/config
batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
hsr_slave_0: entered promiscuous mode
hsr_slave_1: entered promiscuous mode
------------[ cut here ]------------
err == -EMSGSIZE
WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
Modules linked in:
CPU: 0 UID: 0 PID: 6496 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:rtmsg_ifinfo_build_skb+0x218/0x260
Code: f6 ba 01 00 00 00 89 e9 e8 25 15 3a 00 4c 89 f0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d e9 7f 3a 2e 02 cc e8 49 3b 42 f8 90 <0f> 0b 90 eb 90 89 d9 80 e1 07 fe c1 38 c1 0f 8c 95 fe ff ff 48 89
RSP: 0018:ffffc9000637e9a0 EFLAGS: 00010293
RAX: ffffffff89835e27 RBX: 0000000000000000 RCX: ffff8881b80a57c0
RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6
RBP: 00000000ffffffa6 R08: 0000000000000004 R09: 0000000000000004
R10: fffff52000c6fcdc R11: 0000000000000000 R12: 1ffff110235ddc21
R13: 0000000000000000 R14: ffff8881133dc780 R15: ffff88811aeee000
FS: 0000555557c4a500(0000) GS:ffff88818de65000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055555e1838c8 CR3: 0000000168b80000 CR4: 00000000000006f0
Call Trace:
<TASK>
rtnetlink_event+0x1b7/0x270
notifier_call_chain+0x1be/0x400
netdev_change_features+0x95/0xe0
__netdev_upper_dev_link+0xb20/0xc80
netdev_upper_dev_link+0xb0/0x100
macsec_newlink+0xb11/0x1200
rtnl_newlink_create+0x329/0xb70
rtnl_newlink+0x1666/0x1be0
rtnetlink_rcv_msg+0x7d5/0xbe0
netlink_rcv_skb+0x232/0x4b0
netlink_unicast+0x80f/0x9b0
netlink_sendmsg+0x813/0xb40
__sys_sendto+0x672/0x710
__x64_sys_sendto+0xde/0x100
do_syscall_64+0x14d/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ffadf757917
Code: 48 89 fa 4c 89 df e8 a8 56 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
RSP: 002b:00007ffd0eb79f80 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000555557c4a500 RCX: 00007ffadf757917
RDX: 0000000000000054 RSI: 00007ffae0544670 RDI: 0000000000000003
RBP: 0000000000000001 R08: 00007ffd0eb79fe4 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
R13: 0000000000000000 R14: 00007ffae0544670 R15: 0000000000000000
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
2026-03-10 17:02 ` [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features syzbot ci
@ 2026-03-10 19:17 ` Sabrina Dubroca
2026-03-11 0:47 ` Hangbin Liu
0 siblings, 1 reply; 10+ messages in thread
From: Sabrina Dubroca @ 2026-03-10 19:17 UTC (permalink / raw)
To: syzbot ci
Cc: andrew, bridge, davem, edumazet, horms, idosch, jiri, jv, kuba,
linux-kernel, liuhangbin, netdev, pabeni, razor,
sridhar.samudrala, syzbot, syzkaller-bugs
2026-03-10, 10:02:09 -0700, syzbot ci wrote:
> batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
> hsr_slave_0: entered promiscuous mode
> hsr_slave_1: entered promiscuous mode
> ------------[ cut here ]------------
> err == -EMSGSIZE
> WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
I'm not sure this one is caused by this series, but either way,
reviewing if_nlmsg_size/rtnl_fill_ifinfo for mismatches is really
unpleasant :/
Things I see in rtnl_fill_ifinfo but don't find in if_nlmsg_size:
- IFLA_PARENT_DEV_NAME
- IFLA_PARENT_DEV_BUS_NAME
(both from 00e77ed8e64d ("rtnetlink: add
IFLA_PARENT_[DEV|DEV_BUS]_NAME"), which doesn't include a change to
if_nlmsg_size)
- rtnl_link_slave_info_fill also outputs IFLA_INFO_SLAVE_KIND + the
IFLA_INFO_SLAVE_DATA nest, but rtnl_link_get_slave_info_data_size
only counts the nest, and its caller (rtnl_link_get_size) doesn't
have anything more about the slave info. This may be what syzbot is
tripping on here.
But there's a
+ nla_total_size(4) /* IFLA_WEIGHT */
that doesn't get filled anywhere.
> Modules linked in:
> CPU: 0 UID: 0 PID: 6496 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:rtmsg_ifinfo_build_skb+0x218/0x260
> Code: f6 ba 01 00 00 00 89 e9 e8 25 15 3a 00 4c 89 f0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d e9 7f 3a 2e 02 cc e8 49 3b 42 f8 90 <0f> 0b 90 eb 90 89 d9 80 e1 07 fe c1 38 c1 0f 8c 95 fe ff ff 48 89
> RSP: 0018:ffffc9000637e9a0 EFLAGS: 00010293
> RAX: ffffffff89835e27 RBX: 0000000000000000 RCX: ffff8881b80a57c0
> RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6
> RBP: 00000000ffffffa6 R08: 0000000000000004 R09: 0000000000000004
> R10: fffff52000c6fcdc R11: 0000000000000000 R12: 1ffff110235ddc21
> R13: 0000000000000000 R14: ffff8881133dc780 R15: ffff88811aeee000
> FS: 0000555557c4a500(0000) GS:ffff88818de65000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055555e1838c8 CR3: 0000000168b80000 CR4: 00000000000006f0
> Call Trace:
> <TASK>
> rtnetlink_event+0x1b7/0x270
> notifier_call_chain+0x1be/0x400
> netdev_change_features+0x95/0xe0
> __netdev_upper_dev_link+0xb20/0xc80
> netdev_upper_dev_link+0xb0/0x100
> macsec_newlink+0xb11/0x1200
> rtnl_newlink_create+0x329/0xb70
> rtnl_newlink+0x1666/0x1be0
--
Sabrina
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
2026-03-10 19:17 ` Sabrina Dubroca
@ 2026-03-11 0:47 ` Hangbin Liu
2026-03-11 21:18 ` Sabrina Dubroca
0 siblings, 1 reply; 10+ messages in thread
From: Hangbin Liu @ 2026-03-11 0:47 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri,
jv, kuba, linux-kernel, netdev, pabeni, razor, sridhar.samudrala,
syzbot, syzkaller-bugs
On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote:
> 2026-03-10, 10:02:09 -0700, syzbot ci wrote:
> > batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
> > hsr_slave_0: entered promiscuous mode
> > hsr_slave_1: entered promiscuous mode
> > ------------[ cut here ]------------
> > err == -EMSGSIZE
> > WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
>
> I'm not sure this one is caused by this series, but either way,
rtnetlink_event+0x1b7/0x270
notifier_call_chain+0x1be/0x400
netdev_change_features+0x95/0xe0
__netdev_upper_dev_link+0xb20/0xc80
netdev_upper_dev_link+0xb0/0x100
This patch calls netdev_change_features() after __netdev_upper_dev_link(),
Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
to fill the new link info. Maybe the event is a bit early and macsec has
data not ready?
Thanks
Hangbin
> reviewing if_nlmsg_size/rtnl_fill_ifinfo for mismatches is really
> unpleasant :/
>
> Things I see in rtnl_fill_ifinfo but don't find in if_nlmsg_size:
> - IFLA_PARENT_DEV_NAME
> - IFLA_PARENT_DEV_BUS_NAME
> (both from 00e77ed8e64d ("rtnetlink: add
> IFLA_PARENT_[DEV|DEV_BUS]_NAME"), which doesn't include a change to
> if_nlmsg_size)
> - rtnl_link_slave_info_fill also outputs IFLA_INFO_SLAVE_KIND + the
> IFLA_INFO_SLAVE_DATA nest, but rtnl_link_get_slave_info_data_size
> only counts the nest, and its caller (rtnl_link_get_size) doesn't
> have anything more about the slave info. This may be what syzbot is
> tripping on here.
>
>
> But there's a
>
> + nla_total_size(4) /* IFLA_WEIGHT */
>
> that doesn't get filled anywhere.
>
>
> > Modules linked in:
> > CPU: 0 UID: 0 PID: 6496 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:rtmsg_ifinfo_build_skb+0x218/0x260
> > Code: f6 ba 01 00 00 00 89 e9 e8 25 15 3a 00 4c 89 f0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d e9 7f 3a 2e 02 cc e8 49 3b 42 f8 90 <0f> 0b 90 eb 90 89 d9 80 e1 07 fe c1 38 c1 0f 8c 95 fe ff ff 48 89
> > RSP: 0018:ffffc9000637e9a0 EFLAGS: 00010293
> > RAX: ffffffff89835e27 RBX: 0000000000000000 RCX: ffff8881b80a57c0
> > RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6
> > RBP: 00000000ffffffa6 R08: 0000000000000004 R09: 0000000000000004
> > R10: fffff52000c6fcdc R11: 0000000000000000 R12: 1ffff110235ddc21
> > R13: 0000000000000000 R14: ffff8881133dc780 R15: ffff88811aeee000
> > FS: 0000555557c4a500(0000) GS:ffff88818de65000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000055555e1838c8 CR3: 0000000168b80000 CR4: 00000000000006f0
> > Call Trace:
> > <TASK>
> > rtnetlink_event+0x1b7/0x270
> > notifier_call_chain+0x1be/0x400
> > netdev_change_features+0x95/0xe0
> > __netdev_upper_dev_link+0xb20/0xc80
> > netdev_upper_dev_link+0xb0/0x100
> > macsec_newlink+0xb11/0x1200
> > rtnl_newlink_create+0x329/0xb70
> > rtnl_newlink+0x1666/0x1be0
>
> --
> Sabrina
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
2026-03-11 0:47 ` Hangbin Liu
@ 2026-03-11 21:18 ` Sabrina Dubroca
2026-03-12 9:40 ` Paolo Abeni
0 siblings, 1 reply; 10+ messages in thread
From: Sabrina Dubroca @ 2026-03-11 21:18 UTC (permalink / raw)
To: Hangbin Liu
Cc: syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri,
jv, kuba, linux-kernel, netdev, pabeni, razor, sridhar.samudrala,
syzbot, syzkaller-bugs
2026-03-11, 00:47:41 +0000, Hangbin Liu wrote:
> On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote:
> > 2026-03-10, 10:02:09 -0700, syzbot ci wrote:
> > > batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
> > > hsr_slave_0: entered promiscuous mode
> > > hsr_slave_1: entered promiscuous mode
> > > ------------[ cut here ]------------
> > > err == -EMSGSIZE
> > > WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
> >
> > I'm not sure this one is caused by this series, but either way,
>
>
> rtnetlink_event+0x1b7/0x270
> notifier_call_chain+0x1be/0x400
> netdev_change_features+0x95/0xe0
> __netdev_upper_dev_link+0xb20/0xc80
> netdev_upper_dev_link+0xb0/0x100
>
>
> This patch calls netdev_change_features() after __netdev_upper_dev_link(),
> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
> to fill the new link info. Maybe the event is a bit early and macsec has
> data not ready?
But this would still mean that there's a mismatch between
if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only
revealing it.
I'll send fixes for the stuff I mentioned, no idea if that's what
syzbot saw since we don't have a repro.
--
Sabrina
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
2026-03-11 21:18 ` Sabrina Dubroca
@ 2026-03-12 9:40 ` Paolo Abeni
2026-03-12 11:13 ` Sabrina Dubroca
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2026-03-12 9:40 UTC (permalink / raw)
To: Sabrina Dubroca, Hangbin Liu
Cc: syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri,
jv, kuba, linux-kernel, netdev, razor, sridhar.samudrala, syzbot,
syzkaller-bugs
On 3/11/26 10:18 PM, Sabrina Dubroca wrote:
> 2026-03-11, 00:47:41 +0000, Hangbin Liu wrote:
>> On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote:
>>> 2026-03-10, 10:02:09 -0700, syzbot ci wrote:
>>>> batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
>>>> hsr_slave_0: entered promiscuous mode
>>>> hsr_slave_1: entered promiscuous mode
>>>> ------------[ cut here ]------------
>>>> err == -EMSGSIZE
>>>> WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
>>>
>>> I'm not sure this one is caused by this series, but either way,
>>
>>
>> rtnetlink_event+0x1b7/0x270
>> notifier_call_chain+0x1be/0x400
>> netdev_change_features+0x95/0xe0
>> __netdev_upper_dev_link+0xb20/0xc80
>> netdev_upper_dev_link+0xb0/0x100
>>
>>
>> This patch calls netdev_change_features() after __netdev_upper_dev_link(),
>> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
>> to fill the new link info. Maybe the event is a bit early and macsec has
>> data not ready?
>
> But this would still mean that there's a mismatch between
> if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only
> revealing it.
>
> I'll send fixes for the stuff I mentioned, no idea if that's what
> syzbot saw since we don't have a repro.
It looks like even the nipa CI is reproducing the issue, i.e.:
https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/554921/17-rtnetlink-sh/
more failures here:
https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2026-03-12--06-00&pw-n=0&pass=0
the fail in mascsec-offload looks quite deterministic, could you please
have a look?
/P
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
2026-03-12 9:40 ` Paolo Abeni
@ 2026-03-12 11:13 ` Sabrina Dubroca
2026-03-12 14:34 ` Hangbin Liu
0 siblings, 1 reply; 10+ messages in thread
From: Sabrina Dubroca @ 2026-03-12 11:13 UTC (permalink / raw)
To: Paolo Abeni
Cc: Hangbin Liu, syzbot ci, andrew, bridge, davem, edumazet, horms,
idosch, jiri, jv, kuba, linux-kernel, netdev, razor,
sridhar.samudrala, syzbot, syzkaller-bugs
2026-03-12, 10:40:43 +0100, Paolo Abeni wrote:
> On 3/11/26 10:18 PM, Sabrina Dubroca wrote:
> > 2026-03-11, 00:47:41 +0000, Hangbin Liu wrote:
> >> On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote:
> >>> 2026-03-10, 10:02:09 -0700, syzbot ci wrote:
> >>>> batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
> >>>> hsr_slave_0: entered promiscuous mode
> >>>> hsr_slave_1: entered promiscuous mode
> >>>> ------------[ cut here ]------------
> >>>> err == -EMSGSIZE
> >>>> WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
> >>>
> >>> I'm not sure this one is caused by this series, but either way,
> >>
> >>
> >> rtnetlink_event+0x1b7/0x270
> >> notifier_call_chain+0x1be/0x400
> >> netdev_change_features+0x95/0xe0
> >> __netdev_upper_dev_link+0xb20/0xc80
> >> netdev_upper_dev_link+0xb0/0x100
> >>
> >>
> >> This patch calls netdev_change_features() after __netdev_upper_dev_link(),
> >> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
> >> to fill the new link info. Maybe the event is a bit early and macsec has
> >> data not ready?
> >
> > But this would still mean that there's a mismatch between
> > if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only
> > revealing it.
> >
> > I'll send fixes for the stuff I mentioned, no idea if that's what
> > syzbot saw since we don't have a repro.
>
> It looks like even the nipa CI is reproducing the issue, i.e.:
>
> https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/554921/17-rtnetlink-sh/
>
> more failures here:
>
> https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2026-03-12--06-00&pw-n=0&pass=0
>
> the fail in mascsec-offload looks quite deterministic, could you please
> have a look?
Ah crap, sorry Hangbin, you were right. macsec_fill_info() returns
-EMSGSIZE when the key length is unexpected, and at this point we
haven't set it to its proper value yet.
Bandaid solution would be:
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index f6cad0746a02..0f7ef7bfbdde 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4337,7 +4337,7 @@ static int macsec_fill_info(struct sk_buff *skb,
csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 : MACSEC_CIPHER_ID_GCM_AES_256;
break;
default:
- goto nla_put_failure;
+ return 0;
}
if (nla_put_sci(skb, IFLA_MACSEC_SCI, secy->sci,
Proper fix (so that the notification we're sending during
upper_dev_link has full linkinfo) would be to move
netdev_upper_dev_link() to after macsec_changelink_common() and fix up
the error handling. I don't see anything in macsec_add_dev or
macsec_changelink_common that needs the device to be linked. But
anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
on invalid data, so the "bandaid" should be included as well.
Should this be part of this series (either just the "bandaid" or the
"proper fix"+bandaid), since we never saw a problem before?
--
Sabrina
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
2026-03-12 11:13 ` Sabrina Dubroca
@ 2026-03-12 14:34 ` Hangbin Liu
2026-03-12 15:58 ` Sabrina Dubroca
0 siblings, 1 reply; 10+ messages in thread
From: Hangbin Liu @ 2026-03-12 14:34 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Paolo Abeni, syzbot ci, andrew, bridge, davem, edumazet, horms,
idosch, jiri, jv, kuba, linux-kernel, netdev, razor,
sridhar.samudrala, syzbot, syzkaller-bugs
On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote:
> > >> This patch calls netdev_change_features() after __netdev_upper_dev_link(),
> > >> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
> > >> to fill the new link info. Maybe the event is a bit early and macsec has
> > >> data not ready?
> > >
> > > But this would still mean that there's a mismatch between
> > > if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only
> > > revealing it.
> > >
> > > I'll send fixes for the stuff I mentioned, no idea if that's what
> > > syzbot saw since we don't have a repro.
> >
> > It looks like even the nipa CI is reproducing the issue, i.e.:
> >
> > https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/554921/17-rtnetlink-sh/
> >
> > more failures here:
> >
> > https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2026-03-12--06-00&pw-n=0&pass=0
> >
> > the fail in mascsec-offload looks quite deterministic, could you please
> > have a look?
>
> Ah crap, sorry Hangbin, you were right. macsec_fill_info() returns
> -EMSGSIZE when the key length is unexpected, and at this point we
> haven't set it to its proper value yet.
>
> Bandaid solution would be:
>
> diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
> index f6cad0746a02..0f7ef7bfbdde 100644
> --- a/drivers/net/macsec.c
> +++ b/drivers/net/macsec.c
> @@ -4337,7 +4337,7 @@ static int macsec_fill_info(struct sk_buff *skb,
> csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 : MACSEC_CIPHER_ID_GCM_AES_256;
> break;
> default:
> - goto nla_put_failure;
> + return 0;
> }
>
> if (nla_put_sci(skb, IFLA_MACSEC_SCI, secy->sci,
>
>
> Proper fix (so that the notification we're sending during
> upper_dev_link has full linkinfo) would be to move
> netdev_upper_dev_link() to after macsec_changelink_common() and fix up
> the error handling. I don't see anything in macsec_add_dev or
> macsec_changelink_common that needs the device to be linked. But
If we move the netdev_upper_dev_link() after macsec_changelink_common(),
we will not goto nla_put_failure via default, right?
> anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
> on invalid data, so the "bandaid" should be included as well.
>
> Should this be part of this series (either just the "bandaid" or the
> "proper fix"+bandaid), since we never saw a problem before?
Since macsec need the "bandaid" fix either way. How about you post the
"bandaid" fix to net. And I include the "proper fix" in this series for
net-next?
BTW, here is my draft patch, would you like to review it first?
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index f6cad0746a02..1f8da4c291c6 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4161,10 +4161,6 @@ static int macsec_newlink(struct net_device *dev,
lockdep_set_class(&dev->addr_list_lock,
&macsec_netdev_addr_lock_key);
- err = netdev_upper_dev_link(real_dev, dev, extack);
- if (err < 0)
- goto unregister;
-
/* need to be already registered so that ->init has run and
* the MAC addr is set
*/
@@ -4177,12 +4173,12 @@ static int macsec_newlink(struct net_device *dev,
if (rx_handler && sci_exists(real_dev, sci)) {
err = -EBUSY;
- goto unlink;
+ goto unregister;
}
err = macsec_add_dev(dev, sci, icv_len);
if (err)
- goto unlink;
+ goto unregister;
if (data) {
err = macsec_changelink_common(dev, data);
@@ -4190,6 +4186,10 @@ static int macsec_newlink(struct net_device *dev,
goto del_dev;
}
+ err = netdev_upper_dev_link(real_dev, dev, extack);
+ if (err < 0)
+ goto unlink;
+
/* If h/w offloading is available, propagate to the device */
if (macsec_is_offloaded(macsec)) {
const struct macsec_ops *ops;
@@ -4200,7 +4200,7 @@ static int macsec_newlink(struct net_device *dev,
ctx.secy = &macsec->secy;
err = macsec_offload(ops->mdo_add_secy, &ctx);
if (err)
- goto del_dev;
+ goto unlink;
macsec->insert_tx_tag =
macsec_needs_tx_tag(macsec, ops);
@@ -4209,7 +4209,7 @@ static int macsec_newlink(struct net_device *dev,
err = register_macsec_dev(real_dev, dev);
if (err < 0)
- goto del_dev;
+ goto unlink;
netdev_update_features(dev);
netif_stacked_transfer_operstate(real_dev, dev);
@@ -4219,10 +4219,10 @@ static int macsec_newlink(struct net_device *dev,
return 0;
-del_dev:
- macsec_del_dev(macsec);
unlink:
netdev_upper_dev_unlink(real_dev, dev);
+del_dev:
+ macsec_del_dev(macsec);
unregister:
unregister_netdevice(dev);
return err;
Thanks
Hangbin
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
2026-03-12 14:34 ` Hangbin Liu
@ 2026-03-12 15:58 ` Sabrina Dubroca
2026-03-12 16:47 ` Paolo Abeni
0 siblings, 1 reply; 10+ messages in thread
From: Sabrina Dubroca @ 2026-03-12 15:58 UTC (permalink / raw)
To: Hangbin Liu
Cc: Paolo Abeni, syzbot ci, andrew, bridge, davem, edumazet, horms,
idosch, jiri, jv, kuba, linux-kernel, netdev, razor,
sridhar.samudrala, syzbot, syzkaller-bugs
2026-03-12, 14:34:44 +0000, Hangbin Liu wrote:
> On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote:
> > Proper fix (so that the notification we're sending during
> > upper_dev_link has full linkinfo) would be to move
> > netdev_upper_dev_link() to after macsec_changelink_common() and fix up
> > the error handling. I don't see anything in macsec_add_dev or
> > macsec_changelink_common that needs the device to be linked. But
>
> If we move the netdev_upper_dev_link() after macsec_changelink_common(),
> we will not goto nla_put_failure via default, right?
Yes.
> > anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
> > on invalid data, so the "bandaid" should be included as well.
> >
> > Should this be part of this series (either just the "bandaid" or the
> > "proper fix"+bandaid), since we never saw a problem before?
>
> Since macsec need the "bandaid" fix either way. How about you post the
> "bandaid" fix to net. And I include the "proper fix" in this series for
> net-next?
But I don't think it's needed in net. Am I missing a codepath (before
your series) where macsec_fill_info could be called for the new device
before macsec_newlink returns? If not, it doesn't really qualify as a
fix, that's why I was asking Paolo.
> BTW, here is my draft patch, would you like to review it first?
>
> diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
> index f6cad0746a02..1f8da4c291c6 100644
> --- a/drivers/net/macsec.c
> +++ b/drivers/net/macsec.c
> @@ -4161,10 +4161,6 @@ static int macsec_newlink(struct net_device *dev,
> lockdep_set_class(&dev->addr_list_lock,
> &macsec_netdev_addr_lock_key);
>
> - err = netdev_upper_dev_link(real_dev, dev, extack);
> - if (err < 0)
> - goto unregister;
> -
> /* need to be already registered so that ->init has run and
> * the MAC addr is set
> */
> @@ -4177,12 +4173,12 @@ static int macsec_newlink(struct net_device *dev,
>
> if (rx_handler && sci_exists(real_dev, sci)) {
> err = -EBUSY;
> - goto unlink;
> + goto unregister;
> }
>
> err = macsec_add_dev(dev, sci, icv_len);
> if (err)
> - goto unlink;
> + goto unregister;
>
> if (data) {
> err = macsec_changelink_common(dev, data);
> @@ -4190,6 +4186,10 @@ static int macsec_newlink(struct net_device *dev,
> goto del_dev;
> }
>
> + err = netdev_upper_dev_link(real_dev, dev, extack);
> + if (err < 0)
> + goto unlink;
This one shouldn't be "goto unlink" since we haven't linked yet. I'm
not sure netdev_upper_dev_unlink can handle "not linked yet" devices.
Otherwise this looks ok.
--
Sabrina
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
2026-03-12 15:58 ` Sabrina Dubroca
@ 2026-03-12 16:47 ` Paolo Abeni
2026-03-12 17:07 ` Sabrina Dubroca
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2026-03-12 16:47 UTC (permalink / raw)
To: Sabrina Dubroca, Hangbin Liu
Cc: syzbot ci, andrew, bridge, davem, edumazet, horms, idosch, jiri,
jv, kuba, linux-kernel, netdev, razor, sridhar.samudrala, syzbot,
syzkaller-bugs
On 3/12/26 4:58 PM, Sabrina Dubroca wrote:
> 2026-03-12, 14:34:44 +0000, Hangbin Liu wrote:
>> On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote:
>>> Proper fix (so that the notification we're sending during
>>> upper_dev_link has full linkinfo) would be to move
>>> netdev_upper_dev_link() to after macsec_changelink_common() and fix up
>>> the error handling. I don't see anything in macsec_add_dev or
>>> macsec_changelink_common that needs the device to be linked. But
>>
>> If we move the netdev_upper_dev_link() after macsec_changelink_common(),
>> we will not goto nla_put_failure via default, right?
>
> Yes.
>
>>> anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
>>> on invalid data, so the "bandaid" should be included as well.
>>>
>>> Should this be part of this series (either just the "bandaid" or the
>>> "proper fix"+bandaid), since we never saw a problem before?
>>
>> Since macsec need the "bandaid" fix either way. How about you post the
>> "bandaid" fix to net. And I include the "proper fix" in this series for
>> net-next?
>
> But I don't think it's needed in net. Am I missing a codepath (before
> your series) where macsec_fill_info could be called for the new device
> before macsec_newlink returns? If not, it doesn't really qualify as a
> fix, that's why I was asking Paolo.
FWIW, I don't see a codepath calling into rtmsg_ifinfo_build_skb()
before device initialization, so I would be fine targeting net-next with
the EMSGSIZE-related change.
Side note, it looks like that the WARN() in the rnnetlink code here
helped identifying a real problem and correctly returning 0 when the
key_len is not yet initialized will silence it forever, what about
preserving a warning for this kind of race? something alike:
---
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index f6cad0746a02..82974d4fa3f6 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4337,7 +4337,8 @@ static int macsec_fill_info(struct sk_buff *skb,
csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 :
MACSEC_CIPHER_ID_GCM_AES_256;
break;
default:
- goto nla_put_failure;
+ WARN_ON_ONCE(1);
+ return 0;
}
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
2026-03-12 16:47 ` Paolo Abeni
@ 2026-03-12 17:07 ` Sabrina Dubroca
0 siblings, 0 replies; 10+ messages in thread
From: Sabrina Dubroca @ 2026-03-12 17:07 UTC (permalink / raw)
To: Paolo Abeni
Cc: Hangbin Liu, syzbot ci, andrew, bridge, davem, edumazet, horms,
idosch, jiri, jv, kuba, linux-kernel, netdev, razor,
sridhar.samudrala, syzbot, syzkaller-bugs
2026-03-12, 17:47:45 +0100, Paolo Abeni wrote:
>
>
> On 3/12/26 4:58 PM, Sabrina Dubroca wrote:
> > 2026-03-12, 14:34:44 +0000, Hangbin Liu wrote:
> >> On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote:
> >>> Proper fix (so that the notification we're sending during
> >>> upper_dev_link has full linkinfo) would be to move
> >>> netdev_upper_dev_link() to after macsec_changelink_common() and fix up
> >>> the error handling. I don't see anything in macsec_add_dev or
> >>> macsec_changelink_common that needs the device to be linked. But
> >>
> >> If we move the netdev_upper_dev_link() after macsec_changelink_common(),
> >> we will not goto nla_put_failure via default, right?
> >
> > Yes.
> >
> >>> anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
> >>> on invalid data, so the "bandaid" should be included as well.
> >>>
> >>> Should this be part of this series (either just the "bandaid" or the
> >>> "proper fix"+bandaid), since we never saw a problem before?
> >>
> >> Since macsec need the "bandaid" fix either way. How about you post the
> >> "bandaid" fix to net. And I include the "proper fix" in this series for
> >> net-next?
> >
> > But I don't think it's needed in net. Am I missing a codepath (before
> > your series) where macsec_fill_info could be called for the new device
> > before macsec_newlink returns? If not, it doesn't really qualify as a
> > fix, that's why I was asking Paolo.
>
> FWIW, I don't see a codepath calling into rtmsg_ifinfo_build_skb()
> before device initialization, so I would be fine targeting net-next with
> the EMSGSIZE-related change.
>
> Side note, it looks like that the WARN() in the rnnetlink code here
> helped identifying a real problem and correctly returning 0 when the
> key_len is not yet initialized will silence it forever, what about
> preserving a warning for this kind of race? something alike:
Right, I was thinking about putting a DEBUG_NET_WARN_ON_ONCE there.
--
Sabrina
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-12 17:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260310-offload_compute-v1-0-3df79c09ea65@gmail.com>
2026-03-10 17:02 ` [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features syzbot ci
2026-03-10 19:17 ` Sabrina Dubroca
2026-03-11 0:47 ` Hangbin Liu
2026-03-11 21:18 ` Sabrina Dubroca
2026-03-12 9:40 ` Paolo Abeni
2026-03-12 11:13 ` Sabrina Dubroca
2026-03-12 14:34 ` Hangbin Liu
2026-03-12 15:58 ` Sabrina Dubroca
2026-03-12 16:47 ` Paolo Abeni
2026-03-12 17:07 ` Sabrina Dubroca
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox