* Sleeping in atomic context with VLAN and netdev instance lock drivers
@ 2025-07-15 15:04 Cosmin Ratiu
2025-07-15 15:55 ` Stanislav Fomichev
0 siblings, 1 reply; 3+ messages in thread
From: Cosmin Ratiu @ 2025-07-15 15:04 UTC (permalink / raw)
To: sdf@fomichev.me; +Cc: kuba@kernel.org, netdev@vger.kernel.org
Hi Stanislav,
There's a bug that was uncovered recently in a kernel with
DEBUG_ATOMIC_SLEEP related to the new netdev instance locking.
I looked a bit into it and I am not sure how to solve it, I'd like your
help. On a netdevice with instance locking enabled which supports
macsec (e.g. mlx5) and a kernel with:
CONFIG_MACSEC=y
CONFIG_MLX5_MACSEC=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
Run these:
IF=eth1
ip link del macsec0
ip link add link $IF macsec0 type macsec sci 3154 cipher gcm-aes-256
encrypt on encodingsa 0
ip link set dev macsec0 up
ip link add link macsec0 name macsec_vlan type vlan id 1
ip link set dev macsec_vlan address 00:11:22:33:44:88
ip link set dev macsec_vlan up
And you get this splat:
# BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:275
# dump_stack_lvl+0x4f/0x60
# __might_resched+0xeb/0x140
# mutex_lock+0x1a/0x40
# dev_set_promiscuity+0x26/0x90
# __dev_set_promiscuity+0x85/0x170
# __dev_set_rx_mode+0x69/0xa0
# dev_uc_add+0x6d/0x80
# vlan_dev_open+0x5f/0x120 [8021q]
# __dev_open+0x10c/0x2a0
# __dev_change_flags+0x1a4/0x210
# netif_change_flags+0x22/0x60
# do_setlink.isra.0+0xdb0/0x10f0
# rtnl_newlink+0x797/0xb00
# rtnetlink_rcv_msg+0x1cb/0x3f0
# netlink_rcv_skb+0x53/0x100
# netlink_unicast+0x273/0x3b0
# netlink_sendmsg+0x1f2/0x430
The problem is taking the netdev instance lock while holding the dev-
>addr_list_lock spinlock.
Any suggestions on how to refactor things to avoid this? Maybe schedule
a wq task from vlan_dev_change_rx_flags instead of synchronously trying
to do the change? I'm not sure that would entirely solve the issue
though.
Cosmin.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Sleeping in atomic context with VLAN and netdev instance lock drivers
2025-07-15 15:04 Sleeping in atomic context with VLAN and netdev instance lock drivers Cosmin Ratiu
@ 2025-07-15 15:55 ` Stanislav Fomichev
2025-07-19 10:23 ` Cosmin Ratiu
0 siblings, 1 reply; 3+ messages in thread
From: Stanislav Fomichev @ 2025-07-15 15:55 UTC (permalink / raw)
To: Cosmin Ratiu; +Cc: sdf@fomichev.me, kuba@kernel.org, netdev@vger.kernel.org
On 07/15, Cosmin Ratiu wrote:
> Hi Stanislav,
>
> There's a bug that was uncovered recently in a kernel with
> DEBUG_ATOMIC_SLEEP related to the new netdev instance locking.
>
> I looked a bit into it and I am not sure how to solve it, I'd like your
> help. On a netdevice with instance locking enabled which supports
> macsec (e.g. mlx5) and a kernel with:
> CONFIG_MACSEC=y
> CONFIG_MLX5_MACSEC=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
>
> Run these:
>
> IF=eth1
> ip link del macsec0
> ip link add link $IF macsec0 type macsec sci 3154 cipher gcm-aes-256
> encrypt on encodingsa 0
> ip link set dev macsec0 up
> ip link add link macsec0 name macsec_vlan type vlan id 1
> ip link set dev macsec_vlan address 00:11:22:33:44:88
> ip link set dev macsec_vlan up
>
> And you get this splat:
> # BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:275
> # dump_stack_lvl+0x4f/0x60
> # __might_resched+0xeb/0x140
> # mutex_lock+0x1a/0x40
> # dev_set_promiscuity+0x26/0x90
> # __dev_set_promiscuity+0x85/0x170
> # __dev_set_rx_mode+0x69/0xa0
> # dev_uc_add+0x6d/0x80
> # vlan_dev_open+0x5f/0x120 [8021q]
> # __dev_open+0x10c/0x2a0
> # __dev_change_flags+0x1a4/0x210
> # netif_change_flags+0x22/0x60
> # do_setlink.isra.0+0xdb0/0x10f0
> # rtnl_newlink+0x797/0xb00
> # rtnetlink_rcv_msg+0x1cb/0x3f0
> # netlink_rcv_skb+0x53/0x100
> # netlink_unicast+0x273/0x3b0
> # netlink_sendmsg+0x1f2/0x430
>
> The problem is taking the netdev instance lock while holding the dev-
> >addr_list_lock spinlock.
>
> Any suggestions on how to refactor things to avoid this? Maybe schedule
> a wq task from vlan_dev_change_rx_flags instead of synchronously trying
> to do the change? I'm not sure that would entirely solve the issue
> though.
Thanks for the report, I was looking at similar issue in [0] and for
macsec I was thinking about the following:
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 7edbe76b5455..4c75d1fea552 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -3868,7 +3868,7 @@ static void macsec_setup(struct net_device *dev)
ether_setup(dev);
dev->min_mtu = 0;
dev->max_mtu = ETH_MAX_MTU;
- dev->priv_flags |= IFF_NO_QUEUE;
+ dev->priv_flags |= IFF_NO_QUEUE | IFF_UNICAST_FLT;
dev->netdev_ops = &macsec_netdev_ops;
dev->needs_free_netdev = true;
dev->priv_destructor = macsec_free_netdev;
macsec has an ndo_set_rx_mode handler that propagates the uc list so
not sure why it lacks IFF_UNICAST_FLT.
This is not a systemic fix, but I guess with the limited number of
stacking devices, that should do? If that fixes the issue for you,
I can send a patch..
0: https://lore.kernel.org/netdev/686d55b4.050a0220.1ffab7.0014.GAE@google.com/
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: Sleeping in atomic context with VLAN and netdev instance lock drivers
2025-07-15 15:55 ` Stanislav Fomichev
@ 2025-07-19 10:23 ` Cosmin Ratiu
0 siblings, 0 replies; 3+ messages in thread
From: Cosmin Ratiu @ 2025-07-19 10:23 UTC (permalink / raw)
To: stfomichev@gmail.com
Cc: kuba@kernel.org, netdev@vger.kernel.org, sdf@fomichev.me
On Tue, 2025-07-15 at 08:55 -0700, Stanislav Fomichev wrote:
>
> Thanks for the report, I was looking at similar issue in [0] and for
> macsec I was thinking about the following:
>
> diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
> index 7edbe76b5455..4c75d1fea552 100644
> --- a/drivers/net/macsec.c
> +++ b/drivers/net/macsec.c
> @@ -3868,7 +3868,7 @@ static void macsec_setup(struct net_device
> *dev)
> ether_setup(dev);
> dev->min_mtu = 0;
> dev->max_mtu = ETH_MAX_MTU;
> - dev->priv_flags |= IFF_NO_QUEUE;
> + dev->priv_flags |= IFF_NO_QUEUE | IFF_UNICAST_FLT;
> dev->netdev_ops = &macsec_netdev_ops;
> dev->needs_free_netdev = true;
> dev->priv_destructor = macsec_free_netdev;
>
> macsec has an ndo_set_rx_mode handler that propagates the uc list so
> not sure why it lacks IFF_UNICAST_FLT.
>
> This is not a systemic fix, but I guess with the limited number of
> stacking devices, that should do? If that fixes the issue for you,
> I can send a patch..
>
> 0:
> https://lore.kernel.org/netdev/686d55b4.050a0220.1ffab7.0014.GAE@google.com/
I tested, this works, thank you.
I guess avoiding nested calls requiring the instance lock while holding
the spinlock is one way of avoiding the problem. Looking forward to the
fix.
Thank you,
Cosmin.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-07-19 10:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15 15:04 Sleeping in atomic context with VLAN and netdev instance lock drivers Cosmin Ratiu
2025-07-15 15:55 ` Stanislav Fomichev
2025-07-19 10:23 ` Cosmin Ratiu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox