linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: Lock lower level devices when updating features
@ 2025-05-06 14:21 Cosmin Ratiu
  2025-05-06 17:12 ` Stanislav Fomichev
  0 siblings, 1 reply; 10+ messages in thread
From: Cosmin Ratiu @ 2025-05-06 14:21 UTC (permalink / raw)
  To: netdev, cratiu
  Cc: Stanislav Fomichev, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, jiri @ resnulli . us, Saeed Mahameed,
	Dragos Tatulea, linux-kselftest

__netdev_update_features() expects the netdevice to be ops-locked, but
it gets called recursively on the lower level netdevices to sync their
features, and nothing locks those.

This commit fixes that, with the assumption that it shouldn't be possible
for both higher-level and lover-level netdevices to require the instance
lock, because that would lead to lock dependency warnings.

Without this, playing with higher level (e.g. vxlan) netdevices on top
of netdevices with instance locking enabled can run into issues:

WARNING: CPU: 59 PID: 206496 at ./include/net/netdev_lock.h:17 netif_napi_add_weight_locked+0x753/0xa60
[...]
Call Trace:
 <TASK>
 mlx5e_open_channel+0xc09/0x3740 [mlx5_core]
 mlx5e_open_channels+0x1f0/0x770 [mlx5_core]
 mlx5e_safe_switch_params+0x1b5/0x2e0 [mlx5_core]
 set_feature_lro+0x1c2/0x330 [mlx5_core]
 mlx5e_handle_feature+0xc8/0x140 [mlx5_core]
 mlx5e_set_features+0x233/0x2e0 [mlx5_core]
 __netdev_update_features+0x5be/0x1670
 __netdev_update_features+0x71f/0x1670
 dev_ethtool+0x21c5/0x4aa0
 dev_ioctl+0x438/0xae0
 sock_ioctl+0x2ba/0x690
 __x64_sys_ioctl+0xa78/0x1700
 do_syscall_64+0x6d/0x140
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
 </TASK>

Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 net/core/dev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1be7cb73a602..77472364225c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10620,8 +10620,11 @@ int __netdev_update_features(struct net_device *dev)
 	/* some features must be disabled on lower devices when disabled
 	 * on an upper device (think: bonding master or bridge)
 	 */
-	netdev_for_each_lower_dev(dev, lower, iter)
+	netdev_for_each_lower_dev(dev, lower, iter) {
+		netdev_lock_ops(lower);
 		netdev_sync_lower_features(dev, lower, features);
+		netdev_unlock_ops(lower);
+	}
 
 	if (!err) {
 		netdev_features_t diff = features ^ dev->features;
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: Lock lower level devices when updating features
  2025-05-06 14:21 [PATCH net] net: Lock lower level devices when updating features Cosmin Ratiu
@ 2025-05-06 17:12 ` Stanislav Fomichev
  2025-05-06 17:40   ` Stanislav Fomichev
  2025-05-06 17:47   ` Cosmin Ratiu
  0 siblings, 2 replies; 10+ messages in thread
From: Stanislav Fomichev @ 2025-05-06 17:12 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: netdev, Stanislav Fomichev, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, jiri @ resnulli . us, Saeed Mahameed,
	Dragos Tatulea, linux-kselftest

On 05/06, Cosmin Ratiu wrote:
> __netdev_update_features() expects the netdevice to be ops-locked, but
> it gets called recursively on the lower level netdevices to sync their
> features, and nothing locks those.
> 
> This commit fixes that, with the assumption that it shouldn't be possible
> for both higher-level and lover-level netdevices to require the instance
> lock, because that would lead to lock dependency warnings.
> 
> Without this, playing with higher level (e.g. vxlan) netdevices on top
> of netdevices with instance locking enabled can run into issues:

Mentioning vxlan is a bit confusing here; it shouldn't let you flip lro (I
think). Which upper are you testing against?

Trying to understand if we can cover this case in the selftests.
netdevsim also doesn't expose F_LRO feature... (yet?)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: Lock lower level devices when updating features
  2025-05-06 17:12 ` Stanislav Fomichev
@ 2025-05-06 17:40   ` Stanislav Fomichev
  2025-05-06 17:47   ` Cosmin Ratiu
  1 sibling, 0 replies; 10+ messages in thread
From: Stanislav Fomichev @ 2025-05-06 17:40 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: netdev, Stanislav Fomichev, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, jiri @ resnulli . us, Saeed Mahameed,
	Dragos Tatulea, linux-kselftest

On 05/06, Stanislav Fomichev wrote:
> On 05/06, Cosmin Ratiu wrote:
> > __netdev_update_features() expects the netdevice to be ops-locked, but
> > it gets called recursively on the lower level netdevices to sync their
> > features, and nothing locks those.
> > 
> > This commit fixes that, with the assumption that it shouldn't be possible
> > for both higher-level and lover-level netdevices to require the instance
> > lock, because that would lead to lock dependency warnings.
> > 
> > Without this, playing with higher level (e.g. vxlan) netdevices on top
> > of netdevices with instance locking enabled can run into issues:
> 
> Mentioning vxlan is a bit confusing here; it shouldn't let you flip lro (I
> think). Which upper are you testing against?
> 
> Trying to understand if we can cover this case in the selftests.
> netdevsim also doesn't expose F_LRO feature... (yet?)

Ok, yeah, I can see it with the teaming on top of netdevsim and the following
patch applied. Might try to send a testcase to cover this and the promisc part
that I missed.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index 2aa999345fe1..af545d42961c 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -881,11 +881,13 @@ static void nsim_setup(struct net_device *dev)
 			 NETIF_F_SG |
 			 NETIF_F_FRAGLIST |
 			 NETIF_F_HW_CSUM |
+			 NETIF_F_LRO |
 			 NETIF_F_TSO;
 	dev->hw_features |= NETIF_F_HW_TC |
 			    NETIF_F_SG |
 			    NETIF_F_FRAGLIST |
 			    NETIF_F_HW_CSUM |
+			    NETIF_F_LRO |
 			    NETIF_F_TSO;
 	dev->max_mtu = ETH_MAX_MTU;
 	dev->xdp_features = NETDEV_XDP_ACT_HW_OFFLOAD;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: Lock lower level devices when updating features
  2025-05-06 17:12 ` Stanislav Fomichev
  2025-05-06 17:40   ` Stanislav Fomichev
@ 2025-05-06 17:47   ` Cosmin Ratiu
  2025-05-06 18:13     ` Stanislav Fomichev
  1 sibling, 1 reply; 10+ messages in thread
From: Cosmin Ratiu @ 2025-05-06 17:47 UTC (permalink / raw)
  To: stfomichev@gmail.com
  Cc: davem@davemloft.net, kuba@kernel.org,
	linux-kselftest@vger.kernel.org, Dragos Tatulea, sdf@fomichev.me,
	pabeni@redhat.com, jiri@resnulli.us, edumazet@google.com,
	Saeed Mahameed, netdev@vger.kernel.org

On Tue, 2025-05-06 at 10:12 -0700, Stanislav Fomichev wrote:
> On 05/06, Cosmin Ratiu wrote:
> > __netdev_update_features() expects the netdevice to be ops-locked,
> > but
> > it gets called recursively on the lower level netdevices to sync
> > their
> > features, and nothing locks those.
> > 
> > This commit fixes that, with the assumption that it shouldn't be
> > possible
> > for both higher-level and lover-level netdevices to require the
> > instance
> > lock, because that would lead to lock dependency warnings.
> > 
> > Without this, playing with higher level (e.g. vxlan) netdevices on
> > top
> > of netdevices with instance locking enabled can run into issues:
> 
> Mentioning vxlan is a bit confusing here; it shouldn't let you flip
> lro (I
> think). Which upper are you testing against?

It is vxlan, but LRO is just a red herring in this case, 
mlx5e_set_features calls every feature handler in turn, and this is
just the example I picked from the sea of stack traces.

> 
> Trying to understand if we can cover this case in the selftests.
> netdevsim also doesn't expose F_LRO feature... (yet?)

I see you found a way with teaming, but I think in essence a sequence
of commands that makes __netdev_update_features call itself recursively
once on the lower dev will trigger the netdev_ops_assert_locked on the
lower dev.

Thanks for the review!

Cosmin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: Lock lower level devices when updating features
  2025-05-06 17:47   ` Cosmin Ratiu
@ 2025-05-06 18:13     ` Stanislav Fomichev
  2025-05-07 14:35       ` Cosmin Ratiu
  0 siblings, 1 reply; 10+ messages in thread
From: Stanislav Fomichev @ 2025-05-06 18:13 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: davem@davemloft.net, kuba@kernel.org,
	linux-kselftest@vger.kernel.org, Dragos Tatulea, sdf@fomichev.me,
	pabeni@redhat.com, jiri@resnulli.us, edumazet@google.com,
	Saeed Mahameed, netdev@vger.kernel.org

On 05/06, Cosmin Ratiu wrote:
> On Tue, 2025-05-06 at 10:12 -0700, Stanislav Fomichev wrote:
> > On 05/06, Cosmin Ratiu wrote:
> > > __netdev_update_features() expects the netdevice to be ops-locked,
> > > but
> > > it gets called recursively on the lower level netdevices to sync
> > > their
> > > features, and nothing locks those.
> > > 
> > > This commit fixes that, with the assumption that it shouldn't be
> > > possible
> > > for both higher-level and lover-level netdevices to require the
> > > instance
> > > lock, because that would lead to lock dependency warnings.
> > > 
> > > Without this, playing with higher level (e.g. vxlan) netdevices on
> > > top
> > > of netdevices with instance locking enabled can run into issues:
> > 
> > Mentioning vxlan is a bit confusing here; it shouldn't let you flip
> > lro (I
> > think). Which upper are you testing against?
> 
> It is vxlan, but LRO is just a red herring in this case, 
> mlx5e_set_features calls every feature handler in turn, and this is
> just the example I picked from the sea of stack traces.
> 
> > 
> > Trying to understand if we can cover this case in the selftests.
> > netdevsim also doesn't expose F_LRO feature... (yet?)
> 
> I see you found a way with teaming, but I think in essence a sequence
> of commands that makes __netdev_update_features call itself recursively
> once on the lower dev will trigger the netdev_ops_assert_locked on the
> lower dev.

Right, but netdev_sync_lower_features calls lower's __netdev_update_features
only for NETIF_F_UPPER_DISABLES. So it doesn't propagate all features,
only LRO AFAICT.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: Lock lower level devices when updating features
  2025-05-06 18:13     ` Stanislav Fomichev
@ 2025-05-07 14:35       ` Cosmin Ratiu
  2025-05-07 15:13         ` Cosmin Ratiu
  0 siblings, 1 reply; 10+ messages in thread
From: Cosmin Ratiu @ 2025-05-07 14:35 UTC (permalink / raw)
  To: stfomichev@gmail.com
  Cc: davem@davemloft.net, linux-kselftest@vger.kernel.org,
	Dragos Tatulea, sdf@fomichev.me, pabeni@redhat.com,
	jiri@resnulli.us, edumazet@google.com, kuba@kernel.org,
	Saeed Mahameed, netdev@vger.kernel.org

On Tue, 2025-05-06 at 11:13 -0700, Stanislav Fomichev wrote:
> On 05/06, Cosmin Ratiu wrote:
> 
> 
> Right, but netdev_sync_lower_features calls lower's
> __netdev_update_features
> only for NETIF_F_UPPER_DISABLES. So it doesn't propagate all
> features,
> only LRO AFAICT.

Got it, I didn't look into what netdev_sync_lower_features actually
does besides noticing it can call __netdev_update_feature...

In any case, please hold off with picking this patch up, it seems
there's a possibility of a real deadlock. Here's the scenario:

============================================
WARNING: possible recursive locking detected
--------------------------------------------
ethtool/44150 is trying to acquire lock:
ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
__netdev_update_features+0x31e/0xe20

but task is already holding lock:
ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
ethnl_set_features+0xbc/0x4b0
and the lock comparison function returns 0:

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&dev_instance_lock_key#7);
  lock(&dev_instance_lock_key#7);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by ethtool/44150:
 #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40
 #1: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at:
ethnl_set_features+0x88/0x4b0
 #2: ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
ethnl_set_features+0xbc/0x4b0

stack backtrace:
Call Trace:
 <TASK>
 dump_stack_lvl+0x69/0xa0
 print_deadlock_bug.cold+0xbd/0xca
 __lock_acquire+0x163c/0x2f00
 lock_acquire+0xd3/0x2e0
 __mutex_lock+0x98/0xf10
 __netdev_update_features+0x31e/0xe20
 netdev_update_features+0x1f/0x60
 vlan_device_event+0x57d/0x930 [8021q]
 notifier_call_chain+0x3d/0x100
 netdev_features_change+0x32/0x50
 ethnl_set_features+0x17e/0x4b0
 genl_family_rcv_msg_doit+0xe0/0x130
 genl_rcv_msg+0x188/0x290
[...]

I'm not sure how to solve this yet...
Cosmin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: Lock lower level devices when updating features
  2025-05-07 14:35       ` Cosmin Ratiu
@ 2025-05-07 15:13         ` Cosmin Ratiu
  2025-05-07 20:29           ` Cosmin Ratiu
  0 siblings, 1 reply; 10+ messages in thread
From: Cosmin Ratiu @ 2025-05-07 15:13 UTC (permalink / raw)
  To: stfomichev@gmail.com
  Cc: davem@davemloft.net, linux-kselftest@vger.kernel.org,
	Dragos Tatulea, sdf@fomichev.me, kuba@kernel.org,
	jiri@resnulli.us, edumazet@google.com, pabeni@redhat.com,
	Saeed Mahameed, netdev@vger.kernel.org

On Wed, 2025-05-07 at 14:35 +0000, Cosmin Ratiu wrote:
> On Tue, 2025-05-06 at 11:13 -0700, Stanislav Fomichev wrote:
> > On 05/06, Cosmin Ratiu wrote:
> > 
> > 
> > Right, but netdev_sync_lower_features calls lower's
> > __netdev_update_features
> > only for NETIF_F_UPPER_DISABLES. So it doesn't propagate all
> > features,
> > only LRO AFAICT.
> 
> Got it, I didn't look into what netdev_sync_lower_features actually
> does besides noticing it can call __netdev_update_feature...
> 
> In any case, please hold off with picking this patch up, it seems
> there's a possibility of a real deadlock. Here's the scenario:
> 
> ============================================
> WARNING: possible recursive locking detected
> --------------------------------------------
> ethtool/44150 is trying to acquire lock:
> ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> __netdev_update_features+0x31e/0xe20
> 
> but task is already holding lock:
> ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> ethnl_set_features+0xbc/0x4b0
> and the lock comparison function returns 0:
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&dev_instance_lock_key#7);
>   lock(&dev_instance_lock_key#7);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by ethtool/44150:
>  #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40
>  #1: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at:
> ethnl_set_features+0x88/0x4b0
>  #2: ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> ethnl_set_features+0xbc/0x4b0
> 
> stack backtrace:
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x69/0xa0
>  print_deadlock_bug.cold+0xbd/0xca
>  __lock_acquire+0x163c/0x2f00
>  lock_acquire+0xd3/0x2e0
>  __mutex_lock+0x98/0xf10
>  __netdev_update_features+0x31e/0xe20
>  netdev_update_features+0x1f/0x60
>  vlan_device_event+0x57d/0x930 [8021q]
>  notifier_call_chain+0x3d/0x100
>  netdev_features_change+0x32/0x50
>  ethnl_set_features+0x17e/0x4b0
>  genl_family_rcv_msg_doit+0xe0/0x130
>  genl_rcv_msg+0x188/0x290
> [...]
> 
> I'm not sure how to solve this yet...
> Cosmin.

If it's not clear, the problem is that:
1. the lower device is already ops locked
2. netdev_feature_change gets called.
3. __netdev_update_features gets called for the vlan (upper) dev.
4. It tries to acquire the same lock instance as 1 (this patch).
5. Deadlock.

One solution I can think of would be to run device notifiers for
changing features outside the lock, it doesn't seem like the netdev
lock has anything to do with what other devices might do with this
information.

This can be triggered from many scenarios, I have another similar stack
involving bonding.

What do you think?

Cosmin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: Lock lower level devices when updating features
  2025-05-07 15:13         ` Cosmin Ratiu
@ 2025-05-07 20:29           ` Cosmin Ratiu
  2025-05-07 21:20             ` Stanislav Fomichev
  0 siblings, 1 reply; 10+ messages in thread
From: Cosmin Ratiu @ 2025-05-07 20:29 UTC (permalink / raw)
  To: stfomichev@gmail.com
  Cc: davem@davemloft.net, linux-kselftest@vger.kernel.org,
	Dragos Tatulea, sdf@fomichev.me, pabeni@redhat.com,
	jiri@resnulli.us, edumazet@google.com, kuba@kernel.org,
	Saeed Mahameed, netdev@vger.kernel.org

On Wed, 2025-05-07 at 15:13 +0000, Cosmin Ratiu wrote:
> > In any case, please hold off with picking this patch up, it seems
> > there's a possibility of a real deadlock. Here's the scenario:
> > 
> > ============================================
> > WARNING: possible recursive locking detected
> > --------------------------------------------
> > ethtool/44150 is trying to acquire lock:
> > ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> > __netdev_update_features+0x31e/0xe20
> > 
> > but task is already holding lock:
> > ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> > ethnl_set_features+0xbc/0x4b0
> > and the lock comparison function returns 0:
> > 
> > other info that might help us debug this:
> >  Possible unsafe locking scenario:
> > 
> >        CPU0
> >        ----
> >   lock(&dev_instance_lock_key#7);
> >   lock(&dev_instance_lock_key#7);
> > 
> >  *** DEADLOCK ***
> > 
> >  May be due to missing lock nesting notation
> > 
> > 3 locks held by ethtool/44150:
> >  #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40
> >  #1: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at:
> > ethnl_set_features+0x88/0x4b0
> >  #2: ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> > ethnl_set_features+0xbc/0x4b0
> > 
> > stack backtrace:
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x69/0xa0
> >  print_deadlock_bug.cold+0xbd/0xca
> >  __lock_acquire+0x163c/0x2f00
> >  lock_acquire+0xd3/0x2e0
> >  __mutex_lock+0x98/0xf10
> >  __netdev_update_features+0x31e/0xe20
> >  netdev_update_features+0x1f/0x60
> >  vlan_device_event+0x57d/0x930 [8021q]
> >  notifier_call_chain+0x3d/0x100
> >  netdev_features_change+0x32/0x50
> >  ethnl_set_features+0x17e/0x4b0
> >  genl_family_rcv_msg_doit+0xe0/0x130
> >  genl_rcv_msg+0x188/0x290
> > [...]
> > 
> > I'm not sure how to solve this yet...
> > Cosmin.
> 
> If it's not clear, the problem is that:
> 1. the lower device is already ops locked
> 2. netdev_feature_change gets called.
> 3. __netdev_update_features gets called for the vlan (upper) dev.
> 4. It tries to acquire the same lock instance as 1 (this patch).
> 5. Deadlock.
> 
> One solution I can think of would be to run device notifiers for
> changing features outside the lock, it doesn't seem like the netdev
> lock has anything to do with what other devices might do with this
> information.
> 
> This can be triggered from many scenarios, I have another similar
> stack
> involving bonding.
> 
> What do you think?

All I could think of was to drop the lock during the
netdev_features_changed notifier calls, like in the following hunk.
I'm running this through regressions, let's see if it's a good idea or
not.

diff --git a/net/core/dev.c b/net/core/dev.c
index 1be7cb73a602..817fd5bc21b1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1514,7 +1514,12 @@ int dev_get_alias(const struct net_device *dev,
char *name, size_t len)
  */
 void netdev_features_change(struct net_device *dev)
 {
+	/* Drop the lock to avoid potential deadlocks from e.g. upper
dev
+	 * notifiers altering features of 'dev' and acquiring dev lock
again.
+	 */
+	netdev_unlock_ops(dev);
 	call_netdevice_notifiers(NETDEV_FEAT_CHANGE, dev);
+	netdev_lock_ops(dev);
 }
 EXPORT_SYMBOL(netdev_features_change);


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: Lock lower level devices when updating features
  2025-05-07 20:29           ` Cosmin Ratiu
@ 2025-05-07 21:20             ` Stanislav Fomichev
  2025-05-08 10:33               ` Cosmin Ratiu
  0 siblings, 1 reply; 10+ messages in thread
From: Stanislav Fomichev @ 2025-05-07 21:20 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: davem@davemloft.net, linux-kselftest@vger.kernel.org,
	Dragos Tatulea, sdf@fomichev.me, pabeni@redhat.com,
	jiri@resnulli.us, edumazet@google.com, kuba@kernel.org,
	Saeed Mahameed, netdev@vger.kernel.org

On 05/07, Cosmin Ratiu wrote:
> On Wed, 2025-05-07 at 15:13 +0000, Cosmin Ratiu wrote:
> > > In any case, please hold off with picking this patch up, it seems
> > > there's a possibility of a real deadlock. Here's the scenario:
> > > 
> > > ============================================
> > > WARNING: possible recursive locking detected
> > > --------------------------------------------
> > > ethtool/44150 is trying to acquire lock:
> > > ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> > > __netdev_update_features+0x31e/0xe20
> > > 
> > > but task is already holding lock:
> > > ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> > > ethnl_set_features+0xbc/0x4b0
> > > and the lock comparison function returns 0:
> > > 
> > > other info that might help us debug this:
> > >  Possible unsafe locking scenario:
> > > 
> > >        CPU0
> > >        ----
> > >   lock(&dev_instance_lock_key#7);
> > >   lock(&dev_instance_lock_key#7);
> > > 
> > >  *** DEADLOCK ***
> > > 
> > >  May be due to missing lock nesting notation
> > > 
> > > 3 locks held by ethtool/44150:
> > >  #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40
> > >  #1: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at:
> > > ethnl_set_features+0x88/0x4b0
> > >  #2: ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> > > ethnl_set_features+0xbc/0x4b0
> > > 
> > > stack backtrace:
> > > Call Trace:
> > >  <TASK>
> > >  dump_stack_lvl+0x69/0xa0
> > >  print_deadlock_bug.cold+0xbd/0xca
> > >  __lock_acquire+0x163c/0x2f00
> > >  lock_acquire+0xd3/0x2e0
> > >  __mutex_lock+0x98/0xf10
> > >  __netdev_update_features+0x31e/0xe20
> > >  netdev_update_features+0x1f/0x60
> > >  vlan_device_event+0x57d/0x930 [8021q]
> > >  notifier_call_chain+0x3d/0x100
> > >  netdev_features_change+0x32/0x50
> > >  ethnl_set_features+0x17e/0x4b0
> > >  genl_family_rcv_msg_doit+0xe0/0x130
> > >  genl_rcv_msg+0x188/0x290
> > > [...]
> > > 
> > > I'm not sure how to solve this yet...
> > > Cosmin.
> > 
> > If it's not clear, the problem is that:
> > 1. the lower device is already ops locked
> > 2. netdev_feature_change gets called.
> > 3. __netdev_update_features gets called for the vlan (upper) dev.
> > 4. It tries to acquire the same lock instance as 1 (this patch).
> > 5. Deadlock.
> > 
> > One solution I can think of would be to run device notifiers for
> > changing features outside the lock, it doesn't seem like the netdev
> > lock has anything to do with what other devices might do with this
> > information.
> > 
> > This can be triggered from many scenarios, I have another similar
> > stack
> > involving bonding.
> > 
> > What do you think?
> 
> All I could think of was to drop the lock during the
> netdev_features_changed notifier calls, like in the following hunk.
> I'm running this through regressions, let's see if it's a good idea or
> not.
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1be7cb73a602..817fd5bc21b1 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1514,7 +1514,12 @@ int dev_get_alias(const struct net_device *dev,
> char *name, size_t len)
>   */
>  void netdev_features_change(struct net_device *dev)
>  {
> +	/* Drop the lock to avoid potential deadlocks from e.g. upper
> dev
> +	 * notifiers altering features of 'dev' and acquiring dev lock
> again.
> +	 */
> +	netdev_unlock_ops(dev);
>  	call_netdevice_notifiers(NETDEV_FEAT_CHANGE, dev);
> +	netdev_lock_ops(dev);
>  }
>  EXPORT_SYMBOL(netdev_features_change);
> 

Hmm, are you sure you're calling __netdev_update_features on the upper?
I don't see how the lower would be locked in that case. From my POW,
this is what happens:

1. your dev (lower) has a vlan on it (upper)
2. you call lro=off on the _lower_
3. this triggers FEAT_CHANGE notifier and vlan_device_event catches it
4. since the lower has a vlan device (dev->vlan_info != NULL), it goes
   over every other vlan in the group and triggers netdev_update_features
   for the upper (netdev_update_features vlandev)
5. the upper tries to sync the features into the lower (including the
   one that triggered FEAT_CHANGE) and that's where the deadlock happens

But I think (5) should be largely a no-op for the device triggering the
notification, because the features have been already applied in ethnl_set_features.
I'd move the lock into netdev_sync_lower_features, and only after checking
the features (and making sure that we are going to change them). The feature
check might be racy, but I think it should still work?

Can you also share the bonding stacktrace as well to confirm it's the
same issue?

diff --git a/net/core/dev.c b/net/core/dev.c
index bcb266ab2912..b5fc8a740e8b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10312,6 +10312,7 @@ static void netdev_sync_lower_features(struct net_device *upper,
        for_each_netdev_feature(upper_disables, feature_bit) {
                feature = __NETIF_F_BIT(feature_bit);
                if (!(features & feature) && (lower->features & feature)) {
+                       netdev_lock_ops(lower);
                        netdev_dbg(upper, "Disabling feature %pNF on lower dev %s.\n",
                                   &feature, lower->name);
                        lower->wanted_features &= ~feature;
@@ -10322,6 +10323,7 @@ static void netdev_sync_lower_features(struct net_device *upper,
                                            &feature, lower->name);
                        else
                                netdev_features_change(lower);
+                       netdev_unlock_ops(lower);
                }
        }
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: Lock lower level devices when updating features
  2025-05-07 21:20             ` Stanislav Fomichev
@ 2025-05-08 10:33               ` Cosmin Ratiu
  0 siblings, 0 replies; 10+ messages in thread
From: Cosmin Ratiu @ 2025-05-08 10:33 UTC (permalink / raw)
  To: stfomichev@gmail.com
  Cc: davem@davemloft.net, linux-kselftest@vger.kernel.org,
	Dragos Tatulea, sdf@fomichev.me, kuba@kernel.org,
	jiri@resnulli.us, edumazet@google.com, pabeni@redhat.com,
	Saeed Mahameed, netdev@vger.kernel.org

On Wed, 2025-05-07 at 14:20 -0700, Stanislav Fomichev wrote:
> On 05/07, Cosmin Ratiu wrote:
> > On Wed, 2025-05-07 at 15:13 +0000, Cosmin Ratiu wrote:
> > > > In any case, please hold off with picking this patch up, it
> > > > seems
> > > > there's a possibility of a real deadlock. Here's the scenario:
> 
> Hmm, are you sure you're calling __netdev_update_features on the
> upper?
> I don't see how the lower would be locked in that case. From my POW,
> this is what happens:
> 
> 1. your dev (lower) has a vlan on it (upper)
> 2. you call lro=off on the _lower_
> 3. this triggers FEAT_CHANGE notifier and vlan_device_event catches
> it
> 4. since the lower has a vlan device (dev->vlan_info != NULL), it
> goes
>    over every other vlan in the group and triggers
> netdev_update_features
>    for the upper (netdev_update_features vlandev)
> 5. the upper tries to sync the features into the lower (including the
>    one that triggered FEAT_CHANGE) and that's where the deadlock
> happens
> 
> But I think (5) should be largely a no-op for the device triggering
> the
> notification, because the features have been already applied in
> ethnl_set_features.
> I'd move the lock into netdev_sync_lower_features, and only after
> checking
> the features (and making sure that we are going to change them). The
> feature
> check might be racy, but I think it should still work?
> 

You are right, if I restrict the lower dev critical section to only the
call to __netdev_update_features for the lower dev there's no deadlock
any more, because the device with the lock held already had its
features updated.

I will send a new version of this patch soon after the full regression
suite finishes and I convince myself there are no more issues related
to this that we can encounter.

> Can you also share the bonding stacktrace as well to confirm it's the
> same issue?

Sure, here it is, it's the same scenario. bond_netdev_event gets called
on a slave dev, it recomputes features and updates all slaves
(bond_compute_features), and then the same lock is reacquired.

But this is also fixed with your suggestion above.

 ============================================
 WARNING: possible recursive locking detected

 devlink/14341 is trying to acquire lock:
 ffff88810ebd8c80 (&dev_instance_lock_key#9){+.+.}-{4:4}, at:
__netdev_update_features+0x31e/0xe20

 but task is already holding lock:
 ffff88810ebd8c80 (&dev_instance_lock_key#9){+.+.}-{4:4}, at:
mlx5e_attach_netdev+0x31f/0x360 [mlx5_core]
 and the lock comparison function returns 0:
    
 other info that might help us debug this: 
  Possible unsafe locking scenario:
    
        CPU0
        ----
   lock(&dev_instance_lock_key#9);
   lock(&dev_instance_lock_key#9);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 4 locks held by devlink/14341:
  #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40
  #1: ffff888164a5c250 (&devlink->lock_key){+.+.}-{4:4}, at:
devlink_get_from_attrs_lock+0xbc/0x180
  #2: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at:
mlx5e_attach_netdev+0x30d/0x360 [mlx5_core]
  #3: ffff88810ebd8c80 (&dev_instance_lock_key#9){+.+.}-{4:4}, at:
mlx5e_attach_netdev+0x31f/0x360 [mlx5_core]

 Call Trace:
  <TASK>
  dump_stack_lvl+0x69/0xa0
  print_deadlock_bug.cold+0xbd/0xca
  __lock_acquire+0x163c/0x2f00
  lock_acquire+0xd3/0x2e0
  __mutex_lock+0x98/0xf10
  __netdev_update_features+0x31e/0xe20
  netdev_change_features+0x1f/0x60
  bond_compute_features+0x24e/0x300 [bonding]
  bond_netdev_event+0x2e0/0x400 [bonding]
  notifier_call_chain+0x3d/0x100
  netdev_update_features+0x52/0x60
  mlx5e_attach_netdev+0x32f/0x360 [mlx5_core]
  mlx5e_netdev_attach_profile+0x48/0x90 [mlx5_core]
  mlx5e_netdev_change_profile+0x90/0xf0 [mlx5_core]
  mlx5e_vport_rep_load+0x414/0x490 [mlx5_core]
  __esw_offloads_load_rep+0x87/0xd0 [mlx5_core]
  mlx5_esw_offloads_rep_load+0x45/0xe0 [mlx5_core]
  esw_offloads_enable+0xb7b/0xca0 [mlx5_core] 
  mlx5_eswitch_enable_locked+0x293/0x430 [mlx5_core]
  mlx5_devlink_eswitch_mode_set+0x229/0x620 [mlx5_core]
  devlink_nl_eswitch_set_doit+0x60/0xd0
  genl_family_rcv_msg_doit+0xe0/0x130
  genl_rcv_msg+0x188/0x290
  netlink_rcv_skb+0x4b/0xf0
  genl_rcv+0x24/0x40
  netlink_unicast+0x1e1/0x2c0
  netlink_sendmsg+0x210/0x450

Cosmin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-05-08 10:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-06 14:21 [PATCH net] net: Lock lower level devices when updating features Cosmin Ratiu
2025-05-06 17:12 ` Stanislav Fomichev
2025-05-06 17:40   ` Stanislav Fomichev
2025-05-06 17:47   ` Cosmin Ratiu
2025-05-06 18:13     ` Stanislav Fomichev
2025-05-07 14:35       ` Cosmin Ratiu
2025-05-07 15:13         ` Cosmin Ratiu
2025-05-07 20:29           ` Cosmin Ratiu
2025-05-07 21:20             ` Stanislav Fomichev
2025-05-08 10:33               ` Cosmin Ratiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).