netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v1] dev: remove netdev_lock() and netdev_lock_ops() in register_netdevice().
@ 2025-03-08 20:37 Kohei Enju
  2025-03-08 21:18 ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Kohei Enju @ 2025-03-08 20:37 UTC (permalink / raw)
  To: netdev, linux-kernel, bpf
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Sebastian Andrzej Siewior,
	Ahmed Zaki, Stanislav Fomichev, Alexander Lobakin, Kohei Enju,
	Kohei Enju

ieee80211_register_hw() takes the wiphy lock, and then
register_netdevice() calls netdev_lock(), since commit 5fda3f35349b ("net:
make netdev_lock() protect netdev->reg_state").
On the other hand, do_setlink() calls netdev_lock() and then
ieee80211_change_mac() takes the wiphy lock, after commit df43d8bf1031
("net: replace dev_addr_sem with netdev instance lock").
This causes the circular locking dependency.

Like netdev_lock(), netdev_lock_ops() introduced in commit 97246d6d21c2
("net: hold netdev instance lock during ndo_bpf") would also cause the
same type of issue.

Both netdev_lock() and netdev_lock_ops() are called before
list_netdevice() in register_netdevice().
No other context can access the struct net_device, so we don't need these
locks in this context.
Remove them.

WARNING: possible circular locking dependency detected
6.14.0-rc5-next-20250307 #44 Not tainted

NetworkManager/8289 is trying to acquire lock:
ffff88810fb30768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ieee80211_change_mac+0x110/0x1450

but task is already holding lock:
ffff88810fb48d30 (&dev->lock){+.+.}-{4:4}, at: do_setlink.isra.0+0x326c/0x3f40

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&dev->lock){+.+.}-{4:4}:
       lock_acquire+0x1b2/0x550
       __mutex_lock+0x1a2/0x14b0
       register_netdevice+0x12dc/0x2000
       cfg80211_register_netdevice+0x149/0x330
       ieee80211_if_add+0xcfe/0x1880
       ieee80211_register_hw+0x3655/0x3f60
       mac80211_hwsim_new_radio+0x2760/0x53a0
       init_mac80211_hwsim+0x6c6/0x7d0
       do_one_initcall+0x11a/0x6d0
       kernel_init_freeable+0x6dd/0x770
       kernel_init+0x1f/0x1e0
       ret_from_fork+0x45/0x80
       ret_from_fork_asm+0x1a/0x30

-> #0 (&rdev->wiphy.mtx){+.+.}-{4:4}:
       check_prev_add+0x1af/0x23e0
       __lock_acquire+0x2d55/0x49a0
       lock_acquire+0x1b2/0x550
       __mutex_lock+0x1a2/0x14b0
       ieee80211_change_mac+0x110/0x1450
       netif_set_mac_address+0x30a/0x4a0
       do_setlink.isra.0+0x77a/0x3f40
       rtnl_newlink+0x11ef/0x2370
       rtnetlink_rcv_msg+0x95b/0xe90
       netlink_rcv_skb+0x16b/0x440
       netlink_unicast+0x532/0x7f0
       netlink_sendmsg+0x8ca/0xda0
       ____sys_sendmsg+0x9cf/0xb60
       ___sys_sendmsg+0x11a/0x1a0
       __sys_sendmsg+0x136/0x1e0
       do_syscall_64+0x74/0x190
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&dev->lock);
                               lock(&rdev->wiphy.mtx);
                               lock(&dev->lock);
  lock(&rdev->wiphy.mtx);

 *** DEADLOCK ***

Fixes: df43d8bf1031 ("net: replace dev_addr_sem with netdev instance lock")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/dev.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c4cc33f73629..df9661961558 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11003,17 +11003,11 @@ int register_netdevice(struct net_device *dev)
 		goto err_ifindex_release;
 
 	ret = netdev_register_kobject(dev);
-
-	netdev_lock(dev);
 	WRITE_ONCE(dev->reg_state, ret ? NETREG_UNREGISTERED : NETREG_REGISTERED);
-	netdev_unlock(dev);
-
 	if (ret)
 		goto err_uninit_notify;
 
-	netdev_lock_ops(dev);
 	__netdev_update_features(dev);
-	netdev_unlock_ops(dev);
 
 	/*
 	 *	Default initial state at registry is that the
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v1] dev: remove netdev_lock() and netdev_lock_ops() in register_netdevice().
  2025-03-08 20:37 [PATCH net-next v1] dev: remove netdev_lock() and netdev_lock_ops() in register_netdevice() Kohei Enju
@ 2025-03-08 21:18 ` Jakub Kicinski
  2025-03-08 22:41   ` Stanislav Fomichev
  2025-03-08 22:41   ` Jakub Kicinski
  0 siblings, 2 replies; 7+ messages in thread
From: Jakub Kicinski @ 2025-03-08 21:18 UTC (permalink / raw)
  To: Kohei Enju
  Cc: netdev, linux-kernel, bpf, David S . Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	Sebastian Andrzej Siewior, Ahmed Zaki, Stanislav Fomichev,
	Alexander Lobakin, Kohei Enju

On Sun, 9 Mar 2025 05:37:18 +0900 Kohei Enju wrote:
> Both netdev_lock() and netdev_lock_ops() are called before
> list_netdevice() in register_netdevice().
> No other context can access the struct net_device, so we don't need these
> locks in this context.

Doesn't sysfs get registered earlier?
I'm afraid not being able to take the lock from the registration
path ties our hands too much. Maybe we need to make a more serious
attempt at letting the caller take the lock?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v1] dev: remove netdev_lock() and netdev_lock_ops() in register_netdevice().
  2025-03-08 21:18 ` Jakub Kicinski
@ 2025-03-08 22:41   ` Stanislav Fomichev
  2025-03-09 21:41     ` Stanislav Fomichev
  2025-03-08 22:41   ` Jakub Kicinski
  1 sibling, 1 reply; 7+ messages in thread
From: Stanislav Fomichev @ 2025-03-08 22:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kohei Enju, netdev, linux-kernel, bpf, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	Sebastian Andrzej Siewior, Ahmed Zaki, Stanislav Fomichev,
	Alexander Lobakin, Kohei Enju

On 03/08, Jakub Kicinski wrote:
> On Sun, 9 Mar 2025 05:37:18 +0900 Kohei Enju wrote:
> > Both netdev_lock() and netdev_lock_ops() are called before
> > list_netdevice() in register_netdevice().
> > No other context can access the struct net_device, so we don't need these
> > locks in this context.

That's technically true, but it will set off a bunch of lockdep
warnings :-(

> Doesn't sysfs get registered earlier?
> I'm afraid not being able to take the lock from the registration
> path ties our hands too much. Maybe we need to make a more serious
> attempt at letting the caller take the lock?

This looks like another case of upper/lower :-( So maybe we need to solve
it for real? With an extra upper_lock pointer in the netdev?
Untested patch to convey the general idea:

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d3c549f73909..9c85179431e6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2520,6 +2520,7 @@ struct net_device {
 	 * Ordering: take after rtnl_lock.
 	 */
 	struct mutex		lock;
+	struct mutex		*upper_lock;
 
 #if IS_ENABLED(CONFIG_NET_SHAPER)
 	/**
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 90597bf84e3d..3d0fda6e9bca 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3022,6 +3022,9 @@ static int do_setlink(const struct sk_buff *skb, struct net_device *dev,
 	char ifname[IFNAMSIZ];
 	int err;
 
+	/* TODO: add another wrapper for this */
+	if (dev->upper_lock)
+		mutex_lock(dev->upper_lock);
 	netdev_lock_ops(dev);
 
 	err = validate_linkmsg(dev, tb, extack);
@@ -3394,6 +3397,8 @@ static int do_setlink(const struct sk_buff *skb, struct net_device *dev,
 	}
 
 	netdev_unlock_ops(dev);
+	if (dev->upper_lock)
+		mutex_unlock(dev->upper_lock);
 
 	return err;
 }
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index b0423046028c..818ff487b363 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -304,7 +304,7 @@ static int ieee80211_change_mac(struct net_device *dev, void *addr)
 	if (!dev->ieee80211_ptr->registered)
 		return 0;
 
-	guard(wiphy)(local->hw.wiphy);
+	/* TODO: remove guard from other places */
 
 	return _ieee80211_change_mac(sdata, addr);
 }
@@ -2227,6 +2227,8 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
 			free_netdev(ndev);
 			return ret;
 		}
+
+		ndev->upper_lock = &local->hw.wiphy.mtx;
 	}
 
 	mutex_lock(&local->iflist_mtx);


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v1] dev: remove netdev_lock() and netdev_lock_ops() in register_netdevice().
  2025-03-08 21:18 ` Jakub Kicinski
  2025-03-08 22:41   ` Stanislav Fomichev
@ 2025-03-08 22:41   ` Jakub Kicinski
  2025-03-08 22:59     ` Stanislav Fomichev
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2025-03-08 22:41 UTC (permalink / raw)
  To: Kohei Enju
  Cc: netdev, linux-kernel, bpf, David S . Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	Sebastian Andrzej Siewior, Ahmed Zaki, Stanislav Fomichev,
	Alexander Lobakin, Kohei Enju

On Sat, 8 Mar 2025 13:18:13 -0800 Jakub Kicinski wrote:
> On Sun, 9 Mar 2025 05:37:18 +0900 Kohei Enju wrote:
> > Both netdev_lock() and netdev_lock_ops() are called before
> > list_netdevice() in register_netdevice().
> > No other context can access the struct net_device, so we don't need these
> > locks in this context.  
> 
> Doesn't sysfs get registered earlier?
> I'm afraid not being able to take the lock from the registration
> path ties our hands too much. Maybe we need to make a more serious
> attempt at letting the caller take the lock?

Looking closer at the report - we are violating the contract that only
drivers which opted in get their ops called under the instance lock.
iavf had a similar problem but it had to opt in. WiFi doesn't.

Maybe we can bring the address semaphore back?
We just need to take it before the ops lock in do_setlink.
A bit ugly but would work?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v1] dev: remove netdev_lock() and netdev_lock_ops() in register_netdevice().
  2025-03-08 22:41   ` Jakub Kicinski
@ 2025-03-08 22:59     ` Stanislav Fomichev
  2025-03-08 23:51       ` Stanislav Fomichev
  0 siblings, 1 reply; 7+ messages in thread
From: Stanislav Fomichev @ 2025-03-08 22:59 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kohei Enju, netdev, linux-kernel, bpf, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	Sebastian Andrzej Siewior, Ahmed Zaki, Stanislav Fomichev,
	Alexander Lobakin, Kohei Enju

On 03/08, Jakub Kicinski wrote:
> On Sat, 8 Mar 2025 13:18:13 -0800 Jakub Kicinski wrote:
> > On Sun, 9 Mar 2025 05:37:18 +0900 Kohei Enju wrote:
> > > Both netdev_lock() and netdev_lock_ops() are called before
> > > list_netdevice() in register_netdevice().
> > > No other context can access the struct net_device, so we don't need these
> > > locks in this context.  
> > 
> > Doesn't sysfs get registered earlier?
> > I'm afraid not being able to take the lock from the registration
> > path ties our hands too much. Maybe we need to make a more serious
> > attempt at letting the caller take the lock?
> 
> Looking closer at the report - we are violating the contract that only
> drivers which opted in get their ops called under the instance lock.
> iavf had a similar problem but it had to opt in. WiFi doesn't.
> 
> Maybe we can bring the address semaphore back?
> We just need to take it before the ops lock in do_setlink.
> A bit ugly but would work?

I remember I was having another lockdep circular report with the addr
sema, but maybe moving it before the ops lock fill fix it not sure.

But coming back to "No other context can access the struct net_device,
so we don't need these locks in this context.". What if we move
netdev_set_addr_lockdep_class() call down a bit? Right before list_netdevice
happens. Will it help with the lockdep?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v1] dev: remove netdev_lock() and netdev_lock_ops() in register_netdevice().
  2025-03-08 22:59     ` Stanislav Fomichev
@ 2025-03-08 23:51       ` Stanislav Fomichev
  0 siblings, 0 replies; 7+ messages in thread
From: Stanislav Fomichev @ 2025-03-08 23:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kohei Enju, netdev, linux-kernel, bpf, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	Sebastian Andrzej Siewior, Ahmed Zaki, Stanislav Fomichev,
	Alexander Lobakin, Kohei Enju

On 03/08, Stanislav Fomichev wrote:
> On 03/08, Jakub Kicinski wrote:
> > On Sat, 8 Mar 2025 13:18:13 -0800 Jakub Kicinski wrote:
> > > On Sun, 9 Mar 2025 05:37:18 +0900 Kohei Enju wrote:
> > > > Both netdev_lock() and netdev_lock_ops() are called before
> > > > list_netdevice() in register_netdevice().
> > > > No other context can access the struct net_device, so we don't need these
> > > > locks in this context.  
> > > 
> > > Doesn't sysfs get registered earlier?
> > > I'm afraid not being able to take the lock from the registration
> > > path ties our hands too much. Maybe we need to make a more serious
> > > attempt at letting the caller take the lock?
> > 
> > Looking closer at the report - we are violating the contract that only
> > drivers which opted in get their ops called under the instance lock.
> > iavf had a similar problem but it had to opt in. WiFi doesn't.
> > 
> > Maybe we can bring the address semaphore back?
> > We just need to take it before the ops lock in do_setlink.
> > A bit ugly but would work?
> 
> I remember I was having another lockdep circular report with the addr
> sema, but maybe moving it before the ops lock fill fix it not sure.
> 
> But coming back to "No other context can access the struct net_device,
> so we don't need these locks in this context.". What if we move
> netdev_set_addr_lockdep_class() call down a bit? Right before list_netdevice
> happens. Will it help with the lockdep?

Hmm, netdev_set_addr_lockdep_class is not touching instance lock :-(
But basically do lockdep_set_novalidate_class early and undo it
before list_netdevice...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v1] dev: remove netdev_lock() and netdev_lock_ops() in register_netdevice().
  2025-03-08 22:41   ` Stanislav Fomichev
@ 2025-03-09 21:41     ` Stanislav Fomichev
  0 siblings, 0 replies; 7+ messages in thread
From: Stanislav Fomichev @ 2025-03-09 21:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kohei Enju, netdev, linux-kernel, bpf, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	Sebastian Andrzej Siewior, Ahmed Zaki, Stanislav Fomichev,
	Alexander Lobakin, Kohei Enju

On 03/08, Stanislav Fomichev wrote:
> On 03/08, Jakub Kicinski wrote:
> > On Sun, 9 Mar 2025 05:37:18 +0900 Kohei Enju wrote:
> > > Both netdev_lock() and netdev_lock_ops() are called before
> > > list_netdevice() in register_netdevice().
> > > No other context can access the struct net_device, so we don't need these
> > > locks in this context.
> 
> That's technically true, but it will set off a bunch of lockdep
> warnings :-(
Let me drop it for now from the nipa because it does complain about it:
https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/24641/67-nl-netdev-py/stderr

---
pw-bot: cr

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-03-09 21:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-08 20:37 [PATCH net-next v1] dev: remove netdev_lock() and netdev_lock_ops() in register_netdevice() Kohei Enju
2025-03-08 21:18 ` Jakub Kicinski
2025-03-08 22:41   ` Stanislav Fomichev
2025-03-09 21:41     ` Stanislav Fomichev
2025-03-08 22:41   ` Jakub Kicinski
2025-03-08 22:59     ` Stanislav Fomichev
2025-03-08 23:51       ` Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).