netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops
@ 2025-04-08 19:59 Jakub Kicinski
  2025-04-08 19:59 ` [PATCH net-next v2 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch Jakub Kicinski
                   ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-08 19:59 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato, Jakub Kicinski

netdev-genl used to be protected by rtnl_lock. In previous release
we already switched the queue management ops (for Rx zero-copy) to
the instance lock. This series converts other ops to depend on the
instance lock when possible.

Unfortunately queue related state is hard to lock (unlike NAPI)
as the process of switching the number of queues usually involves
a large reconfiguration of the driver. The reconfig process has
historically been under rtnl_lock, but for drivers which opt into
ops locking it is also under the instance lock. Leverage that
and conditionally take rtnl_lock or instance lock depending
on the device capabilities.

v2:
 - fix a bug in patch 2
 - reword the doc a tiny bit (patch 7)
v1: https://lore.kernel.org/20250407190117.16528-1-kuba@kernel.org

Jakub Kicinski (8):
  net: avoid potential race between netdev_get_by_index_lock() and netns
    switch
  net: designate XSK pool pointers in queues as "ops protected"
  netdev: add "ops compat locking" helpers
  netdev: don't hold rtnl_lock over nl queue info get when possible
  xdp: double protect netdev->xdp_flags with netdev->lock
  netdev: depend on netdev->lock for xdp features
  docs: netdev: break down the instance locking info per ops struct
  netdev: depend on netdev->lock for qstats in ops locked drivers

 Documentation/networking/netdevices.rst    | 61 +++++++++++++----
 include/linux/netdevice.h                  |  7 +-
 include/net/netdev_lock.h                  | 16 +++++
 include/net/netdev_queues.h                |  4 +-
 include/net/netdev_rx_queue.h              |  6 +-
 include/net/xdp.h                          |  1 +
 net/core/dev.h                             | 17 ++++-
 drivers/net/ethernet/google/gve/gve_main.c |  2 +-
 net/core/dev.c                             | 76 ++++++++++++++++++++--
 net/core/lock_debug.c                      |  2 +-
 net/core/netdev-genl.c                     | 73 ++++++++++-----------
 net/core/xdp.c                             | 12 +++-
 net/xdp/xsk_buff_pool.c                    |  6 +-
 13 files changed, 217 insertions(+), 66 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch
  2025-04-08 19:59 [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
@ 2025-04-08 19:59 ` Jakub Kicinski
  2025-04-08 19:59 ` [PATCH net-next v2 2/8] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-08 19:59 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato, Jakub Kicinski

netdev_get_by_index_lock() performs following steps:

  rcu_lock();
  dev = lookup(netns, ifindex);
  dev_get(dev);
  rcu_unlock();
  [... lock & validate the dev ...]
  return dev

Validation right now only checks if the device is registered but since
the lookup is netns-aware we must also protect against the device
switching netns right after we dropped the RCU lock. Otherwise
the caller in netns1 may get a pointer to a device which has just
switched to netns2.

We can't hold the lock for the entire netns change process (because of
the NETDEV_UNREGISTER notifier), and there's no existing marking to
indicate that the netns is unlisted because of netns move, so add one.

AFAIU none of the existing netdev_get_by_index_lock() callers can
suffer from this problem (NAPI code double checks the netns membership
and other callers are either under rtnl_lock or not ns-sensitive),
so this patch does not have to be treated as a fix.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/netdevice.h |  6 +++++-
 net/core/dev.h            |  2 +-
 net/core/dev.c            | 25 ++++++++++++++++++-------
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf3b6445817b..8e9be80bc167 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1952,6 +1952,7 @@ enum netdev_reg_state {
  *	@priv_destructor:	Called from unregister
  *	@npinfo:		XXX: need comments on this one
  * 	@nd_net:		Network namespace this network device is inside
+ *				protected by @lock
  *
  * 	@ml_priv:	Mid-layer private
  *	@ml_priv_type:  Mid-layer private type
@@ -2359,6 +2360,9 @@ struct net_device {
 
 	bool dismantle;
 
+	/** @moving_ns: device is changing netns, protected by @lock */
+	bool moving_ns;
+
 	enum {
 		RTNL_LINK_INITIALIZED,
 		RTNL_LINK_INITIALIZING,
@@ -2521,7 +2525,7 @@ struct net_device {
 	 *	@net_shaper_hierarchy, @reg_state, @threaded
 	 *
 	 * Double protects:
-	 *	@up
+	 *	@up, @moving_ns, @nd_net
 	 *
 	 * Double ops protects:
 	 *	@real_num_rx_queues, @real_num_tx_queues
diff --git a/net/core/dev.h b/net/core/dev.h
index 710abc05ebdb..e7bf21f52fc7 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -30,7 +30,7 @@ netdev_napi_by_id_lock(struct net *net, unsigned int napi_id);
 struct net_device *dev_get_by_napi_id(unsigned int napi_id);
 
 struct net_device *netdev_get_by_index_lock(struct net *net, int ifindex);
-struct net_device *__netdev_put_lock(struct net_device *dev);
+struct net_device *__netdev_put_lock(struct net_device *dev, struct net *net);
 struct net_device *
 netdev_xa_find_lock(struct net *net, struct net_device *dev,
 		    unsigned long *index);
diff --git a/net/core/dev.c b/net/core/dev.c
index 4ccc6dc5303e..8c13e5339cc9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -828,7 +828,7 @@ netdev_napi_by_id_lock(struct net *net, unsigned int napi_id)
 	dev_hold(dev);
 	rcu_read_unlock();
 
-	dev = __netdev_put_lock(dev);
+	dev = __netdev_put_lock(dev, net);
 	if (!dev)
 		return NULL;
 
@@ -1039,10 +1039,11 @@ struct net_device *dev_get_by_napi_id(unsigned int napi_id)
  * This helper is intended for locking net_device after it has been looked up
  * using a lockless lookup helper. Lock prevents the instance from going away.
  */
-struct net_device *__netdev_put_lock(struct net_device *dev)
+struct net_device *__netdev_put_lock(struct net_device *dev, struct net *net)
 {
 	netdev_lock(dev);
-	if (dev->reg_state > NETREG_REGISTERED) {
+	if (dev->reg_state > NETREG_REGISTERED ||
+	    dev->moving_ns || !net_eq(dev_net(dev), net)) {
 		netdev_unlock(dev);
 		dev_put(dev);
 		return NULL;
@@ -1070,7 +1071,7 @@ struct net_device *netdev_get_by_index_lock(struct net *net, int ifindex)
 	if (!dev)
 		return NULL;
 
-	return __netdev_put_lock(dev);
+	return __netdev_put_lock(dev, net);
 }
 
 struct net_device *
@@ -1090,7 +1091,7 @@ netdev_xa_find_lock(struct net *net, struct net_device *dev,
 		dev_hold(dev);
 		rcu_read_unlock();
 
-		dev = __netdev_put_lock(dev);
+		dev = __netdev_put_lock(dev, net);
 		if (dev)
 			return dev;
 
@@ -12157,7 +12158,11 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
 	netif_close(dev);
 	/* And unlink it from device chain */
 	unlist_netdevice(dev);
-	netdev_unlock_ops(dev);
+
+	if (!netdev_need_ops_lock(dev))
+		netdev_lock(dev);
+	dev->moving_ns = true;
+	netdev_unlock(dev);
 
 	synchronize_net();
 
@@ -12193,7 +12198,9 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
 	move_netdevice_notifiers_dev_net(dev, net);
 
 	/* Actually switch the network namespace */
+	netdev_lock(dev);
 	dev_net_set(dev, net);
+	netdev_unlock(dev);
 	dev->ifindex = new_ifindex;
 
 	if (new_name[0]) {
@@ -12219,7 +12226,11 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
 	err = netdev_change_owner(dev, net_old, net);
 	WARN_ON(err);
 
-	netdev_lock_ops(dev);
+	netdev_lock(dev);
+	dev->moving_ns = false;
+	if (!netdev_need_ops_lock(dev))
+		netdev_unlock(dev);
+
 	/* Add the device back in the hashes */
 	list_netdevice(dev);
 	/* Notify protocols, that a new device appeared. */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 2/8] net: designate XSK pool pointers in queues as "ops protected"
  2025-04-08 19:59 [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
  2025-04-08 19:59 ` [PATCH net-next v2 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch Jakub Kicinski
@ 2025-04-08 19:59 ` Jakub Kicinski
  2025-04-08 19:59 ` [PATCH net-next v2 3/8] netdev: add "ops compat locking" helpers Jakub Kicinski
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-08 19:59 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato, Jakub Kicinski

Read accesses go via xsk_get_pool_from_qid(), the call coming
from the core and gve look safe (other "ops locked" drivers
don't support XSK).

Write accesses go via xsk_reg_pool_at_qid() and xsk_clear_pool_at_qid().
Former is already under the ops lock, latter is not (both coming from
the workqueue via xp_clear_dev() and NETDEV_UNREGISTER via xsk_notifier()).

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
---
v2:
 - move the locking to xp_clear_dev()
v1: https://lore.kernel.org/20250407190117.16528-3-kuba@kernel.org
---
 include/linux/netdevice.h     | 1 +
 include/net/netdev_rx_queue.h | 6 +++---
 net/xdp/xsk_buff_pool.c       | 6 +++++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8e9be80bc167..7242fb8a22fc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -688,6 +688,7 @@ struct netdev_queue {
 	/* Subordinate device that the queue has been assigned to */
 	struct net_device	*sb_dev;
 #ifdef CONFIG_XDP_SOCKETS
+	/* "ops protected", see comment about net_device::lock */
 	struct xsk_buff_pool    *pool;
 #endif
 
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index b2238b551dce..8cdcd138b33f 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -20,12 +20,12 @@ struct netdev_rx_queue {
 	struct net_device		*dev;
 	netdevice_tracker		dev_tracker;
 
+	/* All fields below are "ops protected",
+	 * see comment about net_device::lock
+	 */
 #ifdef CONFIG_XDP_SOCKETS
 	struct xsk_buff_pool            *pool;
 #endif
-	/* NAPI instance for the queue
-	 * "ops protected", see comment about net_device::lock
-	 */
 	struct napi_struct		*napi;
 	struct pp_memory_provider_params mp_params;
 } ____cacheline_aligned_in_smp;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 25a76c5ce0f1..cbf2129e808b 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -266,13 +266,17 @@ int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_sock *umem_xs,
 
 void xp_clear_dev(struct xsk_buff_pool *pool)
 {
+	struct net_device *netdev = pool->netdev;
+
 	if (!pool->netdev)
 		return;
 
+	netdev_lock_ops(netdev);
 	xp_disable_drv_zc(pool);
 	xsk_clear_pool_at_qid(pool->netdev, pool->queue_id);
-	dev_put(pool->netdev);
 	pool->netdev = NULL;
+	netdev_unlock_ops(netdev);
+	dev_put(netdev);
 }
 
 static void xp_release_deferred(struct work_struct *work)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 3/8] netdev: add "ops compat locking" helpers
  2025-04-08 19:59 [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
  2025-04-08 19:59 ` [PATCH net-next v2 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch Jakub Kicinski
  2025-04-08 19:59 ` [PATCH net-next v2 2/8] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
@ 2025-04-08 19:59 ` Jakub Kicinski
  2025-04-08 19:59 ` [PATCH net-next v2 4/8] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-08 19:59 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato, Jakub Kicinski

Add helpers to "lock a netdev in a backward-compatible way",
which for ops-locked netdevs will mean take the instance lock.
For drivers which haven't opted into the ops locking we'll take
rtnl_lock.

The scoped foreach is dropping and re-taking the lock for each
device, even if prev and next are both under rtnl_lock.
I hope that's fine since we expect that netdev nl to be mostly
supported by modern drivers, and modern drivers should also
opt into the instance locking.

Note that these helpers are mostly needed for queue related state,
because drivers modify queue config in their ops in a non-atomic
way. Or differently put, queue changes don't have a clear-cut API
like NAPI configuration. Any state that can should just use the
instance lock directly, not the "compat" hacks.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/net/netdev_lock.h | 16 ++++++++++++
 net/core/dev.h            | 15 ++++++++++++
 net/core/dev.c            | 51 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)

diff --git a/include/net/netdev_lock.h b/include/net/netdev_lock.h
index c316b551df8d..5706835a660c 100644
--- a/include/net/netdev_lock.h
+++ b/include/net/netdev_lock.h
@@ -64,6 +64,22 @@ netdev_ops_assert_locked_or_invisible(const struct net_device *dev)
 		netdev_ops_assert_locked(dev);
 }
 
+static inline void netdev_lock_ops_compat(struct net_device *dev)
+{
+	if (netdev_need_ops_lock(dev))
+		netdev_lock(dev);
+	else
+		rtnl_lock();
+}
+
+static inline void netdev_unlock_ops_compat(struct net_device *dev)
+{
+	if (netdev_need_ops_lock(dev))
+		netdev_unlock(dev);
+	else
+		rtnl_unlock();
+}
+
 static inline int netdev_lock_cmp_fn(const struct lockdep_map *a,
 				     const struct lockdep_map *b)
 {
diff --git a/net/core/dev.h b/net/core/dev.h
index e7bf21f52fc7..e93f36b7ddf3 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -42,6 +42,21 @@ DEFINE_FREE(netdev_unlock, struct net_device *, if (_T) netdev_unlock(_T));
 	     (var_name = netdev_xa_find_lock(net, var_name, &ifindex)); \
 	     ifindex++)
 
+struct net_device *
+netdev_get_by_index_lock_ops_compat(struct net *net, int ifindex);
+struct net_device *
+netdev_xa_find_lock_ops_compat(struct net *net, struct net_device *dev,
+			       unsigned long *index);
+
+DEFINE_FREE(netdev_unlock_ops_compat, struct net_device *,
+	    if (_T) netdev_unlock_ops_compat(_T));
+
+#define for_each_netdev_lock_ops_compat_scoped(net, var_name, ifindex)	\
+	for (struct net_device *var_name __free(netdev_unlock_ops_compat) = NULL; \
+	     (var_name = netdev_xa_find_lock_ops_compat(net, var_name,	\
+							&ifindex));	\
+	     ifindex++)
+
 #ifdef CONFIG_PROC_FS
 int __init dev_proc_init(void);
 #else
diff --git a/net/core/dev.c b/net/core/dev.c
index 8c13e5339cc9..b52efa4cec56 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1052,6 +1052,20 @@ struct net_device *__netdev_put_lock(struct net_device *dev, struct net *net)
 	return dev;
 }
 
+static struct net_device *
+__netdev_put_lock_ops_compat(struct net_device *dev, struct net *net)
+{
+	netdev_lock_ops_compat(dev);
+	if (dev->reg_state > NETREG_REGISTERED ||
+	    dev->moving_ns || !net_eq(dev_net(dev), net)) {
+		netdev_unlock_ops_compat(dev);
+		dev_put(dev);
+		return NULL;
+	}
+	dev_put(dev);
+	return dev;
+}
+
 /**
  *	netdev_get_by_index_lock() - find a device by its ifindex
  *	@net: the applicable net namespace
@@ -1074,6 +1088,18 @@ struct net_device *netdev_get_by_index_lock(struct net *net, int ifindex)
 	return __netdev_put_lock(dev, net);
 }
 
+struct net_device *
+netdev_get_by_index_lock_ops_compat(struct net *net, int ifindex)
+{
+	struct net_device *dev;
+
+	dev = dev_get_by_index(net, ifindex);
+	if (!dev)
+		return NULL;
+
+	return __netdev_put_lock_ops_compat(dev, net);
+}
+
 struct net_device *
 netdev_xa_find_lock(struct net *net, struct net_device *dev,
 		    unsigned long *index)
@@ -1099,6 +1125,31 @@ netdev_xa_find_lock(struct net *net, struct net_device *dev,
 	} while (true);
 }
 
+struct net_device *
+netdev_xa_find_lock_ops_compat(struct net *net, struct net_device *dev,
+			       unsigned long *index)
+{
+	if (dev)
+		netdev_unlock_ops_compat(dev);
+
+	do {
+		rcu_read_lock();
+		dev = xa_find(&net->dev_by_index, index, ULONG_MAX, XA_PRESENT);
+		if (!dev) {
+			rcu_read_unlock();
+			return NULL;
+		}
+		dev_hold(dev);
+		rcu_read_unlock();
+
+		dev = __netdev_put_lock_ops_compat(dev, net);
+		if (dev)
+			return dev;
+
+		(*index)++;
+	} while (true);
+}
+
 static DEFINE_SEQLOCK(netdev_rename_lock);
 
 void netdev_copy_name(struct net_device *dev, char *name)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 4/8] netdev: don't hold rtnl_lock over nl queue info get when possible
  2025-04-08 19:59 [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (2 preceding siblings ...)
  2025-04-08 19:59 ` [PATCH net-next v2 3/8] netdev: add "ops compat locking" helpers Jakub Kicinski
@ 2025-04-08 19:59 ` Jakub Kicinski
  2025-04-08 19:59 ` [PATCH net-next v2 5/8] xdp: double protect netdev->xdp_flags with netdev->lock Jakub Kicinski
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-08 19:59 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato, Jakub Kicinski

Netdev queue dump accesses: NAPI, memory providers, XSk pointers.
All three are "ops protected" now, switch to the op compat locking.
rtnl lock does not have to be taken for "ops locked" devices.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/core/netdev-genl.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 5d7af50fe702..7ef9b0191936 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -481,18 +481,15 @@ int netdev_nl_queue_get_doit(struct sk_buff *skb, struct genl_info *info)
 	if (!rsp)
 		return -ENOMEM;
 
-	rtnl_lock();
-
-	netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
+	netdev = netdev_get_by_index_lock_ops_compat(genl_info_net(info),
+						     ifindex);
 	if (netdev) {
 		err = netdev_nl_queue_fill(rsp, netdev, q_id, q_type, info);
-		netdev_unlock(netdev);
+		netdev_unlock_ops_compat(netdev);
 	} else {
 		err = -ENODEV;
 	}
 
-	rtnl_unlock();
-
 	if (err)
 		goto err_free_msg;
 
@@ -541,17 +538,17 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 	if (info->attrs[NETDEV_A_QUEUE_IFINDEX])
 		ifindex = nla_get_u32(info->attrs[NETDEV_A_QUEUE_IFINDEX]);
 
-	rtnl_lock();
 	if (ifindex) {
-		netdev = netdev_get_by_index_lock(net, ifindex);
+		netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
 		if (netdev) {
 			err = netdev_nl_queue_dump_one(netdev, skb, info, ctx);
-			netdev_unlock(netdev);
+			netdev_unlock_ops_compat(netdev);
 		} else {
 			err = -ENODEV;
 		}
 	} else {
-		for_each_netdev_lock_scoped(net, netdev, ctx->ifindex) {
+		for_each_netdev_lock_ops_compat_scoped(net, netdev,
+						       ctx->ifindex) {
 			err = netdev_nl_queue_dump_one(netdev, skb, info, ctx);
 			if (err < 0)
 				break;
@@ -559,7 +556,6 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 			ctx->txq_idx = 0;
 		}
 	}
-	rtnl_unlock();
 
 	return err;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 5/8] xdp: double protect netdev->xdp_flags with netdev->lock
  2025-04-08 19:59 [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (3 preceding siblings ...)
  2025-04-08 19:59 ` [PATCH net-next v2 4/8] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
@ 2025-04-08 19:59 ` Jakub Kicinski
  2025-04-08 22:15   ` Harshitha Ramamurthy
  2025-04-09 18:40   ` Martin KaFai Lau
  2025-04-08 19:59 ` [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features Jakub Kicinski
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-08 19:59 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato, Jakub Kicinski, bpf

Protect xdp_features with netdev->lock. This way pure readers
no longer have to take rtnl_lock to access the field.

This includes calling NETDEV_XDP_FEAT_CHANGE under the lock.
Looks like that's fine for bonding, the only "real" listener,
it's the same as ethtool feature change.

In terms of normal drivers - only GVE need special consideration
(other drivers don't use instance lock or don't support XDP).
It calls xdp_set_features_flag() helper from gve_init_priv() which
in turn is called from gve_reset_recovery() (locked), or prior
to netdev registration. So switch to _locked.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: bpf@vger.kernel.org
---
 Documentation/networking/netdevices.rst    |  1 +
 include/linux/netdevice.h                  |  2 +-
 include/net/xdp.h                          |  1 +
 drivers/net/ethernet/google/gve/gve_main.c |  2 +-
 net/core/lock_debug.c                      |  2 +-
 net/core/xdp.c                             | 12 +++++++++++-
 6 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 6c2d8945f597..d6357472d3f1 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -354,6 +354,7 @@ For devices with locked ops, currently only the following notifiers are
 running under the lock:
 * ``NETDEV_REGISTER``
 * ``NETDEV_UP``
+* ``NETDEV_XDP_FEAT_CHANGE``
 
 The following notifiers are running without the lock:
 * ``NETDEV_UNREGISTER``
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7242fb8a22fc..dece2ae396a1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2526,7 +2526,7 @@ struct net_device {
 	 *	@net_shaper_hierarchy, @reg_state, @threaded
 	 *
 	 * Double protects:
-	 *	@up, @moving_ns, @nd_net
+	 *	@up, @moving_ns, @nd_net, @xdp_flags
 	 *
 	 * Double ops protects:
 	 *	@real_num_rx_queues, @real_num_tx_queues
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 48efacbaa35d..20e41b5ff319 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -616,6 +616,7 @@ struct xdp_metadata_ops {
 u32 bpf_xdp_metadata_kfunc_id(int id);
 bool bpf_dev_bound_kfunc_id(u32 btf_id);
 void xdp_set_features_flag(struct net_device *dev, xdp_features_t val);
+void xdp_set_features_flag_locked(struct net_device *dev, xdp_features_t val);
 void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg);
 void xdp_features_clear_redirect_target(struct net_device *dev);
 #else
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index f9a73c956861..7a249baee316 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -2185,7 +2185,7 @@ static void gve_set_netdev_xdp_features(struct gve_priv *priv)
 		xdp_features = 0;
 	}
 
-	xdp_set_features_flag(priv->dev, xdp_features);
+	xdp_set_features_flag_locked(priv->dev, xdp_features);
 }
 
 static int gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
diff --git a/net/core/lock_debug.c b/net/core/lock_debug.c
index b7f22dc92a6f..598c443ef2f3 100644
--- a/net/core/lock_debug.c
+++ b/net/core/lock_debug.c
@@ -20,6 +20,7 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
 	switch (cmd) {
 	case NETDEV_REGISTER:
 	case NETDEV_UP:
+	case NETDEV_XDP_FEAT_CHANGE:
 		netdev_ops_assert_locked(dev);
 		fallthrough;
 	case NETDEV_DOWN:
@@ -58,7 +59,6 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
 	case NETDEV_OFFLOAD_XSTATS_DISABLE:
 	case NETDEV_OFFLOAD_XSTATS_REPORT_USED:
 	case NETDEV_OFFLOAD_XSTATS_REPORT_DELTA:
-	case NETDEV_XDP_FEAT_CHANGE:
 		ASSERT_RTNL();
 		break;
 
diff --git a/net/core/xdp.c b/net/core/xdp.c
index f86eedad586a..3cd0db9c9d2d 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -17,6 +17,7 @@
 #include <net/page_pool/helpers.h>
 
 #include <net/hotdata.h>
+#include <net/netdev_lock.h>
 #include <net/xdp.h>
 #include <net/xdp_priv.h> /* struct xdp_mem_allocator */
 #include <trace/events/xdp.h>
@@ -991,17 +992,26 @@ static int __init xdp_metadata_init(void)
 }
 late_initcall(xdp_metadata_init);
 
-void xdp_set_features_flag(struct net_device *dev, xdp_features_t val)
+void xdp_set_features_flag_locked(struct net_device *dev, xdp_features_t val)
 {
 	val &= NETDEV_XDP_ACT_MASK;
 	if (dev->xdp_features == val)
 		return;
 
+	netdev_assert_locked_or_invisible(dev);
 	dev->xdp_features = val;
 
 	if (dev->reg_state == NETREG_REGISTERED)
 		call_netdevice_notifiers(NETDEV_XDP_FEAT_CHANGE, dev);
 }
+EXPORT_SYMBOL_GPL(xdp_set_features_flag_locked);
+
+void xdp_set_features_flag(struct net_device *dev, xdp_features_t val)
+{
+	netdev_lock(dev);
+	xdp_set_features_flag_locked(dev, val);
+	netdev_unlock(dev);
+}
 EXPORT_SYMBOL_GPL(xdp_set_features_flag);
 
 void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features
  2025-04-08 19:59 [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (4 preceding siblings ...)
  2025-04-08 19:59 ` [PATCH net-next v2 5/8] xdp: double protect netdev->xdp_flags with netdev->lock Jakub Kicinski
@ 2025-04-08 19:59 ` Jakub Kicinski
  2025-04-10 17:10   ` Kuniyuki Iwashima
  2025-04-08 19:59 ` [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct Jakub Kicinski
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-08 19:59 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato, Jakub Kicinski

Writes to XDP features are now protected by netdev->lock.
Other things we report are based on ops which don't change
once device has been registered. It is safe to stop taking
rtnl_lock, and depend on netdev->lock instead.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/core/netdev-genl.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 7ef9b0191936..8c58261de969 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -38,6 +38,8 @@ netdev_nl_dev_fill(struct net_device *netdev, struct sk_buff *rsp,
 	u64 xdp_rx_meta = 0;
 	void *hdr;
 
+	netdev_assert_locked(netdev); /* note: rtnl_lock may not be held! */
+
 	hdr = genlmsg_iput(rsp, info);
 	if (!hdr)
 		return -EMSGSIZE;
@@ -122,15 +124,14 @@ int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info)
 	if (!rsp)
 		return -ENOMEM;
 
-	rtnl_lock();
-
-	netdev = __dev_get_by_index(genl_info_net(info), ifindex);
-	if (netdev)
-		err = netdev_nl_dev_fill(netdev, rsp, info);
-	else
+	netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
+	if (!netdev) {
 		err = -ENODEV;
+		goto err_free_msg;
+	}
 
-	rtnl_unlock();
+	err = netdev_nl_dev_fill(netdev, rsp, info);
+	netdev_unlock(netdev);
 
 	if (err)
 		goto err_free_msg;
@@ -146,18 +147,15 @@ int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct netdev_nl_dump_ctx *ctx = netdev_dump_ctx(cb);
 	struct net *net = sock_net(skb->sk);
-	struct net_device *netdev;
-	int err = 0;
+	int err;
 
-	rtnl_lock();
-	for_each_netdev_dump(net, netdev, ctx->ifindex) {
+	for_each_netdev_lock_scoped(net, netdev, ctx->ifindex) {
 		err = netdev_nl_dev_fill(netdev, skb, genl_info_dump(cb));
 		if (err < 0)
-			break;
+			return err;
 	}
-	rtnl_unlock();
 
-	return err;
+	return 0;
 }
 
 static int
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct
  2025-04-08 19:59 [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (5 preceding siblings ...)
  2025-04-08 19:59 ` [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features Jakub Kicinski
@ 2025-04-08 19:59 ` Jakub Kicinski
  2025-04-09  2:59   ` Joe Damato
  2025-04-10  6:01   ` Jacob Keller
  2025-04-08 19:59 ` [PATCH net-next v2 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers Jakub Kicinski
  2025-04-10  0:40 ` [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops patchwork-bot+netdevbpf
  8 siblings, 2 replies; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-08 19:59 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato, Jakub Kicinski

Explicitly list all the ops structs and what locking they provide.
Use "ops locked" as a term for drivers which have ops called under
the instance lock.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
 - an exception -> exceptions
v1: https://lore.kernel.org/20250407190117.16528-8-kuba@kernel.org
---
 Documentation/networking/netdevices.rst | 54 +++++++++++++++++++------
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index d6357472d3f1..7ae28c5fb835 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -314,13 +314,8 @@ struct napi_struct synchronization rules
 		 softirq
 		 will be called with interrupts disabled by netconsole.
 
-struct netdev_queue_mgmt_ops synchronization rules
-==================================================
-
-All queue management ndo callbacks are holding netdev instance lock.
-
-RTNL and netdev instance lock
-=============================
+netdev instance lock
+====================
 
 Historically, all networking control operations were protected by a single
 global lock known as ``rtnl_lock``. There is an ongoing effort to replace this
@@ -328,10 +323,13 @@ global lock with separate locks for each network namespace. Additionally,
 properties of individual netdev are increasingly protected by per-netdev locks.
 
 For device drivers that implement shaping or queue management APIs, all control
-operations will be performed under the netdev instance lock. Currently, this
-instance lock is acquired within the context of ``rtnl_lock``. The drivers
-can also explicitly request instance lock to be acquired via
-``request_ops_lock``. In the future, there will be an option for individual
+operations will be performed under the netdev instance lock.
+Drivers can also explicitly request instance lock to be held during ops
+by setting ``request_ops_lock`` to true. Code comments and docs refer
+to drivers which have ops called under the instance lock as "ops locked".
+See also the documentation of the ``lock`` member of struct net_device.
+
+In the future, there will be an option for individual
 drivers to opt out of using ``rtnl_lock`` and instead perform their control
 operations directly under the netdev instance lock.
 
@@ -343,8 +341,40 @@ there are two sets of interfaces: ``dev_xxx`` and ``netif_xxx`` (e.g.,
 acquiring the instance lock themselves, while the ``netif_xxx`` functions
 assume that the driver has already acquired the instance lock.
 
+struct net_device_ops
+---------------------
+
+``ndos`` are called without holding the instance lock for most drivers.
+
+"Ops locked" drivers will have most of the ``ndos`` invoked under
+the instance lock.
+
+struct ethtool_ops
+------------------
+
+Similarly to ``ndos`` the instance lock is only held for select drivers.
+For "ops locked" drivers all ethtool ops without exceptions should
+be called under the instance lock.
+
+struct net_shaper_ops
+---------------------
+
+All net shaper callbacks are invoked while holding the netdev instance
+lock. ``rtnl_lock`` may or may not be held.
+
+Note that supporting net shapers automatically enables "ops locking".
+
+struct netdev_queue_mgmt_ops
+----------------------------
+
+All queue management callbacks are invoked while holding the netdev instance
+lock. ``rtnl_lock`` may or may not be held.
+
+Note that supporting struct netdev_queue_mgmt_ops automatically enables
+"ops locking".
+
 Notifiers and netdev instance lock
-==================================
+----------------------------------
 
 For device drivers that implement shaping or queue management APIs,
 some of the notifiers (``enum netdev_cmd``) are running under the netdev
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers
  2025-04-08 19:59 [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (6 preceding siblings ...)
  2025-04-08 19:59 ` [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct Jakub Kicinski
@ 2025-04-08 19:59 ` Jakub Kicinski
  2025-04-10  5:23   ` Jacob Keller
  2025-04-10  0:40 ` [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops patchwork-bot+netdevbpf
  8 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-08 19:59 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato, Jakub Kicinski

We mostly needed rtnl_lock in qstat to make sure the queue count
is stable while we work. For "ops locked" drivers the instance
lock protects the queue count, so we don't have to take rtnl_lock.

For currently ops-locked drivers: netdevsim and bnxt need
the protection from netdev going down while we dump, which
instance lock provides. gve doesn't care.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/netdevices.rst |  6 +++++
 include/net/netdev_queues.h             |  4 +++-
 net/core/netdev-genl.c                  | 29 +++++++++++++++----------
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 7ae28c5fb835..0ccc7dcf4390 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -356,6 +356,12 @@ Similarly to ``ndos`` the instance lock is only held for select drivers.
 For "ops locked" drivers all ethtool ops without exceptions should
 be called under the instance lock.
 
+struct netdev_stat_ops
+----------------------
+
+"qstat" ops are invoked under the instance lock for "ops locked" drivers,
+and under rtnl_lock for all other drivers.
+
 struct net_shaper_ops
 ---------------------
 
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 825141d675e5..ea709b59d827 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -85,9 +85,11 @@ struct netdev_queue_stats_tx {
  * for some of the events is not maintained, and reliable "total" cannot
  * be provided).
  *
+ * Ops are called under the instance lock if netdev_need_ops_lock()
+ * returns true, otherwise under rtnl_lock.
  * Device drivers can assume that when collecting total device stats,
  * the @get_base_stats and subsequent per-queue calls are performed
- * "atomically" (without releasing the rtnl_lock).
+ * "atomically" (without releasing the relevant lock).
  *
  * Device drivers are encouraged to reset the per-queue statistics when
  * number of queues change. This is because the primary use case for
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 8c58261de969..b64c614a00c4 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -795,26 +795,31 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
 	if (info->attrs[NETDEV_A_QSTATS_IFINDEX])
 		ifindex = nla_get_u32(info->attrs[NETDEV_A_QSTATS_IFINDEX]);
 
-	rtnl_lock();
 	if (ifindex) {
-		netdev = __dev_get_by_index(net, ifindex);
-		if (netdev && netdev->stat_ops) {
+		netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
+		if (!netdev) {
+			NL_SET_BAD_ATTR(info->extack,
+					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
+			return -ENODEV;
+		}
+		if (netdev->stat_ops) {
 			err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
 							    info, ctx);
 		} else {
 			NL_SET_BAD_ATTR(info->extack,
 					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
-			err = netdev ? -EOPNOTSUPP : -ENODEV;
-		}
-	} else {
-		for_each_netdev_dump(net, netdev, ctx->ifindex) {
-			err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
-							    info, ctx);
-			if (err < 0)
-				break;
+			err = -EOPNOTSUPP;
 		}
+		netdev_unlock_ops_compat(netdev);
+		return err;
+	}
+
+	for_each_netdev_lock_ops_compat_scoped(net, netdev, ctx->ifindex) {
+		err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
+						    info, ctx);
+		if (err < 0)
+			break;
 	}
-	rtnl_unlock();
 
 	return err;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 5/8] xdp: double protect netdev->xdp_flags with netdev->lock
  2025-04-08 19:59 ` [PATCH net-next v2 5/8] xdp: double protect netdev->xdp_flags with netdev->lock Jakub Kicinski
@ 2025-04-08 22:15   ` Harshitha Ramamurthy
  2025-04-09 18:40   ` Martin KaFai Lau
  1 sibling, 0 replies; 27+ messages in thread
From: Harshitha Ramamurthy @ 2025-04-08 22:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	kuniyu, jdamato, bpf

On Tue, Apr 8, 2025 at 1:00 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Protect xdp_features with netdev->lock. This way pure readers
> no longer have to take rtnl_lock to access the field.
>
> This includes calling NETDEV_XDP_FEAT_CHANGE under the lock.
> Looks like that's fine for bonding, the only "real" listener,
> it's the same as ethtool feature change.
>
> In terms of normal drivers - only GVE need special consideration
> (other drivers don't use instance lock or don't support XDP).
> It calls xdp_set_features_flag() helper from gve_init_priv() which
> in turn is called from gve_reset_recovery() (locked), or prior
> to netdev registration. So switch to _locked.
>
> Reviewed-by: Joe Damato <jdamato@fastly.com>
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>

> ---
> CC: bpf@vger.kernel.org
> ---
>  Documentation/networking/netdevices.rst    |  1 +
>  include/linux/netdevice.h                  |  2 +-
>  include/net/xdp.h                          |  1 +
>  drivers/net/ethernet/google/gve/gve_main.c |  2 +-
>  net/core/lock_debug.c                      |  2 +-
>  net/core/xdp.c                             | 12 +++++++++++-
>  6 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
> index 6c2d8945f597..d6357472d3f1 100644
> --- a/Documentation/networking/netdevices.rst
> +++ b/Documentation/networking/netdevices.rst
> @@ -354,6 +354,7 @@ For devices with locked ops, currently only the following notifiers are
>  running under the lock:
>  * ``NETDEV_REGISTER``
>  * ``NETDEV_UP``
> +* ``NETDEV_XDP_FEAT_CHANGE``
>
>  The following notifiers are running without the lock:
>  * ``NETDEV_UNREGISTER``
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 7242fb8a22fc..dece2ae396a1 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2526,7 +2526,7 @@ struct net_device {
>          *      @net_shaper_hierarchy, @reg_state, @threaded
>          *
>          * Double protects:
> -        *      @up, @moving_ns, @nd_net
> +        *      @up, @moving_ns, @nd_net, @xdp_flags
>          *
>          * Double ops protects:
>          *      @real_num_rx_queues, @real_num_tx_queues
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 48efacbaa35d..20e41b5ff319 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -616,6 +616,7 @@ struct xdp_metadata_ops {
>  u32 bpf_xdp_metadata_kfunc_id(int id);
>  bool bpf_dev_bound_kfunc_id(u32 btf_id);
>  void xdp_set_features_flag(struct net_device *dev, xdp_features_t val);
> +void xdp_set_features_flag_locked(struct net_device *dev, xdp_features_t val);
>  void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg);
>  void xdp_features_clear_redirect_target(struct net_device *dev);
>  #else
> diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> index f9a73c956861..7a249baee316 100644
> --- a/drivers/net/ethernet/google/gve/gve_main.c
> +++ b/drivers/net/ethernet/google/gve/gve_main.c
> @@ -2185,7 +2185,7 @@ static void gve_set_netdev_xdp_features(struct gve_priv *priv)
>                 xdp_features = 0;
>         }
>
> -       xdp_set_features_flag(priv->dev, xdp_features);
> +       xdp_set_features_flag_locked(priv->dev, xdp_features);
>  }
>
>  static int gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
> diff --git a/net/core/lock_debug.c b/net/core/lock_debug.c
> index b7f22dc92a6f..598c443ef2f3 100644
> --- a/net/core/lock_debug.c
> +++ b/net/core/lock_debug.c
> @@ -20,6 +20,7 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
>         switch (cmd) {
>         case NETDEV_REGISTER:
>         case NETDEV_UP:
> +       case NETDEV_XDP_FEAT_CHANGE:
>                 netdev_ops_assert_locked(dev);
>                 fallthrough;
>         case NETDEV_DOWN:
> @@ -58,7 +59,6 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
>         case NETDEV_OFFLOAD_XSTATS_DISABLE:
>         case NETDEV_OFFLOAD_XSTATS_REPORT_USED:
>         case NETDEV_OFFLOAD_XSTATS_REPORT_DELTA:
> -       case NETDEV_XDP_FEAT_CHANGE:
>                 ASSERT_RTNL();
>                 break;
>
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index f86eedad586a..3cd0db9c9d2d 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -17,6 +17,7 @@
>  #include <net/page_pool/helpers.h>
>
>  #include <net/hotdata.h>
> +#include <net/netdev_lock.h>
>  #include <net/xdp.h>
>  #include <net/xdp_priv.h> /* struct xdp_mem_allocator */
>  #include <trace/events/xdp.h>
> @@ -991,17 +992,26 @@ static int __init xdp_metadata_init(void)
>  }
>  late_initcall(xdp_metadata_init);
>
> -void xdp_set_features_flag(struct net_device *dev, xdp_features_t val)
> +void xdp_set_features_flag_locked(struct net_device *dev, xdp_features_t val)
>  {
>         val &= NETDEV_XDP_ACT_MASK;
>         if (dev->xdp_features == val)
>                 return;
>
> +       netdev_assert_locked_or_invisible(dev);
>         dev->xdp_features = val;
>
>         if (dev->reg_state == NETREG_REGISTERED)
>                 call_netdevice_notifiers(NETDEV_XDP_FEAT_CHANGE, dev);
>  }
> +EXPORT_SYMBOL_GPL(xdp_set_features_flag_locked);
> +
> +void xdp_set_features_flag(struct net_device *dev, xdp_features_t val)
> +{
> +       netdev_lock(dev);
> +       xdp_set_features_flag_locked(dev, val);
> +       netdev_unlock(dev);
> +}
>  EXPORT_SYMBOL_GPL(xdp_set_features_flag);
>
>  void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg)
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct
  2025-04-08 19:59 ` [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct Jakub Kicinski
@ 2025-04-09  2:59   ` Joe Damato
  2025-04-10  6:01   ` Jacob Keller
  1 sibling, 0 replies; 27+ messages in thread
From: Joe Damato @ 2025-04-09  2:59 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Tue, Apr 08, 2025 at 12:59:54PM -0700, Jakub Kicinski wrote:
> Explicitly list all the ops structs and what locking they provide.
> Use "ops locked" as a term for drivers which have ops called under
> the instance lock.
> 
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v2:
>  - an exception -> exceptions
> v1: https://lore.kernel.org/20250407190117.16528-8-kuba@kernel.org
> ---
>  Documentation/networking/netdevices.rst | 54 +++++++++++++++++++------
>  1 file changed, 42 insertions(+), 12 deletions(-)

Reviewed-by: Joe Damato <jdamato@fastly.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 5/8] xdp: double protect netdev->xdp_flags with netdev->lock
  2025-04-08 19:59 ` [PATCH net-next v2 5/8] xdp: double protect netdev->xdp_flags with netdev->lock Jakub Kicinski
  2025-04-08 22:15   ` Harshitha Ramamurthy
@ 2025-04-09 18:40   ` Martin KaFai Lau
  1 sibling, 0 replies; 27+ messages in thread
From: Martin KaFai Lau @ 2025-04-09 18:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu, jdamato, bpf

On Tue, Apr 8, 2025 at 1:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Protect xdp_features with netdev->lock. This way pure readers
> no longer have to take rtnl_lock to access the field.
>
> This includes calling NETDEV_XDP_FEAT_CHANGE under the lock.
> Looks like that's fine for bonding, the only "real" listener,
> it's the same as ethtool feature change.
>
> In terms of normal drivers - only GVE need special consideration
> (other drivers don't use instance lock or don't support XDP).
> It calls xdp_set_features_flag() helper from gve_init_priv() which
> in turn is called from gve_reset_recovery() (locked), or prior
> to netdev registration. So switch to _locked.

Acked-by: Martin KaFai Lau <martin.lau@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops
  2025-04-08 19:59 [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (7 preceding siblings ...)
  2025-04-08 19:59 ` [PATCH net-next v2 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers Jakub Kicinski
@ 2025-04-10  0:40 ` patchwork-bot+netdevbpf
  8 siblings, 0 replies; 27+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-04-10  0:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu, jdamato

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  8 Apr 2025 12:59:47 -0700 you wrote:
> netdev-genl used to be protected by rtnl_lock. In previous release
> we already switched the queue management ops (for Rx zero-copy) to
> the instance lock. This series converts other ops to depend on the
> instance lock when possible.
> 
> Unfortunately queue related state is hard to lock (unlike NAPI)
> as the process of switching the number of queues usually involves
> a large reconfiguration of the driver. The reconfig process has
> historically been under rtnl_lock, but for drivers which opt into
> ops locking it is also under the instance lock. Leverage that
> and conditionally take rtnl_lock or instance lock depending
> on the device capabilities.
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch
    (no matching commit)
  - [net-next,v2,2/8] net: designate XSK pool pointers in queues as "ops protected"
    https://git.kernel.org/netdev/net-next/c/606048cbd834
  - [net-next,v2,3/8] netdev: add "ops compat locking" helpers
    (no matching commit)
  - [net-next,v2,4/8] netdev: don't hold rtnl_lock over nl queue info get when possible
    https://git.kernel.org/netdev/net-next/c/d02e3b388221
  - [net-next,v2,5/8] xdp: double protect netdev->xdp_flags with netdev->lock
    (no matching commit)
  - [net-next,v2,6/8] netdev: depend on netdev->lock for xdp features
    https://git.kernel.org/netdev/net-next/c/99e44f39a8f7
  - [net-next,v2,7/8] docs: netdev: break down the instance locking info per ops struct
    https://git.kernel.org/netdev/net-next/c/87eba404f2e1
  - [net-next,v2,8/8] netdev: depend on netdev->lock for qstats in ops locked drivers
    https://git.kernel.org/netdev/net-next/c/ce7b14947484

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers
  2025-04-08 19:59 ` [PATCH net-next v2 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers Jakub Kicinski
@ 2025-04-10  5:23   ` Jacob Keller
  2025-04-10 23:46     ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: Jacob Keller @ 2025-04-10  5:23 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato



On 4/8/2025 12:59 PM, Jakub Kicinski wrote:
> We mostly needed rtnl_lock in qstat to make sure the queue count
> is stable while we work. For "ops locked" drivers the instance
> lock protects the queue count, so we don't have to take rtnl_lock.
> 
> For currently ops-locked drivers: netdevsim and bnxt need
> the protection from netdev going down while we dump, which
> instance lock provides. gve doesn't care.
> 
> Reviewed-by: Joe Damato <jdamato@fastly.com>
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  Documentation/networking/netdevices.rst |  6 +++++
>  include/net/netdev_queues.h             |  4 +++-
>  net/core/netdev-genl.c                  | 29 +++++++++++++++----------
>  3 files changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
> index 7ae28c5fb835..0ccc7dcf4390 100644
> --- a/Documentation/networking/netdevices.rst
> +++ b/Documentation/networking/netdevices.rst
> @@ -356,6 +356,12 @@ Similarly to ``ndos`` the instance lock is only held for select drivers.
>  For "ops locked" drivers all ethtool ops without exceptions should
>  be called under the instance lock.
>  
> +struct netdev_stat_ops
> +----------------------
> +
> +"qstat" ops are invoked under the instance lock for "ops locked" drivers,
> +and under rtnl_lock for all other drivers.
> +
>  struct net_shaper_ops
>  ---------------------
>  

What determines if a driver is "ops locked"? Is that defined above this
chunk in the doc? I see its when netdev_need_ops_lock() is set? Ok.
Sounds like it would be good to start migrating drivers over to this
locking paradigm over time.

> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> index 825141d675e5..ea709b59d827 100644
> --- a/include/net/netdev_queues.h
> +++ b/include/net/netdev_queues.h
> @@ -85,9 +85,11 @@ struct netdev_queue_stats_tx {
>   * for some of the events is not maintained, and reliable "total" cannot
>   * be provided).
>   *
> + * Ops are called under the instance lock if netdev_need_ops_lock()
> + * returns true, otherwise under rtnl_lock.
>   * Device drivers can assume that when collecting total device stats,
>   * the @get_base_stats and subsequent per-queue calls are performed
> - * "atomically" (without releasing the rtnl_lock).
> + * "atomically" (without releasing the relevant lock).
>   *
>   * Device drivers are encouraged to reset the per-queue statistics when
>   * number of queues change. This is because the primary use case for
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index 8c58261de969..b64c614a00c4 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -795,26 +795,31 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
>  	if (info->attrs[NETDEV_A_QSTATS_IFINDEX])
>  		ifindex = nla_get_u32(info->attrs[NETDEV_A_QSTATS_IFINDEX]);
>  
> -	rtnl_lock();

We used to lock here..

>  	if (ifindex) {
> -		netdev = __dev_get_by_index(net, ifindex);
> -		if (netdev && netdev->stat_ops) {
> +		netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
> +		if (!netdev) {
> +			NL_SET_BAD_ATTR(info->extack,
> +					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
> +			return -ENODEV;
> +		}

I guess netdev_get_by_index_lock_ops_compat acquires the lock when it
returns success?

> +		if (netdev->stat_ops) {
>  			err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
>  							    info, ctx);
>  		} else {
>  			NL_SET_BAD_ATTR(info->extack,
>  					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
> -			err = netdev ? -EOPNOTSUPP : -ENODEV;
> -		}
> -	} else {

But there's an else branch here so now I'm confused with how this
locking works.

> -		for_each_netdev_dump(net, netdev, ctx->ifindex) {
> -			err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
> -							    info, ctx);
> -			if (err < 0)
> -				break;
> +			err = -EOPNOTSUPP;
>  		}
> +		netdev_unlock_ops_compat(netdev);

And we call netdev_unlock_ops_compat() here... but I don't see how this
branch acquired the lock?

> +		return err;
> +	}
> +
> +	for_each_netdev_lock_ops_compat_scoped(net, netdev, ctx->ifindex) {
> +		err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
> +						    info, ctx);
> +		if (err < 0)
> +			break;

This looks like its scope guarded so its fine.

>  	}
> -	rtnl_unlock();
>  

What am I missing?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct
  2025-04-08 19:59 ` [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct Jakub Kicinski
  2025-04-09  2:59   ` Joe Damato
@ 2025-04-10  6:01   ` Jacob Keller
  2025-04-10 16:08     ` Jakub Kicinski
  1 sibling, 1 reply; 27+ messages in thread
From: Jacob Keller @ 2025-04-10  6:01 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato



On 4/8/2025 12:59 PM, Jakub Kicinski wrote:
> Explicitly list all the ops structs and what locking they provide.
> Use "ops locked" as a term for drivers which have ops called under
> the instance lock.
> 
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v2:
>  - an exception -> exceptions
> v1: https://lore.kernel.org/20250407190117.16528-8-kuba@kernel.org
> ---
>  Documentation/networking/netdevices.rst | 54 +++++++++++++++++++------
>  1 file changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
> index d6357472d3f1..7ae28c5fb835 100644
> --- a/Documentation/networking/netdevices.rst
> +++ b/Documentation/networking/netdevices.rst
> @@ -314,13 +314,8 @@ struct napi_struct synchronization rules
>  		 softirq
>  		 will be called with interrupts disabled by netconsole.
>  
> -struct netdev_queue_mgmt_ops synchronization rules
> -==================================================
> -
> -All queue management ndo callbacks are holding netdev instance lock.
> -
> -RTNL and netdev instance lock
> -=============================
> +netdev instance lock
> +====================
>  
>  Historically, all networking control operations were protected by a single
>  global lock known as ``rtnl_lock``. There is an ongoing effort to replace this
> @@ -328,10 +323,13 @@ global lock with separate locks for each network namespace. Additionally,
>  properties of individual netdev are increasingly protected by per-netdev locks.
>  
>  For device drivers that implement shaping or queue management APIs, all control
> -operations will be performed under the netdev instance lock. Currently, this
> -instance lock is acquired within the context of ``rtnl_lock``. The drivers
> -can also explicitly request instance lock to be acquired via
> -``request_ops_lock``. In the future, there will be an option for individual
> +operations will be performed under the netdev instance lock.
> +Drivers can also explicitly request instance lock to be held during ops
> +by setting ``request_ops_lock`` to true. Code comments and docs refer
> +to drivers which have ops called under the instance lock as "ops locked".
> +See also the documentation of the ``lock`` member of struct net_device.
> +
> +In the future, there will be an option for individual
>  drivers to opt out of using ``rtnl_lock`` and instead perform their control
>  operations directly under the netdev instance lock.
>  
> @@ -343,8 +341,40 @@ there are two sets of interfaces: ``dev_xxx`` and ``netif_xxx`` (e.g.,
>  acquiring the instance lock themselves, while the ``netif_xxx`` functions
>  assume that the driver has already acquired the instance lock.
>  
> +struct net_device_ops
> +---------------------
> +
> +``ndos`` are called without holding the instance lock for most drivers.
> +
> +"Ops locked" drivers will have most of the ``ndos`` invoked under
> +the instance lock.
> +
> +struct ethtool_ops
> +------------------
> +
> +Similarly to ``ndos`` the instance lock is only held for select drivers.
> +For "ops locked" drivers all ethtool ops without exceptions should
> +be called under the instance lock.
> +
> +struct net_shaper_ops
> +---------------------
> +
> +All net shaper callbacks are invoked while holding the netdev instance
> +lock. ``rtnl_lock`` may or may not be held.
> +
> +Note that supporting net shapers automatically enables "ops locking".
> +
> +struct netdev_queue_mgmt_ops
> +----------------------------
> +
> +All queue management callbacks are invoked while holding the netdev instance
> +lock. ``rtnl_lock`` may or may not be held.
> +
> +Note that supporting struct netdev_queue_mgmt_ops automatically enables
> +"ops locking".
> +

Does this mean we don't allow drivers which support
netdev_queue_mgmt_ops but don't set request_ops_lock? Or does it mean
that supporting netdev_queue_mgmt_ops and/or netdev shapers
automatically implies request_ops_lock? Or is there some other
behavioral difference?

From the wording this sounds like its enforced via code, and it seems
reasonable to me that we wouldn't allow these without setting
request_ops_lock to true...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct
  2025-04-10  6:01   ` Jacob Keller
@ 2025-04-10 16:08     ` Jakub Kicinski
  2025-04-10 22:35       ` Keller, Jacob E
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-10 16:08 UTC (permalink / raw)
  To: Jacob Keller
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu, jdamato

On Wed, 9 Apr 2025 23:01:18 -0700 Jacob Keller wrote:
> > +All queue management callbacks are invoked while holding the netdev instance
> > +lock. ``rtnl_lock`` may or may not be held.
> > +
> > +Note that supporting struct netdev_queue_mgmt_ops automatically enables
> > +"ops locking".
> > +  
> 
> Does this mean we don't allow drivers which support
> netdev_queue_mgmt_ops but don't set request_ops_lock? Or does it mean
> that supporting netdev_queue_mgmt_ops and/or netdev shapers
> automatically implies request_ops_lock? Or is there some other
> behavioral difference?
> 
> From the wording this sounds like its enforced via code, and it seems
> reasonable to me that we wouldn't allow these without setting
> request_ops_lock to true...

"request" is for drivers to optionally request.
If the driver supports queue or shaper APIs it doesn't have a say.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features
  2025-04-08 19:59 ` [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features Jakub Kicinski
@ 2025-04-10 17:10   ` Kuniyuki Iwashima
  2025-04-11  2:10     ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-10 17:10 UTC (permalink / raw)
  To: kuba
  Cc: andrew+netdev, davem, edumazet, horms, hramamurthy, jdamato,
	kuniyu, netdev, pabeni, sdf

From: Jakub Kicinski <kuba@kernel.org>
Date: Tue,  8 Apr 2025 12:59:53 -0700
> Writes to XDP features are now protected by netdev->lock.
> Other things we report are based on ops which don't change
> once device has been registered. It is safe to stop taking
> rtnl_lock, and depend on netdev->lock instead.
> 
> Reviewed-by: Joe Damato <jdamato@fastly.com>
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  net/core/netdev-genl.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index 7ef9b0191936..8c58261de969 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -38,6 +38,8 @@ netdev_nl_dev_fill(struct net_device *netdev, struct sk_buff *rsp,
>  	u64 xdp_rx_meta = 0;
>  	void *hdr;
>  
> +	netdev_assert_locked(netdev); /* note: rtnl_lock may not be held! */

syzkaller reported splats in register_netdevice() and
unregister_netdevice_many_notify().

In register_netdevice(), some devices cannot use
netdev_assert_locked().

In unregister_netdevice_many_notify(), maybe we need to
hold ops lock in UNREGISTER as you initially suggested.
Now do_setlink() deadlock does not happen.


register_netdevice:

WARNING: CPU: 1 PID: 4014 at ./include/net/netdev_lock.h:17 netdev_assert_locked include/net/netdev_lock.h:17 [inline]
WARNING: CPU: 1 PID: 4014 at ./include/net/netdev_lock.h:17 netdev_nl_dev_fill+0x4e8/0x628 net/core/netdev-genl.c:41
Modules linked in:
CPU: 1 UID: 0 PID: 4014 Comm: syz.2.1132 Not tainted 6.14.0-13344-ga9843689e2de #27 PREEMPT  796b8818af72d9451c8904a331bb7c0385e61b12
Hardware name: linux,dummy-virt (DT)
pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : netdev_assert_locked include/net/netdev_lock.h:17 [inline]
pc : netdev_nl_dev_fill+0x4e8/0x628 net/core/netdev-genl.c:41
lr : netdev_assert_locked include/net/netdev_lock.h:17 [inline]
lr : netdev_nl_dev_fill+0x4e8/0x628 net/core/netdev-genl.c:41
sp : ffff80008d1274a0
x29: ffff80008d1274b0 x28: 1ffff00010bed1b5 x27: 1fffe000038dd021
x26: ffff800085f68000 x25: dfff800000000000 x24: dfff800000000000
x23: ffff80008d127510 x22: 0000000000000000 x21: ffff80008d127510
x20: ffff00001c6e8000 x19: ffff00001b9fc640 x18: 0000000000000000
x17: 4d45545359534255 x16: 5300302d78742f73 x15: 0000000000000001
x14: 1fffe00003abf9dc x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000080000 x10: 0000000000067d91 x9 : ffff80008ecb7000
x8 : 0000000000067d92 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000040 x4 : 0000000000000000 x3 : ffff8000832814c4
x2 : ffff80008d127510 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 netdev_assert_locked include/net/netdev_lock.h:17 [inline] (P)
 netdev_nl_dev_fill+0x4e8/0x628 net/core/netdev-genl.c:41 (P)
 netdev_genl_dev_notify+0x168/0x370 net/core/netdev-genl.c:102
 netdev_genl_netdevice_event+0x98/0xc0 net/core/netdev-genl.c:-1
 notifier_call_chain+0x1ac/0x4c8 kernel/notifier.c:85
 raw_notifier_call_chain+0x48/0x68 kernel/notifier.c:453
 call_netdevice_notifiers_info net/core/dev.c:2235 [inline]
 call_netdevice_notifiers_extack net/core/dev.c:2273 [inline]
 call_netdevice_notifiers+0xd8/0x150 net/core/dev.c:2287
 register_netdevice+0x1170/0x1620 net/core/dev.c:11109
 register_netdev+0x84/0xb0 net/core/dev.c:11188
 loopback_net_init+0x70/0x168 drivers/net/loopback.c:218
 ops_init+0x31c/0x558 net/core/net_namespace.c:138
 setup_net+0x21c/0x7f0 net/core/net_namespace.c:364
 copy_net_ns+0x2b0/0x4b8 net/core/net_namespace.c:518
 create_new_namespaces+0x324/0x608 kernel/nsproxy.c:110
 copy_namespaces+0x3d4/0x448 kernel/nsproxy.c:179
 copy_process+0x1350/0x3230 kernel/fork.c:2432
 kernel_clone+0x18c/0x6e8 kernel/fork.c:2844
 __do_sys_clone kernel/fork.c:2987 [inline]
 __se_sys_clone kernel/fork.c:2955 [inline]
 __arm64_sys_clone+0x100/0x140 kernel/fork.c:2955
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151
 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744
 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762
 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600
irq event stamp: 3880
hardirqs last  enabled at (3879): [<ffff800084532260>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline]
hardirqs last  enabled at (3879): [<ffff800084532260>] _raw_spin_unlock_irqrestore+0x48/0xc0 kernel/locking/spinlock.c:194
hardirqs last disabled at (3880): [<ffff80008450fa94>] el1_dbg+0x24/0x50 arch/arm64/kernel/entry-common.c:488
softirqs last  enabled at (2984): [<ffff800083268488>] spin_unlock_bh include/linux/spinlock.h:396 [inline]
softirqs last  enabled at (2984): [<ffff800083268488>] release_sock+0x158/0x1c0 net/core/sock.c:3721
softirqs last disabled at (2982): [<ffff800083268370>] spin_lock_bh include/linux/spinlock.h:356 [inline]
softirqs last disabled at (2982): [<ffff800083268370>] release_sock+0x40/0x1c0 net/core/sock.c:3710


unregister_netdevice_many_notify:

WARNING: CPU: 1 PID: 230 at ./include/net/netdev_lock.h:17 netdev_assert_locked include/net/netdev_lock.h:17 [inline]
WARNING: CPU: 1 PID: 230 at ./include/net/netdev_lock.h:17 netdev_nl_dev_fill+0x4e8/0x628 net/core/netdev-genl.c:41
Modules linked in:
CPU: 1 UID: 0 PID: 230 Comm: kworker/u8:4 Not tainted 6.14.0-13344-ga9843689e2de #27 PREEMPT  796b8818af72d9451c8904a331bb7c0385e61b12
Hardware name: linux,dummy-virt (DT)
Workqueue: netns cleanup_net
pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : netdev_assert_locked include/net/netdev_lock.h:17 [inline]
pc : netdev_nl_dev_fill+0x4e8/0x628 net/core/netdev-genl.c:41
lr : netdev_assert_locked include/net/netdev_lock.h:17 [inline]
lr : netdev_nl_dev_fill+0x4e8/0x628 net/core/netdev-genl.c:41
sp : ffff80008d027790
x29: ffff80008d0277a0 x28: 1ffff00010bed1b5 x27: 1fffe00018db0021
x26: ffff800085f68000 x25: dfff800000000000 x24: dfff800000000000
x23: ffff80008d027800 x22: 0000000000000000 x21: ffff80008d027800
x20: ffff0000c6d80000 x19: ffff0000c91f3c80 x18: 0000000000000000
x17: ffff800080297454 x16: ffff8000833ddf10 x15: 0000000000000001
x14: 1fffe00000f651dc x13: 0000000000000000 x12: 0000000000000000
x11: ffff600000f651dd x10: 0000000000ff0100 x9 : 0000000000000000
x8 : ffff0000c7009d00 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000040 x4 : 0000000000000000 x3 : ffff8000832814c4
x2 : ffff80008d027800 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 netdev_assert_locked include/net/netdev_lock.h:17 [inline] (P)
 netdev_nl_dev_fill+0x4e8/0x628 net/core/netdev-genl.c:41 (P)
 netdev_genl_dev_notify+0x168/0x370 net/core/netdev-genl.c:102
 netdev_genl_netdevice_event+0x98/0xc0 net/core/netdev-genl.c:-1
 notifier_call_chain+0x1ac/0x4c8 kernel/notifier.c:85
 raw_notifier_call_chain+0x48/0x68 kernel/notifier.c:453
 call_netdevice_notifiers_info net/core/dev.c:2235 [inline]
 call_netdevice_notifiers_extack net/core/dev.c:2273 [inline]
 call_netdevice_notifiers net/core/dev.c:2287 [inline]
 unregister_netdevice_many_notify+0xf40/0x1c98 net/core/dev.c:12035
 unregister_netdevice_many+0x34/0x50 net/core/dev.c:12099
 cleanup_net+0x540/0x968 net/core/net_namespace.c:649
 process_one_work+0x744/0x13e0 kernel/workqueue.c:3238
 process_scheduled_works kernel/workqueue.c:3319 [inline]
 worker_thread+0x928/0xe98 kernel/workqueue.c:3400
 kthread+0x4dc/0x638 kernel/kthread.c:464
 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:862
irq event stamp: 526602
hardirqs last  enabled at (526601): [<ffff800084532260>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline]
hardirqs last  enabled at (526601): [<ffff800084532260>] _raw_spin_unlock_irqrestore+0x48/0xc0 kernel/locking/spinlock.c:194
hardirqs last disabled at (526602): [<ffff80008450fa94>] el1_dbg+0x24/0x50 arch/arm64/kernel/entry-common.c:488
softirqs last  enabled at (526394): [<ffff80008332274c>] spin_unlock_bh include/linux/spinlock.h:396 [inline]
softirqs last  enabled at (526394): [<ffff80008332274c>] netif_addr_unlock_bh include/linux/netdevice.h:4809 [inline]
softirqs last  enabled at (526394): [<ffff80008332274c>] dev_mc_flush+0x1bc/0x208 net/core/dev_addr_lists.c:1037
softirqs last disabled at (526392): [<ffff800083322ca8>] local_bh_disable+0x10/0x40 include/linux/bottom_half.h:19

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct
  2025-04-10 16:08     ` Jakub Kicinski
@ 2025-04-10 22:35       ` Keller, Jacob E
  2025-04-10 23:39         ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: Keller, Jacob E @ 2025-04-10 22:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Dumazet, Eric,
	pabeni@redhat.com, andrew+netdev@lunn.ch, horms@kernel.org,
	sdf@fomichev.me, hramamurthy@google.com, kuniyu@amazon.com,
	Damato, Joe



> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, April 10, 2025 9:08 AM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; Dumazet, Eric
> <edumazet@google.com>; pabeni@redhat.com; andrew+netdev@lunn.ch;
> horms@kernel.org; sdf@fomichev.me; hramamurthy@google.com;
> kuniyu@amazon.com; Damato, Joe <jdamato@fastly.com>
> Subject: Re: [PATCH net-next v2 7/8] docs: netdev: break down the instance
> locking info per ops struct
> 
> On Wed, 9 Apr 2025 23:01:18 -0700 Jacob Keller wrote:
> > > +All queue management callbacks are invoked while holding the netdev
> instance
> > > +lock. ``rtnl_lock`` may or may not be held.
> > > +
> > > +Note that supporting struct netdev_queue_mgmt_ops automatically enables
> > > +"ops locking".
> > > +
> >
> > Does this mean we don't allow drivers which support
> > netdev_queue_mgmt_ops but don't set request_ops_lock? Or does it mean
> > that supporting netdev_queue_mgmt_ops and/or netdev shapers
> > automatically implies request_ops_lock? Or is there some other
> > behavioral difference?
> >
> > From the wording this sounds like its enforced via code, and it seems
> > reasonable to me that we wouldn't allow these without setting
> > request_ops_lock to true...
> 
> "request" is for drivers to optionally request.
> If the driver supports queue or shaper APIs it doesn't have a say.

Which is to say: if you support either of the new APIs, or you automatically get ops locking regardless of what request_ops_lock is, so that if you do support one of those interfaces, there is no behavioral difference between setting or not setting request_ops_lock.

Ok, I think that's reasonable.

Regards,
Jake

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct
  2025-04-10 22:35       ` Keller, Jacob E
@ 2025-04-10 23:39         ` Jakub Kicinski
  2025-04-11 21:26           ` Jacob Keller
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-10 23:39 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Dumazet, Eric,
	pabeni@redhat.com, andrew+netdev@lunn.ch, horms@kernel.org,
	sdf@fomichev.me, hramamurthy@google.com, kuniyu@amazon.com,
	Damato, Joe

On Thu, 10 Apr 2025 22:35:43 +0000 Keller, Jacob E wrote:
> > > Does this mean we don't allow drivers which support
> > > netdev_queue_mgmt_ops but don't set request_ops_lock? Or does it mean
> > > that supporting netdev_queue_mgmt_ops and/or netdev shapers
> > > automatically implies request_ops_lock? Or is there some other
> > > behavioral difference?
> > >
> > > From the wording this sounds like its enforced via code, and it seems
> > > reasonable to me that we wouldn't allow these without setting
> > > request_ops_lock to true...  
> > 
> > "request" is for drivers to optionally request.
> > If the driver supports queue or shaper APIs it doesn't have a say.  
> 
> Which is to say: if you support either of the new APIs, or you
> automatically get ops locking regardless of what request_ops_lock is,
> so that if you do support one of those interfaces, there is no
> behavioral difference between setting or not setting request_ops_lock.
> 
> Ok, I think that's reasonable.

Right, and FWIW we may one day actually make the request_ops_lock 
bit be _the_ decider and auto-set it based on op presence when netdev
is registered. Purely to simplify the condition in netdev_need_ops_lock().
For now it isn't that. I was worried if I go into too much detail here
we'll then forget to update it and stale docs are worse than no docs :(

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers
  2025-04-10  5:23   ` Jacob Keller
@ 2025-04-10 23:46     ` Jakub Kicinski
  2025-04-11 21:29       ` Jacob Keller
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-10 23:46 UTC (permalink / raw)
  To: Jacob Keller
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu, jdamato

On Wed, 9 Apr 2025 22:23:28 -0700 Jacob Keller wrote:
> > +struct netdev_stat_ops
> > +----------------------
> > +
> > +"qstat" ops are invoked under the instance lock for "ops locked" drivers,
> > +and under rtnl_lock for all other drivers.
> > +
> >  struct net_shaper_ops
> >  ---------------------
> >    
> 
> What determines if a driver is "ops locked"? Is that defined above this
> chunk in the doc? I see its when netdev_need_ops_lock() is set? Ok.

Yup, it was hiding in the previous patch:

   Code comments and docs refer to drivers which have ops called under
   the instance lock as "ops locked".

> Sounds like it would be good to start migrating drivers over to this
> locking paradigm over time.

At least for the drivers which implement queue stats its nice to be able 
to dump stats without taking the global lock. 

> >  	if (ifindex) {
> > -		netdev = __dev_get_by_index(net, ifindex);
> > -		if (netdev && netdev->stat_ops) {
> > +		netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
> > +		if (!netdev) {
> > +			NL_SET_BAD_ATTR(info->extack,
> > +					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
> > +			return -ENODEV;
> > +		}  
> 
> I guess netdev_get_by_index_lock_ops_compat acquires the lock when it
> returns success?

Yes.

> > +		if (netdev->stat_ops) {
> >  			err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
> >  							    info, ctx);
> >  		} else {
> >  			NL_SET_BAD_ATTR(info->extack,
> >  					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
> > -			err = netdev ? -EOPNOTSUPP : -ENODEV;
> > -		}
> > -	} else {  
> 
> But there's an else branch here so now I'm confused with how this
> locking works.

The diff is really hard to read, sorry, I should have done two patches.
The else branch is _removed_. The code is now:

	if (ifindex) {
		netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
		...
		netdev_unlock_ops_compat(netdev);  
		return ;
	}

	for_each_lock_scoped() {
	}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features
  2025-04-10 17:10   ` Kuniyuki Iwashima
@ 2025-04-11  2:10     ` Jakub Kicinski
  2025-04-11  2:23       ` Jakub Kicinski
  2025-04-11  2:36       ` Kuniyuki Iwashima
  0 siblings, 2 replies; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-11  2:10 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: andrew+netdev, davem, edumazet, horms, hramamurthy, jdamato,
	netdev, pabeni, sdf

On Thu, 10 Apr 2025 10:10:01 -0700 Kuniyuki Iwashima wrote:
> syzkaller reported splats in register_netdevice() and
> unregister_netdevice_many_notify().
> 
> In register_netdevice(), some devices cannot use
> netdev_assert_locked().
> 
> In unregister_netdevice_many_notify(), maybe we need to
> hold ops lock in UNREGISTER as you initially suggested.
> Now do_setlink() deadlock does not happen.

Ah...  Thank you.

Do you have a reference to use as Reported-by, or its from a
non-public instance ?

I'll test this shortly:

diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index b64c614a00c4..891e2f60922f 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -38,7 +38,8 @@ netdev_nl_dev_fill(struct net_device *netdev, struct sk_buff *rsp,
        u64 xdp_rx_meta = 0;
        void *hdr;
 
-       netdev_assert_locked(netdev); /* note: rtnl_lock may not be held! */
+       /* note: rtnl_lock may or may not be held! */
+       netdev_assert_locked_or_invisible(netdev);
 
        hdr = genlmsg_iput(rsp, info);
        if (!hdr)
@@ -966,7 +967,9 @@ static int netdev_genl_netdevice_event(struct notifier_block *nb,
                netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_ADD_NTF);
                break;
        case NETDEV_UNREGISTER:
+               netdev_lock(netdev);
                netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_DEL_NTF);
+               netdev_unlock(netdev);
                break;
        case NETDEV_XDP_FEAT_CHANGE:
                netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_CHANGE_NTF);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features
  2025-04-11  2:10     ` Jakub Kicinski
@ 2025-04-11  2:23       ` Jakub Kicinski
  2025-04-11 17:41         ` Stanislav Fomichev
  2025-04-11  2:36       ` Kuniyuki Iwashima
  1 sibling, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-11  2:23 UTC (permalink / raw)
  To: sdf
  Cc: Kuniyuki Iwashima, andrew+netdev, davem, edumazet, horms,
	hramamurthy, jdamato, netdev, pabeni

On Thu, 10 Apr 2025 19:10:28 -0700 Jakub Kicinski wrote:
> On Thu, 10 Apr 2025 10:10:01 -0700 Kuniyuki Iwashima wrote:
> > syzkaller reported splats in register_netdevice() and
> > unregister_netdevice_many_notify().
> > 
> > In register_netdevice(), some devices cannot use
> > netdev_assert_locked().
> > 
> > In unregister_netdevice_many_notify(), maybe we need to
> > hold ops lock in UNREGISTER as you initially suggested.
> > Now do_setlink() deadlock does not happen.  
> 
> Ah...  Thank you.
> 
> Do you have a reference to use as Reported-by, or its from a
> non-public instance ?
> 
> I'll test this shortly:
> 
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index b64c614a00c4..891e2f60922f 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -38,7 +38,8 @@ netdev_nl_dev_fill(struct net_device *netdev, struct sk_buff *rsp,
>         u64 xdp_rx_meta = 0;
>         void *hdr;
>  
> -       netdev_assert_locked(netdev); /* note: rtnl_lock may not be held! */
> +       /* note: rtnl_lock may or may not be held! */
> +       netdev_assert_locked_or_invisible(netdev);
>  
>         hdr = genlmsg_iput(rsp, info);
>         if (!hdr)
> @@ -966,7 +967,9 @@ static int netdev_genl_netdevice_event(struct notifier_block *nb,
>                 netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_ADD_NTF);
>                 break;
>         case NETDEV_UNREGISTER:
> +               netdev_lock(netdev);
>                 netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_DEL_NTF);
> +               netdev_unlock(netdev);
>                 break;
>         case NETDEV_XDP_FEAT_CHANGE:
>                 netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_CHANGE_NTF);

Ugh, REGISTER is ops locked we'd need conditional locking here.

Stanislav, I can make the REGISTERED notifier fully locked, right?
I suspect any new object we add that's protected by the instance
lock will want to lock the dev.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features
  2025-04-11  2:10     ` Jakub Kicinski
  2025-04-11  2:23       ` Jakub Kicinski
@ 2025-04-11  2:36       ` Kuniyuki Iwashima
  1 sibling, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-11  2:36 UTC (permalink / raw)
  To: kuba
  Cc: andrew+netdev, davem, edumazet, horms, hramamurthy, jdamato,
	kuniyu, netdev, pabeni, sdf

From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 10 Apr 2025 19:10:28 -0700
> On Thu, 10 Apr 2025 10:10:01 -0700 Kuniyuki Iwashima wrote:
> > syzkaller reported splats in register_netdevice() and
> > unregister_netdevice_many_notify().
> > 
> > In register_netdevice(), some devices cannot use
> > netdev_assert_locked().
> > 
> > In unregister_netdevice_many_notify(), maybe we need to
> > hold ops lock in UNREGISTER as you initially suggested.
> > Now do_setlink() deadlock does not happen.
> 
> Ah...  Thank you.
> 
> Do you have a reference to use as Reported-by, or its from a
> non-public instance ?

It's from private instance on my EC2, so

  Reported-by: syzkaller <syzkaller@googlegroups.com>

would be fine unless Eric can release a public report from syzbot
if exists.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features
  2025-04-11  2:23       ` Jakub Kicinski
@ 2025-04-11 17:41         ` Stanislav Fomichev
  2025-04-11 19:19           ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: Stanislav Fomichev @ 2025-04-11 17:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: sdf, Kuniyuki Iwashima, andrew+netdev, davem, edumazet, horms,
	hramamurthy, jdamato, netdev, pabeni

On 04/10, Jakub Kicinski wrote:
> On Thu, 10 Apr 2025 19:10:28 -0700 Jakub Kicinski wrote:
> > On Thu, 10 Apr 2025 10:10:01 -0700 Kuniyuki Iwashima wrote:
> > > syzkaller reported splats in register_netdevice() and
> > > unregister_netdevice_many_notify().
> > > 
> > > In register_netdevice(), some devices cannot use
> > > netdev_assert_locked().
> > > 
> > > In unregister_netdevice_many_notify(), maybe we need to
> > > hold ops lock in UNREGISTER as you initially suggested.
> > > Now do_setlink() deadlock does not happen.  
> > 
> > Ah...  Thank you.
> > 
> > Do you have a reference to use as Reported-by, or its from a
> > non-public instance ?
> > 
> > I'll test this shortly:
> > 
> > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > index b64c614a00c4..891e2f60922f 100644
> > --- a/net/core/netdev-genl.c
> > +++ b/net/core/netdev-genl.c
> > @@ -38,7 +38,8 @@ netdev_nl_dev_fill(struct net_device *netdev, struct sk_buff *rsp,
> >         u64 xdp_rx_meta = 0;
> >         void *hdr;
> >  
> > -       netdev_assert_locked(netdev); /* note: rtnl_lock may not be held! */
> > +       /* note: rtnl_lock may or may not be held! */
> > +       netdev_assert_locked_or_invisible(netdev);
> >  
> >         hdr = genlmsg_iput(rsp, info);
> >         if (!hdr)
> > @@ -966,7 +967,9 @@ static int netdev_genl_netdevice_event(struct notifier_block *nb,
> >                 netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_ADD_NTF);
> >                 break;
> >         case NETDEV_UNREGISTER:
> > +               netdev_lock(netdev);
> >                 netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_DEL_NTF);
> > +               netdev_unlock(netdev);
> >                 break;
> >         case NETDEV_XDP_FEAT_CHANGE:
> >                 netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_CHANGE_NTF);
> 
> Ugh, REGISTER is ops locked we'd need conditional locking here.
> 
> Stanislav, I can make the REGISTERED notifier fully locked, right?
> I suspect any new object we add that's protected by the instance
> lock will want to lock the dev.

Are you suggesting to do s/netdev_lock_ops/netdev_lock/ around
call_netdevice_notifiers in register_netdevice? We can try, the biggest
concern, as usual, are the stacking devices (with an extra lock), but
casually grepping for NETDEV_REGISTER doesn't bring up anything
suspicious.

But if you're gonna do conditional locking for NETDEV_UNREGISTER, any
reason not to play it safe and add conditional locking to NETDEV_REGISTER
in netdev_genl_netdevice_event?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features
  2025-04-11 17:41         ` Stanislav Fomichev
@ 2025-04-11 19:19           ` Jakub Kicinski
  0 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2025-04-11 19:19 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: sdf, Kuniyuki Iwashima, andrew+netdev, davem, edumazet, horms,
	hramamurthy, jdamato, netdev, pabeni

On Fri, 11 Apr 2025 10:41:58 -0700 Stanislav Fomichev wrote:
> > Ugh, REGISTER is ops locked we'd need conditional locking here.
> > 
> > Stanislav, I can make the REGISTERED notifier fully locked, right?
> > I suspect any new object we add that's protected by the instance
> > lock will want to lock the dev.  
> 
> Are you suggesting to do s/netdev_lock_ops/netdev_lock/ around
> call_netdevice_notifiers in register_netdevice?

Aha

> We can try, the biggest concern, as usual, are the stacking devices
> (with an extra lock), but casually grepping for NETDEV_REGISTER
> doesn't bring up anything suspicious.
> 
> But if you're gonna do conditional locking for NETDEV_UNREGISTER, any
> reason not to play it safe and add conditional locking to
> NETDEV_REGISTER in netdev_genl_netdevice_event?

Just trying to think what will lead to fewer problems down the line.

Let's me think it thru. So we have this situation:
 - device A - getting registered
 - device L - listens / has the notifier
with upper devs our concern is usually that taking the A's lock
around the notifier forces A -> L lock ordering (between A's instance
lock and whatever lock of L, can be either instance or some other).

If A is an arbitrary device then L has to already be ready to handle
its REGISTER callback under A's instance lock, because A may be ops
locked. So as you say for generic bonding/team changing lock type
is a noop.

If A is a specific device that L is looking for - changing the lock type
around REGISTER may impact L. /me goes to look at the code
Ugh there is a bunch of drivers that wait for a specific device to
register and then take a lock. Let me play it safe then...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct
  2025-04-10 23:39         ` Jakub Kicinski
@ 2025-04-11 21:26           ` Jacob Keller
  0 siblings, 0 replies; 27+ messages in thread
From: Jacob Keller @ 2025-04-11 21:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Dumazet, Eric,
	pabeni@redhat.com, andrew+netdev@lunn.ch, horms@kernel.org,
	sdf@fomichev.me, hramamurthy@google.com, kuniyu@amazon.com,
	Damato, Joe



On 4/10/2025 4:39 PM, Jakub Kicinski wrote:
> On Thu, 10 Apr 2025 22:35:43 +0000 Keller, Jacob E wrote:
>>>> Does this mean we don't allow drivers which support
>>>> netdev_queue_mgmt_ops but don't set request_ops_lock? Or does it mean
>>>> that supporting netdev_queue_mgmt_ops and/or netdev shapers
>>>> automatically implies request_ops_lock? Or is there some other
>>>> behavioral difference?
>>>>
>>>> From the wording this sounds like its enforced via code, and it seems
>>>> reasonable to me that we wouldn't allow these without setting
>>>> request_ops_lock to true...  
>>>
>>> "request" is for drivers to optionally request.
>>> If the driver supports queue or shaper APIs it doesn't have a say.  
>>
>> Which is to say: if you support either of the new APIs, or you
>> automatically get ops locking regardless of what request_ops_lock is,
>> so that if you do support one of those interfaces, there is no
>> behavioral difference between setting or not setting request_ops_lock.
>>
>> Ok, I think that's reasonable.
> 
> Right, and FWIW we may one day actually make the request_ops_lock 
> bit be _the_ decider and auto-set it based on op presence when netdev
> is registered. Purely to simplify the condition in netdev_need_ops_lock().
> For now it isn't that. I was worried if I go into too much detail here
> we'll then forget to update it and stale docs are worse than no docs :(

Yea. I don't think the doc itself needs more comment. My questions are
also for my own education as to whats going on, and to make sure I
understood what the code and docs are saying now.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers
  2025-04-10 23:46     ` Jakub Kicinski
@ 2025-04-11 21:29       ` Jacob Keller
  0 siblings, 0 replies; 27+ messages in thread
From: Jacob Keller @ 2025-04-11 21:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu, jdamato



On 4/10/2025 4:46 PM, Jakub Kicinski wrote:
> On Wed, 9 Apr 2025 22:23:28 -0700 Jacob Keller wrote:
>>> +struct netdev_stat_ops
>>> +----------------------
>>> +
>>> +"qstat" ops are invoked under the instance lock for "ops locked" drivers,
>>> +and under rtnl_lock for all other drivers.
>>> +
>>>  struct net_shaper_ops
>>>  ---------------------
>>>    
>>
>> What determines if a driver is "ops locked"? Is that defined above this
>> chunk in the doc? I see its when netdev_need_ops_lock() is set? Ok.
> 
> Yup, it was hiding in the previous patch:
> 
>    Code comments and docs refer to drivers which have ops called under
>    the instance lock as "ops locked".
> 
>> Sounds like it would be good to start migrating drivers over to this
>> locking paradigm over time.
> 
> At least for the drivers which implement queue stats its nice to be able 
> to dump stats without taking the global lock. 
> 

Yep. Lots of good reasons to do this work, even if it takes a long time
because of how interconnected the problems are. A measured approach
where we do things slowly is great for reducing the risk of such a big
refactor.

>>>  	if (ifindex) {
>>> -		netdev = __dev_get_by_index(net, ifindex);
>>> -		if (netdev && netdev->stat_ops) {
>>> +		netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
>>> +		if (!netdev) {
>>> +			NL_SET_BAD_ATTR(info->extack,
>>> +					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
>>> +			return -ENODEV;
>>> +		}  
>>
>> I guess netdev_get_by_index_lock_ops_compat acquires the lock when it
>> returns success?
> 
> Yes.
> 

Thanks!

>>> +		if (netdev->stat_ops) {
>>>  			err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
>>>  							    info, ctx);
>>>  		} else {
>>>  			NL_SET_BAD_ATTR(info->extack,
>>>  					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
>>> -			err = netdev ? -EOPNOTSUPP : -ENODEV;
>>> -		}
>>> -	} else {  
>>
>> But there's an else branch here so now I'm confused with how this
>> locking works.
> 

Ugh.. Yea I should have noticed that :D

> The diff is really hard to read, sorry, I should have done two patches.
> The else branch is _removed_. The code is now:
> 
> 	if (ifindex) {
> 		netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
> 		...
> 		netdev_unlock_ops_compat(netdev);  
> 		return ;
> 	}
> 
> 	for_each_lock_scoped() {
> 	}

I should have fetched this to review locally, that is much better.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-04-11 21:29 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08 19:59 [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
2025-04-08 19:59 ` [PATCH net-next v2 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch Jakub Kicinski
2025-04-08 19:59 ` [PATCH net-next v2 2/8] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
2025-04-08 19:59 ` [PATCH net-next v2 3/8] netdev: add "ops compat locking" helpers Jakub Kicinski
2025-04-08 19:59 ` [PATCH net-next v2 4/8] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
2025-04-08 19:59 ` [PATCH net-next v2 5/8] xdp: double protect netdev->xdp_flags with netdev->lock Jakub Kicinski
2025-04-08 22:15   ` Harshitha Ramamurthy
2025-04-09 18:40   ` Martin KaFai Lau
2025-04-08 19:59 ` [PATCH net-next v2 6/8] netdev: depend on netdev->lock for xdp features Jakub Kicinski
2025-04-10 17:10   ` Kuniyuki Iwashima
2025-04-11  2:10     ` Jakub Kicinski
2025-04-11  2:23       ` Jakub Kicinski
2025-04-11 17:41         ` Stanislav Fomichev
2025-04-11 19:19           ` Jakub Kicinski
2025-04-11  2:36       ` Kuniyuki Iwashima
2025-04-08 19:59 ` [PATCH net-next v2 7/8] docs: netdev: break down the instance locking info per ops struct Jakub Kicinski
2025-04-09  2:59   ` Joe Damato
2025-04-10  6:01   ` Jacob Keller
2025-04-10 16:08     ` Jakub Kicinski
2025-04-10 22:35       ` Keller, Jacob E
2025-04-10 23:39         ` Jakub Kicinski
2025-04-11 21:26           ` Jacob Keller
2025-04-08 19:59 ` [PATCH net-next v2 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers Jakub Kicinski
2025-04-10  5:23   ` Jacob Keller
2025-04-10 23:46     ` Jakub Kicinski
2025-04-11 21:29       ` Jacob Keller
2025-04-10  0:40 ` [PATCH net-next v2 0/8] net: depend on instance lock for queue related netlink ops patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).