netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops
@ 2025-04-07 19:01 Jakub Kicinski
  2025-04-07 19:01 ` [PATCH net-next 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch Jakub Kicinski
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-07 19:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato

netdev-genl used to be protected by rtnl_lock. In previous release
we already switched the queue management ops (for Rx zero-copy) to
the instance lock. This series converts other ops to depend on the
instance lock when possible.

Unfortunately queue related state is hard to lock (unlike NAPI)
as the process of switching the number of queues usually involves
a large reconfiguration of the driver. The reconfig process has
historically been under rtnl_lock, but for drivers which opt into
ops locking it is also under the instance lock. Leverage that
and conditionally take rtnl_lock or instance lock depending
on the device capabilities.

Jakub Kicinski (8):
  net: avoid potential race between netdev_get_by_index_lock() and netns
    switch
  net: designate XSK pool pointers in queues as "ops protected"
  netdev: add "ops compat locking" helpers
  netdev: don't hold rtnl_lock over nl queue info get when possible
  xdp: double protect netdev->xdp_flags with netdev->lock
  netdev: depend on netdev->lock for xdp features
  docs: netdev: break down the instance locking info per ops struct
  netdev: depend on netdev->lock for qstats in ops locked drivers

 Documentation/networking/netdevices.rst    | 61 +++++++++++++----
 include/linux/netdevice.h                  |  7 +-
 include/net/netdev_lock.h                  | 16 +++++
 include/net/netdev_queues.h                |  4 +-
 include/net/netdev_rx_queue.h              |  6 +-
 include/net/xdp.h                          |  1 +
 net/core/dev.h                             | 17 ++++-
 drivers/net/ethernet/google/gve/gve_main.c |  2 +-
 net/core/dev.c                             | 76 ++++++++++++++++++++--
 net/core/lock_debug.c                      |  2 +-
 net/core/netdev-genl.c                     | 73 ++++++++++-----------
 net/core/xdp.c                             | 12 +++-
 net/xdp/xsk.c                              |  2 +
 net/xdp/xsk_buff_pool.c                    |  7 +-
 14 files changed, 220 insertions(+), 66 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch
  2025-04-07 19:01 [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
@ 2025-04-07 19:01 ` Jakub Kicinski
  2025-04-08  2:40   ` Joe Damato
  2025-04-07 19:01 ` [PATCH net-next 2/8] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-07 19:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato

netdev_get_by_index_lock() performs following steps:

  rcu_lock();
  dev = lookup(netns, ifindex);
  dev_get(dev);
  rcu_unlock();
  [... lock & validate the dev ...]
  return dev

Validation right now only checks if the device is registered but since
the lookup is netns-aware we must also protect against the device
switching netns right after we dropped the RCU lock. Otherwise
the caller in netns1 may get a pointer to a device which has just
switched to netns2.

We can't hold the lock for the entire netns change process (because of
the NETDEV_UNREGISTER notifier), and there's no existing marking to
indicate that the netns is unlisted because of netns move, so add one.

AFAIU none of the existing netdev_get_by_index_lock() callers can
suffer from this problem (NAPI code double checks the netns membership
and other callers are either under rtnl_lock or not ns-sensitive),
so this patch does not have to be treated as a fix.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/netdevice.h |  6 +++++-
 net/core/dev.h            |  2 +-
 net/core/dev.c            | 25 ++++++++++++++++++-------
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf3b6445817b..8e9be80bc167 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1952,6 +1952,7 @@ enum netdev_reg_state {
  *	@priv_destructor:	Called from unregister
  *	@npinfo:		XXX: need comments on this one
  * 	@nd_net:		Network namespace this network device is inside
+ *				protected by @lock
  *
  * 	@ml_priv:	Mid-layer private
  *	@ml_priv_type:  Mid-layer private type
@@ -2359,6 +2360,9 @@ struct net_device {
 
 	bool dismantle;
 
+	/** @moving_ns: device is changing netns, protected by @lock */
+	bool moving_ns;
+
 	enum {
 		RTNL_LINK_INITIALIZED,
 		RTNL_LINK_INITIALIZING,
@@ -2521,7 +2525,7 @@ struct net_device {
 	 *	@net_shaper_hierarchy, @reg_state, @threaded
 	 *
 	 * Double protects:
-	 *	@up
+	 *	@up, @moving_ns, @nd_net
 	 *
 	 * Double ops protects:
 	 *	@real_num_rx_queues, @real_num_tx_queues
diff --git a/net/core/dev.h b/net/core/dev.h
index 7ee203395d8e..3cc2d8787c83 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -29,7 +29,7 @@ netdev_napi_by_id_lock(struct net *net, unsigned int napi_id);
 struct net_device *dev_get_by_napi_id(unsigned int napi_id);
 
 struct net_device *netdev_get_by_index_lock(struct net *net, int ifindex);
-struct net_device *__netdev_put_lock(struct net_device *dev);
+struct net_device *__netdev_put_lock(struct net_device *dev, struct net *net);
 struct net_device *
 netdev_xa_find_lock(struct net *net, struct net_device *dev,
 		    unsigned long *index);
diff --git a/net/core/dev.c b/net/core/dev.c
index 0608605cfc24..7060c3171cd8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -828,7 +828,7 @@ netdev_napi_by_id_lock(struct net *net, unsigned int napi_id)
 	dev_hold(dev);
 	rcu_read_unlock();
 
-	dev = __netdev_put_lock(dev);
+	dev = __netdev_put_lock(dev, net);
 	if (!dev)
 		return NULL;
 
@@ -1039,10 +1039,11 @@ struct net_device *dev_get_by_napi_id(unsigned int napi_id)
  * This helper is intended for locking net_device after it has been looked up
  * using a lockless lookup helper. Lock prevents the instance from going away.
  */
-struct net_device *__netdev_put_lock(struct net_device *dev)
+struct net_device *__netdev_put_lock(struct net_device *dev, struct net *net)
 {
 	netdev_lock(dev);
-	if (dev->reg_state > NETREG_REGISTERED) {
+	if (dev->reg_state > NETREG_REGISTERED ||
+	    dev->moving_ns || !net_eq(dev_net(dev), net)) {
 		netdev_unlock(dev);
 		dev_put(dev);
 		return NULL;
@@ -1070,7 +1071,7 @@ struct net_device *netdev_get_by_index_lock(struct net *net, int ifindex)
 	if (!dev)
 		return NULL;
 
-	return __netdev_put_lock(dev);
+	return __netdev_put_lock(dev, net);
 }
 
 struct net_device *
@@ -1090,7 +1091,7 @@ netdev_xa_find_lock(struct net *net, struct net_device *dev,
 		dev_hold(dev);
 		rcu_read_unlock();
 
-		dev = __netdev_put_lock(dev);
+		dev = __netdev_put_lock(dev, net);
 		if (dev)
 			return dev;
 
@@ -12154,7 +12155,11 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
 	netif_close(dev);
 	/* And unlink it from device chain */
 	unlist_netdevice(dev);
-	netdev_unlock_ops(dev);
+
+	if (!netdev_need_ops_lock(dev))
+		netdev_lock(dev);
+	dev->moving_ns = true;
+	netdev_unlock(dev);
 
 	synchronize_net();
 
@@ -12190,7 +12195,9 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
 	move_netdevice_notifiers_dev_net(dev, net);
 
 	/* Actually switch the network namespace */
+	netdev_lock(dev);
 	dev_net_set(dev, net);
+	netdev_unlock(dev);
 	dev->ifindex = new_ifindex;
 
 	if (new_name[0]) {
@@ -12216,7 +12223,11 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
 	err = netdev_change_owner(dev, net_old, net);
 	WARN_ON(err);
 
-	netdev_lock_ops(dev);
+	netdev_lock(dev);
+	dev->moving_ns = false;
+	if (!netdev_need_ops_lock(dev))
+		netdev_unlock(dev);
+
 	/* Add the device back in the hashes */
 	list_netdevice(dev);
 	/* Notify protocols, that a new device appeared. */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 2/8] net: designate XSK pool pointers in queues as "ops protected"
  2025-04-07 19:01 [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
  2025-04-07 19:01 ` [PATCH net-next 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch Jakub Kicinski
@ 2025-04-07 19:01 ` Jakub Kicinski
  2025-04-08  2:17   ` Joe Damato
  2025-04-07 19:01 ` [PATCH net-next 3/8] netdev: add "ops compat locking" helpers Jakub Kicinski
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-07 19:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato

Read accesses go via xsk_get_pool_from_qid(), the call coming
from the core and gve look safe (other "ops locked" drivers
don't support XSK).

Write accesses go via xsk_reg_pool_at_qid() and xsk_clear_pool_at_qid().
Former is already under the ops lock, latter needs to be locked when
coming from the workqueue via xp_clear_dev().

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
---
 include/linux/netdevice.h     | 1 +
 include/net/netdev_rx_queue.h | 6 +++---
 net/xdp/xsk.c                 | 2 ++
 net/xdp/xsk_buff_pool.c       | 7 ++++++-
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8e9be80bc167..7242fb8a22fc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -688,6 +688,7 @@ struct netdev_queue {
 	/* Subordinate device that the queue has been assigned to */
 	struct net_device	*sb_dev;
 #ifdef CONFIG_XDP_SOCKETS
+	/* "ops protected", see comment about net_device::lock */
 	struct xsk_buff_pool    *pool;
 #endif
 
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index b2238b551dce..8cdcd138b33f 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -20,12 +20,12 @@ struct netdev_rx_queue {
 	struct net_device		*dev;
 	netdevice_tracker		dev_tracker;
 
+	/* All fields below are "ops protected",
+	 * see comment about net_device::lock
+	 */
 #ifdef CONFIG_XDP_SOCKETS
 	struct xsk_buff_pool            *pool;
 #endif
-	/* NAPI instance for the queue
-	 * "ops protected", see comment about net_device::lock
-	 */
 	struct napi_struct		*napi;
 	struct pp_memory_provider_params mp_params;
 } ____cacheline_aligned_in_smp;
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5696af45bcf7..2ecf063a2c38 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -1654,7 +1654,9 @@ static int xsk_notifier(struct notifier_block *this,
 				xsk_unbind_dev(xs);
 
 				/* Clear device references. */
+				netdev_lock_ops(dev);
 				xp_clear_dev(xs->pool);
+				netdev_unlock_ops(dev);
 			}
 			mutex_unlock(&xs->mutex);
 		}
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 25a76c5ce0f1..c7e50fd86c6a 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -279,9 +279,14 @@ static void xp_release_deferred(struct work_struct *work)
 {
 	struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool,
 						  work);
+	struct net_device *netdev = pool->netdev;
 
 	rtnl_lock();
-	xp_clear_dev(pool);
+	if (netdev) {
+		netdev_lock_ops(netdev);
+		xp_clear_dev(pool);
+		netdev_unlock_ops(netdev);
+	}
 	rtnl_unlock();
 
 	if (pool->fq) {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 3/8] netdev: add "ops compat locking" helpers
  2025-04-07 19:01 [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
  2025-04-07 19:01 ` [PATCH net-next 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch Jakub Kicinski
  2025-04-07 19:01 ` [PATCH net-next 2/8] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
@ 2025-04-07 19:01 ` Jakub Kicinski
  2025-04-08  2:41   ` Joe Damato
  2025-04-07 19:01 ` [PATCH net-next 4/8] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-07 19:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato

Add helpers to "lock a netdev in a backward-compatible way",
which for ops-locked netdevs will mean take the instance lock.
For drivers which haven't opted into the ops locking we'll take
rtnl_lock.

The scoped foreach is dropping and re-taking the lock for each
device, even if prev and next are both under rtnl_lock.
I hope that's fine since we expect that netdev nl to be mostly
supported by modern drivers, and modern drivers should also
opt into the instance locking.

Note that these helpers are mostly needed for queue related state,
because drivers modify queue config in their ops in a non-atomic
way. Or differently put, queue changes don't have a clear-cut API
like NAPI configuration. Any state that can should just use the
instance lock directly, not the "compat" hacks.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/net/netdev_lock.h | 16 ++++++++++++
 net/core/dev.h            | 15 ++++++++++++
 net/core/dev.c            | 51 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)

diff --git a/include/net/netdev_lock.h b/include/net/netdev_lock.h
index c316b551df8d..5706835a660c 100644
--- a/include/net/netdev_lock.h
+++ b/include/net/netdev_lock.h
@@ -64,6 +64,22 @@ netdev_ops_assert_locked_or_invisible(const struct net_device *dev)
 		netdev_ops_assert_locked(dev);
 }
 
+static inline void netdev_lock_ops_compat(struct net_device *dev)
+{
+	if (netdev_need_ops_lock(dev))
+		netdev_lock(dev);
+	else
+		rtnl_lock();
+}
+
+static inline void netdev_unlock_ops_compat(struct net_device *dev)
+{
+	if (netdev_need_ops_lock(dev))
+		netdev_unlock(dev);
+	else
+		rtnl_unlock();
+}
+
 static inline int netdev_lock_cmp_fn(const struct lockdep_map *a,
 				     const struct lockdep_map *b)
 {
diff --git a/net/core/dev.h b/net/core/dev.h
index 3cc2d8787c83..3fd7847d6d60 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -41,6 +41,21 @@ DEFINE_FREE(netdev_unlock, struct net_device *, if (_T) netdev_unlock(_T));
 	     (var_name = netdev_xa_find_lock(net, var_name, &ifindex)); \
 	     ifindex++)
 
+struct net_device *
+netdev_get_by_index_lock_ops_compat(struct net *net, int ifindex);
+struct net_device *
+netdev_xa_find_lock_ops_compat(struct net *net, struct net_device *dev,
+			       unsigned long *index);
+
+DEFINE_FREE(netdev_unlock_ops_compat, struct net_device *,
+	    if (_T) netdev_unlock_ops_compat(_T));
+
+#define for_each_netdev_lock_ops_compat_scoped(net, var_name, ifindex)	\
+	for (struct net_device *var_name __free(netdev_unlock_ops_compat) = NULL; \
+	     (var_name = netdev_xa_find_lock_ops_compat(net, var_name,	\
+							&ifindex));	\
+	     ifindex++)
+
 #ifdef CONFIG_PROC_FS
 int __init dev_proc_init(void);
 #else
diff --git a/net/core/dev.c b/net/core/dev.c
index 7060c3171cd8..baf615e0ae27 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1052,6 +1052,20 @@ struct net_device *__netdev_put_lock(struct net_device *dev, struct net *net)
 	return dev;
 }
 
+static struct net_device *
+__netdev_put_lock_ops_compat(struct net_device *dev, struct net *net)
+{
+	netdev_lock_ops_compat(dev);
+	if (dev->reg_state > NETREG_REGISTERED ||
+	    dev->moving_ns || !net_eq(dev_net(dev), net)) {
+		netdev_unlock_ops_compat(dev);
+		dev_put(dev);
+		return NULL;
+	}
+	dev_put(dev);
+	return dev;
+}
+
 /**
  *	netdev_get_by_index_lock() - find a device by its ifindex
  *	@net: the applicable net namespace
@@ -1074,6 +1088,18 @@ struct net_device *netdev_get_by_index_lock(struct net *net, int ifindex)
 	return __netdev_put_lock(dev, net);
 }
 
+struct net_device *
+netdev_get_by_index_lock_ops_compat(struct net *net, int ifindex)
+{
+	struct net_device *dev;
+
+	dev = dev_get_by_index(net, ifindex);
+	if (!dev)
+		return NULL;
+
+	return __netdev_put_lock_ops_compat(dev, net);
+}
+
 struct net_device *
 netdev_xa_find_lock(struct net *net, struct net_device *dev,
 		    unsigned long *index)
@@ -1099,6 +1125,31 @@ netdev_xa_find_lock(struct net *net, struct net_device *dev,
 	} while (true);
 }
 
+struct net_device *
+netdev_xa_find_lock_ops_compat(struct net *net, struct net_device *dev,
+			       unsigned long *index)
+{
+	if (dev)
+		netdev_unlock_ops_compat(dev);
+
+	do {
+		rcu_read_lock();
+		dev = xa_find(&net->dev_by_index, index, ULONG_MAX, XA_PRESENT);
+		if (!dev) {
+			rcu_read_unlock();
+			return NULL;
+		}
+		dev_hold(dev);
+		rcu_read_unlock();
+
+		dev = __netdev_put_lock_ops_compat(dev, net);
+		if (dev)
+			return dev;
+
+		(*index)++;
+	} while (true);
+}
+
 static DEFINE_SEQLOCK(netdev_rename_lock);
 
 void netdev_copy_name(struct net_device *dev, char *name)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 4/8] netdev: don't hold rtnl_lock over nl queue info get when possible
  2025-04-07 19:01 [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (2 preceding siblings ...)
  2025-04-07 19:01 ` [PATCH net-next 3/8] netdev: add "ops compat locking" helpers Jakub Kicinski
@ 2025-04-07 19:01 ` Jakub Kicinski
  2025-04-08  2:28   ` Joe Damato
  2025-04-07 19:01 ` [PATCH net-next 5/8] xdp: double protect netdev->xdp_flags with netdev->lock Jakub Kicinski
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-07 19:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato

Netdev queue dump accesses: NAPI, memory providers, XSk pointers.
All three are "ops protected" now, switch to the op compat locking.
rtnl lock does not have to be taken for "ops locked" devices.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/core/netdev-genl.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 5d7af50fe702..7ef9b0191936 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -481,18 +481,15 @@ int netdev_nl_queue_get_doit(struct sk_buff *skb, struct genl_info *info)
 	if (!rsp)
 		return -ENOMEM;
 
-	rtnl_lock();
-
-	netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
+	netdev = netdev_get_by_index_lock_ops_compat(genl_info_net(info),
+						     ifindex);
 	if (netdev) {
 		err = netdev_nl_queue_fill(rsp, netdev, q_id, q_type, info);
-		netdev_unlock(netdev);
+		netdev_unlock_ops_compat(netdev);
 	} else {
 		err = -ENODEV;
 	}
 
-	rtnl_unlock();
-
 	if (err)
 		goto err_free_msg;
 
@@ -541,17 +538,17 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 	if (info->attrs[NETDEV_A_QUEUE_IFINDEX])
 		ifindex = nla_get_u32(info->attrs[NETDEV_A_QUEUE_IFINDEX]);
 
-	rtnl_lock();
 	if (ifindex) {
-		netdev = netdev_get_by_index_lock(net, ifindex);
+		netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
 		if (netdev) {
 			err = netdev_nl_queue_dump_one(netdev, skb, info, ctx);
-			netdev_unlock(netdev);
+			netdev_unlock_ops_compat(netdev);
 		} else {
 			err = -ENODEV;
 		}
 	} else {
-		for_each_netdev_lock_scoped(net, netdev, ctx->ifindex) {
+		for_each_netdev_lock_ops_compat_scoped(net, netdev,
+						       ctx->ifindex) {
 			err = netdev_nl_queue_dump_one(netdev, skb, info, ctx);
 			if (err < 0)
 				break;
@@ -559,7 +556,6 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 			ctx->txq_idx = 0;
 		}
 	}
-	rtnl_unlock();
 
 	return err;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 5/8] xdp: double protect netdev->xdp_flags with netdev->lock
  2025-04-07 19:01 [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (3 preceding siblings ...)
  2025-04-07 19:01 ` [PATCH net-next 4/8] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
@ 2025-04-07 19:01 ` Jakub Kicinski
  2025-04-08  2:30   ` Joe Damato
  2025-04-07 19:01 ` [PATCH net-next 6/8] netdev: depend on netdev->lock for xdp features Jakub Kicinski
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-07 19:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato

Protect xdp_features with netdev->lock. This way pure readers
no longer have to take rtnl_lock to access the field.

This includes calling NETDEV_XDP_FEAT_CHANGE under the lock.
Looks like that's fine for bonding, the only "real" listener,
it's the same as ethtool feature change.

In terms of normal drivers - only GVE need special consideration
(other drivers don't use instance lock or don't support XDP).
It calls xdp_set_features_flag() helper from gve_init_priv() which
in turn is called from gve_reset_recovery() (locked), or prior
to netdev registration. So switch to _locked.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: bpf@vger.kernel.org
---
 Documentation/networking/netdevices.rst    |  1 +
 include/linux/netdevice.h                  |  2 +-
 include/net/xdp.h                          |  1 +
 drivers/net/ethernet/google/gve/gve_main.c |  2 +-
 net/core/lock_debug.c                      |  2 +-
 net/core/xdp.c                             | 12 +++++++++++-
 6 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 6c2d8945f597..d6357472d3f1 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -354,6 +354,7 @@ For devices with locked ops, currently only the following notifiers are
 running under the lock:
 * ``NETDEV_REGISTER``
 * ``NETDEV_UP``
+* ``NETDEV_XDP_FEAT_CHANGE``
 
 The following notifiers are running without the lock:
 * ``NETDEV_UNREGISTER``
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7242fb8a22fc..dece2ae396a1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2526,7 +2526,7 @@ struct net_device {
 	 *	@net_shaper_hierarchy, @reg_state, @threaded
 	 *
 	 * Double protects:
-	 *	@up, @moving_ns, @nd_net
+	 *	@up, @moving_ns, @nd_net, @xdp_flags
 	 *
 	 * Double ops protects:
 	 *	@real_num_rx_queues, @real_num_tx_queues
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 48efacbaa35d..20e41b5ff319 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -616,6 +616,7 @@ struct xdp_metadata_ops {
 u32 bpf_xdp_metadata_kfunc_id(int id);
 bool bpf_dev_bound_kfunc_id(u32 btf_id);
 void xdp_set_features_flag(struct net_device *dev, xdp_features_t val);
+void xdp_set_features_flag_locked(struct net_device *dev, xdp_features_t val);
 void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg);
 void xdp_features_clear_redirect_target(struct net_device *dev);
 #else
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index f9a73c956861..7a249baee316 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -2185,7 +2185,7 @@ static void gve_set_netdev_xdp_features(struct gve_priv *priv)
 		xdp_features = 0;
 	}
 
-	xdp_set_features_flag(priv->dev, xdp_features);
+	xdp_set_features_flag_locked(priv->dev, xdp_features);
 }
 
 static int gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
diff --git a/net/core/lock_debug.c b/net/core/lock_debug.c
index b7f22dc92a6f..598c443ef2f3 100644
--- a/net/core/lock_debug.c
+++ b/net/core/lock_debug.c
@@ -20,6 +20,7 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
 	switch (cmd) {
 	case NETDEV_REGISTER:
 	case NETDEV_UP:
+	case NETDEV_XDP_FEAT_CHANGE:
 		netdev_ops_assert_locked(dev);
 		fallthrough;
 	case NETDEV_DOWN:
@@ -58,7 +59,6 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
 	case NETDEV_OFFLOAD_XSTATS_DISABLE:
 	case NETDEV_OFFLOAD_XSTATS_REPORT_USED:
 	case NETDEV_OFFLOAD_XSTATS_REPORT_DELTA:
-	case NETDEV_XDP_FEAT_CHANGE:
 		ASSERT_RTNL();
 		break;
 
diff --git a/net/core/xdp.c b/net/core/xdp.c
index f86eedad586a..3cd0db9c9d2d 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -17,6 +17,7 @@
 #include <net/page_pool/helpers.h>
 
 #include <net/hotdata.h>
+#include <net/netdev_lock.h>
 #include <net/xdp.h>
 #include <net/xdp_priv.h> /* struct xdp_mem_allocator */
 #include <trace/events/xdp.h>
@@ -991,17 +992,26 @@ static int __init xdp_metadata_init(void)
 }
 late_initcall(xdp_metadata_init);
 
-void xdp_set_features_flag(struct net_device *dev, xdp_features_t val)
+void xdp_set_features_flag_locked(struct net_device *dev, xdp_features_t val)
 {
 	val &= NETDEV_XDP_ACT_MASK;
 	if (dev->xdp_features == val)
 		return;
 
+	netdev_assert_locked_or_invisible(dev);
 	dev->xdp_features = val;
 
 	if (dev->reg_state == NETREG_REGISTERED)
 		call_netdevice_notifiers(NETDEV_XDP_FEAT_CHANGE, dev);
 }
+EXPORT_SYMBOL_GPL(xdp_set_features_flag_locked);
+
+void xdp_set_features_flag(struct net_device *dev, xdp_features_t val)
+{
+	netdev_lock(dev);
+	xdp_set_features_flag_locked(dev, val);
+	netdev_unlock(dev);
+}
 EXPORT_SYMBOL_GPL(xdp_set_features_flag);
 
 void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 6/8] netdev: depend on netdev->lock for xdp features
  2025-04-07 19:01 [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (4 preceding siblings ...)
  2025-04-07 19:01 ` [PATCH net-next 5/8] xdp: double protect netdev->xdp_flags with netdev->lock Jakub Kicinski
@ 2025-04-07 19:01 ` Jakub Kicinski
  2025-04-08  2:31   ` Joe Damato
  2025-04-07 19:01 ` [PATCH net-next 7/8] docs: netdev: break down the instance locking info per ops struct Jakub Kicinski
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-07 19:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato

Writes to XDP features are now protected by netdev->lock.
Other things we report are based on ops which don't change
once device has been registered. It is safe to stop taking
rtnl_lock, and depend on netdev->lock instead.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/core/netdev-genl.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 7ef9b0191936..8c58261de969 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -38,6 +38,8 @@ netdev_nl_dev_fill(struct net_device *netdev, struct sk_buff *rsp,
 	u64 xdp_rx_meta = 0;
 	void *hdr;
 
+	netdev_assert_locked(netdev); /* note: rtnl_lock may not be held! */
+
 	hdr = genlmsg_iput(rsp, info);
 	if (!hdr)
 		return -EMSGSIZE;
@@ -122,15 +124,14 @@ int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info)
 	if (!rsp)
 		return -ENOMEM;
 
-	rtnl_lock();
-
-	netdev = __dev_get_by_index(genl_info_net(info), ifindex);
-	if (netdev)
-		err = netdev_nl_dev_fill(netdev, rsp, info);
-	else
+	netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
+	if (!netdev) {
 		err = -ENODEV;
+		goto err_free_msg;
+	}
 
-	rtnl_unlock();
+	err = netdev_nl_dev_fill(netdev, rsp, info);
+	netdev_unlock(netdev);
 
 	if (err)
 		goto err_free_msg;
@@ -146,18 +147,15 @@ int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct netdev_nl_dump_ctx *ctx = netdev_dump_ctx(cb);
 	struct net *net = sock_net(skb->sk);
-	struct net_device *netdev;
-	int err = 0;
+	int err;
 
-	rtnl_lock();
-	for_each_netdev_dump(net, netdev, ctx->ifindex) {
+	for_each_netdev_lock_scoped(net, netdev, ctx->ifindex) {
 		err = netdev_nl_dev_fill(netdev, skb, genl_info_dump(cb));
 		if (err < 0)
-			break;
+			return err;
 	}
-	rtnl_unlock();
 
-	return err;
+	return 0;
 }
 
 static int
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 7/8] docs: netdev: break down the instance locking info per ops struct
  2025-04-07 19:01 [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (5 preceding siblings ...)
  2025-04-07 19:01 ` [PATCH net-next 6/8] netdev: depend on netdev->lock for xdp features Jakub Kicinski
@ 2025-04-07 19:01 ` Jakub Kicinski
  2025-04-08  2:34   ` Joe Damato
  2025-04-07 19:01 ` [PATCH net-next 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers Jakub Kicinski
  2025-04-08  0:28 ` [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Stanislav Fomichev
  8 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-07 19:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato

Explicitly list all the ops structs and what locking they provide.
Use "ops locked" as a term for drivers which have ops called under
the instance lock.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/netdevices.rst | 54 +++++++++++++++++++------
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index d6357472d3f1..0cfff56b436e 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -314,13 +314,8 @@ struct napi_struct synchronization rules
 		 softirq
 		 will be called with interrupts disabled by netconsole.
 
-struct netdev_queue_mgmt_ops synchronization rules
-==================================================
-
-All queue management ndo callbacks are holding netdev instance lock.
-
-RTNL and netdev instance lock
-=============================
+netdev instance lock
+====================
 
 Historically, all networking control operations were protected by a single
 global lock known as ``rtnl_lock``. There is an ongoing effort to replace this
@@ -328,10 +323,13 @@ global lock with separate locks for each network namespace. Additionally,
 properties of individual netdev are increasingly protected by per-netdev locks.
 
 For device drivers that implement shaping or queue management APIs, all control
-operations will be performed under the netdev instance lock. Currently, this
-instance lock is acquired within the context of ``rtnl_lock``. The drivers
-can also explicitly request instance lock to be acquired via
-``request_ops_lock``. In the future, there will be an option for individual
+operations will be performed under the netdev instance lock.
+Drivers can also explicitly request instance lock to be held during ops
+by setting ``request_ops_lock`` to true. Code comments and docs refer
+to drivers which have ops called under the instance lock as "ops locked".
+See also the documentation of the ``lock`` member of struct net_device.
+
+In the future, there will be an option for individual
 drivers to opt out of using ``rtnl_lock`` and instead perform their control
 operations directly under the netdev instance lock.
 
@@ -343,8 +341,40 @@ there are two sets of interfaces: ``dev_xxx`` and ``netif_xxx`` (e.g.,
 acquiring the instance lock themselves, while the ``netif_xxx`` functions
 assume that the driver has already acquired the instance lock.
 
+struct net_device_ops
+---------------------
+
+``ndos`` are called without holding the instance lock for most drivers.
+
+"Ops locked" drivers will have most of the ``ndos`` invoked under
+the instance lock.
+
+struct ethtool_ops
+------------------
+
+Similarly to ``ndos`` the instance lock is only held for select drivers.
+For "ops locked" drivers all ethtool ops without an exception should
+be called under the instance lock.
+
+struct net_shaper_ops
+---------------------
+
+All net shaper callbacks are invoked while holding the netdev instance
+lock. ``rtnl_lock`` may or may not be held.
+
+Note that supporting net shapers automatically enables "ops locking".
+
+struct netdev_queue_mgmt_ops
+----------------------------
+
+All queue management callbacks are invoked while holding the netdev instance
+lock. ``rtnl_lock`` may or may not be held.
+
+Note that supporting struct netdev_queue_mgmt_ops automatically enables
+"ops locking".
+
 Notifiers and netdev instance lock
-==================================
+----------------------------------
 
 For device drivers that implement shaping or queue management APIs,
 some of the notifiers (``enum netdev_cmd``) are running under the netdev
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers
  2025-04-07 19:01 [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (6 preceding siblings ...)
  2025-04-07 19:01 ` [PATCH net-next 7/8] docs: netdev: break down the instance locking info per ops struct Jakub Kicinski
@ 2025-04-07 19:01 ` Jakub Kicinski
  2025-04-08  2:37   ` Joe Damato
  2025-04-08  0:28 ` [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Stanislav Fomichev
  8 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-07 19:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf, hramamurthy,
	kuniyu, jdamato

We mostly needed rtnl_lock in qstat to make sure the queue count
is stable while we work. For "ops locked" drivers the instance
lock protects the queue count, so we don't have to take rtnl_lock.

For currently ops-locked drivers: netdevsim and bnxt need
the protection from netdev going down while we dump, which
instance lock provides. gve doesn't care.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/netdevices.rst |  6 +++++
 include/net/netdev_queues.h             |  4 +++-
 net/core/netdev-genl.c                  | 29 +++++++++++++++----------
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 0cfff56b436e..ec9d9a2cefe7 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -356,6 +356,12 @@ Similarly to ``ndos`` the instance lock is only held for select drivers.
 For "ops locked" drivers all ethtool ops without an exception should
 be called under the instance lock.
 
+struct netdev_stat_ops
+----------------------
+
+"qstat" ops are invoked under the instance lock for "ops locked" drivers,
+and under rtnl_lock for all other drivers.
+
 struct net_shaper_ops
 ---------------------
 
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 825141d675e5..ea709b59d827 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -85,9 +85,11 @@ struct netdev_queue_stats_tx {
  * for some of the events is not maintained, and reliable "total" cannot
  * be provided).
  *
+ * Ops are called under the instance lock if netdev_need_ops_lock()
+ * returns true, otherwise under rtnl_lock.
  * Device drivers can assume that when collecting total device stats,
  * the @get_base_stats and subsequent per-queue calls are performed
- * "atomically" (without releasing the rtnl_lock).
+ * "atomically" (without releasing the relevant lock).
  *
  * Device drivers are encouraged to reset the per-queue statistics when
  * number of queues change. This is because the primary use case for
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 8c58261de969..b64c614a00c4 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -795,26 +795,31 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
 	if (info->attrs[NETDEV_A_QSTATS_IFINDEX])
 		ifindex = nla_get_u32(info->attrs[NETDEV_A_QSTATS_IFINDEX]);
 
-	rtnl_lock();
 	if (ifindex) {
-		netdev = __dev_get_by_index(net, ifindex);
-		if (netdev && netdev->stat_ops) {
+		netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
+		if (!netdev) {
+			NL_SET_BAD_ATTR(info->extack,
+					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
+			return -ENODEV;
+		}
+		if (netdev->stat_ops) {
 			err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
 							    info, ctx);
 		} else {
 			NL_SET_BAD_ATTR(info->extack,
 					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
-			err = netdev ? -EOPNOTSUPP : -ENODEV;
-		}
-	} else {
-		for_each_netdev_dump(net, netdev, ctx->ifindex) {
-			err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
-							    info, ctx);
-			if (err < 0)
-				break;
+			err = -EOPNOTSUPP;
 		}
+		netdev_unlock_ops_compat(netdev);
+		return err;
+	}
+
+	for_each_netdev_lock_ops_compat_scoped(net, netdev, ctx->ifindex) {
+		err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
+						    info, ctx);
+		if (err < 0)
+			break;
 	}
-	rtnl_unlock();
 
 	return err;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops
  2025-04-07 19:01 [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
                   ` (7 preceding siblings ...)
  2025-04-07 19:01 ` [PATCH net-next 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers Jakub Kicinski
@ 2025-04-08  0:28 ` Stanislav Fomichev
  8 siblings, 0 replies; 20+ messages in thread
From: Stanislav Fomichev @ 2025-04-08  0:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu, jdamato

On 04/07, Jakub Kicinski wrote:
> netdev-genl used to be protected by rtnl_lock. In previous release
> we already switched the queue management ops (for Rx zero-copy) to
> the instance lock. This series converts other ops to depend on the
> instance lock when possible.
> 
> Unfortunately queue related state is hard to lock (unlike NAPI)
> as the process of switching the number of queues usually involves
> a large reconfiguration of the driver. The reconfig process has
> historically been under rtnl_lock, but for drivers which opt into
> ops locking it is also under the instance lock. Leverage that
> and conditionally take rtnl_lock or instance lock depending
> on the device capabilities.
> 
> Jakub Kicinski (8):
>   net: avoid potential race between netdev_get_by_index_lock() and netns
>     switch
>   net: designate XSK pool pointers in queues as "ops protected"
>   netdev: add "ops compat locking" helpers
>   netdev: don't hold rtnl_lock over nl queue info get when possible
>   xdp: double protect netdev->xdp_flags with netdev->lock
>   netdev: depend on netdev->lock for xdp features
>   docs: netdev: break down the instance locking info per ops struct
>   netdev: depend on netdev->lock for qstats in ops locked drivers

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] net: designate XSK pool pointers in queues as "ops protected"
  2025-04-07 19:01 ` [PATCH net-next 2/8] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
@ 2025-04-08  2:17   ` Joe Damato
  2025-04-08 14:59     ` Jakub Kicinski
  0 siblings, 1 reply; 20+ messages in thread
From: Joe Damato @ 2025-04-08  2:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Mon, Apr 07, 2025 at 12:01:11PM -0700, Jakub Kicinski wrote:
> Read accesses go via xsk_get_pool_from_qid(), the call coming
> from the core and gve look safe (other "ops locked" drivers
> don't support XSK).
> 
> Write accesses go via xsk_reg_pool_at_qid() and xsk_clear_pool_at_qid().
> Former is already under the ops lock, latter needs to be locked when
> coming from the workqueue via xp_clear_dev().
> 
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
> ---
>  include/linux/netdevice.h     | 1 +
>  include/net/netdev_rx_queue.h | 6 +++---
>  net/xdp/xsk.c                 | 2 ++
>  net/xdp/xsk_buff_pool.c       | 7 ++++++-
>  4 files changed, 12 insertions(+), 4 deletions(-)

[...]

> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 25a76c5ce0f1..c7e50fd86c6a 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -279,9 +279,14 @@ static void xp_release_deferred(struct work_struct *work)
>  {
>  	struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool,
>  						  work);
> +	struct net_device *netdev = pool->netdev;
>  
>  	rtnl_lock();
> -	xp_clear_dev(pool);
> +	if (netdev) {
> +		netdev_lock_ops(netdev);
> +		xp_clear_dev(pool);
> +		netdev_unlock_ops(netdev);
> +	}
>  	rtnl_unlock();

Is it actually possible for netdev to be NULL here?

I feel like it probably isn't, but if it were possible we'd need an
else case here to xp_clear_dev(pool) without the netdev_lock_ops?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/8] netdev: don't hold rtnl_lock over nl queue info get when possible
  2025-04-07 19:01 ` [PATCH net-next 4/8] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
@ 2025-04-08  2:28   ` Joe Damato
  0 siblings, 0 replies; 20+ messages in thread
From: Joe Damato @ 2025-04-08  2:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Mon, Apr 07, 2025 at 12:01:13PM -0700, Jakub Kicinski wrote:
> Netdev queue dump accesses: NAPI, memory providers, XSk pointers.
> All three are "ops protected" now, switch to the op compat locking.
> rtnl lock does not have to be taken for "ops locked" devices.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  net/core/netdev-genl.c | 18 +++++++-----------
>  1 file changed, 7 insertions(+), 11 deletions(-)

Reviewed-by: Joe Damato <jdamato@fastly.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 5/8] xdp: double protect netdev->xdp_flags with netdev->lock
  2025-04-07 19:01 ` [PATCH net-next 5/8] xdp: double protect netdev->xdp_flags with netdev->lock Jakub Kicinski
@ 2025-04-08  2:30   ` Joe Damato
  0 siblings, 0 replies; 20+ messages in thread
From: Joe Damato @ 2025-04-08  2:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Mon, Apr 07, 2025 at 12:01:14PM -0700, Jakub Kicinski wrote:
> Protect xdp_features with netdev->lock. This way pure readers
> no longer have to take rtnl_lock to access the field.
> 
> This includes calling NETDEV_XDP_FEAT_CHANGE under the lock.
> Looks like that's fine for bonding, the only "real" listener,
> it's the same as ethtool feature change.
> 
> In terms of normal drivers - only GVE need special consideration
> (other drivers don't use instance lock or don't support XDP).
> It calls xdp_set_features_flag() helper from gve_init_priv() which
> in turn is called from gve_reset_recovery() (locked), or prior
> to netdev registration. So switch to _locked.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: bpf@vger.kernel.org
> ---
>  Documentation/networking/netdevices.rst    |  1 +
>  include/linux/netdevice.h                  |  2 +-
>  include/net/xdp.h                          |  1 +
>  drivers/net/ethernet/google/gve/gve_main.c |  2 +-
>  net/core/lock_debug.c                      |  2 +-
>  net/core/xdp.c                             | 12 +++++++++++-
>  6 files changed, 16 insertions(+), 4 deletions(-)

Reviewed-by: Joe Damato <jdamato@fastly.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 6/8] netdev: depend on netdev->lock for xdp features
  2025-04-07 19:01 ` [PATCH net-next 6/8] netdev: depend on netdev->lock for xdp features Jakub Kicinski
@ 2025-04-08  2:31   ` Joe Damato
  0 siblings, 0 replies; 20+ messages in thread
From: Joe Damato @ 2025-04-08  2:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Mon, Apr 07, 2025 at 12:01:15PM -0700, Jakub Kicinski wrote:
> Writes to XDP features are now protected by netdev->lock.
> Other things we report are based on ops which don't change
> once device has been registered. It is safe to stop taking
> rtnl_lock, and depend on netdev->lock instead.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  net/core/netdev-genl.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)

Reviewed-by: Joe Damato <jdamato@fastly.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 7/8] docs: netdev: break down the instance locking info per ops struct
  2025-04-07 19:01 ` [PATCH net-next 7/8] docs: netdev: break down the instance locking info per ops struct Jakub Kicinski
@ 2025-04-08  2:34   ` Joe Damato
  2025-04-08 15:08     ` Jakub Kicinski
  0 siblings, 1 reply; 20+ messages in thread
From: Joe Damato @ 2025-04-08  2:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Mon, Apr 07, 2025 at 12:01:16PM -0700, Jakub Kicinski wrote:
> Explicitly list all the ops structs and what locking they provide.
> Use "ops locked" as a term for drivers which have ops called under
> the instance lock.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  Documentation/networking/netdevices.rst | 54 +++++++++++++++++++------
>  1 file changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
> index d6357472d3f1..0cfff56b436e 100644
> --- a/Documentation/networking/netdevices.rst
> +++ b/Documentation/networking/netdevices.rst
> @@ -314,13 +314,8 @@ struct napi_struct synchronization rules

[...]

> +struct ethtool_ops
> +------------------
> +
> +Similarly to ``ndos`` the instance lock is only held for select drivers.
> +For "ops locked" drivers all ethtool ops without an exception should
> +be called under the instance lock.

Extreme nit (which you can ignore): "without an exception" read
oddly to me. Did you mean "without exception" ?

At any rate:

Reviewed-by: Joe Damato <jdamato@fastly.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers
  2025-04-07 19:01 ` [PATCH net-next 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers Jakub Kicinski
@ 2025-04-08  2:37   ` Joe Damato
  0 siblings, 0 replies; 20+ messages in thread
From: Joe Damato @ 2025-04-08  2:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Mon, Apr 07, 2025 at 12:01:17PM -0700, Jakub Kicinski wrote:
> We mostly needed rtnl_lock in qstat to make sure the queue count
> is stable while we work. For "ops locked" drivers the instance
> lock protects the queue count, so we don't have to take rtnl_lock.
> 
> For currently ops-locked drivers: netdevsim and bnxt need
> the protection from netdev going down while we dump, which
> instance lock provides. gve doesn't care.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  Documentation/networking/netdevices.rst |  6 +++++
>  include/net/netdev_queues.h             |  4 +++-
>  net/core/netdev-genl.c                  | 29 +++++++++++++++----------
>  3 files changed, 26 insertions(+), 13 deletions(-)

Reviewed-by: Joe Damato <jdamato@fastly.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch
  2025-04-07 19:01 ` [PATCH net-next 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch Jakub Kicinski
@ 2025-04-08  2:40   ` Joe Damato
  0 siblings, 0 replies; 20+ messages in thread
From: Joe Damato @ 2025-04-08  2:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Mon, Apr 07, 2025 at 12:01:10PM -0700, Jakub Kicinski wrote:
> netdev_get_by_index_lock() performs following steps:
> 
>   rcu_lock();
>   dev = lookup(netns, ifindex);
>   dev_get(dev);
>   rcu_unlock();
>   [... lock & validate the dev ...]
>   return dev
> 
> Validation right now only checks if the device is registered but since
> the lookup is netns-aware we must also protect against the device
> switching netns right after we dropped the RCU lock. Otherwise
> the caller in netns1 may get a pointer to a device which has just
> switched to netns2.
> 
> We can't hold the lock for the entire netns change process (because of
> the NETDEV_UNREGISTER notifier), and there's no existing marking to
> indicate that the netns is unlisted because of netns move, so add one.
> 
> AFAIU none of the existing netdev_get_by_index_lock() callers can
> suffer from this problem (NAPI code double checks the netns membership
> and other callers are either under rtnl_lock or not ns-sensitive),
> so this patch does not have to be treated as a fix.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  include/linux/netdevice.h |  6 +++++-
>  net/core/dev.h            |  2 +-
>  net/core/dev.c            | 25 ++++++++++++++++++-------
>  3 files changed, 24 insertions(+), 9 deletions(-)

Reviewed-by: Joe Damato <jdamato@fastly.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/8] netdev: add "ops compat locking" helpers
  2025-04-07 19:01 ` [PATCH net-next 3/8] netdev: add "ops compat locking" helpers Jakub Kicinski
@ 2025-04-08  2:41   ` Joe Damato
  0 siblings, 0 replies; 20+ messages in thread
From: Joe Damato @ 2025-04-08  2:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Mon, Apr 07, 2025 at 12:01:12PM -0700, Jakub Kicinski wrote:
> Add helpers to "lock a netdev in a backward-compatible way",
> which for ops-locked netdevs will mean take the instance lock.
> For drivers which haven't opted into the ops locking we'll take
> rtnl_lock.
> 
> The scoped foreach is dropping and re-taking the lock for each
> device, even if prev and next are both under rtnl_lock.
> I hope that's fine since we expect that netdev nl to be mostly
> supported by modern drivers, and modern drivers should also
> opt into the instance locking.
> 
> Note that these helpers are mostly needed for queue related state,
> because drivers modify queue config in their ops in a non-atomic
> way. Or differently put, queue changes don't have a clear-cut API
> like NAPI configuration. Any state that can should just use the
> instance lock directly, not the "compat" hacks.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  include/net/netdev_lock.h | 16 ++++++++++++
>  net/core/dev.h            | 15 ++++++++++++
>  net/core/dev.c            | 51 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 82 insertions(+)
> 

Reviewed-by: Joe Damato <jdamato@fastly.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] net: designate XSK pool pointers in queues as "ops protected"
  2025-04-08  2:17   ` Joe Damato
@ 2025-04-08 14:59     ` Jakub Kicinski
  0 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-08 14:59 UTC (permalink / raw)
  To: Joe Damato
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Mon, 7 Apr 2025 19:17:36 -0700 Joe Damato wrote:
> > -	xp_clear_dev(pool);
> > +	if (netdev) {
> > +		netdev_lock_ops(netdev);
> > +		xp_clear_dev(pool);
> > +		netdev_unlock_ops(netdev);
> > +	}
> >  	rtnl_unlock();  
> 
> Is it actually possible for netdev to be NULL here?

So I've been told in v1 review :) I should have probably linked to
previous postings, these patches had been a part of various series
before.

The code is indeed buggy, tho, we need to move the netdev = pool->netdev
assignment under rtnl_lock..

> I feel like it probably isn't, but if it were possible we'd need an
> else case here to xp_clear_dev(pool) without the netdev_lock_ops?

It does nothing when netdev is null. Actually, due to "other changes"
all callers of xp_clear_dev() are now wrapped in the lock. We can
move the locking inside that helper.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 7/8] docs: netdev: break down the instance locking info per ops struct
  2025-04-08  2:34   ` Joe Damato
@ 2025-04-08 15:08     ` Jakub Kicinski
  0 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2025-04-08 15:08 UTC (permalink / raw)
  To: Joe Damato
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
	hramamurthy, kuniyu

On Mon, 7 Apr 2025 19:34:34 -0700 Joe Damato wrote:
> > +Similarly to ``ndos`` the instance lock is only held for select drivers.
> > +For "ops locked" drivers all ethtool ops without an exception should
> > +be called under the instance lock.  
> 
> Extreme nit (which you can ignore): "without an exception" read
> oddly to me. Did you mean "without exception" ?

I guess I'll never understand indefinite articles.
Let me change to "without exceptions".
AFAIU plural forms are a get out of jail free card..

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-04-08 15:08 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-07 19:01 [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Jakub Kicinski
2025-04-07 19:01 ` [PATCH net-next 1/8] net: avoid potential race between netdev_get_by_index_lock() and netns switch Jakub Kicinski
2025-04-08  2:40   ` Joe Damato
2025-04-07 19:01 ` [PATCH net-next 2/8] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
2025-04-08  2:17   ` Joe Damato
2025-04-08 14:59     ` Jakub Kicinski
2025-04-07 19:01 ` [PATCH net-next 3/8] netdev: add "ops compat locking" helpers Jakub Kicinski
2025-04-08  2:41   ` Joe Damato
2025-04-07 19:01 ` [PATCH net-next 4/8] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
2025-04-08  2:28   ` Joe Damato
2025-04-07 19:01 ` [PATCH net-next 5/8] xdp: double protect netdev->xdp_flags with netdev->lock Jakub Kicinski
2025-04-08  2:30   ` Joe Damato
2025-04-07 19:01 ` [PATCH net-next 6/8] netdev: depend on netdev->lock for xdp features Jakub Kicinski
2025-04-08  2:31   ` Joe Damato
2025-04-07 19:01 ` [PATCH net-next 7/8] docs: netdev: break down the instance locking info per ops struct Jakub Kicinski
2025-04-08  2:34   ` Joe Damato
2025-04-08 15:08     ` Jakub Kicinski
2025-04-07 19:01 ` [PATCH net-next 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers Jakub Kicinski
2025-04-08  2:37   ` Joe Damato
2025-04-08  0:28 ` [PATCH net-next 0/8] net: depend on instance lock for queue related netlink ops Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).