* [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET
@ 2025-03-24 22:45 Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf() Jakub Kicinski
` (12 more replies)
0 siblings, 13 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Skip taking rtnl_lock for queue GET ops on devices which opt
into running all ops under the instance lock.
This fixes and completes Stan's ops-locking work, so I think
for sanity / ease of backporting fixes we should merge it for
v6.15.
v2:
- rebase
- only clear XSK if netdev still set
v1: https://lore.kernel.org/20250312223507.805719-1-kuba@kernel.org
Jakub Kicinski (11):
net: bubble up taking netdev instance lock to callers of
net_devmem_unbind_dmabuf()
net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n
net: constify dev pointer in misc instance lock helpers
net: explain "protection types" for the instance lock
net: designate queue counts as "double ops protected" by instance lock
net: designate queue -> napi linking as "ops protected"
net: protect rxq->mp_params with the instance lock
net: make NETDEV_UNREGISTER and instance lock more consistent
net: designate XSK pool pointers in queues as "ops protected"
netdev: add "ops compat locking" helpers
netdev: don't hold rtnl_lock over nl queue info get when possible
include/linux/netdevice.h | 41 ++++++++++++---------
include/net/netdev_lock.h | 36 ++++++++++++++++--
include/net/netdev_rx_queue.h | 6 +--
net/core/dev.h | 15 ++++++++
net/core/dev.c | 69 +++++++++++++++++++++++++++++++----
net/core/devmem.c | 2 -
net/core/net-sysfs.c | 2 +
net/core/netdev-genl.c | 27 ++++++++------
net/core/netdev_rx_queue.c | 3 ++
net/core/page_pool.c | 7 +---
net/xdp/xsk_buff_pool.c | 7 +++-
11 files changed, 165 insertions(+), 50 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net-next v2 01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf()
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-25 5:19 ` Mina Almasry
2025-03-24 22:45 ` [PATCH net-next v2 02/11] net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n Jakub Kicinski
` (11 subsequent siblings)
12 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
A recent commit added taking the netdev instance lock
in netdev_nl_bind_rx_doit(), but didn't remove it in
net_devmem_unbind_dmabuf() which it calls from an error path.
Always expect the callers of net_devmem_unbind_dmabuf() to
hold the lock. This is consistent with net_devmem_bind_dmabuf().
(Not so) coincidentally this also protects mp_param with the instance
lock, which the rest of this series needs.
Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
- rebase on a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()")
v1: https://lore.kernel.org/20250312223507.805719-2-kuba@kernel.org
---
net/core/devmem.c | 2 --
net/core/netdev-genl.c | 2 ++
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 6802e82a4d03..ee145a2aa41c 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -128,12 +128,10 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
rxq->mp_params.mp_priv = NULL;
rxq->mp_params.mp_ops = NULL;
- netdev_lock(binding->dev);
rxq_idx = get_netdev_rx_queue_index(rxq);
err = netdev_rx_queue_restart(binding->dev, rxq_idx);
WARN_ON(err && err != -ENETDOWN);
- netdev_unlock(binding->dev);
}
xa_erase(&net_devmem_dmabuf_bindings, binding->id);
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index a186fea63c09..9e4882a22407 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -947,7 +947,9 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
mutex_lock(&priv->lock);
list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
+ netdev_lock(binding->dev);
net_devmem_unbind_dmabuf(binding);
+ netdev_unlock(binding->dev);
}
mutex_unlock(&priv->lock);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 02/11] net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf() Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 03/11] net: constify dev pointer in misc instance lock helpers Jakub Kicinski
` (10 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Since commit a953be53ce40 ("net-sysfs: add support for device-specific
rx queue sysfs attributes"), so for at least a decade now it is safe
to call net_rx_queue_update_kobjects() when SYSFS=n. That function
does its own ifdef-inery and will return 0. Remove the unnecessary
stub for netif_set_real_num_rx_queues().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/netdevice.h | 10 ----------
net/core/dev.c | 2 --
2 files changed, 12 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f22cca7c03ad..55859c565f84 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4062,17 +4062,7 @@ static inline bool netif_is_multiqueue(const struct net_device *dev)
}
int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq);
-
-#ifdef CONFIG_SYSFS
int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq);
-#else
-static inline int netif_set_real_num_rx_queues(struct net_device *dev,
- unsigned int rxqs)
-{
- dev->real_num_rx_queues = rxqs;
- return 0;
-}
-#endif
int netif_set_real_num_queues(struct net_device *dev,
unsigned int txq, unsigned int rxq);
diff --git a/net/core/dev.c b/net/core/dev.c
index bcf81c3ff6a3..1a6e62792bb5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3160,7 +3160,6 @@ int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
}
EXPORT_SYMBOL(netif_set_real_num_tx_queues);
-#ifdef CONFIG_SYSFS
/**
* netif_set_real_num_rx_queues - set actual number of RX queues used
* @dev: Network device
@@ -3191,7 +3190,6 @@ int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq)
return 0;
}
EXPORT_SYMBOL(netif_set_real_num_rx_queues);
-#endif
/**
* netif_set_real_num_queues - set actual number of RX and TX queues used
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 03/11] net: constify dev pointer in misc instance lock helpers
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf() Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 02/11] net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 04/11] net: explain "protection types" for the instance lock Jakub Kicinski
` (9 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
lockdep asserts and predicates can operate on const pointers.
In the future this will let us add asserts in functions
which operate on const pointers like dev_get_min_mp_channel_count().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/netdev_lock.h | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/net/netdev_lock.h b/include/net/netdev_lock.h
index 99631fbd7f54..689ffdfae50d 100644
--- a/include/net/netdev_lock.h
+++ b/include/net/netdev_lock.h
@@ -11,19 +11,20 @@ static inline bool netdev_trylock(struct net_device *dev)
return mutex_trylock(&dev->lock);
}
-static inline void netdev_assert_locked(struct net_device *dev)
+static inline void netdev_assert_locked(const struct net_device *dev)
{
lockdep_assert_held(&dev->lock);
}
-static inline void netdev_assert_locked_or_invisible(struct net_device *dev)
+static inline void
+netdev_assert_locked_or_invisible(const struct net_device *dev)
{
if (dev->reg_state == NETREG_REGISTERED ||
dev->reg_state == NETREG_UNREGISTERING)
netdev_assert_locked(dev);
}
-static inline bool netdev_need_ops_lock(struct net_device *dev)
+static inline bool netdev_need_ops_lock(const struct net_device *dev)
{
bool ret = dev->request_ops_lock || !!dev->queue_mgmt_ops;
@@ -46,7 +47,7 @@ static inline void netdev_unlock_ops(struct net_device *dev)
netdev_unlock(dev);
}
-static inline void netdev_ops_assert_locked(struct net_device *dev)
+static inline void netdev_ops_assert_locked(const struct net_device *dev)
{
if (netdev_need_ops_lock(dev))
lockdep_assert_held(&dev->lock);
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 04/11] net: explain "protection types" for the instance lock
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (2 preceding siblings ...)
2025-03-24 22:45 ` [PATCH net-next v2 03/11] net: constify dev pointer in misc instance lock helpers Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 05/11] net: designate queue counts as "double ops protected" by " Jakub Kicinski
` (8 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Try to define some terminology for which fields are protected
by which lock and how. Some fields are protected by both rtnl_lock
and instance lock which is hard to talk about without having
a "key phrase" to refer to a particular protection scheme.
"ops protected" fields are defined later in the series, one by one.
Add ASSERT_RTNL() to netdev_ops_assert_locked() for drivers
not other instance protection of ops. Hopefully it's not too
confusion that netdev_lock_ops() does not match the lock which
netdev_ops_assert_locked() will assert, exactly. The noun "ops"
is in a different place in the name, so I think it's acceptable...
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/netdevice.h | 24 ++++++++++++++++++------
include/net/netdev_lock.h | 3 +++
2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 55859c565f84..09773e5c109a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2496,19 +2496,31 @@ struct net_device {
* Should always be taken using netdev_lock() / netdev_unlock() helpers.
* Drivers are free to use it for other protection.
*
- * Protects:
+ * For the drivers that implement shaper or queue API, the scope
+ * of this lock is expanded to cover most ndo/queue/ethtool/sysfs
+ * operations. Drivers may opt-in to this behavior by setting
+ * @request_ops_lock.
+ *
+ * @lock protection mixes with rtnl_lock in multiple ways, fields are
+ * either:
+ * - simply protected by the instance @lock;
+ * - double protected - writers hold both locks, readers hold either;
+ * - ops protected - protected by the lock held around the NDOs
+ * and other callbacks, that is the instance lock on devices for
+ * which netdev_need_ops_lock() returns true, otherwise by rtnl_lock;
+ * - double ops protected - always protected by rtnl_lock but for
+ * devices for which netdev_need_ops_lock() returns true - also
+ * the instance lock.
+ *
+ * Simply protects:
* @gro_flush_timeout, @napi_defer_hard_irqs, @napi_list,
* @net_shaper_hierarchy, @reg_state, @threaded
*
- * Partially protects (writers must hold both @lock and rtnl_lock):
+ * Double protects:
* @up
*
* Also protects some fields in struct napi_struct.
*
- * For the drivers that implement shaper or queue API, the scope
- * of this lock is expanded to cover most ndo/queue/ethtool/sysfs
- * operations.
- *
* Ordering: take after rtnl_lock.
*/
struct mutex lock;
diff --git a/include/net/netdev_lock.h b/include/net/netdev_lock.h
index 689ffdfae50d..efd302375ef2 100644
--- a/include/net/netdev_lock.h
+++ b/include/net/netdev_lock.h
@@ -5,6 +5,7 @@
#include <linux/lockdep.h>
#include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
static inline bool netdev_trylock(struct net_device *dev)
{
@@ -51,6 +52,8 @@ static inline void netdev_ops_assert_locked(const struct net_device *dev)
{
if (netdev_need_ops_lock(dev))
lockdep_assert_held(&dev->lock);
+ else
+ ASSERT_RTNL();
}
static inline int netdev_lock_cmp_fn(const struct lockdep_map *a,
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 05/11] net: designate queue counts as "double ops protected" by instance lock
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (3 preceding siblings ...)
2025-03-24 22:45 ` [PATCH net-next v2 04/11] net: explain "protection types" for the instance lock Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 06/11] net: designate queue -> napi linking as "ops protected" Jakub Kicinski
` (7 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Drivers which opt into instance lock protection of ops should
only call set_real_num_*_queues() under the instance lock.
This means that queue counts are double protected (writes
are under both rtnl_lock and instance lock, readers under
either).
Some readers may still be under the rtnl_lock, however, so for
now we need double protection of writers.
OTOH queue API paths are only under the protection of the instance
lock, so we need to validate that the instance is actually locking
ops, otherwise the input checks we do against queue count are racy.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/netdevice.h | 3 +++
net/core/dev.c | 2 ++
net/core/net-sysfs.c | 2 ++
net/core/netdev-genl.c | 7 +++++++
net/core/netdev_rx_queue.c | 3 +++
5 files changed, 17 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 09773e5c109a..3dbce9cd3f5a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2519,6 +2519,9 @@ struct net_device {
* Double protects:
* @up
*
+ * Double ops protects:
+ * @real_num_rx_queues, @real_num_tx_queues
+ *
* Also protects some fields in struct napi_struct.
*
* Ordering: take after rtnl_lock.
diff --git a/net/core/dev.c b/net/core/dev.c
index 1a6e62792bb5..aa99c91fd68f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3130,6 +3130,7 @@ int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
if (dev->reg_state == NETREG_REGISTERED ||
dev->reg_state == NETREG_UNREGISTERING) {
ASSERT_RTNL();
+ netdev_ops_assert_locked(dev);
rc = netdev_queue_update_kobjects(dev, dev->real_num_tx_queues,
txq);
@@ -3179,6 +3180,7 @@ int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq)
if (dev->reg_state == NETREG_REGISTERED) {
ASSERT_RTNL();
+ netdev_ops_assert_locked(dev);
rc = net_rx_queue_update_kobjects(dev, dev->real_num_rx_queues,
rxq);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index abaa1c919b98..fa8c8c364a9a 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -2148,8 +2148,10 @@ static void remove_queue_kobjects(struct net_device *dev)
net_rx_queue_update_kobjects(dev, real_rx, 0);
netdev_queue_update_kobjects(dev, real_tx, 0);
+ netdev_lock_ops(dev);
dev->real_num_rx_queues = 0;
dev->real_num_tx_queues = 0;
+ netdev_unlock_ops(dev);
#ifdef CONFIG_SYSFS
kset_unregister(dev->queues_kset);
#endif
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 9e4882a22407..fd1cfa9707dc 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -867,6 +867,13 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
goto err_unlock_sock;
}
+ if (!netdev_need_ops_lock(netdev)) {
+ err = -EOPNOTSUPP;
+ NL_SET_BAD_ATTR(info->extack,
+ info->attrs[NETDEV_A_DEV_IFINDEX]);
+ goto err_unlock;
+ }
+
if (dev_xdp_prog_count(netdev)) {
NL_SET_ERR_MSG(info->extack, "unable to bind dmabuf to device with XDP program attached");
err = -EEXIST;
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
index a5b234b33cd5..3af716f77a13 100644
--- a/net/core/netdev_rx_queue.c
+++ b/net/core/netdev_rx_queue.c
@@ -92,6 +92,9 @@ static int __net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
struct netdev_rx_queue *rxq;
int ret;
+ if (!netdev_need_ops_lock(dev))
+ return -EOPNOTSUPP;
+
if (ifq_idx >= dev->real_num_rx_queues)
return -EINVAL;
ifq_idx = array_index_nospec(ifq_idx, dev->real_num_rx_queues);
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 06/11] net: designate queue -> napi linking as "ops protected"
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (4 preceding siblings ...)
2025-03-24 22:45 ` [PATCH net-next v2 05/11] net: designate queue counts as "double ops protected" by " Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 07/11] net: protect rxq->mp_params with the instance lock Jakub Kicinski
` (6 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
netdev netlink is the only reader of netdev_{,rx_}queue->napi,
and it already holds netdev->lock. Switch protection of
the writes to netdev->lock to "ops protected".
The expectation will be now that accessing queue->napi
will require netdev->lock for "ops locked" drivers, and
rtnl_lock for all other drivers.
Current "ops locked" drivers don't require any changes.
gve and netdevsim use _locked() helpers right next to
netif_queue_set_napi() so they must be holding the instance
lock. iavf doesn't call it. bnxt is a bit messy but all paths
seem locked.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/netdevice.h | 5 +++--
include/net/netdev_lock.h | 8 ++++++++
include/net/netdev_rx_queue.h | 2 +-
net/core/dev.c | 3 +--
4 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3dbce9cd3f5a..fd508e9f148c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -710,7 +710,7 @@ struct netdev_queue {
* slow- / control-path part
*/
/* NAPI instance for the queue
- * Readers and writers must hold RTNL
+ * "ops protected", see comment about net_device::lock
*/
struct napi_struct *napi;
@@ -2522,7 +2522,8 @@ struct net_device {
* Double ops protects:
* @real_num_rx_queues, @real_num_tx_queues
*
- * Also protects some fields in struct napi_struct.
+ * Also protects some fields in:
+ * struct napi_struct, struct netdev_queue, struct netdev_rx_queue
*
* Ordering: take after rtnl_lock.
*/
diff --git a/include/net/netdev_lock.h b/include/net/netdev_lock.h
index efd302375ef2..1c0c9a94cc22 100644
--- a/include/net/netdev_lock.h
+++ b/include/net/netdev_lock.h
@@ -56,6 +56,14 @@ static inline void netdev_ops_assert_locked(const struct net_device *dev)
ASSERT_RTNL();
}
+static inline void
+netdev_ops_assert_locked_or_invisible(const struct net_device *dev)
+{
+ if (dev->reg_state == NETREG_REGISTERED ||
+ dev->reg_state == NETREG_UNREGISTERING)
+ netdev_ops_assert_locked(dev);
+}
+
static inline int netdev_lock_cmp_fn(const struct lockdep_map *a,
const struct lockdep_map *b)
{
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index af40842f229d..b2238b551dce 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -24,7 +24,7 @@ struct netdev_rx_queue {
struct xsk_buff_pool *pool;
#endif
/* NAPI instance for the queue
- * Readers and writers must hold RTNL
+ * "ops protected", see comment about net_device::lock
*/
struct napi_struct *napi;
struct pp_memory_provider_params mp_params;
diff --git a/net/core/dev.c b/net/core/dev.c
index aa99c91fd68f..690d46497b2f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6896,8 +6896,7 @@ void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
if (WARN_ON_ONCE(napi && !napi->dev))
return;
- if (dev->reg_state >= NETREG_REGISTERED)
- ASSERT_RTNL();
+ netdev_ops_assert_locked_or_invisible(dev);
switch (type) {
case NETDEV_QUEUE_TYPE_RX:
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 07/11] net: protect rxq->mp_params with the instance lock
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (5 preceding siblings ...)
2025-03-24 22:45 ` [PATCH net-next v2 06/11] net: designate queue -> napi linking as "ops protected" Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-25 5:34 ` Mina Almasry
2025-03-24 22:45 ` [PATCH net-next v2 08/11] net: make NETDEV_UNREGISTER and instance lock more consistent Jakub Kicinski
` (5 subsequent siblings)
12 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Ensure that all accesses to mp_params are under the netdev
instance lock. The only change we need is to move
dev_memory_provider_uninstall() under the lock.
Appropriately swap the asserts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/dev.c | 4 ++--
net/core/page_pool.c | 7 ++-----
2 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 690d46497b2f..652f2c6f5674 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10353,7 +10353,7 @@ u32 dev_get_min_mp_channel_count(const struct net_device *dev)
{
int i;
- ASSERT_RTNL();
+ netdev_ops_assert_locked(dev);
for (i = dev->real_num_rx_queues - 1; i >= 0; i--)
if (dev->_rx[i].mp_params.mp_priv)
@@ -11957,9 +11957,9 @@ void unregister_netdevice_many_notify(struct list_head *head,
dev_tcx_uninstall(dev);
netdev_lock_ops(dev);
dev_xdp_uninstall(dev);
+ dev_memory_provider_uninstall(dev);
netdev_unlock_ops(dev);
bpf_dev_bound_netdev_unregister(dev);
- dev_memory_provider_uninstall(dev);
netdev_offload_xstats_disable_all(dev);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index acef1fcd8ddc..7745ad924ae2 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -11,6 +11,7 @@
#include <linux/slab.h>
#include <linux/device.h>
+#include <net/netdev_lock.h>
#include <net/netdev_rx_queue.h>
#include <net/page_pool/helpers.h>
#include <net/page_pool/memory_provider.h>
@@ -279,11 +280,7 @@ static int page_pool_init(struct page_pool *pool,
get_device(pool->p.dev);
if (pool->slow.flags & PP_FLAG_ALLOW_UNREADABLE_NETMEM) {
- /* We rely on rtnl_lock()ing to make sure netdev_rx_queue
- * configuration doesn't change while we're initializing
- * the page_pool.
- */
- ASSERT_RTNL();
+ netdev_assert_locked(pool->slow.netdev);
rxq = __netif_get_rx_queue(pool->slow.netdev,
pool->slow.queue_idx);
pool->mp_priv = rxq->mp_params.mp_priv;
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 08/11] net: make NETDEV_UNREGISTER and instance lock more consistent
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (6 preceding siblings ...)
2025-03-24 22:45 ` [PATCH net-next v2 07/11] net: protect rxq->mp_params with the instance lock Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-25 12:17 ` Cosmin Ratiu
2025-03-24 22:45 ` [PATCH net-next v2 09/11] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
` (4 subsequent siblings)
12 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
The NETDEV_UNREGISTER notifier gets called under the ops lock
when device changes namespace but not during real unregistration.
Take it consistently, XSK tries to poke at netdev queue state
from this notifier.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/dev.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 652f2c6f5674..7bd8bd82f66f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1848,7 +1848,9 @@ static void call_netdevice_unregister_notifiers(struct notifier_block *nb,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
}
+ netdev_lock_ops(dev);
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
+ netdev_unlock_ops(dev);
}
static int call_netdevice_register_net_notifiers(struct notifier_block *nb,
@@ -11174,8 +11176,11 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
rtnl_lock();
/* Rebroadcast unregister notification */
- list_for_each_entry(dev, list, todo_list)
+ list_for_each_entry(dev, list, todo_list) {
+ netdev_lock_ops(dev);
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+ netdev_unlock_ops(dev);
+ }
__rtnl_unlock();
rcu_barrier();
@@ -11966,7 +11971,9 @@ void unregister_netdevice_many_notify(struct list_head *head,
/* Notify protocols, that we are about to destroy
* this device. They should clean all the things.
*/
+ netdev_lock_ops(dev);
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+ netdev_unlock_ops(dev);
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 09/11] net: designate XSK pool pointers in queues as "ops protected"
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (7 preceding siblings ...)
2025-03-24 22:45 ` [PATCH net-next v2 08/11] net: make NETDEV_UNREGISTER and instance lock more consistent Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 10/11] netdev: add "ops compat locking" helpers Jakub Kicinski
` (3 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Read accesses go via xsk_get_pool_from_qid(), the call coming
from the core and gve look safe (other "ops locked" drivers
don't support XSK).
Write accesses go via xsk_reg_pool_at_qid() and xsk_clear_pool_at_qid().
Former is already under the ops lock, latter needs to be locked when
coming from the workqueue via xp_clear_dev().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
- only clear if netdev still set
v1: https://lore.kernel.org/20250312223507.805719-10-kuba@kernel.org
---
include/linux/netdevice.h | 1 +
include/net/netdev_rx_queue.h | 6 +++---
net/xdp/xsk_buff_pool.c | 7 ++++++-
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fd508e9f148c..15f8b596da66 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -688,6 +688,7 @@ struct netdev_queue {
/* Subordinate device that the queue has been assigned to */
struct net_device *sb_dev;
#ifdef CONFIG_XDP_SOCKETS
+ /* "ops protected", see comment about net_device::lock */
struct xsk_buff_pool *pool;
#endif
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index b2238b551dce..8cdcd138b33f 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -20,12 +20,12 @@ struct netdev_rx_queue {
struct net_device *dev;
netdevice_tracker dev_tracker;
+ /* All fields below are "ops protected",
+ * see comment about net_device::lock
+ */
#ifdef CONFIG_XDP_SOCKETS
struct xsk_buff_pool *pool;
#endif
- /* NAPI instance for the queue
- * "ops protected", see comment about net_device::lock
- */
struct napi_struct *napi;
struct pp_memory_provider_params mp_params;
} ____cacheline_aligned_in_smp;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 25a76c5ce0f1..c7e50fd86c6a 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -279,9 +279,14 @@ static void xp_release_deferred(struct work_struct *work)
{
struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool,
work);
+ struct net_device *netdev = pool->netdev;
rtnl_lock();
- xp_clear_dev(pool);
+ if (netdev) {
+ netdev_lock_ops(netdev);
+ xp_clear_dev(pool);
+ netdev_unlock_ops(netdev);
+ }
rtnl_unlock();
if (pool->fq) {
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 10/11] netdev: add "ops compat locking" helpers
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (8 preceding siblings ...)
2025-03-24 22:45 ` [PATCH net-next v2 09/11] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 11/11] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
` (2 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Add helpers to "lock a netdev in a backward-compatible way",
which for ops-locked netdevs will mean take the instance lock.
For drivers which haven't opted into the ops locking we'll take
rtnl_lock.
The scoped foreach is dropping and re-taking the lock for each
device, even if prev and next are both under rtnl_lock.
I hope that's fine since we expect that netdev nl to be mostly
supported by modern drivers, and modern drivers should also
opt into the instance locking.
Note that these helpers are mostly needed for queue related state,
because drivers modify queue config in their ops in a non-atomic
way. Or differently put, queue changes don't have a clear-cut API
like NAPI configuration. Any state that can should just use the
instance lock directly, not the "compat" hacks.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/netdev_lock.h | 16 +++++++++++++
net/core/dev.h | 15 ++++++++++++
net/core/dev.c | 49 +++++++++++++++++++++++++++++++++++++++
3 files changed, 80 insertions(+)
diff --git a/include/net/netdev_lock.h b/include/net/netdev_lock.h
index 1c0c9a94cc22..76cbf5a449b6 100644
--- a/include/net/netdev_lock.h
+++ b/include/net/netdev_lock.h
@@ -64,6 +64,22 @@ netdev_ops_assert_locked_or_invisible(const struct net_device *dev)
netdev_ops_assert_locked(dev);
}
+static inline void netdev_lock_ops_compat(struct net_device *dev)
+{
+ if (netdev_need_ops_lock(dev))
+ netdev_lock(dev);
+ else
+ rtnl_lock();
+}
+
+static inline void netdev_unlock_ops_compat(struct net_device *dev)
+{
+ if (netdev_need_ops_lock(dev))
+ netdev_unlock(dev);
+ else
+ rtnl_unlock();
+}
+
static inline int netdev_lock_cmp_fn(const struct lockdep_map *a,
const struct lockdep_map *b)
{
diff --git a/net/core/dev.h b/net/core/dev.h
index 7ee203395d8e..c4b645120d72 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -41,6 +41,21 @@ DEFINE_FREE(netdev_unlock, struct net_device *, if (_T) netdev_unlock(_T));
(var_name = netdev_xa_find_lock(net, var_name, &ifindex)); \
ifindex++)
+struct net_device *
+netdev_get_by_index_lock_ops_compat(struct net *net, int ifindex);
+struct net_device *
+netdev_xa_find_lock_ops_compat(struct net *net, struct net_device *dev,
+ unsigned long *index);
+
+DEFINE_FREE(netdev_unlock_ops_compat, struct net_device *,
+ if (_T) netdev_unlock_ops_compat(_T));
+
+#define for_each_netdev_lock_ops_compat_scoped(net, var_name, ifindex) \
+ for (struct net_device *var_name __free(netdev_unlock_ops_compat) = NULL; \
+ (var_name = netdev_xa_find_lock_ops_compat(net, var_name, \
+ &ifindex)); \
+ ifindex++)
+
#ifdef CONFIG_PROC_FS
int __init dev_proc_init(void);
#else
diff --git a/net/core/dev.c b/net/core/dev.c
index 7bd8bd82f66f..01139d8ef840 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1051,6 +1051,18 @@ struct net_device *__netdev_put_lock(struct net_device *dev)
return dev;
}
+static struct net_device *__netdev_put_lock_ops_compat(struct net_device *dev)
+{
+ netdev_lock_ops_compat(dev);
+ if (dev->reg_state > NETREG_REGISTERED) {
+ netdev_unlock_ops_compat(dev);
+ dev_put(dev);
+ return NULL;
+ }
+ dev_put(dev);
+ return dev;
+}
+
/**
* netdev_get_by_index_lock() - find a device by its ifindex
* @net: the applicable net namespace
@@ -1073,6 +1085,18 @@ struct net_device *netdev_get_by_index_lock(struct net *net, int ifindex)
return __netdev_put_lock(dev);
}
+struct net_device *
+netdev_get_by_index_lock_ops_compat(struct net *net, int ifindex)
+{
+ struct net_device *dev;
+
+ dev = dev_get_by_index(net, ifindex);
+ if (!dev)
+ return NULL;
+
+ return __netdev_put_lock_ops_compat(dev);
+}
+
struct net_device *
netdev_xa_find_lock(struct net *net, struct net_device *dev,
unsigned long *index)
@@ -1098,6 +1122,31 @@ netdev_xa_find_lock(struct net *net, struct net_device *dev,
} while (true);
}
+struct net_device *
+netdev_xa_find_lock_ops_compat(struct net *net, struct net_device *dev,
+ unsigned long *index)
+{
+ if (dev)
+ netdev_unlock_ops_compat(dev);
+
+ do {
+ rcu_read_lock();
+ dev = xa_find(&net->dev_by_index, index, ULONG_MAX, XA_PRESENT);
+ if (!dev) {
+ rcu_read_unlock();
+ return NULL;
+ }
+ dev_hold(dev);
+ rcu_read_unlock();
+
+ dev = __netdev_put_lock_ops_compat(dev);
+ if (dev)
+ return dev;
+
+ (*index)++;
+ } while (true);
+}
+
static DEFINE_SEQLOCK(netdev_rename_lock);
void netdev_copy_name(struct net_device *dev, char *name)
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 11/11] netdev: don't hold rtnl_lock over nl queue info get when possible
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (9 preceding siblings ...)
2025-03-24 22:45 ` [PATCH net-next v2 10/11] netdev: add "ops compat locking" helpers Jakub Kicinski
@ 2025-03-24 22:45 ` Jakub Kicinski
2025-03-25 4:04 ` [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Stanislav Fomichev
2025-03-25 17:30 ` patchwork-bot+netdevbpf
12 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-24 22:45 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Netdev queue dump accesses: NAPI, memory providers, XSk pointers.
All three are "ops protected" now, switch to the op compat locking.
rtnl lock does not have to be taken for "ops locked" devices.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/netdev-genl.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index fd1cfa9707dc..39f52a311f07 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -481,18 +481,15 @@ int netdev_nl_queue_get_doit(struct sk_buff *skb, struct genl_info *info)
if (!rsp)
return -ENOMEM;
- rtnl_lock();
-
- netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
+ netdev = netdev_get_by_index_lock_ops_compat(genl_info_net(info),
+ ifindex);
if (netdev) {
err = netdev_nl_queue_fill(rsp, netdev, q_id, q_type, info);
- netdev_unlock(netdev);
+ netdev_unlock_ops_compat(netdev);
} else {
err = -ENODEV;
}
- rtnl_unlock();
-
if (err)
goto err_free_msg;
@@ -541,17 +538,17 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
if (info->attrs[NETDEV_A_QUEUE_IFINDEX])
ifindex = nla_get_u32(info->attrs[NETDEV_A_QUEUE_IFINDEX]);
- rtnl_lock();
if (ifindex) {
- netdev = netdev_get_by_index_lock(net, ifindex);
+ netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
if (netdev) {
err = netdev_nl_queue_dump_one(netdev, skb, info, ctx);
- netdev_unlock(netdev);
+ netdev_unlock_ops_compat(netdev);
} else {
err = -ENODEV;
}
} else {
- for_each_netdev_lock_scoped(net, netdev, ctx->ifindex) {
+ for_each_netdev_lock_ops_compat_scoped(net, netdev,
+ ctx->ifindex) {
err = netdev_nl_queue_dump_one(netdev, skb, info, ctx);
if (err < 0)
break;
@@ -559,7 +556,6 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
ctx->txq_idx = 0;
}
}
- rtnl_unlock();
return err;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (10 preceding siblings ...)
2025-03-24 22:45 ` [PATCH net-next v2 11/11] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
@ 2025-03-25 4:04 ` Stanislav Fomichev
2025-03-25 17:30 ` patchwork-bot+netdevbpf
12 siblings, 0 replies; 19+ messages in thread
From: Stanislav Fomichev @ 2025-03-25 4:04 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf
On 03/24, Jakub Kicinski wrote:
> Skip taking rtnl_lock for queue GET ops on devices which opt
> into running all ops under the instance lock.
>
> This fixes and completes Stan's ops-locking work, so I think
> for sanity / ease of backporting fixes we should merge it for
> v6.15.
>
> v2:
> - rebase
> - only clear XSK if netdev still set
> v1: https://lore.kernel.org/20250312223507.805719-1-kuba@kernel.org
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf()
2025-03-24 22:45 ` [PATCH net-next v2 01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf() Jakub Kicinski
@ 2025-03-25 5:19 ` Mina Almasry
0 siblings, 0 replies; 19+ messages in thread
From: Mina Almasry @ 2025-03-25 5:19 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf
On Mon, Mar 24, 2025 at 3:46 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> A recent commit added taking the netdev instance lock
> in netdev_nl_bind_rx_doit(), but didn't remove it in
> net_devmem_unbind_dmabuf() which it calls from an error path.
> Always expect the callers of net_devmem_unbind_dmabuf() to
> hold the lock. This is consistent with net_devmem_bind_dmabuf().
>
> (Not so) coincidentally this also protects mp_param with the instance
> lock, which the rest of this series needs.
>
> Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sorry for not noticing this earlier.
Reviewed-by: Mina Almasry <almasrymina@google.com>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 07/11] net: protect rxq->mp_params with the instance lock
2025-03-24 22:45 ` [PATCH net-next v2 07/11] net: protect rxq->mp_params with the instance lock Jakub Kicinski
@ 2025-03-25 5:34 ` Mina Almasry
2025-03-25 9:50 ` Jakub Kicinski
0 siblings, 1 reply; 19+ messages in thread
From: Mina Almasry @ 2025-03-25 5:34 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf
On Mon, Mar 24, 2025 at 3:47 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Ensure that all accesses to mp_params are under the netdev
> instance lock. The only change we need is to move
> dev_memory_provider_uninstall() under the lock.
>
> Appropriately swap the asserts.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
> ---
> net/core/dev.c | 4 ++--
> net/core/page_pool.c | 7 ++-----
> 2 files changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 690d46497b2f..652f2c6f5674 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -10353,7 +10353,7 @@ u32 dev_get_min_mp_channel_count(const struct net_device *dev)
> {
> int i;
>
> - ASSERT_RTNL();
> + netdev_ops_assert_locked(dev);
>
> for (i = dev->real_num_rx_queues - 1; i >= 0; i--)
> if (dev->_rx[i].mp_params.mp_priv)
> @@ -11957,9 +11957,9 @@ void unregister_netdevice_many_notify(struct list_head *head,
> dev_tcx_uninstall(dev);
> netdev_lock_ops(dev);
> dev_xdp_uninstall(dev);
> + dev_memory_provider_uninstall(dev);
> netdev_unlock_ops(dev);
> bpf_dev_bound_netdev_unregister(dev);
> - dev_memory_provider_uninstall(dev);
So initially I thought this may be wrong because netdev_lock_ops()
only locks if there are queue_mgmt_ops, but access to mp_params should
be locked anyway. But I guess you're relying on the fact that if the
device doesn't support queue_mgmt_ops memory providers don't work
anyway.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 07/11] net: protect rxq->mp_params with the instance lock
2025-03-25 5:34 ` Mina Almasry
@ 2025-03-25 9:50 ` Jakub Kicinski
0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-25 9:50 UTC (permalink / raw)
To: Mina Almasry; +Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf
On Mon, 24 Mar 2025 22:34:43 -0700 Mina Almasry wrote:
> > @@ -11957,9 +11957,9 @@ void unregister_netdevice_many_notify(struct list_head *head,
> > dev_tcx_uninstall(dev);
> > netdev_lock_ops(dev);
> > dev_xdp_uninstall(dev);
> > + dev_memory_provider_uninstall(dev);
> > netdev_unlock_ops(dev);
> > bpf_dev_bound_netdev_unregister(dev);
> > - dev_memory_provider_uninstall(dev);
>
> So initially I thought this may be wrong because netdev_lock_ops()
> only locks if there are queue_mgmt_ops, but access to mp_params should
> be locked anyway. But I guess you're relying on the fact that if the
> device doesn't support queue_mgmt_ops memory providers don't work
> anyway.
Right, my expectation is that they must be NULL if device is not
ops-locked. Not sure if that's what textbooks would consider "correct"
but I think KCSAN will not complain :)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 08/11] net: make NETDEV_UNREGISTER and instance lock more consistent
2025-03-24 22:45 ` [PATCH net-next v2 08/11] net: make NETDEV_UNREGISTER and instance lock more consistent Jakub Kicinski
@ 2025-03-25 12:17 ` Cosmin Ratiu
2025-03-25 17:04 ` Jakub Kicinski
0 siblings, 1 reply; 19+ messages in thread
From: Cosmin Ratiu @ 2025-03-25 12:17 UTC (permalink / raw)
To: kuba@kernel.org, davem@davemloft.net, sdf@fomichev.me
Cc: horms@kernel.org, edumazet@google.com, netdev@vger.kernel.org,
andrew+netdev@lunn.ch, pabeni@redhat.com
On Mon, 2025-03-24 at 15:45 -0700, Jakub Kicinski wrote:
> The NETDEV_UNREGISTER notifier gets called under the ops lock
> when device changes namespace but not during real unregistration.
> Take it consistently, XSK tries to poke at netdev queue state
> from this notifier.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> net/core/dev.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 652f2c6f5674..7bd8bd82f66f 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1848,7 +1848,9 @@ static void
> call_netdevice_unregister_notifiers(struct notifier_block *nb,
> dev);
> call_netdevice_notifier(nb, NETDEV_DOWN, dev);
> }
> + netdev_lock_ops(dev);
> call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
> + netdev_unlock_ops(dev);
> }
This introduces a potential deadlock when changing a device namespace:
do_setlink already locks the instance lock and
call_netdevice_unregister_notifiers will then deadlock.
============================================
WARNING: possible recursive locking detected
6.14.0-rc6+ #143 Not tainted
--------------------------------------------
ip/2459 is trying to acquire lock:
ffff888113ca8c80 (&dev->lock){+.+.}-{4:4}, at:
call_netdevice_unregister_notifiers+0x73/0x110
but task is already holding lock:
ffff888155a10c80 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.isra.0+0x5b/0x1220
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dev->lock);
lock(&dev->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by ip/2459:
#0: ffffffff82df18c8 (rtnl_mutex){+.+.}-{4:4}, at:
rtnl_newlink+0x35d/0xb50
#1: ffff888155a10c80 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.isra.0+0x5b/0x1220
Call Trace:
<TASK>
dump_stack_lvl+0x62/0x90
print_deadlock_bug+0x274/0x3b0
__lock_acquire+0x1229/0x2470
lock_acquire+0xb7/0x2b0
__mutex_lock+0xa6/0xd20
call_netdevice_unregister_notifiers+0x73/0x110
netif_change_net_namespace+0x4b7/0xa90
do_setlink.isra.0+0xd5/0x1220
rtnl_newlink+0x7ea/0xb50
rtnetlink_rcv_msg+0x459/0x5e0
netlink_rcv_skb+0x54/0x100
netlink_unicast+0x193/0x270
netlink_sendmsg+0x204/0x450
<snip>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 08/11] net: make NETDEV_UNREGISTER and instance lock more consistent
2025-03-25 12:17 ` Cosmin Ratiu
@ 2025-03-25 17:04 ` Jakub Kicinski
0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-25 17:04 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: davem@davemloft.net, sdf@fomichev.me, horms@kernel.org,
edumazet@google.com, netdev@vger.kernel.org,
andrew+netdev@lunn.ch, pabeni@redhat.com
On Tue, 25 Mar 2025 12:17:15 +0000 Cosmin Ratiu wrote:
> > + netdev_lock_ops(dev);
> > call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
> > + netdev_unlock_ops(dev);
> > }
>
> This introduces a potential deadlock when changing a device namespace:
> do_setlink already locks the instance lock and
> call_netdevice_unregister_notifiers will then deadlock.
Thanks for catching, the notifiers need a closer look.
Let me apply the earlier patches (most importantly the fix)
and we'll see what shakes out as we address the outstanding
bugs in notifiers..
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (11 preceding siblings ...)
2025-03-25 4:04 ` [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Stanislav Fomichev
@ 2025-03-25 17:30 ` patchwork-bot+netdevbpf
12 siblings, 0 replies; 19+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-03-25 17:30 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 24 Mar 2025 15:45:26 -0700 you wrote:
> Skip taking rtnl_lock for queue GET ops on devices which opt
> into running all ops under the instance lock.
>
> This fixes and completes Stan's ops-locking work, so I think
> for sanity / ease of backporting fixes we should merge it for
> v6.15.
>
> [...]
Here is the summary with links:
- [net-next,v2,01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf()
https://git.kernel.org/netdev/net-next/c/ba6f418fbf64
- [net-next,v2,02/11] net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n
https://git.kernel.org/netdev/net-next/c/bae2da826196
- [net-next,v2,03/11] net: constify dev pointer in misc instance lock helpers
https://git.kernel.org/netdev/net-next/c/e2f81e8f4d0c
- [net-next,v2,04/11] net: explain "protection types" for the instance lock
(no matching commit)
- [net-next,v2,05/11] net: designate queue counts as "double ops protected" by instance lock
https://git.kernel.org/netdev/net-next/c/0a65dcf6249b
- [net-next,v2,06/11] net: designate queue -> napi linking as "ops protected"
https://git.kernel.org/netdev/net-next/c/310ae9eb2617
- [net-next,v2,07/11] net: protect rxq->mp_params with the instance lock
https://git.kernel.org/netdev/net-next/c/b52458652eca
- [net-next,v2,08/11] net: make NETDEV_UNREGISTER and instance lock more consistent
(no matching commit)
- [net-next,v2,09/11] net: designate XSK pool pointers in queues as "ops protected"
(no matching commit)
- [net-next,v2,10/11] netdev: add "ops compat locking" helpers
(no matching commit)
- [net-next,v2,11/11] netdev: don't hold rtnl_lock over nl queue info get when possible
(no matching commit)
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-03-25 17:29 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-24 22:45 [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf() Jakub Kicinski
2025-03-25 5:19 ` Mina Almasry
2025-03-24 22:45 ` [PATCH net-next v2 02/11] net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 03/11] net: constify dev pointer in misc instance lock helpers Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 04/11] net: explain "protection types" for the instance lock Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 05/11] net: designate queue counts as "double ops protected" by " Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 06/11] net: designate queue -> napi linking as "ops protected" Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 07/11] net: protect rxq->mp_params with the instance lock Jakub Kicinski
2025-03-25 5:34 ` Mina Almasry
2025-03-25 9:50 ` Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 08/11] net: make NETDEV_UNREGISTER and instance lock more consistent Jakub Kicinski
2025-03-25 12:17 ` Cosmin Ratiu
2025-03-25 17:04 ` Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 09/11] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 10/11] netdev: add "ops compat locking" helpers Jakub Kicinski
2025-03-24 22:45 ` [PATCH net-next v2 11/11] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
2025-03-25 4:04 ` [PATCH net-next v2 00/11] net: skip taking rtnl_lock for queue GET Stanislav Fomichev
2025-03-25 17:30 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).