* [PATCH net-next 01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf()
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
@ 2025-03-12 22:34 ` Jakub Kicinski
2025-03-12 22:34 ` [PATCH net-next 02/11] net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n Jakub Kicinski
` (9 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:34 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
A recent commit added taking the netdev instance lock
in netdev_nl_bind_rx_doit(), but didn't remove it in
net_devmem_unbind_dmabuf() which it calls from an error path.
Always expect the callers of net_devmem_unbind_dmabuf() to
hold the lock. This is consistent with net_devmem_bind_dmabuf().
(Not so) coincidentally this also protects mp_param with the instance
lock, which the rest of this series needs.
Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/devmem.c | 2 --
net/core/netdev-genl.c | 2 ++
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 5c4d79a1bcd8..9e9db6de8631 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -127,10 +127,8 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
rxq->mp_params.mp_priv = NULL;
rxq->mp_params.mp_ops = NULL;
- netdev_lock(binding->dev);
rxq_idx = get_netdev_rx_queue_index(rxq);
WARN_ON(netdev_rx_queue_restart(binding->dev, rxq_idx));
- netdev_unlock(binding->dev);
}
xa_erase(&net_devmem_dmabuf_bindings, binding->id);
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index a186fea63c09..9e4882a22407 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -947,7 +947,9 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
mutex_lock(&priv->lock);
list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
+ netdev_lock(binding->dev);
net_devmem_unbind_dmabuf(binding);
+ netdev_unlock(binding->dev);
}
mutex_unlock(&priv->lock);
}
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next 02/11] net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
2025-03-12 22:34 ` [PATCH net-next 01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf() Jakub Kicinski
@ 2025-03-12 22:34 ` Jakub Kicinski
2025-03-12 22:34 ` [PATCH net-next 03/11] net: constify dev pointer in misc instance lock helpers Jakub Kicinski
` (8 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:34 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Since commit a953be53ce40 ("net-sysfs: add support for device-specific
rx queue sysfs attributes"), so for at least a decade now it is safe
to call net_rx_queue_update_kobjects() when SYSFS=n. That function
does its own ifdef-inery and will return 0. Remove the unnecessary
stub for netif_set_real_num_rx_queues().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/netdevice.h | 10 ----------
net/core/dev.c | 2 --
2 files changed, 12 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0dbfe069a6e3..2f344d5ad953 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4062,17 +4062,7 @@ static inline bool netif_is_multiqueue(const struct net_device *dev)
}
int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq);
-
-#ifdef CONFIG_SYSFS
int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq);
-#else
-static inline int netif_set_real_num_rx_queues(struct net_device *dev,
- unsigned int rxqs)
-{
- dev->real_num_rx_queues = rxqs;
- return 0;
-}
-#endif
int netif_set_real_num_queues(struct net_device *dev,
unsigned int txq, unsigned int rxq);
diff --git a/net/core/dev.c b/net/core/dev.c
index 0bf5af9706b1..3f35f3f8b2f1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3162,7 +3162,6 @@ int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
}
EXPORT_SYMBOL(netif_set_real_num_tx_queues);
-#ifdef CONFIG_SYSFS
/**
* netif_set_real_num_rx_queues - set actual number of RX queues used
* @dev: Network device
@@ -3193,7 +3192,6 @@ int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq)
return 0;
}
EXPORT_SYMBOL(netif_set_real_num_rx_queues);
-#endif
/**
* netif_set_real_num_queues - set actual number of RX and TX queues used
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next 03/11] net: constify dev pointer in misc instance lock helpers
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
2025-03-12 22:34 ` [PATCH net-next 01/11] net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf() Jakub Kicinski
2025-03-12 22:34 ` [PATCH net-next 02/11] net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n Jakub Kicinski
@ 2025-03-12 22:34 ` Jakub Kicinski
2025-03-12 22:35 ` [PATCH net-next 04/11] net: explain "protection types" for the instance lock Jakub Kicinski
` (7 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:34 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
lockdep asserts and predicates can operate on const pointers.
In the future this will let us add asserts in functions
which operate on const pointers like dev_get_min_mp_channel_count().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/netdev_lock.h | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/net/netdev_lock.h b/include/net/netdev_lock.h
index 99631fbd7f54..689ffdfae50d 100644
--- a/include/net/netdev_lock.h
+++ b/include/net/netdev_lock.h
@@ -11,19 +11,20 @@ static inline bool netdev_trylock(struct net_device *dev)
return mutex_trylock(&dev->lock);
}
-static inline void netdev_assert_locked(struct net_device *dev)
+static inline void netdev_assert_locked(const struct net_device *dev)
{
lockdep_assert_held(&dev->lock);
}
-static inline void netdev_assert_locked_or_invisible(struct net_device *dev)
+static inline void
+netdev_assert_locked_or_invisible(const struct net_device *dev)
{
if (dev->reg_state == NETREG_REGISTERED ||
dev->reg_state == NETREG_UNREGISTERING)
netdev_assert_locked(dev);
}
-static inline bool netdev_need_ops_lock(struct net_device *dev)
+static inline bool netdev_need_ops_lock(const struct net_device *dev)
{
bool ret = dev->request_ops_lock || !!dev->queue_mgmt_ops;
@@ -46,7 +47,7 @@ static inline void netdev_unlock_ops(struct net_device *dev)
netdev_unlock(dev);
}
-static inline void netdev_ops_assert_locked(struct net_device *dev)
+static inline void netdev_ops_assert_locked(const struct net_device *dev)
{
if (netdev_need_ops_lock(dev))
lockdep_assert_held(&dev->lock);
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next 04/11] net: explain "protection types" for the instance lock
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (2 preceding siblings ...)
2025-03-12 22:34 ` [PATCH net-next 03/11] net: constify dev pointer in misc instance lock helpers Jakub Kicinski
@ 2025-03-12 22:35 ` Jakub Kicinski
2025-03-12 22:35 ` [PATCH net-next 05/11] net: designate queue counts as "double ops protected" by " Jakub Kicinski
` (6 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:35 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Try to define some terminology for which fields are protected
by which lock and how. Some fields are protected by both rtnl_lock
and instance lock which is hard to talk about without having
a "key phrase" to refer to a particular protection scheme.
"ops protected" fields are defined later in the series, one by one.
Add ASSERT_RTNL() to netdev_ops_assert_locked() for drivers
not other instance protection of ops. Hopefully it's not too
confusion that netdev_lock_ops() does not match the lock which
netdev_ops_assert_locked() will assert, exactly. The noun "ops"
is in a different place in the name, so I think it's acceptable...
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/netdevice.h | 24 ++++++++++++++++++------
include/net/netdev_lock.h | 3 +++
2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2f344d5ad953..e8e4e6fbcef4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2496,19 +2496,31 @@ struct net_device {
* Should always be taken using netdev_lock() / netdev_unlock() helpers.
* Drivers are free to use it for other protection.
*
- * Protects:
+ * For the drivers that implement shaper or queue API, the scope
+ * of this lock is expanded to cover most ndo/queue/ethtool/sysfs
+ * operations. Drivers may opt-in to this behavior by setting
+ * @request_ops_lock.
+ *
+ * @lock protection mixes with rtnl_lock in multiple ways, fields are
+ * either:
+ * - simply protected by the instance @lock;
+ * - double protected - writers hold both locks, readers hold either;
+ * - ops protected - protected by the lock held around the NDOs
+ * and other callbacks, that is the instance lock on devices for
+ * which netdev_need_ops_lock() returns true, otherwise by rtnl_lock;
+ * - double ops protected - always protected by rtnl_lock but for
+ * devices for which netdev_need_ops_lock() returns true - also
+ * the instance lock.
+ *
+ * Simply protects:
* @gro_flush_timeout, @napi_defer_hard_irqs, @napi_list,
* @net_shaper_hierarchy, @reg_state, @threaded, @dev_addr
*
- * Partially protects (writers must hold both @lock and rtnl_lock):
+ * Double protects:
* @up
*
* Also protects some fields in struct napi_struct.
*
- * For the drivers that implement shaper or queue API, the scope
- * of this lock is expanded to cover most ndo/queue/ethtool/sysfs
- * operations.
- *
* Ordering: take after rtnl_lock.
*/
struct mutex lock;
diff --git a/include/net/netdev_lock.h b/include/net/netdev_lock.h
index 689ffdfae50d..efd302375ef2 100644
--- a/include/net/netdev_lock.h
+++ b/include/net/netdev_lock.h
@@ -5,6 +5,7 @@
#include <linux/lockdep.h>
#include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
static inline bool netdev_trylock(struct net_device *dev)
{
@@ -51,6 +52,8 @@ static inline void netdev_ops_assert_locked(const struct net_device *dev)
{
if (netdev_need_ops_lock(dev))
lockdep_assert_held(&dev->lock);
+ else
+ ASSERT_RTNL();
}
static inline int netdev_lock_cmp_fn(const struct lockdep_map *a,
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next 05/11] net: designate queue counts as "double ops protected" by instance lock
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (3 preceding siblings ...)
2025-03-12 22:35 ` [PATCH net-next 04/11] net: explain "protection types" for the instance lock Jakub Kicinski
@ 2025-03-12 22:35 ` Jakub Kicinski
2025-03-12 22:35 ` [PATCH net-next 06/11] net: designate queue -> napi linking as "ops protected" Jakub Kicinski
` (5 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:35 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Drivers which opt into instance lock protection of ops should
only call set_real_num_*_queues() under the instance lock.
This means that queue counts are double protected (writes
are under both rtnl_lock and instance lock, readers under
either).
Some readers may still be under the rtnl_lock, however, so for
now we need double protection of writers.
OTOH queue API paths are only under the protection of the instance
lock, so we need to validate that the instance is actually locking
ops, otherwise the input checks we do against queue count are racy.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/netdevice.h | 3 +++
net/core/dev.c | 2 ++
net/core/net-sysfs.c | 2 ++
net/core/netdev-genl.c | 7 +++++++
net/core/netdev_rx_queue.c | 3 +++
5 files changed, 17 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e8e4e6fbcef4..0e2fa80a556d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2519,6 +2519,9 @@ struct net_device {
* Double protects:
* @up
*
+ * Double ops protects:
+ * @real_num_rx_queues, @real_num_tx_queues
+ *
* Also protects some fields in struct napi_struct.
*
* Ordering: take after rtnl_lock.
diff --git a/net/core/dev.c b/net/core/dev.c
index 3f35f3f8b2f1..8ae4f56169fe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3132,6 +3132,7 @@ int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
if (dev->reg_state == NETREG_REGISTERED ||
dev->reg_state == NETREG_UNREGISTERING) {
ASSERT_RTNL();
+ netdev_ops_assert_locked(dev);
rc = netdev_queue_update_kobjects(dev, dev->real_num_tx_queues,
txq);
@@ -3181,6 +3182,7 @@ int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq)
if (dev->reg_state == NETREG_REGISTERED) {
ASSERT_RTNL();
+ netdev_ops_assert_locked(dev);
rc = net_rx_queue_update_kobjects(dev, dev->real_num_rx_queues,
rxq);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 529a0f721268..6776b1080864 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -2145,8 +2145,10 @@ static void remove_queue_kobjects(struct net_device *dev)
net_rx_queue_update_kobjects(dev, real_rx, 0);
netdev_queue_update_kobjects(dev, real_tx, 0);
+ netdev_lock_ops(dev);
dev->real_num_rx_queues = 0;
dev->real_num_tx_queues = 0;
+ netdev_unlock_ops(dev);
#ifdef CONFIG_SYSFS
kset_unregister(dev->queues_kset);
#endif
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 9e4882a22407..fd1cfa9707dc 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -867,6 +867,13 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
goto err_unlock_sock;
}
+ if (!netdev_need_ops_lock(netdev)) {
+ err = -EOPNOTSUPP;
+ NL_SET_BAD_ATTR(info->extack,
+ info->attrs[NETDEV_A_DEV_IFINDEX]);
+ goto err_unlock;
+ }
+
if (dev_xdp_prog_count(netdev)) {
NL_SET_ERR_MSG(info->extack, "unable to bind dmabuf to device with XDP program attached");
err = -EEXIST;
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
index a5b234b33cd5..3af716f77a13 100644
--- a/net/core/netdev_rx_queue.c
+++ b/net/core/netdev_rx_queue.c
@@ -92,6 +92,9 @@ static int __net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
struct netdev_rx_queue *rxq;
int ret;
+ if (!netdev_need_ops_lock(dev))
+ return -EOPNOTSUPP;
+
if (ifq_idx >= dev->real_num_rx_queues)
return -EINVAL;
ifq_idx = array_index_nospec(ifq_idx, dev->real_num_rx_queues);
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next 06/11] net: designate queue -> napi linking as "ops protected"
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (4 preceding siblings ...)
2025-03-12 22:35 ` [PATCH net-next 05/11] net: designate queue counts as "double ops protected" by " Jakub Kicinski
@ 2025-03-12 22:35 ` Jakub Kicinski
2025-03-12 22:35 ` [PATCH net-next 07/11] net: protect rxq->mp_params with the instance lock Jakub Kicinski
` (4 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:35 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
netdev netlink is the only reader of netdev_{,rx_}queue->napi,
and it already holds netdev->lock. Switch protection of
the writes to netdev->lock to "ops protected".
The expectation will be now that accessing queue->napi
will require netdev->lock for "ops locked" drivers, and
rtnl_lock for all other drivers.
Current "ops locked" drivers don't require any changes.
gve and netdevsim use _locked() helpers right next to
netif_queue_set_napi() so they must be holding the instance
lock. iavf doesn't call it. bnxt is a bit messy but all paths
seem locked.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/netdevice.h | 5 +++--
include/net/netdev_lock.h | 8 ++++++++
include/net/netdev_rx_queue.h | 2 +-
net/core/dev.c | 3 +--
4 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0e2fa80a556d..0fc79ae60ff5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -710,7 +710,7 @@ struct netdev_queue {
* slow- / control-path part
*/
/* NAPI instance for the queue
- * Readers and writers must hold RTNL
+ * "ops protected", see comment about net_device::lock
*/
struct napi_struct *napi;
@@ -2522,7 +2522,8 @@ struct net_device {
* Double ops protects:
* @real_num_rx_queues, @real_num_tx_queues
*
- * Also protects some fields in struct napi_struct.
+ * Also protects some fields in:
+ * struct napi_struct, struct netdev_queue, struct netdev_rx_queue
*
* Ordering: take after rtnl_lock.
*/
diff --git a/include/net/netdev_lock.h b/include/net/netdev_lock.h
index efd302375ef2..1c0c9a94cc22 100644
--- a/include/net/netdev_lock.h
+++ b/include/net/netdev_lock.h
@@ -56,6 +56,14 @@ static inline void netdev_ops_assert_locked(const struct net_device *dev)
ASSERT_RTNL();
}
+static inline void
+netdev_ops_assert_locked_or_invisible(const struct net_device *dev)
+{
+ if (dev->reg_state == NETREG_REGISTERED ||
+ dev->reg_state == NETREG_UNREGISTERING)
+ netdev_ops_assert_locked(dev);
+}
+
static inline int netdev_lock_cmp_fn(const struct lockdep_map *a,
const struct lockdep_map *b)
{
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index af40842f229d..b2238b551dce 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -24,7 +24,7 @@ struct netdev_rx_queue {
struct xsk_buff_pool *pool;
#endif
/* NAPI instance for the queue
- * Readers and writers must hold RTNL
+ * "ops protected", see comment about net_device::lock
*/
struct napi_struct *napi;
struct pp_memory_provider_params mp_params;
diff --git a/net/core/dev.c b/net/core/dev.c
index 8ae4f56169fe..8f30938ed402 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6886,8 +6886,7 @@ void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
if (WARN_ON_ONCE(napi && !napi->dev))
return;
- if (dev->reg_state >= NETREG_REGISTERED)
- ASSERT_RTNL();
+ netdev_ops_assert_locked_or_invisible(dev);
switch (type) {
case NETDEV_QUEUE_TYPE_RX:
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next 07/11] net: protect rxq->mp_params with the instance lock
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (5 preceding siblings ...)
2025-03-12 22:35 ` [PATCH net-next 06/11] net: designate queue -> napi linking as "ops protected" Jakub Kicinski
@ 2025-03-12 22:35 ` Jakub Kicinski
2025-03-12 22:35 ` [PATCH net-next 08/11] net: make NETDEV_UNREGISTER and instance lock more consistent Jakub Kicinski
` (3 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:35 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Ensure that all accesses to mp_params are under the netdev
instance lock. The only change we need is to move
dev_memory_provider_uninstall() under the lock.
Appropriately swap the asserts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/dev.c | 4 ++--
net/core/page_pool.c | 7 ++-----
2 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 8f30938ed402..ded6ffbda965 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10334,7 +10334,7 @@ u32 dev_get_min_mp_channel_count(const struct net_device *dev)
{
int i;
- ASSERT_RTNL();
+ netdev_ops_assert_locked(dev);
for (i = dev->real_num_rx_queues - 1; i >= 0; i--)
if (dev->_rx[i].mp_params.mp_priv)
@@ -11938,9 +11938,9 @@ void unregister_netdevice_many_notify(struct list_head *head,
dev_tcx_uninstall(dev);
netdev_lock_ops(dev);
dev_xdp_uninstall(dev);
+ dev_memory_provider_uninstall(dev);
netdev_unlock_ops(dev);
bpf_dev_bound_netdev_unregister(dev);
- dev_memory_provider_uninstall(dev);
netdev_offload_xstats_disable_all(dev);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index acef1fcd8ddc..7745ad924ae2 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -11,6 +11,7 @@
#include <linux/slab.h>
#include <linux/device.h>
+#include <net/netdev_lock.h>
#include <net/netdev_rx_queue.h>
#include <net/page_pool/helpers.h>
#include <net/page_pool/memory_provider.h>
@@ -279,11 +280,7 @@ static int page_pool_init(struct page_pool *pool,
get_device(pool->p.dev);
if (pool->slow.flags & PP_FLAG_ALLOW_UNREADABLE_NETMEM) {
- /* We rely on rtnl_lock()ing to make sure netdev_rx_queue
- * configuration doesn't change while we're initializing
- * the page_pool.
- */
- ASSERT_RTNL();
+ netdev_assert_locked(pool->slow.netdev);
rxq = __netif_get_rx_queue(pool->slow.netdev,
pool->slow.queue_idx);
pool->mp_priv = rxq->mp_params.mp_priv;
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next 08/11] net: make NETDEV_UNREGISTER and instance lock more consistent
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (6 preceding siblings ...)
2025-03-12 22:35 ` [PATCH net-next 07/11] net: protect rxq->mp_params with the instance lock Jakub Kicinski
@ 2025-03-12 22:35 ` Jakub Kicinski
2025-03-12 22:35 ` [PATCH net-next 09/11] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
` (2 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:35 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
The NETDEV_UNREGISTER notifier gets called under the ops lock
when device changes namespace but not during real unregistration.
Take it consistently, XSK tries to poke at netdev queue state
from this notifier.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/dev.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index ded6ffbda965..8296b4c159fe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1856,7 +1856,9 @@ static void call_netdevice_unregister_notifiers(struct notifier_block *nb,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
}
+ netdev_lock_ops(dev);
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
+ netdev_unlock_ops(dev);
}
static int call_netdevice_register_net_notifiers(struct notifier_block *nb,
@@ -11155,8 +11157,11 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
rtnl_lock();
/* Rebroadcast unregister notification */
- list_for_each_entry(dev, list, todo_list)
+ list_for_each_entry(dev, list, todo_list) {
+ netdev_lock_ops(dev);
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+ netdev_unlock_ops(dev);
+ }
__rtnl_unlock();
rcu_barrier();
@@ -11947,7 +11952,9 @@ void unregister_netdevice_many_notify(struct list_head *head,
/* Notify protocols, that we are about to destroy
* this device. They should clean all the things.
*/
+ netdev_lock_ops(dev);
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+ netdev_unlock_ops(dev);
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next 09/11] net: designate XSK pool pointers in queues as "ops protected"
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (7 preceding siblings ...)
2025-03-12 22:35 ` [PATCH net-next 08/11] net: make NETDEV_UNREGISTER and instance lock more consistent Jakub Kicinski
@ 2025-03-12 22:35 ` Jakub Kicinski
2025-03-13 8:28 ` Stanislav Fomichev
2025-03-19 21:51 ` Paolo Abeni
2025-03-12 22:35 ` [PATCH net-next 10/11] netdev: add "ops compat locking" helpers Jakub Kicinski
2025-03-12 22:35 ` [PATCH net-next 11/11] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
10 siblings, 2 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:35 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Read accesses go via xsk_get_pool_from_qid(), the call coming
from the core and gve look safe (other "ops locked" drivers
don't support XSK).
Write accesses go via xsk_reg_pool_at_qid() and xsk_clear_pool_at_qid().
Former is already under the ops lock, latter needs to be locked when
coming from the workqueue via xp_clear_dev().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/netdevice.h | 1 +
include/net/netdev_rx_queue.h | 6 +++---
net/xdp/xsk_buff_pool.c | 3 +++
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0fc79ae60ff5..7d802ef1c864 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -688,6 +688,7 @@ struct netdev_queue {
/* Subordinate device that the queue has been assigned to */
struct net_device *sb_dev;
#ifdef CONFIG_XDP_SOCKETS
+ /* "ops protected", see comment about net_device::lock */
struct xsk_buff_pool *pool;
#endif
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index b2238b551dce..8cdcd138b33f 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -20,12 +20,12 @@ struct netdev_rx_queue {
struct net_device *dev;
netdevice_tracker dev_tracker;
+ /* All fields below are "ops protected",
+ * see comment about net_device::lock
+ */
#ifdef CONFIG_XDP_SOCKETS
struct xsk_buff_pool *pool;
#endif
- /* NAPI instance for the queue
- * "ops protected", see comment about net_device::lock
- */
struct napi_struct *napi;
struct pp_memory_provider_params mp_params;
} ____cacheline_aligned_in_smp;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 14716ad3d7bc..60b3adb7b2d7 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -279,9 +279,12 @@ static void xp_release_deferred(struct work_struct *work)
{
struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool,
work);
+ struct net_device *netdev = pool->netdev;
rtnl_lock();
+ netdev_lock_ops(netdev);
xp_clear_dev(pool);
+ netdev_unlock_ops(netdev);
rtnl_unlock();
if (pool->fq) {
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH net-next 09/11] net: designate XSK pool pointers in queues as "ops protected"
2025-03-12 22:35 ` [PATCH net-next 09/11] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
@ 2025-03-13 8:28 ` Stanislav Fomichev
2025-03-19 21:51 ` Paolo Abeni
1 sibling, 0 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2025-03-13 8:28 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf
On 03/12, Jakub Kicinski wrote:
> Read accesses go via xsk_get_pool_from_qid(), the call coming
> from the core and gve look safe (other "ops locked" drivers
> don't support XSK).
>
> Write accesses go via xsk_reg_pool_at_qid() and xsk_clear_pool_at_qid().
> Former is already under the ops lock, latter needs to be locked when
> coming from the workqueue via xp_clear_dev().
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> include/linux/netdevice.h | 1 +
> include/net/netdev_rx_queue.h | 6 +++---
> net/xdp/xsk_buff_pool.c | 3 +++
> 3 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0fc79ae60ff5..7d802ef1c864 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -688,6 +688,7 @@ struct netdev_queue {
> /* Subordinate device that the queue has been assigned to */
> struct net_device *sb_dev;
> #ifdef CONFIG_XDP_SOCKETS
> + /* "ops protected", see comment about net_device::lock */
> struct xsk_buff_pool *pool;
> #endif
>
> diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
> index b2238b551dce..8cdcd138b33f 100644
> --- a/include/net/netdev_rx_queue.h
> +++ b/include/net/netdev_rx_queue.h
> @@ -20,12 +20,12 @@ struct netdev_rx_queue {
> struct net_device *dev;
> netdevice_tracker dev_tracker;
>
> + /* All fields below are "ops protected",
> + * see comment about net_device::lock
> + */
> #ifdef CONFIG_XDP_SOCKETS
> struct xsk_buff_pool *pool;
> #endif
> - /* NAPI instance for the queue
> - * "ops protected", see comment about net_device::lock
> - */
> struct napi_struct *napi;
> struct pp_memory_provider_params mp_params;
> } ____cacheline_aligned_in_smp;
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 14716ad3d7bc..60b3adb7b2d7 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -279,9 +279,12 @@ static void xp_release_deferred(struct work_struct *work)
> {
> struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool,
> work);
> + struct net_device *netdev = pool->netdev;
It looks like netdev might be null here. At least xp_clear_dev has
an explicit check. Presumably this happens when the device goes down
and invalidates the open sockets (in xsk_notifier) and we call
xp_release_deferred on socket close afterwards with netdev==null.
> rtnl_lock();
> + netdev_lock_ops(netdev);
> xp_clear_dev(pool);
> + netdev_unlock_ops(netdev);
> rtnl_unlock();
>
> if (pool->fq) {
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next 09/11] net: designate XSK pool pointers in queues as "ops protected"
2025-03-12 22:35 ` [PATCH net-next 09/11] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
2025-03-13 8:28 ` Stanislav Fomichev
@ 2025-03-19 21:51 ` Paolo Abeni
1 sibling, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2025-03-19 21:51 UTC (permalink / raw)
To: Jakub Kicinski, davem; +Cc: netdev, edumazet, andrew+netdev, horms, sdf
On 3/12/25 11:35 PM, Jakub Kicinski wrote:
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 14716ad3d7bc..60b3adb7b2d7 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -279,9 +279,12 @@ static void xp_release_deferred(struct work_struct *work)
> {
> struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool,
> work);
> + struct net_device *netdev = pool->netdev;
>
> rtnl_lock();
> + netdev_lock_ops(netdev);
I agree with Stan, NULL ptr deref. looks possible here.
/P
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 10/11] netdev: add "ops compat locking" helpers
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (8 preceding siblings ...)
2025-03-12 22:35 ` [PATCH net-next 09/11] net: designate XSK pool pointers in queues as "ops protected" Jakub Kicinski
@ 2025-03-12 22:35 ` Jakub Kicinski
2025-03-12 22:35 ` [PATCH net-next 11/11] netdev: don't hold rtnl_lock over nl queue info get when possible Jakub Kicinski
10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:35 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Add helpers to "lock a netdev in a backward-compatible way",
which for ops-locked netdevs will mean take the instance lock.
For drivers which haven't opted into the ops locking we'll take
rtnl_lock.
The scoped foreach is dropping and re-taking the lock for each
device, even if prev and next are both under rtnl_lock.
I hope that's fine since we expect that netdev nl to be mostly
supported by modern drivers, and modern drivers should also
opt into the instance locking.
Note that these helpers are mostly needed for queue related state,
because drivers modify queue config in their ops in a non-atomic
way. Or differently put, queue changes don't have a clear-cut API
like NAPI configuration. Any state that can should just use the
instance lock directly, not the "compat" hacks.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/netdev_lock.h | 16 +++++++++++++
net/core/dev.h | 15 ++++++++++++
net/core/dev.c | 49 +++++++++++++++++++++++++++++++++++++++
3 files changed, 80 insertions(+)
diff --git a/include/net/netdev_lock.h b/include/net/netdev_lock.h
index 1c0c9a94cc22..76cbf5a449b6 100644
--- a/include/net/netdev_lock.h
+++ b/include/net/netdev_lock.h
@@ -64,6 +64,22 @@ netdev_ops_assert_locked_or_invisible(const struct net_device *dev)
netdev_ops_assert_locked(dev);
}
+static inline void netdev_lock_ops_compat(struct net_device *dev)
+{
+ if (netdev_need_ops_lock(dev))
+ netdev_lock(dev);
+ else
+ rtnl_lock();
+}
+
+static inline void netdev_unlock_ops_compat(struct net_device *dev)
+{
+ if (netdev_need_ops_lock(dev))
+ netdev_unlock(dev);
+ else
+ rtnl_unlock();
+}
+
static inline int netdev_lock_cmp_fn(const struct lockdep_map *a,
const struct lockdep_map *b)
{
diff --git a/net/core/dev.h b/net/core/dev.h
index 0ddd3631acb0..f1e9f51f36d4 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -42,6 +42,21 @@ DEFINE_FREE(netdev_unlock, struct net_device *, if (_T) netdev_unlock(_T));
(var_name = netdev_xa_find_lock(net, var_name, &ifindex)); \
ifindex++)
+struct net_device *
+netdev_get_by_index_lock_ops_compat(struct net *net, int ifindex);
+struct net_device *
+netdev_xa_find_lock_ops_compat(struct net *net, struct net_device *dev,
+ unsigned long *index);
+
+DEFINE_FREE(netdev_unlock_ops_compat, struct net_device *,
+ if (_T) netdev_unlock_ops_compat(_T));
+
+#define for_each_netdev_lock_ops_compat_scoped(net, var_name, ifindex) \
+ for (struct net_device *var_name __free(netdev_unlock_ops_compat) = NULL; \
+ (var_name = netdev_xa_find_lock_ops_compat(net, var_name, \
+ &ifindex)); \
+ ifindex++)
+
#ifdef CONFIG_PROC_FS
int __init dev_proc_init(void);
#else
diff --git a/net/core/dev.c b/net/core/dev.c
index 8296b4c159fe..1477bbae0ee0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1037,6 +1037,18 @@ struct net_device *__netdev_put_lock(struct net_device *dev)
return dev;
}
+static struct net_device *__netdev_put_lock_ops_compat(struct net_device *dev)
+{
+ netdev_lock_ops_compat(dev);
+ if (dev->reg_state > NETREG_REGISTERED) {
+ netdev_unlock_ops_compat(dev);
+ dev_put(dev);
+ return NULL;
+ }
+ dev_put(dev);
+ return dev;
+}
+
/**
* netdev_get_by_index_lock() - find a device by its ifindex
* @net: the applicable net namespace
@@ -1059,6 +1071,18 @@ struct net_device *netdev_get_by_index_lock(struct net *net, int ifindex)
return __netdev_put_lock(dev);
}
+struct net_device *
+netdev_get_by_index_lock_ops_compat(struct net *net, int ifindex)
+{
+ struct net_device *dev;
+
+ dev = dev_get_by_index(net, ifindex);
+ if (!dev)
+ return NULL;
+
+ return __netdev_put_lock_ops_compat(dev);
+}
+
/**
* netdev_get_by_name_lock() - find a device by its name
* @net: the applicable net namespace
@@ -1106,6 +1130,31 @@ netdev_xa_find_lock(struct net *net, struct net_device *dev,
} while (true);
}
+struct net_device *
+netdev_xa_find_lock_ops_compat(struct net *net, struct net_device *dev,
+ unsigned long *index)
+{
+ if (dev)
+ netdev_unlock_ops_compat(dev);
+
+ do {
+ rcu_read_lock();
+ dev = xa_find(&net->dev_by_index, index, ULONG_MAX, XA_PRESENT);
+ if (!dev) {
+ rcu_read_unlock();
+ return NULL;
+ }
+ dev_hold(dev);
+ rcu_read_unlock();
+
+ dev = __netdev_put_lock_ops_compat(dev);
+ if (dev)
+ return dev;
+
+ (*index)++;
+ } while (true);
+}
+
static DEFINE_SEQLOCK(netdev_rename_lock);
void netdev_copy_name(struct net_device *dev, char *name)
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next 11/11] netdev: don't hold rtnl_lock over nl queue info get when possible
2025-03-12 22:34 [PATCH net-next 00/11] net: skip taking rtnl_lock for queue GET Jakub Kicinski
` (9 preceding siblings ...)
2025-03-12 22:35 ` [PATCH net-next 10/11] netdev: add "ops compat locking" helpers Jakub Kicinski
@ 2025-03-12 22:35 ` Jakub Kicinski
10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-03-12 22:35 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sdf,
Jakub Kicinski
Netdev queue dump accesses: NAPI, memory providers, XSk pointers.
All three are "ops protected" now, switch to the op compat locking.
rtnl lock does not have to be taken for "ops locked" devices.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/netdev-genl.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index fd1cfa9707dc..39f52a311f07 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -481,18 +481,15 @@ int netdev_nl_queue_get_doit(struct sk_buff *skb, struct genl_info *info)
if (!rsp)
return -ENOMEM;
- rtnl_lock();
-
- netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
+ netdev = netdev_get_by_index_lock_ops_compat(genl_info_net(info),
+ ifindex);
if (netdev) {
err = netdev_nl_queue_fill(rsp, netdev, q_id, q_type, info);
- netdev_unlock(netdev);
+ netdev_unlock_ops_compat(netdev);
} else {
err = -ENODEV;
}
- rtnl_unlock();
-
if (err)
goto err_free_msg;
@@ -541,17 +538,17 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
if (info->attrs[NETDEV_A_QUEUE_IFINDEX])
ifindex = nla_get_u32(info->attrs[NETDEV_A_QUEUE_IFINDEX]);
- rtnl_lock();
if (ifindex) {
- netdev = netdev_get_by_index_lock(net, ifindex);
+ netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
if (netdev) {
err = netdev_nl_queue_dump_one(netdev, skb, info, ctx);
- netdev_unlock(netdev);
+ netdev_unlock_ops_compat(netdev);
} else {
err = -ENODEV;
}
} else {
- for_each_netdev_lock_scoped(net, netdev, ctx->ifindex) {
+ for_each_netdev_lock_ops_compat_scoped(net, netdev,
+ ctx->ifindex) {
err = netdev_nl_queue_dump_one(netdev, skb, info, ctx);
if (err < 0)
break;
@@ -559,7 +556,6 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
ctx->txq_idx = 0;
}
}
- rtnl_unlock();
return err;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread