* [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking
@ 2025-05-21 12:08 Tariq Toukan
2025-05-21 12:08 ` [PATCH net-next 1/5] IB/IPoIB: Enqueue separate work_structs for each flushed interface Tariq Toukan
` (6 more replies)
0 siblings, 7 replies; 13+ messages in thread
From: Tariq Toukan @ 2025-05-21 12:08 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: Jason Gunthorpe, Leon Romanovsky, Saeed Mahameed, Tariq Toukan,
Richard Cochran, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-rdma, linux-kernel,
netdev, bpf, Moshe Shemesh, Mark Bloch, Gal Pressman,
Cosmin Ratiu
Hi,
This series by Cosmin converts mlx5 to use the recently added netdev
instance locking scheme.
Find detailed description by Cosmin below [1].
Regards,
Tariq
[1]
mlx5 manages multiple netdevices, from basic Ethernet to Infiniband
netdevs. This patch series converts the driver to use netdev instance
locking for everything in preparation for TCP devmem Zero Copy.
Because mlx5 is tightly coupled with the ipoib driver, a series of
changes first happen in ipoib to allow it to work with mlx5 netdevs that
use instance locking:
IB/IPoIB: Enqueue separate work_structs for each flushed interface
IB/IPoIB: Replace vlan_rwsem with the netdev instance lock
IB/IPoIB: Allow using netdevs that require the instance lock
A small patch then avoids dropping RTNL during firmware update:
net/mlx5e: Don't drop RTNL during firmware flash
The main patch then converts all mlx5 netdevs to use instance locking:
net/mlx5e: Convert mlx5 netdevs to instance locking
Cosmin Ratiu (5):
IB/IPoIB: Enqueue separate work_structs for each flushed interface
IB/IPoIB: Replace vlan_rwsem with the netdev instance lock
IB/IPoIB: Allow using netdevs that require the instance lock
net/mlx5e: Don't drop RTNL during firmware flash
net/mlx5e: Convert mlx5 netdevs to instance locking
drivers/infiniband/ulp/ipoib/ipoib.h | 13 +-
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 65 ++++++---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 127 ++++++++++++------
drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 8 +-
drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 19 +--
.../ethernet/mellanox/mlx5/core/en/health.c | 2 +
.../net/ethernet/mellanox/mlx5/core/en/ptp.c | 25 +++-
.../mellanox/mlx5/core/en/reporter_tx.c | 4 -
.../net/ethernet/mellanox/mlx5/core/en/trap.c | 12 +-
.../ethernet/mellanox/mlx5/core/en_dcbnl.c | 2 +
.../ethernet/mellanox/mlx5/core/en_ethtool.c | 5 -
.../net/ethernet/mellanox/mlx5/core/en_fs.c | 4 +
.../net/ethernet/mellanox/mlx5/core/en_main.c | 82 ++++++-----
.../net/ethernet/mellanox/mlx5/core/en_rep.c | 7 +
.../ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 3 +
15 files changed, 246 insertions(+), 132 deletions(-)
base-commit: f685204c57e87d2a88b159c7525426d70ee745c9
--
2.31.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net-next 1/5] IB/IPoIB: Enqueue separate work_structs for each flushed interface
2025-05-21 12:08 [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking Tariq Toukan
@ 2025-05-21 12:08 ` Tariq Toukan
2025-05-21 12:08 ` [PATCH net-next 2/5] IB/IPoIB: Replace vlan_rwsem with the netdev instance lock Tariq Toukan
` (5 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Tariq Toukan @ 2025-05-21 12:08 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: Jason Gunthorpe, Leon Romanovsky, Saeed Mahameed, Tariq Toukan,
Richard Cochran, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-rdma, linux-kernel,
netdev, bpf, Moshe Shemesh, Mark Bloch, Gal Pressman,
Cosmin Ratiu
From: Cosmin Ratiu <cratiu@nvidia.com>
Previously, flushing a netdevice involved first flushing all child
devices from the flush task itself. That requires holding the lock that
protects the list for the entire duration of the flush.
This poses a problem when converting from vlan_rwsem to the netdev
instance lock (next patch), because holding the parent lock while
trying to acquire a child lock makes lockdep unhappy, rightfully.
Fix this by splitting a big flush task into individual flush tasks
(all are already created in their respective ipoib_dev_priv structs)
and defining a helper function to enqueue all of them while holding the
list lock.
In ipoib_set_mac, the function is not used and the task is enqueued
directly, because in the subsequent patches locking is changed and this
function may be called with the netdev instance lock held.
This is effectively a noop, the wq is single-threaded and ordered and
will execute the same flush operations in the same order as before.
Furthermore, there should be no new races because
ipoib_parent_unregister_pre() calls flush_workqueue() after stopping new
work generation to wait for pending work to complete. flush_workqueue()
waits for all currently enqueued work to finish before returning.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/infiniband/ulp/ipoib/ipoib.h | 2 +
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 46 ++++++++++++++--------
drivers/infiniband/ulp/ipoib/ipoib_main.c | 10 ++++-
drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 8 ++--
4 files changed, 44 insertions(+), 22 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index abe0522b7df4..2e05e9c9317d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -512,6 +512,8 @@ int ipoib_intf_init(struct ib_device *hca, u32 port, const char *format,
void ipoib_ib_dev_flush_light(struct work_struct *work);
void ipoib_ib_dev_flush_normal(struct work_struct *work);
void ipoib_ib_dev_flush_heavy(struct work_struct *work);
+void ipoib_queue_work(struct ipoib_dev_priv *priv,
+ enum ipoib_flush_level level);
void ipoib_ib_tx_timeout_work(struct work_struct *work);
void ipoib_ib_dev_cleanup(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 5cde275daa94..e0e7f600097d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -1172,24 +1172,11 @@ static bool ipoib_dev_addr_changed_valid(struct ipoib_dev_priv *priv)
}
static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv,
- enum ipoib_flush_level level,
- int nesting)
+ enum ipoib_flush_level level)
{
- struct ipoib_dev_priv *cpriv;
struct net_device *dev = priv->dev;
int result;
- down_read_nested(&priv->vlan_rwsem, nesting);
-
- /*
- * Flush any child interfaces too -- they might be up even if
- * the parent is down.
- */
- list_for_each_entry(cpriv, &priv->child_intfs, list)
- __ipoib_ib_dev_flush(cpriv, level, nesting + 1);
-
- up_read(&priv->vlan_rwsem);
-
if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) &&
level != IPOIB_FLUSH_HEAVY) {
/* Make sure the dev_addr is set even if not flushing */
@@ -1280,7 +1267,7 @@ void ipoib_ib_dev_flush_light(struct work_struct *work)
struct ipoib_dev_priv *priv =
container_of(work, struct ipoib_dev_priv, flush_light);
- __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_LIGHT, 0);
+ __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_LIGHT);
}
void ipoib_ib_dev_flush_normal(struct work_struct *work)
@@ -1288,7 +1275,7 @@ void ipoib_ib_dev_flush_normal(struct work_struct *work)
struct ipoib_dev_priv *priv =
container_of(work, struct ipoib_dev_priv, flush_normal);
- __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_NORMAL, 0);
+ __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_NORMAL);
}
void ipoib_ib_dev_flush_heavy(struct work_struct *work)
@@ -1297,10 +1284,35 @@ void ipoib_ib_dev_flush_heavy(struct work_struct *work)
container_of(work, struct ipoib_dev_priv, flush_heavy);
rtnl_lock();
- __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_HEAVY, 0);
+ __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_HEAVY);
rtnl_unlock();
}
+void ipoib_queue_work(struct ipoib_dev_priv *priv,
+ enum ipoib_flush_level level)
+{
+ if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
+ struct ipoib_dev_priv *cpriv;
+
+ down_read(&priv->vlan_rwsem);
+ list_for_each_entry(cpriv, &priv->child_intfs, list)
+ ipoib_queue_work(cpriv, level);
+ up_read(&priv->vlan_rwsem);
+ }
+
+ switch (level) {
+ case IPOIB_FLUSH_LIGHT:
+ queue_work(ipoib_workqueue, &priv->flush_light);
+ break;
+ case IPOIB_FLUSH_NORMAL:
+ queue_work(ipoib_workqueue, &priv->flush_normal);
+ break;
+ case IPOIB_FLUSH_HEAVY:
+ queue_work(ipoib_workqueue, &priv->flush_heavy);
+ break;
+ }
+}
+
void ipoib_ib_dev_cleanup(struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 3b463db8ce39..55b1f3cbee17 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -2415,6 +2415,14 @@ static int ipoib_set_mac(struct net_device *dev, void *addr)
set_base_guid(priv, (union ib_gid *)(ss->__data + 4));
+ if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
+ struct ipoib_dev_priv *cpriv;
+
+ down_read(&priv->vlan_rwsem);
+ list_for_each_entry(cpriv, &priv->child_intfs, list)
+ queue_work(ipoib_workqueue, &cpriv->flush_light);
+ up_read(&priv->vlan_rwsem);
+ }
queue_work(ipoib_workqueue, &priv->flush_light);
return 0;
@@ -2526,7 +2534,7 @@ static struct net_device *ipoib_add_port(const char *format,
ib_register_event_handler(&priv->event_handler);
/* call event handler to ensure pkey in sync */
- queue_work(ipoib_workqueue, &priv->flush_heavy);
+ ipoib_queue_work(priv, IPOIB_FLUSH_HEAVY);
ndev->rtnl_link_ops = ipoib_get_link_ops();
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 368e5d77416d..86983080d28b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -280,15 +280,15 @@ void ipoib_event(struct ib_event_handler *handler,
dev_name(&record->device->dev), record->element.port_num);
if (record->event == IB_EVENT_CLIENT_REREGISTER) {
- queue_work(ipoib_workqueue, &priv->flush_light);
+ ipoib_queue_work(priv, IPOIB_FLUSH_LIGHT);
} else if (record->event == IB_EVENT_PORT_ERR ||
record->event == IB_EVENT_PORT_ACTIVE ||
record->event == IB_EVENT_LID_CHANGE) {
- queue_work(ipoib_workqueue, &priv->flush_normal);
+ ipoib_queue_work(priv, IPOIB_FLUSH_NORMAL);
} else if (record->event == IB_EVENT_PKEY_CHANGE) {
- queue_work(ipoib_workqueue, &priv->flush_heavy);
+ ipoib_queue_work(priv, IPOIB_FLUSH_HEAVY);
} else if (record->event == IB_EVENT_GID_CHANGE &&
!test_bit(IPOIB_FLAG_DEV_ADDR_SET, &priv->flags)) {
- queue_work(ipoib_workqueue, &priv->flush_light);
+ ipoib_queue_work(priv, IPOIB_FLUSH_LIGHT);
}
}
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net-next 2/5] IB/IPoIB: Replace vlan_rwsem with the netdev instance lock
2025-05-21 12:08 [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking Tariq Toukan
2025-05-21 12:08 ` [PATCH net-next 1/5] IB/IPoIB: Enqueue separate work_structs for each flushed interface Tariq Toukan
@ 2025-05-21 12:08 ` Tariq Toukan
2025-05-21 12:09 ` [PATCH net-next 3/5] IB/IPoIB: Allow using netdevs that require the " Tariq Toukan
` (4 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Tariq Toukan @ 2025-05-21 12:08 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: Jason Gunthorpe, Leon Romanovsky, Saeed Mahameed, Tariq Toukan,
Richard Cochran, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-rdma, linux-kernel,
netdev, bpf, Moshe Shemesh, Mark Bloch, Gal Pressman,
Cosmin Ratiu
From: Cosmin Ratiu <cratiu@nvidia.com>
vlan_rwsem was added more than a decade ago to work around a deadlock
involving the original mutex being acquired twice, once from the wq.
Subsequent changes then tweaked it to partially protect access to
ipoib_dev_priv->child_intfs together with the RTNL. Flushing the wq
synchronously was also since then refactored to happen separately.
This semaphore unfortunately prevents updating ipoib to work with
devices that require the netdev lock, because of lock ordering issues
between RTNL, vlan_rwsem and the netdev instance locks of parent and
child devices.
To uncomplicate things, this commit replaces vlan_rwsem with the netdev
instance lock of the parent device. Both parent child_intfs list and the
children's list membership in it require holding the parent netdev
instance lock.
All call paths were carefully reviewed and no-longer-needed ASSERT_RTNL
calls were dropped. Some non-trivial changes:
- ipoib_match_gid_pkey_addr() now only acquires the instance lock and
iterates through child_intfs for the first level of recursion (the
parent), as it's not possible to have multiple levels of nested
subinterfaces.
- ipoib_open() and ipoib_stop() schedule tasks on the global workqueue
to open/stop child interfaces to avoid potentially acquiring nested
netdev instance locks. To avoid the device going away between the task
scheduling and execution, netdev_hold/netdev_put are used.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/infiniband/ulp/ipoib/ipoib.h | 11 +--
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 4 +-
drivers/infiniband/ulp/ipoib/ipoib_main.c | 110 ++++++++++++++--------
drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 19 ++--
4 files changed, 87 insertions(+), 57 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 2e05e9c9317d..91f866e3fb8b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -329,14 +329,6 @@ struct ipoib_dev_priv {
unsigned long flags;
- /*
- * This protects access to the child_intfs list.
- * To READ from child_intfs the RTNL or vlan_rwsem read side must be
- * held. To WRITE RTNL and the vlan_rwsem write side must be held (in
- * that order) This lock exists because we have a few contexts where
- * we need the child_intfs, but do not want to grab the RTNL.
- */
- struct rw_semaphore vlan_rwsem;
struct mutex mcast_mutex;
struct rb_root path_tree;
@@ -399,6 +391,9 @@ struct ipoib_dev_priv {
struct ib_event_handler event_handler;
struct net_device *parent;
+ /* 'child_intfs' and 'list' membership of all child devices are
+ * protected by the netdev instance lock of 'dev'.
+ */
struct list_head child_intfs;
struct list_head list;
int child_type;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index e0e7f600097d..dc670b4a191b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -1294,10 +1294,10 @@ void ipoib_queue_work(struct ipoib_dev_priv *priv,
if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
struct ipoib_dev_priv *cpriv;
- down_read(&priv->vlan_rwsem);
+ netdev_lock(priv->dev);
list_for_each_entry(cpriv, &priv->child_intfs, list)
ipoib_queue_work(cpriv, level);
- up_read(&priv->vlan_rwsem);
+ netdev_unlock(priv->dev);
}
switch (level) {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 55b1f3cbee17..4879fd17e868 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -132,6 +132,52 @@ static int ipoib_netdev_event(struct notifier_block *this,
}
#endif
+struct ipoib_ifupdown_work {
+ struct work_struct work;
+ struct net_device *dev;
+ netdevice_tracker dev_tracker;
+ bool up;
+};
+
+static void ipoib_ifupdown_task(struct work_struct *work)
+{
+ struct ipoib_ifupdown_work *pwork =
+ container_of(work, struct ipoib_ifupdown_work, work);
+ struct net_device *dev = pwork->dev;
+ unsigned int flags;
+
+ rtnl_lock();
+ flags = dev->flags;
+ if (pwork->up)
+ flags |= IFF_UP;
+ else
+ flags &= ~IFF_UP;
+
+ if (dev->flags != flags)
+ dev_change_flags(dev, flags, NULL);
+ rtnl_unlock();
+ netdev_put(dev, &pwork->dev_tracker);
+ kfree(pwork);
+}
+
+static void ipoib_schedule_ifupdown_task(struct net_device *dev, bool up)
+{
+ struct ipoib_ifupdown_work *work;
+
+ if ((up && (dev->flags & IFF_UP)) ||
+ (!up && !(dev->flags & IFF_UP)))
+ return;
+
+ work = kmalloc(sizeof(*work), GFP_KERNEL);
+ if (!work)
+ return;
+ work->dev = dev;
+ netdev_hold(dev, &work->dev_tracker, GFP_KERNEL);
+ work->up = up;
+ INIT_WORK(&work->work, ipoib_ifupdown_task);
+ queue_work(ipoib_workqueue, &work->work);
+}
+
int ipoib_open(struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
@@ -154,17 +200,10 @@ int ipoib_open(struct net_device *dev)
struct ipoib_dev_priv *cpriv;
/* Bring up any child interfaces too */
- down_read(&priv->vlan_rwsem);
- list_for_each_entry(cpriv, &priv->child_intfs, list) {
- int flags;
-
- flags = cpriv->dev->flags;
- if (flags & IFF_UP)
- continue;
-
- dev_change_flags(cpriv->dev, flags | IFF_UP, NULL);
- }
- up_read(&priv->vlan_rwsem);
+ netdev_lock(dev);
+ list_for_each_entry(cpriv, &priv->child_intfs, list)
+ ipoib_schedule_ifupdown_task(cpriv->dev, true);
+ netdev_unlock(dev);
} else if (priv->parent) {
struct ipoib_dev_priv *ppriv = ipoib_priv(priv->parent);
@@ -199,17 +238,10 @@ static int ipoib_stop(struct net_device *dev)
struct ipoib_dev_priv *cpriv;
/* Bring down any child interfaces too */
- down_read(&priv->vlan_rwsem);
- list_for_each_entry(cpriv, &priv->child_intfs, list) {
- int flags;
-
- flags = cpriv->dev->flags;
- if (!(flags & IFF_UP))
- continue;
-
- dev_change_flags(cpriv->dev, flags & ~IFF_UP, NULL);
- }
- up_read(&priv->vlan_rwsem);
+ netdev_lock(dev);
+ list_for_each_entry(cpriv, &priv->child_intfs, list)
+ ipoib_schedule_ifupdown_task(cpriv->dev, false);
+ netdev_unlock(dev);
}
return 0;
@@ -426,17 +458,20 @@ static int ipoib_match_gid_pkey_addr(struct ipoib_dev_priv *priv,
}
}
+ if (test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
+ return matches;
+
/* Check child interfaces */
- down_read_nested(&priv->vlan_rwsem, nesting);
+ netdev_lock(priv->dev);
list_for_each_entry(child_priv, &priv->child_intfs, list) {
matches += ipoib_match_gid_pkey_addr(child_priv, gid,
- pkey_index, addr,
- nesting + 1,
- found_net_dev);
+ pkey_index, addr,
+ nesting + 1,
+ found_net_dev);
if (matches > 1)
break;
}
- up_read(&priv->vlan_rwsem);
+ netdev_unlock(priv->dev);
return matches;
}
@@ -1992,9 +2027,9 @@ static int ipoib_ndo_init(struct net_device *ndev)
dev_hold(priv->parent);
- down_write(&ppriv->vlan_rwsem);
+ netdev_lock(priv->parent);
list_add_tail(&priv->list, &ppriv->child_intfs);
- up_write(&ppriv->vlan_rwsem);
+ netdev_unlock(priv->parent);
}
return 0;
@@ -2004,8 +2039,6 @@ static void ipoib_ndo_uninit(struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
- ASSERT_RTNL();
-
/*
* ipoib_remove_one guarantees the children are removed before the
* parent, and that is the only place where a parent can be removed.
@@ -2015,9 +2048,9 @@ static void ipoib_ndo_uninit(struct net_device *dev)
if (priv->parent) {
struct ipoib_dev_priv *ppriv = ipoib_priv(priv->parent);
- down_write(&ppriv->vlan_rwsem);
+ netdev_lock(ppriv->dev);
list_del(&priv->list);
- up_write(&ppriv->vlan_rwsem);
+ netdev_unlock(ppriv->dev);
}
ipoib_neigh_hash_uninit(dev);
@@ -2167,7 +2200,6 @@ static void ipoib_build_priv(struct net_device *dev)
priv->dev = dev;
spin_lock_init(&priv->lock);
- init_rwsem(&priv->vlan_rwsem);
mutex_init(&priv->mcast_mutex);
INIT_LIST_HEAD(&priv->path_list);
@@ -2372,10 +2404,10 @@ static void set_base_guid(struct ipoib_dev_priv *priv, union ib_gid *gid)
netif_addr_unlock_bh(netdev);
if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
- down_read(&priv->vlan_rwsem);
+ netdev_lock(priv->dev);
list_for_each_entry(child_priv, &priv->child_intfs, list)
set_base_guid(child_priv, gid);
- up_read(&priv->vlan_rwsem);
+ netdev_unlock(priv->dev);
}
}
@@ -2418,10 +2450,10 @@ static int ipoib_set_mac(struct net_device *dev, void *addr)
if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
struct ipoib_dev_priv *cpriv;
- down_read(&priv->vlan_rwsem);
+ netdev_lock(dev);
list_for_each_entry(cpriv, &priv->child_intfs, list)
queue_work(ipoib_workqueue, &cpriv->flush_light);
- up_read(&priv->vlan_rwsem);
+ netdev_unlock(dev);
}
queue_work(ipoib_workqueue, &priv->flush_light);
@@ -2632,9 +2664,11 @@ static void ipoib_remove_one(struct ib_device *device, void *client_data)
rtnl_lock();
+ netdev_lock(priv->dev);
list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs,
list)
unregister_netdevice_queue(cpriv->dev, &head);
+ netdev_unlock(priv->dev);
unregister_netdevice_queue(priv->dev, &head);
unregister_netdevice_many(&head);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 562df2b3ef18..243e8f555eca 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -53,8 +53,7 @@ static bool is_child_unique(struct ipoib_dev_priv *ppriv,
struct ipoib_dev_priv *priv)
{
struct ipoib_dev_priv *tpriv;
-
- ASSERT_RTNL();
+ bool result = true;
/*
* Since the legacy sysfs interface uses pkey for deletion it cannot
@@ -73,13 +72,17 @@ static bool is_child_unique(struct ipoib_dev_priv *ppriv,
if (ppriv->pkey == priv->pkey)
return false;
+ netdev_lock(ppriv->dev);
list_for_each_entry(tpriv, &ppriv->child_intfs, list) {
if (tpriv->pkey == priv->pkey &&
- tpriv->child_type == IPOIB_LEGACY_CHILD)
- return false;
+ tpriv->child_type == IPOIB_LEGACY_CHILD) {
+ result = false;
+ break;
+ }
}
+ netdev_unlock(ppriv->dev);
- return true;
+ return result;
}
/*
@@ -98,8 +101,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
int result;
struct rdma_netdev *rn = netdev_priv(ndev);
- ASSERT_RTNL();
-
/*
* We do not need to touch priv if register_netdevice fails, so just
* always use this flow.
@@ -267,6 +268,7 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
ppriv = ipoib_priv(pdev);
rc = -ENODEV;
+ netdev_lock(ppriv->dev);
list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) {
if (priv->pkey == pkey &&
priv->child_type == IPOIB_LEGACY_CHILD) {
@@ -278,9 +280,7 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
goto out;
}
- down_write(&ppriv->vlan_rwsem);
list_del_init(&priv->list);
- up_write(&ppriv->vlan_rwsem);
work->dev = priv->dev;
INIT_WORK(&work->work, ipoib_vlan_delete_task);
queue_work(ipoib_workqueue, &work->work);
@@ -291,6 +291,7 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
}
out:
+ netdev_unlock(ppriv->dev);
rtnl_unlock();
return rc;
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net-next 3/5] IB/IPoIB: Allow using netdevs that require the instance lock
2025-05-21 12:08 [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking Tariq Toukan
2025-05-21 12:08 ` [PATCH net-next 1/5] IB/IPoIB: Enqueue separate work_structs for each flushed interface Tariq Toukan
2025-05-21 12:08 ` [PATCH net-next 2/5] IB/IPoIB: Replace vlan_rwsem with the netdev instance lock Tariq Toukan
@ 2025-05-21 12:09 ` Tariq Toukan
2025-05-21 12:09 ` [PATCH net-next 4/5] net/mlx5e: Don't drop RTNL during firmware flash Tariq Toukan
` (3 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Tariq Toukan @ 2025-05-21 12:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: Jason Gunthorpe, Leon Romanovsky, Saeed Mahameed, Tariq Toukan,
Richard Cochran, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-rdma, linux-kernel,
netdev, bpf, Moshe Shemesh, Mark Bloch, Gal Pressman,
Cosmin Ratiu
From: Cosmin Ratiu <cratiu@nvidia.com>
After the last patch removing vlan_rwsem, it is an incremental step to
allow ipoib to work with netdevs that require the instance lock.
In several places, netdev_lock() is changed to netdev_lock_ops_to_full()
which takes care of not acquiring the lock again when the netdev is
already locked.
In ipoib_ib_tx_timeout_work() and __ipoib_ib_dev_flush() for HEAVY
flushes, the netdev lock is acquired/released. This is needed because
these functions end up calling .ndo_stop()/.ndo_open() on subinterfaces,
and the device may expect the netdev instance lock to be held.
ipoib_set_mode() now explicitly acquires ops lock while manipulating the
features, mtu and tx queues.
Finally, ipoib_napi_enable()/ipoib_napi_disable() now use the *_locked
variants of the napi_enable()/napi_disable() calls and optionally
acquire the netdev lock themselves depending on the dev they operate on.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 19 +++++++++++-----
drivers/infiniband/ulp/ipoib/ipoib_main.c | 27 ++++++++++++++---------
2 files changed, 31 insertions(+), 15 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index dc670b4a191b..10b0dbda6cd5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -40,6 +40,7 @@
#include <linux/ip.h>
#include <linux/tcp.h>
+#include <net/netdev_lock.h>
#include <rdma/ib_cache.h>
#include "ipoib.h"
@@ -781,16 +782,20 @@ static void ipoib_napi_enable(struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
- napi_enable(&priv->recv_napi);
- napi_enable(&priv->send_napi);
+ netdev_lock_ops_to_full(dev);
+ napi_enable_locked(&priv->recv_napi);
+ napi_enable_locked(&priv->send_napi);
+ netdev_unlock_full_to_ops(dev);
}
static void ipoib_napi_disable(struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
- napi_disable(&priv->recv_napi);
- napi_disable(&priv->send_napi);
+ netdev_lock_ops_to_full(dev);
+ napi_disable_locked(&priv->recv_napi);
+ napi_disable_locked(&priv->send_napi);
+ netdev_unlock_full_to_ops(dev);
}
int ipoib_ib_dev_stop_default(struct net_device *dev)
@@ -1240,10 +1245,14 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv,
ipoib_ib_dev_down(dev);
if (level == IPOIB_FLUSH_HEAVY) {
+ netdev_lock_ops(dev);
if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
ipoib_ib_dev_stop(dev);
- if (ipoib_ib_dev_open(dev))
+ result = ipoib_ib_dev_open(dev);
+ netdev_unlock_ops(dev);
+
+ if (result)
return;
if (netif_queue_stopped(dev))
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 4879fd17e868..f2f5465f2a90 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -49,6 +49,7 @@
#include <linux/jhash.h>
#include <net/arp.h>
#include <net/addrconf.h>
+#include <net/netdev_lock.h>
#include <net/pkt_sched.h>
#include <linux/inetdevice.h>
#include <rdma/ib_cache.h>
@@ -200,10 +201,10 @@ int ipoib_open(struct net_device *dev)
struct ipoib_dev_priv *cpriv;
/* Bring up any child interfaces too */
- netdev_lock(dev);
+ netdev_lock_ops_to_full(dev);
list_for_each_entry(cpriv, &priv->child_intfs, list)
ipoib_schedule_ifupdown_task(cpriv->dev, true);
- netdev_unlock(dev);
+ netdev_unlock_full_to_ops(dev);
} else if (priv->parent) {
struct ipoib_dev_priv *ppriv = ipoib_priv(priv->parent);
@@ -238,10 +239,10 @@ static int ipoib_stop(struct net_device *dev)
struct ipoib_dev_priv *cpriv;
/* Bring down any child interfaces too */
- netdev_lock(dev);
+ netdev_lock_ops_to_full(dev);
list_for_each_entry(cpriv, &priv->child_intfs, list)
ipoib_schedule_ifupdown_task(cpriv->dev, false);
- netdev_unlock(dev);
+ netdev_unlock_full_to_ops(dev);
}
return 0;
@@ -566,9 +567,11 @@ int ipoib_set_mode(struct net_device *dev, const char *buf)
set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
ipoib_warn(priv, "enabling connected mode "
"will cause multicast packet drops\n");
+ netdev_lock_ops(dev);
netdev_update_features(dev);
- dev_set_mtu(dev, ipoib_cm_max_mtu(dev));
+ netif_set_mtu(dev, ipoib_cm_max_mtu(dev));
netif_set_real_num_tx_queues(dev, 1);
+ netdev_unlock_ops(dev);
rtnl_unlock();
priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM;
@@ -578,9 +581,11 @@ int ipoib_set_mode(struct net_device *dev, const char *buf)
if (!strcmp(buf, "datagram\n")) {
clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+ netdev_lock_ops(dev);
netdev_update_features(dev);
- dev_set_mtu(dev, min(priv->mcast_mtu, dev->mtu));
+ netif_set_mtu(dev, min(priv->mcast_mtu, dev->mtu));
netif_set_real_num_tx_queues(dev, dev->num_tx_queues);
+ netdev_unlock_ops(dev);
rtnl_unlock();
ipoib_flush_paths(dev);
return (!rtnl_trylock()) ? -EBUSY : 0;
@@ -1247,6 +1252,7 @@ void ipoib_ib_tx_timeout_work(struct work_struct *work)
int err;
rtnl_lock();
+ netdev_lock_ops(priv->dev);
if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
goto unlock;
@@ -1261,6 +1267,7 @@ void ipoib_ib_tx_timeout_work(struct work_struct *work)
netif_tx_wake_all_queues(priv->dev);
unlock:
+ netdev_unlock_ops(priv->dev);
rtnl_unlock();
}
@@ -2404,10 +2411,10 @@ static void set_base_guid(struct ipoib_dev_priv *priv, union ib_gid *gid)
netif_addr_unlock_bh(netdev);
if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
- netdev_lock(priv->dev);
+ netdev_lock_ops_to_full(priv->dev);
list_for_each_entry(child_priv, &priv->child_intfs, list)
set_base_guid(child_priv, gid);
- netdev_unlock(priv->dev);
+ netdev_unlock_full_to_ops(priv->dev);
}
}
@@ -2450,10 +2457,10 @@ static int ipoib_set_mac(struct net_device *dev, void *addr)
if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
struct ipoib_dev_priv *cpriv;
- netdev_lock(dev);
+ netdev_lock_ops_to_full(dev);
list_for_each_entry(cpriv, &priv->child_intfs, list)
queue_work(ipoib_workqueue, &cpriv->flush_light);
- netdev_unlock(dev);
+ netdev_unlock_full_to_ops(dev);
}
queue_work(ipoib_workqueue, &priv->flush_light);
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net-next 4/5] net/mlx5e: Don't drop RTNL during firmware flash
2025-05-21 12:08 [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking Tariq Toukan
` (2 preceding siblings ...)
2025-05-21 12:09 ` [PATCH net-next 3/5] IB/IPoIB: Allow using netdevs that require the " Tariq Toukan
@ 2025-05-21 12:09 ` Tariq Toukan
2025-05-22 16:00 ` Jakub Kicinski
2025-05-21 12:09 ` [PATCH net-next 5/5] net/mlx5e: Convert mlx5 netdevs to instance locking Tariq Toukan
` (2 subsequent siblings)
6 siblings, 1 reply; 13+ messages in thread
From: Tariq Toukan @ 2025-05-21 12:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: Jason Gunthorpe, Leon Romanovsky, Saeed Mahameed, Tariq Toukan,
Richard Cochran, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-rdma, linux-kernel,
netdev, bpf, Moshe Shemesh, Mark Bloch, Gal Pressman,
Cosmin Ratiu
From: Cosmin Ratiu <cratiu@nvidia.com>
There's no explanation in the original commit of why that was done, but
presumably flashing takes a long time and holding RTNL for so long
blocks other interactions with the netdev layer.
However, the stack is moving towards netdev instance locking and
dropping and reacquiring RTNL in the context of flashing introduces
locking ordering issues: RTNL must be acquired before the netdev
instance lock and released after it.
This patch therefore takes the simpler approach by no longer dropping
and reacquiring the RTNL, as soon RTNL for ethtool will be removed,
leaving only the instance lock to protect against races.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index e399d7a3d6cb..ea078c9f5d15 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -2060,14 +2060,9 @@ int mlx5e_ethtool_flash_device(struct mlx5e_priv *priv,
if (err)
return err;
- dev_hold(dev);
- rtnl_unlock();
-
err = mlx5_firmware_flash(mdev, fw, NULL);
release_firmware(fw);
- rtnl_lock();
- dev_put(dev);
return err;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net-next 5/5] net/mlx5e: Convert mlx5 netdevs to instance locking
2025-05-21 12:08 [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking Tariq Toukan
` (3 preceding siblings ...)
2025-05-21 12:09 ` [PATCH net-next 4/5] net/mlx5e: Don't drop RTNL during firmware flash Tariq Toukan
@ 2025-05-21 12:09 ` Tariq Toukan
2025-05-21 18:27 ` Stanislav Fomichev
2025-05-22 15:51 ` [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev " Jakub Kicinski
2025-05-22 16:30 ` patchwork-bot+netdevbpf
6 siblings, 1 reply; 13+ messages in thread
From: Tariq Toukan @ 2025-05-21 12:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: Jason Gunthorpe, Leon Romanovsky, Saeed Mahameed, Tariq Toukan,
Richard Cochran, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-rdma, linux-kernel,
netdev, bpf, Moshe Shemesh, Mark Bloch, Gal Pressman,
Cosmin Ratiu
From: Cosmin Ratiu <cratiu@nvidia.com>
This patch convert mlx5 to use the new netdev instance lock in addition
to the pre-existing state_lock (and the RTNL).
mlx5e_priv.state_lock was already used throughout mlx5 to protect
against concurrent state modifications on the same netdev, usually in
addition to the RTNL. The new netdev instance lock will eventually
replace it, but for now, it is acquired in addition to the existing
locks in the order RTNL -> instance lock -> state_lock.
All three netdev types handled by mlx5 are converted to the new style of
locking, because they share a lot of code related to initializing
channels and dealing with NAPI, so it's better to convert all three
rather than introduce different assumptions deep in the call stack
depending on the type of device.
Because of the nature of the call graphs in mlx5, it wasn't possible to
incrementally convert parts of the driver to use the new lock, since
either all call paths into NAPI have to possess the new lock if the
*_locked variants are used, or none of them can have the lock.
One area which required extra care is the interaction between closing
channels and devlink health reporter tasks.
Previously, the recovery tasks were unconditionally acquiring the
RTNL, which could lead to deadlocks in these scenarios:
T1: mlx5e_close (== .ndo_stop(), has RTNL) -> mlx5e_close_locked
-> mlx5e_close_channels -> mlx5e_ptp_close
-> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs
-> mlx5e_ptp_close_txqsq
-> cancel_work_sync(&ptpsq->report_unhealthy_work) waits for
T2: mlx5e_ptpsq_unhealthy_work -> mlx5e_reporter_tx_ptpsq_unhealthy
-> mlx5e_health_report -> devlink_health_report
-> devlink_health_reporter_recover
-> mlx5e_tx_reporter_ptpsq_unhealthy_recover which does:
rtnl_lock(); => Deadlock.
Another similar instance of this is:
T1: mlx5e_close (== .ndo_stop(), has RTNL) -> mlx5e_close_locked
-> mlx5e_close_channels -> mlx5e_ptp_close
-> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs
-> mlx5e_ptp_close_txqsq
-> cancel_work_sync(&sq->recover_work) waits for
T2: mlx5e_tx_err_cqe_work -> mlx5e_reporter_tx_err_cqe
-> mlx5e_health_report -> devlink_health_report
-> devlink_health_reporter_recover
-> mlx5e_tx_reporter_err_cqe_recover which does:
rtnl_lock(); => Another deadlock.
Fix that by using the same pattern previously done in
mlx5e_tx_timeout_work, where the RTNL was repeatedly tried to be
acquired until either:
a) it is successfully acquired or
b) there's no need for the work to be done any more (channel is being
closed).
Now, for all three recovery tasks, the instance lock is repeatedly tried
to be acquired until successful or the channel/SQ is closed.
As a side-effect, drop the !test_bit(MLX5E_STATE_OPENED, &priv->state)
check from mlx5e_tx_timeout_work, it's weaker than
!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state) and unnecessary.
Future patches will introduce new call paths (from netdev queue
management ops) which can close channels (and call cancel_work_sync on
the recovery tasks) without the RTNL lock and only with the netdev
instance lock.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../ethernet/mellanox/mlx5/core/en/health.c | 2 +
.../net/ethernet/mellanox/mlx5/core/en/ptp.c | 25 ++++--
.../mellanox/mlx5/core/en/reporter_tx.c | 4 -
.../net/ethernet/mellanox/mlx5/core/en/trap.c | 12 +--
.../ethernet/mellanox/mlx5/core/en_dcbnl.c | 2 +
.../net/ethernet/mellanox/mlx5/core/en_fs.c | 4 +
.../net/ethernet/mellanox/mlx5/core/en_main.c | 82 ++++++++++++-------
.../net/ethernet/mellanox/mlx5/core/en_rep.c | 7 ++
.../ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 3 +
9 files changed, 96 insertions(+), 45 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
index 81523825faa2..cb972b2d46e2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
@@ -114,6 +114,7 @@ int mlx5e_health_recover_channels(struct mlx5e_priv *priv)
int err = 0;
rtnl_lock();
+ netdev_lock(priv->netdev);
mutex_lock(&priv->state_lock);
if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
@@ -123,6 +124,7 @@ int mlx5e_health_recover_channels(struct mlx5e_priv *priv)
out:
mutex_unlock(&priv->state_lock);
+ netdev_unlock(priv->netdev);
rtnl_unlock();
return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index 131ed97ca997..5d0014129a7e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -8,6 +8,7 @@
#include "en/fs_tt_redirect.h"
#include <linux/list.h>
#include <linux/spinlock.h>
+#include <net/netdev_lock.h>
struct mlx5e_ptp_fs {
struct mlx5_flow_handle *l2_rule;
@@ -449,8 +450,22 @@ static void mlx5e_ptpsq_unhealthy_work(struct work_struct *work)
{
struct mlx5e_ptpsq *ptpsq =
container_of(work, struct mlx5e_ptpsq, report_unhealthy_work);
+ struct mlx5e_txqsq *sq = &ptpsq->txqsq;
+
+ /* Recovering the PTP SQ means re-enabling NAPI, which requires the
+ * netdev instance lock. However, SQ closing has to wait for this work
+ * task to finish while also holding the same lock. So either get the
+ * lock or find that the SQ is no longer enabled and thus this work is
+ * not relevant anymore.
+ */
+ while (!netdev_trylock(sq->netdev)) {
+ if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
+ return;
+ msleep(20);
+ }
mlx5e_reporter_tx_ptpsq_unhealthy(ptpsq);
+ netdev_unlock(sq->netdev);
}
static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn,
@@ -892,7 +907,7 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params,
if (err)
goto err_free;
- netif_napi_add(netdev, &c->napi, mlx5e_ptp_napi_poll);
+ netif_napi_add_locked(netdev, &c->napi, mlx5e_ptp_napi_poll);
mlx5e_ptp_build_params(c, cparams, params);
@@ -910,7 +925,7 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params,
return 0;
err_napi_del:
- netif_napi_del(&c->napi);
+ netif_napi_del_locked(&c->napi);
err_free:
kvfree(cparams);
kvfree(c);
@@ -920,7 +935,7 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params,
void mlx5e_ptp_close(struct mlx5e_ptp *c)
{
mlx5e_ptp_close_queues(c);
- netif_napi_del(&c->napi);
+ netif_napi_del_locked(&c->napi);
kvfree(c);
}
@@ -929,7 +944,7 @@ void mlx5e_ptp_activate_channel(struct mlx5e_ptp *c)
{
int tc;
- napi_enable(&c->napi);
+ napi_enable_locked(&c->napi);
if (test_bit(MLX5E_PTP_STATE_TX, c->state)) {
for (tc = 0; tc < c->num_tc; tc++)
@@ -957,7 +972,7 @@ void mlx5e_ptp_deactivate_channel(struct mlx5e_ptp *c)
mlx5e_deactivate_txqsq(&c->ptpsq[tc].txqsq);
}
- napi_disable(&c->napi);
+ napi_disable_locked(&c->napi);
}
int mlx5e_ptp_get_rqn(struct mlx5e_ptp *c, u32 *rqn)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index dbd9482359e1..c3bda4612fa9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -107,9 +107,7 @@ static int mlx5e_tx_reporter_err_cqe_recover(void *ctx)
mlx5e_reset_txqsq_cc_pc(sq);
sq->stats->recover++;
clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
- rtnl_lock();
mlx5e_activate_txqsq(sq);
- rtnl_unlock();
if (sq->channel)
mlx5e_trigger_napi_icosq(sq->channel);
@@ -176,7 +174,6 @@ static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
priv = ptpsq->txqsq.priv;
- rtnl_lock();
mutex_lock(&priv->state_lock);
chs = &priv->channels;
netdev = priv->netdev;
@@ -196,7 +193,6 @@ static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
netif_carrier_on(netdev);
mutex_unlock(&priv->state_lock);
- rtnl_unlock();
return err;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c
index 140606fcd23b..b5c19396e096 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c
@@ -149,7 +149,7 @@ static struct mlx5e_trap *mlx5e_open_trap(struct mlx5e_priv *priv)
t->mkey_be = cpu_to_be32(priv->mdev->mlx5e_res.hw_objs.mkey);
t->stats = &priv->trap_stats.ch;
- netif_napi_add(netdev, &t->napi, mlx5e_trap_napi_poll);
+ netif_napi_add_locked(netdev, &t->napi, mlx5e_trap_napi_poll);
err = mlx5e_open_trap_rq(priv, t);
if (unlikely(err))
@@ -164,7 +164,7 @@ static struct mlx5e_trap *mlx5e_open_trap(struct mlx5e_priv *priv)
err_close_trap_rq:
mlx5e_close_trap_rq(&t->rq);
err_napi_del:
- netif_napi_del(&t->napi);
+ netif_napi_del_locked(&t->napi);
kvfree(t);
return ERR_PTR(err);
}
@@ -173,13 +173,13 @@ void mlx5e_close_trap(struct mlx5e_trap *trap)
{
mlx5e_tir_destroy(&trap->tir);
mlx5e_close_trap_rq(&trap->rq);
- netif_napi_del(&trap->napi);
+ netif_napi_del_locked(&trap->napi);
kvfree(trap);
}
static void mlx5e_activate_trap(struct mlx5e_trap *trap)
{
- napi_enable(&trap->napi);
+ napi_enable_locked(&trap->napi);
mlx5e_activate_rq(&trap->rq);
mlx5e_trigger_napi_sched(&trap->napi);
}
@@ -189,7 +189,7 @@ void mlx5e_deactivate_trap(struct mlx5e_priv *priv)
struct mlx5e_trap *trap = priv->en_trap;
mlx5e_deactivate_rq(&trap->rq);
- napi_disable(&trap->napi);
+ napi_disable_locked(&trap->napi);
}
static struct mlx5e_trap *mlx5e_add_trap_queue(struct mlx5e_priv *priv)
@@ -285,6 +285,7 @@ int mlx5e_handle_trap_event(struct mlx5e_priv *priv, struct mlx5_trap_ctx *trap_
if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
return 0;
+ netdev_lock(priv->netdev);
switch (trap_ctx->action) {
case DEVLINK_TRAP_ACTION_TRAP:
err = mlx5e_handle_action_trap(priv, trap_ctx->id);
@@ -297,6 +298,7 @@ int mlx5e_handle_trap_event(struct mlx5e_priv *priv, struct mlx5_trap_ctx *trap_
trap_ctx->action);
err = -EINVAL;
}
+ netdev_unlock(priv->netdev);
return err;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 8705cffc747f..5fe016e477b3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -1147,6 +1147,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state)
bool reset = true;
int err;
+ netdev_lock(priv->netdev);
mutex_lock(&priv->state_lock);
new_params = priv->channels.params;
@@ -1162,6 +1163,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state)
&trust_state, reset);
mutex_unlock(&priv->state_lock);
+ netdev_unlock(priv->netdev);
return err;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 05058710d2c7..04a969128161 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -484,7 +484,9 @@ static int mlx5e_vlan_rx_add_svid(struct mlx5e_flow_steering *fs,
}
/* Need to fix some features.. */
+ netdev_lock(netdev);
netdev_update_features(netdev);
+ netdev_unlock(netdev);
return err;
}
@@ -521,7 +523,9 @@ int mlx5e_fs_vlan_rx_kill_vid(struct mlx5e_flow_steering *fs,
} else if (be16_to_cpu(proto) == ETH_P_8021AD) {
clear_bit(vid, fs->vlan->active_svlans);
mlx5e_fs_del_vlan_rule(fs, MLX5E_VLAN_RULE_TYPE_MATCH_STAG_VID, vid);
+ netdev_lock(netdev);
netdev_update_features(netdev);
+ netdev_unlock(netdev);
}
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 9bd166f489e7..ea822c69d137 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -39,6 +39,7 @@
#include <linux/debugfs.h>
#include <linux/if_bridge.h>
#include <linux/filter.h>
+#include <net/netdev_lock.h>
#include <net/netdev_queues.h>
#include <net/page_pool/types.h>
#include <net/pkt_sched.h>
@@ -1903,7 +1904,20 @@ void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
recover_work);
+ /* Recovering queues means re-enabling NAPI, which requires the netdev
+ * instance lock. However, SQ closing flows have to wait for work tasks
+ * to finish while also holding the netdev instance lock. So either get
+ * the lock or find that the SQ is no longer enabled and thus this work
+ * is not relevant anymore.
+ */
+ while (!netdev_trylock(sq->netdev)) {
+ if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
+ return;
+ msleep(20);
+ }
+
mlx5e_reporter_tx_err_cqe(sq);
+ netdev_unlock(sq->netdev);
}
static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
@@ -2705,8 +2719,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
c->aff_mask = irq_get_effective_affinity_mask(irq);
c->lag_port = mlx5e_enumerate_lag_port(mdev, ix);
- netif_napi_add_config(netdev, &c->napi, mlx5e_napi_poll, ix);
- netif_napi_set_irq(&c->napi, irq);
+ netif_napi_add_config_locked(netdev, &c->napi, mlx5e_napi_poll, ix);
+ netif_napi_set_irq_locked(&c->napi, irq);
err = mlx5e_open_queues(c, params, cparam);
if (unlikely(err))
@@ -2728,7 +2742,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
mlx5e_close_queues(c);
err_napi_del:
- netif_napi_del(&c->napi);
+ netif_napi_del_locked(&c->napi);
err_free:
kvfree(cparam);
@@ -2741,7 +2755,7 @@ static void mlx5e_activate_channel(struct mlx5e_channel *c)
{
int tc;
- napi_enable(&c->napi);
+ napi_enable_locked(&c->napi);
for (tc = 0; tc < c->num_tc; tc++)
mlx5e_activate_txqsq(&c->sq[tc]);
@@ -2773,7 +2787,7 @@ static void mlx5e_deactivate_channel(struct mlx5e_channel *c)
mlx5e_deactivate_txqsq(&c->sq[tc]);
mlx5e_qos_deactivate_queues(c);
- napi_disable(&c->napi);
+ napi_disable_locked(&c->napi);
}
static void mlx5e_close_channel(struct mlx5e_channel *c)
@@ -2782,7 +2796,7 @@ static void mlx5e_close_channel(struct mlx5e_channel *c)
mlx5e_close_xsk(c);
mlx5e_close_queues(c);
mlx5e_qos_close_queues(c);
- netif_napi_del(&c->napi);
+ netif_napi_del_locked(&c->napi);
kvfree(c);
}
@@ -4276,7 +4290,7 @@ void mlx5e_set_xdp_feature(struct net_device *netdev)
if (!netdev->netdev_ops->ndo_bpf ||
params->packet_merge.type != MLX5E_PACKET_MERGE_NONE) {
- xdp_clear_features_flag(netdev);
+ xdp_set_features_flag_locked(netdev, 0);
return;
}
@@ -4285,7 +4299,7 @@ void mlx5e_set_xdp_feature(struct net_device *netdev)
NETDEV_XDP_ACT_RX_SG |
NETDEV_XDP_ACT_NDO_XMIT |
NETDEV_XDP_ACT_NDO_XMIT_SG;
- xdp_set_features_flag(netdev, val);
+ xdp_set_features_flag_locked(netdev, val);
}
int mlx5e_set_features(struct net_device *netdev, netdev_features_t features)
@@ -4968,21 +4982,19 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
struct net_device *netdev = priv->netdev;
int i;
- /* Take rtnl_lock to ensure no change in netdev->real_num_tx_queues
- * through this flow. However, channel closing flows have to wait for
- * this work to finish while holding rtnl lock too. So either get the
- * lock or find that channels are being closed for other reason and
- * this work is not relevant anymore.
+ /* Recovering the TX queues implies re-enabling NAPI, which requires
+ * the netdev instance lock.
+ * However, channel closing flows have to wait for this work to finish
+ * while holding the same lock. So either get the lock or find that
+ * channels are being closed for other reason and this work is not
+ * relevant anymore.
*/
- while (!rtnl_trylock()) {
+ while (!netdev_trylock(netdev)) {
if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state))
return;
msleep(20);
}
- if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
- goto unlock;
-
for (i = 0; i < netdev->real_num_tx_queues; i++) {
struct netdev_queue *dev_queue =
netdev_get_tx_queue(netdev, i);
@@ -4996,8 +5008,7 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
break;
}
-unlock:
- rtnl_unlock();
+ netdev_unlock(netdev);
}
static void mlx5e_tx_timeout(struct net_device *dev, unsigned int txqueue)
@@ -5321,7 +5332,6 @@ static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
struct mlx5e_rq_stats *xskrq_stats;
struct mlx5e_rq_stats *rq_stats;
- ASSERT_RTNL();
if (mlx5e_is_uplink_rep(priv) || !priv->stats_nch)
return;
@@ -5341,7 +5351,6 @@ static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5e_sq_stats *sq_stats;
- ASSERT_RTNL();
if (!priv->stats_nch)
return;
@@ -5362,7 +5371,6 @@ static void mlx5e_get_base_stats(struct net_device *dev,
struct mlx5e_ptp *ptp_channel;
int i, tc;
- ASSERT_RTNL();
if (!mlx5e_is_uplink_rep(priv)) {
rx->packets = 0;
rx->bytes = 0;
@@ -5458,6 +5466,8 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
netdev->netdev_ops = &mlx5e_netdev_ops;
netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops;
netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops;
+ netdev->request_ops_lock = true;
+ netdev_lockdep_set_classes(netdev);
mlx5e_dcbnl_build_netdev(netdev);
@@ -5839,9 +5849,11 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
mlx5e_nic_set_rx_mode(priv);
rtnl_lock();
+ netdev_lock(netdev);
if (netif_running(netdev))
mlx5e_open(netdev);
udp_tunnel_nic_reset_ntf(priv->netdev);
+ netdev_unlock(netdev);
netif_device_attach(netdev);
rtnl_unlock();
}
@@ -5854,9 +5866,16 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
mlx5e_dcbnl_delete_app(priv);
rtnl_lock();
+ netdev_lock(priv->netdev);
if (netif_running(priv->netdev))
mlx5e_close(priv->netdev);
netif_device_detach(priv->netdev);
+ if (priv->en_trap) {
+ mlx5e_deactivate_trap(priv);
+ mlx5e_close_trap(priv->en_trap);
+ priv->en_trap = NULL;
+ }
+ netdev_unlock(priv->netdev);
rtnl_unlock();
mlx5e_nic_set_rx_mode(priv);
@@ -5866,11 +5885,6 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
mlx5e_monitor_counter_cleanup(priv);
mlx5e_disable_blocking_events(priv);
- if (priv->en_trap) {
- mlx5e_deactivate_trap(priv);
- mlx5e_close_trap(priv->en_trap);
- priv->en_trap = NULL;
- }
mlx5e_disable_async_events(priv);
mlx5_lag_remove_netdev(mdev, priv->netdev);
mlx5_vxlan_reset_to_default(mdev->vxlan);
@@ -6125,7 +6139,9 @@ static void mlx5e_update_features(struct net_device *netdev)
return; /* features will be updated on netdev registration */
rtnl_lock();
+ netdev_lock(netdev);
netdev_update_features(netdev);
+ netdev_unlock(netdev);
rtnl_unlock();
}
@@ -6136,7 +6152,7 @@ static void mlx5e_reset_channels(struct net_device *netdev)
int mlx5e_attach_netdev(struct mlx5e_priv *priv)
{
- const bool take_rtnl = priv->netdev->reg_state == NETREG_REGISTERED;
+ const bool need_lock = priv->netdev->reg_state == NETREG_REGISTERED;
const struct mlx5e_profile *profile = priv->profile;
int max_nch;
int err;
@@ -6178,15 +6194,19 @@ int mlx5e_attach_netdev(struct mlx5e_priv *priv)
* 2. Set our default XPS cpumask.
* 3. Build the RQT.
*
- * rtnl_lock is required by netif_set_real_num_*_queues in case the
+ * Locking is required by netif_set_real_num_*_queues in case the
* netdev has been registered by this point (if this function was called
* in the reload or resume flow).
*/
- if (take_rtnl)
+ if (need_lock) {
rtnl_lock();
+ netdev_lock(priv->netdev);
+ }
err = mlx5e_num_channels_changed(priv);
- if (take_rtnl)
+ if (need_lock) {
+ netdev_unlock(priv->netdev);
rtnl_unlock();
+ }
if (err)
goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 2abab241f03b..719aa16bd404 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -33,6 +33,7 @@
#include <linux/dim.h>
#include <linux/debugfs.h>
#include <linux/mlx5/fs.h>
+#include <net/netdev_lock.h>
#include <net/switchdev.h>
#include <net/pkt_cls.h>
#include <net/act_api.h>
@@ -885,6 +886,8 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev,
{
SET_NETDEV_DEV(netdev, mdev->device);
netdev->netdev_ops = &mlx5e_netdev_ops_rep;
+ netdev->request_ops_lock = true;
+ netdev_lockdep_set_classes(netdev);
eth_hw_addr_random(netdev);
netdev->ethtool_ops = &mlx5e_rep_ethtool_ops;
@@ -1344,9 +1347,11 @@ static void mlx5e_uplink_rep_enable(struct mlx5e_priv *priv)
netdev->wanted_features |= NETIF_F_HW_TC;
rtnl_lock();
+ netdev_lock(netdev);
if (netif_running(netdev))
mlx5e_open(netdev);
udp_tunnel_nic_reset_ntf(priv->netdev);
+ netdev_unlock(netdev);
netif_device_attach(netdev);
rtnl_unlock();
}
@@ -1356,9 +1361,11 @@ static void mlx5e_uplink_rep_disable(struct mlx5e_priv *priv)
struct mlx5_core_dev *mdev = priv->mdev;
rtnl_lock();
+ netdev_lock(priv->netdev);
if (netif_running(priv->netdev))
mlx5e_close(priv->netdev);
netif_device_detach(priv->netdev);
+ netdev_unlock(priv->netdev);
rtnl_unlock();
mlx5e_rep_bridge_cleanup(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 0979d672d47f..79ae3a51a4b3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -32,6 +32,7 @@
#include <rdma/ib_verbs.h>
#include <linux/mlx5/fs.h>
+#include <net/netdev_lock.h>
#include "en.h"
#include "en/params.h"
#include "ipoib.h"
@@ -102,6 +103,8 @@ int mlx5i_init(struct mlx5_core_dev *mdev, struct net_device *netdev)
netdev->netdev_ops = &mlx5i_netdev_ops;
netdev->ethtool_ops = &mlx5i_ethtool_ops;
+ netdev->request_ops_lock = true;
+ netdev_lockdep_set_classes(netdev);
return 0;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 5/5] net/mlx5e: Convert mlx5 netdevs to instance locking
2025-05-21 12:09 ` [PATCH net-next 5/5] net/mlx5e: Convert mlx5 netdevs to instance locking Tariq Toukan
@ 2025-05-21 18:27 ` Stanislav Fomichev
2025-05-21 20:37 ` Cosmin Ratiu
0 siblings, 1 reply; 13+ messages in thread
From: Stanislav Fomichev @ 2025-05-21 18:27 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Jason Gunthorpe, Leon Romanovsky, Saeed Mahameed,
Richard Cochran, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-rdma, linux-kernel,
netdev, bpf, Moshe Shemesh, Mark Bloch, Gal Pressman,
Cosmin Ratiu
On 05/21, Tariq Toukan wrote:
> From: Cosmin Ratiu <cratiu@nvidia.com>
>
> This patch convert mlx5 to use the new netdev instance lock in addition
> to the pre-existing state_lock (and the RTNL).
>
> mlx5e_priv.state_lock was already used throughout mlx5 to protect
> against concurrent state modifications on the same netdev, usually in
> addition to the RTNL. The new netdev instance lock will eventually
> replace it, but for now, it is acquired in addition to the existing
> locks in the order RTNL -> instance lock -> state_lock.
>
> All three netdev types handled by mlx5 are converted to the new style of
> locking, because they share a lot of code related to initializing
> channels and dealing with NAPI, so it's better to convert all three
> rather than introduce different assumptions deep in the call stack
> depending on the type of device.
>
> Because of the nature of the call graphs in mlx5, it wasn't possible to
> incrementally convert parts of the driver to use the new lock, since
> either all call paths into NAPI have to possess the new lock if the
> *_locked variants are used, or none of them can have the lock.
>
> One area which required extra care is the interaction between closing
> channels and devlink health reporter tasks.
> Previously, the recovery tasks were unconditionally acquiring the
> RTNL, which could lead to deadlocks in these scenarios:
>
> T1: mlx5e_close (== .ndo_stop(), has RTNL) -> mlx5e_close_locked
> -> mlx5e_close_channels -> mlx5e_ptp_close
> -> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs
> -> mlx5e_ptp_close_txqsq
> -> cancel_work_sync(&ptpsq->report_unhealthy_work) waits for
>
> T2: mlx5e_ptpsq_unhealthy_work -> mlx5e_reporter_tx_ptpsq_unhealthy
> -> mlx5e_health_report -> devlink_health_report
> -> devlink_health_reporter_recover
> -> mlx5e_tx_reporter_ptpsq_unhealthy_recover which does:
> rtnl_lock(); => Deadlock.
>
> Another similar instance of this is:
> T1: mlx5e_close (== .ndo_stop(), has RTNL) -> mlx5e_close_locked
> -> mlx5e_close_channels -> mlx5e_ptp_close
> -> mlx5e_ptp_close_queues -> mlx5e_ptp_close_txqsqs
> -> mlx5e_ptp_close_txqsq
> -> cancel_work_sync(&sq->recover_work) waits for
>
> T2: mlx5e_tx_err_cqe_work -> mlx5e_reporter_tx_err_cqe
> -> mlx5e_health_report -> devlink_health_report
> -> devlink_health_reporter_recover
> -> mlx5e_tx_reporter_err_cqe_recover which does:
> rtnl_lock(); => Another deadlock.
>
> Fix that by using the same pattern previously done in
> mlx5e_tx_timeout_work, where the RTNL was repeatedly tried to be
> acquired until either:
> a) it is successfully acquired or
> b) there's no need for the work to be done any more (channel is being
> closed).
>
> Now, for all three recovery tasks, the instance lock is repeatedly tried
> to be acquired until successful or the channel/SQ is closed.
> As a side-effect, drop the !test_bit(MLX5E_STATE_OPENED, &priv->state)
> check from mlx5e_tx_timeout_work, it's weaker than
> !test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state) and unnecessary.
>
> Future patches will introduce new call paths (from netdev queue
> management ops) which can close channels (and call cancel_work_sync on
> the recovery tasks) without the RTNL lock and only with the netdev
> instance lock.
>
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> .../ethernet/mellanox/mlx5/core/en/health.c | 2 +
> .../net/ethernet/mellanox/mlx5/core/en/ptp.c | 25 ++++--
> .../mellanox/mlx5/core/en/reporter_tx.c | 4 -
> .../net/ethernet/mellanox/mlx5/core/en/trap.c | 12 +--
> .../ethernet/mellanox/mlx5/core/en_dcbnl.c | 2 +
> .../net/ethernet/mellanox/mlx5/core/en_fs.c | 4 +
> .../net/ethernet/mellanox/mlx5/core/en_main.c | 82 ++++++++++++-------
> .../net/ethernet/mellanox/mlx5/core/en_rep.c | 7 ++
> .../ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 3 +
> 9 files changed, 96 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
> index 81523825faa2..cb972b2d46e2 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
> @@ -114,6 +114,7 @@ int mlx5e_health_recover_channels(struct mlx5e_priv *priv)
> int err = 0;
>
> rtnl_lock();
> + netdev_lock(priv->netdev);
> mutex_lock(&priv->state_lock);
>
> if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
> @@ -123,6 +124,7 @@ int mlx5e_health_recover_channels(struct mlx5e_priv *priv)
>
> out:
> mutex_unlock(&priv->state_lock);
> + netdev_unlock(priv->netdev);
> rtnl_unlock();
>
> return err;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> index 131ed97ca997..5d0014129a7e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> @@ -8,6 +8,7 @@
> #include "en/fs_tt_redirect.h"
> #include <linux/list.h>
> #include <linux/spinlock.h>
> +#include <net/netdev_lock.h>
>
> struct mlx5e_ptp_fs {
> struct mlx5_flow_handle *l2_rule;
> @@ -449,8 +450,22 @@ static void mlx5e_ptpsq_unhealthy_work(struct work_struct *work)
> {
> struct mlx5e_ptpsq *ptpsq =
> container_of(work, struct mlx5e_ptpsq, report_unhealthy_work);
> + struct mlx5e_txqsq *sq = &ptpsq->txqsq;
> +
> + /* Recovering the PTP SQ means re-enabling NAPI, which requires the
> + * netdev instance lock. However, SQ closing has to wait for this work
> + * task to finish while also holding the same lock. So either get the
> + * lock or find that the SQ is no longer enabled and thus this work is
> + * not relevant anymore.
> + */
> + while (!netdev_trylock(sq->netdev)) {
> + if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
> + return;
> + msleep(20);
> + }
>
> mlx5e_reporter_tx_ptpsq_unhealthy(ptpsq);
> + netdev_unlock(sq->netdev);
> }
>
> static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn,
> @@ -892,7 +907,7 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params,
> if (err)
> goto err_free;
>
> - netif_napi_add(netdev, &c->napi, mlx5e_ptp_napi_poll);
> + netif_napi_add_locked(netdev, &c->napi, mlx5e_ptp_napi_poll);
>
> mlx5e_ptp_build_params(c, cparams, params);
>
> @@ -910,7 +925,7 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params,
> return 0;
>
> err_napi_del:
> - netif_napi_del(&c->napi);
> + netif_napi_del_locked(&c->napi);
> err_free:
> kvfree(cparams);
> kvfree(c);
> @@ -920,7 +935,7 @@ int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params,
> void mlx5e_ptp_close(struct mlx5e_ptp *c)
> {
> mlx5e_ptp_close_queues(c);
> - netif_napi_del(&c->napi);
> + netif_napi_del_locked(&c->napi);
>
> kvfree(c);
> }
> @@ -929,7 +944,7 @@ void mlx5e_ptp_activate_channel(struct mlx5e_ptp *c)
> {
> int tc;
>
> - napi_enable(&c->napi);
> + napi_enable_locked(&c->napi);
>
> if (test_bit(MLX5E_PTP_STATE_TX, c->state)) {
> for (tc = 0; tc < c->num_tc; tc++)
> @@ -957,7 +972,7 @@ void mlx5e_ptp_deactivate_channel(struct mlx5e_ptp *c)
> mlx5e_deactivate_txqsq(&c->ptpsq[tc].txqsq);
> }
>
> - napi_disable(&c->napi);
> + napi_disable_locked(&c->napi);
> }
>
> int mlx5e_ptp_get_rqn(struct mlx5e_ptp *c, u32 *rqn)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> index dbd9482359e1..c3bda4612fa9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> @@ -107,9 +107,7 @@ static int mlx5e_tx_reporter_err_cqe_recover(void *ctx)
> mlx5e_reset_txqsq_cc_pc(sq);
> sq->stats->recover++;
> clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
> - rtnl_lock();
> mlx5e_activate_txqsq(sq);
> - rtnl_unlock();
>
> if (sq->channel)
> mlx5e_trigger_napi_icosq(sq->channel);
> @@ -176,7 +174,6 @@ static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
>
> priv = ptpsq->txqsq.priv;
>
> - rtnl_lock();
> mutex_lock(&priv->state_lock);
> chs = &priv->channels;
> netdev = priv->netdev;
> @@ -196,7 +193,6 @@ static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
> netif_carrier_on(netdev);
>
> mutex_unlock(&priv->state_lock);
> - rtnl_unlock();
>
> return err;
> }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c
> index 140606fcd23b..b5c19396e096 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c
> @@ -149,7 +149,7 @@ static struct mlx5e_trap *mlx5e_open_trap(struct mlx5e_priv *priv)
> t->mkey_be = cpu_to_be32(priv->mdev->mlx5e_res.hw_objs.mkey);
> t->stats = &priv->trap_stats.ch;
>
> - netif_napi_add(netdev, &t->napi, mlx5e_trap_napi_poll);
> + netif_napi_add_locked(netdev, &t->napi, mlx5e_trap_napi_poll);
>
> err = mlx5e_open_trap_rq(priv, t);
> if (unlikely(err))
> @@ -164,7 +164,7 @@ static struct mlx5e_trap *mlx5e_open_trap(struct mlx5e_priv *priv)
> err_close_trap_rq:
> mlx5e_close_trap_rq(&t->rq);
> err_napi_del:
> - netif_napi_del(&t->napi);
> + netif_napi_del_locked(&t->napi);
> kvfree(t);
> return ERR_PTR(err);
> }
> @@ -173,13 +173,13 @@ void mlx5e_close_trap(struct mlx5e_trap *trap)
> {
> mlx5e_tir_destroy(&trap->tir);
> mlx5e_close_trap_rq(&trap->rq);
> - netif_napi_del(&trap->napi);
> + netif_napi_del_locked(&trap->napi);
> kvfree(trap);
> }
>
> static void mlx5e_activate_trap(struct mlx5e_trap *trap)
> {
> - napi_enable(&trap->napi);
> + napi_enable_locked(&trap->napi);
> mlx5e_activate_rq(&trap->rq);
> mlx5e_trigger_napi_sched(&trap->napi);
> }
> @@ -189,7 +189,7 @@ void mlx5e_deactivate_trap(struct mlx5e_priv *priv)
> struct mlx5e_trap *trap = priv->en_trap;
>
> mlx5e_deactivate_rq(&trap->rq);
> - napi_disable(&trap->napi);
> + napi_disable_locked(&trap->napi);
> }
>
> static struct mlx5e_trap *mlx5e_add_trap_queue(struct mlx5e_priv *priv)
> @@ -285,6 +285,7 @@ int mlx5e_handle_trap_event(struct mlx5e_priv *priv, struct mlx5_trap_ctx *trap_
> if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
> return 0;
>
> + netdev_lock(priv->netdev);
> switch (trap_ctx->action) {
> case DEVLINK_TRAP_ACTION_TRAP:
> err = mlx5e_handle_action_trap(priv, trap_ctx->id);
> @@ -297,6 +298,7 @@ int mlx5e_handle_trap_event(struct mlx5e_priv *priv, struct mlx5_trap_ctx *trap_
> trap_ctx->action);
> err = -EINVAL;
> }
> + netdev_unlock(priv->netdev);
> return err;
> }
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
> index 8705cffc747f..5fe016e477b3 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
> @@ -1147,6 +1147,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state)
> bool reset = true;
> int err;
>
> + netdev_lock(priv->netdev);
> mutex_lock(&priv->state_lock);
>
> new_params = priv->channels.params;
> @@ -1162,6 +1163,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state)
> &trust_state, reset);
>
> mutex_unlock(&priv->state_lock);
> + netdev_unlock(priv->netdev);
>
> return err;
> }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
> index 05058710d2c7..04a969128161 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
> @@ -484,7 +484,9 @@ static int mlx5e_vlan_rx_add_svid(struct mlx5e_flow_steering *fs,
> }
>
> /* Need to fix some features.. */
> + netdev_lock(netdev);
> netdev_update_features(netdev);
> + netdev_unlock(netdev);
> return err;
> }
>
> @@ -521,7 +523,9 @@ int mlx5e_fs_vlan_rx_kill_vid(struct mlx5e_flow_steering *fs,
> } else if (be16_to_cpu(proto) == ETH_P_8021AD) {
> clear_bit(vid, fs->vlan->active_svlans);
> mlx5e_fs_del_vlan_rule(fs, MLX5E_VLAN_RULE_TYPE_MATCH_STAG_VID, vid);
> + netdev_lock(netdev);
> netdev_update_features(netdev);
> + netdev_unlock(netdev);
> }
>
> return 0;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 9bd166f489e7..ea822c69d137 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -39,6 +39,7 @@
> #include <linux/debugfs.h>
> #include <linux/if_bridge.h>
> #include <linux/filter.h>
> +#include <net/netdev_lock.h>
> #include <net/netdev_queues.h>
> #include <net/page_pool/types.h>
> #include <net/pkt_sched.h>
> @@ -1903,7 +1904,20 @@ void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
> struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
> recover_work);
>
> + /* Recovering queues means re-enabling NAPI, which requires the netdev
> + * instance lock. However, SQ closing flows have to wait for work tasks
> + * to finish while also holding the netdev instance lock. So either get
> + * the lock or find that the SQ is no longer enabled and thus this work
> + * is not relevant anymore.
> + */
> + while (!netdev_trylock(sq->netdev)) {
> + if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
> + return;
> + msleep(20);
> + }
> +
> mlx5e_reporter_tx_err_cqe(sq);
> + netdev_unlock(sq->netdev);
> }
>
> static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
> @@ -2705,8 +2719,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
> c->aff_mask = irq_get_effective_affinity_mask(irq);
> c->lag_port = mlx5e_enumerate_lag_port(mdev, ix);
>
> - netif_napi_add_config(netdev, &c->napi, mlx5e_napi_poll, ix);
> - netif_napi_set_irq(&c->napi, irq);
> + netif_napi_add_config_locked(netdev, &c->napi, mlx5e_napi_poll, ix);
> + netif_napi_set_irq_locked(&c->napi, irq);
>
> err = mlx5e_open_queues(c, params, cparam);
> if (unlikely(err))
> @@ -2728,7 +2742,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
> mlx5e_close_queues(c);
>
> err_napi_del:
> - netif_napi_del(&c->napi);
> + netif_napi_del_locked(&c->napi);
>
> err_free:
> kvfree(cparam);
> @@ -2741,7 +2755,7 @@ static void mlx5e_activate_channel(struct mlx5e_channel *c)
> {
> int tc;
>
> - napi_enable(&c->napi);
> + napi_enable_locked(&c->napi);
>
> for (tc = 0; tc < c->num_tc; tc++)
> mlx5e_activate_txqsq(&c->sq[tc]);
> @@ -2773,7 +2787,7 @@ static void mlx5e_deactivate_channel(struct mlx5e_channel *c)
> mlx5e_deactivate_txqsq(&c->sq[tc]);
> mlx5e_qos_deactivate_queues(c);
>
> - napi_disable(&c->napi);
> + napi_disable_locked(&c->napi);
> }
>
> static void mlx5e_close_channel(struct mlx5e_channel *c)
> @@ -2782,7 +2796,7 @@ static void mlx5e_close_channel(struct mlx5e_channel *c)
> mlx5e_close_xsk(c);
> mlx5e_close_queues(c);
> mlx5e_qos_close_queues(c);
> - netif_napi_del(&c->napi);
> + netif_napi_del_locked(&c->napi);
>
> kvfree(c);
> }
> @@ -4276,7 +4290,7 @@ void mlx5e_set_xdp_feature(struct net_device *netdev)
>
> if (!netdev->netdev_ops->ndo_bpf ||
> params->packet_merge.type != MLX5E_PACKET_MERGE_NONE) {
> - xdp_clear_features_flag(netdev);
> + xdp_set_features_flag_locked(netdev, 0);
> return;
> }
>
> @@ -4285,7 +4299,7 @@ void mlx5e_set_xdp_feature(struct net_device *netdev)
> NETDEV_XDP_ACT_RX_SG |
> NETDEV_XDP_ACT_NDO_XMIT |
> NETDEV_XDP_ACT_NDO_XMIT_SG;
> - xdp_set_features_flag(netdev, val);
> + xdp_set_features_flag_locked(netdev, val);
> }
>
> int mlx5e_set_features(struct net_device *netdev, netdev_features_t features)
> @@ -4968,21 +4982,19 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
> struct net_device *netdev = priv->netdev;
> int i;
>
> - /* Take rtnl_lock to ensure no change in netdev->real_num_tx_queues
> - * through this flow. However, channel closing flows have to wait for
> - * this work to finish while holding rtnl lock too. So either get the
> - * lock or find that channels are being closed for other reason and
> - * this work is not relevant anymore.
> + /* Recovering the TX queues implies re-enabling NAPI, which requires
> + * the netdev instance lock.
> + * However, channel closing flows have to wait for this work to finish
> + * while holding the same lock. So either get the lock or find that
> + * channels are being closed for other reason and this work is not
> + * relevant anymore.
> */
> - while (!rtnl_trylock()) {
> + while (!netdev_trylock(netdev)) {
> if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state))
> return;
> msleep(20);
> }
>
> - if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
> - goto unlock;
> -
> for (i = 0; i < netdev->real_num_tx_queues; i++) {
> struct netdev_queue *dev_queue =
> netdev_get_tx_queue(netdev, i);
> @@ -4996,8 +5008,7 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
> break;
> }
>
> -unlock:
> - rtnl_unlock();
> + netdev_unlock(netdev);
> }
>
> static void mlx5e_tx_timeout(struct net_device *dev, unsigned int txqueue)
> @@ -5321,7 +5332,6 @@ static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
> struct mlx5e_rq_stats *xskrq_stats;
> struct mlx5e_rq_stats *rq_stats;
>
> - ASSERT_RTNL();
> if (mlx5e_is_uplink_rep(priv) || !priv->stats_nch)
> return;
>
> @@ -5341,7 +5351,6 @@ static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
> struct mlx5e_priv *priv = netdev_priv(dev);
> struct mlx5e_sq_stats *sq_stats;
>
> - ASSERT_RTNL();
> if (!priv->stats_nch)
> return;
>
> @@ -5362,7 +5371,6 @@ static void mlx5e_get_base_stats(struct net_device *dev,
> struct mlx5e_ptp *ptp_channel;
> int i, tc;
>
> - ASSERT_RTNL();
> if (!mlx5e_is_uplink_rep(priv)) {
> rx->packets = 0;
> rx->bytes = 0;
> @@ -5458,6 +5466,8 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
> netdev->netdev_ops = &mlx5e_netdev_ops;
> netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops;
> netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops;
> + netdev->request_ops_lock = true;
> + netdev_lockdep_set_classes(netdev);
>
> mlx5e_dcbnl_build_netdev(netdev);
>
> @@ -5839,9 +5849,11 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
> mlx5e_nic_set_rx_mode(priv);
>
> rtnl_lock();
> + netdev_lock(netdev);
> if (netif_running(netdev))
> mlx5e_open(netdev);
> udp_tunnel_nic_reset_ntf(priv->netdev);
> + netdev_unlock(netdev);
> netif_device_attach(netdev);
> rtnl_unlock();
> }
> @@ -5854,9 +5866,16 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
> mlx5e_dcbnl_delete_app(priv);
>
> rtnl_lock();
> + netdev_lock(priv->netdev);
> if (netif_running(priv->netdev))
> mlx5e_close(priv->netdev);
> netif_device_detach(priv->netdev);
> + if (priv->en_trap) {
> + mlx5e_deactivate_trap(priv);
> + mlx5e_close_trap(priv->en_trap);
> + priv->en_trap = NULL;
> + }
> + netdev_unlock(priv->netdev);
> rtnl_unlock();
>
> mlx5e_nic_set_rx_mode(priv);
> @@ -5866,11 +5885,6 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
> mlx5e_monitor_counter_cleanup(priv);
>
> mlx5e_disable_blocking_events(priv);
> - if (priv->en_trap) {
> - mlx5e_deactivate_trap(priv);
> - mlx5e_close_trap(priv->en_trap);
> - priv->en_trap = NULL;
> - }
> mlx5e_disable_async_events(priv);
> mlx5_lag_remove_netdev(mdev, priv->netdev);
> mlx5_vxlan_reset_to_default(mdev->vxlan);
> @@ -6125,7 +6139,9 @@ static void mlx5e_update_features(struct net_device *netdev)
> return; /* features will be updated on netdev registration */
>
> rtnl_lock();
> + netdev_lock(netdev);
> netdev_update_features(netdev);
> + netdev_unlock(netdev);
> rtnl_unlock();
> }
>
> @@ -6136,7 +6152,7 @@ static void mlx5e_reset_channels(struct net_device *netdev)
>
> int mlx5e_attach_netdev(struct mlx5e_priv *priv)
> {
> - const bool take_rtnl = priv->netdev->reg_state == NETREG_REGISTERED;
> + const bool need_lock = priv->netdev->reg_state == NETREG_REGISTERED;
> const struct mlx5e_profile *profile = priv->profile;
> int max_nch;
> int err;
> @@ -6178,15 +6194,19 @@ int mlx5e_attach_netdev(struct mlx5e_priv *priv)
> * 2. Set our default XPS cpumask.
> * 3. Build the RQT.
> *
> - * rtnl_lock is required by netif_set_real_num_*_queues in case the
> + * Locking is required by netif_set_real_num_*_queues in case the
> * netdev has been registered by this point (if this function was called
> * in the reload or resume flow).
> */
> - if (take_rtnl)
> + if (need_lock) {
> rtnl_lock();
> + netdev_lock(priv->netdev);
> + }
> err = mlx5e_num_channels_changed(priv);
> - if (take_rtnl)
> + if (need_lock) {
> + netdev_unlock(priv->netdev);
> rtnl_unlock();
> + }
> if (err)
> goto out;
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 2abab241f03b..719aa16bd404 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -33,6 +33,7 @@
> #include <linux/dim.h>
> #include <linux/debugfs.h>
> #include <linux/mlx5/fs.h>
> +#include <net/netdev_lock.h>
> #include <net/switchdev.h>
> #include <net/pkt_cls.h>
> #include <net/act_api.h>
> @@ -885,6 +886,8 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev,
> {
> SET_NETDEV_DEV(netdev, mdev->device);
> netdev->netdev_ops = &mlx5e_netdev_ops_rep;
> + netdev->request_ops_lock = true;
> + netdev_lockdep_set_classes(netdev);
> eth_hw_addr_random(netdev);
> netdev->ethtool_ops = &mlx5e_rep_ethtool_ops;
>
> @@ -1344,9 +1347,11 @@ static void mlx5e_uplink_rep_enable(struct mlx5e_priv *priv)
> netdev->wanted_features |= NETIF_F_HW_TC;
>
> rtnl_lock();
> + netdev_lock(netdev);
> if (netif_running(netdev))
> mlx5e_open(netdev);
> udp_tunnel_nic_reset_ntf(priv->netdev);
> + netdev_unlock(netdev);
> netif_device_attach(netdev);
> rtnl_unlock();
> }
> @@ -1356,9 +1361,11 @@ static void mlx5e_uplink_rep_disable(struct mlx5e_priv *priv)
> struct mlx5_core_dev *mdev = priv->mdev;
>
> rtnl_lock();
> + netdev_lock(priv->netdev);
> if (netif_running(priv->netdev))
> mlx5e_close(priv->netdev);
> netif_device_detach(priv->netdev);
> + netdev_unlock(priv->netdev);
> rtnl_unlock();
>
> mlx5e_rep_bridge_cleanup(priv);
[..]
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> index 0979d672d47f..79ae3a51a4b3 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> @@ -32,6 +32,7 @@
>
> #include <rdma/ib_verbs.h>
> #include <linux/mlx5/fs.h>
> +#include <net/netdev_lock.h>
> #include "en.h"
> #include "en/params.h"
> #include "ipoib.h"
> @@ -102,6 +103,8 @@ int mlx5i_init(struct mlx5_core_dev *mdev, struct net_device *netdev)
>
> netdev->netdev_ops = &mlx5i_netdev_ops;
> netdev->ethtool_ops = &mlx5i_ethtool_ops;
> + netdev->request_ops_lock = true;
> + netdev_lockdep_set_classes(netdev);
>
> return 0;
> }
Out of curiosity: any reason this is part of patch 5 and not patch 4?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 5/5] net/mlx5e: Convert mlx5 netdevs to instance locking
2025-05-21 18:27 ` Stanislav Fomichev
@ 2025-05-21 20:37 ` Cosmin Ratiu
2025-05-22 15:28 ` Stanislav Fomichev
0 siblings, 1 reply; 13+ messages in thread
From: Cosmin Ratiu @ 2025-05-21 20:37 UTC (permalink / raw)
To: Tariq Toukan, stfomichev@gmail.com
Cc: andrew+netdev@lunn.ch, hawk@kernel.org, davem@davemloft.net,
leon@kernel.org, john.fastabend@gmail.com,
linux-kernel@vger.kernel.org, edumazet@google.com,
linux-rdma@vger.kernel.org, richardcochran@gmail.com,
pabeni@redhat.com, ast@kernel.org, kuba@kernel.org,
daniel@iogearbox.net, bpf@vger.kernel.org, Saeed Mahameed,
netdev@vger.kernel.org, Mark Bloch, Moshe Shemesh, jgg@ziepe.ca,
Gal Pressman
On Wed, 2025-05-21 at 11:27 -0700, Stanislav Fomichev wrote:
> On 05/21, Tariq Toukan wrote:
>
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> > index 0979d672d47f..79ae3a51a4b3 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> > @@ -32,6 +32,7 @@
> >
> > #include <rdma/ib_verbs.h>
> > #include <linux/mlx5/fs.h>
> > +#include <net/netdev_lock.h>
> > #include "en.h"
> > #include "en/params.h"
> > #include "ipoib.h"
> > @@ -102,6 +103,8 @@ int mlx5i_init(struct mlx5_core_dev *mdev,
> > struct net_device *netdev)
> >
> > netdev->netdev_ops = &mlx5i_netdev_ops;
> > netdev->ethtool_ops = &mlx5i_ethtool_ops;
> > + netdev->request_ops_lock = true;
> > + netdev_lockdep_set_classes(netdev);
> >
> > return 0;
> > }
>
> Out of curiosity: any reason this is part of patch 5 and not patch 4?
If you're referring to enabling instance locking in
drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c and by patch 5
you meant patch 3, this part cannot be submitted separately from the
other changes in this patch, as without all of the changes we'd either
get assertion failures from missing the instance lock or deadlocks
(e.g. from using the dev_* instead of netif_* functions).
As I tried to explain in the description, I couldn't figure out a way
to split this change into smaller units, as the call graph looks like a
ball of hair spit out by a cat.
Cosmin.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 5/5] net/mlx5e: Convert mlx5 netdevs to instance locking
2025-05-21 20:37 ` Cosmin Ratiu
@ 2025-05-22 15:28 ` Stanislav Fomichev
0 siblings, 0 replies; 13+ messages in thread
From: Stanislav Fomichev @ 2025-05-22 15:28 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: Tariq Toukan, andrew+netdev@lunn.ch, hawk@kernel.org,
davem@davemloft.net, leon@kernel.org, john.fastabend@gmail.com,
linux-kernel@vger.kernel.org, edumazet@google.com,
linux-rdma@vger.kernel.org, richardcochran@gmail.com,
pabeni@redhat.com, ast@kernel.org, kuba@kernel.org,
daniel@iogearbox.net, bpf@vger.kernel.org, Saeed Mahameed,
netdev@vger.kernel.org, Mark Bloch, Moshe Shemesh, jgg@ziepe.ca,
Gal Pressman
On 05/21, Cosmin Ratiu wrote:
> On Wed, 2025-05-21 at 11:27 -0700, Stanislav Fomichev wrote:
> > On 05/21, Tariq Toukan wrote:
> >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> > > b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> > > index 0979d672d47f..79ae3a51a4b3 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
> > > @@ -32,6 +32,7 @@
> > >
> > > #include <rdma/ib_verbs.h>
> > > #include <linux/mlx5/fs.h>
> > > +#include <net/netdev_lock.h>
> > > #include "en.h"
> > > #include "en/params.h"
> > > #include "ipoib.h"
> > > @@ -102,6 +103,8 @@ int mlx5i_init(struct mlx5_core_dev *mdev,
> > > struct net_device *netdev)
> > >
> > > netdev->netdev_ops = &mlx5i_netdev_ops;
> > > netdev->ethtool_ops = &mlx5i_ethtool_ops;
> > > + netdev->request_ops_lock = true;
> > > + netdev_lockdep_set_classes(netdev);
> > >
> > > return 0;
> > > }
> >
> > Out of curiosity: any reason this is part of patch 5 and not patch 4?
>
> If you're referring to enabling instance locking in
> drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c and by patch 5
> you meant patch 3, this part cannot be submitted separately from the
> other changes in this patch, as without all of the changes we'd either
> get assertion failures from missing the instance lock or deadlocks
> (e.g. from using the dev_* instead of netif_* functions).
>
> As I tried to explain in the description, I couldn't figure out a way
> to split this change into smaller units, as the call graph looks like a
> ball of hair spit out by a cat.
SG, thanks for clarifying!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking
2025-05-21 12:08 [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking Tariq Toukan
` (4 preceding siblings ...)
2025-05-21 12:09 ` [PATCH net-next 5/5] net/mlx5e: Convert mlx5 netdevs to instance locking Tariq Toukan
@ 2025-05-22 15:51 ` Jakub Kicinski
2025-05-22 16:10 ` Leon Romanovsky
2025-05-22 16:30 ` patchwork-bot+netdevbpf
6 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-05-22 15:51 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Jason Gunthorpe, Leon Romanovsky, Saeed Mahameed, Richard Cochran,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, linux-rdma, linux-kernel, netdev, bpf,
Moshe Shemesh, Mark Bloch, Gal Pressman, Cosmin Ratiu
On Wed, 21 May 2025 15:08:57 +0300 Tariq Toukan wrote:
> This series by Cosmin converts mlx5 to use the recently added netdev
> instance locking scheme.
Are you planning to re-submit this as a PR?
The subject tag and Leon's reviews being present makes me think
that applying directly is fine, but I wanted to confirm..
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 4/5] net/mlx5e: Don't drop RTNL during firmware flash
2025-05-21 12:09 ` [PATCH net-next 4/5] net/mlx5e: Don't drop RTNL during firmware flash Tariq Toukan
@ 2025-05-22 16:00 ` Jakub Kicinski
0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2025-05-22 16:00 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Jason Gunthorpe, Leon Romanovsky, Saeed Mahameed, Richard Cochran,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, linux-rdma, linux-kernel, netdev, bpf,
Moshe Shemesh, Mark Bloch, Gal Pressman, Cosmin Ratiu
On Wed, 21 May 2025 15:09:01 +0300 Tariq Toukan wrote:
> However, the stack is moving towards netdev instance locking and
> dropping and reacquiring RTNL in the context of flashing introduces
> locking ordering issues: RTNL must be acquired before the netdev
> instance lock and released after it.
>
> This patch therefore takes the simpler approach by no longer dropping
> and reacquiring the RTNL, as soon RTNL for ethtool will be removed,
> leaving only the instance lock to protect against races.
You didn't mention it so just in case someone tries to report this
as a regression later - devlink has been the preferred way to flash
devices for 5+ years. It has much better UX with the progress
notifications, and already does per-instance locking.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking
2025-05-22 15:51 ` [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev " Jakub Kicinski
@ 2025-05-22 16:10 ` Leon Romanovsky
0 siblings, 0 replies; 13+ messages in thread
From: Leon Romanovsky @ 2025-05-22 16:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Jason Gunthorpe, Saeed Mahameed, Richard Cochran,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, linux-rdma, linux-kernel, netdev, bpf,
Moshe Shemesh, Mark Bloch, Gal Pressman, Cosmin Ratiu
On Thu, May 22, 2025 at 08:51:32AM -0700, Jakub Kicinski wrote:
> On Wed, 21 May 2025 15:08:57 +0300 Tariq Toukan wrote:
> > This series by Cosmin converts mlx5 to use the recently added netdev
> > instance locking scheme.
>
> Are you planning to re-submit this as a PR?
> The subject tag and Leon's reviews being present makes me think
> that applying directly is fine, but I wanted to confirm..
Yes, please apply them directly. There are no changes in IPoIB in this
cycle, so it will be safe to merge the code through netdev.
Cosmin added my ROB tags, after internal review, so I didn't want to be silly
and send my Acked-by in addition to already existing tags.
Thanks
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking
2025-05-21 12:08 [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking Tariq Toukan
` (5 preceding siblings ...)
2025-05-22 15:51 ` [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev " Jakub Kicinski
@ 2025-05-22 16:30 ` patchwork-bot+netdevbpf
6 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-05-22 16:30 UTC (permalink / raw)
To: Tariq Toukan
Cc: davem, kuba, pabeni, edumazet, andrew+netdev, jgg, leon, saeedm,
richardcochran, ast, daniel, hawk, john.fastabend, linux-rdma,
linux-kernel, netdev, bpf, moshe, mbloch, gal, cratiu
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 21 May 2025 15:08:57 +0300 you wrote:
> Hi,
>
> This series by Cosmin converts mlx5 to use the recently added netdev
> instance locking scheme.
>
> Find detailed description by Cosmin below [1].
>
> [...]
Here is the summary with links:
- [net-next,1/5] IB/IPoIB: Enqueue separate work_structs for each flushed interface
https://git.kernel.org/netdev/net-next/c/5f85120e7462
- [net-next,2/5] IB/IPoIB: Replace vlan_rwsem with the netdev instance lock
https://git.kernel.org/netdev/net-next/c/463e51769697
- [net-next,3/5] IB/IPoIB: Allow using netdevs that require the instance lock
https://git.kernel.org/netdev/net-next/c/fd07ba1680ba
- [net-next,4/5] net/mlx5e: Don't drop RTNL during firmware flash
https://git.kernel.org/netdev/net-next/c/d7d4f9f7365a
- [net-next,5/5] net/mlx5e: Convert mlx5 netdevs to instance locking
https://git.kernel.org/netdev/net-next/c/8f7b00307bf1
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-05-22 16:30 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 12:08 [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking Tariq Toukan
2025-05-21 12:08 ` [PATCH net-next 1/5] IB/IPoIB: Enqueue separate work_structs for each flushed interface Tariq Toukan
2025-05-21 12:08 ` [PATCH net-next 2/5] IB/IPoIB: Replace vlan_rwsem with the netdev instance lock Tariq Toukan
2025-05-21 12:09 ` [PATCH net-next 3/5] IB/IPoIB: Allow using netdevs that require the " Tariq Toukan
2025-05-21 12:09 ` [PATCH net-next 4/5] net/mlx5e: Don't drop RTNL during firmware flash Tariq Toukan
2025-05-22 16:00 ` Jakub Kicinski
2025-05-21 12:09 ` [PATCH net-next 5/5] net/mlx5e: Convert mlx5 netdevs to instance locking Tariq Toukan
2025-05-21 18:27 ` Stanislav Fomichev
2025-05-21 20:37 ` Cosmin Ratiu
2025-05-22 15:28 ` Stanislav Fomichev
2025-05-22 15:51 ` [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev " Jakub Kicinski
2025-05-22 16:10 ` Leon Romanovsky
2025-05-22 16:30 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).