From: Tariq Toukan <tariqt@nvidia.com>
To: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>,
"Andrew Lunn" <andrew+netdev@lunn.ch>
Cc: Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>,
"Saeed Mahameed" <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>,
"Richard Cochran" <richardcochran@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
<linux-rdma@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
Moshe Shemesh <moshe@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
Gal Pressman <gal@nvidia.com>, Cosmin Ratiu <cratiu@nvidia.com>
Subject: [PATCH net-next 1/5] IB/IPoIB: Enqueue separate work_structs for each flushed interface
Date: Wed, 21 May 2025 15:08:58 +0300 [thread overview]
Message-ID: <1747829342-1018757-2-git-send-email-tariqt@nvidia.com> (raw)
In-Reply-To: <1747829342-1018757-1-git-send-email-tariqt@nvidia.com>
From: Cosmin Ratiu <cratiu@nvidia.com>
Previously, flushing a netdevice involved first flushing all child
devices from the flush task itself. That requires holding the lock that
protects the list for the entire duration of the flush.
This poses a problem when converting from vlan_rwsem to the netdev
instance lock (next patch), because holding the parent lock while
trying to acquire a child lock makes lockdep unhappy, rightfully.
Fix this by splitting a big flush task into individual flush tasks
(all are already created in their respective ipoib_dev_priv structs)
and defining a helper function to enqueue all of them while holding the
list lock.
In ipoib_set_mac, the function is not used and the task is enqueued
directly, because in the subsequent patches locking is changed and this
function may be called with the netdev instance lock held.
This is effectively a noop, the wq is single-threaded and ordered and
will execute the same flush operations in the same order as before.
Furthermore, there should be no new races because
ipoib_parent_unregister_pre() calls flush_workqueue() after stopping new
work generation to wait for pending work to complete. flush_workqueue()
waits for all currently enqueued work to finish before returning.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/infiniband/ulp/ipoib/ipoib.h | 2 +
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 46 ++++++++++++++--------
drivers/infiniband/ulp/ipoib/ipoib_main.c | 10 ++++-
drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 8 ++--
4 files changed, 44 insertions(+), 22 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index abe0522b7df4..2e05e9c9317d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -512,6 +512,8 @@ int ipoib_intf_init(struct ib_device *hca, u32 port, const char *format,
void ipoib_ib_dev_flush_light(struct work_struct *work);
void ipoib_ib_dev_flush_normal(struct work_struct *work);
void ipoib_ib_dev_flush_heavy(struct work_struct *work);
+void ipoib_queue_work(struct ipoib_dev_priv *priv,
+ enum ipoib_flush_level level);
void ipoib_ib_tx_timeout_work(struct work_struct *work);
void ipoib_ib_dev_cleanup(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 5cde275daa94..e0e7f600097d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -1172,24 +1172,11 @@ static bool ipoib_dev_addr_changed_valid(struct ipoib_dev_priv *priv)
}
static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv,
- enum ipoib_flush_level level,
- int nesting)
+ enum ipoib_flush_level level)
{
- struct ipoib_dev_priv *cpriv;
struct net_device *dev = priv->dev;
int result;
- down_read_nested(&priv->vlan_rwsem, nesting);
-
- /*
- * Flush any child interfaces too -- they might be up even if
- * the parent is down.
- */
- list_for_each_entry(cpriv, &priv->child_intfs, list)
- __ipoib_ib_dev_flush(cpriv, level, nesting + 1);
-
- up_read(&priv->vlan_rwsem);
-
if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) &&
level != IPOIB_FLUSH_HEAVY) {
/* Make sure the dev_addr is set even if not flushing */
@@ -1280,7 +1267,7 @@ void ipoib_ib_dev_flush_light(struct work_struct *work)
struct ipoib_dev_priv *priv =
container_of(work, struct ipoib_dev_priv, flush_light);
- __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_LIGHT, 0);
+ __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_LIGHT);
}
void ipoib_ib_dev_flush_normal(struct work_struct *work)
@@ -1288,7 +1275,7 @@ void ipoib_ib_dev_flush_normal(struct work_struct *work)
struct ipoib_dev_priv *priv =
container_of(work, struct ipoib_dev_priv, flush_normal);
- __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_NORMAL, 0);
+ __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_NORMAL);
}
void ipoib_ib_dev_flush_heavy(struct work_struct *work)
@@ -1297,10 +1284,35 @@ void ipoib_ib_dev_flush_heavy(struct work_struct *work)
container_of(work, struct ipoib_dev_priv, flush_heavy);
rtnl_lock();
- __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_HEAVY, 0);
+ __ipoib_ib_dev_flush(priv, IPOIB_FLUSH_HEAVY);
rtnl_unlock();
}
+void ipoib_queue_work(struct ipoib_dev_priv *priv,
+ enum ipoib_flush_level level)
+{
+ if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
+ struct ipoib_dev_priv *cpriv;
+
+ down_read(&priv->vlan_rwsem);
+ list_for_each_entry(cpriv, &priv->child_intfs, list)
+ ipoib_queue_work(cpriv, level);
+ up_read(&priv->vlan_rwsem);
+ }
+
+ switch (level) {
+ case IPOIB_FLUSH_LIGHT:
+ queue_work(ipoib_workqueue, &priv->flush_light);
+ break;
+ case IPOIB_FLUSH_NORMAL:
+ queue_work(ipoib_workqueue, &priv->flush_normal);
+ break;
+ case IPOIB_FLUSH_HEAVY:
+ queue_work(ipoib_workqueue, &priv->flush_heavy);
+ break;
+ }
+}
+
void ipoib_ib_dev_cleanup(struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 3b463db8ce39..55b1f3cbee17 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -2415,6 +2415,14 @@ static int ipoib_set_mac(struct net_device *dev, void *addr)
set_base_guid(priv, (union ib_gid *)(ss->__data + 4));
+ if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
+ struct ipoib_dev_priv *cpriv;
+
+ down_read(&priv->vlan_rwsem);
+ list_for_each_entry(cpriv, &priv->child_intfs, list)
+ queue_work(ipoib_workqueue, &cpriv->flush_light);
+ up_read(&priv->vlan_rwsem);
+ }
queue_work(ipoib_workqueue, &priv->flush_light);
return 0;
@@ -2526,7 +2534,7 @@ static struct net_device *ipoib_add_port(const char *format,
ib_register_event_handler(&priv->event_handler);
/* call event handler to ensure pkey in sync */
- queue_work(ipoib_workqueue, &priv->flush_heavy);
+ ipoib_queue_work(priv, IPOIB_FLUSH_HEAVY);
ndev->rtnl_link_ops = ipoib_get_link_ops();
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 368e5d77416d..86983080d28b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -280,15 +280,15 @@ void ipoib_event(struct ib_event_handler *handler,
dev_name(&record->device->dev), record->element.port_num);
if (record->event == IB_EVENT_CLIENT_REREGISTER) {
- queue_work(ipoib_workqueue, &priv->flush_light);
+ ipoib_queue_work(priv, IPOIB_FLUSH_LIGHT);
} else if (record->event == IB_EVENT_PORT_ERR ||
record->event == IB_EVENT_PORT_ACTIVE ||
record->event == IB_EVENT_LID_CHANGE) {
- queue_work(ipoib_workqueue, &priv->flush_normal);
+ ipoib_queue_work(priv, IPOIB_FLUSH_NORMAL);
} else if (record->event == IB_EVENT_PKEY_CHANGE) {
- queue_work(ipoib_workqueue, &priv->flush_heavy);
+ ipoib_queue_work(priv, IPOIB_FLUSH_HEAVY);
} else if (record->event == IB_EVENT_GID_CHANGE &&
!test_bit(IPOIB_FLAG_DEV_ADDR_SET, &priv->flags)) {
- queue_work(ipoib_workqueue, &priv->flush_light);
+ ipoib_queue_work(priv, IPOIB_FLUSH_LIGHT);
}
}
--
2.31.1
next prev parent reply other threads:[~2025-05-21 12:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 12:08 [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev instance locking Tariq Toukan
2025-05-21 12:08 ` Tariq Toukan [this message]
2025-05-21 12:08 ` [PATCH net-next 2/5] IB/IPoIB: Replace vlan_rwsem with the netdev instance lock Tariq Toukan
2025-05-21 12:09 ` [PATCH net-next 3/5] IB/IPoIB: Allow using netdevs that require the " Tariq Toukan
2025-05-21 12:09 ` [PATCH net-next 4/5] net/mlx5e: Don't drop RTNL during firmware flash Tariq Toukan
2025-05-22 16:00 ` Jakub Kicinski
2025-05-21 12:09 ` [PATCH net-next 5/5] net/mlx5e: Convert mlx5 netdevs to instance locking Tariq Toukan
2025-05-21 18:27 ` Stanislav Fomichev
2025-05-21 20:37 ` Cosmin Ratiu
2025-05-22 15:28 ` Stanislav Fomichev
2025-05-22 15:51 ` [PATCH net-next 0/5] net/mlx5: Convert mlx5 to netdev " Jakub Kicinski
2025-05-22 16:10 ` Leon Romanovsky
2025-05-22 16:30 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1747829342-1018757-2-git-send-email-tariqt@nvidia.com \
--to=tariqt@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cratiu@nvidia.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=hawk@kernel.org \
--cc=jgg@ziepe.ca \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=moshe@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=richardcochran@gmail.com \
--cc=saeedm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox