* [PATCH rdma-next 0/3] Allow parallel cleanup of HW objects
@ 2024-10-31 11:22 Leon Romanovsky
2024-10-31 11:22 ` [PATCH rdma-next 1/3] RDMA/core: Add device ufile cleanup operation Leon Romanovsky
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Leon Romanovsky @ 2024-10-31 11:22 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: linux-kernel, linux-rdma, Patrisious Haddad
This series from Patrisious adds a new device operation to allow the
driver to cleanup HW objects in parallel to the ufile cleanup. This is
useful for drivers that have HW objects that are not associated with a
kernel structures and doesn't have any dependencies on other objects.
In mlx5 case, we are using this new operation to cleanup DEVX QP
objects, which are independent from the rest verbs objects (like PD, CQ,
e.t.c).
Thanks
Patrisious Haddad (3):
RDMA/core: Add device ufile cleanup operation
RDMA/core: Move ib_uverbs_file struct to uverbs_types.h
RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation
drivers/infiniband/core/device.c | 1 +
drivers/infiniband/core/rdma_core.c | 12 +++-
drivers/infiniband/core/uverbs.h | 31 ----------
drivers/infiniband/hw/mlx5/devx.c | 93 ++++++++++++++++++++++++++++-
drivers/infiniband/hw/mlx5/devx.h | 4 ++
drivers/infiniband/hw/mlx5/main.c | 1 +
include/rdma/ib_verbs.h | 6 ++
include/rdma/uverbs_types.h | 33 ++++++++++
8 files changed, 146 insertions(+), 35 deletions(-)
--
2.46.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH rdma-next 1/3] RDMA/core: Add device ufile cleanup operation
2024-10-31 11:22 [PATCH rdma-next 0/3] Allow parallel cleanup of HW objects Leon Romanovsky
@ 2024-10-31 11:22 ` Leon Romanovsky
2024-10-31 11:22 ` [PATCH rdma-next 2/3] RDMA/core: Move ib_uverbs_file struct to uverbs_types.h Leon Romanovsky
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2024-10-31 11:22 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Patrisious Haddad, linux-rdma
From: Patrisious Haddad <phaddad@nvidia.com>
Add a driver operation to allow preemptive cleanup of ufile HW resources
before the standard ufile cleanup flow begins. Thus, expediting the
final cleanup phase which leads to fast teardown overall.
This allows the use of driver specific clean up procedures to make the
cleanup process more efficient.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/device.c | 1 +
drivers/infiniband/core/rdma_core.c | 7 ++++++-
include/rdma/ib_verbs.h | 6 ++++++
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 93c6d27b5d8f..ca9b956c034d 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2760,6 +2760,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, resize_cq);
SET_DEVICE_OP(dev_ops, set_vf_guid);
SET_DEVICE_OP(dev_ops, set_vf_link_state);
+ SET_DEVICE_OP(dev_ops, ufile_hw_cleanup);
SET_OBJ_SIZE(dev_ops, ib_ah);
SET_OBJ_SIZE(dev_ops, ib_counters);
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 29b1ab1d5f93..02ef09e77bf8 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -880,9 +880,14 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
static int __uverbs_cleanup_ufile(struct ib_uverbs_file *ufile,
enum rdma_remove_reason reason)
{
+ struct uverbs_attr_bundle attrs = { .ufile = ufile };
+ struct ib_ucontext *ucontext = ufile->ucontext;
+ struct ib_device *ib_dev = ucontext->device;
struct ib_uobject *obj, *next_obj;
int ret = -EINVAL;
- struct uverbs_attr_bundle attrs = { .ufile = ufile };
+
+ if (ib_dev->ops.ufile_hw_cleanup)
+ ib_dev->ops.ufile_hw_cleanup(ufile);
/*
* This shouldn't run while executing other commands on this
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 67551133b522..3417636da960 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2675,6 +2675,12 @@ struct ib_device_ops {
*/
void (*del_sub_dev)(struct ib_device *sub_dev);
+ /**
+ * ufile_cleanup - Attempt to cleanup ubojects HW resources inside
+ * the ufile.
+ */
+ void (*ufile_hw_cleanup)(struct ib_uverbs_file *ufile);
+
DECLARE_RDMA_OBJ_SIZE(ib_ah);
DECLARE_RDMA_OBJ_SIZE(ib_counters);
DECLARE_RDMA_OBJ_SIZE(ib_cq);
--
2.46.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rdma-next 2/3] RDMA/core: Move ib_uverbs_file struct to uverbs_types.h
2024-10-31 11:22 [PATCH rdma-next 0/3] Allow parallel cleanup of HW objects Leon Romanovsky
2024-10-31 11:22 ` [PATCH rdma-next 1/3] RDMA/core: Add device ufile cleanup operation Leon Romanovsky
@ 2024-10-31 11:22 ` Leon Romanovsky
2024-10-31 11:22 ` [PATCH rdma-next 3/3] RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation Leon Romanovsky
2024-11-04 8:46 ` [PATCH rdma-next 0/3] Allow parallel cleanup of HW objects Leon Romanovsky
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2024-10-31 11:22 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Patrisious Haddad, linux-rdma
From: Patrisious Haddad <phaddad@nvidia.com>
In light of the previous commit, make the ib_uverbs_file accessible to
drivers by moving its definition to uverbs_types.h, to allow drivers to
freely access the struct argument and create a personalized cleanup flow.
For the same reason expose uverbs_try_lock_object function to allow driver
to safely access the uverbs objects.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/rdma_core.c | 5 +++--
drivers/infiniband/core/uverbs.h | 31 ---------------------------
include/rdma/uverbs_types.h | 33 +++++++++++++++++++++++++++++
3 files changed, 36 insertions(+), 33 deletions(-)
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 02ef09e77bf8..90c177edf9b0 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -58,8 +58,8 @@ void uverbs_uobject_put(struct ib_uobject *uobject)
}
EXPORT_SYMBOL(uverbs_uobject_put);
-static int uverbs_try_lock_object(struct ib_uobject *uobj,
- enum rdma_lookup_mode mode)
+int uverbs_try_lock_object(struct ib_uobject *uobj,
+ enum rdma_lookup_mode mode)
{
/*
* When a shared access is required, we use a positive counter. Each
@@ -84,6 +84,7 @@ static int uverbs_try_lock_object(struct ib_uobject *uobj,
}
return 0;
}
+EXPORT_SYMBOL(uverbs_try_lock_object);
static void assert_uverbs_usecnt(struct ib_uobject *uobj,
enum rdma_lookup_mode mode)
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index dfd2e5a86e6f..797e2fcc8072 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -133,37 +133,6 @@ struct ib_uverbs_completion_event_file {
struct ib_uverbs_event_queue ev_queue;
};
-struct ib_uverbs_file {
- struct kref ref;
- struct ib_uverbs_device *device;
- struct mutex ucontext_lock;
- /*
- * ucontext must be accessed via ib_uverbs_get_ucontext() or with
- * ucontext_lock held
- */
- struct ib_ucontext *ucontext;
- struct ib_uverbs_async_event_file *default_async_file;
- struct list_head list;
-
- /*
- * To access the uobjects list hw_destroy_rwsem must be held for write
- * OR hw_destroy_rwsem held for read AND uobjects_lock held.
- * hw_destroy_rwsem should be called across any destruction of the HW
- * object of an associated uobject.
- */
- struct rw_semaphore hw_destroy_rwsem;
- spinlock_t uobjects_lock;
- struct list_head uobjects;
-
- struct mutex umap_lock;
- struct list_head umaps;
- struct page *disassociate_page;
-
- struct xarray idr;
-
- struct mutex disassociation_lock;
-};
-
struct ib_uverbs_event {
union {
struct ib_uverbs_async_event_desc async;
diff --git a/include/rdma/uverbs_types.h b/include/rdma/uverbs_types.h
index ccd11631c167..26ba919ac245 100644
--- a/include/rdma/uverbs_types.h
+++ b/include/rdma/uverbs_types.h
@@ -134,6 +134,8 @@ static inline void uverbs_uobject_get(struct ib_uobject *uobject)
}
void uverbs_uobject_put(struct ib_uobject *uobject);
+int uverbs_try_lock_object(struct ib_uobject *uobj, enum rdma_lookup_mode mode);
+
struct uverbs_obj_fd_type {
/*
* In fd based objects, uverbs_obj_type_ops points to generic
@@ -150,6 +152,37 @@ struct uverbs_obj_fd_type {
int flags;
};
+struct ib_uverbs_file {
+ struct kref ref;
+ struct ib_uverbs_device *device;
+ struct mutex ucontext_lock;
+ /*
+ * ucontext must be accessed via ib_uverbs_get_ucontext() or with
+ * ucontext_lock held
+ */
+ struct ib_ucontext *ucontext;
+ struct ib_uverbs_async_event_file *default_async_file;
+ struct list_head list;
+
+ /*
+ * To access the uobjects list hw_destroy_rwsem must be held for write
+ * OR hw_destroy_rwsem held for read AND uobjects_lock held.
+ * hw_destroy_rwsem should be called across any destruction of the HW
+ * object of an associated uobject.
+ */
+ struct rw_semaphore hw_destroy_rwsem;
+ spinlock_t uobjects_lock;
+ struct list_head uobjects;
+
+ struct mutex umap_lock;
+ struct list_head umaps;
+ struct page *disassociate_page;
+
+ struct xarray idr;
+
+ struct mutex disassociation_lock;
+};
+
extern const struct uverbs_obj_type_class uverbs_idr_class;
extern const struct uverbs_obj_type_class uverbs_fd_class;
int uverbs_uobject_fd_release(struct inode *inode, struct file *filp);
--
2.46.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rdma-next 3/3] RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation
2024-10-31 11:22 [PATCH rdma-next 0/3] Allow parallel cleanup of HW objects Leon Romanovsky
2024-10-31 11:22 ` [PATCH rdma-next 1/3] RDMA/core: Add device ufile cleanup operation Leon Romanovsky
2024-10-31 11:22 ` [PATCH rdma-next 2/3] RDMA/core: Move ib_uverbs_file struct to uverbs_types.h Leon Romanovsky
@ 2024-10-31 11:22 ` Leon Romanovsky
2024-11-04 8:46 ` [PATCH rdma-next 0/3] Allow parallel cleanup of HW objects Leon Romanovsky
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2024-10-31 11:22 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Patrisious Haddad, linux-rdma
From: Patrisious Haddad <phaddad@nvidia.com>
Implement the device API for ufile_hw_cleanup operation, which
iterates over the ufile uobjects lists, and attempts to destroy
DevX QPs, by issuing up to 8 commands in parallel.
This function is responsible only for cleaning the FW resources of the
QP, and doesn't necessarily cleanup all of its resources.
Hence the normal serialized cleanup flow is still executed after it
in __uverbs_cleanup_ufile() to cleanup the remaining resources and
handle the cleanup of SW objects.
In order to avoid double cleanup for the FW resources, new DevX flag
was added DEVX_OBJ_FLAGS_HW_FREED, which marks the object's FW resources
as already freed.
Since QP destruction is the most time-consuming operation in FW,
parallelizing it reduces the cleanup time of applications that use
DevX QPs.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/devx.c | 93 ++++++++++++++++++++++++++++++-
drivers/infiniband/hw/mlx5/devx.h | 4 ++
drivers/infiniband/hw/mlx5/main.c | 1 +
3 files changed, 97 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index 5027a39ab1dd..a4a661e533bf 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -27,6 +27,19 @@ enum devx_obj_flags {
DEVX_OBJ_FLAGS_INDIRECT_MKEY = 1 << 0,
DEVX_OBJ_FLAGS_DCT = 1 << 1,
DEVX_OBJ_FLAGS_CQ = 1 << 2,
+ DEVX_OBJ_FLAGS_HW_FREED = 1 << 3,
+};
+
+#define MAX_ASYNC_CMDS 8
+
+struct mlx5_async_cmd {
+ struct ib_uobject *uobject;
+ void *in;
+ int in_size;
+ u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)];
+ int err;
+ struct mlx5_async_work cb_work;
+ struct completion comp;
};
struct devx_async_data {
@@ -1405,7 +1418,9 @@ static int devx_obj_cleanup(struct ib_uobject *uobject,
*/
mlx5r_deref_wait_odp_mkey(&obj->mkey);
- if (obj->flags & DEVX_OBJ_FLAGS_DCT)
+ if (obj->flags & DEVX_OBJ_FLAGS_HW_FREED)
+ ret = 0;
+ else if (obj->flags & DEVX_OBJ_FLAGS_DCT)
ret = mlx5_core_destroy_dct(obj->ib_dev, &obj->core_dct);
else if (obj->flags & DEVX_OBJ_FLAGS_CQ)
ret = mlx5_core_destroy_cq(obj->ib_dev->mdev, &obj->core_cq);
@@ -2596,6 +2611,82 @@ void mlx5_ib_devx_cleanup(struct mlx5_ib_dev *dev)
}
}
+static void devx_async_destroy_cb(int status, struct mlx5_async_work *context)
+{
+ struct mlx5_async_cmd *devx_out = container_of(context,
+ struct mlx5_async_cmd, cb_work);
+ struct devx_obj *obj = devx_out->uobject->object;
+
+ if (!status)
+ obj->flags |= DEVX_OBJ_FLAGS_HW_FREED;
+
+ complete(&devx_out->comp);
+}
+
+static void devx_async_destroy(struct mlx5_ib_dev *dev,
+ struct mlx5_async_cmd *cmd)
+{
+ init_completion(&cmd->comp);
+ cmd->err = mlx5_cmd_exec_cb(&dev->async_ctx, cmd->in, cmd->in_size,
+ &cmd->out, sizeof(cmd->out),
+ devx_async_destroy_cb, &cmd->cb_work);
+}
+
+static void devx_wait_async_destroy(struct mlx5_async_cmd *cmd)
+{
+ if (!cmd->err)
+ wait_for_completion(&cmd->comp);
+ atomic_set(&cmd->uobject->usecnt, 0);
+}
+
+void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
+{
+ struct mlx5_async_cmd async_cmd[MAX_ASYNC_CMDS];
+ struct ib_ucontext *ucontext = ufile->ucontext;
+ struct ib_device *device = ucontext->device;
+ struct mlx5_ib_dev *dev = to_mdev(device);
+ struct ib_uobject *uobject;
+ struct devx_obj *obj;
+ int head = 0;
+ int tail = 0;
+
+ list_for_each_entry(uobject, &ufile->uobjects, list) {
+ WARN_ON(uverbs_try_lock_object(uobject, UVERBS_LOOKUP_WRITE));
+
+ /*
+ * Currently we only support QP destruction, if other objects
+ * are to be destroyed need to add type synchronization to the
+ * cleanup algorithm and handle pre/post FW cleanup for the
+ * new types if needed.
+ */
+ if (uobj_get_object_id(uobject) != MLX5_IB_OBJECT_DEVX_OBJ ||
+ (get_dec_obj_type(uobject->object, MLX5_EVENT_TYPE_MAX) !=
+ MLX5_OBJ_TYPE_QP)) {
+ atomic_set(&uobject->usecnt, 0);
+ continue;
+ }
+
+ obj = uobject->object;
+
+ async_cmd[tail % MAX_ASYNC_CMDS].in = obj->dinbox;
+ async_cmd[tail % MAX_ASYNC_CMDS].in_size = obj->dinlen;
+ async_cmd[tail % MAX_ASYNC_CMDS].uobject = uobject;
+
+ devx_async_destroy(dev, &async_cmd[tail % MAX_ASYNC_CMDS]);
+ tail++;
+
+ if (tail - head == MAX_ASYNC_CMDS) {
+ devx_wait_async_destroy(&async_cmd[head % MAX_ASYNC_CMDS]);
+ head++;
+ }
+ }
+
+ while (head != tail) {
+ devx_wait_async_destroy(&async_cmd[head % MAX_ASYNC_CMDS]);
+ head++;
+ }
+}
+
static ssize_t devx_async_cmd_event_read(struct file *filp, char __user *buf,
size_t count, loff_t *pos)
{
diff --git a/drivers/infiniband/hw/mlx5/devx.h b/drivers/infiniband/hw/mlx5/devx.h
index ee2213275fd6..1344bf4c9d21 100644
--- a/drivers/infiniband/hw/mlx5/devx.h
+++ b/drivers/infiniband/hw/mlx5/devx.h
@@ -28,6 +28,7 @@ int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user);
void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, u16 uid);
int mlx5_ib_devx_init(struct mlx5_ib_dev *dev);
void mlx5_ib_devx_cleanup(struct mlx5_ib_dev *dev);
+void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile);
#else
static inline int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user)
{
@@ -41,5 +42,8 @@ static inline int mlx5_ib_devx_init(struct mlx5_ib_dev *dev)
static inline void mlx5_ib_devx_cleanup(struct mlx5_ib_dev *dev)
{
}
+static inline void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
+{
+}
#endif
#endif /* _MLX5_IB_DEVX_H */
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 5038c52b79aa..65da5df05d02 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4149,6 +4149,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
.req_notify_cq = mlx5_ib_arm_cq,
.rereg_user_mr = mlx5_ib_rereg_user_mr,
.resize_cq = mlx5_ib_resize_cq,
+ .ufile_hw_cleanup = mlx5_ib_ufile_hw_cleanup,
INIT_RDMA_OBJ_SIZE(ib_ah, mlx5_ib_ah, ibah),
INIT_RDMA_OBJ_SIZE(ib_counters, mlx5_ib_mcounters, ibcntrs),
--
2.46.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH rdma-next 0/3] Allow parallel cleanup of HW objects
2024-10-31 11:22 [PATCH rdma-next 0/3] Allow parallel cleanup of HW objects Leon Romanovsky
` (2 preceding siblings ...)
2024-10-31 11:22 ` [PATCH rdma-next 3/3] RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation Leon Romanovsky
@ 2024-11-04 8:46 ` Leon Romanovsky
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2024-11-04 8:46 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: linux-kernel, linux-rdma, Patrisious Haddad
On Thu, 31 Oct 2024 13:22:50 +0200, Leon Romanovsky wrote:
> This series from Patrisious adds a new device operation to allow the
> driver to cleanup HW objects in parallel to the ufile cleanup. This is
> useful for drivers that have HW objects that are not associated with a
> kernel structures and doesn't have any dependencies on other objects.
>
> In mlx5 case, we are using this new operation to cleanup DEVX QP
> objects, which are independent from the rest verbs objects (like PD, CQ,
> e.t.c).
>
> [...]
Applied, thanks!
[1/3] RDMA/core: Add device ufile cleanup operation
https://git.kernel.org/rdma/rdma/c/e18f73a885df74
[2/3] RDMA/core: Move ib_uverbs_file struct to uverbs_types.h
https://git.kernel.org/rdma/rdma/c/1e1faa6232cf05
[3/3] RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation
https://git.kernel.org/rdma/rdma/c/6c2af7e3ebe6b5
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-04 8:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 11:22 [PATCH rdma-next 0/3] Allow parallel cleanup of HW objects Leon Romanovsky
2024-10-31 11:22 ` [PATCH rdma-next 1/3] RDMA/core: Add device ufile cleanup operation Leon Romanovsky
2024-10-31 11:22 ` [PATCH rdma-next 2/3] RDMA/core: Move ib_uverbs_file struct to uverbs_types.h Leon Romanovsky
2024-10-31 11:22 ` [PATCH rdma-next 3/3] RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation Leon Romanovsky
2024-11-04 8:46 ` [PATCH rdma-next 0/3] Allow parallel cleanup of HW objects Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).