[PATCH rdma-next 0/3] Handle FW failures to destroy QP/RQ objects

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH rdma-next 0/3] Handle FW failures to destroy QP/RQ objects
@ 2022-04-05  8:12 Leon Romanovsky
  2022-04-05  8:12 ` [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction Leon Romanovsky
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Leon Romanovsky @ 2022-04-05  8:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Jakub Kicinski, linux-rdma, netdev, Paolo Abeni,
	Patrisious Haddad, Saeed Mahameed, Yishai Hadas

From: Leon Romanovsky <leonro@nvidia.com>

Hi,

This series from Patrisious extends mlx5 driver to convey FW failures
back to the upper layers and allow retry to delete these hardware
resources.

Thanks

Patrisious Haddad (3):
  net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction
  RDMA/mlx5: Handling dct common resource destruction upon firmware
    failure
  RDMA/mlx5: Return the firmware result upon destroying QP/RQ

 drivers/infiniband/hw/mlx5/qpc.c              | 13 +++++++------
 .../net/ethernet/mellanox/mlx5/core/debugfs.c | 12 ++++++------
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  | 19 +++++++++++++------
 3 files changed, 26 insertions(+), 18 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction
  2022-04-05  8:12 [PATCH rdma-next 0/3] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
@ 2022-04-05  8:12 ` Leon Romanovsky
  2022-04-05 19:48   ` Saeed Mahameed
  2022-04-05  8:12 ` [PATCH rdma-next 2/3] RDMA/mlx5: Handling dct common resource destruction upon firmware failure Leon Romanovsky
  2022-04-05  8:12 ` [PATCH rdma-next 3/3] RDMA/mlx5: Return the firmware result upon destroying QP/RQ Leon Romanovsky
  2 siblings, 1 reply; 8+ messages in thread
From: Leon Romanovsky @ 2022-04-05  8:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Patrisious Haddad, Jakub Kicinski, linux-rdma, netdev,
	Paolo Abeni, Saeed Mahameed, Yishai Hadas

From: Patrisious Haddad <phaddad@nvidia.com>

Prior to this patch in the case that destroy_unmap_eq()
failed and was called again, it triggered an additional call of
mlx5_debug_eq_remove() which causes a kernel crash, since
eq->dbg was not nullified in previous call.

Fix it by nullifying eq->dbg pointer after removal.

As for the qp->dbg the change is a preparation for the next patches
from the series in which mlx5_core_destroy_qp() could actually fail,
and have the same outcome.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/debugfs.c | 12 ++++++------
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  | 19 +++++++++++++------
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
index 3d3e55a5cb11..9b96a1ca0779 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
@@ -486,11 +486,11 @@ EXPORT_SYMBOL(mlx5_debug_qp_add);
 
 void mlx5_debug_qp_remove(struct mlx5_core_dev *dev, struct mlx5_core_qp *qp)
 {
-	if (!mlx5_debugfs_root)
+	if (!mlx5_debugfs_root || !qp->dbg)
 		return;
 
-	if (qp->dbg)
-		rem_res_tree(qp->dbg);
+	rem_res_tree(qp->dbg);
+	qp->dbg = NULL;
 }
 EXPORT_SYMBOL(mlx5_debug_qp_remove);
 
@@ -512,11 +512,11 @@ int mlx5_debug_eq_add(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 
 void mlx5_debug_eq_remove(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 {
-	if (!mlx5_debugfs_root)
+	if (!mlx5_debugfs_root || !eq->dbg)
 		return;
 
-	if (eq->dbg)
-		rem_res_tree(eq->dbg);
+	rem_res_tree(eq->dbg);
+	eq->dbg = NULL;
 }
 
 int mlx5_debug_cq_add(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 229728c80233..3c61f355cdac 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -386,16 +386,20 @@ void mlx5_eq_disable(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 }
 EXPORT_SYMBOL(mlx5_eq_disable);
 
-static int destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
+static int destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
+			    bool reentry)
 {
 	int err;
 
 	mlx5_debug_eq_remove(dev, eq);
 
 	err = mlx5_cmd_destroy_eq(dev, eq->eqn);
-	if (err)
+	if (err) {
 		mlx5_core_warn(dev, "failed to destroy a previously created eq: eqn %d\n",
 			       eq->eqn);
+		if (reentry)
+			return err;
+	}
 
 	mlx5_frag_buf_free(dev, &eq->frag_buf);
 	return err;
@@ -481,7 +485,7 @@ static int destroy_async_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 	int err;
 
 	mutex_lock(&eq_table->lock);
-	err = destroy_unmap_eq(dev, eq);
+	err = destroy_unmap_eq(dev, eq, false);
 	mutex_unlock(&eq_table->lock);
 	return err;
 }
@@ -748,12 +752,15 @@ EXPORT_SYMBOL(mlx5_eq_create_generic);
 
 int mlx5_eq_destroy_generic(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 {
+	struct mlx5_eq_table *eq_table = dev->priv.eq_table;
 	int err;
 
 	if (IS_ERR(eq))
 		return -EINVAL;
 
-	err = destroy_async_eq(dev, eq);
+	mutex_lock(&eq_table->lock);
+	err = destroy_unmap_eq(dev, eq, true);
+	mutex_unlock(&eq_table->lock);
 	if (err)
 		goto out;
 
@@ -851,7 +858,7 @@ static void destroy_comp_eqs(struct mlx5_core_dev *dev)
 	list_for_each_entry_safe(eq, n, &table->comp_eqs_list, list) {
 		list_del(&eq->list);
 		mlx5_eq_disable(dev, &eq->core, &eq->irq_nb);
-		if (destroy_unmap_eq(dev, &eq->core))
+		if (destroy_unmap_eq(dev, &eq->core, false))
 			mlx5_core_warn(dev, "failed to destroy comp EQ 0x%x\n",
 				       eq->core.eqn);
 		tasklet_disable(&eq->tasklet_ctx.task);
@@ -915,7 +922,7 @@ static int create_comp_eqs(struct mlx5_core_dev *dev)
 			goto clean_eq;
 		err = mlx5_eq_enable(dev, &eq->core, &eq->irq_nb);
 		if (err) {
-			destroy_unmap_eq(dev, &eq->core);
+			destroy_unmap_eq(dev, &eq->core, false);
 			goto clean_eq;
 		}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH rdma-next 2/3] RDMA/mlx5: Handling dct common resource destruction upon firmware failure
  2022-04-05  8:12 [PATCH rdma-next 0/3] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
  2022-04-05  8:12 ` [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction Leon Romanovsky
@ 2022-04-05  8:12 ` Leon Romanovsky
  2022-04-05  8:12 ` [PATCH rdma-next 3/3] RDMA/mlx5: Return the firmware result upon destroying QP/RQ Leon Romanovsky
  2 siblings, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2022-04-05  8:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Patrisious Haddad, Jakub Kicinski, linux-rdma, netdev,
	Paolo Abeni, Saeed Mahameed, Yishai Hadas

From: Patrisious Haddad <phaddad@nvidia.com>

Previously when destroying a DCT, if the firmware function for the
destruction failed, the common resource would have been destroyed
either way, since it was destroyed before the firmware object.
Which leads to kernel warning "refcount_t: underflow" which indicates
possible use-after-free.
Which is triggered when we try to destroy the common resource for the
second time and execute refcount_dec_and_test(&common->refcount).

So, currently before destroying the common resource we check its
refcount and continue with the destruction only if it isn't zero.

refcount_t: underflow; use-after-free.
WARNING: CPU: 8 PID: 1002 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse
CPU: 8 PID: 1002 Comm: python3 Not tainted 5.16.0-rc5+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0xd8/0xe0
Code: ff 48 c7 c7 18 f5 23 82 c6 05 60 70 ff 00 01 e8 d0 0a 45 00 0f 0b c3 48 c7 c7 c0 f4 23 82 c6 05 4c 70 ff 00 01 e8 ba 0a 45 00 <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13
RSP: 0018:ffff8881221d3aa8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8881313e8d40 RCX: ffff88852cc1b5c8
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88852cc1b5c0
RBP: ffff888100f70000 R08: ffff88853ffd1ba8 R09: 0000000000000003
R10: 00000000fffff000 R11: 3fffffffffffffff R12: 0000000000000246
R13: ffff888100f71fa0 R14: ffff8881221d3c68 R15: 0000000000000020
FS:  00007efebbb13740(0000) GS:ffff88852cc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005611aac29f80 CR3: 00000001313de004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 destroy_resource_common+0x6e/0x95 [mlx5_ib]
 mlx5_core_destroy_rq_tracked+0x38/0xbe [mlx5_ib]
 mlx5_ib_destroy_wq+0x22/0x80 [mlx5_ib]
 ib_destroy_wq_user+0x1f/0x40 [ib_core]
 uverbs_free_wq+0x19/0x40 [ib_uverbs]
 destroy_hw_idr_uobject+0x18/0x50 [ib_uverbs]
 uverbs_destroy_uobject+0x2f/0x190 [ib_uverbs]
 uobj_destroy+0x3c/0x80 [ib_uverbs]
 ib_uverbs_cmd_verbs+0x3e4/0xb80 [ib_uverbs]
 ? uverbs_free_wq+0x40/0x40 [ib_uverbs]
 ? ip_list_rcv+0xf7/0x120
 ? netif_receive_skb_list_internal+0x1b6/0x2d0
 ? task_tick_fair+0xbf/0x450
 ? __handle_mm_fault+0x11fc/0x1450
 ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs]
 __x64_sys_ioctl+0x3e4/0x8e0
 ? handle_mm_fault+0xb9/0x210
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7efebc0be17b
Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe71813e78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe71813fb8 RCX: 00007efebc0be17b
RDX: 00007ffe71813fa0 RSI: 00000000c0181b01 RDI: 0000000000000005
RBP: 00007ffe71813f80 R08: 00005611aae96020 R09: 000000000000004f
R10: 00007efebbf9ffa0 R11: 0000000000000246 R12: 00007ffe71813f80
R13: 00007ffe71813f4c R14: 00005611aae2eca0 R15: 00007efeae6c89d0
 </TASK>

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/qpc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/qpc.c b/drivers/infiniband/hw/mlx5/qpc.c
index 542e4c63a8de..d9ce73a2fbeb 100644
--- a/drivers/infiniband/hw/mlx5/qpc.c
+++ b/drivers/infiniband/hw/mlx5/qpc.c
@@ -178,6 +178,9 @@ static void destroy_resource_common(struct mlx5_ib_dev *dev,
 	struct mlx5_qp_table *table = &dev->qp_table;
 	unsigned long flags;
 
+	if (refcount_read(&qp->common.refcount) == 0)
+		return;
+
 	spin_lock_irqsave(&table->lock, flags);
 	radix_tree_delete(&table->tree,
 			  qp->qpn | (qp->common.res << MLX5_USER_INDEX_LEN));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH rdma-next 3/3] RDMA/mlx5: Return the firmware result upon destroying QP/RQ
  2022-04-05  8:12 [PATCH rdma-next 0/3] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
  2022-04-05  8:12 ` [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction Leon Romanovsky
  2022-04-05  8:12 ` [PATCH rdma-next 2/3] RDMA/mlx5: Handling dct common resource destruction upon firmware failure Leon Romanovsky
@ 2022-04-05  8:12 ` Leon Romanovsky
  2 siblings, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2022-04-05  8:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Patrisious Haddad, Jakub Kicinski, linux-rdma, netdev,
	Paolo Abeni, Saeed Mahameed, Yishai Hadas

From: Patrisious Haddad <phaddad@nvidia.com>

Previously when destroying a QP/RQ, the result of the firmware
destruction function was ignored and upper layers weren't informed
about the failure.
Which in turn could lead to various problems since when upper layer
isn't aware of the failure it continues its operation thinking that the
related QP/RQ was successfully destroyed while it actually wasn't,
which could lead to the below kernel WARN.

Currently, we return the correct firmware destruction status to upper
layers which in case of the RQ would be mlx5_ib_destroy_wq() which
was already capable of handling RQ destruction failure or in case of
a QP to destroy_qp_common(), which now would actually warn upon qp
destruction failure.

WARNING: CPU: 3 PID: 995 at drivers/infiniband/core/rdma_core.c:940 uverbs_destroy_ufile_hw+0xcb/0xe0 [ib_uverbs]
Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse
CPU: 3 PID: 995 Comm: python3 Not tainted 5.16.0-rc5+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:uverbs_destroy_ufile_hw+0xcb/0xe0 [ib_uverbs]
Code: 41 5c 41 5d 41 5e e9 44 34 f0 e0 48 89 df e8 4c 77 ff ff 49 8b 86 10 01 00 00 48 85 c0 74 a1 4c 89 e7 ff d0 eb 9a 0f 0b eb c1 <0f> 0b be 04 00 00 00 48 89 df e8 b6 f6 ff ff e9 75 ff ff ff 90 0f
RSP: 0018:ffff8881533e3e78 EFLAGS: 00010287
RAX: ffff88811b2cf3e0 RBX: ffff888106209700 RCX: 0000000000000000
RDX: ffff888106209780 RSI: ffff8881533e3d30 RDI: ffff888109b101a0
RBP: 0000000000000001 R08: ffff888127cb381c R09: 0de9890000000009
R10: ffff888127cb3800 R11: 0000000000000000 R12: ffff888106209780
R13: ffff888106209750 R14: ffff888100f20660 R15: 0000000000000000
FS:  00007f8be353b740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8bd5b117c0 CR3: 000000012cd8a004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ib_uverbs_close+0x1a/0x90 [ib_uverbs]
 __fput+0x82/0x230
 task_work_run+0x59/0x90
 exit_to_user_mode_prepare+0x138/0x140
 syscall_exit_to_user_mode+0x1d/0x50
 ? __x64_sys_close+0xe/0x40
 do_syscall_64+0x4a/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f8be3ae0abb
Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 83 43 f9 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 c1 43 f9 ff 8b 44
RSP: 002b:00007ffdb51909c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000557bb7f7c020 RCX: 00007f8be3ae0abb
RDX: 0000557bb7c74010 RSI: 0000557bb7f14ca0 RDI: 0000000000000005
RBP: 0000557bb7fbd598 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000557bb7fbd5b8
R13: 0000557bb7fbd5a8 R14: 0000000000001000 R15: 0000557bb7f7c020
 </TASK>

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/qpc.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qpc.c b/drivers/infiniband/hw/mlx5/qpc.c
index d9ce73a2fbeb..73bae024c01a 100644
--- a/drivers/infiniband/hw/mlx5/qpc.c
+++ b/drivers/infiniband/hw/mlx5/qpc.c
@@ -300,8 +300,7 @@ int mlx5_core_destroy_qp(struct mlx5_ib_dev *dev, struct mlx5_core_qp *qp)
 	MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP);
 	MLX5_SET(destroy_qp_in, in, qpn, qp->qpn);
 	MLX5_SET(destroy_qp_in, in, uid, qp->uid);
-	mlx5_cmd_exec_in(dev->mdev, destroy_qp, in);
-	return 0;
+	return mlx5_cmd_exec_in(dev->mdev, destroy_qp, in);
 }
 
 int mlx5_core_set_delay_drop(struct mlx5_ib_dev *dev,
@@ -551,14 +550,14 @@ int mlx5_core_xrcd_dealloc(struct mlx5_ib_dev *dev, u32 xrcdn)
 	return mlx5_cmd_exec_in(dev->mdev, dealloc_xrcd, in);
 }
 
-static void destroy_rq_tracked(struct mlx5_ib_dev *dev, u32 rqn, u16 uid)
+static int destroy_rq_tracked(struct mlx5_ib_dev *dev, u32 rqn, u16 uid)
 {
 	u32 in[MLX5_ST_SZ_DW(destroy_rq_in)] = {};
 
 	MLX5_SET(destroy_rq_in, in, opcode, MLX5_CMD_OP_DESTROY_RQ);
 	MLX5_SET(destroy_rq_in, in, rqn, rqn);
 	MLX5_SET(destroy_rq_in, in, uid, uid);
-	mlx5_cmd_exec_in(dev->mdev, destroy_rq, in);
+	return mlx5_cmd_exec_in(dev->mdev, destroy_rq, in);
 }
 
 int mlx5_core_create_rq_tracked(struct mlx5_ib_dev *dev, u32 *in, int inlen,
@@ -589,8 +588,7 @@ int mlx5_core_destroy_rq_tracked(struct mlx5_ib_dev *dev,
 				 struct mlx5_core_qp *rq)
 {
 	destroy_resource_common(dev, rq);
-	destroy_rq_tracked(dev, rq->qpn, rq->uid);
-	return 0;
+	return destroy_rq_tracked(dev, rq->qpn, rq->uid);
 }
 
 static void destroy_sq_tracked(struct mlx5_ib_dev *dev, u32 sqn, u16 uid)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction
  2022-04-05  8:12 ` [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction Leon Romanovsky
@ 2022-04-05 19:48   ` Saeed Mahameed
  2022-04-06  7:55     ` Leon Romanovsky
  0 siblings, 1 reply; 8+ messages in thread
From: Saeed Mahameed @ 2022-04-05 19:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Patrisious Haddad, Jakub Kicinski, linux-rdma,
	netdev, Paolo Abeni, Saeed Mahameed, Yishai Hadas

On 05 Apr 11:12, Leon Romanovsky wrote:
>From: Patrisious Haddad <phaddad@nvidia.com>
>
>Prior to this patch in the case that destroy_unmap_eq()
>failed and was called again, it triggered an additional call of

Where is it being failed and called again ? this shouldn't even be an
option, we try to keep mlx5 symmetrical, constructors and destructors are
supposed to be called only once in their respective positions.
the callers must be fixed to avoid re-entry, or change destructors to clear
up all resources even on failures, no matter what do not invent a reentry
protocols to mlx5 destructors.

>mlx5_debug_eq_remove() which causes a kernel crash, since
>eq->dbg was not nullified in previous call.
>

[...]

> int mlx5_debug_cq_add(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq)
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>index 229728c80233..3c61f355cdac 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>@@ -386,16 +386,20 @@ void mlx5_eq_disable(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
> }
> EXPORT_SYMBOL(mlx5_eq_disable);
>
>-static int destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
>+static int destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
>+			    bool reentry)
> {
> 	int err;
>
> 	mlx5_debug_eq_remove(dev, eq);
>
> 	err = mlx5_cmd_destroy_eq(dev, eq->eqn);
>-	if (err)
>+	if (err) {
> 		mlx5_core_warn(dev, "failed to destroy a previously created eq: eqn %d\n",
> 			       eq->eqn);
>+		if (reentry)
>+			return err;
>+	}
>
> 	mlx5_frag_buf_free(dev, &eq->frag_buf);
> 	return err;
>@@ -481,7 +485,7 @@ static int destroy_async_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
> 	int err;
>
> 	mutex_lock(&eq_table->lock);
>-	err = destroy_unmap_eq(dev, eq);
>+	err = destroy_unmap_eq(dev, eq, false);
> 	mutex_unlock(&eq_table->lock);
> 	return err;
> }
>@@ -748,12 +752,15 @@ EXPORT_SYMBOL(mlx5_eq_create_generic);
>
> int mlx5_eq_destroy_generic(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
> {
>+	struct mlx5_eq_table *eq_table = dev->priv.eq_table;
> 	int err;
>
> 	if (IS_ERR(eq))
> 		return -EINVAL;
>
>-	err = destroy_async_eq(dev, eq);
>+	mutex_lock(&eq_table->lock);

Here you are inventing the re-entry. 
Please drop this and fix properly. And avoid boolean parameters to mlx5 core
functions as much as possible, let's keep mlx5_core simple.

>+	err = destroy_unmap_eq(dev, eq, true);
>+	mutex_unlock(&eq_table->lock);
> 	if (err)
> 		goto out;
>
>@@ -851,7 +858,7 @@ static void destroy_comp_eqs(struct mlx5_core_dev *dev)
> 	list_for_each_entry_safe(eq, n, &table->comp_eqs_list, list) {
> 		list_del(&eq->list);
> 		mlx5_eq_disable(dev, &eq->core, &eq->irq_nb);
>-		if (destroy_unmap_eq(dev, &eq->core))
>+		if (destroy_unmap_eq(dev, &eq->core, false))
> 			mlx5_core_warn(dev, "failed to destroy comp EQ 0x%x\n",
> 				       eq->core.eqn);
> 		tasklet_disable(&eq->tasklet_ctx.task);
>@@ -915,7 +922,7 @@ static int create_comp_eqs(struct mlx5_core_dev *dev)
> 			goto clean_eq;
> 		err = mlx5_eq_enable(dev, &eq->core, &eq->irq_nb);
> 		if (err) {
>-			destroy_unmap_eq(dev, &eq->core);
>+			destroy_unmap_eq(dev, &eq->core, false);
> 			goto clean_eq;
> 		}
>
>-- 
>2.35.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction
  2022-04-05 19:48   ` Saeed Mahameed
@ 2022-04-06  7:55     ` Leon Romanovsky
  2022-04-08 19:30       ` Saeed Mahameed
  0 siblings, 1 reply; 8+ messages in thread
From: Leon Romanovsky @ 2022-04-06  7:55 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Jason Gunthorpe, Patrisious Haddad, Jakub Kicinski, linux-rdma,
	netdev, Paolo Abeni, Saeed Mahameed, Yishai Hadas

On Tue, Apr 05, 2022 at 12:48:45PM -0700, Saeed Mahameed wrote:
> On 05 Apr 11:12, Leon Romanovsky wrote:
> > From: Patrisious Haddad <phaddad@nvidia.com>
> > 
> > Prior to this patch in the case that destroy_unmap_eq()
> > failed and was called again, it triggered an additional call of
> 
> Where is it being failed and called again ? this shouldn't even be an
> option, we try to keep mlx5 symmetrical, constructors and destructors are
> supposed to be called only once in their respective positions.
> the callers must be fixed to avoid re-entry, or change destructors to clear
> up all resources even on failures, no matter what do not invent a reentry
> protocols to mlx5 destructors.

It can happen when QP is exposed through DEVX interface. In that flow,
only FW knows about it and reference count all users. This means that
attempt to destroy such QP will fail, but mlx5_core is structured in
such way that all cleanup was done before calling to FW to get
success/fail response.

For more detailed information, see this cover letter:
https://lore.kernel.org/all/20200907120921.476363-1-leon@kernel.org/

<...>

> > int mlx5_eq_destroy_generic(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
> > {
> > +	struct mlx5_eq_table *eq_table = dev->priv.eq_table;
> > 	int err;
> > 
> > 	if (IS_ERR(eq))
> > 		return -EINVAL;
> > 
> > -	err = destroy_async_eq(dev, eq);
> > +	mutex_lock(&eq_table->lock);
> 
> Here you are inventing the re-entry. Please drop this and fix properly. And
> avoid boolean parameters to mlx5 core
> functions as much as possible, let's keep mlx5_core simple.

If after reading the link above, you were not convinced, let's take it offline.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction
  2022-04-06  7:55     ` Leon Romanovsky
@ 2022-04-08 19:30       ` Saeed Mahameed
  2022-04-10  7:58         ` Leon Romanovsky
  0 siblings, 1 reply; 8+ messages in thread
From: Saeed Mahameed @ 2022-04-08 19:30 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Saeed Mahameed, Jason Gunthorpe, Patrisious Haddad,
	Jakub Kicinski, linux-rdma, netdev, Paolo Abeni, Yishai Hadas

On 06 Apr 10:55, Leon Romanovsky wrote:
>On Tue, Apr 05, 2022 at 12:48:45PM -0700, Saeed Mahameed wrote:
>> On 05 Apr 11:12, Leon Romanovsky wrote:
>> > From: Patrisious Haddad <phaddad@nvidia.com>
>> >
>> > Prior to this patch in the case that destroy_unmap_eq()
>> > failed and was called again, it triggered an additional call of
>>
>> Where is it being failed and called again ? this shouldn't even be an
>> option, we try to keep mlx5 symmetrical, constructors and destructors are
>> supposed to be called only once in their respective positions.
>> the callers must be fixed to avoid re-entry, or change destructors to clear
>> up all resources even on failures, no matter what do not invent a reentry
>> protocols to mlx5 destructors.
>
>It can happen when QP is exposed through DEVX interface. In that flow,
>only FW knows about it and reference count all users. This means that
>attempt to destroy such QP will fail, but mlx5_core is structured in
>such way that all cleanup was done before calling to FW to get
>success/fail response.

I wasn't talking about destroy_qp, actually destroy_qp is implemented the
way i am asking you to implement destroy_eq(); remove debugfs on first call
to destroy EQ, and drop the reentry logic from from mlx5_eq_destroy_generic
and destroy_async_eq.

EQ is a core/mlx5_ib resources, it's not exposed to user nor DEVX, it
shouldn't be subject to DEVX limitations.

Also looking at the destroy_qp implementation, it removes the debugfs
unconditionally even if the QP has ref count and removal will fail in FW.
just FYI.

For EQ I don't even understand why devx can cause ODP EQ removal to fail..
you must fix this at mlx5_ib layer, but for this patch, please drop the
re-entry and remove debugfs in destroy_eq, unconditionally.

>
>For more detailed information, see this cover letter:
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20200907120921.476363-1-leon%40kernel.org%2F&amp;data=04%7C01%7Csaeedm%40nvidia.com%7Cee8a0add0a154e055f8508da17a2d6fd%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637848285407413801%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=xD54MMVFSeONDeQyPHOinh93CPWjp2rUEL7F3izc210%3D&amp;reserved=0
>
><...>
>
>> > int mlx5_eq_destroy_generic(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
>> > {
>> > +	struct mlx5_eq_table *eq_table = dev->priv.eq_table;
>> > 	int err;
>> >
>> > 	if (IS_ERR(eq))
>> > 		return -EINVAL;
>> >
>> > -	err = destroy_async_eq(dev, eq);
>> > +	mutex_lock(&eq_table->lock);
>>
>> Here you are inventing the re-entry. Please drop this and fix properly. And
>> avoid boolean parameters to mlx5 core
>> functions as much as possible, let's keep mlx5_core simple.
>
>If after reading the link above, you were not convinced, let's take it offline.
>

I am not convinced, see above.

>Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction
  2022-04-08 19:30       ` Saeed Mahameed
@ 2022-04-10  7:58         ` Leon Romanovsky
  0 siblings, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2022-04-10  7:58 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, Jason Gunthorpe, Patrisious Haddad,
	Jakub Kicinski, linux-rdma, netdev, Paolo Abeni, Yishai Hadas

On Fri, Apr 08, 2022 at 12:30:35PM -0700, Saeed Mahameed wrote:
> On 06 Apr 10:55, Leon Romanovsky wrote:
> > On Tue, Apr 05, 2022 at 12:48:45PM -0700, Saeed Mahameed wrote:
> > > On 05 Apr 11:12, Leon Romanovsky wrote:
> > > > From: Patrisious Haddad <phaddad@nvidia.com>
> > > >
> > > > Prior to this patch in the case that destroy_unmap_eq()
> > > > failed and was called again, it triggered an additional call of
> > > 
> > > Where is it being failed and called again ? this shouldn't even be an
> > > option, we try to keep mlx5 symmetrical, constructors and destructors are
> > > supposed to be called only once in their respective positions.
> > > the callers must be fixed to avoid re-entry, or change destructors to clear
> > > up all resources even on failures, no matter what do not invent a reentry
> > > protocols to mlx5 destructors.
> > 
> > It can happen when QP is exposed through DEVX interface. In that flow,
> > only FW knows about it and reference count all users. This means that
> > attempt to destroy such QP will fail, but mlx5_core is structured in
> > such way that all cleanup was done before calling to FW to get
> > success/fail response.
> 
> I wasn't talking about destroy_qp, actually destroy_qp is implemented the
> way i am asking you to implement destroy_eq(); remove debugfs on first call
> to destroy EQ, and drop the reentry logic from from mlx5_eq_destroy_generic
> and destroy_async_eq.
> 
> EQ is a core/mlx5_ib resources, it's not exposed to user nor DEVX, it
> shouldn't be subject to DEVX limitations.

I tend to agree with you. I'll take another look on it and resubmit.

> 
> Also looking at the destroy_qp implementation, it removes the debugfs
> unconditionally even if the QP has ref count and removal will fail in FW.
> just FYI.

Right, we don't care about debugfs.

> 
> For EQ I don't even understand why devx can cause ODP EQ removal to fail..
> you must fix this at mlx5_ib layer, but for this patch, please drop the
> re-entry and remove debugfs in destroy_eq, unconditionally.

The reason to complexity is not debugfs, but an existence of
"mlx5_frag_buf_free(dev, &eq->frag_buf);" line, after FW command is
executed.

We need to separate to two flows: the one that can tolerate FW cmd failures
and the one that can't. If you don't add "reentry" flag, you can (theoretically)
find yourself leaking ->frag_buf in the flows that don't know how to reentry.

I'll resubmit.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-10  7:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-05  8:12 [PATCH rdma-next 0/3] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
2022-04-05  8:12 ` [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction Leon Romanovsky
2022-04-05 19:48   ` Saeed Mahameed
2022-04-06  7:55     ` Leon Romanovsky
2022-04-08 19:30       ` Saeed Mahameed
2022-04-10  7:58         ` Leon Romanovsky
2022-04-05  8:12 ` [PATCH rdma-next 2/3] RDMA/mlx5: Handling dct common resource destruction upon firmware failure Leon Romanovsky
2022-04-05  8:12 ` [PATCH rdma-next 3/3] RDMA/mlx5: Return the firmware result upon destroying QP/RQ Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).