* [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects
@ 2023-06-05 10:14 Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 1/4] net/mlx5: Nullify qp->dbg pointer post destruction Leon Romanovsky
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Leon Romanovsky @ 2023-06-05 10:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Eric Dumazet, Jakub Kicinski, linux-kernel,
linux-rdma, netdev, Paolo Abeni, Patrisious Haddad,
Saeed Mahameed
From: Leon Romanovsky <leonro@nvidia.com>
Changelog:
v2:
* Reworked DCT patch
v1: https://lore.kernel.org/all/cover.1678973858.git.leon@kernel.org
* Dropped EQ changes
v0: https://lore.kernel.org/all/cover.1649139915.git.leonro@nvidia.com
-----------------------------------------------------------------------
Hi,
This series from Patrisious extends mlx5 driver to convey FW failures
back to the upper layers and allow retry to delete these hardware
resources.
Thanks
Leon Romanovsky (1):
RDMA/mlx5: Reduce QP table exposure
Patrisious Haddad (3):
net/mlx5: Nullify qp->dbg pointer post destruction
RDMA/mlx5: Handle DCT QP logic separately from low level QP interface
RDMA/mlx5: Return the firmware result upon destroying QP/RQ
drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
drivers/infiniband/hw/mlx5/qp.h | 12 ++-
drivers/infiniband/hw/mlx5/qpc.c | 93 +++++++++++--------
.../net/ethernet/mellanox/mlx5/core/debugfs.c | 6 +-
include/linux/mlx5/driver.h | 10 --
5 files changed, 69 insertions(+), 53 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH rdma-next v2 1/4] net/mlx5: Nullify qp->dbg pointer post destruction
2023-06-05 10:14 [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
@ 2023-06-05 10:14 ` Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 2/4] RDMA/mlx5: Reduce QP table exposure Leon Romanovsky
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2023-06-05 10:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Patrisious Haddad, Eric Dumazet, Jakub Kicinski, linux-rdma,
netdev, Paolo Abeni, Saeed Mahameed
From: Patrisious Haddad <phaddad@nvidia.com>
Nullifying qp->dbg is a preparation for the next patches
from the series in which mlx5_core_destroy_qp() could actually fail,
and then it can be called again which causes a kernel crash, since
qp->dbg was not nullified in previous call.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/debugfs.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
index bb95b40d25eb..b08b5695ee45 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
@@ -513,11 +513,11 @@ EXPORT_SYMBOL(mlx5_debug_qp_add);
void mlx5_debug_qp_remove(struct mlx5_core_dev *dev, struct mlx5_core_qp *qp)
{
- if (!mlx5_debugfs_root)
+ if (!mlx5_debugfs_root || !qp->dbg)
return;
- if (qp->dbg)
- rem_res_tree(qp->dbg);
+ rem_res_tree(qp->dbg);
+ qp->dbg = NULL;
}
EXPORT_SYMBOL(mlx5_debug_qp_remove);
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH rdma-next v2 2/4] RDMA/mlx5: Reduce QP table exposure
2023-06-05 10:14 [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 1/4] net/mlx5: Nullify qp->dbg pointer post destruction Leon Romanovsky
@ 2023-06-05 10:14 ` Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 3/4] RDMA/mlx5: Handle DCT QP logic separately from low level QP interface Leon Romanovsky
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2023-06-05 10:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Eric Dumazet, Jakub Kicinski, linux-rdma, netdev,
Paolo Abeni, Patrisious Haddad, Saeed Mahameed
From: Leon Romanovsky <leonro@nvidia.com>
driver.h is common header to whole mlx5 code base, but struct
mlx5_qp_table is used in mlx5_ib driver only. So move that struct
to be under sole responsibility of mlx5_ib.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
drivers/infiniband/hw/mlx5/qp.h | 11 ++++++++++-
include/linux/mlx5/driver.h | 9 ---------
3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index fe5e23a06735..1eefefdaa009 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -25,6 +25,7 @@
#include <rdma/mlx5_user_ioctl_verbs.h>
#include "srq.h"
+#include "qp.h"
#define mlx5_ib_dbg(_dev, format, arg...) \
dev_dbg(&(_dev)->ib_dev.dev, "%s:%d:(pid %d): " format, __func__, \
diff --git a/drivers/infiniband/hw/mlx5/qp.h b/drivers/infiniband/hw/mlx5/qp.h
index 77f9b4a54816..f5130873dd52 100644
--- a/drivers/infiniband/hw/mlx5/qp.h
+++ b/drivers/infiniband/hw/mlx5/qp.h
@@ -6,7 +6,16 @@
#ifndef _MLX5_IB_QP_H
#define _MLX5_IB_QP_H
-#include "mlx5_ib.h"
+struct mlx5_ib_dev;
+
+struct mlx5_qp_table {
+ struct notifier_block nb;
+
+ /* protect radix tree
+ */
+ spinlock_t lock;
+ struct radix_tree_root tree;
+};
int mlx5_init_qp_table(struct mlx5_ib_dev *dev);
void mlx5_cleanup_qp_table(struct mlx5_ib_dev *dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index a4c4f737f9c1..e3b616388b18 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -443,15 +443,6 @@ struct mlx5_core_health {
struct delayed_work update_fw_log_ts_work;
};
-struct mlx5_qp_table {
- struct notifier_block nb;
-
- /* protect radix tree
- */
- spinlock_t lock;
- struct radix_tree_root tree;
-};
-
enum {
MLX5_PF_NOTIFY_DISABLE_VF,
MLX5_PF_NOTIFY_ENABLE_VF,
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH rdma-next v2 3/4] RDMA/mlx5: Handle DCT QP logic separately from low level QP interface
2023-06-05 10:14 [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 1/4] net/mlx5: Nullify qp->dbg pointer post destruction Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 2/4] RDMA/mlx5: Reduce QP table exposure Leon Romanovsky
@ 2023-06-05 10:14 ` Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 4/4] RDMA/mlx5: Return the firmware result upon destroying QP/RQ Leon Romanovsky
2023-06-11 9:19 ` [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
4 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2023-06-05 10:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Patrisious Haddad, Eric Dumazet, Jakub Kicinski, linux-rdma,
netdev, Paolo Abeni, Saeed Mahameed
From: Patrisious Haddad <phaddad@nvidia.com>
Previously when destroying a DCT, if the firmware function for the
destruction failed, the common resource would have been destroyed
either way, since it was destroyed before the firmware object.
Which leads to kernel warning "refcount_t: underflow" which indicates
possible use-after-free.
Which is triggered when we try to destroy the common resource for the
second time and execute refcount_dec_and_test(&common->refcount).
So, let's fix the destruction order by factoring out the DCT QP logic
to be in separate XArray database.
refcount_t: underflow; use-after-free.
WARNING: CPU: 8 PID: 1002 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse
CPU: 8 PID: 1002 Comm: python3 Not tainted 5.16.0-rc5+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0xd8/0xe0
Code: ff 48 c7 c7 18 f5 23 82 c6 05 60 70 ff 00 01 e8 d0 0a 45 00 0f 0b c3 48 c7 c7 c0 f4 23 82 c6 05 4c 70 ff 00 01 e8 ba 0a 45 00 <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13
RSP: 0018:ffff8881221d3aa8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8881313e8d40 RCX: ffff88852cc1b5c8
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88852cc1b5c0
RBP: ffff888100f70000 R08: ffff88853ffd1ba8 R09: 0000000000000003
R10: 00000000fffff000 R11: 3fffffffffffffff R12: 0000000000000246
R13: ffff888100f71fa0 R14: ffff8881221d3c68 R15: 0000000000000020
FS: 00007efebbb13740(0000) GS:ffff88852cc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005611aac29f80 CR3: 00000001313de004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
destroy_resource_common+0x6e/0x95 [mlx5_ib]
mlx5_core_destroy_rq_tracked+0x38/0xbe [mlx5_ib]
mlx5_ib_destroy_wq+0x22/0x80 [mlx5_ib]
ib_destroy_wq_user+0x1f/0x40 [ib_core]
uverbs_free_wq+0x19/0x40 [ib_uverbs]
destroy_hw_idr_uobject+0x18/0x50 [ib_uverbs]
uverbs_destroy_uobject+0x2f/0x190 [ib_uverbs]
uobj_destroy+0x3c/0x80 [ib_uverbs]
ib_uverbs_cmd_verbs+0x3e4/0xb80 [ib_uverbs]
? uverbs_free_wq+0x40/0x40 [ib_uverbs]
? ip_list_rcv+0xf7/0x120
? netif_receive_skb_list_internal+0x1b6/0x2d0
? task_tick_fair+0xbf/0x450
? __handle_mm_fault+0x11fc/0x1450
ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs]
__x64_sys_ioctl+0x3e4/0x8e0
? handle_mm_fault+0xb9/0x210
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7efebc0be17b
Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe71813e78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe71813fb8 RCX: 00007efebc0be17b
RDX: 00007ffe71813fa0 RSI: 00000000c0181b01 RDI: 0000000000000005
RBP: 00007ffe71813f80 R08: 00005611aae96020 R09: 000000000000004f
R10: 00007efebbf9ffa0 R11: 0000000000000246 R12: 00007ffe71813f80
R13: 00007ffe71813f4c R14: 00005611aae2eca0 R15: 00007efeae6c89d0
</TASK>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/qp.h | 1 +
drivers/infiniband/hw/mlx5/qpc.c | 83 +++++++++++++++++++-------------
include/linux/mlx5/driver.h | 1 -
3 files changed, 51 insertions(+), 34 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/qp.h b/drivers/infiniband/hw/mlx5/qp.h
index f5130873dd52..b6ee7c3ee1ca 100644
--- a/drivers/infiniband/hw/mlx5/qp.h
+++ b/drivers/infiniband/hw/mlx5/qp.h
@@ -10,6 +10,7 @@ struct mlx5_ib_dev;
struct mlx5_qp_table {
struct notifier_block nb;
+ struct xarray dct_xa;
/* protect radix tree
*/
diff --git a/drivers/infiniband/hw/mlx5/qpc.c b/drivers/infiniband/hw/mlx5/qpc.c
index bae0334d6e7f..3a2f755d4985 100644
--- a/drivers/infiniband/hw/mlx5/qpc.c
+++ b/drivers/infiniband/hw/mlx5/qpc.c
@@ -88,23 +88,35 @@ static bool is_event_type_allowed(int rsc_type, int event_type)
}
}
+static int dct_event_notifier(struct mlx5_ib_dev *dev, struct mlx5_eqe *eqe)
+{
+ struct mlx5_core_dct *dct;
+ unsigned long flags;
+ u32 qpn;
+
+ qpn = be32_to_cpu(eqe->data.dct.dctn) & 0xFFFFFF;
+ xa_lock_irqsave(&dev->qp_table.dct_xa, flags);
+ dct = xa_load(&dev->qp_table.dct_xa, qpn);
+ if (dct)
+ complete(&dct->drained);
+ xa_unlock_irqrestore(&dev->qp_table.dct_xa, flags);
+ return NOTIFY_OK;
+}
+
static int rsc_event_notifier(struct notifier_block *nb,
unsigned long type, void *data)
{
+ struct mlx5_ib_dev *dev =
+ container_of(nb, struct mlx5_ib_dev, qp_table.nb);
struct mlx5_core_rsc_common *common;
- struct mlx5_qp_table *table;
- struct mlx5_core_dct *dct;
+ struct mlx5_eqe *eqe = data;
u8 event_type = (u8)type;
struct mlx5_core_qp *qp;
- struct mlx5_eqe *eqe;
u32 rsn;
switch (event_type) {
case MLX5_EVENT_TYPE_DCT_DRAINED:
- eqe = data;
- rsn = be32_to_cpu(eqe->data.dct.dctn) & 0xffffff;
- rsn |= (MLX5_RES_DCT << MLX5_USER_INDEX_LEN);
- break;
+ return dct_event_notifier(dev, eqe);
case MLX5_EVENT_TYPE_PATH_MIG:
case MLX5_EVENT_TYPE_COMM_EST:
case MLX5_EVENT_TYPE_SQ_DRAINED:
@@ -113,7 +125,6 @@ static int rsc_event_notifier(struct notifier_block *nb,
case MLX5_EVENT_TYPE_PATH_MIG_FAILED:
case MLX5_EVENT_TYPE_WQ_INVAL_REQ_ERROR:
case MLX5_EVENT_TYPE_WQ_ACCESS_ERROR:
- eqe = data;
rsn = be32_to_cpu(eqe->data.qp_srq.qp_srq_n) & 0xffffff;
rsn |= (eqe->data.qp_srq.type << MLX5_USER_INDEX_LEN);
break;
@@ -121,8 +132,7 @@ static int rsc_event_notifier(struct notifier_block *nb,
return NOTIFY_DONE;
}
- table = container_of(nb, struct mlx5_qp_table, nb);
- common = mlx5_get_rsc(table, rsn);
+ common = mlx5_get_rsc(&dev->qp_table, rsn);
if (!common)
return NOTIFY_OK;
@@ -137,11 +147,6 @@ static int rsc_event_notifier(struct notifier_block *nb,
qp->event(qp, event_type);
/* Need to put resource in event handler */
return NOTIFY_OK;
- case MLX5_RES_DCT:
- dct = (struct mlx5_core_dct *)common;
- if (event_type == MLX5_EVENT_TYPE_DCT_DRAINED)
- complete(&dct->drained);
- break;
default:
break;
}
@@ -188,28 +193,15 @@ static void destroy_resource_common(struct mlx5_ib_dev *dev,
}
static int _mlx5_core_destroy_dct(struct mlx5_ib_dev *dev,
- struct mlx5_core_dct *dct, bool need_cleanup)
+ struct mlx5_core_dct *dct)
{
u32 in[MLX5_ST_SZ_DW(destroy_dct_in)] = {};
struct mlx5_core_qp *qp = &dct->mqp;
- int err;
- err = mlx5_core_drain_dct(dev, dct);
- if (err) {
- if (dev->mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
- goto destroy;
-
- return err;
- }
- wait_for_completion(&dct->drained);
-destroy:
- if (need_cleanup)
- destroy_resource_common(dev, &dct->mqp);
MLX5_SET(destroy_dct_in, in, opcode, MLX5_CMD_OP_DESTROY_DCT);
MLX5_SET(destroy_dct_in, in, dctn, qp->qpn);
MLX5_SET(destroy_dct_in, in, uid, qp->uid);
- err = mlx5_cmd_exec_in(dev->mdev, destroy_dct, in);
- return err;
+ return mlx5_cmd_exec_in(dev->mdev, destroy_dct, in);
}
int mlx5_core_create_dct(struct mlx5_ib_dev *dev, struct mlx5_core_dct *dct,
@@ -227,13 +219,13 @@ int mlx5_core_create_dct(struct mlx5_ib_dev *dev, struct mlx5_core_dct *dct,
qp->qpn = MLX5_GET(create_dct_out, out, dctn);
qp->uid = MLX5_GET(create_dct_in, in, uid);
- err = create_resource_common(dev, qp, MLX5_RES_DCT);
+ err = xa_err(xa_store_irq(&dev->qp_table.dct_xa, qp->qpn, dct, GFP_KERNEL));
if (err)
goto err_cmd;
return 0;
err_cmd:
- _mlx5_core_destroy_dct(dev, dct, false);
+ _mlx5_core_destroy_dct(dev, dct);
return err;
}
@@ -284,7 +276,31 @@ static int mlx5_core_drain_dct(struct mlx5_ib_dev *dev,
int mlx5_core_destroy_dct(struct mlx5_ib_dev *dev,
struct mlx5_core_dct *dct)
{
- return _mlx5_core_destroy_dct(dev, dct, true);
+ struct mlx5_qp_table *table = &dev->qp_table;
+ struct mlx5_core_dct *tmp;
+ int err;
+
+ err = mlx5_core_drain_dct(dev, dct);
+ if (err) {
+ if (dev->mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
+ goto destroy;
+
+ return err;
+ }
+ wait_for_completion(&dct->drained);
+
+destroy:
+ tmp = xa_cmpxchg_irq(&table->dct_xa, dct->mqp.qpn, dct, XA_ZERO_ENTRY, GFP_KERNEL);
+ if (WARN_ON(tmp != dct))
+ return xa_err(tmp) ?: -EINVAL;
+
+ err = _mlx5_core_destroy_dct(dev, dct);
+ if (err) {
+ xa_cmpxchg_irq(&table->dct_xa, dct->mqp.qpn, XA_ZERO_ENTRY, dct, 0);
+ return err;
+ }
+ xa_erase_irq(&table->dct_xa, dct->mqp.qpn);
+ return 0;
}
int mlx5_core_destroy_qp(struct mlx5_ib_dev *dev, struct mlx5_core_qp *qp)
@@ -488,6 +504,7 @@ int mlx5_init_qp_table(struct mlx5_ib_dev *dev)
spin_lock_init(&table->lock);
INIT_RADIX_TREE(&table->tree, GFP_ATOMIC);
+ xa_init(&table->dct_xa);
mlx5_qp_debugfs_init(dev->mdev);
table->nb.notifier_call = rsc_event_notifier;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index e3b616388b18..e67c603d507b 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -382,7 +382,6 @@ enum mlx5_res_type {
MLX5_RES_SRQ = 3,
MLX5_RES_XSRQ = 4,
MLX5_RES_XRQ = 5,
- MLX5_RES_DCT = MLX5_EVENT_QUEUE_TYPE_DCT,
};
struct mlx5_core_rsc_common {
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH rdma-next v2 4/4] RDMA/mlx5: Return the firmware result upon destroying QP/RQ
2023-06-05 10:14 [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
` (2 preceding siblings ...)
2023-06-05 10:14 ` [PATCH rdma-next v2 3/4] RDMA/mlx5: Handle DCT QP logic separately from low level QP interface Leon Romanovsky
@ 2023-06-05 10:14 ` Leon Romanovsky
2023-06-11 9:19 ` [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
4 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2023-06-05 10:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Patrisious Haddad, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed
From: Patrisious Haddad <phaddad@nvidia.com>
Previously when destroying a QP/RQ, the result of the firmware
destruction function was ignored and upper layers weren't informed
about the failure.
Which in turn could lead to various problems since when upper layer
isn't aware of the failure it continues its operation thinking that the
related QP/RQ was successfully destroyed while it actually wasn't,
which could lead to the below kernel WARN.
Currently, we return the correct firmware destruction status to upper
layers which in case of the RQ would be mlx5_ib_destroy_wq() which
was already capable of handling RQ destruction failure or in case of
a QP to destroy_qp_common(), which now would actually warn upon qp
destruction failure.
WARNING: CPU: 3 PID: 995 at drivers/infiniband/core/rdma_core.c:940 uverbs_destroy_ufile_hw+0xcb/0xe0 [ib_uverbs]
Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse
CPU: 3 PID: 995 Comm: python3 Not tainted 5.16.0-rc5+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:uverbs_destroy_ufile_hw+0xcb/0xe0 [ib_uverbs]
Code: 41 5c 41 5d 41 5e e9 44 34 f0 e0 48 89 df e8 4c 77 ff ff 49 8b 86 10 01 00 00 48 85 c0 74 a1 4c 89 e7 ff d0 eb 9a 0f 0b eb c1 <0f> 0b be 04 00 00 00 48 89 df e8 b6 f6 ff ff e9 75 ff ff ff 90 0f
RSP: 0018:ffff8881533e3e78 EFLAGS: 00010287
RAX: ffff88811b2cf3e0 RBX: ffff888106209700 RCX: 0000000000000000
RDX: ffff888106209780 RSI: ffff8881533e3d30 RDI: ffff888109b101a0
RBP: 0000000000000001 R08: ffff888127cb381c R09: 0de9890000000009
R10: ffff888127cb3800 R11: 0000000000000000 R12: ffff888106209780
R13: ffff888106209750 R14: ffff888100f20660 R15: 0000000000000000
FS: 00007f8be353b740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8bd5b117c0 CR3: 000000012cd8a004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ib_uverbs_close+0x1a/0x90 [ib_uverbs]
__fput+0x82/0x230
task_work_run+0x59/0x90
exit_to_user_mode_prepare+0x138/0x140
syscall_exit_to_user_mode+0x1d/0x50
? __x64_sys_close+0xe/0x40
do_syscall_64+0x4a/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f8be3ae0abb
Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 83 43 f9 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 c1 43 f9 ff 8b 44
RSP: 002b:00007ffdb51909c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000557bb7f7c020 RCX: 00007f8be3ae0abb
RDX: 0000557bb7c74010 RSI: 0000557bb7f14ca0 RDI: 0000000000000005
RBP: 0000557bb7fbd598 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000557bb7fbd5b8
R13: 0000557bb7fbd5a8 R14: 0000000000001000 R15: 0000557bb7f7c020
</TASK>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/qpc.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/qpc.c b/drivers/infiniband/hw/mlx5/qpc.c
index 3a2f755d4985..d9cf6982d645 100644
--- a/drivers/infiniband/hw/mlx5/qpc.c
+++ b/drivers/infiniband/hw/mlx5/qpc.c
@@ -314,8 +314,7 @@ int mlx5_core_destroy_qp(struct mlx5_ib_dev *dev, struct mlx5_core_qp *qp)
MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP);
MLX5_SET(destroy_qp_in, in, qpn, qp->qpn);
MLX5_SET(destroy_qp_in, in, uid, qp->uid);
- mlx5_cmd_exec_in(dev->mdev, destroy_qp, in);
- return 0;
+ return mlx5_cmd_exec_in(dev->mdev, destroy_qp, in);
}
int mlx5_core_set_delay_drop(struct mlx5_ib_dev *dev,
@@ -568,14 +567,14 @@ int mlx5_core_xrcd_dealloc(struct mlx5_ib_dev *dev, u32 xrcdn)
return mlx5_cmd_exec_in(dev->mdev, dealloc_xrcd, in);
}
-static void destroy_rq_tracked(struct mlx5_ib_dev *dev, u32 rqn, u16 uid)
+static int destroy_rq_tracked(struct mlx5_ib_dev *dev, u32 rqn, u16 uid)
{
u32 in[MLX5_ST_SZ_DW(destroy_rq_in)] = {};
MLX5_SET(destroy_rq_in, in, opcode, MLX5_CMD_OP_DESTROY_RQ);
MLX5_SET(destroy_rq_in, in, rqn, rqn);
MLX5_SET(destroy_rq_in, in, uid, uid);
- mlx5_cmd_exec_in(dev->mdev, destroy_rq, in);
+ return mlx5_cmd_exec_in(dev->mdev, destroy_rq, in);
}
int mlx5_core_create_rq_tracked(struct mlx5_ib_dev *dev, u32 *in, int inlen,
@@ -606,8 +605,7 @@ int mlx5_core_destroy_rq_tracked(struct mlx5_ib_dev *dev,
struct mlx5_core_qp *rq)
{
destroy_resource_common(dev, rq);
- destroy_rq_tracked(dev, rq->qpn, rq->uid);
- return 0;
+ return destroy_rq_tracked(dev, rq->qpn, rq->uid);
}
static void destroy_sq_tracked(struct mlx5_ib_dev *dev, u32 sqn, u16 uid)
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects
2023-06-05 10:14 [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
` (3 preceding siblings ...)
2023-06-05 10:14 ` [PATCH rdma-next v2 4/4] RDMA/mlx5: Return the firmware result upon destroying QP/RQ Leon Romanovsky
@ 2023-06-11 9:19 ` Leon Romanovsky
4 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2023-06-11 9:19 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Eric Dumazet, Jakub Kicinski, linux-kernel, linux-rdma, netdev,
Paolo Abeni, Patrisious Haddad, Saeed Mahameed, Leon Romanovsky
On Mon, 05 Jun 2023 13:14:03 +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> Changelog:
> v2:
> * Reworked DCT patch
> v1: https://lore.kernel.org/all/cover.1678973858.git.leon@kernel.org
> * Dropped EQ changes
> v0: https://lore.kernel.org/all/cover.1649139915.git.leonro@nvidia.com
> -----------------------------------------------------------------------
>
> [...]
Applied, thanks!
[1/4] net/mlx5: Nullify qp->dbg pointer post destruction
https://git.kernel.org/rdma/rdma/c/c023b61ac8285d
[2/4] RDMA/mlx5: Reduce QP table exposure
https://git.kernel.org/rdma/rdma/c/2ecfd946169e7f
[3/4] RDMA/mlx5: Handle DCT QP logic separately from low level QP interface
https://git.kernel.org/rdma/rdma/c/afff24899846ff
[4/4] RDMA/mlx5: Return the firmware result upon destroying QP/RQ
https://git.kernel.org/rdma/rdma/c/22664c06e99708
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-06-11 9:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-05 10:14 [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 1/4] net/mlx5: Nullify qp->dbg pointer post destruction Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 2/4] RDMA/mlx5: Reduce QP table exposure Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 3/4] RDMA/mlx5: Handle DCT QP logic separately from low level QP interface Leon Romanovsky
2023-06-05 10:14 ` [PATCH rdma-next v2 4/4] RDMA/mlx5: Return the firmware result upon destroying QP/RQ Leon Romanovsky
2023-06-11 9:19 ` [PATCH rdma-next v2 0/4] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).