* [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel
@ 2024-09-19 3:05 Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 1/6] RDMA/bnxt_re: Fix a possible memory leak Selvin Xavier
` (7 more replies)
0 siblings, 8 replies; 14+ messages in thread
From: Selvin Xavier @ 2024-09-19 3:05 UTC (permalink / raw)
To: leon, jgg
Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
Selvin Xavier
Few generic fixes in bnxt_re driver for 6.12 kernel.
Please review and apply.
Thanks,
Selvin Xavier
v1 - v2:
- Add a patch that removes irq variant of spinlock and use
spin_lock_bh as the control path processing happens from
tasklet context
- Address the comments from Zhu Yanjun by initializing the
newly added spin lock.
- One more fix included in the series
Kalesh AP (2):
RDMA/bnxt_re: Fix a possible memory leak
RDMA/bnxt_re: Add a check for memory allocation
Saravanan Vajravel (1):
RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
Selvin Xavier (3):
RDMA/bnxt_re: Fix the usage of control path spin locks
RDMA/bnxt_re: synchronize the qp-handle table array
RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode
drivers/infiniband/hw/bnxt_re/ib_verbs.c | 6 ++++-
drivers/infiniband/hw/bnxt_re/main.c | 5 +++-
drivers/infiniband/hw/bnxt_re/qplib_fp.c | 4 ++++
drivers/infiniband/hw/bnxt_re/qplib_fp.h | 2 +-
drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 38 +++++++++++++++---------------
drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 ++
drivers/infiniband/hw/bnxt_re/qplib_res.c | 2 ++
7 files changed, 37 insertions(+), 22 deletions(-)
--
2.5.5
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 for-rc 1/6] RDMA/bnxt_re: Fix a possible memory leak
2024-09-19 3:05 [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier
@ 2024-09-19 3:05 ` Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 2/6] RDMA/bnxt_re: Fix incorrect AVID type in WQE structure Selvin Xavier
` (6 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Selvin Xavier @ 2024-09-19 3:05 UTC (permalink / raw)
To: leon, jgg
Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
Selvin Xavier
From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
In bnxt_re_setup_chip_ctx() when bnxt_qplib_map_db_bar() fails
driver is not freeing the memory allocated for "rdev->chip_ctx".
Fixes: 0ac20faf5d83 ("RDMA/bnxt_re: Reorg the bar mapping")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
---
drivers/infiniband/hw/bnxt_re/main.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 16a84ca..72719c8 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -186,8 +186,11 @@ static int bnxt_re_setup_chip_ctx(struct bnxt_re_dev *rdev)
bnxt_re_set_db_offset(rdev);
rc = bnxt_qplib_map_db_bar(&rdev->qplib_res);
- if (rc)
+ if (rc) {
+ kfree(rdev->chip_ctx);
+ rdev->chip_ctx = NULL;
return rc;
+ }
if (bnxt_qplib_determine_atomics(en_dev->pdev))
ibdev_info(&rdev->ibdev,
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 for-rc 2/6] RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
2024-09-19 3:05 [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 1/6] RDMA/bnxt_re: Fix a possible memory leak Selvin Xavier
@ 2024-09-19 3:05 ` Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 3/6] RDMA/bnxt_re: Add a check for memory allocation Selvin Xavier
` (5 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Selvin Xavier @ 2024-09-19 3:05 UTC (permalink / raw)
To: leon, jgg
Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
Saravanan Vajravel, Selvin Xavier
From: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Driver uses internal data structure to construct WQE frame.
It used avid type as u16 which can accommodate up to 64K AVs.
When outstanding AVID crosses 64K, driver truncates AVID and
hence it uses incorrect AVID to WR. This leads to WR failure
due to invalid AV ID and QP is moved to error state with reason
set to 19 (INVALID AVID). When RDMA CM path is used, this issue
hits QP1 and it is moved to error state
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
drivers/infiniband/hw/bnxt_re/qplib_fp.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.h b/drivers/infiniband/hw/bnxt_re/qplib_fp.h
index b62df87..820611a 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_fp.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.h
@@ -170,7 +170,7 @@ struct bnxt_qplib_swqe {
};
u32 q_key;
u32 dst_qp;
- u16 avid;
+ u32 avid;
} send;
/* Send Raw Ethernet and QP1 */
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 for-rc 3/6] RDMA/bnxt_re: Add a check for memory allocation
2024-09-19 3:05 [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 1/6] RDMA/bnxt_re: Fix a possible memory leak Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 2/6] RDMA/bnxt_re: Fix incorrect AVID type in WQE structure Selvin Xavier
@ 2024-09-19 3:05 ` Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 4/6] RDMA/bnxt_re: Fix the usage of control path spin locks Selvin Xavier
` (4 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Selvin Xavier @ 2024-09-19 3:05 UTC (permalink / raw)
To: leon, jgg
Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
Selvin Xavier
From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
__alloc_pbl() can return error when memory allocation fails.
Driver is not checking the status on one of the instances.
Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation")
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
drivers/infiniband/hw/bnxt_re/qplib_res.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
index dfc943f..1fdffd6 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
@@ -244,6 +244,8 @@ int bnxt_qplib_alloc_init_hwq(struct bnxt_qplib_hwq *hwq,
sginfo.pgsize = npde * pg_size;
sginfo.npages = 1;
rc = __alloc_pbl(res, &hwq->pbl[PBL_LVL_0], &sginfo);
+ if (rc)
+ goto fail;
/* Alloc PBL pages */
sginfo.npages = npbl;
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 for-rc 4/6] RDMA/bnxt_re: Fix the usage of control path spin locks
2024-09-19 3:05 [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier
` (2 preceding siblings ...)
2024-09-19 3:05 ` [PATCH v2 for-rc 3/6] RDMA/bnxt_re: Add a check for memory allocation Selvin Xavier
@ 2024-09-19 3:05 ` Selvin Xavier
2024-09-19 3:06 ` [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier
` (3 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Selvin Xavier @ 2024-09-19 3:05 UTC (permalink / raw)
To: leon, jgg
Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
Selvin Xavier
Control path completion processing always runs in
tasklet context. To synchronize with the posting
thread, there is no need to use the irq variant of
spin lock. Use spin_lock_bh instead.
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
index 3ffaef0c..5bef9b4 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
@@ -290,7 +290,6 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw,
struct bnxt_qplib_hwq *hwq;
u32 sw_prod, cmdq_prod;
struct pci_dev *pdev;
- unsigned long flags;
u16 cookie;
u8 *preq;
@@ -301,7 +300,7 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw,
/* Cmdq are in 16-byte units, each request can consume 1 or more
* cmdqe
*/
- spin_lock_irqsave(&hwq->lock, flags);
+ spin_lock_bh(&hwq->lock);
required_slots = bnxt_qplib_get_cmd_slots(msg->req);
free_slots = HWQ_FREE_SLOTS(hwq);
cookie = cmdq->seq_num & RCFW_MAX_COOKIE_VALUE;
@@ -311,7 +310,7 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw,
dev_info_ratelimited(&pdev->dev,
"CMDQ is full req/free %d/%d!",
required_slots, free_slots);
- spin_unlock_irqrestore(&hwq->lock, flags);
+ spin_unlock_bh(&hwq->lock);
return -EAGAIN;
}
if (msg->block)
@@ -367,7 +366,7 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw,
wmb();
writel(cmdq_prod, cmdq->cmdq_mbox.prod);
writel(RCFW_CMDQ_TRIG_VAL, cmdq->cmdq_mbox.db);
- spin_unlock_irqrestore(&hwq->lock, flags);
+ spin_unlock_bh(&hwq->lock);
/* Return the CREQ response pointer */
return 0;
}
@@ -486,7 +485,6 @@ static int __bnxt_qplib_rcfw_send_message(struct bnxt_qplib_rcfw *rcfw,
{
struct creq_qp_event *evnt = (struct creq_qp_event *)msg->resp;
struct bnxt_qplib_crsqe *crsqe;
- unsigned long flags;
u16 cookie;
int rc;
u8 opcode;
@@ -512,12 +510,12 @@ static int __bnxt_qplib_rcfw_send_message(struct bnxt_qplib_rcfw *rcfw,
rc = __poll_for_resp(rcfw, cookie);
if (rc) {
- spin_lock_irqsave(&rcfw->cmdq.hwq.lock, flags);
+ spin_lock_bh(&rcfw->cmdq.hwq.lock);
crsqe = &rcfw->crsqe_tbl[cookie];
crsqe->is_waiter_alive = false;
if (rc == -ENODEV)
set_bit(FIRMWARE_STALL_DETECTED, &rcfw->cmdq.flags);
- spin_unlock_irqrestore(&rcfw->cmdq.hwq.lock, flags);
+ spin_unlock_bh(&rcfw->cmdq.hwq.lock);
return -ETIMEDOUT;
}
@@ -628,7 +626,6 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
u16 cookie, blocked = 0;
bool is_waiter_alive;
struct pci_dev *pdev;
- unsigned long flags;
u32 wait_cmds = 0;
int rc = 0;
@@ -659,8 +656,7 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
*
*/
- spin_lock_irqsave_nested(&hwq->lock, flags,
- SINGLE_DEPTH_NESTING);
+ spin_lock_nested(&hwq->lock, SINGLE_DEPTH_NESTING);
cookie = le16_to_cpu(qp_event->cookie);
blocked = cookie & RCFW_CMD_IS_BLOCKING;
cookie &= RCFW_MAX_COOKIE_VALUE;
@@ -672,7 +668,7 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
dev_info(&pdev->dev,
"rcfw timedout: cookie = %#x, free_slots = %d",
cookie, crsqe->free_slots);
- spin_unlock_irqrestore(&hwq->lock, flags);
+ spin_unlock(&hwq->lock);
return rc;
}
@@ -720,7 +716,7 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
__destroy_timedout_ah(rcfw,
(struct creq_create_ah_resp *)
qp_event);
- spin_unlock_irqrestore(&hwq->lock, flags);
+ spin_unlock(&hwq->lock);
}
*num_wait += wait_cmds;
return rc;
@@ -734,12 +730,11 @@ static void bnxt_qplib_service_creq(struct tasklet_struct *t)
u32 type, budget = CREQ_ENTRY_POLL_BUDGET;
struct bnxt_qplib_hwq *hwq = &creq->hwq;
struct creq_base *creqe;
- unsigned long flags;
u32 num_wakeup = 0;
u32 hw_polled = 0;
/* Service the CREQ until budget is over */
- spin_lock_irqsave(&hwq->lock, flags);
+ spin_lock_bh(&hwq->lock);
while (budget > 0) {
creqe = bnxt_qplib_get_qe(hwq, hwq->cons, NULL);
if (!CREQ_CMP_VALID(creqe, creq->creq_db.dbinfo.flags))
@@ -782,7 +777,7 @@ static void bnxt_qplib_service_creq(struct tasklet_struct *t)
if (hw_polled)
bnxt_qplib_ring_nq_db(&creq->creq_db.dbinfo,
rcfw->res->cctx, true);
- spin_unlock_irqrestore(&hwq->lock, flags);
+ spin_unlock_bh(&hwq->lock);
if (num_wakeup)
wake_up_nr(&rcfw->cmdq.waitq, num_wakeup);
}
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array
2024-09-19 3:05 [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier
` (3 preceding siblings ...)
2024-09-19 3:05 ` [PATCH v2 for-rc 4/6] RDMA/bnxt_re: Fix the usage of control path spin locks Selvin Xavier
@ 2024-09-19 3:06 ` Selvin Xavier
2024-09-19 8:28 ` Zhu Yanjun
2024-10-04 19:27 ` Jason Gunthorpe
2024-09-19 3:06 ` [PATCH v2 for-rc 6/6] RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode Selvin Xavier
` (2 subsequent siblings)
7 siblings, 2 replies; 14+ messages in thread
From: Selvin Xavier @ 2024-09-19 3:06 UTC (permalink / raw)
To: leon, jgg
Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
Selvin Xavier
There is a race between the CREQ tasklet and destroy
qp when accessing the qp-handle table. There is
a chance of reading a valid qp-handle in the
CREQ tasklet handler while the QP is already moving
ahead with the destruction.
Fixing this race by implementing a table-lock to
synchronize the access.
Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error")
Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
drivers/infiniband/hw/bnxt_re/qplib_fp.c | 4 ++++
drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 13 +++++++++----
drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 ++
3 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
index 42e98e5..bc358bd 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
@@ -1527,9 +1527,11 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res,
u32 tbl_indx;
int rc;
+ spin_lock_bh(&rcfw->tbl_lock);
tbl_indx = map_qp_id_to_tbl_indx(qp->id, rcfw);
rcfw->qp_tbl[tbl_indx].qp_id = BNXT_QPLIB_QP_ID_INVALID;
rcfw->qp_tbl[tbl_indx].qp_handle = NULL;
+ spin_unlock_bh(&rcfw->tbl_lock);
bnxt_qplib_rcfw_cmd_prep((struct cmdq_base *)&req,
CMDQ_BASE_OPCODE_DESTROY_QP,
@@ -1540,8 +1542,10 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res,
sizeof(resp), 0);
rc = bnxt_qplib_rcfw_send_message(rcfw, &msg);
if (rc) {
+ spin_lock_bh(&rcfw->tbl_lock);
rcfw->qp_tbl[tbl_indx].qp_id = qp->id;
rcfw->qp_tbl[tbl_indx].qp_handle = qp;
+ spin_unlock_bh(&rcfw->tbl_lock);
return rc;
}
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
index 5bef9b4..85bfedc 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
@@ -634,17 +634,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
err_event = (struct creq_qp_error_notification *)qp_event;
qp_id = le32_to_cpu(err_event->xid);
+ spin_lock_nested(&rcfw->tbl_lock, SINGLE_DEPTH_NESTING);
tbl_indx = map_qp_id_to_tbl_indx(qp_id, rcfw);
qp = rcfw->qp_tbl[tbl_indx].qp_handle;
+ if (!qp) {
+ spin_unlock(&rcfw->tbl_lock);
+ break;
+ }
+ bnxt_qplib_mark_qp_error(qp);
+ rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp);
+ spin_unlock(&rcfw->tbl_lock);
dev_dbg(&pdev->dev, "Received QP error notification\n");
dev_dbg(&pdev->dev,
"qpid 0x%x, req_err=0x%x, resp_err=0x%x\n",
qp_id, err_event->req_err_state_reason,
err_event->res_err_state_reason);
- if (!qp)
- break;
- bnxt_qplib_mark_qp_error(qp);
- rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp);
break;
default:
/*
@@ -973,6 +977,7 @@ int bnxt_qplib_alloc_rcfw_channel(struct bnxt_qplib_res *res,
GFP_KERNEL);
if (!rcfw->qp_tbl)
goto fail;
+ spin_lock_init(&rcfw->tbl_lock);
rcfw->max_timeout = res->cctx->hwrm_cmd_max_timeout;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
index 45996e6..07779ae 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
@@ -224,6 +224,8 @@ struct bnxt_qplib_rcfw {
struct bnxt_qplib_crsqe *crsqe_tbl;
int qp_tbl_size;
struct bnxt_qplib_qp_node *qp_tbl;
+ /* To synchronize the qp-handle hash table */
+ spinlock_t tbl_lock;
u64 oos_prev;
u32 init_oos_stats;
u32 cmdq_depth;
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 for-rc 6/6] RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode
2024-09-19 3:05 [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier
` (4 preceding siblings ...)
2024-09-19 3:06 ` [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier
@ 2024-09-19 3:06 ` Selvin Xavier
2024-09-20 12:56 ` [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Jason Gunthorpe
2024-10-04 20:05 ` Jason Gunthorpe
7 siblings, 0 replies; 14+ messages in thread
From: Selvin Xavier @ 2024-09-19 3:06 UTC (permalink / raw)
To: leon, jgg
Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
Selvin Xavier
max_sw_wqe used for static wqe mode should be same as the max_wqe.
Calculate the max_sw_wqe only for the variable WQE mode.
Fixes: de1d364c3815 ("RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
drivers/infiniband/hw/bnxt_re/ib_verbs.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 460f339..e66ae9f 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -1307,7 +1307,11 @@ static int bnxt_re_init_sq_attr(struct bnxt_re_qp *qp,
0 : BNXT_QPLIB_RESERVED_QP_WRS;
entries = bnxt_re_init_depth(entries + diff + 1, uctx);
sq->max_wqe = min_t(u32, entries, dev_attr->max_qp_wqes + diff + 1);
- sq->max_sw_wqe = bnxt_qplib_get_depth(sq, qplqp->wqe_mode, true);
+ if (qplqp->wqe_mode == BNXT_QPLIB_WQE_MODE_VARIABLE)
+ sq->max_sw_wqe = bnxt_qplib_get_depth(sq, qplqp->wqe_mode, true);
+ else
+ sq->max_sw_wqe = sq->max_wqe;
+
}
sq->q_full_delta = diff + 1;
/*
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array
2024-09-19 3:06 ` [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier
@ 2024-09-19 8:28 ` Zhu Yanjun
2024-10-04 19:27 ` Jason Gunthorpe
1 sibling, 0 replies; 14+ messages in thread
From: Zhu Yanjun @ 2024-09-19 8:28 UTC (permalink / raw)
To: Selvin Xavier, leon, jgg
Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil
在 2024/9/19 11:06, Selvin Xavier 写道:
> There is a race between the CREQ tasklet and destroy
> qp when accessing the qp-handle table. There is
> a chance of reading a valid qp-handle in the
> CREQ tasklet handler while the QP is already moving
> ahead with the destruction.
>
> Fixing this race by implementing a table-lock to
> synchronize the access.
>
> Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error")
> Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing")
> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
> ---
> drivers/infiniband/hw/bnxt_re/qplib_fp.c | 4 ++++
> drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 13 +++++++++----
> drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 ++
> 3 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
> index 42e98e5..bc358bd 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
> @@ -1527,9 +1527,11 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res,
> u32 tbl_indx;
> int rc;
>
> + spin_lock_bh(&rcfw->tbl_lock);
> tbl_indx = map_qp_id_to_tbl_indx(qp->id, rcfw);
> rcfw->qp_tbl[tbl_indx].qp_id = BNXT_QPLIB_QP_ID_INVALID;
> rcfw->qp_tbl[tbl_indx].qp_handle = NULL;
> + spin_unlock_bh(&rcfw->tbl_lock);
>
> bnxt_qplib_rcfw_cmd_prep((struct cmdq_base *)&req,
> CMDQ_BASE_OPCODE_DESTROY_QP,
> @@ -1540,8 +1542,10 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res,
> sizeof(resp), 0);
> rc = bnxt_qplib_rcfw_send_message(rcfw, &msg);
> if (rc) {
> + spin_lock_bh(&rcfw->tbl_lock);
> rcfw->qp_tbl[tbl_indx].qp_id = qp->id;
> rcfw->qp_tbl[tbl_indx].qp_handle = qp;
> + spin_unlock_bh(&rcfw->tbl_lock);
> return rc;
> }
>
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> index 5bef9b4..85bfedc 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> @@ -634,17 +634,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
> case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
> err_event = (struct creq_qp_error_notification *)qp_event;
> qp_id = le32_to_cpu(err_event->xid);
> + spin_lock_nested(&rcfw->tbl_lock, SINGLE_DEPTH_NESTING);
> tbl_indx = map_qp_id_to_tbl_indx(qp_id, rcfw);
> qp = rcfw->qp_tbl[tbl_indx].qp_handle;
> + if (!qp) {
> + spin_unlock(&rcfw->tbl_lock);
> + break;
> + }
> + bnxt_qplib_mark_qp_error(qp);
> + rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp);
> + spin_unlock(&rcfw->tbl_lock);
> dev_dbg(&pdev->dev, "Received QP error notification\n");
> dev_dbg(&pdev->dev,
> "qpid 0x%x, req_err=0x%x, resp_err=0x%x\n",
> qp_id, err_event->req_err_state_reason,
> err_event->res_err_state_reason);
> - if (!qp)
> - break;
> - bnxt_qplib_mark_qp_error(qp);
> - rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp);
> break;
> default:
> /*
> @@ -973,6 +977,7 @@ int bnxt_qplib_alloc_rcfw_channel(struct bnxt_qplib_res *res,
> GFP_KERNEL);
> if (!rcfw->qp_tbl)
> goto fail;
> + spin_lock_init(&rcfw->tbl_lock);
Thanks. I am fine with this spin_lock_bh and spin_lock_init.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Zhu Yanjun
>
> rcfw->max_timeout = res->cctx->hwrm_cmd_max_timeout;
>
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
> index 45996e6..07779ae 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
> @@ -224,6 +224,8 @@ struct bnxt_qplib_rcfw {
> struct bnxt_qplib_crsqe *crsqe_tbl;
> int qp_tbl_size;
> struct bnxt_qplib_qp_node *qp_tbl;
> + /* To synchronize the qp-handle hash table */
> + spinlock_t tbl_lock;
> u64 oos_prev;
> u32 init_oos_stats;
> u32 cmdq_depth;
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel
2024-09-19 3:05 [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier
` (5 preceding siblings ...)
2024-09-19 3:06 ` [PATCH v2 for-rc 6/6] RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode Selvin Xavier
@ 2024-09-20 12:56 ` Jason Gunthorpe
2024-10-04 20:05 ` Jason Gunthorpe
7 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2024-09-20 12:56 UTC (permalink / raw)
To: Selvin Xavier; +Cc: leon, linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil
On Wed, Sep 18, 2024 at 08:05:55PM -0700, Selvin Xavier wrote:
> Few generic fixes in bnxt_re driver for 6.12 kernel.
> Please review and apply.
These will have to wait till the next rc1 due to the LPC conference..
Jason
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array
2024-09-19 3:06 ` [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier
2024-09-19 8:28 ` Zhu Yanjun
@ 2024-10-04 19:27 ` Jason Gunthorpe
2024-10-07 5:50 ` Selvin Xavier
1 sibling, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2024-10-04 19:27 UTC (permalink / raw)
To: Selvin Xavier; +Cc: leon, linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil
On Wed, Sep 18, 2024 at 08:06:00PM -0700, Selvin Xavier wrote:
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> index 5bef9b4..85bfedc 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> @@ -634,17 +634,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
> case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
> err_event = (struct creq_qp_error_notification *)qp_event;
> qp_id = le32_to_cpu(err_event->xid);
> + spin_lock_nested(&rcfw->tbl_lock, SINGLE_DEPTH_NESTING);
Why would you need this lockdep annotation? tbl_lock doesn't look
nested into itself to me.
Regardless it is a big red flag to see lockdep exception stuff like
this without a clear explanation what it is for..
Jason
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel
2024-09-19 3:05 [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier
` (6 preceding siblings ...)
2024-09-20 12:56 ` [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Jason Gunthorpe
@ 2024-10-04 20:05 ` Jason Gunthorpe
7 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2024-10-04 20:05 UTC (permalink / raw)
To: Selvin Xavier; +Cc: leon, linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil
On Wed, Sep 18, 2024 at 08:05:55PM -0700, Selvin Xavier wrote:
> Few generic fixes in bnxt_re driver for 6.12 kernel.
> Please review and apply.
>
> Thanks,
> Selvin Xavier
>
> v1 - v2:
> - Add a patch that removes irq variant of spinlock and use
> spin_lock_bh as the control path processing happens from
> tasklet context
> - Address the comments from Zhu Yanjun by initializing the
> newly added spin lock.
> - One more fix included in the series
>
> Kalesh AP (2):
> RDMA/bnxt_re: Fix a possible memory leak
> RDMA/bnxt_re: Add a check for memory allocation
>
> Saravanan Vajravel (1):
> RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
>
> Selvin Xavier (3):
> RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode
Applied to for-rc
> RDMA/bnxt_re: Fix the usage of control path spin locks
> RDMA/bnxt_re: synchronize the qp-handle table array
I deferred these two with remarks
Thanks,
Jason
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array
2024-10-04 19:27 ` Jason Gunthorpe
@ 2024-10-07 5:50 ` Selvin Xavier
2024-10-07 12:28 ` Jason Gunthorpe
0 siblings, 1 reply; 14+ messages in thread
From: Selvin Xavier @ 2024-10-07 5:50 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: leon, linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil
[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]
On Sat, Oct 5, 2024 at 12:57 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Sep 18, 2024 at 08:06:00PM -0700, Selvin Xavier wrote:
>
> > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> > index 5bef9b4..85bfedc 100644
> > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> > @@ -634,17 +634,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
> > case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
> > err_event = (struct creq_qp_error_notification *)qp_event;
> > qp_id = le32_to_cpu(err_event->xid);
> > + spin_lock_nested(&rcfw->tbl_lock, SINGLE_DEPTH_NESTING);
>
> Why would you need this lockdep annotation? tbl_lock doesn't look
> nested into itself to me.
bnxt_qplib_process_qp_event is always called with a spinlock
(hwq->lock ) in the caller. i.e. bnxt_qplib_service_creq. I have used
the nested variant because of this.
>
> Regardless it is a big red flag to see lockdep exception stuff like
> this without a clear explanation what it is for..
Will add an explanation for this.
>
> Jason
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4224 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array
2024-10-07 5:50 ` Selvin Xavier
@ 2024-10-07 12:28 ` Jason Gunthorpe
2024-10-07 18:26 ` Selvin Xavier
0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2024-10-07 12:28 UTC (permalink / raw)
To: Selvin Xavier; +Cc: leon, linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil
On Mon, Oct 07, 2024 at 11:20:24AM +0530, Selvin Xavier wrote:
> On Sat, Oct 5, 2024 at 12:57 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Wed, Sep 18, 2024 at 08:06:00PM -0700, Selvin Xavier wrote:
> >
> > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> > > index 5bef9b4..85bfedc 100644
> > > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> > > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> > > @@ -634,17 +634,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
> > > case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
> > > err_event = (struct creq_qp_error_notification *)qp_event;
> > > qp_id = le32_to_cpu(err_event->xid);
> > > + spin_lock_nested(&rcfw->tbl_lock, SINGLE_DEPTH_NESTING);
> >
> > Why would you need this lockdep annotation? tbl_lock doesn't look
> > nested into itself to me.
> bnxt_qplib_process_qp_event is always called with a spinlock
> (hwq->lock ) in the caller. i.e. bnxt_qplib_service_creq. I have used
> the nested variant because of this.
That is not what nested is for. Nested different locks are fine, you
only need nested if you are nesting the same lock
Jason
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array
2024-10-07 12:28 ` Jason Gunthorpe
@ 2024-10-07 18:26 ` Selvin Xavier
0 siblings, 0 replies; 14+ messages in thread
From: Selvin Xavier @ 2024-10-07 18:26 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: leon, linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil
[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]
On Mon, Oct 7, 2024 at 5:58 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Mon, Oct 07, 2024 at 11:20:24AM +0530, Selvin Xavier wrote:
> > On Sat, Oct 5, 2024 at 12:57 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Wed, Sep 18, 2024 at 08:06:00PM -0700, Selvin Xavier wrote:
> > >
> > > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> > > > index 5bef9b4..85bfedc 100644
> > > > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> > > > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> > > > @@ -634,17 +634,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
> > > > case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
> > > > err_event = (struct creq_qp_error_notification *)qp_event;
> > > > qp_id = le32_to_cpu(err_event->xid);
> > > > + spin_lock_nested(&rcfw->tbl_lock, SINGLE_DEPTH_NESTING);
> > >
> > > Why would you need this lockdep annotation? tbl_lock doesn't look
> > > nested into itself to me.
> > bnxt_qplib_process_qp_event is always called with a spinlock
> > (hwq->lock ) in the caller. i.e. bnxt_qplib_service_creq. I have used
> > the nested variant because of this.
>
> That is not what nested is for. Nested different locks are fine, you
> only need nested if you are nesting the same lock
Ok. I will modify the patch and post it this week.
>
> Jason
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4224 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-10-07 18:26 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-19 3:05 [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 1/6] RDMA/bnxt_re: Fix a possible memory leak Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 2/6] RDMA/bnxt_re: Fix incorrect AVID type in WQE structure Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 3/6] RDMA/bnxt_re: Add a check for memory allocation Selvin Xavier
2024-09-19 3:05 ` [PATCH v2 for-rc 4/6] RDMA/bnxt_re: Fix the usage of control path spin locks Selvin Xavier
2024-09-19 3:06 ` [PATCH v2 for-rc 5/6] RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier
2024-09-19 8:28 ` Zhu Yanjun
2024-10-04 19:27 ` Jason Gunthorpe
2024-10-07 5:50 ` Selvin Xavier
2024-10-07 12:28 ` Jason Gunthorpe
2024-10-07 18:26 ` Selvin Xavier
2024-09-19 3:06 ` [PATCH v2 for-rc 6/6] RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode Selvin Xavier
2024-09-20 12:56 ` [PATCH v2 for-rc 0/6] RDMA/bnxt_re: Bug fixes for 6.12 kernel Jason Gunthorpe
2024-10-04 20:05 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).