* [PATCH 0/4] RDMA/bnxt_re: Bug fixes for 6.12 kernel
@ 2024-09-11 18:29 Selvin Xavier
2024-09-11 18:29 ` [PATCH 1/4] RDMA/bnxt_re: Fix a possible memory leak Selvin Xavier
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Selvin Xavier @ 2024-09-11 18:29 UTC (permalink / raw)
To: leon, jgg
Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
Selvin Xavier
Few generic fixes in bnxt_re driver for 6.12 kernel.
Please review and apply.
Thanks,
Selvin Xavier
Kalesh AP (2):
RDMA/bnxt_re: Fix a possible memory leak
RDMA/bnxt_re: Add a check for memory allocation
Saravanan Vajravel (1):
RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
Selvin Xavier (1):
RDMA/bnxt_re: synchronize the qp-handle table array
drivers/infiniband/hw/bnxt_re/main.c | 5 ++++-
drivers/infiniband/hw/bnxt_re/qplib_fp.c | 5 +++++
drivers/infiniband/hw/bnxt_re/qplib_fp.h | 2 +-
drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 12 ++++++++----
drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 ++
drivers/infiniband/hw/bnxt_re/qplib_res.c | 2 ++
6 files changed, 22 insertions(+), 6 deletions(-)
--
2.5.5
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/4] RDMA/bnxt_re: Fix a possible memory leak 2024-09-11 18:29 [PATCH 0/4] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier @ 2024-09-11 18:29 ` Selvin Xavier 2024-09-11 18:29 ` [PATCH 2/4] RDMA/bnxt_re: Fix incorrect AVID type in WQE structure Selvin Xavier ` (2 subsequent siblings) 3 siblings, 0 replies; 8+ messages in thread From: Selvin Xavier @ 2024-09-11 18:29 UTC (permalink / raw) To: leon, jgg Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil, Selvin Xavier From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> In bnxt_re_setup_chip_ctx() when bnxt_qplib_map_db_bar() fails driver is not freeing the memory allocated for "rdev->chip_ctx". Fixes: 0ac20faf5d83 ("RDMA/bnxt_re: Reorg the bar mapping") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> --- drivers/infiniband/hw/bnxt_re/main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c index 16a84ca..72719c8 100644 --- a/drivers/infiniband/hw/bnxt_re/main.c +++ b/drivers/infiniband/hw/bnxt_re/main.c @@ -186,8 +186,11 @@ static int bnxt_re_setup_chip_ctx(struct bnxt_re_dev *rdev) bnxt_re_set_db_offset(rdev); rc = bnxt_qplib_map_db_bar(&rdev->qplib_res); - if (rc) + if (rc) { + kfree(rdev->chip_ctx); + rdev->chip_ctx = NULL; return rc; + } if (bnxt_qplib_determine_atomics(en_dev->pdev)) ibdev_info(&rdev->ibdev, -- 2.5.5 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/4] RDMA/bnxt_re: Fix incorrect AVID type in WQE structure 2024-09-11 18:29 [PATCH 0/4] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier 2024-09-11 18:29 ` [PATCH 1/4] RDMA/bnxt_re: Fix a possible memory leak Selvin Xavier @ 2024-09-11 18:29 ` Selvin Xavier 2024-09-11 18:29 ` [PATCH 3/4] RDMA/bnxt_re: Add a check for memory allocation Selvin Xavier 2024-09-11 18:29 ` [PATCH 4/4] RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier 3 siblings, 0 replies; 8+ messages in thread From: Selvin Xavier @ 2024-09-11 18:29 UTC (permalink / raw) To: leon, jgg Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil, Saravanan Vajravel, Selvin Xavier From: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Driver uses internal data structure to construct WQE frame. It used avid type as u16 which can accommodate up to 64K AVs. When outstanding AVID crosses 64K, driver truncates AVID and hence it uses incorrect AVID to WR. This leads to WR failure due to invalid AV ID and QP is moved to error state with reason set to 19 (INVALID AVID). When RDMA CM path is used, this issue hits QP1 and it is moved to error state Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> --- drivers/infiniband/hw/bnxt_re/qplib_fp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.h b/drivers/infiniband/hw/bnxt_re/qplib_fp.h index b62df87..820611a 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.h +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.h @@ -170,7 +170,7 @@ struct bnxt_qplib_swqe { }; u32 q_key; u32 dst_qp; - u16 avid; + u32 avid; } send; /* Send Raw Ethernet and QP1 */ -- 2.5.5 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/4] RDMA/bnxt_re: Add a check for memory allocation 2024-09-11 18:29 [PATCH 0/4] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier 2024-09-11 18:29 ` [PATCH 1/4] RDMA/bnxt_re: Fix a possible memory leak Selvin Xavier 2024-09-11 18:29 ` [PATCH 2/4] RDMA/bnxt_re: Fix incorrect AVID type in WQE structure Selvin Xavier @ 2024-09-11 18:29 ` Selvin Xavier 2024-09-11 18:29 ` [PATCH 4/4] RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier 3 siblings, 0 replies; 8+ messages in thread From: Selvin Xavier @ 2024-09-11 18:29 UTC (permalink / raw) To: leon, jgg Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil, Selvin Xavier From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> __alloc_pbl() can return error when memory allocation fails. Driver is not checking the status on one of the instances. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> --- drivers/infiniband/hw/bnxt_re/qplib_res.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c index dfc943f..1fdffd6 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_res.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c @@ -244,6 +244,8 @@ int bnxt_qplib_alloc_init_hwq(struct bnxt_qplib_hwq *hwq, sginfo.pgsize = npde * pg_size; sginfo.npages = 1; rc = __alloc_pbl(res, &hwq->pbl[PBL_LVL_0], &sginfo); + if (rc) + goto fail; /* Alloc PBL pages */ sginfo.npages = npbl; -- 2.5.5 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/4] RDMA/bnxt_re: synchronize the qp-handle table array 2024-09-11 18:29 [PATCH 0/4] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier ` (2 preceding siblings ...) 2024-09-11 18:29 ` [PATCH 3/4] RDMA/bnxt_re: Add a check for memory allocation Selvin Xavier @ 2024-09-11 18:29 ` Selvin Xavier 2024-09-13 4:26 ` Zhu Yanjun 2024-09-13 4:30 ` Zhu Yanjun 3 siblings, 2 replies; 8+ messages in thread From: Selvin Xavier @ 2024-09-11 18:29 UTC (permalink / raw) To: leon, jgg Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil, Selvin Xavier There is a race between the CREQ tasklet and destroy qp when accessing the qp-handle table. There is a chance of reading a valid qp-handle in the CREQ tasklet handler while the QP is already moving ahead with the destruction. Fixing this race by implementing a table-lock to synchronize the access. Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> --- drivers/infiniband/hw/bnxt_re/qplib_fp.c | 5 +++++ drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 12 ++++++++---- drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 ++ 3 files changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c index 42e98e5..5d36216 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c @@ -1524,12 +1524,15 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res, struct creq_destroy_qp_resp resp = {}; struct bnxt_qplib_cmdqmsg msg = {}; struct cmdq_destroy_qp req = {}; + unsigned long flags; u32 tbl_indx; int rc; + spin_lock_irqsave(&rcfw->tbl_lock, flags); tbl_indx = map_qp_id_to_tbl_indx(qp->id, rcfw); rcfw->qp_tbl[tbl_indx].qp_id = BNXT_QPLIB_QP_ID_INVALID; rcfw->qp_tbl[tbl_indx].qp_handle = NULL; + spin_unlock_irqrestore(&rcfw->tbl_lock, flags); bnxt_qplib_rcfw_cmd_prep((struct cmdq_base *)&req, CMDQ_BASE_OPCODE_DESTROY_QP, @@ -1540,8 +1543,10 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res, sizeof(resp), 0); rc = bnxt_qplib_rcfw_send_message(rcfw, &msg); if (rc) { + spin_lock_irqsave(&rcfw->tbl_lock, flags); rcfw->qp_tbl[tbl_indx].qp_id = qp->id; rcfw->qp_tbl[tbl_indx].qp_handle = qp; + spin_unlock_irqrestore(&rcfw->tbl_lock, flags); return rc; } diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c index 3ffaef0c..993c356 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c @@ -637,17 +637,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw, case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION: err_event = (struct creq_qp_error_notification *)qp_event; qp_id = le32_to_cpu(err_event->xid); + spin_lock(&rcfw->tbl_lock); tbl_indx = map_qp_id_to_tbl_indx(qp_id, rcfw); qp = rcfw->qp_tbl[tbl_indx].qp_handle; + if (!qp) { + spin_unlock(&rcfw->tbl_lock); + break; + } + bnxt_qplib_mark_qp_error(qp); + rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp); + spin_unlock(&rcfw->tbl_lock); dev_dbg(&pdev->dev, "Received QP error notification\n"); dev_dbg(&pdev->dev, "qpid 0x%x, req_err=0x%x, resp_err=0x%x\n", qp_id, err_event->req_err_state_reason, err_event->res_err_state_reason); - if (!qp) - break; - bnxt_qplib_mark_qp_error(qp); - rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp); break; default: /* diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h index 45996e6..07779ae 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h @@ -224,6 +224,8 @@ struct bnxt_qplib_rcfw { struct bnxt_qplib_crsqe *crsqe_tbl; int qp_tbl_size; struct bnxt_qplib_qp_node *qp_tbl; + /* To synchronize the qp-handle hash table */ + spinlock_t tbl_lock; u64 oos_prev; u32 init_oos_stats; u32 cmdq_depth; -- 2.5.5 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 4/4] RDMA/bnxt_re: synchronize the qp-handle table array 2024-09-11 18:29 ` [PATCH 4/4] RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier @ 2024-09-13 4:26 ` Zhu Yanjun 2024-09-13 4:30 ` Zhu Yanjun 1 sibling, 0 replies; 8+ messages in thread From: Zhu Yanjun @ 2024-09-13 4:26 UTC (permalink / raw) To: Selvin Xavier, leon, jgg Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil 在 2024/9/12 2:29, Selvin Xavier 写道: > There is a race between the CREQ tasklet and destroy > qp when accessing the qp-handle table. There is > a chance of reading a valid qp-handle in the > CREQ tasklet handler while the QP is already moving > ahead with the destruction. > > Fixing this race by implementing a table-lock to > synchronize the access. > > Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") > Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing") > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> > Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> > --- > drivers/infiniband/hw/bnxt_re/qplib_fp.c | 5 +++++ > drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 12 ++++++++---- > drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 ++ > 3 files changed, 15 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c > index 42e98e5..5d36216 100644 > --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c > +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c > @@ -1524,12 +1524,15 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res, > struct creq_destroy_qp_resp resp = {}; > struct bnxt_qplib_cmdqmsg msg = {}; > struct cmdq_destroy_qp req = {}; > + unsigned long flags; > u32 tbl_indx; > int rc; > > + spin_lock_irqsave(&rcfw->tbl_lock, flags); > tbl_indx = map_qp_id_to_tbl_indx(qp->id, rcfw); > rcfw->qp_tbl[tbl_indx].qp_id = BNXT_QPLIB_QP_ID_INVALID; > rcfw->qp_tbl[tbl_indx].qp_handle = NULL; > + spin_unlock_irqrestore(&rcfw->tbl_lock, flags); > > bnxt_qplib_rcfw_cmd_prep((struct cmdq_base *)&req, > CMDQ_BASE_OPCODE_DESTROY_QP, > @@ -1540,8 +1543,10 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res, > sizeof(resp), 0); > rc = bnxt_qplib_rcfw_send_message(rcfw, &msg); > if (rc) { > + spin_lock_irqsave(&rcfw->tbl_lock, flags); > rcfw->qp_tbl[tbl_indx].qp_id = qp->id; > rcfw->qp_tbl[tbl_indx].qp_handle = qp; > + spin_unlock_irqrestore(&rcfw->tbl_lock, flags); > return rc; > } > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > index 3ffaef0c..993c356 100644 > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > @@ -637,17 +637,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw, > case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION: > err_event = (struct creq_qp_error_notification *)qp_event; > qp_id = le32_to_cpu(err_event->xid); > + spin_lock(&rcfw->tbl_lock); > tbl_indx = map_qp_id_to_tbl_indx(qp_id, rcfw); > qp = rcfw->qp_tbl[tbl_indx].qp_handle; > + if (!qp) { > + spin_unlock(&rcfw->tbl_lock); > + break; > + } > + bnxt_qplib_mark_qp_error(qp); > + rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp); > + spin_unlock(&rcfw->tbl_lock); > dev_dbg(&pdev->dev, "Received QP error notification\n"); > dev_dbg(&pdev->dev, > "qpid 0x%x, req_err=0x%x, resp_err=0x%x\n", > qp_id, err_event->req_err_state_reason, > err_event->res_err_state_reason); > - if (!qp) > - break; > - bnxt_qplib_mark_qp_error(qp); > - rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp); > break; > default: > /* > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > index 45996e6..07779ae 100644 > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > @@ -224,6 +224,8 @@ struct bnxt_qplib_rcfw { > struct bnxt_qplib_crsqe *crsqe_tbl; > int qp_tbl_size; > struct bnxt_qplib_qp_node *qp_tbl; > + /* To synchronize the qp-handle hash table */ > + spinlock_t tbl_lock; missing spin_lock_init? In this commit, spin_lock_init is not found, maybe this function is called in other commit? Or now spin_lock_init is not used again? Zhu Yanjun > u64 oos_prev; > u32 init_oos_stats; > u32 cmdq_depth; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 4/4] RDMA/bnxt_re: synchronize the qp-handle table array 2024-09-11 18:29 ` [PATCH 4/4] RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier 2024-09-13 4:26 ` Zhu Yanjun @ 2024-09-13 4:30 ` Zhu Yanjun 2024-09-13 7:59 ` Selvin Xavier 1 sibling, 1 reply; 8+ messages in thread From: Zhu Yanjun @ 2024-09-13 4:30 UTC (permalink / raw) To: Selvin Xavier, leon, jgg Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil 在 2024/9/12 2:29, Selvin Xavier 写道: > There is a race between the CREQ tasklet and destroy > qp when accessing the qp-handle table. There is > a chance of reading a valid qp-handle in the > CREQ tasklet handler while the QP is already moving > ahead with the destruction. > > Fixing this race by implementing a table-lock to > synchronize the access. > > Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") > Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing") > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> > Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> > --- > drivers/infiniband/hw/bnxt_re/qplib_fp.c | 5 +++++ > drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 12 ++++++++---- > drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 ++ > 3 files changed, 15 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c > index 42e98e5..5d36216 100644 > --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c > +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c > @@ -1524,12 +1524,15 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res, > struct creq_destroy_qp_resp resp = {}; > struct bnxt_qplib_cmdqmsg msg = {}; > struct cmdq_destroy_qp req = {}; > + unsigned long flags; > u32 tbl_indx; > int rc; > > + spin_lock_irqsave(&rcfw->tbl_lock, flags); Because the race occurs between tasklets, spin_lock_bh is enough to this? Zhu Yanjun > tbl_indx = map_qp_id_to_tbl_indx(qp->id, rcfw); > rcfw->qp_tbl[tbl_indx].qp_id = BNXT_QPLIB_QP_ID_INVALID; > rcfw->qp_tbl[tbl_indx].qp_handle = NULL; > + spin_unlock_irqrestore(&rcfw->tbl_lock, flags); > > bnxt_qplib_rcfw_cmd_prep((struct cmdq_base *)&req, > CMDQ_BASE_OPCODE_DESTROY_QP, > @@ -1540,8 +1543,10 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res, > sizeof(resp), 0); > rc = bnxt_qplib_rcfw_send_message(rcfw, &msg); > if (rc) { > + spin_lock_irqsave(&rcfw->tbl_lock, flags); > rcfw->qp_tbl[tbl_indx].qp_id = qp->id; > rcfw->qp_tbl[tbl_indx].qp_handle = qp; > + spin_unlock_irqrestore(&rcfw->tbl_lock, flags); > return rc; > } > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > index 3ffaef0c..993c356 100644 > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > @@ -637,17 +637,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw, > case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION: > err_event = (struct creq_qp_error_notification *)qp_event; > qp_id = le32_to_cpu(err_event->xid); > + spin_lock(&rcfw->tbl_lock); > tbl_indx = map_qp_id_to_tbl_indx(qp_id, rcfw); > qp = rcfw->qp_tbl[tbl_indx].qp_handle; > + if (!qp) { > + spin_unlock(&rcfw->tbl_lock); > + break; > + } > + bnxt_qplib_mark_qp_error(qp); > + rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp); > + spin_unlock(&rcfw->tbl_lock); > dev_dbg(&pdev->dev, "Received QP error notification\n"); > dev_dbg(&pdev->dev, > "qpid 0x%x, req_err=0x%x, resp_err=0x%x\n", > qp_id, err_event->req_err_state_reason, > err_event->res_err_state_reason); > - if (!qp) > - break; > - bnxt_qplib_mark_qp_error(qp); > - rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp); > break; > default: > /* > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > index 45996e6..07779ae 100644 > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > @@ -224,6 +224,8 @@ struct bnxt_qplib_rcfw { > struct bnxt_qplib_crsqe *crsqe_tbl; > int qp_tbl_size; > struct bnxt_qplib_qp_node *qp_tbl; > + /* To synchronize the qp-handle hash table */ > + spinlock_t tbl_lock; > u64 oos_prev; > u32 init_oos_stats; > u32 cmdq_depth; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 4/4] RDMA/bnxt_re: synchronize the qp-handle table array 2024-09-13 4:30 ` Zhu Yanjun @ 2024-09-13 7:59 ` Selvin Xavier 0 siblings, 0 replies; 8+ messages in thread From: Selvin Xavier @ 2024-09-13 7:59 UTC (permalink / raw) To: Zhu Yanjun Cc: leon, jgg, linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil [-- Attachment #1: Type: text/plain, Size: 5204 bytes --] On Fri, Sep 13, 2024 at 10:00 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > 在 2024/9/12 2:29, Selvin Xavier 写道: > > There is a race between the CREQ tasklet and destroy > > qp when accessing the qp-handle table. There is > > a chance of reading a valid qp-handle in the > > CREQ tasklet handler while the QP is already moving > > ahead with the destruction. > > > > Fixing this race by implementing a table-lock to > > synchronize the access. > > > > Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") > > Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing") > > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> > > Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> > > --- > > drivers/infiniband/hw/bnxt_re/qplib_fp.c | 5 +++++ > > drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 12 ++++++++---- > > drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 ++ > > 3 files changed, 15 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c > > index 42e98e5..5d36216 100644 > > --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c > > +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c > > @@ -1524,12 +1524,15 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res, > > struct creq_destroy_qp_resp resp = {}; > > struct bnxt_qplib_cmdqmsg msg = {}; > > struct cmdq_destroy_qp req = {}; > > + unsigned long flags; > > u32 tbl_indx; > > int rc; > > > > + spin_lock_irqsave(&rcfw->tbl_lock, flags); > > Because the race occurs between tasklets, spin_lock_bh is enough to this? You are right. The race occurs in the BH handler. bnxt_qplib_service_creq , which is scheduled in the tasklet context, is already using the spin_lock_irqsave for synchronizing this function with other contexts. The current code change is with-in this context. I added this change on top of the current code and used irqsave. But I will fix both the spin lock usage and post a v2. Thanks for the review. > > Zhu Yanjun > > > tbl_indx = map_qp_id_to_tbl_indx(qp->id, rcfw); > > rcfw->qp_tbl[tbl_indx].qp_id = BNXT_QPLIB_QP_ID_INVALID; > > rcfw->qp_tbl[tbl_indx].qp_handle = NULL; > > + spin_unlock_irqrestore(&rcfw->tbl_lock, flags); > > > > bnxt_qplib_rcfw_cmd_prep((struct cmdq_base *)&req, > > CMDQ_BASE_OPCODE_DESTROY_QP, > > @@ -1540,8 +1543,10 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res, > > sizeof(resp), 0); > > rc = bnxt_qplib_rcfw_send_message(rcfw, &msg); > > if (rc) { > > + spin_lock_irqsave(&rcfw->tbl_lock, flags); > > rcfw->qp_tbl[tbl_indx].qp_id = qp->id; > > rcfw->qp_tbl[tbl_indx].qp_handle = qp; > > + spin_unlock_irqrestore(&rcfw->tbl_lock, flags); > > return rc; > > } > > > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > > index 3ffaef0c..993c356 100644 > > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > > @@ -637,17 +637,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw, > > case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION: > > err_event = (struct creq_qp_error_notification *)qp_event; > > qp_id = le32_to_cpu(err_event->xid); > > + spin_lock(&rcfw->tbl_lock); > > tbl_indx = map_qp_id_to_tbl_indx(qp_id, rcfw); > > qp = rcfw->qp_tbl[tbl_indx].qp_handle; > > + if (!qp) { > > + spin_unlock(&rcfw->tbl_lock); > > + break; > > + } > > + bnxt_qplib_mark_qp_error(qp); > > + rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp); > > + spin_unlock(&rcfw->tbl_lock); > > dev_dbg(&pdev->dev, "Received QP error notification\n"); > > dev_dbg(&pdev->dev, > > "qpid 0x%x, req_err=0x%x, resp_err=0x%x\n", > > qp_id, err_event->req_err_state_reason, > > err_event->res_err_state_reason); > > - if (!qp) > > - break; > > - bnxt_qplib_mark_qp_error(qp); > > - rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp); > > break; > > default: > > /* > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > > index 45996e6..07779ae 100644 > > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > > @@ -224,6 +224,8 @@ struct bnxt_qplib_rcfw { > > struct bnxt_qplib_crsqe *crsqe_tbl; > > int qp_tbl_size; > > struct bnxt_qplib_qp_node *qp_tbl; > > + /* To synchronize the qp-handle hash table */ > > + spinlock_t tbl_lock; > > u64 oos_prev; > > u32 init_oos_stats; > > u32 cmdq_depth; > [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4224 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-09-13 7:59 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-11 18:29 [PATCH 0/4] RDMA/bnxt_re: Bug fixes for 6.12 kernel Selvin Xavier 2024-09-11 18:29 ` [PATCH 1/4] RDMA/bnxt_re: Fix a possible memory leak Selvin Xavier 2024-09-11 18:29 ` [PATCH 2/4] RDMA/bnxt_re: Fix incorrect AVID type in WQE structure Selvin Xavier 2024-09-11 18:29 ` [PATCH 3/4] RDMA/bnxt_re: Add a check for memory allocation Selvin Xavier 2024-09-11 18:29 ` [PATCH 4/4] RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier 2024-09-13 4:26 ` Zhu Yanjun 2024-09-13 4:30 ` Zhu Yanjun 2024-09-13 7:59 ` Selvin Xavier
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.