linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes
@ 2023-05-11  7:26 Selvin Xavier
  2023-05-17 19:35 ` Jason Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Selvin Xavier @ 2023-05-11  7:26 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, andrew.gospodarek, Selvin Xavier

[-- Attachment #1: Type: text/plain, Size: 1167 bytes --]

Includes some of the generic bug fixes in bnxt_re driver.
Please review and apply.

Thanks,
Selvin Xavier

Kalesh AP (9):
  RDMA/bnxt_re: Fix a possible memory leak
  RDMA/bnxt_re: Fix to remove unnecessary return labels
  RDMA/bnxt_re: Use unique names while registering interrupts
  RDMA/bnxt_re: Remove a redundant check inside bnxt_re_update_gid
  RDMA/bnxt_re: Fix return value of bnxt_re_process_raw_qp_pkt_rx
  RDMA/bnxt_re: Fix to remove an unnecessary log
  RDMA/bnxt_re: Do not enable congestion control on VFs
  RDMA/bnxt_re: Return directly without goto jumps
  RDMA/bnxt_re: Remove unnecessary checks

Selvin Xavier (1):
  RDMA/bnxt_re: Disable/kill tasklet only if it is enabled

 drivers/infiniband/hw/bnxt_re/ib_verbs.c   |   2 +-
 drivers/infiniband/hw/bnxt_re/main.c       |  24 ++++---
 drivers/infiniband/hw/bnxt_re/qplib_fp.c   | 111 ++++++++++++++---------------
 drivers/infiniband/hw/bnxt_re/qplib_fp.h   |   2 +-
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c |  29 ++++++--
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.h |   1 +
 drivers/infiniband/hw/bnxt_re/qplib_sp.c   |   8 ---
 7 files changed, 91 insertions(+), 86 deletions(-)

-- 
2.5.5


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4224 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes
  2023-05-11  7:26 Selvin Xavier
@ 2023-05-17 19:35 ` Jason Gunthorpe
  2023-05-18  8:17   ` Selvin Xavier
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2023-05-17 19:35 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: leon, linux-rdma, andrew.gospodarek

On Thu, May 11, 2023 at 12:26:15AM -0700, Selvin Xavier wrote:
> Includes some of the generic bug fixes in bnxt_re driver.
> Please review and apply.
> 
> Thanks,
> Selvin Xavier
> 
> Kalesh AP (9):
>   RDMA/bnxt_re: Fix a possible memory leak
>   RDMA/bnxt_re: Fix to remove unnecessary return labels
>   RDMA/bnxt_re: Use unique names while registering interrupts
>   RDMA/bnxt_re: Remove a redundant check inside bnxt_re_update_gid
>   RDMA/bnxt_re: Fix return value of bnxt_re_process_raw_qp_pkt_rx
>   RDMA/bnxt_re: Fix to remove an unnecessary log
>   RDMA/bnxt_re: Do not enable congestion control on VFs
>   RDMA/bnxt_re: Return directly without goto jumps
>   RDMA/bnxt_re: Remove unnecessary checks
> 
> Selvin Xavier (1):
>   RDMA/bnxt_re: Disable/kill tasklet only if it is enabled

A lot of this is not for-rc material, please split it up properly

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes
  2023-05-17 19:35 ` Jason Gunthorpe
@ 2023-05-18  8:17   ` Selvin Xavier
  0 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2023-05-18  8:17 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: leon, linux-rdma, andrew.gospodarek

[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]

On Thu, May 18, 2023 at 1:05 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, May 11, 2023 at 12:26:15AM -0700, Selvin Xavier wrote:
> > Includes some of the generic bug fixes in bnxt_re driver.
> > Please review and apply.
> >
> > Thanks,
> > Selvin Xavier
> >
> > Kalesh AP (9):
> >   RDMA/bnxt_re: Fix a possible memory leak
> >   RDMA/bnxt_re: Fix to remove unnecessary return labels
> >   RDMA/bnxt_re: Use unique names while registering interrupts
> >   RDMA/bnxt_re: Remove a redundant check inside bnxt_re_update_gid
> >   RDMA/bnxt_re: Fix return value of bnxt_re_process_raw_qp_pkt_rx
> >   RDMA/bnxt_re: Fix to remove an unnecessary log
> >   RDMA/bnxt_re: Do not enable congestion control on VFs
> >   RDMA/bnxt_re: Return directly without goto jumps
> >   RDMA/bnxt_re: Remove unnecessary checks
> >
> > Selvin Xavier (1):
> >   RDMA/bnxt_re: Disable/kill tasklet only if it is enabled
>
> A lot of this is not for-rc material, please split it up properly
sure. will post it as two series.
-Selvin
>
> Jason

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4224 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes
@ 2024-10-08  7:41 Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 01/10] RDMA/bnxt_re: Fix the max CQ WQEs for older adapters Selvin Xavier
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Selvin Xavier

Few bugfixes for bnxt_re driver. Please review and apply.

Thanks,
Selvin Xavier

Abhishek Mohapatra (1):
  RDMA/bnxt_re: Fix the max CQ WQEs for older adapters

Bhargava Chenna Marreddy (1):
  RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages

Chandramohan Akula (1):
  RDMA/bnxt_re: Change the sequence of updating the CQ toggle value

Kalesh AP (5):
  RDMA/bnxt_re: Fix out of bound check
  RDMA/ bnxt_re: Return more meaningful error
  RDMA/bnxt_re: Fix a possible NULL pointer dereference
  RDMA/bnxt_re: Fix an error path in bnxt_re_add_device
  RDMA/bnxt_re: Fix the GID table length

Kashyap Desai (1):
  RDMA/bnxt_re: Fix incorrect dereference of srq in async event

Selvin Xavier (1):
  RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop

 drivers/infiniband/hw/bnxt_re/hw_counters.c |  2 +-
 drivers/infiniband/hw/bnxt_re/main.c        | 42 ++++++++++++++---------------
 drivers/infiniband/hw/bnxt_re/qplib_fp.c    |  5 ++++
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c  |  2 +-
 drivers/infiniband/hw/bnxt_re/qplib_res.c   | 19 +++----------
 drivers/infiniband/hw/bnxt_re/qplib_sp.c    | 11 +++++++-
 drivers/infiniband/hw/bnxt_re/qplib_sp.h    |  1 +
 7 files changed, 42 insertions(+), 40 deletions(-)

-- 
2.5.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-rc 01/10] RDMA/bnxt_re: Fix the max CQ WQEs for older adapters
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
@ 2024-10-08  7:41 ` Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 02/10] RDMA/bnxt_re: Fix out of bound check Selvin Xavier
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Abhishek Mohapatra, Selvin Xavier

From: Abhishek Mohapatra <abhishek.mohapatra@broadcom.com>

Older adapters doesn't support the MAX CQ WQEs reported by
older FW. So restrict the value reported to 1M always for
older adapters.

Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Abhishek Mohapatra<abhishek.mohapatra@broadcom.com>
Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/qplib_sp.c | 2 ++
 drivers/infiniband/hw/bnxt_re/qplib_sp.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
index 4f75e7e..32c1cc7 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
@@ -140,6 +140,8 @@ int bnxt_qplib_get_dev_attr(struct bnxt_qplib_rcfw *rcfw,
 			    min_t(u32, sb->max_sge_var_wqe, BNXT_VAR_MAX_SGE) : 6;
 	attr->max_cq = le32_to_cpu(sb->max_cq);
 	attr->max_cq_wqes = le32_to_cpu(sb->max_cqe);
+	if (!bnxt_qplib_is_chip_gen_p7(rcfw->res->cctx))
+		attr->max_cq_wqes = min_t(u32, BNXT_QPLIB_MAX_CQ_WQES, attr->max_cq_wqes);
 	attr->max_cq_sges = attr->max_qp_sges;
 	attr->max_mr = le32_to_cpu(sb->max_mr);
 	attr->max_mw = le32_to_cpu(sb->max_mw);
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.h b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
index acd9c14..ecf3f45 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
@@ -56,6 +56,7 @@ struct bnxt_qplib_dev_attr {
 	u32				max_qp_wqes;
 	u32				max_qp_sges;
 	u32				max_cq;
+#define BNXT_QPLIB_MAX_CQ_WQES          0xfffff
 	u32				max_cq_wqes;
 	u32				max_cq_sges;
 	u32				max_mr;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-rc 02/10] RDMA/bnxt_re: Fix out of bound check
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 01/10] RDMA/bnxt_re: Fix the max CQ WQEs for older adapters Selvin Xavier
@ 2024-10-08  7:41 ` Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 03/10] RDMA/bnxt_re: Fix incorrect dereference of srq in async event Selvin Xavier
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Selvin Xavier

From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

Driver exports pacing stats only on GenP5 and P7 adapters. But while
parsing the pacing stats, driver has a check for "rdev->dbr_pacing".
This caused a trace when KASAN is enabled.

BUG: KASAN: slab-out-of-bounds in bnxt_re_get_hw_stats+0x2b6a/0x2e00 [bnxt_re]
Write of size 8 at addr ffff8885942a6340 by task modprobe/4809

Fixes: 8b6573ff3420 ("bnxt_re: Update the debug counters for doorbell pacing")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/hw_counters.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxt_re/hw_counters.c b/drivers/infiniband/hw/bnxt_re/hw_counters.c
index 128651c..1e63f80 100644
--- a/drivers/infiniband/hw/bnxt_re/hw_counters.c
+++ b/drivers/infiniband/hw/bnxt_re/hw_counters.c
@@ -366,7 +366,7 @@ int bnxt_re_ib_get_hw_stats(struct ib_device *ibdev,
 				goto done;
 			}
 		}
-		if (rdev->pacing.dbr_pacing)
+		if (rdev->pacing.dbr_pacing && bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx))
 			bnxt_re_copy_db_pacing_stats(rdev, stats);
 	}
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-rc 03/10] RDMA/bnxt_re: Fix incorrect dereference of srq in async event
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 01/10] RDMA/bnxt_re: Fix the max CQ WQEs for older adapters Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 02/10] RDMA/bnxt_re: Fix out of bound check Selvin Xavier
@ 2024-10-08  7:41 ` Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 04/10] RDMA/ bnxt_re: Return more meaningful error Selvin Xavier
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Kashyap Desai, Selvin Xavier

From: Kashyap Desai <kashyap.desai@broadcom.com>

Currently driver is not getting correct srq. Dereference only if
qplib has a valid srq.

Fixes: b02fd3f79ec3 ("RDMA/bnxt_re: Report async events and errors")
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/main.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 686e405..dd39948 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -1028,12 +1028,15 @@ static int bnxt_re_handle_unaffi_async_event(struct creq_func_event
 static int bnxt_re_handle_qp_async_event(struct creq_qp_event *qp_event,
 					 struct bnxt_re_qp *qp)
 {
-	struct bnxt_re_srq *srq = container_of(qp->qplib_qp.srq, struct bnxt_re_srq,
-					       qplib_srq);
 	struct creq_qp_error_notification *err_event;
+	struct bnxt_re_srq *srq = NULL;
 	struct ib_event event = {};
 	unsigned int flags;
 
+	if (qp->qplib_qp.srq)
+		srq =  container_of(qp->qplib_qp.srq, struct bnxt_re_srq,
+				    qplib_srq);
+
 	if (qp->qplib_qp.state == CMDQ_MODIFY_QP_NEW_STATE_ERR &&
 	    rdma_is_kernel_res(&qp->ib_qp.res)) {
 		flags = bnxt_re_lock_cqs(qp);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-rc 04/10] RDMA/ bnxt_re: Return more meaningful error
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
                   ` (2 preceding siblings ...)
  2024-10-08  7:41 ` [PATCH for-rc 03/10] RDMA/bnxt_re: Fix incorrect dereference of srq in async event Selvin Xavier
@ 2024-10-08  7:41 ` Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 05/10] RDMA/bnxt_re: Fix a possible NULL pointer dereference Selvin Xavier
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Selvin Xavier

From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

When the HWRM command fails, driver currently returns
-EFAULT(Bad address). This does not look correct.

Modified to return -EIO(I/O error).

Fixes: cc1ec769b87c ("RDMA/bnxt_re: Fixing the Control path command and response handling")
Fixes: 65288a22ddd8 ("RDMA/bnxt_re: use shadow qd while posting non blocking rcfw command")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
index 3ffaef0c..7294221 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
@@ -525,7 +525,7 @@ static int __bnxt_qplib_rcfw_send_message(struct bnxt_qplib_rcfw *rcfw,
 		/* failed with status */
 		dev_err(&rcfw->pdev->dev, "cmdq[%#x]=%#x status %#x\n",
 			cookie, opcode, evnt->status);
-		rc = -EFAULT;
+		rc = -EIO;
 	}
 
 	return rc;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-rc 05/10] RDMA/bnxt_re: Fix a possible NULL pointer dereference
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
                   ` (3 preceding siblings ...)
  2024-10-08  7:41 ` [PATCH for-rc 04/10] RDMA/ bnxt_re: Return more meaningful error Selvin Xavier
@ 2024-10-08  7:41 ` Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 06/10] RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop Selvin Xavier
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Selvin Xavier

From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

There is a possibility of a NULL pointer dereference in the failure
path of bnxt_re_add_device().
To address that, moved the update of "rdev->adev" to bnxt_re_dev_add().

Fixes: dee3da3422d5 ("RDMA/bnxt_re: Change aux driver data to en_info to hold more information")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-rdma/CAH-L+nMCwymKGqf5pd8-FZNhxEkDD=kb6AoCaE6fAVi7b3e5Qw@mail.gmail.com/T/#t
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index dd39948..915b0d3 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -960,7 +960,7 @@ static int bnxt_re_register_ib(struct bnxt_re_dev *rdev)
 	return ib_register_device(ibdev, "bnxt_re%d", &rdev->en_dev->pdev->dev);
 }
 
-static struct bnxt_re_dev *bnxt_re_dev_add(struct bnxt_aux_priv *aux_priv,
+static struct bnxt_re_dev *bnxt_re_dev_add(struct auxiliary_device *adev,
 					   struct bnxt_en_dev *en_dev)
 {
 	struct bnxt_re_dev *rdev;
@@ -976,6 +976,7 @@ static struct bnxt_re_dev *bnxt_re_dev_add(struct bnxt_aux_priv *aux_priv,
 	rdev->nb.notifier_call = NULL;
 	rdev->netdev = en_dev->net;
 	rdev->en_dev = en_dev;
+	rdev->adev = adev;
 	rdev->id = rdev->en_dev->pdev->devfn;
 	INIT_LIST_HEAD(&rdev->qp_list);
 	mutex_init(&rdev->qp_lock);
@@ -1829,7 +1830,6 @@ static void bnxt_re_update_en_info_rdev(struct bnxt_re_dev *rdev,
 	 */
 	rtnl_lock();
 	en_info->rdev = rdev;
-	rdev->adev = adev;
 	rtnl_unlock();
 }
 
@@ -1846,7 +1846,7 @@ static int bnxt_re_add_device(struct auxiliary_device *adev, u8 op_type)
 	en_dev = en_info->en_dev;
 
 
-	rdev = bnxt_re_dev_add(aux_priv, en_dev);
+	rdev = bnxt_re_dev_add(adev, en_dev);
 	if (!rdev || !rdev_to_dev(rdev)) {
 		rc = -ENOMEM;
 		goto exit;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-rc 06/10] RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
                   ` (4 preceding siblings ...)
  2024-10-08  7:41 ` [PATCH for-rc 05/10] RDMA/bnxt_re: Fix a possible NULL pointer dereference Selvin Xavier
@ 2024-10-08  7:41 ` Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 07/10] RDMA/bnxt_re: Fix an error path in bnxt_re_add_device Selvin Xavier
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Selvin Xavier

Driver waits indefinitely for the fifo occupancy to go below
a threshold as soon as the pacing interrupt is received. This can
cause soft lockup on one of the processors, if the rate of DB is very high.

Add a loop count for FPGA and exit the __wait_for_fifo_occupancy_below_th
if the loop is taking more time. Pacing will be continuing until the
occupancy is below the threshold. This is ensured by the
checks in bnxt_re_pacing_timer_exp and further scheduling the
work for pacing based on the fifo occupancy.

Fixes: 2ad4e6303a6d ("RDMA/bnxt_re: Implement doorbell pacing algorithm")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 915b0d3..b1dcb6b 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -534,6 +534,7 @@ static bool is_dbr_fifo_full(struct bnxt_re_dev *rdev)
 static void __wait_for_fifo_occupancy_below_th(struct bnxt_re_dev *rdev)
 {
 	struct bnxt_qplib_db_pacing_data *pacing_data = rdev->qplib_res.pacing_data;
+	u32 retry_fifo_check = 1000;
 	u32 fifo_occup;
 
 	/* loop shouldn't run infintely as the occupancy usually goes
@@ -547,6 +548,14 @@ static void __wait_for_fifo_occupancy_below_th(struct bnxt_re_dev *rdev)
 
 		if (fifo_occup < pacing_data->pacing_th)
 			break;
+		if (!retry_fifo_check--) {
+			dev_info_once(rdev_to_dev(rdev),
+				      "%s: fifo_occup = 0x%xfifo_max_depth = 0x%x pacing_th = 0x%x\n",
+				      __func__, fifo_occup, pacing_data->fifo_max_depth,
+					pacing_data->pacing_th);
+			break;
+		}
+
 	}
 }
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-rc 07/10] RDMA/bnxt_re: Fix an error path in bnxt_re_add_device
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
                   ` (5 preceding siblings ...)
  2024-10-08  7:41 ` [PATCH for-rc 06/10] RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop Selvin Xavier
@ 2024-10-08  7:41 ` Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 08/10] RDMA/bnxt_re: Change the sequence of updating the CQ toggle value Selvin Xavier
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Selvin Xavier

From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

In bnxt_re_add_device(), when register netdev notifier fails, driver is
not unregistering the IB device in the error cleanup path.
Also, removed the duplicate cleanup in error path of bnxt_re_probe.

Fixes: 94a9dc6ac8f7 ("RDMA/bnxt_re: Group all operations under add_device and remove_device")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/main.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index b1dcb6b..63ca600 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -1880,12 +1880,14 @@ static int bnxt_re_add_device(struct auxiliary_device *adev, u8 op_type)
 		rdev->nb.notifier_call = NULL;
 		pr_err("%s: Cannot register to netdevice_notifier",
 		       ROCE_DRV_MODULE_NAME);
-		return rc;
+		goto re_dev_unreg;
 	}
 	bnxt_re_setup_cc(rdev, true);
 
 	return 0;
 
+re_dev_unreg:
+	ib_unregister_device(&rdev->ibdev);
 re_dev_uninit:
 	bnxt_re_update_en_info_rdev(NULL, en_info, adev);
 	bnxt_re_dev_uninit(rdev, BNXT_RE_COMPLETE_REMOVE);
@@ -2029,15 +2031,7 @@ static int bnxt_re_probe(struct auxiliary_device *adev,
 	auxiliary_set_drvdata(adev, en_info);
 
 	rc = bnxt_re_add_device(adev, BNXT_RE_COMPLETE_INIT);
-	if (rc)
-		goto err;
 	mutex_unlock(&bnxt_re_mutex);
-	return 0;
-
-err:
-	mutex_unlock(&bnxt_re_mutex);
-	bnxt_re_remove(adev);
-
 	return rc;
 }
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-rc 08/10] RDMA/bnxt_re: Change the sequence of updating the CQ toggle value
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
                   ` (6 preceding siblings ...)
  2024-10-08  7:41 ` [PATCH for-rc 07/10] RDMA/bnxt_re: Fix an error path in bnxt_re_add_device Selvin Xavier
@ 2024-10-08  7:41 ` Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 09/10] RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages Selvin Xavier
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Chandramohan Akula, Selvin Xavier

From: Chandramohan Akula <chandramohan.akula@broadcom.com>

Currently the CQ toggle value in the shared page (read by
the userlib) is updated as part of the cqn_handler. There
is a potential race of application calling the CQ ARM
doorbell immediately and using the old toggle value.

Change the sequence of updating CQ toggle value to update
in the bnxt_qplib_service_nq function immediately after
reading the toggle value to be in sync with the HW updated
value.

Fixes: e275919d9669 ("RDMA/bnxt_re: Share a page to expose per CQ info with userspace")
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/main.c     | 8 +-------
 drivers/infiniband/hw/bnxt_re/qplib_fp.c | 5 +++++
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 63ca600..6715c96 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -1274,15 +1274,9 @@ static int bnxt_re_cqn_handler(struct bnxt_qplib_nq *nq,
 {
 	struct bnxt_re_cq *cq = container_of(handle, struct bnxt_re_cq,
 					     qplib_cq);
-	u32 *cq_ptr;
 
-	if (cq->ib_cq.comp_handler) {
-		if (cq->uctx_cq_page) {
-			cq_ptr = (u32 *)cq->uctx_cq_page;
-			*cq_ptr = cq->qplib_cq.toggle;
-		}
+	if (cq->ib_cq.comp_handler)
 		(*cq->ib_cq.comp_handler)(&cq->ib_cq, cq->ib_cq.cq_context);
-	}
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
index 42e98e5..2ebcb2d 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
@@ -327,6 +327,7 @@ static void bnxt_qplib_service_nq(struct tasklet_struct *t)
 		case NQ_BASE_TYPE_CQ_NOTIFICATION:
 		{
 			struct nq_cn *nqcne = (struct nq_cn *)nqe;
+			struct bnxt_re_cq *cq_p;
 
 			q_handle = le32_to_cpu(nqcne->cq_handle_low);
 			q_handle |= (u64)le32_to_cpu(nqcne->cq_handle_high)
@@ -337,6 +338,10 @@ static void bnxt_qplib_service_nq(struct tasklet_struct *t)
 			cq->toggle = (le16_to_cpu(nqe->info10_type) &
 					NQ_CN_TOGGLE_MASK) >> NQ_CN_TOGGLE_SFT;
 			cq->dbinfo.toggle = cq->toggle;
+			cq_p = container_of(cq, struct bnxt_re_cq, qplib_cq);
+			if (cq_p->uctx_cq_page)
+				*((u32 *)cq_p->uctx_cq_page) = cq->toggle;
+
 			bnxt_qplib_armen_db(&cq->dbinfo,
 					    DBC_DBC_TYPE_CQ_ARMENA);
 			spin_lock_bh(&cq->compl_lock);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-rc 09/10] RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
                   ` (7 preceding siblings ...)
  2024-10-08  7:41 ` [PATCH for-rc 08/10] RDMA/bnxt_re: Change the sequence of updating the CQ toggle value Selvin Xavier
@ 2024-10-08  7:41 ` Selvin Xavier
  2024-10-08  7:41 ` [PATCH for-rc 10/10] RDMA/bnxt_re: Fix the GID table length Selvin Xavier
  2024-10-11 23:51 ` [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Jason Gunthorpe
  10 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Bhargava Chenna Marreddy, Selvin Xavier

From: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>

Avoid memory corruption while setting up Level-2 PBL
pages for the non MR resources when num_pages > 256K.

There will be a single PDE page address (contiguous
pages in the case of > PAGE_SIZE), but, current logic
assumes multiple pages, leading to invalid memory
access after 256K PBL entries in the PDE.

Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation")
Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/qplib_res.c | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
index 1fdffd6..96ceec1 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
@@ -257,22 +257,9 @@ int bnxt_qplib_alloc_init_hwq(struct bnxt_qplib_hwq *hwq,
 			dst_virt_ptr =
 				(dma_addr_t **)hwq->pbl[PBL_LVL_0].pg_arr;
 			src_phys_ptr = hwq->pbl[PBL_LVL_1].pg_map_arr;
-			if (hwq_attr->type == HWQ_TYPE_MR) {
-			/* For MR it is expected that we supply only 1 contigous
-			 * page i.e only 1 entry in the PDL that will contain
-			 * all the PBLs for the user supplied memory region
-			 */
-				for (i = 0; i < hwq->pbl[PBL_LVL_1].pg_count;
-				     i++)
-					dst_virt_ptr[0][i] = src_phys_ptr[i] |
-						flag;
-			} else {
-				for (i = 0; i < hwq->pbl[PBL_LVL_1].pg_count;
-				     i++)
-					dst_virt_ptr[PTR_PG(i)][PTR_IDX(i)] =
-						src_phys_ptr[i] |
-						PTU_PDE_VALID;
-			}
+			for (i = 0; i < hwq->pbl[PBL_LVL_1].pg_count; i++)
+				dst_virt_ptr[0][i] = src_phys_ptr[i] | flag;
+
 			/* Alloc or init PTEs */
 			rc = __alloc_pbl(res, &hwq->pbl[PBL_LVL_2],
 					 hwq_attr->sginfo);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-rc 10/10] RDMA/bnxt_re: Fix the GID table length
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
                   ` (8 preceding siblings ...)
  2024-10-08  7:41 ` [PATCH for-rc 09/10] RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages Selvin Xavier
@ 2024-10-08  7:41 ` Selvin Xavier
  2024-10-11 23:51 ` [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Jason Gunthorpe
  10 siblings, 0 replies; 15+ messages in thread
From: Selvin Xavier @ 2024-10-08  7:41 UTC (permalink / raw)
  To: leon, jgg
  Cc: linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil,
	Selvin Xavier

From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

GID table length is reported by FW. The gid index
which is passed to the driver during modify_qp/create_ah
is restricted by the sgid_index field of struct ib_global_route.
sgid_index is u8 and the max sgid possible is 256.

Each GID entry in HW will have 2 GID entries in the kernel gid table.
So we can support twice the gid table size reported by FW. Also, restrict
the max GID to 256 also.

Fixes: 847b97887ed4 ("RDMA/bnxt_re: Restrict the max_gids to 256")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/qplib_sp.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
index 32c1cc7..e29fbbd 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
@@ -159,7 +159,14 @@ int bnxt_qplib_get_dev_attr(struct bnxt_qplib_rcfw *rcfw,
 	if (!bnxt_qplib_is_chip_gen_p7(rcfw->res->cctx))
 		attr->l2_db_size = (sb->l2_db_space_size + 1) *
 				    (0x01 << RCFW_DBR_BASE_PAGE_SHIFT);
-	attr->max_sgid = BNXT_QPLIB_NUM_GIDS_SUPPORTED;
+	/*
+	 * Read the max gid supported by HW.
+	 * For each entry in HW  GID in HW table, we consume 2
+	 * GID entries in the kernel GID table.  So max_gid reported
+	 * to stack can be up to twice the value reported by the HW, up to 256 gids.
+	 */
+	attr->max_sgid = le32_to_cpu(sb->max_gid);
+	attr->max_sgid = min_t(u32, BNXT_QPLIB_NUM_GIDS_SUPPORTED, 2 * attr->max_sgid);
 	attr->dev_cap_flags = le16_to_cpu(sb->dev_cap_flags);
 	attr->dev_cap_flags2 = le16_to_cpu(sb->dev_cap_ext_flags_2);
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes
  2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
                   ` (9 preceding siblings ...)
  2024-10-08  7:41 ` [PATCH for-rc 10/10] RDMA/bnxt_re: Fix the GID table length Selvin Xavier
@ 2024-10-11 23:51 ` Jason Gunthorpe
  10 siblings, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2024-10-11 23:51 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: leon, linux-rdma, andrew.gospodarek, kalesh-anakkur.purayil

On Tue, Oct 08, 2024 at 12:41:32AM -0700, Selvin Xavier wrote:
> Few bugfixes for bnxt_re driver. Please review and apply.
> 
> Thanks,
> Selvin Xavier
> 
> Abhishek Mohapatra (1):
>   RDMA/bnxt_re: Fix the max CQ WQEs for older adapters
> 
> Bhargava Chenna Marreddy (1):
>   RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages
> 
> Chandramohan Akula (1):
>   RDMA/bnxt_re: Change the sequence of updating the CQ toggle value
> 
> Kalesh AP (5):
>   RDMA/bnxt_re: Fix out of bound check
>   RDMA/ bnxt_re: Return more meaningful error
>   RDMA/bnxt_re: Fix a possible NULL pointer dereference
>   RDMA/bnxt_re: Fix an error path in bnxt_re_add_device
>   RDMA/bnxt_re: Fix the GID table length
> 
> Kashyap Desai (1):
>   RDMA/bnxt_re: Fix incorrect dereference of srq in async event
> 
> Selvin Xavier (1):
>   RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop

Applied to for-rc, thanks

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-10-11 23:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-08  7:41 [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Selvin Xavier
2024-10-08  7:41 ` [PATCH for-rc 01/10] RDMA/bnxt_re: Fix the max CQ WQEs for older adapters Selvin Xavier
2024-10-08  7:41 ` [PATCH for-rc 02/10] RDMA/bnxt_re: Fix out of bound check Selvin Xavier
2024-10-08  7:41 ` [PATCH for-rc 03/10] RDMA/bnxt_re: Fix incorrect dereference of srq in async event Selvin Xavier
2024-10-08  7:41 ` [PATCH for-rc 04/10] RDMA/ bnxt_re: Return more meaningful error Selvin Xavier
2024-10-08  7:41 ` [PATCH for-rc 05/10] RDMA/bnxt_re: Fix a possible NULL pointer dereference Selvin Xavier
2024-10-08  7:41 ` [PATCH for-rc 06/10] RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop Selvin Xavier
2024-10-08  7:41 ` [PATCH for-rc 07/10] RDMA/bnxt_re: Fix an error path in bnxt_re_add_device Selvin Xavier
2024-10-08  7:41 ` [PATCH for-rc 08/10] RDMA/bnxt_re: Change the sequence of updating the CQ toggle value Selvin Xavier
2024-10-08  7:41 ` [PATCH for-rc 09/10] RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages Selvin Xavier
2024-10-08  7:41 ` [PATCH for-rc 10/10] RDMA/bnxt_re: Fix the GID table length Selvin Xavier
2024-10-11 23:51 ` [PATCH for-rc 00/10] RDMA/bnxt_re: Bug fixes Jason Gunthorpe
  -- strict thread matches above, loose matches on Subject: below --
2023-05-11  7:26 Selvin Xavier
2023-05-17 19:35 ` Jason Gunthorpe
2023-05-18  8:17   ` Selvin Xavier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).