* [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
@ 2026-03-03 4:36 Kamal Heib
2026-03-04 9:02 ` Siva Reddy Kallam
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Kamal Heib @ 2026-03-03 4:36 UTC (permalink / raw)
To: linux-rdma
Cc: Siva Reddy Kallam, Jason Gunthorpe, Leon Romanovsky, Kamal Heib
If the firmware version query fails, the driver currently ignores the
error and continues initializing. This leaves the device in a bad state.
Fix this by making bng_re_query_hwrm_version() return the error code and
update the driver to check for this error and stop the setup process
safely if it happens.
Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver")
Signed-off-by: Kamal Heib <kheib@redhat.com>
---
drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c
index d34b5f88cd40..17147175a9b0 100644
--- a/drivers/infiniband/hw/bng_re/bng_dev.c
+++ b/drivers/infiniband/hw/bng_re/bng_dev.c
@@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev)
return rc;
}
-static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
+static int bng_re_query_hwrm_version(struct bng_re_dev *rdev)
{
struct bnge_auxr_dev *aux_dev = rdev->aux_dev;
struct hwrm_ver_get_output ver_get_resp = {};
@@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
if (rc) {
ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x",
rc);
- return;
+ return rc;
}
cctx = rdev->chip_ctx;
@@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
if (!cctx->hwrm_cmd_max_timeout)
cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT;
+
+ return 0;
}
static void bng_re_dev_uninit(struct bng_re_dev *rdev)
@@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
goto msix_ctx_fail;
}
- bng_re_query_hwrm_version(rdev);
+ rc = bng_re_query_hwrm_version(rdev);
+ if (rc)
+ goto query_hwrm_ver_fail;
rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw);
if (rc) {
@@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
nq_alloc_fail:
bng_re_free_rcfw_channel(&rdev->rcfw);
alloc_fw_chl_fail:
+query_hwrm_ver_fail:
bng_re_destroy_chip_ctx(rdev);
msix_ctx_fail:
bnge_unregister_dev(rdev->aux_dev);
--
2.52.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query 2026-03-03 4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib @ 2026-03-04 9:02 ` Siva Reddy Kallam 2026-03-04 15:37 ` Leon Romanovsky 2026-03-05 9:34 ` Leon Romanovsky 2 siblings, 0 replies; 6+ messages in thread From: Siva Reddy Kallam @ 2026-03-04 9:02 UTC (permalink / raw) To: Kamal Heib; +Cc: linux-rdma, Jason Gunthorpe, Leon Romanovsky [-- Attachment #1: Type: text/plain, Size: 2654 bytes --] On Tue, Mar 3, 2026 at 10:06 AM Kamal Heib <kheib@redhat.com> wrote: > > If the firmware version query fails, the driver currently ignores the > error and continues initializing. This leaves the device in a bad state. > > Fix this by making bng_re_query_hwrm_version() return the error code and > update the driver to check for this error and stop the setup process > safely if it happens. > > Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver") > Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Siva Reddy Kallam <siva.kallam@broadcom.com> > --- > drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c > index d34b5f88cd40..17147175a9b0 100644 > --- a/drivers/infiniband/hw/bng_re/bng_dev.c > +++ b/drivers/infiniband/hw/bng_re/bng_dev.c > @@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev) > return rc; > } > > -static void bng_re_query_hwrm_version(struct bng_re_dev *rdev) > +static int bng_re_query_hwrm_version(struct bng_re_dev *rdev) > { > struct bnge_auxr_dev *aux_dev = rdev->aux_dev; > struct hwrm_ver_get_output ver_get_resp = {}; > @@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev) > if (rc) { > ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x", > rc); > - return; > + return rc; > } > > cctx = rdev->chip_ctx; > @@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev) > > if (!cctx->hwrm_cmd_max_timeout) > cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT; > + > + return 0; > } > > static void bng_re_dev_uninit(struct bng_re_dev *rdev) > @@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev) > goto msix_ctx_fail; > } > > - bng_re_query_hwrm_version(rdev); > + rc = bng_re_query_hwrm_version(rdev); > + if (rc) > + goto query_hwrm_ver_fail; > > rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw); > if (rc) { > @@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev) > nq_alloc_fail: > bng_re_free_rcfw_channel(&rdev->rcfw); > alloc_fw_chl_fail: > +query_hwrm_ver_fail: > bng_re_destroy_chip_ctx(rdev); > msix_ctx_fail: > bnge_unregister_dev(rdev->aux_dev); > -- > 2.52.0 > [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 5471 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query 2026-03-03 4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib 2026-03-04 9:02 ` Siva Reddy Kallam @ 2026-03-04 15:37 ` Leon Romanovsky 2026-03-05 3:49 ` Kamal Heib 2026-03-05 9:34 ` Leon Romanovsky 2 siblings, 1 reply; 6+ messages in thread From: Leon Romanovsky @ 2026-03-04 15:37 UTC (permalink / raw) To: Kamal Heib; +Cc: linux-rdma, Siva Reddy Kallam, Jason Gunthorpe On Mon, Mar 02, 2026 at 11:36:45PM -0500, Kamal Heib wrote: > If the firmware version query fails, the driver currently ignores the > error and continues initializing. This leaves the device in a bad state. Can you please elaborate what will it cause? Thanks > > Fix this by making bng_re_query_hwrm_version() return the error code and > update the driver to check for this error and stop the setup process > safely if it happens. > > Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver") > Signed-off-by: Kamal Heib <kheib@redhat.com> > --- > drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c > index d34b5f88cd40..17147175a9b0 100644 > --- a/drivers/infiniband/hw/bng_re/bng_dev.c > +++ b/drivers/infiniband/hw/bng_re/bng_dev.c > @@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev) > return rc; > } > > -static void bng_re_query_hwrm_version(struct bng_re_dev *rdev) > +static int bng_re_query_hwrm_version(struct bng_re_dev *rdev) > { > struct bnge_auxr_dev *aux_dev = rdev->aux_dev; > struct hwrm_ver_get_output ver_get_resp = {}; > @@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev) > if (rc) { > ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x", > rc); > - return; > + return rc; > } > > cctx = rdev->chip_ctx; > @@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev) > > if (!cctx->hwrm_cmd_max_timeout) > cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT; > + > + return 0; > } > > static void bng_re_dev_uninit(struct bng_re_dev *rdev) > @@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev) > goto msix_ctx_fail; > } > > - bng_re_query_hwrm_version(rdev); > + rc = bng_re_query_hwrm_version(rdev); > + if (rc) > + goto query_hwrm_ver_fail; > > rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw); > if (rc) { > @@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev) > nq_alloc_fail: > bng_re_free_rcfw_channel(&rdev->rcfw); > alloc_fw_chl_fail: > +query_hwrm_ver_fail: > bng_re_destroy_chip_ctx(rdev); > msix_ctx_fail: > bnge_unregister_dev(rdev->aux_dev); > -- > 2.52.0 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query 2026-03-04 15:37 ` Leon Romanovsky @ 2026-03-05 3:49 ` Kamal Heib 2026-03-05 9:32 ` Leon Romanovsky 0 siblings, 1 reply; 6+ messages in thread From: Kamal Heib @ 2026-03-05 3:49 UTC (permalink / raw) To: Leon Romanovsky; +Cc: linux-rdma, Siva Reddy Kallam, Jason Gunthorpe On Wed, Mar 04, 2026 at 05:37:07PM +0200, Leon Romanovsky wrote: > On Mon, Mar 02, 2026 at 11:36:45PM -0500, Kamal Heib wrote: > > If the firmware version query fails, the driver currently ignores the > > error and continues initializing. This leaves the device in a bad state. > > Can you please elaborate what will it cause? > > Thanks > If bng_re_query_hwrm_version() fails, the code returns early and leaves cctx->hwrm_cmd_max_timeout uninitialized. This parameter is subsequently assigned to rcfw->max_timeout, which is used by __wait_for_resp(). Later, when the driver sends firmware commands and enters __wait_for_resp(), it passes a zero timeout to the commands being sent, which can lead to a lockup. Also, cctx->hwrm_intf_ver is left uninitialized, which will likely be used in the future to determine if a specific feature is supported or not (like how it is done in bnxt_re). Thanks, Kamal > > > > Fix this by making bng_re_query_hwrm_version() return the error code and > > update the driver to check for this error and stop the setup process > > safely if it happens. > > > > Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver") > > Signed-off-by: Kamal Heib <kheib@redhat.com> > > --- > > drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++--- > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c > > index d34b5f88cd40..17147175a9b0 100644 > > --- a/drivers/infiniband/hw/bng_re/bng_dev.c > > +++ b/drivers/infiniband/hw/bng_re/bng_dev.c > > @@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev) > > return rc; > > } > > > > -static void bng_re_query_hwrm_version(struct bng_re_dev *rdev) > > +static int bng_re_query_hwrm_version(struct bng_re_dev *rdev) > > { > > struct bnge_auxr_dev *aux_dev = rdev->aux_dev; > > struct hwrm_ver_get_output ver_get_resp = {}; > > @@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev) > > if (rc) { > > ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x", > > rc); > > - return; > > + return rc; > > } > > > > cctx = rdev->chip_ctx; > > @@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev) > > > > if (!cctx->hwrm_cmd_max_timeout) > > cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT; > > + > > + return 0; > > } > > > > static void bng_re_dev_uninit(struct bng_re_dev *rdev) > > @@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev) > > goto msix_ctx_fail; > > } > > > > - bng_re_query_hwrm_version(rdev); > > + rc = bng_re_query_hwrm_version(rdev); > > + if (rc) > > + goto query_hwrm_ver_fail; > > > > rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw); > > if (rc) { > > @@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev) > > nq_alloc_fail: > > bng_re_free_rcfw_channel(&rdev->rcfw); > > alloc_fw_chl_fail: > > +query_hwrm_ver_fail: > > bng_re_destroy_chip_ctx(rdev); > > msix_ctx_fail: > > bnge_unregister_dev(rdev->aux_dev); > > -- > > 2.52.0 > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query 2026-03-05 3:49 ` Kamal Heib @ 2026-03-05 9:32 ` Leon Romanovsky 0 siblings, 0 replies; 6+ messages in thread From: Leon Romanovsky @ 2026-03-05 9:32 UTC (permalink / raw) To: Kamal Heib; +Cc: linux-rdma, Siva Reddy Kallam, Jason Gunthorpe On Wed, Mar 04, 2026 at 10:49:40PM -0500, Kamal Heib wrote: > On Wed, Mar 04, 2026 at 05:37:07PM +0200, Leon Romanovsky wrote: > > On Mon, Mar 02, 2026 at 11:36:45PM -0500, Kamal Heib wrote: > > > If the firmware version query fails, the driver currently ignores the > > > error and continues initializing. This leaves the device in a bad state. > > > > Can you please elaborate what will it cause? > > > > Thanks > > > > If bng_re_query_hwrm_version() fails, the code returns early and leaves > cctx->hwrm_cmd_max_timeout uninitialized. This parameter is subsequently > assigned to rcfw->max_timeout, which is used by __wait_for_resp(). Later, > when the driver sends firmware commands and enters __wait_for_resp(), it > passes a zero timeout to the commands being sent, which can lead to a > lockup. > > Also, cctx->hwrm_intf_ver is left uninitialized, which will likely > be used in the future to determine if a specific feature is supported > or not (like how it is done in bnxt_re). I'm not concerned about these flows. If something as fundamental as querying the HW version fails, it's likely that nothing else will behave correctly. Let's apply this patch. Thanks ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query 2026-03-03 4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib 2026-03-04 9:02 ` Siva Reddy Kallam 2026-03-04 15:37 ` Leon Romanovsky @ 2026-03-05 9:34 ` Leon Romanovsky 2 siblings, 0 replies; 6+ messages in thread From: Leon Romanovsky @ 2026-03-05 9:34 UTC (permalink / raw) To: linux-rdma, Kamal Heib; +Cc: Siva Reddy Kallam, Jason Gunthorpe On Mon, 02 Mar 2026 23:36:45 -0500, Kamal Heib wrote: > If the firmware version query fails, the driver currently ignores the > error and continues initializing. This leaves the device in a bad state. > > Fix this by making bng_re_query_hwrm_version() return the error code and > update the driver to check for this error and stop the setup process > safely if it happens. > > [...] Applied, thanks! [1/1] RDMA/bng_re: Fix silent failure in HWRM version query https://git.kernel.org/rdma/rdma/c/c242e92c9da456 Best regards, -- Leon Romanovsky <leon@kernel.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-05 9:34 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-03 4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib 2026-03-04 9:02 ` Siva Reddy Kallam 2026-03-04 15:37 ` Leon Romanovsky 2026-03-05 3:49 ` Kamal Heib 2026-03-05 9:32 ` Leon Romanovsky 2026-03-05 9:34 ` Leon Romanovsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox