* [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
@ 2026-03-03 4:36 Kamal Heib
2026-03-04 9:02 ` Siva Reddy Kallam
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Kamal Heib @ 2026-03-03 4:36 UTC (permalink / raw)
To: linux-rdma
Cc: Siva Reddy Kallam, Jason Gunthorpe, Leon Romanovsky, Kamal Heib
If the firmware version query fails, the driver currently ignores the
error and continues initializing. This leaves the device in a bad state.
Fix this by making bng_re_query_hwrm_version() return the error code and
update the driver to check for this error and stop the setup process
safely if it happens.
Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver")
Signed-off-by: Kamal Heib <kheib@redhat.com>
---
drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c
index d34b5f88cd40..17147175a9b0 100644
--- a/drivers/infiniband/hw/bng_re/bng_dev.c
+++ b/drivers/infiniband/hw/bng_re/bng_dev.c
@@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev)
return rc;
}
-static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
+static int bng_re_query_hwrm_version(struct bng_re_dev *rdev)
{
struct bnge_auxr_dev *aux_dev = rdev->aux_dev;
struct hwrm_ver_get_output ver_get_resp = {};
@@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
if (rc) {
ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x",
rc);
- return;
+ return rc;
}
cctx = rdev->chip_ctx;
@@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
if (!cctx->hwrm_cmd_max_timeout)
cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT;
+
+ return 0;
}
static void bng_re_dev_uninit(struct bng_re_dev *rdev)
@@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
goto msix_ctx_fail;
}
- bng_re_query_hwrm_version(rdev);
+ rc = bng_re_query_hwrm_version(rdev);
+ if (rc)
+ goto query_hwrm_ver_fail;
rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw);
if (rc) {
@@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
nq_alloc_fail:
bng_re_free_rcfw_channel(&rdev->rcfw);
alloc_fw_chl_fail:
+query_hwrm_ver_fail:
bng_re_destroy_chip_ctx(rdev);
msix_ctx_fail:
bnge_unregister_dev(rdev->aux_dev);
--
2.52.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
2026-03-03 4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib
@ 2026-03-04 9:02 ` Siva Reddy Kallam
2026-03-04 15:37 ` Leon Romanovsky
2026-03-05 9:34 ` Leon Romanovsky
2 siblings, 0 replies; 6+ messages in thread
From: Siva Reddy Kallam @ 2026-03-04 9:02 UTC (permalink / raw)
To: Kamal Heib; +Cc: linux-rdma, Jason Gunthorpe, Leon Romanovsky
[-- Attachment #1: Type: text/plain, Size: 2654 bytes --]
On Tue, Mar 3, 2026 at 10:06 AM Kamal Heib <kheib@redhat.com> wrote:
>
> If the firmware version query fails, the driver currently ignores the
> error and continues initializing. This leaves the device in a bad state.
>
> Fix this by making bng_re_query_hwrm_version() return the error code and
> update the driver to check for this error and stop the setup process
> safely if it happens.
>
> Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver")
> Signed-off-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
> ---
> drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c
> index d34b5f88cd40..17147175a9b0 100644
> --- a/drivers/infiniband/hw/bng_re/bng_dev.c
> +++ b/drivers/infiniband/hw/bng_re/bng_dev.c
> @@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev)
> return rc;
> }
>
> -static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> +static int bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> {
> struct bnge_auxr_dev *aux_dev = rdev->aux_dev;
> struct hwrm_ver_get_output ver_get_resp = {};
> @@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> if (rc) {
> ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x",
> rc);
> - return;
> + return rc;
> }
>
> cctx = rdev->chip_ctx;
> @@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
>
> if (!cctx->hwrm_cmd_max_timeout)
> cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT;
> +
> + return 0;
> }
>
> static void bng_re_dev_uninit(struct bng_re_dev *rdev)
> @@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
> goto msix_ctx_fail;
> }
>
> - bng_re_query_hwrm_version(rdev);
> + rc = bng_re_query_hwrm_version(rdev);
> + if (rc)
> + goto query_hwrm_ver_fail;
>
> rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw);
> if (rc) {
> @@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
> nq_alloc_fail:
> bng_re_free_rcfw_channel(&rdev->rcfw);
> alloc_fw_chl_fail:
> +query_hwrm_ver_fail:
> bng_re_destroy_chip_ctx(rdev);
> msix_ctx_fail:
> bnge_unregister_dev(rdev->aux_dev);
> --
> 2.52.0
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5471 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
2026-03-03 4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib
2026-03-04 9:02 ` Siva Reddy Kallam
@ 2026-03-04 15:37 ` Leon Romanovsky
2026-03-05 3:49 ` Kamal Heib
2026-03-05 9:34 ` Leon Romanovsky
2 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2026-03-04 15:37 UTC (permalink / raw)
To: Kamal Heib; +Cc: linux-rdma, Siva Reddy Kallam, Jason Gunthorpe
On Mon, Mar 02, 2026 at 11:36:45PM -0500, Kamal Heib wrote:
> If the firmware version query fails, the driver currently ignores the
> error and continues initializing. This leaves the device in a bad state.
Can you please elaborate what will it cause?
Thanks
>
> Fix this by making bng_re_query_hwrm_version() return the error code and
> update the driver to check for this error and stop the setup process
> safely if it happens.
>
> Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver")
> Signed-off-by: Kamal Heib <kheib@redhat.com>
> ---
> drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c
> index d34b5f88cd40..17147175a9b0 100644
> --- a/drivers/infiniband/hw/bng_re/bng_dev.c
> +++ b/drivers/infiniband/hw/bng_re/bng_dev.c
> @@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev)
> return rc;
> }
>
> -static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> +static int bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> {
> struct bnge_auxr_dev *aux_dev = rdev->aux_dev;
> struct hwrm_ver_get_output ver_get_resp = {};
> @@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> if (rc) {
> ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x",
> rc);
> - return;
> + return rc;
> }
>
> cctx = rdev->chip_ctx;
> @@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
>
> if (!cctx->hwrm_cmd_max_timeout)
> cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT;
> +
> + return 0;
> }
>
> static void bng_re_dev_uninit(struct bng_re_dev *rdev)
> @@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
> goto msix_ctx_fail;
> }
>
> - bng_re_query_hwrm_version(rdev);
> + rc = bng_re_query_hwrm_version(rdev);
> + if (rc)
> + goto query_hwrm_ver_fail;
>
> rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw);
> if (rc) {
> @@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
> nq_alloc_fail:
> bng_re_free_rcfw_channel(&rdev->rcfw);
> alloc_fw_chl_fail:
> +query_hwrm_ver_fail:
> bng_re_destroy_chip_ctx(rdev);
> msix_ctx_fail:
> bnge_unregister_dev(rdev->aux_dev);
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
2026-03-04 15:37 ` Leon Romanovsky
@ 2026-03-05 3:49 ` Kamal Heib
2026-03-05 9:32 ` Leon Romanovsky
0 siblings, 1 reply; 6+ messages in thread
From: Kamal Heib @ 2026-03-05 3:49 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: linux-rdma, Siva Reddy Kallam, Jason Gunthorpe
On Wed, Mar 04, 2026 at 05:37:07PM +0200, Leon Romanovsky wrote:
> On Mon, Mar 02, 2026 at 11:36:45PM -0500, Kamal Heib wrote:
> > If the firmware version query fails, the driver currently ignores the
> > error and continues initializing. This leaves the device in a bad state.
>
> Can you please elaborate what will it cause?
>
> Thanks
>
If bng_re_query_hwrm_version() fails, the code returns early and leaves
cctx->hwrm_cmd_max_timeout uninitialized. This parameter is subsequently
assigned to rcfw->max_timeout, which is used by __wait_for_resp(). Later,
when the driver sends firmware commands and enters __wait_for_resp(), it
passes a zero timeout to the commands being sent, which can lead to a
lockup.
Also, cctx->hwrm_intf_ver is left uninitialized, which will likely
be used in the future to determine if a specific feature is supported
or not (like how it is done in bnxt_re).
Thanks,
Kamal
> >
> > Fix this by making bng_re_query_hwrm_version() return the error code and
> > update the driver to check for this error and stop the setup process
> > safely if it happens.
> >
> > Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver")
> > Signed-off-by: Kamal Heib <kheib@redhat.com>
> > ---
> > drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++---
> > 1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c
> > index d34b5f88cd40..17147175a9b0 100644
> > --- a/drivers/infiniband/hw/bng_re/bng_dev.c
> > +++ b/drivers/infiniband/hw/bng_re/bng_dev.c
> > @@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev)
> > return rc;
> > }
> >
> > -static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> > +static int bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> > {
> > struct bnge_auxr_dev *aux_dev = rdev->aux_dev;
> > struct hwrm_ver_get_output ver_get_resp = {};
> > @@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> > if (rc) {
> > ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x",
> > rc);
> > - return;
> > + return rc;
> > }
> >
> > cctx = rdev->chip_ctx;
> > @@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> >
> > if (!cctx->hwrm_cmd_max_timeout)
> > cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT;
> > +
> > + return 0;
> > }
> >
> > static void bng_re_dev_uninit(struct bng_re_dev *rdev)
> > @@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
> > goto msix_ctx_fail;
> > }
> >
> > - bng_re_query_hwrm_version(rdev);
> > + rc = bng_re_query_hwrm_version(rdev);
> > + if (rc)
> > + goto query_hwrm_ver_fail;
> >
> > rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw);
> > if (rc) {
> > @@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
> > nq_alloc_fail:
> > bng_re_free_rcfw_channel(&rdev->rcfw);
> > alloc_fw_chl_fail:
> > +query_hwrm_ver_fail:
> > bng_re_destroy_chip_ctx(rdev);
> > msix_ctx_fail:
> > bnge_unregister_dev(rdev->aux_dev);
> > --
> > 2.52.0
> >
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
2026-03-05 3:49 ` Kamal Heib
@ 2026-03-05 9:32 ` Leon Romanovsky
0 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2026-03-05 9:32 UTC (permalink / raw)
To: Kamal Heib; +Cc: linux-rdma, Siva Reddy Kallam, Jason Gunthorpe
On Wed, Mar 04, 2026 at 10:49:40PM -0500, Kamal Heib wrote:
> On Wed, Mar 04, 2026 at 05:37:07PM +0200, Leon Romanovsky wrote:
> > On Mon, Mar 02, 2026 at 11:36:45PM -0500, Kamal Heib wrote:
> > > If the firmware version query fails, the driver currently ignores the
> > > error and continues initializing. This leaves the device in a bad state.
> >
> > Can you please elaborate what will it cause?
> >
> > Thanks
> >
>
> If bng_re_query_hwrm_version() fails, the code returns early and leaves
> cctx->hwrm_cmd_max_timeout uninitialized. This parameter is subsequently
> assigned to rcfw->max_timeout, which is used by __wait_for_resp(). Later,
> when the driver sends firmware commands and enters __wait_for_resp(), it
> passes a zero timeout to the commands being sent, which can lead to a
> lockup.
>
> Also, cctx->hwrm_intf_ver is left uninitialized, which will likely
> be used in the future to determine if a specific feature is supported
> or not (like how it is done in bnxt_re).
I'm not concerned about these flows. If something as fundamental as querying
the HW version fails, it's likely that nothing else will behave correctly.
Let's apply this patch.
Thanks
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
2026-03-03 4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib
2026-03-04 9:02 ` Siva Reddy Kallam
2026-03-04 15:37 ` Leon Romanovsky
@ 2026-03-05 9:34 ` Leon Romanovsky
2 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2026-03-05 9:34 UTC (permalink / raw)
To: linux-rdma, Kamal Heib; +Cc: Siva Reddy Kallam, Jason Gunthorpe
On Mon, 02 Mar 2026 23:36:45 -0500, Kamal Heib wrote:
> If the firmware version query fails, the driver currently ignores the
> error and continues initializing. This leaves the device in a bad state.
>
> Fix this by making bng_re_query_hwrm_version() return the error code and
> update the driver to check for this error and stop the setup process
> safely if it happens.
>
> [...]
Applied, thanks!
[1/1] RDMA/bng_re: Fix silent failure in HWRM version query
https://git.kernel.org/rdma/rdma/c/c242e92c9da456
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-05 9:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-03 4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib
2026-03-04 9:02 ` Siva Reddy Kallam
2026-03-04 15:37 ` Leon Romanovsky
2026-03-05 3:49 ` Kamal Heib
2026-03-05 9:32 ` Leon Romanovsky
2026-03-05 9:34 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox