public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
@ 2026-03-03  4:36 Kamal Heib
  2026-03-04  9:02 ` Siva Reddy Kallam
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Kamal Heib @ 2026-03-03  4:36 UTC (permalink / raw)
  To: linux-rdma
  Cc: Siva Reddy Kallam, Jason Gunthorpe, Leon Romanovsky, Kamal Heib

If the firmware version query fails, the driver currently ignores the
error and continues initializing. This leaves the device in a bad state.

Fix this by making bng_re_query_hwrm_version() return the error code and
update the driver to check for this error and stop the setup process
safely if it happens.

Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver")
Signed-off-by: Kamal Heib <kheib@redhat.com>
---
 drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c
index d34b5f88cd40..17147175a9b0 100644
--- a/drivers/infiniband/hw/bng_re/bng_dev.c
+++ b/drivers/infiniband/hw/bng_re/bng_dev.c
@@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev)
 	return rc;
 }
 
-static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
+static int bng_re_query_hwrm_version(struct bng_re_dev *rdev)
 {
 	struct bnge_auxr_dev *aux_dev = rdev->aux_dev;
 	struct hwrm_ver_get_output ver_get_resp = {};
@@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
 	if (rc) {
 		ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x",
 			  rc);
-		return;
+		return rc;
 	}
 
 	cctx = rdev->chip_ctx;
@@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
 
 	if (!cctx->hwrm_cmd_max_timeout)
 		cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT;
+
+	return 0;
 }
 
 static void bng_re_dev_uninit(struct bng_re_dev *rdev)
@@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
 		goto msix_ctx_fail;
 	}
 
-	bng_re_query_hwrm_version(rdev);
+	rc = bng_re_query_hwrm_version(rdev);
+	if (rc)
+		goto query_hwrm_ver_fail;
 
 	rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw);
 	if (rc) {
@@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
 nq_alloc_fail:
 	bng_re_free_rcfw_channel(&rdev->rcfw);
 alloc_fw_chl_fail:
+query_hwrm_ver_fail:
 	bng_re_destroy_chip_ctx(rdev);
 msix_ctx_fail:
 	bnge_unregister_dev(rdev->aux_dev);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
  2026-03-03  4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib
@ 2026-03-04  9:02 ` Siva Reddy Kallam
  2026-03-04 15:37 ` Leon Romanovsky
  2026-03-05  9:34 ` Leon Romanovsky
  2 siblings, 0 replies; 6+ messages in thread
From: Siva Reddy Kallam @ 2026-03-04  9:02 UTC (permalink / raw)
  To: Kamal Heib; +Cc: linux-rdma, Jason Gunthorpe, Leon Romanovsky

[-- Attachment #1: Type: text/plain, Size: 2654 bytes --]

On Tue, Mar 3, 2026 at 10:06 AM Kamal Heib <kheib@redhat.com> wrote:
>
> If the firmware version query fails, the driver currently ignores the
> error and continues initializing. This leaves the device in a bad state.
>
> Fix this by making bng_re_query_hwrm_version() return the error code and
> update the driver to check for this error and stop the setup process
> safely if it happens.
>
> Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver")
> Signed-off-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
> ---
>  drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c
> index d34b5f88cd40..17147175a9b0 100644
> --- a/drivers/infiniband/hw/bng_re/bng_dev.c
> +++ b/drivers/infiniband/hw/bng_re/bng_dev.c
> @@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev)
>         return rc;
>  }
>
> -static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> +static int bng_re_query_hwrm_version(struct bng_re_dev *rdev)
>  {
>         struct bnge_auxr_dev *aux_dev = rdev->aux_dev;
>         struct hwrm_ver_get_output ver_get_resp = {};
> @@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
>         if (rc) {
>                 ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x",
>                           rc);
> -               return;
> +               return rc;
>         }
>
>         cctx = rdev->chip_ctx;
> @@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
>
>         if (!cctx->hwrm_cmd_max_timeout)
>                 cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT;
> +
> +       return 0;
>  }
>
>  static void bng_re_dev_uninit(struct bng_re_dev *rdev)
> @@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
>                 goto msix_ctx_fail;
>         }
>
> -       bng_re_query_hwrm_version(rdev);
> +       rc = bng_re_query_hwrm_version(rdev);
> +       if (rc)
> +               goto query_hwrm_ver_fail;
>
>         rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw);
>         if (rc) {
> @@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
>  nq_alloc_fail:
>         bng_re_free_rcfw_channel(&rdev->rcfw);
>  alloc_fw_chl_fail:
> +query_hwrm_ver_fail:
>         bng_re_destroy_chip_ctx(rdev);
>  msix_ctx_fail:
>         bnge_unregister_dev(rdev->aux_dev);
> --
> 2.52.0
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5471 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
  2026-03-03  4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib
  2026-03-04  9:02 ` Siva Reddy Kallam
@ 2026-03-04 15:37 ` Leon Romanovsky
  2026-03-05  3:49   ` Kamal Heib
  2026-03-05  9:34 ` Leon Romanovsky
  2 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2026-03-04 15:37 UTC (permalink / raw)
  To: Kamal Heib; +Cc: linux-rdma, Siva Reddy Kallam, Jason Gunthorpe

On Mon, Mar 02, 2026 at 11:36:45PM -0500, Kamal Heib wrote:
> If the firmware version query fails, the driver currently ignores the
> error and continues initializing. This leaves the device in a bad state.

Can you please elaborate what will it cause?

Thanks

> 
> Fix this by making bng_re_query_hwrm_version() return the error code and
> update the driver to check for this error and stop the setup process
> safely if it happens.
> 
> Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver")
> Signed-off-by: Kamal Heib <kheib@redhat.com>
> ---
>  drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c
> index d34b5f88cd40..17147175a9b0 100644
> --- a/drivers/infiniband/hw/bng_re/bng_dev.c
> +++ b/drivers/infiniband/hw/bng_re/bng_dev.c
> @@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev)
>  	return rc;
>  }
>  
> -static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> +static int bng_re_query_hwrm_version(struct bng_re_dev *rdev)
>  {
>  	struct bnge_auxr_dev *aux_dev = rdev->aux_dev;
>  	struct hwrm_ver_get_output ver_get_resp = {};
> @@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
>  	if (rc) {
>  		ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x",
>  			  rc);
> -		return;
> +		return rc;
>  	}
>  
>  	cctx = rdev->chip_ctx;
> @@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
>  
>  	if (!cctx->hwrm_cmd_max_timeout)
>  		cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT;
> +
> +	return 0;
>  }
>  
>  static void bng_re_dev_uninit(struct bng_re_dev *rdev)
> @@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
>  		goto msix_ctx_fail;
>  	}
>  
> -	bng_re_query_hwrm_version(rdev);
> +	rc = bng_re_query_hwrm_version(rdev);
> +	if (rc)
> +		goto query_hwrm_ver_fail;
>  
>  	rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw);
>  	if (rc) {
> @@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
>  nq_alloc_fail:
>  	bng_re_free_rcfw_channel(&rdev->rcfw);
>  alloc_fw_chl_fail:
> +query_hwrm_ver_fail:
>  	bng_re_destroy_chip_ctx(rdev);
>  msix_ctx_fail:
>  	bnge_unregister_dev(rdev->aux_dev);
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
  2026-03-04 15:37 ` Leon Romanovsky
@ 2026-03-05  3:49   ` Kamal Heib
  2026-03-05  9:32     ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: Kamal Heib @ 2026-03-05  3:49 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma, Siva Reddy Kallam, Jason Gunthorpe

On Wed, Mar 04, 2026 at 05:37:07PM +0200, Leon Romanovsky wrote:
> On Mon, Mar 02, 2026 at 11:36:45PM -0500, Kamal Heib wrote:
> > If the firmware version query fails, the driver currently ignores the
> > error and continues initializing. This leaves the device in a bad state.
> 
> Can you please elaborate what will it cause?
> 
> Thanks
>

If bng_re_query_hwrm_version() fails, the code returns early and leaves
cctx->hwrm_cmd_max_timeout uninitialized. This parameter is subsequently
assigned to rcfw->max_timeout, which is used by __wait_for_resp(). Later,
when the driver sends firmware commands and enters __wait_for_resp(), it
passes a zero timeout to the commands being sent, which can lead to a
lockup.

Also, cctx->hwrm_intf_ver is left uninitialized, which will likely
be used in the future to determine if a specific feature is supported
or not (like how it is done in bnxt_re).

Thanks,
Kamal
> > 
> > Fix this by making bng_re_query_hwrm_version() return the error code and
> > update the driver to check for this error and stop the setup process
> > safely if it happens.
> > 
> > Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver")
> > Signed-off-by: Kamal Heib <kheib@redhat.com>
> > ---
> >  drivers/infiniband/hw/bng_re/bng_dev.c | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c
> > index d34b5f88cd40..17147175a9b0 100644
> > --- a/drivers/infiniband/hw/bng_re/bng_dev.c
> > +++ b/drivers/infiniband/hw/bng_re/bng_dev.c
> > @@ -210,7 +210,7 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev)
> >  	return rc;
> >  }
> >  
> > -static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> > +static int bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> >  {
> >  	struct bnge_auxr_dev *aux_dev = rdev->aux_dev;
> >  	struct hwrm_ver_get_output ver_get_resp = {};
> > @@ -230,7 +230,7 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> >  	if (rc) {
> >  		ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x",
> >  			  rc);
> > -		return;
> > +		return rc;
> >  	}
> >  
> >  	cctx = rdev->chip_ctx;
> > @@ -244,6 +244,8 @@ static void bng_re_query_hwrm_version(struct bng_re_dev *rdev)
> >  
> >  	if (!cctx->hwrm_cmd_max_timeout)
> >  		cctx->hwrm_cmd_max_timeout = BNG_ROCE_FW_MAX_TIMEOUT;
> > +
> > +	return 0;
> >  }
> >  
> >  static void bng_re_dev_uninit(struct bng_re_dev *rdev)
> > @@ -306,7 +308,9 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
> >  		goto msix_ctx_fail;
> >  	}
> >  
> > -	bng_re_query_hwrm_version(rdev);
> > +	rc = bng_re_query_hwrm_version(rdev);
> > +	if (rc)
> > +		goto query_hwrm_ver_fail;
> >  
> >  	rc = bng_re_alloc_fw_channel(&rdev->bng_res, &rdev->rcfw);
> >  	if (rc) {
> > @@ -392,6 +396,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev)
> >  nq_alloc_fail:
> >  	bng_re_free_rcfw_channel(&rdev->rcfw);
> >  alloc_fw_chl_fail:
> > +query_hwrm_ver_fail:
> >  	bng_re_destroy_chip_ctx(rdev);
> >  msix_ctx_fail:
> >  	bnge_unregister_dev(rdev->aux_dev);
> > -- 
> > 2.52.0
> > 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
  2026-03-05  3:49   ` Kamal Heib
@ 2026-03-05  9:32     ` Leon Romanovsky
  0 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2026-03-05  9:32 UTC (permalink / raw)
  To: Kamal Heib; +Cc: linux-rdma, Siva Reddy Kallam, Jason Gunthorpe

On Wed, Mar 04, 2026 at 10:49:40PM -0500, Kamal Heib wrote:
> On Wed, Mar 04, 2026 at 05:37:07PM +0200, Leon Romanovsky wrote:
> > On Mon, Mar 02, 2026 at 11:36:45PM -0500, Kamal Heib wrote:
> > > If the firmware version query fails, the driver currently ignores the
> > > error and continues initializing. This leaves the device in a bad state.
> > 
> > Can you please elaborate what will it cause?
> > 
> > Thanks
> >
> 
> If bng_re_query_hwrm_version() fails, the code returns early and leaves
> cctx->hwrm_cmd_max_timeout uninitialized. This parameter is subsequently
> assigned to rcfw->max_timeout, which is used by __wait_for_resp(). Later,
> when the driver sends firmware commands and enters __wait_for_resp(), it
> passes a zero timeout to the commands being sent, which can lead to a
> lockup.
> 
> Also, cctx->hwrm_intf_ver is left uninitialized, which will likely
> be used in the future to determine if a specific feature is supported
> or not (like how it is done in bnxt_re).

I'm not concerned about these flows. If something as fundamental as querying
the HW version fails, it's likely that nothing else will behave correctly.

Let's apply this patch.

Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query
  2026-03-03  4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib
  2026-03-04  9:02 ` Siva Reddy Kallam
  2026-03-04 15:37 ` Leon Romanovsky
@ 2026-03-05  9:34 ` Leon Romanovsky
  2 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2026-03-05  9:34 UTC (permalink / raw)
  To: linux-rdma, Kamal Heib; +Cc: Siva Reddy Kallam, Jason Gunthorpe


On Mon, 02 Mar 2026 23:36:45 -0500, Kamal Heib wrote:
> If the firmware version query fails, the driver currently ignores the
> error and continues initializing. This leaves the device in a bad state.
> 
> Fix this by making bng_re_query_hwrm_version() return the error code and
> update the driver to check for this error and stop the setup process
> safely if it happens.
> 
> [...]

Applied, thanks!

[1/1] RDMA/bng_re: Fix silent failure in HWRM version query
      https://git.kernel.org/rdma/rdma/c/c242e92c9da456

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-05  9:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-03  4:36 [PATCH for-rc] RDMA/bng_re: Fix silent failure in HWRM version query Kamal Heib
2026-03-04  9:02 ` Siva Reddy Kallam
2026-03-04 15:37 ` Leon Romanovsky
2026-03-05  3:49   ` Kamal Heib
2026-03-05  9:32     ` Leon Romanovsky
2026-03-05  9:34 ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox