* [PATCH net-next 1/4] bnxt_en: Change FW message timeout warning
2025-04-15 17:48 [PATCH net-next 0/4] bnxt_en: Update for net-next Michael Chan
@ 2025-04-15 17:48 ` Michael Chan
2025-04-16 3:14 ` Jakub Kicinski
2025-04-15 17:48 ` [PATCH net-next 2/4] bnxt_en: Report the ethtool coredump length after copying the coredump Michael Chan
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Michael Chan @ 2025-04-15 17:48 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, kuba, pabeni, andrew+netdev, pavan.chebbi,
andrew.gospodarek, Kalesh AP
The firmware advertises a "hwrm_cmd_max_timeout" value to the driver
for NVRAM and coredump related functions that can take tens of seconds
to complete. The driver polls for the operation to complete under
mutex and may trigger hung task watchdog warning if the wait is too long.
To warn the user about this, the driver currently prints a warning if
this advertised value exceeds 40 seconds:
Device requests max timeout of %d seconds, may trigger hung task watchdog
The original hope was that newer FW would be able to reduce this timeout
below 40 seconds. But 60 seconds is the timeout on most production FW
and cannot be reduced further. Change the driver's warning threshold to
60 seconds to avoid triggering this warning on all production devices.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.h
index 15ca51b5d204..fb5f5b063c3d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.h
@@ -58,7 +58,7 @@ void hwrm_update_token(struct bnxt *bp, u16 seq, enum bnxt_hwrm_wait_state s);
#define BNXT_HWRM_MAX_REQ_LEN (bp->hwrm_max_req_len)
#define BNXT_HWRM_SHORT_REQ_LEN sizeof(struct hwrm_short_input)
-#define HWRM_CMD_MAX_TIMEOUT 40000U
+#define HWRM_CMD_MAX_TIMEOUT 60000U
#define SHORT_HWRM_CMD_TIMEOUT 20
#define HWRM_CMD_TIMEOUT (bp->hwrm_cmd_timeout)
#define HWRM_RESET_TIMEOUT ((HWRM_CMD_TIMEOUT) * 4)
--
2.30.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH net-next 1/4] bnxt_en: Change FW message timeout warning
2025-04-15 17:48 ` [PATCH net-next 1/4] bnxt_en: Change FW message timeout warning Michael Chan
@ 2025-04-16 3:14 ` Jakub Kicinski
2025-04-16 21:41 ` Michael Chan
0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2025-04-16 3:14 UTC (permalink / raw)
To: Michael Chan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, pavan.chebbi,
andrew.gospodarek, Kalesh AP
On Tue, 15 Apr 2025 10:48:15 -0700 Michael Chan wrote:
> The firmware advertises a "hwrm_cmd_max_timeout" value to the driver
> for NVRAM and coredump related functions that can take tens of seconds
> to complete. The driver polls for the operation to complete under
> mutex and may trigger hung task watchdog warning if the wait is too long.
> To warn the user about this, the driver currently prints a warning if
> this advertised value exceeds 40 seconds:
>
> Device requests max timeout of %d seconds, may trigger hung task watchdog
>
> The original hope was that newer FW would be able to reduce this timeout
> below 40 seconds. But 60 seconds is the timeout on most production FW
> and cannot be reduced further. Change the driver's warning threshold to
> 60 seconds to avoid triggering this warning on all production devices.
>
> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.h
> index 15ca51b5d204..fb5f5b063c3d 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.h
> @@ -58,7 +58,7 @@ void hwrm_update_token(struct bnxt *bp, u16 seq, enum bnxt_hwrm_wait_state s);
>
> #define BNXT_HWRM_MAX_REQ_LEN (bp->hwrm_max_req_len)
> #define BNXT_HWRM_SHORT_REQ_LEN sizeof(struct hwrm_short_input)
> -#define HWRM_CMD_MAX_TIMEOUT 40000U
> +#define HWRM_CMD_MAX_TIMEOUT 60000U
> #define SHORT_HWRM_CMD_TIMEOUT 20
> #define HWRM_CMD_TIMEOUT (bp->hwrm_cmd_timeout)
> #define HWRM_RESET_TIMEOUT ((HWRM_CMD_TIMEOUT) * 4)
sysctl_hung_task_timeout_secs is an exported symbol, and it defaults to 120.
Should you not use it in the warning (assuming I understand the intent
there)?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 1/4] bnxt_en: Change FW message timeout warning
2025-04-16 3:14 ` Jakub Kicinski
@ 2025-04-16 21:41 ` Michael Chan
2025-04-16 21:44 ` Jakub Kicinski
0 siblings, 1 reply; 9+ messages in thread
From: Michael Chan @ 2025-04-16 21:41 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, pavan.chebbi,
andrew.gospodarek, Kalesh AP
[-- Attachment #1: Type: text/plain, Size: 629 bytes --]
On Tue, Apr 15, 2025 at 8:14 PM Jakub Kicinski <kuba@kernel.org> wrote:
> sysctl_hung_task_timeout_secs is an exported symbol, and it defaults to 120.
> Should you not use it in the warning (assuming I understand the intent
> there)?
Yes, we have considered that. This is only printed once at driver
load time, but the sysctl value can be changed at any time after the
driver is loaded. So we just want to use a reasonable value well
below the default sysctl_hung_task_timeout_secs value as the
threshold.
But we can reference and compare with the
sysctl_hung_task_timeout_secs value if that makes more sense.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 1/4] bnxt_en: Change FW message timeout warning
2025-04-16 21:41 ` Michael Chan
@ 2025-04-16 21:44 ` Jakub Kicinski
2025-04-16 22:07 ` Michael Chan
0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2025-04-16 21:44 UTC (permalink / raw)
To: Michael Chan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, pavan.chebbi,
andrew.gospodarek, Kalesh AP
On Wed, 16 Apr 2025 14:41:10 -0700 Michael Chan wrote:
> On Tue, Apr 15, 2025 at 8:14 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> > sysctl_hung_task_timeout_secs is an exported symbol, and it defaults to 120.
> > Should you not use it in the warning (assuming I understand the intent
> > there)?
> Yes, we have considered that. This is only printed once at driver
> load time, but the sysctl value can be changed at any time after the
> driver is loaded. So we just want to use a reasonable value well
> below the default sysctl_hung_task_timeout_secs value as the
> threshold.
>
> But we can reference and compare with the
> sysctl_hung_task_timeout_secs value if that makes more sense.
I see your point. We could also check against
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT ?
I noticed that some arches set this value really low (10 or 20 sec),
it may be worth warning the users in such cases.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 1/4] bnxt_en: Change FW message timeout warning
2025-04-16 21:44 ` Jakub Kicinski
@ 2025-04-16 22:07 ` Michael Chan
0 siblings, 0 replies; 9+ messages in thread
From: Michael Chan @ 2025-04-16 22:07 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, pavan.chebbi,
andrew.gospodarek, Kalesh AP
[-- Attachment #1: Type: text/plain, Size: 1122 bytes --]
On Wed, Apr 16, 2025 at 2:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 16 Apr 2025 14:41:10 -0700 Michael Chan wrote:
> > On Tue, Apr 15, 2025 at 8:14 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > > sysctl_hung_task_timeout_secs is an exported symbol, and it defaults to 120.
> > > Should you not use it in the warning (assuming I understand the intent
> > > there)?
> > Yes, we have considered that. This is only printed once at driver
> > load time, but the sysctl value can be changed at any time after the
> > driver is loaded. So we just want to use a reasonable value well
> > below the default sysctl_hung_task_timeout_secs value as the
> > threshold.
> >
> > But we can reference and compare with the
> > sysctl_hung_task_timeout_secs value if that makes more sense.
>
> I see your point. We could also check against
> CONFIG_DEFAULT_HUNG_TASK_TIMEOUT ?
>
> I noticed that some arches set this value really low (10 or 20 sec),
> it may be worth warning the users in such cases.
OK, that sounds good. I will check CONFIG_DEFAULT_HUNG_TASK_TIMEOUT
in v2. Thanks.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next 2/4] bnxt_en: Report the ethtool coredump length after copying the coredump
2025-04-15 17:48 [PATCH net-next 0/4] bnxt_en: Update for net-next Michael Chan
2025-04-15 17:48 ` [PATCH net-next 1/4] bnxt_en: Change FW message timeout warning Michael Chan
@ 2025-04-15 17:48 ` Michael Chan
2025-04-15 17:48 ` [PATCH net-next 3/4] bnxt_en: Remove unused field "ref_count" in struct bnxt_ulp Michael Chan
2025-04-15 17:48 ` [PATCH net-next 4/4] bnxt_en: Remove unused macros in bnxt_ulp.h Michael Chan
3 siblings, 0 replies; 9+ messages in thread
From: Michael Chan @ 2025-04-15 17:48 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, kuba, pabeni, andrew+netdev, pavan.chebbi,
andrew.gospodarek, Shruti Parab, Damodharam Ammepalli
From: Shruti Parab <shruti.parab@broadcom.com>
ethtool first calls .get_dump_flags() to get the dump length. For
coredump, the driver calls the FW to get the coredump length (L1). The
min. of L1 and the user specified length is then passed to
.get_dump_data() (L2) to get the coredump. The actual coredump length
retrieved by the FW (L3) during .get_dump_data() may be smaller than L1.
This length discrepancy will trigger a WARN_ON() in
ethtool_get_dump_data().
ethtool has already vzalloc'ed a buffer with size L1. Just report
the coredump length as L2 even though the actual coredump length L3
may be smaller. The extra zero padding does not matter. This will
prevent the warning that may alarm the user.
For correctness, only do the final length update if there is no error.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_coredump.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_coredump.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_coredump.c
index 5576e7cf8463..9b6489e417fc 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_coredump.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_coredump.c
@@ -496,9 +496,16 @@ static int __bnxt_get_coredump(struct bnxt *bp, u16 dump_type, void *buf,
start_utc, coredump.total_segs + 1,
rc);
kfree(coredump.data);
- *dump_len += sizeof(struct bnxt_coredump_record);
- if (rc == -ENOBUFS)
+ if (!rc) {
+ *dump_len += sizeof(struct bnxt_coredump_record);
+ /* The actual coredump length can be smaller than the FW
+ * reported length earlier. Use the ethtool provided length.
+ */
+ if (buf_len)
+ *dump_len = buf_len;
+ } else if (rc == -ENOBUFS) {
netdev_err(bp->dev, "Firmware returned large coredump buffer\n");
+ }
return rc;
}
--
2.30.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH net-next 3/4] bnxt_en: Remove unused field "ref_count" in struct bnxt_ulp
2025-04-15 17:48 [PATCH net-next 0/4] bnxt_en: Update for net-next Michael Chan
2025-04-15 17:48 ` [PATCH net-next 1/4] bnxt_en: Change FW message timeout warning Michael Chan
2025-04-15 17:48 ` [PATCH net-next 2/4] bnxt_en: Report the ethtool coredump length after copying the coredump Michael Chan
@ 2025-04-15 17:48 ` Michael Chan
2025-04-15 17:48 ` [PATCH net-next 4/4] bnxt_en: Remove unused macros in bnxt_ulp.h Michael Chan
3 siblings, 0 replies; 9+ messages in thread
From: Michael Chan @ 2025-04-15 17:48 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, kuba, pabeni, andrew+netdev, pavan.chebbi,
andrew.gospodarek, Kalesh AP, Somnath Kotur
From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
The "ref_count" field in struct bnxt_ulp is unused after
commit a43c26fa2e6c ("RDMA/bnxt_re: Remove the sriov config callback").
So we can just remove it now.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 5 -----
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 1 -
2 files changed, 6 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index a8e930d5dbb0..238db9a1aebf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -148,7 +148,6 @@ void bnxt_unregister_dev(struct bnxt_en_dev *edev)
struct net_device *dev = edev->net;
struct bnxt *bp = netdev_priv(dev);
struct bnxt_ulp *ulp;
- int i = 0;
ulp = edev->ulp_tbl;
netdev_lock(dev);
@@ -164,10 +163,6 @@ void bnxt_unregister_dev(struct bnxt_en_dev *edev)
synchronize_rcu();
ulp->max_async_event_id = 0;
ulp->async_events_bmap = NULL;
- while (atomic_read(&ulp->ref_count) != 0 && i < 10) {
- msleep(100);
- i++;
- }
mutex_unlock(&edev->en_dev_lock);
netdev_unlock(dev);
return;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
index 7fa3b8d1ebd2..f6b5efb5e775 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
@@ -50,7 +50,6 @@ struct bnxt_ulp {
unsigned long *async_events_bmap;
u16 max_async_event_id;
u16 msix_requested;
- atomic_t ref_count;
};
struct bnxt_en_dev {
--
2.30.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH net-next 4/4] bnxt_en: Remove unused macros in bnxt_ulp.h
2025-04-15 17:48 [PATCH net-next 0/4] bnxt_en: Update for net-next Michael Chan
` (2 preceding siblings ...)
2025-04-15 17:48 ` [PATCH net-next 3/4] bnxt_en: Remove unused field "ref_count" in struct bnxt_ulp Michael Chan
@ 2025-04-15 17:48 ` Michael Chan
3 siblings, 0 replies; 9+ messages in thread
From: Michael Chan @ 2025-04-15 17:48 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, kuba, pabeni, andrew+netdev, pavan.chebbi,
andrew.gospodarek, Kalesh AP, Shruti Parab
From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
BNXT_ROCE_ULP and BNXT_MAX_ULP are no longer used. Remove them to
clean up the code.
Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
index f6b5efb5e775..7b9dd8ebe4bc 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
@@ -10,9 +10,6 @@
#ifndef BNXT_ULP_H
#define BNXT_ULP_H
-#define BNXT_ROCE_ULP 0
-#define BNXT_MAX_ULP 1
-
#define BNXT_MIN_ROCE_CP_RINGS 2
#define BNXT_MIN_ROCE_STAT_CTXS 1
--
2.30.1
^ permalink raw reply related [flat|nested] 9+ messages in thread