* [PATCH net 0/2] bnxt_en: Error recovery bug fixes.
@ 2021-02-26 9:43 Michael Chan
2021-02-26 9:43 ` [PATCH net 1/2] bnxt_en: Fix race between firmware reset and driver remove Michael Chan
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Michael Chan @ 2021-02-26 9:43 UTC (permalink / raw)
To: davem; +Cc: netdev, kuba, gospo
Two error recovery related bug fixes for 2 corner cases.
Please queue patch #2 for -stable. Thanks.
Edwin Peer (1):
bnxt_en: reliably allocate IRQ table on reset to avoid crash
Vasundhara Volam (1):
bnxt_en: Fix race between firmware reset and driver remove.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
--
2.18.1
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH net 1/2] bnxt_en: Fix race between firmware reset and driver remove. 2021-02-26 9:43 [PATCH net 0/2] bnxt_en: Error recovery bug fixes Michael Chan @ 2021-02-26 9:43 ` Michael Chan 2021-02-26 9:43 ` [PATCH net 2/2] bnxt_en: reliably allocate IRQ table on reset to avoid crash Michael Chan 2021-02-27 0:00 ` [PATCH net 0/2] bnxt_en: Error recovery bug fixes patchwork-bot+netdevbpf 2 siblings, 0 replies; 4+ messages in thread From: Michael Chan @ 2021-02-26 9:43 UTC (permalink / raw) To: davem; +Cc: netdev, kuba, gospo From: Vasundhara Volam <vasundhara-v.volam@broadcom.com> The driver's error recovery reset sequence can take many seconds to complete and only the critical sections are protected by rtnl_lock. A recent change has introduced a regression in this sequence. bnxt_remove_one() may be called while the recovery is in progress. Normally, unregister_netdev() would cause bnxt_close_nic() to be called and this would cause the error recovery to safely abort with the BNXT_STATE_ABORT_ERR flag set in bnxt_close_nic(). Recently, we added bnxt_reinit_after_abort() to allow the user to reopen the device after an aborted recovery. This causes the regression in the scenario described above because we would attempt to re-open even after the netdev has been unregistered. Fix it by checking the netdev reg_state in bnxt_reinit_after_abort() and abort if it is unregistered. Fixes: 6882c36cf82e ("bnxt_en: attempt to reinitialize after aborted reset") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index a680fd9c68ea..c55189c7bb36 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -9890,6 +9890,9 @@ static int bnxt_reinit_after_abort(struct bnxt *bp) if (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) return -EBUSY; + if (bp->dev->reg_state == NETREG_UNREGISTERED) + return -ENODEV; + rc = bnxt_fw_init_one(bp); if (!rc) { bnxt_clear_int_mode(bp); -- 2.18.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH net 2/2] bnxt_en: reliably allocate IRQ table on reset to avoid crash 2021-02-26 9:43 [PATCH net 0/2] bnxt_en: Error recovery bug fixes Michael Chan 2021-02-26 9:43 ` [PATCH net 1/2] bnxt_en: Fix race between firmware reset and driver remove Michael Chan @ 2021-02-26 9:43 ` Michael Chan 2021-02-27 0:00 ` [PATCH net 0/2] bnxt_en: Error recovery bug fixes patchwork-bot+netdevbpf 2 siblings, 0 replies; 4+ messages in thread From: Michael Chan @ 2021-02-26 9:43 UTC (permalink / raw) To: davem; +Cc: netdev, kuba, gospo From: Edwin Peer <edwin.peer@broadcom.com> The following trace excerpt corresponds with a NULL pointer dereference of 'bp->irq_tbl' in bnxt_setup_inta() on an Aarch64 system after many device resets: Unable to handle kernel NULL pointer dereference at ... 000000d ... pc : string+0x3c/0x80 lr : vsnprintf+0x294/0x7e0 sp : ffff00000f61ba70 pstate : 20000145 x29: ffff00000f61ba70 x28: 000000000000000d x27: ffff0000009c8b5a x26: ffff00000f61bb80 x25: ffff0000009c8b5a x24: 0000000000000012 x23: 00000000ffffffe0 x22: ffff000008990428 x21: ffff00000f61bb80 x20: 000000000000000d x19: 000000000000001f x18: 0000000000000000 x17: 0000000000000000 x16: ffff800b6d0fb400 x15: 0000000000000000 x14: ffff800b7fe31ae8 x13: 00001ed16472c920 x12: ffff000008c6b1c9 x11: ffff000008cf0580 x10: ffff00000f61bb80 x9 : 00000000ffffffd8 x8 : 000000000000000c x7 : ffff800b684b8000 x6 : 0000000000000000 x5 : 0000000000000065 x4 : 0000000000000001 x3 : ffff0a00ffffff04 x2 : 000000000000001f x1 : 0000000000000000 x0 : 000000000000000d Call trace: string+0x3c/0x80 vsnprintf+0x294/0x7e0 snprintf+0x44/0x50 __bnxt_open_nic+0x34c/0x928 [bnxt_en] bnxt_open+0xe8/0x238 [bnxt_en] __dev_open+0xbc/0x130 __dev_change_flags+0x12c/0x168 dev_change_flags+0x20/0x60 ... Ordinarily, a call to bnxt_setup_inta() (not in trace due to inlining) would not be expected on a system supporting MSIX at all. However, if bnxt_init_int_mode() does not end up being called after the call to bnxt_clear_int_mode() in bnxt_fw_reset_close(), then the driver will think that only INTA is supported and bp->irq_tbl will be NULL, causing the above crash. In the error recovery scenario, we call bnxt_clear_int_mode() in bnxt_fw_reset_close() early in the sequence. Ordinarily, we will call bnxt_init_int_mode() in bnxt_hwrm_if_change() after we reestablish communication with the firmware after reset. However, if the sequence has to abort before we call bnxt_init_int_mode() and if the user later attempts to re-open the device, then it will cause the crash above. We fix it in 2 ways: 1. Check for bp->irq_tbl in bnxt_setup_int_mode(). If it is NULL, call bnxt_init_init_mode(). 2. If we need to abort in bnxt_hwrm_if_change() and cannot complete the error recovery sequence, set the BNXT_STATE_ABORT_ERR flag. This will cause more drastic recovery at the next attempt to re-open the device, including a call to bnxt_init_int_mode(). Fixes: 3bc7d4a352ef ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.") Reviewed-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index c55189c7bb36..b53a0d87371a 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -8556,10 +8556,18 @@ static void bnxt_setup_inta(struct bnxt *bp) bp->irq_tbl[0].handler = bnxt_inta; } +static int bnxt_init_int_mode(struct bnxt *bp); + static int bnxt_setup_int_mode(struct bnxt *bp) { int rc; + if (!bp->irq_tbl) { + rc = bnxt_init_int_mode(bp); + if (rc || !bp->irq_tbl) + return rc ?: -ENODEV; + } + if (bp->flags & BNXT_FLAG_USING_MSIX) bnxt_setup_msix(bp); else @@ -8744,7 +8752,7 @@ static int bnxt_init_inta(struct bnxt *bp) static int bnxt_init_int_mode(struct bnxt *bp) { - int rc = 0; + int rc = -ENODEV; if (bp->flags & BNXT_FLAG_MSIX_CAP) rc = bnxt_init_msix(bp); @@ -9514,7 +9522,8 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up) { struct hwrm_func_drv_if_change_output *resp = bp->hwrm_cmd_resp_addr; struct hwrm_func_drv_if_change_input req = {0}; - bool resc_reinit = false, fw_reset = false; + bool fw_reset = !bp->irq_tbl; + bool resc_reinit = false; int rc, retry = 0; u32 flags = 0; @@ -9557,6 +9566,7 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up) if (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state) && !fw_reset) { netdev_err(bp->dev, "RESET_DONE not set during FW reset.\n"); + set_bit(BNXT_STATE_ABORT_ERR, &bp->state); return -ENODEV; } if (resc_reinit || fw_reset) { -- 2.18.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net 0/2] bnxt_en: Error recovery bug fixes. 2021-02-26 9:43 [PATCH net 0/2] bnxt_en: Error recovery bug fixes Michael Chan 2021-02-26 9:43 ` [PATCH net 1/2] bnxt_en: Fix race between firmware reset and driver remove Michael Chan 2021-02-26 9:43 ` [PATCH net 2/2] bnxt_en: reliably allocate IRQ table on reset to avoid crash Michael Chan @ 2021-02-27 0:00 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 4+ messages in thread From: patchwork-bot+netdevbpf @ 2021-02-27 0:00 UTC (permalink / raw) To: Michael Chan; +Cc: davem, netdev, kuba, gospo Hello: This series was applied to netdev/net.git (refs/heads/master): On Fri, 26 Feb 2021 04:43:08 -0500 you wrote: > Two error recovery related bug fixes for 2 corner cases. > > Please queue patch #2 for -stable. Thanks. > > Edwin Peer (1): > bnxt_en: reliably allocate IRQ table on reset to avoid crash > > [...] Here is the summary with links: - [net,1/2] bnxt_en: Fix race between firmware reset and driver remove. https://git.kernel.org/netdev/net/c/d20cd745218c - [net,2/2] bnxt_en: reliably allocate IRQ table on reset to avoid crash https://git.kernel.org/netdev/net/c/20d7d1c5c9b1 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-02-27 0:00 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-02-26 9:43 [PATCH net 0/2] bnxt_en: Error recovery bug fixes Michael Chan 2021-02-26 9:43 ` [PATCH net 1/2] bnxt_en: Fix race between firmware reset and driver remove Michael Chan 2021-02-26 9:43 ` [PATCH net 2/2] bnxt_en: reliably allocate IRQ table on reset to avoid crash Michael Chan 2021-02-27 0:00 ` [PATCH net 0/2] bnxt_en: Error recovery bug fixes patchwork-bot+netdevbpf
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.