* Re: [Patch v6 0/4] bnx2x: Fix error recovering in switch configuration
@ 2023-11-16 16:08 Thinh Tran
2023-11-16 20:56 ` Simon Horman
2024-01-17 21:56 ` Thinh Tran
0 siblings, 2 replies; 6+ messages in thread
From: Thinh Tran @ 2023-11-16 16:08 UTC (permalink / raw)
To: Jakub Kicinski
Cc: aelior, davem, edumazet, manishc, netdev, pabeni, skalluru,
VENKATA.SAI.DUGGI, Thinh Tran, Abdul Haleem, David Christensen,
Simon Horman
Hi,
Could we proceed with advancing these patches? They've been in the
"Awaiting Upstream" state for a while now. Notably, one of them has
successfully made it to the mainline kernel:
[v6,1/4] bnx2x: new flag for tracking HW resource
https://github.com/torvalds/linux/commit/bf23ffc8a9a777dfdeb04232e0946b803adbb6a9
As testing the latest kernel, we are still encountering crashes due to
the absence of one of the patches:
[v6,3/4] bnx2x: Prevent access to a freed page in page_pool.
Is there anything specific I need to do to help moving these patches
forward?
We would greatly appreciate if they could be incorporated into the
mainline kernel.
Thank you,
Thinh Tran
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Patch v6 0/4] bnx2x: Fix error recovering in switch configuration
2023-11-16 16:08 [Patch v6 0/4] bnx2x: Fix error recovering in switch configuration Thinh Tran
@ 2023-11-16 20:56 ` Simon Horman
2024-01-17 21:56 ` Thinh Tran
1 sibling, 0 replies; 6+ messages in thread
From: Simon Horman @ 2023-11-16 20:56 UTC (permalink / raw)
To: Thinh Tran
Cc: Jakub Kicinski, aelior, davem, edumazet, manishc, netdev, pabeni,
skalluru, VENKATA.SAI.DUGGI, Abdul Haleem, David Christensen
On Thu, Nov 16, 2023 at 10:08:34AM -0600, Thinh Tran wrote:
> Hi,
>
> Could we proceed with advancing these patches? They've been in the
> "Awaiting Upstream" state for a while now. Notably, one of them has
> successfully made it to the mainline kernel:
> [v6,1/4] bnx2x: new flag for tracking HW resource
>
> https://github.com/torvalds/linux/commit/bf23ffc8a9a777dfdeb04232e0946b803adbb6a9
>
> As testing the latest kernel, we are still encountering crashes due to
> the absence of one of the patches:
> [v6,3/4] bnx2x: Prevent access to a freed page in page_pool.
>
> Is there anything specific I need to do to help moving these patches
> forward?
> We would greatly appreciate if they could be incorporated into the
> mainline kernel.
Hi Thinh Tran,
I'd suggest that the best way to move this forwards would
be to rebase the remaining patches on net-next and posting them as a v7.
It would be useful to include the information above in the cover letter.
And to annotate that they are targeting net-next in the subject of
each patch and the cover letter.
Subject: [PATCH net-next v8 x/3] ...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Patch v6 0/4] bnx2x: Fix error recovering in switch configuration
2023-11-16 16:08 [Patch v6 0/4] bnx2x: Fix error recovering in switch configuration Thinh Tran
2023-11-16 20:56 ` Simon Horman
@ 2024-01-17 21:56 ` Thinh Tran
2024-01-17 23:55 ` Jakub Kicinski
1 sibling, 1 reply; 6+ messages in thread
From: Thinh Tran @ 2024-01-17 21:56 UTC (permalink / raw)
To: Jakub Kicinski
Cc: aelior, davem, edumazet, manishc, netdev, pabeni, skalluru,
VENKATA.SAI.DUGGI, Abdul Haleem, David Christensen, Simon Horman
Hi all,
I hope this message finds you well. I'm reaching out to move forward
with these patches. If there are any remaining concerns or if
additional information is needed from my side, please let me know.
Your guidance on the next steps would be greatly appreciated.
Best regards,
Thinh Tran
On 11/16/2023 10:08 AM, Thinh Tran wrote:
> Hi,
>
> Could we proceed with advancing these patches? They've been in the
> "Awaiting Upstream" state for a while now. Notably, one of them has
> successfully made it to the mainline kernel:
> [v6,1/4] bnx2x: new flag for tracking HW resource
>
> https://github.com/torvalds/linux/commit/bf23ffc8a9a777dfdeb04232e0946b803adbb6a9
>
> As testing the latest kernel, we are still encountering crashes due to
> the absence of one of the patches:
> [v6,3/4] bnx2x: Prevent access to a freed page in page_pool.
>
> Is there anything specific I need to do to help moving these patches
> forward?
> We would greatly appreciate if they could be incorporated into the
> mainline kernel.
>
> Thank you,
> Thinh Tran
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Patch v6 0/4] bnx2x: Fix error recovering in switch configuration
2024-01-17 21:56 ` Thinh Tran
@ 2024-01-17 23:55 ` Jakub Kicinski
2024-01-18 16:53 ` Thinh Tran
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2024-01-17 23:55 UTC (permalink / raw)
To: Thinh Tran
Cc: aelior, davem, edumazet, manishc, netdev, pabeni, skalluru,
VENKATA.SAI.DUGGI, Abdul Haleem, David Christensen, Simon Horman
On Wed, 17 Jan 2024 15:56:21 -0600 Thinh Tran wrote:
> I hope this message finds you well. I'm reaching out to move forward
> with these patches. If there are any remaining concerns or if
> additional information is needed from my side, please let me know.
> Your guidance on the next steps would be greatly appreciated.
If there are any patches that got stuck in a limbo for a long time
please repost them in a new thread. If I'm looking this up right in
online archives the thread is 6 months old, I've deleted the old
messages already :(
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Patch v6 0/4] bnx2x: Fix error recovering in switch configuration
2024-01-17 23:55 ` Jakub Kicinski
@ 2024-01-18 16:53 ` Thinh Tran
0 siblings, 0 replies; 6+ messages in thread
From: Thinh Tran @ 2024-01-18 16:53 UTC (permalink / raw)
To: Jakub Kicinski
Cc: aelior, davem, edumazet, manishc, netdev, pabeni, skalluru,
VENKATA.SAI.DUGGI, Abdul Haleem, David Christensen, Simon Horman
On 1/17/2024 5:55 PM, Jakub Kicinski wrote:
> If there are any patches that got stuck in a limbo for a long time
> please repost them in a new thread. If I'm looking this up right in
> online archives the thread is 6 months old, I've deleted the old
> messages already :(
I will work on re-posting them in a new different thread.
Thank you.
Thinh Tran
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v4] bnx2x: Fix error recovering in switch configuration
@ 2023-07-28 21:11 Thinh Tran
2023-08-18 16:14 ` [Patch v6 0/4] " Thinh Tran
0 siblings, 1 reply; 6+ messages in thread
From: Thinh Tran @ 2023-07-28 21:11 UTC (permalink / raw)
To: kuba
Cc: aelior, davem, edumazet, manishc, netdev, pabeni, skalluru, drc,
abdhalee, simon.horman, Thinh Tran
As the BCM57810 and other I/O adapters are connected
through a PCIe switch, the bnx2x driver causes unexpected
system hang/crash while handling PCIe switch errors, if
its error handler is called after other drivers' handlers.
In this case, after numbers of bnx2x_tx_timout(), the
bnx2x_nic_unload() is called, frees up resources and
calls bnx2x_napi_disable(). Then when EEH calls its
error handler, the bnx2x_io_error_detected() and
bnx2x_io_slot_reset() also calling bnx2x_napi_disable()
and freeing the resources.
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Reviewed-by: Manish Chopra <manishc@marvell.com>
Tested-by: Abdul Haleem <abdhalee@in.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
v4:
- factoring common code into new function bnx2x_stop_nic()
that disables and releases IRQs and NAPIs
v3:
- no changes, just repatched to the latest driver level
- updated the reviewed-by Manish in October, 2022
v2:
- Check the state of the NIC before calling disable nappi
and freeing the IRQ
- Prevent recurrence of TX timeout by turning off the carrier,
calling netif_carrier_off() in bnx2x_tx_timeout()
- Check and bail out early if fp->page_pool already freed
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 ++
.../net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 18 +++++++------
.../net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 18 +++++++++++++
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 25 +++----------------
.../net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c | 9 ++-----
5 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 8bcde0a6e011..e2a4e1088b7f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1508,6 +1508,8 @@ struct bnx2x {
bool cnic_loaded;
struct cnic_eth_dev *(*cnic_probe)(struct net_device *);
+ bool nic_stopped;
+
/* Flag that indicates that we can start looking for FCoE L2 queue
* completions in the default status block.
*/
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 6ea5521074d3..b4143c427936 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2715,6 +2715,7 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode)
bnx2x_add_all_napi(bp);
DP(NETIF_MSG_IFUP, "napi added\n");
bnx2x_napi_enable(bp);
+ bp->nic_stopped = false;
if (IS_PF(bp)) {
/* set pf load just before approaching the MCP */
@@ -2960,6 +2961,7 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode)
load_error1:
bnx2x_napi_disable(bp);
bnx2x_del_all_napi(bp);
+ bp->nic_stopped = true;
/* clear pf_load status, as it was already set */
if (IS_PF(bp))
@@ -3095,14 +3097,8 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link)
if (!CHIP_IS_E1x(bp))
bnx2x_pf_disable(bp);
- /* Disable HW interrupts, NAPI */
- bnx2x_netif_stop(bp, 1);
- /* Delete all NAPI objects */
- bnx2x_del_all_napi(bp);
- if (CNIC_LOADED(bp))
- bnx2x_del_all_napi_cnic(bp);
- /* Release IRQs */
- bnx2x_free_irq(bp);
+ /* Disable HW interrupts, delete NAPIs, Release IRQs */
+ bnx2x_stop_nic(bp);
/* Report UNLOAD_DONE to MCP */
bnx2x_send_unload_done(bp, false);
@@ -4987,6 +4983,12 @@ void bnx2x_tx_timeout(struct net_device *dev, unsigned int txqueue)
{
struct bnx2x *bp = netdev_priv(dev);
+ /* Immediately indicate link as down */
+ bp->link_vars.link_up = 0;
+ bp->force_link_down = true;
+ netif_carrier_off(dev);
+ BNX2X_ERR("Indicating link is down due to Tx-timeout\n");
+
/* We want the information of the dump logged,
* but calling bnx2x_panic() would kill all chances of recovery.
*/
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index d8b1824c334d..f5ecbe8d604a 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -1015,6 +1015,9 @@ static inline void bnx2x_free_rx_sge_range(struct bnx2x *bp,
{
int i;
+ if (!fp->page_pool.page)
+ return;
+
if (fp->mode == TPA_MODE_DISABLED)
return;
@@ -1399,5 +1402,20 @@ void bnx2x_set_os_driver_state(struct bnx2x *bp, u32 state);
*/
int bnx2x_nvram_read(struct bnx2x *bp, u32 offset, u8 *ret_buf,
int buf_size);
+static inline void bnx2x_stop_nic(struct bnx2x *bp)
+{
+ if (!bp->nic_stopped) {
+ /* Disable HW interrupts, NAPI */
+ bnx2x_netif_stop(bp, 1);
+ /* Delete all NAPI objects */
+ bnx2x_del_all_napi(bp);
+ if (CNIC_LOADED(bp))
+ bnx2x_del_all_napi_cnic(bp);
+ /* Release IRQs */
+ bnx2x_free_irq(bp);
+ bp->nic_stopped = true;
+ }
+}
+
#endif /* BNX2X_CMN_H */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 1e7a6f1d4223..e40c83b8b32e 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9474,15 +9474,8 @@ void bnx2x_chip_cleanup(struct bnx2x *bp, int unload_mode, bool keep_link)
}
}
- /* Disable HW interrupts, NAPI */
- bnx2x_netif_stop(bp, 1);
- /* Delete all NAPI objects */
- bnx2x_del_all_napi(bp);
- if (CNIC_LOADED(bp))
- bnx2x_del_all_napi_cnic(bp);
-
- /* Release IRQs */
- bnx2x_free_irq(bp);
+ /* Disable HW interrupts, delete NAPIs, Release IRQs */
+ bnx2x_stop_nic(bp);
/* Reset the chip, unless PCI function is offline. If we reach this
* point following a PCI error handling, it means device is really
@@ -10273,12 +10266,6 @@ static void bnx2x_sp_rtnl_task(struct work_struct *work)
bp->sp_rtnl_state = 0;
smp_mb();
- /* Immediately indicate link as down */
- bp->link_vars.link_up = 0;
- bp->force_link_down = true;
- netif_carrier_off(bp->dev);
- BNX2X_ERR("Indicating link is down due to Tx-timeout\n");
-
bnx2x_nic_unload(bp, UNLOAD_NORMAL, true);
/* When ret value shows failure of allocation failure,
* the nic is rebooted again. If open still fails, a error
@@ -14238,13 +14225,9 @@ static pci_ers_result_t bnx2x_io_slot_reset(struct pci_dev *pdev)
}
bnx2x_drain_tx_queues(bp);
bnx2x_send_unload_req(bp, UNLOAD_RECOVERY);
- bnx2x_netif_stop(bp, 1);
- bnx2x_del_all_napi(bp);
-
- if (CNIC_LOADED(bp))
- bnx2x_del_all_napi_cnic(bp);
- bnx2x_free_irq(bp);
+ /* Disable HW interrupts, delete NAPIs, Release IRQs */
+ bnx2x_stop_nic(bp);
/* Report UNLOAD_DONE to MCP */
bnx2x_send_unload_done(bp, true);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index 0657a0f5170f..d8af7c453172 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -529,13 +529,8 @@ void bnx2x_vfpf_close_vf(struct bnx2x *bp)
bnx2x_vfpf_finalize(bp, &req->first_tlv);
free_irq:
- /* Disable HW interrupts, NAPI */
- bnx2x_netif_stop(bp, 0);
- /* Delete all NAPI objects */
- bnx2x_del_all_napi(bp);
-
- /* Release IRQs */
- bnx2x_free_irq(bp);
+ /* Disable HW interrupts, delete NAPIs, Release IRQs */
+ bnx2x_stop_nic(bp);
}
static void bnx2x_leading_vfq_init(struct bnx2x *bp, struct bnx2x_virtf *vf,
--
2.31.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Patch v6 0/4] bnx2x: Fix error recovering in switch configuration
2023-07-28 21:11 [PATCH v4] " Thinh Tran
@ 2023-08-18 16:14 ` Thinh Tran
0 siblings, 0 replies; 6+ messages in thread
From: Thinh Tran @ 2023-08-18 16:14 UTC (permalink / raw)
To: kuba
Cc: aelior, davem, edumazet, manishc, netdev, pabeni, skalluru,
VENKATA.SAI.DUGGI, Thinh Tran, Abdul Haleem, David Christensen,
Simon Horman, Venkata Sai Duggi
While injecting PCIe errors to the upstream PCIe switch of
a BCM57810 NIC, system hangs/crashes were observed.
After several calls to bnx2x_tx_timout() complete,
bnx2x_nic_unload() is called to free up HW resources
and bnx2x_napi_disable() is called to release NAPI objects.
Later, when the EEH driver calls bnx2x_io_slot_reset() to
complete the recovery process, bnx2x attempts to disable
NAPI again by calling bnx2x_napi_disable() and freeing
resources which have already been freed, resulting in a
hang or crash.
This patch set introduces a new flag to track the HW
resource and NAPI allocation state, refactor duplicated
code into a single function, check page pool allocation
status before freeing, and reduces debug output when
a TX timeout event occurs.
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Reviewed-by: Manish Chopra <manishc@marvell.com>
Tested-by: Abdul Haleem <abdhalee@in.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
v6:
- Clarifying and updating commit messages
v5:
- Breaking down into a series of individual patches
v4:
- factoring common code into new function bnx2x_stop_nic()
that disables and releases IRQs and NAPIs
v3:
- no changes, just repatched to the latest driver level
- updated the reviewed-by Manish in October, 2022
v2:
- Check the state of the NIC before calling disable nappi
and freeing the IRQ
- Prevent recurrence of TX timeout by turning off the carrier,
calling netif_carrier_off() in bnx2x_tx_timeout()
- Check and bail out early if fp->page_pool already freed
Thinh Tran (4):
bnx2x: new the bp->nic_stopped variable for checking NIC status
bnx2x: factor out common code to bnx2x_stop_nic()
bnx2x: Prevent access to a freed page in page_pool
bnx2x: prevent excessive debug information during a TX timeout
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 ++
.../net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 33 ++++++++++++++-----
.../net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 4 +++
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 26 +++------------
.../net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c | 9 ++---
5 files changed, 37 insertions(+), 37 deletions(-)
--
2.27.0
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-18 16:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-16 16:08 [Patch v6 0/4] bnx2x: Fix error recovering in switch configuration Thinh Tran
2023-11-16 20:56 ` Simon Horman
2024-01-17 21:56 ` Thinh Tran
2024-01-17 23:55 ` Jakub Kicinski
2024-01-18 16:53 ` Thinh Tran
-- strict thread matches above, loose matches on Subject: below --
2023-07-28 21:11 [PATCH v4] " Thinh Tran
2023-08-18 16:14 ` [Patch v6 0/4] " Thinh Tran
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).