All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Cc: netdev@vger.kernel.org, aelior@marvell.com, davem@davemloft.net,
	manishc@marvell.com, skalluru@marvell.com
Subject: Re: [PATCH] bnx2x: Fix error recovering in switch configuration
Date: Fri, 26 Aug 2022 18:44:40 -0700	[thread overview]
Message-ID: <20220826184440.37e7cb85@kernel.org> (raw)
In-Reply-To: <20220825200029.4143670-1-thinhtr@linux.vnet.ibm.com>

On Thu, 25 Aug 2022 20:00:29 +0000 Thinh Tran wrote:
> As the BCM57810 and other I/O adapters are connected
> through a PCIe switch, the bnx2x driver causes unexpected
> system hang/crash while handling PCIe switch errors, if 
> its error handler is called after other drivers' handlers.
> 
> In this case, after numbers of bnx2x_tx_timout(), the
> bnx2x_nic_unload() is  called, frees up resources and
> calls bnx2x_napi_disable(). Then when EEH calls its
> error handler, the bnx2x_io_error_detected() and
> bnx2x_io_slot_reset() also calling bnx2x_napi_disable()
> and freeing the resources.
> 
> This patch will:
> - reduce the numbers of bnx2x_panic_dump() while in
>   bnx2x_tx_timeout(), avoid filling up dmesg buffer.
> - use checking new napi_enable flags to prevent calling 
>   disable again which causing system hangs.
> - cheking if fp->page_pool already freed avoid system
>   crash.

> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> index 712b5595bc39..bb8d91f44642 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> @@ -1860,37 +1860,49 @@ static int bnx2x_setup_irqs(struct bnx2x *bp)
>  static void bnx2x_napi_enable_cnic(struct bnx2x *bp)
>  {
>  	int i;
> +	if (bp->cnic_napi_enable)

empty line between variables and code, pls

> +		return;
>  
>  	for_each_rx_queue_cnic(bp, i) {
>  		napi_enable(&bnx2x_fp(bp, i, napi));
>  	}
> +	bp->cnic_napi_enable = true;

The concept of not calling napi_enable() / disable()
feels a little wrong. It's the state of the driver,
not the NAPI that's the problem so perhaps you could
a appropriately named bool for that (IDK, maybe 
nic_stopped) and prevent coming into the NAPI handling
functions completely?

Is all other code in the driver on the path in question 
really idempotent?

>  }
>  
>  static void bnx2x_napi_enable(struct bnx2x *bp)
>  {
>  	int i;
> +	if (bp->napi_enable)
> +		return;
>  
>  	for_each_eth_queue(bp, i) {
>  		napi_enable(&bnx2x_fp(bp, i, napi));
>  	}
> +	bp->napi_enable = true;
>  }
>  
>  static void bnx2x_napi_disable_cnic(struct bnx2x *bp)
>  {
>  	int i;
> +	if (!bp->cnic_napi_enable)
> +		return;
>  
>  	for_each_rx_queue_cnic(bp, i) {
>  		napi_disable(&bnx2x_fp(bp, i, napi));
>  	}
> +	bp->cnic_napi_enable = false;
>  }
>  
>  static void bnx2x_napi_disable(struct bnx2x *bp)
>  {
>  	int i;
> +	if (!bp->napi_enable)
> +		return;
>  
>  	for_each_eth_queue(bp, i) {
>  		napi_disable(&bnx2x_fp(bp, i, napi));
>  	}
> +	bp->napi_enable = false;
>  }
>  
>  void bnx2x_netif_start(struct bnx2x *bp)
> @@ -2554,6 +2566,7 @@ int bnx2x_load_cnic(struct bnx2x *bp)
>  	}
>  
>  	/* Add all CNIC NAPI objects */
> +	bp->cnic_napi_enable = false;
>  	bnx2x_add_all_napi_cnic(bp);
>  	DP(NETIF_MSG_IFUP, "cnic napi added\n");
>  	bnx2x_napi_enable_cnic(bp);
> @@ -2701,7 +2714,9 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode)
>  	 */
>  	bnx2x_setup_tc(bp->dev, bp->max_cos);
>  
> +	bp->tx_timeout_cnt = 0;
>  	/* Add all NAPI objects */
> +	bp->napi_enable = false;
>  	bnx2x_add_all_napi(bp);
>  	DP(NETIF_MSG_IFUP, "napi added\n");
>  	bnx2x_napi_enable(bp);
> @@ -4982,7 +4997,14 @@ void bnx2x_tx_timeout(struct net_device *dev, unsigned int txqueue)
>  	 */
>  	if (!bp->panic)
>  #ifndef BNX2X_STOP_ON_ERROR
> -		bnx2x_panic_dump(bp, false);
> +	{
> +		if (++bp->tx_timeout_cnt > 3) {
> +			bnx2x_panic_dump(bp, false);
> +			bp->tx_timeout_cnt = 0;
> +		} else {
> +			netdev_err(bp->dev, "TX timeout %d times\n", bp->tx_timeout_cnt);
> +		}
> +	}

Please put this code in a helper function so that the oddly looking
brackets are not needed.

>  #else
>  		bnx2x_panic();
>  #endif
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
> index d8b1824c334d..7e1d38a2c7ec 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
> @@ -1018,6 +1018,9 @@ static inline void bnx2x_free_rx_sge_range(struct bnx2x *bp,
>  	if (fp->mode == TPA_MODE_DISABLED)
>  		return;
>  
> +	if (!fp->page_pool.page)
> +		return;

See, another thing that's not idempotent. Better to bail higher up,
in the callee.

>  	for (i = 0; i < last; i++)
>  		bnx2x_free_rx_sge(bp, fp, i);
>  


  reply	other threads:[~2022-08-27  1:44 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 20:00 [PATCH] bnx2x: Fix error recovering in switch configuration Thinh Tran
2022-08-27  1:44 ` Jakub Kicinski [this message]
2022-08-30 16:15   ` Thinh Tran
2022-09-16 19:51   ` [PATCH v2] " Thinh Tran
2022-09-19 13:53     ` [EXT] " Manish Chopra
2022-09-21 20:11       ` Thinh Tran
2022-09-28 21:33         ` Thinh Tran
2022-10-10 22:01           ` Thinh Tran
2022-10-17 11:14             ` Manish Chopra
2022-10-20 19:12               ` Thinh Tran
2023-07-19 22:02     ` [Patch v3] " Thinh Tran
2023-07-24 11:48       ` Simon Horman
2023-07-24 22:50       ` Jakub Kicinski
2023-07-27 13:29         ` Thinh Tran
2023-07-28 21:11     ` [PATCH v4] " Thinh Tran
2023-08-01  0:47       ` Jakub Kicinski
2023-08-01  8:30         ` Paolo Abeni
2023-08-07 21:15           ` Thinh Tran
2023-08-07 21:08         ` Thinh Tran
2023-08-07 21:24           ` Jakub Kicinski
2023-08-07 21:35           ` Jakub Kicinski
2023-08-11 20:15       ` [Patch v5 0/4] " Thinh Tran
2023-08-11 20:15         ` [Patch v5 1/4] bnx2x: new the bp->nic_stopped variable for checking NIC status Thinh Tran
2023-08-11 20:15         ` [Patch v5 2/4] bnx2x: factor out common code to bnx2x_stop_nic() Thinh Tran
2023-08-11 22:08           ` Jakub Kicinski
2023-08-11 20:15         ` [Patch v5 3/4] bnx2x: Prevent access to a freed page in page_pool Thinh Tran
2023-08-11 20:15         ` [Patch v5 4/4] bnx2x: prevent excessive debug information during a TX timeout Thinh Tran
2023-08-11 22:09         ` [Patch v5 0/4] bnx2x: Fix error recovering in switch configuration Jakub Kicinski
2023-08-18 13:24           ` Thinh Tran
2023-08-18 16:14       ` [Patch v6 " Thinh Tran
2023-08-18 16:14         ` [Patch v6 1/4] bnx2x: new flag for track HW resource Thinh Tran
2023-08-18 16:14         ` [Patch v6 2/4] bnx2x: factor out common code to bnx2x_stop_nic() Thinh Tran
2023-08-18 16:14         ` [Patch v6 3/4] bnx2x: Prevent access to a freed page in page_pool Thinh Tran
2023-08-18 16:14         ` [Patch v6 4/4] bnx2x: prevent excessive debug information during a TX timeout Thinh Tran
2022-08-30  9:19 ` [EXT] [PATCH] bnx2x: Fix error recovering in switch configuration Manish Chopra
2022-08-30 21:01   ` Thinh Tran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220826184440.37e7cb85@kernel.org \
    --to=kuba@kernel.org \
    --cc=aelior@marvell.com \
    --cc=davem@davemloft.net \
    --cc=manishc@marvell.com \
    --cc=netdev@vger.kernel.org \
    --cc=skalluru@marvell.com \
    --cc=thinhtr@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.