All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Cc: aelior@marvell.com, davem@davemloft.net, edumazet@google.com,
	manishc@marvell.com, netdev@vger.kernel.org, pabeni@redhat.com,
	skalluru@marvell.com, drc@linux.vnet.ibm.com,
	abdhalee@in.ibm.com, simon.horman@corigine.com
Subject: Re: [PATCH v4] bnx2x: Fix error recovering in switch configuration
Date: Mon, 31 Jul 2023 17:47:16 -0700	[thread overview]
Message-ID: <20230731174716.0898ff62@kernel.org> (raw)
In-Reply-To: <20230728211133.2240873-1-thinhtr@linux.vnet.ibm.com>

On Fri, 28 Jul 2023 16:11:33 -0500 Thinh Tran wrote:
> As the BCM57810 and other I/O adapters are connected
> through a PCIe switch, the bnx2x driver causes unexpected
> system hang/crash while handling PCIe switch errors, if
> its error handler is called after other drivers' handlers.
> 
> In this case, after numbers of bnx2x_tx_timout(), the
> bnx2x_nic_unload() is  called, frees up resources and
> calls bnx2x_napi_disable(). Then when EEH calls its
> error handler, the bnx2x_io_error_detected() and
> bnx2x_io_slot_reset() also calling bnx2x_napi_disable()
> and freeing the resources.
> 
> 
> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Reviewed-by: Manish Chopra <manishc@marvell.com>
> Tested-by: Abdul Haleem <abdhalee@in.ibm.com>
> Tested-by: David Christensen <drc@linux.vnet.ibm.com>
> 
> Reviewed-by: Simon Horman <simon.horman@corigine.com>

nit: no empty lines between tags

There should be a "---" line between the tags and changelog.

>   v4:
>    - factoring common code into new function bnx2x_stop_nic()
>      that disables and releases IRQs and NAPIs 
>   v3:
>     - no changes, just repatched to the latest driver level
>     - updated the reviewed-by Manish in October, 2022
> 
>   v2:
>    - Check the state of the NIC before calling disable nappi
>      and freeing the IRQ
>    - Prevent recurrence of TX timeout by turning off the carrier,
>      calling netif_carrier_off() in bnx2x_tx_timeout()
>    - Check and bail out early if fp->page_pool already freed
>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x.h   |  2 ++

> @@ -3095,14 +3097,8 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link)
>  		if (!CHIP_IS_E1x(bp))
>  			bnx2x_pf_disable(bp);
>  
> -		/* Disable HW interrupts, NAPI */
> -		bnx2x_netif_stop(bp, 1);
> -		/* Delete all NAPI objects */
> -		bnx2x_del_all_napi(bp);
> -		if (CNIC_LOADED(bp))
> -			bnx2x_del_all_napi_cnic(bp);
> -		/* Release IRQs */
> -		bnx2x_free_irq(bp);

Could you split the change into two patches - one factoring out the
code into bnx2x_stop_nic() and the other adding the nic_stopped
variable? First one should be pure code refactoring with no functional
changes. That'd make the reviewing process easier.

> +		/* Disable HW interrupts, delete NAPIs, Release IRQs */
> +		bnx2x_stop_nic(bp);
>  
>  		/* Report UNLOAD_DONE to MCP */
>  		bnx2x_send_unload_done(bp, false);
> @@ -4987,6 +4983,12 @@ void bnx2x_tx_timeout(struct net_device *dev, unsigned int txqueue)
>  {
>  	struct bnx2x *bp = netdev_priv(dev);
>  
> +	/* Immediately indicate link as down */
> +	bp->link_vars.link_up = 0;
> +	bp->force_link_down = true;
> +	netif_carrier_off(dev);
> +	BNX2X_ERR("Indicating link is down due to Tx-timeout\n");

Is this code move to make the shutdown more immediate?
That could also be a separate patch.

>  	/* We want the information of the dump logged,
>  	 * but calling bnx2x_panic() would kill all chances of recovery.
>  	 */
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
> index d8b1824c334d..f5ecbe8d604a 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
> @@ -1015,6 +1015,9 @@ static inline void bnx2x_free_rx_sge_range(struct bnx2x *bp,
>  {
>  	int i;
>  
> +	if (!fp->page_pool.page)
> +		return;
> +
>  	if (fp->mode == TPA_MODE_DISABLED)
>  		return;
>  
> @@ -1399,5 +1402,20 @@ void bnx2x_set_os_driver_state(struct bnx2x *bp, u32 state);
>   */
>  int bnx2x_nvram_read(struct bnx2x *bp, u32 offset, u8 *ret_buf,
>  		     int buf_size);
> +static inline void bnx2x_stop_nic(struct bnx2x *bp)

can't it live in bnx2x_cmn.c ? Why make it a static inline?

> +{
> +	if (!bp->nic_stopped) {
> +		/* Disable HW interrupts, NAPI */
> +		bnx2x_netif_stop(bp, 1);
> +		/* Delete all NAPI objects */
> +		bnx2x_del_all_napi(bp);
> +		if (CNIC_LOADED(bp))
> +			bnx2x_del_all_napi_cnic(bp);
> +		/* Release IRQs */
> +		bnx2x_free_irq(bp);
> +		bp->nic_stopped = true;
> +	}
> +}
> +
>  

nit: double new line

> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
> @@ -529,13 +529,8 @@ void bnx2x_vfpf_close_vf(struct bnx2x *bp)
>  	bnx2x_vfpf_finalize(bp, &req->first_tlv);
>  
>  free_irq:
> -	/* Disable HW interrupts, NAPI */
> -	bnx2x_netif_stop(bp, 0);

This used to say 

	bnx2x_netif_stop(bp, 0);

but bnx2x_stop_nic() will do:

	bnx2x_netif_stop(bp, 1);

is it okay to shut down the HW here ? (whatever that entails)

> -	/* Delete all NAPI objects */
> -	bnx2x_del_all_napi(bp);
> -
> -	/* Release IRQs */
> -	bnx2x_free_irq(bp);
> +	/* Disable HW interrupts, delete NAPIs, Release IRQs */
> +	bnx2x_stop_nic(bp);
-- 
pw-bot: cr

  reply	other threads:[~2023-08-01  0:47 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 20:00 [PATCH] bnx2x: Fix error recovering in switch configuration Thinh Tran
2022-08-27  1:44 ` Jakub Kicinski
2022-08-30 16:15   ` Thinh Tran
2022-09-16 19:51   ` [PATCH v2] " Thinh Tran
2022-09-19 13:53     ` [EXT] " Manish Chopra
2022-09-21 20:11       ` Thinh Tran
2022-09-28 21:33         ` Thinh Tran
2022-10-10 22:01           ` Thinh Tran
2022-10-17 11:14             ` Manish Chopra
2022-10-20 19:12               ` Thinh Tran
2023-07-19 22:02     ` [Patch v3] " Thinh Tran
2023-07-24 11:48       ` Simon Horman
2023-07-24 22:50       ` Jakub Kicinski
2023-07-27 13:29         ` Thinh Tran
2023-07-28 21:11     ` [PATCH v4] " Thinh Tran
2023-08-01  0:47       ` Jakub Kicinski [this message]
2023-08-01  8:30         ` Paolo Abeni
2023-08-07 21:15           ` Thinh Tran
2023-08-07 21:08         ` Thinh Tran
2023-08-07 21:24           ` Jakub Kicinski
2023-08-07 21:35           ` Jakub Kicinski
2023-08-11 20:15       ` [Patch v5 0/4] " Thinh Tran
2023-08-11 20:15         ` [Patch v5 1/4] bnx2x: new the bp->nic_stopped variable for checking NIC status Thinh Tran
2023-08-11 20:15         ` [Patch v5 2/4] bnx2x: factor out common code to bnx2x_stop_nic() Thinh Tran
2023-08-11 22:08           ` Jakub Kicinski
2023-08-11 20:15         ` [Patch v5 3/4] bnx2x: Prevent access to a freed page in page_pool Thinh Tran
2023-08-11 20:15         ` [Patch v5 4/4] bnx2x: prevent excessive debug information during a TX timeout Thinh Tran
2023-08-11 22:09         ` [Patch v5 0/4] bnx2x: Fix error recovering in switch configuration Jakub Kicinski
2023-08-18 13:24           ` Thinh Tran
2023-08-18 16:14       ` [Patch v6 " Thinh Tran
2023-08-18 16:14         ` [Patch v6 1/4] bnx2x: new flag for track HW resource Thinh Tran
2023-08-18 16:14         ` [Patch v6 2/4] bnx2x: factor out common code to bnx2x_stop_nic() Thinh Tran
2023-08-18 16:14         ` [Patch v6 3/4] bnx2x: Prevent access to a freed page in page_pool Thinh Tran
2023-08-18 16:14         ` [Patch v6 4/4] bnx2x: prevent excessive debug information during a TX timeout Thinh Tran
2022-08-30  9:19 ` [EXT] [PATCH] bnx2x: Fix error recovering in switch configuration Manish Chopra
2022-08-30 21:01   ` Thinh Tran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230731174716.0898ff62@kernel.org \
    --to=kuba@kernel.org \
    --cc=abdhalee@in.ibm.com \
    --cc=aelior@marvell.com \
    --cc=davem@davemloft.net \
    --cc=drc@linux.vnet.ibm.com \
    --cc=edumazet@google.com \
    --cc=manishc@marvell.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=simon.horman@corigine.com \
    --cc=skalluru@marvell.com \
    --cc=thinhtr@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.