[PATCH net v2] ice: Fix ice module unload

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net v2] ice: Fix ice module unload
@ 2023-06-12 17:14 Tony Nguyen
  2023-06-14  8:05 ` Simon Horman
  2023-06-15  6:10 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 3+ messages in thread
From: Tony Nguyen @ 2023-06-12 17:14 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, netdev
  Cc: Jakub Buchocki, anthony.l.nguyen, michal.swiatkowski, jiri,
	Przemek Kitszel, Pucha Himasekhar Reddy

From: Jakub Buchocki <jakubx.buchocki@intel.com>

Clearing the interrupt scheme before PFR reset,
during the removal routine, could cause the hardware
errors and possibly lead to system reboot, as the PF
reset can cause the interrupt to be generated.

Place the call for PFR reset inside ice_deinit_dev(),
wait until reset and all pending transactions are done,
then call ice_clear_interrupt_scheme().

This introduces a PFR reset to multiple error paths.

Additionally, remove the call for the reset from
ice_load() - it will be a part of ice_unload() now.

Error example:
[   75.229328] ice 0000:ca:00.1: Failed to read Tx Scheduler Tree - User Selection data from flash
[   77.571315] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[   77.571418] {1}[Hardware Error]: event severity: recoverable
[   77.571459] {1}[Hardware Error]:  Error 0, type: recoverable
[   77.571500] {1}[Hardware Error]:   section_type: PCIe error
[   77.571540] {1}[Hardware Error]:   port_type: 4, root port
[   77.571580] {1}[Hardware Error]:   version: 3.0
[   77.571615] {1}[Hardware Error]:   command: 0x0547, status: 0x4010
[   77.571661] {1}[Hardware Error]:   device_id: 0000:c9:02.0
[   77.571703] {1}[Hardware Error]:   slot: 25
[   77.571736] {1}[Hardware Error]:   secondary_bus: 0xca
[   77.571773] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
[   77.571821] {1}[Hardware Error]:   class_code: 060400
[   77.571858] {1}[Hardware Error]:   bridge: secondary_status: 0x2800, control: 0x0013
[   77.572490] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
[   77.572870] pcieport 0000:c9:02.0:    [21] ACSViol                (First)
[   77.573222] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[   77.573554] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
[   77.691273] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[   77.691738] {2}[Hardware Error]: event severity: recoverable
[   77.691971] {2}[Hardware Error]:  Error 0, type: recoverable
[   77.692192] {2}[Hardware Error]:   section_type: PCIe error
[   77.692403] {2}[Hardware Error]:   port_type: 4, root port
[   77.692616] {2}[Hardware Error]:   version: 3.0
[   77.692825] {2}[Hardware Error]:   command: 0x0547, status: 0x4010
[   77.693032] {2}[Hardware Error]:   device_id: 0000:c9:02.0
[   77.693238] {2}[Hardware Error]:   slot: 25
[   77.693440] {2}[Hardware Error]:   secondary_bus: 0xca
[   77.693641] {2}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
[   77.693853] {2}[Hardware Error]:   class_code: 060400
[   77.694054] {2}[Hardware Error]:   bridge: secondary_status: 0x0800, control: 0x0013
[   77.719115] pci 0000:ca:00.1: AER: can't recover (no error_detected callback)
[   77.719140] pcieport 0000:c9:02.0: AER: device recovery failed
[   77.719216] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
[   77.719390] pcieport 0000:c9:02.0:    [21] ACSViol                (First)
[   77.719557] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[   77.719723] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010

Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
Signed-off-by: Jakub Buchocki <jakubx.buchocki@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
v2: Changed to avoid multiple, individual calls to ice_clear_interrupt_scheme().

v1: https://lore.kernel.org/netdev/20230523173033.3577110-1-anthony.l.nguyen@intel.com/

 drivers/net/ethernet/intel/ice/ice_main.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 03513d4871ab..42c318ceff61 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4802,9 +4802,13 @@ static int ice_init_dev(struct ice_pf *pf)
 static void ice_deinit_dev(struct ice_pf *pf)
 {
 	ice_free_irq_msix_misc(pf);
-	ice_clear_interrupt_scheme(pf);
 	ice_deinit_pf(pf);
 	ice_deinit_hw(&pf->hw);
+
+	/* Service task is already stopped, so call reset directly. */
+	ice_reset(&pf->hw, ICE_RESET_PFR);
+	pci_wait_for_pending_transaction(pf->pdev);
+	ice_clear_interrupt_scheme(pf);
 }
 
 static void ice_init_features(struct ice_pf *pf)
@@ -5094,10 +5098,6 @@ int ice_load(struct ice_pf *pf)
 	struct ice_vsi *vsi;
 	int err;
 
-	err = ice_reset(&pf->hw, ICE_RESET_PFR);
-	if (err)
-		return err;
-
 	err = ice_init_dev(pf);
 	if (err)
 		return err;
@@ -5354,12 +5354,6 @@ static void ice_remove(struct pci_dev *pdev)
 	ice_setup_mc_magic_wake(pf);
 	ice_set_wake(pf);
 
-	/* Issue a PFR as part of the prescribed driver unload flow.  Do not
-	 * do it via ice_schedule_reset() since there is no need to rebuild
-	 * and the service task is already stopped.
-	 */
-	ice_reset(&pf->hw, ICE_RESET_PFR);
-	pci_wait_for_pending_transaction(pdev);
 	pci_disable_device(pdev);
 }
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] ice: Fix ice module unload
  2023-06-12 17:14 [PATCH net v2] ice: Fix ice module unload Tony Nguyen
@ 2023-06-14  8:05 ` Simon Horman
  2023-06-15  6:10 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: Simon Horman @ 2023-06-14  8:05 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, netdev, Jakub Buchocki,
	michal.swiatkowski, jiri, Przemek Kitszel, Pucha Himasekhar Reddy

On Mon, Jun 12, 2023 at 10:14:21AM -0700, Tony Nguyen wrote:
> From: Jakub Buchocki <jakubx.buchocki@intel.com>
> 
> Clearing the interrupt scheme before PFR reset,
> during the removal routine, could cause the hardware
> errors and possibly lead to system reboot, as the PF
> reset can cause the interrupt to be generated.
> 
> Place the call for PFR reset inside ice_deinit_dev(),
> wait until reset and all pending transactions are done,
> then call ice_clear_interrupt_scheme().
> 
> This introduces a PFR reset to multiple error paths.
> 
> Additionally, remove the call for the reset from
> ice_load() - it will be a part of ice_unload() now.
> 
> Error example:
> [   75.229328] ice 0000:ca:00.1: Failed to read Tx Scheduler Tree - User Selection data from flash
> [   77.571315] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> [   77.571418] {1}[Hardware Error]: event severity: recoverable
> [   77.571459] {1}[Hardware Error]:  Error 0, type: recoverable
> [   77.571500] {1}[Hardware Error]:   section_type: PCIe error
> [   77.571540] {1}[Hardware Error]:   port_type: 4, root port
> [   77.571580] {1}[Hardware Error]:   version: 3.0
> [   77.571615] {1}[Hardware Error]:   command: 0x0547, status: 0x4010
> [   77.571661] {1}[Hardware Error]:   device_id: 0000:c9:02.0
> [   77.571703] {1}[Hardware Error]:   slot: 25
> [   77.571736] {1}[Hardware Error]:   secondary_bus: 0xca
> [   77.571773] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
> [   77.571821] {1}[Hardware Error]:   class_code: 060400
> [   77.571858] {1}[Hardware Error]:   bridge: secondary_status: 0x2800, control: 0x0013
> [   77.572490] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
> [   77.572870] pcieport 0000:c9:02.0:    [21] ACSViol                (First)
> [   77.573222] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
> [   77.573554] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
> [   77.691273] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> [   77.691738] {2}[Hardware Error]: event severity: recoverable
> [   77.691971] {2}[Hardware Error]:  Error 0, type: recoverable
> [   77.692192] {2}[Hardware Error]:   section_type: PCIe error
> [   77.692403] {2}[Hardware Error]:   port_type: 4, root port
> [   77.692616] {2}[Hardware Error]:   version: 3.0
> [   77.692825] {2}[Hardware Error]:   command: 0x0547, status: 0x4010
> [   77.693032] {2}[Hardware Error]:   device_id: 0000:c9:02.0
> [   77.693238] {2}[Hardware Error]:   slot: 25
> [   77.693440] {2}[Hardware Error]:   secondary_bus: 0xca
> [   77.693641] {2}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
> [   77.693853] {2}[Hardware Error]:   class_code: 060400
> [   77.694054] {2}[Hardware Error]:   bridge: secondary_status: 0x0800, control: 0x0013
> [   77.719115] pci 0000:ca:00.1: AER: can't recover (no error_detected callback)
> [   77.719140] pcieport 0000:c9:02.0: AER: device recovery failed
> [   77.719216] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
> [   77.719390] pcieport 0000:c9:02.0:    [21] ACSViol                (First)
> [   77.719557] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
> [   77.719723] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
> 
> Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
> Signed-off-by: Jakub Buchocki <jakubx.buchocki@intel.com>
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] ice: Fix ice module unload
  2023-06-12 17:14 [PATCH net v2] ice: Fix ice module unload Tony Nguyen
  2023-06-14  8:05 ` Simon Horman
@ 2023-06-15  6:10 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-06-15  6:10 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, netdev, jakubx.buchocki,
	michal.swiatkowski, jiri, przemyslaw.kitszel,
	himasekharx.reddy.pucha

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 12 Jun 2023 10:14:21 -0700 you wrote:
> From: Jakub Buchocki <jakubx.buchocki@intel.com>
> 
> Clearing the interrupt scheme before PFR reset,
> during the removal routine, could cause the hardware
> errors and possibly lead to system reboot, as the PF
> reset can cause the interrupt to be generated.
> 
> [...]

Here is the summary with links:
  - [net,v2] ice: Fix ice module unload
    https://git.kernel.org/netdev/net/c/24b454bc354a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-06-15  6:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-12 17:14 [PATCH net v2] ice: Fix ice module unload Tony Nguyen
2023-06-14  8:05 ` Simon Horman
2023-06-15  6:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).