* [PATCH net v2] ice: Fix ice module unload
@ 2023-06-12 17:14 Tony Nguyen
2023-06-14 8:05 ` Simon Horman
2023-06-15 6:10 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 3+ messages in thread
From: Tony Nguyen @ 2023-06-12 17:14 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, netdev
Cc: Jakub Buchocki, anthony.l.nguyen, michal.swiatkowski, jiri,
Przemek Kitszel, Pucha Himasekhar Reddy
From: Jakub Buchocki <jakubx.buchocki@intel.com>
Clearing the interrupt scheme before PFR reset,
during the removal routine, could cause the hardware
errors and possibly lead to system reboot, as the PF
reset can cause the interrupt to be generated.
Place the call for PFR reset inside ice_deinit_dev(),
wait until reset and all pending transactions are done,
then call ice_clear_interrupt_scheme().
This introduces a PFR reset to multiple error paths.
Additionally, remove the call for the reset from
ice_load() - it will be a part of ice_unload() now.
Error example:
[ 75.229328] ice 0000:ca:00.1: Failed to read Tx Scheduler Tree - User Selection data from flash
[ 77.571315] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 77.571418] {1}[Hardware Error]: event severity: recoverable
[ 77.571459] {1}[Hardware Error]: Error 0, type: recoverable
[ 77.571500] {1}[Hardware Error]: section_type: PCIe error
[ 77.571540] {1}[Hardware Error]: port_type: 4, root port
[ 77.571580] {1}[Hardware Error]: version: 3.0
[ 77.571615] {1}[Hardware Error]: command: 0x0547, status: 0x4010
[ 77.571661] {1}[Hardware Error]: device_id: 0000:c9:02.0
[ 77.571703] {1}[Hardware Error]: slot: 25
[ 77.571736] {1}[Hardware Error]: secondary_bus: 0xca
[ 77.571773] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x347a
[ 77.571821] {1}[Hardware Error]: class_code: 060400
[ 77.571858] {1}[Hardware Error]: bridge: secondary_status: 0x2800, control: 0x0013
[ 77.572490] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
[ 77.572870] pcieport 0000:c9:02.0: [21] ACSViol (First)
[ 77.573222] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[ 77.573554] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
[ 77.691273] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 77.691738] {2}[Hardware Error]: event severity: recoverable
[ 77.691971] {2}[Hardware Error]: Error 0, type: recoverable
[ 77.692192] {2}[Hardware Error]: section_type: PCIe error
[ 77.692403] {2}[Hardware Error]: port_type: 4, root port
[ 77.692616] {2}[Hardware Error]: version: 3.0
[ 77.692825] {2}[Hardware Error]: command: 0x0547, status: 0x4010
[ 77.693032] {2}[Hardware Error]: device_id: 0000:c9:02.0
[ 77.693238] {2}[Hardware Error]: slot: 25
[ 77.693440] {2}[Hardware Error]: secondary_bus: 0xca
[ 77.693641] {2}[Hardware Error]: vendor_id: 0x8086, device_id: 0x347a
[ 77.693853] {2}[Hardware Error]: class_code: 060400
[ 77.694054] {2}[Hardware Error]: bridge: secondary_status: 0x0800, control: 0x0013
[ 77.719115] pci 0000:ca:00.1: AER: can't recover (no error_detected callback)
[ 77.719140] pcieport 0000:c9:02.0: AER: device recovery failed
[ 77.719216] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
[ 77.719390] pcieport 0000:c9:02.0: [21] ACSViol (First)
[ 77.719557] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[ 77.719723] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
Signed-off-by: Jakub Buchocki <jakubx.buchocki@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
v2: Changed to avoid multiple, individual calls to ice_clear_interrupt_scheme().
v1: https://lore.kernel.org/netdev/20230523173033.3577110-1-anthony.l.nguyen@intel.com/
drivers/net/ethernet/intel/ice/ice_main.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 03513d4871ab..42c318ceff61 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4802,9 +4802,13 @@ static int ice_init_dev(struct ice_pf *pf)
static void ice_deinit_dev(struct ice_pf *pf)
{
ice_free_irq_msix_misc(pf);
- ice_clear_interrupt_scheme(pf);
ice_deinit_pf(pf);
ice_deinit_hw(&pf->hw);
+
+ /* Service task is already stopped, so call reset directly. */
+ ice_reset(&pf->hw, ICE_RESET_PFR);
+ pci_wait_for_pending_transaction(pf->pdev);
+ ice_clear_interrupt_scheme(pf);
}
static void ice_init_features(struct ice_pf *pf)
@@ -5094,10 +5098,6 @@ int ice_load(struct ice_pf *pf)
struct ice_vsi *vsi;
int err;
- err = ice_reset(&pf->hw, ICE_RESET_PFR);
- if (err)
- return err;
-
err = ice_init_dev(pf);
if (err)
return err;
@@ -5354,12 +5354,6 @@ static void ice_remove(struct pci_dev *pdev)
ice_setup_mc_magic_wake(pf);
ice_set_wake(pf);
- /* Issue a PFR as part of the prescribed driver unload flow. Do not
- * do it via ice_schedule_reset() since there is no need to rebuild
- * and the service task is already stopped.
- */
- ice_reset(&pf->hw, ICE_RESET_PFR);
- pci_wait_for_pending_transaction(pdev);
pci_disable_device(pdev);
}
--
2.38.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net v2] ice: Fix ice module unload
2023-06-12 17:14 [PATCH net v2] ice: Fix ice module unload Tony Nguyen
@ 2023-06-14 8:05 ` Simon Horman
2023-06-15 6:10 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: Simon Horman @ 2023-06-14 8:05 UTC (permalink / raw)
To: Tony Nguyen
Cc: davem, kuba, pabeni, edumazet, netdev, Jakub Buchocki,
michal.swiatkowski, jiri, Przemek Kitszel, Pucha Himasekhar Reddy
On Mon, Jun 12, 2023 at 10:14:21AM -0700, Tony Nguyen wrote:
> From: Jakub Buchocki <jakubx.buchocki@intel.com>
>
> Clearing the interrupt scheme before PFR reset,
> during the removal routine, could cause the hardware
> errors and possibly lead to system reboot, as the PF
> reset can cause the interrupt to be generated.
>
> Place the call for PFR reset inside ice_deinit_dev(),
> wait until reset and all pending transactions are done,
> then call ice_clear_interrupt_scheme().
>
> This introduces a PFR reset to multiple error paths.
>
> Additionally, remove the call for the reset from
> ice_load() - it will be a part of ice_unload() now.
>
> Error example:
> [ 75.229328] ice 0000:ca:00.1: Failed to read Tx Scheduler Tree - User Selection data from flash
> [ 77.571315] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> [ 77.571418] {1}[Hardware Error]: event severity: recoverable
> [ 77.571459] {1}[Hardware Error]: Error 0, type: recoverable
> [ 77.571500] {1}[Hardware Error]: section_type: PCIe error
> [ 77.571540] {1}[Hardware Error]: port_type: 4, root port
> [ 77.571580] {1}[Hardware Error]: version: 3.0
> [ 77.571615] {1}[Hardware Error]: command: 0x0547, status: 0x4010
> [ 77.571661] {1}[Hardware Error]: device_id: 0000:c9:02.0
> [ 77.571703] {1}[Hardware Error]: slot: 25
> [ 77.571736] {1}[Hardware Error]: secondary_bus: 0xca
> [ 77.571773] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x347a
> [ 77.571821] {1}[Hardware Error]: class_code: 060400
> [ 77.571858] {1}[Hardware Error]: bridge: secondary_status: 0x2800, control: 0x0013
> [ 77.572490] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
> [ 77.572870] pcieport 0000:c9:02.0: [21] ACSViol (First)
> [ 77.573222] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
> [ 77.573554] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
> [ 77.691273] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> [ 77.691738] {2}[Hardware Error]: event severity: recoverable
> [ 77.691971] {2}[Hardware Error]: Error 0, type: recoverable
> [ 77.692192] {2}[Hardware Error]: section_type: PCIe error
> [ 77.692403] {2}[Hardware Error]: port_type: 4, root port
> [ 77.692616] {2}[Hardware Error]: version: 3.0
> [ 77.692825] {2}[Hardware Error]: command: 0x0547, status: 0x4010
> [ 77.693032] {2}[Hardware Error]: device_id: 0000:c9:02.0
> [ 77.693238] {2}[Hardware Error]: slot: 25
> [ 77.693440] {2}[Hardware Error]: secondary_bus: 0xca
> [ 77.693641] {2}[Hardware Error]: vendor_id: 0x8086, device_id: 0x347a
> [ 77.693853] {2}[Hardware Error]: class_code: 060400
> [ 77.694054] {2}[Hardware Error]: bridge: secondary_status: 0x0800, control: 0x0013
> [ 77.719115] pci 0000:ca:00.1: AER: can't recover (no error_detected callback)
> [ 77.719140] pcieport 0000:c9:02.0: AER: device recovery failed
> [ 77.719216] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
> [ 77.719390] pcieport 0000:c9:02.0: [21] ACSViol (First)
> [ 77.719557] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
> [ 77.719723] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
>
> Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
> Signed-off-by: Jakub Buchocki <jakubx.buchocki@intel.com>
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net v2] ice: Fix ice module unload
2023-06-12 17:14 [PATCH net v2] ice: Fix ice module unload Tony Nguyen
2023-06-14 8:05 ` Simon Horman
@ 2023-06-15 6:10 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-06-15 6:10 UTC (permalink / raw)
To: Tony Nguyen
Cc: davem, kuba, pabeni, edumazet, netdev, jakubx.buchocki,
michal.swiatkowski, jiri, przemyslaw.kitszel,
himasekharx.reddy.pucha
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 12 Jun 2023 10:14:21 -0700 you wrote:
> From: Jakub Buchocki <jakubx.buchocki@intel.com>
>
> Clearing the interrupt scheme before PFR reset,
> during the removal routine, could cause the hardware
> errors and possibly lead to system reboot, as the PF
> reset can cause the interrupt to be generated.
>
> [...]
Here is the summary with links:
- [net,v2] ice: Fix ice module unload
https://git.kernel.org/netdev/net/c/24b454bc354a
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-06-15 6:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-12 17:14 [PATCH net v2] ice: Fix ice module unload Tony Nguyen
2023-06-14 8:05 ` Simon Horman
2023-06-15 6:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).