All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] [v1] e1000e: EEH on e1000e adapter detects io perm failure can trigger crash
@ 2019-10-03 16:54 ` David Dai
  0 siblings, 0 replies; 36+ messages in thread
From: David Dai @ 2019-10-03 16:54 UTC (permalink / raw)
  To: intel-wired-lan

We see the behavior when EEH e1000e adapter detects io permanent failure,
it will crash kernel with this stack:
EEH: Beginning: 'error_detected(permanent failure)'
EEH: PE#900000 (PCI 0115:90:00.1): Invoking e1000e->error_detected(permanent failure)
EEH: PE#900000 (PCI 0115:90:00.1): e1000e driver reports: 'disconnect'
EEH: PE#900000 (PCI 0115:90:00.0): Invoking e1000e->error_detected(permanent failure)
EEH: PE#900000 (PCI 0115:90:00.0): e1000e driver reports: 'disconnect'
EEH: Finished:'error_detected(permanent failure)'
Oops: Exception in kernel mode, sig: 5 [#1]
NIP [c0000000007b1be0] free_msi_irqs+0xa0/0x280
 LR [c0000000007b1bd0] free_msi_irqs+0x90/0x280
Call Trace:
[c0000004f491ba10] [c0000000007b1bd0] free_msi_irqs+0x90/0x280 (unreliable)
[c0000004f491ba70] [c0000000007b260c] pci_disable_msi+0x13c/0x180
[c0000004f491bab0] [d0000000046381ac] e1000_remove+0x234/0x2a0 [e1000e]
[c0000004f491baf0] [c000000000783cec] pci_device_remove+0x6c/0x120
[c0000004f491bb30] [c00000000088da6c] device_release_driver_internal+0x2bc/0x3f0
[c0000004f491bb80] [c00000000076f5a8] pci_stop_and_remove_bus_device+0xb8/0x110
[c0000004f491bbc0] [c00000000006e890] pci_hp_remove_devices+0x90/0x130
[c0000004f491bc50] [c00000000004ad34] eeh_handle_normal_event+0x1d4/0x660
[c0000004f491bd10] [c00000000004bf10] eeh_event_handler+0x1c0/0x1e0
[c0000004f491bdc0] [c00000000017c4ac] kthread+0x1ac/0x1c0
[c0000004f491be30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80

Basically the e1000e irqs haven't been freed at the time eeh is trying to 
remove the the e1000e device.
Need to make sure when e1000e_close is called to bring down the NIC,
if adapter error_state is pci_channel_io_perm_failure, it should also 
bring down the link and free irqs.

Reported-by: Morumuri Srivalli  <smorumu1@in.ibm.com>
Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index d7d56e4..cf618e1 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4715,7 +4715,8 @@ int e1000e_close(struct net_device *netdev)
 
 	pm_runtime_get_sync(&pdev->dev);
 
-	if (!test_bit(__E1000_DOWN, &adapter->state)) {
+	if (!test_bit(__E1000_DOWN, &adapter->state) ||
+	    (adapter->pdev->error_state == pci_channel_io_perm_failure)) {
 		e1000e_down(adapter, true);
 		e1000_free_irq(adapter);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2020-02-25 20:46 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-10-03 16:54 [Intel-wired-lan] [v1] e1000e: EEH on e1000e adapter detects io perm failure can trigger crash David Dai
2019-10-03 16:54 ` David Dai
2019-10-03 17:39 ` [Intel-wired-lan] " Alexander Duyck
2019-10-03 17:39   ` Alexander Duyck
2019-10-03 18:50   ` [Intel-wired-lan] " David Z. Dai
2019-10-03 18:50     ` David Z. Dai
2019-10-03 20:39     ` [Intel-wired-lan] " Alexander Duyck
2019-10-03 20:39       ` Alexander Duyck
2019-10-04  0:02       ` [Intel-wired-lan] " David Z. Dai
2019-10-04  0:02         ` David Z. Dai
2019-10-04 14:35         ` [Intel-wired-lan] " Alexander Duyck
2019-10-04 14:35           ` Alexander Duyck
2019-10-04 17:04           ` [Intel-wired-lan] " David Z. Dai
2019-10-04 17:04             ` David Z. Dai
2019-10-04 23:36             ` [Intel-wired-lan] [RFC PATCH] e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm Alexander Duyck
2019-10-04 23:36               ` Alexander Duyck
2019-10-05  2:18               ` [Intel-wired-lan] " David Z. Dai
2019-10-05  2:18                 ` David Z. Dai
2019-10-05 17:22                 ` [Intel-wired-lan] " Alexander Duyck
2019-10-05 17:22                   ` Alexander Duyck
2019-10-07 15:50                   ` [Intel-wired-lan] " David Z. Dai
2019-10-07 15:50                     ` David Z. Dai
2019-10-07 17:02                     ` [Intel-wired-lan] " Alexander Duyck
2019-10-07 17:02                       ` Alexander Duyck
2019-10-07 17:12                       ` [Intel-wired-lan] " David Z. Dai
2019-10-07 17:12                         ` David Z. Dai
2019-10-07 17:23                         ` [Intel-wired-lan] " Alexander Duyck
2019-10-07 17:23                           ` Alexander Duyck
2019-10-07 17:27                           ` [Intel-wired-lan] [RFC PATCH v2] " Alexander Duyck
2019-10-07 17:27                             ` Alexander Duyck
2019-10-08 20:49                             ` [Intel-wired-lan] " David Z. Dai
2019-10-08 20:49                               ` David Z. Dai
2020-02-25  9:42                             ` [Intel-wired-lan] " Kai-Heng Feng
2020-02-25  9:42                               ` Kai-Heng Feng
2020-02-25 20:46                               ` [Intel-wired-lan] " Greg Kroah-Hartman
2020-02-25 20:46                                 ` Greg Kroah-Hartman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.