* [PATCH v2] powerpc/eeh: Permanently disable the removed device
@ 2024-04-22 7:57 Ganesh Goudar
2024-05-03 10:41 ` Michael Ellerman
0 siblings, 1 reply; 2+ messages in thread
From: Ganesh Goudar @ 2024-04-22 7:57 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Sahitya.Damerla, Ganesh Goudar, mahesh
When a device is hot removed on powernv, the hotplug driver clears
the device's state. However, on pseries, if a device is removed by
phyp after reaching the error threshold, the kernel remains unaware,
leading to the device not being torn down. This prevents necessary
remediation actions like failover.
Permanently disable the device if the presence check fails.
Also, in eeh_dev_check_failure in we may consider the error as false
positive if the device is hotpluged out as the get_state call returns
EEH_STATE_NOT_SUPPORT and we may end up not clearing the device state,
so log the event if the state is not moved to permanent failure state.
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
V2:
* Elobrate the commit message.
* Fix formatting issues in commit message and comments.
---
arch/powerpc/kernel/eeh.c | 11 ++++++++++-
arch/powerpc/kernel/eeh_driver.c | 13 +++++++++++--
2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index ab316e155ea9..6670063a7a6c 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -506,9 +506,18 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
* We will punt with the following conditions: Failure to get
* PE's state, EEH not support and Permanently unavailable
* state, PE is in good state.
+ *
+ * On the pSeries, after reaching the threshold, get_state might
+ * return EEH_STATE_NOT_SUPPORT. However, it's possible that the
+ * device state remains uncleared if the device is not marked
+ * pci_channel_io_perm_failure. Therefore, consider logging the
+ * event to let device removal happen.
+ *
*/
if ((ret < 0) ||
- (ret == EEH_STATE_NOT_SUPPORT) || eeh_state_active(ret)) {
+ (ret == EEH_STATE_NOT_SUPPORT &&
+ dev->error_state == pci_channel_io_perm_failure) ||
+ eeh_state_active(ret)) {
eeh_stats.false_positives++;
pe->false_positives++;
rc = 0;
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 48773d2d9be3..7efe04c68f0f 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -865,9 +865,18 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
devices++;
if (!devices) {
- pr_debug("EEH: Frozen PHB#%x-PE#%x is empty!\n",
+ pr_warn("EEH: Frozen PHB#%x-PE#%x is empty!\n",
pe->phb->global_number, pe->addr);
- goto out; /* nothing to recover */
+ /*
+ * The device is removed, tear down its state, on powernv
+ * hotplug driver would take care of it but not on pseries,
+ * permanently disable the card as it is hot removed.
+ *
+ * In the case of powernv, note that the removal of device
+ * is covered by pci rescan lock, so no problem even if hotplug
+ * driver attempts to remove the device.
+ */
+ goto recover_failed;
}
/* Log the event */
--
2.44.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH v2] powerpc/eeh: Permanently disable the removed device
2024-04-22 7:57 [PATCH v2] powerpc/eeh: Permanently disable the removed device Ganesh Goudar
@ 2024-05-03 10:41 ` Michael Ellerman
0 siblings, 0 replies; 2+ messages in thread
From: Michael Ellerman @ 2024-05-03 10:41 UTC (permalink / raw)
To: linuxppc-dev, mpe, Ganesh Goudar; +Cc: Sahitya.Damerla, mahesh
On Mon, 22 Apr 2024 13:27:37 +0530, Ganesh Goudar wrote:
> When a device is hot removed on powernv, the hotplug driver clears
> the device's state. However, on pseries, if a device is removed by
> phyp after reaching the error threshold, the kernel remains unaware,
> leading to the device not being torn down. This prevents necessary
> remediation actions like failover.
>
> Permanently disable the device if the presence check fails.
>
> [...]
Applied to powerpc/next.
[1/1] powerpc/eeh: Permanently disable the removed device
https://git.kernel.org/powerpc/c/d1679b4fa1722e6bb4a17b13aacdc01a130ba362
cheers
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-05-03 10:43 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-22 7:57 [PATCH v2] powerpc/eeh: Permanently disable the removed device Ganesh Goudar
2024-05-03 10:41 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).