* [PATCH] powerpc/eeh: Enable IO path on permanent error
@ 2017-01-05 23:39 Gavin Shan
2017-01-05 23:46 ` Russell Currey
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Gavin Shan @ 2017-01-05 23:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: ruscur, mpe, Gavin Shan
We give up recovery on permanent error, simply shutdown the affected
devices and remove them. If the devices can't be put into quiet state,
they spew more traffic that is likely to cause another unexpected EEH
error. This was observed on "p8dtu2u" machine:
0002:00:00.0 PCI bridge: IBM Device 03dc
0002:01:00.0 Ethernet controller: Intel Corporation \
Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.1 Ethernet controller: Intel Corporation \
Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.2 Ethernet controller: Intel Corporation \
Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.3 Ethernet controller: Intel Corporation \
Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
On P8 PowerNV platform, the IO path is frozen when shutdowning the
devices, meaning the memory registers are inaccessible. It is why
the devices can't be put into quiet state before removing them.
This fixes the issue by enabling IO path prior to putting the devices
into quiet state.
Link: https://github.com/open-power/supermicro-openpower/issues/419
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
arch/powerpc/kernel/eeh.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 8180bfd..9de7f79 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -298,9 +298,17 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
*
* For pHyp, we have to enable IO for log retrieval. Otherwise,
* 0xFF's is always returned from PCI config space.
+ *
+ * When the @severity is EEH_LOG_PERM, the PE is going to be
+ * removed. Prior to that, the drivers for devices included in
+ * the PE will be closed. The drivers rely on working IO path
+ * to bring the devices to quiet state. Otherwise, PCI traffic
+ * from those devices after they are removed is like to cause
+ * another unexpected EEH error.
*/
if (!(pe->type & EEH_PE_PHB)) {
- if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG))
+ if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG) ||
+ severity == EEH_LOG_PERM)
eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
/*
--
2.7.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] powerpc/eeh: Enable IO path on permanent error
2017-01-05 23:39 [PATCH] powerpc/eeh: Enable IO path on permanent error Gavin Shan
@ 2017-01-05 23:46 ` Russell Currey
2017-01-06 1:56 ` Gavin Shan
2017-01-18 4:06 ` Russell Currey
2017-01-18 12:10 ` Michael Ellerman
2 siblings, 1 reply; 5+ messages in thread
From: Russell Currey @ 2017-01-05 23:46 UTC (permalink / raw)
To: Gavin Shan, linuxppc-dev
On Fri, 2017-01-06 at 10:39 +1100, Gavin Shan wrote:
> We give up recovery on permanent error, simply shutdown the affected
> devices and remove them. If the devices can't be put into quiet state,
> they spew more traffic that is likely to cause another unexpected EEH
> error. This was observed on "p8dtu2u" machine:
>
> 0002:00:00.0 PCI bridge: IBM Device 03dc
> 0002:01:00.0 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
> 0002:01:00.1 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
> 0002:01:00.2 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
> 0002:01:00.3 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>
> On P8 PowerNV platform, the IO path is frozen when shutdowning the
> devices, meaning the memory registers are inaccessible. It is why
> the devices can't be put into quiet state before removing them.
> This fixes the issue by enabling IO path prior to putting the devices
> into quiet state.
>
> Link: https://github.com/open-power/supermicro-openpower/issues/419
FYI this link isn't publicly accessible.
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/eeh.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 8180bfd..9de7f79 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -298,9 +298,17 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int
> severity)
> *
> * For pHyp, we have to enable IO for log retrieval. Otherwise,
> * 0xFF's is always returned from PCI config space.
> + *
> + * When the @severity is EEH_LOG_PERM, the PE is going to be
> + * removed. Prior to that, the drivers for devices included in
> + * the PE will be closed. The drivers rely on working IO path
> + * to bring the devices to quiet state. Otherwise, PCI traffic
> + * from those devices after they are removed is like to cause
> + * another unexpected EEH error.
> */
> if (!(pe->type & EEH_PE_PHB)) {
> - if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG))
> + if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG) ||
> + severity == EEH_LOG_PERM)
> eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
>
> /*
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] powerpc/eeh: Enable IO path on permanent error
2017-01-05 23:46 ` Russell Currey
@ 2017-01-06 1:56 ` Gavin Shan
0 siblings, 0 replies; 5+ messages in thread
From: Gavin Shan @ 2017-01-06 1:56 UTC (permalink / raw)
To: Russell Currey; +Cc: Gavin Shan, linuxppc-dev, mpe
On Fri, Jan 06, 2017 at 10:46:21AM +1100, Russell Currey wrote:
>On Fri, 2017-01-06 at 10:39 +1100, Gavin Shan wrote:
>> We give up recovery on permanent error, simply shutdown the affected
>> devices and remove them. If the devices can't be put into quiet state,
>> they spew more traffic that is likely to cause another unexpected EEH
>> error. This was observed on "p8dtu2u" machine:
>>
>> 0002:00:00.0 PCI bridge: IBM Device 03dc
>> 0002:01:00.0 Ethernet controller: Intel Corporation \
>> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>> 0002:01:00.1 Ethernet controller: Intel Corporation \
>> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>> 0002:01:00.2 Ethernet controller: Intel Corporation \
>> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>> 0002:01:00.3 Ethernet controller: Intel Corporation \
>> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>>
>> On P8 PowerNV platform, the IO path is frozen when shutdowning the
>> devices, meaning the memory registers are inaccessible. It is why
>> the devices can't be put into quiet state before removing them.
>> This fixes the issue by enabling IO path prior to putting the devices
>> into quiet state.
>>
>> Link: https://github.com/open-power/supermicro-openpower/issues/419
>
>FYI this link isn't publicly accessible.
>
Yeah, I knew it. The reason I put it here is more details out there
for you or me.
>> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/kernel/eeh.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>> index 8180bfd..9de7f79 100644
>> --- a/arch/powerpc/kernel/eeh.c
>> +++ b/arch/powerpc/kernel/eeh.c
>> @@ -298,9 +298,17 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int
>> severity)
>> *
>> * For pHyp, we have to enable IO for log retrieval. Otherwise,
>> * 0xFF's is always returned from PCI config space.
>> + *
>> + * When the @severity is EEH_LOG_PERM, the PE is going to be
>> + * removed. Prior to that, the drivers for devices included in
>> + * the PE will be closed. The drivers rely on working IO path
>> + * to bring the devices to quiet state. Otherwise, PCI traffic
>> + * from those devices after they are removed is like to cause
>> + * another unexpected EEH error.
>> */
>> if (!(pe->type & EEH_PE_PHB)) {
>> - if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG))
>> + if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG) ||
>> + severity == EEH_LOG_PERM)
>> eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
>>
>> /*
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] powerpc/eeh: Enable IO path on permanent error
2017-01-05 23:39 [PATCH] powerpc/eeh: Enable IO path on permanent error Gavin Shan
2017-01-05 23:46 ` Russell Currey
@ 2017-01-18 4:06 ` Russell Currey
2017-01-18 12:10 ` Michael Ellerman
2 siblings, 0 replies; 5+ messages in thread
From: Russell Currey @ 2017-01-18 4:06 UTC (permalink / raw)
To: Gavin Shan, linuxppc-dev
On Fri, 2017-01-06 at 10:39 +1100, Gavin Shan wrote:
> We give up recovery on permanent error, simply shutdown the affected
> devices and remove them. If the devices can't be put into quiet state,
> they spew more traffic that is likely to cause another unexpected EEH
> error. This was observed on "p8dtu2u" machine:
>
> 0002:00:00.0 PCI bridge: IBM Device 03dc
> 0002:01:00.0 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
> 0002:01:00.1 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
> 0002:01:00.2 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
> 0002:01:00.3 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>
> On P8 PowerNV platform, the IO path is frozen when shutdowning the
> devices, meaning the memory registers are inaccessible. It is why
> the devices can't be put into quiet state before removing them.
> This fixes the issue by enabling IO path prior to putting the devices
> into quiet state.
>
> Link: https://github.com/open-power/supermicro-openpower/issues/419
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
(forgot to ack this)
Acked-by: Russell Currey <ruscur@russell.cc>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: powerpc/eeh: Enable IO path on permanent error
2017-01-05 23:39 [PATCH] powerpc/eeh: Enable IO path on permanent error Gavin Shan
2017-01-05 23:46 ` Russell Currey
2017-01-18 4:06 ` Russell Currey
@ 2017-01-18 12:10 ` Michael Ellerman
2 siblings, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2017-01-18 12:10 UTC (permalink / raw)
To: Gavin Shan, linuxppc-dev; +Cc: Gavin Shan
On Thu, 2017-01-05 at 23:39:49 UTC, Gavin Shan wrote:
> We give up recovery on permanent error, simply shutdown the affected
> devices and remove them. If the devices can't be put into quiet state,
> they spew more traffic that is likely to cause another unexpected EEH
> error. This was observed on "p8dtu2u" machine:
>
> 0002:00:00.0 PCI bridge: IBM Device 03dc
> 0002:01:00.0 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
> 0002:01:00.1 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
> 0002:01:00.2 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
> 0002:01:00.3 Ethernet controller: Intel Corporation \
> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>
> On P8 PowerNV platform, the IO path is frozen when shutdowning the
> devices, meaning the memory registers are inaccessible. It is why
> the devices can't be put into quiet state before removing them.
> This fixes the issue by enabling IO path prior to putting the devices
> into quiet state.
>
> Link: https://github.com/open-power/supermicro-openpower/issues/419
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Acked-by: Russell Currey <ruscur@russell.cc>
Applied to powerpc fixes, thanks.
https://git.kernel.org/powerpc/c/387bbc974f6adf91aa635090f73434
cheers
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-01-18 12:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-05 23:39 [PATCH] powerpc/eeh: Enable IO path on permanent error Gavin Shan
2017-01-05 23:46 ` Russell Currey
2017-01-06 1:56 ` Gavin Shan
2017-01-18 4:06 ` Russell Currey
2017-01-18 12:10 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).