linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset
@ 2015-10-07  3:12 Gavin Shan
  2015-10-07  3:12 ` [PATCH 2/5] powerpc/eeh: More relexed hotplug criterion Gavin Shan
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Gavin Shan @ 2015-10-07  3:12 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

On PowerNV platform, the PE is kept in frozen state until the PE
reset is completed to avoid recursive EEH error caused by MMIO
access during the period of EEH reset. The PE's frozen state is
cleared after BARs of PCI device included in the PE are restored
and enabled. However, we needn't clear the frozen state for PHB PE
explicitly at this point as there is no real PE for PHB PE. As the
PHB PE is always binding with PE#0, we actually clear PE#0, which
is wrong. It doesn't incur any problem though.

This checks if the PE is PHB PE and doesn't clear the frozen state
if it is.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 89eb4bc..3a626ed 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -587,10 +587,16 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	eeh_ops->configure_bridge(pe);
 	eeh_pe_restore_bars(pe);
 
-	/* Clear frozen state */
-	rc = eeh_clear_pe_frozen_state(pe, false);
-	if (rc)
-		return rc;
+	/*
+	 * If it's PHB PE, the frozen state on all available PEs should have
+	 * been cleared by the PHB reset. Otherwise, we unfreeze the PE and its
+	 * child PEs because they might be in frozen state.
+	 */
+	if (!(pe->type & EEH_PE_PHB)) {
+		rc = eeh_clear_pe_frozen_state(pe, false);
+		if (rc)
+			return rc;
+	}
 
 	/* Give the system 5 seconds to finish running the user-space
 	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] powerpc/eeh: More relexed hotplug criterion
  2015-10-07  3:12 [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
@ 2015-10-07  3:12 ` Gavin Shan
  2015-10-07  3:12 ` [PATCH 3/5] powerpc/eeh: Force reset on fenced PHB Gavin Shan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Gavin Shan @ 2015-10-07  3:12 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

Currently, we rely on the existence of struct pci_driver::err_handler
to judge if the corresponding PCI device should be unplugged during
EEH recovery (partially hotplug case). However, it's not elaborate.
some device drivers are implementing part of the EEH error handlers
to collect diag-data. That means the driver still expects a hotplug
to recover from the EEH error.

This makes the hotplug criterion more relaxed: if the device driver
doesn't provide all necessary EEH error handlers, it will experience
hotplug during EEH recovery.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 3a626ed..32178a4 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (driver) {
 		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (driver->err_handler &&
+		    driver->err_handler->error_detected &&
+		    driver->err_handler->slot_reset &&
+		    driver->err_handler->resume)
 			return NULL;
 	}
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] powerpc/eeh: Force reset on fenced PHB
  2015-10-07  3:12 [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
  2015-10-07  3:12 ` [PATCH 2/5] powerpc/eeh: More relexed hotplug criterion Gavin Shan
@ 2015-10-07  3:12 ` Gavin Shan
  2015-10-07  3:12 ` [PATCH 4/5] powerpc/eeh: More relxed condition for enabled IO path Gavin Shan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Gavin Shan @ 2015-10-07  3:12 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

On fenced PHB, the error handlers in the drivers of its subordinate
devices could return PCI_ERS_RESULT_CAN_RECOVER, indicating no reset
will be issued during the recovery. It's conflicting with the fact
that fenced PHB won't be recovered without reset.

This limits the return value from the error handlers in the drivers
of the fenced PHB's subordinate devices to PCI_ERS_RESULT_NEED_NONE
or PCI_ERS_RESULT_NEED_RESET, to ensure reset will be issued during
recovery.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 32178a4..76d918b 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -664,9 +664,17 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * to accomplish the reset.  Each child gets a report of the
 	 * status ... if any child can't handle the reset, then the entire
 	 * slot is dlpar removed and added.
+	 *
+	 * When the PHB is fenced, we have to issue a reset to recover from
+	 * the error. Override the result if necessary to have partially
+	 * hotplug for this case.
 	 */
 	pr_info("EEH: Notify device drivers to shutdown\n");
 	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
+	if ((pe->type & EEH_PE_PHB) &&
+	    result != PCI_ERS_RESULT_NEED_NONE &&
+	    result != PCI_ERS_RESULT_NEED_RESET)
+		result = PCI_ERS_RESULT_NEED_RESET;
 
 	/* Get the current PCI slot state. This can take a long time,
 	 * sometimes over 300 seconds for certain systems.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] powerpc/eeh: More relxed condition for enabled IO path
  2015-10-07  3:12 [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
  2015-10-07  3:12 ` [PATCH 2/5] powerpc/eeh: More relexed hotplug criterion Gavin Shan
  2015-10-07  3:12 ` [PATCH 3/5] powerpc/eeh: Force reset on fenced PHB Gavin Shan
@ 2015-10-07  3:12 ` Gavin Shan
  2015-10-07  3:12 ` [PATCH 5/5] powerpc/pseries: Cleanup on pseries_eeh_get_state() Gavin Shan
  2015-10-07 23:20 ` [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
  4 siblings, 0 replies; 7+ messages in thread
From: Gavin Shan @ 2015-10-07  3:12 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

When one of below two flags or both of them are marked in the PE
state, the PE's IO path is regarded as enabled: EEH_STATE_MMIO_ACTIVE
or EEH_STATE_MMIO_ENABLED.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index e968533..ddbf406 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -630,7 +630,7 @@ int eeh_pci_enable(struct eeh_pe *pe, int function)
 	 */
 	switch (function) {
 	case EEH_OPT_THAW_MMIO:
-		active_flag = EEH_STATE_MMIO_ACTIVE;
+		active_flag = EEH_STATE_MMIO_ACTIVE | EEH_STATE_MMIO_ENABLED;
 		break;
 	case EEH_OPT_THAW_DMA:
 		active_flag = EEH_STATE_DMA_ACTIVE;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] powerpc/pseries: Cleanup on pseries_eeh_get_state()
  2015-10-07  3:12 [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
                   ` (2 preceding siblings ...)
  2015-10-07  3:12 ` [PATCH 4/5] powerpc/eeh: More relxed condition for enabled IO path Gavin Shan
@ 2015-10-07  3:12 ` Gavin Shan
  2015-10-07 23:20 ` [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
  4 siblings, 0 replies; 7+ messages in thread
From: Gavin Shan @ 2015-10-07  3:12 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

This cleans up pseries_eeh_get_state(), no functional changes:

   * Return EEH_STATE_NOT_SUPPORT early when the 2nd RTAS output
     argument is zero to avoid nested if statements.
   * Skip clearing bits in the PE state represented by variable
     "result" to simplify the code.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 60 ++++++++++++----------------
 1 file changed, 26 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 1ba55d0..ac3ffd9 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -433,42 +433,34 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
 		return ret;
 
 	/* Parse the result out */
-	result = 0;
-	if (rets[1]) {
-		switch(rets[0]) {
-		case 0:
-			result &= ~EEH_STATE_RESET_ACTIVE;
-			result |= EEH_STATE_MMIO_ACTIVE;
-			result |= EEH_STATE_DMA_ACTIVE;
-			break;
-		case 1:
-			result |= EEH_STATE_RESET_ACTIVE;
-			result |= EEH_STATE_MMIO_ACTIVE;
-			result |= EEH_STATE_DMA_ACTIVE;
-			break;
-		case 2:
-			result &= ~EEH_STATE_RESET_ACTIVE;
-			result &= ~EEH_STATE_MMIO_ACTIVE;
-			result &= ~EEH_STATE_DMA_ACTIVE;
-			break;
-		case 4:
-			result &= ~EEH_STATE_RESET_ACTIVE;
-			result &= ~EEH_STATE_MMIO_ACTIVE;
-			result &= ~EEH_STATE_DMA_ACTIVE;
-			result |= EEH_STATE_MMIO_ENABLED;
-			break;
-		case 5:
-			if (rets[2]) {
-				if (state) *state = rets[2];
-				result = EEH_STATE_UNAVAILABLE;
-			} else {
-				result = EEH_STATE_NOT_SUPPORT;
-			}
-			break;
-		default:
+	if (!rets[1])
+		return EEH_STATE_NOT_SUPPORT;
+
+	switch(rets[0]) {
+	case 0:
+		result = EEH_STATE_MMIO_ACTIVE |
+			 EEH_STATE_DMA_ACTIVE;
+		break;
+	case 1:
+		result = EEH_STATE_RESET_ACTIVE |
+			 EEH_STATE_MMIO_ACTIVE  |
+			 EEH_STATE_DMA_ACTIVE;
+		break;
+	case 2:
+		result = 0;
+		break;
+	case 4:
+		result = EEH_STATE_MMIO_ENABLED;
+		break;
+	case 5:
+		if (rets[2]) {
+			if (state) *state = rets[2];
+			result = EEH_STATE_UNAVAILABLE;
+		} else {
 			result = EEH_STATE_NOT_SUPPORT;
 		}
-	} else {
+		break;
+	default:
 		result = EEH_STATE_NOT_SUPPORT;
 	}
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset
  2015-10-07  3:12 [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
                   ` (3 preceding siblings ...)
  2015-10-07  3:12 ` [PATCH 5/5] powerpc/pseries: Cleanup on pseries_eeh_get_state() Gavin Shan
@ 2015-10-07 23:20 ` Gavin Shan
  2015-10-08  4:05   ` Gavin Shan
  4 siblings, 1 reply; 7+ messages in thread
From: Gavin Shan @ 2015-10-07 23:20 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev, mpe

On Wed, Oct 07, 2015 at 02:12:05PM +1100, Gavin Shan wrote:
>On PowerNV platform, the PE is kept in frozen state until the PE
>reset is completed to avoid recursive EEH error caused by MMIO
>access during the period of EEH reset. The PE's frozen state is
>cleared after BARs of PCI device included in the PE are restored
>and enabled. However, we needn't clear the frozen state for PHB PE
>explicitly at this point as there is no real PE for PHB PE. As the
>PHB PE is always binding with PE#0, we actually clear PE#0, which
>is wrong. It doesn't incur any problem though.
>
>This checks if the PE is PHB PE and doesn't clear the frozen state
>if it is.
>
>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Michael, please ignore this series for now. One or two patches are
still missed. Also, this series needs respin.

Thanks,
Gavin

>---
> arch/powerpc/kernel/eeh_driver.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
>diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>index 89eb4bc..3a626ed 100644
>--- a/arch/powerpc/kernel/eeh_driver.c
>+++ b/arch/powerpc/kernel/eeh_driver.c
>@@ -587,10 +587,16 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
> 	eeh_ops->configure_bridge(pe);
> 	eeh_pe_restore_bars(pe);
>
>-	/* Clear frozen state */
>-	rc = eeh_clear_pe_frozen_state(pe, false);
>-	if (rc)
>-		return rc;
>+	/*
>+	 * If it's PHB PE, the frozen state on all available PEs should have
>+	 * been cleared by the PHB reset. Otherwise, we unfreeze the PE and its
>+	 * child PEs because they might be in frozen state.
>+	 */
>+	if (!(pe->type & EEH_PE_PHB)) {
>+		rc = eeh_clear_pe_frozen_state(pe, false);
>+		if (rc)
>+			return rc;
>+	}
>
> 	/* Give the system 5 seconds to finish running the user-space
> 	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
>-- 
>2.1.0
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset
  2015-10-07 23:20 ` [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
@ 2015-10-08  4:05   ` Gavin Shan
  0 siblings, 0 replies; 7+ messages in thread
From: Gavin Shan @ 2015-10-08  4:05 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev, mpe

On Thu, Oct 08, 2015 at 10:20:14AM +1100, Gavin Shan wrote:
>On Wed, Oct 07, 2015 at 02:12:05PM +1100, Gavin Shan wrote:
>>On PowerNV platform, the PE is kept in frozen state until the PE
>>reset is completed to avoid recursive EEH error caused by MMIO
>>access during the period of EEH reset. The PE's frozen state is
>>cleared after BARs of PCI device included in the PE are restored
>>and enabled. However, we needn't clear the frozen state for PHB PE
>>explicitly at this point as there is no real PE for PHB PE. As the
>>PHB PE is always binding with PE#0, we actually clear PE#0, which
>>is wrong. It doesn't incur any problem though.
>>
>>This checks if the PE is PHB PE and doesn't clear the frozen state
>>if it is.
>>
>>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>Michael, please ignore this series for now. One or two patches are
>still missed. Also, this series needs respin.
>

v2 was just sent out. Please look at v2 instead of this one.

Thanks,
Gavin

>>---
>> arch/powerpc/kernel/eeh_driver.c | 14 ++++++++++----
>> 1 file changed, 10 insertions(+), 4 deletions(-)
>>
>>diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>index 89eb4bc..3a626ed 100644
>>--- a/arch/powerpc/kernel/eeh_driver.c
>>+++ b/arch/powerpc/kernel/eeh_driver.c
>>@@ -587,10 +587,16 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>> 	eeh_ops->configure_bridge(pe);
>> 	eeh_pe_restore_bars(pe);
>>
>>-	/* Clear frozen state */
>>-	rc = eeh_clear_pe_frozen_state(pe, false);
>>-	if (rc)
>>-		return rc;
>>+	/*
>>+	 * If it's PHB PE, the frozen state on all available PEs should have
>>+	 * been cleared by the PHB reset. Otherwise, we unfreeze the PE and its
>>+	 * child PEs because they might be in frozen state.
>>+	 */
>>+	if (!(pe->type & EEH_PE_PHB)) {
>>+		rc = eeh_clear_pe_frozen_state(pe, false);
>>+		if (rc)
>>+			return rc;
>>+	}
>>
>> 	/* Give the system 5 seconds to finish running the user-space
>> 	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
>>-- 
>>2.1.0
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-10-08  4:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-07  3:12 [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
2015-10-07  3:12 ` [PATCH 2/5] powerpc/eeh: More relexed hotplug criterion Gavin Shan
2015-10-07  3:12 ` [PATCH 3/5] powerpc/eeh: Force reset on fenced PHB Gavin Shan
2015-10-07  3:12 ` [PATCH 4/5] powerpc/eeh: More relxed condition for enabled IO path Gavin Shan
2015-10-07  3:12 ` [PATCH 5/5] powerpc/pseries: Cleanup on pseries_eeh_get_state() Gavin Shan
2015-10-07 23:20 ` [PATCH 1/5] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
2015-10-08  4:05   ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).